Google as a company has always been in support of open-source software and data, in their stand at least. It’s almost a trend with big companies now as both Google and Microsoft have made prominent contributions to the open-source scene. “Google is also a major contributor to open-source software. Key examples of this include Android, our smartphone operating system, Chromium, the code base for our Chrome browser (now also powering many competitors), and TensorFlow, our machine learning system. Google’s release of Kubernetes changed cloud hosting forever, and has enabled innovation and competition across the cloud industry. Google is also the largest contributor of open source code to GitHub, a shared repository for software development. In 2017, Googlers made more than 250,000 changes to tens of thousands of projects on GitHub alone.”
In most cases, it’s not really out of generosity but more about benefiting from free development and then profit from widespread adoption. Regardless, these contributions have helped thousands of researchers and have spearheaded software development in a uniform manner, which should be celebrated. Google in a recent blogpost highlighted their contribution to open data and open-source software.
With the advent of real-time tracking and the development of driverless cars, a lot of research is being done in computer vision and Google is one of the companies at the forefront of Visual Tech.
Our commitment to open source and open data has led us to share datasets, services and software with everyone. For example, Google released the Open Images dataset of 36.5 million images containing nearly 20,000 categories of human-labeled objects. With this data, computer vision researchers can train image recognition systems. Similarly, the millions of annotated videos in the YouTube-8M collection can be used to train video recognition.
– Hal Varian
Chief Economist, Google
Google is also sitting on a lot of data which can help NLP research and help computers better understand human speech. In the blog post, Google highlighted the sharing of a key database stating “With respect to language processing, we’ve shared the Natural Questions database, which contains 307,373 human-generated questions and answers. We’ve also made available the Trillion Word Corpus, which is based on words used on public web pages, and the Ngram Viewer, that can be used to explore the more than 25 million books in Google Books. These collections can be used for statistical machine translation, speech recognition, spelling correction, entity detection, information extraction and other language research.”
The search engine is one of Google’s core businesses, getting over 63,000 queries every second. This data is very important for the company and Google analyzes this data for targeted advertising. Still, some insights into the overall data are made public in Google’s trends portal.
“Google also offers Google Trends, a free service that enables anyone to see and download aggregate search activity since 2004 for Google Search, Image Search, News Search, Shopping and YouTube. You can get search information for countries, regions, metro areas and cities on a monthly, weekly, daily and even hourly basis. The Trends data is widely used by researchers in fields as varied as medicine and economics. According to Google Scholar, there are more than 21,000 research papers that cite Trends as a data source.”
Why Work On Open Source Projects?
I talked about this briefly at the beginning of the article. A new piece of software can implement a great idea and innovate the space but that doesn’t stop others from implementing similar ideas and then working to make it better. Many companies have learned it the hard way, for example, the Windows Phone. As we know it was a resounding failure and for a lot of reasons but having a closed environment and controlling licensing was a big part of it. Hardoop and HDFS were open-sourced versions of MapReduce built by Google and the company learned the importance of open-source early on. Summarizing, the decision to make an IP open source is a strategic one.
Google in its blog post delves into a few other reasons, stating “First and foremost, our primary mission is “to organize the world’s information and make it universally accessible and useful.” Certainly one obvious way to make information universally accessible and useful is to give it away!”
They also talk about why they can’t release some stuff stating “Of course, we can’t release all the data we use in our business. We need to protect user privacy, maintain confidentiality for business customers, and protect Google’s own intellectual property. But, subject to such considerations, we generally try to make our data as “universally accessible and useful” as possible.“