Google has announced the availability of multiple datasets comprising of diverse but limited natural images. The search giant is confident the publicly available data will drive the pace of Machine Learning and Artificial Intelligence while reducing the time taken to train the AI models on a minimal amount of data. Google is calling the new initiative ‘Free Meta-Datasets’ that will help AI models ‘learn’ on less data. The ‘Few-Shot AI’ from the company is optimized to ensure AI learns new classes from only a few representative images.
Understanding the need to quickly train AI and Machine Learning models with fewer datasets, Google has launched ‘Meta-Dataset’, a small collection of images that should help reduce the amount of data needed to improve the accuracy of algorithms. The company claims that using few-shot image classification techniques, the AI and ML models will gain the same insights from a lot fewer representative images.
Google AI Announces Meta-Dataset: A Dataset of Datasets For Few-Shot Learning:
Deep Learning for AI and Machine Learning has been growing exponentially for quite some time. However, the core requirement is the availability of high-quality data and that too in large amounts. The large amounts of manually annotated training data are often difficult to procure and can sometimes be unreliable as well. Understanding the risks of large datasets, Google has announced the availability of a collection of meta-datasets.
Through “Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples” (presented at ICLR 2020), Google has proposed a large-scale and diverse benchmark for measuring the competence of different image classification models in a realistic and challenging few-shot setting, offering a framework in which one can investigate several important aspects of few-shot classification. Essentially, Google is offering 10 publicly available and free to use datasets of natural images. These datasets comprise of ImageNet, CUB-200-2011, Fungi, handwritten characters and doodles. The code is public and includes a notebook that demonstrates how Meta-Dataset can be used in TensorFlow and PyTorch.
Few-shot classification goes beyond the standard training and deep learning models. It takes generalization to entirely new classes at test time. In other words, the images used during the testing were not seen in training. In a few-shot classification, the training set contains classes that are entirely disjoint from those that will appear at test time. Each test task contains a support set of a few labeled images from which the model can learn about the new classes and a disjoint query set of examples that the model is then asked to classify.
A Meta-Dataset is a large component wherein model studies generalization to entirely new datasets, from which no images of any class were seen in training. This is in addition to the tough generalization challenge to new classes inherent in the few-shot learning setup.
How Does Meta-Dataset Help Deep Learning For AI And Machine Learning Models?
Meta-Dataset represents the largest-scale organized benchmark for cross-dataset, few-shot image classification to date. It also introduces a sampling algorithm for generating tasks of varying characteristics and difficulty, by varying the number of classes in each task, the number of available examples per class, introducing class imbalances, and, for some datasets, varying the degree of similarity between the classes of each task.
Meta-Dataset does introduce new challenges for a few-shot classification. Google’s research is still preliminary and there’s a lot of ground to cover. However, the search giant has claimed that researchers are experiencing success. Some of the notable examples include using cleverly-designed task conditioning, more sophisticated hyperparameter tuning, a ‘meta-baseline’ that combines the benefits of pre-training and meta-learning, and finally using feature selection to specialize a universal representation for each task.