Tech

Three New USE Multilingual Modules Are Coming To TensorFlow

Google is one of the pioneers of AI research and a multitude of their projects have gone on to turn heads. AlphaZero from Google’s DeepMind team was a breakthrough in AI research, due to the program’s ability to learn complicated games by itself (Without Human Training and Intervention). Google has also done excellent work in Natural Language Processing Programs (NLPs), which is one of the reasons behind Google Assistant’s efficiency in understanding and processing human speech.

Google recently announced the release of three new USE multilingual modules and provide more multilingual models for retrieving semantically similar text.

The first two modules provide multilingual models for retrieving semantically similar text, one optimized for retrieval performance and the other for speed and less memory usage. The third model is specialized for question-answer retrieval in sixteen languages (USE-QA) and represents an entirely new application of USE. All three multilingual modules are trained using a multi-task dual-encoder framework, similar to the original USE model for English, while using techniques we developed for improving the dual-encoder with additive margin softmax approach. They are designed not only to maintain good transfer learning performance, but to perform well n semantic retrieval tasks.

Language processing in systems has come a long way, from basic syntax tree parsing to big vector association models. Understanding context in text is one of the biggest problems in the NLP field and the Universal Sentence Encoder solves this by converting text in high dimensional vectors, which makes text ranking and denotation easier.

UTE Marking Structure Source – Google Blog

According to Google, “The three new modules are all built on semantic retrieval architecture, which typically split the encoding of questions and answers into separate neural networks, which makes it possible to search among billions of potential answers within milliseconds.” In other words, this helps in better indexing of data.

All three multilingual modules are trained using a multi-task dual-encoder framework, similar to the original USE model for English, while using techniques we developed for improving the dual-encoder with additive margin softmax approach. They are designed not only to maintain good transfer learning performance, but to perform well n semantic retrieval tasks.” The Softmax function is often used to save computational power by exponentiating vectors and then dividing every element by the sum of the exponential.

Semantic retrieval architecture

“The three new modules are all built on semantic retrieval architectures, which typically split the encoding of questions and answers into separate neural networks, which makes it possible to search among billions of potential answers within milliseconds. The key to using dual encoders for efficient semantic retrieval is to pre-encode all candidate answers to expected input queries and store them in a vector database that is optimized for solving the nearest neighbor problem, which allows a large number of candidates to be searched quickly with good precision and recall.” 

You can download these modules from TensorFlow Hub. For further reading refer to GoogleAI’s full blogpost.


Leave a Reply

Your email address will not be published.

Close