Google’s Real-time Hand Tracking Algorithm Uses Smartphone To Improve Sign Language Recognition

Sign Language is used by millions of people around the world. Researchers have been working to build technologies that can understand the gestures and automatically convert them into human-understandable language. However, such projects have not gained huge success in terms of accuracy.

Google has recently developed an algorithm that can be used for real-time hand tracking. The intelligent system leverages machine learning to create a map of the hand. The map is created with the help of a camera or a smartphone. We can not deny the fact that most of the systems fail to accurately capture quick hand movements. Google has specifically addressed this problem in this research. Interestingly, they have limited the amount of data that was previously processed by the algorithms.

How Does The Real-time Hand Tracking Work?

Most of the existing projects translate the sign language by detecting the size and position of the complete hand. With this research. the researchers have eliminated the need to handle rectangular shapes in different sizes. Google’s system just recognizes the palm which is square in shape. Secondly, a separate analysis process is done for the fingers.

Google's Hand Tracking Algorithm
Hand gestures

The researchers used around 30.000 hand images to train the machine learning algorithm. These images were captured in different lightening conditions and pose. The system then detects the gesture by making a comparison between the hand pose and a list of known entities such as a ball or happiness. Google describes the gesture recognition in a blog post.

Then we map the set of finger states to a set of pre-defined gestures. This straightforward yet effective technique allows us to estimate basic static gestures with reasonable quality. The existing pipeline supports counting gestures from multiple cultures, e.g. American, European, and Chinese, and various hand signs including “Thumb up”, closed fist, “OK”, “Rock”, and “Spiderman”.

The final hand-tracking algorithm produces state of the art results in terms of its speed and accuracy. The algorithm uses MediaPipe framework to run. This technique seems like a major advancement in the sign language domain. Although still there is a lot of room for improvement. to create a better understanding of the sign language. Anyone can extend this work to use facial expressions and both hands to achieve better results.

Although there is no word from Google, there is a possibility that Google can improve this real-time hand tracking technology to use it in its products. Meanwhile, if you want to play around with the code, it is publically available on GitHub.

Alex Schoff
Alex is a technology reporter with a particular interest in Microsoft and Windows. He keeps a close eye on major developments related to Windows 10, Google Chrome, Office 365, and more.