Microsoft has been reportedly sending audio clips to actual human beings for “analysis and improvement” of its services. After Google and Apple, both admitted to using the same controversial methodology, it turns out Microsoft too has been doing the same. The personal conversations between Skype users as well as audio instructions that were given to Cortana being deliberately exposed to humans is a serious concern for people who have been casually using the services assuming privacy and confidentiality.
Human contractors working for Microsoft have been listening to personal and private conversations of Skype users. The audio clips were obtained through the software’s translation service, reported Motherboard, a website that claims to have the relevant cache of internal documents, screenshots, and audio recordings. Incidentally, Microsoft does openly mention on Skype’s official website that the company may analyze audio of phone calls that a user wants to translate to improve the chat platform’s services. However, the statement omits who or what is actually conducting the analysis. This ambiguity, combined with the slightly protracted methodology of opting out, and the several perceived benefits, have ensured thousands of users continue to actively use the platform.
Microsoft Skype And Cortana Audio Clips Accessed And Analyzed By Human Contractors Instead Of AI?
According to Motherboard, which cites an anonymous contractor, Microsoft has been listening in to your Skype and Cortana audio. Quite similar to Google and Apple’s methodology of hiring human contractors to analyze snippets of audio, Microsoft too is sending small clips of audio to its large number of human contractors around the globe for analysis. The audio clips ranged from 5 to 10 seconds in length. It is quite likely that Microsoft must have taken due precautions to ensure the recordings aren’t tied to user credentials.
However, the primary concern here is Microsoft never clearly disclosed that it was actual humans who were listening. Microsoft has clearly informed users that some of their audio will be analyzed. However, the majority of users would immediately assume that Microsoft would rely on Artificial Intelligence (AI) to go through the audio for improving accuracy. An AI-based listening server could surely offer several benefits to most users, including those who suffer from speech impediments.
Is this even a surprise? $MSFT #AI assistants leaning awfully heavy on the human-in-the-loop. Reasonable to suspect pretty much every other big player in the market does the same. Lol are we comparing algos or are we comparing contractor quality? https://t.co/aqZLj66Bok
— Cameron K.F. Koo (@kfungfung) August 7, 2019
Speech recognition systems that go into services like Skype, Cortana, Apple’s Siri, Amazon’s Alexa, and Google’s virtual assistant heavily rely on technologies that attempt to improve the clarity of audio and as an extension, accuracy. Apart from deploying multiple microphones, even Smart Speakers that listen to audio instructions and relay relevant information or answers, benefit immensely from AI-based technologies.
Microsoft Claims Clear Policy On Privacy And User Consent
The absence of clarity about who or what is listening to the audio recordings is certainly a concern. However, Microsoft stresses that the company has clear policies on ensuring user privacy. Moreover, the company ensures users willingly give their consent to their audio being recorded and even agree to the same being analyzed to “improve the quality of service”. An FAQ for Skype Translator makes it amply clear. “Skype collects and uses your conversation to help improve Microsoft products and services. To help the translation and speech recognition technology learn and grow, sentences and automatic transcripts are analyzed and any corrections are entered into our system, to build more performant services.” But nowhere does it clearly say that humans may be listening to audio captured by the translator service.
Although most of the recordings are of short duration, some reports indicate they could be longer as well. The length of the audio recording provided for analysis could depend on the complexity of speech or other factors like coherency, which AI still finds difficult.
Anything marketed as being purely AI probably has human eyes and ears involved at some point in the workflow. https://t.co/wiEu4Eg11u
— Connor Mason (@conmas) August 7, 2019
Nonetheless, the fact that actual humans could be listening to audio recordings, some of which were reportedly quite intimate and private, is certainly concerning. Taking cognizance of the same, Apple and Google recently suspended their use of human transcribers for their respective Siri and Google Assistant. However, these companies took action only after a severe and persistent backlash that followed similar media reporting on the companies’ practices.
Microsoft Skype gained the ability to offer translation services back in 2015. This granted users the ability to get near-real-time audio translations during phone and video calls. Interestingly, before the feature was launched, a much-publicized article lauded Microsoft’s efforts of skillfully using AI to build its language translator. Undeniably, this created the strong impression that Microsoft predominantly uses AI to improve the service. Although Microsoft extensively relies on AI and it has been making strong inroads into understanding human speech, machine learning is often augmented with intelligence and understanding of actual humans. Contractors are responsible for filling in the gaps and helping AI to improve its algorithms.