Deep learning is an advanced field of machine learning that is rapidly growing these days. Although a lot of research has been done in this domain, there is still a research gap. Google engineers have made significant contributions in this regard and they are actively working to produce some major corpus. The search giant has recently collaborated with Jigsaw to release a huge corpus of deepfakes. The researchers can now work on synthetic video detection frameworks with the help of the freely available corpus.
Andrew Gully the technical research manager of Jigsaw and Nick Dufour the Research scientist of Google described in a blog post:
Since their first appearance in late 2017, many open-source deepfake generation methods have emerged, leading to a growing number of synthesized media clips. While many are likely intended to be humorous, others could be harmful to individuals and society.
Google further stated that the company worked with consenting and paid actors in order to compile the dataset of hundreds of videos. The company then used the videos to produce thousands of deepfakes. They generated both fake and real samples.
Considering the fact that deepfake is an evolving technology, the company will continue to add to the corpus.
We firmly believe in supporting a thriving research community around mitigating potential harms from misuses of synthetic media, and today’s release of our deepfake dataset in the FaceForensics benchmark is an important step in that direction.
Deepfake videos were initially spotted back in 2017. These videos were originally compiled for humour content. People now use AI-based systems to generate manipulative videos that swap the faces of people. There are various deepfake generating applications that can be for that purpose.
The good thing is authorities have now taken notice of this issue and they are making strict laws to increase scrutiny. The deepfake dataset is a major contribution in a way of resolving the issue. Moreover, Google is already working to mitigate the potential abuse of this technology. The company released a synthetic speech corpus which was later used by more than 150 research studies.