It seems that as the years go by, it will be challenging for people to access authentic products. Counterfeiters all over the globe are stepping up their game in regards to productions of knock-offs of all kinds of consumer and media goods. Today, there are even fake products in regards to video clips, audio tracks, and even financial transactions. Some people can forge your handwriting in such a way that no one will tell the difference.
There was a sigh of relief with recent improvements in AI (Artificial Intelligence) and ML (Machine Learning) that sought to eliminate the problem significantly. Sadly, it looks like the unscrupulous may utilize ML and AI to make even better fakes. In the past, there were no questions about the use of audio and video recordings as sources of reliable evidence. It is a fact that is rapidly changing because of advancements in AI. Several studies support this like:
The Development of a Machine Learning System
Researchers at the University of Washington in July 2016 developed a machine learning system. They used it to synthesize an individual's voice accurately. It is not all as it even captures lip syncs and vocal mannerisms when a person wants to sync their words into a video.
What the innovation implies is that it is now easy to fake a person’s voice. It is possible to create a video of an individual saying what you want them to say. The team demonstrated this using footage of address that former President Barack Obama used to make every week. The team was able to produce a photorealistic video only using an associated audio track.
It was possible because the recurrent neural network was taught to associate different audio features with individual shape of the mouth. The team took to generate CGI mouth movements as the next step. They then transferred the animated lips into a separate video that the president had made with the help of 3D pose matching.
The results of the technology suffered backlash over the possibility misapplication. The team behind it was quick to explain that they had more everyday uses for the system like:
- The reduction of bandwidth that people need for video transmission or coding because of the ability to use audio to create a top-quality video.
- Video synthesis is also beneficial to hearing-impaired persons. According to the study, it can help them with lip-reading mainly from the sound that is over-the-phone.
- Digital humans can also enjoy games and entertainment applications such as film special eff.
Introduction of Face To Face System
It is not only UW that wants to experiment with this kind of technology. A team studying at Standford introduced the Face2Face system. It generates video from other videos, not like the one from UW that produces video from audio.
The system utilizes regular webcam to capture the mouth shapes and facial expressions of users. It then uses the details to deform YouTube target’s video to match the speech and expressions of the users in real time.
The audio-video transaction that is based on AI is a two-way street. A group from MIT’s CSAIL came up with ways to create audio from silent video reel. They do it well enough to convince human audiences. Andrew Owens the paper’s lead author said that when a person runs their fingers across a glass of wine, the sound that it produces reflects the amount of liquid it has. He revealed the information while talking to MIT News. The algorithm they were working with stimulates that such sounds can expose details about the material type and shapes of the objects. It also divulges information about the motion and force of their interactions with the universe.
The deep learning system from MIT was trained over the course for several months. The researchers used a thousand videos which had 46, 000 sounds resulting from various objects being scraped, struck, or poked using a drumstick. The team learned how to associate different audio properties with particular on-screen actions. They could then synthesize the sounds while the video was playing. They proceeded to test the video with authentic sounds online. Surprisingly, a high percentage chose the fake video over the real one, two times as often as the baseline algorithm.
The primary reason for the MIT conducting the study was to leverage the technology to permit robots to have better situational awareness. Owen said that with the technology, a robot could look at a sidewalk and immediately know that grass is soft and cement is hard knowing what would happen if it was to step on either. He explained that ability to predict sound is essential when it comes to predicting consequences of physical interactions with the world.
Research of audio synthesization is not a preserve of universities. Some major corporations are also looking into the technology. For instance, Google has developed Wavenet. It is a degenerative model of audio waveforms in their raw form.
The first iterations of computer-generated TTS (text to speech) have got to be concatenative TTS. Here a person records multiple speech fragments. They then feed it into a database which the computer reconstructs to form words and sentences. Major issue behind this is that the results sound like something from Moviefone guy.
On the other hand, the waveform trains on the speech of people. The system uses samples of recordings to obtain data points of up to 16,000 points every second. Waveform outputs sound by using the model to predict what the next sound will be basing its production on the sounds that came before. Although it is a costly process, it works well to produce exceptional audio quality when you compare it to the conventional TTS techniques.
Countless examples of the use of such technology are out there like the Al-based handwriting mimic that could potentially see a robot forge your signature in official documents if misused.
These systems both the ones made to uncover fakes and the ones used to produce knock-offs have still not reached their potential. In the future, however, machine learning techniques and artificial intelligence will continue to grow and become better in ways that you may not even imagine. It is horrifying to think about technologies that will have the ability to create imitations, frauds and uncannily resounding deceits.
Written by Garrett Parker
Read more posts by Garrett Parker