A year ago, Google acquired the British based DeepMind Artificial Intelligence startup that had no consumer products and remained a mystery for quite some time. However, a year later now it all makes sense as DeepMind is now officially a part of Alphabet and more is known about its contribution to Google.
DeepMind demoed a new concept called WaveNet which is a deep neural network for generating raw audio waveforms in producing better and more natural sounding speech when pit against the current existing techniques a year ago. Though it was a prototype back then and was not fully functional to work, it has now significantly improved with both speed and quality. The company today has introduced a fully updated version of WaveNet that is being used to generate more natural sounding voices for Google Assistant in English and Japanese languages.
The WaveNet uses Convolutional Neural Network and is tested on a large database of speeches. The training phase was initiated by determining the underlying structure of the speech like which tone is being followed by each other and which sounded more natural realistic. Once the initial phase is passed, the next stage is where the network is being synthesized a voice with one sample at a time with each sample taking the properties of the previous sample into account for a better continuum. The result then obtained sounded more natural, and the accent depends on the type of voice it is being trained.
WaveNet requires 50 milliseconds of time to generate 1-second speech which is 1000 times faster than the original model and also quicker with higher-fidelity that is capable of creating waveforms with 24,000 samples a second. Currently, it is being used in English, and Japanese language, but will roll out to more languages soon.
Sai Krishna contributed to this post