iQIYI Holds World's First Low-Resource Voice Cloning Challenge to Accelerate Development of AI Voice Technology
iQIYI Inc. (NASDAQ: IQ) has launched the Multi-Speaker Multi-Style Voice Cloning Challenge (M2VoC) from Nov 27, 2020, to Feb 11, 2021, aimed at enhancing synthetic speech quality. This first-ever low-resource voice cloning challenge encourages participants to develop better intelligibility and naturalness in speech despite limited resources. The challenge offers two categories—'few-shots' and 'one-shot'—with evaluations based on speaker similarity, speech quality, style, and pronunciation accuracy. The global speech technology market is predicted to reach $16 billion with a CAGR of 16% in the next 7-8 years, highlighting the industry's growth potential.
- Launch of M2VoC positions iQIYI as an innovator in AI and speech synthesis.
- Challenge encourages collaboration with researchers, potentially leading to breakthroughs in voice cloning technology.
- Participation in a growing market, projected to reach $16 billion with a 16% CAGR.
- None.
BEIJING, Dec. 15, 2020 /PRNewswire/ -- iQIYI Inc. (NASDAQ: IQ) ("iQIYI" or the "Company"), an innovative market-leading online entertainment service in China, is pleased to announce that it has partnered with multiple organizations to hold a Multi-Speaker Multi-Style Voice Cloning Challenge (M2VoC) scheduled to run from 27 November 2020 to 11 February 2021.
M2VoC aims to enhance the quality of synthetic speech while reducing the dependence on the quantity and quality of training datasets. The Company hopes that participants can improve the intelligibility and naturalness of synthetic speech even under conditions in which there are limited resources.
iQIYI released detailed guidelines for M2VoC, the first low-resource voice cloning challenge in the world, on 27 November. Organized by a team of iQIYI experts and a number of organizations, the challenge is aimed to serve as a general dataset and a fair test platform that would facilitate the research of the voice cloning tasks.
As an ICASSP2021 Signal Processing Grand Challenge, M2VoC encourages researchers from academia and the computing industry to participate.
The competition is comprised of two categories, the 'few-shots' category and the 'one-shot' category. Target speakers for voice cloning validation and evaluation are provided for both categories.
In the few-shots category, each speaker has a different speaking style with 100 available samples. In the one-shot category, each speaker has a different speaking style with only 5 samples.
For both categories, contestants will be provided with two base datasets for base model training, with each dataset containing 5,000 different training samples of different speech styles.
The winners will be selected for each category based on a weighted value of four criteria: speaker similarity, speech quality, style/expressiveness and pronunciation accuracy.
As an innovative technology in the field of artificial intelligence (AI), speech synthesis is essential for creating a good interactive experience. As speech synthesis has valuable applications in areas such as voice assistants, broadcasting and audio books, it is a fast-growing field. The global market of speech recognition and speech-related technologies is projected to expand to
Thanks to deep learning, speech synthesis has been able to produce very realistic and natural-sounding speech in specific areas. However, the technology requires a large number of datasets and highly demanding recording conditions. As a result, technological advancement in the field has been hindered by the capital and time required for dataset creation. There is still much room for improvement in the expressiveness and robustness of synthetic speech with different speakers and various styles, especially in real-world or low-resource conditions. iQIYI hopes that M2VoC will help to address these issues and accelerate the development of AI voice technology.
The competition will also drive the development of cutting-edge technologies such as voice cloning and speech recognition, further broadening the application scope of AI and creating new opportunities in the audiovisual industry. Through this challenge, iQIYI hopes to team up with talented researchers and build solutions for low-resource voice cloning with advanced deep-learning technology and multi-stylistic voice morphing technology. The Company also anticipates that M2VoC will further elevate the interactive experience of video and drive the development and application of voice cloning technology.
In recent years, iQIYI has been leveraging AI to enable content creation, enhance users' entertainment experience and improve iQIYI's growing entertainment ecosystem. Currently, iQIYI's AI technology has been applied to a whole set of processes including content creation, production, distribution and commercialization. In the years ahead, iQIYI will continue to explore AI voice technology, unlocking its tremendous potential for use in the multi-media entertainment industry so that the Company can create a better audio-visual world for its users.
View original content:http://www.prnewswire.com/news-releases/iqiyi-holds-worlds-first-low-resource-voice-cloning-challenge-to-accelerate-development-of-ai-voice-technology-301192766.html
SOURCE iQIYI
FAQ
What is the iQIYI Multi-Speaker Multi-Style Voice Cloning Challenge (M2VoC)?
How does the M2VoC challenge evaluate participants?
What are the two categories in the M2VoC challenge?
What is the projected market size for speech recognition technologies?