OpenAI Introduces Voice Cloning Tool, But Waits for Public Release


    10 April 2024

    ChatGPT developer OpenAI has introduced a new tool that it says can reproduce human voices with just a short audio sample.

    The tool is among several developed by technology companies that aim to clone voices with a high level of exactness.

    The system is called Voice Engine. OpenAI released details about Voice Engine on March 29. The company also released some examples of how the tool performs.

    The OpenAI logo is seen on a mobile phone in front of a computer screen which displays output from ChatGPT, March 21, 2023, in Boston.(AP Photo/Michael Dwyer, File)
    The OpenAI logo is seen on a mobile phone in front of a computer screen which displays output from ChatGPT, March 21, 2023, in Boston.(AP Photo/Michael Dwyer, File)

    The company explained in a statement that the system uses written instructions and a 15-second audio sample to produce, or generate, "natural-sounding speech" in the voice of any speaker.

    OpenAI said it has been working on developing the tool since late 2022. It plans to continue developing it before releasing it to the public. The company said the delayed release represents "a cautious and informed" effort to prevent possible misuse.

    OpenAI said it began testing Voice Engine "with a small group of trusted partners." This testing aims to help company officials decide how to improve the tool's safety and learn how it can "be used for good" across different industries.

    There have already been highly publicized examples of false, or fake, voice recordings that sound like well-known people and politicians.

    One such example was an AI-manufactured voice claiming to belong to U.S. President Joe Biden. The voice was turned into a message that was sent by telephone to voters earlier this year in the state of New Hampshire.

    In the fake audio, Biden appears to try to urge voters not to take part in presidential primary voting. Election officials in New Hampshire are still investigating the incident.

    OpenAI's statement identified such election risks. "We recognize that generating speech that resembles people's voices has serious risks, which are especially top of mind in an election year."

    There have been other examples of so-called "deepfakes" being used in political campaigns around the world. A "deepfake" is a piece of audio or video created to make it appear that people in it are saying or doing things that they never did.

    OpenAI said it is currently discussing how to limit these and other risks of such voice cloning systems. Technology experts have also warned such tools can be used to carry out financial crimes. For example, voices can be recorded and then reproduced to represent another individual in an effort to carry out fraud.

    The company said it has been in contact with U.S. and international partners "from across government, media, entertainment, education, civil society and beyond." The aim is to get advice from people in these groups about the best ways to develop and deploy the system, OpenAI said.

    OpenAI said its early Voice Engine testers have agreed not to represent a person without their permission and to clearly state that the voices are produced by AI. The company is best known for its launch of the AI-powered ChatGPT tool, which was launched in November 2022.

    There are several other companies developing similar voice cloning tools. One of the most widely known is called Descript. The company explains on its website the AI-powered system can clone any person's voice with a 60-second audio sample.

    Descript also offers other services involving audio and video production. The company's website says users can choose a free plan that includes up to 1,000 words of voice cloning. For $24, users can get unlimited voice generation, Descript said.

    OpenAI listed several possible uses for the technology tool. These include providing reading assistance to non-readers or children, supporting individuals who cannot speak, and helping patients recover their voice.

    The company said its Voice Engine system can work in several languages. One example of such use is taking an English voice recording and producing a new audio version in another language. In an example released online, OpenAI demonstrated a person's voice and style of speaking can remain the same in other targeted languages.

    I'm Bryan Lynn.

    Bryan Lynn wrote this story for VOA Learning English, based on reports from OpenAI, The Associated Press and Agence France-Presse.

    __________________________________________

    Words in This Story

    sample – n. a small amount of something that gives you information about the thing it was taken from

    clone – v. to create something that is very similar to something else

    instructions – n. advice and information about how to do or use something

    cautious – adj. taking care to avoid risks or danger

    resemble – v. to look like or be like someone or something

    top of mind – idiom. something that occupies a big part of a person's thoughts

    fraud – n. the crime of doing something illegal in order to get money

    style – n. a way of doing something that is typical of a particular person, group, place, etc.