When OpenAI, the generative artificial intelligence startup company, released ChatGPT 4o last week, a video of its VoiceMode demonstrated how its emotive voice answered questions from users, according to The Hollywood Reporter. This included a controversy with Scarlett Johansson.

One of the voices available, Sky, sounded very much like the actress Scarlett Johansson. The actress was the voice of the emotive AI Samantha in the 2013 film Her. OpenAI founder Sam Altman posted just the word “her” on his X (formerly Twitter) account during the demo.

After users complained about the similarities of the voices, OpenAI has now announced that it's “pausing” using the Sky voice so they can address the issue.

Black Widow defeats OpenAI 

Scarlett Johansson, AI 

The company posted on X Monday morning, “We've heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them.”

Johansson released a statement on Monday evening. She said that Altman asked her last September to consider lending her voice to ChatGPT.

“He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and AI. He said he felt that my voice would be comforting to people,” the actress said the statement.

“After much consideration and for personal reasons, I declined the offer. Nine months later, my friends, family and the general public all noted how much the newest system named ‘Sky' sounded like me,” she added.

Two days before the demo, the OpenAI CEO contacted her agent to ask her to reconsider. The actress continued that because the company released the ‘Sky' voice, her lawyers have sent letters to OpenAI.

“As a result of their actions, I was forced to hire legal counsel, who wrote two letters to Mr. Altman and OpenAI, setting out what they had done and asking them to detail the exact process by which they created the ‘Sky' voice. Consequently, OpenAI reluctantly agreed to take down the ‘Sky' voice,” Johansson stated.

The company acknowledged the concerns raised in their blog post. OpenAI also explained how they create the voices as well as its extensive casting process.

“We believe that AI voices should not deliberately mimic a celebrity’s distinctive voice — Sky's voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice. To protect their privacy, we cannot share the names of our voice talents,” the blog post noted.

Creating the voice of ‘Sky'

OpenAI stated that it has been working with “well-known, award-winning” casting directors and producers since 2023 to select voice actors to participate. The casting received 400 submissions, which was then pared down to 14.

The blog post further explained, “We spoke with each actor about the vision for human-AI voice interactions and OpenAI, and discussed the technology's capabilities, limitations, and the risks involved, as well as the safeguards we have implemented. It was important to us that each actor understood the scope and intentions of Voice Mode before committing to the project.” From the initial 14, the company settled on five.

Those five voice actors then flew to San Francisco for recording sessions, which were then placed into ChatGPT last fall. OpenAI said that it's going to add more voices over time.

“We support the creative community and worked closely with the voice acting industry to ensure we took the right steps to cast ChatGPT's voices. Each actor receives compensation above top-of-market rates, and this will continue for as long as their voices are used in our products,” the blog continued.

Shock, anger and disbelief

As for Johansson, her lengthy statement shared that as soon as she heard the demo, she was “shocked, angered and in disbelief” that Altman went on to use a voice that “sounded so eerily similar to mine” that even her closest friends and news outlets weren't able to differentiate the two.

The actress ended the statement with a call for legislation that would protect individual rights.

The newest iteration of ChatGPT, the GPT-4o, the “o” is for omni, which references its modalities for text, vision and audio. This update introduces what's called a rapid audio input response. OpenAI claims that this response is similar to that of a human, with an average response time of 320 milliseconds with an AI-generated voice that's supposed to sound human. This update is also supposed to have “sentiment analysis” and the voice response was built to generate speech with emotional nuances.

According to the company, GPT-4o can be accessed in the free tier. Plus users will be able to use with up to five times higher message limits. In the coming weeks, OpenAi said it will roll out a new version of Voice Mode — mostly likely without the ‘Sky' voice or a heavily modified one — with GPT-4o in alpha within ChatGPT.

So say goodbye to the dulcet tones of Scarlett Johansson when you next log in.