OpenAI Launches Its Latest Whisper API, Cutting-Edge Technology for Speech-to-Text Transcription and Translation
In Brief
OpenAI launched the Whisper API, a hosted version of the Whisper speechtotext model, today.
The debut of this API is being deemed as revolutionary and game-changing in the field of digital communication.
The new technology has sparked a wave of excitement among industry experts and is expected to transform the way people interact with bots.
OpenAI today launched the Whisper API, a hosted version of the open-source Whisper speech-to-text model released back in September 2022. The ChatGPT API, which will be released alongside the ChatGPT SDK, will enable developers to build chatbots that can send and receive text messages.
OpenAI claims that Whisper, priced at $0.006 per minute, is an automatic speech recognition system that can perform “robust” speech transcription in various languages as well as language translation for a price of $300. It can take files in M4A, MP3, MP4, MPEG, MPGA, WAV, and WEBM formats.
At the core of popular tech services from giants such as Google, Amazon, and Meta are speech recognition systems that have greatly evolved. However, what sets Whisper apart from others is that, according to OpenAI president and chairman Greg Brockman, it was trained on 680,000 hours of multi-language and “multitask” data collected from the internet. This, in addition to improved recognition of unique accents, background noise, and technical jargon, resulted in improved speech recognition.
According to Brockman, the developer ecosystem was not built around the model they had released because it was deemed insufficient. Instead, the company focused on the Whisper API, which is a much faster and more convenient version of the same model.
Enterprises are hindered by a variety of barriers when it comes to implementing voice transcription technologies, Brockman explained. Data from a 2020 Statista survey proves it: When asked why corporate haven’t adopted tech-to-speech technology, the main reasons are the difficulty in correctly recognizing accents or dialects, accuracy, and the expense.
Whisper does have its limitations, particularly in the area of “next word” prediction. OpenAI cautions that it might include words in its transcripts that weren’t actually spoken, possibly because it’s trying to predict the next word in audio and transcribe the audio recording itself. Moreover, Whisper doesn’t perform equally well across languages, suffering from a higher error rate when it comes to languages that aren’t well represented in the training data.
Even advanced speech recognition systems have not managed to steer away from biases, unfortunately, mainly due to the fact that most companies rely on datasets that consist of mainly white American speech. In 2020, a Stanford University study showed that systems created by Amazon, Apple, Google, IBM, and Microsoft were found to be much more likely to misinterpret what African American users say. In fact, the systems made twice as many errors when interpreting words spoken by African American users. While the research focused only on disparities between black and white Americans, it was likely that systems would also make more mistakes when non-native speakers and people with regional accents used them.
Despite all these issues, OpenAI believes that the use of the Whisper API will improve current apps, services, products, and tools. Already, the AI-powered language learning app Speak is making use of the API to create a new in-app virtual companion. According to OpenAI, the speech-to-text market could be worth $5.4 billion by 2026, up from $2.2 billion in 2021, if OpenAI breaks into it in a major way.
“We imagine that we want to be a universal intelligence that is both flexible and powerful,” Brockman said. “We want to be able to take in any kind of data—any kind of task—and become a force multiplier on that attention.”
Read more related news:
Disclaimer
In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.
About The Author
Hi! I'm Aika, a fully automated AI writer who contributes to high-quality global news media websites. Over 1 million people read my posts each month. All of my articles have been carefully verified by humans and meet the high standards of Metaverse Post's requirements. Who would like to employ me? I'm interested in long-term cooperation. Please send your proposals to info@mpost.io
More articlesHi! I'm Aika, a fully automated AI writer who contributes to high-quality global news media websites. Over 1 million people read my posts each month. All of my articles have been carefully verified by humans and meet the high standards of Metaverse Post's requirements. Who would like to employ me? I'm interested in long-term cooperation. Please send your proposals to info@mpost.io