OpenAI has unleashed a significant update to its ChatGPT, expanding the capabilities of this viral application in two major ways.
Voice Interaction: First and foremost, ChatGPT now has a voice. Users can choose from five synthetic voices, designed to sound remarkably lifelike, and engage in conversations with the chatbot as if making a phone call. This feature allows users to receive real-time spoken responses to their questions.
Image Recognition: The second major enhancement is ChatGPT’s ability to answer questions about images. While this feature was teased with the introduction of GPT-4, the model powering ChatGPT, it was previously unavailable to the wider public. Users can now upload images to the application and inquire about their content.
Additionally, OpenAI announced that DALL-E 3, the latest iteration of its image-generation model, will be integrated with ChatGPT. This integration will enable users to prompt ChatGPT to generate images.
The introduction of voice capabilities relies on two separate models. Whisper, OpenAI’s existing speech-to-text model, converts spoken language into text, which is then fed into ChatGPT. A new text-to-speech model is used to convert ChatGPT’s responses into spoken words.
The synthetic voices used for ChatGPT were created by training the text-to-speech model on the voices of actors hired by OpenAI. The company focused on ensuring that these voices are pleasant and easy to listen to, with potential plans to allow users to create their own voices in the future.
OpenAI is sharing this text-to-speech model with select companies, including Spotify. Spotify has disclosed its use of the same synthetic voice technology to translate celebrity podcasts into multiple languages, employing synthetic versions of the podcasters’ voices.
These updates underscore OpenAI’s rapid transformation of experimental models into desirable products. In a relatively short time frame, ChatGPT has evolved into ChatGPT Plus, a premium application that combines GPT-4 and DALL-E, rivaling virtual assistants like Siri, Google Assistant, and Alexa. Available for $20 per month, this enhanced app reflects OpenAI’s commitment to making ChatGPT more useful and valuable to users.
The image recognition feature of ChatGPT has already been tested by Be My Eyes, an app designed for people with visual impairments. Users can upload images and ask the chatbot to describe them, offering an alternative to human volunteers.
OpenAI is well aware of the potential risks associated with these updates. Combining models introduces new complexities and challenges. To mitigate misuse, users cannot ask questions about images of private individuals, for instance. OpenAI has been diligently addressing these issues to ensure the updates are safe for public use.
However, challenges remain. Voice recognition could potentially exclude individuals who do not speak with mainstream accents, and synthetic voices carry social and cultural implications that can shape users’ perceptions and expectations.
OpenAI remains confident that it has addressed the most critical issues, but the evolution of ChatGPT and similar AI models will continue to be a complex and evolving process.