Exciting Announcements for AI App Developers
OpenAI DevDay 2024: Realtime API and Vision Fine-Tuning Steal the Show!
Last updated:
Edited By
Mackenzie Ferguson
AI Tools Researcher & Implementation Consultant
Despite recent executive shake-ups, OpenAI's 2024 DevDay unveiled a host of new features, including a public beta of its Realtime API and vision fine-tuning in its API, aimed at making AI app development faster and more versatile.
OpenAI's recent DevDay event showcased several exciting developments for AI app developers, despite a tumultuous week marked by executive departures and significant fundraising activities. During DevDay, OpenAI introduced new tools, including a public beta of its Realtime API designed for building applications with low-latency, AI-generated voice responses.
Chief product officer Kevin Weil emphasized that the departures of key executive members, such as chief technology officer Mira Murati and chief research officer Bob McGrew, would not hinder the company's progress. Weil acknowledged the contributions of these leaders but reassured attendees that the company remains on track with its development goals.
AI is evolving every day. Don't fall behind.
Join 50,000+ readers learning how to use AI in just 5 minutes daily.
Completely free, unsubscribe at any time.
In light of these leadership changes, OpenAI is working to demonstrate that it continues to provide the best platform for AI app development. With a community of over 3 million developers using its AI models, OpenAI is facing increasing competition from major players like Meta and Google. To stay competitive, OpenAI has significantly reduced the costs for developers to access its API by 99% over the past two years.
One of the key highlights of the DevDay was the introduction of the Realtime API. This API will enable developers to create nearly real-time speech-to-speech experiences within their applications, using six distinct voices provided by OpenAI. These voices are specifically curated to avoid copyright issues and do not include third-party sourced voices.
During the event, Romain Huet, OpenAI’s head of developer experience, demonstrated a trip planning app utilizing the Realtime API. This app allowed users to interact verbally with an AI assistant for low-latency responses about travel plans to London. The API's capabilities extend to integrating tools for enhanced user experiences, such as annotating maps with restaurant locations.
Another notable demonstration involved using the Realtime API for phone-based interactions, such as inquiring about food orders for events. Although OpenAI’s API cannot directly make calls, it can integrate with calling APIs like Twilio to facilitate this functionality. However, there is a current gap in automatic AI identification during calls, placing the responsibility on developers to include necessary disclosures, which may soon be mandated by law.
Additionally, OpenAI introduced vision fine-tuning within its API, allowing developers to use images alongside text to enhance their applications of GPT-4o. This has the potential to significantly improve tasks that require visual comprehension. OpenAI ensured safety by prohibiting the upload of copyrighted, violent, or otherwise inappropriate imagery.
OpenAI is also striving to match features provided by competitors in the AI model licensing space. For instance, its new prompt caching feature, which reduces costs and latency by caching frequently used content, offers a 50% saving for developers. This is comparable to Anthropic’s similar feature, which promises a 90% cost reduction.
Moreover, OpenAI has introduced a model distillation capability that allows larger AI models to fine-tune smaller models, providing cost savings without sacrificing performance quality. This is complemented by a new beta evaluation tool that helps developers assess the performance of their fine-tuned models within the OpenAI API.
While DevDay brought many advancements, some anticipated announcements were notably absent. There was no update regarding the GPT Store, which was mentioned in the previous year’s DevDay. Additionally, developers are still waiting for the release of new AI models, such as OpenAI o1 and the video generation model Sora, which remain under development.