Tools Galore for API Agent Development

New tools for building agents with the API

Estimated read time: 1:20

    Learn to use AI like a Pro

    Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

    Canva Logo
    Claude AI Logo
    Google Gemini Logo
    HeyGen Logo
    Hugging Face Logo
    Microsoft Logo
    OpenAI Logo
    Zapier Logo
    Canva Logo
    Claude AI Logo
    Google Gemini Logo
    HeyGen Logo
    Hugging Face Logo
    Microsoft Logo
    OpenAI Logo
    Zapier Logo

    Summary

    OpenAI is launching a suite of new tools designed to help developers build reliable and efficient agents via their API. An 'agent' is defined as a system capable of performing tasks independently on behalf of a user. This launch introduces three new built-in tools, a new API, and an open-source SDK aimed at simplifying the process for developers. Key offerings include a web search tool, a file search tool, and a computer use tool, all of which enhance the functionality and capabilities of agents. The introduction of the responses API promises a more pleasant developer experience with its ability to handle complex interactions seamlessly. OpenAI is committed to continuing support for existing systems while providing ample timelines for migration, promising that 2025 will be a pivotal year for AI agents, transforming how tasks are automated and executed in the real world.

      Highlights

      • Kevin from OpenAI introduces a suite of new tools for building agents 🌐.
      • Three new built-in tools enhance agent development: web search, file search, and computer use 🔍.
      • The responses API simplifies complex multi-tool interactions for developers 💡.
      • OpenAI revises Swarm SDK into the production-ready Agents SDK 🌟.
      • The company promises continued support and a gradual transition to the new systems 🔄.
      • 2025 is expected to be transformative for agent capabilities 🚀.

      Key Takeaways

      • OpenAI introduces new agent tools like web search, file search, and computer use, making AI development more robust 🤖.
      • The new responses API supports complex, multi-tool interactions, enhancing agent capabilities 💪.
      • OpenAI's SDK, previously known as Swarm, is now production-ready and renamed as the Agents SDK 🚀.
      • OpenAI continues to support current systems, ensuring smooth transition paths to new APIs 🛠️.
      • 2025 is anticipated to be the 'year of the agent', promising transformative features 🌟.

      Overview

      OpenAI is revolutionizing the way developers build AI agents by unveiling a comprehensive suite of tools designed to simplify and enhance the process. Their new offerings include three built-in tools, a new API, and an open-source software development kit (SDK). These tools aim to equip developers with the capabilities needed to construct sophisticated agents capable of performing independent tasks efficiently.

        The initiative focuses on three primary tools: a web search tool to access current and factual internet data, a file search tool for private data navigation, and a computer use tool that extends operational capabilities to systems without API access. With these tools, OpenAI enhances its flexibility and usability for developers dealing with complex, multi-step workflows.

          OpenAI tells developers to anticipate exciting developments in 2025, which is slated to be 'the year of the agent'. With ongoing support for existing systems and well-defined transition paths to new technologies, OpenAI promises to elevate the standards of automation and interactivity in real-world applications.

            Chapters

            • 00:00 - 01:00: Introduction The chapter introduces Kevin, who leads product at OpenAI, and discusses the concept of 'agents'. Agents are systems that can act independently on behalf of a user. The focus of the chapter is on the new tools being launched to help developers build reliable and effective agents. Two new agents have been launched in ChatGPT this year.
            • 01:00 - 02:00: Launch of New Tools for Agents The chapter discusses the launch of new tools designed for agents, including 'Operator', which enables web browsing and online task execution, and 'Deep Research', capable of generating comprehensive reports on any given topic in a short amount of time. The tools are well-received and the plan is to now extend them to developers through an API, following positive feedback.
            • 02:00 - 03:30: Web Search Tool The chapter discusses the challenges faced by developers in building agents, highlighting issues such as the complexity and brittleness of using low-level APIs from multiple sources. It notes that while models with advanced reasoning and multimodal understanding are ready to perform complex multi-step workflows required by agents, integrating these technologies remains difficult. The chapter concludes with excitement about a new development aimed at addressing these challenges, promising to simplify the process.
            • 03:30 - 05:00: File Search Tool In this chapter titled 'File Search Tool', the team introduces a series of new tools, an API, and an open-source SDK intended to simplify processes for developers. The team members include Elan from the developer experience team, Steve from the API team, and Nik from the API product team. They reveal the launch of three new built-in tools, alongside a new API and an open-source SDK, enhancing the developer experience.
            • 05:00 - 06:30: Computer Use Tool The chapter introduces the 'web search tool', a new feature designed to enhance the capability of models to access and retrieve up-to-date, factual information from the internet. This tool ensures that user responses are current and accurate by leveraging a model that has been fine-tuned specifically for this purpose. It is the same tool that powers Chat GBd search, utilizing a variant of the GBD 40 model known for its proficiency in navigating and extracting relevant information from large volumes of web data.
            • 06:30 - 08:00: Responses API The "Responses API" chapter discusses various tools and benchmarks related to API responses. The chapter starts by introducing a benchmark known as Simple QA, where gbd 40 achieves a state-of-the-art score of 90%. This reflects the strength of current response technologies in maintaining high accuracy levels. Additionally, the chapter mentions another tool, the file Search tool, which was released last year within the assistance API. This tool allows developers to upload and embed documents, enhancing the searchability and access to information. This tool is particularly favored by one of the speakers, Steve.
            • 08:00 - 10:00: Personal Stylist Assistant Demo The chapter introduces two new features in the file search tool: metadata filtering and direct search endpoint. Metadata filtering allows users to add attributes to their files to easily filter them, making searches more efficient. The direct search endpoint enables users to search their vector stores directly, without filtering through the model, thus improving search speeds and accuracy for private data.
            • 10:00 - 12:00: Building Agents and SDK In this chapter, the focus is on the introduction of the 'computer use tool' by an API. This tool allows for the control and automation of computers or virtual machines, especially those with only graphical user interfaces and no API access. It is ideal for automating tasks and building applications without direct API interaction. The 'computer use model' employed is the same as that used by Operator in Chat GPT, and it has soda benchmarks on OS.
            • 12:00 - 15:00: Swarm and Agents SDK The chapter titled 'Swarm and Agents SDK' discusses the development and user feedback on the Kua model and related tools. Early feedback from users in the World Web Arena and Voyager suggests a positive reception and excitement about what can be built with these tools. The text also briefly mentions the strategic approach taken to designing an effective API for these tools, emphasizing a first principles approach. Additionally, the release of chat completions in March 2023 alongside the gbd 3.5 turbo version is noted.
            • 15:00 - 17:00: Monitoring and Tracing with Agents SDK The chapter discusses the evolution of API interactions, highlighting the transition from text-only exchanges to multimodal capabilities including images, audio, and tool integration. It introduces the response API, a flexible API designed to support multiple terms and tools, optimizing behind-the-scenes operations.
            • 17:00 - 19:00: Future of API and Tools In this chapter, the focus is on exploring the potentials of API and tool usage in enhancing various applications. It introduces the responses API, explaining its simplicity and functionality, which is analogous to chat completions. The chapter demonstrates the API's capabilities through an example project of building a personal stylist assistant, showcasing the practical application and power of the API.
            • 19:00 - 20:00: Conclusion In the Conclusion chapter, the atmosphere is light and conversational. The narrative mentions a personal stylist interacting in front of a large audience, emphasizing a humorous and informal tone. This suggests an engagement with current trends, likely within a fashion or style context, though the exact nature of these trends isn't specified in the summary provided. The chapter hints at the playful, perhaps slightly stressful environment created by public expectation or performance.

            New tools for building agents with the API Transcription

            • 00:00 - 00:30 hey everyone I'm Kevin and I lead product at open aai today we're here to talk developers and agents and in particular we're excited to launch a bunch of new tools that make it easy for developers to build reliable and useful agents now when we say agent we mean A system that can act independently to do tasks on your behalf and we've launched two agents this year in chat PT the
            • 00:30 - 01:00 first is uh operator which can browse the web and do things for you on the web the second is deep research which can uh create detailed reports for you on any topic you want so you give it a topic and it can go off and do what might be a week's worth of research for you and come back with an answer in 15 minutes now the feedback for those has been fantastic but we want to Now launch those tools and more in the API to developers so we've spent the last
            • 01:00 - 01:30 couple months going around talking to developers all over the world about how we can make it easy for them to build agents and what we've heard is that the models are ready so with Advanced reasoning with multimodal understanding our models can now do the kind of complex multi-step workflows that agents need but on the other hand developers feel like they're having to Cobble together different low-level apis from different sources it's difficult it's slow it often feels brittle So today we're really excited to bring that
            • 01:30 - 02:00 together into a series of tools uh and and a new API and an open source SDK to make this a lot easier so with that let me introduce the team yeah hi I'm Elan I'm an engineer on the developer experience team I'm Steve I'm an engineer on the API team and I'm Nik I work on the API product team so let's dive into all the stuff that we are launching today like Kevin mentioned we have three new built-in tools we have a new API and an open source SDK uh starting off with the built-in tools the
            • 02:00 - 02:30 first tool that we're announcing today is called the web search tool the web search tool allows our models to access information from the internet so that your responses and the output that you get is up to-date and factual uh the web search tool is the same tool that powers chat gbd search and it's powered by a fine-tuned model under the hood so this is a fine tuned gbd 40 or 40 mini that is really good at looking at large amounts of data retriev from the web finding the relevant pieces of
            • 02:30 - 03:00 information and then clearly citing it in its response um in a benchmark that uh measures uh these type of things uh which is called Simple QA uh you can see that gbd 40 hits a high score of state-of-the-art score of 90% so that's the first tool Steve do you want to tell us about the second one yeah the second tool is actually my favorite tool and this is the file Search tool now we launched the file Search tool last year uh in the assistance API as a way for developers to upload chunk embed their documents
            • 03:00 - 03:30 and then do really easily do uh rag really easily over those documents now we're really excited to be launching two new features in the file Search tool today the first is metadata filtering so with metadata filtering you can add attributes to your files to be able to easily filter them down to just the ones that are the most relevant for your query the second is a direct search endpoint so now you can directly search your vector stores without your queries being filtered through the model first nice so you have web search for the public data file search for the the private data that you have and then the
            • 03:30 - 04:00 third tool that we are launching is the computer use tool the computer use tool is operator in the API but it allows you to control the computers that you are operating so this could be a virtual machine it could be a legacy application that just has a graphical user interface and you have no API access to it if you want to automate those kind of tasks and build applications on that you can use the computer use tool which comes with the computer use model um so this is the same model that is used by operator in chat gbt it has soda benchmarks on uh OS
            • 04:00 - 04:30 World web Arena web Voyager early user feedback on the Kua model and the tool has been super super positive so I'm really excited to see what all of you built with it all right so those are the three tools um and while we were building these tools and thinking of getting them out we also wanted to take a first principles Approach at designing the best API for these tools um we released chat completions I think in March 2023 alongside gbd 3.5 5 turbo and
            • 04:30 - 05:00 every single API interaction at that time was just text in and text out since then we've we've uh introduced multimodality so you have images you have audio we're introducing tools today and you also have products like 01 Pro deep research operator that make these multiple model turns and multiple tool calls behind the scenes so you wanted to build an API primitive that is flexible enough it supports multiple terms it supports tools um and we're calling this new API the respon API and to show you
            • 05:00 - 05:30 the responses API I'm going to hand it over to Steve cool let's go ahead and take a look at the responses API so if you've used chat completions before this will look really familiar to you you select some context you pick a model and you get a response that's pretty simple it's pretty simple and it's always hilarious so maybe not I don't know um so to demonstrate the power of the responses API we're going to be building sort of a personal stylist assistant so let's start by giving it some instructions you
            • 05:30 - 06:00 are a personal stylist you're only typing in front of like 50,000 people right now don't worry about it cool and we'll say uh we'll get rid of this and we'll say what are some of the latest trends the jokes in the context the joke is in the let's see what it says okay okay cool great um but no
            • 06:00 - 06:30 personal stylist assistant is complete unless it understands what its users like so in order to demonstrate this we've created a vector store that has uh some you know like some entries almost some diary entries of what people on the team have been wearing um we've kind that's not weird at all it's not weird at all I would just let it happen uh we've kind of been following people around the office and kind of like understanding what they what they've been up to so we we we uh we yeah there's a whole there's a team there's a team on it yeah so go ahead and add the file Search
            • 06:30 - 07:00 tool and uh I'll copy in my Vector store ID and here I can actually filter down this the files in this Vector store to just the ones that are relevant to the person that we want to style so uh in this case let's start with Elon we'll go ahead and filter down to his username and we'll come back here and we'll refresh and we'll say uh can you briefly
            • 07:00 - 07:30 summarize what Elon likes to wear I often ask chat GPT this question yeah but it never knows and now it can actually tell you what Alon lookes to cool so Elon has a distinct in consistent style characterized by Miami Chic that's really awesome um so the file Search tool is a great way to bring information about your users into your application but in order to be able to create a really good application for this personal stylist we want to be able to bring in fresh data
            • 07:30 - 08:00 from around the web um so that we have both the newest information and also stuff that's really relevant to your users so in order to demonstrate that I'll add the web search tool cool the web search tool is really great because you can also add loc you can also add data about like where your user is so let's try with somebody else Kevin are you Happ going to be taking any trips anytime soon let's say Tokyo okay cool Tokyo so I'll put in Tokyo here and we'll swap in Kevin and the
            • 08:00 - 08:30 responses API is really cool because it can do multiple things at once it can call a file Search tool it can call the web search tool and it can give you a final answer just in one API response so in order to tell it exactly what we want let's give it some instructions and it'd be good if I knew how to code well great you say you're an engineer here yeah well I'm in training so uh what we want we want the model to do is when it's asked recommend products we wanted to use the file
            • 08:30 - 09:00 Search tool to understand what Kevin likes and then use the web search tool to find a store near him where he can buy something that he might be interested in so let's go back and say uh find me a jacket um that I would like nearby and what the model will do is it will uh issue a file Search tool call to understand what kinds of things Kevin likes to wear and then it will isue a web search tool call to then go and find uh stuff that Kevin would like based on
            • 09:00 - 09:30 where he is so the model was able to uh just in the scope of one API call find a bunch of Patagonia stores in Tokyo for you Kevin which which go it actually corresponds to Kevin's preferences he's been wearing a lot of Patagonia around the office so um but no personal stylist assistant would be complete unless they could actually go and make purchases on your behalf so in order to do that let's demonstrate the computer use tool so we'll go ahead and add this we're using the computer use preview mod mod and the computer use preview tool
            • 09:30 - 10:00 and we will ask um help me find my friend Kevin a new pagonia jacket what's your favorite color Kev uh let's go with black and black can't have too many black patagon jackets and what the model will do is it will ask us for a screenshot and we have a Docker container running locally on this computer and we will go ahead and send that screenshot to the model it will look at the state of the computer and issue another action click drag move
            • 10:00 - 10:30 type and then we will execute that action take another screenshot send it back to the model and then it will continue in this fashion until it feels that it's completed the task and then return a final answer so well this is kind of going and doing its thing we'll hand it back to nun yeah awesome so these are some really cool tools and a really flexible API for you to build uh agents and and you have you have amazing building blocks to to do that now but for those of you who have built more complex applications like say you're building a customer support agent it's
            • 10:30 - 11:00 not always about just having one agent that's sort of the personal style uh stylist you also have some uh agentic application that's doing your refunds you have another thing that's answering customer support uh FAQ queries you have something else that's dealing with orders and billing Etc and to make these applications easy to build we released an SDK last year called swarm and swarm made it easy to do agent orchestration this was uh supposed to be an experimental and educational thing but so many of you took it to production
            • 11:00 - 11:30 anyway so uh you're like forcing our hand over here and so uh we've decided to take swarm and make it production ready add a bunch of new features and we're going to be rebranding it to be called the agents SDK Elan built uh swarm uh and help build it so I'm going to have hand it over to him to tell you more about how it works yeah thanks nun yeah so uh in my time at open AI I've spent a lot of time working with Enterprises and Builders to help them build out agentic experience and I've seen firsthand how pretty
            • 11:30 - 12:00 simple ideas can actually grow in complexity like when you actually go to implement them and so the idea with the agents SDK is to keep Simple ideas simple to implement while allowing you to build more complex and robust ideas still in a pretty like straightforward and simple way so um let's take a look at what Steve had before in the demo but implemented using the agents s it's going to look very similar at first we have our agent defined here we have some instructions um and we also have both of the tools
            • 12:00 - 12:30 file Search tool web search tool that we had before is this using like responses under the hood yeah so by default this is using the responses API but we actually support multiple vendors anything that really fits the chat completions um shape can work with the agents SDK nice so um during the practice runs we actually we actually accidentally ordered like many many pagonas so I'm sorry we're have I understand what's the problem we're helping you here uh want to return some of them uh and so to do that I could
            • 12:30 - 13:00 usually just add in like a returns tool and like add more to this prompt and get it to work but the problem with that is you start to mix all of this business logic which makes your agents a little bit harder to test and so this is the power of multiple agents is you can actually separate your concerns and develop and test them separately so to do so let's actually introduce a like an agent specifically to deal with the sorts of uh like returns so I'm going to load mine in and great so we still have our agent from before but you can see
            • 13:00 - 13:30 there's also this new agent the customer support agent here and I've defined a couple tools for it to use the guest get passed orders and then submit refund request and um you might notice these are just regular python functions as this is actually a feature that we people really loved in swarm that we brought over to the agent SDK which is we'll take your python functions and look at the type inference or look at the type signatures and then automatically generate the Json schema that the models need to use to perform
            • 13:30 - 14:00 those function calls and then once they do we actually run the code and then return the results so you can just Define these functions um as as they are now I've given them um now we have our two agents right we have the stylist agent and we have the customer support refunds agent so how do we interact with both of them as a user this is where the notion of handoffs come in and a handoff is actually a pretty simple idea it's pretty powerful and it's when you have one conversation where One agent is
            • 14:00 - 14:30 handling it and then it hands it off to another where you keep the entire conversation the same but behind the scenes you just swap out the instructions and the tools um and this gives you a way to triage conversations and like load in the correct context for each part of the conversation so what we've done here is created this triage agent that can hand off to the stylist agent or the customer support agent so enough talking let's actually see this in action so I'm going to save and do you know um I think we may
            • 14:30 - 15:00 have ordered one too many pagonas can you help me return I don't understand I I know I'm so sorry I can get you one later so what just happened here is it started off by transferring remember we're starting with the triage agent um to the customer support agent and this is just a function call that I'll show show you in a second um and then the customer support agent proactively called the get past orders function where we can see all of Kevin's pedagog I think you'll be
            • 15:00 - 15:30 okay um cool so to actually see what happened behind the scenes usually you might need to add some debugging statements by hand but one of the things that the agents s brings right out of the box is monitoring and tracing so I'm going to go over to the tracing UI that we have on our platform um to actually take a look what just happened so these are some of the previous runs that we've had I'm just refreshing the page um and we can see the last one uh and this last one you can actually see exactly what happened we started with a tree agent
            • 15:30 - 16:00 which um we sent a request to made a handoff and then switched over to the customer support agent which called the function now uh we can see what the original input was and handoffs are first class objects in this dashboard so you can see not only which agent we actually handed it off to but any that it like it had as options that it did not which is actually a really useful feature for debugging um afterward once we're in the customer support agent you can see they get get past orders function call with any input prams Here There Were None um
            • 16:00 - 16:30 and then the output is just again just all of Kevin's very monotonous history um and then finally we can get to the end where you get a response and so these are some of the features that you get right out of the box with the agents SDK there's a few more you uh we also have built-in guard rails that you can enable we have life cycle events um and importantly this is an open source framework so we're going to keep building it out um and you can install it like very soon or right now so you can just do pip install open AI middle
            • 16:30 - 17:00 Dash agents and we'll have an one for the JavaScript coming soon um but to close this off let's um let's let's actually perform the the refund so uh you know uh you know what I'm sorry Kevin get rid of all of them oh what am I going to wear Kevin's going to be cold yeah let's see it's a lot of them there we go takes a while to return so many P gam and so
            • 17:00 - 17:30 what what happens under the hood how do you how do you debug this how do you understand more about what's going on yeah so that we can all do back in the in the tracing in the tracing UI so this is a pretty nice straightforward way to build out these experiences yeah the awesome pass to you I'm so excited for all of you to have access to all of these tools uh and before we wrap up I wanted to make two additional points first we've introduced the responses API but the chat completions API is not going away we're going to continue
            • 17:30 - 18:00 supporting it with new models and capabilities there will be certain capabilities that require built-in tool use and there'll be certain models and agentic products that we release in the future that will require will require them and those will be available in responses API only responses API features are a superet of what chat chat completions support so whenever you decide to migrate over it should be a pretty straightforward migration to you and we hope you love the developer experience of responses cuz be put a lot of thought into that the second point I
            • 18:00 - 18:30 wanted to make was around the assistance API we built the assistance API based on all the great feedback that we got from all of our beta users and uh you know we we wouldn't be here without uh without all the learnings that we had during the assistance API phase we are going to be adding more features to the responses API so that it can support everything that the assistance API can do and once that happens we'll be sharing a migration guide that makes it really easy for all of you to migrate your
            • 18:30 - 19:00 applications from assistants to responses without any loss of functionality or data we'll give you ample time to move things over and once we once we're done with that we plan to Sunset the assistance API sometime in 2026 we'll be sharing a lot more details about this uh offline as well but yeah that's it for me I'll hand it over to Kevin to wrap us up awesome well we're super excited to announce the the responses API and the idea that we can bring take a single powerful API and
            • 19:00 - 19:30 bring together a whole bunch of different tools from Rag and file search to web search to Kua and our uh operator uh computer use apis now um now you can count on us to continue building powerful new models and bring more intelligence to bring more powerful tools to help you build better agents 20125 is going to be the year of the agent it's the year that chat GPT and our developer tools go from just answering questions to actually doing
            • 19:30 - 20:00 things for you out in the real world we're super excited about that we're just getting started we know you are too and we can't wait to see what you build