Training Your Own AI Model Is Not As Hard As You (Probably) Think

Estimated read time: 1:20

    Summary

    In this informative video, Steve from Builder.io shares insights on how training your own AI model isn't as daunting as it may seem. With basic development skills, you can achieve faster, cheaper, and more effective results compared to using large, off-the-shelf models like those from OpenAI. Through a step-by-step guide, Steve explains the process of identifying your use case, breaking it down, and utilizing platforms like Google's Vertex AI to train specialized models tailored to your needs.

      Highlights

      • Training specialized AI models can be over 1,000 times faster and cheaper than using large models. ๐Ÿ’ธ
      • Breaking problems into smaller pieces helps identify which tasks can be solved with simple code. ๐Ÿงฉ
      • Using Google's Vertex AI, training your model without direct coding is possible and efficient. ๐ŸŒ
      • Manually verifying the quality of training data is essential for optimal performance. โœ…
      • A combination of specialized models and traditional coding allows for flexible and customizable AI solutions. ๐Ÿ”ง

      Key Takeaways

      • Training your own AI model can be more efficient and cost-effective than using large, general-purpose models. ๐Ÿš€
      • Breaking down your problem and understanding your specific needs is crucial for customizing AI solutions. ๐Ÿ”
      • Platforms like Google Vertex AI facilitate the training of specialized models without extensive coding. ๐Ÿค–
      • Creating your own data set ensures higher quality training and more reliable AI performance. ๐Ÿ“Š
      • Despite the convenience of pre-existing models, plain code can sometimes be the simplest and fastest solution. ๐Ÿ’ช

      Overview

      In today's tech-savvy world, training your own AI model could be a game-changer for developers, offering a more tailored, cost-effective approach. Steve from Builder.io demonstrates how with fundamental development skills, training a specialized AI model can yield superior results in speed, cost, and reliability compared to using large, general-purpose models from firms like OpenAI.

        Steve emphasizes the importance of dissecting your problem into manageable chunks and experimenting with pre-existing models to discern their efficacy for your specific use case. If limitations arise, such as high costs or lack of customization, then opting to train your own model might be a preferable route. This involves identifying suitable models for your requirements and generating exemplary data sets, which Steve clearly illustrates using their experience with turning Figma designs into code.

          Leveraging tools like Google's Vertex AI, Steve highlights, enables users to train their models without the need for complex coding, simplifying the development process. He stresses the criticality of data qualityโ€”ensuring top-notch data is vital for attaining optimal AI performance. Ultimately, incorporating specialized models with traditional coding practices furnishes robust, flexible solutions for developers, paving the way for innovation and superior user experiences.

            Chapters

            • 00:00 - 01:00: Introduction and Motivation for Custom AI Models The chapter 'Introduction and Motivation for Custom AI Models' discusses the reasons for training a custom AI model instead of using large language models (LLMs) like OpenAI's GPT-3 and GPT-4. The author argues that training your own model is simpler than expected with basic development skills, and can lead to significantly faster, cheaper, and more effective outcomes. The initial use of LLMs resulted in poor performance for their specific problem, as it turned out to be slow, costly, unpredictable, and difficult to manage.
            • 01:00 - 02:30: Challenges with Large Language Models (LLMs) In this chapter, the narrator discusses the challenges and solutions encountered when working with Large Language Models (LLMs). Instead of relying on existing models, they decided to train their own specialized, smaller model, which proved to be over 1,000 times faster and cheaper than larger, general-purpose models. The new model met their specific needs better and was more predictable, reliable, and customizable. They emphasize the importance of breaking down problems into smaller tasks to efficiently train a specialized AI model and share their example of automating the conversion of Figma designs into high-quality code.
            • 02:30 - 04:00: Breaking Down the Problem The chapter titled "Breaking Down the Problem" focuses on strategies to address challenges in model selection and deployment. It highlights the initial step of assessing whether an existing model can solve your problem, enabling quicker market entry and user testing. However, it cautions about the potential drawbacks such as cost, competition, or lack of customization. Therefore, simultaneously training a custom model could be a prudent plan if those drawbacks become significant.
            • 04:00 - 06:30: Object Detection Model Training The chapter discusses the challenges faced when training object detection models using popular general-purpose models, such as feeding Figma designs and expecting React components, which resulted in poor performance. It highlights the inefficiency and unpredictability of these models in specific tasks, emphasizing the need for customized model training.
            • 06:30 - 07:30: Data Generation and Quality Assurance The chapter discusses the challenges of training large AI models, particularly in the context of converting Figma designs into code. It highlights the common misconception that a large model with vast amounts of data can resolve complex tasks easily. However, it outlines that training huge models is expensive in terms of both time and financial investment. Therefore, iterative processes are necessary when working with large datasets and models.
            • 07:30 - 09:00: Training with Google's Vertex AI This chapter discusses the iterative nature of training models with Google's Vertex AI, highlighting the considerable time, cost, and expertise required. It looks at the challenges of generating or sourcing large datasets and questions the feasibility of manually creating vast quantities of design data for use in models, particularly with varied styling options such as Tailwind. The emphasis is on the practicality and efficiency challenges faced in the model training process.
            • 09:00 - 10:30: Testing and Confidence Thresholds In this chapter, the discussion focuses on the challenges of using AI models for complex problems compared to more traditional coding methods. The speaker suggests that relying solely on an all-encompassing AI model may not be appropriate at present. Instead, they propose that when faced with such complex issues, it is better to attempt solving the problem without AI first. By doing so, the problem can be broken down into smaller parts, each manageable with traditional coding techniques. This approach may provide a clearer, more effective solution.
            • 10:30 - 15:00: Combining Specialized Models and LLM The chapter discusses the challenges and strategies in combining specialized models with large language models (LLMs) for problem-solving. Initially, it was assumed that certain problems could be easily solved with code, however, they faced challenges in some areas. It highlights the importance of creatively iterating over problems to find solutions and introduces the idea of training specialized models to identify images as a part of the solution.

            Training Your Own AI Model Is Not As Hard As You (Probably) Think Transcription

            • 00:00 - 00:30 training your own AI model is a lot easier than you probably think I'll show you how to do it with only basic development skills in a way that for us yielded wildly faster cheaper and better results than using an off-the-shelf large model like those provided by open AI but first why not just use an llm in our experience we tried to apply an llm to our problem like open ai's gpt3 and GPT 4 but the results were very disappointing for our use case it was incredibly slow insanely expensive highly unpredictable and very difficult
            • 00:30 - 01:00 to customize so instead we trained our own model it wasn't as hard as we anticipated and because our models were small and specialized the results were that they were over 1,000 times faster and cheaper and they not only served our use case better but were more predictable more reliable and of course far more customizable so let's break down how you can train your own specialized AI model like we did first you need to break down your problem into smaller pieces in our case we wanted to take any figma design and automatically convert that into high quality code in
            • 01:00 - 01:30 order to break this problem down we first explored our options the first one I'd suggest you always try is basically what I suggested not to do which is see if you can solve your problem with a pre-existing model if you find this effective it can allow you to get a product to Market faster and test on real users as well as understand how easy to replicate this might be for competitors and ultimately if you find this works well for you but some of those drawbacks I mentioned become a problem such as cost beat or customization you could train your own model on the side and keep your finding
            • 01:30 - 02:00 it until it outperforms the llm you tried first but in many cases you might find that these popular general purpose models just don't work well for your use case at all in our case we tried feeding it figma designs as raw Jon data and asking for react components out the other side and it just frankly did awful we also tried GPT 4V and taking screenshots of figma designs and getting cut out the other side and similarly the results were highly unpredictable and often terribly bad so if you can't just pick up and use a model off the shelf
            • 02:00 - 02:30 now we need to explore what it would look like to train our own a lot of people have the intuition let's just make one big giant model where the input is the figma design and the output is the fully finished code we'll just apply millions of figma designs with millions of code Snippets and we'll be done the AI model will solve all our problems the reality is a lot more nuanced than that first training a large model is extremely expensive the larger it is and the more data it needs the more costly it is to train large models also take a lot of time to train so as you iterate
            • 02:30 - 03:00 and make improvements your iteration Cycles can be days at a time waiting for training to complete and even if you can afford that amount of time expense and have the expertise needed to make these large complicated custom models you may not have any way to generate all that data you need anyway if you can't find this data on the open web then are you really going to pay thousands of developers to hand code millions of figma designs into react or any other framework let alone all the different styling options like Tailwind versus
            • 03:00 - 03:30 emotion versus CSS modules it just becomes an impossibly complex problem to solve and a super duper model that just does everything for us is probably not the right approach here at least not today when you run into problems like this I would highly recommend trying to swing the pendulum to the complete other end and try as hard as you can to solve as much of this problem as possible without AI whatsoever that forces you to break the problem down into lots of discreet pieces that you can write normal traditional code for and see how far you can solve this in my experience
            • 03:30 - 04:00 however far you think you can solve it with some iteration and creativity you can get a lot farther than you think when we tried to break this problem down into just plain code we realized that there were a few different specific problems we had to solve in our findings at least two of the five problems were really easy to just solve with code where we hit challenges was in those other three areas so let's take that first step of identifying images and cover how we can train our own specialized model to solve this use case you really only need two key things to
            • 04:00 - 04:30 train your own model these days first is identify the right type of model and second you need to generate lots of example data in our case we're able to find a very common type of model that people train is an object detection model which can take an image and return some bounding boxes on where it found specific types of objects in this case locating the three cats and so we asked ourselves could we train this on a slightly novel use case which is to take a figma design as an image which uses hundreds of vectors throughout but for a
            • 04:30 - 05:00 website or mobile app certain groups of those should really be compressed into one single image and can identify where those image points would be so we can compress those into one and generate the code accordingly so that leads us to step two we need to generate lots of example data and see if training this model accordingly will work out for our use case we thought wait a second could we derive this data from somewhere somewhere that's public and free just like tools like open AI did where they crawl through tons of public data on the web and GitHub and use that as the basis
            • 05:00 - 05:30 of their training ultimately we realized yes we wrote a symol crawler that uses a headless browser to pull up a website into it and then evaluate some JavaScript on the page to identify where the images are and what their bounding boxes are which was able to generate a lot of training data for us really quickly now keep in mind one critical thing quality of your model is entirely dependent on quality of your data so out of hundreds of examples we generated we manually went through and used engineering to verify that every single
            • 05:30 - 06:00 bounding box was correct every time and used a visual tool to correct it anytime there weren't in my experience this can become one of the most complex areas of machine learning which is building your own tools to generate QA and fix data to ensure that your data set is as Immaculate as possible so that your model has the highest quality information to go off of now in the case of this object detection model luckily we use Google's vertex AI which has that exact tooling built in in fact vertex AI I is how we uploaded all that data and
            • 06:00 - 06:30 train the model without even needing to do that in code at all all you need to do is go to the vertex AI section of the Google Cloud console go to data sets and hit create we then can choose that we're using an object detection model and hit create and now you just need to upload your data you can do it manually by selecting files from your computer and then use their visual tool to outline the areas that matter to us which is a huge help that we don't have to build that ourselves or in our case because we generated all of our data programmatically we can just upload it to Google cloud in
            • 06:30 - 07:00 this format where you provide a path to an image and then list out the bounding boxes of the objects you want to identify then back in Google Cloud you can manually verify or tweak your data as much as you need and then once your data set's in shape all we need to do is train our model I use all the default settings here and I use the minimum amount of training hours this is the one piece that will cost you some money in this case the minimum amount of training needed costs about $60 now that's a lot cheaper than buying your own GPU and
            • 07:00 - 07:30 letting it run for hours or days at a time but if you don't want to pay a cloud provider trending on your own machine is still an option there's a lot of nice python libraries that are not complicated to learn where you can do this too once you hit start trading in an rase took about three Real World hours then you can find your training results and deploy your model which in this case I've already done that can take a couple minutes and then you'll have an API endpoints that you can send an image and get back a set of bounding boxes with their confidence levels we
            • 07:30 - 08:00 could also use the UI here as well so to test it out now in figma I'm just going to take a screen grab of a portion of this figma file because I'm lazy and I can just upload it to the UI to test and there we go we can see it did a decent job but there are some mistakes here but there's something important to know this UI is showing all possible images regardless of confidence when I take my cursor and I hover over each area that has high confidence these are spot-on these are perfect look at that and the
            • 08:00 - 08:30 strange ones are the ones down here with really low confidence I mean these are just wrong but that works as expected this even gives you an API where you can specify only return results above a certain confidence threshold by looking at this I think we want a threshold of at least point2 and there you have it with a specialized model we can run it wildly faster and cheaper and when we broke down our problem we found for image identification A specialized model was a much better solution for building the layout hierarchy similarly we made our own specialized model for that too
            • 08:30 - 09:00 for Styles and basic code generation plain code with a perfect solution and don't forget plain code is always the fastest the cheapest the easiest to test the easiest to debug the most predictable and just the best things whenever you could use it absolutely just do that and then finally to allow people to customize their code name it better use different libraries than we already support we used an llm for the final step now that we're able to take a design and make Baseline code llms are very good at taking basic code and
            • 09:00 - 09:30 making adjustments to the code giving you new code with small changes back so despite all my complaints about llms and the fact that I still hate how slow and costly that step is in this pipeline it was and continues to be the best solution for that one specific piece and now when we bring all that together and launch the builder. figment importer all I need to do is Click generate code we will rapidly run through those specialized models and launch it to the Builder visual editor where we converted that design into responsive Pixel Perfect code that we can output as high
            • 09:30 - 10:00 quality react quick view Etc code and even change options to use popular styling Frameworks like Tailwind Etc doing all this super cool AI magic and you can just copy and paste right into your code base and luckily because we created this entire tool chain all of that's in our control and that's it to quickly recap I would always recommend testing an llm for your use case just for exploratory purposes but if it's not hitting the mark write plain old code as much as you possibly can and where you hit bottleneck see if you can find a
            • 10:00 - 10:30 specialized type of model that you can train generating your own data and using a product like vertex AI or many others and create your own robust incredible tool chain to wow your users with exciting feat of engineering that they maybe have never seen before for a more detailed breakdown of everything I just showed you here check out my latest blog post on the builder. blog and I can't wait to see what you go and build