The Grand Finale of Gen AI Learning

DAY 5 Livestream - 5-Day Gen AI Intensive Course | Kaggle

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

In the final session of the Kaggle 5-Day Generative AI Intensive Course, over 250,000 developers tuned in to learn about applying MLOps principles to generative AI applications on platforms like Vertex AI. The session covered topics such as operationalizing GenAI at scale, prompt engineering, fine-tuning, and handling generative model evaluations, with insights shared by experts from Google DeepMind and other divisions. It culminated with a capstone project challenge for participants to showcase their learnings.

Highlights

The course culminated with insights into deploying GenAI at scale using Vertex AI 🌐
MLOps for GenAI includes new components like prompt engineering and fine-tuning 🛠️
The experts discussed how to manage and monitor AI applications using agent-based models 🤖
A capstone project was launched for participants to apply their learning from the course 🎓

Key Takeaways

Generative AI MLOps requires new tools and workflows like prompt management, RAG, and agent orchestration 🛠️
Vertex AI provides a robust platform for deploying and monitoring generative AI applications 🚀
Evaluating generative models involves complex metrics beyond traditional accuracy, including fluency, relevance, and safety ✅
The introduction of agents in AI brings new complexities, especially in managing interactions and ensuring security 🔐
Developers were encouraged to stay adaptable and continuously learn as AI technologies are rapidly evolving 🌟

Overview

This intensive course ended with day five focusing on the deployment of generative AI at scale using MLOps principles. The session explored how foundational models are built and cared for, especially when it comes to operationalizing them using platforms like Google’s Vertex AI. The focus was particularly on using MLOps to manage GenAI’s life cycle and orchestration.

Participants dived into the nitty-gritty of managing prompts, fine-tuning data sets, and creating evaluation and monitoring scripts to ensure performance and governance. The integration of agents, especially in handling immersive and complex tasks, was heavily analyzed as it marked a shift from traditional ML models.

To solidify their learning, participants were introduced to a capstone project to demonstrate their acquired skills and knowledge. Through this project, individuals or groups could showcase hands-on application by building a project using generative AI, aiming to foster continued learning and innovation.

Chapters

00:00 - 01:00: Welcome and Introduction In this chapter, the course organizers welcome participants to the final day of the Kaggle Generative AI intensive course. The course has attracted over 250,000 developers globally who are eager to learn about building AI applications using Gemini APIs, vector databases, and agents. The organizers express excitement about the final session, hinting at surprises planned towards the end, encouraging attendees to stay attentive as the session concludes.
01:00 - 06:00: Course Overview and Day 5 Focus The chapter titled 'Course Overview and Day 5 Focus' features a discussion led by Paige Bailey, the engineering lead for the developer relations team at Google DeepMind. The purpose of this session is to provide an overview of the course with a specific focus on the content of Day 5, which centers around MLOps for generative AI. The host highlights the support of Kaggle as a sponsor, encouraging exploration of their resources such as models and data sets. The session aims to enlighten participants about the steps and considerations in implementing machine learning operations for generative AI applications.
06:00 - 10:00: Operationalizing GenAI at Scale The chapter 'Operationalizing GenAI at Scale' focuses on the process of taking applications built with GenAI technology and integrating them into a robust production system. The chapter acknowledges the hard work and dedication of moderators like Brenda Flynn and Kal, who have tirelessly managed logistics, coordinated digital spaces, and responded to queries on platforms like Discord, ensuring smooth operations behind the scenes.
10:00 - 20:00: Q&A Part 1: MLOps and GenAI The chapter titled 'Q&A Part 1: MLOps and GenAI' discusses the curriculum related to operationalizing Generative AI (GenAI) at scale using MLOps on Google Cloud's Vertex AI. The introduction indicates that the session will go over foundational model-building and the creation of effective prompts to input into these models. It's a continuation of the topics covered in previous days, focusing on scaling and operational practices around GenAI.
20:00 - 31:00: Q&A Part 2: Evaluation and Challenges The chapter 'Q&A Part 2: Evaluation and Challenges' covers the use of embeddings and their role in representing semantic representations of data, including querying them at scale. It also discusses dynamic agents and their interaction with the external world to solve tasks. Additionally, the chapter explores domain-specific models, highlighting the creation of specialized models for particular tasks and fields, such as in the medical field.
31:00 - 40:00: Demo: Agent Starter Pack The chapter titled 'Demo: Agent Starter Pack' discusses the integration of medicine and cybersecurity with AI, focusing on practical applications. It highlights the need for adapting MLOps principles specifically for the generative AI lifecycle, emphasizing that traditional methods are not directly applicable. The chapter outlines key phases such as discovery, development, experimentation, and evaluation in this context.
40:00 - 47:00: Q&A Part 3: Future of MLOps for GenAI The chapter discusses the evolving aspects of deployment, monitoring, and governance, which need to be specifically adapted for GenAI (Generative AI). It highlights the shift in the development paradigm where GenAI begins with foundational models, adapting them through prompt engineering. Prompts are likened to data and code artifacts, tailored for specific models or model versions. The chapter also touches on the integration of various components using methods like chaining.
47:00 - 61:00: Closing Remarks and Capstone Project Introduction The final chapter, titled "Closing Remarks and Capstone Project Introduction," covered several advanced topics in the realm of machine learning and artificial intelligence. It discussed the use of RA (Reinforcement learning with Augmented generation) to enhance the quality of model outputs, which is pivotal for ensuring high-performance results in AI applications. The chapter also delved into model adaptation and data handling practices. Key techniques like supervised fine-tuning and Reinforcement Learning with Human Feedback (RLHF) were highlighted as crucial components of the MLOps pipeline. These approaches are essential for tailoring models to specific tasks and enhancing their performance. Furthermore, the importance of using a diverse range of data was emphasized, which includes prompts, grounding data, task-specific datasets, and human feedback, all critical for improving model accuracy and robustness.

DAY 5 Livestream - 5-Day Gen AI Intensive Course | Kaggle Transcription

00:00 - 00:30 hello everyone welcome back to our last and final day of the Kaggle Generative AI intensive course where over a quarter of a million developers around the world have tuned in to learn all about how you can build AI applications using the Gemini APIs um uh things like vector databases agents and more um we're so excited uh to have kind of our last session of the week and we also have some fun surprises ready for you um towards the end of the session so make sure to pay attention um as we wrap up
00:30 - 01:00 today and and kind of share more about next steps um as always I'm Paige Bailey um the engineering lead for our developer relations team over at Google DeepMind and I am overjoyed to welcome all of our expert guests today to talk about MLOps for generative AI just as a general reminder this course is sponsored by Kaggle um which I encourage all of y'all to take a look at they have lots of great models data sets and more available for you to use in your projects um this is day five uh MLOps for gem render of AI so all about how
01:00 - 01:30 you can take these apps that you've built and put them into a robust production uh robust production system um and I want to give a huge virtual round of applause and thank you to all of our wonderful moderators um especially Brenda Flynn and Kal who have been making everything happen um behind the scenes tirelessly working to answer Discord questions make sure that all of the logistics are wrapped up um so uh if you see any of them around the digital space or in person um please make sure
01:30 - 02:00 to say thank you um and with that I'm going to hand it over to Anant for a brief curriculum overview of what we'll be learning today anant take it away thank you Paige hello everyone welcome to the last day today we'll be looking at uh how to operationalize Genai at scale uh using MLOps on Vertex AI so in the first few days we looked at how to foundational models are built how you can prompt engineer and craft like good prompts for um for the inputs to get the
02:00 - 02:30 desired outcome we looked at embeddings and how they represent semantic representations of data and um how you can query them at scale then we looked at agents of how they their dynamic um you can use uh dynamic aspects of them where you can see how agents interact with the external world to solve the task at hand then we looked at domain specific models where we saw how you can make specialized models for certain tasks and fields for example medical and
02:30 - 03:00 medicine and cyber security and today we'll be looking at how all how we could kind of combine all and a lot of these and put them in production so you can actually start using them for your applications so um so quick overview of the white paper we read in the white paper how MLOps principles have to be adapted to the unique genai life cycle and they cannot be used as is for example this includes the discovery phase of Genai the development and experimentation phase evaluation
03:00 - 03:30 deployment monitoring and governance these are all aspects which need to be adapted to specifically to Genai then later we saw that the shift in the development paradigm uh Genai starts with foundational models adapting them with prompt engineering uh where prompts are basically kind of data and code artifacts because it their uh prompts are developed for a certain model or a model version and chaining these various components together with techniques like
03:30 - 04:00 RA or agible augmented generation to improve the output quality of your models this was then followed by model adaptation and data practices for example we looked at fine-tuning techniques like supervised fine-tuning and RLHF uh which can be used as part of your um MLOps pipeline to enhance performance for specific tasks and needs genai uses diverse data which includes prompts grounding data task specific data sets human feedback and even
04:00 - 04:30 synthetic data at times to kind of achieve the um these are the tools which you can kind of uh turn to um when you need to in your MLOX pipeline after that the paper covered evaluation and monitoring which are also critical for genai workloads we moved towards automated evaluation sometimes even using models or autoerators as judges as we saw in day one and uh we also looked at how some creating custom evaluation data sets for your business and the KPIs
04:30 - 05:00 that you're tracking and continuously monitoring after the model is deployed continuously monitoring the model for skew drift and performance to ensure that it's operating as intended deployment and governance handle the complexity of geni systems but this also involves managing multiple um artifacts for example prompts chains adapters and what have you via CI/CD strict version control optimizing foundational model
05:00 - 05:30 deployments and extending governance across the entire lineage to make it easily trackable and traceable uh a key section which we have added in this uh year's update would uh is basically adapting MLOps to agent or as we call it agent ops discovers the agent life cycle tool orchestration using a tool registry strategic tool selection also at scale agent specific evaluation observ observability for example memory and tracing as well as deployment pipelines for agent specific workloads and then
05:30 - 06:00 all of that we concluded in the second half of the white paper we we discuss how all of this can be used and developed on uh by you on Vert.ex AI it's basically a managed pipeline uh platform which uh provides tools across the full life cycle for example uh for discovery um uh prototyping and services for tuning feature uh features for chaining and augmentation evaluation tools scalable endpoints with monitoring
06:00 - 06:30 and robust governance features just to name a few so that will that covers the summary and uh Paige would you like to take over the Q&A thank you yeah thank you Anant uh that was wonderful um so let's move quickly into the Q&A um I am overjoyed today to have a whole bunch of folks uh joining us from uh from Google Cloud to talk more about some of the projects that they've been building uh especially some of the Vert.Ex AI features that we just heard about um and uh we'll get started right away with our
06:30 - 07:00 first question um for Sora Tovari it's uh great to have you here today um uh and I really really love all of the products that you have in your portfolio you know everything from uh kind of Kaggle and Collab some of these beloved tools uh uh that uh that folks have been using um uh throughout uh throughout the course as well as um many many other production grade AI system tooling um for Google cloud so excellent to have you um our first question is the pace of
07:00 - 07:30 development in the field of agents has been tremendous especially productionizing agentic systems what are some recent technologies that can help developers productionize agents so um I mean yes there is a lot of excitement and thank you Paige and and it is great to join as well uh in this in this event hopefully many of you you who are watching and listening are finding it uh useful uh the entire course if you look into the entire agent
07:30 - 08:00 space uh there are many different uh like exciting things happening uh the most exciting I would say is uh LLMs are moving away from prompts getting wrapped around LLMs to actually these what we are calling as agents doing actual things and so within that there are a few things that you can think of which will be very important important one is frameworks how do you manage and orchestrate u agents uh this is where
08:00 - 08:30 there are many open-source uh tools which are there like lang chain uh lang graph which is therei many many other uh things u by the way this course happens just a week before Google cloud next uh so we would also have a lot of announcements coming uh next week so just hold on to that um we'll be talking about a lot of things which should be useful to this uh particular audience um then if you look into apart from the
08:30 - 09:00 orchestration framework there is the model which is very important and one particular aspect of the model which is specifically important which is uh function calling because at the end of the day for the agents to be meaningful beyond just giving long responses or descriptive things etc these agents would have to do actual things which means they need to interact with real world system and that means interacting with APIs or data sources etc and so within that the quality of LLMs which
09:00 - 09:30 are good in function calling will have a huge impact now obviously there are other aspects like for example the reasoning portion because when you are expecting the agents to do things they need to iterate on certain options meaning oh they did something just like us as human beings like we try to do something it doesn't work out then you iterate and you're like okay let me try this other thing etc and so on or you take a complex task and you unbundle it into smaller pieces and you co-executing all of them as well those capabilities
09:30 - 10:00 as well the thinking aspect of the models those will become important then access to APIs and data sources that is another critical piece uh which which is there and then the last one is the eval framework there is a lot of excitement I would say in the last couple of years relating to agents and many many demos have been done um one reason why it hasn't picked up even though there is a lot of excitement about it is Because the demos work well
10:00 - 10:30 but when you try to do things in real world or at high level of predictability that is what you want in like real grade production systems etc the quality does not hold up and so having automated eval systems will also be uh important and this includes things like autoator which needs to be there where you can have LLMs in a in a loop um actually if you think about because these agents can and because of the word agent they have agency to to do do things um you will
10:30 - 11:00 have them operate in many different ways and so having kind of unit test or predictable frameworks to test these agents is not an easy way or it's not a scalable way uh so to speak you cannot have human judges kind of labeling whether a particular agent worked this particular uh in this particular scenario or not etc in in hundreds of because it will become very expensive and un impractical very very quickly and so having potentially an LLM based eval
11:00 - 11:30 framework for these agents is the other particular direction uh where I think there would be a lot of activity and so you would see uh across uh both Google products as well as um other frameworks uh you'll see a lot of uh capabilities tools etc across all of these different areas uh that are there and I hope to see much more exciting things happening with agents very soon awesome excellent thank you for the for the great answer and folks should remember that earlier
11:30 - 12:00 in this week we learned all about agents and function calling and how you can use um Gemini with those component systems to build your own agents so really excited um to see what folks create um next question for uh for Gabriella um traditional ML ops focuses heavily on managing uh training data and model versioning how does this shift with generative AI where prompt engineering fine-tuning data sets rag um managing
12:00 - 12:30 agent co tool chains become central so what new components or workflows are needed to help folks um manage their MLOps pipelines for generative AI yeah I think this question really hits at the heart of of the heart of how the MLOps landscape is evolving with generative AI if we think about the core principles of MLPS MLOps like automation versioning everything code data models continuous integration and delivery robust monitoring and fostering collaboration
12:30 - 13:00 these remain absolutely viral on generative AI as well where things really change is what um how is mentioned in this question is when uh because we are not often training models from the ground up anymore so we usually start with these powerful pre-trained foundation models and and the game chiefs um we usually um we it's like less about initial training and more about how we adapt these big models and that means things like is mentioned in this question like prop engineering rack building agents etc so basically this
13:00 - 13:30 chip brings um new first class citizens into our MLOps pipelines and this needs exactly the same level of rigorous management so if we break it down for prompt we need to treat parts of the prompt like prompt templates as code versioning them test them deploy them reliably on verxi we are building tools into verxi used for this like uh prom management to store and iterate over prompts on the UI and also on the SDK uh you can also lean on vari experiments and the evaluation services um so you
13:30 - 14:00 can track different pro versions log how well they actually work and even wire them into your CI/CD so updates get validated automatically uh for building robots rack system you require to manage the data to embedded pipeline and the vector store itself for instance you can use for this Google cloud storage for source data for storing your source data vertex pipelines to automate the data ingestion and embedding generation and leverage um vertx search as your as your manage vector database so you don't have to run
14:00 - 14:30 it yourself uh and you can also um use tools like BigQuery or cloud monitoring um to to monitor your application and now if we talk about agents uh well agents can get very complex too so you have multiple steps and interactions so you need ways to define them using open source frameworks uh and then you can have you need to have a solid runtime to actually execute them on on Verex AI we have agent engine which gives you this manage runtime uh the nice thing is like
14:30 - 15:00 um is framework agnostic so it helps you with deploying scaling and gives you observability with cloud trace logging and monitoring right out of the box uh and for managing the actual tools the agents might use you can use tools like artifact registry and actually the the whole u verxia platform and Google cloud is very flexible so you could integrate cloud trace with open telemetry yourself if you wanted to build it like DIY style and I think Ellia is going to show how that works later um then if we think
15:00 - 15:30 about fine-tuning especially what we've seen uh folks do a lot are parameter efficient finetuning or laura tuning so on verex ai we also have our verx i tuning service uh which works really closely with verex experiments and evolution services so you can keep meticulous track of everything uh what data you use the hyperparameters the results and then the adapter that you create and the models you deploy can get version and manage right into vertx AI model registry so I think I'm covering
15:30 - 16:00 all the points it's a lot to cover and I love the the kind of attention to detail and the the focus on managing and version control for each one of these component pieces of a pipeline um we talked a little bit earlier in the week about how um you know deploying agents feels a lot like uh uh distributed systems tasks um so so uh you know being being mindful and having a good kind of catalog of all of the all of the component pieces um is really going to
16:00 - 16:30 become mission critical for organizations um and I also really love uh just as a kind of quick shout out I love the open source evals framework called prompt fu um which also works really really well with Google cloud and with the gemini APIs um cool next question um for Anon to Navan um evaluating evaluating generative models goes beyond standard accuracy metrics what practical techniques and tools are you using or developing within GCP for evaluating um large language model
16:30 - 17:00 outputs for quality so things like fluency or relevance or lack of hallucination um also safety cost effectiveness during development and continuous monitoring and production and wow that was a mouthful um so so how are you all thinking about uh evaluating generative models so maybe I can start off and uh talk about evaluation in general and Ian we can I'll leave it to you for talking about agent eval okay perfect so um as learned as you guys saw in day one um uh we talked a lot about
17:00 - 17:30 evaluation I think using autoerators as judges has really taken off um especially when it comes to evaluating uh these various dimensions as you mentioned uh and there's two aspects to that one is pointwise where you kind of look at the response and a prompt and say okay is this safe to go ahead with it is it fluent etc which does not so much depend on the prompt itself but often on the response and you can use pointwise metrics for this but often another approach which works really well
17:30 - 18:00 is pair-wise side by side as as we say it um because sometimes point-wise evaluations um um g give a very um give a score which is too broad and you can have a lot of ties and as humans we often need to uh when we look at two different documents or even three different documents we often compare side by side to say okay hm I like this one more than the other one and that's where sideby-side comparison using auditor judges really helps a lot now these basic concepts can also be extended to
18:00 - 18:30 evaluating agent trajectory ries and the final response as some of it which we covered in the day three agents ivan would you like to um um shed more color on this yeah yeah so essentially uh as you said Anand I mean uh I'm as a judge for sure uh they are one of the important components in your evaluation toolbox that you want to use with this new generative application but when we talk about agents agents they don't only you know generate uh the response they also use tool uh in order to generate
18:30 - 19:00 that response so that's why in vertex AI you know we integrated our toolkit uh for evaluating generative application with a set of matric matric metrics that allows you to uh understand if the agent is capable of uh you know uh taking the right uh path in uh when it decide to uh you know utilize some of the tools that you make available to the agent itself so to the model itself and so with this
19:00 - 19:30 with these metrics you can uh you can try to understand if the agents you know pick the right tool with respect to the user input or if it picks a set of tools uh in the right order or in any order so you can also you know experiment with the variability of the the way the agents decide to use this tool and the one important thing is that uh with the agent I mean this the tool selection of the agent or the way the agent uses this tool is one aspect but then you want you also want to combine like the response
19:30 - 20:00 that the agent generate when you use a subset of tools right and in with in in this sense like the metrics that we provide today they are still like you know in a research space so that's why in um with the vertexi geni experiment services we give you the possibility to define your own custom metrics so you um you can define metrics that um that are related to uh how the response you know kind of
20:00 - 20:30 correlate with the tools that the agent decide to use so in such a way that you know you you can uh you can um evaluate in this sense if the agent not only picks the right tool with respect to uh the user uh input that he receive but also you know if use the tool in the correct way if it is if it is good in the reasoning part in picking these tools and generate the correct response so uh this is very important just
20:30 - 21:00 because um again uh using a auto raider is just one way uh you can uh you can use to evaluate generative AI application but uh it has a it has its own uh pros and cons so at the end of the day you really want to have a uh toolkit uh that allows you to implement your evaluation using uh autoated methods but also you know custom metrics so it gives you this flexibility to adapt your evaluation with respect the application that you're trying to build yeah and one small thing to add on to
21:00 - 21:30 that talking about the toolkit argument another trend which has taken off is multimodal evaluation so for those of you who are in Iclear please drop by our booth we are discussing uh text to image and text to video evaluation using um um um using rubrics driven kind of um uh where you kind of take take the original instruction make rubrics and use that to evaluate this helps specifying it more to a certain task that also applies to text and video generation so that'll be all thank you awesome excellent and I I
21:30 - 22:00 love the trend towards multimodal evaluation especially as people are getting more and more into uh challenges like video understanding audio understanding um and even generating multimodal outputs from the models um next question for Socrates and IA looking at the full life cycle from data prep uh for fine-tuning and u monitoring deployed agents and models what is currently the most significant bottleneck or challenge in applying MLOps principles effectively to generative AI projects and what
22:00 - 22:30 advancements platform features best practices new tools are most needed to overcome it thank you for this question up to this point we heard everyone talking about evaluation right and this is one of the most important aspects if you want to bring something robust into production so let's start with the technology aspect of things to do evaluation first you need to prepare data so one of the most critical bottlenecks that we have is how we prepare data for evaluation most of uh uh the customers in the
22:30 - 23:00 market what they used to do is they used to incorporate the humans in the loop and what I mean is labelers that they go manually they check what is the the input that we need to to give to the LLM they generate an output and then they use to assess the models manually then we move also to LLM and synthetic data generation that uh it is the cost is lower than having humans but might not be that accurate or we might need to combine both this problem becomes even
23:00 - 23:30 worse whenever we have multimodality as you said right when where we have images audio video how do you prepare this data for them and if we think about uh more recent uh aspects like live streaming API how can I evaluate live streaming i interrupt a model while it talks and then how can I test this as well for agents uh as we already mentioned one of the key aspects is how we call tools right so to create an evaluation data
23:30 - 24:00 set it is not only about input uh query and expected output we need to track the whole tool chain what the function I call when I call and what was the sequence did I stay in the same topic so again the data preparation is difficult and nowadays what we have seen is we are trying to create systems that we record the interaction of a human with an agent with that way we gener generate the data and later on we can use them for evaluation a very important aspect on
24:00 - 24:30 this is memory management and memory management especially when we move in a multi- aent setup we start talking about graph databases and the valuation becomes even uh more difficult now similar topics uh that we discussed already is about tool management as well imagine in production systems we need to have authentication authorization of all these tools it is not like in the classic machine learning that we had a model that we used to to to trigger and
24:30 - 25:00 get a result back now we have tools that we trigger and we get the result back so we need to orchestrate them with a very nice way next and I used to say that is MLOps or geni ops is not only about technology is people and processes as well what I mean is uh we have new personas into the whole landscape we had data scientists and machine learning engineers now we have prompt engineers that they need to be specialists on the topic of uh the application that they built we have AI engineers who are
25:00 - 25:30 responsible to know how to call a tool and how to integrate everything in a real application we have devops and app devs that they need to integrate the whole back end that the front engineers and AI engineers have created into uh into an application so as you can imagine this again can be a bottleneck what I mean is the communication among all these different personas and to make them work like a very welloiled machine and the last point that I want to raise is about knowledge knowledge of genai this is a big bottleneck in the market
25:30 - 26:00 we have seen that most of the software engineers or business personas they want to use genai for everything but actually they need to have the fundamental knowledge to somehow filter which applications can be solved with genai which not so knowledge is very very very important uh tool awesome and Ellia is there anything you would like to add cool sounds sounds good like I I think
26:00 - 26:30 that was very very comprehensive as an answer next question how is GCP cloud AI uh simplifying the MLOps life cycle specifically for generative AI developers um and what are the the kind of key challenges that they face when operationalizing generative AI um so things like managing cost and latency um uh Mike and Avon do you want to talk a little bit about this one yeah sure thanks Paige i have a couple of thoughts about this um and then Ivonne I can hand over to you and get your thoughts on
26:30 - 27:00 this too um yeah I mean as if you know MLOps wasn't complicated enough uh Genai comes along and makes it exponentially more complicated so we're now trying to figure out how to manage this life cycle um I think one of the good things here is that is that Google has a tradition of being a developer focused company which we're a company of engineers so we are really thinking about how to make a really developer friendly platform for Genai development uh one important part of that obviously for Genai is prompt management um prompts are so important
27:00 - 27:30 in Genai i often think about a prompt and a model together as a as a unit we talked about this in the white paper earlier this year um but you really have to manage the prompt itself as an important as a critical component to the application so we've introduced prompt versioning both in the Vert.ex AI SDK and in Vert.ex AI Studio so you can systematically manage your prompts as part of the application programmatically manage the the prompts that you're deploying to your application um in the
27:30 - 28:00 same way that you manage the rest of your software supply chain and using the same software supply chain controls so you can version all of the prompt parts of the application as much as the rest of the software um another part of it is that we're working on is infrastructure um obviously fine-tuning Gabriela talked about fine-tuning and as part of the the the way of specializing and optimizing your JI application um we're trying to make that as streamlined as possible through things like Vert.ex AI custom
28:00 - 28:30 jobs so in a in a couple of lines of code you can spin up your own fine-tuning pre-training jobs on Vert.Ex infrastructure using a managed cluster of of of nodes with multiple GPUs multiple regions in the world and you can do this all programmatically again so we're abstracting away a lot of the the complication of deploying you know a cluster of training nodes and and acquiring the uh the GPU compute resources to do that kind of work so again that really takes away some of the
28:30 - 29:00 the complication in the the productionization of a genai pipeline um and then when it comes to deployment another part of that is you know obviously taking the model to production and making it available to your application so you can deploy a pre-trained model your fine-tuned model into the Vert.exi Vert.Ex Ex AI prediction service that gives you an endpoint that your application can hit that has all of the infrastructure controls security um infrastructure um
29:00 - 29:30 uh uh support that you have that you need for a production application and you can do that just at the end of your Vert.exi custom job take the model deploy it to Vert.ex AI prediction and it really again streamlines that whole process and then as we're talking about agents agents are obviously taking the whole the whole geni world by storm right now you can deploy an agent you know a custom agent into agent engine using one of a number of frameworks that that that's available and you take advantage of the managed service underneath that to make your agent
29:30 - 30:00 available in your Genai application so we're really focusing a lot on these different parts of the process to streamline the whole Genai uh life cycle for MLOps application development um and Ivan are there anything you wanted to add to that as well yeah no I mean uh I just want to briefly touch two aspect that uh they ask in the question like uh what about uh the cost management latency optimization and the continuous evaluation so uh I mean um especially
30:00 - 30:30 when you start you know building a generative uh AI application at scale uh using these models which is are which are powerful model they are they are cheap but they are not free right so I can understand why you are uh you want to monitor your cost and you want to also optimize the latency uh around them so essentially the way you can do that is uh setting up um some uh um monitoring dashboards that allows you you know to count the input and the
30:30 - 31:00 output token that the model generates and um you know uh watch also the latency of a of a of your application and uh I mean when uh when you start building the application during the development for uh during the development phase for sure you can reuse some of the services that we mentioned so far so for example with the vertex AI gener evaluation services you can compare different models and try to figure it out which is the the best one but also you know uh validate which is the cheapest one that you want to use
31:00 - 31:30 compared to your uh your application so if you have a simple task maybe you want to use um a model that is cheaper uh rather than you know a big big reasoning models and so you can optimize this aspect during the development phase and then again when you go to production you can use some observability dashboard actually recently we uh release this integrated observability dashboard in uh in Vertex AI that you can leverage you know to track the you know uh the model the the
31:30 - 32:00 number of token it consume and so uh like estimate your cost and um the other aspect of continuous uh evaluation is that at the end of the day once you deploy an application this application will generate logs right in the past the log they were the prediction of your um machine learning models today uh they are related to the tracing of the application that you generate especially if you have a chatbot or you have a or you have an agent so what you can do is that you can collect this tracing and uh
32:00 - 32:30 and then um um post like a post processing it in a way that you can structure your evaluation data set and run an evaluation job and I think uh uh actually Aliyah in his last demo uh it will shows you it will show you how how you can do that and if you don't want to use the dashboard that we provide you can also use your own tools so it will it will show you how to build a custom uh dashboard to monitor your application so I think this br uh everything yeah
32:30 - 33:00 thank you both for the great answer um next question is for Sorab uh looking ahead five years how do you see MLOps for generative AI evolving to support enterprise applications um particularly in areas like automated model retraining um governance optimization um and how do you think Vertex AI will play into this transformation so five years is a very long time period uh if many of us think about what we were doing even 3 years back uh to now
33:00 - 33:30 uh and the progression of uh deep learning or genai or AI in particular uh it has been quite a journey and uh I would say that the speed of change and acceleration will actually increase if you can imagine that so predicting what will happen at 5 year time period is pretty uh risky and challenging um in terms of the the specifics of the of the question uh itself I would say we need
33:30 - 34:00 to be and I think as many of the earlier speakers mentioned uh particularly I think Socrates was mentioning about having this knowledge or this thirst for knowledge of like what's happening what's changing etc and that is where I would say tools like Collab Kaggle AI studio become very very uh useful and beneficial because you can get to if you and um if you look at Kaggle right you have right now a K prize happening which is about uh software engineering
34:00 - 34:30 automation uh aspects right and a very challenging goal which is which is there uh that can help give signals into I mean that's just one example there are many other competitions going on in Kaggle If you look into AI studio I think earlier we were mentioning about uh some of the multimodel capabilities there is a multimodel live API which is there which can be leveraged right now to kind of evaluate like when we are when I think Paige was mentioning about you can look at like machines can look
34:30 - 35:00 at video they can look at text they can understand voice and compose all of these information to make the right judgment calls etc whether it be in an auto evaluation kind of a framework or whether it be to actually do the task or to do that agentic work so these things are all accessible right now again on collab side uh obviously there is a notebook functionality but as I think Paige was mentioning about the data science agent that's just one example where if you look back into if you had a
35:00 - 35:30 large amount of data whether it be in your spreadsheet or whether it be uh in an alloy DB or more of a commercial database etc and if you were to analyze information on top of that you would be spending lots of time about data managing cleaning analysis etc and all of that could be uh automated away so we already are seeing across even these three pieces we already are seeing bits and pieces of how to connect the building blocks etc and the speed of uh
35:30 - 36:00 acceleration will just uh increase meaning um if you have to project I mean I would say even 6 months a year down there will be multiple agents and there will be agent orchestrators will be running on top of some of these very close to semi-autonomous agents etc the way to think about vertxi is whatever you see in the open source world or in the developer friendly world we provide the same thing on an enterprise or a managed setting uh etc so lots of lots of acceleration uh I kind of like even
36:00 - 36:30 personally uh given the the part of my job the the type of things that I get to see uh is just fascinating and the the other cool part which is happening is whatever is happening in the research side of like Google deep mind and many other places around the world it is getting propagated at a unprecedented and truly I mean by that unprecedented word word that it is getting dispersed or distributed across the world in a very short period of time and so I would
36:30 - 37:00 say just keep a close look at what is happening uh in in this particular space and be adaptable to the change which is there yep i love that answer and I love the focus on uh kind of being flexible and adaptable and and kind of trying to to focus on outcomes and real problems as opposed to um uh as opposed to uh you know trying to learn um just one specific uh technology or one specific kind of model capability um because wherever we are with the models today in
37:00 - 37:30 two weeks or even a month uh like we're all going to be blown away yet again yeah and uh maybe let me just just add to that that typically in the past most of us have thought about oh here is an area here is a focus area where I will just continue iterating etc and so on this the shape or the landscape is just changing so quick quickly that we need to be really really adaptable to it right we can't be married to oh there is just one way in which we will go so for example oh I will just continue doing
37:30 - 38:00 prompt rewrites to iterate and improve right there are now meta prompt writers where you describe what you want and then there are LMS which are optimized to write the prompts for the smaller models etc and so on right so there are things as well like those type of things that we we need to be open to yep it's awesome and I love uh like you mentioned I love getting to be embedded in kind of the the research and the the deployment at Google because it feels like we have a front row seat to to a lot of this progress
38:00 - 38:30 excellent um so next we are hiring as well yes we are hiring So if folks uh after you've uh learned and built with all of these generative AI tools uh uh Google is definitely hiring um next question from Kush um what fundamental shift will happen in the software development life cycle um uh uh through uh Agentic AI so we already talked a little bit about one the data science agent and collab notebooks what other things have been cooking up um on the uh
38:30 - 39:00 on the Google cloud and kind of broader developer community side socrates do you want to go first yep sure sure sure well uh software development software development is nothing else than writing structured text right this is amazing because LLMs and agents can help us to do do uh to perform many tasks first of all we can start developing uh solutions extremely easy i I I don't know if you have seen that currently we have canvases that we
39:00 - 39:30 can write our prompts see the code and see the preview of our code immediately i used that uh last week to create Tetris in a few minutes right with Gemini it was amazing uh then after boosting the application development the next and the most important thing is about the boosting the testing part testing is something that most of the people they are not really really good at right so this is a by leveraging agents uh it is a very easy way to to
39:30 - 40:00 perform that now the next is about uh let's imagine that we have a scenario that uh we already have built a very complex repository with legacy code very very difficult so I have seen aic solutions that you can pass the whole code repository to them then they analyze your code they do the architecture design for your new query for example I want to add this functionality to my code and then they start generating the code they run the code they pass through the errors They
40:00 - 40:30 resolve the errors and then they give you a complete solution as you can imagine this is amazing to boost the whole process and to come closer to the data science world right we have seen also solutions from agents that they prefer to generate the code for data scientists to create plots for example or to to create queries for data frames or we have seen even the natural language to SQL uh solutions that helps us a lot to create uh data queries for
40:30 - 41:00 databases and this can make the life for data engineers extremely simple so all these categories can help you to boost the whole development cycle ivan do you want to add something yeah no I mean uh what what I just would like to say is that uh especially now that we are seeing you know uh Gemini's models that we provide is getting better and better encoding we are seeing that these kind of models they are already integrated in some of the most uh ID uh uh you know uh tools so I think I think
41:00 - 41:30 the the main uh the main thing is here here is the way like uh the interaction with this model for coding is changing uh the approach that as developers we have in in coding so everyone right now is talking about this uh this idea of uh vibe coding so which essentially is uh the concept where the user like use AI to you know write the code and then u based on simp simple the description that it provides to the tool uh and so
41:30 - 42:00 and then he relies on the tool you know uh to to get the code that actually can integrate and use uh to build um like the application that he has in mind so to me it's just uh it's just uh the way you know we are approaching coding we are approaching software development together with these uh uh coding pairs that uh I mean is changing and uh I think in a positive way in some sense because as you said so is speed up a lot
42:00 - 42:30 the the process of getting an application uh like a prototype uh up and running yeah excellent answers i've also been really impressed with some of the the developer tools um externally so things like cursor now supports Gemini um continue.dev Dev Rue Klein and the open- source space um Windinsurf from the Kodium team as well as Cody from the source graph folks um they're all adding support for Gemini especially Gemini's uh kind of longer context window which
42:30 - 43:00 can ingest full code bases um and it's been really magical to see how they can all um kind of accelerate software development and even build in follow-up questions and more agent style approaches i've really loved working with Rode for that um so uh hope uh excited to see more um applications not just for the in IDE experience but also AI is applied to the full endto-end software development life cycle including things like code review and code performance and deployments awesome
43:00 - 43:30 um so next question um for Sorob do you think there will be a market of agents created by software companies or digital creators um and you mentioned uh Google Cloud Next next week so so I'm wondering if there might be might be anything agent like um announced there i mean we will be talking about agents quite a lot uh next week uh including in Thomas Kuran's keynote who is the Google cloud CEO um but going back to the question I
43:30 - 44:00 think the answer is a big yes um you will see uh multiple agents across multiple frameworks as we talked earlier like there are different frameworks which are there different data sources there is different quality aspects etc there will be a heterogeneity of agents which will be there there will be value which will be attributable to agents uh depending upon how good they perform how much value do they bring in and because of those reasons there will be a
44:00 - 44:30 marketplace uh one thing I would call out which I don't think is getting that level of attention and I feel will become very very important in the agent space is security as agents start having more capabilities and ability to do things then the simpler example I would say is as there were apps in the app store on your phone whether it be Android or iPhone where you have to go through an approval process then there is like controls on cameras and battery and stuff like that
44:30 - 45:00 etc in a similar way if you think about agents can do much much more uh and so uh whether there are inherent by design kind of like uh malicious activities or whether through other attributes some uh bad outcomes could be created right those things authentication is another thing access to content which is another thing right so that entire space I I just want to call out and put a plug for all our developers who are uh thinking
45:00 - 45:30 who are participating in this in this uh course and are thinking about building agents to think along that particular axis because that will be an important play and yes there will be a marketplace with exchanges etc happening as well yeah I've loved seeing the I've loved seeing the the investment that Google cloud is making in things like MCP servers and then also these kind of robust security approaches towards agent agent interactions yeah um the the Stripe team also had a really great example a couple of weeks ago where they were using agentto agent uh
45:30 - 46:00 authenticated payments so agents could kind of uh authenticate and pay other agents um uh uh as well so so lots of really interesting work needed to happen in that space to make it enterprise ready um and hopefully some of the folks on the call will be inspired to build businesses around this or to to join existing businesses to make that happen awesome thank you for the great question Peter and with that we are going to head over to the code labs ellia I I'm very
46:00 - 46:30 excited to learn more i hear that this is a live demo and so we will be crossing our fingers and hoping to the demo gods that everything will be going as attended is that correct yeah that's correct fingers crossed yeah uh so I'll share my screen um and we can start with a really really quick uh presentation on uh a resource we created to help you accelerate your agent development uh journey essentially which is called starter pack so uh we are going to start
46:30 - 47:00 with few numbers which is essentially the fact that it took us from around three months for us which are experienced GCP developers to deploy a dummy genai agent in production so what we see with our cloud Google cloud customers is instead that it takes around 3 to nine months to do the same process and so um this is typically a long process and so we thought why is that you know why what are the reason that are leading to this long time to
47:00 - 47:30 develop an agent in production uh so we found out that there are uh some common challenges and you know we discovered also by actually talking with our customers in the last two years that typically creating the first agent and you know creating the first prototype is typically the the easiest part so downloading a sample starting you know a notebook and things like these are pretty easy uh but then when it uh when it when it comes time to go to production this is where uh customers
47:30 - 48:00 and developers start encountering challenges and so there are some common challenges we we started to see so uh the first challenge is around customization so customizing the agent based on your business needs or business logic so many agents will for example consume data so how do we make sure this data is accessible high quality and fresh is an open question and so and then we discussed also previously about security and compliance so things related to data privacy access control
48:00 - 48:30 adversarial attack mitigation we discuss around evaluation so measuring the performance of the agent building confidence on the agent is an art topic and then also on generating synthetic data which can then be used for evaluation purposes and then also for uh tooling purposes then we have the deployment part uh so which which is touching areas like how do we integrate our agent into a into a UI to make a product essentially and then things like CI/CD pipelines testing rapid iteration
48:30 - 49:00 and rollback mechanism are still challenges uh when it comes to building agents and finally infrastructure so creating scalable and robust infrastructure to deploy the agent is an open challenge and finally observability so the areas like how do we monitor the performance of the agent how do we build insights with our agents so that we can iterate back and start again the development process and so hopefully these are challenges that are also resonating with you as well and to support customers and developers in into
49:00 - 49:30 solving a good part of these challenges uh we created this starter pack resource which is essentially a a collection of production ready agent templates designed to accelerate time to production from months to weeks essentially so this cover all the areas we discussed earlier uh so like from deployment and operations to customization observability you you need essentially all you need to get to production quickly and so like we
49:30 - 50:00 discussed like the deployment operation part where we have things like an API server UI playground and a set of CI/CD and Terraform samples ready for you to start building your agent uh the customization part where we offer a series of readytouse AI patterns and the observability part and evaluation part as well and so I think like now we can directly jump into a demo where we are going to showcase this resource here um so like I can share a different screen now so I can share with you
50:00 - 50:30 uh my terminal command okay so fingers crossed everything will fine with the demo so hopefully you can see my terminal command and typically the journey starts into uh GitHub so we have an open open-source repository called Google Cloud Platform/ Aentarterbug you can go there and start using it as a quick start as well it's essentially a Python package you can install and start
50:30 - 51:00 using it like this so I already done it in my terminal so we can start using it directly for example let's say I want to create a new agent and I want to call my agent guide to MLOps so I will hit enter this will guide me into our inter interactive UI where I will be able to uh select one of the agents I want to start using to build agents and of course like keep keep this space watch this space really closely because at cloud next we are going to introduce more agents here um so let's say I want
51:00 - 51:30 to use a dummy agent and I will be able to choose like how I want to deploy the agent as well in this case I'm going to use Vert.Ex AI agent engine which is a manage runtime to deploy the agent so I'll click next i'll it with all the defaults i will verify that I have enough uh permissions to call Vert.exai and in a couple of seconds you will see we created a new agent so like we can actually see for example the readme of the agent you know and this is actually
51:30 - 52:00 creating the full project structure including the code uh notebooks test may file and so on and so forth so I already have a UI open where you will be able to actually see the agent and you know we can actually start asking question directly to it and you know you can actually start developing changing things on the code iterating until you're ready to actually deploy this agent to production i started already the process just for the sake of time and I created a repository where I push
52:00 - 52:30 the agent we created earlier and so like the really interesting part of this repository is that you can see that it contains some CI/CD pipelines that are actually deploying the agent to production so you can see now we have moving to cloud build and in cloud build you will see that there is a pipeline that is deploying the agent into a staging environment running things like load test against agent engine and then you know we can actually add all the
52:30 - 53:00 report of all the load test how the agent is performing if it's have enough latency and so on and so forth and then when we are ready we can actually deploy to production with one button and and start having our users testing it and using it in production the journey doesn't finish here because when the agent is in production we want to understand how the users are using it and so what this is where the starter pack is offering automated observability uh so like you'll be able to go into cloud trace and into cloud trace for
53:00 - 53:30 each call the user is making to the agent you'll be able to select one of the calls and understand how the user is actually using the agent so like what what are the steps that the agent perform and for each step what what were the specific calls the user and the agent actually um have done you know this is really important to build insights all the data you have here will then land in into bequery so that you will be be able to create dashboards you will be able to create evaluation data set and you know you we even offer a
53:30 - 54:00 lookers to your dashboard so that you can actually understand like how the agent is performing over time and of course this is just a template So everything we are providing here uh you can take it edit it according to your needs and then start building agents as well so hopefully it is uh concludes like the demo i loved seeing that it looked very very simple to go from GitHub to having an agent that you could test and deploy and also loved being able to see the logs and telemetry for the agent to understand how it's performing over time
54:00 - 54:30 what kinds of question user users are asking it um and things like costs and latency so thank you so much for uh for the demo it was great um and also uh wonderful to to see how quickly all of those things could be done with the agent starter pack awesome excellent so thank you so much to all of our wonderful Q&A presenters i learned a lot this segment and I think we're all very inspired to understand how do you incorporate Emilops into all of our apps
54:30 - 55:00 and all of our practices um I'm going to go ahead and go to the next section which is all of our pop quiz questions um everybody get a pen and paper um and we'll be going through uh a few questions before getting into um uh into the wrap for today and for our generative AI course so first question which of these is not a core practice of MLOps for generative AI applications as discussed in the white papers is it a prompt engineering and evaluation as an iterative cycle um b data validation
55:00 - 55:30 model evaluation and model monitoring c training a foundation model from scratch or D managing and versioning prompt templates chain definitions and external data sets which of these is not a core practice of MLOps for generative AI um I'm going to count down five 4 3 2 1 um and C is not a core practice of MLOps for generative AI applications we're we're focused on all of the
55:30 - 56:00 infrastructure around um the model not training a foundation model from scratch question number two what is a prompt template in the context of generative AI is it A a simple text input from the user B a set of instructions and examples with placeholders for user input C the foundation model itself or D the final output generated by the model i'm going to count down five 4 3 2 1 and the correct answer is B a set of
56:00 - 56:30 instructions and examples with placeholders for user input question number three what is the purpose of chaining in generative AI applications is it A to maintain recency in the model's outputs b to avoid hallucination and maintain recency in the model's outputs C to increase the complexity of the model or D to reduce the efficiency of the model um gonna count down five four three two one and the correct answer is B to avoid hallucination and
56:30 - 57:00 maintain recency in the model's outputs question number four why is evaluation a crucial step in the development of generative AI systems and we've been talking about this all week so hopefully folks get this question right um is it to ensure the model is deployed to the correct infrastructure is it B to optimize resource utilization and reduce latency is it C to track the lineage of data and model versions or D to measure the quality and effectiveness of the model's outputs going to count down 5 4
57:00 - 57:30 3 2 1 and evaluation is a crucial step in measuring the quality and effectiveness of the model's outputs hopefully everybody got this right and are already thinking about how you can create your own evals question number five which Vertex AI product allows for recurrent execution of evaluation product uh evaluation jobs in production skew and drift detection processes is it A Vertex AI model monitoring B Vertex AI pipelines the C Vertex AI feature store
57:30 - 58:00 or D Vertex AI model registry i'm going to count down five 4 3 2 1 and the correct answer is B vertex AI pipelines and with that it is uh hopefully everybody got 100% correct um we are going to head to the final piece of our Kaggle Generative AI intensive course we have wrapped up all of our um kind of in-person YouTube streams and now hopefully you have all of the tools in
58:00 - 58:30 your toolkit um to do our final segment which is the exciting capstone project um so for this capstone project you can level up your skills and build your um your engineering portfolio with a real world AI engineering project um this competition starts today and ends on April 20th and your goal is to create a notebook demonstrating a use case using some of the generative AI capabilities that you've learned from this week's course whether it's agents vector databases embeddings um or all of the
58:30 - 59:00 above um and for bonus points you can create a blog post or a YouTube video to share everything about what you've learned um and about your project um evaluation and submission details will be shared today um so make sure to pay attention on Discord and via email um all of our participants will receive a Kaggle badge and a certificate on their Kaggle profiles by the end of the month by the end of April um and you can either work individually or you can form teams so there's no need to go alone you
59:00 - 59:30 can uh you can do this um with a group of your friends um and uh and we are so looking forward um to seeing what you build this is very very exciting and the best way to learn is by applying so hopefully everybody has been thinking of something that they'd like to build as they've been kind of tackling all of the course curriculum throughout this week um and uh we're looking forward uh to seeing what you create so thank you so much for tuning in this week um I I I
59:30 - 60:00 think that it's been magical to see everything um you know all over the world over a quarter of a million of y'all kind of sharing um sharing all of your excitement um and this is the part of the job that that makes everything worth it um Anant do you want to add anything yes and the winners of this uh of the capstone project will also be amplifying your capstone projects on Kaggle and Google um coms so um yeah be
60:00 - 60:30 looking forward to see what you discuss yeah so excellent i can't wait to uh to take a look and um also just want to say personally thank you so much again to all of our wonderful course moderators all of the people behind the scenes who were building things to make sure that uh that you know all of y'all would have the the tutorials and the educational materials it really does take a village um to make it uh to make it all happen so thank you so much um uh have a beautiful weekend ahead um and uh we
60:30 - 61:00 appreciate you and uh and are very excited to uh to have you as part of this course and this capstone project