On .NET Live - Multi-agent workflow using Azure Durable Functions and Semantic Kernel

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

In this insightful session of On .NET Live, Scott Addie hosts a discussion on leveraging Azure Durable Functions and Semantic Kernel for building multi-agent workflows. The guest, Divar Kumar, shares his journey from having limited AI knowledge to now developing an open-source project employing these technologies. The conversation covers foundational AI concepts, the transition to multi-agent frameworks for handling complex tasks, and how Azure Durable Functions enhance scalability and resilience. With detailed discussions on the architecture, implementation using Azure Cosmos DB, and the integration of various tools for efficient processing, the session offers rich insights for developers keen on advancing in AI applications.

Highlights

Divar Kumar shares his transition from a beginner to an advanced AI developer using Azure tools. 🔄
Introduction to multi-agent systems and their advantages over single-agent systems. 🤔
Usage of Retrieval Augmented Generation to enhance AI responses with context. 🧠
How Azure Durable Functions contribute to building resilient, scalable AI workflows. 🛠️
Integrating Semantic Kernel for better maintainability and functionality in code. 💾

Key Takeaways

Azure Durable Functions are used to create scalable and resilient multi-agent workflows. 🚀
Divar Kumar shares his AI journey, highlighting the importance of developers in advancing AI applications. 🧑‍💻
Retrieval Augmented Generation (RAG) is key to improving AI model responses by adding context. 📚
Multi-agent frameworks distribute tasks to specialized agents for better handling of complex scenarios. 🤖
Effective use of Azure Cosmos DB can incorporate both transactional and vector data storage. 🌐

Overview

Scott Addie hosts another enlightening episode of On .NET Live, focusing on the implementation of multi-agent workflows using Azure Durable Functions and Semantic Kernel. As February ends, the discussion promises fresh insights into warming up our digital skills for the better weather ahead.

Guest speaker Divar Kumar narrates his fascinating journey into the AI space, from his initial zero exposure to becoming a technical architect. His story emphasizes how developers, not just data scientists, are pivotal in promoting AI applications from POCs to enterprise-grade solutions, highlighting the role of architecture and design in AI advancements.

The session delves into the architectural nuances of combining Azure Durable Functions with multi-agent frameworks to improve scalability, cost-effectiveness, and response accuracy. Divar explains his project setup involving Azure Cosmos DB as a dual-purpose data store, the significance of Retrieval Augmented Generation, and the integration of Semantic Kernel to maintain a seamless development process.

Chapters

00:00 - 03:00: Introduction The chapter titled 'Introduction' likely serves as the opening segment of the book or presentation. However, the given transcript is essentially non-existent and contains only notations for music, indicating that the audio might have started with some form of introductory music or sound without any spoken content. Consequently, there is no substantive content to summarize from the provided transcript.
03:00 - 04:00: Welcome and Host Introduction This chapter sets the stage for the content to follow by welcoming the audience and introducing the host. It likely includes salutations, a brief mention of the show's theme, and introduces the person or people who will be guiding the discussion. Although the transcript itself is missing, one can infer that the use of music suggests a formal start or a transition in the audio or video program.
04:00 - 05:00: Guest Introduction - Divar Kumar The introductory chapter features a guest named Divar Kumar. It starts with music, setting the tone for the session.
05:00 - 08:00: Divar's Journey into AI In this chapter titled 'Divar's Journey into AI', the story unfolds with Divar embarking on a quest to explore the realm of Artificial Intelligence. Divar is intrigued by the immense possibilities AI offers and is keen on understanding its intricacies. Throughout the chapter, Divar encounters various challenges and learning experiences that shape his understanding of AI. The narrative focuses on Divar's development and his growing fascination with how AI technology can impact different aspects of life. Despite the challenges, Divar remains determined and optimistic about the future, reflecting on the transformative potential of AI in shaping a new era.
08:00 - 11:00: Introduction to Prompts and RAG This chapter introduces the basics of prompts and Retrieval-Augmented Generation (RAG), emphasizing their importance in AI and machine learning. It discusses how prompts function as inputs that guide the responses of AI models and how RAG enhances information retrieval by combining retrieval techniques with generative models.
11:00 - 14:00: Concepts of Agents in LLMs This chapter provides an overview of the concepts of agents in large language models (LLMs). It begins with a welcome message to the audience, establishing a casual tone with the phrase 'net friends,' before delving into the intricacies and roles of agents within the context of LLMs.
14:00 - 17:00: Multi-Agent Frameworks The chapter discusses the conclusion of February and the anticipation of warmer weather. The on.net live show's purpose is highlighted as empowering the .NET community members to achieve their goals.
17:00 - 20:00: Scalability and Resilience in AI Architectures The chapter begins with an introductory segment where the hosts, Scott Addie and co-hosts Frank Bue and Cam Soer, greet the audience and introduce today's guest, Divar Kumar. Divar expresses his gratitude and excitement about being invited to speak, setting a positive tone for an insightful discussion on scalability and resilience in AI architectures.
20:00 - 23:00: Mail Chain and Multi-Agent Proposal The chapter begins with a speaker, Dak, who has been working as a technical architect at FOF for over 10 years. Dak admits that a couple of years ago, he had no knowledge about AI, showcasing the rapid transformation that is possible in the tech field. Despite earlier gaps in knowledge, he is now recognized as a Microsoft ‘M’ in AI, indicating significant growth and expertise in the area. This introduction sets the stage for a discussion on AI and related topics.
23:00 - 27:00: Durable Functions with Multi-Agent Architecture The chapter explores the pivotal role of developers in advancing AI applications from proof of concept (POC) to enterprise-level deployment, a space often assumed to be dominated by data scientists. The speaker intends to share their journey and insights gained in building an open-source application, emphasizing the significance of a developer's role in promoting these applications effectively.
27:00 - 30:00: Cosmos DB and Semantic Layer This chapter introduces Cosmos DB and its connection with the semantic layer in AI technologies. The speaker expresses excitement about learning and getting up to speed with AI and indicates that the upcoming session will involve exploring this subject further, including a discussion on personal experiences and insights gained in the field.
30:00 - 35:00: Architecture Overview and Cost Considerations The chapter titled 'Architecture Overview and Cost Considerations' begins with the author's introductory remarks, sharing personal experience and background about utilizing chat CPT. They express their fascination with the model's performance which led them to download a Postman collection. The chapter likely delves into the intricacies of architecture in the context of chat CPT and possibly weighs its cost implications, though specific details are not provided in the excerpt. The author also shares resources such as a blog link and GitHub handle.
35:00 - 40:00: Demo of Multi-Agent Workflow The chapter titled 'Demo of Multi-Agent Workflow' describes the journey of a developer who started by downloading an HTTP protocol and using a post endpoint for chance completion. They sent request payloads and received responses. During this process, they discovered two key elements: one being the request payload which includes a prompt, potentially a system or user prompt.
40:00 - 45:00: Demo: Eager Integration and Open Telemetry The chapter titled 'Demo: Eager Integration and Open Telemetry' begins with the explanation of a 'system prompt,' which is used to define a persona for the model as well as to set goals. These prompts include details that accompany each request sent to the language learning model (LLM), specifically a 'system prompt' and a 'user prompt.' The response generated by the model, which includes the completion of the task or query, is also discussed.
45:00 - 51:00: Code Walkthrough: Durable Functions and Semantic Kernel The chapter begins with the author discussing the process of augmenting knowledge into prompts. The primary focus is on addressing the limitations of machine learning models that rely on outdated data and possess no information on recent events or private information. Additionally, the author emphasizes the necessity of providing these models with updated content to improve their responses, particularly when queried about new or confidential topics. The walkthrough navigates through 'Durable Functions' and the 'Semantic Kernel', shedding light on methodologies to enhance model capabilities in understanding and processing such information.
51:00 - 55:00: Prompts with Prompty Extension The chapter titled 'Prompts with Prompty Extension' discusses how to handle hallucinations in AI models by providing additional information through system prompts. It highlights the importance of context and introduces the concept of Retrieval Augmented Generation (RAG) as a solution. RAG is employed due to the limitation of the model's context window, which constrains how much information can be processed at once.
55:00 - 61:00: Open Telemetry Integration in Code The chapter discusses the importance of providing relevant information to language learning models (LLMs) to ensure the quality of responses. It introduces the concept of 'rack', which involves using a vector database to organize and supply pertinent knowledge to the model, thereby minimizing the chances of receiving hallucinated or irrelevant answers. This approach enhances the accuracy and usefulness of the model's output, particularly in the context of Open Telemetry integration in code.
61:00 - 70:00: Live Demo Execution The chapter 'Live Demo Execution' discusses a shift from traditional data storage to a new methodology where data meaning is prioritized. Instead of just storing data assets, the process involves converting data into mathematical representations to capture their meaning, which is then stored in a Vector database. This approach facilitates interactions with language models, wherein a query can be efficiently processed.
70:00 - 77:00: End-to-End Flow and Monitoring with Open Telemetry The chapter discusses foundational concepts in Open Telemetry related to querying and embedding format in vector databases. It explains how cross-checking data in vector databases ensures the selection of relevant information, which, when fed into language learning models (LLMs), results in better responses. This foundational concept is deemed long-lasting and crucial to the discipline.
77:00 - 81:00: Questions and Comments The chapter discusses the concept of utilizing agents in conjunction with large language models (LLMs). It highlights the significant power that comes when LLMs can operate independently, thinking and acting autonomously with the help of various tools. These tools could include function calls that facilitate interaction with the outside world or integrations with third-party services.
81:00 - 87:00: Cosmos DB API and Vector Search This chapter discusses the integration of Cosmos DB API with vector search, particularly in the context of a service architecture, whether monolithic or microservices. The conversation explores how internal services can interact with the external world via function calls. It elaborates on the process of defining a list of functions within an application and how, upon receiving prompts as requests, the system leverages a language model (LLM) to determine the appropriate function for each specific use case.
87:00 - 95:00: Natural Language to SQL Queries The chapter discusses the process of converting natural language into SQL queries. It explains how a model can autonomously utilize a tool to perform actions, thereby acting as an agent. This involves the use of memory, requiring a vector or transactional database to store data for later use. The chapter highlights the importance of understanding the arguments necessary for calling these functions and maintaining a structured flow in decision-making processes within AI models.
95:00 - 100:00: Durable Functions: Fan-out/Fan-in Pattern The chapter discusses Durable Functions, focusing on the Fan-out/Fan-in pattern in computing. It highlights the importance of utilizing the history of conversations to enhance the model's understanding and performance, comparing this approach to turning the model into an agent.
100:00 - 105:00: Wrapping Up and Closing Remarks In this closing chapter titled 'Wrapping Up and Closing Remarks', the discussion focuses on the transition from using a single agent to a multi-agent system. The single-agent solution was effective only for simple scenarios, but as businesses often deal with complex cases, it became necessary to adopt a multi-agent approach. A single agent was failing to manage complex scenarios effectively, motivating the shift to a more robust multi-agent system.

On .NET Live - Multi-agent workflow using Azure Durable Functions and Semantic Kernel Transcription

00:00 - 00:30 [Music] n [Music]
00:30 - 01:00 n [Music]
01:00 - 01:30 [Music]
01:30 - 02:00 n
02:00 - 02:30 [Music]
02:30 - 03:00 welcome. net friends uh it's hard to
03:00 - 03:30 believe February is coming to a close this week but here we are as Frank points out in the chat warming up the bits so with the end of February hopefully comes warmer weather and we do our best to warm up the bits here on the on.net live show if you're tuning in for the first time our goal here on the on.net live show is to empower all of you our net community members to achieve
03:30 - 04:00 more so who are we and who are we talking to today well I'm your host Scott Addie and I'm joined by co-hosts Frank bue and cam soer I'd like to welcome today's guest divar Kumar dvar could you please briefly introduce yourself to the audience yeah so hello everyone first of all thanks for having me here it's such a great honor for me to speak here and
04:00 - 04:30 welcome everyone so I Dak I currently work as a technical architect at fof I have been in this field for more than like 10 years uh but the beauty of it is like if you had asked me couple of years ago anything about AI related to that I would have said to you that I literally have zero knowledge on that yet here I'm speaking a topic on AI and I'm a Microsoft m in AI the reason why I say this is not to brag about myself but to
04:30 - 05:00 give you the confidence developer like us is going to play a crucial role in the a space because it is not the data scientist who is going to promote the AI application from the POC to Enterprise grade application it is asked like a developer who is going to promote that so that is what we are going to see today um I'm just going to walk you through the Journey that I went through and how I started to build this open source application [Music]
05:00 - 05:30 and then we will see the code part of it that sounds great and I think you know a lot of us are still trying to get up to speed on a all things AI so uh for me personally I think it's it'll be fascinating to heure about your journey and what you've learned and U I'd say let's dig in let's get started yeah uh so this is just a short intro uh so I have already done that so if you are interested about the uh the
05:30 - 06:00 blogs that I write so you could visit this link and also below is the GitHub handle for mine um and that is the same for the Linton as well so with that let me uh give you a brief introduction how I started this journey so like everyone of us I started interacting with the chat CPT one day and I was fascinated by the result that I received from this model so that is when I started to download a postman collection uh because that that is what we have been doing for
06:00 - 06:30 the entire life as a developer so we used to interact with the HTTP protocol so I started to download that and I made use of this post endpoint for the chance comption and I send in some request payload then I received a response so that is how the journey started so when I was doing that I came to know about two things one is in the request payload you will be sending in uh something called as a prompt so that could be a system prompt or it could be a user prom
06:30 - 07:00 so when I say system prompt it is something that you uh Define uh persona for the model and also if you want to set a goal for the model you will be doing that in the system prom and there is a user prom where you will be asking the queries so for each of those requests that you send to the llm so you will be sending along with these details uh the system prom and the user prom and then you will be getting a response back that they call it as a completion so
07:00 - 07:30 this is how I started and then I started to augment some of the knowledge into the into these prompts because as you could as you know uh these models are trained on the old data so they doesn't have any information on the new happenings that uh recently happened right and also if it is related to any private information then also the models don't know about those things so obviously when you ask about those questions to to the model it will
07:30 - 08:00 obviously hallucinate so what you need to do is like you need to provide those information along with the system prom then the model will be able to give you a better response so the then what I started to do is like I started to work with retrival augmented generation the reason why we went with retrieval augmented generation is uh we had this kind of a contact window so all these models will be having a contact window Beyond which you can't dump the the knowledge so once you Dum um more
08:00 - 08:30 knowledge to it more knowledge to it in the pr in the way of prom so it tends to uh give you uh hallucinated answer or it that the quality of the answer that you expect from the model will not be good so you always need to make sure that whatever the information that you feed to this llm is relevant to what you are asking so that is when this uh rack concept came in so where we will be making making use of a vector database
08:30 - 09:00 uh where you will be storing the meaning of those data rather than storing the data assets as we have been doing for our entire life uh now along with those data we will be converting them into a mathematical representation so that will hold the meaning of those data and we will be storing them in a new type of a database called as Vector database then as when we ask a query to the llm so we will will convert that
09:00 - 09:30 query as well into the embedding format and we will be cross checking with the uh the data that we have in the vector database and we will be picking only the relevant information from there then uh if you feed that into the llm so it will be able to give you a better response so this is some kind of a foundational concept that I learned when I started with this journey so the these foundational concept is never going to um be last because this is going to be there for a long time
09:30 - 10:00 and then uh I started to learn about agent so that is where the real power comes in so because uh now your llm the the models that you use is able to think on its own and also it is able to act uh with the help of certain tools that you provide with it uh when I say tools it could be a function called with words you interact with the external world uh it could be a third party service that you are interacting with or it could be
10:00 - 10:30 uh a monolithic service or a micros service that you have within your organization so now it has the capability to reach to external world not essentially directly um so you make a function call you give um all the list of functions that you define in your application along uh th with with those prompts that you send as a request below uh now the llm will be sending back to you which would be the right function to pick up for that particular use case and
10:30 - 11:00 along with that it will also say you what are the arguments you need to call it with so that is how the flow happens within the agent so when your model thinks on its own and make use of this tool to perform certain action then you can call this model to be an agent uh this happens with the help of the memory as we seen earlier you need to have a vector database or it could be any transactional database uh where you store the data uh could be used for um
11:00 - 11:30 retrieving the last conversation uh because now if you feed these LMS with all the list of previous conversation that you had it will be having a better knowledge on what you are trying to ask it so if if you are doing this in this way then you could call your model to be an agent so this is one of the main concept that you uh that I have learned through uh when I started with this journey
11:30 - 12:00 then there comes the multi-agent the reason why we went with multi-agent is the single agent was working fine till uh we were handling a complex use case it was working fine only for the simpler use case but the businesses around us uh is never going to be simple right so we are going to handle complex cases so when when handling complex us Cas by a single agent it was drastically failing uh it was not able to have handle those kind of a complex scenarios so that is
12:00 - 12:30 when what we did is we segregated the responsibility uh to individual agents um and form a multi- agenty framework so by doing this uh what we uh came up with this like each of these agents have a single responsibility to handle so when we did this we we got a better results so that is when there were lot of Frameworks that came up like autogen uh QA open a form so these are different
12:30 - 13:00 multi-agent Frameworks uh that you could Deploy on a infrastructure and as we seen earlier uh the other things Remains the Same so now you will be making use of this tools also known as plugins um and you will be interacting with a third party servers or monolithic or micros service layer and on the top you will be having a database uh which is your uh normal transactional database alongside that uh now you need to also have Vector store because that is what we have seen
13:00 - 13:30 in the rack concept so now you need to store the meaning of those data to retrieve the uh relevant information from those database but what we are talking about is developers right so we always have this developer mindset so we always tend to focus on these three ke areas so we always want to have high scalability high availability lot of itties right so uh we need to uh build a solution that is highly scalable but also at the same
13:30 - 14:00 time it should be highly performant and when we are choosing uh the service that we want to deploy or that we want to host or it could be the database it should be cost effective and it should be highly resilient so how we are handling the failures uh whether we are able to gracefully handle those failures or whether we are able to have uh implemented the retri mechanism because now we are dealing with multiple agents so there could be there could be anywhere there is a
14:00 - 14:30 failure that is happening so are we maintaining those checkpoints because we need to know from where we need to continue again so how we are going to deal with that so these are the several things that you need to think about uh when you are designing this kind of an architecture and finally comes the security and monitoring so if you have done this part but without the monitoring you are deploying the solution you are going to pay a hard time when you are going to promote it to production so you always need to have a
14:30 - 15:00 kind of a phasing mechanism within your architecture because that is how you will be having an into and visualization on what is happening behind the scenes so that is when uh what I did is um I initiated a mail chain with other MVPs in Ai and um I started this mange with this subject called multi-agent or modular monolith because from a developer I uh so since I'm from the developer
15:00 - 15:30 background as when I saw this multi-agent framework all I could see was modular monolithic architecture because you have uh segregated the responsibility to those individual agents but in turn what you're doing is you are deploying it as a single package into a single infrastructure so if you want to scale up or if you want to high have high scalability now you need to scale up the instance instead of scaling out the instance so that could work for most of the use cases but uh that could
15:30 - 16:00 also be a problematic scenario for some of the use cases so I proposed a initial architecture uh with auu durable functions how we could make sure that each of those agents are able to handle uh scalability to any extent and uh it has high performance and also highly resilient because out of the box you get this resiliency uh in the aable functions and sooner or later this got more tracktion so there were many
16:00 - 16:30 feedbacks that I received from the MVP maain chain and that is when I started to work on this open source project and there was one another incident uh with uh a cosmos DB uh PG recording meeting so there uh we were discussing on design patterns related to Cosmos DB and I was initiating a uh question around the materialized view pattern can't we Implement AIC layer using materialized view pattern the reason why I brought this topic is to ensure we have a highly
16:30 - 17:00 relevant information back from the llm because it is no longer just a keyword search that you will be making in the database because that might be working for some of the use case but you need to also make sure you are querying upon the semantic of the data so unless until like you have a sematic layer on top of your transactional data you won't be have able to get a better output from this llm so that is when I started this
17:00 - 17:30 conversation uh with the cosmos DB team and U um Mark was um uh the product manager of Cosmos TB so he helped me to uh go through uh certain references and also uh if I would say this project is uh in such a stable format so I would say it is all because of him the feedback that I received from in and the others from the main thing so that is when I started to build on this curable multi- agentic
17:30 - 18:00 architecture so here you could see there is only one difference from the existing architecture that we have seen before so now all these agents are deployed into different infrastructures that gives you an ability to have highly scalable environment that uh now you have highly resilient environment once we look at the uh enti interin architecture you will be able to get get a better understanding of that and now you also have this vecta plus database combined
18:00 - 18:30 together um because I'm using Cosmos DP as my database layer so what happens in the cosmos DB is like now alongside your transactional data you could also store the vector embeding alongside it so you don't need to rely on a separate database just for the use case of storing the embeding now you could show them alongside your single database you don't need to rely on a different database and have different data pipeline for it to push those data to
18:30 - 19:00 the to the those data layers so all I have is one single database layer that is a cosmos DB which acts as a transaction TB and also as a vector database on top of that I have a symic layer which is going to store this embeddings and along with the transactional data and also I do have a Microsoft fabric integration with this application and out of the box what you will be getting uh once you clone this
19:00 - 19:30 project uh is different kind of implementation one is custom multi agent that I have implemented and the other one is single agent uh you could compare the performance between these two and also I have made use of certain frame that is already out there like autogen and the final thing that I have is the real time AG implementation where you could have the spece to piece uh uh communication with the llm
19:30 - 20:00 so before um getting into this architecture let me know like if there are any query uh so we can address that and then move on to this architecture looking at the the questions that have come in um the the one that sticks out to me is the one that was at the Forefront of my mind you had a slide where you were talking about cost Effectiveness and one of our viewers asked cost probably referring to if they were to create a similar
20:00 - 20:30 architecture what are they looking at in terms of let's say monthly cost you know roughly yeah so uh yeah maybe I can uh explain with this architecture itself so what I have used here as my hosting service is a durable functions which you all might be knowing it is a serverless based architecture so you only pay for the time that you are using it uh and the as part of the database I'm using AZ Cosmos
20:30 - 21:00 TV as I said before I'm not relying on a separate Vector store um say suppose aure search for that matters uh what I'm using now is as a cosmos 3B uh alongside that I'm storing the vectoring as well so there is a cost effect invers uh across the layers that I have implemented in this architecture right from the deployment the application layer uh till the uh database layer you could see uh the services that we have chosen here uh is mostly cost effective uh
21:00 - 21:30 one got it and it is there um are there U I guess one way to phrase this if I wanted to create something like this am I able to use entry level skews of these services or do I really need to ramp up to you know more robust skews of the the various Services yeah so for the POC that have been built uh I have been using this in the
21:30 - 22:00 consruction plan so I haven't created any Premium plan for this one it was able to handle very much uh pretty much well uh and for the database as well so uh there is no specific Q I have used uh even for the Microsoft fabric uh since we all are on the free trial uh right now so I'm using the free trial as well uh but um after that as well like if even if you are using Q S4 or so so on so so you will be able to handle uh multiple Tropics with this type of an
22:00 - 22:30 architecture got it Frank cam did you want to ask anything I didn't want to ask anything I just wanted to say I really like that slide earlier that demonstrated how uh RG works right because I I had never it had never clicked for me before that what is happening there is that the the um uh that there's a search going on and that the results of the search are being injected into the The Prompt that gets
22:30 - 23:00 fed to the llm I didn't know that so thank you divar that that really helped I didn't have any questions so yeah with that let let's see uh what this architecture uh from the end end perspective so for the frint end I have been using Blazer application and um whenever you send a request uh to a durable functions is kind of an HTTP trigger if you are familiar with a
23:00 - 23:30 durable functions and a functions in general you should be aware of different triggers HTTP trigger is one of the way uh where you could uh interact with the HTTP protocol and inside that I do have different orchestrators so this is one of the patterns that you could use within the a durable functions what I have within the a durable functions is uh kind of a different sub so one that is will be that will be help me to handle the Travel Agency use cases and the other one is uh where I'm
23:30 - 24:00 handling the uh the general queries like the frequently Asked queries or uh anything related to my past bookings so they those are handled by a separate sub orator so how this works is like uh each of these agents are the individual activity triggers that you could see here so that allows you to uh scale up to any extent and this has been implemented alongside with the help of Mon colel and for maintaining the prompt
24:00 - 24:30 files uh which we will see in detail once we get into the code uh with the help of this cool extension that I came across which is called prompty so with these things you can ensure that you are building an Enterprise grade application because first and foremost thing that you get out of these libraries and the sdks is the maintainability of the code uh will be higher um I haven't seen in other Frameworks or in other uh abstraction layers like the Lang change
24:30 - 25:00 and other things uh but if you are using santic kernel one of the main advantage that you get out of it is the maintainability of the code and also obviously the abstractions that they have built together and the single code that they have the called as kernel and I do have a separate sub Channel pop sub channel uh the reason why I have this is to give the user a better experience because now once you send this SCP trigger to a durable functions it will
25:00 - 25:30 return back to you a 21 accepted result which is kind of an disconnected scenario so you if you want to have this uh kind of a mimicking of request reply pattern what you could do is like you could have this kind of an uh pop up mechanism where once you are done with whatever the activities that you are doing with these multiple agents you could send a notification to this sub channel uh which is been subscribed by your print and application so then uh
25:30 - 26:00 the user will be knowing that okay so this is what uh the conversation looks like um and uh the CHP experience won't be interrupted just because you have a disconnected scenario in your backend application and the next thing is uh with the help of change feed uh I am building a sematic layer change feed is a concept where whenever you store a data into the cosmos DP uh there will be a CDC mechanism so uh the same stream uh Whenever there is an insert or update
26:00 - 26:30 happens on the the logs of the database so it will intimate you and the a durable functions uh the Azure functions that I have for this will be listening onto this uh CH feed uh trigger and um with that what I do is like I fetch the information from lot of different areas it could be uh within a single database I could get gather the information from different database uh or it could be different micros service where I fetch the uh the data and I feed them into the
26:30 - 27:00 semantic layer so that whenever a a person is asking a query upon uh certain things uh related to the travel agency I was able to handle that seamlessly with the help of the semantic layer rather than directly pulling that information from a single table because uh within a single table you could have your column names uh that is only understood uh by you uh because we we we have have this practice of having any column names
27:00 - 27:30 right but it should be understandable to the llm uh what kind of data are you storing in that specific column so for that you need to always have this kind of a semantic layer so that whenever you are doing this natural language to SQL conversion it will have a better result and finally I also have a database mirroring concept uh which is uh uh something a new in the town so uh with the Microsoft Fabric and what you could now do is like you could just
27:30 - 28:00 mirror the data that you have in this Cosmos DV you don't need to have a sying for just to perform the analytics you just need to mirror the data and uh once you uh pull those data into the one L there you could use any number of servers like data varrow data Leos um you could also use powerbi to visualize those T so that is what I have been using here as so I believe I covered most of the area here um the other Integrations that
28:00 - 28:30 I do have is eager Integrations so I was talking about the monitoring aspect right so you need to have this monitoring aspect uh for that I do have this tracing mechanism so that you could visualize an end to end flow uh right from the start of the request uh user is posting to the llm till the response that it is generating all the multi- agentic response in between you have within uh those agenty conversations so everything is been raised by this uh
28:30 - 29:00 eager uh monitoring tool you could fap this with any monitoring tool that you have because at the end of the day if your APM is supporting OTP protocol then you could make use of grafana or Prometheus any any sort of APM and that's it with the a maybe now we could see the code walk through office yeah let's see some code
29:00 - 29:30 so let me first show you how the Spector looks like so once you clone this repository uh you will be able to find uh certain folders I segre segregated so this one is where I store my pipelines and uh the main uh uh the solution structure is inside this durable agent so here is where our a durable functions reside and alongside that we do have the services just to mimic an Enterprise grade application
29:30 - 30:00 obviously you will be having a microservices architecture so to mimic those I do have everything in the single repository uh so here you could find I'm currently having four microservices one is for the booking service the other one is for flight service uh the one is for user service and the Weather Service uh with that let's see some code inside the durable functions I just want to cover few parts in uh
30:00 - 30:30 that is related to the Santi kernel how it helps us to build a better code uh and I and then I will show you how the prompty extension is helping you to maintain those prom files uh then uh we will see like um uh the code that we have for the agenting implementations and then we will also see how the open Telemetry is been integrated so first let's start with the open a services so the beauty of it is uh like
30:30 - 31:00 instead of directly uh tightly coupling uh any uh specific AI provider what you could now do is like you could have an abstraction layer on top of it um that is been provided by this I chat completion service um this is been used whenever you are interacting with the model chat completion service is the one that helps you to uh get back the response in the text format and there is one another uh interface that is called
31:00 - 31:30 itex embedding generation service this allows you to generate the embeddings out of the text that you pass it to it um this is where we have seen in the rack so you need to convert that data into embedding format so this is where I have been using this these two interfaces right now I do have only one implementation of it auru open a completion service this could be multiple implementation as well so you could uh then have uh a runtime sequence
31:30 - 32:00 uh so you could dynamically switch between those implementation in runtime and also I do have a specific Place uh which is kind of an single point of contact so this is the another layer that I have internally so here why I have this layer is to have a better abstraction on top of whatever we have already uh since now we are going to interact with multiple agents each of those agents could use different model so you could have a specific logic
32:00 - 32:30 business logic around here because whenever you are calling this from a specific agent say suppose this this has been called from a booking agent so you could have a business logic around here to Swit between um uh different models like you just need to specif uh specify which model ID you want to refer here and then you could have a different settings for that specific agent around and you could have a different temperature settings for that agent so this could be different for different
32:30 - 33:00 kind of agent so that is the level of flexibility that you will be getting out of the uh stic kernel so now you don't need to go to uh multiple places just to plug in these things all you need to do will be in a single place so that is one uh example that I could show you uh with respect to the simanti kernel one thing I could in here one thing that I really appreciated I think
33:00 - 33:30 it was in your program CS file I wanted to point something out there that uh you're actually using the Azure identity Library specifically the default Azure credential to avoid doing key based authentication you're you're taking that a step further and you're doing token based authentication to enter ID which is is great to see exactly so what we have been using here is the manag so once you deploy uh this uh into the
33:30 - 34:00 functions you need to set the uh respective RBS for those a functions but if you are executing this from your local environment you just need to make sure whatever the account that you are currently logged in with this Visual Studio has those permissions as well to interact with these models so that when you are running this locally also you will be able to make use of this manage so that there will be a seamless experience that you will be getting
34:00 - 34:30 within the local um environment and also with the deployment environment okay so with that uh let me go to this prompt uh for that let me expand this ageny folders uh so in each of those agent I do have a prompty file uh so if I give a PL on this one so you could see
34:30 - 35:00 uh this is essentially a prompty extension file I will come to a bit like why like how we are connecting or integrating this with the semantic tunnel but here whatever we have seen earlier like the system prompt or the user prompt those are the same concept that you are providing in this prompty file instead of you um just jumping everything in the HTP client alongside your request body you now have a different mechanic like because now you have a separate
35:00 - 35:30 file for holding those information so for the booking agent I do have this separate file and uh there are different prompt templates uh uh the templating language could be different uh here I'm I believe I am using liquid prom template there are different templates like uh handlebar templates as well so uh by the templating language what I mean is like in the runtime these values will be replaced anything that has been uh surrounded with this curly braces so
35:30 - 36:00 these values will be replaced at the runtime uh by the kernel by the suanti kernel for you when you are providing it with the arguments and here you could have lot of uh business logic as well like you could have a for Loop you could have an uh if condition anything you want to have you could have this uh in this prty file and also you could mention that the output should be in the specific format so it it will do on its one uh I have a question here for you so I
36:00 - 36:30 see the the extension on this file is prompty does visual studio provide any kind of Assistance or colorization U when you use that extension yeah uh sadly I have been using uh visual studio for my development but then you could use Visual Studio code so you have this uh prompty extension here uh download this extension
36:30 - 37:00 so this is the prompty extension that I was talking about once you download this uh you could now see this Visual Studio code has recognized this prompty file it has attached that with the prompty logo and you also could see uh this has been interated and the the things has been highlighted yeah as much Visual Studio code is yeah there is one more thing uh good thing about the prompt is like you also could preview now the markdown files uh
37:00 - 37:30 alongside like how you preview the markdown file so that is same as prompty file so suppose you have some static images here you could just visualize them in the prompty file in the preview for and that doesn't end there you could also run it before you are trying it uh with the deployed voic so that it helps you to work with the local environment so if I just click on run so what it is going to do is like it
37:30 - 38:00 is just going to use the local credential to make so that is uh having the manag permissions and then it is trying to uh run this en prompt file since I don't have specific informations on these user uh information let me show you that uh go to the veros mode so here you could see right so the rle is been system and the content here it has nothing
38:00 - 38:30 replac my uh the the curly braces because I haven't set it uh I haven't said these are the values that you need to bind it with but if you are doing that and you are running this from your Visual Studio code you could know the uh response that you are getting out from llm prior to you uh even trying this out in the deployed environment you just can handle this with a single property file as well you don't need to have your ENT set ofet code with along with it you
38:30 - 39:00 just need to have a single prompty file and you are good to go so those are some of the features uh that that I've been uh provided with the prompting file going back here uh let me go to this prompty service I do have an abstraction for this one uh just to uh have everything in a single place so what I do here is I will be reading all the contents from the prompty file and I will be uh converting that into
39:00 - 39:30 the promt templates that I was saying before the liquid promp template is what I'm using so you could use any promp template from Factory that you are familiar with and here you could see while I'm Runing I'm passing the kernel along with that I'm passing the arguments as well so these are the runtime variables that you will be replacing with those curly Brides then you will be sending this to your whatever the call application okay suppose uh this is one
39:30 - 40:00 of the uh the calling point where I'm using this prompty service here I'm just um mentioning it as with which prompt file it needs to call into and then I'm just passing the kernel along with it and I'm also passing the contact each of those runtime variables the contact has been replaced with this user query the user ID is free we are also passing the chat so that it better knows about the previous convers that we had with this and alongside we are passing the other
40:00 - 40:30 information related to the current logged in person so with that let me show you a bit around [Music] the open elry maybe uh okay let me start from the program. open elry so you have have two different ways of instrumenting it one
40:30 - 41:00 is automatic instrumentation that you could use um with libraries you um inject into your services you will be able to get the traces out of the box for you but we are the developers right so we always tend to have the uh flexibility among ourselves so we tend to do it uh manually so that is what I have done in this open source project as well so that gives you a better uh flexibility because right now since we are handling with multiple agents you
41:00 - 41:30 need to make sure what kind of token that you are consuming at each of those agents each of those to calls uh because you need to always make sure you are within the context window that we have seen at the start of the slide right so you need to ensure that you are tracking those metrics so if you are using automaic instrumentation uh I believe you won't be able to capture those things but then you need to make sure that you are conf this uh using the manual instrumentation so what I did is
41:30 - 42:00 here is I have used a um phaser provider Builder SDK and I'm using OTP uh protocol as I said before whatever the uh APM that you are using if you are uh following this OTP protocol which is a standardized protocol that they are using for the APM uh then you will be able to export them uh export whatever the logs whatever the metrics whatever the phases that you push from these applications to those uh OTP collectors and alongside that I'm also
42:00 - 42:30 using AUM monitor Trace exporter uh this works in a separate way than the OTP exporter so we do have a separate SDK for it U with which you could directly connect with the Azure Monitor and you need to specify the uh the name of the service that you are currently handling this is three service um and you could set sampling um it could be different for the production environment right now I'm having always on sampler so it will
42:30 - 43:00 uh try to give my me better picture because I'm just going to dump all the verbos into this uh APM so that is where that is how you will be registering the open uh tet SK and also I do have the metrix provider so along with that let me show you how you could do this manual instrumentation so here you could see there are different racing anal do have one is for the activity tring analy
43:00 - 43:30 other one is for the HTTP interceptors HTTP interceptors is something uh will be called when you are using uh right before you using the HTTP client so whenever you are interacting with the third party whenever you are interacting with the microservices you need to make sure that you are finding those requests alongside with uh the open Elementary contact so that the micros Services once they are receiving that request they will be able to know okay this is the
43:30 - 44:00 correlation ID that I need to bind with you will be able to get the end let me show you one example so activity trigger tracing is one thing that I have created here so what I've done here is go here I'm kind of getting the parent raising anal so there are three things uh that you need to get capture uh from your parent fan that you are
44:00 - 44:30 creating with which you will be able to create so one is the trace the other one is the span ID and the third one is the uh Trace flag so with these three parameters what you could do is like you could create a child fan so this child fan when it is created F now you are trying uh uh the subsequent uh size f with the parent f you could see the visualization of the
44:30 - 45:00 inter and then these are the sets of things um how how I B this is like I do have this way of delegate hand that I'm using here let me show you that this is one of the agent that um booking agent what I have here is I have this entire um business logic trapped in a
45:00 - 45:30 delegate and then I'm using the tracing andl that we before and executing the tring and with the help of this so what happens is before it is trying to hit the basic uh the business logic I'm trying to bind the siles fan with the uh tra IDs so that you get this uh uh interation
45:30 - 46:00 maybe if I could run this application maybe we could uh get a better understanding on what we are about to see oh let me run this application I'm already running the eager instance
46:00 - 46:30 um which is an uh ex format that I downloaded uh so you could you also use Docker uh instance for that um me pick up this one so while you're working on that the viewers might be wondering hey where do
46:30 - 47:00 I see this code um is this available on GitHub for folks to take a look at yes uh it is U maybe I could have the link here yeah if you put that in the the people tab we can get that to the viewers you read my mind SC thank you prompty is super cool I I learn about like I I knew the name but I learned like really how it was what it
47:00 - 47:30 does at the end of January during a the AI tour I was doing a workshop and it's pretty cool sometimes when you run locally the functions uh that will try to do
47:30 - 48:00 wonderful so for folks that want to follow follow along at home and kind of dig deeper into what you're seeing here take a look at the GitHub repo that I just shared and yes Jeremy this is something you
48:00 - 48:30 would have seen Seth demonstrate I know Seth has spent a lot of time working on prompty yeah now
48:30 - 49:00 okay there we go running live demo while streaming Wooh yeah that's always challenging so yeah so
49:00 - 49:30 this is the application that I was talking about um let me pick one of the earlier conversation that we did this one yes I was asking this llm to plan me a trip from tenai to Goa on so and so dat so what it did is like uh you could see the interaction at the right hand side so first we try to communicate with the flight agent uh to look at um what are the flight details available for the specific date so it
49:30 - 50:00 form two specific date for that uh for the departure and for the return flight and then it try to initiate a call to a weather agent so there it uh gets to know about the weather of those uh different water uh times and then it tries to give you uh a better uh time for you to go to uh those locations so it it then finally reached out to the booking agent for you to book the script so these are kind of a multiple agent interactions
50:00 - 50:30 that is happening behind the scenes so what we will do is like let's let's initiate one call so let's see how it goes since um everything is in the server serverless mode as I said before uh those services will be on the sleep mode now so it will take some time to wake up so hope it shouldn't take more time let me open the eager U as
50:30 - 51:00 [Music] well okay so uh yeah now it is interacting with the fight agent so now you can see uh I'm just keeping the uh user interactive because now you will be able to see like what are the interactions that we have behind the scenes so each for those loading screens what I uh used is like I used to uh give the information to the llm that these are the destination that this particular user is trying to travel to so it will
51:00 - 51:30 just pick one particular uh uh cool thing about that particular location and it will show you here so that you will be having a better user experience so now it is reach to the booking agent that is our final agent so once it's done uh sort of interacting so you should be able to get the confirmation mail as well so that we will see in the bit cool so now uh the booking has been done
51:30 - 52:00 so as before you could see um there were three agents that was involved in this entire conversation and let's us see if this has been captured in our open element cool so now uh you could see the entry and flow has been captured here one so right from the place where we
52:00 - 52:30 initiated the uh the web call so it traveled through the orchestrators as I said uh we have different orchestrators for uh one for the travel bookings and the other one for handling the frequently ask quaries so it reached out to the route orator it knows which arator it needs to pick into and that is how it uh went to the travel orator so then it uh form uh flight agent is the first agent that I want need to ask uh which would be the
52:30 - 53:00 right uh flight for me to pick up and then there are different plugins that I'm using internally here uh one is me Zoom it a bit hope this could be better now so here you could see uh inside the uh flight agent we we made uh several uh tool calls this is what we have seen in the agent slide so when your agent is able to think on its own and perform
53:00 - 53:30 action based on the tools that you provide then you could call your model to be an agent right so so this this model has chose um airport code plugin so that is what it should pick up so you could see the response data here as well uh it tried to call the departure City uh it found that departure city is the Chennai and the destination city is Goa and the response data also you could see here so now you could have an entire
53:30 - 54:00 picture uh how it is happening right so uh here you could see the it has reached out to uh the flight listing plug-in so now that we have found the outport code for that particular destination the next thing that we need to do is like what are the flights available for those destination so it try to call that uh endpoint uh that tool and then it uh started to interact with uh weather agent so here we have
54:00 - 54:30 multiple plug-in calls to get all those weather details from each of those uh destinations one other cool thing uh that I did here is along with the response data now you could also see the token consumption here as right right in your APM uh you could also see how much token you are consuming uh so that you are always is keeping on track of it so just for the flight agent I'm using this uh much of
54:30 - 55:00 total tokens that has been consume and this is for the weather Regent and the final thing uh that that reach to the booking agent so you get to know about uh the entire workflow with the help of this open liit maybe if you have any queries we can if not yeah um I believe that's all I had for this
55:00 - 55:30 demo any questions in the chat here he I see we're trying to find the correct Scott um I'm trying to handle the ambiguous Scott exception over here was it the Great Scot Scott the greater Scott I forgot there was a joke once where they where uh I'm
55:30 - 56:00 being so I I guess I have a question if there aren't any uh I know in your architecture slide uh you showed Cosmos DB I'm wondering uh which API specifically are you using in Cosmos DB uh is it no SQL or something else yeah that's a great question since I haven't shown that let me show you that as well um I'm I'm using no SQL
56:00 - 56:30 um so there is a vector sech capability that has been introduced in the no SQL as well and there are other um apas right now has this capability one is vco and the other one is even in the postal you do have this capability of uh interacting with the vector SE uh but what I want to show you is natural language to SQL servers so this uh this you could find it in this source
56:30 - 57:00 code so what I did here is like whatever the query that you ask uh to the llm what I do is like I convert those natural language into a SQL query uh so because certain queries will be handled by the keyword search and certain queries will be handled better with the help of the semantic layer that we built so so uh what I'm doing here is converting the natural language to the SQL uh what I'm using for that is uh
57:00 - 57:30 there is an endpoint so this this is something that you could see in the cosmos GB if you are familiar with the cosmos GB once you open a specific document uh you will also see ethical query that is attached to it where you could prompt with natural language and if you could just uh drill down into the developer tools you will be able to see like what are the network calls that is happening behind the scenes that is how I built the abstraction layer for the for my own purpose you could also build
57:30 - 58:00 this on your own by uh specifying each keyword what it means like uh whatever the collection you have you just need to specify uh these are the properties that you have inside your collection and these are uh the responsibility of each of those properties then you could also build uh the NL to SQL layer on your own I'm just using making use of the cosmos uh SQL uh AP in point uh converting the natural language to SQL and
58:00 - 58:30 uh that is one thing that I'm using with the help of a durable functions let me show you that one I'm using fan out and fanin pattern so as I said before um aable functions comes with a lot of different packages one is that uh you will be able to uh split the the request to do two different services and these two different services will be handled
58:30 - 59:00 parall at the same time so what I'm doing here is I'm making a call to uh semantic agent uh where we convert the natural language to the SQL and I send a query to a vector search agent so this is where you store the data in the eding format and there is an repository that will handling uh this this particular task so both of these tasks are handled at the same time and I do have a consolidator agent so
59:00 - 59:30 I'm planning in here uh with the consolidator agent I'm just going to ask uh it to respond back with a better response because uh let me show you that okay suppose so this is one of my query uh that I earlier asked with theum have I visited any Beach destination in the past so this uh could be um handled better by the
59:30 - 60:00 vector C because you are now holding the sematic of the data so all you all you are sending to this uh uh database is just the destination name which is Goa or chenai but the llm knows Goa is bound to be a beast designation so it tries to say that yes you have visited a be destination in the past but you could see here semantic agent it is failing in this aspect so that is the reason I'm sending two requests that is parall
60:00 - 60:30 happening um it is kind of an hybrid search that is happening beyond the scene to give you a better response there are different places where pantic agent will be playing a better uh role than the vector ass Ag and maybe I can pick this one I'm just asking it can you list my pass booking so your vector search algorithm could be having top K response or something like that which will be just showing you two or one response right so
60:30 - 61:00 but then you natural language to SQL query what it does is it does the keyword s and it will retrieve all the pass booking for you and it will be uh able to explain you what are the past uh bookings that you did so these are two places where I have used Cosmos TB um you could also find this implementation here so I I will be triggering two different agents one is the semantic agent uh
61:00 - 61:30 inside this agent I do have different plugin so under the plugins you could see semantic layer plugin so here I'm getting the SQL query from the semantic layer and there's one more plugin here I'm doing a similarity search so there is a service that I call to so here you could see right so based on the similarities between the query
61:30 - 62:00 that you asked uh whatever the query that you send to the llm that will be converted into the embeddings and it will be crossed with what of the embeddings that you have stored in your database and you will be retrieving top end from it so here I just have one record to be returned back from this reposit so it was failing in those cases where it was not able to uh get all the data from the database so you always need to have this kind of an hybrid search within your organization so that
62:00 - 62:30 you could get more relevant information out of it very interesting awesome yeah the uh lots of lots of good chatter here um some I guess some folks asking what's durable functions I I see that was addressed already um overall lots of Praise in the chat folks like what they see very well put together uh but with that we we are at time unfortunately for today's show
62:30 - 63:00 um I did want to give a shout outs to all of our viewers for you know spending the last hour of your day with us we really appreciate your support here and tuning in to listen in what our guest has to share divar as well huge kudos for uh sharing your knowledge here and this awesome application that you've put together I know I learned a lot hopefully others can say the same um as a reminder Today's show was recorded um we we record all of our our shows and
63:00 - 63:30 you can find those out at do. netlive just navigate there in your browser and we hope to see you next time um thanks again everyone bye bye and sorry about that it's not play
63:30 - 64:00 [Music] y