Chatbots, Memory, and Langchain

GenAI & LLMs | Video9 |Part1 | Mastering Chatbots & Memory with Langchain | Venkat Reddy AI Classes

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

In this video, Venkat Reddy emphasizes the importance of memory in working with chatbots and large language models (LLMs). He highlights how LLMs typically don't retain context between interactions, which can lead to disjointed dialogues. To address this, memory components like conversational buffer memory and entity memory can be implemented to maintain continuity in conversation by retaining previous interactions' contexts. He also differentiates between models like OpenAI's Chat models for more refined conversational handling. The session is structured to enhance chatbot functionality, making them more interactive and contextually aware.

Highlights

Discussed the role of memory in enhancing chatbot interactions 🌟
Explained various types of memory in Langchain 🤖
Highlighted the benefits of using OpenAI's chat models for better context handling 📚
Emphasized the importance of context-aware chatbots for realistic interactions 🌐
Detailed installation and coding steps for setting up memory in chatbots 💡

Key Takeaways

Large Language Models (LLMs) need memory to maintain conversation context.
Langchain supports various memory types to enhance chatbot interactions.
Conversational buffer memory keeps entire dialogues, while entity memory focuses on key details.
OpenAI offers specialized models for better conversational experiences.
Memory components enable chatbots to simulate real human interactions better.

Overview

In this insightful session, Venkat Reddy presents the concept of memory within chatbots and how it enhances interaction fidelity by retaining conversations' context. He starts by discussing the limitations of standard large language models, which typically forget previous interactions, resulting in fragmented dialogue experiences.

Venkat introduces several types of memory models available in Langchain, such as conversation buffer memory, conversation summary memory, and entity memory. Each of these plays a crucial role in maintaining conversational context, ensuring that chatbots can provide more coherent and relevant responses.

He also distinguishes between using OpenAI's models and other alternatives, highlighting the superior performance of OpenAI's chat-specific models for sustained dialogues. The demonstration includes practical coding examples that illustrate how to implement these memory systems, equipping viewers with the essential skills to develop more interactive and contextually aware chatbots.

Chapters

00:00 - 00:30: Introduction to Previous Topics The chapter opens with a review of previously discussed topics. It starts with an introduction to generative AI, explaining what it is along with the concept of large language models. The interaction with these models is facilitated through a process known as prompt engineering, which involves providing structured prompts. Following this introduction, the chapter touches on the construction of large language models and delves into topics such as regression and logistic methods.
00:30 - 01:00: Machine Learning and Neural Networks This chapter delves into the foundational concepts of machine learning and neural networks. It starts by covering regression techniques to give a comprehensive overview of the machine learning model building workflow. Subsequently, the focus shifts to artificial neural networks, highlighting their role and significance. Additionally, the chapter explores natural language processing (NLP), specifically the conversion of text data into numerical forms. This process is explained through the concept of word embeddings or vectorization, where artificial networks are utilized to transform text into numerical vectors.
01:00 - 01:30: Langchain Basics and Its Components The chapter begins with transforming non-numerical data into numbers, followed by discussions on Hugging Face models and their usage. Introduced Langchain, its basics, and components like sequential chains, the concept of chaining, and connecting these chains. Discussed model input/output with Langchain, mentioning multiple input integrators and output methods.
01:30 - 02:00: Memory Component Introduction The chapter 'Memory Component Introduction' begins with a recap of the previously discussed topic about RAG (Retrieval-Augmentation-Generation). It emphasizes the importance of retrieving specific data using LLM (Large Language Models) not just from the vast world of data but specifically from the augmented documents. The chapter sets the premise for the introduction of a new concept called the 'Memory Component,' which at this point in the transcript is yet to be elaborated upon.
02:00 - 02:30: Chatbots and Memory Issues This chapter titled "Chatbots and Memory Issues" discusses the concept of memory in the context of large language models (LLMs). The transcript begins by addressing the use of a certain component, and briefly mentions the concept of agents, which are currently very popular. The focus then shifts to memory issues within LLMs, particularly how they handle sequential questions. The speaker notes that responses from LLMs often appear disconnected when consecutive questions are asked, highlighting a challenge in maintaining context or continuity in dialogue.
02:30 - 03:00: Demonstration of Memory in Langchain The chapter discusses the concept of memory in Langchain, highlighting that unlike systems where there is no memory component and each question is treated independently, a chatbot requires memory. This is so that it can maintain context across interactions. When a user asks a question and receives an answer, the memory allows the chatbot to relate subsequent interactions to previous ones, providing coherent responses that acknowledge past inputs.
03:00 - 03:30: Implementing Various Memory Forms In the chapter titled 'Implementing Various Memory Forms,' the focus is on how large language models handle user interactions by recognizing and maintaining context over multiple questions. This involves understanding and responding to questions like, 'When did India win the T20 World Cup?' with the answer being in 2007. Although the example uses an outdated slide, it highlights the need for LLMs to retain information from previous dialogues to provide coherent and relevant responses.
03:30 - 04:00: Incorporating Conversation Entity Memory This chapter explores the concept of conversation entity memory, particularly focusing on understanding which entities or subjects a user is referencing in a conversation. It uses the example of a conversation about the T20 World Cup to illustrate how the context of previous questions influences what the user is talking about. The chapter highlights a scenario where a user, after discussing India's victory in the T20 World Cup, asks about the players in 'that team'. This showcases a need to remember the subject of prior queries to provide relevant responses, and the potential pitfalls where answers might be incorrectly associated with unrelated entities such as football players like Lionel Messi.
04:00 - 04:30: Summary and Future Directions The chapter 'Summary and Future Directions' discusses the limitations of traditional large language models, particularly their inability to retain context from previous questions and answers. It highlights the necessity of implementing a memory component in these models to improve context-awareness and coherence in responses.

GenAI & LLMs | Video9 |Part1 | Mastering Chatbots & Memory with Langchain | Venkat Reddy AI Classes Transcription

00:00 - 00:30 so until now if you see we have discussed several topics we started with jna introduction what exactly is generative Ai and what are large language models to interact with large language models we use something called prompt engineering we use prompts prompt engineering is a process of giving the structured prompts and then that was just the introduction and then we went on to discuss how these large language models are built so we discussed something called regression logistic
00:30 - 01:00 regression to get an idea on the overall machine learning model building procedure and then we went on to discuss artificial neural networks artificial neural networks plus natural language processing converting text into numbers what is that process called converting your text Data into numerical vectors that is known as word embeddings or word to Vector word embeddings or word to W in word to we use something called artificial Network to convert the
01:00 - 01:30 non-numerical data into numbers and then we went on to discuss hugging phase models on hugging face platform there are several large language models how do we use them and then we have introduced Lang chain what is Lang chain some of the basics of Lang chain what are sequential chains how do you connect various chains what is the whole concept of chain how does that work that's what we have discussed and then we went on to discuss model input and output so there are multiple input integrators that are given in Lang chain and there are multiple ways you can take your output
01:30 - 02:00 and then we went on to discuss the most important topic called rag retrieval augmentation and generation we want to retrieve some data by using llm and while we are retrieving the data we don't want to retrieve the data from the general whole world of data we want to retrieve it from the documents that we have so we want to augment augment our documents from there we want to retrieve the information so until this point we have discussed today we are going to discuss another component called memory component what exactly is the memory
02:00 - 02:30 component what is the use of it and then later on we are going to discuss something called agents the concept of Agents which is right now very famous in the outside world so let us get started with this new topic called memory right now so what you'll observe when you are working with large language models is until now whatever question that we have asked to a large language model if you are asking one more question right after that the answers that you get from the large language model are disconnected let's say if you ask question number one
02:30 - 03:00 you will get answer number one and then if you ask question number two the answer number two if this question is related to this question the answer may not be related to the previous answer there is no memory component as such let's say if you want to build a chatbot now this chatbot definitely needs memory that means the user will ask question number one then the chatbot will give you answer number one when the user asks question number two obviously this is a chatbot the user thinks that he is
03:00 - 03:30 asking relevant question or related to previous questions the user will expect the large language model or the tool that we have created using the large language model to be understanding whatever was the questions context based on the previous questions or simple example would look like this for example if I say who won the when did India win the T20 World Cup so this is question number one so the answer was India won the T20 World Cup in year 2007 this is slightly out dated slide because
03:30 - 04:00 recently just a couple of weeks back India again won the T20 World Cup but if I go back let's say this is one month back one so if you go back to this uh scenario this is correct answer no problem with that now if I ask question number two give me some players in that team when I say in that team which team I'm asking about from the user point of view what is the user talking about tell me from the user point of view the user is talking about the previous question isn't it T20 World Cup team but what is the answer the answer is Leonel Messi uh like Di Maria Apollo DEA like do you
04:00 - 04:30 think these are the actual guys who are part of that team who won the T20 World Cup not even a single name is matching none of them are part of that team so that is the usual regular large language model what happens with large language model is when you are trying to ask a question related to another question usually these questions are not related the answers are not related we have to implement memory component we have to tell the large language model that you have to remember the previous questions and previous answers the whole context as well when you're answering every
04:30 - 05:00 other question until now that's not the scenario but from here on we want to make it as a habit as usual we will go to our code file I'm going to paste the code file Link in the chat window you can also access the code file from the link that I have given from the document that I have shared so here let's go to the code file as usual we will install all the packages and then uh for me I will be using open AI API key for you you can use coher API key you can put your API
05:00 - 05:30 key here or you can do it on Google collab user data way as well any one of these two you can do it so let it get installed so it's installing open AI Lang chain all the required packages so use any one of them so maybe this you can keep it as a comment and here you can bring your own API key usually I keep my API Keys here
05:30 - 06:00 it will be easy to access I can simply copy them and I can paste them here there is one secret Google secrets in this you can keep it if you keep it here you can use it in this manner otherwise you can directly paste also it's going to take a couple of minutes for installation
06:00 - 06:30 so it is asking to Grant the access because I have kept it in secret so I'm granting the access for open AI API key who here API key so from L chain llms import open AI so I'll just give you a moment for you to execute that while it is executing then I'll explain what I'm actually talking about what is the need of memory first let me tell you why we need memory I think you already got that point for example if I Define my large language
06:30 - 07:00 model my large language model is equal to open Ai and I don't want uh any temperature max tokens is optional I'll remove that so when did India win the T20 World Cup so for that question the answer would be accurate there's no problem with that at all India won the World Cup in 2007 why there is no mention of 2024 World Cup can somebody tell me that why there was no mention of like India won the World
07:00 - 07:30 Cup in 2024 as well why it was not mentioned the open AI when was the last time it got trained like when was the last data point it took you remember open AI sometimes it says that I have been trained last time I saw the data was September 2022 or something like that isn't it two years back it was the latest model so definitely it will not have access to data which is latest but we can give the data we can connect it to various tools to search in the internet later on I'm going to introduce something called agents to this l language model you can connect it to
07:30 - 08:00 Google search agent it will not only get the data from its own search engine it will also get the data from the Google search current information also it'll be able to get but as of now we are relying on the large language models ability to just give us the answer and it is giving the answer that India won the T20 World Cup in 2007 now I want to ask another related question to it give me some players in that team when I say in that team I meant to say that in the T20 World Cup winning team who are the players
08:00 - 08:30 but these players are football players they are not at all connected to T20 Cricket World Cup team now this is how your large language model most of the time behaves if you're asking two questions and there is no way that you are telling a large language models that they are connected so what you need to do is you have to introduce a memory component there are several ways of memory conversational buffer memory conversational buffer window memory so basically you will tell the large language model to remember the previous questions and the previous answers as well that means when you are giving me the answer to this question I want you
08:30 - 09:00 to consider the previous question as well not only that in open AI there are two models one is open AI when you want to do a chatbot kind of environment when you want to introduce memory when you want to make it as a conversation open AI is used for a generation for conversation there is one more model that they have shared that is known as chat open AI you can use either open AI or chat open AI but chat open AI is very much Specialized or optimized for chat based models if you want to build chat boards if you want to have conversations it's not just asking a question getting
09:00 - 09:30 the answer if you want to have conversations then instead of using open AI you can use something called chat open AI so how does that work let us see so I would say my large language model is equal instead of writing open AI I will say chat open AI so from L chain let me give a hint to Google Chat open AI so from Lang chain chat models import chat open Ai and then I will try to
09:30 - 10:00 import the memory component etc etc but before that let me show you how does this work my large language model is equal to chat open AI so chat open AI temperature is zero so this is how you will declare your large language model first I have to execute this followed by this one so earlier open AI is the model now it is the chat open so what is the difference so I have kind of listed few advantages or the distinctions between open Ai and chat open a this is the general purpose generation one the primary is used to generate and this is
10:00 - 10:30 the primary is use is to make a conversation this is specially optimized for conversations so instead of chat open can I use open a you can use open aai there's no problem with that but this is very much fine- tuned you may get better results especially when you are having a chat or a chatbot kind of conversation so you can wherever you are using open AI I have seen people using chat openai or in some cases instead of using chat open a can I use them vice versa yes but this works slightly better for memory kind of things so the functionality main functionality of open
10:30 - 11:00 AI is it's broad for text generation code completion Etc but this one is mainly focused on the context and the conversation like if you are asking a question right now and that question is related to four or five questions later on for those kind of scenarios chat open AI is much more fine tuned or kind of uh it works slightly better in fact only in open AI you have these two models if you go to coh here or if you go to any other versions of the large language models you may not have two models available in
11:00 - 11:30 other large language models you may have only one large language model that you have to use both for chat as well as content generation but only open AI they have given two models one is for usual generation of content creation data analysis Educational Tools the other one is chatbots customer service assistants virtual assistants etc etc so in simple words when you want to implement memory when you want to build chatbots use chat open a when you have conversation kind of tools that you are building then you use chat open AI when you're doing the
11:30 - 12:00 general generation of the content then you use open AI but instead of using chat open AI if I use only open AI am I going to get wrong results no you not going to get wrong results even with this also you will get right results because we not just using this one model the most important one is we are implementing memory so we will implement the memory component in the large language model so how do you implement memory component so apart from importing this model from Lang chain memory from Lang chain
12:00 - 12:30 do memory I will import conversation buffer memory this is one of the basic forms of the memory this is one of the most widely used memory type what does it do it will keep all the conversations in the buffer question followed by answer question followed by answer all of that will be kept in the buffer all of that will be considered while answering the next question that you are asking if you're asking the fifth question previous four questions and their answers will be kept in the memory so let us see how this conversational buffer memory works so I will declare my
12:30 - 13:00 large language model followed by I will declare my memory as well my memory is equal to conversational buffer memory just instantiate it and then you create a chain or a conversation so my conversation is equal to conversational chain this is a conversation chain take my large language model use my memory as of now I not use weos now I will use this conversation I will not use the large language model I will use conversation for interacting or getting the answers to my questions
13:00 - 13:30 conversational chain is not defined so let us have that conversational chain from blank chain chains conversation chain now let us try to ask the same question so I will get the same questions earlier I used to say llm do invoke but now I have created a chain for chain it is predict I will say conversation. predict conversation dot predict so that is the first one let's
13:30 - 14:00 keep it this way parent run ID I think there is one key word that we have to give that is conversation. predict input is this India won the T20 World Cup in 2007 and 2011 I think 2011 it went to finals at least 2007 it's correct based on whatever is the data it is giving us now the most important point is when we ask the next question so my next question is here so the first question was this one the next question is give me some of the
14:00 - 14:30 players in that team so earlier football players were given none of the player who was part of the 2007 World Cup none of the players were given but now some of the CLE players in this team were goam gambir yur Singh vandra sa haran Singh these were some of the players so when you introduce memory component it is going to keep all the information in the memory is there a way to see the information that is present in the memory yes you can get the
14:30 - 15:00 information in the memory if you try to see print the conversation memory buffer what was there in the buffer so human has asked this question when did India win the T20 World Cup AI has given this answer human has asked another question now when AI is answering it will take all these three questions into consideration not only question like the question answer question answer that means if I ask another question who was was
15:00 - 15:30 the player of the series the player of the series was Shahid AFI that was correct now if I see the conversation buffer memory human asked this AI told this human asked this AI told this human asked this AI told this in fact if you keep something called verbos equal to True here you can actually see all of that what is happening in the back also you can see let's say if I try to re-execute
15:30 - 16:00 this now if I try to execute this internally what is happening so the green color text that is what is happening internally usually we get to see the final output only we don't get to see what is happening inside if you want to see what's happening inside if you keep veros equal to true then you will get to know it it is saying that it is entering the chain the following is a friendly conversation between a human and the ai ai is talkative and provides uh lots of specific details to the context AI does not know the answer to a
16:00 - 16:30 question it truthfully says it does not know answer to the question Etc ET that is a very lengthy problem that was given then the human ask this then the AI is giving this answer now when I execute the second one again The Prompt current conversation current conversation is kept in the memory human message AI message human message AI message and finish the chain again if I say this so basically when you say verbos equal to true what you will get to see is what is happening behind the scene
16:30 - 17:00 so conversational buffer memory or in simple words memory is what we have introduced when we are connecting to three questions and this is very important when you are automating let us suppose if you have a rag idea in that rag you have all your conversations with your customers you have thousand agents all these thousand agents have been talking to your customers and you have a huge customer question answer database now you have millions and millions of questions that were asked and your answers were also there now can I use
17:00 - 17:30 that database to create an automatic agent so what do you have you have multiple questions and answers you have millions of conversations that have been happening manually earlier we our agents physically used to answer all the questions now I have this huge database can I build a rag on top of it that means if a customer asks a question I have to fetch the answer now until this point we have already done it in the rag
17:30 - 18:00 are you with me once you load the data once you ask a question in the rag rag will get the precise answer from multiple places it will fetch it will combine and give us the answer to that question now what we want to do is we want to make it like a conversation you will get the first answer and then if the user asks another question we want to keep this in the context and then we want to fetch the other answers so that will be the ultimate automation of all these customer service are you getting a feel of what I'm trying to say all of
18:00 - 18:30 you what can be the use of this you can totally remove the customer service agents if you have a good database with you do you agree with that if you have the good database of question answer question answer you can put up a rag on top of it you can introduce memory into that then you can create a customer agent tool when the user is interacting they almost feels that they are interacting with the physical agent because it is almost having a conversation which is getting the information from the previous conversations which will be much more authentic and
18:30 - 19:00 realistic so this is known as conversational buffer memory any quick questions before I move on there are other forms of the memory as well we will try two three other forms of the memory but what I have seen in my experience is conversational buffer memory is the most widely used form of the memory so let us try to get into few more examples how this conversation buffer memory Works once again we want
19:00 - 19:30 to get a little familiar with it so help me so from blank chain memory component from L chain memory in get the conation buffer memory so from L chain llm import open AI not only open AI I will import chat open Ai and then conversation buffer memory and then conversation chain once you have all these four you're ready I will say my large language model is equal to chat open AI not only chat open AI you
19:30 - 20:00 can use open AI no problem with that here only by open AI company they are giving chat open Ai and open AI separately otherwise other companies are giving the single large language model which you can use for conversations as well as regular generation because memory component is the one that is taking care of the memory so if you're not using chat open AI you may not get entirely wrong answers even the even with this one also you will get right answers let's say my conversation so once you declare large language model once you declare memory my conversation equal to conversation change LGE language model memory if you want verbos
20:00 - 20:30 you can give verbos equal to true and then sorry and then you'll say that conversation. predict conversation do predict and then whatever is the information that you want to give let's say I'll say input is equal to if I say simply I mean mostly these are used as customer service agents so if I say hello the answer should be almost like a human the response should be almost like the way a
20:30 - 21:00 human would respond I think I'll keep the verbos down now if I say hello the output should be hello it's nice to meet you my name is AI and I'm an artificial intelligent designed to assist with communicate humans etc etc we can actually set up this name also my name is Rahul and then I'm here to help you today like that you can assist it like or not every time you'll get the same message it'll be changing hello again do you have specific question it is saying hello again because we are counting everything with the memory okay we are keeping everything in the memory so let us suppose if I ask a question
21:00 - 21:30 conversation. predict what I want to know is the main difference between uh explain the difference between classification and regression so classification and regressions are two types of models etc etc it is giving now what I'm saying is give me the above when I say above I mean to say the previous one above output in a markdown tabular format I want it
21:30 - 22:00 in a tab format when I say above I want this system to understand this is what like this is a markdown table format that means if you try to see it it will look like a table so this is what is the output it is giving Which one is most widely used when I say which one so basically first question that I asked is explain the difference between regression and classification when I say give me the above output in the markdown table format so it is taking this output and giving it to me I want to ask one more
22:00 - 22:30 question the question is which one is most widely used so when I say that which one is most widely used for customer churn I mean to ask is it regression is it regression technique or is it the classification technique that is most widely used for customer Chan so the answer it is going to give us is classification is the one that is most widely used for customer CH once again if I try to see what is there in the buffer that means all the questions that we have asked all the answers that have been presented by AI all of that will be
22:30 - 23:00 in the buffer so if I try to see what is there in the buffer I will say my conversation the name is conversation. memory. buffer conversation. memory. buffer human said hello again we said hello actually twice and then we asked this question he I gave this answer you asked this question AI gave this answer
23:00 - 23:30 again human asked this question last question was which one is most widely used AI gave this answer so are you getting a feel of what is this memory doing in simple words memory will keep a buffer of all the questions and answers that we have been interacting with which will help us in giving the answer to the next question in the most related manner not only that it will be context aware it'll almost look like we are having a conversation rather than just simply asking a question so what we have seen here is
23:30 - 24:00 conversational buffer memory there are other forms of the memory as well there is something called conversation buffer window memory conversation buffer window memory will keep only a window of conversations here in conversational buffer memory it'll try to keep all the conversations in the memory but if you say conversational beffer window memory and you say window size equal to five only last five conversations will be kept in the conversational buffer if you are sure that you don't want to prolong this conversation for long you want to keep only last five
24:00 - 24:30 conversations relevant let's say you are an Amazon agent and you have asked a question today you have asked four or five questions about product a then you're done with it after one month you come back you're asking another question I don't need to really get all the information from the previous four five conversations or four five questions maybe you're asking about a new product so you can mention something called conversation buffer window memory and the window size or conversation summary memory that means you will create a summary of all the conversation here the complete conversation is kept instead of taking the complete conversation we can
24:30 - 25:00 use conversation summary or conversation summary buffer there's something called Knowledge Graph we can represent all of this in the form of knowledge graphs and try to save it internally or conversation entity memory there are multiple forms of the memory that are available but I have seen people using conversation buffer memory mostly but if you have to use some other form of memory let us suppose if you have to use conversation summary memory the only difference that you will be making here is instead of declaring my memory equal to conversation buffer memory you will say conversation summary memory so let's see an example of conversation summary memory so from Lang chain import
25:00 - 25:30 conversation buffer memory I will also import from Lang chain memory import it's called conversation summary memory summary memory I think summary buffer again summary buffer is mentioning the buffer window but I'm saying conversation summary memory in general take the complete summary so if I say again large language model
25:30 - 26:00 is equal to open AI my memory is not this conversation buffer memory I would say conversation summary memory now when you are doing conversation summary memory since you have to summarize since you have to summarize all this you have to give again the large language model there like how otherwise how does the summarization will happen so I will use a large language model for summarization which is llm this is the llm that I have given a conversational chain as usual my large language model my memory I can
26:00 - 26:30 keep the veros on or off let's say once I have defined this let's say I'm having a conversation I will say conversation do predict I said hello the answer was given here and then I will say conversation do predict my input is let's say I need help with my order I have ordered something I need help with that so so it is saying that uh I'm this
26:30 - 27:00 I'm assisting you here how can I help you with the order kind of relevant answer only I have ordered pizza last night or an hour back now let us see what will be the output I have ordered pizza an hour back hello there my name is name is pza board and I'm an assistant how can I
27:00 - 27:30 help you with your Piza order let's say it is not yet delivered so every time it is giving hello there my name is bizard because of this one let us suppose if I use chat open AI here so those are also right answers there's nothing wrong with that saying everything but chat open AI is
27:30 - 28:00 slightly better fine tuned compared to open AI now if I say hello chat open AI like you have seen these answers every time it is giving my name is this my name is this my name is this but chat open will not do that hello how are you doing today I need help with my order of course see see the difference between chat open and open AI both of them are giving right answers but chat open AI will give you somewhat fine tuned answer of course I'm happy to will help you with the order uh can you please provide me with your order number any order details I can assist you better this almost looks like a conversation and then I have ordered pza an hour back
28:00 - 28:30 that's what I have given as input hello how are you today and then uh how can I assist you can you provide me more details about your order and then I would say it is not yet delivered hello how are you feeling today I see you mentioned that ordered Piza an hour ago I can help track the order number if you provide me with more details on let me know how can I assist you it is asking for the order number etc etc so I would say I don't remember the order number I don't have the order number
28:30 - 29:00 no problem can you provide me with the name of the order it just looks like a person whom we are talking to now the focus that I want you to focus here is the conversation summary side of it earlier it was conversation buffer memory it was just keeping every conversation literally but if you see the memory buffer right now what's happening inside it'll keep only a summary of all the conversation print conversation
29:00 - 29:30 conversation do memory what is there in the buffer memory conversation. memory do buffer so this is what actually has been stored internally earlier the complete conversation when you see the buffer human message AI message human message AI message that's what happens with the regular conversational buffer memory but when you say conversation summary memory a summary of all of the events that have
29:30 - 30:00 happened human greets AI with a Hello and AI responds with a friendly greeting asking human how is the feeling human mentions ordering pizza an hour back to which AI assist the request more details etc etc etc are you getting what I'm trying to say here also we are keeping the memory but uh we are summarizing the whole conversation especially if you feel that your conversation gets into 20 conversations like mean 20 inputs from
30:00 - 30:30 the user 20 answers from the AI then you go with the usual conversation buffer memory the first one that we have talked about that means overall you can have 40 dialogues or 20 dialogues from your side 20 dialogues from AI most of the cases within 20 dialogues we are done do you agree with that most of the customer service agents or most of the Bots that we interact with within 20 dialogues which will take roughly 30 minutes within 30 minutes we get our overall information or our problem will be resolved but if you feel that this is a very lengthy conversation then you can
30:30 - 31:00 try something called conversation summary if you feel that it will go for 100 dialogues if you feel that it'll go for 50 60 dialogues then you can use conversation summary memory I have not seen people using conversation summary memory but if you feel that the app that you are building the tool that you're using it goes for lengthy periods of conversations and you can use conversation summary if you feel that it will go even lengthier if you feel that it may go for 100 200 conversations but still you have to keep the cont text relevant then you have to use something
31:00 - 31:30 called conversation entity memory so instead of even like the first method is keeping all the information in the buffer the second method is summarizing the information when you summarize definitely you will lose some information but anyway you you feel that you have to go for multiple dialogues that's why we are summarizing if you feel that it will go even bigger than this then you can use something called conversation entity memory that means it will keep the entities in the memory what do you mean by entities that means if I mention that hello my name is wut
31:30 - 32:00 wut is an entity it's the name of a person that will be kept inside that will be stored somewhere I'm calling you from Bangalore Bangalore is a name that will be kept inside and I'm talking to you from something called DV analytics is my company that means that is a organization name that will be kept inside I ordered a product on 12th of August date that is an entity that will be kept inside so only the important pieces of entities will be kept in the memory especially this is useful when you are going for very very lengthy discussions with the bot usually
32:00 - 32:30 conversation buffer the first one that we have used that is mostly sufficient but if you feel there is a complexity in the system either you try conversation summary or the other one is conversation entity memory let me give a hint for this one conversation entity memory so how do you do that from the memory my memory is equal to from Lang chain conversation memory
32:30 - 33:00 conversation entity memory now what happens in the conversation entity memory we can actually see what is there inside so in the conversation entity memory there is one conversation memory template The Prompt template that is used internally that can be also seen we will see that let's say I would say my entity memory conversation template so what is the template is equal to so this is what happens internally you are an AI assistant so that means when we are using
33:00 - 33:30 conversation entity memory this is what is the template that is working the promt template internally working you are an AA assistant and your job is to remember all this you are designed to be able to assist with wide range of tasks from answering the simple questions etc etc you're constantly improving overall you are a powerful tool and these are all the context so you will be keeping all the valuable you will be keeping a range of task providing this and then you will be keeping all the information when you go through it there somewhere we are giving you have to remember all the
33:30 - 34:00 entities so let us try to use conversation entity memory then we will go to our next topic so I would say my memory is equal to conversation entity memory obviously when you are doing something Beyond buffer buffer is simply keeping question answer question answer it doesn't require a large language model but entity memory summary memory summary memory has to summarize the information entity memory has to find the entities they require large language model so I will say my large language model is chat open AI I don't really care about the max
34:00 - 34:30 tokens here instead of chat open AI you can give another model as well C here or something but this is a place where you start seeing that uh the models like chat open AI or open AI they are working much better than the other models so if you compare the results that are given by chat open a to coh here or any of the other model the results will be pretty poor from the other models that is the reason why chat open is very famous right now it is a little costly but people have to use it because of the accuracy accuracy wise I think if you're using using coher have you observed the difference between the accuracy of this
34:30 - 35:00 model and the coher or cohere is it as good as this one is anyone doing the exercise along with me have you observed the same kind of results or how are the results that you are getting when you're using coher model is anyone doing these exercises along with me when you start executing using coher model what you'll observe is it is uh giving you slightly less accurate results compared to this chat open a model this is one of the best model available right now so my memory is conversation entity
35:00 - 35:30 memory my large language model is chat open AI now my conversation is conversational chain large language model memory is memory and verbos is it either true or like let me keep the verbos true then we will get to see what is happening inside what are the entities that are stored we will be able to easily see them in the entity memory so let us see what is the issue input variables prompt history input but got entities etc etc I think in this we have to give one more input that is called prompt so I will say my prompt is
35:30 - 36:00 entity memory conversation template entity memory conversation template so this is the one that we will give it as input I think this should be taken care automatically but it is expecting us to give that prompt so that prompt is also given so what is that prompt you're an AI assistant you have to keep all of this so basically we are saying in the conversational chain we are giving the large language model and we are also giving a prompt usually in a chain we give the prompt we give the large language model and then then we give the memory but earlier prompt was not required but here specifically prompt
36:00 - 36:30 need to be mentioned then I would say my conversation. predict my name is wut so wut is a name I'm from Bangalore Bangalore is a place I need help with my order so if I execute this internally since I kept varos equal to True internally you can see that context wut Bangalore these entities have been kept internally they will be used later on now my next conversation that I'm going to have is uh I ordered a mobile phone I ordered
36:30 - 37:00 a mobile phone on Tuesday night 8:00 p.m. 12th August Tuesday night 8:00 p.m. so again what are the entities it is keeping the conversation as well as the entities what are the entities that it is keeping context mobile phone let me just say let me keep it uh this way let us try to get all the entities in
37:00 - 37:30 the buffer so when I say conversation my order number is this what is the status of my order let us see so order number again will be kept in the context so if I want to see what is the overall buffer inside it will keep all the conversations apart from that it will keep this entity information also once the order number is tracked next time even if you ask this question after 100 conversations it it'll be able to get you that once the name is accessed once the name is passed
37:30 - 38:00 on once the name is stored even after 20 30 40 conversations we will be able to say that hello wer did I help you well hello wet uh wishing you a very happy pleasant day that's it so once you store all the entities like entity memory is one of the memory so if I want to see what is happening in the buffer conversation buffer human said this AI said this human said this AI said this whatever human said whatever AI has said and the
38:00 - 38:30 order number so this is one of the perfect way to handle the memory but it is slightly Crosser whenever you are introducing llm you're using llm in a chain you're using llm for memory also that means it is double costlier but we always have to do that assessment the tool that I'm building is it okay to be costly but it is still accurate or we if we want to save some uh amount of money we want to give less tokens then you can use conversation buffer memory so I have introduced three types of memory to you there are already several memories in future there are other types of memories
38:30 - 39:00 may get added but overall the purpose of the memory is to keep track of the previous pieces of information so that we can create systems like chat Bots are you with me everyone are you getting a hang of what is the memory doing here yes yes sir yeah so the one that I have shown you is the conversation buffer memory you can go ahead use it for the starters only only when you feel there is a need for a different type of
39:00 - 39:30 memory then you can do some research on the other forms of memory maybe you can experiment with conversation summary memory or conversation entity memory or maybe Knowledge Graph memory or some other forms of the memory that would be introduced later on