Don’t Embed Wrong!

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

In this video, Matt Williams discusses the importance of using prefixes when embedding content in AI applications. Initially skeptical, Williams shares his experience of learning about prefixes and their impact on improving the accuracy of responses in various embedding models. Through testing, he demonstrates how the inclusion of prefixes can significantly enhance model performance, especially in retrieving relevant documents and answering queries accurately. He concludes by encouraging viewers to experiment with prefixes in their own projects, emphasizing their surprising effectiveness.

Highlights

Prefixes enhance AI model accuracy by providing context 🧩
Olama models support diverse prefixes for embedding tasks, not documented 🤔
Real testing shows prefixed models answer queries more precisely 📊
Prefixed 'Nomic' and 'Snowflake' models outperform non-prefixed counterparts each time ⏫
Llama models lagged behind in embedding task effectiveness 🐌

Key Takeaways

Prefixes can significantly improve embedding accuracy 🎯
Three out of five Olama embedding models support prefixes, despite lack of documentation 📚
Using prefixes like 'search_document:', 'search_query:', and classification types can double results accuracy 📈
Experimentation shows noticeable improvement with prefixed models compared to non-prefixed ones 🔬
Embeddings with prefixes outperform regular LLMs for this task 🚀

Overview

Matt Williams introduces the surprising effectiveness of using prefixes in content embeddings, revealing his own initial misconceptions and newfound understanding. He explains how prefixes, small text additions, serve as context clues that guide embedding models to more accurately process and answer queries.

Testing these prefixes, Williams shares concrete results that suggest marked improvements in output quality. By embedding large datasets using prefixes, models like 'Nomic' and 'Snowflake' were able to deliver significantly more relevant documents and accurate responses compared to their un-prefixed versions.

Throughout the video, Williams contrasts the efficacy of dedicated embedding models against regular LLMs, urging viewers to apply and test prefixes themselves, highlighting how this small tweak can lead to significant advancements in AI-powered applications.

Don’t Embed Wrong! Transcription

00:00 - 00:30 you are doing embedding wrong I was in the founding olama team and until this week I did it wrong too I read the docs I even made videos about it but then Aaron on the anomic team introduced me to prefixes depending on your use case and your content adding a prefix to your content can make a massive difference to the success of your rag application you may get results that are twice as accurate as before what do I mean by a prefix well before you send a chunk to the embedding model you insert a piece
00:30 - 01:00 of text in front describing its purpose three of the five embedding models in the official olama Library support prefixes and they're different for each one and none of the prefixes are part of the olama documentation so let's take a look at them and see how using them Compares this is the olama course every week I put out another video that teaches you a bit more of everything you need to know about using olama to do everything you can do with AI we're a
01:00 - 01:30 few episodes in and I have lots more to come okay so I chunked up the scripts to my last few videos and then embedded them with five models without prefixes then repeated that with the three that support prefixes then I asked the question how do I install n8n with Docker compose when using nomic embed text without prefixes I didn't get an answer but with prefixes in place I got this and that's a much more complete answer and this was when I asked it to
01:30 - 02:00 give me a single dock or chunk back from the vector database in just a second I'll go through all the models and a bunch more questions so stay on for that so what are the prefixes well with nomic embed text there are two main prefixes to use for the source documents that you add to the vector store add search doent colon before the chunk of text and to the query that you want to run the similarity search against you add search _ query colon snowflake
02:00 - 02:30 Arctic and mixed bread both use the phrase represent the sentence for searching relevant passages colon instead of searchquery colon and don't use any prefix for the documents Snowflake and mix Brad just use that one but actually nomic uses a few others as well if you're doing a classification then use the prefix classification colon but if you're trying to discover common
02:30 - 03:00 topics in the text or eliminate semantic duplicates in the text then you want clustering and the prefix for that is clustering colon if you're using the nomic API with their hosted service then there's an option in the API call but with AMA it really is just sticking the text the prefix text in front of the rest of the text so that's pretty easy but does it really make a difference we'll find out if prefixes make a difference here in just a sec but you
03:00 - 03:30 know what makes a massive difference to me personally that would be you going down and clicking the like button and then subscribing to the channel it helps me know I'm on the right track helping you learn about AMA well in the GitHub repo for this video which I've linked to in the description below I have five main bits of code first there's a step to prepare the database Vector prep I think this uses chroma DB to create 16 collections so there are five models and three of them support prefixing so
03:30 - 04:00 that's eight Collections and then I have one set of those that includes the questions in the vector store and then another set that don't include the questions so that's where the 16 comes from is it appropriate to include the questions well there are many documents you might want to add that include a question say as a heading this simulates that case there are four scripts that have been added and there are 13 questions based on them I chunked up the the scripts by paragraph looking for new
04:00 - 04:30 lines and then getting rid of any empty chunks and then I embed them accordingly for each model creating embeddings is a core part of any rag application as well as performing clustering classification and other processes for rag you generally get some Source text split it up into smaller pieces create embeddings from those pieces and then store that in a data store then you ask a question and find pieces similar to the question based on those embeddings
04:30 - 05:00 then the matching plain text Source chunks get handed off to the model for processing again it's the plain text source that gets sent to the model and not the embeddings that's what the docs say that's what the my videos have said and that's generally true but prefixes is the little wrinkle that I just learned about now for the first test I embed the question with and without the prefixes as needed by each model and then I find the top two results from each since the questions are in the vector store the question I asked will
05:00 - 05:30 always be the top result because it's 100% match so I remove that from the resulting docs the results of this test show just the docs output from the vector store that come up as a good match it does this for all the combinations which is 13 * 8 which is about 104 not about it is 104 then I go through each one and grade it for the first test I'm just figuring out if the information provided could po eventually
05:30 - 06:00 answer the question so let's take a look at the results so the first question is what is n8n and the docs I get back aren't that promising either I get nothing or I get something about installing it or I get one of the other questions but nothing about what it is next question is about installing it with Docker compose nomic doesn't give me anything but nomic with the prefixes is much more useful the prefix snowflakes also gave me a good result okay so how do I run n8n on my Mac
06:00 - 06:30 prefix nomic is the only one that gives an okay result especially considering it it only has one doc to work with now this video will get really boring really quickly if I go through every result with you so let me speed it up and just get to the overall results so we see in this small data set that is definitely not statistically very interesting that nomic embed text with the prefixes comes out ahead and snowflake with prefixes
06:30 - 07:00 comes second and nomic without prefixes is third but then again these numbers are so small it's not all that conclusive if I could hire a young kid to do the grading or even cheaper a grad student then testing on 20 scripts with 200 questions that get asked 10 times each could be more interesting but I'm not going to sit through grading 30,000 results so this is what we have and the prefixes seem to come ahead of course
07:00 - 07:30 the obvious concern here is that we didn't actually test if a model can answer the question so I run this test again but instead of outputting docks I output an answer the answer is being generated by IBM's new granite 3 DSE 8 billion parameter model which seems to be really good at this task you can see the prompt I use here from the source code I've asked it to only use the info provided by the docs but these are llm models So the instructions not always followed so if there is an answer I
07:30 - 08:00 haven't output the documents to verify the answer came from the documents and as usual the source code is in the same repo I always use and the link is in the description below so let's see how that does and things here seem to be generally the same there were a few instances where it came up with the right answer but that had to be the knowledge in the model because there was nothing relevant in the docs provided from the database so I Mark those as a fail
08:00 - 08:30 this is interesting but not really a real world test in most cases I'm giving the model a single document from the vector store in reality I'd probably retrieve more docks and the question wouldn't be in there so that's what we tried next with the top five docks given to the model and the questions were not added to the database this is test three in the repo we see that the prefixed versions of nomic and snowflake still won with the unprefixed nomic still coming in
08:30 - 09:00 third but one of the interesting things that we have seen each time is how prefix nomic is a little bit better than unprefixed nomic but prefixed snowflake is a lot better than unprefixed snowflake but generally all the embedding models seem to perform better when we allow more documents to be used from the data store they're all kind of catching up to the leaders the final test test four just increases the number of documents pulled from the vector store from 5 to 10 and now most of the
09:00 - 09:30 models perform a lot better in fact they all seem about equal with the one exception being the unprefixed snowflake which was definitely worse of course all the numbers were dealing with are small and to really understand the differences we should be asking a lot more questions and asking them over and over and over again we should play with different chunking sizes and number of chunks delivered from the data store but at a
09:30 - 10:00 high level it does seem that adding the prefixes does make a bit of difference most of the time especially with snowflake now one of the questions I get every now and then is well how about using the Llama models for embedding they're a lot bigger and although they may be slower maybe they get better results so I added llama 3.1 8B and llama 3.2 3B and mistol which is 72b to my model config I ran Vector prep to get
10:00 - 10:30 the vectors added to the collections and then ran test two that's the one where it generated an answer based on one or two documents pulled from the data store none of those three models could get any of the questions answered so then I tried with test three that was the one where the questions were removed and five docks were pulled from the model llama 3.1 was a little bit better with than snowflake with PR prefixes but all of the other
10:30 - 11:00 embedding models perform so much better so the embedding models are orders of magnitude faster and they just come up with better results please don't use regular llms for generating embeddings if you're still doing that stop and use a model meant for embedding now I would love for you to try this out with your own documents play with the chunking sizes my code is written using doo2 which is using typescript but it's not
11:00 - 11:30 that difficult to switch over to using python or whatever you would like to use what do you think were you surprised by anything you saw it's so easy to add prefixes there's no reason not to use them and they definitely get better results I was so surprised when I heard about prefixes the other day and and I knew in my heart that it wouldn't make a lick of difference and then I was shocked when I saw the difference they
11:30 - 12:00 made so hopefully you learned something I certainly did prefixes huh the way of the future thanks so much for watching goodbye