Understanding the Basics: Clustered vs. Nonclustered Indexes
SQL Indexes | Clustered vs. Nonclustered Index | #SQL Course 35
Estimated read time: 1:20
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
In this comprehensive video by Data with Baraa, learn how to optimize your database performance using SQL indexes. Understand the significance of indexes, with a focus on two types: clustered and non-clustered indexes. Baraa explains the differences, uses, and trade-offs between these indexing methods, and walks through practical SQL examples to create and manage indexes. This tutorial aims to equip viewers with essential indexing knowledge to enhance their database operations.
Highlights
Indexes improve query performance by reducing the time needed to locate data on large tables. 💡
Clustered indexes physically sort table data, creating a 'table of contents' for quick access. 📋
Non-clustered indexes act like an index in a book, providing detailed pointers without altering data order. 📈
Creating the right index requires understanding of both the database structure and the purpose of the data retrieval. 🔍
Each type of index serves specific use cases, and a balance between clustered and non-clustered is often needed. ⚖️
Key Takeaways
Indexes in SQL help speed up data retrieval by providing a structured way to locate data, similar to a book index. 📚
Clustered indexes sort and store the data rows physically on the disk, improving read performance but slowing down write operations. 🚀
Non-clustered indexes create a separate structure storing pointers to the data pages, allowing multiple indexes on a table. 🎯
Choosing the right type of index involves trade-offs between read speed and write efficiency, depending on database needs. 🔄
Clustered indexes are ideal for columns with unique, rarely changing values, such as primary keys. 🗝️
Overview
In Baraa's insightful video tutorial, he dives deep into the world of SQL indexes, focusing on clustered and non-clustered indexes. He starts by explaining what indexes are and their vital role in expediting database queries, particularly in large data tables akin to a library's index.
Baraa distinguishes clustered indexes by their ability to sort and physically store data in an organized manner. He uses relatable analogies, comparing them to a book’s contents for quick navigation. This indexing type boosts read performance at the cost of slower write operations due to the data sorting involved.
On the other hand, non-clustered indexes maintain data order independently by storing references as pointers. Baraa likens them to book indices, allowing multiple indices per table and providing a flexible approach to data management without direct data rearrangement. He concludes with practical SQL examples to implement these indexes efficiently.
Chapters
00:00 - 00:30: Introduction to Database Optimization with Indexes The chapter introduces the concept of database optimization focusing on the use of indexes. It explains the importance of indexes in enhancing database and query performance. The chapter covers two key types of indexing: clustered and non-clustered indexes, detailing their differences and appropriate usage contexts. Additionally, it includes practical SQL instructions on creating these indexes.
00:30 - 01:00: Introduction and Benefits of SQL Indexes Welcome to the video! My name is Bar, and I lead data projects at renowned companies like Mercedes-Benz. My goal is to share my expertise along with best practices through high-quality video content on this YouTube channel. I encourage new viewers to subscribe if they want to keep updated with the latest insights.
01:00 - 02:00: Understanding Indexes with Analogies An index is compared to a guide or a table of contents for a database, helping to expedite data search processes, especially with large tables. The analogy compares an index to a book's index, highlighting how it allows you to directly locate specific information instead of searching through every page.
02:00 - 03:00: Different Types of Indexes: Structure, Storage, Functions The chapter uses an analogy of a hotel to explain the concept of indexes in databases. It describes the inefficiency of searching for a room without guidance, comparing it to searching data without indexes. The hotel numbering system and maps provided by the reception represent indexes in databases, which help locate data efficiently.
03:00 - 04:30: Database Storage Basics and Data Pages The chapter discusses the importance of indexes in databases, comparing them to maps and signs in a hotel that help you find your room easily. It highlights that indexes are essential for quickly locating data without scanning the entire database. Additionally, it addresses a scenario where someone with a large table seeks to speed up queries using indexes.
04:30 - 05:30: Heap Structures and Full Table Scans This chapter discusses table usage in databases, specifically focusing on whether the table is used for text search or complex analyses. It introduces the concept of different indexes in databases tailored for different purposes and categorizes these indexes as per their organizational and referencing structure in the database system.
05:30 - 09:00: Clustered Indexes: Building and Benefits This chapter examines clustered indexes, focusing on their construction and the advantages they offer. Initially, the text distinguishes between two primary index types: clustered and non-clustered indexes, which are essential for understanding data organization. Further into the discussion, the chapter introduces an index classification based on storage methods; namely, row store and column store indexes. Spatial organization of data within databases is emphasized through these index types, alluding to their role in data retrieval efficiency. Additionally, a third index classification based on functions is introduced, highlighting the unique index type among them. This establishes indexed functions as another dimension of database indexing strategies.
09:00 - 15:00: Non-Clustered Indexes: Building and Comparison This chapter delves into various index types, specifically focusing on non-clustered indexes. It highlights that each index type has its trade-offs, with some improving read performance and others enhancing insert and update operations. The chapter emphasizes the importance of selecting the appropriate index for specific tasks. It promises a detailed exploration of each index type, starting with the structure of clustered and non-clustered indexes, to understand their functionality and creation.
15:00 - 21:00: Practical Usage and Syntax of SQL Indexes The chapter begins by setting the context for understanding the role and function of indexes in SQL databases. It prompts the reader to consider the consequences of not using an index by imagining a scenario with a database containing a customers table with 20 entries. On the client side, this table resembles a spreadsheet with rows and columns, but the underlying storage structure is more complex and handled differently by the database system.
21:00 - 27:00: Composite Indexes and Best Practices The chapter titled 'Composite Indexes and Best Practices' discusses the storage mechanism of databases. It explains that data is stored in files on disk, specifically in units called pages. Each page is the fundamental storage unit in a database, with a fixed size of 8 kilobytes. The transcript emphasizes understanding the role of pages in efficiently managing and accessing data within a SQL database.
27:00 - 29:00: Conclusion and Next Steps The chapter discusses how SQL does not store data in traditional row and column formats but instead uses data pages to manage rows and columns' metadata and indexes. Every interaction with data involves reading and writing to these pages. Unlike selecting specific columns, SQL queries fit a data page to access the rows inside. The chapter emphasizes understanding data pages and other elements for efficient data handling.
SQL Indexes | Clustered vs. Nonclustered Index | #SQL Course 35 Transcription
00:00 - 00:30 all right friends so now we're going to start talking about how to optimize the performance of your database and queries and one of the most important techniques is to use indexes so in this video you can have like an introduction about what are indexes two very important types of indexing we have the clustered index and the non-clustered index and we're going to learn as well what are the main differences between them and when to use which one and of course we're going to practice any SQL on how to create those indexes so let's go
00:30 - 01:00 hey friends if you're new here welcome my name is bar and I lead data projects in big companies like mercedesbenz and I'm here to share my knowledge and best practices about data in high quality videos in this YouTube channel so if you don't want to miss anything make sure to subscribe now let's go back to the video so what is an index an index is a data structure that provides a quick access to the row to improve the speed of your
01:00 - 01:30 ques so an index is like a guide for your database in order to speed up the process of searching for data especially if you have like big tables so now in order to understand what are indexes imagine you have huge book and you want to find a specific topic or a chapter instead of flipping each single page in order to find the topic that you are searching for you would use the index at the back of the book in order to jump straight to the right page and that's exactly what index does but for your data another analogy that I use in order
01:30 - 02:00 to understand indexes is think about the indexes as a big hotel now let's say that in the hotel we don't have any guide and you would like to find the room number let's say 5001 now what you're going to do you're going to go and search for your room floor by floor and checking each room until you find your room but instead of that thankfully hotels have a numbering system and you can ask for a map from the reception in order to understand in which building in
02:00 - 02:30 which floor you can find your room so by just following the map and maybe some signs it's going to be very quickly to locate and find your room in such big hotel and that's exactly what each database needs it needs an index in order to help the database finding and locating the right data without having to scan everything and now let's say that you ask me you know what I have this big table and I would like to speed up the queries using indexes and my first
02:30 - 03:00 question going to be what are you exactly doing with this table are you using this table to search for a text or are you doing like complex analyses with this table and the reason why I'm asking this is that we have different indexes in databases for different purposes so now let's have a quick look to the different types of indexes that we have in database I divide the indexes in databases into three categories the first one is by the structure how the database is organizing and referencing
03:00 - 03:30 the data and here we have two types the clustered index and the non-clustered index those are very important to understand now we have another category for the indexes we can divide them by the storage and in this category we are talking about how the data is stored physically in the database so we have two types we have the row store index and the column store index and the third type is the functions and here we have two types we have the unique index IND
03:30 - 04:00 and the filtered index now each index type has its own strings but as well there is always a tradeoff some might improve their read performance the other one might improve the insert and update operations so it's all about choosing the right type of index for the job so now what we're going to do we're going to go and deep dive into each of those types in order to understand how they work and how we can create them and we will start with the first category the structure we have the clustered index and the non-clustered
04:00 - 04:30 index now before we dive into how the indexes Works in databases let's understand first what happens to the database tables if you don't use any index when you create a new table in your database like for example the customers table where you have let's say 20 customers inside this table what you're going to see at the client side is like spreadsheets like a table with rows and columns but behind the scenes the database store it a bit differently
04:30 - 05:00 it going to store the data in a data file on the desk and inside this file the data can to be stored inside blocks called pages so it's not like rows and columns data are stored inside data files and inside the data files we have pages so what is a page a page is the unit of data storage in a database and it is a fixed size of 8 kiloby where the SQL database can store anything inside
05:00 - 05:30 it can store inside it the rows of your tables or columns metadata indexes and every time you are interacting with your data the SQL is reading and writing to those pages so as you can see the SQL is not storing the data inside like rows and columns so if you are running a query the SQL is not like selecting a specific column it always fits a data page in order to read the rows inside this page and the main two tabs that we're going to learn is the data page
05:30 - 06:00 and the index page so how the data page looks like it is divided into multiple sections the first section is the page header where the database can store Key informations about the metadata like the page ID and it has the following format it start with the file ID like one and then we have a unique number for each page so for example 150 so the page header is a fixed size of 96 pites now to the next section we're going to have a variable size this is where your data row is going to be stored so your actual
06:00 - 06:30 data AR row is going to be stored in this section and the SQL going to try and fits as many rows as it can in one single page and this of course depends on the size of each row so if you have like a large table where the rows are really big so SQL can fit only few rows in one single page and now moving on to the last section in the data page we have the offset array this is like a quick index for the rows stored inside this page it keeps track of where each
06:30 - 07:00 row begins so that the SQL can easily locate a specific row without having SQL like scanning the entire page in order to find a row so this is the structure of the data page and this is exactly how the SQL stores data inside the databases so now back to our example where we have the customer table and 20 rows so let's see how SQL going to be creating those pages now if you are not using any index in this table so now what going to happen is SC going to insert the data
07:00 - 07:30 inside those pages as you are inserting the data inside the customers so maybe first you are inserting the customers like 12 5 15 6 7 and SQL going to insert it to the databases exactly like that so that means SQL is just inserting the data as you insert it to the table so let's say each data page is like fitting only five rows so after we insert five customers SQL going to go and create another data page for the next rows so in the next page the SQL we're going to insert the next five customers and once
07:30 - 08:00 it's full it's going to create another Data Page in order to start adding the next customer until we have like for example four pages for the 20 customers so now if you check the customers inside those four pages you see that they are not sorted at all and that's because in this scenario we are not using any index so we call this structure as a heap structure so a heap table is a table without a clustered index that means the rows are stored randomly without any
08:00 - 08:30 particular order this is not really bad because it's going to be very quick to insert data inside this table but of course finding something from this table going to be very slow so this is the first tradeoff you have a very fast wrs but very bad reads think about it like you are throwing all your papers in a drawer without organizing them so you can toast things very quickly in this drawer but if you want to search for specific paper l
08:30 - 09:00 it's going to be very long process until you find it because nothing's in order so now let's see how the SQL going to handle if you read something from this table let's say that you are searching for the customer with the ID 14 so now SQL has totally no idea where to find this customer so SQL going to start fishing each Data Page and start scanning each row so it's going to start with the First Data Page and start scanning well SQL will not find 14 here so SQL going to go to the next page and start scanning as well searching for the
09:00 - 09:30 ID 14 and nothing going to be found the same thing for the third page as well SQL will not find 14 so SQL going to go to the last Data Page and there after scanning four rows in this datab page finally isq going to find the customer number 14 and it's going to return it for the clients so as you can see in order to find one customer SQL did read four different pages and scanned like 19 rows in order to find the customer and this process we call it Full Table scan
09:30 - 10:00 so the Full Table scans means SQL is scanning the entire table Page by Page and row by Row in order to find specific row and of course for this table maybe it's not a big deal but if you have like a big table where you have like hundred of thousands or maybe millions of rows searching through the Heap structure going to be very painful and slow in order to locate one row and here exactly why we need indexes in skill databases so let's understand the first type of indexes the clustered
10:00 - 10:30 index all right so now let's understand what can to happen if you create clustered index in your table so say you created clustered index on the ID column of the customers so the first thing that's going to happen isql going to physically sort all the data based on the column ID so the rows going to rearrange in each data page from the lowest to the highest so in the first page we're going to have the first customer ID number one then 2 3 4 5
10:30 - 11:00 until we reach in the last page the last customer number 20 so as you can see the first page has the lowest value and the last page has the highest value so that's not all the next step is that SQL going to go and start structuring and building the B3 so what is a B3 a B3 short for Palance tree it is hierarchical structure that store the data as a tree upside
11:00 - 11:30 [Music] down it start with the root the root node and then it keep branching out until we reach eventually the leaves between the leaf nod and the root node we call this section the intermediate nodes so it could be like one level or multiple levels between the root and the leaves and once SQL construct the B tree it's going to be very easy for SQL to navigate through the B Tree in order to
11:30 - 12:00 find specific information so let's see how SQL is building the B tree for the clustered index now very important to understand that the leaves the leaf nodes at the P tree for the clust index contain the actual data the data pages so all your nice sorted data pages and your data is stored at the leaf level then after that it's going going to start building the intermediate nodes and here the database going to use different type of Pages we have the
12:00 - 12:30 index page so in the index page we cannot find the actual data the entire rows but instead the index page stores a key value that contain a pointer to another index page or to a data page so for example we have here the value one the key and then the value going to be the ID of the data page so here we don't have like the horror rle about the data we have here only a pointer to another data page so here we are telling isql if you are searching for IDs between one
12:30 - 13:00 and five you can locate it at the data page ID 1. 100 and then we can store in this index page another pointer where we can tell SQL if you are searching between 6 and 10 then you can locate it at the second data page so this is the structure of the index page it contains only pointers to another page and the same thing for the second two pages the SQL going to create another index page where it going to says if you are searching for IDs between 11 and 15 you
13:00 - 13:30 can find it at the third page 1.12 and for the last group between 16 and 20 we have another pointer to the last page to the page number 1.13 so as you can see inside those index Pages we have like a pointer for each group of IDs for each cluster so for the group of customers between one and five we have one pointer and for the second group between six and 10 we have another pointer so that means we we don't have here a pointer for each row
13:30 - 14:00 we have a pointer for each group for each cluster that's why we call it clustered index and now once SQL is done building the intermediate nodes SQL going to go and build the last node the root node where it says if you are searching for customers between 1 and 10 then go to the index page with the ID one. 200 so that means the root node here is pointing to another index page not Direct directly to the data page and
14:00 - 14:30 the same thing we need another pointer for the second index page so the customers between 11 and 20 go to the index page with the ID 1.21 and this is exactly what can happen if you create a clustered index in SQL first you're going to go and physically sort all your data in the datab bages so if it's from the first time sorted randomly SQL has to arrange everything and sort the data from the scratch and then it's going to go and build this
14:30 - 15:00 structure where you have in the root node an index page in the intermediate nodes the index pages but at the leaf level at the leaves we have the actual data the data pages so now let's see what can happen if you query the table where you search for the ID number 14 so it's going to check which pointer to use since 14 is in the group between 11 and 20 it's going to go and use the second pointer to the index page with the ID 1. 2001 and here the SQL going to open this
15:00 - 15:30 index page and check the pointers so since 14 is between 11 and 15 it going to go and use the pointer to the data page 1.12 and with that SQL located the correct Data Page the third page and now SQL going to open this data page and find the customer ID number 14 so as you can see it was very fast for SQL to locate the correct data page with only three jumps from the root node to the the intermediate nodes the SQL were able
15:30 - 16:00 to find fast the correct Data Page and here SQL needs only to read one data page instead of reading as we saw in the H structure four different datab pages and of course you might say but still here we are reading like three pages well reading an index page is very fast compared to the data page because reading a data page is always slower than reading an index page so as you can see this P3 structure the clustered index structure did help the SQL and the database to locate the right data in the right
16:00 - 16:30 Data Page without having unnecessary read operations on different databases and this is exactly how the clustered index works in the SQL database all right so now we're going to move to the second type and we're going to understand how exactly SQL build and create the nonclustered index so let's go so now we are back to the Heap structure where our table don't have any index and our data are stored randomly
16:30 - 17:00 inside the data pages and now if you go and create a nonclustered index on the customer ID what can to happen and here the big difference that SQL will not touch or change anything on the physical actual data on the datab bages so the databas is going to stay as it is and nothing going to be changed and the SQL start immediately building the B structure so it's going to start immediately building an index page and this index page is a little bit different than the one that we have
17:00 - 17:30 learned previously so since it's is index page it going to store pointers but this time SQL going to store in the key the customer ID so one is the customer ID and now the value the pointer it will not be the datab page ID we will be more specific so we're going to have like an address where exactly the row is stored so it's going to start with the file ID the page number because the customer ID one is stored in the page one double Point 100 to but SQL going to go add as well the offset
17:30 - 18:00 number of the row where exactly in the page we can find this ID and the whole thing we can call it an air ID the row identifier so now let's see quickly how the index page is pointing exactly to the row inside the data page so the first part of the row identifier is mapping to the data page ID and then from the 96 it's going to take us to the offset and that's exactly the location of the the row number one so 96 is the P
18:00 - 18:30 where we're going to start finding the row number one and that's going to T us exactly to the place where we can read the information about the row ID number one so this is how the index page is locating the exact place of the rows so SQL can go and continue and assign for each customer ID pointer to the exact location so as you can see now in the index page we don't have like a pointer for each group of customers like we have
18:30 - 19:00 learned in the Clusters index we have now a pointer for each ID and this type of index page we call it R locator page so now isq going to go and continue and map a pointer for each customer ID that we have inside our table so we will have multiple index Pages pointing to our data page so as you can see we have a lot of pointers and the data inside the index page is of course sorted but inside the data pages LIF at it is and
19:00 - 19:30 now those index pages that has the row identifier going to be stored at the leaf level of the P tree so at the leaf level we don't have the actual data the data Pages we have index Pages where we have pointers then to the actual data and then isq going to go and start building the intermediate nodes it's exactly like the clustered index where it going to point to another index page so between one and five customers it going to be in the index page page number 200 so the next step is going
19:30 - 20:00 going to go and build the intermediate notes it's going to be exactly like the clustered index nothing going to be changed it's like the same structure so it is an index page pointing to another index page but this time for a group of customers and then we're going to have as well the root node so again we call this structure as a B3 structure where they point to another data pages but the databases are not part of the P3 so now let's see if you are searching for the cust ID number 14 what's going to happen
20:00 - 20:30 is going to start again from the root node and then it's going to find the pointer to the intermediate node and then jump to the next step to the intermediate node and then it's going to find the pointer to the index page between 11 and 15 and then is s going to go and scan this index page and find okay for the customer ID number 14 we have the following address so it's going to go and locate the exact datab page and as well the exact place of the row so it can go and jump immedately to the row without scanning anything else so
20:30 - 21:00 here this time with the nonclustered index the SQL did read three different index pages and finally the one Data Page in order to find the data so if you compare to the cluster index you can see that we have here one extra layer one extra index page to be scanned in order to find the right place of the arrow and this is how SQL creates the B Tre for the nonclustered index and how it scans it in order to find the information
21:00 - 21:30 all right so now when I think about the clustered index and the non-clustered index I think about a book you can think of the clustered index like the table of contents at the front of the table so the table of contents going to tells you where to find each chapter and the chapters are exactly sorted like the table of contents and this is exactly what the clustered index does but now in the other hand think about the non-clustered index as has the index that you can find at the end of the book
21:30 - 22:00 the index of the book is a very detailed list of topics terms and keywords where it points exactly to the location where you can find it in the book and the content and the topic of the book is not sorted like the index of the book and this is exactly what the noncluster index does it is coexisting with the data it is an extra list where it going to point exactly where we can find the data inside our table all right so now let's put those two indexes side by side
22:00 - 22:30 to understand the differences between them so the structure of the clustered index is a b Tre where it start with the root node where we have an index page this index page is pointing to the intermediate nodes where we have as well index pages and those index pages are pointing to the actual data to the data pages so at the leave level of the cluster index we have the data Pages the actual data what's special about the cluster index is that it physically sort
22:30 - 23:00 the data inside those pages so everything here is physically rearranged and sorted now if you are talking about the nonclustered index we have as well a p Tre so the same thing at the root node we have an index page pointing to an intermediate index page but this time the intermediate nodes are pointing to another index page they are not pointing like the class of index to a data page they are pointing to index page so now if you check this structure you can see that at the leaf level for the clustered
23:00 - 23:30 index we have the actual data the data bages but on the other side at the leaf level for the non-clustered index we don't have the actual data we have index pages but those index pages are pointing to the actual data to the databases but the big difference of that the databases are not part of the P3 the P3 of the nonclustered index is just a separate structure that does not involve any data so we have only index pages and it just
23:30 - 24:00 points to the databases without changing anything physically with your data but in reality what happen is that you're going to have those two types of indexes the clustered and the non-clustered indexes in one table so what can to happen the leaf level of the nonclustered index can be pointing to the data pages of the clustered index because those index Pages don't care whether those pages are sorted or not it's just going to go and point point to the correct page and to the correct row
24:00 - 24:30 so that means we have now like two different B3 structures that are pointing to the data and here there is like one thing that you have to understand that you can create only one clustered index on a table and this rule really makes sense because you can sort the data only in one way in SQL and that of course makes sense because you can sort the data physically only once and that's why in SQL databases you are allowed to create only one class cled index because physically the data can be
24:30 - 25:00 sorted only in one way but in the other hand in the nonclustered index you can create as many nonclustered index you need so you can create three four and all of them are pointing to the same databases because in the B3 of the nonclustered index you don't store any datab bages we store only pointers to the data and you could have like multiple pointers so this is the most important and the main difference between those two indexes now if you put it side by side we have learned that the clustered index going to go and
25:00 - 25:30 physically sorts and stores the rows at the P tree but the nonclustered index is going to go and create a separate P structure with pointers to the actual data and by the way the clustered index we call it the main index that we could use in each table so the clustered index is the main one the most important one that you can go and use in each table in your database now as we learned if we are talking about the number of indexes you can create maximum one index for
25:30 - 26:00 each table but for the nonclustered index there is no limitations you can go and create multiple indexes for each table and now if you go and compare them about the read performance how fast we can get data using clustered index well it is faster than the non-clustered index and that's because in the nonclustered index we have this extra layer at the leaf node from the P Tre and because of this having extra layer that means SQL has to do extra job in
26:00 - 26:30 order to find the data that's why clustered index is faster than the non-clustered index but now in the other hand if we are talking about the right performance how fast we can insert data to the tables well writing data to a table with a clustered index is slower than the non-clustered index and that's because as you are inserting data to the table Isle has always to check the databases is everything sorted correctly and if not SK has to go and start physically sorting the data again in
26:30 - 27:00 order to have the correct order so there is a lot of stress in order to sort the data with the clustered index but in the other hand in the non-clustered index we don't have this so the physical data going to stay as it is we are just creating nice new pointers so if you are writing to a table where you have a clustered index it's going to be slower than writing to a table where you have nonclustered index and of course the fastest way to write data to a table is to not have indexes at all so a heap
27:00 - 27:30 structure so SQL just go and start inserting data inside those data Pages without creating any extra structures so as you can see it's like always a tradeoff you can read fast but you're going to write slower so you cannot have like everything now if we are talking about the storage efficiency the cluster index going to be better with the storage than the nonclustered index and that's because of the same reason with a non-structured index we have this extra layer of index pages and index Pages
27:30 - 28:00 needed storage and that's why they can waste more storage than the clustered index now if you're talking about the use cases when to use clustered index well if you have like a column this column has to have few criteria in order to be good candidate for that clustered index first it's going to be good if the values inside the columns are unique and second and it is way more important than that the values of this column should not change a lot because if this column having a lot of update operators and the data is keep changing that means each
28:00 - 28:30 time isql going to go and start sorting the data again left and right so having a column that is frequently changing it's not good for clustered index and that's why the primary keys of tables are a perfect candidate because first they are unique and second we will never go and update a primary key value we always Abend a new primary key value and that's why primary keys are perfect for clustered index and and one more thing where I go and use clustered index is
28:30 - 29:00 that to optimize the performance of a range query if you are querying the data between one value and another one cluster index works really well now in the other hand if we are talking about the non-clustered index we could use it on columns that are used in the search conditions or if are joining tables without using the primary Keys then you can go and apply the N class index in order to have faster joints or you can go and use it to optimize the performance if you are searching for an exact value exact match so those are the
29:00 - 29:30 main and important differences between the clustered and the non-clustered indexes all right so now before we go to SQL and start practicing I would like to show you the syntax of the index so it's very very simple it start with create and then we can define whether it is clustered or nonclustered and then the Keyword Index but this section is optional so if you don't to find anything the default going to be the
29:30 - 30:00 nonclustered so if you say create index the SQL Server going to go and create nonclustered index then after that we have to go and Define the name of the index and then we have to tell SQL which table we have to create the index in on table name and then we can go and Define one column or multiple columns for the index and we call an index with multiple columns as Composite Index so for example we can go and create clustered index using this command create
30:00 - 30:30 clustered index the index name and then we specify the table and the ID so we are saying create clustered index based on this column the ID from the table customers and if you want to create a nonclustered index you say create nonclustered index and the same thing so so far we are using one column in the index but we can go and create a composite index with multiple columns like the following example so we can say create an index and as you can see we skipped here the finding the type and that's because the default going to be
30:30 - 31:00 nonclustered index and now here we are specifying two columns the last name and the first name and as you can see we're specifying as well for SQL how to sort the data so we are saying last name should be sorted inside the data page ascending lowest to the highest but the first name should be the way around from the highest to the lowest so you can control how the data going to be sorted physically in the data page so as you can see it is very simple this is the Syntax for creating index in SQL all
31:00 - 31:30 right so back to SQL and the first question is where do we find indexes in the database well you can go and explore it if you go to the object Explorer over here and check any tables from our sales DB for example the customers and here you have a folder called indexes so if you expand it you will find here an index I didn't create any of those indexes in the database but in SQL Server if you define any of the columns as a primary key the SQL Server we're going to go by default creating a
31:30 - 32:00 clustered index for the primary key because it makes always sense to create a clustered index on the primary key so this one is created as a default and as you can see at the starts we have like a key primary key customer and then it is clustered now I would like to start from the scratch that's why I would like to go and create a new table without any indexes so what we're going to do we're going to go and load the table customers into a new table so how we can do that we're going to go and say select star from sales
32:00 - 32:30 customers and before the from we're going to say into a new table so it going to be TB customers so like this let's go and execute it so now if you go to the left side and refresh the tables you can find we have now a new table called DB customers now let's go and check whether we have any indexes inside it so indexes it is empty so we don't have anything no class index or anything else and this table has the structure of Heap structure so the data are inserted
32:30 - 33:00 there randomly it is not sorted and if I go over here and for example let's say I'm going to select from this new table where customer ID equal one and I execute it the SQL saver did a full scan on the table in order to find this customer ID so our new table DB customers is Heap cluster but let's go and change that what we're going to do we we going to go and create a new clustered index so we're going to say
33:00 - 33:30 create clustered index and then we're going to go and give it a name for the index we usually follow the following index so we have index as prefix and then after that we specify the table name so DB customers and then the key for the index so the column that we are using in order to index the table this is important to stick with the same naming convention for the index name because later as you are mon ENT in your indexes it's going to be really easy to understand okay
33:30 - 34:00 this index is for the table DB customers and we are using the customer ID to index so now after that we're going to go specify on which table we are doing the index so on sales DB customers and then we're going to specify the column name so we are saying build for me a clustered index based on that customer ID so now let's go and execute it so as you can see it's very fast because we have only five so the database just switched all the data
34:00 - 34:30 Pages very fast now let's go and check our new index so let's go refresh and let's go inside it and now we can see that we have our new index clustered index based on the customer ID now as we learned we cannot create multiple cluster index but let's go and test that so I will just take the whole thing and let's say I would like to create a cluster index based on the first name as well here so let's go and execute it so as
34:30 - 35:00 you can see SQL saying you cannot create more than one cluster index on this table that's means we can create only one clustered index and let's say that after you created the index you chose the wrong column and you would like to change it to the first name so what we're going to do we have to go and drop the index so we say drop index and then you need the index name it was this one and then you have to specify which table so it's going to be sales TB
35:00 - 35:30 customers like this so if I do it like this and let's go and refresh again you can see that we don't have any indexes anymore and the table is backed as a heap structure and now you can go and create the correct clustered index for this table but to be honest I'm going to stick with the customer ID so I will not create a clustered index on the first name because the first name of course is not unique you can have like maybe multiple customers having the same name and as well updates could happen on the
35:30 - 36:00 first name and that's going to be very expensive so that means I'm going to stick with my index on the customer ID let's go and execute it and now I have again my index on my table now let's say that that I have the following select statement from our tables so customers and I'm searching for the last name where let's say we are searching for brown so let's go and execute it so let's say that we are getting more and more customers and our table is getting
36:00 - 36:30 bigger and I frequently use this query so I'm searching for specific customers using the last name so what we can do we can go and create a nonclustered index for the last name in order to improve the performance of this query so let's go and create that so we're going to say create nonclustered index and now we're going to give it the name using the nameing convention so DB customers and we're going to use the last name for this index so on
36:30 - 37:00 sales DB customers and we will use the column last name for the index so let's go and execute it and now if you go to our indexes and refresh we will find our new index over here and as you can see it says it is nonclustered and as well non unique we will talk about the uniqueness later so as you can see it's very easy we have just created an unclustered index on the last name and now as we we can go and create multiple nonclustered index on the same table
37:00 - 37:30 let's say for example now our query looks like this we are searching for the first name using for example the value Anna and now this query happens a lot and maybe slow so we can go and create new nonclustered index so let me just have it like this and for the nonclustered index you don't have to specify always like nonclustered index as default it's going to be nonclustered so we can skip that and here here let's call it first name and the column that
37:30 - 38:00 we are using is the first name so let's go and create this index and now let's go and refresh our indexes and as you can see SQL did create a nonclustered index for the first name so if you don't specify the type of the index it going to be as a default nonclustered index all right so now let's talk about the composite index it is an index that has multiple columns inside the same
38:00 - 38:30 index so far we have used only one column in the index but we can go and specify multiple columns and that's because sometimes our rare conditions are complicated and based on multiple columns so for example let's say that we are searching for Country equal to USA and at the same time we are seeing the score should be higher than 500 so that means in this condition we are using two columns and we would like to speed up
38:30 - 39:00 this query so how we're going to do it so we're going to go and create let's say an index and give it the name Tob customers and let's say country score on sales DB customers and now it is very important to do the following thing now we have to go and Define a list of columns that you want to be included in this index and it is very crucial and important that you get the same order as
39:00 - 39:30 your query so your query start with the country and then the score you have to do it the same thing in the index so the First Column it's going to be the country and then the score so it must be the same order as your query so let's go and create this index and if you go to the indexes over here you can see that we have created our new index so now once you create such an index and your table going to be like always updating this index you have to be committed and
39:30 - 40:00 responsible so in your queries if you want to filter the data using country and score always start with the country then the score in order to be able to use the index Optimizer so if you do it like this the index going to be working but if you go and query the way around so you start with the score and then the country the SQL will not be using your index so either you adjust your query or you have to go and recreate the index
40:00 - 40:30 based on this switch so be very careful with the composite indexes the order is very crital so you're going to have it exactly like the query and now you might say you know what now we have like a nice index for those two columns what can happen if I go and use in my query only one of them like for example the country so now the question is if I go and execute this query is the SQL is using this index even though that I don't have the score well yes because it
40:30 - 41:00 follows the left most prefix rule so what this means SQL can use the index if you are using always the left columns so here in our index country is on the lift that's why it is working over here but if you go and skip the lift column it will not work so if you go over here for example and say let's go and select only the score and it is like like higher than 500 what we have done we have skipped
41:00 - 41:30 the country in this query and that's why it will not be working so as long as you are including the left columns it will work even though it is only one column so in this scenario the first query going to use the index the second one will not be using it so now let me give you a very simple example in order to understand how this works so let's say that we have an index using four columns AB BC D now in your query if you go and Target the column A the index going to be used now the same thing going to
41:30 - 42:00 happen if you go and use A and P so if you're using those two columns you will be using the index so those are where the index will be used so now let's have the scenarios where the index wants be used so for example if you go and just jump immediately to the column B so you are not using the left column the a that's why you will not be using the index and as as well in your query if you are using a and you are skipping the
42:00 - 42:30 P so you have a and then C you will not be using the index so you have always to use always the lift columns so here if we are using a b c you'll be using the index and let's see here you are using a b and then you jump and skip to the D you will not be using the index so this is what we mean with the leftmost prefix rule by using the composite index so if you're using multiple columns inside one index be care careful with the order of the columns that you are defining all
42:30 - 43:00 right so that's all for this category clustered and nonclustered index now we're going to move to the second category where we talk about the indexes by the storage the row store and the column store if you like this video and you want me to create more content like this I'm going to really appreciate it if you support the channel by subscribing liking sharing commenting all those stuff going to help the Channel with the YouTube algorithm and as well my content going to reach the others so thank you so much for watching and I will see you in the next tutorial