Azure Synapse Analytics Tutorial (From Zero to Pro) | Azure Data Engineering

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

This tutorial, led by Ansh Lamba, is a comprehensive guide to Azure Synapse Analytics, highlighting its significance in the data engineering landscape. Over four hours, Ansh provides an in-depth, hands-on learning experience covering Azure Synapse architecture, SQL pools, Apache Spark, and data lakes. The video aims to bridge learning gaps, particularly focusing on critical yet often overlooked aspects like external data storage configuration, and offers guidance on creating resource groups, working with serverless and dedicated SQL pools, and utilizing data flows. Supplemental resources, like links to Azure Data Factory and Pyspark tutorials, are also provided.

Highlights

Learned how Azure Synapse integrates data warehousing, ETL, and big data into one platform for seamless workflows. 💡
Discovered the importance of external data storage configurations and its role in real-world applications. 🌍
Explored strategies for distributed table storage in dedicated SQL pools, including hash and round robin. 📦
Gained insights into the benefits of serverless SQL pools for querying data directly from files without heavy costs. 💰
Found out how critical it is to manage identities and data access permissions in Azure services. 🛡️

Key Takeaways

Azure Synapse is a one-stop solution for data engineering, integrating various data functions under one platform. 🎯
Understanding external storage configuration in Synapse is crucial for real-world applications. 🌐
There are distinct strategies for distributed tables in dedicated SQL pools like hash, round robin, and replicated. 🚀
Serverless SQL pools in Azure Synapse are essential for cost-effective data querying directly from files. 📊
Mastering Azure services requires hands-on practice and understanding of credentials and access permissions. 🔑

Overview

Azure Synapse Analytics is transforming how data engineering is approached by providing a unified arena for all data management processes, from warehousing to big data analytics. This video elaborates on its architecture and practical applications. 🎓

Ansh Lamba, through explicit and practical demonstrations, delves into crucial topics like SQL pools, serverless data handling, Apache Spark integration, and the handling of external data sources, emphasizing the real-world applicability of skills learned. 🔍

To enhance learning, Ansh shares additional resources for Azure Data Factory and Pyspark, aiding viewers in navigating their path from novices to proficient Azure data engineers. The tutorial fills notable knowledge gaps, equipping learners with the tools needed in today's data-centric job market. 📚

Chapters

00:00 - 03:00: Introduction and Course Overview The chapter provides an overview of a course designed to train individuals as Aure Data Engineers. This training promises comprehensive hands-on experience, emphasizing the core technology crucial for developing cloud data warehouses within Aure, specifically focusing on Aure Snap Analytics. Key areas of learning include Aure Synapse architecture, and the distinction between control nodes and compute nodes. With a commitment of four hours, participants can significantly enhance their skills, applicable to a role in this high-demand area.
03:00 - 16:30: Introduction to Azure Synapse Analytics In the 'Introduction to Azure Synapse Analytics' chapter, the focus is on various key components and functionalities of Azure Synapse. It covers elements like Azure Synapse dedicated SQL pool and serverless SQL pool. The chapter discusses crucial concepts such as database scoped credentials, external data sources, and external file formats. It delves into querying files using open row set, creating external tables and views in serverless SQL pools, and distinctions between CTAS (Create Table As Select) and its variations. Additionally, distributed table strategies like round-robin, hash, and replicated tables are explained. The use of Apache Spark pool for data loading is also introduced. The chapter aims to provide a comprehensive overview of managing and querying data within Azure Synapse.
16:30 - 50:30: Creating Resource Groups and Storage Accounts This chapter introduces the process of integrating an external data lake into a dedicated SQL pool using PolyBase and the COPY INTO command. The instructor encourages proactive learning and suggests preparation with tools like notebooks, pens, and highlighters to follow along and upskill effectively.
50:30 - 64:00: Creating Azure Synapse Workspace The chapter introduces the process of creating an Azure Synapse Workspace. It starts with a lighthearted and enthusiastic introduction, with the speaker expressing excitement about discussing Azure Synapse Analytics. The speaker encourages new viewers to subscribe to the channel for more content.
64:00 - 87:00: Introduction to Synapse Studio The chapter introduces Synapse Studio, encouraging readers to subscribe and join the 'data fam' community led by the speaker. The speaker shares their passion for data engineering, promising enjoyable and insightful learning experiences every Sunday. As a personal note, they mention a new bandana they bought while listening to music.
87:00 - 164:00: Understanding Serverless SQL Pools This chapter discusses the rising demand for Synaps Altic, emphasizing its popularity in the current market. The introduction hints at an exciting exploration, indicating that the reasons for its demand will be revealed shortly. The narrator also includes a brief personal anecdote about wanting to wear something they saw, adding a bit of personality to the introduction. Overall, the chapter sets the stage for a deeper dive into the specifics of Synaps Altic's popularity.
164:00 - 237:00: Working with Openrowset Function The chapter "Working with Openrowset Function" focuses on identifying gaps in the learning process for Synapse Analytics. It highlights that the primary issue is the incomplete coverage of topics related to Synapse Analytics in current educational resources. This points to a need for more comprehensive training materials to ensure all relevant topics are adequately covered.
237:00 - 297:00: Creating External Tables The chapter highlights a significant gap in the learning resources available for isnap Antics, specifically the lack of guidance on dealing with external storage accounts. The focus of many is typically on using default storage accounts, neglecting the complexities associated with external ones.
297:00 - 339:00: Understanding CTAS and PolyBase Understanding CTAS (Create Table As Select) and PolyBase involves learning more than just the coding steps. It requires a deep understanding of the configurations and external storage setups in Azure Synapse. The process demands significant effort and comprehension to ensure proper data handling and integration. Merely following procedural steps without grasping the underlying reasons can lead to inefficiencies or errors. The chapter emphasizes the importance of knowing the 'why' behind the actions to ensure accurate data management and utilization.
339:00 - 348:00: Loading Data with Copy Into Command The chapter discusses the process of loading data into Snap Analytics using the Copy Into command. It emphasizes the ease of working with external storage accounts and external Data Lake using Snap Analytics, particularly if you are following the videos provided. The speaker clarifies that though the existing videos are beneficial, there is an intent to address and fill in any missing gaps in the content through this explanation.
348:00 - 353:00: Understanding Apache Spark Pools The chapter 'Understanding Apache Spark Pools' aims to cover every aspect of Apache Spark Pools comprehensively. It starts by briefly mentioning serverless SQL pools and dedicated SQL pools, noting that these topics will be elaborated on later. The speaker acknowledges briefly skimming over these terms and apologizes for not delving into them immediately, assuring the listener that all components will be thoroughly explained.
353:00 - 355:00: Conclusion and Further Learning Resources The 'Conclusion and Further Learning Resources' chapter emphasizes the foundational knowledge required for mastering synapse analytics. It assures that all necessary topics, such as external data lakes and external storage accounts, are comprehensively covered. The chapter aims to bolster confidence by suggesting that upon completing the material, learners will feel adept in handling real-time, complex scenarios, equipped with a robust understanding of each aspect discussed.

Azure Synapse Analytics Tutorial (From Zero to Pro) | Azure Data Engineering Transcription

00:00 - 00:30 out of thousands of job applicants you can be placed as an aure data engineer after completing this 4 hours long video because this video will provide you strong Hands-On with the most in demand and the core technology behind building cloud data warehouses in aor and it is aor snaps analytics in this special course you will learn about your synapse architecture control node versus compute
00:30 - 01:00 node Azure synapse dedicated SQL pool and serverless SQL pool database scoped credential external data source and external file formats squaring files using open row set external tables and serverless equal pools external views in serverless equal pool C Tas versus ctas distributed table strategies such as round robin hash and replicated Apache spark pool and loading data from
01:00 - 01:30 external data Lake into dedicated SQL pool using poly base and copy into command by the way I think now you know why this video is made for you if I were you I would have taken out my notebook pen pencil highlighter my laptop and started learning right away and I know you also going to do that so without delaying further let's get started with this amazing video and upscale yourself
01:30 - 02:00 so another Sunday another video amazing video not just a video amazing video with an lamba wow wow wow wow wow just kidding just kidding just kidding so welcome welcome welcome welcome to the brand new video with amazing topic I love this topic which is none other than your synapse analytics so if you are new to this channel you and just hit the
02:00 - 02:30 Subscribe button now you would say why so the reason is if you want to become the part of my data fam if you want to become the part of our data fam so you can just hit the Subscribe button because me with my data fam have fun on every Sunday and we just learn a lot in the world of data engineering and yes you can become the part of that family and you can also learn by the way how is my bandana looking so I just bought it actually I was just listening to a song
02:30 - 03:00 and I just saw a guy wearing this so I was like I also want to wear this so how is it looking I know it's good but you can also tell me okay so now what's so special why I'm so excited for today's video so the thing is so the thing is currently your synaps altic is very much in demand very very much in demand and you will know the reason in just few minutes but listen to me first it is very much much in demand and and and and
03:00 - 03:30 the main point the main point actually there are like lots of gaps currently in the learning for a synaps analytics what are those gaps what are those gaps what are those gaps so the gaps are like if if you would be learning your snaps analytics so the thing is not each and every topic is covered first of all
03:30 - 04:00 second the major Gap that I personally found that I personally faced when I was learning isnap Antics no one is telling you how to deal with external storage account no one no one everyone is focusing on telling you how you can just work with the default storage account like what's the big deal in this bro that's one of the biggest
04:00 - 04:30 deals in terms of Aur stics why because you need to configure that storage external storage in synapse and that requires lots of lots of lots of efforts and you need to understand that as well it's not just about learning the code it's not just about just following the steps no you should know why we are doing this and if you are not following those
04:30 - 05:00 things like attaching the external storage account working with the external storage account external data Lake snaps is very easy but when you work with external storage account and you are watching my videos then your snaps analytic is very very easy so the thing is I'm not just saying that those videos are bad no those video videos are really really good so I'm just trying to complete the gaps which are like missed and I'm just trying to fulfill those gaps so those videos are really really
05:00 - 05:30 good but this video will cover each and everything so right from like serverless SQL pool dedicated SQL pool hey hey hey hey hey stop stop stop what is this serverless SQL pool what is this dedicated SQL pool okay I know that I haven't talked about this I will talk about it in just few minutes so I was just taking some examples my bad my bad so sorry so we will be learning each and everything every component
05:30 - 06:00 that is required for you to set a strong base in the world of strong uh in in the world of synapse analytics trust me each and everything will be covered each and everything and including external data legs external storage accounts and you will definitely learn a lot and once you complete this video Once you complete this video you going to feel like you can just understand the real time scenarios you can work with complex real time scenarios you can work with anything because you would know each and
06:00 - 06:30 everything the architecture behind everything trust me trust me trust me trust me there are like simple ways to get started with synapse analytics because everything is automated but when you actually build projects in the real world you cannot use those things you cannot so I will just tell you each and everything each and everything but but but hit the Subscribe button and just just hit the Bell icon as
06:30 - 07:00 well and if you want to become a part of my I'm just repeating again so you can also hit the Subscribe button and be the part of our lovely lovely lovely data engineering family I I I call them as data fam so I don't call them like subscribers though they are not subscribers they are my data fam so now it's time to get started with this amazing video which is is your
07:00 - 07:30 synapse analytics so first question first question first question what are the prequest like what you should know before learning atics you should have a PhD just kidding no no no no no you should just have a laptop or PC with stable internet internet connection otherwise you will be seeing video being buffered so second thing Azure account you should have an Azure account even if you if you do not have
07:30 - 08:00 don't worry I will just tell you how you can just create a free one so do not need to worry third thing the most the most the most important thing important important excitement to learn your synapse antics you should have excitement to learn as your stics otherwise this video is not for you you should have that excitement you should have that rage to learn your synaptics if you have all the three then let's get started first thing
08:00 - 08:30 first thing first thing first thing before starting anything just tell us what is a yours analytics what is this technology and why every compan is talkinging about this technology why in every aure data engineer job we just see this technology that is your ntic yours your synaptics so you will get the answer today why companies are putting a your snaps analytics in their job portals and why you see a your snap snaps analytics as a requirement in
08:30 - 09:00 every job description you will know you will know so basically this is the your synapse analytics you can say architect not an architecture just an overview I will explain what's that so basically let me just explain you verbally first your synops analytics is like an unified it's like a unified solution for data engineering Solutions what's that what did you just just say okay let
09:00 - 09:30 me just repeat so basically if you are a data engineer right and if you want to perform each and everything each and everything means ETL transformation data warehousing data Lakes everything so this is a unified solution this is a One-Stop solution in which you can perform all the things all the things under one umbrella under one platform that is aural this is the simp simp definition now let's go deeper when I
09:30 - 10:00 say this is one stop solution for everything obviously the first thing is data loading data ingestion so for that so for that it has something called Data integration it has something called Data integration what does it mean in aure snaps nalex it has embedded a your data Factory inside your synaps analytics so that you can create same data pipelines in a yours analytics
10:00 - 10:30 wow so now you do not need to just go to a data Factory then create the data pipelines you can you can but you have the option in synaps analytics as well and you can directly build the pipelines in an absal everything is same ditto same okay second thing you said that you can also transform the data yes we have something called run times which is this apach part what does it mean have you
10:30 - 11:00 heard about data breaks no so basically data braks is just an application or you can say a management layer on the top of aache spark through which we can just use spark clusters and we can just transform data in the form of uh notebooks spark notebooks so similarly exactly similar synaps natics has created something called as spark pool spark pool don't worry we will discuss it in in detail don't worry bro just
11:00 - 11:30 calm down we are just having an overview right now right okay so basically it has something called spark pool so in that spark pool you can create the notebooks using your ppar functions using your if you are familiar with Scala you can just use Scala like you can do exactly same things as you do in data braks same almost same almost same not like 100% same obviously there are like some functionalities which are only available in data brakes but almost same
11:30 - 12:00 I can say if you are familiar with data brakes you can easily work with spark pools if you are familiar with spark pool you can easily work with data bre both the ways simple then third thing you said that we can also build data warehouse yes that is the prime reason behind synapse analytics we have something called SQL pools okay we have SQL pools okay yeah yeah we have SQL pools so in that SQL pool we can just
12:00 - 12:30 create data warehouse how data warehouse is different from database you already know database is for transactional records and data warehouse is nothing but olab that is online analytical processing and that is for storing large amount of data which requires reading large amount of data instead of just few records and it updates large amount of data as well instead of updating One Two Three or 10 or like small amount of reord so it is
12:30 - 13:00 by the way by the by the by the way in your synapse analytics we have like SQL pool and before this before this we had something called as we if if you know if you know you can just type it because I also didn't know this so basically we have something called we had now we don't have it is just renamed now we had something called a SQL data warehouse a your SQL data warehouse like we have a your SQL database right we had something called
13:00 - 13:30 as your SQL data but now it is renamed as dedicated SQL pool what is this dedicated SQL pool you just repeating this word again and again okay so sorry so sorry we we will discuss it so sorry so this was all about synaps ntic like this is an application where you can just do everything everything means everything and today we will be focusing on the data warehousing site because synaps analytic is the prime reason for synapse analytics or what are you saying bro Sy Antics is the prime reason for
13:30 - 14:00 the data warehousing so today we will be learning each and everything each and every component of data warehousing related to synaps analytics and if you are already familiar with data Factory everything is same in synapse analytics I will still show you some of the UI things so that you can just be sure that I'm not lying it is exactly same so if you know data Factory it is exactly same if you do not know data Factory here's the gift for you I have created a dedicated 4 hours video for four hours video four hours long video on a GE
14:00 - 14:30 Factory you can definitely watch that and that video has covered each and everything from scratch so once you complete that video you will be like well wor with the GE Factory and you do not need to worry so I will just attach the link in the description you can just check it and definitely then you can just learn this now now no hold on hold on hold on before just going to that video if you do not know anything about a your data Factory do not need to worry because we are learning a your analytics you do not need to know a your data Factory for sure yes after watching this
14:30 - 15:00 video you can just watch that video okay so you can just click on the link you can just like it or just put it in your bookmark or watch list a playlist anything you can just save that video okay okay I'm just obsessed with this bandana I'm just looking cute in this so jokes apart jokes apart jokes apart now very important thing so this storage account I'm talking about this so now the thing is you should use external storage account
15:00 - 15:30 when you create synapse analytics you create one storage account as well but we should not use that because in the real world our data is resigning in the external storage account so we will be using external storage account and once you do that you will be well versed I am repeating this again and again because it is very important it is very very important trust me when I was learning SS and I was like bro I understood the concept but I want to work with external storage account because my data is in
15:30 - 16:00 the external storage account why I would be just uploading my data in the default storage account because I knew that this this this concept is important then I just dig deeper deeper deeper and I found some insight some documentations like some documentation on like Microsoft Pages now you do not need to go there and just reading everything there is a boy on lamba we'll explain everything to you okay simple sort it sort sorted now you
16:00 - 16:30 have like good understanding in a synaptics Okay now what's next what's next what's next finally finally what is this a your synap SQL pool what is that what's that any guesses any any guesses any idea what do you think after reading this word any any guess so basically your synap equal pool hey wait wait where's your notebook where is your notebook if you do not
16:30 - 17:00 have your notebook just bring your notebook right now right now just pause the video just bring your notebook right now put it on your desk on your table on your sofa on your bed I don't care just bring your notebooks pen pencil highlighters and some snacks you can just enjoy as well because food is important right so just bring everything right now just pause the video I'm here I'm not going anywhere just just bring
17:00 - 17:30 everything here and then just start the video because now you need to just note each and every point that I'm saying okay okay first of all I trust you that you have brought your notebooks and all okay what is a j SQL pool so basically basically basically basically a yours app equal pool is a distributed query engine okay okay just remove the word distributed it is just a query engine
17:30 - 18:00 similar to your let's say workbench that you use for like my SQL post gr SQL Ms SQL let's related with you can say it is another application called Snap SQL let's say okay you will understand it just just hold on you can just relate it with an application called synap SQL similar to my SQL postgress SQL and this is synap SQL so this is running your SQL queries simple because this is a just just a query engine but now we have
18:00 - 18:30 attached a word called distributed let me just write it for you if you have not brought the notebooks then you should not talk to me and I will not talk to you okay distributed query engine what it does it will process your SQL queries in distributed manner okay and distributed manner what
18:30 - 19:00 does it mean basically let's say you have like one machine this is your one machine so instead of processing all the data using one machine it will distribute the data using parallel processing so it will let's say create these machines small small and it will distribute that query among those machines and it will process your data faster got it yeah you do not need to get
19:00 - 19:30 everything for now this is just an overview you will just get it everything in detail but this is just the concept okay I know you just got the blurry concept and it's fine it's fine trust me okay so this was like distributed query engine okay now what is this dedicated SQL pool and serverless SQL pool so these are two kinds of SQL pools in our synapse analytics in our synapse analytics this is dedicated one and this
19:30 - 20:00 is rename this this is the one which is renamed from your SQL data warehouse this is the one dedicate SQL pool this is the new one serverless SQL pool what is the difference between both so basically this is very good and this is the one which is in demand right now because of the lakeh house concept because in this dedicated SQL pool you have to load the data in the traditional databases like structured data being stored in the physical databases
20:00 - 20:30 right but in s SQL pool you don't do that you just work with the files data is stored in the files let's say CSV pocket okay I will just explain this in detail don't worry don't worry I'm here don't worry so actually data is residing here let's let me explain you this let me explain you this okay so before explaining this let me just show you something more important because I think that is the base for everything and then
20:30 - 21:00 you will just get the concept of distributed parallel processing or distributed query engine then we can just understand like how it works okay so what is a short snap SQL architecture this is really important and this is not hard this is very easy and this will just give you the understanding like what do I mean when I say distributed query engine now you will know what do I mean by distributed query engine and for that I will take you to the Microsoft documentation page not to read the text just to show show you the visual because visual is very
21:00 - 21:30 very nice in that page and instead of just copying that visual here it's always good to just go there and I can just explain you everything okay so let's go there so this is your architecture that I was talking about let's see what do I mean by distributed query engine so you do not need to read all the things we just want to focus on this let me just zoom it okay perfect perfect perfect perfect so basically let's focus on this first by the way both are same in case of like
21:30 - 22:00 distributed qu engine so what it does behind the scenes so basically our distributed query engine consist of two things compute node and control node as you can see this is your control node and these are your compute nodes oops these are control compute nodes if you are familiar with spark if you are familiar with spark this is similar to driver program and worker nodes if you are not familiar with spark
22:00 - 22:30 just ignore what I just said okay so basically basically basically you do not need to Lear spark don't worry so basically we have a kind of group of machines when I say distributed query engine that means it has like multiple machines instead of just single machine when you run post gr SQL when you run Ms SQL Server when you run let's say my SQL you are just running everything on one single machine that is your PC right but when you use cloud that means a your
22:30 - 23:00 synapse analytics right then they have multiple machines they have many machines Okay and like how many machines they have it varies from 1 to 60 I will just tell you don't worry so basically they have one machine which is called control node which is called control node as the name suggest it controls everything control node is the brain of of the whole group of machines so
23:00 - 23:30 whatever query we are sending to synapse it is going to control node okay and then then it is just segregating all the information and just sharing that all all the information to these compute nodes why why why why because these compute nodes are your actual workers who are processing the query who are just proc accessing the information and giving
23:30 - 24:00 back the results to you this control node is not doing anything it is just giving the tasks it is just giving the information information to the to the compute node which are these that's why as in the diagram also it is shown that your query application goes to control node then MPP engine means massive parallel processing MPP parallel processing in which you can just query the data parallely right because
24:00 - 24:30 obviously it can just speed up the process like let's say let's take an example you just need to complete your homework okay and you are the only one who will be just writing 100 Pages have you ever written 100 Pages no I know you so let's suppose you have to write 100 Pages it will take a lot of time right but at the same time I will say you have a team you with your nine friends that means now you are 10 10 people okay you
24:30 - 25:00 have to write 100 Pages how long it will take almost almost one tenth of the time if you would take 10 hours right then now it will be taking like 1.5 hours bro what is this math so I'm assuming that some of your friends are like very slow in writing okay just King okay now it can take like just one hour because one person was taking 10 hours 10 people will take just one hour right so that's the power of parallel processing you can just parall process the data that's
25:00 - 25:30 where we have this dedicated SQL pool and ser SQL pool where we can just run the data parall so these are compute nodes and this actually processes the data and this gets your data from the Azure storage simple yes now what is the difference between dedicated and seress SQL pool in terms of architecture there's not much difference why because both are working on the same concept that is parallel processing I'm not I am
25:30 - 26:00 not telling you the difference between dedicated and sisal pool how they function but I'm just talking about the massive parallel processing architecture only mppp architecture do not get confused okay I will just tell you that part as well do not worry so when you say that distribut engine it works the same there's one control node and there are like many compute nodes similar to Dedicated SQL pool so your query will go to control node and there'll be like multiple compute nodes okay let's let's
26:00 - 26:30 minimize it and let's see what is written here in the control node so as I just said like it is the brain of the architecture like it just interacts with the applications or the users and compute nodes are the real workers provide the computational power obviously if you want to just process the data you need computation power that's it okay and and and the number of compute nodes as I just mentioned that it can be between 1 to 60 so so let's say if you are using all the 60 compute
26:30 - 27:00 nodes your data whatever you are querying will be distributed among 60 computers will be distributed among 60 computers so you can just imagine the speed and how we can just Define the computational power it totally depends upon the data warehouse units dwu and it should be written somewhere here as well uh data warehouse units let me just search it will just save a lot of time or I will search units or unit yep see
27:00 - 27:30 what dedicated SQL pool the unit of scale is known as like data warehouse unit so you can just Define the power of your compute node using this data warehouse unit simple this was about the architecture now let's see what is the difference what is the actual actual actual difference so basically in case of dedicated SQL pool you need to pay for
27:30 - 28:00 the storage you need to pay for the computation even if you are not quering anything if you have data stored in your database you have to pay for it you have to pay for it let's say you have a table or let's say you have a database okay this is dedicated database dedicated SQL database okay this is dedicated then even if you are not using this what does it mean let's say you created your database and you did not query for at least a week but you need
28:00 - 28:30 to pay for one week why because your data is stored there right your data is managed by the dedicated SQ food so it is storing your data so that's why you have to pay for it and and and and as I just said that you need to store the data physically in this database that's why it is expensive that's why but in case of s SQL pool you let's say created a database right right but this time you created sess like you use sess equal
28:30 - 29:00 pool then you do not need to pay for the data why because you have the data in the data Lake your data is not being stored physically stored in the Ser equl pool no no it is quering Data directly on the data Lake that's why you will only pay for the data that you are processing let me repeat again you will only pay for the queries or for the data that you are
29:00 - 29:30 processing okay and if you are not running any queries you will not get paid will not yeah yeah trust me you will not be charged so understood and let me just give you a hint like how it works like obviously we will be just covering sess SQL pool in detail but just a hint like this is your let's say data Lake and your data is stored here it queries this data it queries this data directly and
29:30 - 30:00 how this queries the data I will tell you don't worry don't worry don't worry you will learn everything okay this is sorted this is sorted I hope this is sorted next thing next thing what is a yourap SQL architecture bro you have already told us I know so what is spark pool what is spark pool we want to know spark pool okay for sparkpool let's go to the documentation and before going to the documentation let let me just give you some overview of spark pool basically
30:00 - 30:30 have you ever used data bricks and I know I have already asked you this question so basically whenever you want to work with a paches park whenever you want to let's say run notebooks or build notebooks using F spk we have to use Park clusters right we have to use Park clusters but we do not need to manage these spark clusters who will manage that spark cluster in the case of synaps Analytics synapse will manage it and
30:30 - 31:00 within synapse spark pool will manage it and this will be autoscaled what does it mean so let's say you are processing large amount of data suddenly so it will be autoscaled automatically simple and let me just show you the architecture we have a very beautiful diagram here in a part pool architecture so this is the story written not a story good documentation written uh how driver program or driver
31:00 - 31:30 node and worker nodes work so this is about spark okay so you can just uh read it or I can just explain you within few seconds so basically the thing is okay in Spar we have control node but we call it as a driver program or driver node okay both are same thing so driver node is the brain again the same concept driver node is the brain and the compute nodes are known as has worker nodes here simple so if you want to just know the
31:30 - 32:00 architecture it works very uh smoothly so you submit your application let's say let me draw it you you submit your application let me like let's suppose this is you okay you submit your application to the driver program to the compute node I'm just relating with synapse so that you can just understand okay let's say you submitted your application to driver node it will send the application like it will just send this information
32:00 - 32:30 to resource manager and in case of spark pool it uses Yan YN okay and this Yan will allocate the resources which are worker nodes okay let's say it requires three worker nodes okay so it will just allocate these three resources to the driver program and now same thing this will coordinate with worker nodes like control node communicating with driver uh compute
32:30 - 33:00 nodes and Driver node is communicating with worker nodes different names same concept okay and this is the Yan this is the resource manager and this allocates the resources and once it it allocates the resources it just say like just just talk to each other just talk to each other simple so this is the architecture this is related to spark I know this is not relevant but still it's my duty to share the knowledge with you so this is about spark and when we talk about uh spark pool so this is again the
33:00 - 33:30 spark so as you can see this is just comparing map reduce with aach spark so this is simple do not need to worry so why why why why we need to use spark pool because of speed and efficiency and there's one more thing if you want to create a cluster like group of machines it will create in Just 2 minutes if you have like 60 nodes nodes means machine if you have less than 60 machines and it will take like 5 minutes if you have
33:30 - 34:00 machines more than 60 simple obviously ease of creation you do not need to just manage any cluster you can directly start working with spark with spark pool simple then you have like rest API support for aure data storage and two this is the thing that I was talking about the external data storage so we will just see how you can just do that don't worry don't worry bro don't worry don't worry don't worry so this was all about the spark pool now now now now I have ready your mind how let me show you I know you are
34:00 - 34:30 just saying this enough information is provided can we start building our project which project basically project it's a kind of project like that we are doing for snaps Antics not the like the project project but yeah learning synapse is itself a project right so I know that you are saying this and I have to have to have to start actually ually with synapse analytics but I think these
34:30 - 35:00 fundamentals are really really important and I hope you will just make notes because these are the concepts now your fundamentals are clear now you can actually relate with the Practical knowledge with the Practical things that you'll be doing practical activities but before these fundamentals you cannot relate with those practical scenarios so now what we'll be doing we will first create our free Azure account because that is important that is required once we do that then we will just start with our synapse okay let's get started so
35:00 - 35:30 you just need to go to your incognito mode no you can just use your default Google tab as well don't worry it's all up to you so just go there and just type as your free account hit enter then you just need to click on the very first link see this one click on this and then this is the main page you should click on try aure for free do not click on pay as you go because this is a paid account and you need to just just pay for the services that you use so I personally
35:30 - 36:00 use pay as you go because I just create like many projects and I do like keeping it for long so you you can just click on try aure for free because I also started with try aure for free then I just upgraded my account because I was learning more and more so it's your choice like if you want to do that you can because if you are learning you should spend some money on your education you have already spent like thousands of bucks like thousands of dollars already on your education and I don't know where you are using that education but you will be using this education so smart choice okay don't
36:00 - 36:30 worry no worries I'm not encouraging you to pay for the your uses I'm just telling you that it's your choice and you should feel good if you are just paying do not feel like oh I I have to pay for it no bro you're just paying you're just investing in your career you just investing for your future you have already just paid for like so many useless these I wanton I I won't take names so these like shitty web series platforms so I don't know so if you do
36:30 - 37:00 not have any Microsoft account you can just create your Microsoft account and obviously you can just use that Microsoft account for like email services and all and you also get some free storage you can use that bro so if you do not have just click on create one and you can just quickly create your account and just put your credentials here I already have my credential so I can just put it here and then we will see a page I I will show you I will show you so I have just put my credentials and now it is just loading the screen uh and it should take just few seconds and
37:00 - 37:30 yeah it shouldn't take much yeah so this is the screen which is very very important why because here you just need to put your all the information like your name email address where you live blah blah blah then as you can see that you get the popular Services free for 12 months and some services like always free and obviously snaps altic is a paid one but you will get $200 that you can spend in first 30 days and I think that is enough to complete this video to
37:30 - 38:00 complete this learning uh video so you can easily expect that obviously you would just save some bucks and now the question is what will happen after 30 days so it will just ask you to upgrade your account if you do not even reply if you do not even just bother about that email it will just shut down your services it will not be charging you at all do not need to worry and just trust Azure aure is not just like taking money from you don't worry it is just promoting their services so that you can
38:00 - 38:30 learn and use their services right then one important step after clicking on sign up you will be asked to provide the car details and all so you can just put your car details it is not for like billing purpose it is just for you like for Azure to confirm that you will be the one who will be using those Services you will not get any Bill until and unless you just upgrade your account to pay as you go do not need to worry do not need to worry so just click on that just create a your free account and just enjoy it for 30 days and if you want to
38:30 - 39:00 grade it you can if you do not want forget about it bro forget about it do not worry okay so this was all about your portal account now it's time to actually start the start working with Azure portal account because now we have the portal account right so now we can just start working with Azure and let me show you the portal Azure portal let let me show you so to go to your Azure portal account account oops sorry so to
39:00 - 39:30 go to your Azure portal account you can simply type portal. azure.com let me just zoom it for you so you can simply write portal or a.com it will just take you to the portal okay just hit enter so this is your aure portal account welcome welcome welcome this looks so so so good I still remember those days when I first landed to this aure portal account it's so good it's so good it's so good you love this
39:30 - 40:00 UI I I love it I love it so don't worry first of all do not worry if you are seeing if you are not seeing these services in your portal account if you are seeing some other services so basically it will just show you the recent services that you use so do not do not need to worry if you are not seeing any Services here so this is about your aure portal it's lovely right I I love it so just to give you an overview so these are like some of the
40:00 - 40:30 functions that you can just do some of the functionalities and obviously we are not going in deep about a y we are just learning synapse analytics and truly speaking there's nothing to go beyond like a y you can just play and you can just click on different different icons you will just get to know like what services like for what thing do not need to worry but that doesn't mean that I won't explain you the concepts in detail no bro we will we will discuss everything in detail so first first of all let's start with the first thing what is the resource
40:30 - 41:00 Group if you are familiar with Azure you would know this if you are not familiar with the the do not worry I will tell you what is the resource Group so basically it's a kind of folder it's a kind of folder you would have used folders in your C drive D drive e Drive and you have put your documents inside that similarly we have folder and Azure we just put the resources in the resource Group so we call folder are as resource groups and the services the
41:00 - 41:30 applications that we use we call them as resources for example now we we know that we want to create a your synaps analytics service right a your synaps analytics is a resource and that will go inside this Resource Group simple and we will create one more service what's that it is Data Lake storage account but in aure we do not have a feature called Data Lake really really really you you you are lying you are lying no no no no no we do not have any feature called Data L we have something called storage
41:30 - 42:00 account which creates a blob storage account if you want to create a data Lake from that storage account you need to just check one special feature and then it will create data L for you I will just show you do not need to worry so first of all let's create a resource Group how you can create a resource Group simply go to search bar and just type Resource Group and you should just click on this popup and do not worry these are my some of the resource groups that I have already created simply click on plus create and then this is the let me just
42:00 - 42:30 show you what I did so simply go to Resource Group okay and then you just need to click on this plus create okay this one click on plus create and then you can just name it anything I will name it as RG as Resource Group and then synapse and then I will say course okay simple simple simple simple then nothing it's done just click on review plus create so it's validated now
42:30 - 43:00 click on create for one more time and your resource Group is created wow wow wow you can simply click on this go to Resource Group click on this and this is your resource Group wow you have nothing in your resarch group obviously because you haven't created anything right first of all we will create the data link okay so if you want to create the data L inside this Resource Group obviously then you need to just click on this create button click on create button it will just take you to the marketplace
43:00 - 43:30 what is that have you used Marketplace on Facebook it is similar to that where everyone is selling their services selling their products so similarly like different different brands different different service providers are just selling their services we are not interested in those we just want to use Services by Microsoft okay so we'll simply write storage account I will simp Simply sech Storage account okay I should click on this
43:30 - 44:00 popup then as you can see we have multiple options but we just need to use this one which is provided by Microsoft okay storage account click on this and click on create now it will ask you to complete this configuration we will complete it together don't worry first of all Resource Group it's already filled why because we created resource from the create button inside the resource Group if you would have not created your resource Group it would ask you to
44:00 - 44:30 create a new one and then you can just click on this create new button but we already have and it's called RG synaps codes it's good then storage account name you have to put a unique storage account name so that unique so you so the storage account name that I will be putting here you cannot use it because uh your storage account name should be unique throughout the Azure Network it's not about just about you and me it's about the whole world so
44:30 - 45:00 let's say I want to just give uh name as data Lake synapse I hope so it is available no see it is saying that this is not available this is already taken someone has already taken your storage account so so so so we don't care so we can just change our name and we can say data like unch if it is available what this is also not available oh
45:00 - 45:30 because I just created the storage account in my previous project I guess yeah so I can say unch data link this is not taken anch is not taken okay so you can also use your word as prefix and then you can just say data L it's simple and easy to find as well right if you are not Tak okay so primary service we do not need to select it because we will just select data l after that after this step so first thing this step is very
45:30 - 46:00 important which one this one data redundancy options in your data l so basically we have four options available lrs zrs GRS gzrs or gzrs so I will simply pick lrs tell me the reason why because lrs is the cheapest option because it creates a duplicate copy of your data within the same Data Center and which is the most expensive one it
46:00 - 46:30 is uh gzrs that creates your duplicate copy in the other region yes of the data center so simply pick lrs it is the cheapest one obviously you will not be charged but I'm just telling you because you can just save the money for your company and your manager will be very happy really okay okay okay okay now the Second Step I just clicked on Next Step so so this just scroll it down and you will see this important feature which is enable
46:30 - 47:00 hierarchial name space this is the feature that we need to enable and when we enable this it creates aure data Lake instead of aure blob storage and what is the difference between both so basically when you just create the data Lake you have container right you have container blob as well let's say this is your blob this is your data Lake okay let me just write it as well this this is data L this is blob container data L container blob
47:00 - 47:30 container So within this container you can create folders hierarchically but in case of blob you cannot create this is the difference and we want to create data lay so that we can just store folders within the containers okay now you can simply click on review plus create and it will just validate it and once the validation is done it's done so simply click on create and it will just create your storage account simple simple simple
47:30 - 48:00 simple and it is in progress it will just take few more seconds and it should just deploy your storage account and you can also call it as data Lake and after this we will create our synapse resource and then you will know each and everything practically and in detail and do not worry you will learn a lot today you will will learn a lot today this Sunday you will learn a lot I know you
48:00 - 48:30 miss me on Sundays I know I know that I know that see it is deployed now you can just click on go to resource or you can just click on home button and then you can just click on resource groups and then you will see all your resource groups in your case it will be just one so I just need to find it where is mine okay here it is now you won't see it is empty because now we have a data Lake which is called ACH data Lake simple now now now it's time to create the synapse Resource as well
48:30 - 49:00 let's create that in order to create the synapse workspace you can just simply click on create button and then we will simply write synapse and it should show up the popup and just click on this one which is offered by Microsoft click on this and just click on create button okay so now now now no you would have asked me like question what is this manage Resource Group because we have just created the resource Group and it is already filled why do we need to create another
49:00 - 49:30 Resource Group and why it is called manage Resource Group so the thing is this Resource Group will be used by synapse why because it will just keep all the resources which are managed by synapse only such as storage account do you remember I just talked about the default storage account which is assigned to synaps and everyone is talking about uploading some data there and it is very easy to use why it is very easy to use because it is managed by synapse and it is auto synced with
49:30 - 50:00 that synapse workspace what is the use case of it when we create let's say managed tables so the underlying data will get stored inside that storage account and you can directly access that storage account because it is managed by synapse that's why I created a dedicated different data l so that you can know actually how to assign row how to give the right permission and then you can access the data leg not the
50:00 - 50:30 data leg which is managed by S this is the reason this is the reason so obviously we need to just Define the manage Resource Group even if you do not Define it it's fine why because it will just give some random name to it and it will just create the manage Resource Group we should not touch that Resource Group it is just for synapse it is not for you so I'll just leave it as it is now workspace name I can just pick workspace name on synapse workspace you should not write unch you can write
50:30 - 51:00 if you love me but if not it's fine okay region I will pick uh uh uh let's say Central us okay now what is this account name what is this let me just show you if you just click on this I button it will say this account will be primary storage account for the workspace holding catalog metadata with the workspace this is your default storage account this this is your default storage account very good so it is
51:00 - 51:30 suggested that you should always create a new storage account instead of just assigning a pre-existing one to the workspace so I will simply click on create new again I'm repeating we are just creating these storage accounts to for the synapse so that synapse can use those storage accounts to just manage our catalog data metadata everything but our real data will reside in our external data L that we just created simple sort it that's why I was telling
51:30 - 52:00 that thing that there are some gaps which are not yet filled but now it is being filled with me okay no worries so I will just click on create new and I will simply say uh managed storage synapse unch okay that's a good name click on okay and file system name create new similarly I will say manage file system synapse unch I'm not bothering with the
52:00 - 52:30 naming convention because I know that I'm not going to use this at all I will be using my St account if I just show you this what is it written it is written we will automatically Grant the workspace let me just zoom it for you we will automatically Grant the workspace identity data access to the specified data L Storage account using blob contributor row this is the thing that we have to do
52:30 - 53:00 manually but because synapse manages this so it assigns everything all these three things assign other users to contributor role on the workspace obviously who will be working on this second assign other users synapse RB roles using synaps Studio assign yourself and others to the storage blog contributor to the storage account because synaps wants to manage it we do not need to manage it but if we are working with external data we have to do all these steps and I will show you how
53:00 - 53:30 you can just do all these steps and these are important because in synaps we work with external data okay bro next thing okay next security then you just need to Define your admin and password so if you would have worked with let's say my SQL or post G SQL you would have just created your admin and password right if you remember same thing we just need to do it here as well because we'll be using databases and these credentials we can use it okay
53:30 - 54:00 so I will simply say SQL admin user okay let's let's have it or let me just choose admin only it is easy to remember oops admin okay your name login must not SQL identifier oh man come on man let me just pick unch okay simple oops unch okay and then I can just pick a password
54:00 - 54:30 anything can just pick any password you can just put your name your friend's name your girlfriend's name boyfriend's name any anything I have heard like people just put their girlfriend boyfriend's name in the password really bro really come on man grow up what are you doing then click on next networking and then click on on next this is your tag
54:30 - 55:00 okay then as you can see it is showing us serverless SQL pool estimated around $5 us per terabyte I told you that Ser equl pool will charge you based on the size of the data that you are processing so if you're processing one TB of data it will just cost you $5 that's it simple obviously you are not pring one TB of data and this is a free syas works for you because you're using a your account but it is giving you an estimate that how much you can just expect okay
55:00 - 55:30 then just click on Create and it will just create your workspace hm why is failed let me just check did I miss something let me just click on redeploy and maybe we just forgot something to put it wait let me see Resource Group we can just again select our Resource Group which is this one
55:30 - 56:00 okay H let's pick the same regon region here I think it's fine it's fine yeah okay let me create manage end point false and you're ready no virtu network default and what did I miss man let me just click on review plus create again let's see maybe we just forgot something thing and then just click on
56:00 - 56:30 create H now it is just deploying again maybe we missed one box maybe in the networking and let's see now it is creating and now it is saying that deployment is in progress so it should take like one to two minutes and it should just deploy your synapse workspace and once it is deployed then we can just work with our synapse workpace okay so I can just drink some water till the time it is deployed and I will see
56:30 - 57:00 you soon so as you can see that our synapse workspace is successfully deployed so are you excited to open your synaps workspace let's open it simply go to your home Tab and then just click on as your synapse course and then you will see something called as man to synapse on this is managed account this is your managed account so if you just click on this and if you just click on the containers you will see there's already
57:00 - 57:30 a container if you remember what's that this is the container that you created during snaps deployment the file name the file system if you remember let me Zoom it for you manage file system synaps so this is your container if you just open this obviously it is empty because we have not put it put it put anything inside it okay so I will simply go to my Resource Group now thing is if you will put anything inside the storage account it is very easy very very very
57:30 - 58:00 very very easy where's a mic here's a mic and I'm just talking without mic so sorry so sorry so sorry so sorry and I think I was loud enough so that you can just hear okay okay okay okay okay it's fine one two three mic testing okay so this is the storage account okay this is the storage account so if you will put anything inside this storage account you can easily access that access that data
58:00 - 58:30 using synapse easily why because this storage account is synced with this your synapse workspace but this synapse workspace this one is not sync with your external data link your external storage account and this is the destination where you put your m main data if you are working in an organization if you are just building
58:30 - 59:00 any project this is the space this is the destination where you will put your exact like the main data this is the exact location but this location is managed by snaps we do not put all the data inside this no no if you are learning it is very easy to laun with the storage account and everyone knows how to work with synaps using the storage account but if you want to be competitive enough to crack
59:00 - 59:30 the interviews to be an outlier to work in your organization efficiently you should know how to work with the storage account you should know you have to know you have to know there's no choice you have to know you are my data F you have to be competitive right you have to know and now you will know everything okay simple so this is my workspace let me just click on it yours workspace and let me just click on open so this is your synapse workspace wow wow wow so if this
59:30 - 60:00 is your first synapse workspace and if you are already familiar with your data Factory you can say that the UI is almost almost similar if you are not familiar with a Y Factory you should be familiar with your synaps workspace no worries at all and now if you just go to a y Factory you will feel familiar with that work like that interface there so do not need to worry there are some changes I will show you what are those changes but but but but these three
60:00 - 60:30 things are same like these this is pipeline monitor tab manage tab what are these things let me show you let me just expand this tab so that you can just read the namings names as well so first of all this is your Home tab obviously this is our home tab where we are just right now then this is the data tab just ignore it for now and develop tab just ignore it for now okay this this is the integrate button this is the integrate tab if you remember that we just talked
60:30 - 61:00 about aure data Factory we just talked about that you have ETL functionality within s analytics I will just click on this integrate button to just show you so if I just click on integrate button then I will click on plus button click on it and I will simply click on Pipeline then boom if you are familiar with the Joda Factory you will say hey bro this is exactly same I told you I told you so basically if you are not familiar with the factory do not need to
61:00 - 61:30 worry so basically these are some of the activities that we perform in a j Factory and all are exactly same 100% same same types of Link services data sets activities configuration parameters variables scheduling monitoring everything same every every Everything same got it everything same so that's why we are not covering this part in the synapse because we will be just focusing on the core of synapse analytics and
61:30 - 62:00 this is just the AZ data Factory thing and you can just definitely watch my J Factory tutorial for that that I have already created for you you can just find the link in the description for that and is the I think four to five hours long tutorial video in which we have covered everything from scratch and have covered realtime scenarios as well so it's onetop solution for you to learn as your data Factory okay so this was all about integrate service within synaps ntic and now you're familiar that okay synapse analytics provides us data ETL El functionality as well within
62:00 - 62:30 synapse analytics okay let's close this let's close this okay let's close this let me just let me just refresh it because I do not need to save anything I can just reload it and everything will be gone because I didn't save anything okay now let's go to this monitor tab what is this so if you create your pipelines right obviously you will be creating your pipelines you can monitor those pipelines under this St all your pipeline runs trigger runs everything like your logs for the
62:30 - 63:00 pipelines everything will be available in this tab okay and as you can see on the screen as well that we have SQL pools spark pools we didn't create anything don't worry we will create it trigger runs integration run times like aparture data flow everything then we have something called as manage tab what is this so in this manage tab we configure G we create triggers we create Link services connection everything so as you can see we have Link services perview which is a data governance tool
63:00 - 63:30 triggers integration run times everything everything so currently you will see buil-in serverless equl pool what is that we didn't create serverless equal pool why do we have this serverless equal pool so the thing is we do not create serverless equal pool we do not create serverless SQL pool it is automatically created in snaps workspace why because any query you will be running using serverless equal pool or
63:30 - 64:00 let's say you are creating a database within serverless equal pool it will use that serverless equl pool which is autoscaled and which is just charging you for the queries it is running and it is not charging for to you charging you to for the charging you for the data that you have inside the database because actually do not have the we'll explain it we'll explain it we'll explain it we'll explain it don't rush because currently we are not talking about s secret pool I was just trying to give you an overview because you were seeing this thing on screen and you
64:00 - 64:30 should be curious enough to ask the question so this was the answer to that question why we are seeing this thing in our homepage okay simple so now it's finally time to click on this button which is data what is that let me just click on it when I just click on it it will show me like workspace and linked okay what is workspace so when I just click on it it ask me to create SQL database SQL database
64:30 - 65:00 and we have two options one is s SQL pool and second is dedicated SQL pool we will use it right here okay don't worry next thing is Lake database what is this Lake database and why do we have like different database if we are working with SQL database if you are creating lake house let's say and it's called Lake database in synapse language actually it is a lake house if if you are creating a database using ppot using spark then you will be
65:00 - 65:30 creating Lake database it is different from SQL database yes got it don't worry we'll explain it in detail then we have integration data set connect to external data browse Gallery don't worry we will discuss each and everything in detail then what is this linked so what is link so let let let's talk about linked service so basically let's say this is your Source
65:30 - 66:00 this is your Source right and this is your synapse workspace synapse workspace and this is your data or let's say destination anything okay so now you would need to make the connection with your Source in order to talk to it in in order to read data in order to write data you need a
66:00 - 66:30 connection right in synapse we just use a fancy word for it called linked service so whenever I will be creating a linked service with my external storage account or anything it will just pop up here so if I just click on this now you will say just click on this okay I clicked on this why I am seeing this thing if I just expand it it is saying anch synapse workspace that is my synapse workspace name and primary
66:30 - 67:00 hyphen manage storage synapse anch this storage account we saw in our Resource Group if I just click on it okay sir I clicked on this I should see the container yes this is the container why I am seeing this container because synapse has automatically created the connection with synapse and storage account automatically this is called default storage account but we will not use this no no no no no no not at all this is not
67:00 - 67:30 the way to do this work no no no no we have to learn how we can establish the connection with the external storage external data link right yes because we work in a medallion architecture what is this Medallion architecture Medallion architecture is a kind of architecture that we follow to create end to end data engineering Solutions in which we have three layers like let me just write it for you one is like raw
67:30 - 68:00 oops one is raw second is silver then gold so we have these three layers raw silver gold and we use sometimes we use different different storage accounts for each layer how will you just store all the in one storage account then how you cannot okay that's why in the real world we work with external data links and now
68:00 - 68:30 today this boy anlama will tell you how you can do that don't worry okay okay simple we will do that don't worry don't worry I'm here I'm here bro just chill it's Sunday man it's Sunday just enjoy so okay now we will just create our first database and we are going to start with dedicated SQL why because you will feel familiar with that because it is exactly same as your normal my SQL post gr SQL
68:30 - 69:00 Ms SQL it is similar to that okay then we will just go to serverless equl pool which is very much in demand right now because that works with files and I will explain you each and everything like how the things are being happened behind the scenes and how the architecture is working behind the scenes I will tell you each and everything before that let's quickly cover dedicated SQL pool and in dedicated SQL pool as well we have something special special for you what's that it's called it's
69:00 - 69:30 called oops not not oops distributed tables okay sounds nice what's that distributed tables so we already know that whatever data we store whatever data we process we process using distributed set of machines using massive parall processing if you remember remember the architecture right we already know that we have different different compute nodes so if we are creating table we are
69:30 - 70:00 not creating a simple table we are creating a table whose records are distributed among compute nodes what does it mean it means let's say this is your data table okay this is your table and this is your let's pick yellow color not yellow let's pink color I love pink color so this is your uh database okay this is
70:00 - 70:30 your database let's say this is your database that you created under dedicated SQL pool okay okay sir dedicated SQL pool now if you just write this data to this database okay it will show you that table is here obviously we will be querying this table but actually actually behind the scenes the data
70:30 - 71:00 inside the table is stored in compute nodes like different different machines different different uh distributed machines okay your data will be stored in like let's say 60 compute nodes so this is your data okay yep so now we have the like power to define the distribution strategy we have total three different uh strategies that we
71:00 - 71:30 can pick for distribution and each strategy is for special use case let's say like I will just explain you don't worry so we have to pick the strategy according to the requirement I will also tell you which strategy should we pick in which scenario okay let's talk about all the three distrib distribution strategies that we have inside dedicated cqal pool let's talk about it so I am just referring the Microsoft documentation to use the visuals and it
71:30 - 72:00 is very very very good to just see that and it will be very easy for you to grab the information do not need to read any text written here just forget everything I will just tell you each and everything I'm just waving in your on your screen so I do not need to read anything okay so first of all as I just mentioned that what is a distributor table so if you just go through just for two lines it is also saying that distributed table appears as single table but the records or rows are actually stored across 60
72:00 - 72:30 distributions right because of those 60 compute nodes okay okay now it's time to just discuss the strategies we have total three strategies total three strategies first strategy which is Hash distribution hash distribution okay first one is Hash distribution this strategy is used just just note it down this can be asked in your interviews like which strategy you should use because obviously if you have
72:30 - 73:00 just built projects you should know like which strategy you should pick obviously you have three options which one you will pick this is your interview question just note it down okay first thing first strategy is Hash distribution this distribution or this strategy you should pick in the large tables and when you just build the model which is the largest table it's fact it's not fact fact table it's fact table right so whenever you are just creating fact table having numerical
73:00 - 73:30 values like so many records you should always pick hash distribution and how it works so let's say this is your table this is your table and these are your compute nodes where data will be stored okay so this will hash the similar value together for like faster reads so if I just read what is written here so it is written that since identical value always hash to the same distribution so let's say you want to store data obviously you
73:30 - 74:00 have like primary key based on ID so you have to Define on which column you want to Hash let's say you just give hashing on ID okay let's it you just give the hashing on ID okay so now it will just bring all the similar values of ID in one distribution one distribution means one compute simple it is just for the understanding since identical values always has to same distribution so that
74:00 - 74:30 can just pick the similar values faster okay this is the hash distribution what's next the second distribution which is very popular is round robin round robin yeah round robin distribution what it does so first of all it is used for your staging tables what are these staging tables so when you do not want to uh query the data
74:30 - 75:00 when you just want to write the data as fast as you can as quickly as you can simple right if I just give you the opportunity to just design the distribution strategy let me give you an opportunity okay where you have like 10 boxes simple and you have so many records okay and one record is coming to you every time and I will ask you just do something and you have to utilize every
75:00 - 75:30 box like you have 10 boxes you have to use every box and you have to store every record and you have to do it as fast as you can what will you do you will see let me just draw it for you it will be easier for you to understand okay so let's say this is you anome boy or handsome handsome handsome girl what handsome boy and beautiful girl okay so let's say this is you and you
75:30 - 76:00 have let's say three partitions or three distributions let's say not 60 just take three okay now you are the one who will storing the information storing the records so what you will do you will take first record okay oh nice there's a snowstorm outside and I love this snow okay so you have first record okay you will store This Record here okay this is done
76:00 - 76:30 this is done because you have to use all the containers all the boxes okay you have second record you will put it here you have third record you will put it here then you have fourth record you will again put the first put it in the first one then you have fifth one we will put it in the second one you have sixth one you will put it in the third one so you will just use the distributions like 1 by one by one and you will not think much you will not
76:30 - 77:00 apply any logic to combine the similar values together you will just use first container for first second container second third container for third fourth uh value you will just put it inside the first one because you just have three one like three three boxes right same thing same thing happens in Round Rob and distribution it will just put the data let's say in first partition second third fourth till 60 okay so first record will go to here second here third
77:00 - 77:30 here fourth here 60th record here now 61st record again will go to the first box simple that's why it is very convenient when you want to use it with staging tables because in staging tables we do not want to just read that data we just want to write that data as fast as can just dump it that's it that's it that's it that's why we use it with round robin table all like round robin with staging tables okay simple now the third one third one third one third one
77:30 - 78:00 where's the third one where's the third one it's called replicated where is the third one let me search it let me just search it replicated okay okay here it is so they haven't just provided the dedicated space for replicated so now in case of replicated distribution what will happen
78:00 - 78:30 so first of all when we use replicated this is the first question we use replicated distribution when we are working with small Dimensions so in that scenario what we will do we will save the copy of the data in all the distributions it called it is called replicated so that copy of the data will be replicated at to all the compute nodes will be replicated to all the distributions so let me just draw it for
78:30 - 79:00 you let's say just forget that this is Hash just forget about it okay just forget that this is Hash this is your table and this is a very small Dimension table and it will just save the copy of this table in each compute node in each distribution exact same copy remember it should be very small so that it can just fit in the compute node or distribution so you can just easily
79:00 - 79:30 map the values with that because you need to just apply joint joints like with the fact table and your fact data is distributed among so many nodes right and your dimension table is replicated to each node you can easily apply join whatever data you want you can just apply join bro simple this is very important question in terms of practical scenarios Plus in in terms of interviews as well so this was very very very important okay so now we know all the
79:30 - 80:00 three things like all the three modes now I will show you how you can actually create these distributed tables let let me show you so we are in our synapse workspace so if we want to create the table what we need first obviously the database okay so for that we will simply go to data tab we will simply click on plus and we will say SQL database why because we want to create a SQL database okay let's create it then it will ask us
80:00 - 80:30 which type of pool you want serverless or dedicated obviously we want dedicated so I will click on this and click on SQL database click on dedicated and this is saying that you haven't created any dedicated SQL pool so now it's time to create the dedicated SQL pool let's created so you can simply click on manage pools or you can simply go to manage Tab and go to SQL pools and then click on plus new and now we will create dedicated SQL pool so first of all
80:30 - 81:00 dedicated SQL pools are very very very expensive very expensive so I have to just put performance level as the minimum because it will charge me a lot see estimated cost $15 per hour $15 per hour yes it is very very expensive let me just keep it to minimum okay because we are just learning this you can use as much as you can when you're working in the company they have like lots of money okay so now it is
81:00 - 81:30 still like 1.5 USD per hour but it's fine because I want you to learn it's fine so dedicated SQL pool name and obviously you're using free account so you can just put as much as you can but still I won't recommend you to just put maximum capacity just keep it like you months right data 100 C is enough okay and we'll quickly cover this and we will just pause the SQL pools so that we will not get charged okay dedic dedicated Cal pool name so I will call it as let's
81:30 - 82:00 say un dedicated okay dedicated pool okay I will simply say additional settings it's fine it is collation do you know what is collation so basically collation is used to let's say it it it these are some these are the rules that tells us like these are the rules that tell us like how data should be red so this is the collation and we should make sure that collation is
82:00 - 82:30 followed constantly throughout the database if you're using one collation like one way of collation you have to use the same collation throughout the other tables as well and you can also change the collation in the database as well yeah so do not worry about that these are the things that DB is manage but you should also know this then we will simply click on review plus create and and let's create it okay so now you can see it is deploying our dedicated SQL pool so it will take some time and
82:30 - 83:00 it will then deploy our dedicated SQL pool then we can create databases inside this and then we will create our all the three kinds of table so that you can just learn how distributed tables actually get created okay sorted and then we will directly jump to Ser equal pool because we have one topic very important topic in dedic pool which is called loading the data from data Lake to synapse and we need to understand the
83:00 - 83:30 like lots of content uh Concepts before that so we will first learn s SQL pool and we will just cover that particular topic you would have heard about poly waves and copy into command we will cover that topic at the end do not worry So currently we will just cover the distributed tables within the dedicated SQL F sort that because I want you to learn s SQL pool as well before jumping onto that thing because that is very easy that poly base or copy into command
83:30 - 84:00 is very easy Once you know all the concepts of s SQL pool because that is you can say a byproduct when you learn s SQL pool okay so serverless equol is very very important and this is the skill that everyone is in everyone is demanding right now because it works with files I will show you what do I mean by that so and I I will show you each and every permission that you need to actually work with the data setting in the data l so while this is being
84:00 - 84:30 deployed I can just eat something okay I'm really hungry I haven't eaten anything since morning so okay let's wait till it's done and then we can just start our tables Creation in dedicated SQL pool okay so as you can see that our dedicated SQL pool is ready and you can see the status as well it is online online online online that means I have to pay for it now don't worry I will pay for you so you can just learn and grow
84:30 - 85:00 that's all that's all and then one thing you can even pause it if if you are not using it let's say you just learning it and you obviously you are learning in pieces so you can just create it and you can just pause it when you are not using it okay then it will just charge very very very minimum and obviously you have like free accounts so you do not need to worry I'm just talking about them who have like pay as you go account so do not need to worry so let's go to our data Tab and we should just see this uh SQL database
85:00 - 85:30 created because we have just created the dedicated SQL pool okay you can just refresh it if you want if you if you are not seeing this so click on this now we have un dedicated pool just click on this drag down drag down then we have tables empty external tables empty external resources empty because we do not have anything right right we have not created anything else then it's time to create the
85:30 - 86:00 tables so now what we'll be doing click on these three buttons and then we will simply click on new SQL script new SQL script because obviously if we want to do anything in SQL right we H we need a script simple bro simple come on man just click on empty script simple simple simple simple now I I will tell you each and everything first of all let's cover this area this is very simple but this is very important you just need need to name
86:00 - 86:30 your script and when you work with realtime scenarios realtime projects it is very important to just give the right name to the script so that you can easily navigate to that script and every name should be self- understood like it should be like understood by anyone else who is just looking at the script like this script is for this purpose okay so currently this purpose is like uhuhuh I can say tutorial no no no
86:30 - 87:00 not tutorial be professional bro un be professional you just mentioned that script should have a right name so we are just creating the uh distributed table so I will simply say distributed tables simple okay distributed tables now if I just click on this properties tab it will just just close this tab so it's not it is not closed it just it is just hidden if you just again click on this you can just
87:00 - 87:30 have it back so I will just simply close it so that I can have more space I like I like more space I like more space okay so now you can see it is connected to this dedicated SQL pool that's very right that's very right that's very nice then use database if we have multiple dedicated SQL pools let's say I create another dedicated SQL pool I can actually switch between those switch between those what is this master un did
87:30 - 88:00 you create this SQL pool behind our back no it is automatically created by synapse we do not use Master database we create our own so simple now now now now our database is ready one thing that we will be doing right now we will create a table okay so for that and I can just simply add a comment so first of all we will create a round robin table okay and we now have a good
88:00 - 88:30 understanding like how things work for the distributed tables so it's time to just create one round robin table okay and I will simply say I'm just adding commments and if you want to add commments you can just use Double Dash and I will simply say uh create table and then I will say table name uh I will give it let's say Round Table
88:30 - 89:00 okay then I will Define the schema let's say ID int this is just the random name like for the tables you can just pick anything okay ID in name let's say Vare 4,000 then salary then let's keep it end this is just a hypothetical table right then then then then we need to write something called as width hit enter then
89:00 - 89:30 now we will Define our strategy I will say distribution equals to round robin simple distribution equals to round robin simple so this is your round robin table let's say we want to create this table let's click on
89:30 - 90:00 run oops we didn't add e after create Okay now what's wrong with this round droing oh underscore okay simple so let me just zoom it for you so this is the syntax wow man wow wow wow slow claps for anch L okay it happens this is the the syntax so we have defined distribution equals to round drin now it will just
90:00 - 90:30 put the data in 60 distributions one by one we already know the architecture right let me just click on run who is texting me so now we can just insert some random data let's say I I just want to insert one row let's say so that we can just query this table as well right insert into a round table and then I can say
90:30 - 91:00 values and then that's it just a random value I will simply say okay now I can just simply query this data select T from Round Table simple simple it should just give me the result yeah now this table is just a random table now you can just build any table and you can just feed as much data as
91:00 - 91:30 you want do not worry we have like one very very very important section to be covered and we will cover it after s equal pool so that we can I can just show you how you can just bulkload the data inside these tables these tables so for now you should have good understanding how you can just create the labels as per the requirements okay so this was all about rounded round robin table now let's create quickly create a replicated table okay everything is same almost same so for
91:30 - 92:00 that I will also create a schema why because obviously we should use schema if we are just storing the data in the data model so let's and I have just all also mentioned that we use a replicated distribution for Dimension tables right so I will simply say and in dimension table we use replicate table okay and I am creating a schema it's called schema I will call it
92:00 - 92:30 as gold why because according to the Medan architecture we store our data model in the gold uh schema so our schema is done let me just run this okay perfect now I will quickly create a table inside this gold schema gold do dim RS let's say our is dim Brad okay then let me
92:30 - 93:00 just Define The Columns uh column can be DM key okay DM key prod to be specific and then we have prodad ID it's also int and then we have PR name it's barar and 4,000 simple and then then same step with now we need to write distribution equals to what
93:00 - 93:30 distribution tell me it's replicate perfect perfect now you know now you know a lot of things right because now you're a part of my dam okay it is done if you want to load the data you can just use the same syntax like insert into blah blah blah and then you can just insert the data on your own okay I'm not inserting any data because I will just show you how you can just insert this data using copy into and po base uh those Concepts like are really really handy so you can just try to add some data manually do not need to worry
93:30 - 94:00 and then we can just quickly cover the third type of the table which is Hash distribution okay and this is the special one that's why I'm covering this at the end because now we need to also Define the hashing column so I simply say create table uh go dot let's say fact table because we use hashing in fact tables okay
94:00 - 94:30 H then I will simply say dim key Brad okay then let's say Revenue it's int above column is also int okay Revenue let's say there's one more column called cost okay this is also right because in the fact table we just just keep numerical Val numerical values now we will Define distribution distribution equals to what
94:30 - 95:00 is the distribution what is the distribution hash okay I have written hash is it done is it done no we need to Define on which column we want to do hashing in which on which column we want to let's say hash the values based on dim product and we have already discussed like what is hashing and how it works right Dem key prod so what it will do it will store the similar values
95:00 - 95:30 together okay in the similar distribution now I will just run this table let's run yep it's done it's done it's done done so we have created all the three different different types of table and I hope you have learned all like end to end learning is done for like distributed tables within syapse right so I will simply click on this publish all to save my work to save my script click on publish it is being published and at the same time I will just pause my dedicated SQL pool because
95:30 - 96:00 it is spinning and spinning and spinning so let me just pause it and we will just resume it when we'll be just covering the poly base and copy into command now now now now it's time to discuss the seress equal pool in detail and let me tell you serverless SQL pool is the backbone of synapse analytics right now and in the Ser equ pool you need to understand each and every concept like there are like so so so many Concepts there are so many permissions that you
96:00 - 96:30 need to assign to yourself to synapse I will explain each and everything and I'm very very very excited to talk about S equl Pool so without wasting the time let's do this so finally finally finally finally the Superstar of today's video so SQL pool why why why why it is very very important because in the world of lak house what is lak house you you will
96:30 - 97:00 get to know everything everything don't worry now industry is moving towards working with the files why because storage is very very very expensive storage means the the data that we are storing in the databases in the structured format now we want to work with the compressed file formats right the columnar file format and we want to just build table on top of it what yeah it is possible bro let
97:00 - 97:30 me let me just start hold your excitement an I know you really really excited just hold your excitement and yes let's get started okay okay okay okay let's say this is you or me let's say okay now I have data Lake bro just a small request
97:30 - 98:00 if you haven't taken notes so far I'm still angry with you but it's high time to take notes right now because it's very very important okay okay so let's say this is you and this is your data resigning in the data Lake data Lake means storage account okay like I will upload the data don't worry let's say this is your storage account and your data is lying here in the form of csvs pocket any file format let's say this your files okay simple
98:00 - 98:30 now you want to utilize this data you want to utilize this data and you want to create a database obviously like database data warehouse same thing like tables in short you want to create tables so that you can just create your model you can just run equal queries on top of it simple yep perfect good good so far good so far very very nice very nice now I will create a
98:30 - 99:00 database okay I will create a database and this time if I create dedicated database let's say this is a dedicated database then I have to store all the data inside this and then I I can query this I can query this okay and this is very expensive this is very very
99:00 - 99:30 expensive and we do not want to do that we want to use serverless SQL pool and we want to build external tables okay like it's not just about the uh cost because obviously we can create external tables and dedicated SQL pool as well it's about it as Autos Skilling we are not managing it like synapsis managing it right so now what will happen now now this database is your serverless database let's say okay this is your serverless
99:30 - 100:00 database now I I will write it here as well so that you can just note it serverless this is your serverless serverless database is also known as logical database or logical data warehouse let me write that as well using pink color so it's called logical oops data warehouse why why why why why this was my first question when I just
100:00 - 100:30 learned this why we call it as logical data warehouse telling you bro I'm telling you hold on hold on hold on so so far we haven't done something special this is our data and now we want to create a serverless SQL database we have not created yet but we want to create okay this is your serverless SQL database that you want want to create okay now if you create your sub SQL data warehouse
100:30 - 101:00 or slash called logical data warehouse what will happen let me tell you what will happen then it will create a logical metadata layer logical metadata layer just write these words these words are really important and these words will be used in your interviews and you should answer all those questions because you are my data F you have to answer all those questions you have to be confident you should know
101:00 - 101:30 everything do not be blank at that time just answer like a pro okay so this is your logical layer logical metadata layer metadata means like your all the information to columns data types everything okay so it will attach this layer on top top of this data link [Music] oh so actually your data is residing in
101:30 - 102:00 the form of files but we have created a logical metadata layer on top of these files now I can query this data because now I have like table like structure so now when this serverless database or serverless SQL data created it has actually created
102:00 - 102:30 a logical metadata layer on top of your data files now this is so so so cheap why because our data is residing in the data L and data L is cheap we all know then this metadata logical layer obviously this is not holding any data this is just an abstraction layer on top of the files simple now we can create external tables
102:30 - 103:00 now what are the external tables I will just tell you hold on you can just keep external tables just as the form of tables let's say you are just creating tables on top of this metadata layer simple simple simple simple now I know you also like a SQL pool more as a businessman as a business owner because you want to save cost but it's not very easy to work with
103:00 - 103:30 serverless equl pool or let's say external tables why because you have to manage the data as well you have to access the data you should know each and everything how you can read the data how you can write the data how you can just access the data everything and that's why I am here to tell you each and everything that how you can do all these things so this was the overview of serverless equal po like how actually things are behaving Behind
103:30 - 104:00 These scenes okay sorted okay now let's cover one important concept like what is the difference between external tables and managed tables so basically let's say this is your metadata okay or metast store metast store is a repository where we store all the metadata okay this is
104:00 - 104:30 your meta store if you create let's say this is you okay let's say this is you okay now if you want to create a managed table if you want to create a manage table do not worry we do not have like manage tables and uh ser SQL do not worry so let's say you want to create a this is just a concept you should know about it this is just a so if you want to create a manage table so
104:30 - 105:00 what you will do or what you do you store the metad data in The Meta store like all the columns like ID table name everything and the actual data is stored in data Lake obviously okay but this data lake is managed by let's say metast store so whenever you drop the table whenever you drop the table it will drop
105:00 - 105:30 the metadata Plus data as well so everything is gone but but but but but but in case of external tables let's say this is you you are cute okay let's say this is you okay this is me okay don't bother let let me just
105:30 - 106:00 make bandana then I I love bandana now I am just wearing banana since I think yesterday or no no since I think two days back I love it so yeah this is me woohoo this is me okay so let's say I am creating an external table now so all that information like metadata everything will be stored in the meta store obviously okay this is your meta store and data will be residing in the data Lake but this time this is an
106:00 - 106:30 external data Lake remember we created the external storage account let's say data is residing there now if I just drop it drop the table my data will not get deleted only meta store will get deleted that is a difference between external table and manage table so this was the concept that you should know so now you know okay so this was about the external table and manage table now it's time to see how we can actually take
106:30 - 107:00 actions or how we can actually implement the solution using sub SQL pool so for that so for that we need to discuss one very very very important thing and I think if you are my previous subscribers you should know this answer but this is for new subscribers as well and for you my data FM as well because it will be a good revision for you and I would be happy if you already know this so basically we have this synapse okay this is our synapse
107:00 - 107:30 workpace synapse workspace okay this is my data link or storage account okay data Lake how my synapse will read the data sitting in data Lake because we are considering a scenario where we have data in this data Lake obviously because we just discussed that and we will upload the data don't worry we will upload the data so here are the files how my synapse
107:30 - 108:00 workspace will read this data because this is an external data Lake this is not the managed data lake or let's say this is not the default data L if you're using default data L it is very easy to run it you can directly query it but we do not we do not need to do that okay so how we can do this how my previous subscribers which are not my subscribers they are my data fam so if you have subscribed my channel so now you are also my data F so
108:00 - 108:30 it's it's it's my responsibility to tell you as well okay so how how how this can just read this data using data L how so it will use something called as service principle
108:30 - 109:00 it will use something called as managed identity managed identity okay it will use managed identity what is that so basically it is like an ID card that your synapse will have so let's say there's a managed identity there's a
109:00 - 109:30 blank ID card okay and I am writing on that ID card that this ID card can be used to access this data link let's let's let's take an example if you want to enter School no no no we can just enter school like through back Gates so let's say there's an event where the special ID card is required maybe it's a
109:30 - 110:00 kind of let's say uh even special event where you need an need an entry pass okay so what does that entry pass mean if you have that entry pass you can enter the location enter into the location right you can just enter the restaurant or enter the uh any hall or anything any location right using that entry pass so this your manage identity is that entry pass and on that entry pass we will apply a stamp that this ID card can be used to access
110:00 - 110:30 data into data link simple now the question is okay this is simple that we can assign the role of storage contributor in the data Lake and this will read the data easily but how we can create this manager identity card where is this available so there's a good news synapse or any resource in aure get by default one ID card with the same name as
110:30 - 111:00 their workspace name so I have already an ID card with name as synapse workspace or the name that I have put for my synaps workspace same name with the same name I have that ID card now I just need to assign that ID card I just need to apply a stamp that you can just enter now so I will do that this is this is the first step we will assign a role through which it can read and write the data from this data link from this data
111:00 - 111:30 link simple simple simple simple is it done everything is done no this is just the first step okay and this was really really required to perform the further steps so this was all about the manage identity so as our first step we need to First assign our manage identity the role of a contributor to our data L okay let's do it let's let's do it so I'll simply go to my aure portal okay I will simply click on home button I will search my Resource Group simple then I
111:30 - 112:00 will click on my external data link because we want to assign the role to this data link okay simply click on it okay then just click on containers okay these are containers these are empty these are empty I know but still we can just create one container let's say I I will create a container raw in this we will just add some data in just few minutes okay so now I first want to allow my manage
112:00 - 112:30 identity okay so I will simply go to accs am I will simply add a role and I will search a role called storage blob contributor data contributor yeah next so now you have a button called manage identity why we use manage identity because manage identity is the most secured way to establish the connection and it is suggested by Microsoft as well
112:30 - 113:00 and we respect Microsoft so simply click on it and then you need to select the members now you just need to select the manage identity for synaps workspace so you simply need to click click on synaps workspace click on it then you will see that your name of synapse workspace will appear this is the managed identity that you have got for free that you have got automatically when you created the synapse workspace
113:00 - 113:30 so I will simply click on it and I will simply click on it and I will simply click on select that's it now review plus assign review plus assign for one more time and now it is adding the role assignment it is done but it takes 7 to 8 to 10 minutes so do not think like it is done it will take like 7 to 8 to 10 minutes in the background but it's fine it is fine it is fine now now our synapse workspace this one can actually read and write the data
113:30 - 114:00 from this external data Lake ooh now it's sunny snowall is gone nice nice nice so now it can actually read and write the data from this external data Lake really it is connected but still some permission and some steps are still pending right okay okay okay we'll talk about it so meanwhile our permission is
114:00 - 114:30 being given to the synapse workspace we can upload some data I will just upload some random data and do not worry I will just provide this data file in my GitHub repository you can just go there and you can just download it and then you can just also upload the same file I will simply click on containers and I will click on my container that is raw I will upload one file okay let's upload it I'll simply CLI on upload and then I will just browse the file so I have uploaded the file as you can see revenue. CSV actually I have just
114:30 - 115:00 downloaded this file through table server I think there was one blog I can just show you right now you can just go to incognito mode why are you laughing bro W sample data set it's always good to just open incognito mode right right W sample data sets okay perfect so now I can just click on this free sample data sets for
115:00 - 115:30 analyses and uh uh uh yeah I can just click on this there was a special site if I remember it's here like I just search sample data It's tap Guru and just I can just click on this first website and then from here I think I just downloaded this data set so it was an Excel file so I have just converted that in CSV because it was like multiple slides like multiple sheets so this is our data and this is uploaded that's it click on upload yeah now we have this
115:30 - 116:00 data right it's called Revenue star. CSV now our requirement what is our requirement so as we just created our container and we created the raw container and within that container we have this file actually we can just add a directory that's a good practice to instead of direct uploading the file we should always just create the directory but it's fine it's fine when you start CS it's fine okay so now now now now
116:00 - 116:30 what we should do what what what we should do okay or should we just create the directory let's create a directory bro always follow the best practices so I will simply say Revenue we should always create the directory for storing files okay let's say save and if we can just move it from here can we move it from here I think we can just do it from Storage Explorer but I don't know about here so I can just simply go here inside this menu I can
116:30 - 117:00 just re-upload this file okay let me do it I have selected this file here and I will simply click on upload simple so now now now no no no now let's delete this first now we have this data let's say this one and we want to create let's let's forget about creation let's forget about creation I want to read this data how can you read this data I want to read this data using SQL query I I want to I want to but this is a CSV file this
117:00 - 117:30 is not a table how you can just read this data using SQL using SQL you need a table for that bro no we do not need any table we can use the concept of logical data warehouse logical methoda layer if you remember so we have a function called let me just go to synapse to show that okay let me just go to data tab okay and we already know that this is done and this is paused data dedicated
117:30 - 118:00 equal pool is paused obviously that's good okay so first of all before reading in data we will create one serverless database how I will simply click on plus button I will simply click on SQL database and it will say which type of SQL pool you want serverless and database name will be um I will say unch serverless unch serverless okay let's keep this name
118:00 - 118:30 what what what's wrong with this okay now if I just click on this drop down I should see two databases but there's one special thing I have special icon with dedicated and this power button with serverless so that's the identification between two okay I will click on it on Three Dots and I will say m Mt script because I will use one new script for my server SQL pool right okay you're smart you're smart you're really smart first thing first we just need to rename it
118:30 - 119:00 and I will give the name as let's say open roset open roset okay what's that let me first close this tab and let's have this empty blank thing now I was saying that I want want to read the data oops I want to read the data which is stored in the CSV format or any format and that is not a table that is not a table but still I
119:00 - 119:30 want to apply a SQL statement am I a stubborn kid yeah but still I'm a stubborn kid but I have to read the SQL data it's not about being stubborn at all like it's just the functionality so I can just do this using this function open row set this function open row set what this function will do this function will allow us to read the data using SQL and data is in files like data is in
119:30 - 120:00 the CSC format paret format but we can read the data that's amazing that's amazing so how we can just do this for that for that for that for that wait wait hold on I I know you like really really excited to just do everything just just hold on I'm just explaining everything so now if you would have stored this data in your default storage then you would have
120:00 - 120:30 easily read this file but now you have not stored this data in your default storage account you have to do something special not special like these are like prequest or configuration that you have to do before reading the data we know that our our our means our synapse managed identity has the power to read the data I know I know I know I know but we need to tell
120:30 - 121:00 synapse that use that identity because how synapse would know that what method it needs to use there are so many methods XSP SAS tokens service principle how SAS would know that which method it needs to use so for that that we create something called as credential we create something called as credential so let me just write it for
121:00 - 121:30 you we create something called as credential if you are familiar with databas unity catalog and obviously because I have already created a dedicated video on data brakes Unity catalog as well if you haven't watch that video you should you should okay so if you're familiar with data brakes Unity catalog this is exactly similar to the storage credential that we create in data this is the credential that we create in synaps third thing we create
121:30 - 122:00 we Define the exess connector because that is the mediator in this scenario we have manage identity in that scenario we put the access connector ID in this we just need to put the managed identity name because it is understood that this name has this ID so instead of putting ID we just need to put name that's it that's it that's it that's it because what we are doing what we are doing with this credential let's let's I am being very very like I'm going so so so deep
122:00 - 122:30 but it is required I I promise that I will just explain each and every concept like like like you do not need to just go anywhere else after this video that's why I'm just explaining everything in detail and trust me these all all the things are required you do not need to think no one will ask these questions no bro everyone is asking these questions these are real important so you have to do you have to cover all these questions right so why do we need to create a credential
122:30 - 123:00 we already discussed that right because we know need to tell synapse that you need to handle this so that is why we will create a credential using managed identity using managed identity we will say hey bro this is your credential and inside this credential you have your ID card sorted it is similar to that I am just handing over my wallet why would I hand over my wallet let's say I'm just
123:00 - 123:30 handing over you a envelope and within that envelope I have put that ID card that keep it and use it similarly we are creating an envelope using credential and within that credential we will put that ID card simple got it yeah simple very very very nice this concept was really really really a good one and if you haven't understood the 100% of it do not need to worry you can just rewatch this part because obviously you cannot
123:30 - 124:00 grasp everything in mango that's not how you learn you have to rewatch the things you have to think on yours like what like how things are going so it's fine you can just rewatch this part it's fine it's totally fine okay so for now what is the syntax to create credential it is very simple I will just show you here as well and I will in the documentation as well do not need to worry so first of all to create the credential we have one more prequest it is very simple and it
124:00 - 124:30 is very subjective we need to create the master key yes we need to create the master key let me just take you to the documentation page of Microsoft and then I can just explain you what I actually mean so okay I can just open in cogn in cognito modes are like better right it's better so I will simply open incognito mode it is good for privacy right it's good for privacy okay so in incognito mode I will simply write uh data break why data
124:30 - 125:00 breaks why why data bre why data why data bre we are working with synapse AR synaps synaps snaps okay synapse data source simple so I will simply click on this thing I think this is not the right documentation this is for fabric okay okay okay I think this is the one yeah so in this documentation just forget everything
125:00 - 125:30 that is written do not need to worry do not need to worry I will simply take you to the the main thing which is uh uh [Music] uh credential yeah simple so this is the credential that we create and to create this credential to create this credential we need to specify the master key and for that I can or I can just simply say
125:30 - 126:00 master key in Microsoft or SQL simple because msql server is standard for all the things like this step is standard for all so this is the syntax why I'm so eager to explain you this why why why so basically this is not the thing that you should worry much because this is the master key that we just need to create for like on like for once that's it that's it and we do not
126:00 - 126:30 even need to remember it so we just need to give some password and obviously we can just forget it after that and why I'm just showing you through documentation because obviously I cannot show you the password so I will just write this command I will show you I will show you don't worry I will show you how to write this and I can simply copy it and I can just go to my synapse workspace I will simply write create master key just remove this and just remove this square brackets as
126:30 - 127:00 well and then inside this password we need to Define one special password like having multiple special characters like just keep anything like just keep your X's name like you can just keep anything anything it's it's all up to you like X that means like X variable okay so me being very very clear my point was like XY Z a b c d special characters like this do not I I didn't mean that thing so you can just put anything and just
127:00 - 127:30 keep your password complex and you do not need to worry to learn it or to store it somewhere do not worry so I'll simply just put the password here and I will just hit the Run command I won't show you so you can just do it on your own okay H so I have run the command it ran successfully and I have removed the password so I hope that you have also ran it successfully okay so I I can also remove this like I I didn't need this now okay simple now our master key is ready now we can just create the credential so first of all I will write
127:30 - 128:00 create credential why I'm just adding comments because it is very very very good for others as well to understand the code and it promotes readability and I was the one during my school days in the whole class who always got scolded for not intending the code not like not doing any code because I I was not like much into like indentation and all I didn't like to just keep my code
128:00 - 128:30 neat and clean I was the messiest guy in the class so my teacher used to like sco me what you are doing an how can I just read this code and then I replied why you want to read this code just run in the machine machine will read it yeah I got the slap in return but I I I I don't care bro I don't care so create credential I was the most notorious student of my class I'm I'm must say the
128:30 - 129:00 most notorious student so create credential so here the the syntax is very very very simple very simple so I remember it I will show you how you can just do it you do not need to remember it first of all you do not need to remember it because obviously in real life you will just copy the code from the documentation because these are the things that you do not need to remember do not not worry I remember it that's a different story or obviously when you were building like these stuff again and again you will remember the code okay
129:00 - 129:30 but there's no need to remember it do not need to worry create credential sorry database scope credential it's called database scope credential okay and then I will just name it and I will say un creds my credential I I want to name it like that I want okay then I will write with with now I will say what is the type of identity card I am using so I will use identity equals
129:30 - 130:00 to managed identity oops okay managed identity we are specifying that we are using manage identity let me just run this okay it ran successfully now I can just show you as well like how you can just find this through documentation so we will simply go to oh oh you're showing behind the scenes bro okay okay okay okay okay so I just I will just go to the same documentation which was for external data source right
130:00 - 130:30 now now now now now now so I will just simply scroll it down a little bit and I should see create database code credential bro yeah see here is the code here's the code do not worry about this thing do not worry about this thing we haven't talked about this let me raisee it let me raisee it do not look at
130:30 - 131:00 this do not look at this no no no no no no okay okay okay okay okay such a mess man you are such a mess okay so we have like one thing create master key we have created it okay then this is the database group credential so in this scenario they are using Oracle like with identity equals username maybe they are just using some username and password but in our scenario we are not using username and password we are using manage identity so
131:00 - 131:30 this is a syntax simple and you should always use manage identity it is recommended by Microsoft as well perfect perfect perfect perfect okay now now now now what bro you are showing behind the scenes okay now our credential is ready now our credential is ready now our onl is ready okay now we want to create something called as external data source
131:30 - 132:00 if you are familiar with data bricks it is exactly similar to external location we create in unity catalog bro an I love you but if you again talk about TAA breaks or any other thing that we do not know we want to so better better not use any other thing okay okay okay okay my bad my bad I won't use it I won't use any other thing okay okay what datab breaks bro I I I I don't like see see
132:00 - 132:30 see see see so if like now our onop is ready just get to the point our envelope is ready now our credential is ready okay now we will create external data source why let me tell you this is your container right this is your raw container like if I just go to container this is your raw container right if I have multiple containers let's say I have another container let me create it for you I can do whatever I can to
132:30 - 133:00 explain you the stuff so let's say I have container called enriched okay and rised wow and red okay this is the another container I can only use one container in my external data data source so I will create data source which will be pointing to this container that will be called as raw
133:00 - 133:30 data source raw external data source you can give any name but I'm just using the right naming convention right then you can create external data source for this container andr data source and the good thing is in both the scenarios in both the external data sources you can use the same credential why because credential is holding the ID card which has the
133:30 - 134:00 permission so you do not need to create different different ID cards okay I know you didn't get all the thing because I still need to show you how you can do that and now I think you can relate it why it is so so so difficult not difficult why there's like a big gap in synaps analytics learn because these things are really really really time-taking to explain and you need some extra efforts and your bro an lamba has
134:00 - 134:30 accepted the challenge to fill this Gap wow clapping clapping clapping I'm not kidding bro it takes some efforts to explain the stuff ah I need some stretching just a sec ah now I'm I'm good I'm good yep now I'm good okay now you got the concept I hope
134:30 - 135:00 so now you will get 100% of it when I will actually show you how you can do that simply go to the snaps workspace okay now our credential is ready now we will create something called as external data source very nice I will simply copy this comment okay and I will say create data source external data source external data source okay and I will create external data source now you can just
135:00 - 135:30 look at that area which I just cut it so now I will simply show you how you can just do this because I I know like the code so I can just show you not not a big deal okay create external data source simple create external data source now we need to find it so we can just say raw because it is pointing to Raw container right it makes sense raw external Source let's say okay now it
135:30 - 136:00 needs it needs few things first of all this is create external data source name and then we can just say with now we need to give the location what is the location location equals to now is the good thing now we want to create the location like not the create the location we want to get the URL of the location so there are two ways I will show you both the ways and I will tell you which is better and you should know both the ways you should know both the
136:00 - 136:30 ways one thing is you can simply go inside this inside this and click on these three dots properties you will get the URL you will get the URL how just scroll it down uh this this this thing URL you can just simply copy this and you just need to make a small change I will just copy copy this and I will show you what does that change so let's say this is the URL do not run it do not run it I didn't say you to run this okay let's say this is your location that you need to create
136:30 - 137:00 and obviously we do not need to keep the location till file level we just keep the location till folder level in the data link that's a basic thing that you should know okay now one thing that you need to do you need to remove this blob and you need to write DFS you need to write DFS yes yes this is the thing this is the thing that you need to change here that's it what is the second way and I really like that way why because this is adapted in the industries plus if you're
137:00 - 137:30 building big projects you do not need to go there again and again and you can just get the location from here and here we are discussing a new not new concept if you just click on link you will see all the linked Services means connections and so far we have just one connection we have just one connection within our synapse and this is our primary storage account default storage account we do not want that so here's the recommended thing you can just go to
137:30 - 138:00 manage tab click on link Services click on plus new and you should create your connection or link service with your external data link and I will simply do that I will simply say your blob storage no data L Storage gen two click on continue I will name it as um external data L because it is an external data L right then then then then then then I will I
138:00 - 138:30 will simply say authentication type as managed identity system assigned managed identity not user assigned system assigned managed identity because we have already given the permission to use this data leg using managed identities simply click this click on this now which storage account it needs to pick we want to pick un L very nice very nice very nice simply click on create simply click on create done it is created it is created now if you go here and if you
138:30 - 139:00 just refresh this you should see our external data L wow this is the recommended approach that you should always do just a tip just a tip now if I just click on this I will see all the containers see raw enriched now I can just access my data Lake from here that is the power simply click on raw and then it will show me all the folders Revenue then from here I can just copy the URL how I can just
139:00 - 139:30 simply click on this just select this and right click on this click on properties and then you have the URL see I can simply CL copy this copy copied done so I can just use this location here and if I just copy it you will see the difference of just DFS instead of blob simple simple so this is the location that we want to build but here's the thing do we actually need to just create the external Source or
139:30 - 140:00 external data source till folder level till this level no why I know you can just answer this just just use your brain for 2 minutes why it is very interesting it will just open your nerv nervous system just just just think about it why I know you have reached very very very close to it let me just tell you first of all I will just remove this why I do not want to create till folder
140:00 - 140:30 level okay I will simply create my data source till container level why if let me just go to my container if inside my raw I just have one folder right now right Revenue if I just add another file let's say sales if I add third file let's say customers I have to create external data source with all those folders but I can play smart I can I can
140:30 - 141:00 only create the external data source till container level and then I can just Define folders whenever I need because I have the access to all the folders inside this this was a smart move right see I'm just telling you all the smart things all the smart things I I want you to be the smartest simply just remove this and now within this location we are creating data source now but we need to put credential very good because that is the
141:00 - 141:30 ID card we cannot create anything without that so our uh name of credential is unch grids simple simple simple simple it is done it is done I can simply run it oops incorrect syntax near location equals to Y oh I can just use single quote because sometimes it gives errors like this let
141:30 - 142:00 me just rerun it yeah just single code you should always use single code because it doesn't like double quotes I don't know snap snap okay okay okay okay okay okay be serious an what are you doing bro is it your computer class I remember those days bro I can't even tell you like how much fun I used to make in the computer class if you would meet any of my
142:00 - 142:30 friends by the way I don't have friends right now like not even a single one but if you just if you would ever in your life maybe any one of my school friends can tell you may be watching this video and they can definitely tell you like how how how how was my personality in the computer class and how was I like in the computer class I I was the one who who was who used to entertain the whole class because I was the messiest guy in
142:30 - 143:00 the whole computer lab and my computer teacher was my class teacher can you imagine so in the PTM my father used to go to the PTM with me and I used to be like very nice and very calm in all the other subjects but in computer science I was like the messiest student in the whole class and my class teacher was the computer computer computer teacher so she used to
143:00 - 143:30 like complain about all the things that this guy does this this this and my father was like every other teacher talked nice about you why it's just the computer teacher who has not even said like a good word about you why I said like you should attend my computer class you will get the answer okay okay okay nostalgic
143:30 - 144:00 things let's let's let's get back to the work childhood thinks that phase is gone now you should focus on this thing okay now our external data source is created okay and now we can just access the data yes now we can access dat now we have external data source so first of all I will just publish my work you should always always publish your work while you are just building stuff it is very very important
144:00 - 144:30 right H so it is I think published yeah so now finally I want to read my data from the file sitting in the data L how we can do that I will simply write open r at function finally we are just using that function it's called open roset open ret function okay perfect so for that I can just show you the documentation for open roset function uh
144:30 - 145:00 syntax you do not need to remember it it is very very very simple but for CSU file you just need to add few more steps that's it but for parget file Json file or Json file you have to add some steps for CSU because these are text based files but pocket files or like Delta file format the those are like ideal files because those are columnar based file format B colum B columnar based file formats not Bist do not use like so much of accent colum or Bist file
145:00 - 145:30 formats okay open roset function let me just show you that okay perfect bro perfect so I'll simply write open roset open Ros set simple perfect perfect perfect so here you will get the syntax so this is the syntax that we use to read the data simple like data
145:30 - 146:00 source provider name catalog and for CSV I can just show you uh yes so this is for CSV do not need to worry you do not need to use all the things and I know it is very very very confusing when you just read the syntax from these documentations because they have just used generic code they have just used parameters inside that and I I also don't like that that's why I'm not showing much I can just show you the example yep so let me just take you to my my my
146:00 - 146:30 personal my personal documentation which is this okay so syntax is very simple you simply need to write select statement I want to apply SQL as I just mentioned I want to apply SQL on top of the data sitting in the data lake so I will simply write select Ax from table name but here we do not have table here but we have open row set function so I will simply write select Ax from let me just indent it okay let me just indent it now I can
146:30 - 147:00 indent now I can indent okay select Ax from now instead of using table name because I don't have any table name bro I don't have any table name I just have file so I will simply use open row set within this within this I need to give the location this function says you can give me any location I will bring the file for you I will simply bring the file for you okay yep simple simple
147:00 - 147:30 simple simple now for location we use a special word called bulk and then we need to define the location now what will be my location what will be my location let me give you a hint you will just get it get it okay obviously I will not put it anything here I will simply go to next line and I will write d data source equals to what is the name of my data source row external Source okay raw
147:30 - 148:00 external Source simple now just tell me what will be my location okay I think you're smart my location will be only Revenue folder why what is this location the actual location is this let me just copy it for you I will just remove it actual location I took so much of time and I was just understanding these Concepts actual location is this the
148:00 - 148:30 full but but but but we have added that location inside the data source already we just need to give the rest of the location that is the power of using data source till container level I can just use the same data source and I can just put put the rest of the location which is rest of the columns rest of the folders inside this bulk simple simple bro simple was it good was it good I know
148:30 - 149:00 it's good now it's done no not really we just need to specify which type of file we have inside the CSV which type of file we have we need to specify this okay so I will simply say format equals to let's say CSV okay now with CSV we need to use some special configuration like field code field delimeter all those things one
149:00 - 149:30 thing is just going to the documentation and just read the whole documentation copy the right code easiest way why why why why I have created this thing I will tell you so the easiest way that you can do is simply go to this raw container click on this Revenue folder click on this click on your file and right click on it and you will see new SQL script okay just click on select top 100 rows what it will do it will just write the query for you open row at query for you yep you can enjoy just click on this yes
149:30 - 150:00 simple this has written the query for you automation automation automation but can you run this query actually not should I run it it will fail you want to test me okay I will run it it will fail why because is it is not using any data source so it cannot access the data we have the data source so why it is Handy we can just copy these things like parser version and 2.0 this this this
150:00 - 150:30 thing because these are like things are important I can just simply close it I just wanted to show you that you can just generate the query like this now just close this go to open row set now I can just add this parser version equals to 2.0 I think we can also use header header row equals to true or one so let's ignore it for now now if I just run this it should run it should run fine because we have attached this data source this data source has the credential it needs credential what is
150:30 - 151:00 credential manage identity manage identity is packed inside the envelope which is credential and now that envelope is handed over to the open ret function simple use the real life analogy use real life analogy and be happy in life make things simple what to make things complex bro now I want to run this I will simply click on run button incorrect syntax near data source
151:00 - 151:30 I was expecting this maybe I just need to write this data source yep click on run again a correlation name must be specified for the bulk row set in the from Clause Clause what what does it mean we forgot to put the name of the query so we just need to define the name just if you are just familiar with the subqueries we just need to define the alas for the query so we can just give name as query one query to your
151:30 - 152:00 XYZ I was kidding I was kidding sorry sorry sorry sorry sorry no no no XYZ okay only query one then we can just run it and I should see the data in the tabular format using SQL without any table and database wow wow wow wow open ret function just return the query boom boom boom boom so this is your SQL table without having SQL table without having anything
152:00 - 152:30 this is your table this is your like SQL you have used SQL on top of the file aren't you feeling happy bro what are you doing bro you should just dance because you have just use SQL query on top of file learn to celebrate small vs you are learning a lot okay just enjoy the process enjoy the process of learning enjoy the process of
152:30 - 153:00 achieving success enjoy the process you are in that process you are learning you are growing you should feel happy that you have learned how to query a file using SQL without having SQL database or SQL table we haven't built any table we just use open ret function that's called The Power of logical metadata layer simple right boom now if you can see it has returned the column as column one column two column 3 column 4 you can just Define the header row equals to
153:00 - 153:30 let's say two and it should work because I rarely work with csvs I think header row is not the right one I think header version I think let me just check so we can just try one thing wait wait wait I can just make it true me just make it true yeah let's try it that's called debugging you can just hit and try and let's see how oh it worked man very good now you have all
153:30 - 154:00 the data in the structured way wow man I love it I love it I love it I love it so this was all about open Ros set function and let me just click on publish all done I love it I love it love it love it now we will actually build table now we will build database we will build external tables why let's say let me use some drawing my I'm I'm an artist bro
154:00 - 154:30 let me just use the canvas let me just draw something I so the thing is jokes apart I feel like I can explain better with drawings and you can why I'm using drawings bro I can when I can just speak and I I can just explain you so the thing is when we see the things visually and we are just just seeing drawings you can retain that knowledge in your head for a longer period of time that's why I'm just doing everything for you let's do it okay so let's say you
154:30 - 155:00 have created your database okay you have created your database why you're creating your database or datab so that you can just build the reports you can just build the reports in powerbi right so I now I can just see the data using open function I can I know this now I want to create tables obviously we can create external tables why because our actual data will
155:00 - 155:30 be residing here in the Raw but but but but but but we can create external tables on top of it that means when we will be dropping data our data will be here it will not be dropped only metadata will be dropped that means table definition will be dropped that's it so now I will create external table on top of it and there are like multiple ways first of all we will just see the basic thing like how we can just create the table like very very very basic like
155:30 - 156:00 external table how we can create external table on top of any data so now what I I'm going to do I will create external table on top of this Revenue uh folder so they are basically two ways to do this one is C Tas and second thing is normal uh normal external table definition that we will be looking at it in just few more minutes so
156:00 - 156:30 instead of just shuffling everything I will just show you one by one everything and I will create scenarios as well so that you can just relate it okay very good first of all you should celebrate that you know how to work with open row set now I will simply click on publish all and I will create a separate notebook for external table okay let's do it simply click on scripts and here I will simply say I will go to data Tab and then I will go to workspace and then I will click on this drop down button and then
156:30 - 157:00 if I just click on external resources I should I should see my external data source I will click on this external data sources raw external Source see now what is this external file format what is this external file format so basically this is used when we want to create external tables this is used when we want to create external tables because we need to Define what kind of file we have what kind of file format do
157:00 - 157:30 we have so we create something called as file format for that so we can also add one thing in this step after creating external data source we can create external file format as well so you can say if you have CSV file you should have one file format if you have pocket file you should have another file format so you can just create different different file formats like this okay so I will simply say I will simply copy this Comm like comment don't worry you will learn everything today you will learn everything I will simply say create
157:30 - 158:00 external file format okay perfect so for that I can simply write create external file format and this is CSC file format so I will say CSC file format CSC format okay and then I need to say with format equals to text so if you want to see the exact
158:00 - 158:30 syntax I can just go to the documentation page I'll simply say file format in synapse create external file format and then this is your syntax so if you are creating external file format on the top of pocket file format it is very very very easy you just need to write one line of code if you are creating external file format for CSV you just
158:30 - 159:00 need to write these things like format options data compression like all the things like string delimeter like what is a delimeter because you're using a text file format right like the text file you are using first row data format using type encoding everything so you need to write all these things like format options so it's not a big deal I can just write it for you so I have just written this format options and like this this is the basic thing that you have to write in all the scenarios if you have like pocket here
159:00 - 159:30 you will simply write format type equals to pocket that's it if you have Delta you will write Delta but this thing is special like this thing you have to copy it from documentation and you have to paste it because you need to define the encoding for CSU files and Terminator like delimeter everything so that's why so here you what happened so the thing is we are using I'm really tired so the thing is we are using CSU files so by default it is like the
159:30 - 160:00 delator is comma so we do not need to Define it here so I will simply run this so it complete with errors ddl option encoding okay let's moove this encoding then we do not need to worry so I'll simply run it it run successfully now we have like CSU file format created now most of the times you will create pocket file formats so do not need to worry so pocket file formats
160:00 - 160:30 are really really easy so you simply need to write format type equals to pocket that's it you do not need to write this format option because this is just for the delim meters field Terminator row Terminator special encoding like if you want to just use field quote like if you have like M like special characters within the files like within the column so you need to define the uh coding like how you need to code the values so for all these things we need to just tackle it smartly using CSV so you do not need to worry at all so
160:30 - 161:00 now your file format is also done let me click on publish now as we just said that we will be creating external table right so let me just quickly go to my workspace and I simply will create a new script and I will call it as external tables create oops create external tables oh man okay perfect now I'll simply copy the comment it says create
161:00 - 161:30 external table [Music] okay okay so in order to create external table we need just three things and the good thing is we all we have all the three things what are those three things first of all you need external data source right external data source simple second thing you need external file
161:30 - 162:00 format and we already have these two things third thing obviously location like within data source within container where you want to create the table on which data you want to create the table so location simple when you have these three things you are good to go bro you're good to go you're good to go and we have these three things so we we we are good okay so how we can just use this so we will simply write
162:00 - 162:30 create external table and I will call it as Revenue external table okay then we want to Define all the schema and for that we can just simply uh go to file and we can just check the schema like what are the columns like edit so you can just simply click on the file and click on edit you will see the content of it so these are the columns I can simply copy these columns let me just paste it here let me indent it indentation is really really
162:30 - 163:00 important hch what are you doing oh man those days okay simply then we can also Define the uh schema so in CSV it is always better to give all the data types as vard because it by default reads all the columns at as string columns so I will simply say 4,000 where let me just copy this vard 4,000
163:00 - 163:30 okay okay why are it showing red why why why why why Oops why are it showing red here bro why bro why why man why so this is the external table okay now we need to Define all those three things we will go to withd go with withd then we will say uh location the best thing best thing we
163:30 - 164:00 already know we just need to define revenue because we already have the rest of the location in data source we will simply say data source okay and data source is raw external Source if I'm not wrong let's see if we have an errors so if if we have any error then we can just correct it but I don't think so we should have oh oh no no no no no wait wait wait wait wait wait I just I just clicked by mistake I I just pressed shift plus enter so sorry so sorry so sorry that
164:00 - 164:30 was not intentional so sorry oh oh oh oh oh calm down calm down nothing happened okay okay okay file format equals to CSV format have we run that command for yeah we have run that okay good now all the things are nice okay now we can just run this because this will create a table on top of this data on top of this this folder it will create an external table so let me just run this okay it ran
164:30 - 165:00 successfully it ran successfully let me just query this data oh man I want to go to gym I don't think so I would go to gym today I need to just complete this session see I'm just compromising my gym for you see my love for you select Ax from one more thing one more one more one more thing we will be we will be just organizing one live session okay one
165:00 - 165:30 live session very soon I will just notify you and and just tell me one thing like should I just create one Community for us like the permanent Hub permanent place for us where we can just communicate with each other and we can have have some live sessions maybe once in a month like it will be fun right it will be so good to talk to you live so I was just thinking about that if if you
165:30 - 166:00 want to do that just tell me just let me know and I will just try to just design something and I will just try to create a kind of community where the thing is see this channel is dedicated towards learning right pure learning pure learning means pure are learning and you are not fool you have already observed and explored hundreds of channels right you know what's the difference between this Channel and others I'm not comparing myself with
166:00 - 166:30 anyone I don't need to I'm just expressing myself that's it so I think there should be a place see all we all are Learners I'm not saying that I'm a pro I'm the best developer no I am learning I am still learning but I'm sharing my knowledge with you and like after reading your comments I feel like I can share my knowledge with you maybe it's my talent I don't know I
166:30 - 167:00 know how to deliver that knowledge because I imagine like all of you like like I'm I'm just trying to say let's say I put myself in your shoes I put myself in your situation and then I try to explain the stuff if I am the one who is just learning the stuff and if I want someone to explain me this concept nicely or
167:00 - 167:30 easily I just imagine all those things and that's why I I I can connect with you and I know like how to deliver that knowledge you can become a pro developer bro you you would be a pro developer like you would be like the best developer in the whole world but but does that mean that you can share your knowledge with others just answer this question you are working at XYZ company you are working as manager for 105 years
167:30 - 168:00 blah blah blah can you share the knowledge with others if yes then please if no then do not let others stop sharing the knowledge so that's my whole point and we all are Learners right so let's say there are some very very very new data Enthusiast who are like like learning from my videos if they have some doubts
168:00 - 168:30 like you should be there for them why that's how you grow you're helping others right if they have any doubts they can just post that question regarding this concept regarding these regarding these videos that's it no here and there talks no no no garbage just regarding these data engineering Concepts right and you can just help them like you can just approach this question like this or you you just tackle this question like this so we can just build something we'll see we'll see okay select srix from our table what is
168:30 - 169:00 our table name it's Revenue ex table okay let me just query it okay now now now now that is the biggest difference between open row at function and external table you do not need to just write open ret again and again and again you have created your external table that's sorted that's sorted now you will see the data you can connect this table with your power Pi yeah you can I'm not lying you can I will show you like what
169:00 - 169:30 information you just need to connect it with your powerbi I know I see oh but I can see all the other things as like null null null null n why why why why why maybe we just need to define the exact schema like if I I just go back to the CSU file and if I just go maybe there would be numbers and all special correctors okay so we will just correct
169:30 - 170:00 it don't worry so we have some some data oh okay okay I got it all the information is stored in just one column so maybe we just need to define the delimeter in our file format CSU file format we will just correct it don't need to worry so the whole intent was we are able to see the data we are able to see the data through tables now what I was saying that actually this table doesn't have anything like this table doesn't have anything see this is just a metadata layer and when we just connect
170:00 - 170:30 our table with powerbi or any reporting tool it is actually quering this table and behind the scenes we know that actual data is stored in the form of files see that is the magic of external tables in snaps that is the magic of external tables and that is so so so good and now you know what it has actually done it has created external table on top of this location on top of this location Revenue
170:30 - 171:00 column Rue data source and this CSV file format so now let's cover one important concept that is C Tas what is that c Tas it's very popular concept by the way so I will simply first of first of all write c as so it's basically let me just write in caps okay see do so it's basically create table as select what does it mean so when you
171:00 - 171:30 want by the way there are like many use cases of this command will cover like some of these scenarios where you can just use this table first of all the first scenario when you have if you remember we have like open roset function here right this one when you and this is a query this is a just this is just a query right when you want to create table on top of a query or on top of a view by the way view is also a query yeah so when we want to create a table on top of a query
171:30 - 172:00 we can use this function C Tas okay I will just tell you the architecture of it and you will get to know why this function is very very very useful I will just show you so let's say you have your query let's say this is your selection something as from this this this and this is the result of that query okay this is the result and this is just a query this is not the physical data stored somewhere right now we will use
172:00 - 172:30 something called as C Tas so I will create external table on top of this query okay and what it will actually do when I will create external table of on on top of this query it will write listen to me carefully it will write this query to the given location I will just give the location
172:30 - 173:00 while creating the external table and then it will just write that query to that location like the result of that query that means we can use C Tas that is create external table as select statement to migrate the data as well wow let's say if I just run this open R function again okay and we know that this is pointing to the data which is here the revenue star. CSV right and we have created one container here which is
173:00 - 173:30 enriched I want to move this data to this location which is enriched can we do that yes I will show you so what we can do I can simply copy this query because I will be using this query so let's write it C Tas it is very very important because obviously it creates an external table on top of the query plus it migrates the data as well so I will simply say
173:30 - 174:00 create external table and after saying external table I will simply say let's say any table name create external table uh c as or Revenue see as Rue see it as simple okay so now what I will do I do not need to define the schema because we are just writing the uh result of the query right so I
174:00 - 174:30 will simply say with obviously we need to define the location where we need to write the data we need to write the data uh in okay okay okay let's create another scenario instead of writing data and to enriched we can just write the data within Raw and we can just create a new directory let's say see it as Revenue so the whole point is to explain that it can move the data why I'm not using this container because in that particular scenario we have to create
174:30 - 175:00 the new external data source right and I do not want to do that because that step is already created and it's out of scope to just explain the same thing again and again so you can just create your external data source for one more time it will be a good revision for you but the whole intent of this step is to show you that you can create an external table on top of a data plus it will be pointing out the data like it will migrate the data as well yes so I will just show you the
175:00 - 175:30 whole stepbystep execution as well like how it is actually building an external table and how things are like moving behind the SC so I will simply say first of all column is raw c as Revenue okay so I don't even need to just give the folder name I can just create the folder name using C as well yes even if if I do not have any directory still it can just create it so let's say if I just G give SE as Revenue
175:30 - 176:00 simple this folder is not there still it will create the folder for Us location and then data source data source is raw external data source okay one thing that you need to keep in mind you need to choose Ed database your or un serverless because by default whenever we click on the uh any new script it just picks this thing Master by default but we do not need to do that right so raw external
176:00 - 176:30 Source simple then what we'll be doing obviously file format file format and we will pick CSV format now here's the thing here's the very very very important thing for you so even if you do not want to store the data in CSC format you can just pick any other format so it will just write the data in Pocket format so let's say let okay let's let's let's create one pocket
176:30 - 177:00 format okay let's let's do it so let's say I want to create one pocket form we can just create it here as well no worries so I will simply say create pocket file format simple we can just do that simple now how we can just do this same
177:00 - 177:30 step create external file format and I will say call it as pocket file format pocket format simple and then I can just say with oops format type equals to pocket that's it there's one more thing that we need to Define that is like compression equals to Hadoop so we can just get that compression from here so I can just
177:30 - 178:00 simply close the like unnecessary tabs so file format synapse so we will just grab the compression we need the compression this is the one Apache Hadoop something like that so format options equals to data compression for pocket is this one I personally use Snappy codec you can just use anyone so compression data compression equals
178:00 - 178:30 to this one simple you can just copy this compression from here you can just pause the video and you you can just type it so I will just create this it's done it's done okay now let's use pocket format pocket format simple simple okay now now now now our external table is all set to be stored in this location now what is the result result is Select
178:30 - 179:00 oh we already have the query right in the open row set we want to store the result of this query okay I will simply paste this here okay simple now I can just run this table so it should just create the table at this location which is a new one and when I will going to my data Lake I should see the folder let me just go there let me just refresh perfect and the special thing data is converted into pocket format boom we do
179:00 - 179:30 not have like CSV anymore it just converted it for us so if you just see the uh data you can just expand it and you will see dot pocket see that's the magic magic are you magician no but yeah that's a magic it had just converted the data source as well so now let me just show you what are the execution steps what is the order of order order order order of execution steps okay so if you remember we just created this external table okay let me
179:30 - 180:00 just tell you how it will be actually done behind the scenes so let's say this is your query right this is your query this is your query this will return something like some data right and this is your data link okay now the C Tas statement will first will first transfer your data do not go to the syntax that is written
180:00 - 180:30 here I know that we are creating external table first and then we are just storing the query no no that's not how it works behind the scenes it will first return the result of the query it will transfer that data to this folder first this thing will be done okay once the data is there once the data is there then it will create the external table on top of this data just write down these steps these
180:30 - 181:00 are the order of execution like how C TS will be performing these steps okay I know this was a good one so we have covered c t we have covered external table like we have covered like so many stuff right now if you want to create views on top of this open rad set function can you do that yes you can do that so for that I can just create a new query new script new SQL script and then you can just pick onist
181:00 - 181:30 and I will say views views okay so I will simply say views and Views are just like the queries we are just storing the result that's it it will not write any data don't wor and we can simply create create View and view name will be let's say Revenue view okay as same
181:30 - 182:00 query simple I will just run this command and it should just create the view it will simply create a view view is just like the object of the query that's it so I can just simply query The View as well it is very handy when you do not so yeah let me just share this if you want to use this open ret function you do not want to create the table right you do not want to store the data anywhere else but you want to use this query again and again then in that scenario you can just use this view because now you do not need to worry about this code you can simply say
182:00 - 182:30 select ASX from revenue view simple right yeah simple simple you can just get the data perfect oh I know what you are saying you also want to query that table that we created okay let's query that as well because that is a pocket data set srix from what was the name of that table see it as no Revenue see it as no let me just
182:30 - 183:00 check what was the name of it Revenue see it as yep Rue see it as let me just query it okay perfect perfect see when synapse works with Pocket Data it works very well it works very well well because that is the ideal you can say file format for Big Data it doesn't work well with Json or CSV we have to do like so many workarounds for those but still not do not need to worry about that so now we have like clean table stored in Sur a
183:00 - 183:30 equl pool so let me just first click on publish all to publish everything okay so this was all about external tables open R set function and one more thing Let me just share that thing that is very very interesting that can be your tricky question in your interviews okay we created table on top of this folder right on top of this folder
183:30 - 184:00 okay what if I will create a table external table but there's no folder available for that particular location did you get what I'm saying let's say this is my folder C it as Revenue okay of perfect I want to create an external table that will say see it as Revenue to okay on top of the folder which is called c as Revenue 2 but but but but we
184:00 - 184:30 do not have any c t Revenue to folder here so what will happen in that scenario do you know so the thing is it will register the table for that particular folder Whenever there there will be a folder in future with the right with the exact schema inside that the data inside that it will create the table it will show the result I I I I can just show you this scenario this is very very
184:30 - 185:00 important so let's do it let's create another script and let's say scenario okay and I will create a table create external table let me just first pick the database external table I will call it as Revenue let's say two Revenue two let's say okay and obviously I will Define the schema so for schema I can just go to
185:00 - 185:30 the table read external tables yep I can simply go here and I can just bring the schema yep perfect I can just P the schema here okay this is my table and what will be my location let me just grab as well don't worry I'm not using the same location because we are just covering a special
185:30 - 186:00 scenario okay now this time I want to give location Revenue okay one to one okay one to one Revenue one to one or let's say Revenue two that will make sense okay now we want to register table will it run successfully obviously yes let me just show you okay but there is no folder there see let me just refresh there's no folder there so if I will be quering this table what will happen then it will throw error it will say there's no data data
186:00 - 186:30 cannot be listed let me just query this Revenue to it will say bro which table which table is not accessible data is not accessible but table is there if I just go here in the data tab in the SQL in the external tables if me just refresh I have the table Revenue two I have the table but I cannot see the data but but but but if I upload the data if I just create the revenue two folder with the
186:30 - 187:00 same exact data file inside it it will just show this data it will just show the records that's it so that is a very tricky scenario and you should be aware of that and now you are aware of that as well perfect perfect perfect perfect so now I think you have learned so much in the world of synapse analytics do not worry we are just covering one more important topic which is how we can just load data from data Lake to synapse to
187:00 - 187:30 synapse dedicated SQL pool what do I mean let's say I want to create a table in dedicated sqle pool right and we always load the data in the physical form in in the in the tables right so how we can just load the data let's say I want to load this data see it as Revenue this pocket file in my table that I will be creating in dedicated SQL pool how we can do that so there are basically two ways what like what are those two ways one way
187:30 - 188:00 is uh copy into copy into command and second way is C Tas which is also known as p it is very very like very famous and it was very popular before but now now people and companies and Industry one second one second one second now companies are preferring this
188:00 - 188:30 copy into command it is scalable it is fast it is like obviously obviously obviously better than see it as oops see it as approach so they are like two things one is C it as and second is C oh one is C Tas and second is copy into I will show you both both both the ways do not need to worry so first of all we will cover this copy into let's
188:30 - 189:00 first create this copy into how we can just load the data into dedicated SQ pool and yes for that we just need to again resume our dedicated s f are you ready are you ready are you ready how you J hi sir okay so let's do that so let's go to snap's workspace and let me just go to manage Tab and let's go to SQL pools
189:00 - 189:30 and let me just resume it so it will just take like few minutes to resume it because it will just restart everything and then once it is like resumed then we will just start building our table so basically what we'll be doing under copy into command so we'll just create one simple table like with exact schema then we will just run copy into command like I will show you how you can just do it it is very very simple uh it is very very very handy when you just need to move data from data Lake to your synapse
189:30 - 190:00 tables so let's see how we can do it so as you can see that it is resumed now so let's create the table so I will simply go to data and I will simply now this time click on this dedicated SQL pool and new script let me close all all the other script stabs so I will simply say copy into copy into okay perfect so this time
190:00 - 190:30 I will simply create copy into command so I will say copy into okay copy into so now for copy into what is the architecture so this is the architecture you have a table okay in the dedicated SQL pool in the dedicated SQL pool okay now this is blank this is blank this is your data link and you have file here you want to
190:30 - 191:00 load this data you want to copy this data into this that's why we call it as copy into into which table into dedicated table using this file perfect okay so now we will simply create the table first because we need the table for this create table and I will say copy into table copy into table simple and for schema I can just go here in the scripts and I can just select the script for c as or sorry create ex table script because I just want to keep the
191:00 - 191:30 schema so I will simply paste the schema here okay perfect so now we can just Define the distribution as you all know that we Define the distribution as well and distribution let me keep it as like round robin because we can just treat it as our stage table okay so I will simply say round robin perfect so now should I just run this yes so it it is just creating our table so our table will be ready inside dedicated SQL pool and now Our intention
191:30 - 192:00 is to load the data into this table into this table okay so it is just creating the table it is done okay now if I just query this table it is empty right it is empty so the table name is copy into table so if I just query this table it is empty it should be empty yeah it is empty now I want to load the data inside this so I will simply say loading
192:00 - 192:30 data okay so now I will simply say copy into let me hit enter yeah perfect copy into and table name is copy into table from so copy into this table from where from where from location and how you can just get the location simply go to link like data and linked and then raw click on this and then CS Rue right click on
192:30 - 193:00 it and just select properties and just copy the URL simple now we can just paste the URL here in the single codes perfect so now everything is done one thing is missing how it will just identify the file type so we will just say with file type equals to pocket I don't know if we need to just use single code or not we will just test it don't worry file type and then we just need to define the data source uh
193:00 - 193:30 should we Define the data source credential yeah we can just Define the credential because it should have that ID card right credential so credential is UN creds do we have the credential in this do we have the that credentials I don't think so yep that's a very good catch so we do not have credential for registered under the dedicated SQL pool like we have credentials under this tab if I if I
193:30 - 194:00 just go to data if I just go to workspace I have credentials external resources under this exter raw external source and everything if I just copy the script where is it create external uh uh open R set yep so here you can see we have created all these things like credential and data source and file format everything under this un serverless SQL pool so if we want to use this credential we just need to copy
194:00 - 194:30 this and we need to create credential in our database like database cope credentials so I will just copy this and actually I will just copy this external data source as well because it can be used in the further step yep so I will just copy this and actually I will just copy the whole script that will be better that will be better because we just need to register all those resources under dedicated SQL pool as well so we'll do that okay so this is my copy into command and I will create
194:30 - 195:00 another script under dedicated SQL pool so I will click on this three dots new SQL script empty script yes and I will call it as external rest rest means resources right I doubt this won't run properly because we have Define the master key but let's see let's see let's see first of all I do not need to Define CSV file format we are not working with CSV in dedicated SQL pool so get lost and obviously we are not using open function so get lost
195:00 - 195:30 we are just using credential and data source and file format that's it just three things that's it I will simply run this this oh this ran okay that's fine so now we have un creds registered under dedicated SQL pool as well now I just use this credential and I can just run these as well because these will be needed when we will be creating external tables under dedicated SQL pool that is our next step that's called polybase what is polybase just a fency name for
195:30 - 196:00 CT there were like two methods ctas and copy into ctas is also known as poas okay okay sir so I have just run all the commands let me just click publish all and now I can just complete my copy into command one more thing that I forgot to mention that if you just copying the data into the dedicated SQL table right you have to define the schema here as well right so what we'll be doing we just need to define the schema name like
196:00 - 196:30 the table like column names that's it because let me just tell you first I should just complete this because it will just treat this if you just want to let's say rename anything so it will let me just complete it for so this is my first column so you need to give the numbering as well this is your third column this is your fourth column this is your fifth column because because because there's a special scenario for that let's say let me add comma let's say your file has like
196:30 - 197:00 shuffled columns let's say and you want to Define model ID as the second column of the file let's say you want to Define model ID as the third column of the file let's say there can be a scenario right this is a real-time scenario and your file actually has the second column which should be your third column in the dedicated SQL pool so by this method you can just Define the numbering so this numbering is related to the file columns like 1 2 3 4 5 6 this is the actual
197:00 - 197:30 column name dealer ID model ID which is in your dedicated SQL table so you can just do this and there's one more way to do this an credit credits so if you want to just Define the identity obviously you have just created your cred using this identity which one identity equals managed identity so you can also call this table like this so let me just run this and it should just copy the
197:30 - 198:00 data in your what is this error bro I think I know this error so don't worry don't worry don't worry so I think I know this error don't worry don't worry don't worry because I have seen this error in my previous project as well so the thing is as you know that I have just stored this data in the pocket format so in that pocket format obviously these two columns are not being stored as V or string value these are like being stored as integer
198:00 - 198:30 integer columns so we can just Define like redefine these columns and that is the power of debugging because you have to have to have to deal with the errors that's how you grow and that is the advantage that I'm getting right now because I already got the kind of s like similar kind of error in my previous project like the the video that I already built on project so now I can just tackle this error because I already seen this error so now if let's say in future you will see these kinds of Errors you know like how to tackle these
198:30 - 199:00 errors so that's why I always say errors are always always always good they teach you something which is really really important for your growth so to do this I will simply drop this table because I want to just redefine this table with the right data type so I will just run this drop command and I will instantly remove it okay now I will say bant because we use b instead of int here okay perfect let me just rerun this
199:00 - 199:30 table okay perfect perfect perfect now I can just try it because in my previous project it worked well because of these data type mismatch it can just throw some errors let's see let's see let's see let's see oh oh it worked yeah I I I I was I was like familiar with this kind of error I I I'm not sure like was this the exact error but it was like similar to this one so now our data is loaded into the table now if you want to query the table we can just simply say
199:30 - 200:00 select estx from copy into table this is our table name so I should just see the data see data is loaded and this is not a serverless SQL pool or like let's say logical table no this is the real table we have copied data from this from this where where is my data L yeah from this folder to our physical table physical
200:00 - 200:30 table copy into command yes this was so so so good now we will use the poly base method which is the second type which is CT which is commonly known as polybase because it is very popular term so you should just consider this term and you should remember this term like someone can ask you like do you know about poas you should say poas means CT yeah I know simple take your time to remember things and let's say yeah I know bro I know let me just publish it first and then I will
200:30 - 201:00 just create one more uh script for poas okay so let's do that so simply go to three dots new script and I will name it as poas okay or in bracket CTS oh it is not allowing me to just use special characters I can just use underscore polybase CDs perfect so how polybase Works how polybase works so as the name suggest ctas right ctas we have already
201:00 - 201:30 familiar we are already familiar with C task it is exactly same it is exactly same so first we will create an external table yes we can create external tables in dedicated cqal pool as well yes so we will create external table and then we will say hey create a table as select srix from that external table simple this is the concept of ctas
201:30 - 202:00 this is exactly same exactly same so let's do it first of all we will just create an external table on top of the uh pocket format that we have so we already know how to create external tables we just need three things and now we have those three things so I will simply say create external table and I will say let's say bucket table let's say okay and then I can just Define the schema and I can just copy the schema
202:00 - 202:30 from here because I know that we need to keep the schema same so I will simply go here and I will paste the schema here this is my external table that I'm building on top of this location you will get to know okay withd and then location equals to revenue oops Revenue see it as or what's that see it as
202:30 - 203:00 Revenue see it as Revenue not this yeah it's perfect seat as venue okay underscore perfect location is this what is the data source and that's why I told you that we would need to use our data source and file format that's why I created before see I'm so smart I'm so smart wow wow wow and our data source name is units simple and then file format file format is pocket
203:00 - 203:30 format perfect so now this is the definition of our external table let me just create this let me just run this so it will just create what is that external data s oh sorry sorry so sorry we did not create the external uh like data source external data source is not UNS it is called raw external source so sorry so sorry so sorry let me just read on it manage service identity is only supported by AB FSS scheme okay no
203:30 - 204:00 worries do you know what is this error oops do you know what is this error do you know so it is saying that it only accepts the ab FSS location when we create the external data source and dedicated SQL pool so if you just click on this link links but link button and when you just click on Raw when you just click on properties you get two URLs right see one is this one one is AB bfss path so it says like it likes that path so okay bro we will just provide you that path no worries simply click on it
204:00 - 204:30 right click and then copy abss path and then we can just provide that data location to that data source okay simple go here here here where where where so our external resources script is here we just need to update our external data source so there are two ways it's up to you you can either delete this one or you can just create new I think better way is to create a new one so I will simply say create external data source and I will say here AB FSS because it is using
204:30 - 205:00 abfs location so that's the disadvantage of using external table in dedic dedicated SQL pool because it accepts AB bfss storage path by the way data brakes also accepts abfs path so it's up to you I personally like abfs but while working with synapse so I use https so we just need to update this path so where is our path oh I just copied it let me just recopy it and then properties and then
205:00 - 205:30 copy the abss path okay perfect let me just go here and first I just need to write it and obviously we can just remove the folder name perfect and we know the reason because we are just pointing our data source to the contain container level now you are smart okay let me just run it let me just run it it ran successfully now I just need to use this data source because synapse likes it we can't help it says like I want abss PA
205:30 - 206:00 so we will just provide abfs path no worries let me just run it okay it worked fine okay perfect now we have the external table ready right now what we will do we will now write catas or poly base poly base perfect now same exact same exact same syntax so we will simply say
206:00 - 206:30 create table and table name would be let's say poly table let's say okay and then we just need do not need to define the schema no no no okay create table poly table and then this is C task simple create table this one as select AXS from this table so we want to store this table or create this table
206:30 - 207:00 on the on top of this table which is this one create external table this one so it will just query this table and whatever query whatever result it will get it will just store that data inside this one create table po table and then it's boom no not really so we just need to define the distribution as well because it's a dedicate SQL pool so we will say distribution equals let's say round robin let's it let's say we just want to use round robin so let me just
207:00 - 207:30 run it so it should just create the table and it will just load the data using that query select query okay it's done I want to test it okay now I will simply say select ASX from poly table perfect let me just query it I should see the data I yes yes yes yes good job good job good job good job so now we have the data right we have the data wow wow wow
207:30 - 208:00 so these were like two methods of just loading the data from data L and just tell me how was it and which one did you like which one did you like more if you would ask me I personally like poas more I person like C as more because you feel much familiar like obviously you are just working with subq pool so you are just building external table and it's very easy to just build external table in dedicated SQ pool as well and same thing I personally like ctas so it's up
208:00 - 208:30 to you which one do you like now now now now our 99% of the tutorial is completed only 1% left what's that because oh first of all let me just pause my dedicated SQL pool bro it's so expensive okay let's pause it so what's that what's that what's that one person thing that is missing so we have covered everything only one thing is left what's that if I just go to Apaches spark pools
208:30 - 209:00 we haven't discussed yet like we just discuss discuss the concept but we also want to look at look at it okay so if you just click on Apaches part pools let me click on it and then we can just create a new one let's create it so this is the cluster that we can create and we just need to configure the cluster cluster means like group of machines synapse will take care of all the machines all the machines means all the machines so first of all we want to
209:00 - 209:30 give the name so I will say un spart pool okay and it is saying that auto scale enabled disabled so obviously it is it is like very good to have like Auto scale enabled and it is saying per hour this would be the cost it if we say let's say minimum number of nodes it's three maximum number of nodes are 10 so let's say I want to reduce the Min maximum number of nodes is like three simple right I will simply say review
209:30 - 210:00 plus create that's it click on create this is the configuration that you need to do that's it synapse will take care of all the machines everything everything everything now what we going to do using this pool like it is done like right right it is done so what we can do like simple so if you want to just transform your data let's say you have a data in uh data Lake and if you want to just transform the data you can just do
210:00 - 210:30 everything so it will basically be attached to a notebook so when you just attach this cluster to a notebook it will be just processing your all the data and you can just run your ppar code spark SQL code so now if you just want to learn ppar in detail if you want to just be a pro ppar developer I have created a dedicated six hours long video just on ppar in which you will be learning all the available functions in ppar and definitely it will help you a
210:30 - 211:00 lot to become a pro ppar developer and you can definitely check that video and you will get to know each and everything related to ppar how to read data how to transform data how to use like countless functions because I have covered all all all the functions in that and trust me that video is really really really helpful to you if you want to learn pbar so once you just do all those functions then you can simply go to the uh Lake database and then you can just create database and then you can just create your notebook and then you can just read
211:00 - 211:30 and write the data everything and it's all sorted it's all sorted so if we just go back to our scripts so these are basically the scripts that we have created first of all let me just publish all because we have done so so much of hard work right so so so so so so these are all the SQL scripts that we have created so we have learned everything from scratch every every everything and you have just learned so much like open R set c as c
211:30 - 212:00 as then poas and then views open everything right everything so I hope now you have a great understanding of your s ABS altic now if you want to learn your data Factory in detail just for the pipeline things I have a dedicated video for that as well you can definitely check that video and obviously after watching that video you can just build pipelines within synaps second thing if you want to become a pro
212:00 - 212:30 ppar developer developer developer developer you can just watch that video which is six hours long and it will just make you a pro pbar developer and yes I have created one end to end project video as well in in which I have used synapse for data warehousing so you can definitely watch that video as well all the links will be available in the description you can learn you can grow and I hope this video will be very very very beneficial to you and you will be
212:30 - 213:00 cracking lots of lots of lots of interviews related to data engineering especially for aure data engineering because this is all related to Azure right so I have just mentioned so many videos and I am eagerly waiting for you to just click on the video coming on the screen so that I can see you there bye-bye see you next week happy learning