Get Certified: DP-600 Fabric Analytics Engineer

DP-600 Fabric Analytics Engineer FREE workshop!

Estimated read time: 1:20

    AI is evolving every day. Don't fall behind.

    Join 50,000+ readers learning how to use AI in just 5 minutes daily.

    Completely free, unsubscribe at any time.

    Summary

    This workshop provided an in-depth overview of the DP-600 Fabric Analytics Engineer certification, focusing on preparing attendees for the exam through various Microsoft Fabric components and tools. Participants learned about data ingestion, transformation, modeling, and deployment pipelines using Microsoft Fabric, while emphasizing practical tips for exam preparation. Additionally, it included engaging activities such as quizzes to reinforce learning.

      Highlights

      • The workshop included key concepts like data ingestion using pipelines and data flows gen 2. 🌊
      • Participants learned to handle different data storage scenarios with Lakehouse and Warehouse. 🏠
      • Security and governance aspects were covered to ensure safe data practices. 🔒
      • Strategies for effective data modeling and optimization were discussed. 📈
      • Exciting quizzes helped reinforce the material in an engaging way! 🎮

      Key Takeaways

      • Get ready to ace the DP-600 exam with Microsoft Fabric! 🎓
      • Explore data ingestion, transformation, and modeling in Microsoft Fabric. 📊
      • Learn practical strategies for exam success. 🚀
      • Engage with fun quizzes to solidify your understanding. 🧠
      • Network with fellow learners and industry experts! 🤝

      Overview

      The DP-600 Fabric Analytics Engineer workshop by Data Mozart was a comprehensive guide to navigating the Microsoft Fabric ecosystem for certification success. Attendees were walked through the intricacies of data handling, starting from ingestion to transformation, and finally modeling data for analytics using various tools within Microsoft Fabric.

        One of the core elements explored was the use of fabric pipelines and data flows for efficient data transformation and loading. The workshop also delved into comparing storage solutions with Lakehouse and Warehouse, offering insights into when to use each, coupled with practical scenarios.

          Moreover, the session was peppered with interactive quizzes to not only challenge the participants but to also reinforce the learning in a lively manner. Key exam tips and strategies were shared, giving participants the confidence to tackle the DP-600 exam, with ample opportunities for networking and question-answer sessions making it a rich learning experience.

            DP-600 Fabric Analytics Engineer FREE workshop! Transcription

            • 00:00 - 00:30 [Music] [Music] [Music]
            • 00:30 - 01:00 [Music]
            • 01:00 - 01:30 [Music]
            • 01:30 - 02:00 [Music] [Music]
            • 02:00 - 02:30 [Music] [Music]
            • 02:30 - 03:00 [Music] [Music]
            • 03:00 - 03:30 [Music] [Music]
            • 03:30 - 04:00 [Music] [Music]
            • 04:00 - 04:30 [Music]
            • 04:30 - 05:00 for
            • 05:00 - 05:30 [Music]
            • 05:30 - 06:00 [Music]
            • 06:00 - 06:30 [Music] [Music]
            • 06:30 - 07:00 [Music]
            • 07:00 - 07:30 [Music]
            • 07:30 - 08:00 [Music]
            • 08:00 - 08:30 [Music]
            • 08:30 - 09:00 [Music]
            • 09:00 - 09:30 [Music]
            • 09:30 - 10:00 hello hello hello good morning good
            • 10:00 - 10:30 evening good afternoon depending from which part of the planet you are currently joining and I see a lot of familiar and less familiar faces here and I can be more excited about this uh what is going to happen in next I hope four hours not more than that but
            • 10:30 - 11:00 promise not less than that so at least four hours uh we are going to talk about how to pass uh dp600 exam and uh hopefully learn some things about Microsoft fabric along the way so the idea is not just to get you ready for dp600 of course that's the main purpose why we are here today but uh definitely to have some learnings uh uh in general about fabric as a platform along the way so without any further talk I'll jump
            • 11:00 - 11:30 first into introducing some housekeeping rules because there are a lot of us here and we are coming from different countries different continents different cultures different backgrounds different level uh of skills and so on so what I wanted kindly to ask every one of you is to be mindful and respectful of everyone in the audience that means uh if you are posting something something in the chat
            • 11:30 - 12:00 window please make sure that uh uh uh everyone feels safe and comfortable being here today and uh yeah that's that's my message to you basically uh few more things that I wanted to mention is uh if you have questions and I will come up in in a few minutes to explain uh uh and uh talk about people who will help uh this event uh uh going uh seamlessly so if you have any questions
            • 12:00 - 12:30 feel free to drop them in the chat uh for this uh for this live stream and we have amazing people in the audience uh who will happily take and try to answer all of your questions if there are some questions I will watch recordings uh after the stream and if if there is any question that was not addressed during the stream then I'll try to pick them up and uh and do either a followup stream or to to answer those questions to
            • 12:30 - 13:00 address the those questions through my blog so without any further talk and then yeah don't forget to prefix your questions with uh letter Q so that moderators can immediately spot and differentiate between the questions and some other comments of course you are feel free to uh enjoy discussion and uh uh uh uh talk about fabric talk about everything related to data and uh that's why we are all here basically so I'll
            • 13:00 - 13:30 start sharing my screen for the beginning and let's go so the title of this live stream is dp600 fabric analytics engineer and uh in the first few minutes I'll do a short introduction about this exam uh itself and why you would consider taking this exam and becoming a certified fabric analytics engineer so before that of course uh as every
            • 13:30 - 14:00 good host I want to introduce myself my name is Nicola I'm originally from Belgrade in Serbia but for last almost nine years I live in wonderful city of salsburg in Austria uh where I work as an independent data platform consultant and trainer focusing mostly on work with powerbi and Microsoft fabric uh living in salburg was the reason why I've chosen this nickname data Mard so it doesn't have anything to do with my love to classic music or what whatsoever I'm
            • 14:00 - 14:30 not a big fan to be honest of classic music but yeah you probably know that salsburg is uh verly known as a birthplace of foran gados Mozart so I was let's say brave enough to use his last name as part of my nickname and that's why I'm trying to make music from the data I'll also try to follow what's going on in the chat but uh yeah please don't mind if I don't react to all of your messages immediately I have few screens uh all over the place so I'll try to I'll try to cover uh everything
            • 14:30 - 15:00 as much as possible uh currently what's interesting maybe for all of you at least it is for me interesting uh I'm writing a book together with my friend Ben Weisman for oril which is called fundamentals of Microsoft fabric we are almost halfway done uh so you can go on or's platform and read the chapters that we already wrote and published and we expect to have a printed Edition uh in the quarter one or quarter two of the
            • 15:00 - 15:30 next year so it depends on on different things but yeah we hope to have it soon and we hope that there will be not so many changes in fabric uh because we already had to rewrite our chapter so far a few times yeah that's something which is uh relevant uh for fabric things are developing and evolving so fast so it's it's uh hard to stay up to date but yeah we we are doing our best before we start a huge round of applause for these
            • 15:30 - 16:00 wonderful people you see on your screen so par Johnny goind Tom pragati Eugene Miguel rubayat Andy and Shannon and I'll will mention few more people on top of that so these 10 wonderful people will assist you today uh with providing answers to your questions and let me just briefly provide you with a little more of a background uh how it came to this uh to this point yeah you don't
            • 16:00 - 16:30 want to see this uh so essentially when I started this initiative approximately a month ago I didn't even dream that so many people will be interested to join and obviously because I'm alone here and I tend to present and do some demos and show stuff so it's hard to complete everything and monitor the chat answer the questions because I expect there will be many many many questions so I was thinking what should I do now how how should I
            • 16:30 - 17:00 handle the questions from the audience and then I reached out to my friends from community and they immediately responded wanted to join and help with this initiative so I'm hugely thankful and grateful to all of you and uh for all the others in the audience uh uh I hope you appreciate the effort and time that these people will spend the next four hours trying to help you master all the skills uh relevant for dp600 exam there will be
            • 17:00 - 17:30 two more persons that I want to explicitly mention one it's Pam spear or Spire sorry Pam I don't know if I pronounced your last name correctly from Microsoft so she's not helping with technical uh uh moderation so she will not answer to your technical questions but uh all the questions that you have specifically for exam Logistics uh like if there are any more free vouchers or
            • 17:30 - 18:00 so on please uh uh please ask and Pam will make sure to to answer those questions okay so what are the goals for uh our session today it's spear oh good okay okay then yeah I'm fine I I'm good with that okay so what are the session goals for today next four hours first obviously all all came here to learn something right so yeah that's that's the main goal of our
            • 18:00 - 18:30 session today but what shouldn't be underestimated at all is to have some fun and uh I'm a huge advocate of learning while having fun because learning just for the sake of learning sometimes can be you know boring uh uh or overwhelming remember in school we always loved those professors and teachers who brought some fun into into the process of learning so that's what I'm trying to do uh it doesn't always work well I have to admit but yeah I'm
            • 18:30 - 19:00 doing my best so I'll try to do again my best today to help you learn some things relevant for passing dp600 exam and have fun along the way so let's quickly introduce and do a short overview of uh dp600 exam and dp600 certificate which is officially called so once you pass the once you complete this exam hopefully after watching this session and after uh
            • 19:00 - 19:30 completing uh uh challenges and walking through Microsoft's documentation uh the official title you will receive will be fabric analytics engineer and that's this very nice badge that uh you can put on proudly on your uh shirt so essentially dp600 is the exam that or is certificate that consists of only one exam and this exam is called impl implementing analytic Solutions using Microsoft fabric why am
            • 19:30 - 20:00 I emphasizing this that this is one exam if you took certain Microsoft exams in the past uh you Pro you're probably aware that certain certificates uh consisted of multiple different exams we don't need to go too far in the past so like two or three years ago for Azure data engineering uh certificate you had to pass two separate exams now it's one but it used to be two exams let alone
            • 20:00 - 20:30 going back uh for mcsa and mcsc in SQL Server there were like five different exams to pass so here good news is you need to learn and pass just dp600 exam and you are getting certified so this certification is for you if you are working or you are interested in the fields of data analysis and data engineering obviously no one is prohibited from taking uh dp6 on the exam but I would say the target
            • 20:30 - 21:00 audience for uh for this certificate are those professionals those people who are working or plan to work in data analysis and data engineering realm and finally you should uh uh uh to be able to successfully uh uh pass this exam you should understand various features and services in Microsoft fabric what is uh let's say desirable candidate profile
            • 21:00 - 21:30 so who should take this exam uh this is a person who has expertise in designing creating and deploying Enterprise scale data analytic Solutions this sounds scary so let's break this down some of the responsibilities included uh in the in the description of of this role uh contain preparing and en reaching data for analysis securing and maintaining analytic assets and implementing and managing semantic
            • 21:30 - 22:00 models you see who these professionals should help in their professional work and I don't want to go and read through through these slides because yeah I will send uh the presentation to all of you who register for uh for this session so you can go and read on your own but what is important and I want to stress this thing here uh because a lot of people approached me and asked okay I'm doing this and that I know this technology I
            • 22:00 - 22:30 know this language should I take dp600 or not so this is something this is a list of skills in terms of uh uh uh programming languages and and some Concepts that you need to know in order to complete dp600 exam so we are talking about Advanced powerbi skills no data visualization so if you are great at building nice looking uh uh reports and dashboards maybe you want to spend some
            • 22:30 - 23:00 time learning some background stuff when it comes to powerbi uh Power query andex uh so you don't need to be power query ninja knowing all M stuff uh uh programming custom connectors or whatsoever but you need to know power query and especially from the perspective of what can be done within power query obviously decks uh yeah if we talk about Advanced power BS skills deck cannot be uh excluded then
            • 23:00 - 23:30 TC equal and kql so this part slightly changed uh for those of you who were considering taking dp6 on exam before mid November uh there was no kql included it was tsql and Pi spark and now Microsoft decided to uh change the curriculum of dp600 we can also talk about that if if there is enough time uh uh by the end of this session so now p park is not there
            • 23:30 - 24:00 anymore there will be maybe some questions here and there uh that will ask you how to do certain stuff in pis Park but generally speaking and officially now kql or custo quering language we will come up later to explain uh what is kql and in which scenarios it's used now kql is officially part of dp600 exam and there are a lot of data modeling uh Concepts and techniques that you need
            • 24:00 - 24:30 to know in order to be able to uh answer properly to to those questions that are related to data modeling so data modeling in general we are going to talk about certain Concepts but data modeling in general with special Focus obviously because we are talking about the analytic solution obviously the the the uh emphasis is on uh uh dimensional modeling and building star schema so as of 15th of November again uh exam
            • 24:30 - 25:00 curriculum changed back then this is what you are going to be tested on and in terms of the amount of questions that you can expect during the exam uh you see percentages for each of these skills that are measured so there are two there are three main three core uh components of uh uh skills measured within dp600 maintain a data analytic solution which increas Ed after 15th of November
            • 25:00 - 25:30 preparing data which is almost half of the exam and then implementing and managing semantic models so those three we are going to break this down during the session all of these three into smaller subsets and cover everything that is included in those three uh in those three skill set uh uh skill area measured but let's take one step back because yes as I mentioned and I see there are more more than 300 people currently uh on the Stream So maybe not
            • 25:30 - 26:00 all of you are familiar with Microsoft fabric but honestly if you don't familiar if you never heard about Microsoft fabric you probably don't have internet or I don't know you were in coma in the last few months but yeah jok aside let's first take one step back and explain what is Microsoft Fabric and for all of you who know me you know that I'm a huge football fan or sucker for all of our uh friends from USA so in a football team you have
            • 26:00 - 26:30 players with different different skills let's say they play in different positions they they they they have different characteristics and so on for example you have very fast players who can play on wings and utilize their speed to create chances for your team uh then you have strong and tireless players who can excel in defense finally we have players who are skilled to score goals we have a goalkeeper and so on now imagine yourself being a coach of
            • 26:30 - 27:00 this team so you have all these players at your disposal and your task is to establish the system that best fits your players with winning the match as the ultimate goal so think of fabric as the data football team and Microsoft provides you Microsoft provides you with uh players and your task is then to integrate these players into the system that will ensure success and these players players are already well known
            • 27:00 - 27:30 some season Stars uh such as I don't know Azure data Factory or Azure synapse analytics or powerbi but there are also some fresh faces some newcomers that should bring fresh blood to our team and uh we will examine most of these players throughout this session at this moment it's important to understand that Microsoft fabric is not some new fancy tool it's essentially a site of individual analytical tools and services
            • 27:30 - 28:00 that work together in Synergy to provide a unified analytics experience now what does unified mean in the context of Microsoft fabric first and foremost it's about a unified experience whether you look at a single workspace experience single signon experience single storage format all the way to single experience for managing security and collaboration so when we talk about the players in
            • 28:00 - 28:30 Microsoft fabric as of today and I need to repeat this sentence you will hear me repeating this sentence as of today because things are changing uh very fast in Fabric World and uh if this session was delivered three or four weeks ago before ignite then this part here databases would not be here so this one was introduced uh at ignite and thanks to Microsoft for making me uh uh forcing me
            • 28:30 - 29:00 to make changes to my slides every few weeks but yeah never mind so those are the players in Microsoft fabric we have one Lake as the Central Storage repository of the entire platform we are going to talk about one Lake in more detail very very soon and then co-piloting Fabric and Microsoft preview it's out of the scope definitely of this session because it's also out of the scope of the of the exam but also when it comes to different uh let's say uh uh vs to process and handle
            • 29:00 - 29:30 your data uh from a perspective of compute layer there are different experiences that you can see on the top like data Factory realtime intelligence databases and so on we will cover some of them those that are relevant for dp600 exam so what is out of the scope of this session and what is out of the scope of dp600 I'm not saying you don't need to learn this but if you are just preparing uh uh
            • 29:30 - 30:00 proposal for passing dp600 you don't need to know databases you don't need to know uh industry Solutions you don't need to know co-pilots you don't need to know Microsoft preview so what's in the scope data Factory experience real-time intelligence analytics which includes under this analytics umbrella we have multiple different things that we are going to examine obviously powerbi we already mentioned that you need to have advanced skills in powerbi and some Concepts that are relevant to one lake
            • 30:00 - 30:30 so not necessarily everything that uh that that one lake is but certain Concepts which uh which comes into the play which comes into play when we are talking about the uh role of analytics engineer and how you can leverage this role uh by using using data from stored in one Lake okay so let's start with the basics and let's break down some of the key terms that you will hear when working
            • 30:30 - 31:00 with Microsoft Fabric and I thought it would make sense to structure this part uh hierarchically so we'll kick it off from the highest uh level which is a fabric tenant a fabric tenant is a dedicated space for organizations to create to store and manage different fabric items there is often a single instance of fabric for the organization and it's usually aligned with entry ID or formerly known as Azure active directory the fabric tenant maps to the
            • 31:00 - 31:30 root of one Lake and this is at the top level of this hierarchy then we have a capacity capacity represents a dedicated set of resources that is available at a given time to be used uh one tenant can have one or more capacities which are associated with this tenant and capacity defines the ability of resource to perform an activity or to to produce certain output different items can
            • 31:30 - 32:00 consume different capacities at a certain point in time and fabric offers capacity through fabric SKS so F SKS and trials so you can play on your own if you don't want to buy capacity you can use trial but we will also come later to uh to talk more about that then we have the next level is a domain domain is a logical grouping of workspace and domains are used to organize items
            • 32:00 - 32:30 in a way that make sense for your organization to be honest uh data mesh approach that is that became extremely popular in the last few years uh introduced this concept of domains and uh fabric follows simply follows this path of providing uh tools to support data mesh approach in your organization meaning you can create a domain for different departments in your organization such as I don't know sales or marketing or HR and so on and these
            • 32:30 - 33:00 domains are just a logical group of different workspaces since I mentioned workspace workspace is the next in this hierarchy level a workspace is a collection of items that brings together different functionality in a single tenant it behaves it acts as a container that leverages capacity for the work uh uh that is executed and also serves as uh uh control center so you you uh you
            • 33:00 - 33:30 control the access uh who can access uh items in the workspace by implementing certain uh certain uh permission policies that we are going to examine also throughout this session for example I will give let's let's go with a simple example in a sales workspace users associated with the sale within with the sales uh organization can create different items such as Warehouse I don't know they can run no books create semantic models and so on and at the
            • 33:30 - 34:00 lowest level of this hierarchy is an item or fabric items and essentially they are the building blocks of the entire platform so let aside now tenants and capacities and whatever without items without fabric items all this story doesn't make much sense so items are the building blocks of fabric platform so those are the objects that you create in you manage in Microsoft Fabric and there are different types of
            • 34:00 - 34:30 items such as data warehouses lake houses pipelines semantic models many many more we are going to examine a lot of them today but uh please keep in mind that although we are covering some of those items that are relevant for dp600 there are many more that exist outside of what is what is mentioned uh in this session today okay so let me quickly take a look look at the chat I see everything is
            • 34:30 - 35:00 fine so far okay good then we are ready to jump into our first big Su lesson which is called maintain a data analytic solution and this part consists of two subtopics Implement security and governance and maintain the analytics development life cycle so let's go you see uh I took this from the official Microsoft skill measured for dp600 uh uh
            • 35:00 - 35:30 website which is also included as part of the presentation uh so we will go and check one by one all of these all of these uh points that are uh listed in both uh implementing security and governance and maintaining analytics development life cycle so let's start first with implementing security and governance you see which subtopics are uh included here so let's start with under understanding different levels of control in Microsoft
            • 35:30 - 36:00 fabric uh obviously this platform is is is humongous it's huge and there are different levels where you can control who can access or who can see what uh so let's start with the highest possible level which is a workspace level and let's imagine that we are in the company in a huge company and they have a lot of workspaces a lot more than these four but for the sake of simp licity and understanding let's say that they have
            • 36:00 - 36:30 four different workspaces a b c and d now in each of these workspaces as we already learned in the previous slide there are multiple items that reside in these workspaces and uh for example in workspace a we can have a warehouse and a Lakehouse then in workspace ah I forgot to include the line here so let's say in workspace C we have an event house in workspace D we have semi IC model probably many of them but again
            • 36:30 - 37:00 for the sake of Simplicity I just want you to be able to distinguish between different fabric items and then we have different objects that can exist in the specific item so for I can give you a simple example in my warehouse I will probably have tens hundreds of tables some views that I also created uh within the lake house I also have a table which is in Delta format and so on and so on so this is an individual object level so
            • 37:00 - 37:30 there are many objects potentially many objects and commonly many objects that exist in each of those fabric items and finally the lowest level or let's say uh uh if we talk about the granular axis that you can control is the row level or or or column level uh axis this means in dim product we can have I don't know 25 different columns and we want to restrict access to certain columns for a
            • 37:30 - 38:00 specific group of users like product color for example that's the lowest level of uh granularity for controlling who can access what so this is from a high level perspective how uh uh how fabric workloads are structured in terms of uh giving the access and providing the access to uh these different levels so how access control work in fabric uh again let's take a simple example I have
            • 38:00 - 38:30 a workspace a and I have three different items in this workspace I have a semantic model I have a Lakehouse and I have a pipeline now if we control the access to the workspace level if we assign user or group of users to a certain role and we will discuss what are what roles exist currently they will get the access to all the items in the workspace So based on their they will get access to all the items in the workspace that's the
            • 38:30 - 39:00 workspace level access whereas item level allows me to Define uh a single individual item within the workspace that can be accessed by a certain user or group of users so it allows me to go one step one level more in this uh granular control and say look uh I don't want to give you access to the workspace level because then you will see semantic model and pipeline you just
            • 39:00 - 39:30 need access to a Lakehouse I will give you uh access just to a Lakehouse from this workspace so those are from at from a from let's say 10,000 ft high level perspective two different options for uh granting access in Microsoft fabric now we will examine uh uh in more nuances how how this works let's start with the highest level which is workspace access control as I already mentioned there are four different roles that uh can be defined
            • 39:30 - 40:00 on a workspace level and those are admin member contributor and viewer now what I strongly suggest uh you to do uh for taking dp600 is to uh try to learn and memorize what each of these role can do on a work space level because there will be for sure there will be some question that will ask you uh user a needs to do that what is the uh minimum permission
            • 40:00 - 40:30 or minimum uh uh let's say the the role with least privilege that can allow them to do that so try to remember this uh uh table it's taken from Microsoft's official documentation so you can also go there and and try to find it if it's easier for you but definitely something that is super important and that is 100% tested on on dp600 so workpace space roles those are four there are
            • 40:30 - 41:00 four workspace roles you can see which of them are here remember when you assign user or group of users to a specific role they will have access to all the items in the workspace so if you assign someone to a viewer role they will see all the items in the in the workspace they they can't change them but they can see them okay for others depending on uh on the role they can also do many more things so please be aware when
            • 41:00 - 41:30 you when you assign those uh when you assign those roles because yeah in real life you should think twice yeah and again it's uh thinking about uh assigning an individual user or entri ID or M365 group of users in theory you can do both but in real life especially in huge organization you probably don't want to assign individual users to rolls uh it's a recommended practice to handle
            • 41:30 - 42:00 this uh through enter ID or M365 groups so you first assign user to an entry ID or M365 group and then you assign this group to a specific role it's much easier to to control them okay so then we are moving to the next level which is item level access control and item level permission is used to control access to individual fabric items within a workspace uh item permissions are relevant to a particular
            • 42:00 - 42:30 item and don't apply to other items so unlike with workspace rols this one is relevant just for a specific particular item and you should use item permissions to control who can view who can modify and manage individual items in a workspace uh why is this important because you may want to use item level permission model to give users access to a single it them in a workspace that they don't have access to so I don't
            • 42:30 - 43:00 need to have access to workspace a but if someone shares the louse from workspace a with me I will have access to this Lakehouse when you are sharing the item with the user or group you can configure uh different item permissions and sharing an item grants the user the read permission for that item by default uh read permissions allow users to see the metadata for that item and view any reports that are uh connected with that item however read permissions don't
            • 43:00 - 43:30 allow users to access underlying data in SQL or one Lake different fabric items have different permissions in this example I'm showing you the item permission model of the Lakehouse item and I strongly suggest you uh going to this link item permission model then you will see for each specific item that can exist in Microsoft fabric uh which permissions can be uh can be granted and what is the effect of giving those
            • 43:30 - 44:00 permissions so granting people access here for example on this lake house uh if I don't change check anything uh this means they can access lake house from one Lake Hub uh but the uh uh no tables uh can be accessed so just uh lak house but none of the tables now depending on what you set here as permission granted while sharing read edit share this one is interesting read all with SQL analytics endpoint this means as you
            • 44:00 - 44:30 will you see in this uh uh table on the right hand side they can read data from SQL analytics endpoint from Lakehouse or warehouse and so on also read with apach spark meaning that they can use spark to read Lakehouse data and through also through Lakehouse Explorer so again uh this topic alone can be like 60 Minute uh therefore I included this link and I strongly encourage you to go there
            • 44:30 - 45:00 and try to understand this concept of uh item level access control the next concept is RO level security we are talking not about Ro level security in powerbi this is not a topic for dp600 this is included I think in pl300 which is parbi data analyst exam but we are talking about implementing Ro level security in the warehouse and SQL analytics endpoint of of the lake housee and implementing Ro level security there enables you to
            • 45:00 - 45:30 control access to rows in the table based on user roles and based on predicates to implement role level security you need to ensure sure that certain prerequisites are in place uh the first step is to Define uh security policy so it starts with determining the roles and predicates you want uh to use to control the access and roles Define so roles Define who can can access the data whereas predicates Define criteria
            • 45:30 - 46:00 for access this means security predicates are conditions that determine which roles user can access you can create these security predicates as you see in this example by leveraging inline table value functions uh in tsql language so the first step I'm creating a separate schema for security and then this is the function where essentially I'm taking uh the username of the user who loged in and then uh I'm just using
            • 46:00 - 46:30 this value to filter out the records that this specific user can see and then finally I'm using this function to create a security policy so once again this is RO level security on the level of the warehouse or SQL analytics endpoint of the lak housee this has nothing to do with uh roll level security that you establish on semantic model for powerbi to to be able to create Ro level security uh you either need to be a
            • 46:30 - 47:00 admin member or contributor on the workspace or to have elevated permissions on the warehouse or SQL end point so there are certain uh permissions that are specifically granted for uh working with warehouses and SQL analytic end points so th those are elevated permissions uh that means elevated permissions then the next concept is object level and column level security so object level security enables uh access restriction on the database
            • 47:00 - 47:30 object level such as table for example whereas column level security allows you to restrict access to specific Columns of the table so it's more granular than object level security please keep in mind that object level and column level security are applied on a database level which means that whenever any application or reporting platform including powerbi tries to access the data warehous or SQL uh endpoint in the lake house these rules are applied these
            • 47:30 - 48:00 rules are enforced this means if you implement if you enforce object level row level or column level security in a warehouse or SQL analytics endpoint of The Lakehouse if you have a powerbi semantic model in a direct Lake mode all the queries will automatically fall back to direct query please keep this in mind and this can be used as a trick question uh or something like that in
            • 48:00 - 48:30 dp600 I I'm not sure if that's going to happen but this is one uh I I wouldn't say downside of doing object level and column level security and R security obviously this is not a downside but more of a consideration and limitation in uh the current uh uh uh architecture and way of working with direct Lake semantic mold so when you implement this there is no direct Lake okay so Direct query will be enforced again what are the necessary
            • 48:30 - 49:00 permissions to uh create those three and then we have folder level uh Access Control uh data access rooll security uh uh only applies to users that are accessing one Lake directly fabric items such as dat SQL analytics endpoints or semantic malls and warehouses they have their own security mod and access one leg through a delegated identity this means that users
            • 49:00 - 49:30 can see different items in each workload if they're given access to multiple items so in this case we are talking about uh what is currently supported currently it's supported only for The Lakehouse item but again as of today so uh yeah this will probably be supported for other items uh in the future currently it works only for lake house and this is uh from the user interface perspective when you define a new role
            • 49:30 - 50:00 uh in a Lakehouse you see that feature is still in preview uh and this is for example I'm creating a role which is called read Adventure works and then I uh uh uh I'm including only tables that are relevant for adventure Works database I also have some tables from Koso database but this role here when the user belongs to this role when they try to access when they open this L house they will not see contoso tables at all so first you create a role then
            • 50:00 - 50:30 you choose the tables from uh or files you can also Define for files from Lakehouse and then finally you assign role to user or a group of users which is a recommended way again okay so the next topic is sensitivity lab labels uh sensitivity labels uh they are defined in Microsoft preview information protection on items the and the idea is to protect your sensitive content against uh
            • 50:30 - 51:00 unauthorized access obviously so sensitive sensitivity labels are one of the most important components in helping your organization to meet its governance and compliance requirements and labeling your data correctly with sensitivity labels ensures that only authorized people can access the data there are two ways to apply sensitivity labels in Microsoft fabric and before I show you those two uh keep in mind that they need
            • 51:00 - 51:30 first to be created in preview so you can't create sensitivity label in within the fabric itself you need first to go to a perview you create sensitivity labels and then fabric will uh offer you to use uh to choose those uh sensitivity labels that were previously created in perview so two ways to apply uh you can either choose between using a fly out menu or directly on the item so you open the item and in item settings like in
            • 51:30 - 52:00 this case I have a kql database and under sensitivity label I can set set different sensitivities in this example it's confidential so that means that uh everyone who tries to uh access this data Downstream either through directly quering this data or using a reporting reporting platform or so on uh this sensitivity label will be propagated and applied and finally we have endorsement
            • 52:00 - 52:30 uh concept of or feature of endorsement that makes it easier for users in the organization to find high quality or trustworthy content and data uh you can easily recognize endorsed items in a workspace because they are clearly labeled and they have a badge when we talk about the badges for uh endorsement there are three different badge types first first one is promoted with promoted item
            • 52:30 - 53:00 creators so they think the item is ready ready for sharing and reuse so any Fabric or powerbi item except powerbi dashboards can be labeled as promoted and any user with right permission on an item can promote it so any user who can who has right permissions can do a promotion then we have certified certified that means that uh organization authorized reviewer has
            • 53:00 - 53:30 certified that the item meets uh organizational uh uh quality standards and can be regarded as reliable and is ready to use across the entire organization again any Fabric or powerbi item except powerbi dashboards can be certified now any user can request certification for the item but only users that are specified by a fabric administrator can actually certify items that's big difference compared to
            • 53:30 - 54:00 promoted and finally we have Master data uh this means the data in the item is a core source of organizational data that's usually used to uh uh indicate that data item is to be regarded as a single source of Truth for certain kinds of organizational or business data such as product codes or customer lists and so on uh the master uh uh data label can be applied to items that contain data
            • 54:00 - 54:30 such as lak houses and semantic moldes so you need to have physical data to be able to apply this master data uh badge and only users again only user specified by the fabric admin can label data items as Master data this is how uh this is a simple example of these badges in workspace so you see some of them are uh labeled as Master data some of them are labeled as certified some of them promoted and so on so this is how you
            • 54:30 - 55:00 will see those badges appear when you open uh the workspace we are done with the first subtopic not bad let me just take a look okay so I see that everything is running so far thanks everyone in chat I really appreciate the help and support I wouldn't be able to to uh monitor everything and and answer all these
            • 55:00 - 55:30 questions so thanks thanks again everyone okay so let's jump into our next subtopic which is maintain the analytics development life cycle and we are starting with very cool feature which is called Version Control for a workspace this means uh that you can now Define or uh uh synchronize the content of the workspace with your Source control system to be able to do this it's fairly easy and
            • 55:30 - 56:00 straightforward from an end user perspective and from uh uh web UI as you see there are just two things that you need to pay attention on Source control and git status so git status says that for example this report here is not committed and I have one warning sign in my source control uh uh B button which says that I have to pay attention to on something but again from an end user
            • 56:00 - 56:30 perspective that's enough so you can simply choose to synchronize the content of your workspace but I would encourage you especially if you're working in larger organizations where uh uh multi-developer teams work on on creating and managing content to try to learn and understand basic concepts that are relevant for Version Control so those are Concepts good Concepts like uh what is a brand what are commits what are pull requests what does it mean to merge something and
            • 56:30 - 57:00 so on uh currently for this featuring fabric there are two supported providers uh G in Azure rep repos and uh or let's say three GitHub and GitHub Enterprise what can be done so you can sync the content from git which will override the content that you currently have in the workspace then you can commit changes to git so do the opposite in the opposite direction and you can also Branch out uh the content of the workspace to a brand
            • 57:00 - 57:30 new workspace so that's also that's also possible currently almost all fabric fabric items are supported if there are some unsupported items this doesn't mean that the process of uh of uh uh committing changes or pulling request or or doing a pool request will fail no unsupported items will simply be ignored during the process and it's important to keep in mind that only workspace admin
            • 57:30 - 58:00 can manage connections I see some donations I really appreciate that thank you Victor thank you so much thank you I really appreciate that uh next topic that we are going to cover is uh Power by desktop project or uh among friends also known as pbip now this one is still in preview but if you ask me this is one of the greatest enhancements to powerbi in the
            • 58:00 - 58:30 last few years uh what is powerbi desktop project let's try to to to first to demystify this it introduces a new way to author and collaborate uh on powerb projects and when you are done with the work and you save it as a powerbi project there will be two separate folders I will show you the demo in a few minutes I know we all like to see things in action but yeah bear with me so report folder and semantic model folder and they're both saved as
            • 58:30 - 59:00 individual plain text files in a very intuitive form so let's break down key advantages of using pbip format when working with powerbi first is text editor support trust me I will show you in in a two minutes definition files are Json format text files which contain semantic model and Report metadata and these files are human readable uh while project files support Simple Text editing tools like notepad or notepad++
            • 59:00 - 59:30 of course it's better to use a let's say more sophisticated code editor like Visual Studio code or similar which provides a reach editing experience including git integration then scripting and editing definitions so you can create scripts using either tmsl or tmdl language so tabular model script language or tabular model definition language which is Now new standard again I will show you how it looks like uh
            • 59:30 - 60:00 Source control uh powerbi semantic wall and Report item definitions can now be stored in a source control system like git keep repeating this powerbi semantic model can now be stored in a source control system like G why is this so important so for years uh and still PBX the regular format that we know for parbi since parbi appeared uh it's just a binary file so you can't
            • 60:00 - 60:30 really do a Version Control on binary file you can't track changes and so on and this this is why powerbi desktop project is a big deal in in uh uh in powerbi so now you can do a proper Source control also for uh cicd uh operations you can use systems where developers in your uh organization submit a proposed change to the system and then the system validates the change and uh uh perform a number of quality Gates before applying the change to production
            • 60:30 - 61:00 system these quality Gates can include I don't know code reviews automated testing and then automated build and so on so let's quickly go and I will show you what I'm talking about so I'm in powerbi desktop and first and most important thing because powerbi desktop project format is still in preview so it doesn't allow you to save uh your power I work out of the box as uh pbip you need to go to file options and settings
            • 61:00 - 61:30 and then under options find preview features preview features and then powerbi project save option you can choose two additional sub items this one you probably don't want yet PB you don't want yet in real life uh and this one says tore semantic using this new tmdl format so you first need to enable this preview feature and then once you save
            • 61:30 - 62:00 your powerbi work as uh as uh uh as pbip what you get is the following so here what I did I saved my uh powerbi report as pbip and I have two different artifacts I have report and I have semantic model and now let's just open semantic model and for example under definition I can see different
            • 62:00 - 62:30 properties of this model like uh what is the compatibility level of this database then I can also see for example each individual table so if I click on collisions this is what I was talking about this is tmdl It's human readable and then you can see for this table you can see all of its columns with formats uh data type uh Source column and all the other
            • 62:30 - 63:00 properties defined for the specific column which means now since you have this in a text file uh and someone make a change you can easily compare two text files and see what has been changed let alone that you can apply change here save it back and it will be applied but for proper Source control this is really this is really a huge feature so it's just as I said it's just a it's just a text file so for every other table and every other also for for the report
            • 63:00 - 63:30 itself you can see all the details about the pages about the visuals and and uh everything is uh stored in this nice format which uh allows you to to easily manipulate okay deployment pipelines uh that's also one of the improvements that uh that Microsoft introduced not so long ago let's be honest I think deployment pipelines were introduced maybe three or four years ago uh for
            • 63:30 - 64:00 powerbi items and now since fabric is a thing uh and powerbi is is uh just part of the uh of the fabric deployment pipelines we are talking about deployment pipelines for fabric items so in a nutshell they represent an efficient and flexible way of automating the movement of fabric artifacts through different development life cycle stages with that in mind there are usually three stages or three environments of
            • 64:00 - 64:30 the development life cycle in fabric deployment pipelines I said usually you will you will understand soon why first one is development and you should use development environment for design for review for playing around with fabric content and so on this means you can start small you can use minimal data sets for development and once you are sure once you confirm that content is ready for review you can push it to the next stage which is test test this is a
            • 64:30 - 65:00 pre-production environment where you should test and verify that the content meets certain criteria in terms of quality performance and so on in this environment you should run tests on larger data volumes that are more realistic in production environment but also you should test for example if powerbi app uh works properly to confirm that it's fully ready for end users once you confirm that the content is at the level necessary to meet end users uh uh
            • 65:00 - 65:30 expectations you will push it to the third and final stage which is production and this is the final version of the content and it needs to provide the highest whoops apologize uh the highest possible level of quality and data accuracy please keep in mind again this structure is not set in Stone as you can Define between two and 10 deployment stages depending on
            • 65:30 - 66:00 your specific needs therefore I I told that this is a usual workflow to have a development test and production but I don't know if you don't need development if you just want test and production that's also fine if you want to introduce some additional layers in between that's also fine so between two and 10 different environments are supported for deployment pipelines and also what you can can do is you can choose between all versus selective deployment meaning I can either choose
            • 66:00 - 66:30 to move all the items from the workspace to the next environment or I can select just specific items if I have uh I don't know 100 different items in my development environment and I made made changes just on two or three of them I will push only those two or three items to the next stage no need to uh to do uh uh full deployment
            • 66:30 - 67:00 okay no moderator oh sorry uh give me a second uh I didn't see Johnny and Andy uh uh uh give me a sec give me a sec so just a moment so I need to assign them as moderators this will take me few seconds apologize apologize folks so Johnny and the now go while guys you have the
            • 67:00 - 67:30 power okay thanks uh I just spotted that sorry sorry about that okay let's move on with deployment pipelines and talk uh about uh uh them in more details so this is a usual workflow that uh uh happens that is typical for development life cycle management few things to keep in mind the content in workspace can be different in each stage for example if you take a Toral look at the size of the
            • 67:30 - 68:00 data source on this illustration on the left hand side you will probably notice that it starts with the smallest one it's a table and it size grows as we are getting closer to the production environment uh next you may also spot that the report which is represented as a chart icon here is also changing between the requirements so these are the things you can also Define separate workspaces for each stage so those are the things things you should keep in mind when defining deployment workflow for deployment pipelines uh obviously
            • 68:00 - 68:30 you can use rest apis and uh devops to to automate the process and you can also configure rules to allow changes we'll talk about uh defining rules uh on the next slide essentially what you also should keep in mind that uh once you deploy the content from the let's say development to test stage the equivalent existing content will be overridden to simplify if you have a semantic model named sales
            • 68:30 - 69:00 in your test environment and you deploy the sales model from the development environment the existing sales semantic model in the test will be overridden those are the items that are currently supported uh for deployment pipelines in Microsoft Fabric and again uh always check the list of supported items you see the link down down below maybe this link should be uh written in bigger font uh things are changing here
            • 69:00 - 69:30 rapidly again uh uh as as everywhere in fabric so supported items list of supported items is growing I can't say week over week but let's say month over month and this is the list of items that are currently support supported for deployment pipelines try again the same as for workpace roles try to memorize this list because you will be tested on dp600 uh for example you you implement a
            • 69:30 - 70:00 deployment pipeline you have this this and that in your workspace which of these items will be or will not be moved or to to the next stage therefore please try to memorize this okay and this is the usual deployment pipeline workflow this is a new uh user interface that was recently introduced and the once you start creating your deployment pipelines this is what you will see as an end
            • 70:00 - 70:30 user item pairing is uh the process by uh by which an item is in one stage of the deployment pipeline is associated with the same item in the in the next stage and paired items appeared on the same line in the pipeline content list items that are not paired for whatever reason appear on the line by themselves as as you see in this illustration I have a data data flow which appears only in the source but doesn't appear in the
            • 70:30 - 71:00 next stage so this one is not paired keep in mind that uh paired items they remain paired even if you change their names so paired item this this brings us to a conclusion that paired items can have different names that's one thing another thing to keep in mind regarding pairing is items added after the workspace is assigned to a pipeline are not automatically paired this means you can have identical items in uh
            • 71:00 - 71:30 adjacent workspaces that are not paired so you need to perform this afterwards those are some I think useful examples I took from Microsoft learn uh documentation uh which shows you uh uh different examples of when pairing occurs versus when when not so yeah again try to walk through this and uh uh understand item pairing is one of the
            • 71:30 - 72:00 very important Concepts in in deployment pipelines and finally I already mentioned deployment rules when you're working with deployment pipelines uh different stages may have different configuration settings for example each stage may have a different database or different query parameters also development stage might be used only to query sample data from a database while then you have for example t test and production stages to query the entire database uh configuring deployment rules
            • 72:00 - 72:30 enables you to allow changes to content when you deploy content between different uh environments for example if you want a semantic model in a production stage to point to a production database you can define a rule for that semantic model the rule is defined in the production stage uh under this specific semantic model and once you define the rule content deploy from test to production will inherit the value from deployment Rule and will always apply as long as this rule is not
            • 72:30 - 73:00 change and as long as this rule is valid so let's wrap up this first topic uh of maintaining a data analytic solution with a story about creating reusable assets or in powerbi or sorry first we are talking about performing impact analysis uh performing impact analysis allows us to understand and
            • 73:00 - 73:30 review the potential impact of changes that we are making to fabric item so there are two places where we can uh uh find or where we can perform impact analysis first is lineage view in the workspace which you see this icon in the bottom right corner and another one in item Details page so if you click on the specific item and then you open this lineage drop- down menu there is an
            • 73:30 - 74:00 option to see impact analysis once impact analysis once you click on impact analysis you can choose to see impact analysis from two different from two different angles or perspectives you can Browse by item type as you can see on the left hand side this means for example this data set will have an impact on five different reports and three different dashboards so this is by item type or if I do and check Browse by workspace then I can also see what is
            • 74:00 - 74:30 the impact of this data set in which workspaces this uh uh this will have an impact so by item type or by workspace and there is a cool option uh when you change something definitely to notify contacts to all the people who are listed as contacts for the certain item will be notified that there is a change and then yeah they can they can uh act
            • 74:30 - 75:00 accordingly so next topic semantic model deployment through the xmla end point so first of all what is an xmla end point let's take one step back before we explain the process of semantic model deployment so essentially xmla end point is uh the feature that allows communication with a semantic model so like PE we people Comm communicate between each others also different tools and different Services communicate between each other so semantic model has
            • 75:00 - 75:30 this xmla end point which uh which is exposed by semantic model and then different external tools can use this xmla end point to communicate with semantic model when I say communicate this means uh either reading the data from semantic model or reading and writing the data to semantic model so it's choosing between if you just want to allow that someone just access the
            • 75:30 - 76:00 data and read it or do you also want to allow changes from outside uh through this xmla endpoint by default it's read only in your uh fabric tenant uh but definitely with read write it allows more flexibility more options for managing model for configuring models for implementing some Advanced uh modeling modeling techniques and Concepts such as custom partition for example and so on and when we talk about XML end points there are two external
            • 76:00 - 76:30 tools two mainstream external tools that are being used uh in conjunction with semantic mods and xmla first is daak studio uh Dak Studio leverages read only uh xmla and point because it just reads the data from your semantic model whereas tabular editor in this case we are talking about the free version there are two versions of tabular editor tool and as a small digression I sincerely hope that you are using both deck studio and tabular editor if you work with the
            • 76:30 - 77:00 uh with uh tabular models uh so they are amazing tools again deck studio is completely free tabulator has two versions tabulator 2 which is a free version and tabulator three which is commercial version uh but yeah for most of your regular task tabulator 2 the free version is totally fine and with tabulator 2 you can with uh if read uh read write option on XML endpoint is enabled you can also save changes you
            • 77:00 - 77:30 can make changes to uh to your semantic model and uh apply those changes back and deploy this model uh from tabular edor you can deploy model directly to a workspace of course assuming that you have uh enough permissions to to do that and yes creating reusable parb assets I slightly mix uh uh the order of slides so we are wrapping up this topic
            • 77:30 - 78:00 now of maintaining data analytic solution so I'll go back and say when I was a parbi beginner that was at least that was my case one of the first lessons I learned was how to connect and get the data from various data sources remember dashboard in a day and all this stuff how to connect and get data into parbi so think of SQL database think of Excel files and then so on and you know what I found myself very frequently in a
            • 78:00 - 78:30 position of creating the same or almost the same semantic model it was called data set back then for many of my reports so I would go and grab I don't know sales table for example product table and customer table I had these tables in all of my reports now if I had to repeat this tedious task uh uh each and every time whenever I was creating a new report I assume uh other people also had to do the same thing now wouldn't it
            • 78:30 - 79:00 be great if we can somehow reuse this main set of data and then adjust it according to our specific needs and that's where the concept of reusable data sets or reusable uh uh semantic model in this case kicks in basically what is the idea the idea is to create one main semantic model let's call it a golden data set or golden semantic model and then enable multiple users to use
            • 79:00 - 79:30 this golden model as a starting point for building their reports and it's not only about reusability once you confirm that data set or semantic mod contains accurated curated data what you're are doing going back a few slides before we can label this data set as promoted or certified so that others within the organization uh know that this semantic model was checked and confirmed for
            • 79:30 - 80:00 further usage from there you can open a new powerbi desktop uh file and use this data set as a source for your new report I will show you this in a few seconds other than that what you can do uh you can also leverage powerbi template file essentially when you're done with your work in powerbi desktop and you want to save it instead of saving it as a regular PBX file you can choose to save it as a powerbi template or pbit t
            • 80:00 - 80:30 abbreviated the only difference between these two is that the template file doesn't contain any real data it contains only metadata so once you or someone else opens uh this pbit file they will need to refresh data in it to be able to see the real data in uh powerbi desktop then we have PBI DS or PBI data source file which includes all the connection details that are used in
            • 80:30 - 81:00 a specific file and the purpose of this format is to quickly move connections from one solution to another usually especially when you're building some complex reports you have multiple connections within the file and instead of going and going uh and opening and creating all these Connections in a new uh PBX file you can simply take this PBI DS and uh uh export from PBX and uh import this into into your new file and
            • 81:00 - 81:30 share models the concept that we already covered where multiple users can leverage models that were already created now of course there are various permissions here uh that can be additionally uh uh that can be introduced to additionally control what happens uh and what are the the the uh rights for further uh usage of this data so can modify share on wordss and so on I want to quickly show you just in uh
            • 81:30 - 82:00 how to use a shared semantic model so I'm in power by desktop I imported some data here but let's just don't uh uh Focus now on this this will be used for uh later demos so if I go here on the top and select get data under get data I have the option to use powerbi semantic models so those are the models that someone all already created deployed to powerbi workspace and now instead of going and creating the same model over
            • 82:00 - 82:30 and over again I can simply click here powerbi semantic models and then all the models where I as a user have access right can be used to uh to build a report on top of it let's use something very simple so I'll connect to this CMS import model so now powerbi establishes a live connection to this semantic model it will show me all the tables
            • 82:30 - 83:00 that currently exist so I can Define also if I don't need all the tables just some of them let's say first three or let's take this one and then submit so it will allow me to connect to this model and then build a report on top of it this report is socalled thin report because it doesn't contain its own own semantic model so it leverages another semantic model which is deployed in powerbi workspace we are just
            • 83:00 - 83:30 building a report without the model itself and in case that I now build a report with you by using tables from this deployed semantic model and I click here on publish button this will publish report only usually when you import the data and you publish to a workspace you get two items you get report and you get semantic model in this case just report will be published and this report in the background points to a shared semantic model
            • 83:30 - 84:00 okay good we are we covered the first part we covered the first part so we are ready for the biggest portion of this exam which is prepare data let's do like a 15 minutes more of of presentation and then we'll do a short 5 to 10 minutes bio break for stretching legs and uh taking coffee or whatever so yeah let's let's jump into
            • 84:00 - 84:30 prepare data part this one is the biggest one and uh previously query and analyze data was uh a subtopic on its own now it's Incorporated in into this prepare data and now we have three three huge subtopics within this big one get data transform data and query and analyze so let's start with get data obviously the first thing we need to do when we start building our Solutions is
            • 84:30 - 85:00 to create a data connection now depending where your data is there are different types of sources two main groups of data sources are cloud data sources and on premises data sources for cloud data sources like SharePoint SharePoint data verse or Azure SQL DB or ADLs and so on uh you can simply connect from your fa from your fabric environment you can simply connect to them obviously again you need to provide
            • 85:00 - 85:30 uh credentials and uh uh all the details for connecting to dat data source but uh you don't need to in introduce any additional components when you're connecting to Cloud sources which is not the case when you're connecting to on premises data sources such as CSV files Excel files SQL Server I don't know Oracle database and so on for on premises data sources to be able to connect to these data sources you need
            • 85:30 - 86:00 another piece of software which is called on premises data Gateway it's free you just need to download it and you just need to keep it up to date because it's uh for all of you who work with powerbi you're probably already familiar with on premises data Gateway it's also being updated frequently I think almost the same as parb desktops so every month uh essentially uh on premises data sour sources on premises data Gateway is uh a software that
            • 86:00 - 86:30 behaves as a bridge between your fabric environment and your on Prem data sources so it checks all the credentials it checks all the queries it passes the queries together with credentials and then once the data is uh retrieved from on Prem data source it's collected on a data Gateway uh machine and put pushed back to fabric so the main difference between uh creating a connection to Cloud sources and on Prem data sources
            • 86:30 - 87:00 with on Prem data sources you need to have on premises data Gateway software installed now how we can connect to uh World outside of fabric there are different options in Microsoft Fabric and as soon as you start working with fabric or uh when you start learning about fabric you will realize that fabric is all about the options so one same thing can be done in multiple
            • 87:00 - 87:30 different ways so almost every task every operation can be done in multiple different ways and it doesn't necessarily mean that one is better than the other for some of course in some circumstances we know which way to go but sometimes it boils down just to your preference and creating a data connection is not an exception so we can create a connection to different data sources by using different items in fabric we will talk uh in this session
            • 87:30 - 88:00 we will talk about creating a data connection by using pipelines by using data flows Gen 2 and by using notebook of course you see some other items uh mirror database is not part of dp600 but I wanted to include just to give you heads up that there are additional items in fabric that can be used to create and establish data connection okay when we talk about ingesting or accessing data in Microsoft fabric uh
            • 88:00 - 88:30 let's start with this high level overview uh this is again 10,000 ft High uh uh overview of fabric platform in general in the central part this probably looks familiar so we have one Lake we have different compute engines like spark tsql k AQL and Analysis Services engine and we have different workloads or different experiences here
            • 88:30 - 89:00 at the Top This is a little bit outdated because there are no databases but never mind for for what I want to uh explain here it's absolutely enough so this is core fabric let's say core Fabric and when you ingest the data in fabric it will be ingested here within one Lake physically but then there are other options to bring data or to leverage data let's say not bring data but
            • 89:00 - 89:30 leverage data from external services and external sources within fabric the same way as they were natively stored in fabric it's look it looks a little bit confusing but let me explain one by one and then then you will you will uh uh understand for sure so let's go from the left hand side we have this mirroring here on the left hand side mirroring essentially allows you to uh establish con connection to different non-fabric data sources
            • 89:30 - 90:00 currently again as of today we have five different data sources that are supported for mirroring feature Azure SQL DB snowflake Azure SQL managed instance dat Azure data bricks and Cosmos DB so with mirroring essentially we are creating uh a replica of the data that exists in this system we are creating near realtime replica within Fabric in this case with mirroring we
            • 90:00 - 90:30 are creating a physical copy of the data so when you mirror data from let's say Azure SQL DB in your fabric uh environment you will have a physical copy in in near real time you will have a physical copy of this data in Microsoft fabric that's how mirroring Works open access apis let's keep this for moment because yeah it's uh something that you probably won't use on a daily basis as an analytics engineer probably not but yeah I want to focus on
            • 90:30 - 91:00 this concept here at the bottom which is called multicloud shortcuts and this is one of the in my opinion this is one of the greatest features in the entire fabric so what shortcuts allow you to do is to create a virtual copy of the data uh in Fabric and use this data the same as it was stored in fabric currently uh from external data sources
            • 91:00 - 91:30 you see what is supported ADLs asual data L Storage Gen 2 Amazon Google Cloud platform and data verse now let's go and talk about how all roads in uh Microsoft fabric lead to you know this all saying all roads lead to Rome in fabric case this is not Rome this is one lake so all your data uh in the end uh uh ends up in one Lake that's why we said at the beginning when we discussing the overall
            • 91:30 - 92:00 architecture of fabric that one lake is the Central Storage repository for all the data that you plan to use within the organization so no more data silos I store uh uh part of my data here part of my data there then I need to integrate this uh and so on here the idea is that all the data is available Direct L from one Lake either physically stored in one lake or either through some data virtualization which is achieved for
            • 92:00 - 92:30 example by these shortcuts so essentially what shortcuts are you know shortcuts of course you all know shortcuts in Windows File Explorer we all have some uh uh for document or any other document stored somewhere on our D drive under the the series of subfolders and instead of each and every time going to this sub folder uh we create a shortcut on our desktop and we
            • 92:30 - 93:00 double click and we access the content of this file right so that's how shortcut work in Windows File Explorer it's very similar in fabric so the concept is very similar so again your data think of this in terms of analogy I talk about the file now the data lives somewhere else and when I say somewhere else I I'll explain where this somewhere else can be so data lives somewhere else but by creating shortcut we can leverage
            • 93:00 - 93:30 this data as it was uh natively stored within fabric so it's a pointer to a data that is stored somewhere else and then when we need it for combining with data that is already stored in our uh fabric environment we can simply pull the data directly from its original location from its data source now there are two types of shortcuts in fabric internal and external internal
            • 93:30 - 94:00 Shortcuts Point to uh another fabric items I will explain this and show you in a demo what I'm talking about so when we talk about internal shortcuts we are talking about leveraging data from other fabric items and when we talk about external shortcuts those are systems that are external to fabric like ADLs Gen 2 or Amazon S3 or data Google Cloud platform and so on how shortcuts work let's say I have workspace a and I
            • 94:00 - 94:30 have two lakehouses in this workspace I have Lakehouse a and I have Lakehouse B and I have some tables in my Lakehouse a I have some tables in my Lakehouse B so what I can do here is I can create a shortcut to a lak house B and use a short cuted version of this table B within my Lakehouse a you will recognize this by this uh there is a icon like a chain icon that uh shows that specific
            • 94:30 - 95:00 table is in fact a shortcut to some other data destination so I can use this Delta table the same way as I can use those two which are natively stored in my Lakehouse a that's that's a scenario with one workspace what if if I have another workspace workspace B and I have l how C and table C again so I can do a cross workspace uh
            • 95:00 - 95:30 uh shortcut and if I have a proper access obviously to to Lakehouse C in this uh workspace B I can create a shortcut to table C and then I can leverage all these four tables like they were natively stored within my Lakehouse a so physically data stays in its original location it's a pointer and then when I create my queries this data will be loaded here uh and used for for those queries let's go to fabric and
            • 95:30 - 96:00 create shortcut just to show you this in action so I'm currently in uh uh on the homepage of my uh Microsoft Fabric and then let's go to data engineering and I'll open let's open this dp600 boot camp so as you see I'm current I currently have different fabric items in this work space I have a lake housee I have a warehouse event house we will talk about that later and for example let's say I
            • 96:00 - 96:30 want to create some Shortcut shortcuts in my dp600 boot camp Lakehouse as you see I have some tables that I already loaded here so those all of those tables are physically ingested and loaded into Lakehouse dp600 boot camp I can simply click on three dots and say new shortcut this will laun the wizard and these are the things that I mentioned internal sources external
            • 96:30 - 97:00 sources so you see what is currently supported again things are changing in this area as well so uh probably we will have shortcut supported for other systems in the future but let's keep it simple here I'll go to another lake house let's take let's take this one dp600 L LH which is in a different workspace so this one is in dp600 playground workspace and from here I can choose tables and files that I want to uh where
            • 97:00 - 97:30 I want to create shortcuts let's say I need this contoso two tables dim currency dim customer so I click next and then create and then after a few seconds these two tables are here you see the small chain icon in the top left corner this means that this table is a shortcut and basically this one
            • 97:30 - 98:00 points to a location that is outside of this Lakehouse so it's another it's a different Lakehouse different workspace but from here I can combine this data with my existing tables existing Delta tables in this lake house the same way as this data was natively stored here so the concept of shortcuts is super super powerful and really enables you to uh uh push the limits when it comes to uh
            • 98:00 - 98:30 combining data without necessarily physically moving it from one place to to another okay uh I suggest we do a short break uh uh of like I don't know 10 minutes and then we jump into into pipelines so yeah stretch your legs refresh and see you in
            • 98:30 - 99:00 [Music]
            • 99:00 - 99:30 [Music]
            • 99:30 - 100:00 [Music]
            • 100:00 - 100:30 [Music]
            • 100:30 - 101:00 [Music]
            • 101:00 - 101:30 [Music]
            • 101:30 - 102:00 [Music]
            • 102:00 - 102:30 [Music]
            • 102:30 - 103:00 [Music]
            • 103:00 - 103:30 [Music]
            • 103:30 - 104:00 [Music]
            • 104:00 - 104:30 [Music]
            • 104:30 - 105:00 [Music]
            • 105:00 - 105:30 hello hello and welcome back hope you
            • 105:30 - 106:00 used this 10 minutes to properly refresh
            • 106:00 - 106:30 your brains and that you ready for more
            • 106:30 - 107:00 Fabric and dp600 goodies so yeah uh
            • 107:00 - 107:30 let's proceed with pipelines uh
            • 107:30 - 108:00 pipelines in Microsoft fabric uh
            • 108:00 - 108:30 encapsulate a sequence of activities
            • 108:30 - 109:00 that perform data movement and processing tasks uh you can use a pipeline to Define data transfer and transformation activities and orchestrate these activities through control flow activities that manage
            • 109:00 - 109:30 branching looping and other uh typical processing logic uh before we proceed Thanks Martin thanks for donation I really appreciate that thank you uh the graphical pipeline canvas in the fabric user interface enables you to build complex pipelines with minimal or no coding required uh now activities that's the key Concept in uh pipelines and activities are essentially
            • 109:30 - 110:00 the task the tasks that we are executing in a pipeline uh you can define a flow of activities by connecting them in a sequence and the outcome of particular activities uh like success of or failure or or completion can be used to direct the flow to the next activity in the sequence there are two broad categories of activity in a pipeline data transformation activities so those are the activities that encapsulate data transfer operations including simple
            • 110:00 - 110:30 copy data activities that extract data from source and load it to a destination as I will show you in a few minutes uh as well as more complex data flow activities that encapsulates uh uh data flows gen two and that apply sometimes complex transformations to the data as it is transferred from the source to uh to to fabric e and also we have control flow activities those are the activities that you can use to
            • 110:30 - 111:00 implement Loops conditional branching or manage variable and parameter values since I mention parameters uh they are also available in fabric pipelines and finally you can schedule so you can with parameters I forgot to mention this you can parameterize uh different uh uh uh properties in uh in pipelines and activities and also you can schedule runs of uh fabric pipelines obviously it's not a good idea to go I don't know every one
            • 111:00 - 111:30 hour and clicking manually to execute the pipeline you want to schedule the run and uh depending on your business needs you can schedule pipeline run to run I don't know every 10 minutes every 30 minutes and whatsoever or once per day and so on so some of the common activities in uh in a pipeline first and most important is copy data that's one of the most common use cases of data Pipeline and many pipelines consist just of a single copy data
            • 111:30 - 112:00 activity that is used to ingest data from an external Source into fabric Lakehouse uh uh and bring this data in so the copy data tool that uh is very intuitive and straightforward way so when you add a copy dat activity to a pipeline this graphical tool will walk you through all the steps required to configure the data source and destination for the copy operation wide Source wide range of source connections
            • 112:00 - 112:30 are supported making it possible to ingest data from most common data sources uh before we move on to data flows I just want to show you how easy is to create a pipeline in Microsoft fabric so I'll go to my uh dp600 boot camp and here at the top I'll select new item and then here it is Data pipeline it's here so I'll click on Pipeline let's call this one pipeline live stream
            • 112:30 - 113:00 dp600 and I'll click on create so when you create a new pipeline it's just a shell it doesn't contain anything it doesn't do anything you need to provide certain activities to be able to to get something out of this Pipeline and this is what I mentioned you have a copy data assistant so if I click on copy data assistant this will walk me through all the steps necessary to bring the data from external data source into Microsoft
            • 113:00 - 113:30 fabric also I can bring from another fabric item so I can bring data from another uh Lakehouse or Warehouse uh into into into uh different louse but in this case let's connect to a SQL Server database so I'll go to my I'll provide the name of my local instance of SQL server and and let's connect to Adventure Works database since I already created this connection previously and when we
            • 113:30 - 114:00 talk about Connections remember in this case because I'm connecting to a SQL Server this is on Prem data source I need on premises data Gateway installed on this machine which I obviously have so I will now connect to my uh Adventure Works database and then from here I can write queries if I want so I can combine data from multiple tables use a query and bring this data in a form of Delta table
            • 114:00 - 114:30 into Fabric or I can simply go and choose uh between tables that already exist here let's keep it simple I take this de employee for example just a single one so essentially copy data is a fairly straightforward tool low code no code approach that allows you to bring data quickly into Microsoft fabric so now I can choose to create a new louse but in this case I want to use the
            • 114:30 - 115:00 existing one so I'll use this dp600 boot camp I can choose to store this table in files or tables area we will examine what is the difference between files and tables area later loading to existing table or loading to a new table so let's load to a new table and now this uh uh this copy data uh uh assistant will try to figure out what are the proper data types because not all data types uh that exist in
            • 115:00 - 115:30 certain data sources are supported in fabric so for example there is no Nar in spark so this will be translated internally to string data type for spark but don't bother too much yourself with it let's keep it simple and then let's click on next and then save and run so now I have my pipeline either uh although it's super simple I have just one single activity and that is copy data here in the bottom I can uh
            • 115:30 - 116:00 configure different properties like source so I can parameterize this destination I can remap in different way change data types and so on and automatically once you once you create this pipeline uh this one will uh execute and run automatically what I mentioned PR previously here under run you can schedule the Run of this pipeline or you can run it manually you always get the the message with
            • 116:00 - 116:30 information if the pipeline was successfully executed or not so under the the Run tab here on the top you can choose to manually execute the pipeline or you can schedule it under activities you see a whole bunch of different activities that you can use so let's say that for example once my data is in the Lakehouse I want to perform certain Transformations by using notebook so once this is successfully completed so
            • 116:30 - 117:00 on success it can be also a question on dp600 what is the difference between on success and on completion on completion uh doesn't take into account if this specific activity was successfully executed or not it's just completed successfully or unsuccessfully it's completed and then uh uh for this uh you need to uh you need to Define uh what
            • 117:00 - 117:30 happens there uh thanks Thomas for the for the donation really appreciate thank you and once this is successfully completed in within this notebook of course now I don't have I still don't have notebook but then I can schedule the notebook to run to do some data Transformations maybe to load the data into some other table or some other Lakehouse and so on I can also control with if conditions so for those of you who are old enough and I know that there are some people here
            • 117:30 - 118:00 who are old enough like me uh if you remember I I wouldn't say Azure data Factory because it's not old enough but if you remember ssis or how was the tool called before ssis ah it's on top of my tongue but I can't remember it's DTS it was abbreviation never mind never mind I I'm I'm too old so I keep forgetting uh things so essentially if
            • 118:00 - 118:30 you work with all these data orchestration tools uh from Microsoft or from other vendors you will feel very comfortable here within fabric pipelines if you're not sure uh if you're not DTS it's uh uh it's data transformation service or something like this I I forgot but it was DTS not sure if if if we have some older people here who remember that that was really before ssis okay never mind I will find it uh
            • 118:30 - 119:00 during the during the next break uh so here under three dots you can see all the activities that are available and one of my favorites is semantic model refresh meaning previously uh we had to do some gymnastics and I don't know what type what kind of different things to figure out where our when our ETL process is finished we want to refresh our semantic model as our as the last step of this
            • 119:00 - 119:30 ETL uh pipeline basically now you don't have to figure it out on your own so you can simply add at the end of your pipeline you can add this activity semantic model refresh you define which semantic model you want to refresh and once the entire process is completed you your you refresh your semantic model and uh your and users will be happy to have the latest data available in their powerbi reports so this is really how simple and easy is to use uh pipelines
            • 119:30 - 120:00 of course in in reality you may want to introduce more complex Solutions uh and usually you want to build some maybe something that is more metadata driven than just uh uh hardcoding certain values and so on but from a perspective of a beginner and uh of someone who wants to pass dp600 uh this is enough to know so yeah that's that's where I stop regarding uh uh regarding
            • 120:00 - 120:30 pipelines data transformation Services yes so I'm not completely I'm not completely uh uh lost so yeah thanks David for for bringing this yeah uh okay so let's move on to the next feature or next service that we can use to ingest the data into Fabric and this this is data flows gen two uh data flows gen two allows you to leverage low code interface and uh more than three 300 different data and AI based
            • 120:30 - 121:00 Transformations letting you transform data very easily and with more flexibility uh than any other tool so what is a data flow data flow is a type of cloud-based ETL or extract transform load for maybe for for those of you who are not familiar with this term for building and executing scalable data transformation processes so data flows allow you to extract data from various sources transform it using a wide range of transformation operations and load it
            • 121:00 - 121:30 into a destination uh I will do a small digression and compare data flows gen two with data flows let's call them data flows gen one those are data flows that exist in powerbi world since 2018 approximately so this this is uh from a user interface perspective this is the same thing you have this familiar power query online experience uh you do your
            • 121:30 - 122:00 Transformations by clicky clicky draggy draggy droppy uh uh uh actions so no code is required of course if you want to write M code be my guest but you're not required to do that so you can do everything from a user interface and uh similarities between data flows gen two and gen one finish finish finish there the engine is different but the most important difference is that with data flows gen
            • 122:00 - 122:30 one so with the old one once you connect to data source do a transformation and everything else you store this as an artifact called data flow within a workspace and then only powerbi items can leverage data from this data flow you connect to a data flow from powerbi desktop you build your report and you publish your report so data flow is your data source now you can do the same
            • 122:30 - 123:00 thing connect to data source perform Transformations but now you can output results of those Transformations into your fabric Lakehouse or Warehouse there are also other uh other destinations outside of fabric that are supported but for for for us as fabric analytics Engineers we care about uh destinations in my Microsoft fabric so you can output transformed data into Lous or Warehouse in Delta format which means any engine
            • 123:00 - 123:30 that can read Delta format talking about SQL engine Spark engine uh analysis Services engine so all the engines within Microsoft fabric as well as outside engines that can read Delta can consume data that you just transformed that's a huge difference compared to data flows gen one again data flow can be considered as a standalone item in uh uh in Fabric
            • 123:30 - 124:00 World so once you create a data flow it will appear as an item in a workspace but so and you can run it independently but it can be also run as an activity in a pipeline just to show you if I go back to my pipeline that I previously created there is a data flow activity so I can say yes okay once I loaded data let me delete this once I loaded the data with copy data activity because yeah I don't know spark I don't want to use notebook
            • 124:00 - 124:30 uh yeah I know power query I'm good with power query I do all the things in data flow so once this data comes into my fabric workspace uh into my fabric Lakehouse sorry I will use data flow to connect to this data from a lake house do some additional Transformations do whatever I need and then again I can output it somewhere else for example I can transform the data and output it in a warehouse and use Warehouse as my final layer for for business reporting
            • 124:30 - 125:00 so data flow can really be executed both as a standalone artifact and as a part of the orchestration process within the pipeline and this is important this slide please please this is important uh especially for dp600 because there will be certain questions which will ask you to perform Operation X operation Y and so on and depending on the requirements in the question uh you need to choose between a
            • 125:00 - 125:30 pipeline or a data flow okay so this is important to understand what are the differences between data flows gen two and the pipeline because people uh uh sometimes uh you know mix those those two things then you need to perform data Transformations Beyond some light data transformations talking about so light data transformation that you can do in pipeline within the copy activity you can change the column type you can
            • 125:30 - 126:00 exclude some columns from loading but you can't really do any let's say real data Transformations you can't replace values you can't you can't uh I don't know uh add new columns and so on and so on so for that for data transformation purposes you're using data flows you can't really do a proper data transformation with with the pipeline itself okay so we need to use data flows for that purpose for data profiling so you know this great feature within power
            • 126:00 - 126:30 query and uh also in data flows uh uh gen one then you go to view and then turn on this column profile column quality and column distribution whenever you need to perform data profiling operations data flows you can't do that in in pipeline uh again if you are a a user that is familiar already with power query and that experience environment you know how to leverage power query to to achieve your business goal than data
            • 126:30 - 127:00 flows Gen 2 and in terms of number of connectors data flows support many many many more connections out of the box uh compared to pipeline So currently there are more than 150 uh uh connectors that you can use out of the box with data flow pipeline is predominantly used for orchestration meaning yes I can create a pipeline that just contains of one c one
            • 127:00 - 127:30 single copy data activity that's not an orchestration okay so that's just ingesting data from point A to point B in fabric that's not orchestration but whenever you need to include multiple different activities and Define some logic between those activities like when activity a completes I want to perform activity B and so on view use pipelines so you can't orchestrate data with data flow right it's it's it's a task for pipelines uh whenever you need this simple lightweight Transformations with
            • 127:30 - 128:00 copy data it's usually much faster than uh data flows Gen 2 so it's it's usually faster and again uh data flow can be part of the orchestration process so data flow Gen 2 can be part of pipeline but not the other way around so pipeline cannot be part of data flow Gen 2 hopefully this clears uh uh the picture a little bit but we have more things to consider
            • 128:00 - 128:30 and to choose between in fabric remember fabric is all about the options yes that's right so now lak house warehouse or event house when to choose what let's start with event house and whenever we talk about event house about kql stuff whatever has kql in its name it's its main purpose is for handling
            • 128:30 - 129:00 streaming data yes you can handle other types of data as well with kql and uh with kql database and so on yes you can but the main purpose is to handle streaming data so data that comes in real time you want as soon as data is is coming you want it stored somewhere within the fabric and for that purpose we are using event houses and kql databases so event house as an event
            • 129:00 - 129:30 house it's not it's an item in fabric but it doesn't really do anything so it's it's kind of a container for kql databases the purpose is to provide a unified monitoring and management across multiple kql databases and as soon as you store the data in event house in kql database it's automatically indexed and partitioned so this is how this architecture looks from a high level uh
            • 129:30 - 130:00 perspective we have a workspace and we have let's say event house a and we have event house B and then we can have one or multiple dat kql databases within each of the event houses so event house again it's not the artifact that stor the data it's a container for kql databases and other kql related items we will also examine some of them
            • 130:00 - 130:30 throughout the session that's event house we have more houses in fabric we have and those two let's be honest uh will be predominantly used so Lakehouse and Warehouse so we have a data Lakehouse and data warehouse what's the difference between these two first and foremost the foundation of Microsoft fabric as a platform is a Lakehouse which is built on top of one
            • 130:30 - 131:00 Lake uh uh storage layer that we already examined and lake house in this sense is a unified platform that combines flexible and scalable storage of data Lake and the ability to query and analyze the data of a data warehouse so Pioneers for this term Lakehouse is data bricks and since they this term was introduced a few years ago now it's uh uh generally accepted and everyone talks about data lake house so essentially a
            • 131:00 - 131:30 lak house presents a a database that is built on top of data Lake using Delta format tables and lake houses combine SQL based analytical capabilities of relational data warehouse and this scalability and flexibility of data Lake to store literally whatever ever you want uh louses store all data formats and can be used with different analytic tools and different programming languages and as clouds cloud-based
            • 131:30 - 132:00 Solutions louses can scale automatically and provide High availability and Disaster Recovery uh some of the benefits of Lous so they utilize both spark and SQL engines we will we will uh uh understand to what extent uh they utilize SQL engine uh so but but pot Park and SQL engine can be leveraged within a fabric clayhouse they adopt a schema on read format which allow flexible schema definition as needed rather than relying
            • 132:00 - 132:30 on a predefined schema and then they support this assd type of transactions through Delta Lake tables ensuring data consistency and data Integrity uh this is a Lakehouse uh in Lakehouse we we can store any type of data meaning video files images Json files all the way to Delta format which is considered as structure data so we we can store structured unstructured semi-structured
            • 132:30 - 133:00 data in Lous whereas in a warehouse it's more uh uh similar to traditional data warehousing workloads that we know for uh decades so it relies on relational schema modeling and it uh uh allows for storing structure data so you can't really store uh I mean in theory you can but we are not using data warehouse to store images and so on it's not built for that okay that's let's put it that
            • 133:00 - 133:30 way so when you create a Lakehouse in your fabric workspace once you click create louse you will get two additional items automatically created for you SQL analytics endpoint and default semantic model now let's examine uh what we have within each of these in the lake house itself we have two areas I briefly mentioned them when I was uh uh creating a shortcut we have
            • 133:30 - 134:00 files area and we have tables area files area is an unmanaged area so it's not managed by this park engine and you can literally store whatever you want in uh in this files area all the way from CSV Json images whatever PDFs whatever and then we have a tables area which is managed by Spark in theory you can store CSV file in your tables area but in fabric you don't really want to do this because this can
            • 134:00 - 134:30 be leveraged further by other uh experiences in Fabric and the overall idea of fabric is have one copy of the data that can be leveraged across multiple different workloads so that's why I mentioned preferably Delta technically you can store different things but you want you really want Delta format store in tables area just to go back and quickly show you what I'm talking about let me duplicate this thank you Matthew thanks for
            • 134:30 - 135:00 donation thank you thanks so much oh greetings to Dragon thank you thank you thank you guys okay so if I open my lak house and oops here on the left hand side you see I have files and I have table so under files if I click on files I have three CSV files so I can store whatever I want here basically in this files area and in tables area I only
            • 135:00 - 135:30 have let me just refresh this I'm a little bit annoyed but when I have something in this unidentify uh here I have just Delta tables so this is tables area this is how lake house look like and going back to my workspace this is what I wanted to show you as well I have two additional items here I have SQL analytics endpoint and I have default semantic mole so let's go back to slides and just explain what these two
            • 135:30 - 136:00 are so SQL analytics endpoint is something that Lakehouse exposes so you can use tsql language well-known tsql language to query the data from The Lakehouse and when I say query the data that means it doesn't allow you to perform uh operations that will change the data in a Lakehouse so you can't do updates deletes inserts and so on so you can really just query the data uh uh using tsql of course you can
            • 136:00 - 136:30 do some administrative stuff Grant rols permissions and so on uh but you can't you can't change the data you can create store procedures you can create views as long as they don't manipulate the data in terms that data will be changed in uh in a lake house through this this SQL analytics endpoint uh only Delta tables will be exposed and that's what I mentioned previously if you have a different format stored in a table area
            • 136:30 - 137:00 this one is not queriable by using tsql so this will not be included in SQL analytics endpoint at all so only Delta format will be included in SQL analytics endpoint and the key uh uh uh the key uh let's say the key use case for SQL analytics endpoint aside from this uh possibility to quer it by using tsql is for direct Lake uh storage direct L
            • 137:00 - 137:30 storage mode when you build a semantic model which uses direct L storage mode and for all the queries that cannot use direct Lake we will talk about that later they will automatically fall back to direct query mode and then data comes through this SQL analytics endpoint and we have a default semantic model we will talk about that later in this part uh when we talk about uh powerbi in Microsoft fabric what are the tools and techniques
            • 137:30 - 138:00 to explore and transform data in a lake house so there are multiple of them the main the main way of doing things in Microsoft in uh in fabric lake house is using Apachi spark that's a native engine that works with Lakehouse in Microsoft fabric within Spark engine within spark experience we can use notebooks or spark job definitions to explore and transform data of course we have SQL analytics end point as I said as a read only way a read only point for
            • 138:00 - 138:30 uh quering the data using tsql language we can use data flows previously We examined data flows so with data flows uh we can transform we can transform data uh in uh in a in a louse with data pipelines we can ingest the data we can move it from from there and finally we can visualize it with powerbi let's talk about notebooks uh I'm not a python guy but I find
            • 138:30 - 139:00 notebooks really one of the coolest feature uh I met them first time to be honest with fabric I never used notebooks uh in my previous life before fabric but now I'm becoming more and more fan of of it and my friend Tom Martins who is also here uh I know he's even bigger fan than me of notebooks and he's always trying to convince me that that I should use notebooks more and I I should use MacBook but yeah I don't take that okay so in fabric notebooks we can
            • 139:00 - 139:30 use different programming languages so default language is p spark but you can also use Scala r or spark SQL since recently I think like two weeks ago we can use Python itself we can also write tsql language within the notebook execute it from a notebook cell but again it's just for reading the data we can write uh uh markdown you we can use markdown uh uh to write comments this is one of my favorite features
            • 139:30 - 140:00 because yeah I create something and then after three or four weeks I come back and then I can't figure out what I did so this feature for writing comments in markdown that's that's Lifesaver for me running on or freezing individual or multiple cells and using notebooks to ingest and transform data as well notebooks support automation so you can schedule the run of the notebook either as a separate item either through
            • 140:00 - 140:30 orchestration process within the pipeline so notebook same as data flows can be scheduled to run uh uh outside of pipeline or as a part of wider uh orchestration process okay let's talk about Warehouse that's another option for storing data so same as lake house it's centered on a single data lake so
            • 140:30 - 141:00 Storage storage layer for a warehouse is still one Lake same as for a lake house it's powered by synopse analytics and when I say powered by synops analytics that means the same engine which is called Polaris that was introduced with Azure synops analytics that's the engine behind uh fabric Warehouse fully or almost fully supports tsql let's say 98% of things that you can do in traditional tsql
            • 141:00 - 141:30 workloads uh meaning SQL Server Azure SQL DB and so on can be done in warehouse but there are certain features that are obviously outside of the scope of of uh this session today that are not supported or not supported yet we'll see but think of it that it almost fully supports this that we know and it stores data in parquet file format or Delta file file format in the end tools and techniques to explore and transform data in this case SQL end
            • 141:30 - 142:00 point of the fabric Warehouse which is built in the warehouse so when you create a warehouse you don't get a separate artifact which is SQL analytics end point just to show you if I go here here is my warehouse and I don't have a SQL analytics end point I just have a default powerbi semantic model so there is no separate artifact because SQL analytics endpoint is built in a fabric warehouse and this one this version here it's both read and write so you can
            • 142:00 - 142:30 update uh uh Delete and insert data through through the SQL endpoint again using data flows Gen 2 or data pipelines to uh ingest or transform data and again visualizing with powerbi using uh using direct Lake mode or any other of existing modes so to recap if you are coming from a SQL background like me there are two types of SQL houses in fabric SQL end point of
            • 142:30 - 143:00 The Lakehouse and it's not called synaps Data Warehouse anymore apologize for that I need to update this slide so it's just a warehouse now so this one seal end point of the louse it's automatically generated supports only read operations please keep in mind that if you have a question on dp600 you need to do X y z on a lake house and you need to choose if you do that with a notebook or writing tsql store procedure you can't really do that with tsql store
            • 143:00 - 143:30 procedure right so you can use procedure but you cannot change data in L house by using SQL store procedure whereas in Warehouse as we already mentioned it's more like traditional data warehousing workloads where you can do all of this stuff using tsql just to make sure because that's from my experience what I heard from people teaching them fabric uh in previous months is they think okay Lakehouse stores the data in format
            • 143:30 - 144:00 Delta format warehouse stores in some other format they they everything is in one Lake but in in the end those are just files so those are just bunch of parquet files with Delta log as a a a layer on top of that that provides metad dat information about those files but both Lakehouse and Warehouse they store the data in the same format absolutely the same format it's just which engine
            • 144:00 - 144:30 you are using to process the data when you store the data in the lake housee you are using Spark engine and you are using notebooks to process the data when you store the data in a warehouse you are using Polaris engine and you are writing tsql code to process the data that's the only difference from Storage perspective it's the same there is no difference okay moving on to the next
            • 144:30 - 145:00 topic one of the features that was also recently introduced is one Lake integration for semantic models and warehouses and I think this is very useful feature we'll see how it how it's going to work in real life but from a conceptual point of view it's it's really cool uh we have different data sources you see here on the left hand side I have Excel file SQL database SharePoint whatever and essentially I bring this data into
            • 145:00 - 145:30 powerbi semantic model so we create our powerbi semantic model like we did in previous 10 years we combine this data mesh this data we create semantic model in import storage mode and this semantic model consists of one two three five or 100 tables now wouldn't it be cool if we can somehow reuse these tables somewhere else uh so it's not locked within the powerbi semantic mole and that's where
            • 145:30 - 146:00 this feature kicks in this integration for semantic moles with one Lake integration all these tables from your import semantic model will be created and included in one Lake in the format of Delta tables so I have 50 different tables in my import mode model I will have 50 tables 50 Delta tables in one Lake and from there oops sorry since you already learned that all the engines can read Delta within fabric from there once
            • 146:00 - 146:30 I have this in my one leg as a Delta format I can write notebook to to to query this data I can use Warehouse to write tsql to query this data I can create direct Lake model on top of these tables and have a new brand new semantic model that uses direct leg storage mode so the idea is to enable this reusability uh to to uh have this data not locked in here but available also for other experiences in Microsoft
            • 146:30 - 147:00 fabric this feature is I'm not 100% sure so please take this with a grain of salt I think this feature Is AO by the by default it's disabled but is disabled so you need to uh explicitly enable this feature to toggle on this one L integration option and this can be done in two different places so here uh under under uh
            • 147:00 - 147:30 settings for the model and uh within fabric tenant so here when it says semantic malls can export data to one Lake and this one users can store semantic mold tables in one Lake for event house it's similar uh I mean conceptually is the same again for because uh as we already learned kql databases are not storing data in Delta format it's a different format but sometimes you want to leverage this data
            • 147:30 - 148:00 also and combine it with data that is stored in a Lous or warehouse and basically you don't you it's it's not so easy to do and uh by turning on one L integration for event house this data will be available in Delta format also in one Lake uh you can choose to to do this on a the entire kql database level if you turn it on on the entire database level obviously the entire database uh will be uh uh uh integrated into one L
            • 148:00 - 148:30 or choose just specific tables that you need we are done with the first part get data let's move on to transform data still a lot of things to cover okay so first before we jump into creating views store procedures uh let's first understand because we want to transform data uh using certain features or services in fabric let's first
            • 148:30 - 149:00 understand take one step back and understand how to organize data in the lake house Lakehouse is the architecture of choice in Microsoft Fabric and although Microsoft makes a distinction between the Lakehouse and Warehouse as you already learned in the previous part this separation in the end might produce some confusion especially for new fabric users uh let me try quickly to explain why uh uh in today's Cloud analytical landscape we have a
            • 149:00 - 149:30 separation between the storage and compute so if we talk about storage data lake house means that data is stored as files in the lake in this acid compliant Delta Lake format and then you choose between different engines like spark or SQL engine and so on to process files in the story and in Microsoft fabric the storage layer is Lake to be more precise one Lake which we previously examined so don't be baffled don't be confused when
            • 149:30 - 150:00 you hear data warehouse don't think about some proprietary formats in the background like we had before fabric it's still Lake it's still Lake it's just a different engine as we as we already explained so let's now try to explain this uh way of organizing date in the lake house the most common pattern for modeling data in the lake house is called Medallion uh a lot of people call this architecture I wouldn't call it architecture let's say data modeling
            • 150:00 - 150:30 approach architecture is I think it's something different but that's that's my opinion sorry for that uh why it's called Medallion uh the same as for the lak house Concepts in general credit for being Pioneers in Medallion approach goes to data bricks and simply said Medallion architecture assumes that your data within the Lakehouse will be organized in three different layers bronze silver and gold that's why it's called The Medallion now you may also hear terms
            • 150:30 - 151:00 such as raw validated enriched or curated which I personally prefer but essentially the idea is the same to have different logical layers of data in the louse that are of different quality and serve different purposes so call it however you want but Medallion architecture is generally accepted term so therefore I will refer from now on I will refer to uh this data modeling approach as Medallion uh let's
            • 151:00 - 151:30 quickly examine all these all these uh three layers and what I also wanted to say uh there is a great video I think Simon Whitley from advancing analytics did this explaining uh basically how you can custom this Medallion architecture doesn't necessarily need to be three layers you can have one two five 10 so it's just up to you to Define what you
            • 151:30 - 152:00 consider the next level of data quality and yeah use that as as multiple uh multiple layer structure but let's let's uh stick with the naming conventions and the approach that is relevant for dp600 okay this the digression from previous five minutes was more uh my not rent but that's more like in real life you will probably see different things yeah okay going back to bronze layer bronze layer is where we
            • 152:00 - 152:30 land the data from external sources in its original in its raw State and data is ingested as is containing only metadata in addition the purpose of the bronze layer is to serve as a repository uh uh of the historical Archive of source data enable quick data reprocessing when necessary without need to connect to bunch of external uh uh systems again it's important to keep in
            • 152:30 - 153:00 mind that uh bronze layer contains unvalidated data in regards to storage format the bronze layer usually stores the data in one of the uh uh uh in in its original format whatever it is I don't know CSV Json whatever usually we store the data in its original form in its or original format then let's move on to Silver layer in the silver layer data from the bronze layer is conformed and cleaned so that all the key business entities uh
            • 153:00 - 153:30 Concepts and transactions are available in the form of a so-called Enterprise view which is ready for ad hoc analysis ready for machine learning workloads that means the data is enriched and validated and from that point it can be trusted uh Downstream for further analytical workloads from a data modeling perspective usually silver layer contains more thir normal form like tables so usually again this is not
            • 153:30 - 154:00 set in stone but usually it's more like third uh uh database which is operational database normalized to a thir normal form and in the silver layer data should already be stored in some of more performant uh uh formats such as Delta or possibly par and finally we have a gold layer that's the icing on the cake in this approach where data is structured and organized to support specific project requirements uh because
            • 154:00 - 154:30 this is the final stage in the entire process data is additionally refined and cleaned and in the gold layer we also apply various complex business uh rules business logic uh use case specific calculations and so on from the data modeling perspective again usually doesn't necessarily mean that it it should be always like this but it's usually implemented through a kimbal style star schema where the data is
            • 154:30 - 155:00 denormalized to support business reporting requirements in terms of storage data should be stored definitely here in Delta unless you really have a good reason not to do that but should be stored in Delta moving data across Medallion layers refines organizes and prepares it for for Downstream data activities and within Fabrics louse there is as you've already learned previously there is uh more than one way to move data between layers uh there are
            • 155:00 - 155:30 few things to consider when deciding how to move and how to transform data across those layers how much data are you working with uh how complex are transformations that you need to perform how often will you need to move data between the layers and last but definitely not least what tools are you most comfortable with if you don't know uh P spark or or tsql and you don't want to uh burn your chat chat
            • 155:30 - 156:00 GPT then you probably want to use data flows so it also depends which tools are you and and uh Technologies are you most comfortable with fine let's move on to creating views functions and store procedures in the warehouse It's the Most straightforward the same as in traditional tsql workloads so you can create uh uh views functions and store procedures in louse you can do that with P spark or spark SQL or you can do with
            • 156:00 - 156:30 tsql but only those uh objects that just read the data remember it's read only uh through the tsql analytics endpoint so views functions store procedures that read the data can be created using T language in the B house we are using kql which stands for uh custo quering language that's a special it's a different language uh than than other quering languages although similar to
            • 156:30 - 157:00 some and we will see some examples in the in the in the next slides uh so for quering kql databases we are using custo quering language the main difference uh between an event house or let's say kql database uh and warehouses and louses is that you can create materialized views in kql database and you can't do that it is still not supported in Lakehouse and Warehouse it's coming but it's still not supported keep in mind if you work and
            • 157:00 - 157:30 if you rely on building your and implementing your logic in database views a lot of us used to do that in traditional SQL Server workloads building uh uh entire layers using views if you create views in uh in a warehouse or Lakehouse they will not work with direct Lake mode so direct Lake mode when you use views will automatically fall back to direct query
            • 157:30 - 158:00 then let's see how to enrich data by adding new columns and new tables so in a Lakehouse we use notebooks or spark jobs for uh data transformation and we usually operate on a data frame level so data frame is an object that exist or or structure that exist in P spark I like to think about it as a temporary table in tsql because as I said I'm coming
            • 158:00 - 158:30 from tsql background maybe I'm wrong but that's how I think about uh data frame so I do different things on this data frame and finally when I'm ready I drop this data I push this data into into a real table uh with withd column and select functions uh we can create new column and tables so this is a very very simple example let me go here just to zoom it a little bit withd column uh function or method that's method sorry method to
            • 158:30 - 159:00 apply on on a data frame so DF stands for a data frame uh vid column will add a new column in case that the column with the same name doesn't exist already in the data frame if the column with the same name already exist it will simply overwrite the existing column if not it adds a new column in this case because I don't have those columns first and last in my data frame this will create two new columns two additional
            • 159:00 - 159:30 columns okay uh in a warehouse and thanks to Andy Cutler my friend I know you are here so thanks for the great blog post you wrote uh about this in warehouse this is one of the things you remember I mentioned that 98 % of traditional tsql uh uh uh uh operations or uh are supported in fabric warehouse this one goes into those 2% that is not supported
            • 159:30 - 160:00 so alter table add column is supported so you can add a new column but once you add a column and you have it in a table if you try to alter the column which works as a charm uh outside of fabric Warehouse here it doesn't work you will get an error also table drop column also doesn't work it's not supported yet we'll see add column was also not supported up until few months ago it now works so if you want to add a table uh
            • 160:00 - 160:30 add a column uh to a table this operation here uh alter table dim product add column name and data type this one should work but altering existing column and dropping column does not work in kql in custo datab base uh do alter table command is supported and uh when you use this uh uh use this function please keep in mind that existing non-specified columns will be
            • 160:30 - 161:00 dropped so if you want just to include new columns you need to specify all the columns that are part of the table already therefore first use do show table uh uh uh CS LS CSL schema to get the existing table schema before you change it so you see what columns are already included and once you add a column it adds a nullable column to the end of the of the schema this is the this is a simple example of how you add
            • 161:00 - 161:30 a new column in uh kql database so alter table and then adding the column in kql we are still in kql databases there are many operators that don't exist in uh in SQL for examp example extend operator which creates a calculated column and appends to the end of the result set so in this example here I have two columns end time and start time and I'm adding a new column
            • 161:30 - 162:00 new calculated column at the end of this table which will calculate the difference between end time and start time value so with the extend operator we are creating calculating column and appending it to the end of the result set great let's move on to Star schema I see a lot of you in audience that I know are huge fans of kimbal and um yeah this one is for you so how to implement star
            • 162:00 - 162:30 schema for a lak house let's first quickly explain because I'm not 100% sure that all of you understand what star schema is so asper Kimbell a guy who uh wrote a book in 1994 five I think uh or introduce his principles of dimensional modeling uh so kimbal says that in data model each table should be classified either as a fact or Dimension
            • 162:30 - 163:00 table so we have a distinction between fact tables and dimension tables Dimension tables are so Dimension is a a rough name so really it doesn't explain what these tables are doing therefore I like to refer them to them as lookup tables so essentially I need an information about something and then I go to lookup table and find this information like description uh or or many other
            • 163:00 - 163:30 different attributes that can describe certain thing I find this in dimension tables and they usually uh answer the question starting with W so who bought something when did they bought uh when did they buy where uh who who when where uh why what so that's how I think about di Manion tables and then we have fact tables that store some date about some events so something that happened this
            • 163:30 - 164:00 can be a transaction on the terminal in in your uh uh local supermarket this can be information about the temperature for today in salsburg or somewhere else so any event anything that happen at a certain point in time in certain location and so on it's stored in a fact table and fact table usually contains only those numeric values plus uh information about key columns uh uh that
            • 164:00 - 164:30 we can use to refer and pick up the data from these lookup tables or Dimension tables within the dimensional model we have few more Concepts to uh understand uh one is about creating unique keys and there are two different types of ke keys in uh dimensional modeling so-called sarate key uh and Alternate key so sarate key is a dummy meaningless integer or any
            • 164:30 - 165:00 other uh uh uh whole number value that we use to uh uniquely identify a record in a table why are we using surrogate keys and not alternate Keys alternate Keys key is the key unique identifier from The Source system because it can can happen that you are loading the data into your Warehouse from multiple different Source systems and it can happen that you have a business key in each of these systems uh and there is
            • 165:00 - 165:30 there is the same value for this key in system a and in system B so how you differentiate in your Warehouse between these two you need to add this surrogate key which will basically uh uh serve the purpose of uh being this unique identifier so that's the difference between sarate and Alternate key you will also hear term business key for alternate key but sarate key sarate key just a dummy integer value that linearly increases by one and uh uniquely
            • 165:30 - 166:00 identifies the record in a table some more Concepts that are relevant for Star schema uh snowflake schema is another uh Concept in dimensional modeling it's a star schema which is further normalized and we are talking about normalizing uh normalizing Dimension tables so this case you see that I have a dim customer table which is further normalized to uh dim geography and dim store so more granular Dimensions more
            • 166:00 - 166:30 uh uh Dimensions that uh remind of uh modeling the data in uh traditional transactional databases where we have data normalized to a third normal form some more Concepts relevant for Dimension tables that you need to know calendar Dimension which is which should be used uh in every single data model hopefully and slowly changing Dimension which is important from the perspective of dp600 uh so with slowly changing
            • 166:30 - 167:00 Dimension that's a concept also introduced by uh uh uh kimbal and his team essentially it refers to how you handle changes to the attributes that are happening to your data meaning let's say I'm now located in salsburg I used to live in Bel so I moved to salsburg so my location changed how do I handle this change within the data let's say that I'm a customer in a system and uh uh those guys want to update data about
            • 167:00 - 167:30 myself so how do they handle this data there are multiple different uh ways and uh uh uh options to do this there are different types of slowly changing Dimensions but usually most often and most commonly used uh slowly change in dimension type in analytical workloads is slowly changing Dimension type two essentially let's go with 01 and two those are three that are most commonly uh uh most commonly used generally uh
            • 167:30 - 168:00 not not just analytics but generally so slowly changing Dimension type zero means there are no changes to attributes so whatever happens I will always be I will use myself as an example again so I was born in Belgrade I was in Belgrade no matter matter where I move throughout my life I will always be my location will be set to Belgrade which is not good right because yeah you want to do some uh uh data analysis and perform uh how things
            • 168:00 - 168:30 changed over time uh let's say that you want to summarize amount per location I'm not living in Belgrade anymore so my uh my sales amount should be assigned to my current location and so on so slowly changing the measure type zero is really something that we don't use in analytic col work cloths slowly changing Dimension type one means you simply overwrite the existing value meaning I used to be in Belgrade then uh in 2016 I moved to
            • 168:30 - 169:00 salsburg from there uh from 2016 so I will be my location will be updated to salsburg and it will appear that uh as I'm always used to live in salsburg so I never used to live in Belgrade there is no again there is no track of historical changes that's much slightly better but it's not good again for analytical workloads because if we do a comparison between locations in 2015 versus 2016 I
            • 169:00 - 169:30 want to see sales amount for myself in 2015 in Belgrade in 2016 in salsburg right this way it will always be salsburg and until I move somewhere else to Spain for example and uh then all of my sales amount will go to Spain and then slowly changing Dimension type two this is the one that we want to use because it adds a new role for every change to the attribute meaning I used to live in Belgrade until 2016 when I move to salsburg a new row
            • 169:30 - 170:00 will be added to a table which says from 2016 until in the future so any future date I live in salb once I move to a Spain there will be a third row in this table which says Nicola from 2025 hopefully lives in Spain and so on so with slowly changing Dimension type two we are essentially we are essentially uh adding new row for each change of the
            • 170:00 - 170:30 attribute let's talk about some other important Concepts uh in data modeling normalization and denormalization so normalization is not part of dp600 but uh it doesn't make sense to talk about denormalization before we tell what normalization is before because as you may assume denormalization is the opposite process of normalization so normalization is process of organizing the data in a
            • 170:30 - 171:00 database and in most cases third Norm again we have six different uh normal forms and in most cases thir normal form is uh considered optimal with normalization uh the focus is uh uh on enabling the speed of data writing processes so it's commonly used in operational databases where we need to improve the data writing speed with the normalization we are essentially
            • 171:00 - 171:30 creating redundant data in a table and uh it's it's commonly used in analytical scenarios in allop scenarios where the the focus is on enabling the best possible speed of data reading operations how the normalization looks in reality let's say I have a fact table fact sales and I have three dimension tables the product dim product subcategory and dim product category so
            • 171:30 - 172:00 in this case let's say that I'm interested to see uh the total sales amount for category name so I don't have this information in my dim product subcategory and dim product so what's going to happen in this case to retrieve the data uh for category name I need to join first dim product table I need to join dim product subcategory and then I need to join dim product category this is very often expensive when you need to read the data what we do with denormalization we
            • 172:00 - 172:30 store this data within dim product table and essentially we create redundant data you see that women value repeats in subcategory name and clothes value repeats in category name so we create redundant data but from the perspective of data reading operation this should be this should be faster not always but in uh uh in many cases it's faster the next concept is aggregations
            • 172:30 - 173:00 with aggregations the idea is to uh basically take one big large fact table and aggregate data on different attributes in this oversimplified example I have just three attributes I have a date I have customer ID and I have product ID in reality you probably have more than that but what's the idea uh with aggregating the data we reducing the amount of rows that engine needs to scan in this case instead of having five
            • 173:00 - 173:30 rows of course in reality there it will be uh uh many many more uh so instead of five rows I have one table which Aggregates data per date which contains two rows and I have two tables one Aggregates data per product and one the other per customer uh that contain three rows so essentially we reducing the number of rows and all the queries that retrieve data per product or per date can read the data from these
            • 173:30 - 174:00 smaller tables that's the idea of creating aggregations in general now let's see how aggregations can be created in fabric specifically again we have three different uh experiences l house warehouse and event house in louse we do that using p spark or spark SQL in Warehouse with tsql and in ventouse with kql the next operation is merging or
            • 174:00 - 174:30 joining the data again we have those three previously mentioned and again language of choice is more or less predefined for all of them but this time for merging good joining data we can also leverage low code no code experience with data flow gen two for merging and joining the data so uh if you don't want or you don't know how to write uh code in any of these languages here you can use data flow Gen 2 and
            • 174:30 - 175:00 then uh uh leverage this Rich uh graphical user interface to perform uh merge and join operations when you perform merge or join operations and this one is uh really really really uh important From dp600 perspective I'm pretty sure you will get a question that will ask you which type of join you need to perform to obtain result X YZ so let's say we have a two we have
            • 175:00 - 175:30 two tables uh very simple tables one contains uh column fruit and the other contains also column fruit uh in first we have apple orange and banana and in the second we have apple orange and mango if you do a left outer joint then all the Val vales from the first selected table the one we think of it as on the left hand side but think of the first selected table in the query all the records will be included and the matching records from the table that we
            • 175:30 - 176:00 are joining in this case because banana and mango don't match mango will be excluded from result set right outer join is as you may assume the opposite so we take all the records from the second table and then matching values from uh from a from a uh uh from another one with inner join only matching records between both tables will be included so in this case because banana and mango are not matching they will be completely
            • 176:00 - 176:30 omitted from the result set and finally we have full outer join uh which includes all records from both tables where the uh uh mismatching is represented with null value or blank blank value yeah you guys have a crystal ball in chat I see you know uh all uh yeah of course it's not hard to to predict the next topic because I'm I'm just following the official study exam guide so
            • 176:30 - 177:00 yeah don't take don't don't pay some money to to them uh for that okay so identifying and resolving duplicate data in a Lakehouse we are using p spark or spark SQL and there is this uh method drop duplicates in this case I'm literally telling from this data frame here uh I'm checking customer name and email columns and I want to remove all the duplicates from uh from these two columns uh and then return the column
            • 177:00 - 177:30 customer name and column email so I'm with drop duplicates I can remove duplicates in warehouse this is one of the uh one of the ways to do that obviously there are multiple of them but you can use a Common Table expression and then basically uh Partition by IDs and assign row number we will examine row number function in more details very soon and then simply delete records that are uh where row number is greater than
            • 177:30 - 178:00 one and in kql uh you can use arore Max uh function to find uh the maximum value and then from there summarize data based on on this value so with arcore Max identifying and resolving resolving missing data again let's start with louse and uh again tool of choic is p spark and Spark SQL you want to find
            • 178:00 - 178:30 probably null values and uh uh replace them with something we in a Lakehouse there are two functions F Na and fill which essentially return same results and uh you see some of the examples that I include some code examples that I included here so fi will simply take uh all uh uh missing values in this case in City column and it will replace
            • 178:30 - 179:00 it with it will fill with unknown value in tsql we can use quesque or isal functions uh quesque they are slightly different they behave slightly different so quesque will search for a first non-null expression so you can provide multiple arguments to KES function so it will literally first check class value of the class uh column and if class is null it will proceed and check color
            • 179:00 - 179:30 value value for color column if it's also null it will go and check for product number value as long as there the uh as soon as it finds the first non-null value it will stop evaluation and it will return first non null value whereas isnull checks the value in specific column and replace it replaces it with a specified value in this case I'm checking if the value in Max quantity column is null and
            • 179:30 - 180:00 if yes I'm replacing it zero with 0. in kql we have more nuanced uh choice so is null function uh the same as in SQL returns a Boolean result for non- string columns so for numeric columns uh is empty needs to be implemented if you want to uh check result for string
            • 180:00 - 180:30 columns so in in tsql is null is implemented for all data types in kql is empt is for Strings is null is for numbers this table illustrates the difference between these two and then if you want to replace missing values uh you can use serus underscore fieldcore const which replacing uh replaces missing values in a series with a specified value with a specify value that you you define and also there is ques function that works uh exactly the
            • 180:30 - 181:00 same way as in tsql converting column data types can be done in multiple ways again code first approach louse Warehouse or event house or low code no code approach uh using data flows gen two and then uh also with pipelines in copy dat activity as you saw I can change the column data type uh uh during during the process of setting up uh copy dat
            • 181:00 - 181:30 activity so in a Lakehouse we use cast operator to change the data type in a warehouse remember alter table is not supported so you can't use alter table but there is a workaround so what you can do you can use uh a system function Spore rename to rename the table and then change the type and then create table as select with new data types it's a workr so you don't definitely don't know to do don't need to know this for
            • 181:30 - 182:00 dp600 but for the real life this this can be this can be useful work around but alter table is not supported and in event house there are two options alter column and set or append with alter column it's as simple as that as as simple as it should be in a warehouse but it's it doesn't exist so with alter column we you Pro you just specify a table and column name and uh simply type that you want to change the column to
            • 182:00 - 182:30 and set or a pen which creates a new table and preserves existing data while changing the column data type yes Fernando there will be break very soon just just just a few more things to cover uh converting column data types so uh here we have to uh no code approach in data flows gen two oh sorry in pipelines copy data uh as you saw in this uh uh in
            • 182:30 - 183:00 this example that they previously did we can change the type uh of the of the column in destination and in data flows gen two obviously you can convert column data type the same as you do in power query filtering data again depending where you where your data is stored and where you do when you perform uh filter operations if you do it on a Lakehouse then there are two functions to be aware of Select which
            • 183:00 - 183:30 returns subset of columns and filter which returns subset of rows then in a warehouse select and wear and in ventouse project uh is equivalent to select and there is same as in SQL subset of rows filtering data is also possible within the pipeline and uh within the data flow in data flow again same as in power query you simply uh uh open this
            • 183:30 - 184:00 down arrow and then you choose the values that you want and in pipeline there there is a filter activity that allows you to specify uh which rows should be loaded uh into Microsoft fabric okay so we are done with transform data let's let's do a short break of let's say s minutes and then then we
            • 184:00 - 184:30 restart with the query and analyzing data and uh proceed from there e
            • 184:30 - 185:00 [Music] [Music]
            • 185:00 - 185:30 [Music]
            • 185:30 - 186:00 [Music]
            • 186:00 - 186:30 [Music] [Music]
            • 186:30 - 187:00 [Music]
            • 187:00 - 187:30 [Music] [Music]
            • 187:30 - 188:00 [Music] [Music] [Music] [Music]
            • 188:00 - 188:30 [Music]
            • 188:30 - 189:00 [Music]
            • 189:00 - 189:30 okay welcome back from the last break there are no more breaks sorry about that but yeah uh we are slowly leaning towards uh final parts of this session and please stay here because we are
            • 189:30 - 190:00 going to play quiz and I promise this will be super fun first of all thanks Fernando for the for donation really appreciate that thank you thank you and please excuse if I missed someone's donation to to mention you but I really really appreciate that thank you so much okay so let's jump into final part for uh our prepare data uh topic which is query and analyze data so the first and the most straightforward way especially
            • 190:00 - 190:30 for people who are not coders is to use visual query editor that's a new feature that was introduced with Microsoft Fabric and here for example if you uh expand this new SQL query uh uh drop down menu you will see the option new visual query and this new visual query this is how it looks uh in fabric workspace so as you see it reminds of a data flow gen two and essentially it allows you to drag and drop tables and
            • 190:30 - 191:00 perform uh different transformations to the data by using well-known experience from data flows Gen 2 so you can think of visual SE visual query editor as a data flows on top of SQL tables that you use and you see merge queries with left join and so on choosing columns so uh doing some data cleaning and so on so this is visual query editor the next option is quering data with SQL
            • 191:00 - 191:30 obviously this session will not teach you SQL uh for that you either need to read a book or attend like ah multi-day course but yeah so let's focus on elements of SQL tsql language that is important from the perspective of uh dp600 exam so we will examine specific group of functions in tsql those analytical functions that you can expect to be tested on during dp600 so first
            • 191:30 - 192:00 group is window functions window functions uh contains of few building blocks the first one is Over clause which defines a window then we have a partition by clause which is optional and it breaks the rows into smaller subsets so think of it as as creating uh small subtables sub V virtual subtables within the main big table and finally we
            • 192:00 - 192:30 have order by depending on which function which window function you are using it may be required or not this is a very simple example of uh window function row number let me Zoom it a little bit so each window function starts with over keyword and then in this case I'm creating partition for each customer key so each customer key value in my big table will be considered as a small subset of records and everything that I perform from there
            • 192:30 - 193:00 will be applied on this small subset of Records in this case I'm also using order by Clause to sort the records in this subset based on the order date the next group is ranking functions ranking functions there are four of them row number rank and dense rank uh and fourth one is enti which I didn't include it here uh so just to explain the difference between results uh when
            • 193:00 - 193:30 you see results you will see that you and you will understand the difference uh between uh behavior of all of these three functions so we draw number essentially what I'm doing here just first to let's let's uh let's explain this so essentially I created a partition on each customer in in this case I have a customer with ID 11330 and withraw number and I also uh uh Partition by uh order date so you see
            • 193:30 - 194:00 that in this case I have two orders which were made on 24th of October 2013 and with the row number it's just simply increases the value by one so it doesn't take into account any differences here uh or any s any identical values so in this case two these two values are completely the same so it assigns Rong number in arbitr uh completely arbitr arbitrary way in with Rank and Danse
            • 194:00 - 194:30 rank is different so with rank I have these two records on 24th of October and then the next one is on fourth of November so what happens in this case with draw number again it's fairly simple with rank because these two are exact L the same from the perspective of the order date right so we are not doing any other uh uh check we just so we we just check the order date so from the perspective of order date these two are
            • 194:30 - 195:00 treated equally so this is number six and the next one is also number six but what I want you to focus here on is the next value so the next value starts with eight there is a gap here in between because we have two records with the same value then there is a gap of one and the the the the array contain continues with number eight if I had three values with the same here 24th of
            • 195:00 - 195:30 October 2003 then it will be it would be 669 because there will be three records with the same value that's Rank and then if we move to Dan rank sorry if you oops if you move to dance rank it's the same as rank the only difference is here with dance rank values continue from the next number so it doesn't matter if you have
            • 195:30 - 196:00 two three five or 100 uh Records with the same uh with the same dense rank value here the next we start will start without the Gap so there is no Gap with dense rank please remember this difference this is 100% tested on DPC 00 the difference and the expected result set query data with equal finally offset functions uh there are four of them lag lead and first value last value again we
            • 196:00 - 196:30 are creating we are creating uh a subset of values in this case I'm partitioning data per customer so this is my partition number one customer ID 11 11,000 this is partition number two partition number three and so on so lag function simply retrieves the value from the previous row you see it's simple as that whereas lead does exactly the opposite the lead takes the next value
            • 196:30 - 197:00 uh the value from the from the next record and puts it here that's next order and then first order and first value and last value will simply take the first value of the partition and assign it to all the records within this partition within this sub set of Records last value will do the opposite so we'll take this value and place it on all three records here so this these are first value and last
            • 197:00 - 197:30 value with kql we are using uh an item which is called kql query set which represents a collection of one or more kql queries and uh the main benefit of using kql query set item is uh that you are able to easily share queries with others uh be aware that uh using explain keyword uh if you are a a SQL person and you don't know kql uh you write your SQL
            • 197:30 - 198:00 you prefix it with explain keyword and it will translate your SQL to kql statement so the the statement that I wrote here this is tsql and this is the equivalent written in kql so with explain you get kql translation pretty cool you can write this SQL without this explain thingy so you can write this SQL to quer the data from
            • 198:00 - 198:30 kql uh but that's why I put this butt you shouldn't do that kql is not built to uh uh to work to use tsql to quer the data there are many many many functions in kql that are optimized for working with uh uh streaming data that we stored in kql databases so generally you want to use kql tsql should be avoided although it can be used it should be avoided and
            • 198:30 - 199:00 as a uh backup plan you can always use this explain uh and to to translate this to uh to kql I want to show you a few examples of uh let me see if I have some queries no I don't have here maybe if I go here and then warehouses just to show a few examples of uh
            • 199:00 - 199:30 queries this is a visual query I created I think yes so this is an example of visual queries and let me find I'm not sure where I stored aha I have views I have store procedures uh then let's go and show it's better to show than uh kql queries tsql is tsql I already show in these slides a few examples of uh tsql tsql functions that
            • 199:30 - 200:00 you need to learn and understand for uh uh for dp600 I want to show you here what I have in my kql database so uh this is my fundamentals Ms fabric uh art uh uh RTI this is an event house and in this event house I have one kql database as as of today and here this one is a query set that's the the uh item that I mentioned previously so this one allows
            • 200:00 - 200:30 you to store a number of different queries and then you can share this item with the others uh within the organization so this one will simply count records in a table then this one for example will filter the data so just uh give me the data where the state equals New York and event type is winter weather and include only those columns
            • 200:30 - 201:00 remember when you want to filter the data project is used to uh filter out the columns that you want in this case I'm just returning five columns here we are creating a calculated column remember it's simply calculating here in project and I'm returning just top 10 records using summarize operator to show me number of Records grouped by each
            • 201:00 - 201:30 state so for each state I'm counting the number of Records I can also visualize results in this case I'm using bar chart this one is conditional count uh this one uses uh Max function to uh aggregate data and calculate the the maximum number of uh injuries per each state and also uh I'm renaming the column so in this case I'm using Al allias uh
            • 201:30 - 202:00 injuries direct and I'm returning just two columns with project project keyword that's these are some very basic examples of kql kql queries okay cool let's move to the final topic and it's Implement and manage semantic models for all of you powerbi people this one will probably be the favorite so let's let's jump in and see what's in there two main uh the this part uh uh was not changed almost at all
            • 202:00 - 202:30 uh uh as of 15th of November so let's start with designing and building semantic mods you see the subtopics uh that are that are measured skills that are measured within this design and build semantic Ms so let's first examine the feature which is called direct Lake but before we examine it let's take one step back and understand the powerbi architecture from the perspective of someone who doesn't
            • 202:30 - 203:00 use Fabric or didn't use fabric so what we had before fabric we have uh sorry not what we had before fabric but what we have still but it was relevant also before fabric so it's not it's not that those two are not relevant anymore apologize for this that so we have first and default uh Choice when building our parbi semantic models is uh
            • 203:00 - 203:30 import mode uh and as Microsoft suggests and as we witnessed many times in real life import mode should be a default option and uh I like to quote Chris Webb who says you should use uh you should always use import mode unless you have a good reason not to so import mode should be a default Choice when designing uh designing powerbi models especially before this direct L uh became a thing
            • 203:30 - 204:00 so with import mode what we have we have some uh data sources here on the left hand side I have those tables you can think of them as uh SQL tables Excel file SharePoint list whatever and then we take this data and we import this data here into powerbi model so this Cube here in the middle this is powerbi model okay that that is a database called vertac where powerbi stores the data and then we have powerbi reports
            • 204:00 - 204:30 all the way on the right and when your user interacts with the report those visuals generate some Dex queries in the background and those Dex queries are then executed against this vertie database against this local copy of the data that is stored in powerbi so these queries don't touch the original data source and from a perspective of import mode the best thing is blazing fast
            • 204:30 - 205:00 performance and we talk about a columnar inmemory database vertac which stores the data in a highly compressed format so these queries are usually super super fast and they to stored in memory but we have to let's say downsides when using import mode first is data duplication because essentially the data we have here on the left hand side is literally copied physically into powerbi database so we have two versions of the same data we
            • 205:00 - 205:30 have we have the same data at two different places and the other problem let's say is data latency why because when you import the data into powerbi you are taking a snapshot of the data from the the data source at certain point in time so now it's 8:15 in salsburg let's say that I imported my data 815 into powerbi database here as long as I don't bring new data
            • 205:30 - 206:00 fresh data from the data source here my Dex queries will retrieve only those only this data that is stored in powerbi database meaning if I brought 10,000 Rec records at 8:15 and then in next 15 minutes I have new thousand new records coming into my uh data source these new thousand records will not be reflected in my
            • 206:00 - 206:30 powerbi model until I refresh it and then I pull another snapshot and bring what is currently stored then at that moment in a data source so this is import mode to resolve some problems of import mode uh we have direct query as the opposite of import mode with direct query no data is stored in power by database so in this case this cube in the middle this vertie
            • 206:30 - 207:00 database stores metadata only so information about the tables columns data types and so on but physical data is not stored here and that means when your user clicks on the visual in the report and generates Dex query that this Tex query will be translated to a SQL and query will be executed against the data source directly which brings us to important conclusion this time there is no data
            • 207:00 - 207:30 latency because whenever user interacts the with the report the visual will generate the query and that query will retrieve what is the latest data at that moment in the data source okay so there is no data latency there is no data application because data stays here it's not retrieved and it's not stored in powerbi database it's just taken during the query execution and populates the visual but it's not stored here so there
            • 207:30 - 208:00 is no application and then the question is yeah why because uh direct query is so cool there is no data duplication there is no data latency why why don't we use uh direct query always because of this and I'm sure all of us whoever use direct query can confirm that yeah uh if you don't talk about few hundreds or few thousands of rows as soon as your data scales you will run into issues with
            • 208:00 - 208:30 performance and this is really something very often which is not acceptable for end users so that's why we are still using import as our default choice and now with fabric we have a new option which is called direct Lake so direct Lake tries to uh uh to leverage all the best things about direct quer and import mode and so the idea is to bring the performance of import mode or very close
            • 208:30 - 209:00 to import mode and avoid data duplication and data latency uh the same as in direct query so what happens with direct Lake it's from conceptual point of view it's same as direct query so user interacts with the report report generates Dex query Dex query is executed directly against data stored in one Lake remember we store data in Delta files in Delta tables in one lake so these queries will be executed directly
            • 209:00 - 209:30 against those files but because Delta or let's say par which is the native format be uh uh in the background under the hood of Delta par stores data in a way which is very similar to this very very similar so it's colum structure and because of that uh powerbi engine is capable of quickly transcoding this data and putting it
            • 209:30 - 210:00 here in its inmemory optimized storage so we are basically getting performance similar to import mode sometimes the same as in import mode depends on different things but let's say similar to import mode with the concept of no data duplication data still stays here physically powerbi just loads certain columns that are needed by query it loads them on demand here in its uh in its vertie uh database
            • 210:00 - 210:30 so that's direct Lake in a nutshell what are the prerequisites for direct Lake uh first fabric capacity or powerbi premium so if you are using premium user you are out of luck so fabric e either F or P license Lakehouse with a SQL end point for direct query fallback we're going to examine this in a few minutes or Warehouse so direct Lake works on both Lakehouse and
            • 210:30 - 211:00 Warehouse Delta tables so if you store data in other format uh than Delta no direct cck there it's not possible and then we have this V ordering where I intentionally put this asterisk at the end because it's not a hard prerequisite it but essentially it will uh it will ensure the best possible read performance uh when your Delta tables are V ordered without going too deep into explaining uh what V ordering is
            • 211:00 - 211:30 think of it as a fabric specific way of additionally optimizing data in parquet files uh during the data writing process so all the engines when they write data into one Lake they they will by default apply this reordering it's kind of reshuffling the data sorting in specific order so once the engine needs to read the data it's already sorted it's quicker to to retrieve the data that's the logic behind reordering some
            • 211:30 - 212:00 limitations uh as of today again uh uh quering one single Lakehouse or Warehouse meaning if you want to combine data from multiple louses it's not possible but you can use this concept of shortcuts remember so I have a Lous a and I have a Lous B let's say in my semantic model direct L semantic model I need tables from both louse A and B I can't do that out of the box but I can
            • 212:00 - 212:30 create a shortcut in my Lous a which points to a table in Lakehouse B and from there this table will be accessible from lake house uh from Lakehouse a and then direct Lake uh model can read the data from from there tcal VI are not supported we already mentioned that so uh you can include them so you will not get an error or or something like that you can include them in direct Lake model but all the queries will automatically all the queries that get
            • 212:30 - 213:00 the data from those views will fall back to direct query uh Dax queries that are ex exceeding limits based on your the size of your fabric capacity or if they're using any unsupported features will fall back automatically ially to direct query no Dex calculated columns I don't know if this is good or bad but uh if you rely on uh implementing your calculation logic with Dex uh and Dex uh calculation
            • 213:00 - 213:30 calculated columns in Dex then again you are out of flck it's not supported in uh direct Lake no composite models at least not yet so it's just direct Lake All or nothing you can't use three tables in direct Lake two tables in import and so on and no relationships on date time columns so just to be clear you can have tables in your model that are date time data type but you can't create relationships on on those columns and again I've included link which uh lists
            • 213:30 - 214:00 all the current limitations and make sure always to to double check uh what's currently uh uh up to date because yeah this list used to be way longer previously when I when I did uh uh uh explanations for direct Lake now uh some of these limitations are lift lifted so yeah let's talk about relationships and how to implement them uh first of all uh unlike in traditional relational
            • 214:00 - 214:30 databases they have in semantic models in part by semantic models they have additional role or additional uh uh purpose to serve as filters uh when we are executing Dex calculations filters are being propagated through relationships so let's first examine the concept of cardinality in relationships and there are four possible options one to many many to one one to one and many to many so let's start with one to many many to
            • 214:30 - 215:00 one the one side means that the column contains unique values whereas the many side means that the column may contain duplicate values one to many and many to one are essentially the same from cardinality point of view the only difference is where the relationship starts from and I'll give you a very simple example let's say that we have two tables in our model product and sales and there is a product key column in both tables and we create a relationship between two tables
            • 215:00 - 215:30 on that column however since the product key in the product table contains unique value if we Define a relationship from this column to the product key column in the sales table this will be one to many relationship if we do the opposite it's going to be a many to one relationship since there can be many product key repeating values in our sales table one to many should be your Preferred Choice when you're designing Power by semantic
            • 215:30 - 216:00 models the no discussion about that so we we can uh talk about yeah uh that in more detail in some other sessions but one to many for multiple different reasons should be uh Preferred Choice whenever possible in powerbi semantic models onet toone relationships are usually a sign of suboptimal model design and should be reconsidered whenever possible so try to avoid one to one it's usually a sign that you didn't do a proper data modeling many to many
            • 216:00 - 216:30 relationship are also considered as exceptions rather than a rule and there are certain scenarios where using the many to many relationship would make sense such as connecting fact tables with different grain or connecting uh many to many dimensions and a good example of this should be uh let's say we have a model that contains two tables customer and account and one customer can have multiple accounts also one account can be shared by multiple
            • 216:30 - 217:00 customers in this scenario a recommended practice in powerbi semantic modeling is to create a so-called Bridge table which stores unique combination of each customer and account and then we create two one to many relationships between our Dimension tables and the bridge table okay another important concept when designing relationships is filter Direction uh there are two possible
            • 217:00 - 217:30 options to choose from single Direction which is a recommended practice and filter in both direction or bir directional filter this option has its use cases but should be implemented with special care uh the arrow on the relationship reveals the filter Direction in this case we are talking about the single filter Direction going from the product category table then to the product table and then all the way to the sales table this means we can filter values in both
            • 217:30 - 218:00 product and sales Tables by selecting a value in the product category table but not the other way around okay but if I set filter direction to both then we would be able to propagate uh filter values in both directions again bir directional relationships should be used with special care because they can cause inaccurate results and performance issues in case there is a
            • 218:00 - 218:30 need for specific calculation to leverage bir directional filtering the recommended practice is to use to implement it by using cross filter Dex function for that specific calculation only and then with cross filter function you can open the relationship in both directions just for a single calculation instead of keeping it open for uh all all use cases okay so next concept is Dex
            • 218:30 - 219:00 variables and this one is uh sometimes overlooked by uh new users in powerbi Dex variables have three main uh uh advantages first one is to reuse expression logic uh the second one is to improve performance because they will be evaluated by engine only once instead of multiple times and they also help to easier debug your Dex code so this is an example of using variables versus not
            • 219:00 - 219:30 using variables and in this very basic example you see that uh how easier is to understand and read the first uh uh the first statement here on the top versus the second one where I basically repeating the same thing uh multiple times so use variables whenever you can iterator functions uh are called iterator because they are iterating over a table and they evaluate the expression for each row and then finally aggregate
            • 219:30 - 220:00 result so how do you recognize them they all end with x so relatives of of uh scalar functions like average count sum mean and Max so average X count X sum X and so on those are iterator functions they are iterating over each row calculating the expression and then finally aggregate result aggregation depends on what you want to achieve average value or sum or
            • 220:00 - 220:30 whatever keep in mind that when you write uh when you write a DEX expression like this total sales equals sum of column within a table this will be internally translated to this version down below to sux okay so even though you don't explicitly write it the engine will will uh uh translate this uh this expression like this and what is most important thing please keep this in mind also for dp600 but also for your work
            • 220:30 - 221:00 whenever you need to calculate the expression that includes more than one column you must use iterator function so you can't use color functions if you your expression uh includes uh uh multiple columns let's now talk about window and info functions so window functions we already examined some of them uh in SQL in SQL world uh but yeah they're relatively new enhancement in index
            • 221:00 - 221:30 language so uh similar to SQL they aim to provide possibility to calculate specific Expressions over a sorted and partition set of rows so things like sorting customers by total amount spent or comparing the sales from current year and previous year or calculating moving averages or running totals uh for those use cases window functions are perfect fit they also rely similar to the SQL
            • 221:30 - 222:00 relatives to two uh two uh uh uh arguments partition bu and Order bu order bu uh partition bu determines the window itself same as in SQL uh example so so subset of rows where we want to apply uh our calculation on whereas order by defines the Sorting order of the results within that window let's first introduce an index function the index function because that's the most
            • 222:00 - 222:30 straightforward one this function simply returns uh a particular row of the table based on its position for example if you want to return the brand with the highest sales uh you can use the index function and specify number one for the best performing brand in case you want to return the second you just simply switch this number here to number two and so on and so on in this example I haven't used Partition by clause which means that we perform our calculation on
            • 222:30 - 223:00 the entire table now let's imagine that we need to identify the highest selling brand within each category so not the overall High selling brand that we already calculated that but top perform in each category such as audio computers accessories and so on so Partition by will create smaller subset of uh uh of rows and then we perform our calculation in the scope of each of these smaller subsets of
            • 223:00 - 223:30 rows this is index then we have offset offset is similar to lag and Lead function in uh SQL so this one with minus one will return the value from the previous row if you do minus two then it will go two rows back and so on if you include positive values here instead of negative then it will move forward and introduce next values from this one okay window function is probably the
            • 223:30 - 224:00 most complex one we have a basic table uh which shows the total sales amount for uh for each year so let's say uh not for each year for each customer sorry uh I'm analyzing two customers here from the adventure Works Data Set uh Adam Young and Alexander Jenkins and here is the summary of their order orders and total sales amount and the first concept to understand here is that we want to treat each customer as a separate entity
            • 224:00 - 224:30 meaning we want to create a window for each customer and then analyze figures for that specific set of rows in our case we will have two windows here my first window is customer Adam Young my second window is uh customer Alexander Jenkins now window functions index can work in two different ways either by operating on a relative value based on the current row or by operating on an absolute value in my case I always want my window to start
            • 224:30 - 225:00 at the first row of each partition so each customer is a partition and finish on the last row of the partition so this measure will allow me to calculate a running total of order quantity and sales for every partition now this looks super complicated but it's not trust me essentially what you need to take into account is how do you create your partitions and how do you set your
            • 225:00 - 225:30 Arguments for taking relative versus absolute value and how many rows you want to go back and forth within this so those are three key things to keep in mind with window function information functions are uh functions that uh can be used in multiple scenarios the most common is when you need to uh obtain information about the data type or filter context for example of course there are many information functions but some most commonly used are contain string will
            • 225:30 - 226:00 return true if one text one text string contains uh another text string or has one value which evaluates to true if there is only one value in the specified column we also have another group of information functions recently introduced that starts with the keyword info and then dot after that then you can obtain uh a bunch of useful information about your semantic model and its core elements such as tables
            • 226:00 - 226:30 columns measures calculation groups uh data sources and many many more okay some more features that are relevant for dp600 calculation groups so calculation groups instead of giving you a dryer definition uh I'll try to share a descriptive version and imagine that you have a fact table which contains 10 different metrics 10 different explicite
            • 226:30 - 227:00 measures such as sales amount sales quantity discount amount return quantity and so on and so on so you create you go and create 10 measures for each of these metrics so next step is to provide your business users with addition information for example what was my sales amount in the previous quarter or how was how has my sales amount uh uh sales quantity developed year-over year or what is the running total of the return qu quantity in the selected period and so on and so
            • 227:00 - 227:30 on so how do you achieve this of course you'll write a measure to calculate the sales amount for the previous month measure to calculate the sales amount for the previous quarter measure to calculate the sales amount for the previous year then you will write the same set of calculations for sales quantity then for return quantity and so on and so on and you quickly end up with 50 or 100 measures in your semantic model and what's the deal most of them
            • 227:30 - 228:00 uh most of them contain basically redundant logic you're just copy pasting the expression and you just switch the base measure which is sales amount or sales quantity whatever it is so here is where calculation groups come to the rescue uh with the calculation group you set a placeholder for the specific base measure for example base measure is our sales amount and then you define a set of calculation items that will be
            • 228:00 - 228:30 applied to that placeholder measure I know it sounds a little bit confusing so let me show you a simple example of the placeholder measure so placeholder measure is defined by using selected measure function and this function will evaluate the measure that is currently in the context whatever measure is currently in the context will be evaluated through this selected measure and then we have multi one or multiple calculation items for this measure in the context in this case I have two
            • 228:30 - 229:00 calculation items one for month to month-to date calculation which is on your right and year to date on your left and as you see the only difference is the calculate filter modifier there are no two separate measures anymore it's just a single base measure in the scope we are talking about sales amount here and then these calculation items will be simply applied on this measure in the
            • 229:00 - 229:30 scope okay the next thing is field parameters one of my favorite features in powerbi so let's examine how to leverage this to push your powerbi solutions to to a whole new level in a natural field parameter s allow you to perform two actions dynamically change the attribute for slicing and dicing the data in the visual meaning dynamically switch between different columns and dynamically change the metrix displayed in the visual meaning
            • 229:30 - 230:00 dynamically switch between different measures and I hear you I hear you Nicola we could have done this before field parameters as well yes that's true but instead of Dex complexity uh using three test and all these exotic stuff you can now set everything up with just a few clicks and without writing a single line of Dex code so maybe I can show you quickly a demo of field parameters let me first okay yeah never
            • 230:00 - 230:30 mind so how do you create field parameters uh again you need to first to enable them in preview features uh more than two years they are still in preview feature but they work they work well uh I can confess that I implemented them in real life multiple times so under modeling Tab and then new parameter choose Fields but let's first let's first build a visual so I can show you later how to how to uh put field parameters in and let's say that I don't
            • 230:30 - 231:00 need this let's say I'm showing total sales amount which I don't have but I will create now so I have sales amount equals some of sales amount and let's create a new one for quantity order quantity equals sum of order
            • 231:00 - 231:30 quantity and let's say I'm showing here total sales amount by product color now wouldn't it be cool if I can somehow now instead of color if I can display my data here based on on for example I don't know uh education of my customer okay so instead of color I want this view this uh view uh the same calculation based on the on the customer education so previously as I said this was very hardly achievable with some
            • 231:30 - 232:00 decks or using bookmarks overlaying visuals and so on so let me show you how easy it is to do now with field parameters again going to modeling new parameter and then Fields so let's call this one one uh I don't know uh attribute switch and then I'll choose my product color and then I'll choose my uh English education and I can choose
            • 232:00 - 232:30 to add a slicer to a page or not in this case I want slicer to enable me to to do this and what happens when you create field parameter there is a new calculated table in your model which is called attribute switch and here is a definition so you can change this and say instead of English education I just want to call this education so now when I click on this slicer here obviously nothing happens but let's put
            • 232:30 - 233:00 our uh column from from our uh field parameter table here on xaxis and now what happens if I click on color it's color it's now education and so on and so on that's one thing so I already told you you can switch dynamically attributes uh between different columns but here wouldn't it be cool if we can switch between sales amount and Order quantity again you can do that with field parameters so I'll create a new
            • 233:00 - 233:30 one let's call this one kpi switch and let's put our order quantity and sales amount and again leave this uh in a slicer so now in instead of sales amount on y- axis I basically want to use this kpi switch value So currently is order quantity now it's sales amount now it's color sales amount color order quantity so full flexibility I build this in
            • 233:30 - 234:00 three minutes no decks no bookmarks just field parameters I love them okay so going back to a presentation Dynamic format strings another feature that can help you you uh enhance the the look and feel of your powerbi Solutions so it helps you to determine how measures appear in the visual it's usually uh a cers some anous task it was previously to dynamically
            • 234:00 - 234:30 format I don't know currencies or or whatever now you can conditionally apply a format string with a separate Dex expression and essentially it allows you to overcome the limitations of the format function so this is the example of uh implementing Dynamic format strings so under format for the specific measure you choose Dynamic and then here under formatting you define the format you want to use just to show you how it looks in powerbi let me go to my order
            • 234:30 - 235:00 quantity so here under format I select Dynamic and then here I can Define for example I don't want zero like this for example and so on so you can Define how you want to format uh format your measures with Dynamic format sing large semantic mods working large semantic models in powerbi we will go over four hours for sure so uh yeah hope you will stay with us uh large semantic
            • 235:00 - 235:30 mods in powerbi first be before I show you how to enable or disable this uh option in for your semantic model in a workspace I want you just to reiterate about different types of tables that we are using in our semantic models so we have two main types of tables in our models we have a tall table with many rows and not so many columns we are usually building our fact
            • 235:30 - 236:00 tables to be narrow and long okay so that's a tall table with not with many rows and not so many columns and then we have usually Dimension tables which don't contain millions and millions of Records there are ex there are exeptions but usually your dimension tables are smaller in terms of number of rows but they can be very wide so they can there can be columns Dimension columns with Dimension tables with I don't know 50 plus columns so those are considered wide tables with many columns and less
            • 236:00 - 236:30 rows and powerbi is built in this way to uh efficiently handle both of these scenarios but if you build your table like this uh to be both tall and wide to contain many rows and many columns in other words if you use one big flat table this is usually not a good idea for powerbi at least there are some tools out there I'm sure that uh work well with one big flat table powerbi is
            • 236:30 - 237:00 not one of them so you don't want uh uh uh both tall and wide tables in your malls and then when you build these models there is a feature called or option called enable our semantic M storage format uh this one allows your model to go beyond the default limit default limit is 10 gigabytes uh above that you can't uh essentially if you don't enable this feature your model cannot be larger than
            • 237:00 - 237:30 10 gigabytes but if you enable this feature then uh your model can grow up to the size to the maximum size for your uh defined by Your Capacity so if we talk about for example uh f64 which is is equivalent to P1 the maximum model size is 25 GB this doesn't mean that you can have five models of 5 gigabytes so each individual model can be up to 25 gigabytes with this large semantic model
            • 237:30 - 238:00 storage format Ena so limit is per model not the sum of all models uh also what Microsoft suggests I didn't test it myself so I cannot confirm if that that is true or not that the this uh enabling this feature will also help smaller models so if you also have models of a few hundreds of megabytes uh because it will improve xmla Vari operation performance uh in semantic model settings there is uh there is a property
            • 238:00 - 238:30 called large semantic M storage format where you can turn it on or in the workspace settings this this one on the left hand side will turn it on just for an particular semantic model whereas if you do that on a workspace level then all the models will be uh converted to use this uh uh large semantic model format the idea is to with this one uh powerbi handles automatically semantic
            • 238:30 - 239:00 model eviction from the memory so if the model is not used and there are no operations on this model uh powerbi will automatically kick it out from uh from the memory and also on demand Lo meaning only those columns that are needed by Dax query will be loaded into memory that's how this uh how this uh uh property Works talking about composite models feature in powerbi uh we are talking about composite models there are two
            • 239:00 - 239:30 possible use cases uh when you combine two or more different direct query sources or when you combine one or more direct query sources and import mode Please be aware that all imported data is considered as one source so never mind if you connect from uh uh powerbi to Excel file SharePoint list SQL database and you bring all this data into powerbi import mode from composite model perspective that's one source so
            • 239:30 - 240:00 it's not three sources it's one source and then every direct query is a separate source group within this composite model uh you can see in this illustration I have different types of arrows some of them are uh regular arrows some of them are dashed and if you're wondering what these dashed lines represent they represent limited so-called limited relationships limited relationships exist between uh uh
            • 240:00 - 240:30 between the tables that are in different storage modes and in this case I have one store one uh Source data source which is Oracle database one data source uh which is requiry over SQL Server database and my imported data is considered as a vertie data source so whenever you establish relationships between tables that are in different Source groups they are considered limited relationships aggregations in powerbi we
            • 240:30 - 241:00 already talked about aggregations previously in general uh when we talked about louse and Warehouse in powerbi aggregations go One Step Beyond because it allows you to uh Define aggregations on a semantic model level and that your queries will be automatically remapped and executed and results retrieved from those aggregated tables the key thing here is to make powerbi aware of aggregated tables so once you create
            • 241:00 - 241:30 your aggregated tables and put them in the model that's not enough for powerbi to know about them so you need to make powerbi aware of these aggregated tables and this is achieved by using manage aggregations option once you click on a table there is an option manage aggregations and there you define how you want your aggregations to behave please please keep in mind this slide this slide is important because you will
            • 241:30 - 242:00 for sure get a question on dp600 about the best storage mode choice for specific table type in your composite model then you are creating aggregations in Pi the original table must be in direct query storage mode so it's not possible to create aggregations in powerbi if the original table is not in direct query it must be direct query then Dimension tables they can be both
            • 242:00 - 242:30 import and direct query but from Performance Point of View and to avoid these limitation that we mentioned previously of limited relationships Dimension tables should be set to dual storage mode dual storage mode is the one with dash line on the top this one here so Dimension tables should use dual storage mode and then depending if the quer is retrieved from the aggregated table they will behave as import mode if the quer is uh uh retrieved from direct query table they
            • 242:30 - 243:00 will behave as direct query tables and you will have regular relationships between those tables and aggregate tables those where we aggregate data should be in import mode again technically you can put them in direct query but should be in import mode because the overall idea is to improve the performance and we already learned that performance is significantly better with import mode compared to direct query we are done with designing and
            • 243:00 - 243:30 building semantic Ms and one last thing optimizing Enterprise scale semantic mes let's talk about first who complains about performance so we know where to look for a problem if it's your DBA or some someone from your it Department probably your issues are somewhere in the background and we are talking about data M size or data refresh process if
            • 243:30 - 244:00 your end users your final report consumers are complaining about the performance usually the problem is somewhere on the front end and it's most commonly it's either Dex measures or visual surrendering time luckily powerbi uh provides a built-in tool which is called performance analyzer in powerbi desktop that helps you to capture key metrics of the report performance now what does this performance analyzer is capturing three things or three items
            • 244:00 - 244:30 Dax query Dax query uh refers to time needed to return query results visual display which represents time elaps for visual rendering and the final one which is is the trickiest one and it's called other other can mean many things but usually in 99% of cases when uh the value for other is high this means that the specific visual had to wait in the
            • 244:30 - 245:00 queue for all the other processes in queries to complete because don't forget uh formula engine in powerbi Works in a single threaded way so if you have 20 different uh visuals on your report page each of them they they will be they will be executed in sequence so not in parallel and that means that the poor 20th guy has has to wait for previous 19 to complete before it uh it gets its turn so how it looks like from powerbi
            • 245:00 - 245:30 perspective you can uh turn on performance analyzer from optimize tab or from view tab depends performance analyzer and then you click start recording and once I refresh visuals you see that okay in this case it's super fast because it's uh simple and I don't have uh too many visuals so what is what is here because we have field parameters it's also evaluated parameters but this is those are three three metrics that I mentioned Dex query
            • 245:30 - 246:00 visual display and other so this other means that uh this visual here had to wait 25 milliseconds for all the other processes on the page to complete before this one could be could be uh executed so you can sort results here per different uh uh items let's say per uh total time and then I can sort in descending order and this means for example this one he had to wait 69
            • 246:00 - 246:30 milliseconds and so on and so on so performance analyzer is a great tool to give you a first hint about what is the problem with the report page and then from there you can explore and uh uh and try to try to uh troubleshoot performance uh by using yeah many different techniques obviously Eugene has Eugene has course on uh powerb Performance Tuning it's great so uh uh I encourage you to check it and you will learn how to optimize
            • 246:30 - 247:00 most of these most of these scenarios if you have many visuals on the report page the report page will render obviously slower if you reduce the number of visuals on report page it will uh perform better obviously some of the bottlenecks most common bottlenecks if you talk about the visuals uh reduce the number of visuals some not sometimes but more often than not uh uh keep it simple simple is
            • 247:00 - 247:30 better so don't try to over impress with too many visuals reducing the number of visuals with Will speed up the report and remove the Clutter from the report page so it's a win-win situation for Dax Square is I I don't have good news you need to learn and understand that so we cannot talk about that but you know what are the best sources go to SQL bi.com or uh check uh the book definitive guide to Dex and uh yeah learn and understand how Dax works and then you will be able
            • 247:30 - 248:00 to uh to optim optimize your Dax queries a few few more words about data refresh process in direct Lake uh internally this process is called framing uh essentially with direct leg there is no classic refresh like with import mode uh uh models with import mode models powerbi will literally pull the data physical data from your data source and bring this data into its
            • 248:00 - 248:30 local database called verti with direct C it's not pulling any physical data it's just a metadata refresh process so that means that even for uh large tables with hundreds of millions or even billions of rows this process of refreshing data should be very fast maybe like 20 30 seconds Max this is the illustration of how this process works so what I have here I have a Lous and I have Delta table and I have
            • 248:30 - 249:00 uh different versions of the file for this Delta table let's say there were four versions of files for this table so far and this is my direct leg semantic model So currently what powerbi will do we'll create a frame and take the last version of take these files that are currently relevant for for semantic mode and then let's say a new file pops up okay so this one is not included in my
            • 249:00 - 249:30 semantic model so what happens powerbi will first kick out everything from memory and then extend the frame to include the metadata information about this latest file they came that came so this is the process of refreshing data in uh in direct Lake no real data from these files so no real data goes there no it's just a metadata
            • 249:30 - 250:00 like I have this version and I have some information here about data types and so on and so on this option is by default turned on meaning uh powerbi will check every minute or so if there are new records in the underlying table and it will synchronize the data with direct like semantic model but maybe you want to turn this off for different reasons uh and do this automatically or do do do
            • 250:00 - 250:30 this uh uh with more control for example maybe once your ETL process is done you want to refresh your semantic model for example using pipelines like we did uh at the beginning of of this session with this semantic model refresh activity and one last thing that I want to show you is direct leg Behavior property there are three options here to choose from we were talking about uh falling back to direct query remember I
            • 250:30 - 251:00 mentioned multiple times if for whatever reason direct L cannot be uh uh achieved direct L Behavior by default this query will fall back to direct query mode and will be executed as regular direct query uh query against SQL analytics end point so this is a default Behavior if you set this property value to automatic that means direct query if direct lay cannot be used then you can change this to be direct lay only meaning if for whatever
            • 251:00 - 251:30 reason there is no direct C then you will get an error and finally direct query only as its name says no direct like happening at all how do you set this uh I need to open let me go to my workspace and find some direct Lake model let's close this one and for example I think I have something
            • 251:30 - 252:00 here I'll click click on Three Dots and then open data model and then here under model uh it's direct C so I forgot where to find it tables relationships tables it should be somewhere
            • 252:00 - 252:30 here h ah here it is I need to click on semantic model here at the top and then uh direct L Behavior as you see it's automatic it can be direct L only meaning no direct query or direct query only means the opposite there is no direct C so you can set this uh you can change this Behavior
            • 252:30 - 253:00 here okay and then let's go back to a presentation uh incremental refresh I think that's the that's the last topic uh way to optimize workloads uh in uh uh powerbi workloads so uh the idea of incremental refresh in general so we are not talking now specifically about powerbi is that uh we want to synchronize data we have data coming in
            • 253:00 - 253:30 storing from applications and all the other stuff in our transactional database and we want to synchronize the data between transactional database and our Central Storage uh uh location let's say it's our data warehouse so we're talking about things before Fabric and if we each and every time load the data from the entire table this is time and resource consuming so we want to avoid this scenario and the idea about incremental refresh is check when the
            • 253:30 - 254:00 data was last time uh loaded into our Central Storage repository if that was for example yesterday at midnight I don't want to check uh I don't want to change the records that were previously already loaded uh I just want to load only those records that came after yesterday midnight in powerbi incremental refresh Works in a a little bit specific way uh
            • 254:00 - 254:30 all powerbi tables consist of single partition so it doesn't matter if your table has two rows or two billion rows it's by Def a single partition always and then with incremental refresh the idea is to split this partition or split this table into two partitions one partition which contains rows that don't change and another one uh which contains
            • 254:30 - 255:00 rows that are changing so they more more recent records this is an incremental refresh workflow that happens in powerbi you see that what was uh previously a real-time data now after one day becomes incremental refresh data and this is called this is soall rolling window pattern that is implemented during the incremental refresh process and this process just goes every single day it moves the window one one step
            • 255:00 - 255:30 back some of the prerequisites uh you need to have a date column in the table that where you implementing incremental refresh this date column can be either date time or integer data type query folding must be in place query folding one of my favorite topics in parbi but yeah it's out of the scope here and it's not tested really on dp600 but keep in mind that query folding must be in place because for incremental refresh we are using date range
            • 255:30 - 256:00 parameters and they must be uh translated to a ver clause in this SQL query and it allows uh uh uh defining incremental refresh just on a single day data source so you can't combine multiple data sources here okay so we are done with all the skills measured and now I hope you ready to play some games special special thanks to my
            • 256:00 - 256:30 friend Ricardo Ron from Barcelona uh and I'll explain why so I expected that a lot of people will be here and unfortunately my license for creating quiz supports up to 100 participants and then I asked last evening if someone has a license uh for a larger number of participants and Ricardo immediately reached out to me and offered uh his license for cahoot and therefore huge thanks to to Ricardo for uh basically enabling us to to play
            • 256:30 - 257:00 this quiz uh let me just quickly go and okay so there you go we have just to give you a short hint we have 10 questions but we have first question which is like a warmup so just to for you to see how it works and then after that we have 10 questions with points so
            • 257:00 - 257:30 yeah all the things that we covered in the previous four hours will be included so uh hope you are familiar with cahoot you can scan the C code on your screen or you can go toid and then enter this game pin I'll give you a few minutes for everyone
            • 257:30 - 258:00 to to join you don't need to provide your real name if you don't want feel free to be f fabric ninja or fabric master or
            • 258:00 - 258:30 whatever come on I want to see at least 101 participants because because of the license so I don't feel bad bothering
            • 258:30 - 259:00 Ricardo
            • 259:00 - 259:30 for for
            • 259:30 - 260:00 okay
            • 260:00 - 260:30 then I'll slowly start so the first question is just a warming up for you to understand how cahoot works here there are no wrong answers so
            • 260:30 - 261:00 whatever you mark this will be a correct answer but no points yeah great point from Shannon this game is based on both speed and accuracy just for your information yes that's a great point I forgot to mention that so essentially uh it's important that you are fast obviously the most important thing is that you provide the correct answer but after that it also matters if you if you're fast or
            • 261:00 - 261:30 not okay so now I hope you are ready let's play question number one apologize questions are displayed as photos because of the character limitation for the for the uh question
            • 261:30 - 262:00 itself but I will read it loud so everyone can understand you have a fabric workspace that contains a parbi report you need to modify the column names in the parbi report without changing the original names in the underlying Delta table what Warehouse object should you create column story index table value functions schema or View
            • 262:00 - 262:30 okay most of you answer correctly so it's a view uh basically with the column
            • 262:30 - 263:00 store index we don't change column names with table value functions we don't use that or schemas to change the column name without or uh changing the original names for that VI use views okay we have a lot of correct answers let's jump straight into question number two you need to injest data into Lake housee from a large SQL database table
            • 263:00 - 263:30 that contains 500 million plus records data should be ingested without applying additional Transformations Your solution must support a low code approach and minimize effort what should you use copy date activity data flows Gen 2 notebook or SQL store
            • 263:30 - 264:00 procedure great point from from Pam all of you getting the answer right go schedule your exam yes take it before New Year yeah okay again most of you got it right
            • 264:00 - 264:30 but I need that's why I love using those quizzes so that helps me to understand if I explain certain Concepts in a proper manner or not obviously here not uh I'm happy that just two of you answered notebooks and SQL store procedures the question says your solution must support a low code approach so notebooks and SQL store procedures are code first approach then it boils down to data flows gen two versus copy dat activity in this case it
            • 264:30 - 265:00 says without applying Transformations so in those cases because we we want to minimize the effort copy data is the way to go so data flows are their primar primary use case is when we need to apply Transformations okay let's move on to the next one number three you have the report sales with an
            • 265:00 - 265:30 Excel file as a data source the table contains the following columns product ID product color product name product category and sales amount you need to create a star schema model which column should remain in the fact table data modeling 101
            • 265:30 - 266:00 okay most of you got it right I'm happy
            • 266:00 - 266:30 because of that congratulations no need to explain anything let's move on to number four you plan to implement a dimensional model in a warehouse the solution must be able to perform point in time analysis whenever customer status changes the change persists in the customer table and the new row is added
            • 266:30 - 267:00 what slowly changing Dimension type should you use 0o one two or three we talked about that I'm moved from Belgrade to salburg now I plan to move to Spain which slowly changing Dimension type should they use to handle my records
            • 267:00 - 267:30 most of you got it right yes for most of analytical scenarios slowly changing Dimension type two is being used this one adds a new row to the existing uh table uh and uh uh handles point in time
            • 267:30 - 268:00 analysis so you can go and check what happened previously so correct answer is type two no changes on the top okay this will be a tough battle but I have a little secret two questions are uh evaluated with double points and the next one is with double points so be careful you are developing a parbi
            • 268:00 - 268:30 semantic model two tables in the data model are connected with a relationship and a single Direction filter you need to establish the bir directional filtering between the tables which deck X function should you use ah good question for 47th Place what can I do to be in first uh to be faster and that others are providing wrong answers it's as simple as
            • 268:30 - 269:00 that yeah Dax yeah I know that's why this one is double points oh okay it was easier than I thought
            • 269:00 - 269:30 yeah cross filter exactly cross filter is the correct answer we are using cross filter function to enable bir directional filtering for the specific calculation let's see if have any changes so yeah Milan is first now cool let's move on to number six you're designing semantic mold with 50 measures such as sales amount order quantity and so and so on for each
            • 269:30 - 270:00 measure you need to create the same set of time intelligence calculations such as month to date year to date Etc the solution must minimize the effort and maintenance what should you do for
            • 270:00 - 270:30 okay great uh not creating a measure folder in the report model will not help in this case so calculation group this
            • 270:30 - 271:00 is a perfect use case for calculation group okay let's move on we are getting closer and closer which of the following tsql functions Returns the rank of each row within the result set partition with no gaps in the ranking values
            • 271:00 - 271:30 yeah this one was tough I I have to
            • 271:30 - 272:00 admit yeah the key thing is with no gaps in the ranking values and as I shown you in this example uh dense rank is the function that uh continues uh uh to rank
            • 272:00 - 272:30 each row within result set partition with no gaps rank will introduce Gap row number simply goes one two 3 4 five and leg is the offset function so it doesn't have anything to do with this okay we have textile as a on the first place but very tough battle with others stay tuned the next question is here you are using deployment pipelines
            • 272:30 - 273:00 to move fabric items from Dev to test environment which of the following fabric items is not supported when deploying the content make sure to learn this for the real exam that's why I included a question like this today
            • 273:00 - 273:30 okay yeah fair enough this was a tough
            • 273:30 - 274:00 one for those of you who were not looking into this slide when I introduced uh the list of items that are supported currently for deployment pipelines so everything for out everything from this list is supported except dashboards so dashboards are not supported within deployment pipelines and they will not be transferred to the next stage okay tough battle we have two more
            • 274:00 - 274:30 questions we have two more questions you are developing a parbi solution with in a large organization you need to ensure that the solution supports proper Version Control and cicd requirements in which format should you save the powerbi file pbit PBX pbip or
            • 274:30 - 275:00 PBS we talked about this topic at the very beginning so maybe it's not fresh in your mind but I hope you you carefully listen what I was talking
            • 275:00 - 275:30 about yes y most of you got it right I'm happy about that yes it's pbip it's pbip that's correct answer final question so before final question let's see the leaderboard the last question is also double points so be be careful and be ready to answer
            • 275:30 - 276:00 this and let's see who wins the game so the last question final question you are working with a kql query set in fabric you need to enrich the existing table by appending the calculated column at the end of the result set which kql operator should you use calculated column at the end of the
            • 276:00 - 276:30 result Set uh question from Sahar so there won't be any spark questions in the exam right I can't tell you because I took the exam uh at the beginning when spark was still there but officially there is no more P spark in the in the exam I'm not sure if they removed all the questions from the old version I I can't tell you that but uh officially no p
            • 276:30 - 277:00 park most of you got it right it's extend append will add new rows not new columns so append append is adding new rows extend is adding uh uh a calculated column at the end of the result set so that's correct answer let's see the
            • 277:00 - 277:30 final leaderboard number three Amino congratulations great result Christa congratulations and who is the winner yosko yosko congratulations great work all of you and thanks everyone for uh Runners Milan and textile great battle from the very beginning so yeah congratulations to all and thanks for for uh thanks
            • 277:30 - 278:00 for participating in in this quiz I hope you enjoyed it and again learned or or refresh some knowledge that we covered today so slowly wrapping up we are already oh ho ho above the time so some practical exam tips from my side that I want to share make sure to prepare the room uh this is relevant for online takers if you go to testing center of course this does not apply but for online takers uh if you take the exam through Pearson
            • 278:00 - 278:30 view they will ask you to remove everything from your table or and everything that is in reach of your hands also no double no no mult multiple screens and so on so make sure to spend some time preparing the room uh for the exam you can use Microsoft learn during the exam that's great news so it's an open book exam you can use it but don't let this face too much of your time I heard many people uh uh telling that
            • 278:30 - 279:00 they rely too heavily on Microsoft learn and they spent a lot of time to find certain things in Microsoft docs so uh this means that only if you are not sure about two or three options and you know what to look for in documentation go and look for it but if you are relying like I have 50 questions and I plan to take 30 of questions uh with Microsoft docs through the exam better
            • 279:00 - 279:30 don't take better don't take the exam before you are ready this is good just for for a uh for a few for a finding answer for a few questions focus on understanding specific functions index TQ kql and their order of execution there are many questions with you know drag the correct order or fill in the blanks on or or stuff like that where you really need to know which function does what like cross filter for example
            • 279:30 - 280:00 and what is the order of execution so if you write a window function in t equal it's all always over Partition by order by it can't be in different order so make sure to to familiarize yourself with with this part understand when to complete a specific task in a certain way for example when to ingest data with pipeline versus notebook versus data flow gen tool versus using shortcuts remember we were talking about all of
            • 280:00 - 280:30 these features so read through the question stem read carefully and understand what what needs to be done is it low code is it code first uh is it with data transformation or without data transformation and so on and so on some final uh remarks where to go from here I would say the best way to learn is by doing so set up your fabric trial account and try it for 60 days for free that's amazing starting point uh
            • 280:30 - 281:00 there is a link to activate your fabric trial capacity Microsoft fabric official documentation is a tremendous resource for learning all the stuff related to Microsoft fabric not just this which is uh relevant for dp600 I'll be doing a training on Microsoft fabric as a whole not for from a perspective of analytics engineer but as a whole so we cover realtime
            • 281:00 - 281:30 intelligence uh data science stuffs like semantic link and so on this is not a free training so yeah if you want to learn more about fabric that's the one and yeah there are many free learning resources from community that I want to share of course you can refer to my blog data minus.com but uh I would also like to share uh few more learning resources uh Andy Cutler who was here uh moderating serverless sq.com uh our also our friend Kevin
            • 281:30 - 282:00 chant has great blog with many many topics covered uh at Kevin rant.com and also we'll need them probably most of you know him uh he's running uh learn Microsoft fabric YouTube channel it's full of fantastic resources so uh you can just scan QR codes for each of these links and go directly there and uh uh and pick up where we left off so it's it's wherever you go from here it's uh
            • 282:00 - 282:30 it's good that was all for my side one last thing I would like to thank again my friends who joined and moderated this chat this couldn't be possible this couldn't be possible uh this live stream couldn't be possible without you so my uh uh heartfelt thank go to all of the friends who who manage uh chat today uh thanks to to again to Ricardo for the
            • 282:30 - 283:00 license thanks to Pam and shanon from Microsoft for joining and uh helping to promote this uh Workshop today and I wish all of you luck good luck with the exam and I would like to hear if you pass or not so please share this information with us uh we would like to we would like to hear from you if you pass the exam but I sincerely hope that you are going to smash it uh after this workshop and after spending some time uh learning through Microsoft documentation uh that's all thank you so
            • 283:00 - 283:30 much and uh have a great weekend