Unify your data with OneLake and Microsoft Fabric | BRK169
Estimated read time: 1:20
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
In the insightful video by Microsoft Developer, the presenters explore the transformative potential of OneLake and Microsoft Fabric in unifying data management for organizations. They discuss the traditional challenges with data lakes and the breakthrough solutions that OneLake offers, akin to how OneDrive revolutionized file sharing. By integrating various computing engines and utilizing open formats like Delta Lake and Iceberg, OneLake eases data movement and interoperability across different platforms and clouds without duplication. The session underlines the enhancement in analytics and AI capabilities, providing seamless data access and governance in a single, comprehensive solution.
Highlights
OneLake provides a seamless data lake experience as a service, similar to what OneDrive offered for file storage. 🗂️
It unifies data across global storage accounts, simplifying governance and security within a single tenant. 🌐
Supports various computing engines and open-source formats, allowing fluid data interactions and reducing data silos. 🔄
New features like multicloud shortcuts facilitate data management across different clouds and on-premises environments without data duplication. 🌥️
Integration with AI tools like Copilot streamlines data analysis and reporting, making it accessible for users of all skill levels. 📊
Key Takeaways
OneLake acts like OneDrive for data, simplifying data lake management without needing to build solutions from scratch. 🏞️
OneLake supports open formats (Delta Lake and Iceberg), enabling seamless data access and integration across platforms like Snowflake and Microsoft Fabric. 🔗
The service offers robust data governance, ensuring data security, lineage, and compliance within a single tenant structure. 🛡️
New capabilities such as 'shortcuts' and 'mirroring' reduce redundancy and streamline working across different data sources and clouds. 🚀
Copilot enhances the data warehousing experience in Microsoft Fabric, offering AI-powered assistance for data management and analytics. 🤖
Overview
At the heart of Microsoft's presentation is OneLake, a game-changing solution that aims to streamline data management much like OneDrive did for file sharing. Presented at a major conference, they explored how OneLake addresses data silos, governance challenges, and the complexities of data lake setups by offering a comprehensive service without the need for custom solutions. By providing a single data lake per tenant, OneLake simplifies structure and management for organizations.
OneLake's support for open-source formats such as Delta Lake and Iceberg allows seamless communication between platforms, including powerhouses like Snowflake and Microsoft Fabric. This interoperability facilitates a more cohesive data ecosystem, enabling cross-platform analytics without the need for data replication. This breakthrough is further enhanced by features such as shortcuts and mirroring, which help unify data from multiple clouds and on-premises sources efficiently.
With Copilot's integration, Microsoft Fabric now offers an AI-enhanced data warehousing experience that caters to both novice and pro users. Through intuitive data exploration and analytics tools, Copilot assists in generating insights and writing code efficiently. The collective innovations presented underscore Microsoft's commitment to elevating data operations, ensuring their solutions are future-ready and adaptable across various tech landscapes.
Chapters
00:00 - 03:00: Introduction and Session Overview The chapter begins with a light-hearted acknowledgment of the time constraint before a party, reflecting on the completion of the last session of the conference. Despite the timing, there is a genuine appreciation for the attendees' presence, hinting at the successful conclusion of a significant and presumably enjoyable conference.
03:00 - 09:00: Challenges with Traditional Data Lakes The chapter titled 'Challenges with Traditional Data Lakes' starts with Josh Kaplan, who leads product management for One Lake at Microsoft, preparing the audience for a session. Charles Webb is also expected to join the discussion. The session aims to address the challenges faced by organizations with traditional data lakes and discusses the aspirations of having immaculate and efficient data lakes.
09:00 - 17:00: Introducing OneLake and its Benefits The chapter 'Introducing OneLake and its Benefits' discusses the idea of having a centralized location, such as OneLake, to store all data, whether structured or unstructured. The main benefit highlighted is the ability to eliminate data silos since all data is consolidated in one place. This consolidation simplifies blending, analyzing, discovering, sharing, securing, and managing data. The ideal vision of OneLake is compared to the challenges of analytics with traditional data lakes, likening it to the pre-OneDrive and Dropbox era of file sharing.
17:00 - 28:00: Data Storage and Compute in Fabric The chapter discusses the traditional setup of network file shares, where servers are set up with folders to store files and share them with permissions. This is likened to the setup of data lakes, which also require storage before implementing a data lake pattern. The ideal concept of a pristine data lake often falls apart, resulting in more siloed storage solutions.
28:00 - 40:00: Cross-Platform Data Interoperability This chapter discusses the challenge of cross-platform data interoperability. It highlights that the lack of interoperability is not only due to technical reasons but also because of differing processes and resistance to coordination among teams. Teams often prefer to establish and manage their own storage solutions, resulting in isolated silo storage systems. To overcome this fragmentation and achieve unified data management, the chapter suggests that breaking down these silos by moving and copying data to central locations can be an effective strategy.
40:00 - 45:00: Using Shortcuts in OneLake This chapter discusses the challenges of using data stored in data lakes, including the creation of multiple copies and versions of data that can lead to inconsistencies. It addresses the problem of different stakeholders presenting conflicting data due to these discrepancies, and suggests that shortcuts might be a necessary tool to properly manage and reference the original data.
45:00 - 51:00: Introduction to Charles Webb and Data Warehousing in OneLake The introduction discusses the challenges of data management, specifically the issues that arise when multiple copies of the same data exist. These copies must be secured, governed, and kept up-to-date, which requires building extensive systems and processes. It points out that though this work is demanding, it ultimately yields significant value. The chapter draws a parallel between the objectives of OneLake for data lakes and the achievements of OneDrive for file management.
51:00 - 57:00: Co-pilot and Time Travel in SQL The chapter discusses the concept of a data Lake as a Service, highlighting its benefits such as providing value without the need for building infrastructure. It compares this service to platforms like OneDrive and SharePoint, which started as services for file sharing and evolved into collaborative tools. The chapter notes that just as users have had OneDrive for document management, they will now have OneL for managing data.
57:00 - 66:00: Demo of Mirroring Data and Co-pilot in Action The chapter discusses the concept of having a single Lake per fabric tenant. It is intentionally designed so that each tenant will have exactly one Lake, neither zero nor more than one. This design choice is emphasized as a key decision for managing tenants, ensuring consistency and avoiding situations where more than one Lake could be created.
66:00 - 72:00: Conclusion and Resources The 'Conclusion and Resources' chapter discusses the idea of multiple St LS that operate similarly to office environments, where one office tenant and one fabric tenant are present with a single lake without the need for extra provisioning or setup. This setup offers a unique advantage with Software as a Service (SaaS) that was not available in the previous world (P world), namely the concept of a 'tenant.' The tenant structure provides clarity on organizational boundaries and helps establish an automatic governance boundary, effectively managing any data that falls within it.
Unify your data with OneLake and Microsoft Fabric | BRK169 Transcription
00:00 - 00:30 [Music] all right think the clock is running and there's now 45 minutes between you and the party appreciate everyone showing up and I truly mean that as we were at the last session of the last day hopefully everyone had a great conference
00:30 - 01:00 right hopefully we can leave you with one more good session here and yeah thanks for sticking with us for this last one uh but I'm Josh Kaplan I lead a product management for one lake at Microsoft and Charles Webb's going to join me on stage in a few minutes and uh we're here to talk about one Lake and when we talked to organizations we saw that they had these visions of these very pristine data Lakes they saw
01:00 - 01:30 them as one place to land all their data whether it's structured or unstructured they saw as a place where they could break down data silos because it's in one place everything is there you can be Blended together it can be analyzed together because it's all there in one place you don't have to go looking for it it's easy to discover it's easy to share it's easy to secure it's easy to manage this was the vision the reality though analytics with data Lakes is a lot like file sharing prior to one drive and Dropbox coming along anyone remember how we used to share share
01:30 - 02:00 files and you'd go get these Network file shares You' set up these servers you put some folders on there you'd store some files on there and you put some permissions on those folders and it shared files it works it shares files but you're basically building a solution on top of storage data Lakes are very much the same way you don't buy a data Lake you buy storage and then you implement a data Lake pattern on top of that and very soon this vision of a very pristine data Lake goes out the window and you tend to end up with more silos lakes and this happens for many
02:00 - 02:30 reasons some of them technical but a lot of them are just people in process it's easier to go stand up your own storage do your own thing and not have to coordinate with other teams and coordinate on standards and all agree on the same patterns that you're going to implement so you end up with these multiple Silo storage accounts now what do you got to do you want to get back to that Vision that you had before so you got to break down these silos and the easiest way to do that still is through data movement copying data collocating with the other
02:30 - 03:00 data you need and even once you break down these silos get the data together still most users most applications can't work directly over the data so you build a serving layer cubes data marks data warehouses and these don't just reference the data in the lake they're copies of the data in the lake and often they're copies of copies of the data in the lake you show up in a meeting two people presenting the same number but the number is actually different anyone actually experienced that
03:00 - 03:30 happens um anytime you have two numbers it can happen and two would be a good number often it's a lot more than that multiple copies of the same data all need to be secured governed kept up the date and you have to build systems and processes to make this happen it's a lot of work but it's a lot of value when you when you get it done at the end of the day so with one Lake we want to do for data Lakes what one drive did for file
03:30 - 04:00 sharing one Lake will give you a data Lake as a service give you that value out of the box without you needing to build build it and if you think what one drive SharePoint all did for file sharing starts out as a SAS service for file sharing and it goes much beyond that it becomes a way to collaborate over files becomes a way to share files so you've had one drive for your documents now for many years now you're going to have one L for your data
04:00 - 04:30 and we have a blank slide for some reason interesting okay um so let's see how this works so you'll only ever have one excuse me you'll only ever have one one Lake per fabric tenant we yeah we spent a lot of time naming it so we can't give you more than one but uh you'll never have zero you'll never have more than one um and that's literally by Design um you can never get into a situation now where you create these
04:30 - 05:00 multiple St LS that can't talk to each other just like office right you have one office tenant you have one fabric tenant it comes with one one Lake nothing to provision nothing to set up it's just there and we have a a unique value that comes with a SAS service that we didn't really have in the P world and that's this concept of a tenant a tenant with a tenant we know where your organization begins we know where it ends so we can have this governance boundary automatically that goes around your entire organization and any day that lands within that boundary
05:00 - 05:30 any data in your organization can automatically land within that boundary and that'll be uh governed without the Box lineage data protection also data certifications any of the catalog Integrations that we have we'll work on top of that data automatically now it's the tenant admin that sets up that boundary sets up those initial rules but we don't want that tenant admin to be a gatekeeper right the so think about how in office and in um chairo you don't have to go to your abmin every time you want to create a team Channel or a new
05:30 - 06:00 document site we have workspaces in fabric workspaces allow different parts of the organization to now work independently while all still contributing to the same Lake workspaces can have their own can have their own abins their own access control their own billing even um and they inherit the rules set by the tenant admin by default but can be further restricted based on the needs of the workspace now everything can start to land in one Lake automatically
06:00 - 06:30 and one Lake spans the globe so it's very important in your minds to separate storage from a data Lake data lake is going to be made up of lots of storage you don't have to see that if you use one drive today it's built on Azure storage you don't manage those accounts you work with one drive you manage one drive you govern one drive same with one Lake we're built on top of ADLs Gen 2 we have hundreds of thousands of storage accounts right now that span the globe and we constantly add them as needed and if you have dat residency requirements data that must
06:30 - 07:00 live in a certain location you can assign your workspace to certain capacities and say it's got to be in this region and that's where that data will reside we'll use different Hardware across the world um but we'll virtualize it into one logical Lake for you and by default we're going to store all your data in zone redundant uh storage in case something happens to that particular zone of a data center you won't feel a thing optionally you can turn on GE redundancy actually replicate the data to another region in case there's a a disaster uh in that region and we were to lose that that that data center
07:00 - 07:30 but all this is taken care of for you the performance the scale uh the management of the actual resources all done by us under the covers you work with the data Lake that's that value out of the box now we've talked about the data storage it's the compute that actually makes things happen so in fabric we don't just have one computer we have lots of computers we have lots of experiences um we have lots of what we call workloads and they're powered by different engines so all our data
07:30 - 08:00 science our data engineering is all powered by our Spark engine the data warehousing uh workload is powered by our tsql engine all the real-time intelligence is powered by our kql engine and the bi and powerbi is all powered by analysis Services all these now have been reworked to work over one Lake and they will store their data in one Lake by default all engines now will store their data in one Lake in Delta Park
08:00 - 08:30 format for tabular data it will be in Delta Park format where data Lake will take any format but we made sure that all our engines are working on top of the same open source Delta Lake format uh this is important why because when they store their data in one Lake not only they all storing it there but now they can read the data from any engine they can do this without any uh any copy any import any export even so for example if I write data in a data warehouse I can now read it using spark
08:30 - 09:00 natively don't have to go through the the SQL engine to get it spark can go directly to storage and read it this is also true the new excuse me the new direct Lake mode in powerbi and Analysis Services if you ever remember how you get data into analysis Services you either process it into memory or you use what we call Direct query and you just kind of go off every time someone runs a query go off and fetch the data and try to do that as quickly as possible direct leg combines the best of both by skipping the import and just paging the data directly from one Lake
09:00 - 09:30 directly in memory so I can write data in our data warehouse or I can write it with spark and I can just start building PBI visuals directly on top of that with the same uh performance as as import mode in PBI with the latency of direct lake so this is really important why because a lot of times decisions are made about which platform to go with based on how you want to go build that platform how you going to go build that data um and so if you decide to go with
09:30 - 10:00 SQL and that's what you're if that's what your uh engineering team wants to do is you build a data warehouse with tsql uh and load it with store procs they can do that but you no longer have to tie that to the how the data is going to be consumed so data can be consumed any which way so you're separating how the data is going to be created versus how it's going to be consume so how it's going to be created now becomes a matter of preference you another data engineer engineering team that wants to work with spark and python notebooks they can do
10:00 - 10:30 that all the same consumption experience will work on top of that same data so let's actually take a look at this real quick so here I am in Fabric and I'm looking at my list of workspaces um so it's really easy to create a new workspace it's very lightweight just a couple clicks there's nothing actually gets provisioned but I'll pick an existing workspace now here I see my data item data items are like lake houses warehouses we're going to go into the warehouse data item here and I'll see in here I have a a single table
10:30 - 11:00 a single schema and some data in it now this should look like any SQL experience you've worked with before look a lot like a more modern version of SSS even but in the web uh and it's me it's done on purpose it should feel very familiar to you as a SQL Developer this is why we have different data items they should feel very familiar to what you're doing to what you're expecting if you're expecting a data warehouse it should feel work and act like a data warehouse but with all that said it's actually storing the data here in one L
11:00 - 11:30 and we can see it so you know being the one drive for data we're going to give you a one a one drive like experience to be able to see the data in one Lake and you can actually open up in Windows using our file explorer which requires two clicks uh you'll see the same list of workspaces here but they're folders now inside that workspace I see that data item we were just looking at the data warehouse I go inside that data warehouse I can see a section of tables in that tables section I'll see one schema and one one table I can add a
11:30 - 12:00 second table so I'll write a a tsql script here it creates a table and search some data search one row here we'll see it pop up here come back to the file explorer do a little refresh there's our new table inside that table there's Delta log inside uh the data folder there we actually see a paret file here that's what contains our rows so I don't know how to I don't have to know how to write Delta L if I can work with tsql I can write data to one Lake if I work
12:00 - 12:30 with an application that knows how to work with tsql it can read and write data to one Lake in open standard formats in Delta Lake format here let me create one more item here and this time I'm going to create a lake house the lake house is our most Lake likee data data item and what I what I mean by that is anything you can do in an ads Gen 2 container you can do here really in a lake house uh it supports both structured and unstructured data
12:30 - 13:00 want to be a data Lake you got to support both um so we have this file section we have this table section the table section all the file section we will take any type of data so let's go really unstructured here let's take some images so I have some images sitting locally on my scen I'm just going to copy them in to this Lous here and all I have to do is copy and paste directly in Windows another great thing about the Windows Explorer any application that knows how to save a file in Windows can actually save data to one link just file save as put it in this directory it'll show up now on one L come back here to
13:00 - 13:30 Windows or sorry come back here to the web and we'll see those same files in here images completely unstructured now there's lots of things that can give you tables um and the nice thing about the lake housee here is it'll let you load it your way you can use our sparked engine to do it but you can bring your own engine you can bring in five Tran you can bring in data bricks you can bring in HDI anything that knows how to write data to one L can write to dat one leg if it knows how to write ADLs Gen 2 and if it knows how to write Delta Lake
13:30 - 14:00 those will we automatically recognize its files to get started there you can actually just right click on that tables folder and you'll see uh under properties you'll see the path to that one Lake location and you pop this into any any service until that understands how to talk to ADLs Gen 2 what we did is we built one Lake on top of ADLs Gen 2 but we also expose the same apis and sdks to try to get this compatibility so you don't have to worry about different storage accounts which
14:00 - 14:30 one you're going to connect to there's only ever one storage account for your tenant and that's one link inside that storage account uh you're going to see all your workspaces showing up where containers would normally be so in the in the URL to get to that workspace uh sorry to get to that workspace you're going to put the workspace name where the container is and then it's just the path to your data item from that point forward so let's talk about the file formats in here I said you can put put any type of file you want in one link
14:30 - 15:00 I've been talking a lot about Delta Lake and we've been calling it Delta par why would you call it Delta par instead of Delta Lake format well the data itself is actually stored in par the metadata is what's in this Delta Lake format that's on top of it and that's a really important U thing to understand because other formats also store their data in paret particularly Iceberg so one of the things we're announcing is that one Lake will soon support both Delta Lake and Iceberg uh
15:00 - 15:30 and what that means is every table in one Lake whether you write it in iceberg format or you write it in Delta Lake format we will automatically translate that into the other format we don't copy the data because again the data is still in par that's the same format we just translate the metadata virtually for you so any table in one Lake you'll be able to access using either format what does this mean well this opens us up to another ecosystem snowflake for example we announced expanded partnership with them
15:30 - 16:00 this week to do data interop since we're now built since we'll now be able to understand the same file formats we'll be able to seamlessly uh use the same data without any data movement or any data duplication this now opens us up to the entire ecosystem of Delta Lake and um and Iceberg let me show you what this will look like now when we uh when we uh complete this when we complete this expanded partnership with
16:00 - 16:30 snowflake so by the way who's using both Snowflake and fabric in their organization just snowflake just fabric okay if you are using both it's important that you be able that these don't become silos of data right today they would be silos they'd be in different storage formats they wouldn't be able to talk to each other but if I have data in one and need to use it with data in another I can now combine those
16:30 - 17:00 so let me actually start in Snowflake I'm going to create a new snowflake database here and you'll see I now have an option to store this data in iceberg format when I select that format not only will store it in iceberg format it'll let me pick a fabric workspace to store this data in and that's where our data in one L is going to get stored setting that up I can create a new table we'll give the table name and you'll see by default it's going to store it in one L in that workspace it's already set up for me all I have to do now is create the table I'm going to create it from a file upload some data
17:00 - 17:30 along with it I'll verify the schema here and say okay got some rows inserted now I can see the table I can see the data here in Snowflake flipping over to fabric though going to that same workspace we just connected to we see a snowflake database here now in fabric going into that snowflake database we're going to see that same table now that's stored in one Lake in snow in iceberg format loaded right here now I combine this with other data that's already in fabric I use shortcuts
17:30 - 18:00 to do this shortcuts let me virtualize data across uh different domains and different clouds so I I made some shortcuts here to data that's already in one L and this is stored in Delta Lake format so now I'm mixing Delta Lake and Iceberg but I can see it all here in fabric not only that when I switch over because of this translation when I switch over to snowflake I can see it all here too not only can I see it I can query it not only can I query it I can query across it so here I'll do a join between data that was written natively from Snowflake data that's written in um
18:00 - 18:30 Delta L format in Fabric and I'll join them together get our results same is true when I do it in fabric all this data is available to all fabric engines so I'll open up a new spark notebook here I'll run some co-pilot and I'll look for some outliers doesn't matter what format I was written in I can use powerbi on top of this in direct Lake mode and finally since one Lake's integrated uh with Excel and office I can use the one L data Hub to find data in Excel doesn't matter what format's written in I can use it right here you don't have to worry about the
18:30 - 19:00 formats anymore different platforms going to take bets on different formats we're going to handle the translation back and forth so you can just use it with all those platforms and we believe so much uh in these open formats you know we've made all our data available in Delta Lake format already um we're going to support Iceberg as the other open format but we're also going to make sure that the translation between the two stays open so we're uh participating in a project called Apache xtable which is open source project uh that actually handles
19:00 - 19:30 the translation between the two formats you heard me mention shortcuts before let me give you a little overview of what's new with shortcuts and if you don't know what a shortcut is if you ever worked with Windows and You' created a shortcut there it's a lot like that it's a pointer points from one file location to another and makes it appear like that other location is now in the shortcut location so for example if I have data in a in a warehouse I want to make it appear in my lake house
19:30 - 20:00 um I don't have to export it to the lake house I don't have to make a copy of it I just create a shortcut to it and it'll look like it's physically there but no data has been copied same is true um if I want to copy it across different workspaces different domains um I can create a shortcut instead of copying it and the data will appear in that other workspace as if it were physically there but we haven't changed the ownership of the data the owners of that data are still the owners they still maintain the the data freshness they still maintain the security just like a regular
20:00 - 20:30 shortcut would be someone else can just use that without having to worry about is my data up to date am I working on the freshest data now that's how shortcuts work within one leg but we also have multicloud shortcuts that can actually span clouds span data that's in any open storage system that's not in uh fabric so let me show you this you've hopefully seen Demos in the past where we've created shortcuts from ADLs Gen 2 um 2 as well
20:30 - 21:00 as uh Amazon S3 and Microsoft dataverse Google Cloud Google Cloud Storage we recently added um as well as a whole host of S3 compatible sources so anything that's S3 compatible minio um cumulo and um D ECS there are actually dozens and dozens of Rec compatible sources out
21:00 - 21:30 there it became the de facto standard really for accessing storage um now we can talk to all of them so let me show you how you do this we're come to a lake house where I already have data short cuted in here from Amazon S3 uh as well as dataverse and ADLs Gen 2 let's add some of these new sources I'm going to create a new shortcut and this time I'm going to point it to Google Cloud Storage and all I have to do is put in the path
21:30 - 22:00 to my storage and and a key to access it and once I do that we now have this new graphical interface that lets you actually explore the storage directly from here and you can pick the directories that you want to shortcut into one Lake and you can pick more than one you can pick this case two but you can pick a dozen a 100 whatever you need create those shortcuts say next and they will show up in one leg instantly it'll look like the data has been copied it'll feel like the dat has been copied no data movement has
22:00 - 22:30 actually occurred let's create one more you actually see it load here let's create one more this time we're going to do it to D ECS this is one of those S3 compatible sources there's something special about this Del ECS Source not only is it S3 compatible it's actually running in someone's corporate Network on premises sitting behind a firewall all I had to do to connect to this was I use the same on premises data Gateway that we use for powerbi that we now use for pipelines and fabric I
22:30 - 23:00 install that behind the firewall and we're able to connect to it directly on premises again without any data movement or any data duplication it shows up here instantly in one lck to show to prove that I didn't actually copy it let's go back on Prem for a minute let's open up the Del ECS tool here we're going to just take a bunch of folders that are sitting on my local machine and copy them to the servers that are sitting locally on the network nothing's moved to the cloud here but we just put a bunch of new images in coming back here I just refresh the
23:00 - 23:30 screen everything's here so we unified data here across clouds across domains um and across uh on premises without any data movement or any data duplication I have one more and that that one's around data that's already in one lake so we could always shortcut the data in one Lake that was how we started shortcuts um but what about if we have to go cross tenants let's say I need to share data with another organization
23:30 - 24:00 who's using fabric we can now use one like data share to do this so let's say someone shared me some data from a different tenant I'll now receive a notification oh actually before I do that um let's actually talk about so shortcuts themselves are implemented at the lowest layer of the stack so it's at the storage layer so every engine on top of of shortcuts doesn't even know it's a shortcut they just think it's data coming from one Lake it looks like data coming from one Lake that means all the same features of every engine will work
24:00 - 24:30 whether the data is a shortcut or not whether it's coming from Amazon S3 whether it's coming from on premises whether it's coming from one Lake doesn't matter it all looks the same to every engine who works on top of it so we can join across all these sources we can build powerbi reports in direct Lake mode that work on top of these sources as if it was one set of data in one physical place now let we talk about the the cross tenant if you need to share data at Cross tenant you can literally we want to make it as easy as sharing the data in office so
24:30 - 25:00 once it's shared to me I'll get a notification an invite to use that data all I have to do is accept it and once I accept it I select where I want it to go so I'll pick a a Lakehouse in this case pick the location in that Lake housee where that data is going to show up and that's it data will appear there in seconds as if it was physically here again again didn't move wasn't
25:00 - 25:30 copied still controlled by the other tenant if they revoke my access the access disappears it's not like a copy of data that will still persist after that happens so we can now go across clouds on cross on premises cross tenants without any data movement or any data duplication here so all these sources are available today Google Cloud Storage Amazon S3 cross tenant data sharing and as of uh earlier this week the shortcuts to on premises are all available in public
25:30 - 26:00 preview and there'll be more coming but shortcuts are great so let's look at all the ways we got dated into one Lake here so we we first looked at the at uh using the data warehouse to get data into one Lake right any fabric workload will bring its data to one Lake including the 100 plus connectors in Azure data Factory uh um any application that knows how to
26:00 - 26:30 work with these workloads can bring data to one link or read data from one link any API any application that knows how to work with ADLs Gen 2 apis can directly work with the storage layer of one Lake without having to go through another engine then we talked about shortcuts so data that's uh in open file formats in open file systems we don't want you to have to copy into one Lake you can shortcut those virtually putting them into one Lake we support a bunch of sources that we just talked about but what about data that's not in
26:30 - 27:00 an open storage system or in an open file format this is where mirroring comes in then Charles Webb's going to come on stage and give us an overview of mirroring [Applause] somewhere thanks Josh appreciate it all right hey everyone thank you so much for being here especially at the end of your day we really appreciate it
27:00 - 27:30 I'm a principal PM lead for the data warehousing capabilities in Microsoft Fabric and I'm going to talk to you a little bit about the data warehousing capabilities shortcuts and a number of other announcements that we've made over this week first I want to just kind of Segway we've heard from Josh about how one Lake specifically helps you alleviate your traditional analytics challenges and data warehousing and fabric can help you accelerate analytics to that lab mile so everything from helping you get that discipline at the
27:30 - 28:00 core and the flexibility at the edge helping you grow your bi adoption helping you make decisions with data and deliver reporting analytics and AI Solutions this is what we made the warehousing capabilities in fabric for and when we look at the fabric capabilities as a whole data warehousing is just another element of that so it's still built on top of one Lake it still can be governed by perview it's still got AI built in with things like co-pilot which we will touch touch on later um but it's a core part of what makes
28:00 - 28:30 fabric fabric now one of the things that makes the DAT data warehousing experience is a little bit different in fabric is that we built these experiences for everyone so build as you know is a developer conference but we're really focusing on how can we deliver a capability that lets anyone be a developer of any skill set so whether you're an analyst or a business user or a data engineer or a data scientist we really focused on making anyone have the skills to be able to build data warehousing and get value
28:30 - 29:00 out of a data warehouse solution and this is really different than anything else on the market today so first I want to talk to you a little bit about these developer experiences and put what I just said in context and Josh mentioned mirroring is one way to get data from your proprietary data sources that you may have how many of you folks use things like address equal DB a lot of hands raised what about Cosmos DB things like AI applications building rag things like this and of course many of you have data in multiple clouds like what Josh touched on now
29:00 - 29:30 mirroring is one of those things that helps you get data into the data warehouse easier and while we have multiplet toe of ways to get data into the data warehouse everything from SQL store procedures Pro code data integration with things like pipelines no code experiences with things like data flows we're going to touch on the Zero ETL ways to get data in this is a scenario where you don't have to be a plumber and build pipelines you can accelerate the way that you get your data with mirroring and so what this allows you to do is just seamlessly sort
29:30 - 30:00 of connect your databases and any of your data warehouses that may be popular to one Lake Land the data there and then build on this lake house AR Centric architecture that you've already seen everything in fabric run on top of so this means address equal DB Cosmos DB and more and once your data is in the data warehousing capabilities this allows you to queer across clouds it unlocks data science experiences and of course powerbi clicky clicky draggy droy
30:00 - 30:30 now the next thing that I'm going to touch on a little bit here is co-pilot and how many of you have heard of our co-pilot announcements across build many of you are smiling and that's fantastic because what we're really making sure we're doing here is of course we have Pro code and no code experiences in the warehouse and of course we already have deep AI Investments at the engine layer within the warehouse itself but we want to make sure we bring AI experiences into how you build a data warehouse because for our developers we want to make sure this the best place to do your work and so everything from exploring
30:30 - 31:00 data to writing code to modeling your data to finally going to the reporting layer can be done with co-pilot assistance and we're going to show you what that looks like essentially anything that you need to do in terms of building a data warehouse you can do with co-pilot by your side so whether you need to get data ready and prepare it or organize secure model your data to finally just discovering insights can be done with this paired programmer this assistant by your side now how many of you are developers pure developers that like the write code
31:00 - 31:30 and it turns out you're build at the right place but we would be a Miss to not show you some of the SQL sugar and Investments we've been making at the SQL level to make sure that this is the best place to to do your data warehousing and so we're introducing time travel a solution to how you might need to do historical data analysis um with some very very very nice sugar to how you would develop and analyze your data over a period of time or do trouble shooting or even if you have stable reporting
31:30 - 32:00 your compliance needs and what this looks like if you can see the highlighted code here is really just an option label that you add and gone on the days of doing time travel where you need to make multiple copies of the data overwrite data now you simply say hey look as I'm writing a query I want to look back at the last seven days and with this option Clause you can do that which means you also don't have to specify per table in the in the queries and the joins when you actually need to have the data as of you can just do it at the very end of your your your statement so I know all of you are here
32:00 - 32:30 to actually see what this all looks like you're here for the demos not the slides let's just jump in how many of you have used data warehousing in fabric already I see some hands raised and that's that's amazing one of the things we're going to show you here is how you can get data into the data warehouse and I already talked about the no code and the pro code capabilities but there's also mirroring and Josh touched on mirroring a bit but essentially mirring is the way to start
32:30 - 33:00 getting your data uh from any of these popular sources that you see here how many of you again are using address SQL Cosmos other sources like what easy here great so let's make sure this is working
33:00 - 33:30 all right so we're going to walk you through what it actually looks like and takes to go mirror your data and we're going actually start with one of these sources anyone tried mirroring so far anyone had a chance to check out this capability see a few hands raised
33:30 - 34:00 fantastic so to mirror your data it's pretty simple what you do is you select a source and after you SE select a source it's a simple as signing in so what you see here is we've provided some connection information and the next step is you can now very simply see your objects and when you select the objects what we're doing is we're connected to those objects we're allowing you to preview them and then once we've selected what we want to mirror it's as
34:00 - 34:30 simple as giving it a friendly name so maybe this has a nomenclature based on how you've delivered the capability in Azure and you want to give it something friendly so that when you're querying it it makes sense now once you do that you're done what we're doing now is we're Landing all the data in one Lake for you and after we've landed that data in one Lake you can start to Monitor and see all that data coming in so any changes that happen are very easy to see and we're actually keeping all your data in sync so any changes inserts updates deletes are managed for you now secondarily once we've done that that
34:30 - 35:00 allows you to start quering your data right away now the data is in one Lake but it's also now transformed in a tabular format which means that as I want to query it it's as simple as running a SQL query and you can see here this is just your friendly tsql now as you're analyzing data you might say you know what I've got data from another database I want to analyze and similarly it's very easy to get data from now Cosmos DB now we you know that Cosmos DB has data that is you know
35:00 - 35:30 documents this may be Json documents or otherwise and once you know how to mirror your data you know how to mirror your data across any data source so very similarly we sign in and after we sign in it's as simple as either providing a connection name that you already have or using the existing one but then all your documents are right there once you select the documents or the whole database again it's as simple as saying connect and then we handle all of those pipelines Landing the data it's truly zero ETL which means that again it's as
35:30 - 36:00 simple as connecting to your data now the monitor experiences are also the same affordances that you'd be used to but very importantly we've gone from structured data to unstructured data and of course all this data will also get structured for you in the warehouse with nothing you need to do now lastly we support sources that are non- Microsoft so of course your most popular data sources that you might be keen on you can also mirror and we're going to do the exact same thing that
36:00 - 36:30 you saw before which is to make sure that we're Landing that data keeping it in sync we're using the CDC technology of the data source behind the scenes to make sure that everything is kept very neat for you and this is really accelerating time to Value to do this today requires a massive amount of cognitive overload to figure out what's the different Source how am going to keep it up up to date and and sync and then lastly how am I going to start to work across all these different data clouds and quer the in the same place and because one lake is a single SAS
36:30 - 37:00 environment we've landed all the data in one Lake it now becomes very easy to see all my data monitor all my data and then finally query all my data in place now the next step of my journey may be to say you know what I've got data from all of these different clouds and environments but now I want to really analyze it at scale and I showed you previously the co-pilot capabilities but now let's see what it looks like live so we can create a new Warehouse on the Fly and what's neat about this say now we can connect to those same mirror
37:00 - 37:30 databases that we had previously and do additional things to deliver Downstream analytics value so you can see how easy it is now to select those same databases that we mirrored previously and now if we want to do cross database queries write store procedures create views we can do so very easily in a very familiar SQL environment you can see here as we navigate through all those databases that we mirrored are now simply an object Explorer and available for us to to do additional deep
37:30 - 38:00 Dives now we can also leverage co-pilot and if we don't have the skills of a SQL Developer and we wanted to analyze data at scale we can ask co-pilot for help and when we ask co-pilot for help because we're leveraging responsible AI we're going to of course fulfill your request but do it in a way what's highly transparent we're going to tell you how we plan to execute that query we're going to give you the query to kind of review and then after you're ready you can run it at your own speed and so we accelerated the time to Value here but we still give you the power as a
38:00 - 38:30 developer to do what you need to do and if you didn't have those superpowers we just uplevel you now the next step in this journey may be that you want to continue to analyze your data and you may continue to want to analyze data across different environments of course co-pilot smart enough to know all the different databases that you maybe have queried already now at some point you might say I also
38:30 - 39:00 saw some really nice SQL sugar and I also want to be able to write code kind of like you know GitHub co-pilot I really enjoy that capability and just like GitHub co-pilot I can add context and comments into the SQL text editor and then you can see Boom the co-pilot will actually complete the code for you so as a developer as I'm typing and I enjoy that experience where I type and tab and I have all of that intelligence and auto completion just built in we have the same thing here so we have multiple modalities to
39:00 - 39:30 work with co-pilot and AI inside of this experience you can also see the the explain and and the fix buttons that can be leveraged as I might need to understand how code works or to fix errors now lastly there are those scenarios where you need to look at data in the past and this is where that SQL sugar that I was telling you folks about earlier comes into the picture and so traditionally to do something like this I would have had to make multiple copies of the data Maybe I had backups and now it's very simple to add that option clause and now query data as of a
39:30 - 40:00 previous point in time so I can look at data right before where I was and then it's very easy to then go further and extend this to powerbi reports with one click from the data warehouse and you can see now if I wanted to do reporting I wanted to look back just you know right before build I could easily do that and again to do this in other platforms or solutions would either be copying the data or would mean literally specifying for each particular join the as of date for us is just option Clause so very very powerful stuff now when we
40:00 - 40:30 look at data warehousing more you can see that there's other experiences that we talked about right so now if you need to build a star schema you can do that visually inside of this experience and this is one of those things where when you start thinking about AI data is the fuel that powers AI so all these semantics that you can physically add to the data warehouse can now power your additional experiences like powerbi which is deeply connected to the data warehouse meaning this same report connected to the those same tables that were modeled can now leverage co-pilot
40:30 - 41:00 which is able to understand all those semantics so to now go from data warehousing where we copi data in using mirroring very seamlessly without writing any code to then writing code with co-pilot to now a powerbi report was clicks and you can see how co-pilot aided us through that whole entire Journey super powerful stuff so now I want to take a step back and say fabric is reasonably new we went to GA towards the end of November and at the same time we've had amazing momentum
41:00 - 41:30 we have thousands of customers growing daily and this is fantastic but I also want to call out the fact that everything that we've been doing is because of you our customers so one of the things that we do is we listen a ton to you and you what features and functionality that you tell us we have to ship and we ship every single day and what this means is that if you want to take a step back and see all the things that we're trying to release you can simply leverage our release plans which are public and you can see all the things that we plan do by quarter so that you can effectively plan for all this Innovation that will always keep
41:30 - 42:00 coming separately it's very easy to get Microsoft Fabric and the best part about this is that for our developers of which there are so many in the room and so many out there we offer 60-day free trials which means no credit card it means no Azure subscription and it means we're giving you one of our most powerful skews to get started meaning that you can go end to end just like I showed you in minutes but also think about all The Innovation that you could deliver in 60
42:00 - 42:30 days now for folks who have seen a lot of the amazing goodies that we've showcased here today I also want to leave you with a few resources there is aka.ms faab road map that's where you can find all the goodies around what are we planning to do in the next few quarters there's also aka.ms trab fabric that's how you can get access to all of these things and try it for 60 days separately I also want to call pention to um four days of Microsoft
42:30 - 43:00 learning connection and inspiration we're going to have the product teams engineering teams and many more there to to to work with you and learn from you and also show you all the new innovation we continue to ship daily and this is in Stockholm this is in in Sweden and this is coming very soon so mark your calendars for the 24th to the 27th of September of this year now thank you so much all of you for joining this this session um we have a number of additional resources here you can scan the QR code and you can
43:00 - 43:30 find out more sess more details about this session on on the details page but I also want to say a huge thank you we are keeping you from the end of the day the the long weekend and I really appreciate all of your time here spending it with us and learning more about fabric one Lake and the data warehouse thank you so [Applause] much