Designing Your Data Warehouse from the Ground Up

Estimated read time: 1:20

Summary

This educational video presented by Pragmatic Works focuses on building a data warehouse from scratch. Hosted by Dustin Ryan and Mitchell Pearson, both business intelligence consultants, the session covers the essential steps in designing a data warehouse, including identifying business processes, establishing a data warehouse grain, and selecting dimensions and measures. They use a fictional business scenario, 'Buster Block' — a video rental store, to go through these steps practically. They also delve into considerations for SQL Server Analysis Services (SSAS), ensuring data warehouse performance, and touch upon advanced configurations such as snowflaking and surrogate keys. Key resources like 'The Data Warehouse Toolkit' are recommended for further learning.

Highlights

Dustin Ryan and Mitchell Pearson from Pragmatic Works guide viewers in designing a data warehouse from scratch, perfect for beginners. 🎙️
The example of 'Buster Block', a fictional video rental store, helps in understanding real-world application of data warehousing techniques. 📼
The importance of establishing a data warehouse grain is emphasized to minimize issues and rework. 🛠️
The session explains differences between transactional systems and data warehouses, with a focus on the analytic benefits of the latter. 🔄
Surrogate keys are crucial in insulating the data warehouse from changes in source systems, ensuring long-term stability. 🏗️
The discussion includes using SQL Server Analysis Services (SSAS) to enhance report generation and data analysis. 📈
Design considerations for future-proofing data warehouses against extensive rework are outlined, stressing the importance of correct initial setup. 🕒

Key Takeaways

Designing a data warehouse involves four critical steps: identifying business processes, determining the grain, selecting dimensions, and choosing measures. 📊
A data warehouse differs significantly from a transactional system, focusing on analytical capabilities rather than operational data. 🚀
Pay attention to surrogate keys and ensuring correct grain levels to avoid extensive rework later. 🔑
SQL Server Analysis Services (SSAS) plays a vital role in optimizing reporting capabilities within a data warehouse. 🖥️
Prioritize business processes that provide high impact and low risk when starting with a data warehouse project. 🚦
Resources like 'The Data Warehouse Toolkit' are invaluable for anyone new to data warehousing. 📚

Overview

In this comprehensive session, Dustin Ryan and Mitchell Pearson, both experts from Pragmatic Works, provide guidance on designing and building a data warehouse. They use a step-by-step approach focusing on four fundamental steps crucial for the data warehousing process. Their approach is pragmatic and centered around real-life examples, making it accessible for anyone interested in business intelligence and data analysis.

The session is structured around the fictional business 'Buster Block', a video rental store, to practically apply the key concepts. This engaging scenario helps participants understand the intricacies of data warehousing in a straightforward manner. By breaking down the process into identifying business processes, defining data warehouse grain, and selecting dimensions and measures, the presenters ensure that viewers can follow and apply these principles in their own projects.

In addition to the demonstration, the workshop includes valuable insights into SQL Server Analysis Services (SSAS) and its significance in optimizing business intelligence operations. The presenters stress the importance of understanding surrogate keys and the potential challenges of snowflaking in data warehousing. Concluding with a recommendation of further reading, they suggest 'The Data Warehouse Toolkit' to deepen understanding, ensuring participants are well-equipped for future endeavors.

Chapters

00:00 - 00:30: Introduction and Welcome The chapter serves as an introduction and welcome to the course on designing a data warehouse from the ground up. It begins with the instructor ensuring that the participants can see the presentation screen and confirming with a participant named Liz. Once the technical setup is confirmed, the instructor expresses excitement and gratitude for the participants' attendance, indicating the start of the course.
00:30 - 01:00: Session Overview and Presenters Introduction In this session titled 'Session Overview and Presenters Introduction', business intelligence consultants Dustin Ryan and Mitchell Pearson from Pragmatic Works guide the audience through the process of designing a data warehouse from scratch. They outline the steps involved in the process, considerations for successful implementation, and highlight key aspects related to analysis services that participants should keep in mind.
01:00 - 03:00: Why a Data Warehouse is Important The chapter focuses on the importance of designing a data warehouse with a specific emphasis on analysis services. It highlights that even if analysis services are not being utilized, the information provided will still be beneficial. The chapter aims to cater to an audience interested in leveraging the reporting strengths of analysis services and mentions the co-presenter, associated with a website or persona closely linked to SQL.
03:00 - 08:00: Differences Between Transactional Systems and Data Warehouses The chapter introduces Dustin Ryan, highlighting his journey at Pragmatic Works where he transitioned from a janitor to a BI consultant over six and a half years. Dustin is an expert in business intelligence (BI) with a focus on SSAS and MDX, and he is well-regarded for his training and blogging with over a hundred blog posts on these topics. Additionally, he is an author, speaker, and has unique skills in training miniature ponies. The chapter sets the stage for discussing the differences between transactional systems and data warehouses, suggesting that Dustin's diverse skills and evolving career provide him with a broad perspective that will be insightful for the chapter's topic.
08:00 - 09:00: Steps to Designing a Data Warehouse The chapter introduces Mitchell Pierson, a BI consultant and trainer at Pragmatic Works. It highlights his expertise in SSIS, showcasing his roles in teaching both introductory and advanced classes on the subject. Additionally, the chapter touches on his passion for model trains, describing it as both uncomfortable and natural, mentioning his involvement with costumes and other related activities.
09:00 - 18:00: Step 1: Identifying the Business Process The chapter starts with an introduction to the purpose of a data warehouse, highlighted with a story about a bank. The speaker hints at benefits and reasons for needing a data warehouse without going into much detail, stressing the importance of understanding business processes beforehand. The mention of another blogger, Mitchell, and his website adds credibility and suggests going deeper into the subject matter through additional resources.
18:00 - 26:00: Step 2: Identifying the Grain The chapter starts with a technical difficulty as participants experience audio issues, making communication challenging. However, these issues are quickly addressed and resolved.
26:00 - 32:00: Step 3: Choosing the Dimensions The chapter titled 'Step 3: Choosing the Dimensions' recounts the story of a small bank that desired advanced reports for data analysis but lacked a data warehouse. They attempted to create these reports using their transactional system. However, this decision led to the system crashing, causing all ATMs to go offline. This incident serves as a lesson on the importance of using appropriate systems for data processing and analysis.
32:00 - 36:00: Factors for Dimension Design and SSAS Considerations The chapter discusses the necessity of using a data warehouse for certain types of reporting, particularly historical reporting. It highlights the importance of distinguishing between data used for business execution and data used for analysis, emphasizing that not all reports should be run directly from the data warehouse. The discussion serves as a cautionary note regarding the potential pitfalls of improper data management and reporting strategies.
36:00 - 43:20: Step 4: Choosing the Measures The chapter titled 'Step 4: Choosing the Measures' discusses the role of data in supporting business operations across various industries such as retail, banking, transportation, energy, and manufacturing. The data is crucial for executing core business functions, such as purchasing products, managing inventory, and performing tasks that sustain the business and drive profit. Additionally, the chapter emphasizes the significance of analyzing business data to answer critical questions like sales figures or account balances, thereby combining operational execution with strategic analysis.
43:20 - 53:00: Resources for Further Learning This chapter provides an analysis of the differences between transactional systems and data warehouses, specifically OLTP systems and dimensional models or star schemas. It emphasizes the importance of understanding these distinctions for business analysis, particularly in contexts like shipping and freight. Additionally, it discusses how users interact differently with these systems.

Designing Your Data Warehouse from the Ground Up Transcription

00:00 - 00:30 [Music] let me go ahead and start the recording you should be able to see my screen right now you should be looking at designing a data warehouse from the ground up you see that okay Liz yes you're good okay great I'm gonna go ahead and we'll get started here okay so welcome everybody I'm very happy that you guys are joining us here for this
00:30 - 01:00 session this session is designing a data warehouse from the ground up well Dustin Ryan I'm joined here by Mitchell Pearson we're two business intelligence consultants here at pragmatic works and so we're gonna be taking you through designing a data warehouse from the ground up which includes the steps that we're going to use some things to consider and then we're also going to specifically point out some things regarding an analysis services some considerations that you just want to keep in the back of your head when
01:00 - 01:30 you're designing your data warehouse for analysis services now if you're not if you're not going to be using analysis services for your data warehouse that's okay you're still in the right place we're just going to point out some extra things along the way for those of you in the audience that will be looking to leverage the really the strengths the reporting strengths of analysis services ok all right so my co-presenter today is sequel dusty calm or Dustin's sequel or
01:30 - 02:00 dustin ryan there we go he's been a pragmatic works for 8 years now now what you might not know is he spent the first six and a half years as the janitor but fortunately for him he's learned a lot in the last year and a half and he has a bi consultants here at pragmatic works he's also a trainer he teaches our SSA has master's class our yum DX class he's a wealth of knowledge if you go to his blog you'll see he has over a hundred blogs on just those two topics he is an author a blogger and a speaker he's written two books you'll see them on your screen here and he is also a trainer of miniature ponies I
02:00 - 02:30 can't make this stuff up they're so cute they're so precious and this here to my left here is Mitchell Pierson he's also a bi consultant and trainer and pragmatic works he teaches the intro to SSIS class he also teaches the SAS master's class so he knows a lot more about me than SSIS great resource there so if you have any questions about so you definitely need to look up niche he's a blogger speaker and he's really has this you know very uncomfortable and natural passion about model trains he's got costumes and everything so I
02:30 - 03:00 don't go too much detail into it but throw that out there and you can find his blog Mitchell sequel wordpress.com alright so we're gonna get started with talking about before we get into designing a day to warehouse why do we want a day to house Oh Mitch think of any good reasons why we might want a data warehouse actually I'll give you a little story I like stories I like stories to once upon a time there was a little bank and this bank wanted to build these really awesome reports but unfortunately they
03:00 - 03:30 hey you guys can you hear me you just lost down guys Hey we're having some audio yeah yours okay yeah there you go all right I'm not sure what the issue is excellent so we were telling a little story I'll start over from the beginning
03:30 - 04:00 here once upon a time there was a little bank and his bank wanted to have these awesome reports for analysis but unfortunately they didn't have a data warehouse and this is the true story we won't name the bank and his bank built these reports off of the air transactional system and unfortunately when they built these reports off their transactional system their transactional system crashed and it took down all of the ATMs for the bank the end okay so what you're telling me is that somebody went home early that day permanently permanently okay so that is one really good reason why a
04:00 - 04:30 data warehouse is something that you want right for a reporting right historical reporting you know what there's certain reports that you definitely don't want to be running off of your data warehouse um now so that's a kind of a scary story but those kinds of things happen and so that's that's one of the really good reasons why we want to consider using a data warehouse now when we're talking about data data really boils down to two different types you have data that supports the execution of the business and then you have data that supports the analysis of
04:30 - 05:00 the business so when we're talking about data that supports the execution of the business it can be you know a retail business it can be a banking could be transportation energy manufacturing it can be any type of business and we have data that supports the business so somebody is buying products somebody is logging inventory somebody's doing something that helps the business run helps them business make money so that's the execution side of the business then we have the analysis of the business how many cheese balls did we sell last quarter what are the account balances
05:00 - 05:30 that we're looking at right now how much Freight did we ship or how much Freight are we going to ship so that's the analysis side of the business and so that's two of the main differences between the OLTP or your transactional system and your data warehouse your dimensional model your star schema whatever you want to call it that's the analysis side of the business so huge difference is there between a transactional system and the data warehouse also you have how does the user interact with that with those two different systems that another big difference between systems
05:30 - 06:00 so when you have a transactional system primarily the interaction is going to be at a very atomic level or a very granular level somebody's going to insert a new transaction a customer walks into the store they buy a product they say give me a cheeseball we insert our record into the database into the transactional database because some new customer has bought another cheat ball okay that's the transactional side now with the data warehouse side the transactions are going to be a lot different right we want to be able to
06:00 - 06:30 upload all of yesterday's transactions which could be 50 million 100 million whatever the number might be we want to insert all 50 million transactions from yesterday up into the data warehouse so that way we can be in reporting lots of reporting on it now it could also be we want to read a 50 million records so I can figure out how many cheeseballs SLA yesterday so the user interaction is going to be different now the the method of interaction where the types of queries that are going to be run are also going to look a lot different when we're talking about a transactional system you're going to have atomic
06:30 - 07:00 inserts atomic updates deletes we're going to insert one record at a time update one record at a time delete one regular time and the primary method of interaction with the data warehouse is going to be is going to be select your gonna have yes you're gonna have inserts bulk inserts right we're going to insert 100 million records at one time from you know whatever happened yesterday in the business or we're gonna select 100 million records at a time because our users are running queries that are good that they're asking show me how many sales I have show me how many how many
07:00 - 07:30 sales transactions I had yesterday or how many cheese balls I sold yesterday those types of queries another difference between your transactional system in your data warehouse is going to be the temporal focus right usually in a transactional system the temporal focus is going to be what's happening then I'm currently in the business we're not tracking a lot of history in the business outside of what's necessary just to keep the business running with the data warehouse we're gonna have what's going on currently right and there may be a little bit of latency there maybe we're only keeping maybe
07:30 - 08:00 there's like a day latency we update the data warehouse daily and so it takes a little time for the current data to make its way into the data warehouse but we also keep history right maybe you want to keep three five ten years history you know I don't know whatever your business determines is appropriate you want to keep that history in your data warehouse as well so that way we can do analysis over over time so are we selling more cheeseballs this year than we sold three years ago now also the difference between the transactional system in the data warehouse is going to be the design
08:00 - 08:30 optimization right typically with a transactional system you use a third normal form type of design because we want to be able to optimize lots of inserts happening at one time so we may have you know 10,000 10,000 purchase orders coming in in a given hour and so we want to be able to you know insert and update those all at the same time that's how we'll optimize our transactional system now with the data warehouse we want to optimize for high performance queries and that's really the main purpose of designing the data
08:30 - 09:00 warehouse the way that we're going to talk about is we want to be able to get the data out of the data warehouse as fast as humanly possible so that way our users queries don't have to run very very long this is also important for analysis services processing and we'll talk about that a little in a little bit here later okay now the four steps that we're going to talk about are identifying the business process identifying the grain choosing
09:00 - 09:30 the dimensions and choosing the measures so these are the kind of the fork and steps that you can use when you start approaching either building a data warehouse from scratch or enhancing a data warehouse maybe you want to bring in a new business process into your existing little warehouse maybe in your data warehouse you've got retail stuff already and you want to bring in some marketing marketing facts so you can use these four steps the kind of approach enhancing a data warehouse as well and as we're going through these steps I
09:30 - 10:00 want you to think about and remember that the purpose of a dimensional model a well-designed dimensional model or star schema is we want it to be simple right we want it to be easy to use we want the ETL to load these tables to be as simple as possible we want our queries to perform well and we want this to perform also perform well with SSAS so let's start walking through these these four steps here and and as Mitch and I go through these
10:00 - 10:30 steps together one thing we're going to we're gonna do is we're actually going to design a data warehouse together so we've got this kind of imaginary business um a very successful business model you've probably heard of this company buster block they said they sell rentals to VHS movies very successful in the past and so that's kind of our imaginary scenario we're going with here we're going to design a simple star schema for this business okay so we're gonna walk
10:30 - 11:00 through these four steps together keeping in mind that we're designing a star schema for Buster block alright so remember that and so we're gonna kind of approach it from that standpoint so that way we can talk about these four steps but you can also kind of see it in practice and you kind of have a good example of you know using this in your particular environment now the the reason we're going with a retail example here is that we don't want to be complicated everybody understands how
11:00 - 11:30 kind of how retail work somebody buying a product right and so we don't want to get lost in the minutiae of you know how the business actually works or you know the technology or from that kind of standpoint so we want to go with something simple so it's easy to understand and we can focus on actually learning the concepts here so we're gonna start with identifying the business process all right Thank You Dustin so identifying the business process is going to be the first step in designing our data warehouse and the question we want to ask there is what is a business process a business process is
11:30 - 12:00 a natural business activity performed in your organization that is supported by some form of data collection it's important to focus on the business process as opposed to the business departments because this will allow us to deliver information across the business in a more consistent manner otherwise we may duplicate data if you think about having a data warehouse in multiple locations in your business for example having a retail data warehouse and marketing and then a retail data warehouse and sales we may have a duplication of data that's not necessary
12:00 - 12:30 and we want to have one centralized location for that information so what you're really talking about here is having that one source of the truth right we want to focus on business is not business department what we don't want to happen is to have marketing have their own data warehouse manufacturing have their own data warehouse and then everybody's showing up to the same meeting saying well this is what our profit should be for this past quarter and everybody's got different numbers so we're talking about focusing on the business process that may span across business departments exactly
12:30 - 13:00 good job Dustin all right so we must first decide what business process we want to model there's two steps that we want to identify when we're modeling our business process we want to look at the impact that this business process is going to have in our data warehouse and we also want to look at the risk that's associated with this business process so the impact is going to be reports that your business want on a daily basis very common everyday impact high availability reports the risk associated with this
13:00 - 13:30 business process are going to be things like data availability can we have access to the data or is it going to be difficult to get do we have data quality problems you might want something in your data warehouse but when you look at the data it's very unclean and it's going to take a lot of work to get that clean for your data warehouse the third thing there is you might have very complex business logic that has to be applied to actually get that data warehouse up and running and it might be difficult to get that information from all the different departments so we need to assess the impact and the risk and both of those have to take place as part
13:30 - 14:00 of choosing the proper choosing the proper business process all right and the other thing we want to do is we want to make sure we choose the business process that has the most impact from the beginning we want to get that low-hanging fruit so when we start our data warehouse project chances are we're going to have multiple business processes that we identify we're going to have more than one we're not going to have you know just one process we're going to see three or four or five different processes that we're interested in all right so chances are we're not going to be able to fit all of those business processes into our data
14:00 - 14:30 warehouse we're gonna have to choose one or two or three or a couple of those but we won't be able to fit all of them in there this is where I mean what's that what's the best way to decide what is the business process we start with if we're stiff if we've got people out there that are starting a data warehouse from scratch how do they pick what business process do we start with it's a great question so a lot of times what you're going to find out especially when you're at a client so chances are the biz is already going to be telling you what those high-impact items are if you're working in a bi environment now then you probably have a user or a group of users
14:30 - 15:00 that come to you on a regular basis and they say can you send me that report again that shows me my profit margins by category or hey can you give me that report that shows me my Freight that was delivered today and what they're really asking me for what they're really telling me is that these are high-impact business processes that I need a lot of visibility into so chances are if you don't have a data warehouse in your environment now then you're probably going to have users that are already telling you that they need lots of visibility into these specific processes
15:00 - 15:30 and they're high impact for the business so a lot of times the business is already going to have reports they want that they're asking for and it's going to be kind of obvious what you want there now you're going to have to do a little bit of additional research to figure out is this a high-risk item or is this low-risk is the data going to be readily available is it going to be dirty data is are we going to be able to get the business calculations and the business logic that we need so that's going to have to take place as part of that process and so that's good another thing that we want to consider Dustin that we previously touched on is that when we
15:30 - 16:00 say identify the business process we're talking about business processes for the company not for an individual department in the company and what we mean by that is you might have a retail data warehouse in your marketing department and the marketing department is going to come to that meeting that Dustin was talking about with their own set of numbers and then you have another retail data warehouse in your sales department and they come to that same meeting and they say well we've had this much in sales and marketing says no we've got this much in sales and we have conflicting numbers in our company so we want to have one centralized data
16:00 - 16:30 warehouse one version of the truth from marketing manufacturing sales and retail we're gonna have all that in one data warehouse so that's something that we want to consider there another way we might determine our business process is by listening to the questions that our business users are asking us so a lot of times we're going to hear a question they're going to say what are the gross profit margins for our product categories for the previous month what are what's our average account balance what is the average rental quantity we're talking about blockbuster block so
16:30 - 17:00 what is our current rental quantity those are questions the business is going to ask they're asking those questions if we pay attention and we listen closely then we're going to know exactly what those business processes are that they're looking for that are going to have high impact immediately kind of that low-hanging fruit in the company right so if you go if you have you know everybody's got that one user that always comes from and says hey can you give me that report hey I need that reporting and hey I need up or you probably all have users like that the ones that they're coming and asking me for the most are probably the processes
17:00 - 17:30 that have the most impact on the business because they're the ones that that are going to know what's going to be high impact so what I hear you say images we want to looks about low-hanging fruit we want to look for those business processes that are gonna have high impact for the business either make a lot of money for the business say that this is a lot of money make the business more efficient whatever but it's gonna be low risk we can get the data we have people who can help us figure out the logic that those kinds of things absolutely exactly okay so in this case we've already identified our business
17:30 - 18:00 process our example in this case is Buster block right our video rental store and so we're going kind of with a retail business process here for this example okay now the next step here is we want to be able to identify the grain of the fact table okay now this is the most important decision that you will make in your design processes we need to identify what is going to be the grain
18:00 - 18:30 in our factory people now if you're if you're new to data warehousing you may not know what a fact table is a fact table is a table that's going to be at the center of our star schema and it's the table that contains all of the facts about the business how successful the business is how many sales did we have how much quantity do we move those types of things so facts are measurements regarding the success or failure of the business depending on what type of business process that we could be modeling now when we when we identify
18:30 - 19:00 the grain we want to determine what does one row in the fact table represent okay so if we're working in some type of retail scenario here where we want to identify you know what does one role in our say fact table represent in the case of Buster block it might be a line item in a transaction so for this business process we're going to decide you know what one row in this fact table represents a line item in a single
19:00 - 19:30 transaction so that could be an example in a retail scenario that's what we're going to go with for this example now for if you're working in some type of you know business process where you want to track inventory maybe you're deciding that you know what one row in this fact table represents a record of inventory on a particular day so every day we go out to our data warehouse we figure out our another data warehouse our physical warehouse and we determine okay we've
19:30 - 20:00 got you know what we've got 500 cheese balls sitting on the shelves we've got you know a thousand a thousand boxes of cereal or we've got you know whatever is in our warehouse and we want to log that every day into our fact table so maybe we in an inventory an example one row would represent a a you know a record of inventory on a given day but when when you determine what one row represents in your fact table it's important to determine to go with the most granular level you want to design your
20:00 - 20:30 dimensional model to support the most atomic or granular level that you have with your data now when we're talking about atomic the most atomic level we're talking about data that cannot be subdivided any more we can't break it down anymore this is as low as the data goes we can't subdivide this anymore and the reason we do this is because data that is at the most atomic level can be rolled up it can be aggregated it can be sliced it can be diced and pretty
20:30 - 21:00 much any way that the users want okay um and this is really good before app Talk queries because the bottom line here is that we can't predict how the users are going to query the data warehouse they might tell us that look we want a report that does XYZ or shows XYZ right but the moment they realize that okay well we can ask other questions of the data warehouse or we can ask other questions of the cube they're going to start firing all kinds of other queries
21:00 - 21:30 connect to the cube with Excel they're gonna slice and dice in ways that you didn't really expect and so that's one of the reasons that we want to store this level data at a at the most atomic level if we if we decide you know what you know I've worked in situations where a client I said what we really only want the data at the month level we don't care about it up add a level you usually still always want to go with a granular level because eventually they're gonna
21:30 - 22:00 ask you hey can we look at this what happened what you know we see it something see it an anomaly at the month level right okay we're going to figure out where this happened that well we can't right because what we've aggregated our data warehouse up is the month level so we can't we can't roll it down and so that's that's always something to be aware of so I got a question for you how much work would it take to rework that I mean if you build your entire data warehouse on a monthly level or maybe a weekly level or daily level and you want to go more granular how much work would that take I mean if you designed your old data warehouse to
22:00 - 22:30 be at the day level or I mean at the month level and you're and then later on down the road you know what we want we want to take this fact table we want to roll it we really want to change it to the day level if you don't have that history somewhere else you're not gonna be able to get it okay so that's that's a problem for one but if you can it's gonna be a lot of work because then you're gonna have to rework the fact table you can have to rework dimensions you're gonna have to rework ETL you're gonna have to do a lot of rework there so that's kind of why I said at the start of this that you want to focus on getting this step right because if you
22:30 - 23:00 mess this step up it's gonna cause problems for you on later on down the road okay so the bottom line here is we can't predict what queries the users are going to run against our data warehouse so we want to store at the most green their levels so that way they can ask any questions they want all right so in this particular case let me bring my virtual machine over here so we can kind of do this together here all right so we're gonna stub out here a data warehouse
23:00 - 23:30 design together and so what was our what was our brain statement one row represents a line item in a transaction at a Buster block store okay and so we're going to start here with the when you table and we're gonna call this fact sales because we're gonna give it the fact moniker because this is a table that is going to contain our back pay to those measurements that the business has identified that they can use to tell how much success they're having in the execution of the business okay oops
23:30 - 24:00 already created that one so we'll call it back video sales how about that okay all right and so when we when we determine the grain of our fact table based on that grain we can determine what kinds of dimensions which we're going to move into here in a minute we can determine which kinds of dimensions that are going to be related to this fact so if we know it's a line
24:00 - 24:30 item on a transaction at a store so we know that that there's going to be a product related to this right somebody's renting a movie maybe they're renting you note Gone with the Wind dances with the wool Saving Private Ryan and Mitch's case all about miniature trains so we know that we're going to have a product dimension that's going to be related to this we also know that a transaction is going to occur on a given day right so we know that there's going to be a date associated with this and a customer is
24:30 - 25:00 going to be associated with the transaction so that we know we're going to have a customer SK there for that and we're going to discuss what an SK is here in a second and then also we've got product we've got date we've got customer and what was the other one we said we're gonna do and we've got store right this is going to be happening in a store okay and so this this factor will here kind of embodies what our grain is right this is a transaction that's
25:00 - 25:30 occurring with a product on a day with a customer at a store and so based on this we can do analysis on how many products are we gonna sell in us how many products we sell in a store last month how many products do we sell on this day how many products are bought by our top 10% of customers how many products are are upset are our stores selling by sword so we can do all kinds of different analysis based on this particular grain so now that we've kind of identified
25:30 - 26:00 what is going to be the brain of our of our business process that we're modeling here now it's time to move into identifying the dimensions what are the dimensions that are going to be related to this particular brain all right Thank You Dustin so what does it dimension a dimension is going to contain descriptive information regarding our data if I told you that we had five million dollars in sales you're probably going to have some questions about that you might be excited that we have five million dollars in sales but it's not
26:00 - 26:30 very descriptive it doesn't tell you anything about our data so what we want to do with our dimensions in our dimensions to answer those questions that we might have questions like what is the breakdown by product category which one of our stores have the highest amount of cells how much did we sell last month and we can break that information down by using those dimensions now these are questions that we answer with dimensions and if you've correctly identified the grain this step is actually going to be very easy as Dustin just shows you the related dimensions will be easily identified for
26:30 - 27:00 instance our grain statement once again is going to be one row represents a line item on a transaction in not fact table all right so we have product Dustin's already showed you that one we're gonna have customer we're gonna have date we're gonna have store Dustin's gonna go ahead and throw those dimensions up there on our our diagram for you and then we'll start talking about those attributes that are related to that but we can get that from the grain statement one row represents a line item on a transaction we know the transaction had a product that was rented we know there was a customer involved we know that it
27:00 - 27:30 happened on a specific date and at a specific store so the grain is very important and everything else that we're going to do in and I'm doing and working actually working through this data warehouse however if you're going through this and you identify a dimension if you're looking at your grain statement you identify a dimension that it's not correctly represented in our string statement then step two in the process identifying the grain must be revisited we have to go back to step two so Dustin this is actually a good little segue here to ask you do you see any chance that our granularity
27:30 - 28:00 identified can actually be subdivided further because it's pretty low right now that's a good question and it's something that you have to take into consideration as you're going through the design where else so in this particular case we've determined that one row and our fact table represents a line item on a transaction right but if we get into the if we get into identifying the dimensions and our business our business users come to us and say hey we want to be able to tie a particular transaction to a special that's going on a coupon
28:00 - 28:30 that the customer used what is some kind of promotion whatever might be going on in the business so that way we can analyze a promotion effectiveness now if we do that and say okay well a customer could there could be one promotion or more than one promotion associated with a single transaction so maybe there was a deal going on rent one movie get one free kind of thing but they also had a coupon for vaga skittles or something like that okay so there's you've got a you've got a customer using the coupon and then
28:30 - 29:00 you've also got in-store promotion going on and so each of those promotions has an effect on that sale and so we may determine that you know what the grain of this fact table isn't really one line item it's a coupon of one coupon per line item so we could have multiple coupons associated with a line item so if that kind of happens where you decide that you know what the grain of our fact table is not correct we've got to go back and revisit this and it's important that at this stage you go back to step
29:00 - 29:30 two because we've got to be brain nailed down this is the most important step because like I said earlier if you mess up the grain and and you and you find that out later it's gonna cause a lot of rework so if you get to identifying the dimensions or if we get this step for identifying the measures and you determine that you know what our brain statement is not completely correct we need to go back and go back to step two and revisit that so that way we can get that correctly because this is really gonna lay the foundation work for the rest of the data warehouse and and and
29:30 - 30:00 the cube as well this is going to play an impact on the cube so if you if you find out you've done the green wrong and you're already into the cube part of it then you've really got a lot of work to do all right good job Thank You Dustin we actually have a couple questions here a lot of people in the webinar noticed that we did not add the employee dimension and they're letting us know about that so if you want to throw that in there you can and then also the next thing that we have a lot of questions what is a surrogate key so I'm coming to that point in just a second there it's like they're reading your mind I know they're ahead of me here
30:00 - 30:30 we're gonna talk faster okay so in my product dimension the first thing we're going to do here in this dimension is we're going to add a database key for the table the database key is an identity key on the column and this is called a surrogate key and this is going to help us with modeling the relationships between the dimensions and the fact tables now the reason we don't use the business key is if we use the business key then we're going to be vulnerable to things like source changes in the transactional system and that might not mean a whole lot to you right now and we don't have time to get deep into detail on that but if something
30:30 - 31:00 changes in your source system like a metadata change the data type of that changes it's going to affect all the relationships between your dimensions and your fact tables so we want to insulate ourselves from those kind of changes and that's what the surrogate key is going to give us so as we import data out of the transactional system and into our data warehouse if there are any changes in the sources that could definitely cause a lot of problems for us so aside from just insulating ourselves from any of those issues in our source system with
31:00 - 31:30 our data warehouse we're also going to create a database key I'm sorry we're going to prove we're going to go ahead and create our product key or our product ID on our product dimension as well now that doesn't kind of catch up here so the product ID is going to be the business key we just moved from our surrogate key which is that primary key in our data warehouse but that business key that's critical to our business is going to be the product ID the product ID is going to be how we can essentially go from our data warehouse back to the
31:30 - 32:00 transactional system and we can kind of track and trace exactly where that product came from and we can see exactly what that product is any of the transactions around the products of the business key relates back to the transactional system but that surrogate key does not that did that unique identifier in our data warehouse does not relate back to the transactional system and there's also a couple other reasons why we want to have that surrogate key other than just insulating ourself from changes in the source system one good example is going to actually optimize the joins and our fact
32:00 - 32:30 tables and our dimensions so now we're going to be joining our FAQ tables and our dimensions on single integer columns on these simple two dollars instead of strings or any of those other data types which is going to simplify our model and improve our joint conditions circa keys are also going to allow us to keep historical information in our dimensions over time so if you're building a very robust data warehouse and you want to be able to track cells when you were selling a product at $5 and when you were selling a product as $6 and how much did we sell did sales
32:30 - 33:00 increase or did they decrease we can keep that kind of information or data warehouse by keeping a history of the price that we sold that item at or that we rented those rentals at so that's what survey keys give us we can't build a slowly changing dimension that changes over time using business keys the other reason that we might want to use or not use the business key is that we might have two different departments in our business that have different product IDs for the same product so you might have marketing that simply calls all baseball
33:00 - 33:30 bats just baseball bats but then our manufacturing department might have a product ID like abc123 so those IDs don't necessarily relate between those two departments go by creating that surrogate key now we can have that one consistency in our data warehouse so those are three other reasons why you might want to have surrogate keys or why we you should absolutely have certain keys in your data warehouse all right so Dustin is filling out your dimensional model right here on the board here so in our product let's see what we have here we have Product ID that's that business
33:30 - 34:00 key we have product name product type to keep this a little bit simpler for a one-hour webinar we're not putting everything in there but just imagine what else you could have in your product to mention you might have product size you might have the date that you started selling the product you might have the list price of the product how much do we sell this product for these are all things that we can track over time that's related to our product I said that's a good one list broad list price this price especially for historical slowly changing dimensions right I'm going to track over time all right so
34:00 - 34:30 you have the sell date you have the customer are we gonna put the store in here alright so in the customer dimension we have our customer first name customer last name and the address is very important where are our rentals coming from what zip code are we getting our rentals from and we can break that down that can be information that we to our marketing department that's very important we might do some other things in customer as well though right we might do gender we might want to know if they're married or not we might want to
34:30 - 35:00 know their date of birth so we can kind of get a an age of our average customer and the genres that they might be coming in there and and rinsing from us so that would be very good information to have all right and now you've thrown the store in there so in the store we're going to have some good information in there as well we're gonna have the store name we're gonna have the store address we're probably gonna have some stuff like maybe the store manager now as you get into a little bit more advanced data warehousing concepts you might break the store manager out into a separate table but we want to keep this as close to a
35:00 - 35:30 star schema as we can so you can you can add the store manager in there absolutely and Dustin's also going to have the employee dimension now I was actually interested in the date dimension that you had it over here Dustin you got some we could add a holiday in there hi any information would be pretty interesting yeah exactly now one thing that I'll mention here with attribute selection is that the best kinds of attributes that you're going to have in your data warehouse are gonna be those attributes that are most descriptive right so what you don't want
35:30 - 36:00 to happen is you don't want to have a dimension like store right that only that it's just full of ID's right so you've got like the store ID maybe you've got the you know geography ID or you've got like all these different types of IDs that are just all these integer values so the best kinds of attributes are going to be very descriptive types of attributes that the users can look at and they don't have to ask a lot of questions about that what you don't want is you don't want your user to have have to have sticky notes all over his monitor because he's trying
36:00 - 36:30 to figure out okay store ID number 12 is the one that I've been watching lately so I need to have that written down somewhere so you want them to be able to look at the attributes and to have a very clear understanding of you know what the data is what it what is what's included in there and they don't have to come to you with a lot of questions about asking you know what does this mean right you want it to be kind of intuitive from that kind of standpoint now I want to point out now that we've kind of we've identified our dimensions here based on our reign statement right
36:30 - 37:00 and so we've determined that on a transaction item level we can we can we can basically understand there's a product associated with that there's a date associated with the sale an employee you know worked with the sale would work with a customer on that sale that happened in a store and a customer bought that product right so we've been able to identify our dimensions there and I want to point out a few things to be aware of when regarding SSAS okay
37:00 - 37:30 you'll notice here that I included a date dimension this is very important for analysis services and so you want to make sure that you always include a date dimension in your data warehouse and some of the other may be thinking okay well that's silly yeah we want a date dimension or day where else but I have I have seen data warehouse masking cubes before that didn't have a date dimension and so it was always it was always just kind of known by the users that okay this is just a point in time right there's the date is going to be whatever is the last
37:30 - 38:00 current date available which is not exactly all that useful because if you don't have a date dimension you can't track things that you know that aren't aren't included with me no basic T sequel functions like company holidays or fiscal calendars those types of things so we needed a dimension to track those kinds of attributes but also an analysis services if you want to do any types of time calculations like last year last year what don't have any cheese balls we sell last year or or how many movies did we're in last year or if
38:00 - 38:30 we want to do have any type of year-over-year growth analysis or if we want to build any trend reports or anything like that we're gonna need add a dimension analysis services to do that now as I'm coming on through the dimensional model here our star schema here and I've put in the attributes into the tables I've just kind of used generic data types and so this is just from a a purely you know exercise standpoint and so the data types here aren't that important but in your environment they are very important you want to make sure
38:30 - 39:00 that you use the smallest data types possible because this is this is going to play ripple effects into analysis services so if if you don't set your data types up correctly and you just set everything in your data warehouse too extreme we're saying you know what everything is varchar' 50 and that's what we're going to go with this this is going to negatively affect the performance of SSAS and the reason for this is because in announced services your string data is actually stored in a separate file and so anytime analysis services has to access a straight string
39:00 - 39:30 data whether it's a customer name a store name and address a month name anything like that it has to go to this other file look up that data and then return to the results so there's going to be a performance impact to to your queries and processing processing of your cubes when you use string data so it's important to use the smallest data type possible if you can store something in an integer or a big int you want to do that because that's gonna that's going to improve the performance of your
39:30 - 40:00 queue also it's important to just just kind of remember that a single field non nullable integer key is that is the best data type we can use so if you can do that do it but like I said the best data is going to be the descriptive data so you don't want to go overboard you want to strike a balance there if you can store a customer a customer name and use the I customer ID and that's an integer as the key value for that column that's something good to do there as well also
40:00 - 40:30 one thing that you want to keep in mind as you're going through the design of your star schema here is use user friendly naming conventions don't go with them you know don't expose things like fact video sales to the users you want to expose it as maybe just video sales in your cube um you know dim employee they're not going to understand that dim is short for dimensions so you want to expose the the objects here in in a user friendly way and the last thing I'll say about this is about dimension optimization is look for opportunities to build in natural higher
40:30 - 41:00 piece into into your interior dimension design and so we're talking about hierarchies we're talking about you know arranging the relationships between the data in a in a natural way such as you have calendar year at the top you know months under that then you have date right so you have this kind of natural hierarchy these natural relationships that exist between the data because you can build those kinds of relationships into analysis services and those relation and when you those those attribute hierarchies or or
41:00 - 41:30 user-defined heart he's correctly built into analysis services it gives you a performance bonus so that's something you want to consider as well all right so I I think I don't think I'm gonna go any farther with this I think you guys get kind of the point here of what we're doing with the dimensional model here and so once we've identified the business process as step one then we've identified the grain and then based on the grain we're able to correctly determine the dimensions and the
41:30 - 42:00 attributes that are going to make those its dimensions it's time to move on to the last step and that's identify the measures that we want to include in our fact table okay so the measures are going to be how does the business measure success okay so this is gonna be you know save things like sales amount you know what was the what was the sale was that line item sale amount what does the product cost associated with that sale how many
42:00 - 42:30 videos were on that line item did they rent you know a DVD copy of you know Dances with Wolves and then a VHS copy you in school you know whatever the case may be or did they get you know ten bags of skills for the 500 kids you know so you're gonna have a quantity associated with that to that we probably want to track now I'll say the same thing here that I said when we were discussing designing or when Mitch was discussing designing the dimensions if we identify and measure that violates our grain what
42:30 - 43:00 do we have to do guys we got to go back to step two right we got to go back to step two because we've come up we've identified something in our model that I that violates our grain so we got to go back to step two let's um revisit the grain statement make sure that we actually have that correct and then we can come back and go back to step three make sure our dimensions are squared away that's all correct and then we can come back to step four where we're going to identify our measures identify the facts so in this particular case and
43:00 - 43:30 pull back up my dimensional model here now we're ready to get started with the step where we're going to build in the measures into our cube so we can solve the business say hey you know what are the measures you on track what are those high impact low risk metrics that we want to track here in our in our in our cube or our in our data warehouse so they may say okay well we want to sales dollar amount right that's important and I'm just gonna put these as in but you know you probably do like decimal or something there's going to be a quantity associated with that
43:30 - 44:00 right and maybe there's also going to be a cost that cost them out right and that would be important to track if we wanted that you know figure out any type of um you know profit or do any profit margin type calculations okay now and I'll just go ahead put those three and we'll just stick with that because those are three pretty common measures that you probably had if you have a retail business process modeled in your data warehouse probably have those types of measures as well but you might have have other things like a shipping amount or a tax
44:00 - 44:30 amount or you know whatever whatever the case may be now when you're identifying measures for your your star schema the best measures that you can put in your star schema are going to be measures that are fully aggregate able RF are fully additive and what I mean by that is measures that can be rolled up they can be sliced and diced in any way shape or form and the number always makes sense so for example if we roll up our
44:30 - 45:00 sales for all geographies I can see what are my sales across all of my geography regions right that number makes sense if I want to see my sales for customers that number still makes sense if I want to see my sales by month I can slice it by month that number always makes sense okay because it's a fully additive fully aggregate able measure those are going to be the best kinds of measures to include in your cube because your user really can't slice that wrong there's they can't there's a little work pretty hard to mess that up right so measures
45:00 - 45:30 that are fully additive fully aggregated are going that are always valid no matter how they slice and dice it those are going to be the best measures to include in your data warehouse theorem you're cute now you may have a requirement for some measures that are not additive at all you can't add them up and have them make sense so an example of something like that would be not profit margin right so we can't put profit margin in the tape and then have the user add that up because that number would be nonsensical right it's not going to make sense to
45:30 - 46:00 the business or the users if they add up profit margins for all geographies and they'll end up with some crazy number that won't make any sense right so measures that are non additive those are measures that are going to be better handled in the in the data access tool so it may not be as I say yes you know you may not be an ssa a shop right you may be using some type of other reporting technology like tableau or something more maybe your your even doing in power bi or you've got something else power pivot or something
46:00 - 46:30 else going on there right those types of calculations like profit margin are going to be better handled in the data access tool and certainly SSAS can handle those types of calculations also if you want to do any type of time calculations like sales amount for last year sales amount or your year-to-date quantity or my year today or my last year year today so that what we can see are our sales growing this year compared to last year are they actually increasing unit price those types of
46:30 - 47:00 measures are going to be measures that you want to handle in the report and the data access tool or SSAS also if you have any kind of type of you know KPI metrics that you used you know kind of the red light green light yellow light type of indicators that's something that's better handled and SSAS and so when your can when you're planning your data warehouse and if SSIS is in the picture you want to keep in mind that a lot of a lot of measures can be better handled with SSAS those measures that
47:00 - 47:30 are not additive like ratios or KPIs or time calculations that those are better to be handled with SAS then in your data warehouse if SSAS is not in the picture then you know you're gonna have to jump through a lot of Hoops to build in some of these dynamic type time calculations that SSAS can spit out in no time also at this point it's also good to begin estimating how many numbers what's the number of rows that are going to be in this fact table right and so it may be a
47:30 - 48:00 small fact table you may have you know you may determine that you know if we're only gonna have three hundred thousand rows in this back table so that's not very big that's not a lot of data and so maybe you know a clever partitioning scheme is an appropriate Nessun area but if you're a big company like Amazon I can I can't even imagine how many transactions a day Amazon has a Walmart something like that you're gonna have a ton of data in here and so you want started thinking about it was my partitioning scheme they'll look like right can we partition this
48:00 - 48:30 table down in it you know if you're big like Walmart you may even partition down at the hourly level so that you know you may have 50 million transaction that an hour if you're something if you're a company like Walmart and if you're going to be partitioning your table your table in your data warehouse you can align the partitions with partitions in the measure group in your cube and that will give you a performance bonus so aligning your SSAS partitions with your table your sequel server table partitions will
48:30 - 49:00 also give you an a performance improvement there okay and so at that point we identified the dimensions the the measures and so we've got a very basic star schema here that we're ready to start I'm kind of playing with a massaging and developing and moving on from here so Mitch did you want to say something no what we do have a lot of questions okay we'll get to the
49:00 - 49:30 questions here in a few minutes here so what I want to kind of do now is I just want to kind of recap really quickly and I want to make sure that we have plenty of time for the question so we've got about 10 minutes left and then in the session here all right so let's recap real quick so the four basic steps to designing your star schema are identify the business process we want we want to and when we say business process we're
49:30 - 50:00 specifically talking about business process not business department we want to identify the grain what does one row in the fact table represent we want to identify the dimensions and if we get to this step and we decide oh we needed to mention that isn't represented by our grain statement we need to go back to step two here and talk about the grain and then if we we make a change the grain then we'll come back to step three choose the dimensions and the attributes and continue development from there and then we want to just go to step four
50:00 - 50:30 which is choose the measures that we want to represent in our fact table now I I have I've been consulting for eight years here at pragmatic works so I've done a lot of data warehouse design I've done a lot of cube design done a lot of ETL development and one of the things we do a pragmatic works is we do what's called kind of a quick start if you're not familiar with that as we go and we design a POC we go from zero to cube in about a week so we design the data warehouse we do ETL we do cube and we do reports in a week and so it's pretty crazy week but I will say that this part
50:30 - 51:00 here designing the data warehouse we usually spend about three days on it can be two to three days depending on the kind of questions we have and and so this and the reason we spend that much time even though we still have ETL cubes and reports to do is because this is by far the most important step getting the data warehouse right if you don't get the data warehouse right you're gonna have lots of problems down the road so this is the most important step and that's why we spend two to three days discussing that now resources if you're
51:00 - 51:30 new to data warehouse design these two resources here are killer these are the best resources in my opinion that you can find the data warehouse toolkit you'll find a lot of the material for this session came from the data warehouse toolkit because it's an excellent book written by ralph kimball Marty Ross you know chances are many of you out there have read this this is a great book and it talks about these four steps and goes into a lot more detail than what we can do here on a one-hour session so I highly recommend this book the data warehouse fool kit also the
51:30 - 52:00 star schema star schema the complete reference excellent book as well these book both of these books in my opinion should be in every professional bi person's library they're they're that great and in my opinion the data warehouse toolkit is a little more instructional and the star schema is a better reference book because if you have special scenarios that you're you want to research both how I handle a fact dimension or a degenerate dimension or a jump dimension those types of things star schema is going to be your go-to resource for that so I highly
52:00 - 52:30 recommend both of these as as resources for you so look those up both of these books in PDF and in paper format Mitch does as well so I agree he's like this is our scheme of the best if your intro to today to warehousing I just think it's easier to read I think it's easier to read it's not heavy on the technology the data warehouse toolkit though you will probably just read the first five chapters and really get everything that we talked about here and a lot more and then the rest of that book goes department by department so it
52:30 - 53:00 has retail Human Resources finance and how you would model each of those different businesses so it's an awesome book and incredible book but I highly recommend both of those books yeah they're great books and and I want to think I'll say the last thing I'll say about those books is that the really nice thing about those books is for the most part their technology agnostic so you don't have to know a lot about sequel server you don't have to know a lot about Oracle whatever your database technology is because they're they don't approach it from a technology standpoint they're talking about it from a modeling conceptual standpoint so that's really
53:00 - 53:30 nice so thank you everybody for attending our webinar I hope that you learned a lot I hope you I hope that you got a lot out of this session and if you have questions you can now's the time to start asking your question I know we've got questions piling up now so and we've got about 500 people in this webinar so it's gonna be impossible for us to answer every single question but we'll get to as many as possible I've put up on the screen mine and Mitch's contact information you can email us at those email addresses if you're a tweeter you can tweet us there a sequel dusty is
53:30 - 54:00 my twitter handle and Mitchell sequel is Mitch's and then there's our websites there we blog about all kinds of different Microsoft BI related things sequel server scripts that we think are useful SSIS design patterns analysis service stuff MDX power bi all kinds of stuff there so check out those blogs you can ask us questions there as well if you have a question and you can't get through on the webinar there's the Twitter handle at PW question I'm sorry hashtag PW question you can use that for
54:00 - 54:30 your and you can log on the Twitter and you can tag your question with that hashtag and we'll go and take a look and we're gonna answer questions here in the webinar for now but then when the webinar is over we're going to take the remaining questions on Twitter so check that out as well so Mitch do we have questions we have more questions than I've ever seen oh my goodness I cannot keep all right what's up we'll handle these as best we can let's try this one right here all right easy question how did you get to
54:30 - 55:00 the database design window how did you get to the design window so this is in sequel server management studio so that's a good question I was pull that up real quick all right so there's my base design window whatever and I went into my database I just got a database here set up this called demo and I went to my database diagrams folder and I just right click and set select the new database diagram and so if you have tables that are already in your database that you want to add you can select those here and click Add or you can just right click and start naming new tables and creating them from scratch so pretty pretty
55:00 - 55:30 simple alright so another question we had was about using this surrogate key using an integer value for the date why do we do that why would you recommend that over using the date one person I believe said that they think that the Kimball book actually the one who recommended actually recommends using a date well the reason that we go with the integer and and I can't remember specifically why the Kimball book said two to go with a date datatype but and and I unless I'm gonna guess here and say that it's because it's a more user
55:30 - 56:00 friendly value but this the circuit keys are important in an integer type format is because they're going to perform the best right and and they're the way that we create relationships between the dimensions and the fact tables and remember the whole goal of the data warehouse is high performance high performance and so a single field non-mobile integer key is going to perform the best okay and so that's the reason we go with a date an integer field for our date key and our date dimension no I'll also mention here that
56:00 - 56:30 this date key is a little bit different than our other instead of being than our other keys in our other dimensions because instead of going with just an identity column we're we're just gonna you know auto increment it one at a time we're gonna go with our date dimension with what's called a smart key and basically the format looks like it'll look like something like this right if you're looking at my screen the format for January 1st 2012 it's going to be an integer value but it's gonna be formatted to look like a year a month in
56:30 - 57:00 a day so it's still going to be user friendly but then at the same time bring me at the performance benefits of having a non nullable single field integer key in the day dimension so that's why we typically go with the date plus if you're trying to find a record in your in your day dimension or in your fact table you don't actually have to join to the date dimension to get it because your fact table already has the date stored in it in the data SK and so you can read that and so that's kind of nice too
57:00 - 57:30 that's a good question as well if you want to try to take one more this one's a little tough but there was a lot of questions around this one it's something that Dustin will have a blog out there with a lot of these questions answer for you but they wanted to know when you would have a dimension with a dimension when you would you break off for a snowflake there's a lot of questions around that yeah so and and there's there's a that's kind of a big hot debate topic because some people will say well you might want a snowflake right if you if you if you want to maybe
57:30 - 58:00 you want it for instance something like product and product category right maybe you want to break product category out because we might have any other fact table that doesn't relate to product but it relates to product category maybe we've got a forecast fact table right now we don't forecast at the individual product level we forecast at a product category level and so we might have a forecast fact table and so that may be a case where you'll see some people say hey we want to snowflake that out now
58:00 - 58:30 that that's fine and you can convince me that you may want to do that but one thing that I'll mention in regards to analysis service is that when you begin snowflake in your data warehouse there's going to be an impact on the design of your cube so this is you have a a what's called a reference dimension right where you use an intermediate dimension to relate that product category to our fact sales table and so that's gonna there's there's some impact there with having a reference to mention which you can find on my blog on sequel dusty WordPress com
58:30 - 59:00 if you do a quick search let's see I've got it up here pull it up real quick so there's my blog sequel dusty calm so if you go to sequel gusting user research reference to me
59:00 - 59:30 there's there's a lot of reasons why you might consider going with a star schema and typically you see that with people who are more come from a transaction or an OLTP develop on the side because they want didn't want to kind of normalize things a little bit because that's what they've always done it's kind of leaning
59:30 - 60:00 towards that more for third normal form kind of way of doing things but you'll get better performance and you and you'll get better performance with your data warehouse loading your data where I've seen clearing your de warehouse if you if you try not if you try not to normalize things if you stick with a more denormalized design because remember the bottom line is we want a good performance right we're willing to sacrifice some storage or we'll if we're willing to sacrifice some redundancy or some duplication of data rather for
60:00 - 60:30 better performance that's really the end goal here is query performance query performance query performance that's a good question as well all right I think that's all we have time for today but we're gonna be doing some questions afterwards is that correct yep yep so if you have further questions we'll take questions for about 15 20 minutes or so on Twitter depending on how crazy it gets we'll try to get to all the questions on Twitter there's the hash tag hash tag PW question if you have further questions that we just didn't have time to get to
60:30 - 61:00 here in the webinar log on the Twitter hash tag PW question you can hashtag your question with that and then we'll try to get to those as many as possible but thank you so much everybody for coming to our webinar we hope that you found it useful we enjoy doing these kinds of webinars so if you have further questions about any of this kind of stuff log on the Twitter check out our blogs go to pragmatic course com lots of great resources at Freddie Mac works comm and that is where the recording will be posted on pragmatics works comm
61:00 - 61:30 [Music]