Exploring the Acceldata Data Observability Cloud

Acceldata webinar: Product Overview for Data Observability Cloud

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

The Acceldata webinar offers an insightful walkthrough of its Data Observability Cloud, led by Loretta Jones, VP of Growth, and Tristan Spalding, Head of Product. The session focuses on their cloud solution designed to enhance data efficiency, reliability, and performance across platforms like Snowflake and Databricks. It highlights how organizations can navigate modern data architectures effectively using comprehensive insights for better pipeline reliability, compute performance, and cost management. The demo showcases various features and tools for optimizing data pipelines, ensuring data quality, and mitigating common data management risks. The session concludes with opportunities for hands-on trials and workshops for further exploration.

Highlights

Tristan Spalding illuminates the benefits of their Data Observability Cloud, focusing on data efficiency and reliability. 🌟
The cloud solution simplifies navigating data architecture complexities and offers comprehensive insights. 🔍
Key advantages include cost control and performance optimization across data platforms. 📈
The presentation emphasizes a shift towards using data products that drive revenue and internal innovation. 🏦
Tristan showcases a seamless integration and robust alerting framework that doesn't rely on Snowflake credits. 💡

Key Takeaways

Acceldata's Data Observability Cloud enhances data efficiency and reliability across cloud platforms like Snowflake and Databricks. ☁️
The solution helps address costs, risks, and data management challenges with a user-friendly, insightful interface. 💼
Gartner's four aspects of data observability - monitoring upstream, downstream, compute layers, and usage are effectively tackled in this solution. 📊
The tool is available as both a managed service and software within your Virtual Private Cloud (VPC). 🛠️
Implementation of Acceldata's platform is swift, taking around 15 minutes, offering immediate insights for users. ⏱️

Overview

Acceldata's recent webinar, presented by Loretta Jones and Tristan Spalding, delves into the capabilities of their Data Observability Cloud. The session begins with an overview of its positioning within modern data architecture and quickly transitions to a detailed demo. This cloud platform, which ensures enhanced data reliability and efficiency, is touted to be indispensable for managing data operations across platforms like Snowflake and Databricks.

During the webinar, Tristan demonstrates the significant advantages offered by the Data Observability Cloud. By integrating seamlessly with existing data architectures, it helps businesses manage costs and optimize pipeline performance. With Gartner's comprehensive data observability guidelines in mind, the platform covers monitoring requirements for upstream, downstream, usage, and underlying compute layers, which is crucial for modern organizations leveraging big data technologies.

The presentation effectively highlights the transformative potential of data products to drive revenue and innovation. More than just a data management tool, the Observability Cloud provides advanced features like real-time analytics, user-friendly interfaces, and swift deployment capabilities. The session wraps up by encouraging participants to engage in workshops or trials to experience the platform's benefits firsthand.

Chapters

00:00 - 01:00: Introduction and Overview The chapter titled 'Introduction and Overview' starts with a welcome note from Loretta Jones, the VP of Growth at Excel Data, who appreciates the audience for joining the session. She introduces Tristan Spalding, the head of product at Excel Data. Tristan is set to present some slides providing an overview of their product, the data observability cloud, and the specific problem it aims to solve. A demo will follow the presentation, and the session includes time for additional discussions or Q&A.
01:00 - 03:00: Understanding Data Observability The chapter titled 'Understanding Data Observability' begins with an invitation for questions from the audience, encouraging them to participate actively by using the Q&A session or the chat feature. The focus then shifts to Tristan, who takes over the presentation. Tristan expresses gratitude to the participants and acknowledges the collaborative efforts of Excel Data's various teams, including engineers, the support team, marketing, and sales personnel. Tristan sets the stage for an introductory discussion about the core subject matter.
03:00 - 07:00: Data Platforms and Their Challenges The chapter begins with a brief overview of the company's position and quickly transitions into a focus on data observability in the cloud. An introduction to its placement within the modern data architecture is provided, followed by a demonstration of the product. The chapter concludes with a discussion on opportunities for practical, hands-on experience with the product, such as workshops.
07:00 - 10:00: Exceldata's Solution for Data Efficiency In this chapter, the focus is on setting up and utilizing Exceldata's cloud service for better data efficiency. The session includes signing up for a pre-trial and engaging in hands-on activities with your data. The speaker emphasizes the shift in perspective for companies — moving away from viewing data merely as a cost center to recognizing it as a valuable asset that drives value. This transformation is facilitated by using data products.
10:00 - 13:00: Multilayer Insights in Cloud Data Platforms This chapter explores the evolving role of data products within organizations, emphasizing their increasing significance in driving revenue. Data products are now frequently utilized in data exchanges and marketplaces, and they also support analytics and machine learning models. There is considerable excitement and grand visions about the potential and future of data products. From the perspective of data teams within organizations, the chapter delves into the transformative impact and functionalities of these data products.
13:00 - 15:00: Operational Command Center In this chapter titled 'Operational Command Center', the focus is on the challenges of navigating complex data environments without clear visibility. It highlights the growing importance of data observability as a major priority for organizations. The discussion aligns with Gartner's definition of data observability, emphasizing that there are four key aspects to consider in understanding and managing data effectively. The transcript suggests that data is a multifaceted entity that cannot be approached in a simplistic manner.
15:00 - 20:00: Pipeline Management and Data Reliability The chapter titled 'Pipeline Management and Data Reliability' focuses on understanding the different components of data management. It highlights the importance of knowing the upstream and downstream processes, the users involved, and the computational layers which directly affect costs and performance. A report by Garner is mentioned as a pivotal resource that explains these aspects comprehensively. The chapter suggests that the Excel data product aligns well with these outlined criteria, addressing various aspects of data observability. The current offering of the cloud data product is also discussed emphasizing its alignment with these principles. Overall, the chapter seems to weave these elements together to present a coherent picture of pipeline management and data reliability.
20:00 - 50:00: Live Demo: Features and Functionalities The chapter titled 'Live Demo: Features and Functionalities' discusses a comprehensive tool designed to enhance data stack efficiency, pipeline reliability, compute performance, and cost management. It emphasizes that while the focus is on cloud services, the tool is versatile and can be deployed as a managed service or installed directly in a VPC (Virtual Private Cloud).

Acceldata webinar: Product Overview for Data Observability Cloud Transcription

00:00 - 00:30 okay everyone let's get started thanks so much for joining us today I'm Loretta Jones VP of growth at Excel data and appreciate you giving us your time today to learn more about our data observability Cloud uh with me I have Tristan Spalding who is the head of product good Excel data so he'll go through a few slides just to give you um an overview of the product and overview of the solution what we're trying and the problem that we're trying to solve and then we'll go right into a demo we also have left time for
00:30 - 01:00 questions so if you do have any questions please put them in in the Q a or in the chat and we'll go over those questions at the end of the demo so Tristan why don't you take it away foreign Ty and thanks everyone for joining today uh as Loretta mentioned I get to work uh all across Excel data uh with our great Engineers our great engine uh great great support team great marketing team great sellers things like that and uh today I just wanted to give an introduction to a little bit of of who
01:00 - 01:30 we are as a company and very quickly get into the data observability cloud and what exactly we mean by that and so we'll start off with a little bit of an overview on where this sits you know kind of in the the modern data architecture the modern data landscape and we'll pretty quickly jump over into an actual uh tour through the product and uh you know again as Loretta said happy to take questions and uh you know we'll wrap up basically by showing how you can get Hands-On with the product either at a workshop that will will uh
01:30 - 02:00 we'll be setting up and running or actually directly uh in our cloud service today uh signing up for a pre-trial and getting help hands on with your data there so without further ado uh let me start by sort of framing where we are so I think for a lot of companies uh there's a vision which is partly realized but certainly a major initiative of using data products reframing how we think about data from sort of a uh more of a cost center and something that we're always chasing to something that's really driving value whether that's data
02:00 - 02:30 products that are reused inside an organization or increasingly data products are actually used to drive Revenue uh either directly by uh using in data exchanges data Marketplace or indirectly by offering analytics uh or powering machine learning models on top of that so I think there's a lot of grand division a lot of excitement as we look at how we see these data products going off and what we see them doing for those of us you know sort of more on the on the inside and the data teams this is a little more what the perspective looks like it's a little bit
02:30 - 03:00 you know a little bit blurry a little bit cloudy uh and trying to figure out how we're going to navigate all this at high speed without visibility is it's hard to imagine and so you know really data observability there's a reason that started to move uh move to the Forefront and in terms of some of the biggest priorities for many organizations and so for us you know we're very aligned with Gartner's definition here I know everyone has their own definition we sort of go with gardeners around you know there's really four aspects around this you need to under data is a complicated thing you can't just look at
03:00 - 03:30 one side you need to understand what's Upstream what's Downstream who's using it what the underlying compute layers are which are actually consuming costs and driving performance and so Garner has a great report kind of articulating this and uh you know you'll see as we go through this how the Excel data product uh really hits all of these points together so data observability Cloud the product that we're offering now uh this this is basically what we try to do and I'll uh
03:30 - 04:00 I'll also I'll clarify one question in advance there just as we go through but essentially what this is going to let you do is get comprehensive insight into your whole data stack in order to improve data efficiency pipeline reliability compute performance and basically spending as well so my clarification on this is that while what we'll talk about a lot is around uh around Cloud around cloud services this is actually available also not only as a managed service but as some a software that can be installed in your VPC and in fact that's one of the largest
04:00 - 04:30 deployment scenarios that we we observe here uh so what we're going to do when you know why why now what's happening now that requires this so I think it is around these data platforms like Snowflake and databricks confluent there's many more that are coming out around here and and having tremendous success and they're having success because they offer incredible benefits I mean I think for people that have been using data for a long time you know you understand how how easy it is to get started with these how powerful they are the ability to scale up instantly by you
04:30 - 05:00 know selecting a different uh Warehouse size from from a drop down in a graph beautiful graphical user interface it's amazing that you can get this acceleration it's also amazing that this gets upgraded for you so those of you who've done you know upgraded your uh your traditional databases that's a that's a huge process instead with with snowflake with data bricks with these cloud data platforms they're getting updated all the time transparently to you and new features are available so there's some tremendous benefits
05:00 - 05:30 these also have a flip side and it's just something that you know uh with with the benefits comes some of the costs and some of the risks so I think what you find it's so easy to get started that you don't need to take a 90 you know 90 minute or a full week training course to understand how to use snowflake uh you know databricks maybe does have a little bit of a learning curve as well uh but you're up and running you're doing that learning curve you know often in the platform itself and that has a lot of good things it also potentially leads to some you know mistakes and those mistakes you know are
05:30 - 06:00 not free necessarily uh I think the other aspect you know that's that's interesting that's right they're upgrading all the time so and those of you who who use this depending on where you sit in their organization whether you're a data engineer a data platform owner uh even a data analyst you may or may not be reading uh the details of every release release note from every cloud data platform but they are putting out great things every month I I always encourage people you know uh customers to look outside of it look at this looks like what's come out on October September from these these companies it's great but are your did you update
06:00 - 06:30 your workloads and your own configurations to take advantage of that as often a different question so there's some lags there's a little bit of free lunch here that people can do I think the last bit with this that people may have experienced depending on you know where you are in your adoption is around you know how predictable and really how how level is this growth so in many cases you know you can be doing everything right you can put be putting in good practices and suddenly there's a data problem there's a query that's
06:30 - 07:00 particularly bad there's a table that didn't have a Timeout on it and you know you realize not right away but a few weeks down the line hey there's a huge mistake here basically like we've been burned on this we've we've burned a bunch of credits this can never happen again and this is the type of thing that gets into sort of the CFO chain uh type of thing which is not necessarily where where you know we as data people you know want to start off and so all of these aspects you know what we're trying to offer here essentially with data observability uh from Excel data is
07:00 - 07:30 helping you make the most of these the leading cloud data platforms take advantage of all of these you know key traits that aren't available with traditional platforms uh but really avoid you know these risks so get the best of both worlds here so where does this set so you know I think one of the things that you know we have experience with just you know at our company a long history of people working in the data world is that it's not always simple there there are companies that uh have sort of been born
07:30 - 08:00 in the cloud and uh their services even then are a little bit spread out across different vendors different platforms but for Enterprises uh that really have Decades of data Investments uh it's not simple and things don't necessarily you know originate in databricks or originate in Snowflake they go through a lot of steps and so the Excel data perspective on data observability is that you really need to push a lot of the analysis left so there's you know people use the term shift left well for
08:00 - 08:30 us shift left means push all of the analysis we're going to do as far back as we can look at the files look at the streams you know even look at the the applications basically that are feeding us up screen and the more that you can get visibility here and date things here and apply efficiency here the more that you you avoid spending time and credits basically on firefighting and on queries down at the center of the spectrum and so you'll see you know how some of our technology choices we've chosen a quite
08:30 - 09:00 different architecture under the covers that allows this and you'll see some of the benefits as we go through but broadly we look at you know kind of this window of not just these platforms not just the warehouse but actually the data upstream and even these systems uh whether it's airflow or or prefect or whatever the system might be you know even for data science for mlflow or something like that uh use you know these systems that are orchestrating and reading and writing data and transforming data as it goes actually between each of these steps
09:00 - 09:30 so when we look at what this is and you're getting a little more concrete around what are these capabilities and why do they matter uh you know we basically break this down into four elements that we think are quite important so this this data reliability shifting left that we just spoke about the multi-layer insights the fact that something is an operational control center something you're going to react to uh today you know and not necessarily retrospectively look back every quarter
09:30 - 10:00 every month and say hey we should have done better these are the things we could have done better and then finally the idea of complete data coverage so not needing to make such severe compromises when it comes to understanding and protecting essentially the reliability of data based on platform based on cost based on the complexity of the type of rule you're trying to express okay so to get to dig a Little Deeper on this here um you know we talked about files and streams this is quite quite important right and so basically this is this is
10:00 - 10:30 powered by you know one of the choices we've made here is to use spark actually and deploy our manage spark or use your databricks cluster uh in order to actually run even very complicated even multi-data Source cross data source uh data data reliability checks and I'll show you in the product a little about what those look like but being able to to express this up here is a significant difference when you start thinking about what data pipelines look how long they take and the costs they incur uh so what we think this does you know of course this is a you know sort of
10:30 - 11:00 data data engineer oriented company things like that so we've got our little you know Pareto Frontier here of we think you know with this with our technology with our spark based data observability you're able to do more basically in a given amount of time given budget and not make those choices so we've seen some great success um with immense you know uh data companies that are making you know they make their business off data who've been able to dramatically speed up and increase the coverage that they have while reducing cost by using this rather
11:00 - 11:30 than it uh let's say alternative uh data data reliability data quality tools so multi-layer insights so this is what this is trying to say here as well is that um as you're when you're when you're analyzing data when you're looking at these pipelines so as a data engineer you know as a as a data platform or many of you may have been in a situation where you see that something is going wrong right so we notice we get alert we see that they're like the question is why what are we going to what caused
11:30 - 12:00 that and how quickly can we go from noticing this incident to actually resolving this incident and so what you'll find is often this actually requires jumping between areas that you know can sort of bridge skill sets that people like one person isn't necessarily an expert both in data bricks and in spark uh and in the actual data sets that are going through and so what we're able to do is do things like hey this data uh this data is late the state is delayed uh was that because the query
12:00 - 12:30 was queued that was on and this was the query queued because the snowflake Warehouse was too small and was the snowflake Warehouse too small because this group was forecasted to go over their budget so we can train the warehouse so it went down these are especially in the cloud data platform world where you may have decentralized teams operating these these tools and their own accounts a little bit autonomously being able to see this context and move between layers becomes quite essential I'll highlight another one for you so one of the big differences with cloud data platforms is that you actually your query is on bad data they're not just producing bad
12:30 - 13:00 results they're actually burning compute Cycles as well so it's not just time that you're going back and fixing this and the trust that you are wrote those are all there they've always been there you know um but uh but it's actually you know queries uh inserts of bad data like these cost at least as much uh as as they do with good data so this is another aspect where if you're looking at I've got this bill I've got this trend line that I don't feel great about I need to prepare a budget where can I you know sort of trend a fat or where
13:00 - 13:30 can I align things so that I'm still achieving the success that these platforms can deliver I'm still achieving these data these transformative goals but you know I'm staying on a reasonable budget curve this is one of the things that's not obvious unless you are looking at multiple layers so we see this is a big big thing as well the last bit so data pipelines so the data pipeline concept is is one of the most exciting ones that we've seen uh recently and it continues to explode in terms of the ways that people are expressing things like whether people are accessing the DBT whether they're
13:30 - 14:00 expressing the writing them in Python they're orchestrating with airflow or one of the many others or whether it's these traditional you know Workhorse drag and drop ETL tools pipelines are all over the place and they're quite complicated right because in many cases they're bringing data from different systems combining them they're running at particular times they need to deliver data at certain times and so one of the things that we really aim at is orienting our platform around pipelines and capturing instrumenting those pipelines pulling all the information back into our Central Hub and really giving the the notifications and the
14:00 - 14:30 insights and alerts you need in order to really diagnose where are the problems here let's fix it uh let's not say hey this is someone else's problem brought it to them this is someone else's problem brought it to them but actually say let's go in we have the tools to fix this and we understand the context of what we're doing enough to prioritize the next bit so we go into we go into this so so for an operational Command Center you know now we've talked about we're trying to find the issue we're trying to fix it so what you'll you'll know if you've
14:30 - 15:00 dabbled with each of these and uh as we go into the demo you'll see this as well these are quite different systems so even taking Snowflake and databricks as kind of a couple of the big sort of uh big giants you know in in this area these are significantly different platforms the functionality is um to some extent you know you can see these these starting to overlap more and more but the concepts the terminology the details are quite different between them and so what we find is it's often rare for a given data team in an organization even a very large organization to really have deep expertise in both and so we sort of take
15:00 - 15:30 that as part of our mission our reason for existence as a company is to really stay on top of all of this so read the release notes you know hire experts trained experts partner deeply with each of these providers in order to know what's coming be prepared for Incorporated it really before it comes out so our hope is that you know even if you're a year it's very experienced with snowflake are very experienced with databricks with the Excel data data to really cloud you're going to get augmented uh with
15:30 - 16:00 some of some of the time that that we spend uh trying to comb through all of this and aggregate data and synthesize it into a product for you we'll bring these into one tool many times you'll get you know uh nice views in one area nice views in other areas but you won't have the ability to pull them together and why does that matter well one because things are entangled data pipelines might start by you know they wouldn't start but they might have a phase where they put data in S3 it's then processed by databricks and then that's loaded into snowflake so when you think about how are we going to optimize this how are we even going to measure
16:00 - 16:30 the impact and the cost of this well having these in one views is quite essential the other one that maybe on people's mind you know given where we are on the calendar is budgeting so understanding how these systems are being used and really what their growth rate is among these different groups is a little bit hard to reconcile unless you're able to pull into one View and this is something that we offer uh with Excel data here so the last bit here really comes into Data reliability so we think you know as I mentioned we think this is an
16:30 - 17:00 essential piece often overlooked piece when it comes to uh efficiency in with cloud data platform Investments so one of the challenges with data reliability is always how do you get going quickly like how can you cover this data which systems do you support how much work do I need to do to get started and then can this even capture what I needed to capture so for example you know there are some simple checks that that make sense to you right like check if data is late check there's
17:00 - 17:30 more or less data than expected check if values are within a certain range things like that there's some simple rules and these are things that Excel data automates a lot of other tools automate as well that's sort of the basic thing that you point out your data sources uh and uh and get an initial measure but in many cases especially for organizations that have been monetizing their data or really building data products maybe even in advance of the the term data product becoming popular there's a lot of custom rules like it's not enough to just sort
17:30 - 18:00 of point the machine out and say figure it out and tell me something interesting that's a good part but you know the extra part comes after that and we see we work with organizations that have hundreds and hundreds of rules that they've expressed in many languages some proprietary some not uh over time and it's really essential that they're able to author and manage and apply these at scale with all of the sort of bells and whistles and fanciness around segmentation and alerting Reporting all of these things that you expect from sort of a modern tool and so our claim
18:00 - 18:30 you know and our our one of our guiding lights at Excel data is really to take this uh premise this this this this the span of simple checks and tables you know and broaden it all the way to complex user-defined cross table you know cross file cross-stream connections between these and so what you'll see as we go through the product you'll see kind of basic checks automated you'll see medium tracks like uh data draft and anomaly detection uh that might be more
18:30 - 19:00 appropriate for data science workloads uh you'll see those turn into one-click operations to add and you'll also see you know custom udfs that we're writing and applying those and segmenting those across the board so we think it's essential great to have data quality but unless you're easily able to apply it and affordably apply it at scale to everything it doesn't do a lot of good to have those capabilities so that's where we we really look at you know our four principles just to sum up um and hopefully that gives a little bit of an orientation to some of the key
19:00 - 19:30 features and uh you know with just a little more summary and intro I will then jump in and give everyone uh just an initial sort of experience and tour of what the product looks like so just a couple of things to keep um keep an eye out for you know depending on your orientation and where your focus is as we look into Snowflake and data bricks um so you'll see keep an eye out for the specific alerts that we have keep an eye out for the guardrails uh the automatic
19:30 - 20:00 you know sort of resource controls that you have on here and keep an eye out for how we look at the reliability of the tables alongside the cost similarly for data breaks databricks is a very deep platform as many people have experienced and it's getting deeper and it's also getting broader so what we'll show you with with this is really how we can consolidate a lot of this richness and really technological power into something that's uh hopefully you know what we aim for is very coherent cohesive experience on finding the
20:00 - 20:30 insights and the information you need um from all the information that's really emitted uh you know at the workflow level at the the executor level and everywhere sort of up to the you know a up to the entire account level and workspace level so we'll keep an eye out for that all right so with that let me jump over and um unless Loretta you think we should should pause for questions or um we can we can wait until later Joe why don't you go into the demo and then we'll take some questions after thanks great sounds good
20:30 - 21:00 all right so let me jump over to the demo here um and uh this is pulling up uh pulling up our our Cloud instance here so our Cloud control plane and um what I'll do is basically sort of take you through a little bit of a tour of the different capabilities here and show off how this is going to help us give get some Mastery over the data plot our key cloud data platforms okay
21:00 - 21:30 so where we're starting off at the top level is really a view of both the data sources that we're looking at these data platforms as well as the data assets underneath it so we're seeing what our costs are for these over time what we're going to do quite shortly is dig into these kind of diagnose where are their efficiencies where their inefficiencies where where are there are issues we're seeing what is this going to cost us next year things like that um but underneath this just to show you
21:30 - 22:00 as well this is actually looking at the data assets as well so each of these platforms uh across snowflake across databricks you can see here you know these have hundreds of tables uh you know thousands of columns and of course there could be many many more and so what we're able to do this is where this multi-level piece comes in is we're able to connect these two and so if we're spending a lot of money on something that we're either getting exceptionally poor data reliability on or we're not even testing for data reliability we're not even executing any uh any checks on
22:00 - 22:30 if this data is reliable this would be kind of a key action I might want to take if I wanted to understand you know how I can really get results from my data platform my snowflake or my databrick spend without uh without excess you know really exceeding any budgets that I might have set up okay so the first thing we're going to do is kind of walk through here and take a look at uh this data bricks instance and uh we'll then sort of go through Snowflake and then we'll see a little bit more about the data sets and the
22:30 - 23:00 data pipelines underneath all of this so across here uh what we're basically going to see is a breakdown of some of the usage here so I'll just bring us back to what what's a good it's a good time period maybe the last seven days last uh let's see the last 14 days um you know basically see a little bit around the cost Trends so where actually
23:00 - 23:30 you know are we spending resources uh on this and uh and and by whom and how efficient is that so from the top level what are we spending on databricks versus what are we spending on the underlying compute um in this case this is AWS it could also be Azure um what are we spending on the underlying compute um and then breaking this down you know not only into the type of uh of of resource that we're using but also into
23:30 - 24:00 the individual job uh so let's reload some of these um so basically then now into the core usage and then the efficiency as well I've seen keyword we're capturing some of these these assets and then we're also looking at basically the efficiency though so this is one of the challenges with databricks often is that it's incredibly powerful there's a ton of scale it's also very hard for new users to really understand and optimize uh the
24:00 - 24:30 resources so what size instance should we using so that we're not leaving cores idle throughout our job and you'll see a lot more as we go through this but you know you'll see lots of things I think the other classic One is using sort of interactive clusters versus all-purpose clusters um you know there's there's a lot of knobs you can tune and tuning them well can make a big big difference uh so with this we're basically flagging you know a few people that maybe are running you know some applications that are not that effective so we'll look at it must share so with
24:30 - 25:00 the measure I'm going to come back and look at you know a little closer at these applications uh and sort of see across the board um what are they using then aggregate we can filter this and I'm going to click on a mesh here and actually get a list of these applications themselves so you can see hopefully how we're going you know pretty quickly between a high level bit of information around where are we spending money where are we sending resources in actually to logs um and then even into uh even into
25:00 - 25:30 applications let me apply this back for him um and so what we'll do is you know instead of these many times these can get scattered they can get lost Excel they're just pulling them all in and making them all doing some analysis straight away but also making them available for deep brands of action so with this all I'll actually dig in to this application or or now what we call the workflow um and we're going to get a ton of metrics on this like more metrics than um than like a casual user would
25:30 - 26:00 probably pull out together but you know really the set of metrics that you're going to need in aggregate in order to really solve problems almost quickly and uh and introduce some efficiency here so just to give you a little bit of a sense of what we're capturing you know we get these aggregate informations on this but we can go down really to you know to the second uh of the execution it within each of these so we're able to correlate uh where are you taking yeah quick how much are you shuffling like we'll go through here you'll see some information on how skewed the tasks are
26:00 - 26:30 which in this case actually is quite significant and this these are factors that can really introduce inefficiencies in how long this is taking and that you know by really altering this and setting up the partitioning strategy to match your data as it's genuinely coming in you'll see a lot of improvements in terms of your efficiency um we'll bring in a timeline as well where are you spending time well in this case we're spending much of the time in a driver so that might be okay that might not be what we want to do and this gives a little bit of a Diagnostics into
26:30 - 27:00 how we might be able to do that all right guys so again these sort of correlated views debugging information so that if I'm given uh if I'm given a job I want to understand why is it slower than it used to be um now I'm able to actually pull in and do that discuss that and resolve it as we go through you'll see some some information on runs that we can compare here so many times these applications as we'll see actually from some of our pipelines and in a few minutes uh these can be run you know
27:00 - 27:30 every day multiple times a day but in many cases you're expecting some amount of stability and some amount of recurring you know work to happen this will automatically flag for you what's been what's different about this uh versus the previous run and really help again with sort of the Diagnostics on is this behaving as expected or is there something unusual that we need to investigate here okay so this gives you hopefully a little bit of a sense of the the depth that we go into on the data brick side and some of the capabilities that allow
27:30 - 28:00 you to really wield this this platform really effectively snowflake is is more interesting and this is one of the the nuances you know across these platforms snowflake itself does not necessarily offer the same uh the same details um that that databricks does for example so it doesn't databricks you know snowflake doesn't actually necessarily publish the exact instant sizes that you're using let alone sort of the memory consumption uh within within each of these all the time so a lot of the
28:00 - 28:30 details that you need for databricks that we've sort of you know built up our expertise on and and built into the platform now it's a little bit different for Snowflake and so what our snowflake experts have done here is really curate a little bit of a different experience so this shows you uh this will show you and break down basically at the service level at the warehouse level at the query level even user level all of these Dimensions what things are costing you and where there are inefficiencies so as we go across we'll just get a sense even at the most basic level okay how is this trending over time okay up
28:30 - 29:00 and to the right I think that's that's typical what's sometimes interesting though is when you start seeing these Spike if you start seeing storage Spike so storage usually is not the headline item but it can actually in aggregate accumulate uh accumulate some significant costs as as well um so it's worth breaking these aspects down what you also see you know commonly as well is basically uh you know where this is trending So based on historical usage
29:00 - 29:30 you know where do we sort of see this going and how does that relate to the capacity that you've purchased up front uh so in many cases with snowflake people are so happy with that it's so easy to use they start using it a lot and then it's maybe not as trued up with what was what was purchased as people would like and we end up with surprises in this case you're not going to get surprises right you're not going to get surprises because it's telling you what's being Coster and because you're basically consistently getting alerts across this on what these uh where
29:30 - 30:00 things are failing where where these uh credits are being consumed faster where queries are acute so let me just give you take a second here to flag this a little bit and just you know show a little bit of what comes out of the box on this as well as what you can compose yourself uh as you need to extend this right so basically automatically as soon as you are connected to your snowflake cluster and we get some basic information on what you're using we initialize these alerts and basically you know you'll identify
30:00 - 30:30 where you'd like this to go slack email your web hooks embed them into other tools all sorts of choices and you'll you'll start getting alerts on these uh when you exceed uh where you should be and that could be things that you know hey queries are queued maybe we need to increase the size of the warehouse or we're spending too much maybe we need to throttle this down down to things like hey we have a new user that signed up that you know has permissions that maybe they shouldn't have or maybe has a default Warehouse size that's an extra
30:30 - 31:00 large and maybe it should be it small so all of this is really in the service of helping you uh avoid getting a situation where you wake up one day you know things have been going great no news is good news you know we're we're demonstrating some great impact with applications you're building on Snowflake and realizing that maybe you're on path to be way over you know the credits you purchase in advance by the way there's a similar you know kind of set of resources here for for databricks as well
31:00 - 31:30 okay the other sort of cross cross piece we'll look at and then we'll talk about how what some of the levers we can use to really reduce this the other part that's important to understand is uh basically a chargeback concept or a showback concept so what we've done here is essentially set up cost centers uh which can be attributed either to data bricks resources or to snowflake resources or to both and by linking up these cost centers we're able to see you know basically where where is the usage
31:30 - 32:00 right which groups in our organization are consuming resources and hopefully they're getting some benefit out of it but that's something we can interrogate but which groups are uh consuming resources how is that trending you know is there anything unusual we've seen in terms of the adoption and so I think this this becomes quite helpful in trying to understand you know what do we need to forecast what do we negotiate sort of for the next year or whatever the time period is in order to you know sort of stay on plan here so for example and it's not always enough just to come in and look at sort of the
32:00 - 32:30 headline like that because if this these platforms are so powerful that they grow so quickly that you end up with a scenario like this where you know someone starts using data bricks in September you know Suddenly It's you know four thousand dollars a month um you know in the next next uh next month and potentially that's only going to grow uh as things scale so this I think is a becomes a useful tool to kind of you know flag where there might be some uh issues worth investigating now
32:30 - 33:00 I'll talk about a few of the things you can do to actually drive this down a little bit without sacrificing any any functionality and in fact with accelerating the success on these platforms so the first thing the first thing uh you know you can do on this front um again quite different from the databricks side or just a different angle that that we found effective here um is around accounts and users so here we're basically auditing you're tracking all the uh all the users that we have here understanding what their roles are
33:00 - 33:30 uh understanding what their you know certain key traits like uh their default Warehouse size uh which one that actually is you know do they have multi-packed like are they actually properly set up things like that because I think one of the most common aspects and certainly Excel data I will admit was was guilty of this uh in our initial adoption of snowflake is you know sharing passwords uh sharing accounts reusing things like that really not having any insight into that and so one of the easiest things you can do um you know to kind of flag at least
33:30 - 34:00 understand where you are is flagging you know who's actually using this platform are they set up right and uh and do they have the roles they should have or do they have access over goals I took that X tier on this is that what you'll often find is people whether experienced users or not will often spin up new resources uh like new warehouses for example and these may or may not have the controls that you really should have and they may not be as tight as they should be so whether it's things haven't been
34:00 - 34:30 suspended warehouses aren't set up for Autos to spend they're a bit too lenient with that or whether it's there isn't a resource monitor there isn't a statement timeout where it's too high there's no measure on these things like this will often happen and this is how things sometimes slip between the cracks even in really tightly managed organizations when that management isn't automated because it just takes one warehouse and one query to really create some of these horror stories that we've heard you know where unbelievable amounts of money gets spent
34:30 - 35:00 you know and no no one no one ends up happy with that it's just sort of something that slept through their products a little bit so this will flag that you'll also get an alert but this is always a place where we go in and say and basically audit okay does this have a resource monitor does this have a statement timeout that it should have uh and if not that's something we can easily apply the other aspect I'll highlight uh you know is really you know there's two two sides of it one is things that are not being used so I think there's a general you know even if even if tables even if storage is not your um
35:00 - 35:30 not your most obvious headline item in many cases like there can be a silent sort of harm done here with with words it's not a harm but like a silent cost through um you know Auto clustering and even just keeping this data around so we'll flag users that aren't you doing anything uh warehouses aren't doing anything tables that aren't doing anything views that aren't doing anything especially by the way with some of the you know enhancements around query acceleration service and materialize be used this this management of use also
35:30 - 36:00 becomes an interesting sort of constant question uh that people need to balance uh in order to really you know use this platform as effectively as they can okay the last bit you know that's kind of another free lunch type of thing is flagging people that are using drivers that are out of date so it's not the most you know glamorous thing but it is the type of thing that adds up so this driver is actually quite quite old by this point I think the latest is something like 3.13 about something um so this is actually missing quite a few updates
36:00 - 36:30 um that you know security fixes functional you know new features as well as performance improvements so this is another you know kind of easy thing to do once it but not having it be automated means people you know don't always pick it up it's not top on your list well in this case we're providing a list for you so it's much easier to do so let's dig in now into a couple of the bigger uh like biggest drivers really of where there can be issues so uh so this is really looking at I'll talk about primarily about two things to
36:30 - 37:00 that uh let's do three so queries you know warehouses and then tables so for queries you know this is what you're you're running right and we do a lot of breaking down of this uh across you know a number of Dimensions so understanding where things are are spending time where there's execution time where things are taking a long time to compile and so forth I think the the more interesting part of this and the most interesting parts of this ends up being around uh queries that are you know have certain bad
37:00 - 37:30 traits to them so they're either running for a long time um they are doing scans you know across the entire table you know they're spilling to disk uh these are all issues and you know this is all interactive so let me get rid of these these extra Smalls like I don't think we care too much about the extra smalls so let's filter these out uh and apply that and get to some some queries that are you know spending a little more resources here so if we do this you know we're now uh let's get rid of the nulls too
37:30 - 38:00 um we're now going to see you know a little more um you know in terms of of some significant queries that are running and what this does is give us the opportunity to basically identify you know which are running well which could run better and uh and basically where are we spending you know where are we spending time and what are we getting from that on the warehouse fronts the warehouses are really you know primarily what's going to be driving cost in many cases what we're flagging here with this
38:00 - 38:30 Warehouse load view is basically where there's less load than the warehouse really needs right so if you've got something and uh we've got this default Warehouse so really sixty percent of the time you know this Warehouse is not really the capacity you know it's it's over proficient in a sense so this is something where you know given that larger warehouses are more expensive than smaller warehouses it's something where you can really win by reducing at least considering reducing the size of
38:30 - 39:00 this and seeing what the results are similar here you know we're running these there's not a lot of load for much of the time here um and so potentially this could be this could be resized you'll also see you know in some cases we don't have any examples here but you will see cases where something is has reached High load and the warehouse is really um the warehouse is really saturated and you know assuming that's powering a valuable application assuming the data is good all of these things that they're the strong case you know empirical case
39:00 - 39:30 based on the data we have to consider really up leveling that and resizing it and getting some benefit from that so this is just a tool that provides you know it takes us Beyond you know in some ways like a finger in the wind or a little bit of guesswork in terms of understanding you know should the shoulder Warehouse be larger or should it should it be smaller foreign I'll call out here is around uh around tables so tables you know what really flagging here are what are people accessing so are these tables being used
39:30 - 40:00 we're giving you some stats around uh what these end up um what what these end up costing or sorry what what like how effective they actually are uh across us and then the other aspect that's you know fairly unique here is really looking at the quality of the tables themselves so on all these Dimensions I mean let me go back even to the cost side on all of these Dimensions um you know you'll notice this this view is always available right so there's a view to understand great work querying
40:00 - 40:30 this table we're running a stop it cost us this much you know here's what it is we could have made it X percent more efficient but we're also going to say hey is this data even any good so we're going to use this example around uh you know stations and weather and bike sharing in Chicago here just as an example but we're basically going to say hey this data you know what this is flagging is like this data is not necessarily good so there's enough data it's a sort of volume reconciliation timeliness this is data drift this is data quality what we're seeing is is
40:30 - 41:00 this data isn't necessarily very good and we can do this for any of the tables that we have and going back to one of the principles we shared earlier you know Excel data makes it very easy to actually apply this you know and and cost effective as well to apply this at a pretty large scale both in terms of number of tables number of things that you check and uh volume of data itself and so with that why don't we actually cross over and out and say okay we've let's say we've done what we what we can you know pick off on kind of the are these queries running well now let's get into is this data even any good
41:00 - 41:30 okay so I'll just cross over and what this is now showing is basically you know one of the tables underneath this uh so this is this actually happens to be in a snowflake Warehouse uh but it could equally be a file or things like that although you wouldn't see it on the other side uh if it was a file so what we're showing here is basically you know a couple of the ways that this can get broken down and how how we profile that um so just give a little bit of a sense of of what we mean by some of these
41:30 - 42:00 terms they're sometimes used differently data quality is really these are these deterministic checks that we look at right like are there nulls uh are things within the range that they should be are the values that we expect showing up or are we seeing erroneous values they don't make any sense reconciliation can be you know are we seeing exactly the same rows or are we seeing the right number of rows uh we also look at you know data drift schema drift um and sort of these these sort of checks to understand you know this is a
42:00 - 42:30 standard type of thing but something where you're able to understand you know are there any statistical differences between the data we've seen now and the data we've seen later sometimes that's something that's relevant for a machine learning model where that's sensitive to the distributions of data being fed into it and we'll give bad predictions if it hasn't been trained in that area uh sometimes it's actually just a great way to catch data quality errors right if you see that the values the number of uh distinct values for something the cardinality of a string field has blown up immensely you know that's probably
42:30 - 43:00 not a genuine shift in in the world that's probably a corruption or data and this isolates that for you very quickly um great so with this let me just highlight a couple of these like we uh we just applied these there's a lot of sophistication around uh how these policies get managed how they can get authored I don't know if we'll I'll try to show a little of that in here um but let me just show one element of this that's a little bit cool uh especially as you you get into quite complex scenarios so what we're going to
43:00 - 43:30 do here is this was kind of basic right like we're looking at these these attributes um we're looking at are they kind of covering their Dimensions what we can also do is actually split this up by segment and by complex segment at that so what we're going to do here is basically split by a couple things the station status and the station type you know apply these to your situation it could really be anything across here and we're going to do is take every rule that we did every check that we did and basically put this in to break this down by segment and flag on okay this was
43:30 - 44:00 high quality this is not so where this ends up being extremely useful is when you have something like devices or you have something like data providers where you're aggregating things from very easily you now become able to say oh great well the aggregate value you know 90 this is 90 you know quality okay that's okay but this provider is responsible for all of that right all of their data was terrible and I can tell them if I can tell anyone else who needs to know exactly where it failed which rows actually even failed uh if we go through this which rows were good what
44:00 - 44:30 the trades of that are so there's a lot of Diagnostics beneath this where you're able to really Spotlight instantly uh what these are so this is a simple example but I think this is one of the more powerful features where um you can start to segment intercept this as well so right so you can pick multiple Dimensions that you want to segment on uh there's a lot that can happen here and it provides a great way to really isolate where the problem problems are and where you need to go to fix them
44:30 - 45:00 um I think the other let's see the other piece that I'll highlight here we talked a little bit about drift uh I think there's a lot Logan author what what I maybe want to show here a little bit is I just want to give a little bit of a highlight of the more complex end of the spectrum so we talked about a lot of the simple ones they'll be familiar to people um what I want to talk about a little bit more is is basically something around here so I'm going to take uh one of these kind of templates um and just give you a little bit of sense of some of the things that you can
45:00 - 45:30 do you know in in this platform so I don't know if I want to edit or view this so let's take outside so for this um you basically in Python Java JavaScript Scala uh you're going to be able to really write your own expression on this validate it and then save it back and publish it back basically to your organization and so in this case you know we're basically hashing two strings we're going to compare if they're different or not
45:30 - 46:00 um this is letting us and in many cases you'll see teams as I said earlier that have hundreds of literally hundreds of these and they know what good data looks like for them you know they don't want someone to approximate it for them they want to express that exactly and directly and so this is something that allows you to do that and basically allows you to express and share that and make that easy to author by the way this also executes I think one of the great things about this is like this also executes in pretty much the same way right so um so all that segmentation that I
46:00 - 46:30 showed all that analysis all the spark player that's where that this is being executed so you can kind of get to layer on your own checks that you might have been running for years at this point with some of the more novel insights around around segmentation and lineage and things like that um so I think that's that's a little I mean there's a lot that we can go through on this I think what I want to maybe you know cover on this as well is basically Zoom back out right so what we're looking at take
46:30 - 47:00 another look here at what we're looking at so this is the table this is in the prod table station status so the thing about this is this table didn't you know the data didn't fall out of the air here right this is looking at if we look at the profile this is looking at you know quite a bit of information some of that's about weather some of it's about bikes like that so what we're going to do actually Zoom back out to the pipeline itself so this this node that we were just looking at was this and that's actually Downstream of many other Transformations that might be from
47:00 - 47:30 other tools other platforms uh and so you know our premise this goes back to our sort of shift left tenant around data reliability we want to shift actually our analysis back over to the left here and say let's go all the way back uh let's first let's instrument this pipeline so this is actually an airflow pipeline uh running in composer you can see we're running it every three hours or so and have been for quite some time um and 380 executions now so what this
47:30 - 48:00 is doing now is is basically uh at every check along every step along the way here these also have their own data policy quality data quality uh policies in here in many cases right and so what we can do is basically at any moment along this we can eject we can say stop now we don't do that automatically that's something that we you know provide guides to there's ways you know there's ways to do it very easily uh we don't we think pipelines don't necessarily need uh more help breaking than they already have so we make it
48:00 - 48:30 optional but what you can do is basically Define these policies centrally um so you know you can import them shift them between assets but it's key one of the key key point I'm trying to convey is you define the policies centrally so that your data Engineers out out actually authoring these policies don't need to do that they don't need to maintain that in code every time and you're not you know on the other hand like responsible for that or you're not dependent on them to do it right every time instead you define those quality policies centrally now the
48:30 - 49:00 person authoring the pipeline now has the tools to basically break the pipeline if they want to eject if we're putting data through that's not any good uh and Excel data will actually also you know can help quarantine those rows it can help only route the good ones through there's other extensions on top of this but by shifting this all the way over here like we're basically able to kill this pipeline so instead of all these runs you know if we go back to some of the other ones we can see you know how long they took we can break down essentially um
49:00 - 49:30 you know how long it took we can find which nodes basically and processed in this on time which passed which failed which were slowest things like that there's a lot of analysis that comes on top of this and uh this is a pretty active area of work for us as well so you'll see quite a bit more to come as well uh but this essentially you know gives you this whole pipeline view reliability even on files um even outside of warehouses and by stopping things up here you really get a number of benefits all right so there's two other bits I
49:30 - 50:00 want to I want to sort of highlight before we uh before we we sort of pause for questions and then wrap up so the first is you know that we really do this as an operational platform at one level so one half of it is really operational and just as an indication that I wanted to give you a little bit of a view into kind of our incidents and overview command center right so what we're trying to do is give you this view across platforms we're looking uh actually across redshift and other other platforms as well like but across
50:00 - 50:30 databricks snowflake like we're flagging when these are being opened when they're being cleared and so you both know sort of an individual data engineer uh responsible for a given set of pipelines and at different levels in an organization where you might be looking at your entire portfolio of you know hundreds tens of thousands of pipelines and data assets you're able to get a sense basically of how you're faring across the board here and and what this looks like so we really think you know part of our vision and our premise as a company is you know data is an essential
50:30 - 51:00 asset it's a mission critical asset for more and more organizations and so the operational insights and the operational rigor that really um has been established around software applications of last you know 10 20 years that's coming to data as well and we're very much you know a fan of bringing that forward the last bit I'll highlight here I just want to wrap up here and then pause for questions is really you know the other macro level view of where this is so some of the Strategic analysis of how are we faring basically on our data
51:00 - 51:30 coverage as we go so what we're looking at here is basically a breakdown of all the data assets we've analyzed you know how fresh that analysis is how fully those are uh those are covered right so looking at um basically is this data set have you know all the tests that it should have is it fully covered for data reliability we know if data is late we know if data is missing we know for data zerone is we're it's passing all of our custom checks you know on this access um versus how much is it being used so
51:30 - 52:00 if you've got assets down here that essentially are not being queried or not being used by anyone these are ones that you might consider pruning if uh if they're basically generating costs and no value the real you know Hot Spot unfortunately in this case we don't have anything here the real hot spot is here right so things that are being used a lot you know if you imagine thousands and thousands of assets there's going to be some that are used quite a lot and that are also low reliability whether that means they're not even being tested for a liability where it means they are
52:00 - 52:30 being tested and they're consistently failing uh this is where those get flagged and this really gives you a great starting point to say you know more from a data reliability Focus like where can I invest my time and my resources in order to really you know uplift my organization uh to where to where we really wanted to be foreign so with that that's kind of a quick tour through uh through the key highlights of the platform it's very rich platform there's a lot of depth in here and so uh happy to take questions and then you know would really encourage
52:30 - 53:00 people uh as well to try to get Hands-On whether that's at a workshop or actually just signing up for a trial great thanks so much Tristan we do have a few questions so I'll read them out loud so that um everyone on the call can benefit from the answers so one of the questions is what is the implementation of this like right we I've already invested in you know Snowflake and databricks and so what would it take to implement Excel data to support both of those applications that you've demoed yes it's it's it's quite simple actually
53:00 - 53:30 I mean this ends up being like a 15-minute experience for most customers but basically we would go in and uh basically add it add a data source uh pick what we want to do and basically put in the credentials so if we wanted to do let's say snowflake we're going to come in turn on really the types of analysis we want um you know we can install a data plane really on Azure AWS or gcp that's what's going to go in and analyze you know your data for reliability and admit the key metrics back to our Hub and basically
53:30 - 54:00 it's as simple as that so we put a lot of work into really automating this should not be hard to get the stuff set up and so we think you know pretty quickly you know people are going to be gathering insights into you know where their data is good uh where they're spending uh resources that maybe they should or shouldn't and uh and yeah so I think it it's it's not a it's not even a full day process to kind of get that information it's within an hour we'll see something great great um next question is
54:00 - 54:30 um so can you give us a just high level comparison of the cost insights that Excel data is providing compared to what snowflake is providing um we are part of the both of the um Snowflake and databricks partner programs so we do have strong official relations with them with both those um companies but yeah Tristan if you could give a sense of kind of what we're adding over what those applications are offering in that in that particular area that'd be great yeah for sure and so I think the first
54:30 - 55:00 thing is as you mentioned like I think this is something snowflake is quite thoughtful about in terms of um consistently trying to drive value for their customers we're partners with them uh they'll support with other people that look at basically offering this because I think they believe in you know the strength of the platform that they offer as as we all do as well and uh you know trust that if people see what they're sending on it and uh the value that they're getting from it they'll see that it's a great deal and want to use more of it um so I think you know at a macro level
55:00 - 55:30 I think you know that's one aspect of it I think you can get into some detailed uh pieces as well here so let me just go over to uh the compute side and basically show a couple of us um so I think the the first the first part of the question to answer is is also around like what do we mean by by insights as well so I think a lot of this data pretty much is available in Snowflake uh if you go through to the right metadata tables and grab it and pull it and uh analyze it in slice it I said I think what's not necessarily there and
55:30 - 56:00 I think it's something that you know is is coming as well is around you know operational insights and alerting so having a full-fledged alerting framework can figure out of the box or tailored out of the box that's going to flag these and kind of identify resolutions as well uh I think is something that's a little different it's not a different information thing it's a different delivery of that information in a way that allows faster response um so I think that's one aspect that you know this sort of platform offers I think there's another aspect that is
56:00 - 56:30 delivery of information that maybe is not obvious until you you try it and this is one reason I'd encourage people to try try the products uh and try their own it's actually like this the speed and analytics on this stuff so one of the choices not not to get it too much into architecture but like um we've actually made a different architectural choice on this than than some some have which is basically to index a lot of that information for Rapid querying so as I go through here and I'm even looking at and I'm looking at why don't we jack this up to three months you know looking at significant
56:30 - 57:00 amount of information so I'm going to run this we can slice and dice this we can go across these tables um you know I can filter into warehouses this is all quite fast and this is all using zero snowflake credits so this querying is all done on our indexed you know basically analytical database that we've we've set up here that is separate from Snowflake and so what that means is you're able to get a lot deeper analysis all the forecasting we do also runs in this doesn't run in Snowflake uh so basically there's a little bit of a trade-off up like I can get this
57:00 - 57:30 information I can interact I can find the root cause of the issue faster um because I'm able to have like a truly interactive experience without you know a concern about by analyzing snowflake am I also racking like increasing my snowflake costs and this sort of dilemma um I guess the third aspect I would call out that I do think is quite quite essential and encourage people to try with their data is is the data quality inside so snowflake I think has a lot of like great Partnerships and things like that
57:30 - 58:00 um including with us but you know what they don't have in sort of the uis there is basically context on the data quality itself so there's really not an ability to do you can eyeball it right actually if you if you look at given result sets you can kind of look at it but to genuinely be able to look in and say you know okay great I'm spending data I'm spending time on this uh by the way what is this query you know like what is what is this what is this table what are we doing with it is it data good you know when where did this data come from like that part is not presence you know and and snowflakes
58:00 - 58:30 experience and isn't really part of the the goals there I think and uh it very much is a part of what what we offer with our perspective on data durability so I don't know that's you know I guess a few dimensions on where I think there's a few differences um obviously the other macro difference I'll just call out is like zooming back out to organizations that have more than snowflake that have you know also databricks also many other cloud data platforms also some of the cloud you know service providers own
58:30 - 59:00 native systems um you know our our ability to basically give you perspective on currently you know Snowflake and databricks in the future that expands quite significantly to many other platforms as well I think that also becomes a key trait in accomplishing these goals around basically budgeting and forecasting and you know should I run something on snowflake or should I run on databricks databrick says I should run on databricks snowflake scissors run snowflake gcps or whichever vendor says they should wear it there like having some facts around that and kind of a neutral place to kind of uh look at these side by side I think is a useful
59:00 - 59:30 trade as well great and I think you actually preempted a little bit that the next question which is you know you talked about obviously we're showcasing Snowflake and um databricks here but what other kind of Technologies do we support you touched on that a little bit but maybe you can go into that a little bit more yeah sure so um yeah so I mean this is the key key thing right because I think people see this and it's like there's a lot a lot you can do with this um I think you know we really look at it
59:30 - 60:00 in a couple directions one is you know breadth of clouds and the other is breadth of capabilities so um when we look at you know breaths of clouds like we've started with the two basically more more um prominent I would say you know multi-cloud offerings we've also put in you know uh put in work here so that this actually works with data on all sorts of clouds right this works on snowflake on gcp the data reliability works on gcp you know data reliability works on Azure like this thing's we try
60:00 - 60:30 to keep this Matrix simple so people aren't thinking well like what about this over here and that over there right the second piece is then now basically going around to a lot of these Cloud providers and saying okay redshift well we connect we do data reliability on redshift uh we do data reliability on Athena uh there's more on on Azure uh if we try to set those up right um we'll have bigquery we'll have gcfs coming right so I think what will what you're likely to see is a lot of coverage filling out on that direction and then I think the other direction that you'll end up seeing and something
60:30 - 61:00 that where our customers have guided us as well is really thinking about this pipeline view so if you you zoom these pipelines back out a lot of times there is you know a streaming platform like way up at the front of this right so if we go back to our our diagram our stylized diagram of the architecture and where accelerated puts in a lot of times those systems are streaming systems you know whether it's raw Kafka I manage Kafka you know one of the alternative streaming systems that's emerging like stream data processing systems that's emerging right now
61:00 - 61:30 um there's a lot up there and there's a lot of complexity in there so I think you know short term you're likely to see a lot of the sort of cloud data platforms get flushed out of course there's always more work on databricks and snowflake those platforms are innovating so fast and and we we very much try to keep up with that as well um but I think you'll start to see this expand into even more diverse sort of types of processing uh as we go on and on great all right well I'm going to be cognizant of everyone's time it's um if that it's after 11 I think those are all the questions that we have so definitely
61:30 - 62:00 appreciate your time as Tristan mentioned we will be um offering workshops so we will be sending emails uh email out to let you guys know about a data for observability Workshop um either in your area live or online so be on the lookout for that um and again appreciate your time I know that's that's super valuable if you do have any questions feel free to reach out to me Loretta Excel data.io and have a great day and again thanks so much for giving us your time bye-bye