Unlock the Power of Visuals in Big Data
CS5229 - Lecture 10 - Big Data Visualisation (2023)
Estimated read time: 1:20
Summary
In Lecture 10 of CS5229, Nisansa de Silva delves into the art of Big Data visualization, emphasizing its critical role in making sense of vast datasets. Throughout the lecture, listeners are exposed to different techniques and methods to enhance data comprehension and communication. Visualization not only aids in pattern recognition and insight discovery but also bridges the understanding gap between raw data and human interpretation. From understanding the necessity of simplifying complex information to the ethical considerations of visualization, this lecture encapsulates the essence of why and how big data needs visualization for effective communication and decision-making.
Highlights
- Nisansa de Silva introduces the lecture by discussing the immense challenges posed by big data visualization and the importance of simplifying complex datasets into digestible visuals. π₯
- The lecture emphasizes various visualization techniques including PCA, neural networks, and dimensional reduction, aiding in effective data examination and interpretation. π§
- Participants learn the significance of aligning visualization strategies with audience needs and data types to maximize impact. π―
- There's a focus on ensuring visualizations are not cluttered and maintain a good data-to-ink ratio, enhancing readability and comprehension. π
- Real-world examples such as social network visualizations and geographical data representation underscore the principles discussed. π
- Interactive visualization techniques enabling users to explore and manipulate data are highlighted as a critical tool in modern data analysis. π±οΈ
Key Takeaways
- Visualization is crucial to understanding big data and helps in identifying patterns that aren't evident in raw data. π
- There are two primary goals in data visualization: to present and communicate or to discover and explore. π―
- Creating effective visualizations requires understanding your audience and the type of data youβre dealing with. π₯
- Interactive and engaging visuals are more effective in conveying information and helping users explore data. π
- Ethical considerations and conventions are important when designing data visualizations. π§ββοΈ
- Poorly executed data visualizations can mislead the audience, hence accuracy in visual representation is essential. π«
Overview
The lecture begins with an introduction to why big data visualization is essential in understanding vast datasets. Nisansa de Silva explains that effective visualization can simplify the data complexity, making it accessible and insightful for exploring patterns and gaining actionable insights. This approach helps illustrate the transformation of data from a raw state to a structured one that humans can easily interpret and use to make informed decisions.
A deep dive into the importance of creating visualizations that cater to different audiences and objectives forms a core part of the lecture. Emphasizing visual design techniques, the lecture covers concepts like aggregation, pattern recognition, and the ethical aspects of data representation. It mentions that good visualizations not only communicate information efficiently but also adhere to conventions and reduce cognitive load on the viewer.
The session also explores various real-world visualization examples that capture dynamic data interactions and trends. From social networks to geographical mappings, the lecture showcases how visual tools allow users to not only see data differently but to interact with it for deeper understanding. Techniques to enhance user interaction, such as zooming, filtering, and connecting data points, are discussed as pivotal in transforming data analysis into a more fruitful endeavor.
Chapters
- 00:00 - 00:30: Introduction and Overview In the 'Introduction and Overview' chapter, the focus is on Big Data analysis technologies. The chapter provides information about assignments and announces a video resource, which will be shared at the lecture's end. The video serves as an additional learning tool, not created by the lecturer.
- 00:30 - 02:30: Big Data Visualization Importance The chapter discusses the importance of Big Data visualization by mentioning a person who established visualization at Uber and then founded their own visualization company, which was subsequently acquired. Before delving deeper into Big Data visualization, it suggests the need to consider foundational aspects.
- 02:30 - 04:00: Visualization Techniques The chapter 'Visualization Techniques' begins by addressing the importance of visualization in data interpretation. It notes the variety of data types and the need for effective visualization techniques to translate data and statistics into comprehensible forms for human analysis and understanding.
- 05:00 - 08:00: Pipeline of Visualization The chapter titled 'Pipeline of Visualization' emphasizes the importance of visualizations in understanding data patterns and behaviors. It suggests that visualizations can often provide clearer insights than raw data or purely statistical methods. The chapter underscores that the main objective of data visualization is to convey data effectively and intuitively.
- 09:00 - 18:00: Analytical Models and Predictive Analysis The chapter discusses the significance of visualization in the context of Big Data. It emphasizes the challenges posed by big data due to its vast and complex nature, making it difficult for individuals or even a single computer to process and understand without advanced analytical models and predictive analysis tools.
- 23:00 - 27:00: Data Exploration and Presentation The chapter titled 'Data Exploration and Presentation' emphasizes the necessity of creating visualizations to understand complex data. It asserts that humans are unable to easily comprehend vast amounts of raw data, making visualization a key step. The process involves distilling the data to identify patterns and behaviors, which are then effectively communicated through visual means.
- 30:00 - 35:00: Visualization Principles In the chapter titled 'Visualization Principles', the focus is on the nuances and considerations when visualizing data. The key takeaway is that visualization involves either projecting, extracting, or selecting parts of big data to highlight the patterns within, rather than merely presenting the raw data itself. This approach ensures that the visualization communicates meaningful information effectively.
- 45:00 - 60:00: Constraints in Big Data Visualization The chapter discusses the multiple reasons why humans are adept at using visualization tools, highlighting the importance of pattern detection, context remembrance, and intuition. Visualization serves dual purposes: presenting information and effectively communicating it.
- 67:00 - 78:00: Examples of Visualization This chapter discusses the two primary purposes of data visualization: presentation and exploration. The chapter elaborates on scenarios where data is presented to clients or peers, typically after finalizing the data. It also highlights the importance of visualization during the data exploration process, which involves understanding the meaning and implications of the data.
CS5229 - Lecture 10 - Big Data Visualisation (2023) Transcription
- 00:00 - 00:30 welcome to Big Data analysis Technologies like Japan it will tell you about the assignment hopefully live at the lecture type at the end of this lecture I will give you a link to a video a separate video that is not done by me uh to watch it's
- 00:30 - 01:00 from the person who established visualization in Uber and who went on to create her own naturalization company before it was bought over so let's talk about Big Data visualization now before getting into week data visualization we need to look at
- 01:00 - 01:30 why why should we do visualization so we have various types of indicating various types of data so we have data s room then we have statistics from the say data and then we have the visualization now I say human that would be looking at
- 01:30 - 02:00 this Theta starts initialization most of you would be able to understand the patterns and the behavior in this particular data set from this visualization rather than from either the data or the stats so this is the idea this is the whole point of data
- 02:00 - 02:30 visualization and especially in the case of Big Data this is important because on one hand it is entirely impossible for a human being to just look at that amount of data because big data by definition is a whole lot of data it is by definition intractable even by a single computer so
- 02:30 - 03:00 uh it is impossible to ask a human being to look at that data and get a overall clear idea so it is important to create visualizations uh after dialing down no oh concising or summarizing what are the patterns and behaviors of that data so that is important so when we are visualizing
- 03:00 - 03:30 something of big data we are visualizing a projection or an extraction or a selected set of that data or the patterns that the data encapsulate we look at each of these parts but keep that in mind there is usually no point in just showing the data as they are now
- 03:30 - 04:00 also you do need to remember the the reason why these things are good for humans so there are reasons as pattern detection remembering context they are very good for intuition and they can be used for predicting all right so visualization can be done for two different reasons one is to present something and communicate
- 04:00 - 04:30 and the other one is to discover and explore so if you have finalized what you have and you are just showing to your client or your peer so it doesn't matter who you are but if you are doing it in that scenario you'd be doing it as a preset or a community but you are in the middle of exploring your data is you are in the middle of understanding what your data means you
- 04:30 - 05:00 will have a explore or data type or explore or discover a type visualization where you try to visualize it to better understand what really your data means so then you will go with this type of data so you can see that these look more technical while these look Flash okay so visualization happens in a pipeline
- 05:00 - 05:30 you have your raw data from raw data you would be extracting the relevant dimensions and to extract Varian Dimensions you can either and pick them you can do something like PCA oh you can take a decision uh uh that would lie in between oh actually you can run a neural network and do some dimensional reproduction to get the
- 05:30 - 06:00 relevant Dimensions whatever you do you would mods probably be selecting a subset of these Adventures that you started with next you move on to applying or bucketing these Dimensions as to uh what you want to show so at most you have three dimensions right most of your visualizations would be two-dimensional but there is the
- 06:00 - 06:30 possibility of making things 3D there are ups and downs of making something pretty sometimes it makes things more cluttery but on the other hand sometimes they make it easier to understand so uh we'll talk about these questions later as well but uh keep in mind that these things can change from your implementation your problem to problemo again as I said implementation the implementation so from then you do a
- 06:30 - 07:00 filtering and after the filtering you might need to do an aggregation so this is where you'd be taking these averages boxes weapons means Maxes and so and so forth to extract statistics or patterns that you need to visualize then you need to create shapes or create shapes means are you going to do it as a
- 07:00 - 07:30 pie chart you are doing it as a bar graph you are going to say line graph is it going to be a scatter upload whatever it is you are going to create the shapes that are relevant to whatever the data points all classes so the units that you have decided on next you have to assign the scales to Shapers shapes you need to uh
- 07:30 - 08:00 scale them accordingly to the numbers that you have but let's say if something is big in data it should be within uh we will we will get to that when we are talking about the librarations and so forth next you need to render it to the screen no print it out so you can see there it can be an ambiguity there but it is not very important as to As Long As You Are
- 08:00 - 08:30 involving a method that a person can see so you can render the screen you can print it out you can project it to a wall you can communicate it to a VR headset doesn't matter the idea here is that you put it in a format put it in a manner that the person can just look at it that is the whole point of visualizations all right so then you have the option to
- 08:30 - 09:00 select various visualizations and when doing this you need to look at your data where it is coming from what type of data do you have and what is your objective what are you trying to communicate what type of idea are you trying to pass to your audience and as we are talking about audience the last point to
- 09:00 - 09:30 look at is the audience what type of an audience do you have uh the image shows so it's your audience very technical people who are they just lay PP that do not have much of a knowledge or interest in technical data so you need to decide on these main three ideas as to what type
- 09:30 - 10:00 of a visualization that you wish to do all right now with this in mind let us look at analytical models so analytical models are what we use to uh process but whatever the data that we had so we have the difficulty scale here and the value scale here so you have the
- 10:00 - 10:30 information side and the optimization side so uh if you are on this side you have more information and if you are on this side you have more optimization so uh none of them are better than the other it is just how two things can be many to handy right so in this end we will be talking about
- 10:30 - 11:00 questions such as what happened why did it happen so slowly we are rising up uh with this Arrow what will happen so here we are in the predictive category especially important with the big data we it is a selling point for Big Data to ask what will happen and then we have the question of how can
- 11:00 - 11:30 we make it happen so if there is a desirable result how can we make sure that this desirable uh outcome comes to pass so again here we have hindsight here we have inside and here we have foresight uh so as you can see it depends on
- 11:30 - 12:00 what type of analysis that we are doing so if we are doing if you are answering questions on what happened you will be doing descriptive analysis if you are on why did it happen you will be at diagnostic analysis so if you are at what will happen you will be at predictive analysis and
- 12:00 - 12:30 finally if you are trying to do how can we make it happen this prescriptive analysis so you are describing what happened you are trying to diagnose what had happened you are trying to predict what will happen and here you are going to give people instruction on how to make something happen so depending on these analysis analysis uh
- 12:30 - 13:00 the visualization that you pick the manner in which you are going to present the data change no so let's look at uh descriptive analysis what is uh you would what is that you should do for this bit if you are reporting or you are doing online and if you are showing it on dashboards uh
- 13:00 - 13:30 the easiest way is to have an updating table or working by chance oh changing graphs so here you are looking at what are the data that we have when you are trying to describe it so that PPE can look at it and say okay now I understand what your business process is so what your data is what type of uh
- 13:30 - 14:00 problems you're solving and so forth these are the core traditional business invadia so before big data and before all of this happened this is the type of analytics that was mostly used in businesses so they would collect the data uh most probably in the manual way and some manual analysts would run through this data and do this digestion this analytics and create the charts and
- 14:00 - 14:30 reports for the executives they would sit them down and show the charts and also didn't submit the reports and that is how things used to happen so it mainly talks about what has happened what occurred so then the humans the executives who look at this thing on their own
- 14:30 - 15:00 about what they can do what they what questions can be taken and then do the appropriate changes we should change the data and then the data analytes can come back after part after a month whatever and show the new descriptive charts and graphs and so forth so that is there's nothing wrong with that it is just one way of doing visualization that is one way one use of yourself
- 15:00 - 15:30 all right and then we have the predictive analysis where the analysis tries to predict what would happen depending on the data that you have so regression machine learning neural networks you train with the current and historical data and try to get the system to predict what would happen next so if you are doing
- 15:30 - 16:00 something like stock trading with massive amount of Fitbit uh this is what you will be looking at because um if you are trying to do stock trading it is always better to know what to do or know what the market would behave rather than just sitting there looking at okay we should have bought something at that point and so and so we should have sold our stuff uh before the power packs happen and so and so forth right
- 16:00 - 16:30 so the question is what we love and marketing you say Target for many predictive analysis as in what do the market do uh what are the products that the pp would like to have with uh the upcoming months and so forth so that type of visualization and Analysis yeah so you would do
- 16:30 - 17:00 descriptive type visualization all the users here to do the yeah pretty when perspective analysis so we are looking at now Crypts it's a prescriptive files to uh tell the humans what to do basically so it is rather advanced
- 17:00 - 17:30 helps you allocate scarce resources so it can be Capital it can be individuals it can be any other resource uh instead of just saying what has happened and what will happen these type of analysis has a goal in mind and knows what has happened and what might happen and helps you
- 17:30 - 18:00 achieve that goal from what it is going to prescribe to you so prescribe like how a doctor would prescribe medicine so it is in the domain of optimization and on the question what should happen so strategic planning on let's say Healthcare
- 18:00 - 18:30 so you can look at economic data relation demographic friends Health trends to plan for investment passing where should we have hospitals where would we have let's say an ambulance way and support so that's just an example uh so if you if you know that a certain population know this going to be overwhelmed
- 18:30 - 19:00 you can our system could predict and prescribe to have allocate more resources at that particular node so in this case the node can be a city or a population sentence and so forth so it's just one example but the point is um by analyzing our big data by analyzing now uh uh be speculated the analysis itself would
- 19:00 - 19:30 give a prescription saying okay do this or do that or sometimes uh remember we talked about uh also um yeah also they can say okay out of these options pick something so that is also respective weapons all right now when we are doing uh visualizations we need to decide what should the
- 19:30 - 20:00 viewers see because that is the medium that we are using so there are various ways of deciding on this but here we have given this example of given this theory of uh chart uh versions seven visual variables so it is about position so if you are doing something like a scatter plot you
- 20:00 - 20:30 know that the question of these dots means something and the size if you are doing a bubble turn to let's say a pie chart the size of the bubble or the price slice means something the shape let's see you are creating a Venn diagram uh spiral diagram b shape would matter obviously you know how color works if
- 20:30 - 21:00 the very basic thing that you have learned when you were school children where you were drawing bar charts you know the importance of using color for the different bars brightness can help you understand how things change so you might be able to use brightness or gradient of brightness to show uh how things change as things get more
- 21:00 - 21:30 and more intense or more and more lags orientation would matter you would be able to show how two things would align know how two things would slowly depart so if you are using something like a line graph the orientation matters then you have texture a texture is similar to what I talked about in brightness where texture can be shown to indicate how things would interact or
- 21:30 - 22:00 how intense they are as opposed to previous how they were and so and so forth all right so I am sure that you have seen examples of all of these in various visualization that we have seen right so visuals are not ready predictable and linear way
- 22:00 - 22:30 so here you you the Creator might think okay I created this visualization in a certain way and people would read it in the same way as well that is not so specially look at this and things whether the meme was correct on how you read the various texts on it me
- 22:30 - 23:00 it is right so most of the time you would follow that pattern but if we were just taking vanilla reading patents into uh our consideration you would have thought that the person would read from left to right and top to bottom but that is not true it is true for text if it is just text but if any is a visualization that doesn't happen the reading patterns
- 23:00 - 23:30 differ so here is a example of a eye tracker which was run on a number of people trying to read the graph so you can see they would follow the uh the usual pattern of how the graph goes and then then they want to know what exactly did I see what are the things
- 23:30 - 24:00 this craft shows and then only would they go and read the title and then from the Titi would move on to see the axis and here to the end so it's it becomes a rather Loop
- 24:00 - 24:30 so as you can see uh no one reads visualizations top to bottom and left to right all right so you have to be mindful in when you are creating visualizations next remember we talked about color and patterns and so and so forth it is important to have our things whatever we want to show to be standing out
- 24:30 - 25:00 so here when we show you this graph obviously you would be looking at the the solid pass more than the others the next one is that humans can only process few visuals at once do not clutter so you can see how flattered this visualization is uh and it's a bad
- 25:00 - 25:30 example so the reason that I have put it there is for you not to do it uh uh you might have seen the popular examples of how Indian elections are handled online where there is say overload of information overload of visualizations on the screen next
- 25:30 - 26:00 now there is a saying that nothing that is reported in America is reported with the metric scale and even few days ago there was a report of a asteroid and that asteroid was reported as half the size of a giraffe and people were making fun of it saying okay it would have been better to actually
- 26:00 - 26:30 say the size of the asteroid in matrices and so on but the thing is humans are not very keen on number so metric measures it doesn't matter they are not very key no numbers so it is easier for humans to get an understanding if you they can make a connection to something that is relatable but something that is known to them as
- 26:30 - 27:00 a comparative thing so in this image you you are visualizing a human and a diet so but instead of just putting the numbers and saying okay the dinosaur is five meters tall and 13 meters wide uh they are including a human as well and saying okay a human would be 1.8 uh meters in height so it's it's from
- 27:00 - 27:30 Europe that is why it has a comma instead of a dot but uh the Sri Lankan way of writing this down it would be 1.8 meters it is not a 180 meter tall humans be it anyway the point here is that including this human gives the humans who are reading this an idea by the way of comparison no relatability
- 27:30 - 28:00 all right the next one is rely on conventions and metaphors that you know so what I have here is a bad example by Design so if you create a system where go is red then stop the screen you are going to have a very bad time because that is not the convention that people are familiarity if you are trying to create
- 28:00 - 28:30 a trick game or something that is fine but make sure your visualization would adhere to the known conventions now there might be prejudices and so and so forth you might have ethical concerns on some of the conventions that people use but if you have something like that it is
- 28:30 - 29:00 better to always be ethical so you should always be careful but if there is no ethical concern so there is no one going to be offended by uh red beans top and Bobby green so unless there is a ethical concern it is always better to go with the conventions that are widely accepted by people to be I don't think so all right now there is a choice
- 29:00 - 29:30 visualization there is various ways to visualize if you have whatever the data that you have uh so here is a little bit of a definition into here to things that you might already know but what do you need to understand here is that to uh
- 29:30 - 30:00 show similar type for the same data pair might be various base obvious things that you might be able to go through so as you can see I have shown various types of visualization map visualizations projections uh heat Maps binary diagrams three backgrounds
- 30:00 - 30:30 pie charts images excellent support so there are a whole stock visualization choices and it is up to you to find the best now this is supposed to be a question that I am asking live in class but which one of these visualizations do you like better is it going to be
- 30:30 - 31:00 this side for this side I'm sure most of you would say you prefer this site other than this side so is the fixed matter it is when you are doing a visualization it is better to make it pretty enjoy
- 31:00 - 31:30 now how do you do a good data visualization you need to remember the objective of providing a clear understanding on the patterns of data hidden inspectors you need to condense information remember we can read the entire thing look at the entire data set they would not need the visualization especially in this big data domain which is impossible to look at all the data which is very important dependence information
- 31:30 - 32:00 it is better to use a single color when over a single pattern remember we talked about various ways of visualizing things but the point here is to be consistent on visualizing the same type of data be careful about positive and negative numbers it is better if you have the zero line make sure that you have sufficient
- 32:00 - 32:30 contrast between colors if you are visualizing something and it is just various levels of blue it is not going to help you I will put links on two Reddit forums data is beautiful and data is ugly where you can look at good visualizations and on data it's beautiful and you can look at the or wrong
- 32:30 - 33:00 oh difficult visualizations in data is active so when I when they say avoid back as it means that means that do not confer information you need to avoid them select colors appropriately so this board means to have a variety in colors when you pick them and also to be conscious of the conventions that
- 33:00 - 33:30 we may have so it is better to have a disturbing Trend in written encouraging friendly Green no yellow then the other way around and also not to use more than six colors in a single year don't overwhelm your pack
- 33:30 - 34:00 so analytic design you need to show causality mechanism explanation system and structure uh and your data should show more than one variable foreign it is real estate basically so don't waste
- 34:00 - 34:30 so you have a limited amount of space to show something and if you can show more things go ahead but Don't Clap at it you need to show context up the data that you're showing um just reporting things do not make sense unless you show the context of you can use numbers to highlight the most important part of data so even if you are using a graph let's say a line
- 34:30 - 35:00 graph like this and if you want to know the exact number here in this fee instead of just uh assuming from these numbers you can put that number here saying at the peak we have this number uh even without saying the verse if you just hit that number people would understand it is better to show comparison so as you can see uh this is how the things
- 35:00 - 35:30 has happened in the previous week this week so these comparisons of when passwords would help people to decide so here we are looking at the dashboard design groups of thumb you need to be conscious of the audience that you are going to have that you are having it should be comprehensive it should be
- 35:30 - 36:00 enough context I told you about context in the previous slide you need to highlight the important data see how colors have been used to highlight the data and the sizes see the number sizes have been used to convey most important information over here use Graphics when important here we are using packets for some things and their numbers and letters some things
- 36:00 - 36:30 have a good choice of Graphics and Design it should be historically pleasing as you can see these are pleasing to look at then if if it is a predictable prescriptive analysis there should be enough information to take that decision generally if you are presenting a visualization it is better if the person doesn't have to scroll so if you have
- 36:30 - 37:00 will zoom and then once you zoom then you're allowed to scroll totally okay but by default it is better to have no scrolling if the configuration if you are showing real-time data why are you showing batch data let's say if you have something like election results please pick data uh you will be showing the batch files you are not showing each
- 37:00 - 37:30 uh ballet that is counted you are showing as the update from Electric on the other hand you can have a real-time data via some code you have an online poll or something and you can show how people changes the ideas as they come on you might have seen this in various reality TV shows where they show on the screen I'll be popular ideas
- 37:30 - 38:00 change on various candidates as their mobile loading works it should be clearly organized as in same type of same class of all same group of information should be put together uh rather than all over the place all right so let's look at visualization types
- 38:00 - 38:30 so you have uh qualitative information site and you have the quantitative information site then you have the purpose so you if you have a purpose to make a statement oh the papers to look for new ideas so qualitative we have rename them
- 38:30 - 39:00 conceptual quantitative ribbon we have declarative here and exploratory here so exploration is about trying to learn about the data set explore multiple hypotheses manipulate data freely maybe discarded after the completion so maybe most of the time that would be
- 39:00 - 39:30 internal and on the other side presentation no declarative is for communication uh most probably external it would contain interaction for a client to work on it trying to look through it and visualize visual style is more important in that type of data so we will see how uh they are handled so we have declarative and conceptual visualization here
- 39:30 - 40:00 and a data treatment and declarative one here yeah even a client can look at this and see how it is so see how more abstract this is rather than that this is more concrete here this is a exploratory and data premium so somebody can look at this and take some uh decision and here a more conceptual diagram
- 40:00 - 40:30 somebody would just go and we have every day database on here when we say everyday database it means data visualization that is used in day-to-day business all right
- 40:30 - 41:00 let's look at some basic principles the chart that you are writing should tell a story uh the graphics should be 15 on their board the description should enable meaningful comparison uh it's a deal Insight beyond the text
- 41:00 - 41:30 that you have and this is a saying by telephone which is if the statistics are boring you have the wrong numbers right remember I told you about looking at the LIE Factor so the idea is that your crops and charts and whatever you are doing they should not lie
- 41:30 - 42:00 this it might not be a thing that you've done consciously but sometimes visualizations might can be wrong information even without plan so the idea is a life
- 42:00 - 42:30 factor for visualization which means the size of the effect shown in graphic divided by the size of the effect in data so if it is it it should be around one if it is greater than one it means exaggerated it is less than one it is a understated
- 42:30 - 43:00 so here is an example uh of full economy as you can see if you look at these numbers and when you look at these numbers and the length if you follow an actual progression this image should have looked like this for these lines you have the length that they are
- 43:00 - 43:30 saying they are what they are so 27 versus 18 should look something like this instead of this so the life Factor here is 27 .5 over 18. on 5.3 over 0.16 which is this is what the expected to be this ratio but this is the ratio that we
- 43:30 - 44:00 have actually gotten here so this is the line Factor how much this visualization lies to the audience here is another one all right since this is not a dedicated class on visualization no let's see I use we are not going to go
- 44:00 - 44:30 very much depth into uh that data visualization uh we are more focused on Big Data section how big data can be visualize um we mentioned these because it is important for individualization including big data but we can't go too much into detail because uh that would be distracting from the focus of this particular course which is Big Data
- 44:30 - 45:00 all right so uh basic principles on drawing a graph or a chart uh simply the better uh the idea is again another ratio data versus Inc the data ink ratio is amount of data that you have that means Inc dedicated to show data versus the thought
- 45:00 - 45:30 ink used in the graphic it should be around one uh if it is less than one that means non-data related dink in the graphic that means there are more things that are not giving data information to the viewer so here you have a lower data ratio this is verse so you had
- 45:30 - 46:00 this this data and now you are having a lot of ink here but it is only conveying the same information as before here is the higher data ratio where it is using less data than less ink than here but it shows all the data that somebody could have read from here in fact here they can get a more accurate
- 46:00 - 46:30 number more accurate information on where each of these dots are rather than looking at here and guessing where it would fall here all right oh the next point is minimizing uh chart junk uh that is the dataing running a mock or going wild unnecessary visual clutter so as you can
- 46:30 - 47:00 see uh there are patterns over here and this pattern is not going to give out any information why do you even have this pattern why do you have this pattern here it is not giving out any information so here the more effects give illusions of movement for no reason they stand out in a bad way so there's a visualization in Big Data
- 47:00 - 47:30 um you need to be effective apart from the volume of Big Data the intrinsic constraint generated by the typical characteristic data are as follows we have real-time changes you have extreme variety of sources different levels of data structuring
- 47:30 - 48:00 and it is advisable uh the simultaneous use of several visualizations uh on the PC let's say you would run out of screen to draw all the data points so you would have about 10 to the power 6 data points on the screen you would if you're just looking some them up it would take about 10 to the
- 48:00 - 48:30 power 9 amount of time and you would start to run out of space at 10 to the power 12. so these are like the limits that you are working when you are working on big data and the solution space can be aggregate uh the aggregate means a single visual points would represent multiple data points so they can be averages they can be Maxis
- 48:30 - 49:00 means whatever and on the other hand you can have samples uh instead of taking Aggregates you can get a bunch of data from the you can get the sample of data that is what actually it means you can get a sample of data from the distribution that you have and then visualize it however uh we are going further one central location to store cyber security data is uh data is collected at
- 49:00 - 49:30 once and um third part software so this is just uh implementation uh uh you have to have scalability interoperability more than deploying something directly off from my vendor so uh just putting your big data on a framework and showing them might not cut
- 49:30 - 50:00 uh the date the way that you use data uh influenced by and influences data formats and Technologies so be mindful of this search uh in fact next week we will be talking about big data search analytics relationships distributed processing if you have correlation or
- 50:00 - 50:30 statistical summarization when you are showing Big Data what do you do what do you do with the context do you in which you join and in this case there are hard problems on phrasing can you refresh something uh you can have a common naming scheme uh where do you store how these analytics are stored are you going to generate it each time somebody requests it would that be too expensive or would that be better would that be more online
- 50:30 - 51:00 uh then how you access data you can use this we will even in handy but then again he talked about no experiences right oh now again we are looking at when data becomes speak data can be motion data can be scale we already know these Theta can be in many forms we know the three vs
- 51:00 - 51:30 and extreme skill comes together complex information spaces come together and the critical information in visualization size inclusion of visual analytical and many then active info involvement of a human and with this we are going to look at when Big Data when visualization becomes important in
- 51:30 - 52:00 the Big Data life cycle so we have the collection process cleaning process integration and then we had visualization and this visualization can impact can go back and impact other things as well and go forward and impress so after visualization we have analysis presentation and dissemination so visualization can impact the way that you clean your data it can impact the way that you analyze
- 52:00 - 52:30 your data or present your data and of course how you disseminate the information it we it can play it will play a specific role in several places of the Big Data life cycle it can affect the the data type can affect the visualization sign and as we look at as we saw earlier it can inform the data cleaning process and
- 52:30 - 53:00 the choice of the analytic algorithms that you would be using so the three phases that you would be seeing uh in the big player life cycle or we should listen if the pre-processing staging and handling stage exploratory analysis page and the presentations way uh so on each of these visualization uh would have a feedback loop or a
- 53:00 - 53:30 update here is a style analysis of Big Data visualization uh you can do data reduction in order to move from Big Data to medium data to small data and show it or you can have visual
- 53:30 - 54:00 interaction where you can mix two types of big data or uh data points to create a Interactive interpretation or you can process it with divide and conquer and parallel computation all right we are moving on to uh visualizing big
- 54:00 - 54:30 data on official statistics [Music] there might be dangerous and most of the time the idea is to look at the basic opportunities such as automated analysis tools interactivity methods that can be used for your visualizations and moving on we
- 54:30 - 55:00 can have uh data visualization Technologies you can look at various uh types especially one is mentioned in the video that you are supposed to watch at the end of this lecture there are traditional analytical approaches there are various analytical platforms and presentation tools
- 55:00 - 55:30 all right uh automated analysis and interactive visual methods uh we are looking at how it would help in the entire life cycle of the data we looked at it uh in a different perspective just a few moments few slides ago uh how the analytical capability uh how would you integrate it uh for human
- 55:30 - 56:00 analysis so remember we talked about this early as well the objective of uh this data visualization has always been to facilitate a human uh it is especially true in the case of Big Data because uh uh as we discussed earlier it is impossible for a human to comprehend or just take in the entirety of
- 56:00 - 56:30 big data so any any big data uh by definition would be very hard for it well impossible for a human to take in therefore we have to do the visualization so remember three vs and these come together and create the appropriate definition uh in the context of design and implementation
- 56:30 - 57:00 all right um we you already know about this and we talked about uh visual forms of knowledge and the basics of visualizations in the previous part of this lecture so I tried to be keep it as generally as possible so that you would get the understanding of visualization as a whole and now we are trying to bring it on to the uh be more specific
- 57:00 - 57:30 on Big Data so the idea of uh automated analysis can be done so basically for Big Data mostly the these highlighted ones are the relevance of the most relevance but you can look at how various processes uh would be taking place in these other
- 57:30 - 58:00 levels as well but mostly what is important for Big Data visualization is the pattern recognition no visual query analysis visual interpretation evaluation and representation all right uh so this is again a citation this is something from research um the idea of uh automated analysis of
- 58:00 - 58:30 Big Data is methods to make more sense of data so we have producing the data and the mapping inter representations and then go on to uh to specific techniques
- 58:30 - 59:00 I am not going to read out the slides uh I have tried to make slides as descriptive or wordy as possible so that even if you just read the slides that you will get an idea so this is a visualization lecture so so it is more about you being able to uh just look at the data uh the slides
- 59:00 - 59:30 itself and understand so I wouldn't be just reading most of the slides uh should get access to the sites themselves and uh this recording as well all right so let's look at how an interactive visual analytics system would work for big data we have the data pre-processing then you might have data mining machine
- 59:30 - 60:00 learning statistical methods which as we have been reiterating would what we need to show in a big data visualization uh patterns outliers clusters gaps so interactive visualization would help would give the functionalities of browse search Monitor and
- 60:00 - 60:30 it would help people see more interesting relationships uh what if scenarios verify the presence of biases and simulate changes in Impact if you are trying to disseminate information the objective is to show the data and Enlighten the sense of data and tell uh stories that are comprehensive and has a great flow and
- 60:30 - 61:00 remember not no not lie of the data that you have it should be representative of what you are doing right so we are going further on the interactive visualizations uh so what can you interact with I have this nice table here uh taken from
- 61:00 - 61:30 sorry taken from this research so if you have an interactive visualization you may select you may explore you may reconfigure you can drag and drop you can encode you can abstract zoom out you you filter or you connect so as you can see select uh it's the ability to make data items of interest and highlight it you can click on data items uh these kind of
- 61:30 - 62:00 things would help without libraries it would you it would make you able to identify out levels you can explore you can click on something and explore further go beyond it's like opening a directory structure you can pan across the data reconfigure uh you can use different perspectives you can rotate things you can look at different
- 62:00 - 62:30 ways of different connections between data uh there you might see hidden patterns if you are looking at two types of connections maybe there is a pattern on how those connections work with each other you can import uh transforming basic elements of human vision with let's say colors shapes and dimensions remember the sense things that we mentioned earlier
- 62:30 - 63:00 colors and positions and so forth you should be able to increase or reduce the details of the visualization let's say you can hide the show labels you can zoom in zoom out that kind of this uh filtering you should be able to filter let's say it is a huge graph maybe you only want to look at people and not the let's say they are actors and
- 63:00 - 63:30 directors and movies and so and so forth so there should be a possibility uh to only look at the actors with the movies that they are in and then maybe look at all directors and how they are connected and so on so forth uh connect is a little higher level where emphasize relationships and associations uh if they already exist so interactive visualization if you follow this guideline if you follow if you are
- 63:30 - 64:00 having an interactive visualization for your big data if you follow this it should make it uh more accessible more useful for your audience so computational problems with big data uh obviously as usual with big data traditional methods might not work especially with new data sources
- 64:00 - 64:30 uh summarizing is a big deal because sorry emphasized about three times already the whole point of having visualization is to be way summarizing of big data which is incomprehensible as they are making sense through inferences there are might be patterns that are hiring among data finding and detect detecting and
- 64:30 - 65:00 defining anomalies what is and only in a certain context and then finding them so we have human perception and in human perception objects can make larger smaller uh if they are larger humans might have difficulty in extracting information so you have limited screen space
- 65:00 - 65:30 uh in that you might have clutter remember we were talking about reducing clutter so if you are so trying to show too much stuff there might be issues of uh readability and just a lot of words and numbers on a screen which no one can understand scalability we are looking at how
- 65:30 - 66:00 [Music] uh you can scale what you are doing what they are showing so you can do dimensional the reduction we talked about this in an earliest light as well uh instead of showing all the images all the features that you have you can use various methods such as PCA or uh Auto encoders and so forth to reduce the dimensions and only show the important ones or you
- 66:00 - 66:30 can cluster the data and show the cluster centers it's a type of aggregation remember the times that we discussed earlier aggregation is a type and clustering is a way of application you can use machine learning to extract the patterns and only show the patterns instead of showing uh the data themselves you can use data mining again to get patterns out of the data that you have
- 66:30 - 67:00 and show them instead of the raw data right so we are looking at the constraints one would have been one price to visualize data we have data processing constraints data communication issues and data aggregation and other design problems all right so with this let's look at some
- 67:00 - 67:30 visualizations of Big Data right so this is an example of visualizing the Bible so the bar graph here shows the chapters by all the length of technical support and
- 67:30 - 68:00 the books alternating color uh White and light gray over there you can see uh so people who are unfamiliar the Bible is a collection of books uh so you can see those various books as you look through here and the length of These Bars show the number
- 68:00 - 68:30 of verses in the chapter and these lines are the cross references mentions from one of these chapters to another show that it is a foreign so the color just shows the distance
- 68:30 - 69:00 uh between the two chapters if it is very close it is purple and when this verified is yellow so you can see uh how it creates a nice rainbow in the visualization as you can see it shows us shows you 63 000 crossover consistencies how you uh nicely visualize Big Data
- 69:00 - 69:30 here's another example unusualizing social networks uh the mood in the United States inferred from 300 million tweets and you can see it being projected on the map of the United States and you might see that this uh not the regular United States so this is
- 69:30 - 70:00 what you would see this is the landmass and realistic projection uh where it is predicted by size of the size is determined by the amount of two is the number of tweets that you have so more popular areas like California pass more space rather than its actual land mass so that is for example here
- 70:00 - 70:30 so it's a the way that it is shown is called a density preserving cartographer so as you can see when you are visualizing Big Data it is important to show what show the data as it is remember never to lie so if you just use the physical location of the map you would have
- 70:30 - 71:00 a wrong perception on the amount of twists that it is representing but even they are using this density of selling programs is obvious when there is more data from a smaller land area right here is the human Source information
- 71:00 - 71:30 basically history of the world in 100 seconds this was created by this particular flowing data website what they have done is they have taken 14 000 Wikipedia articles with days pension uh years and days mentioned and uh trying to visualize them on a world map so that it
- 71:30 - 72:00 would show As Nice dots let me run it for you as you can see it shows various Wikipedia articles amazing dates of these events and they are persistent the dot is don't create a little puddle when it comes and then stays persistent
- 72:00 - 72:30 as you can see as the time goes uh the reporting of events now you might look at this and understand that this is very European bias and uh in America is discovered it becomes more Western biased as a whole now as you can see it has Europe it has maybe least it sometimes has China
- 72:30 - 73:00 it has United States city uh a little bit of Africa and some East Asia as the time goes now you are in the thousands you can see the Crusades and stuff happening you can see the uh the ca probability of the 1200s now almost we are coming to the Age of
- 73:00 - 73:30 Enlightenment okay now we are in 1500s more exploration is happening now United States comes out near the states just for Independence Manifest Destiny start to happen and we are in the um end the top showing us a
- 73:30 - 74:00 diagram are showing us a world map even without the world map at the end because of how events have taken place and as you can see we are very underrepresented not because the uh the first the people who made this bias but because uh we are very underrepresented in Wikipedia all right so here is another portrayal of digital
- 74:00 - 74:30 cities uh how you would visualize text messages you can see in London uh how a hurricane uh how messaging changes changed in London and also when Obama
- 74:30 - 75:00 visited uh to see the damage and very importantly you can see that after after a day uh people stopped talking about uh hurricane this is how the new Cycle Works uh unless information comes consistently people forget something happened very fast even if it is a shot yesterday
- 75:00 - 75:30 all right um here is another visualization on a music festival uh which happened in 2008 you can see how various events show up as very bright spots and how parallel events would be shown as
- 75:30 - 76:00 parallel bright spots and how focus of activities would move from certain areas to different areas when more popular things happen in a certain area there would be more activity there and when that ends and a different popular thing comes up at a different point uh the the flow from the flow of baby would
- 76:00 - 76:30 move to that area there is another visualization on Singapore how live Singapore works and let me actually be run it for you there you are [Music]
- 76:30 - 77:00 thank you [Music] as you can see information about sports [Music]
- 77:00 - 77:30 foreign [Music]
- 77:30 - 78:00 [Music] people playing in flying out [Laughter]
- 78:00 - 78:30 [Music] forward here is a similar uh visualization on San Francisco transportation uh on how we move about
- 78:30 - 79:00 so black lines are very small very slow movement and uh faster stand 19 and blue is less than 43 and green uh faster than 43. so you can see among the highway they are faster and from the tributaries that go from them there are Blues then Reds in residential
- 79:00 - 79:30 areas and blacks in Junctions and uh very distinct areas such as schools Etc where there are speed limits you might see the same thing with Google as well so I have put some more examples over here uh on various websites you can click on
- 79:30 - 80:00 these and look at these visualizations let me actually show you this uh which is basically uh visualization of uh word embedding so this is uh virtuick 10K nodes and this are the words that you would have uh in
- 80:00 - 80:30 verdict as you can see uh you can move about and see how these are embedded in the vector space I even do it what are the words that are close by so you can see there are thousands and thousands of embedded words you can see what are the outliers unpaid governments
- 80:30 - 81:00 policy so see this is the good thing about word to it right as you might know the relevant words come together so governments and policies are rather close by and let's see who are the outlasts here simple application the phase input so interface and input makes sense to be close by so when you click on remember we talked about clicking on items right you click on
- 81:00 - 81:30 something and you can see what are the nearest points so nearest points to input is output ah what a surprise and let's click on something else ecological environmental biod is it see uh now you might think okay uh even though they are saying nearest points they are they look like they are all over the place but no see the point is uh humans can only see 3D so it has to be
- 81:30 - 82:00 projected to a 3D space but these things exist in a higher dimensional space where these have 200 dimensions so they are just projected into the 3D space so when you click something they are they the things that are close by mind not be close by in this particular
- 82:00 - 82:30 projection all right you can click on this I uh links in the PDF and go to each of these and have a look so that ends the lecture part of this lecture uh here are the references and
- 82:30 - 83:00 next you are supposed to watch the video which I will be providing you or from the Uber visualization expert thanks