Data knowledge and skills tutorial - Part 2: authenticating data
Estimated read time: 1:20
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
In this second video of a series on data and data handling, Martin Levins focuses on authenticating data with references to the Australian Curriculum in Digital Technologies. Building on the previous video's exploration of data from a Year 5-8 perspective, this installment is tailored for Year 7-10 students, delving into structured data and the cleanliness of external data sources. The video illustrates using Wikipedia for gathering minimum wage data worldwide, stressing the importance of verifying source authenticity despite Wikipedia's debated credibility. Levins demonstrates converting Wikipedia's table into a Word document to easily eliminate unwanted information before analysis, highlighting upcoming steps for further data refinement in Word for accessible educational tools. The series aims to demystify data preparation and authenticity for students, enhancing their analytical skills in a fun, engaging manner.
Highlights
Martin Levins continues his series on data handling, focusing on authenticating data for the Australian Curriculum. 🎓
The video transitions from exploring Big Mac pricing to understanding structured data for older students. 🍔➡️📈
Wikipedia's role and reliability as a data source spark discussions on authenticity. 🌐
Converting data tables from Wikipedia into Word helps streamline data cleaning. 📝
Preparing students with practical data management skills! 🎓
Key Takeaways
Understanding structured data is crucial for students in Years 7-10. 📊
Wikipedia can be a starting point, but authenticity checks are essential! 🤔
Convert and clean data using Word for easier manipulation. 💻
Demonstrating data analysis tailored for the Australian Curriculum is fun and educational! 🇦🇺
Further data refinement techniques teased for the next video. 🚀
Overview
Martin Levins takes the audience deeper into the world of data with the second video in the Australian Curriculum series, focusing on Years 7-10. This installment is all about structuring and authenticating data. Building on the narrative of pricing a Big Mac from the previous video, Levins now shifts attention to understanding what structured data means and why it matters in real-world applications.📚
The conversation pivots to evaluating data sources, with Wikipedia taking center stage. Despite often being scrutinized for reliability, Wikipedia's comprehensive cross-referenced entries provide an excellent platform for teaching data authenticity. Levins gets hands-on by showing how to transform a Wikipedia table into a Word document for better data manipulation, a valuable skill for budding data analysts. 🔍
Looking ahead, the video teases a deeper dive into data cleaning in the upcoming tutorial. This series not only aligns with educational standards but also captivates students' imaginations by making complex data concepts accessible and fun. It's about equipping young minds with the tools they need to navigate an information-driven world confidently! 🌏
Chapters
00:00 - 00:30: Introduction to the Series and Recap In the introduction chapter titled 'Introduction to the Series and Recap,' the focus is on the continuation of a video series that delves into data and data handling, with specific references to the Australian Curriculum: Digital Technologies. The previous video covered data concepts relevant to Year 5 to 8 students, particularly through an exercise involving the pricing of a Big Mac and its representation on a map.
00:30 - 02:00: Data Visualization and Pricing Exercise The chapter titled 'Data Visualization and Pricing Exercise' delves into advanced topics related to structured data. It focuses on understanding, achieving, and cleaning structured data. As a practical example, the chapter revisits an exercise from a previous video, which involved plotting the prices of Big Macs in various countries in US dollars. The aim is to provide a comprehensive understanding of data visualization and handling pricing data across different regions.
02:00 - 03:30: Authenticating and Evaluating Data Sources The chapter titled 'Authenticating and Evaluating Data Sources' discusses the comparison of the price of Big Macs across different countries, specifically highlighting the cost in the Philippines and South Africa. It posits a hypothetical scenario where a teenager might consider moving to South Africa based on the lower price of Big Macs. The chapter concludes by posing an investigative question about the duration of work needed to afford a Big Mac in these countries, thereby delving into the broader theme of evaluating economic data in context.
03:30 - 06:00: Data Cleaning Challenges and Methods This chapter discusses the initial steps involved in cleaning up a data visualization map to make it more user-friendly and informative. It describes altering the map to display labels that show dollar prices, improving navigation by reducing the need to open and close windows constantly. The transcript highlights the importance of understanding the time needed to earn differing amounts in different locations, such as $2.15 in South Africa versus $4.45 in Australia, emphasizing how data visualization can aid in recognizing these economic differences.
06:00 - 09:30: Preparing Data for Analysis in Word The chapter 'Preparing Data for Analysis in Word' discusses the process of gathering data on minimum wages from various countries using a Wikipedia article as a source. It highlights the importance of verifying the authenticity of data sources, acknowledging that some may question the reliability of Wikipedia for accurate data.
09:30 - 10:30: Conclusion and Next Steps This chapter explains the author's choice of Wikipedia as a source due to its comprehensive and backed-up data from various authentic sources.
Data knowledge and skills tutorial - Part 2: authenticating data Transcription
00:00 - 00:30 [music]
Martin Levins: This is the second in a series of videos about data and data handling with
specific references to the Australian Curriculum: Digital Technologies. In the previous video, we looked at the concept
of data from a Year 5 6 and possibly 7 8 perspective, where we looked at the pricing of a Big Mac
and put that onto a map, which we'll look
00:30 - 01:00 at in a sec. This video is going to concentrate more on
the upper end of 7 and 8 and 9 10 to look specifically at things like structured data
and try to explain what that means, how we can achieve it, and how we can clean up data
that we get from another source. At the end of the last video, we had this
situation where we had plotted out the price of Big Macs in every country, so it was $4.45,
and these are in US dollars to make the prices
01:00 - 01:30 comparable. In the Philippines, it's $2.81, and down here
in South Africa, it's $2.15. You might think, if you're a 15- or 16-year-old,
that let's all move to South Africa because the price of Big Macs is very low. What we're going to ask is: How long do you
have to work to earn that amount of money?
01:30 - 02:00 First of all, we're going to clean up this
map a little bit and make it look a little bit nicer. Now I want to change the style so that the
labels actually show us the dollar price. This becomes a little bit easier now to navigate
this map because you don't have to click in and close windows all the time, but it still
begs the question: "How much time do I need to devote to earn that, how long does it take
me to earn $2.15 in South Africa compared to how much time it takes to earn $4.45 in
Australia?", all in US dollars.
02:00 - 02:30 I went to try to find out how much people
earn in each different country. I did a search for the list of minimum wages
by country, and I got a Wikipedia article. One of the things about data is checking its
authenticity. Some people may challenge the idea that Wikipedia
is an authentic source for data.
02:30 - 03:00 In this particular instance, I want to explain
why I chose Wikipedia. I chose it because it gave me the data that
I wanted and insofar as authenticity is concerned, every one of these entries is backed up with
data from a whole variety of different sources, most of which are either newspapers or government
sources themselves.
03:00 - 03:30 Most of them are government sources. We can see that there's a lot of data here. We've got the country, we've got the minimum
wage and an explanation for anything to do with that minimum wage. We've got the nominal price in US dollars
annually that people will earn. A thing called PPP. Now, this crops up quite a bit and it's an
economic term which tries to take into account not only what you earn, but how much, what
its buying power is and takes into account
03:30 - 04:00 things like home mortgages and the rest, how
much it costs to buy a car and all that sort of stuff. Because we are looking at it from the perspective
of a Year 8, 9 or 10 student who probably won't be buying a home anytime soon, I'm going
to take just the US dollars. I'm not going to look at that economic rationalisation
of the number. I'm just going to look at the minimum wage
as it is, and you'll see that Afghanistan
04:00 - 04:30 earns 50 cents an hour in US compared to Australia:
minimum cost is 14.56 in US dollars. Now, that's pretty close, I reckon, to what
you'd earn as a burger jockey in McDonald's in Australia. We can still get a comparison, though, across
the whole lot. If we scroll right to the bottom of this,
we can see here we go, to the bottom we can
04:30 - 05:00 see all of those references adding to the
authenticity of these data. Here we've got a whole bunch of data and we
are not able to download this into Excel or anything else, so we're going to copy and
paste and see how we go. It's in a table, right. We've got the country, bit of discussion,
few bits of pricing stuff, and when that value of the minimum wage was determined, and most
of them are pretty current.
05:00 - 05:30 You will see a few that are 2012, 2013, and
that in itself provides an interesting discussion point for the student. Why is it all different? I've found that if you're copying these things
and you want to put them into Excel or Word or whatever, it's best to copy from the bottom
right of the table. I don't know why, but it just seems to work
a little bit more reliably. Now, this is a big table, so I'm going to
scroll right to the top. Shift-clicking doesn't seem to work as reliably
as this method.
05:30 - 06:00 Even though it takes a little bit longer,
that's okay, we'll get there. We're getting close to the D s, C's, B's and
now we're into the A's and there is our country. We'll come back down and copy that. Now, that's a fairly large data set. What I want to do is to paste that into Word
first. Now, this may seem a little bit odd if we
wanted to do our ultimate work in Excel, but
06:00 - 06:30 pasting it into Word gives us an ability to
get rid of a lot of things that we don't necessarily want. I'm particularly interested in this flag and
country, which are right alongside one another, and that may upset any referencing that I
want to do in Excel. What we're going to do is to take a Word document. We've already got that stuff copied to the
pasteboard. I'll now paste that in, and because it's a
reasonably large amount of data, it's going
06:30 - 07:00 to take a little while for that to happen. When it does happen, it's going to come into
Word as a table in Word. Tables can tend to be a bit nasty when you're
dealing with data. What I want to do is to change the structure
of the data. There it is coming as it's gone right to the
bottom, which is cool. We'll scroll to the top. There we are. It's all come in.
07:00 - 07:30 You beaut, terrific. We've now got our document in Word, as I said,
to use some text editing on this because we need to clean it up. The document is structured or the data, at
least, is structured, but it's not structured the way we want and it's not structured so
that it's clean. Let me explain why. Across the top here, we've got a country and
then we've got all the countries down here, but I'll come back to that in a minute. Then we've got the minimum wage. Then we've got these things.
07:30 - 08:00 Really, all we want is either the annual salary
or the hourly summary. It depends on how we want to compare the data,
so we'll look at that later on. Then here, we've got everything in the right
columns, but at the top here, we've got these fellows, which are two columns in one. Someone has merged this to make it look a
little bit better. That's going to confuse our data analysis
a little bit later on. The biggest thing, the biggest problem is
this.
08:00 - 08:30 We've got a graphic and then we've got a link. I just want Afghanistan. I just want the name of the country to be
in there. I don't want its flag and I don't want the
rest because I want to be able to look up that country so that I can find either its
annual or its hourly wages so that I can compare that to the cost of a Big Mac. We need to clean these data and make sure
that they are structured in the way that we
08:30 - 09:00 want them, and that's what we'll deal with
in the next video. In our next video, Data III, we'll look at
how we're going to use Word to clean up those data. We're going to use Word as if it's a text
editor. Now on Windows, normally you would use something
like Data Studio, or there are a number of other programs available, a number of other
applications. On a Mac, you'd use something like BBEdit,
but because a lot of schools and a lot of
09:00 - 09:30 teachers won't necessarily have access to
those, they are free programs, but they may not be able to download them, we're going
to use Word, and in doing so, we'll explore some of the powerful search and replace options
that are available within Word on both Mac and Windows. [music]