Data knowledge and skills tutorial - Part 3: structuring data in a spreadsheet
Estimated read time: 1:20
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
In this informative tutorial, the Australian Curriculum explores how to structure data in a spreadsheet, making it easier to work with in Excel. The guide takes you through converting data tables to text for manipulation, addressing issues like unwanted formatting marks and links, and finally converting text back to organized tables. This step-by-step approach highlights both the challenges and solutions in prepping data for analysis, ensuring it is clean and structured for optimal use. The video emphasizes the importance of understanding non-breaking spaces and using advanced search and replace functions to streamline data organization.
Highlights
Convert tables to text to manipulate and clean data easily in Excel. π
Understand the role of formatting marks and how to deal with them. π«
Replace non-breaking spaces with tab characters for better table organization. β
Fix hyperlinks to avoid accidental website navigation while working. π
Learn to effectively use the search and replace function to manage data spacing. π
Key Takeaways
Learn how to convert tables to text for better data manipulation in Excel. π§
Discover the importance of handling non-breaking spaces in data. βοΈ
Fix pesky links in your spreadsheet that might cause navigation annoyances. π
Master the art of search and replace to clean up your data efficiently. π
Use simple keyboard shortcuts to transform your data experiences. β¨οΈ
Overview
The magic of structuring data in spreadsheets is unraveled in this compelling session. You start by converting data tables to textβaddressing issues like invisible formatting marks and unwanted hyperlinks that can disrupt your workflow. Converting back to tables allows for a more organized data set, making manipulation in Excel seamless and efficient.
Say goodbye to data-related headaches such as pesky links and invisible characters! With the Australian Curriculum's tutorial, you become adept at using advanced search and replace techniques, transforming non-breaking spaces into useful tab delimiters, and ensuring your spreadsheet is clean, structured, and free from annoying distractions.
Embrace the power of your computer as you tackle complex data management with ease. From converting text back to well-organized tables to performing quick fixes with simple commands, this tutorial offers a deep dive into the efficient management of spreadsheet dataβensuring you're prepared for any data-driven challenge.
Chapters
00:00 - 00:30: Introduction The chapter titled 'Introduction' outlines the initial steps of a data exercise. It involves gathering data, verifying its authenticity, and then organizing it for use in Excel. The process begins with converting a data table into text for further manipulation.
00:30 - 03:30: Text Processing and Data Structure This chapter focuses on the concept of text processing as distinct from word processing. It describes a process of modifying the structure of data from a column format to a continuous format separated by tabs. This transformation is explained to have a particular purpose, which is intended to be discussed further. Additionally, the importance of a specific tool or technique to assist in this task is implied but not fully articulated in the excerpt provided.
03:30 - 05:30: Advanced Search and Replace Chapter Title: Advanced Search and Replace
Summary: The chapter highlights the functionality of the 'show invisibles' feature which reveals formatting marks such as tabs, carriage returns, and paragraph markers. These visual cues help in understanding how data is organized within a document, providing a foundation for advanced search and replace operations.
05:30 - 07:00: Manual Data Cleanup The chapter 'Manual Data Cleanup' discusses the intricacies involved in identifying and removing unwanted spaces in a dataset. It highlights the difference between regular spaces and non-breaking spaces, emphasizing the need to develop a pattern or method that can efficiently identify and eliminate these unnecessary elements. The goal is to clean up the data by removing these flags without disturbing the necessary content.
07:00 - 09:30: Finalizing Data Structure In the chapter titled 'Finalizing Data Structure,' the discussion revolves around the peculiarities encountered when importing data from web pages or tables, such as inconsistent paragraph separations. The focus is on addressing these issues by reverting changes to check the initial table format. The aim is to fine-tune the data structure for further processing.
09:30 - 10:00: Conclusion The Conclusion chapter discusses some challenges encountered with text manipulation due to embedded links in the document. It highlights the inconvenience caused by accidental clicks leading to webpages, which disrupts the text editing process. The chapter briefly mentions a solution to separate flags and country names using a specific symbol, anticipating its usefulness for future adjustments.
Data knowledge and skills tutorial - Part 3: structuring data in a spreadsheet Transcription
00:00 - 00:30 So our data exercise so far has seen us
gather some data, check its authenticity, and now we've got to structure it so that we
can work with it in Excel. Okay, so the first thing I'm going to do is I'm going
to just click anywhere in that table and then go and convert that table to text
because I want to do some text
00:30 - 01:00 processing, which is quite different to
word processing, and you'll see why in a minute. I want to separate the text with
tabs which means that the structure of the data at the moment in columns is
going to be turned into continuous data separated by tabs instead of a column,
and I need to do that for a reason I'll go into in a minute or so. So once we've
done that, it comes up like this, straight text, and what's really helpful at this
point in time is to turn on the little
01:00 - 01:30 'show invisibles' up the top here, the
formatting marks. If I click that I get to see all these little bits of business
that determine how the data are put together. So I can see these little
arrows are tabs. This little backward- facing P thing is a carriage return or a
paragraph marker and I can also see some
01:30 - 02:00 other funny-looking things. This thing
here is a space and this thing here is a hard or non-breaking space. Now, it's
kind of interesting, because what I'm looking for here is a pattern which will
enable me to get rid of these flags because I don't want them. And you may
also notice that some of the lines of
02:00 - 02:30 data are separated with two paragraphs
and some of them are separated with one. This is just one of the weird things
that happens when you bring stuff in from a webpage or a web table. Right, so, what I'm going to do is just to go back to
where I was so I'm going to undo and go back to the tables and the reason why I
want to do that is that I think I can
02:30 - 03:00 use this little thing here β whoops, that's
the other problem with this β it's got links in it; everything's linked, so if
you click on it, off you go to a webpage. That's a bit annoying if we want to
be able to manipulate text because we'll accidentally click on things that will
go off. Anyway, we'll we'll fix that later on. But to separate the flag and the name
of the country there's that little round, that little circley thing. Now that
may come in handy
03:00 - 03:30 because what I could do is to replace
that with a tab character and that would give me the flags all in its own column
when I convert it back to a table, which is what we'll do after we have fiddled
with the text so that we can put the table into Excel. The problem is that
over here we've got other little circles like that, and that's going to cause some
issues, so what I'm going to do is to
03:30 - 04:00 undo and take ourselves back to here and
just make that a little bit smaller so that we can see what's going on. There we
go. What am I interested in? I'm actually interested in this column here;
that's the only column that I'm interested in. Before we start
manipulating the text I'm going to get rid of things and that will get rid of these
little circley things here because the only circley things that we'll have,
non-breaking spaces, are going to be between the flag and the name of the
country. Okay, so I've got all that there,
04:00 - 04:30 so all selected, so what we can do now is
to delete that column. Beauty, that's gone. Now we'll do the same thing here and
delete those columns. Now we've got a little bit of a problem here because 'Hourly' is
spread across two columns, which is a bit annoying, so what I'm going to do is to
delete that cell and I'm going to β let's try shifting
cells left and see what happens. There we
04:30 - 05:00 go. Okay, well that's not too bad.
And I'll delete those cells, and let's try the same thing. Beautiful! Okay, so now
we've just got 'Country' and 'Nominal US' and we've got these PPP
things and whatnot so let's get rid of those. That's going to be a bit interesting.
Let's get rid of Country as well. Let's delete that whole row. Ah, there we go.
Beautiful! And now we can get rid of
05:00 - 05:30 these things here. I'm going to select
those, and we might want to keep the reference, the date reference there. It
doesn't appear to have any of those little circley things, so that's okay.
We'll probably keep that because that might be handy to refer back to later,
and I'm now going to right-click and right-click β thank you, come on Word β and
I'm going to delete those two columns.
05:30 - 06:00 So now I've got some quite nice structured
data that has the country and then the US price etc. In fact, we probably don't even
need that whole thing but we'll leave that in place for the moment. Now what I
want to do is I want to fix these links. You notice that every time you go near
the country name or the flag β or the country name anyway β it turns into a link
so that if I click on that accidentally I'm going to go into a world of pain
because it's going to take me off onto a
06:00 - 06:30 different website. So I'm going to turn
this back into just straight ordinary text and again I'm going to go up to the
table here and convert it from table back into text. I'm going to separate
them by tabs, as we've seen before, and now we're set up and we're ready to take
the next step. You'll notice that here is that little non-breaking space, so what I
want to do is to replace all of those
06:30 - 07:00 with a tab character so that then
should give me the flags in one column, the country in another, the
dollar price in US dollars per hour in another, and then when it was measured in
the last one. And that's going to be pretty handy. So I need to do a pretty advanced
search and replace. Ctrl H on Macs or Windows will bring up a search and
replace box over here on the left-hand side and we can put in
a word or whatever we want to find and
07:00 - 07:30 replace but we've also got a dropdown
here which enables us to change things. Down the bottom here we've got
non-breaking spaces so I want to change a non-breaking space. The code for that
is caret s, and what I'd like to change that into is a tab character. Okay, so,
let's see how we go. Let's do a replace all. They made 203 replacements and now
we've got the little tab characters
07:30 - 08:00 between the flag and the name of the
country. You beauty! But we've still got this problem where we have two paragraph
markers separating some and one paragraph marker separating the other. So
that's going to give us some empty rows when we get it into Excel. But we can use
exactly the same technique. We can say: take a paragraph mark and wherever you
see two of them β I'll have to type this
08:00 - 08:30 in manually, shift 6 and P β whenever you
see two paragraph markers, I want you to replace it with one paragraph marker. Just move that to the right a little bit so you can see that a little bit more
easily. I'm going to take two paragraph markers and replace them with
one and I'm going to say replace all and look at that, all fixed. You beauty! So now
I'm going to click once more into here β
08:30 - 09:00 actually, I'll get rid of that nominal US
dollar line; I think I'll get rid of the whole thing. I don't really need that for the moment. I will now select all β CTRL A on a PC,
Command A on a Mac β and I'll go back up into my
table menu and I will convert that text back into a table and we want to
separate that with tabs and let's just see what that looks like.
Look at that, fantastic! Now we've got a
09:00 - 09:30 few empty columns here but that doesn't
really matter, that's fine. I notice that there's no value for Austria. Oh well,
there you go. I'm just going to take a quick scroll through
here and have a look and see that everything's behaving itself, and that
certainly looks to be the case. We could scroll down to the bottom and see if
there's any β¦ no, it all looks good. Fantastic! Wait a minute, I saw some
interesting stuff down the bottom here.
09:30 - 10:00 What happened there? Aha, looks like Nepal
has played up a bit. So I'm going to take Nepal and copy it and put it over here,
and take that one and that one and put it back into the right place. It looks
like it's just Nepal that's done that but we'll β oh, we've still got those
blessed links, which we'll have to get rid of in a minute. Okay, so I'm going to
take Nepal and copy it or cut it and
10:00 - 10:30 paste it in over here. Sometimes
you need to do a little bit of manual labour on data when you're cleaning
it up and if we were dealing with millions of records there are more
powerful tools than Word that we could use to do that. So it looks like it's
just that; we'll just do a quick scan through to see if there are any others. Ah, there's a Switzerland; typical of Switzerland! Okay, so we'll go and grab that and it
didn't have any data by the look of
10:30 - 11:00 things. We'll cut that out, and
pop that into there. Okay, so we won't miss out on information about
Switzerland. So there you go. Everything looks okay. Terrific!
Right, so now we can click up the top here to select that entire column, and
guess what? We're now able to delete that column and get rid of all the flags.
Isn't that fantastic! Okay, one more thing to do is to get rid of all these links. So when you click over here you've got
11:00 - 11:30 links. Again, this has to be a text
operation; straight text without a tabular format so before we go any
further I'll just get rid of these things here so that they're out of the
road β delete that column, thank you very much. And now we have to invoke something which is quite special. Now this is documented on the Microsoft site
but how do I get rid of all of those links? Now I could go and select it
carefully and then right-click it and go
11:30 - 12:00 down to Link here and remove the link
but that's going to take a long time. We've got a computer so let's make it do
stuff for us a little bit more easily. So to save you the hassle of going to find out
where this is, on Windows if you hold those keys down β CTRL, SHIFT and then
press F9, which is a function key, or on a Mac you might need to add the function
command so that the F9 key doesn't do
12:00 - 12:30 fast-forward on your keyboard but it
does F9 stuff in Word. So let's put that window back to where it came from. Select all, and we will do β because I'm on a Mac I'll do function, command,
shift and press F9. Right, so now you'll
12:30 - 13:00 notice when I hover over the top of these
things it's no longer a link. Fantastic! Okay, select all. Turn it back into a table. Strictly speaking, this isn't necessary
because Excel will understand tab delimited text. I'm just doing this for
for the sake of it. So copy and now swap back to Excel and the Excel
document that we want to work on is the one that has the Big Mac pricing in it.
You know, that came from the previous blog. But
13:00 - 13:30 before I paste stuff in I'm going to
paste it into a separate sheet and I'll just call this 'Minimum wage'. Click in row 2 because I want to put labels at the top here to help me, and
paste. Again, it's a fairly large dataset, might take a minute. There it goes. So now
we can put in a header for this at the top; we can say that that's Country, and
this is the US dollar per hour, and then
13:30 - 14:00 here is when that data was collected if
we need to refer back to that at some stage later. Let's select all three of
those and double-click the little division between them. You'll notice that
the cursor changes shape and that will automatically size the things for
the contents. Sometimes that doesn't quite work properly so Bosnia and
Herzegovina, for argument's sake β¦ There we
14:00 - 14:30 go, that's a bit better. We just do that
one manually. Okay, so now we've got every single country listed and where possible we've got their minimum wage. Great, so what to do with that? Because back here we have got all the countries. Now they're not of all
countries in the world, as we saw on the map. So which countries are here and
which countries are here and how do I
14:30 - 15:00 get them to line up is going to be a
pain in the neck if I do that manually but again we've got a computer so let's
do something computery. And that's what we'll do in the next video, where we'll
finish this task.
[JS1]