DescriptiveStats1
Estimated read time: 1:20
Summary
In this lecture led by Erin Heerey, Descriptive Statistics are discussed as foundational tools for summarizing data. The session highlights the importance of graphical representations of data, referencing Florence Nightingale’s contributions. Heerey also delves into historical anecdotes such as Dr. James Lind’s scurvy trials, illustrating the need for accurate theories and representative samples in research. The lecture underscores that descriptive statistics are vital for drawing meaningful conclusions from data, aiding in the understanding and estimation of population characteristics.
Highlights
- Descriptive statistics help summarize and find patterns in data 📊.
- Florence Nightingale used data visuals to pioneer effective policy-making 🌸.
- Scurvy experiment by Dr. Lind highlights the need for accurate scientific methods 🍋.
- Descriptive statistics are foundational to making inferences from statistical data 🔍.
- Graphs and charts highlight data patterns missed in spreadsheets 📉.
Key Takeaways
- Descriptive statistics are essential for summarizing data and understanding its patterns 📊.
- Florence Nightingale pioneered the use of graphical data representation to influence policy decisions 🌸.
- Dr. James Lind's scurvy experiment showcased the significance of accurate theories and proper sampling 🍋.
- Understanding graphical representations helps in viewing data patterns that are not evident in raw numbers 📉.
- Descriptive statistics are not just about numbers; they are about drawing insights and conclusions about populations 🔬.
Overview
Erin Heerey begins the lecture by delving into descriptive statistics, highlighting how they serve as foundational tools for summarizing data. The focus is on understanding central tendency and dispersion, and how graphical representations like charts and graphs can reveal hidden patterns in data. Heerey underscores the importance of visuals for better comprehension and informed decision-making.
The lecture is sprinkled with engaging historical anecdotes, most notably Florence Nightingale’s use of graphical data to influence sanitation practices during the Crimean War. Heerey highlights how Nightingale's ‘Rose Diagram’ emphasized policy changes, demonstrating the practical application of descriptive statistics in real-world scenarios.
Finally, Heerey shares the 18th-century story of Dr. James Lind who conducted early clinical trials aboard a British Navy ship. His experiments underscore the importance of theories and representative samples in research, showing the intricate relationship between theory, data, and inference. It's a compelling narrative on how descriptive statistics go beyond numbers to offer insights into population dynamics.
Chapters
- 00:00 - 00:30: Introduction to Descriptive Statistics The chapter introduces the concept of descriptive statistics, which focuses on summarizing and describing data. It covers key topics such as measures of central tendency and dispersion, as well as interpreting graphical representations of statistical data. The chapter begins with a question about recognizing a famous statistician.
- 00:30 - 01:30: Florence Nightingale's Contribution Florence Nightingale is a notable figure both as a nurse and a statistician. She was the first woman to be elected as a Fellow of the Royal Statistical Society, and is recognized for her innovative approach to presenting data visually.
- 03:00 - 05:00: The Importance of Descriptive Statistics This chapter discusses the historical development and significance of descriptive statistics. It highlights the contributions of a pioneering individual who proposed the use of graphical data displays to enhance understanding and inform policy decisions. The chapter features a notable quote from August 1857, illustrating the use of diagrams as a tool for expression and influence.
- 10:30 - 12:00: Dr. James Lind's Scurvy Experiment The chapter discusses Dr. James Lind's experiment related to scurvy, though the initial details describe a famous diagram by Florence Nightingale known as the Nightingale Rose Diagram or Polar Area Diagram. Nightingale's diagram visualized excess deaths during the Crimean War caused by poor sanitation in hospitals, emphasizing the preventability of those deaths. Lind's work, while not directly mentioned in this passage, would relate to similar efforts in understanding and preventing medical conditions.
DescriptiveStats1 Transcription
- 00:00 - 00:30 All right. Let's begin the material for this course. We're going to start by talking about descriptive statistics. The goal for this lecture is to talk about these basic summaries of data including quantifying measures of central tendency and dispersion. We're also going to talk about how to read graphical representations of these summaries. I'll start by asking, does anyone recognize this famous statistician?
- 00:30 - 01:00 Odds are, some of you thought you recognized this person until I said the word statistician. This is also a famous nurse. Her name is Florence Nightingale and Florence Nightingale is the first woman ever to be elected as a Fellow of the Royal Statistical Society. She's interesting because she really started this idea of showing data, not just presenting
- 01:00 - 01:30 lots and lots of numbers and little spreadsheets that are crabbed and difficult to make sense of, but she helped pioneer this idea of using graphical displays of data to help shape understanding of the data and therefore shape policy. A famous quote that she wrote in a letter in August of 1857 was, "Whenever I am infuriated I avenge myself with a new diagram." So her diagram,
- 01:30 - 02:00 this is a picture of one of her most famous diagrams. It used to be called the Nightingale Rose Diagram it's also known as the Polar Area diagram. And what she did was she diagrammed excess deaths associated with the Crimean War. The British were in the Crimean War and there were a lot of excess deaths associated with poor sanitation conditions in hospitals. She documented these deaths and talked about how they were preventable; and in fact how to prevent
- 02:00 - 02:30 them. These were things like digging latrines far away from drinking water, and for doctors to wash or at least in some way sanitize their hands between seeing patients to avoid the spread of communicative diseases. So she's responsible for some of the earliest sanitation practices that we see in hospital settings around that time and she did that by making graphs. So looking
- 02:30 - 03:00 at your data is probably one of the most important things you can do because a graph or visualization of the data shows patterns in the data that you wouldn't see if you were just looking at a sheet with hundreds or thousands or millions of numbers on it. We're going to talk about a number of different graphs over the course of lecture. I'm going to talk to you about how to read them and what they mean and we're going to talk about the statistics that they show. So let's first begin with the idea of descriptive statistics. Why are we learning this? Let's talk about the
- 03:00 - 03:30 elephant in the room and I hope that this helps you understand why it is this is probably one of the more important elements and statistics that you can and should take away from this course. Descriptives are absolutely fundamental to understanding a data set. We can't draw conclusions about what data mean without a summary of those data, so these summary statistics allow us to see patterns in the data where the raw data would be too detailed. One reason that's important
- 03:30 - 04:00 is because most of the studies we run only sample the available population, we don't sample everyone in a population. The populations that you know of, i.e., people in the world are just far too large, so descriptive statistics can really help us quantify the uncertainty in the sample and that allows us to eventually (we're not talking about this yet) but without descriptive statistics we
- 04:00 - 04:30 can't make inferences about how things work or about populations because those inferences need to be based on facts. They need to be based on things we know about the data we have observed and descriptive statistics quantify some of those facts. So, these are things that we quantify based on the data we specifically collect so what do we mean by descriptive statistics? This is a term that's given to the form of data analysis that produces meaningful summaries of data that produces patterns in the data or that identifies patterns in the data. Like how
- 04:30 - 05:00 data vary across groups or across conditions, relationships between different variables (we won't talk about that in this lecture we will give that its own lecture another time). And then there's also the notion of uncertainty so the variability within a data set tells us about how certain we can be about the patterns that we see. We're going to talk about how to get measures or metrics of that variability in this lecture, as well as in this week's lab.
- 05:00 - 05:30 So descriptive statistics are facts that describe a specific data set. and they are critical to capturing how well a sample fits the population from which it was drawn. they're very critical to our ability to make estimates about a population. Why are they important? Well, they help us to understand the world and how it works by allowing us to gain insight about these specific data that we have collected.
- 05:30 - 06:00 So descriptive statistics allow us to describe a data set that we have collected. The statistics we calculate become estimators that tell us about the population and without accurate descriptions of a sample we cannot make good inferences. And remember there's a lot more to making good inferences than just descriptive statistics. In
- 06:00 - 06:30 fact we need research methods for that also. The inferences we make also require accurate descriptions of data. Accurate inferences, they require amongst other things a good experimental design that allows us to rule out extraneous or confounding variables. We need representative samples so our population is accurately represented by the sample that we've sampled, and that the population is the correct population with respect to when we're thinking about to whom
- 06:30 - 07:00 the theory applies that that we're interested in. And finally, we need accurate theories. So, theories are always oversimplifications; they are always abstractions of how a process works in real life. But the theory needs to be close enough to the ground truth to be a reasonable description of that theory. Now, this isn't going to be on the test but I'll give you an example one of the very first randomized clinical trials was conducted by Dr James Lind in 1747. Dr Lind
- 07:00 - 07:30 was a physician, or a surgeon as he was known at the time, on board a British Navy ship. This story what happened you know during the 1700s and and in the centuries before it. Basically between about 1500 and probably 1850 or so what were the British out doing they were out colonizing various places in the world. If you haven't colonized the world
- 07:30 - 08:00 you have to sail for a pretty long time before you get to a friendly port where you can do things like take in fresh water and take in fresh food. So, as the British began to colonize the world, because the they certainly did that, what they were doing is going on very long voyages so they would be out at sea for three or four or sometimes even six months at
- 08:00 - 08:30 a time and during those voyages Sailors would fall ill with a with a disease called scurvy. Scurvy is awful your bones. They disintegrate inside of your body, you develop all kinds of lesions. We know today that scurvy is caused by a deficiency in vitamin C vitamin C is a substance that our bodies cannot make. We have to take in food that contains it in order to avoid scurvy.
- 08:30 - 09:00 So, because they were out sailing for such long periods of time sailors on every voyage would fall ill with scurvy and many of them would die. So, Dr James Lind was on one of these voyages and in a period of time because this does tend to happen relatively regularly, he had 12 sailors who were all suffering from scurvy and to suffering to the extent that they
- 09:00 - 09:30 were no longer able to carry out their ship's duties. They were consigned to the sick bay. Dr Lind decided that he would try some different treatments to see what worked and he had six treatments all of which were, or at least most of which were, adding something to the sailor's diet. One of the treatments was, of course, a control treatment in which nothing special was added to
- 09:30 - 10:00 the sailor's diet they got the same rations the same water the same of everything else. The second one was adding salt water to the sailor's diet. The idea there was that, as we all know, salt kills lots of bacteria so it's a good preservative. For example in if you've ever eaten a piece of beef jerky you've eaten a lot of salt there are all kinds of cultures in which salt is a preservative for various vegetables and meats. Lind thought that drinking salt water might be one
- 10:00 - 10:30 possible way of helping to cure scurvy and so the sailors who were assigned to that condition ended up getting an extra portion of salt water in their diet - well not really an extra portion of just a portion of salt water. The third condition that he used was a condition called elixir of vitriol. Don't try this at home, folks. Elixir of vitriol is sulfuric acid. It's relatively poisonous.
- 10:30 - 11:00 Another group of sailors got an extra portion of ship's grog added to their diet which I'm sure nobody was disappointed about. Ship's grog is rum and so they had an extra portion of watered rum as part of their their diet. And then the final two groups: one of those groups got apple cider vinegar as part of the treatment and the other group got the juice of lemons and limes every day mixed together with their ship's
- 11:00 - 11:30 grog and hence we have a cocktail. It turns out that the group who were getting Citrus juice mixed in with their Grog that group was doing relatively well and in fact within the first week or a week and a half of the trial that group, both the two men who had been randomly assigned to that group, these were all male sailors on this boat. So, the two men who'd been randomly assigned to that group
- 11:30 - 12:00 found that they were better enough, they were healed enough to return to their normal duties and the other the group that got apple cider vinegar got a little bit better not quite so quickly and not quite so thoroughly as the lemon and lime as a citrus group but they got better as well, and so the idea became that lemon and lime juice were good for scurvy and that was all fun and good and scurvy was cured for a very long time in terms of the British Naval experience.
- 12:00 - 12:30 Now, the problem is, scurvy wasn't really cured. Here's what happened. This is the bit where you need an accurate theory. so Lind publishes this this paper in which he claims that lemon and lime juice the juice of citrus fruits cures scurvy. So, the British Navy on the back of this advice starts to bottle lemon and lime juice and they send it out with their sailors. It's boiled to purify it and then it's bottled
- 12:30 - 13:00 so that it could be kept over the long term. Now what you might not know is the process of boiling citrus fruit gets rid of the vitamin C. It's very volatile and it comes out in the steam associated with the boiling process so that was, it turned out, not to be a good idea or it was, but it wasn't quite the right theory. It
- 13:00 - 13:30 wasn't quite the right theory but nobody knew it because by the time they started doing this the journeys that sailors were making on boats were substantially shorter and so by reducing the travel times, sailors got off the ships more quickly. They got on land more quickly where they could eat fresh fruits and vegetables and replenish the vitamin C in their systems
- 13:30 - 14:00 before getting back on the boat again and so scurvy didn't rear its ugly head again until the Shackleton Expeditions where they were walking toward poles for very long periods of time carrying things like dried beef and not much in the way of dried vegetables to go with it so scurvy reared its ugly head again on those Expeditions and that's when they discovered
- 14:00 - 14:30 vitamin C, which was the thing that was the active ingredient in the treatment. So, in order to make good inferences we actually need to have all the pieces in place. We need to have our descriptive statistics we need to have a good experimental design and James Lynn's design was pretty good. Actually his population was totally representative of the population of human beings, even though he had only men and probably only white men.
- 14:30 - 15:00 My guess is he probably didn't have too diverse of a crew. However people are people in this domain. If you don't get enough vitamin C you will get scurvy eventually. So in this case it didn't matter that he didn't have a really truly representative population. His population was representative to the group to which the theory applied. And finally they needed an accurate theory. So the theory that was created on the back of this experiment wasn't
- 15:00 - 15:30 quite the right one - it was an oversimplification of what was really needed so this is an example of the kind of information the kind of things we need to combine when we think about statistics. We can't just think about statistics. You have to think about all the other pieces as well when you are considering what the real statistics are, what the real story is, and what a statistic tells you. So descriptive statistics start with the facts. They give us the facts
- 15:30 - 16:00 from specific data sets. And the inferences we make rely on the presence of those facts. The facts are things we know about our data from what population did we sample; what sampling methods were used; was it a random sample; was it a convenience sample; how were participants assigned to groups? Were they randomized in the James Lind example, he used a used a die and so he cast it and by casting the dice, they assigned participants to groups
- 16:00 - 16:30 until he had two people for each group. What are the relationships between the variables? What do the distributions of data look like - the range, the standard deviation and variance; what are the central tendencies looking like in the data as well. So these facts that we use when we talk about data sets, those are critically important elements of descriptive statistics and we need them for that reason. So I'm going to cut the lecture here and we'll pick up with the next section in the next portion of the lecture.