Understanding Samples and Populations

Sampling&Distributions1

Estimated read time: 1:20

Summary

In this engaging lecture, Erin Heerey elucidates the nuances of sampling and distributions, both critical to understanding population dynamics and statistical analysis. The session distinguishes between various distributions and their importance in statistical inference, stressing the distinction between sampling distributions and population distributions. The lecture further explores how samples act as models for populations, enabling insights into population characteristics and abstract generalizations. This exploration extends into methods of sampling and their influence on research conclusions. Her lecture is like a journey through statistics, where abstract concepts and concrete data points interact. She likens this statistical exploration to understanding a car, both from the user's abstract viewpoint and the mechanic's detailed perspective, highlighting the need for multifaceted understanding in research and statistics.

Highlights

Erin Heerey breaks down the difference between sampling and population distributions seamlessly. It's like a trip into the land of statistics made fun! 🎢
Understanding samples as models for populations helps us see the big picture in research. 🖼️
Think of statistics like car mechanics; both need a balance of abstract knowledge and concrete action. Rev up your engines! 🚘
Distributions help describe data points and their density, offering insights into the broader population landscape. 🌍
Sampling is a crucial part of statistical inference, crucial for painting a true picture of the population. 🎨

Key Takeaways

Sampling and distributions are foundational concepts in statistics, helping us understand population dynamics with precision and insight. 📊
Samples are tools for abstraction, allowing researchers to make informed generalizations about populations. 🧠
Understanding the levels of abstraction in statistics—concrete (like specific samples) and abstract (like general population models)—is essential. 🤔
The lecture draws parallels between statistical thinking and mechanics, illustrating the importance of both high-level understanding and detailed knowledge. 🚗
Population distributions, often theoretical in psychology, require careful sampling to make effective real-world conclusions. 🔍

Overview

In this chapter of our statistical journey, Erin Heerey introduces the fascinating world of sampling and distributions. We learn the importance of distinguishing between sampling distributions and population distributions, both crucial in understanding the broader scope of statistics and research. Erin lays the groundwork by explaining how samples serve as abstract models for studying populations, providing insights into the complexities of human and social phenomena. The discussion also touches on the methodologies of sampling, showing how they influence the outcomes of research.

Erin continues to unravel these complex ideas by comparing the statistical process to the workings of a car. Just as driving requires an understanding of the abstract functions of a car alongside the intricate mechanics, mastering statistics demands a balance of concrete and abstract thinking. This analogy helps demystify the abstraction inherent in statistical analysis, making it more relatable and easier to grasp for learners.

The final segments of the lecture underscore the concept of distributions, which describe the 'sample space' by detailing potential data ranges and densities. By illustrating the differences and relationships between sample and population distributions, Erin helps us appreciate the theoretical nature of population distributions in psychology. She emphasizes the importance of sampling as a proxy for real populations, guiding researchers to make accurate generalizations while avoiding common pitfalls. Erin leaves us with a tantalizing glimpse of what lies ahead in the lecture series, promising an in-depth exploration of probability next.

Chapters

00:00 - 00:30: Introduction to Sampling and Distributions This chapter introduces the concept of sampling and distributions, focusing on sampling distributions, population distributions, and distributions of statistics. It also covers methods of sampling as part of the lecture's aim to familiarize students with these fundamental concepts essential to the coursework.
00:30 - 01:30: Understanding Samples as Models The chapter "Understanding Samples as Models" discusses the concept of samples being models for populations. It begins with an introduction to distributions and explains how a representative sample can provide insights into the behaviors of a population. The chapter emphasizes the abstract nature of samples because there is always uncertainty in how accurately they reflect the population.
01:30 - 02:30: Questions Answered by Samples The chapter discusses the importance of comparing models to real-world situations and the necessity of making generalizations. It elaborates on the process of conducting experiments where researchers draw a sample, run an experiment on this sample, and make conclusions about the broader population based on the sample. The chapter emphasizes that the ability to generalize findings to the population depends heavily on how representative the sample is of the actual population.
03:00 - 05:00: Concrete vs Abstract in Statistics This chapter discusses the role of samples as models in statistics, focusing on how they help answer questions about the form of a topic of interest. It explores how the topic is distributed within a population, its shape, and its appearance.
06:00 - 10:30: Introduction to Sampling Distributions The chapter "Introduction to Sampling Distributions" questions how a certain function or process works in a population and its interaction with other topics, participant characteristics, or environments. It emphasizes the significance of characteristics and models in understanding phenomena and their interactions.
13:00 - 14:00: Population vs Sample Distributions This chapter discusses the concept of discerning underlying traits that shape form or function based on the characteristics of a model. It relates the way we can think about models in a similar fashion to architectural models, such as those for buildings or bridges. The relationship between models built from samples and their relevance or connection to larger populations is emphasized.
15:00 - 18:00: Visualizing Sample and Population Differences The chapter focuses on understanding differences between sample and population data using visualization techniques. It compares the role of statistical models to architectural models in understanding interactions and environmental conditions, highlighting the need to work at different levels of abstraction when analyzing samples.
21:00 - 22:00: Conclusion and Next Steps This chapter emphasizes the importance of understanding statistical concepts at both an abstract and practical level. It highlights the distinction between model population functions and concrete examples such as hand calculations or coding tasks, like calculating the standard deviation covered in the previous lab. The chapter encourages integrating both levels of understanding for a comprehensive grasp of statistics.

Sampling&Distributions1 Transcription

00:00 - 00:30 This week's lecture is going to cover two topics. The first one is distributions and sampling and the second one is probability. We're going to start out with distributions and sampling. Now this is a topic that is extremely important to what we do, so the goal of the lecture will be to acquaint you with what we call sampling distributions and to talk about the differences between sampling distributions and population distributions and to talk about distributions of statistics. We'll also talk a little bit here about methods of sampling,
00:30 - 01:00 and we'll be going into that a couple of times over the course of this term. Let's start with distributions and what they tell us. Now one way to think about a sample is to think about a sample being a model for the population from which it was drawn. If that sample is representative, it can give you insight into how a population works on an abstract level; and it's abstract because we're never certain
01:00 - 01:30 how well the model compares to the real thing and that means we need to make generalizations. For example, if you are running an experiment, you draw a sample you run those people in your experiment and then you make conclusions about the average person in your population based on that sample. The degree to which the you can make generalizations about your population from your sample will depend on how representative your sample is of your actual population.
01:30 - 02:00 So, in thinking about samples as models, they answer a couple of different kinds of questions. The first question they answer for us is a question about form - what does our topic of interest look like within our population, how is it distributed, what shape does it take, what does it look like?
02:00 - 02:30 We can then move on to function and ask the question, how does this in thing that we're interested in work in this population? How does it work or interact with other topics? how does it work or interact with participant characteristics or processes or environments? And finally we can think about the characteristics and model those. Models can tell us something about the characteristics of a phenomena,
02:30 - 03:00 so the idea here is whether or not we can discern underlying traits that shape form or function, based on the characteristics of the model. We can think about these models in some ways the same way you might think of an architectural model for a building or a bridge or something like that. The models that we build from our samples, relate to our populations; and in
03:00 - 03:30 some ways function like architectural models to understand how people will interact with those structures how environmental conditions will interact with those structures and so forth. The other thing that you need to understand, is that when we're considering statistics we need to consider the level of abstraction. Here, we need to be working at two different levels. In order to consider the workings of a sample which are very concrete and the way in which it might
03:30 - 04:00 model population functions those are very abstract elements. To understand statistics we need to be working at both of those levels. If we're thinking about hand calculations or if we're thinking about writing code that calculates, for example, the standard deviation, as you did in last week's lab that's a very concrete example. It's local, it's specific to the very piece of information, the
04:00 - 04:30 variable, that you're computing. It's specific to your population. So these are things that are local, concrete, or physical - the facts you get from your observed data. These are computations. They're statistics and the relationships within your sample. They're the graphs you get and so forth. But then we also need to think at a more global level - this more abstract level where we need to think about estimation of populations. We need to think about, could we model a population
04:30 - 05:00 by simulating it? What about the relationships? Could we see those in our models of the population? Are they likely to exist? These are things like understanding knowledge creation, hypothesis testing and so forth. In some senses, this is a little bit like thinking about driving a car. When we drive a car, we work at a very abstract level right? You have the steering wheel - you know what it does. You have the gas pedal and the brake pedal - you know what those
05:00 - 05:30 do. And for most of us if we opened the hood we wouldn't know what was going on underneath the hood. There are lots of little elements in there we know they all work together. Most of us know where to put the windscreen washer fluid and oil in, but beyond that, most people don't have a very good working knowledge of what happens under the hood. Your mechanic, on the other hand, should have a very concrete working knowledge of how those individual pieces fit together and how they function as a system. So your your auto
05:30 - 06:00 mechanic needs this kind of global abstract level to be able to drive cars and to work in that way, but they also need the local concrete physical elements and they need to understand the relationships between those in order to make your car work when you take it in for repair. So it's important to think at both of these levels. In statistics, what we're going to be doing is we're
06:00 - 06:30 going to be thinking at both the local concrete physical level when we think about specific samples and characteristics of samples, but we're also going to be thinking about these more global and abstract ideas - how do we generalize from a sample to a population? How do we make inferences? And how do we understand how our specific theories or phenomena work in a broader population? One of the things that allows us to connect the local physical elements of a sample and the larger
06:30 - 07:00 population is "sampling distributions". So we're going to start by unpacking what distributions are. First, distributions describe what we call the "sample space." They describe the range of possible data points for a given variable. They describe the density of scores at different values within the range. This is a little graphic of a distribution here
07:00 - 07:30 and what you can see is there are more the data are more densely distributed at this data point than they are at this data point (there are more of them here). So we can look at the density of scores at different values over the range of possible scores. They're (sampling distributions) are extremely important to statistics. The inferences that we can make depend on the distribution type, so the likelihood of getting any specific value in a random sample depends on how those data points are distributed within the broader population.
07:30 - 08:00 We also know that distributions come in different shapes. So scores or values are arranged in some order and they're plotted according to a frequency and there are a number of parameters that describe these shapes. We'll talk about those in a couple of slides. Let's think about this very concretely for now. Let's think about the distribution of a sample. We'll drop in a sample - this is what I'm creating here: a histogram of values of data.
08:00 - 08:30 These are data points in a single sample. Each point is a single observation. To interpret this sample, we need to know how likely it is that each of these data points is to occur. So each of these data points represents the score of a different person. These are all... think about one person filling out a questionnaire and then maybe you calculate the average value from for
08:30 - 09:00 the items in that questionnaire and that's what you're plotting on this histogram. So everybody in this histogram is contributing one score to the sample. That's the distribution of a sample. To understand how this sample is representative of, or how it might match the population, what we need to know is how likely each of these data points is to occur in the population distribution.
09:00 - 09:30 Another kind of distribution we can think about is a population distribution. The current population of Canada as of the 30th of December last year is 38 million give or take. The average age of this population is 41.9 with a median of 41.6 so that's a population distribution. Each point is a single observation from one participant just like the
09:30 - 10:00 distribution of a sample. Here, however, every possible participant is represented. So a sample is a selection, hopefully a random one, of people from a population. And a population is everyone who's in that population. What a population is, is going to depend on how you define it. So I could say the population of all Canadians and that would be a population distribution. I
10:00 - 10:30 could say the population of all Americans, and that would be a population distribution so how I define it is going to shape what my population distribution looks like. But regardless, each data point in that distribution is a single observation from one participant. So in that way a population distribution and a sample distribution are similar. Now we don't really ever sample every person from a population unless your population is very
10:30 - 11:00 very small. so occasionally I see an article in Psychology where we have a population distribution that gets measured because, for example you might be dealing with a very rare illness in which case every possible member of that population might be measurable, but most of the time, population distributions are sort of theoretical.
11:00 - 11:30 So let's just review the difference between a sample and a population, because I might ask you to tell me to differentiate these graphs on an exam. If we think about samples, these are going to be small samples of people, randomly selected from this particular population. And these samples, some of them look very different right? We have a sample here that looks like this - it was sampled from this population with these characteristics and what you can see these
11:30 - 12:00 are all samples of exactly the same size and they look very different. They have different ranges. This one ranges from age 15 to age 40. tThis one ranges from 20 to 80. here's one that ranges from 0 to 80. and they have different distributions of individuals within that; within those samples, just from pure random sampling they look different. So sample distributions - one of
12:00 - 12:30 the things you'll notice, is that they tend to be more irregular than population distribution simply because there are fewer people in them. Now that's not always true this is a sample distribution it looks quite regular in fact it looks rather relatively normal. But we also can see that our frequency our sample count here (in the sample) is a very small number whereas our population count here (in the population) is a very large number. So you can kind of see; one of the ways to tell the difference between a sample and a population distribution is how
12:30 - 13:00 big does it go. If we're dealing with super huge numbers we're probably dealing with populations. So are they real in psychology? As I said, they are mainly theoretical. We try to generalize to huge populations of people living all over the world and most of those populations need to be sampled. That means that we assume their distributions based
13:00 - 13:30 on the sample data. So we use a sample to be a proxy for that population - we are almost always generalizing from samples. I will stop there and we'll continue in the next section of the lecture.