Understanding Samples and Populations

Sampling&Distributions1

Estimated read time: 1:20

    Summary

    In this engaging lecture, Erin Heerey elucidates the nuances of sampling and distributions, both critical to understanding population dynamics and statistical analysis. The session distinguishes between various distributions and their importance in statistical inference, stressing the distinction between sampling distributions and population distributions. The lecture further explores how samples act as models for populations, enabling insights into population characteristics and abstract generalizations. This exploration extends into methods of sampling and their influence on research conclusions. Her lecture is like a journey through statistics, where abstract concepts and concrete data points interact. She likens this statistical exploration to understanding a car, both from the user's abstract viewpoint and the mechanic's detailed perspective, highlighting the need for multifaceted understanding in research and statistics.

      Highlights

      • Erin Heerey breaks down the difference between sampling and population distributions seamlessly. It's like a trip into the land of statistics made fun! 🎢
      • Understanding samples as models for populations helps us see the big picture in research. 🖼️
      • Think of statistics like car mechanics; both need a balance of abstract knowledge and concrete action. Rev up your engines! 🚘
      • Distributions help describe data points and their density, offering insights into the broader population landscape. 🌍
      • Sampling is a crucial part of statistical inference, crucial for painting a true picture of the population. 🎨

      Key Takeaways

      • Sampling and distributions are foundational concepts in statistics, helping us understand population dynamics with precision and insight. 📊
      • Samples are tools for abstraction, allowing researchers to make informed generalizations about populations. 🧠
      • Understanding the levels of abstraction in statistics—concrete (like specific samples) and abstract (like general population models)—is essential. 🤔
      • The lecture draws parallels between statistical thinking and mechanics, illustrating the importance of both high-level understanding and detailed knowledge. 🚗
      • Population distributions, often theoretical in psychology, require careful sampling to make effective real-world conclusions. 🔍

      Overview

      In this chapter of our statistical journey, Erin Heerey introduces the fascinating world of sampling and distributions. We learn the importance of distinguishing between sampling distributions and population distributions, both crucial in understanding the broader scope of statistics and research. Erin lays the groundwork by explaining how samples serve as abstract models for studying populations, providing insights into the complexities of human and social phenomena. The discussion also touches on the methodologies of sampling, showing how they influence the outcomes of research.

        Erin continues to unravel these complex ideas by comparing the statistical process to the workings of a car. Just as driving requires an understanding of the abstract functions of a car alongside the intricate mechanics, mastering statistics demands a balance of concrete and abstract thinking. This analogy helps demystify the abstraction inherent in statistical analysis, making it more relatable and easier to grasp for learners.

          The final segments of the lecture underscore the concept of distributions, which describe the 'sample space' by detailing potential data ranges and densities. By illustrating the differences and relationships between sample and population distributions, Erin helps us appreciate the theoretical nature of population distributions in psychology. She emphasizes the importance of sampling as a proxy for real populations, guiding researchers to make accurate generalizations while avoiding common pitfalls. Erin leaves us with a tantalizing glimpse of what lies ahead in the lecture series, promising an in-depth exploration of probability next.

            Chapters

            • 00:00 - 00:30: Introduction to Sampling and Distributions This chapter introduces the concept of sampling and distributions, focusing on sampling distributions, population distributions, and distributions of statistics. It also covers methods of sampling as part of the lecture's aim to familiarize students with these fundamental concepts essential to the coursework.
            • 00:30 - 01:30: Understanding Samples as Models The chapter "Understanding Samples as Models" discusses the concept of samples being models for populations. It begins with an introduction to distributions and explains how a representative sample can provide insights into the behaviors of a population. The chapter emphasizes the abstract nature of samples because there is always uncertainty in how accurately they reflect the population.
            • 01:30 - 02:30: Questions Answered by Samples The chapter discusses the importance of comparing models to real-world situations and the necessity of making generalizations. It elaborates on the process of conducting experiments where researchers draw a sample, run an experiment on this sample, and make conclusions about the broader population based on the sample. The chapter emphasizes that the ability to generalize findings to the population depends heavily on how representative the sample is of the actual population.
            • 03:00 - 05:00: Concrete vs Abstract in Statistics This chapter discusses the role of samples as models in statistics, focusing on how they help answer questions about the form of a topic of interest. It explores how the topic is distributed within a population, its shape, and its appearance.
            • 06:00 - 10:30: Introduction to Sampling Distributions The chapter "Introduction to Sampling Distributions" questions how a certain function or process works in a population and its interaction with other topics, participant characteristics, or environments. It emphasizes the significance of characteristics and models in understanding phenomena and their interactions.
            • 13:00 - 14:00: Population vs Sample Distributions This chapter discusses the concept of discerning underlying traits that shape form or function based on the characteristics of a model. It relates the way we can think about models in a similar fashion to architectural models, such as those for buildings or bridges. The relationship between models built from samples and their relevance or connection to larger populations is emphasized.
            • 15:00 - 18:00: Visualizing Sample and Population Differences The chapter focuses on understanding differences between sample and population data using visualization techniques. It compares the role of statistical models to architectural models in understanding interactions and environmental conditions, highlighting the need to work at different levels of abstraction when analyzing samples.
            • 21:00 - 22:00: Conclusion and Next Steps This chapter emphasizes the importance of understanding statistical concepts at both an abstract and practical level. It highlights the distinction between model population functions and concrete examples such as hand calculations or coding tasks, like calculating the standard deviation covered in the previous lab. The chapter encourages integrating both levels of understanding for a comprehensive grasp of statistics.

            Sampling&Distributions1 Transcription

            • 00:00 - 00:30 This week's lecture is going to cover two topics.  The first one is distributions and sampling and   the second one is probability. We're going  to start out with distributions and sampling.   Now this is a topic that is extremely  important to what we do, so the goal   of the lecture will be to acquaint you with what  we call sampling distributions and to talk about   the differences between sampling distributions  and population distributions and to talk about   distributions of statistics. We'll also talk  a little bit here about methods of sampling,
            • 00:30 - 01:00 and we'll be going into that a couple  of times over the course of this term. Let's start with distributions  and what they tell us.   Now one way to think about a sample is to think  about a sample being a model for the population   from which it was drawn. If that sample is  representative, it can give you insight into   how a population works on an abstract level;  and it's abstract because we're never certain
            • 01:00 - 01:30 how well the model compares to the real thing  and that means we need to make generalizations.   For example, if you are running an experiment,  you draw a sample you run those people in your   experiment and then you make conclusions about  the average person in your population based on   that sample. The degree to which the you can make  generalizations about your population from your   sample will depend on how representative  your sample is of your actual population.
            • 01:30 - 02:00 So, in thinking about samples as models,   they answer a couple of different kinds of  questions. The first question they answer   for us is a question about form - what does our  topic of interest look like within our population,   how is it distributed, what shape  does it take, what does it look like?
            • 02:00 - 02:30 We can then move on to function and  ask the question, how does this in   thing that we're interested  in work in this population?   How does it work or interact with other topics?  how does it work or interact with participant   characteristics or processes or environments? And  finally we can think about the characteristics and   model those. Models can tell us something  about the characteristics of a phenomena,
            • 02:30 - 03:00 so the idea here is whether or not we can discern  underlying traits that shape form or function,   based on the characteristics of the model. We  can think about these models in some ways the   same way you might think of an architectural  model for a building or a bridge or something   like that. The models that we build from our  samples, relate to our populations; and in
            • 03:00 - 03:30 some ways function like architectural models to  understand how people will interact with those   structures how environmental conditions will  interact with those structures and so forth. The other thing that you need to understand, is  that when we're considering statistics we need   to consider the level of abstraction. Here, we  need to be working at two different levels. In   order to consider the workings of a sample which  are very concrete and the way in which it might
            • 03:30 - 04:00 model population functions those are very abstract  elements. To understand statistics we need to be   working at both of those levels. If we're thinking  about hand calculations or if we're thinking about   writing code that calculates, for example, the  standard deviation, as you did in last week's lab   that's a very concrete example. It's local, it's  specific to the very piece of information, the
            • 04:00 - 04:30 variable, that you're computing. It's specific  to your population. So these are things that are   local, concrete, or physical - the facts you get  from your observed data. These are computations.   They're statistics and the relationships within  your sample. They're the graphs you get and so   forth. But then we also need to think at a more  global level - this more abstract level where we   need to think about estimation of populations. We  need to think about, could we model a population
            • 04:30 - 05:00 by simulating it? What about the relationships?  Could we see those in our models of the   population? Are they likely to exist? These are  things like understanding knowledge creation,   hypothesis testing and so forth. In some senses,  this is a little bit like thinking about driving   a car. When we drive a car, we work at a very  abstract level right? You have the steering   wheel - you know what it does. You have the gas  pedal and the brake pedal - you know what those
            • 05:00 - 05:30 do. And for most of us if we opened the hood we  wouldn't know what was going on underneath the   hood. There are lots of little elements  in there we know they all work together.   Most of us know where to put the windscreen  washer fluid and oil in, but beyond that,   most people don't have a very good working  knowledge of what happens under the hood.   Your mechanic, on the other hand, should  have a very concrete working knowledge   of how those individual pieces fit together and  how they function as a system. So your your auto
            • 05:30 - 06:00 mechanic needs this kind of global abstract level  to be able to drive cars and to work in that way,   but they also need the local concrete physical  elements and they need to understand the   relationships between those in order to make your  car work when you take it in for repair. So it's   important to think at both of these levels. In  statistics, what we're going to be doing is we're
            • 06:00 - 06:30 going to be thinking at both the local concrete  physical level when we think about specific   samples and characteristics of samples, but we're  also going to be thinking about these more global   and abstract ideas - how do we generalize from a  sample to a population? How do we make inferences?   And how do we understand how our specific theories  or phenomena work in a broader population? One of the things that allows us to connect the  local physical elements of a sample and the larger
            • 06:30 - 07:00 population is "sampling distributions".  So we're going to start by unpacking what   distributions are. First, distributions  describe what we call the "sample space."   They describe the range of possible data points  for a given variable. They describe the density   of scores at different values within the range.  This is a little graphic of a distribution here
            • 07:00 - 07:30 and what you can see is there are more  the data are more densely distributed   at this data point than they are at this data  point (there are more of them here). So we can   look at the density of scores at different  values over the range of possible scores.   They're (sampling distributions) are extremely  important to statistics. The inferences that   we can make depend on the distribution type, so  the likelihood of getting any specific value in   a random sample depends on how those data points  are distributed within the broader population.
            • 07:30 - 08:00 We also know that distributions come in  different shapes. So scores or values are   arranged in some order and they're plotted  according to a frequency and there are a   number of parameters that describe these shapes.  We'll talk about those in a couple of slides.   Let's think about this very concretely for  now. Let's think about the distribution of   a sample. We'll drop in a sample - this is what  I'm creating here: a histogram of values of data.
            • 08:00 - 08:30 These are data points in a single sample. Each  point is a single observation. To interpret this   sample, we need to know how likely it is that  each of these data points is to occur. So each   of these data points represents the score of a  different person. These are all... think about   one person filling out a questionnaire and then  maybe you calculate the average value from for
            • 08:30 - 09:00 the items in that questionnaire and that's what  you're plotting on this histogram. So everybody   in this histogram is contributing one score to  the sample. That's the distribution of a sample.   To understand how this sample is representative  of, or how it might match the population, what   we need to know is how likely each of these data  points is to occur in the population distribution.
            • 09:00 - 09:30 Another kind of distribution we can  think about is a population distribution.   The current population of Canada as  of the 30th of December last year is 38 million give or take. The average age of this  population is 41.9 with a median of 41.6 so that's   a population distribution. Each point is a single  observation from one participant just like the
            • 09:30 - 10:00 distribution of a sample. Here, however, every  possible participant is represented. So a sample   is a selection, hopefully a random one, of people  from a population. And a population is everyone   who's in that population. What a population  is, is going to depend on how you define it.   So I could say the population of all Canadians  and that would be a population distribution. I
            • 10:00 - 10:30 could say the population of all Americans, and  that would be a population distribution so how I   define it is going to shape what my population  distribution looks like. But regardless,   each data point in that distribution is a  single observation from one participant.   So in that way a population distribution  and a sample distribution are similar.   Now we don't really ever sample every person  from a population unless your population is very
            • 10:30 - 11:00 very small. so occasionally I see an article in  Psychology where we have a population distribution   that gets measured because, for example you  might be dealing with a very rare illness   in which case every possible member of  that population might be measurable,   but most of the time, population  distributions are sort of theoretical.
            • 11:00 - 11:30 So let's just review the difference between  a sample and a population, because I might   ask you to tell me to differentiate these graphs  on an exam. If we think about samples, these are   going to be small samples of people, randomly  selected from this particular population. And   these samples, some of them look very different  right? We have a sample here that looks like   this - it was sampled from this population with  these characteristics and what you can see these
            • 11:30 - 12:00 are all samples of exactly the same size and they  look very different. They have different ranges.   This one ranges from age 15 to age 40. tThis one  ranges from 20 to 80. here's one that ranges from   0 to 80. and they have different distributions  of individuals within that; within those samples,   just from pure random sampling they look  different. So sample distributions - one of
            • 12:00 - 12:30 the things you'll notice, is that they tend to  be more irregular than population distribution   simply because there are fewer people in them.  Now that's not always true this is a sample   distribution it looks quite regular in fact it  looks rather relatively normal. But we also can   see that our frequency our sample count here (in  the sample) is a very small number whereas our   population count here (in the population) is  a very large number. So you can kind of see;   one of the ways to tell the difference between  a sample and a population distribution is how
            • 12:30 - 13:00 big does it go. If we're dealing with super huge  numbers we're probably dealing with populations. So are they real in psychology? As  I said, they are mainly theoretical.   We try to generalize to huge populations of  people living all over the world and most   of those populations need to be sampled. That  means that we assume their distributions based
            • 13:00 - 13:30 on the sample data. So we use a sample to be a  proxy for that population - we are almost always   generalizing from samples. I will stop there and  we'll continue in the next section of the lecture.