Sampling&Distributions2

Estimated read time: 1:20

    Summary

    In this video, Erin Heerey explores the intricacies of linking sample scores to populations using probability density functions. She emphasizes the significance of normal distribution, its characteristics, and the role of random sampling in making inferences about populations. The lecture also covers different types of distributions such as uniform, binomial, and normal distributions, along with an introduction to key statistical concepts like skewness and kurtosis. These are crucial for understanding data distribution in psychological research.

      Highlights

      • Understanding probability density functions is crucial for linking samples to populations 🤓
      • Normal distribution is described by a mean of zero and a standard deviation of one 💡
      • Uniform distributions apply to scenarios like fair dice rolls or prize draws 🎲
      • Binomial distribution involves win/lose experiments with repeated trials, often seen in coin tosses 🪙
      • Skewness and kurtosis are key parameters in distribution, indicating asymmetry and tail heaviness 📈

      Key Takeaways

      • Probability density functions help link sample scores to populations 📊
      • Normal distribution is central, featuring the standard bell curve 🎯
      • Uniform distributions have equally likely outcomes, like dice rolls 🎲
      • Binomial distributions involve experiments with two outcomes, like coin tosses 🪙
      • Skewness indicates asymmetry in data distribution, with income data often positively skewed 💸
      • Kurtosis measures tail heaviness in distributions, affecting statistical tests like the t-test 🧪

      Overview

      In this engaging session, Erin Heerey delves into the world of sampling and distributions, focusing on the probability density function. She explains how this function is pivotal in linking sample data to its respective population, highlighting the similarities between this concept and the normal distribution - the familiar bell curve.

        The lecture transcends into the realm of various distribution types encountered in psychology and statistics. Erin discusses uniform distributions using the relatable example of dice rolls, where every outcome is equally probable. She then transitions into binomial distributions, perfect for scenarios with binary outcomes like flipping a coin multiple times.

          The session wraps up with an insightful exploration of skewness and kurtosis, two fundamental parameters shaping data distribution analyses. Erin elucidates these concepts with real-world examples, such as income distribution for skewness, and addresses the significance of kurtosis in relation to the frequency of extreme scores within data sets.

            Chapters

            • 00:00 - 01:30: Introduction to Probability Density Function and Normal Distribution This chapter introduces the concept of a probability density function (PDF) and its role in linking a score from a sample to the broader population. It emphasizes the importance of understanding the PDF as a non-trivial problem, and relates it to the concept of kernel density estimates discussed in a previous lecture, highlighting the smoothing aspect of PDFs.
            • 01:30 - 04:00: Random Sampling and Probability This chapter discusses random sampling and probability, with a focus on the concept of probability distributions. It introduces the probability density function, which describes how the probability of observing a particular data point is distributed across various values. A specific type of probability distribution, the normal distribution or 'bell curve', is introduced and explained.
            • 04:00 - 05:00: Overview of Common Distributions The chapter 'Overview of Common Distributions' discusses important characteristics of distributions, particularly focusing on a distribution defined by a mean of zero and a standard deviation of one. It explains the specific proportions or percentiles within each standard deviation band, ranging from the mean to multiple standard deviations. This typically refers to the properties of a standard normal distribution, where intervals from the mean encompass certain predictable percentages of the entire distribution.
            • 05:00 - 07:00: Uniform Distribution Explained The chapter introduces the concept of a uniform distribution, which is a probability distribution where all outcomes are equally likely. It uses the analogy of marbles to illustrate how values from within this distribution are represented. The chapter emphasizes the importance of understanding this concept in statistical analysis.
            • 07:00 - 10:00: Binomial Distribution Discussion The transcript provides a discussion on the binomial distribution, highlighting the range of values from 0.01 to 0.3 or higher/lower and noting that extreme values beyond plus or minus four are rare, with a frequency less than 0.13 percent. The emphasis is on understanding how frequently different values occur in this distribution.
            • 10:00 - 13:00: Normal Distribution Characteristics The chapter delves into the characteristics of the normal distribution, using the analogy of drawing marbles to illustrate the concept. It suggests that a value of zero will be the most common outcome if a large number of marbles are mixed in a manner reflective of a normal distribution. The chapter leaves off with a segue into discussing probability in a subsequent lecture.
            • 13:00 - 15:00: Skewness and Kurtosis in Distributions This chapter discusses the concept of skewness and kurtosis in statistical distributions. Skewness measures the asymmetry of the distribution, where more common values are likely to be drawn from the distribution. The chapter illustrates this concept by explaining that values close to 0, such as 0.01 and 0.02, are more frequent, while higher values like 0.38 and 0.39 are rare and hence less likely to be drawn. However, the occurrence of less common values does not imply it is impossible, just less probable.

            Sampling&Distributions2 Transcription

            • 00:00 - 00:30 One problem we need to consider, is how  do we link a score from a sample to the   population from which it was drawn  and that's not a trivial problem. The way we do this, is we consider what we call  the 'probability density function' of the sample.   The probability density function of the  distribution plot, remember last lecture   we talked about 'kernel density estimates'  so this is similar to that. It's a smoothed
            • 00:30 - 01:00 histogram that describes the probability that  any given score falls at any band in this graph.   So the distribution plot is a plot of the  probability density function for a given variable,   and it shows the probability of observing a given  data value, a particular data point within a   distribution. Let's take this distribution. It's a  very special distribution. it's called the normal   distribution. You've heard of this distribution -  it's the standard bell curve, and it has a couple
            • 01:00 - 01:30 of interesting properties. It's described by a mean  of zero and a standard deviation of exactly one .  What that means, is that within each  band so from the mean to minus one standard   deviation, from the mean to plus one standard  deviation, from plus one to plus two, plus two   to plus three, and so on, there is a specific  proportion or a specific percentile of the
            • 01:30 - 02:00 data that falls within that band. Now let's unpack  that because this is a really important concept.   Imagine that I want to take a random sample  from this population. Now let's pretend for a   minute that I have a lot of little marbles, And  each marble has a value from this distribution   on it. Some of those marbles will have a value  of zero, some of the marbles have a value
            • 02:00 - 02:30 of 0.01, some will have a value of 0.02, some will  have a value of 0.03, there will be values of 0.1   0.2 and so on all the way up[/down] through plus[/minus] four.  Let's pretend the distribution ends there - there won't be very many of those more extreme values  in fact it's less than .13 percent. So what we're looking at here is the  frequency with which those values occur.
            • 02:30 - 03:00 a value of zero is going to be the most commonly  occurring marble so if I put let's say 10 million   of these marbles with these values  in appropriate proportion to this normal   distribution in what, I guess would be a very large  barrel, mix them up and drew one at random, what   number would I get? Well, we'll be talking  about probability in the next part of the lecture,
            • 03:00 - 03:30 but if we think about it, values when there are  more of a particular value within a distribution,   I am more likely to draw a common value that's  most common. So there are more values   of 0 and 0.01 and 0.02 than there are of values of  0.38 and 0.39 and so forth there are very few of   these higher values so those values aren't likely to come  up very often. It doesn't mean they won't - the first
            • 03:30 - 04:00 marble I pick could be one of those, but on average  I'm going to pick more marbles that have values   closer to the center of this distribution simply  because there are more of those marbles in the bag.   So that's one way to imagine what the  probability density function is telling   us. It's telling us about how likely  different values are to be sampled.
            • 04:00 - 04:30 Ideally, we use random sampling to take samples  from populations and those samples allow us to   make generalizations to or inferences about that  population. By understanding the likelihood of   a given value in a distribution, that allows us to characterize  the sample and therefore the population more   accurately. So it tells us something about the  relationship between the value and the population.   For example, is the value very far away from  the population mean? How often is that value likely
            • 04:30 - 05:00 to come up? So that's what these probability  density functions or distributions do for us. There are lots of different types of distributions  that we use regularly in Psychology. We use the   uniform distributions on a regular basis. We use  binomial distributions on a regular basis. We use   normal or gaussian distributed data  or variables on a regular basis, and so we'll   be talking about these particular distribution  types today, but please know that when we talk
            • 05:00 - 05:30 about specific statistical tests in the second  part of this class, we will also be talking about   distributions that are unique to those statistical  tests. Each statistical test has its own   distribution and we will talk specifically about  those distributions when we get to those tests.   For now we're going to concentrate on three  common distribution types. The first one is   what we call a uniform distribution. So you might  get a uniform distribution if all of the outcomes
            • 05:30 - 06:00 that you have are bounded, and you know what they  are, and they're all equally likely. For example,  the role of a single die, a six-sided die let's say.  We know that you can get a one, or two, or a three,   or a four, or a five, or six. you can't get anything  else there's no seven there's no zero there's no   16. You can only get the values one through six.  And you also know, assuming the the die is fair,
            • 06:00 - 06:30 that each one of those values is equally likely  to be rolled. If you are entering into a   single entry raffle or prize draw, your number  is just as likely to come up as everyone else's is.   In uniform distributions, there are a  discrete finite number of outcomes that can occur.   We can also have uniform distributions  that are continuous, these are much more
            • 06:30 - 07:00 rare, that can have an infinite number of outcomes where the distance, for example, one good example is  the distance between two points. You can have   a continuous number. if you're  thinking about you can roll a   ball down the street. How far does the ball  go between two points? There are an infinite   number of outcomes because you can keep slicing  the space into smaller and smaller and smaller
            • 07:00 - 07:30 values. That's a continuous distribution and  depending on the role of the ball and how   hard you rolled it it can land in any portion of  your sample space, and all positions can be equally likely. A binomial distribution is a distribution  that is that comprised of the probability   of win or lose outcome in an experiment  that's repeated multiple times. So this
            • 07:30 - 08:00 prefix 'bi' here in binomial, 'bi' means 'two'  possible outcomes. So this might be a   coin toss, this might be whether you  throw a dart and it hits or misses the target. The number of observations in these experiments  must be fixed. So an experiment must be repeated   x times, where that x is  specified beforehand. so you can   think about an experiment that has 50 trials  and on each trial participants will either
            • 08:00 - 08:30 get it right or they will get it wrong. So  that would be a binomial distribution. If we think about one example of this is  a coin toss. So you have a 50% chance of   getting heads in a single coin  toss, and if you toss   a coin 20 times you have close to a 100 chance  of getting at least one heads in those 20   tosses of that coin. Importantly one of the  important things about a binomial distribution
            • 08:30 - 09:00 is that all the trials are independent. So what  happens on trial one is totally unrelated to   what happens on trial two. So the probability of  a win is identical from one trial to the next and   that is how we get a binomial distribution. You  will notice that these are discrete points that   are distributed across, in sort of a normal  fashion, across the potential outcomes here.
            • 09:00 - 09:30 And then finally, we have our standard  normal distribution. This is the one   we're going to deal with the most often.  you've seen this picture before, it's also   known as a Gaussian distribution after the  German mathematician, Gauss. It is perfectly   symmetric so there are exactly the same number  of data points on the left of the central mean   as there are on the right, and it's described by  a mean of zero and a standard deviation of one.
            • 09:30 - 10:00 now of course we can have normally distributed  data that have different means and different   standard deviations because we're measuring  something different than this kind of standard   Gaussian distribution, but in general a normal  distribution is going to have similar proportions   of data distributed across the each standard  deviation boundary. As you see in the normal   distribution plot, there will be more scores clustered  around the mean and as you get further away from   the mean there are fewer scores there. Most of  the variables we deal with in Psychology are
            • 10:00 - 10:30 normally distributed. One of the reasons for that  is they rely on the influence of lots and lots   of smaller variables and when different variables  influence different things in different ways, what   you end up with is kind of this hodgepodge  where most people are closer to the mean and   fewer people are farther away from it. And that  gets you what is essentially a normal distribution.   Now there are two other parameters that we  need to talk about when we're thinking about
            • 10:30 - 11:00 distributions. One of those is a parameter called  skewness. You can have positive skew and you can have   negative skew. So what you see here  is, here's our symmetrical distribution and   it's got a specific characteristic:  the mean the median and the mode are all   identical values they all line up right  on top of one another or pretty darn close.   When we have positive skew, the mode tends  to be lower than the median which in turn tends
            • 11:00 - 11:30 to be lower than the mean. So that usually means  there's a some high scores in these distributions.   A classic positively skewed distribution is the  distribution of income, right? Most of us don't make   that much money, but there are some people who make  a lot, a lot, a lot of money and that skews the data.   We can also have the opposite type of skew,  which is negative skew, where very few people   have very low scores and then as we get higher  up the the number line here, we have more people
            • 11:30 - 12:00 clustered around higher points. So in  this case, when we have negative skew, the   mean and the median are both lower than  the mode and that indicates negative skew.   The other parameter we need to think about is a  parameter called 'kurtosis' and that's a measure   of how heavy the tails in a distribution are. There  is the symmetrical normal distribution. The normal
            • 12:00 - 12:30 distribution is called 'mesokurtic' and it means  it has these really nice evenly distributed   tails where there aren't too many scores in the  middle and there aren't too few scores on the end.   We can also get distributions that are 'platykurtic'.  The uniform distribution is platykurtic. It has   very light tails. Its distribution, it  has low tails and you can sort of see this if   you if you look at the colors here. This kind of  purple distribution, this one here is much,
            • 12:30 - 13:00 it has many fewer scores in the tails here, they do  not extend very far. And then we can think about a   'leptokurtic' distribution, that's this blue one  here, where it's very peaked in the center and   what you can see is that there are  more scores out in the tails then there probably   should be so that's a leptokurtic distribution.  And we'll see that, there is a test, it's a very
            • 13:00 - 13:30 famous and frequently used test statistic called  the t-test that has a leptokurtic distribution. So in general, the kurtosis is a  measure of how much or how many scores   are in the tails of these distributions. So  you'll want to be thinking about that when   you're evaluating whether or not  something is normally distributed. I'll break it there and we'll pick  up the next section in the next video.