Understanding Point Estimates in Statistics

Estimation3

Estimated read time: 1:20

    Summary

    In this lecture, Erin Heerey discusses point estimates, where a single statistic is used to estimate a population parameter based on a sample. The focus is twofold: ensuring the estimator is unbiased and considering its efficiency. An unbiased estimator averages out to target the population parameter over numerous samples, e.g., the sample mean for the population mean. Efficiency addresses the variability of the estimator in skewed distributions, where the median might be more efficient than the mean due to lower standard deviation. The issue with point estimates is their perceived precision, which leads to a false sense of accuracy since samples rarely match the true population parameter precisely. Further solutions for this issue will be shared in the next video.

      Highlights

      • Point estimates offer a single statistic to represent a population parameter based on a sample. 🔍
      • Unbiased estimators average out to the population parameter, balancing over and under-estimations. ⚖️
      • An estimator might be chosen based on efficiency, particularly in skewed distributions, where the median can be more reliable than the mean. 📊
      • The standard deviation is a biased estimator, more likely to underestimate than overestimate the population standard deviation, tackled by Bessel's correction. ✂️
      • Point estimates, though appearing precise, rarely match the true population parameter precisely; a solution will be discussed in the next segment. 🔮

      Key Takeaways

      • Point estimates use a single statistic from a sample to estimate a population parameter. 🎯
      • Unbiasedness ensures estimators average out to target the population parameter. It seeks balance in over/underestimation over many samples. ⚖️
      • Efficiency in estimators involves selecting one with lower variability, especially in non-normal distributions. 📉
      • Point estimates often seem overly precise, leading to a false sense of accuracy. Estimators vary across samples. 🔍
      • The standard deviation as an estimator is often biased towards underestimation, corrected using Bessel’s correction. 📏

      Overview

      In this engaging lecture, Erin Heerey dives deep into the world of point estimates in statistics. These estimates utilize a single statistic from a sample to infer details about a larger population, aiming to find the most representative 'point' to anchor your findings. Examples like using the sample mean to estimate the population mean are common examples in practice.

        Unbiasedness and efficiency are two pivotal concepts discussed here. While an unbiased estimator helps avoid systematic error by ensuring your sample estimates closely target the true population parameter over multiple trials, efficiency involves selecting the most reliable measure, particularly in datasets with non-normal distributions. This might involve choosing the median over the mean to counteract variability brought by outliers.

          Heerey also underscores a core issue with point estimates: their deceptive precision. While these figures might appear accurate, they rarely hit the bullseye of the true population parameter because samples vary greatly, leading to a false sense of certainty. As a teaser, she assures that solutions to enhance estimate accuracy are on the horizon in the subsequent part of the series.

            Chapters

            • 00:00 - 00:30: Introduction to Point Estimates In the section titled 'Introduction to Point Estimates,' the lecture focuses on explaining the concept of point estimates. It defines a point estimate as a single statistic used to infer a population parameter based on a sample. The term 'point estimate' arises from using a single value or point to best represent the population value being estimated. An example given is estimating the population mean (mu) by calculating the sample mean.
            • 00:30 - 01:00: Choosing the Best Estimator This chapter discusses how to choose the best estimator when relating a sample to a population. It emphasizes that the ideal estimator is the one that provides the closest estimate to the true mean or central point of the population. In Psychology, the sample mean is commonly used as an estimator of the population mean (mu), but alternatives such as the median or mode can sometimes be better choices depending on the context.
            • 01:00 - 01:30: Key Properties of Estimators This chapter discusses key properties to consider when choosing the most accurate estimator for a population parameter. The focus is on the property known as 'unbiasedness', emphasizing the importance of not introducing bias into the estimation process.
            • 01:30 - 02:00: Unbiasedness in Estimators The chapter discusses the importance of minimizing bias in statistical models. An estimator is deemed unbiased if, on average, it produces estimates that accurately target the population parameter. This implies that the average value of the estimator aligns with the population parameter.
            • 02:00 - 02:30: The Role of Random Sampling The chapter discusses the importance of randomness in sampling. It highlights how random sampling aims to minimize bias in estimating population parameters. By selecting samples randomly, we ensure that there's an equal chance of overestimating or underestimating the population mean, thus providing a balanced and unbiased estimate.
            • 02:30 - 03:00: Understanding Sampling Distribution In this chapter, we discuss the concept of sampling distribution, focusing on the behavior of an estimator in relation to the population parameter. The essence is that while an estimator may vary slightly from the true population parameter in individual samples, across many samples, it tends to average out to approximate the population parameter. This was illustrated with slides in a previous lecture that depicted the sampling distribution of the sample mean, providing a visual understanding of how sampling from a normal distribution behaves over the long run.
            • 03:00 - 03:30: Biased vs. Unbiased Estimators The chapter 'Biased vs. Unbiased Estimators' discusses the concept of sampling distributions and how sample means converge on the population parameter. The focus is on the mean of the sampling distribution, where different sample sizes (e.g., 3, 5, 10, or 20) are considered. Observations show that on average, the mean of the sampling distribution aligns with the population parameter, albeit with some variability where it might be slightly higher or lower.
            • 03:30 - 04:00: Standard Deviation as a Biased Estimator This chapter explains the concept of a biased estimator, particularly in the context of standard deviation. It differentiates between unbiased and biased estimators, indicating that a biased estimator tends to consistently overestimate or underestimate the parameter rather than accurately estimating it. Unlike unbiased estimators, which have an equal likelihood of overestimating or underestimating, biased estimators show a propensity to err consistently in one direction.
            • 04:00 - 05:00: Correcting Bias in Standard Deviation The chapter titled 'Correcting Bias in Standard Deviation' discusses the concept of bias in statistical estimators. It explains that an estimator is unbiased if it is equally likely to overshoot or undershoot the population parameter. However, if the estimator tends to fall more often on one side of the true population parameter, it is considered biased. The chapter specifically highlights that the mean is an unbiased estimator as it accurately represents the population mean (mu).
            • 05:00 - 06:00: Explanation of Bessel's Correction This chapter explains Bessel's Correction, which is used to correct the bias in the estimation of a population's standard deviation. The transcript discusses how using the standard deviation as an estimator is biased because it tends to underestimate the population standard deviation. The explanation touches on the distribution of the sample mean and how it sits in the middle with a normal distribution around it, hinting at the frequent undershooting of true variability in the population.
            • 11:00 - 11:30: Introduction to Efficiency in Estimators The introduction to the concept of efficiency in estimators begins by highlighting a common issue with estimators where they tend to underestimate rather than overestimate. This behavior is supported by an experiment from the previous week's lab, where a sampling distribution of sample standard deviations was constructed. The outcome showed a tendency for the sample standard deviation to be slightly less than the actual population standard deviation. This bias exists because the standard deviation, on average, is a biased estimator. Efforts are made to correct this bias during calculations.
            • 11:30 - 12:00: Mean vs. Median as Estimators The chapter covers the use of Bessel's correction when calculating the standard deviation of a sample. It references previous lectures on descriptive statistics, particularly focusing on the formula for variance and standard deviation. The chapter emphasizes the importance of understanding these corrections and how they relate back to initial teachings in the course.
            • 12:00 - 12:30: Efficiency and Skewed Distributions The chapter discusses the differences in formulas for calculating population variance (Sigma squared) and sample variance. It highlights that the population variance uses the total number of individuals in the population as its denominator, while the sample variance uses n minus 1 as its denominator. The rationale behind this adjustment in the sample variance formula is explained in the chapter.
            • 14:30 - 15:00: Problems with Point Estimates The chapter titled 'Problems with Point Estimates' discusses the issue of standard deviation often underestimating the population parameter sigma. To correct for this tendency, the formula is adjusted by using n minus 1 in the denominator, which is noted not to be related to the concept of degrees of freedom. The reason for this adjustment is to address the issue of frequently undershooting, rather than overshooting, the true population parameter.
            • 15:30 - 16:00: Conclusion The conclusion chapter revisits the concept of sampling distributions and emphasizes the importance of understanding score distribution, with a focus on the normal curve. It highlights that approximately 68% of scores fall within one standard deviation either above or below the mean, demonstrating a key characteristic of the normal distribution.

            Estimation3 Transcription

            • 00:00 - 00:30 In the next part of the lecture we're going  to talk about what we call "point" estimates.   So this is the use of a single statistic to  make an estimate about a population parameter   based on a sample. These are called Point  estimates because they are based on a single   point or a single value that best represents  the population value that you're estimating.   For example we often estimate mu, or the mean  of the population, by taking the sample mean.
            • 00:30 - 01:00 And the best estimate, as we're relating a sample  to a population, is the one that's going to get us   the closest estimate to the true mean of that  population, or the true true central point.   Now often, in certainly in Psychology,  our estimator of mu is the sample mean.   But we could also pick the median or even the  mode. Sometimes the median is actually a better
            • 01:00 - 01:30 estimator for the population median than the  sample mean is. So what we're going to do,   is look at how we choose. So there are two key  properties of estimators that we need to consider   if we're thinking about choosing the best, most  accurate estimator of a population parameter.   The first property that we need to consider  is a property call that I'm calling here   'unbiasedness'. An estimator is unbiased..., so  remember we don't want to introduce bias into
            • 01:30 - 02:00 our statistics or we want to do that as little  as we can (there's always some that creeps in)   but we want to minimize that level of bias in our  statistical models. So an estimator is unbiased if   on average it produces estimates that target  the population parameter. That means that the   average value of the estimator is  equal to the population parameter.
            • 02:00 - 02:30 Now we also talked about randomness in  sampling. If I randomly sample you who,   might be extroverted, and I don't randomly sample  your friend who's introverted, that might change   the value of my sample mean or my estimator for  a population parameter but the idea here is that   because we're selecting randomly (we want to  pick the most unbiased estimate), our random   sample should be as likely to overestimate the  population mean as it is to underestimate it. So
            • 02:30 - 03:00 the estimator is sometimes a little bit greater  than the population parameter and sometimes a   little bit lower than that population  parameter, but over a lot of samples,   over the long haul, on average it works out  to be similar. And we saw that a little bit   when we talked about the sampling distribution  of the sample mean. So if you remember back to   the slides from that prior lecture, I showed you  a picture where we were sampling from a normal
            • 03:00 - 03:30 distribution and then there were sample sizes  of 3, 5, 10, I think somewhere like 20 maybe,   and what you noticed was that on average the  mean of the sampling distribution, when it was   plotting means not individual participants,  but when it was plotting means, on average   the mean of the sampling distribution  of sample means really did converge on   the population parameter. Sometimes it  was a bit higher sometimes it was a bit
            • 03:30 - 04:00 lower just due to pure random differences in  sampling. But that is an unbiased estimator.   A biased estimator is more likely to either  overestimate the parameter or to underestimate   it than to get it right, so it's more likely to  fall on one side of this equation it's not equally   likely to overestimate as to underestimate it. So  the mean is equally likely to overestimate as to
            • 04:00 - 04:30 underestimate the population parameter because  it's equally likely to fall on on either side   it and is unbiased. If however your  population estimator is more likely to   land on one side than the other of the  true population parameter, then it is   biased and there is a systematic difference  between the estimator and the parameter. So the mean, as I said is an unbiased estimator  because it overshoots the population mean mu as
            • 04:30 - 05:00 often as it undershoots it, right? That's why  we get that mean [of the DOSM] falling right in   the middle with this nice symmetric normal  distribution around it. Now interestingly,   the standard deviation is what we call a  'biased' estimator because it is more likely   to underestimate sigma the population standard  deviation than it is to overestimate sigma.   That doesn't mean you'll never overestimate  sigma. It just means that on average you're
            • 05:00 - 05:30 more likely to underestimate than  to overestimate it and in fact, in last week's lab, you built a sampling  distribution of sample standard deviations   and if I were betting, my bet is that it was  slightly less than the population standard   deviation [in the Monte Carlo simulation],  and that's because the standard deviation   is on average a biased estimator. Now we do our  best to correct that bias when we calculate the
            • 05:30 - 06:00 standard deviation of a sample; and we do that by  using a correction called Bessel's correction. You   also saw this in the lab. We used that in the  denominator of the standard deviation formula.   So if you think back all the way to the very  first lecture, the very first material that you   encountered in the class when we were talking  about descriptive statistics, I showed you a   formula for the variance and then subsequently  the standard deviation, and what you'll notice   was that I showed you a formula for sigma and I  showed you a formula for the standard deviation
            • 06:00 - 06:30 and both of those, they had a slightly different  denominator, right? The formula for Sigma or Sigma   squared really, because we were talking about the  variance, the formula for Sigma squared used the   number of individuals within the population  as its denominator and the formula for the   the sample variance, that one included n minus  1 in its denominator, and I said I would explain
            • 06:30 - 07:00 that in a future lecture. We're there now! So  because the standard deviation is more likely   to underestimate sigma than to overestimate it  we correct that formula by using n minus 1 in   the denominator. This is not degrees of freedom!  It is a correction for the fact that we undershoot   more often than we overshoot the true  population parameter why do we do that?
            • 07:00 - 07:30 Well, again I want you to think back to the  sampling distributions lecture, and think   about how the scores are distributed. So we know  if you look at the picture of the normal curve   that you've already seen a bunch of times in this  class, we know that about 68 of of individuals,   68 of scores, fall within one standard deviation  either above or below the mean. So within one   standard deviation of the means it is 68 of  the sample; within two standard deviations
            • 07:30 - 08:00 the mean we're now getting into the 96 ish  percent domain and it goes out from there.   So there are fewer more extreme scores in  our sample so we are less likely to select   them. And that doesn't mean you won't, right?  You sometimes will. But if you happen to get   a 'normal' pick what you'll end up with is scores  that are much closer to the true population mean
            • 08:00 - 08:30 and that means you are likely to underestimate  the population standard deviation because you're   really not sampling those outliers with the same  probability that you're sampling the scores that   are closer to the mean. So the farther away the  score is from the mean the fewer of those scores   exist in your population and the less likely you  are to come up with one by pure random chance.
            • 08:30 - 09:00 So go back and think about your M&Ms,  the pretend M&Ms that we made in lab,   when you were sampling. If you were to think  about selecting one of the rare ones in your   sample... So you had a sample of 25. What if you  were to select something that only came up once   and if you had one of those in your sample, its  probability of being selected would have been much   lower than the probability of an M&M that occurred  more frequently. So the same thing is happening   here, because there are some scores where  there are just more of them in the population.
            • 09:00 - 09:30 You're more likely to sample those  scores and as a result these outliers,   these scores that are farther from the  mean, those are less likely to come   up [because there are fewer of them  in the population], and that means   that the standard deviation is likely to be  smaller than the true standard deviation in   the population. It's more likely to underestimate  that quantity. So we have to correct it in the
            • 09:30 - 10:00 formula. It's not a perfect correction, by the  way. When we take our we take our sample standard   deviation or our sample variance formula, we  calculate the sum of the squared deviations   and then we divide by n minus one. It's not  a perfect correction but it's closer, right?   It's closer to what it should be than we  would get if on average we used n in our
            • 10:00 - 10:30 denominator. And what you can do if you're not  sure why we're using n minus 1, is take a number   that you can pretend it's the variance divided  by a number n, and then divided by n minus 1 and   see what happens to it. It should, the outcome of  your division operation with the numerator divided   by the denominator, should be just a little bit  bigger when we decrease the denominator, right? If   you think about taking half a pizza or how about  a third of a pizza? If you take one over three
            • 10:30 - 11:00 [a third], if you divide by one over two, which  is n minus one, you get a little more pizza right?   So the same thing is happening here. We're  reducing the size of the denominator and   that increases the outcome of that calculation. So  that's the idea of unbiasedness. If the estimator   is unbiased then it produces estimates of a  target population parameter that are, on average,
            • 11:00 - 11:30 equal to that population parameter. If a statistic  is biased, then it tends to either underestimate   or overestimate the true population parameter and  we're aiming to have these unbiased estimators of   the population. Now I I said, a couple slides ago,  that the mean was our most common estimator for   mu, the population mean. And that's sometimes true  but sometimes the median makes more sense. Here's
            • 11:30 - 12:00 why, so we have another element that we need  to think about when we're thinking about how   to take a sample and estimate a population,  and that's an element we call "efficiency".   This is about the sampling variability of  an estimator. Now let's pretend you have   a skewed distribution or a non-normal data  distribution. In that case the median might   be a better estimator than the mean because  it's less sensitive to outliers. So think
            • 12:00 - 12:30 about a population or a variable like income,  right? We've used this one as an example before.   And if you think about income, most of us earn  some money but there are a few people out there   who earn a lot of money - the Elon Musks of the  world. There are some people out there who whose   scores really genuinely and quite substantially  inflate the estimator when we calculate the mean.
            • 12:30 - 13:00 However, when we calculate the median, those  inflated scores are higher than the median but   there, the degree to which they're inflated is not  as significant - it doesn't cause a significant   problem when we're calculating the median. And  so in distributions like that, we might think   about the median as a more efficient estimator  because it's sampling distribution may actually
            • 13:00 - 13:30 have a lower standard deviation, and maybe less  sensitive to outliers. So in a highly skewed,   non-normal data distribution one of  the things you can think about is that   if you calculate the sampling distribution  of a median (rather than the mean)   by taking random samples from a  population that's highly skewed,   if we have two competing estimators that are  both unbiased, and the mean and the median are
            • 13:30 - 14:00 both unbiased, the one with the smaller variance  for a given sample size, we call that one 'more   efficient'. So if we have two estimators, and  their variation looks like this versus like this,   the one with the the smaller variance here, this  green one, Estimator 2 will be more efficient and   its sampling distribution will be more tightly  clustered around the true population estimate.
            • 14:00 - 14:30 So we don't always use the sample mean to estimate  the population mean. Sometimes we use the sample   median, especially when we have a lot of outliers  or when our distribution is highly skewed,   because in that case the median will be a  more efficient it will sample with greater   precision because it has a lower standard  deviation so that's the idea with efficiency.
            • 14:30 - 15:00 Now there's a little problem with point  estimates. We make point estimates from   single samples and we assume these things are very  precise. I calculate a mean, maybe it's 24.76,   and that seems like a really precise number  and it gives us a false sense of the precision   with which we can estimate a true population  parameter. But as we know and as we've seen,   single samples almost never give  exactly the true population parameter   it's easy to miss this. It's like taking your  fishing rod and casting it in and seeing if
            • 15:00 - 15:30 you get any hits. And random sampling we  know leads to variability in the sample   makeup and therefore the estimator varies from the  quantity it should estimate. We almost never get   an exact estimation of our population mean,  for example if that's what we're using.   We almost never get an estimate of the  population mean that is exactly the same   as the true population mean, which by the  way we never actually know what that is.
            • 15:30 - 16:00 So point estimates they seem very  precise, but in reality they can   vary substantially from one sample to the next  and that's the problem with point estimates.   There's a solution to that problem. We're  going to talk about that in the next video.