Decoding the Mysteries of Single Sample Tests

SingleSample1

Estimated read time: 1:20

Summary

In this lecture, Erin Heerey delves into the intricacies of single sample tests, focusing primarily on z-tests and t-tests. The discussion begins with a revision of the logical foundations of hypothesis testing within inferential statistics, contrasting it with descriptive statistics. Heerey emphasizes the limitations of statistical data in assessing the validity of study designs or the truth of a theory, cautioning against equating statistical significance with practical significance. Through illustrative examples, such as the effects of melatonin on sleep, the lecture highlights the nuances of interpreting p-values and the importance of understanding effect sizes and statistical power. Heerey concludes by underscoring the need for judicious interpretation of statistical results and prepares to discuss single sample tests in the following session.

Highlights

Erin Heerey kicks off with a review of inferential and descriptive statistics, setting the stage for understanding z-tests and t-tests. 🎓
Exploration of how statistical tests can guide decisions, but can't tell us about flaws in study design or evidence of absence. 🤔
Discussion about the implications of p-values and the potential pitfalls of drawing conclusions based solely on statistical significance. 🚩
A funny example with melatonin highlights the difference between statistical and practical significance. 🛌
Heerey wraps up by discussing the importance of effect size and statistical power, hinting at their deeper dive into subsequent lectures. 📚

Key Takeaways

Inferential statistics help us make predictions about a population using sample data, but they can't judge study design or theoretical soundness. 🔍
P-values indicate the probability of observing your data assuming the null hypothesis is true, but they don't tell whether the null hypothesis is indeed false. 🎲
Statistical significance doesn't necessarily equate to practical significance. Always consider the real-world importance of your results. 🌍
Effect size provides a measure of the practical significance of a result, independent of sample size. 📏
Statistical power, influenced by effect size and sample size, indicates a study's ability to detect an effect. 📈

Overview

In this informative lecture, Erin Heerey takes us through the fundamental concepts of single sample tests, specifically focusing on z-tests and t-tests. By revisiting inferential statistics, Heerey contrasts them with descriptive statistics to lay a foundational understanding necessary for comprehending hypothesis testing. With his engaging approach, Heerey not only explains the technical details but also the logical reasoning behind these statistical methods.

Heerey proceeds to explore the broader implications of p-values, emphasizing that while they offer insight into the likelihood of observing certain data under a null hypothesis, they do not confirm or deny the hypothesis itself. She humorously illustrates the gap between statistical and practical significance with an example on melatonin's effect on sleep, encouraging a more nuanced view of statistical outcomes. This discussion serves as a reminder that statistics alone cannot inform us about the practical applicability of results.

The lecture wraps up with an insightful discussion on the significance of effect size and statistical power, both crucial for understanding the practical implications of a study. Heerey cleverly integrates these concepts, preparing the audience for the next session's focus on single sample tests. These insights ensure that viewers are equipped to critically evaluate statistical results beyond conventional thresholds of significance.

Chapters

00:00 - 00:30: Introduction to Single Sample Tests The chapter 'Introduction to Single Sample Tests' discusses the basics of conducting single sample tests, emphasizing z-tests and t-tests. It starts with a brief review of the logic of hypothesis testing, highlighting the distinction between inferential and descriptive statistics.
00:30 - 01:00: Descriptive vs Inferential Statistics Descriptive statistics focus on the properties of observed data. They provide factual insights that can be discovered from a sample. This includes distributions, measures of central tendency, and measures of dispersion or spread. These facts allow for making inferences about the properties of the underlying population distributions.
01:00 - 01:30: Hypothesis Testing and Statistical Decisions The chapter introduces the concept of hypothesis testing and its role in statistical decision-making. It explains that hypothesis testing involves evaluating ideas concerning population parameters using sample data. The chapter also discusses deriving estimates about population characteristics based on observed sample information, emphasizing the role of inferential statistics in making inferences about a population.
03:00 - 03:30: Understanding P-Values The chapter "Understanding P-Values" emphasizes the limitations of relying solely on statistical calculations when interpreting research findings. It highlights that while numbers and statistics can be computed, they cannot address critical concerns such as study design flaws, data collection appropriateness, sample representativeness, or the validity of the underlying theory. Therefore, statistical results should not be the only factor considered in evaluating a study's conclusions.
03:30 - 04:30: Type 1 and Type 2 Errors In this chapter, the focus is on understanding the nuances between Type 1 and Type 2 errors in statistical decision-making processes. It emphasizes the importance of distinguishing between 'absence of evidence' and 'evidence of absence.' The transcript highlights that while statistical tools can aid in decision making, they are not definitive and should be supplemented by personal judgment.
05:30 - 06:00: Statistical vs Practical Significance In the chapter 'Statistical vs Practical Significance,' the discussion centers around the interpretation of results in hypothesis testing. It is noted that retaining or failing to reject a null hypothesis does not necessarily mean the research hypothesis is false. The chapter debates what should constitute evidence, whether a result truly indicates the existence of a phenomenon, and the significance of null results. These considerations are crucial beyond simple statistical tests.
08:00 - 09:00: Effect Size and Sample Size The chapter 'Effect Size and Sample Size' discusses the interpretation of p-values in statistical analysis. It explains that a p-value does not indicate the probability that the null hypothesis is true given the data. Instead, it describes the probability of observing the given statistic if the null hypothesis were true. Rejecting the null hypothesis, therefore, does not confirm its falsity.
10:00 - 11:00: Statistical Power The chapter discusses the concept of statistical power in hypothesis testing. It explains that statistical power is the probability of correctly rejecting a false null hypothesis, thus avoiding a type 2 error. The chapter emphasizes that even if the type 1 error rate is well controlled, the type 2 error rate might still be high due to various reasons, including poor data quality. Additionally, it clarifies that failing to reject the null hypothesis does not necessarily mean it is true, highlighting the intricacies and challenges of hypothesis testing.

SingleSample1 Transcription

00:00 - 00:30 in this lecture we're going to be talking about single sample tests before we get started with the two single sample tests we'll talk about z-tests and t-tests we'll just run over a couple of elements of a review of the logic of hypothesis testing for starters we have inferential statistics and we've talked about this before and I'd like to discriminate inferential and descriptive statistics a
00:30 - 01:00 little bit more clearly descriptive statistic descriptive statistics concern the properties of observed data so these are actual facts that we can discover from a particular sample we've taken these include distributions they include measures of central tendency they include measures of dispersion or spread and these facts that we discover about a specific data set allow us to make inferences or infer the properties of the underlying population distributions
01:00 - 01:30 that give rise to these specific facts about in within a sample so this includes hypothesis testing so when we test hypotheses we are testing ideas about population parameters they also include deriving estimates so we are deriving estimates about what it is that's going on in a particular population based on what we've seen or observed in our sample so that's inferential statistics
01:30 - 02:00 but what I would like to remind you of is that hypo you can crunch all the numbers you want but those numbers cannot tell you whether the design of a study is flawed whether the data were appropriately collected whether your sample is representative of the population from which you drew it whether your theory is true or false so remember no matter what statistics you compute they won't tell you about any of these
02:00 - 02:30 other things they will allow you to make a statistical decision not only a statistical decision the rest of your decision needs to come from you I'd also like to remind you that absence of evidence is not the same as evidence of absence so evidence of absence is evidence that suggests that something does not exist however just because evidence for a particular phenomenon is missing does
02:30 - 03:00 not necessarily mean that it doesn't so when you go ahead and retain your null hypothesis or fail to reject your null hypothesis that does not necessarily mean that your research hypothesis is false there's also debate about what should be considered evidence whether a result really describes the existence of a phenomenon and what is the meaning of null results so these are all things that we need to think about and consider and that a simple statistical test
03:00 - 03:30 cannot tell you by way of review of p-values a p-value does not tell you the probability that the null hypothesis is true given the data does tell you the probability of obtaining data or specifically the statistic you observed given that the null hypothesis is true so rejecting the null hypothesis does not mean that that null hypothesis is false
03:30 - 04:00 it simply means that the observed data are unlikely if the null hypothesis is true you might still have a type 1 error failing to reject the null hypothesis does not mean that the null hypothesis is true you could be making a type 2 error and please understand that even when the type 1 error rate is well controlled the type 2 error rate can be high so and they and that can be high for many reasons your data might fail to
04:00 - 04:30 distinguish between your null and your research hypothesis because you have a small effect size maybe there's low Precision in your measurement maybe you have a lot of measurement noise maybe you have low power and so forth so there are lots of reasons why you can get a type 2 error and they're not very well controlled I will also remind you that we consider a p-value of point of less than 0.05 statistically significant and a p-value that is not less than 0.05 not
04:30 - 05:00 statistically significant and that means that you are drawing an artificial boundary between a p-value of 0.049 and a p-value of 0.051 which give merely the same strength of evidence against the null hypothesis both of those p-values tell you that a result is promising but not yet conclusive so that means we can't really make firm conclusions in either direction it would be absolutely wrong to fully discount P equals 0.05 raw five P equals
05:00 - 05:30 0.051 it would be equally wrong to bet on the replicability of a result where P was equal to 0.049 so you need to remember that P equals 0.05 is an arbitrary threshold that is a cut off that people have decided is good enough but bear in mind that when you have p-values that are close to that threshold they might be inconclusive they might
05:30 - 06:00 make it harder to distinguish from a result and so what should you do if you get that actually a replication study is not the worst idea we should also talk about statistical versus practical significance just because a result is statistically significant does not mean that it is important statistical significance means that it is unlikely that that result is due to chance alone does that mean it has some kind of real world significance or value well let's
06:00 - 06:30 take this example article here this is a meta-analysis which is sort of a statistical analysis of previous studies on melatonin for the treatment of primary Sleep Disorders lots of people take melatonin they try to self-trade their sleep disorders lots of doctors prescribe melatonin especially naturopaths but it turns out that in randomized controlled trials which are double-blinded melatonin supplements
06:30 - 07:00 versus placebo doesn't actually do a heck of a lot melatonin supplements on average decrease sleep latency so how quickly you fall asleep by about seven minutes the total sleep time is increased by approximately eight minutes when you are taking melatonin well it sounds kind of expensive I don't know if anybody takes this but if you do please bear in mind that even though there is a statistically significant result this is a statistically
07:00 - 07:30 significant red uction in sleep latency and a statistically significant increase in sleep time but it's not very big seven minutes falling asleep seven minutes earlier sleeping for eight minutes longer think about whether that's worth it so it turns out that just because something is statistically significant doesn't make it particularly meaningful
07:30 - 08:00 and that's another thing that your statistical test can't tell you you have to decide that on your own so the effect size of a test gives us a standardized measure of its practical significance the idea there is if you get gain an effect size measure that effect size should give you an approximate reasonable metric for understanding how important that effect is out in the real world for example that might be What's the
08:00 - 08:30 magnitude of the difference between a comparison group and a Target group what's the strength of a correlation between an X variable and a y variable the effect size becomes especially important when sample sizes are very small or very locked because the sample size affects affects the p-value relatively quite a lot so if you have a very small sample size even a large effect size might not reach statistical significance and conversely if you have a very large
08:30 - 09:00 sample size pretty super small effect sizes can result in the rejection of the null hypothesis now unlike the p-value the effect size is not affected by sample size and I say not with an asterisk because in theory it's not affected by sample size but there's still a little indirect element of sample size in the computational formula so you kind of need to take this with a sort of partial grain of salt and finally we have this notion of
09:00 - 09:30 statistical power so power we've said before is the probability that a study will correctly reject its null hypothesis when the null hypothesis is false so this is the likelihood of detecting an effect if there's one out there to detect power is related to the significance level so the the more stringently you set you set your Alpha Criterion so if you take Alpha of 0.05 and switch to Alpha of 0.01 because you really don't want to make a type 1 error that's going to
09:30 - 10:00 increase the likelihood of your making a type 2 error and decrease your statistical power effect size the greater your effect size the greater your statistical power because you can imagine when you move the means of two distributions further apart there's less overlap between them they become more distinguishable more easily distinguishable and that's what we're talking about when we're talking about effect size and then finally sample size is larger samples
10:00 - 10:30 provide greater power to detect statistical effects so we'll move on to single sample t-tats or single sample tests rather in the next video