Unpacking Single Sample Tests

SingleSample2

Estimated read time: 1:20

Summary

In this engaging exploration of single sample tests, Erin Heerey delves into the methodology and reasoning behind comparing a single sample group's mean against a theoretical value. Through vivid examples, like testing the protein content of bars or assessing the IQ levels of Western students, Heerey draws a clear path on how researchers identify population parameters, state null and research hypotheses and discuss the outcomes through calculated statistics like z-scores. She elaborates on concepts such as effect size with Cohen's D, signalling the extent of differences relative to standardized deviations. Heerey emphasizes running these tests pre-data collection to ensure robust conclusions, always considering normal distribution assumptions and potential artifacts in small datasets.

Highlights

Single sample tests measure mean differences against theoretical values. 📊
These tests are crucial for checking claims, like a protein bar's advertised content. 🍫
The classic z-test helps determine if a sample's IQ differs from the population mean. 📉
Through Erin's talk, we learn about calculating z-scores and effect sizes. 📐
Understanding normal distributions is key to interpreting single sample test results. 📎

Key Takeaways

Single sample tests compare a group's mean to a theoretical or known mean. 📊
They're useful for hypothesis testing against population parameters, like IQ scores. 🧠
Z-scores and effect sizes help quantify these differences statistically. 📈
Normal distribution aids in defining critical values for comparison. 🔄
Single sample tests need careful setup pre-data collection for validity. ✅

Overview

Erin Heerey captivates her audience with a deep dive into single sample tests, building the curiosity on how they work. Single sample tests are analytical methods used to assess if a group's mean is in line with a known or theoretical mean. Whether assessing the advertised protein content of a nutrition bar or statistically analyzing Western students' IQ scores, single sample tests provide the framework for such comparisons. The essence of these tests lies in hypothesis verification, evaluating predicted outcomes against reliable metrics.

Through the session, Heerey uses clear, tangible examples to illuminate these concepts. She explains the significance of parameters like z-scores and Cohen's D in quantifying how a sample differs from a population parameter. Z-scores give a snapshot of where the sample mean stands in relation to a known distribution, while Cohen's D measures the effect size, showing the practical significance of the difference observed. Heerey’s explanations simplify these statistical concepts making them accessible and understandable.

A pivotal takeaway from Heerey’s talk is the importance of planning. She places strong emphasis on pre-data collection steps, highlighting the need to determine rejection regions and hypothesis levels before diving into data. This preparation ensures that conclusions drawn from single sample tests are not only valid but also robust. Moreover, the interpretation of these tests assumes a normal distribution, making it vital to critically assess this condition especially with smaller samples. Heerey's discourse is both informative and educational, providing listeners with a solid grounding in single sample testing.

Chapters

00:00 - 01:00: Introduction to Single Sample Tests This chapter introduces single sample tests, focusing on comparing the mean of a single sample group to a theoretical comparison value. The logic involves measuring differences between a sample and a theoretical value.
01:00 - 02:00: The Hypothetical Population Mean The chapter discusses the concept of a hypothetical population mean, which may refer to a previously estimated mean from a prior sample, a target mean value, or be used to compare with a new sample. For instance, if a protein bar claims to contain 20 grams of protein per bar, this claim can be tested against the hypothetical mean.
02:00 - 03:00: Examining Protein Bar Claims The chapter focuses on the investigation of protein content in protein bars compared to the advertised claims on packaging. It describes the process of chemically analyzing and decomposing protein bars to measure the actual grams of protein in each bar across different batches. By doing this, the analysis determines whether the protein content aligns with the marketed values, noting that while the average may be around 20 grams, variations exist among individual bars.
03:00 - 04:00: Details of Single Sample Tests This chapter focuses on the concept of single sample tests, which involve examining the difference between a sample mean and a comparison value. It discusses how the test statistic is calculated using the difference (signal) and the measure of sampling error (noise) to form a signal-to-noise ratio. The objective of a single sample test is to determine whether this ratio indicates a significant difference, thereby allowing for conclusions to be drawn about the sample in relation to the comparison benchmark.
04:00 - 05:00: Using Zed or Z-test This chapter explores the concept of using a Z-test or Z-score in hypothesis testing. A single sample test involves comparing data from a sample to a known or hypothesized population parameter, such as scores or percentages. These tests are often used to determine if a sample group, like Western students, differ from a known average or population value. The chapter delves into the methodology and significance of these tests in assessing educational outcomes and other statistical inquiries.
05:00 - 06:00: Violin Plot and Test Statistic The chapter discusses the use of IQ tests that are normed to have a mean of 100 with a standard deviation that varies depending on the test (commonly 10 or 15). It proposes an experiment where a random sample of students could be administered an IQ test to compare their average score to the normative value of 100, which represents the average person.
06:00 - 07:00: Understanding Z Scores This chapter discusses the concept of Z Scores and introduces the idea of z-test as a method for comparing data. An example is given involving data collected from 132 university students who participated in a study, though the specifics of the study are not disclosed.
07:00 - 09:00: Steps in the Z Test The chapter titled 'Steps in the Z Test' explains how to interpret a violin plot used in a larger IQ test study. The data analyzed shows a mean IQ of 103.48 and a standard deviation of 8.65. An orange line is drawn at the comparison parameter of 100 on the plot, providing a visual representation of where the median is in relation to this parameter.
09:00 - 10:00: Directional vs Non-Directional Tests The chapter discusses the concept of directional versus non-directional tests in statistical analysis. It highlights the difference between the median of a sample and the population parameter it is compared against. The example provided shows a median value of about 103.5, which is slightly higher than the population parameter. The distribution of data is described, including the interquartile range (IQR), which represents the range of the data excluding outliers.
10:00 - 12:00: Conducting the Z Test The chapter 'Conducting the Z Test' begins by reviewing the concept of Z-scores, which are standardized scores that allow for the comparison of values that are otherwise difficult to compare due to differences in scale or nature. This process of standardization transforms different data sets onto a common scale, using a standard normal distribution of z-scores, which has its own set of characteristics, including a specific mean.
12:00 - 14:00: Statistical Significance and Effect Size The chapter focuses on statistical significance and effect size, explaining key concepts such as standard deviation and Z scores. It begins by elucidating how Z scores standardize data, converting raw scores into how many standard deviations they lie above or below the mean. For example, a Z score of one indicates a score is one standard deviation above the mean. The investigative process includes identifying the population parameter for comparison, albeit further steps or context were not provided. This chapter likely explores the implications of these statistical methods on interpreting data significance and analyzing effect sizes.
14:00 - 16:00: Cohen's D for Effect Size Cohen's D for Effect Size - Using an average IQ of 100 as a comparison parameter, the chapter explores the concept of null value in defining null hypotheses. The discussion focuses on determining if the average Western student's IQ is different from or similar to the null value of 100.
16:00 - 17:00: Importance of Normal Distribution The chapter describes the importance of testing hypotheses using normal distribution. It elaborates on stating null and research hypotheses, differentiating between directional and non-directional hypotheses. The non-directional hypothesis example given is that Western students do not differ from the average IQ, whereas a directional null hypothesis states that Western students are either the same as or lower than the average IQ.

SingleSample2 Transcription

00:00 - 00:30 single sample tests when we use a single sample test what we're interested in doing is we're interested in comparing the mean of a particular group a single sample to a theoretical comparison value so the logic of single sample tests is like this we measure differences between a sample and a theoretical value and we compare that sample to for example a
00:30 - 01:00 hypothetical population mean and that might also be a previously estimated population mean for example if you took a prior sample and you had a mean or a population estimate from there and you wanted to see if a new sample was the same or different or some kind of a Target value for example if someone makes the makes the claim let's say you have a protein bar and your protein bar is claiming that each bar contains 20 grams of protein well you could actually do the
01:00 - 01:30 chemical analysis and decomposition of your protein bar and you could look at what how much protein was in each protein bar and different batches will have different levels so they're probably on average 20. some of them will be a little more some of them will be a little less and that will depend so you could actually take a sample of protein bars and you could decompose them and you could measure how much how many grams of protein each bar had and then you can compare it to the Target value that was being advertised on the outside of the packaging
01:30 - 02:00 so that's what we mean when we're doing a single sample test what we're doing is we're examining the difference between the sample mean and the comparison value so our test statistic is computed as a difference in the numerator divided by a measure of the sampling error this is a classic signal to noise ratio the signal is the difference between the
02:00 - 02:30 sample mean and the comparator and the noise is the sampling error so a single sample test is a hypothesis test in which data from a sample are compared with a known or hypothesized population parameter for example scores or percentages and we often use these to compare a sample of individuals with some known score or population value so we could ask the question are Western students smarter than average well what's average
02:30 - 03:00 average might actually be the normative value on the IQ so many IQ tests are normed so that they have a mean of 100 and a standard deviation of some number maybe 10 maybe 15 it depends on the test but what we could do is we could take a random sample of Western students we could administer an IQ test and we could compare the average result to 100 which is our normative value this is the value that is the the average person
03:00 - 03:30 scores on that IQ test and the simplest comparison we could make is a comparison called The Zed or z-test so let's imagine that we have some data so these are data actually taken from 132 uwo students who responded to a mass email recruitment advertisement they took part in a study and I won't tell you what study it was because it might be still ongoing but they took an
03:30 - 04:00 IQ test as part of a larger study So Below is a violin plot of the data or at least at the time of when I talked to the researcher the mean of these data is 103.48 it has a standard deviation this data set of 8.65 so this is a violin plot of the data you can see what I've done is drawn an orange line here at the comparison parameter of 100 and you can see that the the median here
04:00 - 04:30 with this little white dot is a little bit higher it's about 103.5 ish give or take um so the median's a little bit higher than the population parameter that we're comparing it to and you can see that it has this nice distribution here's the bottom and the top the IQR here this is the the range of all the data excluding any outliers
04:30 - 05:00 so before we move on to the Z test let's review what we mean by a Zen score so Z score is what we call a standardized score standardization allows us to compare values that would otherwise be difficult to compare because they're on a different scale or perhaps because they're qualitatively different right this is like comparing apples to oranges so this standard normal distribution is a distribution of z-scores it has a mean
05:00 - 05:30 of zero and a standard deviation or Sigma of one the scores are standardized by converting them to Z scores and the and so a z-score describes how many standard deviations a score is above or below its mean so a z-score of one means that a score is exactly one standard deviation above a mean of zero so with steps in the Z test are of course we need to identify the population parameter for comparison so
05:30 - 06:00 for here we're using average IQ 100 as our comparison parameter for the population we are defining the null value and stating the null hypothesis so the null value is the value of the population parameter that we're comparing the sample to so here our null value is 100 and we're asking whether the average Western student is different or similar to 100.
06:00 - 06:30 now we can do this in a directional or non-directional way so we're going to State our null hypothesis if we are making a non-directional hypothesis then we might say Western students do not differ from the average IQ that's non-directional null hypothesis a directional null hypothesis might be that Western students are either the same as or lower in than the average IQ we also need to State our research hypotheses
06:30 - 07:00 so these can be non-directional as in Western students differ from the average IQ or directional Western students are smarter than average we could predict or hypothesize or suppose so these are our hypotheses so we have our population parameter we have our hypotheses the next thing we need to do is we need to get all of the little pieces together so we get the formula for the computed value of the test statistic we decide what kind of test we're using are we
07:00 - 07:30 using a z-test a one sample t-test are we testing proportions what are we doing um are we excluding any data are we how are we doing what are we doing with our sample and then we're selecting a significance level of p-value and we're going to identify the rejection region for that value as well as any criteria for rejecting or excluding participants all of these steps we're doing before we begin data collection so what we're trying to do is determine where our rejection region is so if we
07:30 - 08:00 have a directional or one-tailed test we are looking for a specific cut off or in our distribution where 95 of the distribution Falls below this if we're predicting that Western students are smarter than average we're looking for this end if we're predicting Western students are less smart than average then we would be looking down here on this end instead but we're predicting that Western students are going to be smarter than average and so we have our rejection region which is identified as a z score of 1.645
08:00 - 08:30 if you're curious about where that number comes from go back and look at the normal distribution we'll talk about that a little bit later if we have a non-directional or two-tailed test then we have to make a non-directional cutoff value so what we're doing is we're taking our P equals 0.05 which is a five percent chance right of making an error of making a type 1 error let's be specific and we're taking that five percent
08:30 - 09:00 chance and we're splitting it across the two tails so what we are doing is we are taking five dividing by two that gives us 2.5 and so we're looking for a critical value of Z where 2.5 percent of this curve is below us and 2.5 percent of the curve is above us and so anything that's more extreme than these lines falls into the rejection region critical value of Z there is plus or minus 1.96 again we'll talk about this
09:00 - 09:30 when we talk about the normal curve we then compute the formula elements on the test statistic and we make our statistical decision about whether or not to reject the null hypothesis based on whether the computed value of our Z statistic is in the rejection region and I will reiterate that steps one through five should be done before data collection begins so a note about Zed tests they're
09:30 - 10:00 relatively easy because they're directly compared to the normal distribution so we have these very fancy critical or threshold values that in a z-test we can just pull directly off the normal distribution in other tests they need to be determined based on theoretical distributions they can also be determined based on empirically derived ones but we often use these theoretical distributions so the Z test formula is actually relatively simple Z is equal to
10:00 - 10:30 the mean of the sample minus the mean of the population so remember we were comparing Western students IQ which was 103. I think we said 4 8. and we're comparing that against our population parameter of 100 and then what we're doing is we're dividing by Sigma divided by the square root of n so Sigma is the standard deviation of
10:30 - 11:00 the populations this is our normative standard deviation from our uh from our test from our from our IQ test and the number of participants in the sample this and the square root of that number actually so it'll depend on how many participants were in the sample that will give us the square root of n so when we calculate that out for this specific sample what we're interested in looking at is the average Western student's IQ is 103.48
11:00 - 11:30 minus 100 we have a standard deviation for the distribution of IQ scores as related to that normative test that standard deviation is 10 and we had 132 students in the sample so we're dividing 10 by the square root of 132. so if we actually kind of calculate that out 103.48 minus 100 equals 3.48 and then we have 10 divided by 11.49 which is the
11:30 - 12:00 square root of 132 if you look at your calculator if we further simplify that 3.48 divided by 0.87 because this ratio comes out to 0.87 and if we divide 0.87 by 3.48 what we get is a z-score of four now is that statistically significant well it sure enough is it falls way up here in our Z distribution it's four standard deviations above the mean so what we can
12:00 - 12:30 say is that Western students are significantly smarter than the population average of 100. now one question we can ask is is this a big effect and to do that we can compute an effect size measure now when we're talking about effect size measures for the differences between means the effect size measure we typically use there is a measure called Cohen's d and Cohen's D is nice because it's interpreted in units of standard deviation
12:30 - 13:00 and the Cohen's D that we calculate it's a very simple formula D equals the mean of the sample minus the mean of the population divided by the standard deviation of the population so here we have 3.48 which is the mean of our sample minus the mean of our population divided by 10 which is the standard deviation of IQ scores in the population and that gives us a Cohen's D of 3.48 so
13:00 - 13:30 we are 3.48 standard or 0.348 standard deviations above the mean so a Cohen's D of 0.2 is usually considered small Cohen's D of 0.5 is considered medium and a Cohen's day of 0.8 is considered large I will tell you that Cohen's D has a minimum score of zero but it's maximum can go well up there's no upper limit and so when we're thinking about effect
13:30 - 14:00 sizes we want to consider how big what's the standardized metric for looking at the differences between the mean that we just examined the mean of our sample relative to that population mean and we'll use effects this Cohen's D metric effect size when we do t-tests as well and there are several kinds of t-tests all of which use cones d as a metric so essentially what we can learn from a z-test is that in standardized terms how
14:00 - 14:30 different is a sample mean relative to its population parameter now this assumes a normalized distribution of scores and in some ways it also enforces it because scores are converted to Z scores and that will sort of help to normalize the scores but actually if our population is very non-normally distributed we always need to be careful and rethink about our conclusions especially if the sample size is on the smaller end
14:30 - 15:00 and we'll leave off there and pick up in the next video