Chi Sq 1

Estimated read time: 1:20

Summary

In this enlightening lecture by Erin Heerey, the focus is on understanding tests of association used specifically with categorical data. The presentation reviews core elements of hypothesis testing, exploring different error types such as Type 1 and Type 2 errors, the significance of p-values, and the concept of effect size. Heerey emphasizes the importance of study design, hypothesis specification, and setting criteria before data collection to prevent bias. She also discusses interpreting results based on statistical decision-making processes, highlighting how assumptions are checked in hypothesis tests. This comprehensive lecture delves into both directional and non-directional hypotheses and paves the way for understanding how to minimize errors in hypothesis testing, setting the stage for a deeper dive into chi-squared tests and categorical data.

Highlights

Erin Heerey explores tests of association specifically for categorical data and hypothesis testing. 🎓
Designing a study, formulating hypotheses, and setting criteria are crucial before starting data collection to avoid bias. 🧐
The lecture highlights the importance of understanding errors, particularly Type 1 and Type 2 errors in hypothesis testing. 🚨
Heerey emphasizes the role of p-values in hypothesis testing and decision-making processes. 📊
The discussion on effect size shows how it can help mitigate Type 2 errors by separating means of groups. 📈
Strategies such as moving decision lines on distribution graphs can help manage both Type 1 and Type 2 errors. 🧠

Key Takeaways

Understanding tests of association with categorical data is crucial for accurate analysis. 🤓
Good study design and hypothesis specification are essential to prevent bias. 📊
Recognizing Type 1 and Type 2 errors helps refine statistical decision-making. ⚖️
P-values provide insight into the likelihood of rejecting null hypotheses. 📉
The concept of effect size helps in reducing the likelihood of Type 2 errors. 🚀
Shifting decision criteria can manage Type 1 and Type 2 error probabilities. 🎯

Overview

Erin Heerey kicks off the lecture by diving into the intricacies of hypothesis testing and the specific tests of association used with categorical data. She lays down the groundwork by explaining the differences in data types and the procedures involved in hypothesis testing, including setting criteria and designing studies to limit bias.

The conversation then shifts towards understanding statistical errors, particularly focusing on Type 1 and Type 2 errors, which are pivotal in hypothesis testing. Heerey skillfully breaks down the confusion surrounding p-values and their implications in decision making, showing how they relate to the likelihood of rejecting a null hypothesis.

As the lecture progresses, Erin emphasizes the significance of effect size in hypothesis testing. She discusses how shifting critical values and understanding distribution overlaps can not only reduce errors but also strengthen the credibility of statistical conclusions. The lecture serves as a foundational guide before transitioning into a detailed exploration of chi-squared tests and categorical data in subsequent sections.

Chapters

00:00 - 00:30: Introduction to Test of Association and Hypothesis Testing This chapter introduces the concept of testing for associations, focusing specifically on how it applies to categorical data. It provides an overview of analyzing dependencies in categorical data, along with a brief review of some fundamental elements of hypothesis testing.
00:30 - 01:00: Steps in Conducting Hypothesis Tests When conducting hypothesis tests, there are multiple key steps involved. Initially, one must design a study and concurrently specify the hypotheses, as hypothesizing without a study idea is illogical. Consequently, the first step is forming the study idea and establishing the hypotheses. Following this, a criterion for rejecting the null hypothesis is set, which involves determining the probability of Type I errors.
01:00 - 01:30: Design and Specification of Hypotheses The chapter on 'Design and Specification of Hypotheses' discusses the importance of establishing clear criteria for managing data prior to data collection. It highlights the necessity of pre-determining methods for scoring, cleaning, and excluding data or participants to avoid bias. The chapter emphasizes that these processes should be defined before collecting any data, ensuring objectivity and reliability. After data collection, the chapter proceeds to explain the steps of calculating descriptive statistics and creating visualizations.
01:30 - 02:00: Data Collection and Descriptive Analysis This chapter covers the process of hypothesis testing, emphasizing the importance of checking the assumptions related to the test being used. It revisits concepts like Z-tests and the calculation of confidence intervals, highlighting that certain theoretical formulas require specific assumptions to be valid. This discussion differentiates from methods like bootstrapping, which were also explored for creating confidence intervals. The focus is on understanding these requirements when working with data distributions.
02:00 - 02:30: Conducting Hypothesis Tests and Checking Assumptions This chapter focuses on the process of conducting hypothesis tests and ensuring that the assumptions associated with each test are properly checked. It highlights the importance of making statistical decisions based on the observed score in relation to a critical value. The interpretative aspect of hypothesis testing is also discussed, emphasizing the necessity of understanding where the observed data falls in comparison to a predefined criterion.
02:30 - 03:00: Statistical Decision and Critical Values The chapter explains the concept of statistical decision-making and critical values, focusing on how to interpret critical values in statistical tests. It discusses the use of critical values in a randomization test, specifically the point in a distribution where 95% of the scores fall below. Anything beyond this 95% threshold falls into the rejection region, which is a crucial part of making statistical decisions.
03:00 - 03:30: Directional and Non-directional Tests In this chapter, the focus is on the explanation of directional and non-directional tests in the context of hypothesis testing. It discusses the importance of comparing the observed test statistic with a critical value to decide whether to reject the null hypothesis. The chapter explains the concept of a 95% rejection region for directional hypotheses, using an example where the mean of one group is expected to be larger than that of another. This helps in formulating expectations when testing the efficacy of treatments.
03:30 - 04:00: Understanding p-values and Errors in Hypothesis Testing This chapter introduces the concept of p-values and errors in hypothesis testing. It emphasizes the importance of determining regions under the distribution curve where a significant portion of scores fall, such as identifying areas where 95% of scores are below a certain threshold. The chapter uses an example of a treatment program, like a running plan, to illustrate how expectations of outcomes (e.g., faster running times) can be evaluated using these statistical concepts.
04:00 - 04:30: Distributions and Decision Making in Hypothesis Testing The chapter discusses the concept of hypothesis testing, focusing on distributions and decision-making. It highlights the expectation that as subjects continue engaging in an activity (like running), their performance improves, resulting in faster running times. This idea is used to explain the process of setting critical values in hypothesis testing. The chapter further delves into how one might establish a critical value, such as a score below which the fastest 5% of a distribution might fall, when conducting a non-directional test.
04:30 - 05:00: Type 1 and Type 2 Errors The chapter on Type 1 and Type 2 Errors explores the concept of not specifying a direction for an effect when analyzing differences between groups. Instead of detailing what that difference is, the focus is on the proportion of error that can be tolerated, such as a P-value of 0.05 indicating a willingness to accept being wrong 5% of the time. This error is then split into two, with anything outside the central region being considered noteworthy.
05:00 - 05:30: Overlapping Distributions and Errors The chapter discusses overlapping distributions focusing on the rejection region, which is defined by two critical values. It begins by explaining that 2.5% of the distribution lies above one critical value, and 2.5% lies below the other. Any score that falls in these regions indicates a more extreme result and leads to the rejection of the null hypothesis. If a score does not fall in the rejection region, it is considered closer to the mean, suggesting the null hypothesis cannot be rejected.
05:30 - 06:00: Effect Size and Error Rate Management This chapter discusses the concept of effect size and error rate management in hypothesis testing. It highlights the decision-making process regarding rejecting or retaining the null hypothesis based on the mean and the rejection region. The chapter also delves into the role of p-values in determining the probability of rejecting the null hypothesis both when it is true and when it is false.
06:00 - 06:30: Error Zones in Distributions The chapter titled 'Error Zones in Distributions' deals with the concept of decision-making in the face of uncertainty, particularly in scientific experimentation. It emphasizes that the true state of the world is unknown, and this unknown is a fundamental reason for conducting experiments or scientific inquiries—to seek answers or uncover truths. The chapter underscores that we never truly know if the null hypothesis is true or false, but decisions must be made regardless of these uncertainties.
06:30 - 07:00: Introduction to Chi-Squared and Categorical Data The chapter introduces the concepts of Chi-Squared statistics and categorical data. It explains the relationship between observed statistics and critical values in hypothesis testing. The text discusses the outcomes of failing to reject or rejecting the null hypothesis, indicating that rejecting a false null hypothesis supports the research hypothesis.

Chi Sq 1 Transcription

00:00 - 00:30 so in this lecture we're going to be talking about test of Association and specifically the kind of test of Association that we use with categorical data as an overview we will also in addition to talking about categorical data and how we analyze dependencies in those we will just review a couple of elements of hypothesis testing so in terms of the hypothesis testing
00:30 - 01:00 process when we conduct hypothesis tests we do a number of steps the first thing we do is design a study and related to that study design we specify hypotheses it doesn't really make sense to specify hypotheses when we don't have a study idea in mind so we develop our study idea and we specify our hypotheses we then set a Criterion for rejecting our null hypothesis so this is the probability of type 1 errors we're
01:00 - 01:30 willing to accept and we specify a Criterion for handling things like data how we will score the data how we will clean the data how we will exclude participants and so forth we do all of these processes before we ever collect any data and that's it's good to do that in advance because that prevents bias in your data we then collect the data and calculate descriptive statistics and visualizations we then conduct our
01:30 - 02:00 hypothesis test so that includes checking the assumptions associated with whatever hypothesis test we have so remember when we talked about Zed tests or when we talked about calculating confidence intervals there were some assumptions we needed to make in order to use a theoretical formula so this was not for calculating this the confidence interval from the bootstrapping test that we did this is when we have a distribution we
02:00 - 02:30 have to make sure each test that we do has assumptions associated with it so we need to check our assumptions and finally the last thing we do is we interpret our results and come to a statistical decision so the hypothesis testing process looks something like that and the statistical decision we make will depend on where our observed score for whatever our test is where it falls relative to a critical value relative to
02:30 - 03:00 a threshold so what we did when we did a randomization test in the lab was we looked at a critical value the point in a distribution where 95 of the scores fall below that particular point and anything more extreme than that 95 threshold fell into our rejection region and so that's how we make these statistical decisions if our tests are
03:00 - 03:30 observed test statistic is more extreme than our critical value then we can reject the null hypothesis so what we need to do is we need to find our 95 rejection region if we have a directional hypothesis so we might speculate that something improves treatment so we're expecting a larger number we're expecting for example the mean of group 2 to be bigger than the mean of
03:30 - 04:00 group one or the a stronger effect in effect in a positive direction whichever way we're going um then what we need to make sure we do is we want to find place in our distribution where 95 of the scores fall below that region sometimes we're expecting reductions where we might say oh people who do a treatment program those people are going to get faster for example if you had a running program and you wanted to help people you know a couch to 5K plan where
04:00 - 04:30 people are going to little by little start running and as they engage in that process they will um they their running times will reduce and that makes sense so here what you might expect as your treatment group gets faster so they have lower values and there you would want to find a place a critical value where 95 of your distribution was above that score does that starting to make sense if you have a non-directional test oops
04:30 - 05:00 you're not specifying a direction for the effect you're just going to say there's a difference between these groups you're not going to specify what that difference is then what we're doing is we're looking we're splitting our our the proportion of error we're willing to accept so if P equals 0.05 we're willing to be wrong five percent of the time and so what we have to do there is we have to split that error into two and so what we're what we're saying is anything outside of the central region
05:00 - 05:30 here of this graph anything that's where the where there's 2.5 percent of the distribution is is above and 2.5 percent is below those critical values anything more extreme than either one of those so it could be in a positive direction or it could be in a negative direction we're not specifying um any score that falls into that region is in the rejection region and reject we reject the null hypothesis if the score does not fall into the rejection region it falls closer to the
05:30 - 06:00 mean than the rejection region then we need to retain the null hypothesis we cannot reject the null hypothesis so that's what we're doing when we make that rejection decision and finally p-values the ability of rejecting the null hypothesis when it is true and when it is not true that depends on the p-value so if we have a and remember from before
06:00 - 06:30 we have a decision and we have the true state of the world and remember the true state of the world is something we don't actually know if there's no no point in conducting an experiment in doing science when we know already the answer so this is something that we don't know it's a question we want to answer something we want to find out so we never know whether the null hypothesis is really true or whether it's really false we never know that um and then we make decisions based on
06:30 - 07:00 the relative difference between a rejection between an observed statistic and its critical value is there is the observed statistic in the rejection region so when we fail to reject the null hypothesis and it's true that's a correct decision when we fail when we reject the null hypothesis here and the null hypothesis is actually false the research hypothesis is true
07:00 - 07:30 then we've also made a correct decision when we reject the null hypothesis and the null hypothesis is true that's an error that's a type 1 error in fact and when we fail to reject the null hypothesis when the research hypothesis is true that's a type 2 error so remember what those are the p-value is the probability of rejecting the null hypothesis given that the null hypothesis is true
07:30 - 08:00 and it holds no information about what happens when the research hypothesis is true we have no ability to guess the likelihood of the research hypothesis being true we can only guess the likelihood that the null hypothesis is false so what does this look like so usually we might have so let's say we have two groups right we have two distributions so we have a distribution here this blue one here
08:00 - 08:30 its mean is here and in this distribution any score from this distribution has a null hypothesis the null hypothesis is true in the orange distribution the research hypothesis is true you can see the big thick Orange Line in the center of that distribution is the mean of that distribution and this blue line here is at the 95th percentile of the blue distribution so
08:30 - 09:00 so this is where we say any score bigger than that blue then this dashed blue line here will reject the null hypothesis and he scores smaller than that blue line will retain the null hypothesis so these are our decisions the decisions that we make so any score that's in this side of this little blue line right here we retain the null hypothesis we fail to reject the null hypothesis
09:00 - 09:30 and usually we say fail to reject the null hypothesis rather than retain but it was shorter to write and on this side of the line we reject the null hypothesis so that's going to be our decision Criterion this is the 95th percentile of the blue distribution so but what you can see is that actually there's a fair bit of the orange distribution that kind of overlaps with
09:30 - 10:00 the blue distribution and we know there are some values in the blue distribution here these values right here where you could get this the blue data point here that's sitting right underneath this orange distribution that blue data point that blue data point could be the one you observed it's bigger than your rejection level right here it's bigger than 95 percent and so if you got that blue data point
10:00 - 10:30 here you would be making a statistical error which kind of error would you be making I want you to think about that now it's also the case that this orange distribution overlaps relatively deeply Into the Blue territory right so the research hypothesis is true any value that comes from this orange distribution so you could get this value right here you could get this value right down here this is even lower than the mean of the blue distribution isn't it but if you
10:30 - 11:00 got that value way down there your decision would be to retain the null hypothesis or fail to reject the null hypothesis you would say oh nothing's going on in the data and in fact your score is from the orange distribution but you simply can't see that so when you reject the null hypothesis and the null hypothesis is true right down here these blue these blue
11:00 - 11:30 values I guess they're sort of a I don't even know what color that is it's orange overlaid by blue or Blue overlayed by orange really um so this little area right here this 95th percentile of the distribution where the null hypothesis is true that's your type 1 error when you reject the null hypothesis and the null hypothesis is true your type 2 error is down here you retain the null hypothesis when in fact
11:30 - 12:00 the research hypothesis is true so this is kind of a way to think about where these errors fall in these distributions and how they happen and what you can see is because we know what the true distribution of the null hypothesis is and we can generate it in fact you've generated several of them you've generated some distributions of sample means you generated a distribution of sample percentages in the lab I think glass lab you calculated the
12:00 - 12:30 randomization test on the gender data so you know what the distribution looks like when the null hypothesis is true and we draw our critical value our Criterion for rejection on that distribution so we don't actually know how much overlap there is between the orange distribution where the research hypothesis is true and the blue distribution or the null hypothesis is true it could be a little it could be a lot so we never actually know what our
12:30 - 13:00 probability of making a type 2 error is and all of these orange values that fall to this side here the left of our decision Criterion are ones where we're going to fail to reject the null hypothesis we're going to retain the null hypothesis even though those scores come from the research the alternate hypothesis distribution because we never know that right if you get a score that's on this side of the
13:00 - 13:30 line you're going to reject the retain the null hypothesis if your score is on this side of the line you're going to reject the null hypothesis now you can see a little bit how these errors occur and why the p-value that P equals 0.05 is strongly tagged to type one errors here but not at all and we can't understand how how what the likelihood is of a type 2 error even though the given the p-value it doesn't tell us anything about that so the effect size is the difference
13:30 - 14:00 between this mean and this mean so one of the ways to reduce your likelihood of making type 2 errors is to make sure those means are further apart so if you can find a stronger measure a measure that has that has better ability to discriminate group a from Group B right whether your hypothesis is is whether your research hypothesis is true or not that's an increase in your effect size and that will pull those distributions
14:00 - 14:30 further apart and create less overlap down here that's going to reduce your likelihood of making type 2 errors if you want to reduce your likelihood of making type 1 errors you can actually move this line so this is the 95th percentile right here you could move it further this way to the let's say the 99th percentile over here now that would increase your probability of making a type 2 error but it would certainly decrease your probability of making a type 1 error you can sort of see if you think about moving that decision line around what that would look like
14:30 - 15:00 so that's the idea with these kind of error zones and how we think about them under the distributions when the null hypothesis is true and when the research hypothesis is true so we're going to move on now to categorical data and we're going to talk about chi-squared and we're going to start with a review of categorical versus continuous data I'm going to do that in the next lecture video