Exploring the Mechanics of ANOVA

ANOVA3

Estimated read time: 1:20

Summary

This lecture by Erin Heerey explains the fundamentals and workings of ANOVA (Analysis of Variance), a method used in statistics to prevent inflation of Alpha values by controlling the family-wise error rate. ANOVA is a conceptual extension of the t-test, analyzing the variance between and within groups through the F-test, which is a signal-to-noise ratio. The lecture covers the ANOVA framework, hypothesis testing, variance partitioning, checking assumptions, post-hoc testing, and dealing with outliers. Visual examples illustrate key concepts and statistical tables underscore the mathematical underpinnings of ANOVA, highlighting its practical application and potential pitfalls in statistical analysis.

Highlights

ANOVA is an extension of the t-test, focused on variance analysis 🎓.
F-statistic is key in determining if group variances are significant 📈.
Hypothesis testing in ANOVA revolves around comparing group mean variances 📐.
Variance partitioning in ANOVA separates treatment effects from errors 🔍.
Handling outliers is crucial to maintain the validity of ANOVA results 🚨.

Key Takeaways

ANOVA helps control the family-wise error rate, avoiding inflated Alpha values 📉.
The F-test in ANOVA compares variance between groups (signal) and within groups (noise) 📊.
Partitioning variance is crucial for distinguishing treatment effects from sampling error 🤓.
Understanding assumptions like independence, normality, and homogeneity ensure reliable ANOVA results 💡.
Post-hoc tests and dealing with outliers are essential for deeper analysis post-ANOVA 🔍.

Overview

In Erin Heerey's lecture on ANOVA, the mechanics of the Analysis of Variance are unveiled, describing its utility in managing family-wise error rates. ANOVA extends the principles of the t-test, primarily focusing on variance comparison among different groups through an F-test, which acts as a signal-to-noise indicator.

The lecture delves into the procedural aspects, including hypothesis testing and variance partitioning. By examining the relationships between and within groups, ANOVA identifies significant variances. The process is exemplified with visual aids and simplified equations, providing clarity on statistical evaluations.

Also covered are the assumptions of independence, normality, and homogeneity, which are vital for accurate ANOVA application. The session highlights the importance of post-hoc tests for further insights and discusses strategies for managing outliers, underscoring the thorough approach needed for effective statistical analysis.

Chapters

00:00 - 01:30: Introduction to ANOVA In the 'Introduction to ANOVA' chapter, the primary focus is on explaining the purpose and functioning of ANOVA (Analysis of Variance). The discussion begins by addressing the use of ANOVA to control the family-wise error rate and prevent inflation of alpha values. It is explained that ANOVA is conceptually an extension of the t-test, which is essentially the difference between means divided by an estimate of the pooled variance.
01:30 - 03:30: Understanding ANOVA: Concepts and Framework The chapter 'Understanding ANOVA: Concepts and Framework' explains the fundamentals of ANOVA (Analysis of Variance) as a statistical tool used to compare differences between groups. It emphasizes ANOVA's function as a signal to noise ratio, through the computation of the F statistic. This involves comparing the variance between groups to the variance within groups, similar to the function of a t-test.
05:00 - 08:00: Visual Example of Group Differences The chapter 'Visual Example of Group Differences' explores the concept of measuring and visualizing differences between groups, particularly through statistical methods such as the t-test and variance analysis. It explains how these methods help in understanding group differences by comparing within-group differences to between-group differences. The discussion is framed within the standard research and hypothesis testing framework used in frequentist statistics.
08:00 - 13:00: Partitioning Variance in ANOVA The chapter titled 'Partitioning Variance in ANOVA' discusses the framework of frequentist statistics, focusing on how studies are designed and hypotheses are formulated. It explains that in analysis of variance (ANOVA), the null hypothesis assumes no difference between groups beyond what can be expected from sampling error. Alternatively, the research hypothesis suggests that there are differences between the groups. The chapter outlines the expectations and implications of these hypotheses in the context of ANOVA.
15:00 - 25:00: ANOVA Source Table and Degrees of Freedom This chapter introduces the concept of the ANOVA source table and its role in statistical analysis. It begins by outlining the importance of identifying differences between groups, emphasizing the relevance of specifying which groups are expected to differ and how they might relate to each other within the hypothesis testing process. Although the current focus is on a broader research hypothesis, the chapter also highlights the need to state the significance level and delineate data handling practices and procedures, key aspects of executing ANOVA effectively.
30:00 - 37:00: Assumptions in ANOVA Testing This chapter discusses the assumptions involved in ANOVA (Analysis of Variance) testing. The fundamental steps include data collection, visualization, calculation of descriptive statistics, checking assumptions, computation of test statistics and p-values, and finally reaching a conclusion. The process remains consistent with standard statistical procedures, with a focus on analyzing group differences, possibly exemplified by data on participants' anxiety levels.
37:00 - 45:00: Post-hoc Tests and Outliers The chapter discusses the analysis of treatment effects across different groups using post-hoc tests while handling outliers. The focus is on visual comparisons over numerical scaling to calculate group means. A visual representation shows the means of three groups: exposure therapy, mindfulness meditation, and control, allowing for a comparison of their effects.
47:00 - 54:00: Visualizing Data and Detecting Outliers The chapter focuses on understanding the relationships between data points, particularly in the context of different groups such as control therapy and exposure therapy. It highlights the need to explore whether these groups differ from each other in terms of anxiety levels. To assess this, the chapter suggests looking at the variance within groups, examining how much individual data points deviate from the group mean. This analysis helps in visualizing data and detecting outliers within the dataset.

ANOVA3 Transcription

00:00 - 00:30 all right so we've talked about why we use Anova and we use it to control that family-wise error rate keep our Alpha values from becoming over inflated by accident so let's talk about how we do this how does the Anova test work and so it turns out that the Anova is actually a conceptual extension of the t-test it's so the t-test as you'll remember is the difference between means divided by an estimate of the pooled variance so it's
00:30 - 01:00 a signal to noise ratio so Anova is also a signal to noise ratio and what we're doing when we look at Anova we're Computing a test now called f f is the statistic and we're looking at the variance between groups divided by the variance within groups so again it's a signal to noise ratio and so it's similar to the t-test in terms of looking at how much difference there is between groups
01:00 - 01:30 relative to how much difference there is within them which is exactly what we're getting here in the t-test although here the difference between the groups is a little bit more explicitly stated here we're using variance which is another way of getting or understanding group differences so how does this process work well of course we have a sort of standard research testing hypothesis framework that we use in frequentest
01:30 - 02:00 frequentist statistics where we I if we we first we designed the study and then we State our null and research hypotheses our null hypothesis in analysis of variance is that the groups are not different to one another they're all the same obviously within sampling error right because they're not going to be exactly the same because different people are in different groups so they won't all have identical means but you know within sampling error are they close our research or alternate hypothesis is
02:00 - 02:30 that there's a difference between at least one pair of groups now at this stage it's a good idea to talk about which groups you think will differ and how they'll relate to one another but that that's part of this hypothesis testing process but for now for this class we will consider a more General research hypothesis we State the significance level we State our data handling practices and procedures
02:30 - 03:00 then we collect the data and after we collect the data we make our visualizations we calculate our descriptive statistics we check our assumptions we then calculate our test statistic and p-value and finally we draw a conclusion so the process here is not different to the process anywhere else we're going to just do something a little bit different with the data so let's take a visual example of the differences between groups so this is the amount of anxiety participants
03:00 - 03:30 report after they've completed the treatment and I've made this that it's not scaled there's no scale level here and I did that just because I want us to have a visual comparison rather than a numeric one so we can calculate the mean of these groups so I've drawn those means in little lines and so the mean of the exposure therapy group is about here here's the mindfulness meditation group and here's the control group and what you can see
03:30 - 04:00 is that there's some parametric relationship between them but we're not sure if these groups are different from one another obviously the control Therapy Group it looks like they have more anxiety than the exposure therapy but we don't know what those relationships really look like so then we look at variants within groups so how much variance is there within the groups how far apart is the average data point from its from its neighbor from its mean so
04:00 - 04:30 we look at the variance within the groups and that's our s measure of sampling error right so which is usually how we do it right so the t-test we do that very explicitly by um pooling the variance within the groups to get our variance of of our standard our the denominator of the t-test which is going to be our differences estimate and now in Anova we're looking at the variance between groups so this is the
04:30 - 05:00 difference between this mean and that mean and this mean so these are the variances between the group so here we're looking at how much we're treating these means now as basically a sample and we're looking at the variance between those means and now we're asking that's our signal we're asking whether there's a difference whether the signal to noise ratio is greater than one right if the numerator and the denominator of equation are the same the resulting
05:00 - 05:30 um the resulting answer to the to the division problem is one so for example if you have two divided by 2 you get one so what we're asking here is is the variance between the groups greater than the variance within the groups and that's what we're doing with this math with the math in this equation and we're comparing things to a grand mean this is it so to get our variance between groups we compare these means
05:30 - 06:00 to a grand mean of all the data points if they weren't in groups so just all the data points doesn't matter what group they're in their overall mean is calculated so to get the variance between the means we take the we take this very similar to a sum of squares but it's between the mean within a group and the overall grand mean is what we call that an analysis of variance so that's where
06:00 - 06:30 we're getting these these ratios the variance between the groups is the variance due to treatment the variance within the groups is the variance due to sampling error so that's our signal to noise ratio right there so to get that signal to noise ratio we do a process called particip partitioning the variance so we have our total variance which is calculated as the total sum of squares um and you're thinking sum of squares we usually do that relative to a mean right
06:30 - 07:00 well we can also do that relative to the overall grand mean so this is on the previous slide it's this kind of dashed magenta colored line here the grand mean so we're looking at each individual score independent of its group how that deviates from the grand mean so the scores are squared and sound I will also warn you right now that the formulas I'm sort of telling you I want you to think about these conceptually the math will not work out if you use
07:00 - 07:30 these formulas these are not formulas that you can calculate with but conceptually this is what we are doing so to get the total variance we take the scores minus the grand mean square and sum those deviations so it's a sum of squares relative to the overall grand mean of the study our treatment effect is that's called the sum of squares treatment
07:30 - 08:00 and what we're considering there is the group mean minus the grand mean squared and then summed and then we need to do um another little scaling process for each one of the scores in those in those groups so we need to you need to be a little bit careful there so this one really really won't work if you try to calculate it just straight up by hand and then finally we have unexplained variance or error variance this is our sampling error it's known in the Anova
08:00 - 08:30 framework as the sum of squares error and it's the score minus the group mean squared and then summed again there's a mathematical shift on it so that's what we're doing when we partition the variance we're taking our total variance and we're breaking it up into two pieces the part that is associated with the treatment which I have highlighted here in green and the part which is associated with the sampling error which I've highlighted in Red so that's the individual differences and all the rest of the you know sort of
08:30 - 09:00 you know how somebody was feeling a particular day whether they were happy with the world whether they thought they were going to do well in their exam all the rest of this you know individual differences that are not predictable based on the treatment so that's what we're doing when we partition the variants we're breaking it into pieces that are associated with either the treatment or the sampling error when we calculate Anova we calculate
09:00 - 09:30 what we call a source table you will see one of these on an exam and there are there's going to be in the next lecture there'll be another kind of Anova table I want you to look very carefully at those because I always put one on um she'll need to know a little bit about how these work so a Nova Source table is divided into rows and columns The Columns each have a particular label and you'll see the same pretty much the same labels regardless of what um of where you go there's a source of variation then we have sum of squares
09:30 - 10:00 we have degrees of freedom we have what we call the mean squared and then we have an F so our treatment Factor is sometimes called the the source of variation here's either called treatment it can be called between subjects it can be called Factor so it has some different names depending on on where you go and the sum of squares treatment is that partition that's associated with a treatment the degrees of freedom associated with
10:00 - 10:30 it is a number that's often referred to as K minus 1 where K is the number of levels of the factor so in our particular example that we have we have we have three different groups right we have exposure therapy mindfulness meditation and control so three minus one that would give us two degrees of freedom associated with that treatment effect the mean squared for treatment this is a
10:30 - 11:00 so this num this letter with this word mean here means just what you think it is it's the sum of squares treatment divided by the degrees of freedom the appropriate degrees of freedom so sum of squares treatment is divided by the degrees of freedom for the treatment this will turn out to the be the numerator of our F ratio so remember our F ratio is that signal to noise ratio so this will be our numerator
11:00 - 11:30 then we have a source of variation we call error meaning sampling error sometimes this is also called within as in within groups so the amount of variation within groups and I'm gonna you'll hear me using this term error more frequently than within although know that you can see it represented both ways so we have the sum of squares error the degrees of freedom there are n minus K where K we already know is the number of levels n is the number of participants we had so if we had 20
11:30 - 12:00 participants minus two levels our degrees of freedom would be 18. so this is and the degrees of freedom is the only math I'm going to ask you to calculate by hand in this class you don't need a calculator I won't give you any hard maths you but you will and you will not be allowed a calculator but degrees of freedom is something really important to look at because it tells you where people are making mistakes this is something that you should be
12:00 - 12:30 able to recognize right away because it tells you something about the design of the of the study and sometimes you'll see papers where a test statistic is reported along with its degrees of freedom a degrees of freedom don't match the number of participants they have then you have to wonder what happened to the data so sometimes researchers when data are missing they can lose degrees of freedom and not notice it so this is really really important from a um from a design perspective as well to
12:30 - 13:00 be able to understand how these degrees of freedom work so I would practice that a little bit and I will give you some chances to practice that um so the degrees of freedom for the within subjects or error sum of squares is the number of participants we have minus the number of levels and to get the mean Square we do exactly the same thing we did on in this row right here we take our sum of squares error and we divide it by the degrees of freedom error and that gives us our mean
13:00 - 13:30 squared error and that's the numerator sorry the denominator of the F ratio so then our final row is total we have our total sum squares the total variance in the whole study our degrees of freedom total is n minus one where n is the number of participants so if we had 20 participants 20 minus and we have
13:30 - 14:00 well if we had 20 participants it would just be 20 minus 1 or 19. now the important thing is that all of these degrees of freedom should add so this plus this should equal this and this is a really nice thing about anovas is that you can check the math really easily the treatment plus the error should equal the total so that's how these this is a nice sort of
14:00 - 14:30 table that allows you to check a lot of times in statistics we don't give you the chance to check very well you just have to learn and be and sort of Intuit whether your numbers are right the Anova is great because you can check it and there's always a check if your sum of squares treatment plus the error does not equal the calculation you made for sum of squares total then you have a mistake in your math somewhere likewise here for these degrees of freedom this plus this should equal this so to get the final F score
14:30 - 15:00 it's the mean squared for treatment so the number in this box right here divided by the mean squared for error which is the number in this box here we don't put anything in these boxes so that's what the f where the F comes from and that's how we sort of divide up this variance when we talk about partitioning the variants now I've given you some made up data some a theoretical number of participants that works with our sample um where the sum of squares treatment is 12 the sum of squares error is 45 and
15:00 - 15:30 the total is 57 and you can see that they add it this way these numbers I made up they're not based on any data um and then we have our degrees of freedom for this design so we have three minus one so we had three groups minus one gives us two degrees of freedom for the numerator 18 minus three gives us 15 degrees of freedom for the denominator and 18 minus 1 is 17 and we know that 15 plus 2 equals 17. so we know that we've
15:30 - 16:00 done our math correctly there are mean squared so for our F ratio our mean squared is 12 divided by two that gives us 6. our mean squared for the error is 45 divided by 15 which gives us three so our F then is 6 divided by 3. and that F comes out to a value of 2. so we can compare the calculated F to our critical value which is our threshold this by the way is an uh is a
16:00 - 16:30 theoretical threshold rather than an empirical one so we have gotten this by looking it up in an F table and not by using randomization or some other method to derive it ourselves so if an if the f is greater than the F critical then we can reject the null hypothesis so here the F critical is 2.49 so I got this based on the table in the back of a textbook the numerator degrees of freedom is 2 and the denominator degrees of freedom is 15. so
16:30 - 17:00 we always report when we report an app statistic we always report the numerator and denominator degrees of freedom with it I will also give you a reminder that because the F statistic is based on a sum of squares we know sums of squares cannot be negative the F can't be negative either so if you get a negative f you have a math mistake somewhere so what is our F distribution look like that's a tricky one um because the F distribution depends on
17:00 - 17:30 both the numerator and the denominator degrees of freedom because we have two values that we're comparing now um for the T distribution by the way the numerator degrees of freedom is simply one and it's always one it's not dependent on the type of design we have so because it's always one and only one we don't have to worry about it in the T distribution because it doesn't vary for the F distribution the numerator degrees of freedom does vary because it depends
17:30 - 18:00 on the number of levels and the number of factors I'm using right so and the number of participants I'm using which is related to our denominator degrees of freedom so different so f distributions with different levels of numerator and denominator degrees of freedom are differently shaped so if we want the F distribution with a
18:00 - 18:30 numerator of three so three degrees of freedom in the numerator and 5 in the denominator now this is pretty theoretical we wouldn't really do this we wouldn't have only five participants and three um and only five denominated degrees of freedom and three numerator degrees of freedom that's something we don't really do usually we try to create larger sample sizes but what you can see and it's most dramatically shown in this case that the
18:30 - 19:00 f is a highly positively skewed so when the tail carries out to the positive end of the graph we're positively skewed and all of these distributions have that kind of level of skew where if the null hypothesis is true on average the mean is going to be somewhere around one um now of course it doesn't work out like that perfectly but when you have a reasonable number of
19:00 - 19:30 denominator degrees of freedom and numerator degrees of freedom that is what you'll end up getting um but typically we have a very small number of numerator degrees of freedom and a very large number of denominator degrees of freedom so most of the time our app distribution is going to look a lot more like the bright red distribution here where it'll have a mean somewhere around one or a big big lump somewhere around one and then it will tail off toward the positive end of that distribution and as you as you get
19:30 - 20:00 more extreme into this distribution um you will see that your p-value goes declines across this distribution here so what assumptions do we need to make when we are testing an analysis of variance obviously we've talked about independence we've talked about normality we've talked about homogeneity of variance these are essentially the same
20:00 - 20:30 assumptions we make for t-test um Independence as I said we are for the purposes of this class we're going to be establishing that based on the design of the study or the experiment rather than on a formal statistical test although in reality you would want to do this with a formal statistical test we can also talk about normality and here we do have a formal statistical test we have the Shapiro Wilk test and that tests the hypothesis that the sample came from a normally distributed population
20:30 - 21:00 and finally we have homogeneity variants so we can make we can test Levine's test there which tests the null hypothesis that the samples were drawn from populations with equal variances so um when we do Levine's test we're asking the question were however many levels however many samples we have so this is how many levels we have how many groups we're comparing were these samples likely drawn from
21:00 - 21:30 populations with equal variances or not and then we also then tasked with a Shapiro will test the Assumption of normality so we're testing the null hypothesis that the sample came from a normally distributed population and here in both of these cases we do not wish to reject the null hypothesis we wish to fail to reject it we wish to retain the null hypothesis for these tests because that will tell us that our data are normally distributed and that our population our
21:30 - 22:00 samples have equal variances so we use these tests to formally check our hypothesis assumptions before we do the hypothesis testing so we'll do that in the lab this week and then assuming we get a significant Omnibus test many times we go on and do post-hoc tests so if the an Omnibus result is significant then you might need to test the data further to find out which groups differ and how they differ because that's
22:00 - 22:30 probably what you really want to know so as I alluded to earlier we can do that using planned comparisons which are hypothesis driven tests for specific pairs of means so I could ask is mean one different from mean three and that might be the only set of means that I'm really interested in we can also do post hoc tests where we do all the possible pairwise comparisons um and for post talk tests we need to be really really careful about controlling the family-wise type one error and so
22:30 - 23:00 what we typically do when we do post hoc tests is we apply a correction to the significance level so we keep the overall significance level at 0.05 and if we had five comparisons what we might want to use is a p-value of .01 as our threshold level because then when we added up those all those .01s together we would end up with a value an overall family-wise error still of 0.05 so then we wouldn't be inflating our type one error rate
23:00 - 23:30 so that's typically how we then further examine the data and Analysis of variance and the other thing that we need to consider in analysis of variants are outliers and invalid data so because we are partitioning variants and outliers increase variance statistical tasks like the t-test and the Anova are very sensitive to outliers in the data so one rule of thumb we often use and you'll often see in the literature is plus or
23:30 - 24:00 minus three standard deviations from the mean um so people will exclude statistical outliers that are plus minus three standard deviations from the mean if the data are really skewed we use a different type of test we'll make a filter that's based on the interquartile range um so sometimes that works better but it depends on the data set however in some designs it makes sense to retain statistical outliers where they might
24:00 - 24:30 not be the same as their other values of data they might not be close to your mean but those outliers might be actually contributing good and valid data and when they are it doesn't really make sense to exclude them because you don't want to throw away good data right if somebody's a statistical outlier because they were you know snoozing in your task or because they took a phone call or because they stopped your task to watch a television show and they came back and couldn't remember the directions um when that happens
24:30 - 25:00 you can get data that are not valid and that turn out to be outliers and those data you definitely want to exclude but in some designs if you're looking at the preference for one thing versus another right we could we could think about you know what people like to I don't know what people like to eat for breakfast um and we could take a poll we could look at the number of people in each of the categories people could rank their food or rate their food preferences rating is probably better um and we could have serious outliers let's
25:00 - 25:30 say lots of people like to eat porridge or oatmeal I'm not sure what you call it over here but porridge or oatmeal for breakfast but some people really really just hate that stuff so those people might be giving you data that look like statistical outliers but in that case it's really validated and it wouldn't make sense to exclude them so outliers can can be caused by participants submitting invalid data
25:30 - 26:00 because they're not following directions they're responding kind of randomly they're just pressing the same button over and over again they have reaction times that are super fast where they're just pushing as quickly as they can or super slow where they're you know taking a phone call or like doing something else during your task um so participants do submit invalid data they do that all the time um and those data points need to be removed because they're not valid data so so
26:00 - 26:30 often outliers are associated with the submission of invalid data but not always so you need to be really careful about the design and consider to what extent are the data valid versus invalid and how they link together these types of data points these statistical outliers can interfere with the validity of a hypothesis test remember because the theoretical versions of these hypothesis tests the ones based on kind of these theoretical distributions
26:30 - 27:00 um those data points can cause problems with validity regardless of whether the data are valid or not right so if you have if you're looking at preferences and somebody writes something really really low and it makes them a statistical Outlet or really really high and that makes them a statistical outlier um you know those kinds of statistical outliers are still validated and you might not want to exclude them for that reason but they doesn't mean they won't make up make you cause and cause you
27:00 - 27:30 errors cause you to make statistical errors in your conclusions because they change the value of your P values or the value of your test statistics so if the outlier is an outlier because it's been because a participant has submitted invalid data which you would Define before data collection begins then they should be excluded but and sometimes you can also make an ordinary statistical argument for excluding these data points but in
27:30 - 28:00 general I would rather see people doing and learning to do randomization testing and more randomization testing rather than throwing out valid data that just because it's an outlier how do you how do you tell well it turns out that visualizations are a good way to see them but some are better than others so here's an example of the data from this particular study so we have our exposure data our mindfulness data and our control conditions so we have our three conditions this is the amount
28:00 - 28:30 of anxiety they report going into the first final exam after they've done their exam anxiety program and it turns out that we can we can't really see the outliers in this display so this is an ordinary bar plot with uh error bars on the top and we you know you can see that that error bar is a little bit smaller than that one that one's maybe a little bit longer but actually we can't really tell the difference we can't see outliers here
28:30 - 29:00 you can see outliers in both of these plots so when you have a violin plot that has a sort of you know that actually looks more like the top of a violin right so it stretches really long in One Direction that's usually an indicator of that there's an outlier in a particular condition box plots show you outliers really well because you get these far statistical outliers which show up and look like this so this is a box plot created in Seabourn and
29:00 - 29:30 you can see the outlier right there so this is a really good way of seeing and finding those outliers and knowing that you have a problem that you're going to need to deal with in your data and I will leave it there for analysis of variants foreign