Exploring Null Hypothesis Significance Testing

NHST4

Estimated read time: 1:20

    Summary

    In this lecture, Erin Heerey delves into null hypothesis significance testing (NHST) using a classic 1974 experiment as a case study. The experiment involved 48 bank managers tasked with deciding promotions based on personnel files that were identical except for the applicant's gender. Results showed that males were favored, with 87.5% being promoted compared to 58% females, raising questions about gender bias. The lecture contrasts the null hypothesis, which states that gender and promotion are independent, against the alternate hypothesis suggesting gender discrimination. It uses the courtroom analogy to explain hypothesis testing and emphasizes the importance of evidence in supporting or rejecting the null hypothesis, highlighting the complementary nature of null and research hypotheses.

      Highlights

      • A 1974 experiment tested promotion bias with resumes differing only in the applicant's gender. 📑
      • Men had a 29.2% advantage in being promoted over women despite identical resumes. 🤔
      • The lecture uses a courtroom analogy to explain NHST, reflecting on evidence and verdicts. 🏛️
      • Key concepts include null hypothesis (status quo) and research hypothesis (challenge to status quo). ⚖️
      • Hypothesis testing requires complete sample spaces for reliability and valid conclusions. 🧪

      Key Takeaways

      • NHST helps assess if observed outcomes are due to chance or another factor like bias 🧐.
      • The 1974 experiment highlighted potential gender bias using identical resumes 🍀.
      • Hypotheses in NHST are like courtroom verdicts—innocence (null) vs guilt (alternative) ⚖️.
      • Decisions are based on evidence, never fully proving, only rejecting or failing to reject hypotheses 🔍.
      • Complete and mutually exclusive hypotheses are essential for accurate testing 🎯.

      Overview

      Erin Heerey dives into the world of null hypothesis significance testing through the lens of a thought-provoking 1974 study. Here, bank managers unwittingly participated in an experiment revealing gender bias in promotion decisions based solely on name identification – John versus Jane. What initially seems like an administrative exercise quickly morphs into a revealing sociological study where the only variance is gender, yet the promotion outcomes are starkly different.

        The heart of this lecture beats with the rhythm of courtroom drama. Erin likens the hypothesis testing process to a trial, with the null hypothesis representing the presumption of innocence – or in our case, the belief that no gender bias exists in promotions. Just like jurors doubt a defendant's guilt until proven, researchers hold the null hypothesis as the status quo, gently nudging it with evidence to see if it holds or folds.

          As the lecture unfolds, we're reminded that in hypothesis testing, as in life, it's less about proving someone's innocence or guilt and more about evaluating the strength and weight of the evidence. Through empirical and theoretical methods, the dance of null versus research hypothesis is ever dynamic, highlighting the necessity for full and fair sample spaces to draw trustworthy conclusions.

            Chapters

            • 00:00 - 00:30: Introduction to Null Hypothesis Significance Testing The chapter introduces the concept of null hypothesis significance testing (NHST). It sets the stage for a deeper dive into NHST by discussing an experiment from 1974 involving 48 randomly selected bank managers or supervisors. The purpose of the chapter is to explore NHST in the context of real-world experiments.
            • 00:30 - 01:00: Experiment Overview: 1974 Bank Manager Study In the 1974 Bank Manager Study, male participants were provided with a personnel file and asked to judge if the individual should be promoted to a routine branch manager position. The role in question involved overseeing multiple bank branches. The study aimed to evaluate the decision-making process regarding promotions based on the resumes or CVs presented in the files.
            • 01:00 - 03:00: Study Design: Identical Resumes with Different Names This chapter discusses a study design where identical resumes were used with only the first names differing by gender—John for male and Jane for female. These resumes were randomly assigned to branch managers to evaluate, aiming to explore potential biases in hiring practices based on gender perceived through names.
            • 03:00 - 05:30: Results and Analysis of the Experiment The chapter titled 'Results and Analysis of the Experiment' examines the outcomes of an experiment designed to assess potential gender bias in promotion recommendations. In the experiment, 35 out of 48 resumes were recommended for promotion. The resumes were identical except for the gender suggested by the applicant's first name, allowing the experiment to focus on whether females were unfairly discriminated against. The independent variable in this study is the gender indicated by the name on the resume.
            • 05:30 - 10:00: Evaluating Hypotheses: Null vs. Research Hypothesis The chapter focuses on the evaluation of hypotheses, distinguishing between the null hypothesis and the research hypothesis. It provides an example involving promotion decisions within a bank, based on gender indicated in resumes. The chapter details a contingency table used to illustrate the decisions, involving either male or female categories. In the study, 48 bank managers were surveyed, with an even split between the two gender categories. The chapter aims to convey understanding of dependent variables through this practical example of a promotional decision-making process.
            • 10:00 - 15:00: Concept of Hypothesis Testing as a Trial This chapter discusses the concept of hypothesis testing using a trial setup. It describes an experiment involving male and female candidates with identical resumes to test for gender bias in promotions within a CVS organization. The results show that 87.5% of male candidates were promoted compared to only 58% of female candidates, highlighting a potential gender bias in the promotion process. The chapter likely delves into the analysis of these probabilities and how they contribute to hypothesis testing.
            • 15:00 - 20:00: Conducting Hypothesis Tests The chapter delves into the critical question of whether observed results in an experiment could be attributed to random chance or if there exists a systematic bias, such as favoritism. A specific case is discussed where promotion decisions in a study appear to favor males over females by 29.2%. The chapter examines the difference in the proportion of promoted males versus females and poses a hypothetic consideration about the outcomes if the experiment were to be repeated.

            NHST4 Transcription

            • 00:00 - 00:30 so in the next section of the lecture we're going to really get into the meat of null hypothesis significance testing and we're going to do that in the context of an experiment so this is an experiment that was completed back in 1974 where it was published in 1974. what happened to this experiment were was that 48 randomly selected bank managers or being supervisors who are
            • 00:30 - 01:00 all male received a version of a personnel file and they were asked to judge whether the person in the file should be promoted to a routine branch manager job so you might have sort of an overarching supervisor of a bank that manages a bunch of different branches and it's this person saying you know whether or not the person whose resume or CV is being received should that person be promoted to a routine bank manager job now what happened in this study is that all
            • 01:00 - 01:30 the files that were given to the branch managers were identical except the name so in half the files the person's name the person in the file had a male first name and in the other half the file contained a female first name John and Jane the files were randomly assigned to supervisors and essentially each Supervisor was
            • 01:30 - 02:00 saying would you promote this person or would you not 35 of the 48 files were recommended for promotion and so the question that's being asked is were the females unfairly discriminated against remember the resumes are identical the only difference is the first name of the applicant hear what the results look like so the independent variable is whether the resume had a male or a female first
            • 02:00 - 02:30 name the dependent variable here is promotion decisions whether the bank manager said go ahead and promote this person or don't so here's what the what the contingency table looks like so we have the gender on the resume which was either male or female and then we have the promotion decision whether they were promoted or not promoted again there were a total of 48 bank managers who were surveyed 24 were
            • 02:30 - 03:00 randomly assigned to female CVS and 20 and 24 to male CVS and here's what the decisions look like so 21 out of 24 men were promoted and only 14 out of 24 women were promoted so the probability of promotion given that you're a male is 21 divided by 24 or 87.5 percent the probability of being promoted if you're a female with an identical resume except for your first name is 14 of 24 so 58 percent
            • 03:00 - 03:30 hmm so what we really want to ask is to what extent could these results be due to chance or is there favoritism happening so promotion decisions favored males by 29.2 percent over females what's the difference between the proportion of meals that were promoted in the proportion of females that were promoted So based on this information which of the following statements could be true if this experiment were repeated more
            • 03:30 - 04:00 women would certainly get promoted promotion is dependent on gender men are more likely to be promoted and the results show evidence of gender discrimination the difference in promotions received by men versus women is due to chance not evidence of gender discrimination women are less qualified than men and this is why fewer women get promoted now of course some of these items we can totally roll up women are less qualified than men and this is where fewer women get promoted but their resume was identical
            • 04:00 - 04:30 except for the first name so there was no difference in the experience that the men and women had in this experiment could we say for certain that more women would get promoted no but what about these other two well both of them could be likely both of them could happen if there's gender discrimination well then maybe men are more likely to be promoted and the results show evidence of that it might also be an alternate hypothesis
            • 04:30 - 05:00 might actually be that the reason that more men versus women were promoted is that maybe those files were randomly assigned to the more cranky of the bank managers so maybe there are some cranky bank managers that just happen to be assigned to female CVS and as a result they wouldn't have promoted anybody male or
            • 05:00 - 05:30 female but just by random assignment or groups we are randomly assigning more cranky make managers who are going to say don't promote regardless of who the person is so maybe it's due to chance and both of these are plausible hypotheses and that's what happens when we do experimentation we have competing claims that need to be rolled out one against the other so our null hypothesis
            • 05:30 - 06:00 this idea that nothing is going on is that promotion and gender are independent there's no gender discrimination here and The observed proportions in the data are due to chance we want to put that against a research hypothesis which will be the sort of something is going on hypothesis where promotion and gender are dependent on one another that men are more likely to be promoted than women there is gender discrimination they observed proportions are not due to chance
            • 06:00 - 06:30 so those are our competing claims here and that's a reasonable compli a reasonable thing to suggest um the nice thing and the important thing here is that these are complementary claims the null hypothesis and the research hypothesis are complementary to one another they fully complete the sample space one of these hypotheses must be true probability of these hypotheses are not symmetric so one of them might be more likely than the other
            • 06:30 - 07:00 so how do we evaluate these two complete competing claims and one way we can think about how to test hypotheses is by likening the idea of a hypothesis test to the idea of a trial so we can think about hypothesis testing very like the way we think about a trial in a court of law where there's a judge and a jury and they're making a decision about whether or not a defendant is guilty
            • 07:00 - 07:30 the null hypothesis is the defendant is perhaps innocent and the research hypothesis or the alternate hypothesis is the defendant is guilty evidence is collected and presented in research we collect data and we describe our data and that evidence is then judged so could the data plausibly have happened by chance if the null hypothesis were true oops and the question is if the data are very
            • 07:30 - 08:00 unlikely to have occurred by chance then the evidence is beyond A Reasonable Doubt so the decision how How likely or unlikely is this we can quantify that with statistical evidence and statistics so if the evidence is not strong enough to reject the presumption of innocence remember we presume that it that a defendant is innocent until proven
            • 08:00 - 08:30 guilty we have a presumption of innocence that's our null hypothesis if the evidence is not strong enough to reject that presumption of innocence the jury must return a verdict of not guilty it's not the same as being innocent it means that there's not enough evidence to convict the defendant might be innocent but they might not be and the jury can never be sure of this remember this is when we're thinking about populations and the relationship between a sample and a population again we can
            • 08:30 - 09:00 never be sure um how closely we match in statistics we fail to reject the null hypothesis so in statistics we presume that the null hypothesis is true unless it's Beyond A Reasonable Doubt if there's enough evidence to reject it the null hypothesis is never declared to be true because we never know whether or not it's really true we do not accept the null hypothesis instead we either
            • 09:00 - 09:30 reject or fail to reject the null hypothesis we never make decisions about the research hypothesis so the jury is calculating the conditional probability of the set of evidence that's been presented given that the defendant is innocent and remember the innocence is never proven it's only rejected just like with our null hypothesis so to recap this idea we start with a
            • 09:30 - 10:00 null hypothesis age zero that represents the status quo the idea that nothing is going on we pit that null hypothesis against a research hypothesis H1 that represents the research question what we are testing for we conduct a hypothesis test under the assumption that the null hypothesis is true and we can do that either via randomization which is what we're going to do this week or that's an empirical method for testing the truth of a null hypothesis we can also do this by a
            • 10:00 - 10:30 theoretical methods we'll learn about those later in the course if the test results show that the data do not provide convincing evidence for the research hypothesis then we must retain the null hypothesis otherwise we reject the null hypothesis in favor of the research hypothesis again we never know if the research hypothesis is true or the null hypothesis is true we don't know these things all we know is what the evidence shows when we make a
            • 10:30 - 11:00 conclusion to the best of our ability based on the evidence that we have and complementarity here is extremely extremely important so remember the paper I showed you earlier it suggested the research hypothesis that one glass of red wine per day improves cardiovascular health so what null hypothesis would be complementary to that red wine does nothing for cardiovascular health red wine makes cardiovascular
            • 11:00 - 11:30 health worse or both of these outcomes together in order for the logic of null hypothesis significance testing to work the sample space needs to be complete and without a complete sample space we cannot accurately compute the probabilities so remember what we're looking for is the probability that the null hypothesis is likely to be true given the data and so without a complete complementary set of hypotheses we can't compute those
            • 11:30 - 12:00 probabilities accurately so we need to make sure that our sample space is fully covered and that our research hypothesis and our null hypothesis really genuinely are mutually exclusive hypotheses this is really easy with our bag of marbles what's the probability of picking an orange one out of our set and we said it was 1 and 14. it's much harder to do when we're talking about psychological constructs
            • 12:00 - 12:30 the things that we measure in regular and standard experimentation so we'll continue in the next part of the lecture video