Probability4

Estimated read time: 1:20

    Summary

    In this lecture, Erin Heerey discusses contingency tables and their role in understanding dependency between events. A key example involves a study on treatments for cocaine addiction, showcasing how relapse rates differ across three treatments: desipiramine, lithium, and placebo. Heerey also delves into concepts like marginal and conditional probabilities, using them to illustrate ideas of independence and the gambler's fallacy. The lecture concludes with discussions on probability distributions and their relevance to statistical sampling, highlighting the convergence of a sample mean to a population mean.

      Highlights

      • Contingency tables show how events, like treatment outcomes, depend on each other using categorical variables. 📊
      • In a cocaine addiction study, desipiramine, lithium, and placebo were tested for relapse rates among participants. 💊
      • Marginal probabilities describe likelihoods of outcomes without considering other variables, focusing on totals. 🔢
      • Conditional probability assesses the impact of one event's outcome on another, highlighting dependencies. 🔄
      • Independent events illustrate how individual outcomes do not influence each other's probabilities. 🎲
      • Clarifying the gambler's fallacy: separate events do not alter each other's probabilities, even in repeated scenarios. 🎲
      • Understanding probability distributions is key to comprehending how sample means align with population means. 📈

      Key Takeaways

      • Contingency tables help illustrate dependencies between categorical variables like coin flips and treatment outcomes. 🎲
      • The lecture uses a study on cocaine addiction treatments to explain concepts of relapse rates and treatment efficacy. 💊
      • Marginal probabilities are calculated using the totals of specific conditions against the overall total. 🔢
      • Conditional probabilities explore the chance of one event given another, essential for understanding dependencies. 🔍
      • Independent events have probabilities that remain unchanged regardless of other outcomes, debunking the gambler's fallacy. 🎲
      • Probability distributions help in understanding sample means and their relation to population means, crucial for statistical analyses. 📊

      Overview

      The lecture starts by introducing contingency tables, tools that display dependencies between different categorical variables such as treatment types and outcomes. These tables provide a clear way to see how often events happen in conjunction with one another. The example of a study on cocaine addiction treatment helps illustrate how different treatments have varied relapse rates, offering real-world application of these concepts.

        Moving deeper, the notion of marginal probabilities is explained. Marginal probabilities focus on the odds of specific occurrences without the influence of other factors. The lecture stresses the importance of understanding both marginal and conditional probabilities, particularly in experiments and data analysis, where assumptions on dependence and independence of events play a critical role.

          To wrap up, the session explores probability distributions and their impact on statistics, particularly how sample means can approximate population means. This segment reinforces the importance of probability theory in understanding statistical outcomes, debunking misconceptions like the gambler’s fallacy, and emphasizing the independence of separate events in probabilistic scenarios.

            Chapters

            • 00:00 - 00:30: Introduction to Contingency Tables This chapter introduces contingency tables, which are used to describe how events depend on each other. These tables display categorical variables, which can be events like a coin flip or determining the sex of a kitten, in terms of their frequencies or relative frequencies. The chapter sets up for a famous example to illustrate the concept.
            • 00:30 - 01:00: Study on Cocaine Addiction Treatments This chapter investigates the efficacy of various treatments for cocaine addiction, known for being particularly challenging to overcome. It finds that certain antidepressants have shown promise in aiding recovery. The study referenced was conducted some time ago, involving 72 individuals who were in recovery from cocaine addiction and were randomly assigned to different treatment groups.
            • 01:00 - 01:30: Description of the Study and Treatments This chapter focuses on a study involving three different treatments aimed at preventing relapse in individuals who have completed a treatment program. The treatments included an antidepressant called desipramine, a mood stabilizer named lithium, and a placebo. The primary measure of the study was the number of participants in each group who relapsed, meaning they started using again after their treatment. The chapter discusses the setup and results of the contingency tables used in the research.
            • 01:30 - 02:00: Analysis of the Relapse Data The chapter titled 'Analysis of the Relapse Data' provides a breakdown of the number of participants in each condition and the outcomes they experienced. The study included a total of 72 participants, with 24 participants randomly assigned to each group. Random assignment was executed, likely using a random number generator to ensure unbiased distribution. This chapter seems to focus on the methodology and data representation concerning the distribution and the outcomes of the groups involved in the study. The main emphasis is on how participants were grouped and the statistical handling of the participant distribution across conditions.
            • 02:00 - 02:30: Understanding Contingency Tables The chapter titled 'Understanding Contingency Tables' discusses how to analyze data related to relapse conditions in different groups: desipramine, lithium, and placebo. It details the number of people who relapsed and those who did not in each group. Specifically, 10 people relapsed in the desipiramine group, 18 in the lithium group, and 20 in the placebo group. On the non-relapse side, there were 14 non-relapsers in the desipiramine group, 6 in the lithium group, and 4 in the placebo group. The chapter illustrates how these numbers sum up and relate to contingency tables.
            • 02:30 - 03:30: Independent vs Conditional Probability The chapter titled 'Independent vs Conditional Probability' begins with a focus on calculating 'marginal' probabilities. The instructor explains the concept using a table and emphasizes determining the likelihood that a participant did not relapse. The example provided includes a scenario where there are 24 people in a certain category, and the marginal probability is calculated by dividing this number by the total number of participants. This sets the stage for exploring more complex ideas around independent and conditional probability.
            • 03:30 - 04:00: Calculating Conditional Probabilities In this chapter, the concept of calculating conditional probabilities is discussed with a specific example. A calculation is made to determine that one-third (33%) of a sample did not experience relapse, which is termed as a marginal calculation due to its basis on table margins. Despite the seemingly low odds, it is noted that these results are relatively positive for the context of addiction. The chapter suggests that further exploration of other cells in the data could provide additional insights into participant probabilities.
            • 04:00 - 05:30: Gender and Faculty Example The chapter discusses the intersection of the outcome involving relapse versus no relapse and the type of treatment participants were receiving. It focuses on those who took desipiramine and experienced a relapse, explaining the process of deriving this data by examining the number of people in the desipiramine group who relapsed, as highlighted in a specific cell. The methodology involves dividing the identified number of relapsed individuals by the total number of participants to understand the conjunction.
            • 05:30 - 06:30: Independent Events and the Gambler's Fallacy The chapter titled 'Independent Events and the Gambler's Fallacy' begins with an example where the probability of relapsing for individuals in a desipiramine group is discussed. The probability is calculated as 10 divided by 72, which approximates to 0.139 or about 14 percent. This illustrates the concept of independent events and sets the stage for examining misconceptions like the Gambler's Fallacy.
            • 06:30 - 07:30: Normal Distribution and Sampling In the chapter titled 'Normal Distribution and Sampling,' the concept of conjunctions, specifically 'AND' conjunctions, is explored. The chapter explains that in AND conjunctions, calculations involve taking the individual cell value and dividing it by the total. This differs from marginal calculations, where a marginal cell is used, and the calculation involves taking the marginal total and dividing it by the overall total to find the marginal probability. Marginal probability helps in determining the likelihood of being in one of the groups, irrespective of the treatment received. The chapter introduces the concept of joint probability as well.
            • 07:30 - 08:00: Conclusion In the conclusion chapter, the concept of probability in the context of treatment and outcomes is revisited. The discussion focuses on the probability of a specific treatment, such as desipiramine, leading to a particular outcome, like relapse. The chapter emphasizes the idea of independence between two variables. It explains that two variables are considered independent if the probability of one event occurring is the same, regardless of the occurrence of another event. This concept is likened to independent events, such as separate coin tosses or weather conditions, where the outcome of one does not influence the probability of the other.

            Probability4 Transcription

            • 00:00 - 00:30 In this final part of the lecture  we're going to talk about what we   call contingency tables. These describe  how events depend upon one another.   A contingency table is a  display of categorical variables   like a coin flip or the sex of one kitten in terms  of their frequencies or relative frequencies. The example that I'm going to show you  is a rather a famous one and it's about
            • 00:30 - 01:00 the efficacy of various treatments for  cocaine addiction. Cocaine addiction is a   really notoriously problematic addiction,  and it's hard, it's very hard to treat.   But what does what has had some success are  certain kinds of antidepressants. So what I'm   showing you are some data from a study that was  conducted a very very long time ago actually,   in which a total of 72 people who were recovering  from cocaine addiction were randomly assigned to
            • 01:00 - 01:30 one of three treatments, and what was measured was  whether or not the or the number of people in each   group who relapse, meaning they started to use  again after they had done their treatment program.   The three treatments were an  antidepressant called the desipiramine,   a mood stabilizer called lithium and placebo.  And what I'm showing you here is the contingency
            • 01:30 - 02:00 table for the number of participants in each  condition; that experienced each outcome.   What you can see is there were a  total of 72 participants in the study   24 of them were assigned to each group, and  they were randomly assigned - so basically there   was some dicing out of who was, of which  participant numbers were in which group.   Usually a random number generator does  this so there's no actual dice involved.   But what we're calculating here, what we're  showing here, is the sum of the number of
            • 02:00 - 02:30 people in each group who relapsed and who did not  relapse. so we have the full sample space covered   and what you can see in the relapse condition  is 10 people in the desipiramine group relapsed;   18 people in the lithium group did; and  20 people in the placebo group. On the   non-relapse side of the equation, there were 14  non-relapsers in desipiramine, six in lithium,   and four in placebo. So we can see how how the  numbers sum up. And that's what a contingency
            • 02:30 - 03:00 table looks like. So the first thing we're going  to do, is we're going to calculate what we call   some 'marginal' probabilities what is the  likelihood that a participant did not relapse   what's the likelihood that they were in this  column. So we know there are 24 people in that   column and what we're going to do to calculate  that marginal probability is we are going to take   the number 24 and we're going to divide it by  the total number of participants there were and
            • 03:00 - 03:30 that's going to give us 33% odds so 24 divided  by 72 is 0.33 and so a third of the sample did   not relapse. [This is a marginal calculation  because the numbers are in the table margins]. Now, for this addiction  actually, that's not bad odds.   I know it seems terrible but it's not bad odds. Let's look at some of these other cells now.  What is the probability that a participant
            • 03:30 - 04:00 took desipiramine and relapsed? So now we're  looking at the intersection of the outcome here,   relapse versus no relapse, and the treatment type  that they were in. So how do we get this number?   Well we have the number of people who were in  desipiramine and the relapse column so that's this   cell right here that's highlighted in blue, that  has a little blue circle over it and what we need   to do to look at the conjunction here is we need  to divide that by the total number of participants
            • 04:00 - 04:30 we have so 10 divided by 72 gives us the  probability of being in the desipiramine group and   relapsing. It's 10 divided by 72, and if we do  the math on our calculators that comes out to   a number to 0.139 so about about 14 percent,  if we round, of people relapsed on discipline
            • 04:30 - 05:00 so that's an AND conjunction so when we're doing  the AND conjunctions we're taking the individual   cell and dividing it by the total whereas for  a marginal we're taking a marginal cell rather   than an individual cell. Here taking the marginal  total, and dividing it by the overall total to   get the marginal probability, the probability of  being in one group or another. So the probability   of being here or here, regardless of which  treatment you got. So the joint probability is
            • 05:00 - 05:30 the probability of getting a specific treatment  and then being in one of these outcome cells.   So here we're looking at the  probability of desipiramine and relapse.   So let's talk about the idea of Independence.   Two variables are independent if the probability  of one event, event x, given the other event,   is the same as the probability of that event by  itself. So given that coin toss one was heads,   what's the probability that coin toss 2 is heads.  Given the fact that it's a sunny day today what
            • 05:30 - 06:00 is the probability that I will eat a chocolate for  dessert - these two events are totally unrelated. If events are non-independent then we need  to think about a thing called conditional   probability. So you can think about that in the  context of my rain example from earlier - the   chances of rain given the environmental  conditions. if it's a nice sunny day and   there's not a cloud in the sky, the  chance of rain is probably lower.
            • 06:00 - 06:30 If it's cloudy and grey and it's warmer than  about three degrees Celsius, the chance of   rain becomes higher because the chance of rain is  dependent on the other environmental conditions.   So when we're thinking about the idea of  independence we're asking whether the presence of   one event makes the possibility of a second event  more or less likely. Two events are independent if   the presence of one of them is totally independent  of the presence of the other. So if we think about
            • 06:30 - 07:00 Poppy's first kitten being male, the probability  that her second kitten is male is totally   independent of the probability that her first  one was male. The the probability that kitten   number two is male is totally independent of  the probability that kitten number one was male. So we can think about conditional probabilities in  these tables as well. So conditional probability
            • 07:00 - 07:30 is the outcome of interest, event X,  given the presence of another condition,   event Y. We usually write that using an equation  that looks like this: the probability of X;   this up and down bar here, this vertical bar,  you read that as 'given'. So the probability of X   given Y equals the probability of X and Y divided  by the probability of Y so let's look at where   those numbers come from in this table. So we have  the probability of relapse and desipiramine which
            • 07:30 - 08:00 we've already calculated. We know the probability  of relapse and desipiramine is 10 divided by 72.   And we need to divide that by the  probability of desipiramine. We know   that 24 of 72 people were assigned to that  condition so that number is, we get that,   by dividing the marginal probability of  being in one condition versus another,   divided by the total number of participants  there are. Now we're going to simplify that math
            • 08:00 - 08:30 so the 72s are both going to go away [review  your algebra if you don't remember this]. This   is going to become 10 divided by 24  and that gives us a probability of 41.7 So the probability of relapse given  desipiramine is about 42 percent. When you look at the numbers, that's  pretty close to reasonable and right.   10 of the people in desipiramine relapsed out of  a total of 24. That's give or take 42 percent.
            • 08:30 - 09:00 Now we can think about conditional  probabilities and getting those from   tables. now we're going to look at these over the   rows here because that makes the most  sense we have treatment type in rows, and we have outcomes in columns. And we usually  think about the independent variable or treatment   in an experiment being the cause of the outcome,  So that's why we're looking at it in a rows. we
            • 09:00 - 09:30 could do it the other way as well. So this is  the probability of relapse given desipiramine,   we already said that was about 42 percent. We  can look at the probability of relapse given   lithium and 18 people out of 24 relapsed in  the lithium group. That's about 75 percent,   and we can look at the probability of relapse and  placebo. I have 20 out of 24 people relapsed so   that probability is 83%. It actually looks like  the desipiramine, even though it has a 42% relapse
            • 09:30 - 10:00 rate, it's still a pretty good treatment. And that  is what we're doing when we talk about conditional   probability. And now you can see that this becomes  interesting when we think about psychological   experiments and testing psychological hypotheses.  We can do this, of course, the other way around as   well. We can calculate what's the probability that  if a participant relapses what's the probability   that they took to desipiramine, in which case,  we're going to do the totals here in the columns   not the rows. Again this doesn't make super much  sense. I'm showing it to you just so you can get
            • 10:00 - 10:30 a feel for where these numbers come from. So  the probability of desipiramine given relapse   is 10 divided by 48 or 21 percent and  we can carry it forward from there.   And that brings us to the multiplication rule.  If X and Y are two outcomes or events, then the   probability of X and Y equals the probability of  X given X times the probability of Y. Remember,
            • 10:30 - 11:00 you've seen a formula that looks really similar  to this earlier this is the formula, it's the   conditional probability formula and it's just been  rearranged a bit. It's useful to think about X as   the outcome of interest and Y is the condition  that caused it or was associated with it. So here is a hypothetical probability  distribution. It's totally made up and   the numbers are made up because they work out  in a very nice way. So we could talk about   gender identity and faculty. So we could you  know randomly sample a total of 100 students
            • 11:00 - 11:30 over in UCC who are walking through. we could  have them identify as either a woman or a man   on a little survey and then tell us are they from  the faculty of Social Science or are they studying   in Science. Now UCC is of course very close to the  social science building and farther away from the   science building so actually we get more social  science students than we do science students and   that kind of makes sense. So the probability that  a randomly selected student is in social science,
            • 11:30 - 12:00 and you should think about doing that  math before I show you what the answer is,   is 60 out of 100 or 0.6 so we have a 60%  likelihood that a randomly sampled student in   UCC is from Social Science. They're more likely  to be from social science than from science,   because social science is simply in closer  proximity; and so where are you going to   get a coffee if you're in science? Well  you're probably going to go somewhere like   um one of the coffee places that's over closer to  your building versus if you're in social science
            • 12:00 - 12:30 you're probably going to go to the Starbucks  at the UCC or maybe the Timmy's at the UCC.   But we can also ask about gender identity  and faculty so what's the probability the   randomly selected student is a social science  major given that they identify as a woman? Again think about where the numbers come  from before I tell you the answer you   can pause and do your calculations  and check and see if you're right.
            • 12:30 - 13:00 So that probability happens to be 30 divided  by 60. that's the probability that a randomly   selected student is a social science major given  that they identify as a woman so 30 out of 60. And the probability that a random social  science student identifies as a man is   actually equal to the probability  that they identify as a woman.
            • 13:00 - 13:30 And both of those are equal to the probability  that they identify as social science. And that   means that the probability of being in  social science is independent of gender. And that's one way to think about these   Independence equations in  the context of probability. So to sum that up, if the probability of event  X given event Y equals the probability of X,   then event X and Event Y are independent. At a  conceptual level, that means knowing something
            • 13:30 - 14:00 about Y doesn't tell us anything about X.  Mathematically if and only if X and Y are   independent the probability of X AND Y equals the  probability of X times the probability of Y. So   the joint probability is the product of their two  marginal probabilities if they are independent.
            • 14:00 - 14:30 Let's do one final experiment: two rolls of a fair  die. What's the probability that you roll a three   given that you just rolled a two? And the sample  space, of course is here, we've identified this,   so the probability of rolling a two is one and six   what's the probability that you  roll a three on the next roll? Think about that very carefully before you answer  the question? probability of getting a three is
            • 14:30 - 15:00 also one and six. Are they related to one another  or are they independent of one another? It turns   out that they're independent of one another.  Two rolls of the same fair die, if we roll if we roll the two in our first roll, what's  the probability that we then roll a three.   It's still one in six. And this is the gambler's  fallacy that you need to be careful of here. The
            • 15:00 - 15:30 gambler's fallacy states that you know if you have  a long run of bad luck eventually you're going to   get lucky. Now that might be true but every roll  of the dice is independent, every flip of a card   if you're thinking about black cards or  reds; or the roulette wheel is a good one,   blacks or reds, These are independent  probabilities and it's important to   remember that if you have two  events that are unrelated to   one another the probability of one does  not affect the probability of the other.
            • 15:30 - 16:00 So if these events are independent,  the probability of a three given that   you've just rolled a 2, is equal to the  probability that you will roll a three. How does this relate to statistics? So remember  a little bit ago when we talked about samples   and distributions? What does this mean for  sampling? We're going to revive this normal   distribution - you're going to see this picture  a lot in this class - so consider that normal
            • 16:00 - 16:30 distribution: what's the probability of randomly  drawing a score that falls within one standard   deviation of the mean? Well, 34.13% of the scores  fall within one within the mean and one standard   deviation below it and 34.13, and these are  rounded to four decimal places here, percent of   scores are between the mean of zero  and plus one standard deviation.
            • 16:30 - 17:00 So we have a total of a 68 percent and a  bit chance. So 68% is 34 plus 34 is 68%. We have about a 68 chance of selecting a score, by  pure random chance, that falls within one standard   deviation of the mean. Our chances of selecting a  score within two standard deviations are this plus   this, plus this, plus this. And so the thing  to remember about probability distributions,
            • 17:00 - 17:30 and why they're important in statistics is they  tell us something about the means and about the   scores that we are drawing, and about the  likelihood of those scores being drawn   from the population. And that's an important  thing to keep in mind as you move forward.   It's also why it is that the mean of a  representative sample converges on the true   population mean. So if you imagine we take our  representative sample, a totally random sample,
            • 17:30 - 18:00 from this population and it's large enough to be  representative. So let's say it's not two people   or three people, let's say we take a good size  sample like 150, or 200, or 400 people, something   like that, if we have a good representative  sample, the mean of those participants,   because we are substantially more likely to sample  scores that are closer to the mean than scores   that are far away from the mean, will converge on  the true population mean. So that's the take-home
            • 18:00 - 18:30 point for this probability idea - or at least  one of important one. Thanks for listening.