Sampling Bias and Error in Research

Estimation2

Estimated read time: 1:20

    Summary

    In "Estimation2," Erin Heerey explores the nuances between sampling error and bias in research contexts. Sampling "error" isn't necessarily a mistake but refers to random variations in sample selections. Contrarily, sampling "bias" happens due to factors like non-response or self-selection, which can skew results. A historical example illustrated is the 1948 U.S. presidential election where a telephone poll led to a misprediction due to sampling bias. Heerey also discusses the importance of precision in measurement tools to avoid non-sampling errors, ensuring conclusions drawn from data accurately represent the studied population.

      Highlights

      • Understanding the difference between sampling error and sampling bias is crucial in research. 📚
      • Sampling error involves random sample variation, while sampling bias is systematic and related to sample selection. 🎯
      • Factors like non-response, self-selection, and convenience samples can introduce bias into research. 🙅
      • The 1948 U.S. election poll mistakes highlight the danger of biased sampling methods. 🗳️
      • Non-sampling errors include measurement issues, leading to inaccurate data conclusions. ⚠️

      Key Takeaways

      • Sampling error is more like 'noise' than a mistake, representing random variation in sample selection. 🤔
      • Sampling bias occurs when the sample does not represent the population properly, often due to non-responding or self-selection. 📊
      • Convenience samples are often used in psychology, but they're prone to bias. Be cautious about generalizing results! 🤯
      • A famous biased sample was the 1948 election's phone survey, predicting the wrong winner due to limited phone ownership. 📞
      • Precise and valid measurement tools are crucial for accurately interpreting population data, avoiding excess noise. 🛠️

      Overview

      In the world of research, not all errors are created equal. Erin Heerey explains that what we often call a sampling 'error' in statistics isn't actually a mistake. It's more akin to 'noise'—an unavoidable product of random sample variations. But when it comes to sampling 'bias,' that's where things get a bit more dicey. Bias occurs when our sample skews toward a certain group for various reasons, like only sampling people who are willing to show up in person or take a survey. This skewed sampling can lead to incorrect conclusions, prompting researchers to exercise caution and awareness.

        Take, for instance, the infamous 1948 U.S. presidential election poll. Back then, not everyone had a landline, mostly wealthy or stable households did. The Chicago Tribune conducted a national phone survey predicting that Dewey would defeat Truman. Of course, that turned out to be wrong, and Truman's victory highlighted how a seemingly random but biased sample could lead entire newsrooms astray. Research demands accuracy, and avoiding sampling bias is a crucial step in achieving reliable results.

          Beyond sampling bias, Erin emphasizes the importance of precision in measurement tools. Whether it's designing a survey to capture extroversion or other personality traits, the instrument's accuracy directly affects data interpretation. Any lapse in measurement validity—like poorly phrased questions or failing to reverse-score items—can introduce unwanted noise. This makes meticulous planning and design integral to sidestep non-sampling errors and ensure that the findings genuinely reflect the studied population.

            Chapters

            • 00:00 - 00:30: Introduction to Sampling Bias and Error This chapter introduces the concept of sampling bias and the distinction between 'sampling error' and mistakes. The term 'error' in the context of sampling does not imply a mistake, but rather represents 'noise' or random variation that occurs when selecting one sample over another.
            • 00:30 - 01:00: Differentiating Sampling Error from Bias The chapter titled 'Differentiating Sampling Error from Bias' focuses on explaining the concept of sampling error. Sampling error is described as random variation that occurs in a truly random sample. This kind of error is not a mistake, but rather a natural occurrence where some individuals are included or excluded from the sample by random chance. The chapter aims to clarify the difference between sampling error and bias, emphasizing that sampling error is not the result of a systematic mistake but a product of random variation.
            • 01:00 - 01:30: Examples and Types of Sampling Bias This chapter discusses the concept of sampling bias. The speaker differentiates between general bias and sampling bias, explaining that sampling bias occurs when the sample selected for a study is not representative of the population due to certain criteria. An example shared is the requirement for participants to physically come to a campus for lab studies, potentially excluding those who can't travel. The chapter also touches on the shift to online studies, which might affect the sampling pool differently.
            • 01:30 - 02:30: Non-Response and Self-Selection Bias The chapter discusses the concept of non-response and self-selection bias in sampling. It provides an example where geographical constraints and funding limitations prevent the acquisition of a truly random sample. Individuals from far locations, such as London, England, are less likely to be included due to the logistical and financial impracticalities involved in bringing them over. This example illustrates how certain biases can affect the representativeness of a sample in research.
            • 02:30 - 03:30: Convenience Sampling and Its Implications The chapter discusses the concept of convenience sampling and the potential implications it can have, particularly focusing on sampling bias. There is emphasis on understanding different types of sampling bias that can occur, with specific mention of 'non-response' bias. This type of bias is exemplified by a scenario where a study advert is posted, aiming to explore how people communicate with strangers on various topics, but is affected by those who do not respond, thereby impacting the research outcomes.
            • 03:30 - 05:30: 1948 U.S. Presidential Election Polling Error Example The chapter discusses the potential for bias in data collection, specifically using the example of the 1948 U.S. Presidential Election polling error. It highlights how personal traits, like introversion, and reluctance to interact with strangers can influence who responds to study advertisements. This could lead to biased samples if there is a systematic reason behind why certain groups choose not to participate. This is important to consider when gathering data, as certain demographics might be underrepresented, affecting the study's accuracy and reliability.
            • 05:30 - 06:30: Issues with Representativeness in Sampling The chapter discusses the potential issues of bias in sampling, even when trying to select participants randomly. It illustrates how even a random approach, such as emailing a randomly selected group from a university, does not guarantee a representative sample. This is because individuals' willingness to participate varies; some might find the study appealing and join, while others may refuse without providing feedback, leading to non-response bias.
            • 06:30 - 09:30: Precision of Measurement in Studies The chapter discusses the precision of measurement in research studies. It begins with the process of participants signing up and consenting to the study. The chapter emphasizes the importance of considering participant comfort when designing questionnaires. It highlights a scenario where difficult questions may lead participants to leave answers blank, particularly if the questions are uncomfortable for those with specific traits. This can result in non-response bias, which is a critical factor for researchers to consider as it may affect the study's outcomes.
            • 09:30 - 13:00: Non-Sampling Errors and Misinterpretations The chapter discusses types of non-sampling errors that can occur in data collection and how they can lead to biased measurements. It introduces concepts like non-response bias, where individuals who do not respond differ systematically from those who do, and self-selection bias, where the way a study is advertised can attract a particular subset of respondents, thereby skewing the results. These biases are significant because they undermine the validity of the survey results.

            Estimation2 Transcription

            • 00:00 - 00:30 Now, I said we were going  to talk about sampling bias.   Before we do that I want to unpack  this word here - this word "error".   When we make sampling "errors", because we  randomly sample one person and not another,   it's not an "error" as in  a mistake that we've made.   It's an error more like 'noise' so it's  not something that we've done wrong,   it's not technically mistake that we've made. So  the term "error" when we're talking about it here,
            • 00:30 - 01:00 is not actually a mistake that you make. It  just means "random variation", and that's the   way you should think about it. Now what I'd  like to do is differentiate sampling error,   which is random variation in in a truly random  sample, in which one person gets included by   random chance and another person gets excluded  by by random chance, that's sampling error;
            • 01:00 - 01:30 and I'd like to differentiate that from  sampling "bias" because there are lots of   ways in which our samples can be biased. One  of which is, I do a lot of in-person studies   and what that means is that only certain people  are eligible to be in my studies and those are   people who happen to be able to come into  campus and participate in a study in my lab.   For example this term we're  online, and if we're online, then
            • 01:30 - 02:00 you might be taking this class from London,  England. And if you're taking this class from   London England, you are not geographically  desirable as a participant. It would take   too much money for me to fly you over here and  I don't have that big of a government grant.   So I can't take a truly random sample. My  samples may be biased for another reason.
            • 02:00 - 02:30 I may have sampling error based on some of these  other reasons. so let's talk about sampling bias.   There's a long list of things that are included in  types of sampling bias. I'm highlighting a couple   of them here. One of the sampling biases that  we regularly encounter are... there's one called   'non-responding'. So let's say I post an advert  for my study, and let's say my study is a study of   how people talk to strangers  about different kinds of topics.
            • 02:30 - 03:00 And maybe you're really introverted. Do you want  to come in and talk to strangers? Maybe you don't.   So, some people may read my study advert and they  decide, "I don't want to be part, I don't want to   do that," and so they don't contribute data. That  will bias my sample, in particular if there's a   systematic trait that motivates who does and does  not respond to my ad that can lead to bias. Even
            • 03:00 - 03:30 if I approach participants randomly - so even  if I don't sample by add and instead I have a   random selection of email addresses of people at  the University and I randomly email out to just   a randomly selected group, when I give them my  study advert some people are going to be like, "Oh   yeah that sounds fun, I think I'll do that," other  people are going to say "Don't call me, I'll call   you." And they're not going to want to take part  in my study and that can lead to bias even though   I've approached participants randomly. We can also  get non-response bias that happens directly in a
            • 03:30 - 04:00 study. So after people decide to sign up for your  study, they consent to your study; let's say I   have a questionnaire that asks difficult questions  that that might make people uncomfortable. If my   questionnaire makes makes people with a particular  trait uncomfortable and they decide to leave those   questions blank, which is the right of research  participants - they don't have to answer questions   that they don't want to answer. So in that case I  will get non-response bias as well, and that can
            • 04:00 - 04:30 lead to a biased [measurement], even if [the  sample] wasn't biased originally, it can be   biased in the context of the data collection. So  that's a also non-response or non-responder bias.   We can also get self-selection. If I advertise my  study widely and see who responds which is usually   what I do, my survey topic can encourage some  people to respond and actually turn other people   off, so that's kind of a self-selection bias. If  I'm doing, for example a study of of a treatment
            • 04:30 - 05:00 for a particular problem that people might have,  maybe I have an anti-procrastination treatment so   I send out an advert and some people are  like, "Nah I don't want to learn how not to   procrastinate, I like my procrastination."  And we all know people like that, in fact   sometimes I'm one of those. So I might choose not  to participate in that study so again, 'm going to   select myself into your study or deselect myself  based on traits, it's a bit similar to this kind
            • 05:00 - 05:30 of non-responding bias idea. And then we have  a convenience sample. So we had the convenience   sample of squirrels earlier - these squirrels  I could sample right here on the UWO campus.   Most of the samples that we get in Psychology are  samples of convenience - so that means we need to   be really, really careful about how we generalize  those up to a larger population, because sampling   bias can lead to wrong conclusions. I'm  going to give you an example of that.
            • 05:30 - 06:00 So there was an example that happened in  the 1948 U.S. presidential election. The   Chicago Tribune did a poll, in which it asked a  nationwide telephone poll of voters who they were   going to vote for. The two choices were Harry  Truman who was a Democrat and Thomas Dewey who   was a Republican. This guy on the top here is is  Truman and this guy down here is Dewey. So what   happened was the Tribune took a random selection  of individuals from the Bell System directory.
            • 06:00 - 06:30 At that time the there was a national phone  directory run by a company called Bell, which   is here in Canada as well but they are different  companies. Now and it had a large sample size,   and on the basis of that poll, the Chicago Tribune  predicted a win for Dewey and in fact, overnight   they even ran a cover story suggesting that Dewey  had had won. Now here's [a photo of] the cover
            • 06:30 - 07:00 story: The Chicago Daily Tribune "Dewey Defeats  Truman" Does this look like Dewey there? Does this   look like a man who lost the election? It does  not. This is President Truman, the day after his   election. The mistake the Tribune made was to  use a telephone poll. Why was this a mistake?   So back in 1948 we didn't have mobile phones.  Nobody had a phone in their pocket and everyone
            • 07:00 - 07:30 didn't have a phone. People who had a phone  were people who tended to own their own houses.   People who owned their own houses were more  likely to have a telephone line than people   who didn't. So if you were renting an apartment,  for example, or a house, if you were renting you   probably didn't have a telephone line directly  into your house and instead what you would have   done was you would have used a pay phone that  might have been available. Back in the olden days,
            • 07:30 - 08:00 you could put coins into telephones and make phone  calls that way. We don't have that anymore really,   in fact I don't think I've seen a pay phone in  ages except in a museum and that's mostly because   most of us carry around our phones. In fact at my  house, because I bought this house after I moved   to Canada - I moved to Canada in 2015. So after I  moved to Canada and bought my house, I didn't even   put in a landline so I don't have a landline,  all I have is the phone in my pocket.
            • 08:00 - 08:30 So the mistake there was to use a telephone poll  and in 1948 because most people didn't have a   private phone, the people who had phones tended  to be wealthier, they tended to be homeowners,   and they tended to be people with stable  addresses; and those people are also more   likely to vote Republican. And so what that  led to was a sample, even though it was random,
            • 08:30 - 09:00 that was not representative of the true  population. The representativeness of the   sample is a major factor in whether you have  an accurate estimate of your population. If   the sample is not representative, the  conclusions that you make from that   sample might be biased. And even with random  sampling these estimates can still be biased   if the criterion that you base your random  draw on is likely to exclude certain people.
            • 09:00 - 09:30 For example, one of the one of the really big  problems in in census data is connecting with   homeless populations. So I might get a card that  comes through my post flap at my house - where,   when the Canadian census happens, I get  a card and then I go online and do the   survey. But what if you're  homeless? Where does that card go?   So if you don't have a stable address, you  can end up with a biased sample. So there's
            • 09:30 - 10:00 actually a lot of outreach that happens with  the community shelters and organizers to try   to survey people who are homeless; otherwise  that's a population that tends to be excluded. Another thing we should talk about is the  precision of a measurement. So the design   of a measurement instrument is a critical  element in your ability to make inferences   about the population. Your measurement needs to  be precise enough for the purposes of your study,
            • 10:00 - 10:30 but shouldn't introduce extra noise. If your  measurement isn't carefully selected and   carefully calibrated, your conclusions may not  generalize beyond the lab. So that's something   that's really important to consider. Again, I  gave the example earlier about a questionnaire   measure of extroversion. What items do you put  in? You need it to be precise enough to cover your   construct fully and to be a valid measure of that  construct. So it needs to really cover all the
            • 10:30 - 11:00 elements of the construct. You might think about  extroversion as having an element of sociability,   and of enjoyment of social situations, feeling  energized by social situations, but also a little   bit of risk-taking, right? Because people  who are extroverted are much more likely to   introduce themselves to strangers than people  who are introverts who might smile shyly from   the other side of the room. So if all of those  constructs that make up the idea of extraversion
            • 11:00 - 11:30 are not properly measured what you can end  up with is a measurement that doesn't quite   work and only selects or only identifies some of  your population but not your whole extroverted   population. Or it introduces noise, because of  course, it's rare that everybody ticks all the   boxes. Often when we have things that we're trying  to measure, some people [with the trait] tick some
            • 11:30 - 12:00 boxes and other people don't tick those  boxes and they tick different boxes.   So when we're making measurements, we need to  make sure that they're covering our construct   fully and without introducing excess noise. So  we don't want to have questions that are going   to be so specific that only a few people are  going to want to answer them, because things   like that will introduce extra noise and  then our measurements may not generalize.   And that leads us into non-sampling errors. So  we can also have errors, and these are more like
            • 12:00 - 12:30 mistakes, where we have measurement error - so  measurements that lack validity or reliability. We   can also get measurement error in terms of making  a mistake with our measurement recording - so   responses are not accurately recorded.  That can be a cause of measurement error.   We can also make calculation or mathematical  errors in data analysis, so data could be   inaccurately summarized for example in  a questionnaire. So in my extroversion
            • 12:30 - 13:00 questionnaire, I might have a question that  says, "I really love to go to parties",   and another question that says, "I'm more of  a stay-at-home kind of person", both of those   measure extroversion but in different directions  so I need to make sure that I reverse score that   second question so that it's coded in the same  direction. Items like that can lead to errors in   statistical analysis. I can also do the wrong  analysis for what data I have and that'll be   another version of an error. And finally we  can have errors of misinterpretation where
            • 13:00 - 13:30 we don't accurately interpret our results or our  statistical analyses, or where our interpretation   oversteps our methods. For example, if I give  you a questionnaire about your social behavior   and then, in the conclusions of my study, I  talk about your social behavior rather than your   report of your social behavior (which is all I've  measured), then what I'm doing is I'm overstepping   my method. So I can be misinterpreting my  methods. If I only ask you about your behavior,
            • 13:30 - 14:00 I can only make conclusions about what  you tell me and I can't make conclusions   about your actual behavior because what you  tell me might be totally true as you see it,   but that might not be actually what you do. And  many of us don't have as much insight into our   real social behaviors as we think we do. So  that can lead to a different kind of error,   an error of misinterpretation. We'll  continue the lecture in the next video.