The era of blind faith in big data must end | Cathy O'Neil
Estimated read time: 1:20
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
In her TED talk, Cathy O'Neil challenges the widespread belief in the infallibility of algorithms and big data. She argues that algorithms often embed biases, reflecting historical patterns of discrimination and injustice. Highlighting examples from the education system to media, she calls for transparency and accountability in their use. O'Neil advocates for what she calls an 'algorithmic audit' to ensure fairness, urging data scientists to engage in ethical discussions rather than just technical ones, and emphasizes that this is a political challenge, not just a technical one.
Highlights
A complex, secret algorithm was used to score teachers, causing misinformation and wrongful firings 📉.
Algorithms can result in bias against minorities in many sectors, like law enforcement and hiring practices ⚖️.
Data laundering is when biases are hidden in 'objective' algorithms, coined as 'weapons of math destruction' by O'Neil 💣.
The need for algorithmic audits to ensure fairness and identify bias before deployment 🛠️.
O'Neil stresses that discussions around algorithms should be ethical and socio-political, not just mathematical 🔍.
Complex algorithms in policing can perpetuate racial biases by predicting crime in minority areas only 🚓.
Transparent and fair algorithms can help ensure accurate assessments and decisions across industries 🌟.
Key Takeaways
Algorithms are not objective; they carry the biases of their creators and historical data 📊.
Secret algorithms can lead to unfair consequences, affecting jobs and opportunities 🤖.
Algorithmic accountability and transparency are crucial for fairness 🔍.
Algorithms may perpetuate existing social biases and inequalities 🚨.
The power held by private companies in creating and deploying algorithms needs scrutiny and regulation 🏢.
An 'algorithmic audit' can help check biases in data and definitions of success ☑️.
Faulty algorithms can cause widespread harm over time without being noticed ⚠️.
Ethical considerations should be at the forefront of data science 🎓.
Blind application of algorithms can automate and perpetuate the status quo 🌐.
Overview
In the TED talk, Cathy O'Neil dismantles the myth of algorithmic objectivity, urging transparency and ethical oversight. She provides compelling examples, such as the wrongful firing of teachers in Washington D.C. based on a flawed scoring algorithm. Such systems, she argues, can cause profound harm when their intricacies remain opaque and unquestioned.
O'Neil outlines the inherent biases lurking in algorithms, influenced by historical data and subjective success definitions. By sharing vivid anecdotes, such as the biased justice system data or the potential for discriminatory hiring algorithms, she paints a worrying picture of the current reliance on these 'black box' systems. These algorithms, she suggests, often enshrine societal inequities rather than solving them.
To combat this, O'Neil proposes conducting 'algorithmic audits' to root out biases and ensure fairness—the digital equivalent of a blind audition for orchestras. Her call to action is clear: data scientists must partake in ethical discussions rather than act solely as technical operators. The ultimate goal is accountability, transforming this debate from a mathematical puzzle into a pressing political issue.
Chapters
00:00 - 00:30: Introduction to Algorithms The chapter 'Introduction to Algorithms' discusses the pervasive nature of algorithms in everyday life. It explores how algorithms influence various aspects of our lives, from job opportunities to financial products such as credit card offers, determining winners and losers based on algorithmic sorting and decision-making processes.
00:30 - 01:00: Secret Formulas and Definitions of Success The chapter "Secret Formulas and Definitions of Success" explores the complexities and opaqueness of algorithms used to determine success. These algorithms are often based on secret formulas and past data, without clear systems for appeal. The chapter questions the accuracy of such algorithms, especially when their definitions of success are set by potentially flawed or biased data. It emphasizes the need to critically assess these algorithms, as they are trained to associate specific patterns with success without necessarily considering broader or alternative understandings of what success could mean.
01:00 - 02:00: Personal Algorithm Example The chapter discusses the concept of personal algorithms, highlighting that everyone utilizes algorithms in daily life even if they don't write them as formal code. An example provided is the algorithm used by the author to prepare family meals. This involves using data such as available ingredients, time, and personal ambition to achieve a successful outcome, defined by the author as a satisfying meal. The narrative humorously mentions the exclusion of ramen noodles as real food.
02:00 - 02:30: Marketing Myths of Algorithms The chapter 'Marketing Myths of Algorithms' challenges the common perception that algorithms are objective and scientific. It highlights that algorithms are actually opinions embedded in code. The chapter uses an analogy of a meal's success to describe how subjective opinions influence the definition of success. Just as a parent's opinion on a successful meal (including children eating vegetables) differs from a child's preference (eating Nutella), the opinions of those who create algorithms are embedded within them, affecting their outcomes.
02:30 - 03:30: Algorithmic Impact on Education In the chapter titled 'Algorithmic Impact on Education,' the focus is on the psychological and practical effects that algorithms and big data have within educational systems. The chapter opens by discussing how marketing strategies use the mystique of algorithms to engender trust and intimidation—playing on the reverence people have for mathematics. It cautions against placing blind faith in big data, citing the potential pitfalls of such an approach. The narrative is exemplified by Kiri Soares, a high school principal in Brooklyn, who recounts her experience in 2011 when her teachers were subjected to evaluations based on a complex and secretive algorithm. This situation highlights issues of transparency and accountability when algorithmic processes are implemented in educational settings.
03:30 - 04:30: Problems with Consistency in Algorithms The chapter titled 'Problems with Consistency in Algorithms' discusses difficulties encountered with the 'value-added model' used in education. The narrator describes an incident where an education department contact dismissively told a colleague that the formula was too complex for her to understand. The situation escalates when the New York Post obtains and publishes teacher ratings and scores through a Freedom of Information Act request, aiming to shame teachers. Efforts to access the source code via similar legal channels were attempted, highlighting issues of transparency and consistency in the application of such algorithms.
04:30 - 06:00: Potential for Bias in Hiring Algorithms The chapter 'Potential for Bias in Hiring Algorithms' discusses a case in New York City where the formula used for hiring decisions was not understood by anyone, even those within the city. A critical figure, Gary Rubinstein, analyzed this issue by studying data from 665 teachers who had received scores for different grades they taught, illustrating the need for transparency and understanding in algorithmic decision-making processes.
06:00 - 08:00: Bias in Law Enforcement Algorithms The chapter discusses the issue of bias in law enforcement algorithms, highlighting the case of Sarah Wysocki. Despite having commendable recommendations from her principal and the parents of her students, Sarah was fired along with 205 other teachers from the Washington, DC school district due to reliance on a flawed algorithm. The scenario is humorously described as resembling a random number generator, inadvertently highlighting the unpredictability and potential injustice of using such algorithms for individual assessments. The chapter implicitly questions the validity and transparency of these algorithmic assessments, engaging data scientists and AI experts to ponder the implications.
08:00 - 09:30: Profit and Bias in Private Algorithms Algorithms, even when designed with good intentions, can go wrong. Unlike a badly designed airplane that crashes visibly, a flawed algorithm can cause damage over time without leaving a trace.
09:30 - 11:00: Algorithmic Audits and Fairness The chapter discusses the founding of Fox News in 1996 and highlights issues related to sexual harassment allegations made by over 20 women against the network. Despite the ousting of a key figure last year, these problems persist. The chapter then explores the idea of using machine-learning algorithms to revamp Fox News's hiring process as a potential solution to foster a more equitable environment, inviting readers to consider the implications of this technological approach.
13:00 - 14:00: Conclusion - Ethical Responsibility in Algorithms The chapter discusses the concept of data choice in the context of algorithm development, particularly referencing the scenario of analyzing applications to an organization like Fox News over 21 years to define success metrics. Success could be identified by tenure and promotion within the company. It touches upon training algorithms to identify successful application traits, emphasizing the ethical responsibility in choosing data and defining success.
The era of blind faith in big data must end | Cathy O'Neil Transcription
00:00 - 00:30 Algorithms are everywhere. They sort and separate
the winners from the losers. The winners get the job or a good credit card offer. The losers don't even get an interview or they pay more for insurance.
00:30 - 01:00 We're being scored with secret formulas
that we don't understand that often don't have systems of appeal. That begs the question: What if the algorithms are wrong? To build an algorithm you need two things: you need data, what happened in the past, and a definition of success, the thing you're looking for
and often hoping for. You train an algorithm
by looking, figuring out. The algorithm figures out
what is associated with success.
01:00 - 01:30 What situation leads to success? Actually, everyone uses algorithms. They just don't formalize them
in written code. Let me give you an example. I use an algorithm every day
to make a meal for my family. The data I use is the ingredients in my kitchen, the time I have, the ambition I have, and I curate that data. I don't count those little packages
of ramen noodles as food. (Laughter) My definition of success is:
01:30 - 02:00 a meal is successful
if my kids eat vegetables. It's very different
from if my youngest son were in charge. He'd say success is if
he gets to eat lots of Nutella. But I get to choose success. I am in charge. My opinion matters. That's the first rule of algorithms. Algorithms are opinions embedded in code. It's really different from what you think
most people think of algorithms. They think algorithms are objective
and true and scientific.
02:00 - 02:30 That's a marketing trick. It's also a marketing trick to intimidate you with algorithms, to make you trust and fear algorithms because you trust and fear mathematics. A lot can go wrong when we put
blind faith in big data. This is Kiri Soares.
She's a high school principal in Brooklyn. In 2011, she told me
her teachers were being scored with a complex, secret algorithm
02:30 - 03:00 called the "value-added model." I told her, "Well, figure out
what the formula is, show it to me. I'm going to explain it to you." She said, "Well, I tried
to get the formula, but my Department of Education contact
told me it was math and I wouldn't understand it." It gets worse. The New York Post filed
a Freedom of Information Act request, got all the teachers' names
and all their scores and they published them
as an act of teacher-shaming. When I tried to get the formulas,
the source code, through the same means,
03:00 - 03:30 I was told I couldn't. I was denied. I later found out that nobody in New York City
had access to that formula. No one understood it. Then someone really smart
got involved, Gary Rubinstein. He found 665 teachers
from that New York Post data that actually had two scores. That could happen if they were teaching seventh grade math and eighth grade math. He decided to plot them. Each dot represents a teacher.
03:30 - 04:00 (Laughter) What is that? (Laughter) That should never have been used
for individual assessment. It's almost a random number generator. (Applause) But it was. This is Sarah Wysocki. She got fired, along
with 205 other teachers, from the Washington, DC school district, even though she had great
recommendations from her principal and the parents of her kids. I know what a lot
of you guys are thinking, especially the data scientists,
the AI experts here.
04:00 - 04:30 You're thinking, "Well, I would never make
an algorithm that inconsistent." But algorithms can go wrong, even have deeply destructive effects
with good intentions. And whereas an airplane
that's designed badly crashes to the earth and everyone sees it, an algorithm designed badly can go on for a long time,
silently wreaking havoc. This is Roger Ailes. (Laughter)
04:30 - 05:00 He founded Fox News in 1996. More than 20 women complained
about sexual harassment. They said they weren't allowed
to succeed at Fox News. He was ousted last year,
but we've seen recently that the problems have persisted. That begs the question: What should Fox News do
to turn over another leaf? Well, what if they replaced
their hiring process with a machine-learning algorithm? That sounds good, right? Think about it.
05:00 - 05:30 The data, what would the data be? A reasonable choice would be the last
21 years of applications to Fox News. Reasonable. What about the definition of success? Reasonable choice would be, well, who is successful at Fox News? I guess someone who, say,
stayed there for four years and was promoted at least once. Sounds reasonable. And then the algorithm would be trained. It would be trained to look for people
to learn what led to success, what kind of applications
historically led to success
05:30 - 06:00 by that definition. Now think about what would happen if we applied that
to a current pool of applicants. It would filter out women because they do not look like people
who were successful in the past. Algorithms don't make things fair if you just blithely,
blindly apply algorithms. They don't make things fair. They repeat our past practices,
06:00 - 06:30 our patterns. They automate the status quo. That would be great
if we had a perfect world, but we don't. And I'll add that most companies
don't have embarrassing lawsuits, but the data scientists in those companies are told to follow the data, to focus on accuracy. Think about what that means. Because we all have bias,
it means they could be codifying sexism or any other kind of bigotry.
06:30 - 07:00 Thought experiment, because I like them: an entirely segregated society -- racially segregated, all towns,
all neighborhoods and where we send the police
only to the minority neighborhoods to look for crime. The arrest data would be very biased. What if, on top of that,
we found the data scientists and paid the data scientists to predict
where the next crime would occur? Minority neighborhood.
07:00 - 07:30 Or to predict who the next
criminal would be? A minority. The data scientists would brag
about how great and how accurate their model would be, and they'd be right. Now, reality isn't that drastic,
but we do have severe segregations in many cities and towns, and we have plenty of evidence of biased policing
and justice system data. And we actually do predict hotspots,
07:30 - 08:00 places where crimes will occur. And we do predict, in fact,
the individual criminality, the criminality of individuals. The news organization ProPublica
recently looked into one of those "recidivism risk" algorithms, as they're called, being used in Florida
during sentencing by judges. Bernard, on the left, the black man,
was scored a 10 out of 10. Dylan, on the right, 3 out of 10. 10 out of 10, high risk.
3 out of 10, low risk.
08:00 - 08:30 They were both brought in
for drug possession. They both had records, but Dylan had a felony but Bernard didn't. This matters, because
the higher score you are, the more likely you're being given
a longer sentence. What's going on? Data laundering. It's a process by which
technologists hide ugly truths inside black box algorithms and call them objective;
08:30 - 09:00 call them meritocratic. When they're secret,
important and destructive, I've coined a term for these algorithms: "weapons of math destruction." (Laughter) (Applause) They're everywhere,
and it's not a mistake. These are private companies
building private algorithms for private ends. Even the ones I talked about
for teachers and the public police, those were built by private companies
09:00 - 09:30 and sold to the government institutions. They call it their "secret sauce" -- that's why they can't tell us about it. It's also private power. They are profiting for wielding
the authority of the inscrutable. Now you might think,
since all this stuff is private and there's competition, maybe the free market
will solve this problem. It won't. There's a lot of money
to be made in unfairness. Also, we're not economic rational agents.
09:30 - 10:00 We all are biased. We're all racist and bigoted
in ways that we wish we weren't, in ways that we don't even know. We know this, though, in aggregate, because sociologists
have consistently demonstrated this with these experiments they build, where they send a bunch
of applications to jobs out, equally qualified but some
have white-sounding names and some have black-sounding names, and it's always disappointing,
the results -- always. So we are the ones that are biased,
10:00 - 10:30 and we are injecting those biases
into the algorithms by choosing what data to collect, like I chose not to think
about ramen noodles -- I decided it was irrelevant. But by trusting the data that's actually
picking up on past practices and by choosing the definition of success, how can we expect the algorithms
to emerge unscathed? We can't. We have to check them. We have to check them for fairness. The good news is,
we can check them for fairness.
10:30 - 11:00 Algorithms can be interrogated, and they will tell us
the truth every time. And we can fix them.
We can make them better. I call this an algorithmic audit, and I'll walk you through it. First, data integrity check. For the recidivism risk
algorithm I talked about, a data integrity check would mean
we'd have to come to terms with the fact that in the US, whites and blacks
smoke pot at the same rate but blacks are far more likely
to be arrested -- four or five times more likely,
depending on the area.
11:00 - 11:30 What is that bias looking like
in other crime categories, and how do we account for it? Second, we should think about
the definition of success, audit that. Remember -- with the hiring
algorithm? We talked about it. Someone who stays for four years
and is promoted once? Well, that is a successful employee, but it's also an employee
that is supported by their culture. That said, also it can be quite biased. We need to separate those two things. We should look to
the blind orchestra audition
11:30 - 12:00 as an example. That's where the people auditioning
are behind a sheet. What I want to think about there is the people who are listening
have decided what's important and they've decided what's not important, and they're not getting
distracted by that. When the blind orchestra
auditions started, the number of women in orchestras
went up by a factor of five. Next, we have to consider accuracy. This is where the value-added model
for teachers would fail immediately. No algorithm is perfect, of course,
12:00 - 12:30 so we have to consider
the errors of every algorithm. How often are there errors,
and for whom does this model fail? What is the cost of that failure? And finally, we have to consider the long-term effects of algorithms, the feedback loops that are engendering. That sounds abstract, but imagine if Facebook engineers
had considered that before they decided to show us
only things that our friends had posted.
12:30 - 13:00 I have two more messages,
one for the data scientists out there. Data scientists: we should
not be the arbiters of truth. We should be translators
of ethical discussions that happen in larger society. (Applause) And the rest of you, the non-data scientists: this is not a math test. This is a political fight. We need to demand accountability
for our algorithmic overlords.
13:00 - 13:30 (Applause) The era of blind faith
in big data must end. Thank you very much. (Applause)