FutureLaw 2024 - Generative AI and Intellectual Property
Estimated read time: 1:20
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
The FutureLaw 2024 panel at Stanford Law School discussed the implications of generative AI on intellectual property (IP). Led by Professor Lemley, various experts, including Angela Dunning and Max Sills, debated the legality of training AI models using copyrighted data. Key issues included whether AI training constitutes fair use and the potential for licensing markets. The discussion emphasized the complexity of balancing innovation with creative rights, reflecting diverse legal and emotional perspectives. While fair use and transformative use were central themes, the panel acknowledged that legal outcomes are unpredictable due to evolving technological and societal contexts.
Highlights
Professor Mark Lemley moderated the panel at Stanford Law. 🎓
Discussion focused on the law of AI rather than AI for law. ⚖️
Experts debated if AI training is fair use or needs licensing. 🤔
Training using copyrighted works poses existential questions for AI. 🧠
Potential licensing schemes for AI training are complicated. 💼
Key Takeaways
Generative AI is reshaping the legal landscape of intellectual property! 🤖
The distinction between AI training and outputs is crucial. 🎓
Fair use and transformative use are key legal concepts under scrutiny. ⚖️
Emotions around AI and copyright vary from excitement to concern. 😅
The potential for licensing AI training data is being explored but is complex. 🔍
Overview
The FutureLaw 2024 session at Stanford Law School brought together experts to discuss a hot topic: generative AI and its implications for intellectual property. With Professor Mark Lemley at the helm, the panel explored the legality and ethical considerations of AI learning from copyrighted materials. Does it fall under fair use? Or is there a need for new licensing frameworks? These questions were at the heart of the discussion, reflecting the complex interplay between law, innovation, and creativity.
One of the key discussions was the separation between AI training and the resulting outputs. It's a nuanced distinction that's crucial in legal debates, especially regarding fair use and transformative use. Panelists shared differing opinions, from those who see AI's ability to learn as a natural extension of human creativity, to those worried about the economic implications and potential misuse of intellectual property.
The conversation also touched on emotional and societal perspectives, illustrating how people’s feelings about AI's role in IP range from innovative excitement to protective concern. As industries grapple with these changes, the possibility of licensing training data emerges, though it presents logistical challenges. Ultimately, the panel underscored that the outcomes of these legal battles are unpredictable, significantly influenced by technological advancements and cultural shifts.
Chapters
00:00 - 00:30: Introduction and Overview The chapter introduces a panel discussion on Generative AI (Gen AI) and Intellectual Property (IP), moderated by Professor Lemley. It humorously notes that this is Professor Lemley's appearance outside of his 'Jedi outfit'. The initial remarks set a light-hearted tone, jesting about the AI's potential uses, including creating a picture of Professor Lemley as a Jedi.
00:30 - 01:00: Moderator Introduction - Professor Lemley The chapter titled 'Moderator Introduction - Professor Lemley' introduces Professor Lemley, who is the Faculty Director of the Law, Science, and Technology program. The speaker humorously mentions seeing actual pictures of themselves in a Jedi outfit, joking that AI is not needed for that. Professor Lemley is described as the speaker's boss at the Law School, and is noted to be one of the most published and cited authors, not only in intellectual property (IP) law but in all areas of legal academia.
01:00 - 02:00: Panel Introduction The chapter titled 'Panel Introduction' introduces the panel's focus on Generative AI and Intellectual Property. The speaker highlights a prominent figure in legal technology, who is a creator of a legal tech project that evolved into the well-known startup Lex Machina. This individual will be leading the discussion for the panel. The audience is invited to welcome the panel and its leader, Mark.
02:00 - 03:00: Panel Focus: Law of AI The chapter titled 'Panel Focus: Law of AI' begins with the moderator addressing the audience, expressing gratitude, and setting the stage for the panel discussion. The panel is unique in the context of the conference as it shifts the focus from AI for law to the law governing AI. The panelists are introduced and there is an emphasis on the significance of this discussion because it addresses key questions that will influence the future and legal landscape of AI technologies like ChatGPT and other similar tools.
03:00 - 04:00: Issues in AI Ownership and Control The chapter discusses the complex issues surrounding ownership and control of artificial intelligence (AI). Legal considerations are highlighted as critical factors that will influence the development and use of AI, particularly in legal contexts. The chapter emphasizes that current legal challenges related to AI will significantly impact its future development and application in society. Mark Lemley, a notable figure in the legal field, is mentioned, indicating the discussion draws on expert insights to explore these issues.
04:00 - 06:00: Introduction of Panelists The chapter 'Introduction of Panelists' begins with panelists being introduced, highlighting their professional backgrounds and areas of expertise. A lawyer from Lex Lumina mentions litigating some relevant cases to be discussed. Angela Dunning self-introduces as a litigation partner at Cleary in Palo Alto, specializing in copyright, trademark, false advertising, and right of publicity cases. She has been practicing for approximately 25 years in the Valley and also engages in teaching.
06:00 - 08:00: Overview of AI-related Litigation The chapter discusses AI-related litigation, specifically focusing on copyright class action lawsuits linked to the development, training, and output of generative AI tools. The narrator shares their background in trademark law and expresses enthusiasm about discussing these matters with esteemed colleagues.
08:00 - 12:00: Fair Use in AI Training This chapter features a discussion with Max Sills, who is the General Counsel of Midjourney and runs Open Advisory Services, a consulting firm for AI startups. It also includes Danielle Van Leer, a former Senior Assistant General Counsel at SAG AFTRA, where she worked on legal issues concerning AI, performer rights, name image likeness rights, publicity rights, intellectual property, and privacy. The conversation touches upon the complex legal landscape of AI, especially regarding contracts and compliance that do not easily fit into collective bargaining frameworks.
12:00 - 16:00: Fair Use Defense and Case Study The chapter titled 'Fair Use Defense and Case Study' begins with a character reflecting on their career, contemplating a change in direction after a strike disrupted their field. The narrative introduces Paul Goldstein, a long-serving faculty member at an academic institution, specializing in teaching copyright law, particularly focusing on international intellectual property. Goldstein shares insights from his extensive experience dating back to 1975, offering a perspective that predates many in his audience.
16:00 - 20:00: Transformative Use and Copyright Challenges This chapter discusses the ongoing legal battles concerning the use of copyrighted material in AI training, with a focus on the numerous lawsuits that address these issues in the United States and internationally. The legal landscape is complex, with over 20 active lawsuits in the US alone. This indicates the significant legal challenges faced by AI developers in normalizing the use of copyrighted content for machine learning purposes.
20:00 - 26:00: Licensing and Market Effects This chapter discusses the early stages of creating generative AI, focusing on the training process and the construction of datasets. It highlights the immense volume of content required for these datasets to understand language, concepts, and image relationships. Additionally, the chapter touches on the prevalent issue of copyright, noting that nearly everything in today's world is protected by copyright laws.
26:00 - 30:00: International Regulation and Compliance This chapter discusses the legality of training generative AI databases, specifically focusing on the aspects of copyright law. It highlights notable exceptions in the law and questions whether it is legal to use copyrighted material in training datasets. The speaker Angela suggests that while it would be ideal to declare it legal unambiguously, there remain challenges and complexities to navigate within this legal sphere.
30:00 - 35:20: Economic Realities of Licensing This chapter discusses the ongoing legal challenges related to the use of licensing in the context of training AI models. The central legal question being addressed is whether the act of making copies for AI training can be considered fair use. Although many lawsuits have been filed, few have directly put this fair use question to the court. The resolution of these cases is expected to hinge on this crucial issue. As it stands, the definitive legal determination on this matter is still forthcoming.
35:20 - 43:00: Output Infringement The chapter titled 'Output Infringement' discusses a legal case involving music publishers seeking a preliminary injunction against Anthropic. The concern is over Anthropic's Claude model being trained on content owned by the music publishers, aiming to prevent further training that infringes on their rights.
43:00 - 46:00: Future of AI and Legal Implications The chapter explores the future of Artificial Intelligence and its legal ramifications. It delves into a case involving plaintiffs where the central legal question revolves around the 'fair use' doctrine and its application. The discussion highlights the importance of establishing whether plaintiffs are likely to succeed on the merits of their claims, which is crucial for determining the provision of injunctive relief. Both sides have presented arguments regarding the legality and implications of training AI on certain content, making the issue of 'fair use' a pivotal point in this legal debate.
46:00 - 47:00: Closing and Audience Q&A The chapter covers the closing remarks and an audience Q&A session. It highlights a legal case involving former Arkansas Governor Mike Huckabee and other writers in the Southern District of New York. The case involved multiple parties, most of whom were either dismissed or transferred out, leaving Bloomberg as the remaining party.
FutureLaw 2024 - Generative AI and Intellectual Property Transcription
00:00 - 00:30 Alright, this is a highly anticipated
panel on Gen AI and IP. We're continuing with our next session, which will be moderated
by our very own Professor Lemley. This is how, this is how he looks when he's not wearing his
Jedi outfit. I, I, I have to say all the uses of AI that could be made and a picture of
me in a Jedi outfit, it seems particularly
00:30 - 01:00 useless because you can find actual pictures
of me in a Jedi outfit with illegal eyes.
You don't need AI for that. Anyway. So please
come on in. And so, and Professor Lemley is of course the Faculty Director of our program
in Law, Science, and Technology. He's my boss here at the Law School so I have to be my, my
best behavior here today. But yeah, he's one, not only one of the, the most published and cited
authors in IP, but in all of legal academia.
01:00 - 01:30 And he's also a legal tech innovator. He created
a legal tech project here several years ago, which became a startup that we all know Lex
Machina. And he will be running this Gen AI and IP panel here for us today. So please join me
in welcoming our panel and over to you, Mark.
01:30 - 02:00 All right. Thanks everybody. So I'm going to,
let me just, I want to ask the panelists to self introduce and in a minute, but just this panel
is a little different than everything else at the conference. We're going to talk about the
law of AI rather than AI for law. And in part that's because it's kind of an interesting
question, but in part because I think it's a question that is going to determine whether
we have things like ChatGPT and other tools
02:00 - 02:30 that can be used to these things, who owns
them, who controls them and how they work.
And so a lot of the sort of legal issues that are
going on right now I think are going to be dealt with. Significantly impacting the way AI develops
and how it can be, or used or not used for law. So I am, as Roland said, Mark Lemley here at the
law school I will note that I am also practicing
02:30 - 03:00 lawyer at Lex Lumina right, where I am litigating
some of the cases that we are gonna talk about and we'll mention that as relevant and let me ask just
going down the line, Angela, to self introduce.
Hi everyone, I'm Angela Dunning. I'm a litigation
partner at Cleary based just down the street in Palo Alto. I've been practicing copyright and
trademark, false advertising, right of publicity cases for the better part of about a quarter
century now here in the Valley. I also teach
03:00 - 03:30 trademark law at a little law school across the
bay that won't be mentioned in this environment.
But I, like Mark, am litigating several of
the copyright class action lawsuits that have been directed to the development, training,
and output of generative AI tools. And I'm excited to be here to talk to you about
that with my esteemed co panelists. Hi,
03:30 - 04:00 I'm Max Sills. I'm the General Counsel of
Midjourney, and I also run Open Advisory Services, which is an advisory practice for AI startups.
I'm Danielle Van Leer, until about a month ago, I was a Senior Assistant GC for Contracts and
Compliance at SAG AFTRA, where I worked on AI and performer rights issues, name image
likeness rights, rights of publicity, IP, privacy, all kinds of stuff. Whatever kind
of didn't fit into the collective bargaining
04:00 - 04:30 hopper kind of fell in my area. Now I'm trying
to figure out what I want to be when I grow up because I needed a change after the strike.
I'm Paul Goldstein. I've been on the faculty here since 1975 before some of you were
born. And teaching principally copyright and generally around intellectual property,
but mostly copyright and international and
04:30 - 05:00 comparative copyright.Like Angela, like Mark,
I am involved in some of the litigation that surrounds training, AI training as well.
Great. So, I want to start with that set of litigation. There are, at my last count,
20 lawsuits going on in the United States as well as several in other countries. And
most of those lawsuits are focused right now
05:00 - 05:30 at least on the sort of early stage creation
of generative AI, the training the building and use of a data set to train generative AI,
those data sets of course take enormous amounts of content to try to learn how language works to
try to learn concepts and image relationships.
And because almost everything in the world is
copyrighted in the modern era almost everything
05:30 - 06:00 that goes into a training dataset is copyrighted.
There are some notable exceptions, particularly in law that we'll talk about. So where are we
with the sort of fundamental question, right?
Is it even legal to train a generative
AI database? Angela? Well, I'd love to declare that it is and have that be the end of
it. But I think that we've got a little ways
06:00 - 06:30 to go. So as Mark said, there are an awful lot of
lawsuits. To date, very few of them have actually put the question to the court yet directly with
respect to whether the training of an AI model constitutes fair use in connection with the
making of copies for purposes of training.
That issue will likely be the decisive factor
in most of these cases. But we are a ways from
06:30 - 07:00 getting a ruling. I would just highlight a
couple of cases that are out in front. One is the case filed by music publishers against
Anthropic in which a preliminary injunction has been sought to block the further future training
of Anthropic's Claude model on content owned by
07:00 - 07:30 those plaintiffs. And in that case the fair
use question has been briefed in connection with the inquiry into whether plaintiffs are
likely to succeed on the merits of their claim, which is a key factor in determining whether
injunctive relief should be granted. And there, there are arguments that have been put in both
ways on why the training on that content is
07:30 - 08:00 fair or not but there is not yet a ruling.
And then I would also just point out to the room the case that was filed by former Arkansas
Governor Mike Huckabee among a class of writers in the Southern District of New York. That case
originally was filed against a number of parties. All have been transferred out or dismissed except
Bloomberg. And in Bloomberg's motion to dismiss,
08:00 - 08:30 which is not yet fully briefed, they have sought
dismissal at the pleading stage on fair use grounds arguing that the tool they developed has
never been commercially released and has never been used by anyone, hasn't generated any revenue.
It was a research tool in and of, and, and a research tool deployed by a news organization.
And so, therefore, on the papers very squarely within what fair use was intended to protect.
So we're watching those cases and it'll be some
08:30 - 09:00 time before fair use issues properly bubble up
in the other cases that we're working on. Yeah, and so just to note on the procedure there,
right? I mean, that's because these cases are all at the motion to dismiss stage. Fair use is
a defense, so it's not part of the pleading.
So you've got to wait until the case is at issue.
What we've seen, I think, in the cases is the whittling out of a lot of sort of ancillary
theories of liability right? So the courts
09:00 - 09:30 have pretty much across the board rejected claims
that you are violating Section 1202 of the Digital Millennium Copyright Act by removing copyright
management information in the training data set.
They have across the board rejected the theory
that the model itself is somehow a derivative work of all of the billions of works that
went into training it. Some of the state law cases have gone away. But, so let's talk
about the sort of heart of the fair use question,
09:30 - 10:00 right? Is there, uh, I mean, there is
a lot of copying going on here, right?
I mean, these models are built on a database,
right? That trains maybe on Common Crawl maybe on the Lion Image database, right? But that are
billions of different copyrighted works. That at least in the outset, right, in their entirety
go into the, to the database. Is that fair use?
10:00 - 10:30 Why? Or why not? And Angela, you're
welcome to jump in, but anybody's welcome to jump in with thoughts on this.
I'll start us off. I, I think fundamentally it has to be. What we as human beings have done from
the dawn of time is ingested what knowledge exists in the world. Through the reading of books,
the viewing of art, the general perception
10:30 - 11:00 of language and all forms of learning. We take
that information, we then think on it, iterate on it in our minds and produce new content.
Obviously if the content we produce is substantially similar in protected expression to
somebody else's book or somebody else's artwork, then that may raise a real copyright
concern. But the learning aspect is not just permissible. That's the whole point of
the Copyright Act. That's the whole point of
11:00 - 11:30 the constitutional provision that guarantees
this limited copyright monopoly for purposes of ensuring that there is a promotion of
the development of the arts and sciences.
We want information in the form of literature,
text, data, music, art, to be made available to all so that it can be learned from and developed
further. And at the training stage in, when you're talking about an AI model, you're, you're talking
about the making of copies, not so those copies
11:30 - 12:00 can then be reproduced, which may raise issues,
but so that you can take the information from those copies, whether it's art or literature
or post, figure out how language works, figure out what a cat is from its spatial dimensions so
that the AI can produce new content the same way that humans can. Maybe just to add, I completely
agree, just to add to that I think it's a good
12:00 - 12:30 time to just check in on how we're feeling. So I
think that people are overloading copyright with a lot of feelings that, that we have around AI.
So one issue that's going on is we're having another period of industrialization and
automation. And people are upset because it seems like the monetary gains of that
are accruing to a small amount of people. That's a separate question from what is current
copyright law. There's also the idea of what,
12:30 - 13:00 what do we want copyright law to be?
So just to underline the point Angela was making I think if you feel upset, and you want
money, it makes sense to advance a theory that you own a property interest in something. But,
the whole reason we gave people those property interests was to advance culture. Advance
society. And, I think we might be at a point where we're forgetting the merger doctrine.
Forgetting the idea expression dichotomy.
13:00 - 13:30 Because we're confusing feelings of upset
about the economic gains of automation with why we have this other law over here. So
I kind of think that asking whether under current law is fair use, is a boring question
with a boring answer. It's yes, clearly. So I think we should be prepping ourselves
for like, what do we really want to say?
What direction do we want society to move in to?
We probably want society to maximize creative
13:30 - 14:00 expression rather than let everyone assert a
property interest over every idea they have.
Alright, I'm going to disagree and, you know.
And for the feelings perspective, I actually, I, one thing I didn't mention is that I'm a
screenwriter and photographer on the side. So, I, you know, I do have an interest on kind
of on all sides. I do use, you know, ChatGPT, for instance, in kind of ideating things, just
trying to help people, writer's block. One of
14:00 - 14:30 the guys I played D& D with uses mid journey
all the time to create really cool photos. But, you know, there's a, the flip side of that is one
is like where the training data is coming from.
I mean, these are like, a lot of this training
data is coming from being harvested off the internet that people didn't necessarily,
you know, anticipate it would be used and maybe they would have kept their IP private if
they, if that was the case. I mean, certainly I have photographs that I wouldn't have posted
publicly if I knew that that was going to happen.
14:30 - 15:00 I'm not even thrilled that I have photos up on
Getty, iStock and Shutterstock. I'm not thrilled that they licensed my photos for training data.
But I think it, you know, I think there is a, it depends context here too because it depends
on the, the particular algorithm, right? I mean, if you look at, say, going back to the face swap
videos, where they are literally taking clips from films and repurposing those, the output of that.
Is that training data fair use when the purpose
15:00 - 15:30 is to reproduce that exact clip
or those exact people? You know, I do think. I think this is something that,
I am a a proponent of very strong copyright, but I also have friends that, like I said, have
used Midjourney for stuff. I played around with it once to create my D&D character picture.
You know her, my little avatar. So, you know, I think they're, you know, I think there's a
middle ground here that we're not finding it right
15:30 - 16:00 now because these are all shaking out everything
is at the far ends of the extreme and we're not seeing many middle ground positions right now, but
but I do think it depends on the intent behind the training in large part when you start looking at
the four factors and it, it does depend on I, I, I, I don't know that everybody here even knows
the, the, are there folks who don't know the U.S four factors of fair use or four plus
factors. I'm sure the answer to that is yes,
16:00 - 16:30 it's not a copyright crime. So there's, you know,
under US copyright law, there's a, a, the defense, you know, in, in fair use entails balancing four
factors, but there's some question, depending who you ask, if it's exclusively four factors, if it's
more, I'm on the side that advocates that it is more than just the four factors, but, and I always
screw them up when I'm just speaking off the cuff, but it's the, the nature and purpose of the use.
Like, so are you using it for for-profit purposes?
16:30 - 17:00 So, you know, are they training the algorithm to
make money? The type of work that's being trained, and that can be whether it's a creative work,
or a like factual work. That gets to the idea expression dichotomy. The, you know, the amount
used, essentially, like I said, I always screw it up when I'm thinking off the cuff here the
amount of the work that you're using, are you, you training it on the whole book, the whole
movie, or are you training it on snippets?
17:00 - 17:30 And also on the, the effect on the market
for that work. So if, you know, and that can include taking away licensing revenue. So
we do have to keep all that in mind. I mean, that was intended to balance, you know, the
common law approach to limiting copyrights, so I think you know, not to mention when you
start looking internationally, you get into a whole other mess where the Berne Convention has
which most countries, most major countries are
17:30 - 18:00 signatories to, have more restrictions,
and I think we're starting to see some interesting things happen in internationally.
I believe Japan is allowing training data and the EU is adopting an opt out or notice an opt out
process. So you know, this is, I think we can't look at it just with a U. S. lens. We need to also
look at it in the context of international. Yeah, just building on Max's observation about feelings
and, and Danielle's earlier beginning points.
18:00 - 18:30 I think there are feelings in conflict here on
the one side. Yeah. You have the feeling that we should be advancing research. The Constitution
authorizes copyright to promote the progress of science and the useful arts, and certainly
that's what training is doing. On the other hand, there is a feeling, certainly in the creative
community of what my colleague in the political
18:30 - 19:00 science department has characterized in
this context as a need for reparation.
And it's interesting to take the reparation
discussion and plant it squarely in the consciousness of people who feel that they've
been ripped off, and whatever the technicalities of the law may be, deserve to be compensated
in some way. I think that's the feelings that, that are in conflict. On the technicalities
of, of fair use you know, Mark said, well
19:00 - 19:30 there's just a lot of stuff that we're copying.
Yeah, the 20 million books that Google digitized for the Google Books project was also a lot of
copying and was held to be fair use. I think when the fair use question is addressed in the
ongoing litigation, the question will be does the rule of the Google Books case from the Second
Circuit apply here. Most of the litigation is
19:30 - 20:00 in the Ninth Circuit, although there is in, in
the second and I imagine that what courts might have a close eye on is that the second circuit
decision in Google Books turned on the notion of transformative use. The opinion was written
by a judge who invented transformative use.
Transformative use has come under subsequent
review in the Warhol case, a quite different
20:00 - 20:30 context, but it leads to some uncertainty
about what does transformative use mean today, post Warhol, or outside the Second Circuit.
Yeah, so I'd love to just build on a couple of points that were made and Danielle
and I have consensus on a number of things we, we typically find actually.
I mean, I think it's really important in
20:30 - 21:00 setting the stage that we distinguish between the
outputs of these models and the active training, right? So again, any particular output that
is generated from a model. If you put that up against an original work on which it was trained,
or, or any copyrighted work for that matter, and it is substantially similar in protected
expression and copied from the original, there may very well be a copyright problem.
And we're going to turn to that issue next. Yeah,
21:00 - 21:30 so I'll, I'll skip that. That's a contextual
analysis of that particular output. What we're talking about in this question is just the act
of training an AI tool to be able to generate language or to be able to generate images
of any kind. Right. If I want to create an image of a cat on a surfboard in Venezuela,
that image probably doesn't exist anywhere.
It's not infringing anybody's copyright, but
without a tool that's been trained on lots and lots and lots of images. There's no way of
doing that. So, first setting the stage there,
21:30 - 22:00 I think we're talking just about training. Then
I think it's important to put in context this, the concept generally of fair use
and this idea expression dichotomy.
So, the idea expression dichotomy is
just the basic rule that nobody can own an idea. Nobody can own a concept. Nobody
can own information. Why not? Because we want everybody to be able to use those ideas and
concepts and information to create new works, to write about them, to expand knowledge.
So what we protect under copyright law is
22:00 - 22:30 just the particular expression of that idea, the
actual words used, the actual image created, the particular notes and rhythms that create a song.
And so separating those two things is important because again, everybody is free to take the
idea, the Supreme Court has said absolutely, definitively, over and over. You are allowed to
use the ideas and concepts from other people's work. That's what all knowledge is based
on. So then the style comes up, right?
22:30 - 23:00 In the context of these cases, there are
artists and writers who say, you're taking my style. But style's an idea. Nobody owns a
style either. I can go to a museum and do my darndest to emulate Picasso's style. Now, I'll
never be as good as Picasso. I may not be able to compete with him. But I want to make a
thousand works in the style of Picasso.
So long as I'm not taking his expression.
Copyright law permits that. And so in the
23:00 - 23:30 context of training, what we're talking
about is ingesting copies of works. So that you can take information from those
works about how language works, syntax, structure, how often words appear next to
other words. This kind of information is arguably not even protected by copyright, and
if what you're doing is taking that for the purposes of creating a completely new ability to
generate new language and content, I would argue that is quintessentially transformative
and exactly what the courts have held,
23:30 - 24:00 including in the Google Books case, is permitted.
So, so the Well, that's owed. Go ahead. Yeah, sorry Mark. That's, that's why I kind of
said we need to look also at the intent, right? Because I think if you were to take the
scripts from all the Star Wars movies train them, to train an AI with the purpose of creating,
I mean, you know, the same person creating or even just a company doing it training an
AI with the, all the Star Wars scripts for the purpose of creating Star Wars content.
And I think, you know, you start getting to
24:00 - 24:30 that intent and I think you do obviate the
fair use. So I do think that is going to potentially. So this, I mean, both of these
comments to me sort of get to the question, get to the problem that is actually really
hard to get people to focus on the distinction between the training and the output, right?
So to me, right, if I, if I decide to train only on Star Wars content I don't think there's
anything inherently problematic about that. Except that it's almost certainly going to give you
output that is substantially similar to Star
24:30 - 25:00 Wars content. And if that's a, if that's what
I'm going to do, then I think we're going to be in a different and more challenging problem.
Right. But can I add one more thing? I'm sorry, Mark. Even the creation of identical
works, even with that intent, that doesn't mean that it isn't fair use. You
know, every time I go to photocopy something, I'm making an identical work. But if I'm doing
that for purposes of scholarship, research, teaching, criticism, that may very well be fair.
If I'm a brand owner, I may very well want to upload my content for purposes of… But
again, we are wandering past training to
25:00 - 25:30 content. I want to talk about that, but I want
to close the loop on training. Right? But the very fact that it is hard to separate
these two, I think is really important.
Right? Because this is what we're seeing happen
in the, in the, in the lawsuits. Right? Why are people upset about training? Maybe they're upset
about training because they sort of conceptually don't like the idea that a machine might learn
from their work. Even if the machine is gonna produce cats surfing in Venezuela, right,
that has nothing to do with their work.
25:30 - 26:00 That's a kind of weird objection though,
right? I, and when you get down to it, I think the objections mostly end up turning
out to be I'm afraid, either one of two forms, right? One is I'm afraid that the output will be
too similar, right? That you will in fact sort of end up copying my work in the output.
Right? And that's an issue we're going to turn to in a minute, right? Or the objection is,
right, I'm afraid of competition from something that isn't infringing my work, right? And that's
an objection and it goes a little bit to Max's point, right? But it's not an
objection copyright law cares about.
26:00 - 26:30 So let me just sort of close the loop
on training with two points to make, right? One is the reason that the lawsuits
so far are mostly focused on training and not output training. Is that training is an
existential question. For AI, it's an existential question for AI because of the way we have
structured copyrights remedies regime, right?
Copyright has a statutory damages provision.
So for every registered work if you show infringement, you can get not just the actual
harm you suffered you can get a minimum of $750
26:30 - 27:00 per work and a maximum of $150,000 per work,
depending on intent. If in fact training on 2 billion images is an act of infringement of all of
those 2 billion images then even assuming we pick the minimum threshold, we don't assume, you know,
we pick the minimum statutory damages that we are required by law to give, right, we're at 1.
5 trillion dollars in damages. plaintiff's
27:00 - 27:30 class actions lawyers can do that in that math
right? And 1. 5 trillion dollars in damages sounds pretty good, even if you're not gonna
get 1. 5 trillion maybe there's a settlement, maybe there's a lot of money to be had here. And
so rather than the hey, sometimes occasionally there is an output that's substantially similar
and that's infringing, we want the big hit right?
We want to say the whole thing is infringing
and we are entitled the damages for all of that. Whether that's true, of course, depends a little
bit on the issue we've also talked about, right,
27:30 - 28:00 which is transformative use, but it also depends
on another question, the fourth factor of the fair use, right, which is the market effect.
Now traditionally we think of the market effect as is my work substituting for yours,
right? Are people buying my copy of this song rather than your copy of this song? If
so, that's unlikely to be a fair use. But as Paul previously mentioned right, one of
the things we think about in market effect is also the possibility of a licensing market.
And so one of the things that I think is an
28:00 - 28:30 unsettled question in the fair
use inquiry is will we see, can we see market for licensing training data?
Is this a feasible thing? And Paul, I'd like, love to hear your thoughts on sort of, is the
world going to end up not with with lawsuits and, and statutory damages, but with a kind of like.
I think the answer to that Mark is yes, but it is a highly qualified yes. I think that however the
current raft of lawsuits resolve themselves there
28:30 - 29:00 is going to be licensing going forward. There is
a very substantial chance that there will be, in part because of this instinct about reparation.
But also in part stuff that is happening in Europe right now, and Danielle properly, properly alluded
to foreign activities as being very important. And
29:00 - 29:30 I'll get that, to that at the It's the fourth of
my licensing possibilities, but there are four ways in which licensing of training activities,
exclusively training activities could go forward.
One is the traditional negotiated two party
license. There is some precedent for that. I think one of the early ones was OpenAI's
license with the Associated Press and there
29:30 - 30:00 are others going forward. The problem with
that is scalability. And transaction costs. There's just too many licenses to be negotiated.
There is a variable factor there, though. There was a really ill conceived bill introduced in
the Congress this week by Congressman Schiff that would impose a duty of transparency on platforms
that are doing training and require them to list
30:00 - 30:30 all copyrighted works that they've trained on.
It's, it's a very happily, it's a very short bill. But it is just totally ill conceived. And
among other things, the sanction for noncompliance is a one time fine of 5, 000, which I.
Well, let's put that to the side. I think individual negotiated licensing is probably a
non part. The second and third are collective
30:30 - 31:00 licensing and compulsory licensing.
Both topics that the Copyright Office asked for input on in their current notice and
inquiry that will be getting a report from them, I think, starting in a couple of months. And there
will be a series of reports. The responses to that were collective licensing. There was some
support among authors groups particularly
31:00 - 31:30 and that is if you think of ASCAP, BMI and CSAC
and the music area of having collective licensing by collective management organizations, CMOs.
It got some support. The problem in the U. S. with collective licensing as a solution to transaction
costs is that the U. S. has four CMOs for musical
31:30 - 32:00 performance rights and virtually no CMOs for
all the other kind of content that is subject to copyright. You contrast that with Europe and
Latin America and In Asia, where there is a single CMO for each, at least one, but typically one
CMO for each area, for photography, for visual arts generally, for writing and so the thought of
getting those collecting organizations in place in
32:00 - 32:30 the U.S. is certainly problematic. There was less
support for compulsory licensing. And, and, and for good reason, the major proponents of it were
student groups, one from Hawaii the University of Hawaii, and the other from Brooklyn Law School.
The two student groups liked compulsory licensing, nobody else seemed to care for it.
And again, for good reason compulsory
32:30 - 33:00 licenses are frowned upon as Danielle alluded to,
there is the Byrne Convention which in Article 9. 2. Limits the, puts limits on a country's
ability to subject normal free market licenses to, to licensing. The fourth kind of licensing
and the one that I think is most likely to come into place, and to do so within the
next two years, is automated licensing.
33:00 - 33:30 Metadata attached to individual works that will
communicate with the platform prior to training saying, I don't want to be copied. I will agree
to be copied, but I need to be compensated in this amount. Or, or, you know, go ahead and
copy, but give me this other consideration. I'm sure many of you recognize that as
content ID which makes YouTube possible.
33:30 - 34:00 Within a world of safe harbors where
otherwise they're subject to notice and takedown. I think that's likely for
a couple of reasons. It has precedent. The content ID precedent. It's an elegant, low
cost solution. And probably the most compelling reason is we're not gonna have any choice.
Some of you may be familiar with Article 4. 3 of
34:00 - 34:30 the Digital Single Market Copyright Directive that
creates, carves out an exception for training, but says in the event that a rights holder
gives notice that it objects to the training, that notice must be honored. Well, Article 4.
3, the opt out provision applies only within,
34:30 - 35:00 applies only to copying, training going
on in Europe, in the individual countries under the principle of territoriality.
What's happened more recently, last month was the adoption in the European Union of the
AI Act which in Article 53C makes the Article
35:00 - 35:30 4.3 obligation an operational obligation across
the board. Not only for training that occurs within the European, the countries of the
European Union, but that occurs anywhere.
Let me just raise real quick there's an obligation
among nations to put in place a policy to comply with union copyright law, Article 4. 3, and in
particular to identify and comply with, including
35:30 - 36:00 through state of the art technology, a reservation
of rights expressed pursuant to Article 4. 3. Now that is going to apply extraterritorially.
It's going to apply to any Platform that does its training anywhere. So long as they're
doing business in the European Union it's akin in that respect to the GDPR general data
protection regulation which similarly imposes
36:00 - 36:30 an extraterritorial obligation, respecting data
privacy on countries outside the European Union, if they are doing business in the European Union.
And that has had the effect among American. companies that are in that business of
conforming their conduct in part because of a vacuum of privacy law under U. S. law,
but in part because they need to do business in the European Union. This is the so-called
Brussels effect, and we will find ourselves,
36:30 - 37:00 I believe, when the AI Act comes into force it's
about two years and a couple of months from now.
In a position where that kind of compliance
will be required. And, and worth noting, there is no fair use doctrine in Europe. There
is no fair use doctrine in Europe. So, so, I, I want to say on the licensing issue, I mean, I,
I am troubled by this, right, because I think the economics are fundamentally different.
Then the economics of other places where
37:00 - 37:30 licensing is work, right? We have, licensing
for satellite broadcasts of transmissions, right? We have licensing for kind of covers
of songs, right? And that works because the thing I'm using is one, or maybe a couple of
individual copyrighted works. And so we know who the people we want to pay are and so forth.
I, I, but I struggle with sort of what it would mean to say we'll pay a fee. For training
on two billion images selected from the
37:30 - 38:00 LION database. So stability AI, right, the whole
company's worth two billion dollars, maybe less right now after recent developments, right? But
right, even if we said, all right, you know what, we're gonna take half of the the entire value of
the company and pay it in compulsory license fees to copyright owners, everybody gets fifty cents.
I don't think when compulsory, when people talk, think about like compulsory licensing, they think,
I want my 50 cents. That's not 0. 50 per use, that's 0. 50, period, for training, right?
And so what I worry about is that if in fact
38:00 - 38:30 we're in a world that Paul's talking about,
right, the what we're gonna see is a bunch of people who say, Sure, I'll license this.
my thing to be trained for 5, 000, right? Or maybe 500, right? And that's just impractical, right? No
one can build a large training data set. There may be specialized ones, right? It may be that you
want to train on a few particular things. Music might be a great example of where people would
be willing to pay a certain amount of money.
38:30 - 39:00 Right. To train a music data set, although
that's often going to be because we want to generate things that seem very similar to
your to your song. And that might not be a very popular. So I worry a little bit that sort of the
practical effect of this is it's not going to be, we'll get a licensing scheme that works.
I just don't think the economics work for it. It's good. We'll get a bunch of people
who say. Right? Sure, I'll do it for money. And then we just opt out completely. Right? That's
a, the result will have to be a They're just,
39:00 - 39:30 we can't train on anyone who who doesn't give
it to us for free. Or, we can't train unless we happen to be Google or some other company
that has gotten the ability to collect this information for other purposes and has put
somewhere in their terms and conditions that we can use this for whatever purpose we want.
If my royalty statements for my photos are any indication, it's about a hundredth of
a cent for a photograph, give or take,
39:30 - 40:00 I think is what I can't remember if it was a
thousandth of a cent or a hundredth of a cent, so. Don't, don't spend it all in one place. I
know, it's, you know, I think the total for the number of photos I have up in that library
for that training data was about a penny.
Mark, two responses, Mark. One is, you started off
talking about compulsory licensing, which is going to be a non starter. But notwithstanding that,
and bear in mind, we do have a compulsory license that was implemented in the Music Modernization
Act a couple of years ago that creates a blanket
40:00 - 40:30 license for making copies of all music subject to
a rate set by some administrators in Washington.
So we do have that possibility for dealing with
millions of content. It's a so called It's sort of a hybrid. It's a compulsory blanket license.
And so everybody's work comes under it. Under, under the second observation you made, which I
think relates more to content ID, which I think
40:30 - 41:00 is right, is going to be the path going forward.
It's a content ID like system. The situation there is, sure, there can be plenty of people who
say, I want 5, 000. And they're going to get, they're going to get rejected. And pretty quickly
the market is going to drop down to the 0. 5 cents per use or per training use, whatever it is,
because that's the most you're going to be paid.
It's like a Spotify royalty, which is, you know,
close to zero. So I think the market is going to
41:00 - 41:30 drive down those demands. I'm not saying 5,
000 is unreasonable. What I'm saying is it's the market is not going to sustain it. Alright,
so people wanted earlier to talk about output infringement and I think we want to talk about
the shift from training to output infringement.
So, I, I noted earlier that training is an
existential question, right, because it's sort of the, the potential amounts of money here if
you can't do it legally are, are enormous. Output
41:30 - 42:00 infringement by contrast is is a more specific
and targeted problem, right? The vast majority of things generated by generative AI are not
substantially similar to any copyrighted work. Alright, they are not infringing, they are not
a problem. But sometimes it happens. Right why does it happen? I, one, one reason it happens
is, is what we call in computer science the deduplication problem. Right, so it's not
that generative AI decided to copy your
42:00 - 42:30 particular text from this particular instance.
Right, it turns out there are 10 or 15, 000 copies of this particular image or of the Harry
Potter books floating around on the internet that got into the training data set. And so when
you ask it a specific enough prompt, right, you get a result that is an amalgam of the closest
universe of things and all of those closest universe of things turn out to be the same image.
Right and so the result looks like the same image
42:30 - 43:00 or looks like the same text. Second reason it
might happen is sort of deliberate prompting of infringement. So when I work with the
folks in the computer science department, right, if you ask if you ask ChatGPT to give
you a story about kids who go to a wizarding school in Britain, it doesn't give you Harry
Potter or anything similar to Harry Potter.
But if you ask it to give you a story about
kids who go to a wizarding school in Britain
43:00 - 43:30 that begins with, and then goes feed it into the
prompt, the first paragraph of Harry Potter. Well, then it actually does spit out something
relatively close to the first chapter because it recognizes that that particular
unique combination of words is likely to occur only in specific contexts, along with.
Other combinations of words and we see some examples of the prompting infringement problem in
the New York Times lawsuit, right? Where the New York Times says, Hey, look OpenAI spat out our
news story. And if you go look behind the scenes
43:30 - 44:00 at the exhibits, right? It turns out that New York
Times OpenAI spat out our news story when we said, give us a New York Times story with this title
that begins with the first nine paragraphs.
And, you know, that's, I mean, that is
an act of infringement, I think, alright, although the question of who's responsible for
it, I think, is an interesting one. And then there's another set of problems which I think
of as the baby Yoda problem, right and that is, there may just be concepts, right that the,
that the, the, that the software recognizes as a
44:00 - 44:30 concept in the same way it recognizes coffee cups
to generate an image. There are enough Baby Yodas out there in the world and they all look similar
enough that if you ask it, give me Baby Yoda, it knows what a Baby Yoda is and it will give you
a very realistic looking image of something that turns out is copyrighted. I, what, so, I think,
Angela said earlier, this is a potential problem.
Right? I mean, even if you think training is
fair use, generating an image of Baby Yoda
44:30 - 45:00 seems much less likely to be a fair use. What do
we do about it? I think we're pattern matching to stuff we saw before. And that's, I mean,
this is a legal audience. Yeah, that's what, I mean, that's our bread and butter.
But I think that, I just want to see if we can forecast a year or two in the future
about what these outputs are. What is AI4 at all? Why do we have people are still trying to figure
that out. But I want to relate a very short story.
45:00 - 45:30 Someone emailed me, and Things were getting a
little tense. They sent me a winky face emoji and that was like a hit of dopamine.
I was like, Oh, maybe, maybe things are okay. It wasn't being conveyed
in the language, but in that emoji, there was something being communicated that they
couldn't before. And at least what we're trying to do. And I think a lots of procedural generation
companies are trying to do. We're trying to expand
45:30 - 46:00 the way that people can make outputs at all.
Expand the vocabulary that you, you can, we want you to be able to communicate new things
with each other. And so from that perspective, I think like we're getting, we're getting very
stuck just pattern matching to the world we see. And it's like caveman thoughts, like money
there. It's all it. I made it. I want it.
Give me money. Or also like, I'm scared
because new, how do I do new? What is new for? But I think like very soon. Generative AI
is going to be in, it's going to be stuck to
46:00 - 46:30 human culture. It's going to be like language.
We're not going to be able to communicate with each other without it intermediating. So
I think it's really important to think now how we want, what do we want the laws to be.
Do we want copyright law to restrict how we think when we start using it as such an inseparable
tool in our thinking? Do we want copyright law
46:30 - 47:00 to restrict how we communicate with the people
we love? I think we're going to see it go, you know, taking Max's challenge looking a
year or two ahead. I think we're going to see some of the litigation go some of the same
ways that we saw with the file sharing cases.
And I do think, you know, the maybe the Grokster
case the inducement liability might, you You, you know, to the example I gave earlier about
if you're training a AI algorithm specifically
47:00 - 47:30 for a specific purpose, so you're creating
one to training it on the Star Wars universe to create Star Wars content and, you know, to
encourage people to create Star Wars content that is infringing, I think we're going to
see, you know, something along the lines of an inducement liability on something like that.
I think it'll be interesting to see with the mid journey type and OpenAI chat GPT stuff
where it's a more general algorithm,
47:30 - 48:00 more general purpose algorithm. It's gonna be
interesting to see where those go. But I do think there, in some cases, we're going to see
an inducement form of liability come into play.
So, I'll just add on to that a, a few things
and this is the point that maybe I started to make on the, on the training question that was
better suited here. If you imagine a scenario in which a work is generated through an AI tool
that is plainly incorporates the, you know,
48:00 - 48:30 the protected expression of a work.
Or is it an identical copy? I mean, there are easier ways to do that than trying
to intentionally engineer a prompt to deliver that. Again, back to our photocopier example.
But assume that you are able to generate that. There are any number of perfectly valid and
appropriate reasons you might want to do that.
If I'm a brand owner that has a really fantastic
logo, for instance, and I want to feed that into
48:30 - 49:00 a tool to figure out an ideate around potential
variations, ways that I might expand upon that. I have every right to do that and it would return
content that I own. If I want to ask tool to produce some image for me that I want to use in
an article to highlight, for instance Concerns over the creative content of, of someone.
I'm allowed to use that for news reporting, even if in another context, with another intent,
it might be infringing. So there is this,
49:00 - 49:30 there is this desire, I think, as we have this
conversation, to put everything in, in yes or no, black or white boxes, but it doesn't quite
work that way. It's very contextual.
There is no question that these tools
can be misused for ill purposes and the copyright law does not allow that. What can't
be said is that the fact that these tools are
49:30 - 50:00 capable of generating content that in certain
instances will look very much like content, copyrighted content on which it is
trained, is necessarily improper.
And I want to take us back to Warhol. So the
professor mentioned that in Warhol, Andy Warhol, who had been through a license entitled
to make a silkscreen of a Lynn Goldsmith photograph of Prince. Now, he made more than
one. He exceeded the scope of the license. He made 16. And later when Prince died, And the, a
magazine wanted to write an article about prints.
50:00 - 50:30 They went back to the Andy Warhol Foundation and
found out that there were additional silk screens and used one that hadn't been authorized. And the
court found that's not okay. There was actually a licensing market for this work. In fact, that's
how you got your hands on it in the first place. And you used it for the specific purpose
that it had originally been licensed, right?
You used the photograph of Prince to create
a work for illustration of him in an article.
50:30 - 51:00 What the court stopped short of saying is that
the making of the work in and of itself, the other silkscreens that never appeared anywhere,
was not fair. Because until those are deployed, until we know how they're being used,
we don't know if it's fair or not.
It's sort of a Heisenberg Uncertainty
Principle, or, you know. There is an element of, if the painting had been hung in a gallery, or
if it had been used for educational purposes,
51:00 - 51:30 That may very well have been a fair use.
And the court said we can't decide that here. We're not going to decide that here.
We're going to decide on this particular use. So I think it's right to say that this is right.
Output infringement, unlike training infringement, which is a kind of existential question, is a
fact specific contextual question, right? What is, what particular thing has been generated? And who
has generated it, although I do think, to go to Daniel's point, one difference here between this
and the internet cases, which may matter a lot for the AI companies, is it's not obvious that
your liability is only for inducement, right?
51:30 - 52:00 And so one of the questions we resolved early
on in the internet was this sort of question of volition, right? If I ask a machine to give
me something and it gives me something Who's, who, who's making the copy? Who's the
direct infringer, right? On the internet, right, we were happy to say that if I'm just
hosting content that other people put up there, the fact that I sort of deliver it in an automated
way doesn't make me the maker of the copy,
52:00 - 52:30 it's the person who's asking for it.
AI looks a little different though, right? Because I am generating a new thing
in response to a prompt. I think there may be a line at which a specific enough prompt
that is clearly designed to misuse the system might make the prompter and not the ai directly
responsible for infringement. But I, I think there are gonna be a bunch of things where a prompt
generates some output that is infringing.
But it is the AI itself that is making the
thing and therefore it's the AI itself that is
52:30 - 53:00 the direct infringer. That, that matters I think
because copyright is a strict liability offense. The fact that you intend, you didn't intend to
do it, the fact that you sort of took efforts to try to prevent yourself, prevent it from happening
won't necessarily mean that you avoid liability.
Max. I, I guess I wonder who would do that?
Like, I mean, if you want a direct image, you can just go find it on the internet. Who
is using generative AI tools to intentionally copy what's in there? So, plaintiff's lawyers is
one answer, right? That's kind of a very small
53:00 - 53:30 answer, right? Most, most of those things have in
fact been the plaintiff's lawyers in the case.
But it's a, it's a fair question, right? But I,
so my guess is it's not. It would be, it's a very bad copy machine. It's a very inefficient way to
use as a copy machine. So I agree with that. In contrast to the grokster. But it, what it's gonna
be, I think is gonna be more the style case. It's going to be the sort of like, I want something
that looks similar to but not identical to this thing and maybe it's too similar, maybe it's not.
But that again is going to be a sort of
53:30 - 54:00 case by case. Well, it's also the derivative works
more than the direct infringement, just to answer that question. Alright, alright. So, we have lots
more things that I would love to talk about but I also see people lining up at questions and so
maybe we should hear what you want to talk about.
Pablo. Alright. So, Professor Goldstein the fair
use. I'm going to use this question, which is sort of an existential issue. You mentioned sort
of coming down to the transformation question and noted that there's fuzziness around that,
which I think we would, not surprising given where that doctrine has to operate.
But it occurs to me that it, you know,
54:00 - 54:30 back when you taught us copyright, if you
said we're now going to all go brainstorm, the most transformational use cases we can think
of, like what is the most, almost a caricature of transforming something. Nobody could have come up
with, I'm going to create 30, Pointing randomly and then have them reorient themselves so that
they can guess the next word of these texts.
Like, that would have been, and so even
in a world where there's some fuzziness, what I don't understand is how is this
anywhere close to a close call? And
54:30 - 55:00 tell me what I'm missing because it seems to me
that like, any fuzziness of the doctrine aside, we're so beyond, we're so off the charts for
transformation that then game over, no licensing, nothing, we just all go back to, you know.
Yeah. As usual, Pablo, you're not far off the mark. You weren't in class and you weren't there.
I, no, I didn't mean to say you weren't present in class. You were always present in class.
No, I think, you know, the way transformative
55:00 - 55:30 use worked in the Google Books case was the use
to which ultimately the copied works were put, the snippets from the copied works were put.
And so I think transformative use here would, in looking at training activities, would say To
what use are these measurements, these tokens effectively being put, and they're being put to
a use that is totally transformative as compared
55:30 - 56:00 to the material on which on which they were
trained. So that's, does that take care of it?
Good. Okay. And I just, I would just add to, to
that though, that part of, you know, the, the, Like, I wrote an amicus brief in the Warhol case
and there were a bunch of amicus, a huge number of amicus cases briefs. And you know, I think one
of the, the concerns that, that we had and that others have had was that the, the focus of fair
use analysis has turned, is it transformative?
56:00 - 56:30 Forgetting that there are three other factors and,
and even transformative is nowhere in the fair use So, it, it's just been something that has become
an easy way to just, oh, it's transformative, so clearly it's fair use, but, and, and I
believe if I recall correctly from the opinion, the court said we have to analyze these
four, you know, these all four factors.
56:30 - 57:00 And I think they even did in the Oracle v. Google
case, which also turned on one particular factor, but that, that's gotten lost in fair use
jurisprudence, is that transformativeness is not the only determinant factor. I wondered if you
could comment on the acquisition of the training data, in particular terms of service on websites.
Say I downloaded, I don't know, millions of hours of YouTube videos, let's say, and then
built a video generating system, you know,
57:00 - 57:30 as an example. Yeah. Just a random example.
You know, just, just a random example. And, and, you know, you know, I didn't do this, but one
may have done this. Yeah. Some would. Yeah, so I, I think the answer is probably not a copyright
problem per se, but quite potentially a sort of breach of contract, breach of terms of use
problem, if in fact that's how you get it, right?
Now, most of the databases, I think
most of the generalized ones, right, have used common crawl right, which respects the
robots. txt header. So they said, we're gonna go
57:30 - 58:00 crawl the internet for the universe of things
that have said, yes, please crawl me and index me in a technical system. I think probably
the pe many of the people who set robots.
txt to yes, didn't have generative AI in mind.
They have, I want to appear in search engines in mind, and it might be that we Right. In the
future, start to distinguish those things, right? Or the lion database, right? Which is a
sort of database of image categorizations that is in turn taken out of common crawl, right?
That is an effort to sort of get around those
58:00 - 58:30 problems. I don't think it gets around
all of the copyright problems in part, because many of the people who put up
information or data on a website and said, sure, please index me might have illegally taken
that information, right? The books. 3 database might be an example of that but I do think if
you are going to a specific website to crawl that website and it does not have a robots.
txt header that says no problem you ought to
58:30 - 59:00 be worried about worried about that. Hello, so my
question is more about what would be the model for compensating the creators for the content, whether
that is text, like a webpage. a video, maybe upload it on YouTube or an a photography work.
So as industry and technology developed it seems that people found ways how to compensate
the creators. So, for web, we can embed AdSense
59:00 - 59:30 and Google pays the, you know, web web page
creators for for their work. For video, we have a YouTube, kind of determining algorithmically how
to pay YouTube creators and you know, Spotify is, you know, doing that for, for, for, for audio.
My question, do you think there would be a kind of emergence of maybe Spotify or YouTube or
AdSense, some type of a technology that would
59:30 - 60:00 be embedded in the, in the content itself
that could help, you know, companies like MidJourney or OpenAI to actually compensate?
To those Paul, you want to take it first?
Yeah, the I think it's going to be various.
It's good. You know, it's interesting that you use the example of Spotify which is
a smoothly functioning, not very low paying, but I'll be it, but a smoothly functioning
operation, which rests on a compulsory
60:00 - 60:30 license. It's the digital phono record delivery
compulsory license of art of section one 15.
And that is basically what drives Spotify, and
they have periodically to negotiate rates between them and the publishers with the Copyright Royalty
Board. But that's one way to do it. Another way is the YouTube way. Interestingly, the legal
infrastructure ultimately makes, whether it's a
60:30 - 61:00 effectively negotiated license with Content ID
or a compulsory license with, with Spotify.
The legal infrastructure at the end of the
day makes very little difference. It's the economic arrangements built on top of it that,
that really count. Angela, real quickly and then I think maybe one more, i, I just wanted to
say it, it really depends too on whether you're suggesting that there should be some sort of
licensing regime For outputs versus training.
61:00 - 61:30 I mean, to Mark, Mark's point earlier, the only
conceivable regime would involve some mechanism for a single training license fee. You, you cannot
conceivably imagine that everybody who's had a, taken a picture of a mountain somewhere
where a, a platform has been trained on two million pictures of mountains suggest that
an output that may have a mountain in it that looks nothing like those two million mountains
would be entitled to any kind of compensation.
61:30 - 62:00 That's never how copyright law was, was meant
to work. And, and it, it would, It would destroy the technology before it's had the ability to
come into full fruition, to solve all sorts of problems. We haven't even talked about on
this panel. I mean, there are amazing as yet on discovered reasons. We need these technologies
developed and drug, drug discovery disease pathways, solving traffic, solving climate.
Big questions, right? Not just cats on,
62:00 - 62:30 on surfboards in Venezuela. And I'm so going
to Google that now, Angela. If I can just make one comment on the business models, because
when we didn't get to, we were going to talk about was like the celebrity impersonation
and digital avatars, digital replicas.
There are a ton of companies out there trying
to, and music actually also trying to develop competent compensation and tracking models and
different, you know, products to try to handle
62:30 - 63:00 this licensing concept. But so I think
there is and that, and that to me, right, that's going to work because it is specific
to an individual or a few individuals, right?
If I want to make, if I want to use your
image, right, then, then I ought to be paying you. Right. I think it's much harder if
it's everyone who's ever taken a picture of a mountain should get a billionth of a cent. All
right. Last question. Yeah. Question regarding the current court cases. What is unique about them
compared to some of the historical ones regarding, you know copyrighted data, IPs indexed by some of
the search engines we have seen in so many cases?
63:00 - 63:30 What are the current cases makes different
regarding usage of radio? I mean, I, so, well, they're current, they're ongoing, so we don't
know how they're going to come out. I mean, I, I, I think as Angela has kind of suggested,
right, and Paul suggested in response to Pablo's question if, if the question is, what
does existing precedent say as to training?
Existing precedent lines up pretty strongly
in favor of, this is a very different purpose
63:30 - 64:00 it's a new technology that's going to be
a fair use Things that might change it, right? A licensing market, right? If we
thought there was a working licensing market and therefore you're depri you're depriving
people of revenue, that can change the fourth factor and it could change the analysis.
And tech lash, right? We're we are in a moment of sort of like AI moral panic. And I
think that affects judges and it might well be that the a legal decision that in a different
technology, in a different kind of psychological
64:00 - 64:30 era would clearly have come out in favor of
the tech company might come out differently because people are afraid of AI.
That's not what should happen, but it might happen. And I think with that,
we're going to have to stop. Thank you all.