FutureLaw 2024 - Generative AI and Intellectual Property

Estimated read time: 1:20

    Learn to use AI like a Pro

    Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

    Canva Logo
    Claude AI Logo
    Google Gemini Logo
    HeyGen Logo
    Hugging Face Logo
    Microsoft Logo
    OpenAI Logo
    Zapier Logo
    Canva Logo
    Claude AI Logo
    Google Gemini Logo
    HeyGen Logo
    Hugging Face Logo
    Microsoft Logo
    OpenAI Logo
    Zapier Logo

    Summary

    The FutureLaw 2024 panel at Stanford Law School discussed the implications of generative AI on intellectual property (IP). Led by Professor Lemley, various experts, including Angela Dunning and Max Sills, debated the legality of training AI models using copyrighted data. Key issues included whether AI training constitutes fair use and the potential for licensing markets. The discussion emphasized the complexity of balancing innovation with creative rights, reflecting diverse legal and emotional perspectives. While fair use and transformative use were central themes, the panel acknowledged that legal outcomes are unpredictable due to evolving technological and societal contexts.

      Highlights

      • Professor Mark Lemley moderated the panel at Stanford Law. 🎓
      • Discussion focused on the law of AI rather than AI for law. ⚖️
      • Experts debated if AI training is fair use or needs licensing. 🤔
      • Training using copyrighted works poses existential questions for AI. 🧠
      • Potential licensing schemes for AI training are complicated. 💼

      Key Takeaways

      • Generative AI is reshaping the legal landscape of intellectual property! 🤖
      • The distinction between AI training and outputs is crucial. 🎓
      • Fair use and transformative use are key legal concepts under scrutiny. ⚖️
      • Emotions around AI and copyright vary from excitement to concern. 😅
      • The potential for licensing AI training data is being explored but is complex. 🔍

      Overview

      The FutureLaw 2024 session at Stanford Law School brought together experts to discuss a hot topic: generative AI and its implications for intellectual property. With Professor Mark Lemley at the helm, the panel explored the legality and ethical considerations of AI learning from copyrighted materials. Does it fall under fair use? Or is there a need for new licensing frameworks? These questions were at the heart of the discussion, reflecting the complex interplay between law, innovation, and creativity.

        One of the key discussions was the separation between AI training and the resulting outputs. It's a nuanced distinction that's crucial in legal debates, especially regarding fair use and transformative use. Panelists shared differing opinions, from those who see AI's ability to learn as a natural extension of human creativity, to those worried about the economic implications and potential misuse of intellectual property.

          The conversation also touched on emotional and societal perspectives, illustrating how people’s feelings about AI's role in IP range from innovative excitement to protective concern. As industries grapple with these changes, the possibility of licensing training data emerges, though it presents logistical challenges. Ultimately, the panel underscored that the outcomes of these legal battles are unpredictable, significantly influenced by technological advancements and cultural shifts.

            Chapters

            • 00:00 - 00:30: Introduction and Overview The chapter introduces a panel discussion on Generative AI (Gen AI) and Intellectual Property (IP), moderated by Professor Lemley. It humorously notes that this is Professor Lemley's appearance outside of his 'Jedi outfit'. The initial remarks set a light-hearted tone, jesting about the AI's potential uses, including creating a picture of Professor Lemley as a Jedi.
            • 00:30 - 01:00: Moderator Introduction - Professor Lemley The chapter titled 'Moderator Introduction - Professor Lemley' introduces Professor Lemley, who is the Faculty Director of the Law, Science, and Technology program. The speaker humorously mentions seeing actual pictures of themselves in a Jedi outfit, joking that AI is not needed for that. Professor Lemley is described as the speaker's boss at the Law School, and is noted to be one of the most published and cited authors, not only in intellectual property (IP) law but in all areas of legal academia.
            • 01:00 - 02:00: Panel Introduction The chapter titled 'Panel Introduction' introduces the panel's focus on Generative AI and Intellectual Property. The speaker highlights a prominent figure in legal technology, who is a creator of a legal tech project that evolved into the well-known startup Lex Machina. This individual will be leading the discussion for the panel. The audience is invited to welcome the panel and its leader, Mark.
            • 02:00 - 03:00: Panel Focus: Law of AI The chapter titled 'Panel Focus: Law of AI' begins with the moderator addressing the audience, expressing gratitude, and setting the stage for the panel discussion. The panel is unique in the context of the conference as it shifts the focus from AI for law to the law governing AI. The panelists are introduced and there is an emphasis on the significance of this discussion because it addresses key questions that will influence the future and legal landscape of AI technologies like ChatGPT and other similar tools.
            • 03:00 - 04:00: Issues in AI Ownership and Control The chapter discusses the complex issues surrounding ownership and control of artificial intelligence (AI). Legal considerations are highlighted as critical factors that will influence the development and use of AI, particularly in legal contexts. The chapter emphasizes that current legal challenges related to AI will significantly impact its future development and application in society. Mark Lemley, a notable figure in the legal field, is mentioned, indicating the discussion draws on expert insights to explore these issues.
            • 04:00 - 06:00: Introduction of Panelists The chapter 'Introduction of Panelists' begins with panelists being introduced, highlighting their professional backgrounds and areas of expertise. A lawyer from Lex Lumina mentions litigating some relevant cases to be discussed. Angela Dunning self-introduces as a litigation partner at Cleary in Palo Alto, specializing in copyright, trademark, false advertising, and right of publicity cases. She has been practicing for approximately 25 years in the Valley and also engages in teaching.
            • 06:00 - 08:00: Overview of AI-related Litigation The chapter discusses AI-related litigation, specifically focusing on copyright class action lawsuits linked to the development, training, and output of generative AI tools. The narrator shares their background in trademark law and expresses enthusiasm about discussing these matters with esteemed colleagues.
            • 08:00 - 12:00: Fair Use in AI Training This chapter features a discussion with Max Sills, who is the General Counsel of Midjourney and runs Open Advisory Services, a consulting firm for AI startups. It also includes Danielle Van Leer, a former Senior Assistant General Counsel at SAG AFTRA, where she worked on legal issues concerning AI, performer rights, name image likeness rights, publicity rights, intellectual property, and privacy. The conversation touches upon the complex legal landscape of AI, especially regarding contracts and compliance that do not easily fit into collective bargaining frameworks.
            • 12:00 - 16:00: Fair Use Defense and Case Study The chapter titled 'Fair Use Defense and Case Study' begins with a character reflecting on their career, contemplating a change in direction after a strike disrupted their field. The narrative introduces Paul Goldstein, a long-serving faculty member at an academic institution, specializing in teaching copyright law, particularly focusing on international intellectual property. Goldstein shares insights from his extensive experience dating back to 1975, offering a perspective that predates many in his audience.
            • 16:00 - 20:00: Transformative Use and Copyright Challenges This chapter discusses the ongoing legal battles concerning the use of copyrighted material in AI training, with a focus on the numerous lawsuits that address these issues in the United States and internationally. The legal landscape is complex, with over 20 active lawsuits in the US alone. This indicates the significant legal challenges faced by AI developers in normalizing the use of copyrighted content for machine learning purposes.
            • 20:00 - 26:00: Licensing and Market Effects This chapter discusses the early stages of creating generative AI, focusing on the training process and the construction of datasets. It highlights the immense volume of content required for these datasets to understand language, concepts, and image relationships. Additionally, the chapter touches on the prevalent issue of copyright, noting that nearly everything in today's world is protected by copyright laws.
            • 26:00 - 30:00: International Regulation and Compliance This chapter discusses the legality of training generative AI databases, specifically focusing on the aspects of copyright law. It highlights notable exceptions in the law and questions whether it is legal to use copyrighted material in training datasets. The speaker Angela suggests that while it would be ideal to declare it legal unambiguously, there remain challenges and complexities to navigate within this legal sphere.
            • 30:00 - 35:20: Economic Realities of Licensing This chapter discusses the ongoing legal challenges related to the use of licensing in the context of training AI models. The central legal question being addressed is whether the act of making copies for AI training can be considered fair use. Although many lawsuits have been filed, few have directly put this fair use question to the court. The resolution of these cases is expected to hinge on this crucial issue. As it stands, the definitive legal determination on this matter is still forthcoming.
            • 35:20 - 43:00: Output Infringement The chapter titled 'Output Infringement' discusses a legal case involving music publishers seeking a preliminary injunction against Anthropic. The concern is over Anthropic's Claude model being trained on content owned by the music publishers, aiming to prevent further training that infringes on their rights.
            • 43:00 - 46:00: Future of AI and Legal Implications The chapter explores the future of Artificial Intelligence and its legal ramifications. It delves into a case involving plaintiffs where the central legal question revolves around the 'fair use' doctrine and its application. The discussion highlights the importance of establishing whether plaintiffs are likely to succeed on the merits of their claims, which is crucial for determining the provision of injunctive relief. Both sides have presented arguments regarding the legality and implications of training AI on certain content, making the issue of 'fair use' a pivotal point in this legal debate.
            • 46:00 - 47:00: Closing and Audience Q&A The chapter covers the closing remarks and an audience Q&A session. It highlights a legal case involving former Arkansas Governor Mike Huckabee and other writers in the Southern District of New York. The case involved multiple parties, most of whom were either dismissed or transferred out, leaving Bloomberg as the remaining party.

            FutureLaw 2024 - Generative AI and Intellectual Property Transcription

            • 00:00 - 00:30 Alright, this is a highly anticipated  panel on Gen AI and IP. We're continuing   with our next session, which will be moderated  by our very own Professor Lemley. This is how,   this is how he looks when he's not wearing his  Jedi outfit. I, I, I have to say all the uses   of AI that could be made and a picture of  me in a Jedi outfit, it seems particularly
            • 00:30 - 01:00 useless because you can find actual pictures  of me in a Jedi outfit with illegal eyes.   You don't need AI for that. Anyway. So please  come on in. And so, and Professor Lemley is   of course the Faculty Director of our program  in Law, Science, and Technology. He's my boss   here at the Law School so I have to be my, my  best behavior here today. But yeah, he's one,   not only one of the, the most published and cited  authors in IP, but in all of legal academia.
            • 01:00 - 01:30 And he's also a legal tech innovator. He created  a legal tech project here several years ago,   which became a startup that we all know Lex  Machina. And he will be running this Gen AI and   IP panel here for us today. So please join me  in welcoming our panel and over to you, Mark.
            • 01:30 - 02:00 All right. Thanks everybody. So I'm going to,  let me just, I want to ask the panelists to self   introduce and in a minute, but just this panel  is a little different than everything else at   the conference. We're going to talk about the  law of AI rather than AI for law. And in part   that's because it's kind of an interesting  question, but in part because I think it's   a question that is going to determine whether  we have things like ChatGPT and other tools
            • 02:00 - 02:30 that can be used to these things, who owns  them, who controls them and how they work.   And so a lot of the sort of legal issues that are  going on right now I think are going to be dealt   with. Significantly impacting the way AI develops  and how it can be, or used or not used for law.   So I am, as Roland said, Mark Lemley here at the  law school I will note that I am also practicing
            • 02:30 - 03:00 lawyer at Lex Lumina right, where I am litigating  some of the cases that we are gonna talk about and   we'll mention that as relevant and let me ask just  going down the line, Angela, to self introduce.   Hi everyone, I'm Angela Dunning. I'm a litigation  partner at Cleary based just down the street in   Palo Alto. I've been practicing copyright and  trademark, false advertising, right of publicity   cases for the better part of about a quarter  century now here in the Valley. I also teach
            • 03:00 - 03:30 trademark law at a little law school across the  bay that won't be mentioned in this environment.   But I, like Mark, am litigating several of  the copyright class action lawsuits that have   been directed to the development, training,  and output of generative AI tools. And I'm   excited to be here to talk to you about  that with my esteemed co panelists. Hi,
            • 03:30 - 04:00 I'm Max Sills. I'm the General Counsel of  Midjourney, and I also run Open Advisory Services,   which is an advisory practice for AI startups. I'm Danielle Van Leer, until about a month ago,   I was a Senior Assistant GC for Contracts and  Compliance at SAG AFTRA, where I worked on AI   and performer rights issues, name image  likeness rights, rights of publicity,   IP, privacy, all kinds of stuff. Whatever kind  of didn't fit into the collective bargaining
            • 04:00 - 04:30 hopper kind of fell in my area. Now I'm trying  to figure out what I want to be when I grow up   because I needed a change after the strike. I'm Paul Goldstein. I've been on the faculty   here since 1975 before some of you were  born. And teaching principally copyright   and generally around intellectual property,  but mostly copyright and international and
            • 04:30 - 05:00 comparative copyright.Like Angela, like Mark,  I am involved in some of the litigation that   surrounds training, AI training as well. Great. So, I want to start with that set of   litigation. There are, at my last count,  20 lawsuits going on in the United States   as well as several in other countries. And  most of those lawsuits are focused right now
            • 05:00 - 05:30 at least on the sort of early stage creation  of generative AI, the training the building   and use of a data set to train generative AI,  those data sets of course take enormous amounts   of content to try to learn how language works to  try to learn concepts and image relationships.   And because almost everything in the world is  copyrighted in the modern era almost everything
            • 05:30 - 06:00 that goes into a training dataset is copyrighted.  There are some notable exceptions, particularly   in law that we'll talk about. So where are we  with the sort of fundamental question, right?   Is it even legal to train a generative  AI database? Angela? Well, I'd love to   declare that it is and have that be the end of  it. But I think that we've got a little ways
            • 06:00 - 06:30 to go. So as Mark said, there are an awful lot of  lawsuits. To date, very few of them have actually   put the question to the court yet directly with  respect to whether the training of an AI model   constitutes fair use in connection with the  making of copies for purposes of training.   That issue will likely be the decisive factor  in most of these cases. But we are a ways from
            • 06:30 - 07:00 getting a ruling. I would just highlight a  couple of cases that are out in front. One   is the case filed by music publishers against  Anthropic in which a preliminary injunction has   been sought to block the further future training  of Anthropic's Claude model on content owned by
            • 07:00 - 07:30 those plaintiffs. And in that case the fair  use question has been briefed in connection   with the inquiry into whether plaintiffs are  likely to succeed on the merits of their claim,   which is a key factor in determining whether  injunctive relief should be granted. And there,   there are arguments that have been put in both  ways on why the training on that content is
            • 07:30 - 08:00 fair or not but there is not yet a ruling. And then I would also just point out to the   room the case that was filed by former Arkansas  Governor Mike Huckabee among a class of writers   in the Southern District of New York. That case  originally was filed against a number of parties.   All have been transferred out or dismissed except  Bloomberg. And in Bloomberg's motion to dismiss,
            • 08:00 - 08:30 which is not yet fully briefed, they have sought  dismissal at the pleading stage on fair use   grounds arguing that the tool they developed has  never been commercially released and has never   been used by anyone, hasn't generated any revenue.  It was a research tool in and of, and, and a   research tool deployed by a news organization.  And so, therefore, on the papers very squarely   within what fair use was intended to protect. So we're watching those cases and it'll be some
            • 08:30 - 09:00 time before fair use issues properly bubble up  in the other cases that we're working on. Yeah,   and so just to note on the procedure there,  right? I mean, that's because these cases are   all at the motion to dismiss stage. Fair use is  a defense, so it's not part of the pleading.   So you've got to wait until the case is at issue.  What we've seen, I think, in the cases is the   whittling out of a lot of sort of ancillary  theories of liability right? So the courts
            • 09:00 - 09:30 have pretty much across the board rejected claims  that you are violating Section 1202 of the Digital   Millennium Copyright Act by removing copyright  management information in the training data set.   They have across the board rejected the theory  that the model itself is somehow a derivative   work of all of the billions of works that  went into training it. Some of the state   law cases have gone away. But, so let's talk  about the sort of heart of the fair use question,
            • 09:30 - 10:00 right? Is there, uh, I mean, there is  a lot of copying going on here, right?   I mean, these models are built on a database,  right? That trains maybe on Common Crawl maybe   on the Lion Image database, right? But that are  billions of different copyrighted works. That   at least in the outset, right, in their entirety  go into the, to the database. Is that fair use?
            • 10:00 - 10:30 Why? Or why not? And Angela, you're  welcome to jump in, but anybody's   welcome to jump in with thoughts on this. I'll start us off. I, I think fundamentally it   has to be. What we as human beings have done from  the dawn of time is ingested what knowledge exists   in the world. Through the reading of books,  the viewing of art, the general perception
            • 10:30 - 11:00 of language and all forms of learning. We take  that information, we then think on it, iterate   on it in our minds and produce new content. Obviously if the content we produce is   substantially similar in protected expression to  somebody else's book or somebody else's artwork,   then that may raise a real copyright  concern. But the learning aspect is not   just permissible. That's the whole point of  the Copyright Act. That's the whole point of
            • 11:00 - 11:30 the constitutional provision that guarantees  this limited copyright monopoly for purposes   of ensuring that there is a promotion of  the development of the arts and sciences.   We want information in the form of literature,  text, data, music, art, to be made available to   all so that it can be learned from and developed  further. And at the training stage in, when you're   talking about an AI model, you're, you're talking  about the making of copies, not so those copies
            • 11:30 - 12:00 can then be reproduced, which may raise issues,  but so that you can take the information from   those copies, whether it's art or literature  or post, figure out how language works, figure   out what a cat is from its spatial dimensions so  that the AI can produce new content the same way   that humans can. Maybe just to add, I completely  agree, just to add to that I think it's a good
            • 12:00 - 12:30 time to just check in on how we're feeling. So I  think that people are overloading copyright with a   lot of feelings that, that we have around AI. So one issue that's going on is we're having   another period of industrialization and  automation. And people are upset because   it seems like the monetary gains of that  are accruing to a small amount of people.   That's a separate question from what is current  copyright law. There's also the idea of what,
            • 12:30 - 13:00 what do we want copyright law to be? So just to underline the point Angela was   making I think if you feel upset, and you want  money, it makes sense to advance a theory that   you own a property interest in something. But,  the whole reason we gave people those property   interests was to advance culture. Advance  society. And, I think we might be at a point   where we're forgetting the merger doctrine. Forgetting the idea expression dichotomy.
            • 13:00 - 13:30 Because we're confusing feelings of upset  about the economic gains of automation with   why we have this other law over here. So  I kind of think that asking whether under   current law is fair use, is a boring question  with a boring answer. It's yes, clearly. So   I think we should be prepping ourselves  for like, what do we really want to say?   What direction do we want society to move in to?  We probably want society to maximize creative
            • 13:30 - 14:00 expression rather than let everyone assert a  property interest over every idea they have.   Alright, I'm going to disagree and, you know.  And for the feelings perspective, I actually,   I, one thing I didn't mention is that I'm a  screenwriter and photographer on the side. So,   I, you know, I do have an interest on kind  of on all sides. I do use, you know, ChatGPT,   for instance, in kind of ideating things, just  trying to help people, writer's block. One of
            • 14:00 - 14:30 the guys I played D& D with uses mid journey  all the time to create really cool photos. But,   you know, there's a, the flip side of that is one  is like where the training data is coming from.   I mean, these are like, a lot of this training  data is coming from being harvested off the   internet that people didn't necessarily,  you know, anticipate it would be used and   maybe they would have kept their IP private if  they, if that was the case. I mean, certainly   I have photographs that I wouldn't have posted  publicly if I knew that that was going to happen.
            • 14:30 - 15:00 I'm not even thrilled that I have photos up on  Getty, iStock and Shutterstock. I'm not thrilled   that they licensed my photos for training data.  But I think it, you know, I think there is a,   it depends context here too because it depends  on the, the particular algorithm, right? I mean,   if you look at, say, going back to the face swap  videos, where they are literally taking clips from   films and repurposing those, the output of that. Is that training data fair use when the purpose
            • 15:00 - 15:30 is to reproduce that exact clip  or those exact people? You know,   I do think. I think this is something that,  I am a a proponent of very strong copyright,   but I also have friends that, like I said, have  used Midjourney for stuff. I played around with   it once to create my D&D character picture. You know her, my little avatar. So, you know,   I think they're, you know, I think there's a  middle ground here that we're not finding it right
            • 15:30 - 16:00 now because these are all shaking out everything  is at the far ends of the extreme and we're not   seeing many middle ground positions right now, but  but I do think it depends on the intent behind the   training in large part when you start looking at  the four factors and it, it does depend on I, I,   I, I don't know that everybody here even knows  the, the, are there folks who don't know the   U.S four factors of fair use or four plus  factors. I'm sure the answer to that is yes,
            • 16:00 - 16:30 it's not a copyright crime. So there's, you know,  under US copyright law, there's a, a, the defense,   you know, in, in fair use entails balancing four  factors, but there's some question, depending who   you ask, if it's exclusively four factors, if it's  more, I'm on the side that advocates that it is   more than just the four factors, but, and I always  screw them up when I'm just speaking off the cuff,   but it's the, the nature and purpose of the use. Like, so are you using it for for-profit purposes?
            • 16:30 - 17:00 So, you know, are they training the algorithm to  make money? The type of work that's being trained,   and that can be whether it's a creative work,  or a like factual work. That gets to the idea   expression dichotomy. The, you know, the amount  used, essentially, like I said, I always screw   it up when I'm thinking off the cuff here the  amount of the work that you're using, are you,   you training it on the whole book, the whole  movie, or are you training it on snippets?
            • 17:00 - 17:30 And also on the, the effect on the market  for that work. So if, you know, and that   can include taking away licensing revenue. So  we do have to keep all that in mind. I mean,   that was intended to balance, you know, the  common law approach to limiting copyrights,   so I think you know, not to mention when you  start looking internationally, you get into a   whole other mess where the Berne Convention has  which most countries, most major countries are
            • 17:30 - 18:00 signatories to, have more restrictions,  and I think we're starting to see some   interesting things happen in internationally. I believe Japan is allowing training data and   the EU is adopting an opt out or notice an opt out  process. So you know, this is, I think we can't   look at it just with a U. S. lens. We need to also  look at it in the context of international. Yeah,   just building on Max's observation about feelings  and, and Danielle's earlier beginning points.
            • 18:00 - 18:30 I think there are feelings in conflict here on  the one side. Yeah. You have the feeling that we   should be advancing research. The Constitution  authorizes copyright to promote the progress   of science and the useful arts, and certainly  that's what training is doing. On the other hand,   there is a feeling, certainly in the creative  community of what my colleague in the political
            • 18:30 - 19:00 science department has characterized in  this context as a need for reparation.   And it's interesting to take the reparation  discussion and plant it squarely in the   consciousness of people who feel that they've  been ripped off, and whatever the technicalities   of the law may be, deserve to be compensated  in some way. I think that's the feelings that,   that are in conflict. On the technicalities  of, of fair use you know, Mark said, well
            • 19:00 - 19:30 there's just a lot of stuff that we're copying. Yeah, the 20 million books that Google digitized   for the Google Books project was also a lot of  copying and was held to be fair use. I think   when the fair use question is addressed in the  ongoing litigation, the question will be does the   rule of the Google Books case from the Second  Circuit apply here. Most of the litigation is
            • 19:30 - 20:00 in the Ninth Circuit, although there is in, in  the second and I imagine that what courts might   have a close eye on is that the second circuit  decision in Google Books turned on the notion   of transformative use. The opinion was written  by a judge who invented transformative use.   Transformative use has come under subsequent  review in the Warhol case, a quite different
            • 20:00 - 20:30 context, but it leads to some uncertainty  about what does transformative use mean today,   post Warhol, or outside the Second Circuit. Yeah, so I'd love to just build on a couple   of points that were made and Danielle  and I have consensus on a number of   things we, we typically find actually. I mean, I think it's really important in
            • 20:30 - 21:00 setting the stage that we distinguish between the  outputs of these models and the active training,   right? So again, any particular output that  is generated from a model. If you put that up   against an original work on which it was trained,  or, or any copyrighted work for that matter,   and it is substantially similar in protected  expression and copied from the original,   there may very well be a copyright problem. And we're going to turn to that issue next. Yeah,
            • 21:00 - 21:30 so I'll, I'll skip that. That's a contextual  analysis of that particular output. What we're   talking about in this question is just the act  of training an AI tool to be able to generate   language or to be able to generate images  of any kind. Right. If I want to create an   image of a cat on a surfboard in Venezuela,  that image probably doesn't exist anywhere.   It's not infringing anybody's copyright, but  without a tool that's been trained on lots   and lots and lots of images. There's no way of  doing that. So, first setting the stage there,
            • 21:30 - 22:00 I think we're talking just about training. Then  I think it's important to put in context this,   the concept generally of fair use  and this idea expression dichotomy.   So, the idea expression dichotomy is  just the basic rule that nobody can   own an idea. Nobody can own a concept. Nobody  can own information. Why not? Because we want   everybody to be able to use those ideas and  concepts and information to create new works,   to write about them, to expand knowledge.  So what we protect under copyright law is
            • 22:00 - 22:30 just the particular expression of that idea, the  actual words used, the actual image created, the   particular notes and rhythms that create a song. And so separating those two things is important   because again, everybody is free to take the  idea, the Supreme Court has said absolutely,   definitively, over and over. You are allowed to  use the ideas and concepts from other people's   work. That's what all knowledge is based  on. So then the style comes up, right?
            • 22:30 - 23:00 In the context of these cases, there are  artists and writers who say, you're taking   my style. But style's an idea. Nobody owns a  style either. I can go to a museum and do my   darndest to emulate Picasso's style. Now, I'll  never be as good as Picasso. I may not be able   to compete with him. But I want to make a  thousand works in the style of Picasso.   So long as I'm not taking his expression.  Copyright law permits that. And so in the
            • 23:00 - 23:30 context of training, what we're talking  about is ingesting copies of works. So   that you can take information from those  works about how language works, syntax,   structure, how often words appear next to  other words. This kind of information is   arguably not even protected by copyright, and  if what you're doing is taking that for the   purposes of creating a completely new ability to  generate new language and content, I would argue   that is quintessentially transformative  and exactly what the courts have held,
            • 23:30 - 24:00 including in the Google Books case, is permitted. So, so the Well, that's owed. Go ahead. Yeah,   sorry Mark. That's, that's why I kind of  said we need to look also at the intent,   right? Because I think if you were to take the  scripts from all the Star Wars movies train them,   to train an AI with the purpose of creating,  I mean, you know, the same person creating   or even just a company doing it training an  AI with the, all the Star Wars scripts for   the purpose of creating Star Wars content. And I think, you know, you start getting to
            • 24:00 - 24:30 that intent and I think you do obviate the  fair use. So I do think that is going to   potentially. So this, I mean, both of these  comments to me sort of get to the question,   get to the problem that is actually really  hard to get people to focus on the distinction   between the training and the output, right? So to me, right, if I, if I decide to train   only on Star Wars content I don't think there's  anything inherently problematic about that. Except   that it's almost certainly going to give you  output that is substantially similar to Star
            • 24:30 - 25:00 Wars content. And if that's a, if that's what  I'm going to do, then I think we're going to   be in a different and more challenging problem. Right. But can I add one more thing? I'm sorry,   Mark. Even the creation of identical  works, even with that intent,   that doesn't mean that it isn't fair use. You  know, every time I go to photocopy something,   I'm making an identical work. But if I'm doing  that for purposes of scholarship, research,   teaching, criticism, that may very well be fair. If I'm a brand owner, I may very well want   to upload my content for purposes of… But  again, we are wandering past training to
            • 25:00 - 25:30 content. I want to talk about that, but I want  to close the loop on training. Right? But the   very fact that it is hard to separate  these two, I think is really important.   Right? Because this is what we're seeing happen  in the, in the, in the lawsuits. Right? Why are   people upset about training? Maybe they're upset  about training because they sort of conceptually   don't like the idea that a machine might learn  from their work. Even if the machine is gonna   produce cats surfing in Venezuela, right,  that has nothing to do with their work.
            • 25:30 - 26:00 That's a kind of weird objection though,  right? I, and when you get down to it,   I think the objections mostly end up turning  out to be I'm afraid, either one of two forms,   right? One is I'm afraid that the output will be  too similar, right? That you will in fact sort   of end up copying my work in the output. Right? And that's an issue we're going to   turn to in a minute, right? Or the objection is,  right, I'm afraid of competition from something   that isn't infringing my work, right? And that's  an objection and it goes a little bit to Max's   point, right? But it's not an  objection copyright law cares about.
            • 26:00 - 26:30 So let me just sort of close the loop  on training with two points to make,   right? One is the reason that the lawsuits  so far are mostly focused on training and   not output training. Is that training is an  existential question. For AI, it's an existential   question for AI because of the way we have  structured copyrights remedies regime, right?   Copyright has a statutory damages provision.  So for every registered work if you show   infringement, you can get not just the actual  harm you suffered you can get a minimum of $750
            • 26:30 - 27:00 per work and a maximum of $150,000 per work,  depending on intent. If in fact training on 2   billion images is an act of infringement of all of  those 2 billion images then even assuming we pick   the minimum threshold, we don't assume, you know,  we pick the minimum statutory damages that we are   required by law to give, right, we're at 1. 5 trillion dollars in damages. plaintiff's
            • 27:00 - 27:30 class actions lawyers can do that in that math  right? And 1. 5 trillion dollars in damages   sounds pretty good, even if you're not gonna  get 1. 5 trillion maybe there's a settlement,   maybe there's a lot of money to be had here. And  so rather than the hey, sometimes occasionally   there is an output that's substantially similar  and that's infringing, we want the big hit right?   We want to say the whole thing is infringing  and we are entitled the damages for all of that.   Whether that's true, of course, depends a little  bit on the issue we've also talked about, right,
            • 27:30 - 28:00 which is transformative use, but it also depends  on another question, the fourth factor of the fair   use, right, which is the market effect. Now traditionally we think of the market   effect as is my work substituting for yours,  right? Are people buying my copy of this song   rather than your copy of this song? If  so, that's unlikely to be a fair use. But   as Paul previously mentioned right, one of  the things we think about in market effect   is also the possibility of a licensing market. And so one of the things that I think is an
            • 28:00 - 28:30 unsettled question in the fair  use inquiry is will we see,   can we see market for licensing training data?  Is this a feasible thing? And Paul, I'd like,   love to hear your thoughts on sort of, is the  world going to end up not with with lawsuits and,   and statutory damages, but with a kind of like. I think the answer to that Mark is yes, but it is   a highly qualified yes. I think that however the  current raft of lawsuits resolve themselves there
            • 28:30 - 29:00 is going to be licensing going forward. There is  a very substantial chance that there will be, in   part because of this instinct about reparation. But also in part stuff that is happening in Europe   right now, and Danielle properly, properly alluded  to foreign activities as being very important. And
            • 29:00 - 29:30 I'll get that, to that at the It's the fourth of  my licensing possibilities, but there are four   ways in which licensing of training activities,  exclusively training activities could go forward.   One is the traditional negotiated two party  license. There is some precedent for that.   I think one of the early ones was OpenAI's  license with the Associated Press and there
            • 29:30 - 30:00 are others going forward. The problem with  that is scalability. And transaction costs.   There's just too many licenses to be negotiated. There is a variable factor there, though. There   was a really ill conceived bill introduced in  the Congress this week by Congressman Schiff that   would impose a duty of transparency on platforms  that are doing training and require them to list
            • 30:00 - 30:30 all copyrighted works that they've trained on. It's, it's a very happily, it's a very short   bill. But it is just totally ill conceived. And  among other things, the sanction for noncompliance   is a one time fine of 5, 000, which I.  Well, let's put that to the side. I think   individual negotiated licensing is probably a  non part. The second and third are collective
            • 30:30 - 31:00 licensing and compulsory licensing. Both topics that the Copyright Office   asked for input on in their current notice and  inquiry that will be getting a report from them,   I think, starting in a couple of months. And there  will be a series of reports. The responses to that   were collective licensing. There was some  support among authors groups particularly
            • 31:00 - 31:30 and that is if you think of ASCAP, BMI and CSAC  and the music area of having collective licensing   by collective management organizations, CMOs. It got some support. The problem in the U. S. with   collective licensing as a solution to transaction  costs is that the U. S. has four CMOs for musical
            • 31:30 - 32:00 performance rights and virtually no CMOs for  all the other kind of content that is subject   to copyright. You contrast that with Europe and  Latin America and In Asia, where there is a single   CMO for each, at least one, but typically one  CMO for each area, for photography, for visual   arts generally, for writing and so the thought of  getting those collecting organizations in place in
            • 32:00 - 32:30 the U.S. is certainly problematic. There was less  support for compulsory licensing. And, and, and   for good reason, the major proponents of it were  student groups, one from Hawaii the University   of Hawaii, and the other from Brooklyn Law School.  The two student groups liked compulsory licensing,   nobody else seemed to care for it. And again, for good reason compulsory
            • 32:30 - 33:00 licenses are frowned upon as Danielle alluded to,  there is the Byrne Convention which in Article   9. 2. Limits the, puts limits on a country's  ability to subject normal free market licenses to,   to licensing. The fourth kind of licensing  and the one that I think is most likely   to come into place, and to do so within the  next two years, is automated licensing.
            • 33:00 - 33:30 Metadata attached to individual works that will  communicate with the platform prior to training   saying, I don't want to be copied. I will agree  to be copied, but I need to be compensated in   this amount. Or, or, you know, go ahead and  copy, but give me this other consideration.   I'm sure many of you recognize that as  content ID which makes YouTube possible.
            • 33:30 - 34:00 Within a world of safe harbors where  otherwise they're subject to notice   and takedown. I think that's likely for  a couple of reasons. It has precedent.   The content ID precedent. It's an elegant, low  cost solution. And probably the most compelling   reason is we're not gonna have any choice. Some of you may be familiar with Article 4. 3 of
            • 34:00 - 34:30 the Digital Single Market Copyright Directive that  creates, carves out an exception for training,   but says in the event that a rights holder  gives notice that it objects to the training,   that notice must be honored. Well, Article 4.  3, the opt out provision applies only within,
            • 34:30 - 35:00 applies only to copying, training going  on in Europe, in the individual countries   under the principle of territoriality. What's happened more recently, last month   was the adoption in the European Union of the  AI Act which in Article 53C makes the Article
            • 35:00 - 35:30 4.3 obligation an operational obligation across  the board. Not only for training that occurs   within the European, the countries of the  European Union, but that occurs anywhere.   Let me just raise real quick there's an obligation  among nations to put in place a policy to comply   with union copyright law, Article 4. 3, and in  particular to identify and comply with, including
            • 35:30 - 36:00 through state of the art technology, a reservation  of rights expressed pursuant to Article 4. 3. Now   that is going to apply extraterritorially. It's going to apply to any Platform that does   its training anywhere. So long as they're  doing business in the European Union it's   akin in that respect to the GDPR general data  protection regulation which similarly imposes
            • 36:00 - 36:30 an extraterritorial obligation, respecting data  privacy on countries outside the European Union,   if they are doing business in the European Union. And that has had the effect among American.   companies that are in that business of  conforming their conduct in part because   of a vacuum of privacy law under U. S. law,  but in part because they need to do business   in the European Union. This is the so-called  Brussels effect, and we will find ourselves,
            • 36:30 - 37:00 I believe, when the AI Act comes into force it's  about two years and a couple of months from now.   In a position where that kind of compliance  will be required. And, and worth noting,   there is no fair use doctrine in Europe. There  is no fair use doctrine in Europe. So, so, I,   I want to say on the licensing issue, I mean, I,  I am troubled by this, right, because I think the   economics are fundamentally different. Then the economics of other places where
            • 37:00 - 37:30 licensing is work, right? We have, licensing  for satellite broadcasts of transmissions,   right? We have licensing for kind of covers  of songs, right? And that works because the   thing I'm using is one, or maybe a couple of  individual copyrighted works. And so we know   who the people we want to pay are and so forth. I, I, but I struggle with sort of what it would   mean to say we'll pay a fee. For training  on two billion images selected from the
            • 37:30 - 38:00 LION database. So stability AI, right, the whole  company's worth two billion dollars, maybe less   right now after recent developments, right? But  right, even if we said, all right, you know what,   we're gonna take half of the the entire value of  the company and pay it in compulsory license fees   to copyright owners, everybody gets fifty cents. I don't think when compulsory, when people talk,   think about like compulsory licensing, they think,  I want my 50 cents. That's not 0. 50 per use,   that's 0. 50, period, for training, right?  And so what I worry about is that if in fact
            • 38:00 - 38:30 we're in a world that Paul's talking about,  right, the what we're gonna see is a bunch   of people who say, Sure, I'll license this. my thing to be trained for 5, 000, right? Or maybe   500, right? And that's just impractical, right? No  one can build a large training data set. There may   be specialized ones, right? It may be that you  want to train on a few particular things. Music   might be a great example of where people would  be willing to pay a certain amount of money.
            • 38:30 - 39:00 Right. To train a music data set, although  that's often going to be because we want   to generate things that seem very similar to  your to your song. And that might not be a very   popular. So I worry a little bit that sort of the  practical effect of this is it's not going to be,   we'll get a licensing scheme that works. I just don't think the economics work for   it. It's good. We'll get a bunch of people  who say. Right? Sure, I'll do it for money.   And then we just opt out completely. Right? That's  a, the result will have to be a They're just,
            • 39:00 - 39:30 we can't train on anyone who who doesn't give  it to us for free. Or, we can't train unless   we happen to be Google or some other company  that has gotten the ability to collect this   information for other purposes and has put  somewhere in their terms and conditions that   we can use this for whatever purpose we want. If my royalty statements for my photos are any   indication, it's about a hundredth of  a cent for a photograph, give or take,
            • 39:30 - 40:00 I think is what I can't remember if it was a  thousandth of a cent or a hundredth of a cent,   so. Don't, don't spend it all in one place. I  know, it's, you know, I think the total for the   number of photos I have up in that library  for that training data was about a penny.   Mark, two responses, Mark. One is, you started off  talking about compulsory licensing, which is going   to be a non starter. But notwithstanding that,  and bear in mind, we do have a compulsory license   that was implemented in the Music Modernization  Act a couple of years ago that creates a blanket
            • 40:00 - 40:30 license for making copies of all music subject to  a rate set by some administrators in Washington.   So we do have that possibility for dealing with  millions of content. It's a so called It's sort   of a hybrid. It's a compulsory blanket license.  And so everybody's work comes under it. Under,   under the second observation you made, which I  think relates more to content ID, which I think
            • 40:30 - 41:00 is right, is going to be the path going forward. It's a content ID like system. The situation there   is, sure, there can be plenty of people who  say, I want 5, 000. And they're going to get,   they're going to get rejected. And pretty quickly  the market is going to drop down to the 0. 5 cents   per use or per training use, whatever it is,  because that's the most you're going to be paid.   It's like a Spotify royalty, which is, you know,  close to zero. So I think the market is going to
            • 41:00 - 41:30 drive down those demands. I'm not saying 5,  000 is unreasonable. What I'm saying is it's   the market is not going to sustain it. Alright,  so people wanted earlier to talk about output   infringement and I think we want to talk about  the shift from training to output infringement.   So, I, I noted earlier that training is an  existential question, right, because it's sort   of the, the potential amounts of money here if  you can't do it legally are, are enormous. Output
            • 41:30 - 42:00 infringement by contrast is is a more specific  and targeted problem, right? The vast majority   of things generated by generative AI are not  substantially similar to any copyrighted work.   Alright, they are not infringing, they are not  a problem. But sometimes it happens. Right why   does it happen? I, one, one reason it happens  is, is what we call in computer science the   deduplication problem. Right, so it's not  that generative AI decided to copy your
            • 42:00 - 42:30 particular text from this particular instance. Right, it turns out there are 10 or 15, 000   copies of this particular image or of the Harry  Potter books floating around on the internet that   got into the training data set. And so when  you ask it a specific enough prompt, right,   you get a result that is an amalgam of the closest  universe of things and all of those closest   universe of things turn out to be the same image. Right and so the result looks like the same image
            • 42:30 - 43:00 or looks like the same text. Second reason it  might happen is sort of deliberate prompting   of infringement. So when I work with the  folks in the computer science department,   right, if you ask if you ask ChatGPT to give  you a story about kids who go to a wizarding   school in Britain, it doesn't give you Harry  Potter or anything similar to Harry Potter.   But if you ask it to give you a story about  kids who go to a wizarding school in Britain
            • 43:00 - 43:30 that begins with, and then goes feed it into the  prompt, the first paragraph of Harry Potter. Well,   then it actually does spit out something  relatively close to the first chapter   because it recognizes that that particular  unique combination of words is likely to   occur only in specific contexts, along with. Other combinations of words and we see some   examples of the prompting infringement problem in  the New York Times lawsuit, right? Where the New   York Times says, Hey, look OpenAI spat out our  news story. And if you go look behind the scenes
            • 43:30 - 44:00 at the exhibits, right? It turns out that New York  Times OpenAI spat out our news story when we said,   give us a New York Times story with this title  that begins with the first nine paragraphs.   And, you know, that's, I mean, that is  an act of infringement, I think, alright,   although the question of who's responsible for  it, I think, is an interesting one. And then   there's another set of problems which I think  of as the baby Yoda problem, right and that is,   there may just be concepts, right that the,  that the, the, that the software recognizes as a
            • 44:00 - 44:30 concept in the same way it recognizes coffee cups  to generate an image. There are enough Baby Yodas   out there in the world and they all look similar  enough that if you ask it, give me Baby Yoda,   it knows what a Baby Yoda is and it will give you  a very realistic looking image of something that   turns out is copyrighted. I, what, so, I think,  Angela said earlier, this is a potential problem.   Right? I mean, even if you think training is  fair use, generating an image of Baby Yoda
            • 44:30 - 45:00 seems much less likely to be a fair use. What do  we do about it? I think we're pattern matching   to stuff we saw before. And that's, I mean,  this is a legal audience. Yeah, that's what,   I mean, that's our bread and butter. But I think that, I just want to see   if we can forecast a year or two in the future  about what these outputs are. What is AI4 at all?   Why do we have people are still trying to figure  that out. But I want to relate a very short story.
            • 45:00 - 45:30 Someone emailed me, and Things were getting a  little tense. They sent me a winky face emoji   and that was like a hit of dopamine. I was like, Oh, maybe, maybe things   are okay. It wasn't being conveyed  in the language, but in that emoji,   there was something being communicated that they  couldn't before. And at least what we're trying   to do. And I think a lots of procedural generation  companies are trying to do. We're trying to expand
            • 45:30 - 46:00 the way that people can make outputs at all. Expand the vocabulary that you, you can,   we want you to be able to communicate new things  with each other. And so from that perspective,   I think like we're getting, we're getting very  stuck just pattern matching to the world we see.   And it's like caveman thoughts, like money  there. It's all it. I made it. I want it.   Give me money. Or also like, I'm scared  because new, how do I do new? What is new   for? But I think like very soon. Generative AI  is going to be in, it's going to be stuck to
            • 46:00 - 46:30 human culture. It's going to be like language.  We're not going to be able to communicate with   each other without it intermediating. So  I think it's really important to think   now how we want, what do we want the laws to be. Do we want copyright law to restrict how we think   when we start using it as such an inseparable  tool in our thinking? Do we want copyright law
            • 46:30 - 47:00 to restrict how we communicate with the people  we love? I think we're going to see it go,   you know, taking Max's challenge looking a  year or two ahead. I think we're going to   see some of the litigation go some of the same  ways that we saw with the file sharing cases.   And I do think, you know, the maybe the Grokster  case the inducement liability might, you You,   you know, to the example I gave earlier about  if you're training a AI algorithm specifically
            • 47:00 - 47:30 for a specific purpose, so you're creating  one to training it on the Star Wars universe   to create Star Wars content and, you know, to  encourage people to create Star Wars content   that is infringing, I think we're going to  see, you know, something along the lines of   an inducement liability on something like that. I think it'll be interesting to see with the mid   journey type and OpenAI chat GPT stuff  where it's a more general algorithm,
            • 47:30 - 48:00 more general purpose algorithm. It's gonna be  interesting to see where those go. But I do   think there, in some cases, we're going to see  an inducement form of liability come into play.   So, I'll just add on to that a, a few things  and this is the point that maybe I started to   make on the, on the training question that was  better suited here. If you imagine a scenario in   which a work is generated through an AI tool  that is plainly incorporates the, you know,
            • 48:00 - 48:30 the protected expression of a work. Or is it an identical copy? I mean,   there are easier ways to do that than trying  to intentionally engineer a prompt to deliver   that. Again, back to our photocopier example.  But assume that you are able to generate that.   There are any number of perfectly valid and  appropriate reasons you might want to do that.   If I'm a brand owner that has a really fantastic  logo, for instance, and I want to feed that into
            • 48:30 - 49:00 a tool to figure out an ideate around potential  variations, ways that I might expand upon that. I   have every right to do that and it would return  content that I own. If I want to ask tool to   produce some image for me that I want to use in  an article to highlight, for instance Concerns   over the creative content of, of someone. I'm allowed to use that for news reporting,   even if in another context, with another intent,  it might be infringing. So there is this,
            • 49:00 - 49:30 there is this desire, I think, as we have this  conversation, to put everything in, in yes or no,   black or white boxes, but it doesn't quite  work that way. It's very contextual.   There is no question that these tools  can be misused for ill purposes and the   copyright law does not allow that. What can't  be said is that the fact that these tools are
            • 49:30 - 50:00 capable of generating content that in certain  instances will look very much like content,   copyrighted content on which it is  trained, is necessarily improper.   And I want to take us back to Warhol. So the  professor mentioned that in Warhol, Andy Warhol,   who had been through a license entitled  to make a silkscreen of a Lynn Goldsmith   photograph of Prince. Now, he made more than  one. He exceeded the scope of the license. He   made 16. And later when Prince died, And the, a  magazine wanted to write an article about prints.
            • 50:00 - 50:30 They went back to the Andy Warhol Foundation and  found out that there were additional silk screens   and used one that hadn't been authorized. And the  court found that's not okay. There was actually a   licensing market for this work. In fact, that's  how you got your hands on it in the first place.   And you used it for the specific purpose  that it had originally been licensed, right?   You used the photograph of Prince to create  a work for illustration of him in an article.
            • 50:30 - 51:00 What the court stopped short of saying is that  the making of the work in and of itself, the   other silkscreens that never appeared anywhere,  was not fair. Because until those are deployed,   until we know how they're being used,  we don't know if it's fair or not.   It's sort of a Heisenberg Uncertainty  Principle, or, you know. There is an element of,   if the painting had been hung in a gallery, or  if it had been used for educational purposes,
            • 51:00 - 51:30 That may very well have been a fair use.  And the court said we can't decide that   here. We're not going to decide that here. We're going to decide on this particular use.   So I think it's right to say that this is right.  Output infringement, unlike training infringement,   which is a kind of existential question, is a  fact specific contextual question, right? What is,   what particular thing has been generated? And who  has generated it, although I do think, to go to   Daniel's point, one difference here between this  and the internet cases, which may matter a lot   for the AI companies, is it's not obvious that  your liability is only for inducement, right?
            • 51:30 - 52:00 And so one of the questions we resolved early  on in the internet was this sort of question   of volition, right? If I ask a machine to give  me something and it gives me something Who's,   who, who's making the copy? Who's the  direct infringer, right? On the internet,   right, we were happy to say that if I'm just  hosting content that other people put up there,   the fact that I sort of deliver it in an automated  way doesn't make me the maker of the copy,
            • 52:00 - 52:30 it's the person who's asking for it. AI looks a little different though,   right? Because I am generating a new thing  in response to a prompt. I think there may   be a line at which a specific enough prompt  that is clearly designed to misuse the system   might make the prompter and not the ai directly  responsible for infringement. But I, I think there   are gonna be a bunch of things where a prompt  generates some output that is infringing.   But it is the AI itself that is making the  thing and therefore it's the AI itself that is
            • 52:30 - 53:00 the direct infringer. That, that matters I think  because copyright is a strict liability offense.   The fact that you intend, you didn't intend to  do it, the fact that you sort of took efforts to   try to prevent yourself, prevent it from happening  won't necessarily mean that you avoid liability.   Max. I, I guess I wonder who would do that?  Like, I mean, if you want a direct image,   you can just go find it on the internet. Who  is using generative AI tools to intentionally   copy what's in there? So, plaintiff's lawyers is  one answer, right? That's kind of a very small
            • 53:00 - 53:30 answer, right? Most, most of those things have in  fact been the plaintiff's lawyers in the case.   But it's a, it's a fair question, right? But I,  so my guess is it's not. It would be, it's a very   bad copy machine. It's a very inefficient way to  use as a copy machine. So I agree with that. In   contrast to the grokster. But it, what it's gonna  be, I think is gonna be more the style case. It's   going to be the sort of like, I want something  that looks similar to but not identical to this   thing and maybe it's too similar, maybe it's not. But that again is going to be a sort of
            • 53:30 - 54:00 case by case. Well, it's also the derivative works  more than the direct infringement, just to answer   that question. Alright, alright. So, we have lots  more things that I would love to talk about but   I also see people lining up at questions and so  maybe we should hear what you want to talk about.   Pablo. Alright. So, Professor Goldstein the fair  use. I'm going to use this question, which is sort   of an existential issue. You mentioned sort  of coming down to the transformation question   and noted that there's fuzziness around that,  which I think we would, not surprising given   where that doctrine has to operate. But it occurs to me that it, you know,
            • 54:00 - 54:30 back when you taught us copyright, if you  said we're now going to all go brainstorm,   the most transformational use cases we can think  of, like what is the most, almost a caricature of   transforming something. Nobody could have come up  with, I'm going to create 30, Pointing randomly   and then have them reorient themselves so that  they can guess the next word of these texts.   Like, that would have been, and so even  in a world where there's some fuzziness,   what I don't understand is how is this  anywhere close to a close call? And
            • 54:30 - 55:00 tell me what I'm missing because it seems to me  that like, any fuzziness of the doctrine aside,   we're so beyond, we're so off the charts for  transformation that then game over, no licensing,   nothing, we just all go back to, you know. Yeah. As usual, Pablo, you're not far off the   mark. You weren't in class and you weren't there.  I, no, I didn't mean to say you weren't present   in class. You were always present in class.  No, I think, you know, the way transformative
            • 55:00 - 55:30 use worked in the Google Books case was the use  to which ultimately the copied works were put,   the snippets from the copied works were put. And so I think transformative use here would,   in looking at training activities, would say To  what use are these measurements, these tokens   effectively being put, and they're being put to  a use that is totally transformative as compared
            • 55:30 - 56:00 to the material on which on which they were  trained. So that's, does that take care of it?   Good. Okay. And I just, I would just add to, to  that though, that part of, you know, the, the,   Like, I wrote an amicus brief in the Warhol case  and there were a bunch of amicus, a huge number   of amicus cases briefs. And you know, I think one  of the, the concerns that, that we had and that   others have had was that the, the focus of fair  use analysis has turned, is it transformative?
            • 56:00 - 56:30 Forgetting that there are three other factors and,  and even transformative is nowhere in the fair use   So, it, it's just been something that has become  an easy way to just, oh, it's transformative,   so clearly it's fair use, but, and, and I  believe if I recall correctly from the opinion,   the court said we have to analyze these  four, you know, these all four factors.
            • 56:30 - 57:00 And I think they even did in the Oracle v. Google  case, which also turned on one particular factor,   but that, that's gotten lost in fair use  jurisprudence, is that transformativeness is   not the only determinant factor. I wondered if you  could comment on the acquisition of the training   data, in particular terms of service on websites. Say I downloaded, I don't know, millions of hours   of YouTube videos, let's say, and then  built a video generating system, you know,
            • 57:00 - 57:30 as an example. Yeah. Just a random example.  You know, just, just a random example. And,   and, you know, you know, I didn't do this, but one  may have done this. Yeah. Some would. Yeah, so I,   I think the answer is probably not a copyright  problem per se, but quite potentially a sort   of breach of contract, breach of terms of use  problem, if in fact that's how you get it, right?   Now, most of the databases, I think  most of the generalized ones, right,   have used common crawl right, which respects the  robots. txt header. So they said, we're gonna go
            • 57:30 - 58:00 crawl the internet for the universe of things  that have said, yes, please crawl me and index   me in a technical system. I think probably  the pe many of the people who set robots.   txt to yes, didn't have generative AI in mind.  They have, I want to appear in search engines in   mind, and it might be that we Right. In the  future, start to distinguish those things,   right? Or the lion database, right? Which is a  sort of database of image categorizations that   is in turn taken out of common crawl, right? That is an effort to sort of get around those
            • 58:00 - 58:30 problems. I don't think it gets around  all of the copyright problems in part,   because many of the people who put up  information or data on a website and said,   sure, please index me might have illegally taken  that information, right? The books. 3 database   might be an example of that but I do think if  you are going to a specific website to crawl   that website and it does not have a robots. txt header that says no problem you ought to
            • 58:30 - 59:00 be worried about worried about that. Hello, so my  question is more about what would be the model for   compensating the creators for the content, whether  that is text, like a webpage. a video, maybe   upload it on YouTube or an a photography work. So as industry and technology developed   it seems that people found ways how to compensate  the creators. So, for web, we can embed AdSense
            • 59:00 - 59:30 and Google pays the, you know, web web page  creators for for their work. For video, we have   a YouTube, kind of determining algorithmically how  to pay YouTube creators and you know, Spotify is,   you know, doing that for, for, for, for audio. My question, do you think there would be a kind   of emergence of maybe Spotify or YouTube or  AdSense, some type of a technology that would
            • 59:30 - 60:00 be embedded in the, in the content itself  that could help, you know, companies like   MidJourney or OpenAI to actually compensate?  To those Paul, you want to take it first?   Yeah, the I think it's going to be various.  It's good. You know, it's interesting   that you use the example of Spotify which is  a smoothly functioning, not very low paying,   but I'll be it, but a smoothly functioning  operation, which rests on a compulsory
            • 60:00 - 60:30 license. It's the digital phono record delivery  compulsory license of art of section one 15.   And that is basically what drives Spotify, and  they have periodically to negotiate rates between   them and the publishers with the Copyright Royalty  Board. But that's one way to do it. Another   way is the YouTube way. Interestingly, the legal  infrastructure ultimately makes, whether it's a
            • 60:30 - 61:00 effectively negotiated license with Content ID  or a compulsory license with, with Spotify.   The legal infrastructure at the end of the  day makes very little difference. It's the   economic arrangements built on top of it that,  that really count. Angela, real quickly and   then I think maybe one more, i, I just wanted to  say it, it really depends too on whether you're   suggesting that there should be some sort of  licensing regime For outputs versus training.
            • 61:00 - 61:30 I mean, to Mark, Mark's point earlier, the only  conceivable regime would involve some mechanism   for a single training license fee. You, you cannot  conceivably imagine that everybody who's had a,   taken a picture of a mountain somewhere  where a, a platform has been trained on   two million pictures of mountains suggest that  an output that may have a mountain in it that   looks nothing like those two million mountains  would be entitled to any kind of compensation.
            • 61:30 - 62:00 That's never how copyright law was, was meant  to work. And, and it, it would, It would destroy   the technology before it's had the ability to  come into full fruition, to solve all sorts   of problems. We haven't even talked about on  this panel. I mean, there are amazing as yet on   discovered reasons. We need these technologies  developed and drug, drug discovery disease   pathways, solving traffic, solving climate. Big questions, right? Not just cats on,
            • 62:00 - 62:30 on surfboards in Venezuela. And I'm so going  to Google that now, Angela. If I can just make   one comment on the business models, because  when we didn't get to, we were going to talk   about was like the celebrity impersonation  and digital avatars, digital replicas.   There are a ton of companies out there trying  to, and music actually also trying to develop   competent compensation and tracking models and  different, you know, products to try to handle
            • 62:30 - 63:00 this licensing concept. But so I think  there is and that, and that to me, right,   that's going to work because it is specific  to an individual or a few individuals, right?   If I want to make, if I want to use your  image, right, then, then I ought to be   paying you. Right. I think it's much harder if  it's everyone who's ever taken a picture of a   mountain should get a billionth of a cent. All  right. Last question. Yeah. Question regarding   the current court cases. What is unique about them  compared to some of the historical ones regarding,   you know copyrighted data, IPs indexed by some of  the search engines we have seen in so many cases?
            • 63:00 - 63:30 What are the current cases makes different  regarding usage of radio? I mean, I, so, well,   they're current, they're ongoing, so we don't  know how they're going to come out. I mean, I,   I, I think as Angela has kind of suggested,  right, and Paul suggested in response to   Pablo's question if, if the question is, what  does existing precedent say as to training?   Existing precedent lines up pretty strongly  in favor of, this is a very different purpose
            • 63:30 - 64:00 it's a new technology that's going to be  a fair use Things that might change it,   right? A licensing market, right? If we  thought there was a working licensing market   and therefore you're depri you're depriving  people of revenue, that can change the fourth   factor and it could change the analysis. And tech lash, right? We're we are in a   moment of sort of like AI moral panic. And I  think that affects judges and it might well   be that the a legal decision that in a different  technology, in a different kind of psychological
            • 64:00 - 64:30 era would clearly have come out in favor of  the tech company might come out differently   because people are afraid of AI. That's not what should happen,   but it might happen. And I think with that,  we're going to have to stop. Thank you all.