Exploring the Limits of AI

Yann LeCun: Human Intelligence is not General Intelligence // AI Inside 63

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

In this illuminating episode of the "AI Inside" podcast, Yann LeCun, Chief AI Scientist at Meta and Turing Award winner, discusses the current limitations and future directions of AI technology, particularly focusing on Large Language Models (LLMs). LeCun provides insights into why the promise of Artificial General Intelligence (AGI) isn't just around the corner, emphasizing the need for machines that truly understand and interact with the physical world. He critiques the hype surrounding LLMs, asserting that while they're useful, they fall short of representing a comprehensive model of intelligence. Instead, he advocates for a future where AI progresses through open collaboration, akin to the early days of the internet, and stresses diversity in AI systems to truly capture global perspectives.

Highlights

Yann LeCun, often dubbed the 'godfather of AI,' sheds light on why LLMs like ChatGPT are not the ultimate path to AGI 📚.
LeCun emphasizes the importance of AI that understands and interacts with the real world 🌍.
He argues that human intelligence is specialized, not general, challenging the notion of AGI as a close reality 🔍.
Meta's strategy of open-sourcing LLAMA is aimed at democratizing AI development and fostering faster innovation 🌐.
LeCun envisions a future where AI assistants aid daily life, but notes achieving human-level intelligence is a significant challenge 🌟.

Key Takeaways

LLMs are useful tools, but they aren't the breakthrough to general intelligence 🎯.
AI needs to understand the physical world and have a world model to advance 🚀.
True intelligence isn't just about solving known problems but adapting to new ones 🌟.
Open-source AI, like Meta's LLAMA, opens doors for innovation beyond a few big players 🤝.
We shouldn't feel threatened by AI; it should empower us, working as our 'smart assistants' 🔧.

Overview

Yann LeCun, the Chief AI Scientist at Meta, candidly discusses the real-world applications and limitations of Large Language Models on AI Inside podcast. He shares his skepticism about the prevalent notion that AI is on the brink of achieving Artificial General Intelligence.

LeCun proposes a forward-thinking perspective that AI systems need to deeply understand and model the physical world to reach new heights. In his view, creating open-source platforms like Meta's LLAMA is pivotal. This openness is intended to democratize AI research, accelerating innovation through global collaboration.

Looking to the future, LeCun imagines a world where AI assistants become integral companions in daily life. However, he maintains that reaching human-level AI isn't just around the corner—it requires groundbreaking advancements in understanding and technology.

Chapters

00:00 - 01:30: Introduction and Guest Welcome The chapter begins with an introduction to the main topics and themes that will be covered in the discussion. The host warmly welcomes the guests, outlining their background and expertise related to the subject matter. The conversation is set to explore various aspects of the topic, promising an in-depth and insightful dialogue.
01:30 - 05:00: Discussion on LLMs and AI Reliability In this chapter, Jason welcomes Yann LeCun, the chief AI scientist at Meta, to discuss topics related to Large Language Models (LLMs) and the reliability of AI systems.
05:00 - 10:00: Limits of AI and Future Research Directions The chapter is titled 'Limits of AI and Future Research Directions' and features a dialogue with a Turing Award winner, often referred to as the godfather of AI. The chapter opens with a welcome to Yann, who is a guest on the show, setting the stage for a discussion likely centered around the current limitations of artificial intelligence and potential avenues for future research in the field. The presence of a prominent figure like Yann suggests that the conversation will offer deep insights into AI challenges and innovations.
10:00 - 15:00: Challenges in Achieving Human-Level Intelligence The chapter opens with a conversation between Jason and Yann, where Jason refers to Yann as the 'godfather of AI.' Yann modestly responds by saying he shuts his ears to avoid turning red, indicating his humility despite the recognition.
15:00 - 30:00: Meta's Open Source Strategy and AI Future The chapter discusses the dynamics of the AI industry, specifically focusing on the differing opinions on the effectiveness and potential of Large Language Models (LLM). Despite OpenAI securing substantial funding due to its success with LLM technology, there are concerns about diminishing returns. The chapter explores why some companies continue to heavily invest in generative AI and LLM, possibly overlooking the limitations highlighted by certain experts.
30:00 - 45:30: Reflecting on Education and AI's Potential The chapter discusses the current and potential future applications of Large Language Models (LLMs) in various fields, particularly in coding and as AI assistants. Yann acknowledges the usefulness of these models but points out that they are not yet fully reliable, especially when considering their application as agentic systems.

Yann LeCun: Human Intelligence is not General Intelligence // AI Inside 63 Transcription

00:00 - 00:30
00:30 - 01:00 Jason: I am thrilled to welcome to AI Inside Yann LeCun, chief AI scientist at Meta,
01:00 - 01:30 Turing Award winner, known by many as the godfather of AI. Welcome to the show, Yann. It's really nice to meet you. Yann: Thanks for having me on.
01:30 - 02:00 Jason: Does it ever get old hearing someone introduce you as the godfather of AI? Are you kind of like, yeah, here we go again. Yann: I shut my ears so I don't turn red. Jason: But you can accept it at this point because it's the truth. Well, there's so many different directions that we can go on this conversation. And we end up talking about your work a lot, obviously, the work of Meta and this moment in LLM. I think the kind of question that I have to kick things off is that we are so firmly implanted into the current realm of artificial intelligence which really seems to be the LLM generation, and there's probably something on the horizon around
02:00 - 02:30 that. But we're still firmly implanted there, and you've been pretty opinionated on the limits of LLM at a time when we're also seeing things like OpenAI securing a record breaking round of funding largely built on its success in LLM technology. And so I see diminishing returns on one side. On the other, companies betting everything on generative AI and LLM. And I'm curious to know what you think as far as why they might not be seeing what you're seeing about this technology.
02:30 - 03:00 Or maybe they are and they're just approaching it differently. What are your thoughts there? Yann: Oh, maybe they are. There's no question that LLMs are useful. I mean, particularly for coding assistants and stuff like that. And in the future, probably for more general AI assistants, jobs. People are talking about agentic systems. It's still not totally reliable yet. For these kinds of applications, the main issue, and it's
03:00 - 03:30 been a recurring problem with AI and computer technology more generally, is the fact that you can see impressive demos. But when it comes time to actually deploy a system thats reliable enough that you put it in the hands of people and they use it on a daily basis, there's a big distance. It's much harder to make those systems reliable enough. Right? Ten years ago, we were seeing demos of cars driving themselves in the countryside, streets, for about ten minutes before you had to intervene. And we made a lot of progress
03:30 - 04:00 but we're still not to the point of having cars that can drive themselves as reliably as humans, except if we cheat, which is fine, which is what Waymo and others are doing. So there's been sort of a repeated history over the last seventy years in AI of people coming up with a new paradigm and then claiming, okay, that's it. This going to take us to human level AI. Within ten years
04:00 - 04:30 most intelligent entity on the planet would be a machine. And every time it's turned out to be false because the new paradigm either hits a limitation that people didn't see or turned out to be really good at solving a subcategory of problem that didn't turn out to be the general intelligence problem.
04:30 - 05:00 And so there's been generation after generation of AI researchers, and industrialists, and founders, making those claims, and they're being wrong every time. So, I don't want to poo poo LLMs. They're very useful. There should be a lot of investment in them. There should be a lot of investment in infrastructure to run them, which is where most of the money is going, actually. It's not to train them or anything. It's to run them in the end serving billions of users potentially. But,
05:00 - 05:30 like every other computer technology, it can be useful even if it's not human level intelligence. Now if we want to shoot for human level intelligence, I think we should. We need to invent new techniques. We're just nowhere near matching that. Jeff: I'm really grateful you're here, Yann, because I quote you constantly on this show and elsewhere because you are the voice of realism, I think, in AI. And I don't hear you spouting the hype that I hear elsewhere.
05:30 - 06:00 And you've been very clear about where we are now. I think you've equated us to, maybe we're getting to the point of a smart cat or a three year old. Yann: Not even ***** Jeff: Right. And I think you've also talked about that we've hit the limits of what LLMs can do. So there is a next paradigm, a next leap, And I think you've talked about them being understanding reality better. But can you talk about where you think research, where you're taking it
06:00 - 06:30 or where it should be going next? Where we should be putting resources next, to get more out of AI? Yann: So I wrote a long paper three years ago where I explained where I think AI research should should go over the next ten years. This was before the world learned about LLMs. And, of course, I knew about it because we were working on it before. But this vision hasn't changed. It has not been affected by the success of LLM.
06:30 - 07:00 So here's the thing. We need machines that understand the physical world. We need machines that are capable of reasoning and planning. We need machines that have persistent memory. And we need those machines to be controllable and safe, which means that they need to be driven by objectives we give them. We give them a task, they accomplish it, or they give us the answer to the question we ask, and that's it. Right? And they can't escape whatever it is that we're asking them to do.
07:00 - 07:30 So, what I explained in that document is how we might potentially one way get to that point. And it's centered on a central concept called "world model." So we all have world models in our head. And animals do too, right? And it's basically the mental model that we have in our head that allows us to predict what's going to happen in the world. Either because the world is being the world or because of an action we might
07:30 - 08:00 take. So if you can predict the consequences of our actions then what we can do with that is if we set ourselves an objective, a goal, a task to accomplish, we can, using our world model, imagine whether a particular sequence of actions will actually fulfill that goal. Okay? And that allows us to plan. So planning and reasoning really is manipulating our
08:00 - 08:30 mental model to figure out if a particular sequence of actions is going to accomplish a task that we set for ourselves. Okay? So that is what psychologists call "System 2." A deliberate, I don't want to say conscious because it's a loaded term, but deliberate process of thinking about how to accomplish a task, essentially.
08:30 - 09:00 And that we don't know how to do really. I mean, we're making some progress at the research level. A lot of the most interesting research in that domain is done in the context of robotics. Because when you need to control a robot, you need to know in advance what the effect of sending a torque on an arm is going to be. So this process, in fact, in control theory and robotics, of imagining the consequences
09:00 - 09:30 of a sequence of actions and then basically, by optimization searching for a sequence of action that satisfies the task, even has a name, even has an acronym. It's called Model Predictive Control, MPC. It's a very classical method in optimal control going back decades. The main issue with this that in robotics and control theory, the way this works, the one model is a bunch of equations that are written by someone, by an engineer. You only control a robot arm or a rocket or something. You can just write down the dynamical equations of it.
09:30 - 10:00 But what we need to do for AI systems, we need this world model to be learned from experience or learned from observation. So this the kind of process that seems to be taking place in the minds of animals and maybe humans, infants, learning how the world works by observation. That's the part that seems really complicated to reproduce. Now, this can be based on a very simple principle which people have been playing with for a long
10:00 - 10:30 time without much success, called Self-supervised Learning. And Self-supervised Learning has been incredibly successful in the context of natural language understanding and LLMs and things like that. In fact, it's the basis of LLM. Right? So you take a piece of text, and you train a big neural net to predict the next word in the text. Okay? That's basically what it comes out to.
10:30 - 11:00 These tricks are how to make this efficient and everything. But that's the basis of LLM. You just train it to predict the next word in the text. And then when you use it, you have it predict the next word, shift the predicted word into its viewing window, and then predict the second word, then shift that in, pick the third. Right? That's autoregressive prediction. That's what LLMs are based on. And the trick is how much money you can afford to hire people to fine tune it so we can answer questions correctly. Which is what a lot of money is going into right now.
11:00 - 11:30 So you could imagine using this principle of Self-supervised Learning for learning representations of images, learning to predict what's going to happen in a video. Right? So if you show a video to a computer, and train some big neural net to predict what's going to happen next in a video, if the system is capable of learning this and doing a good job at that prediction, it will probably have understood a lot about the underlying nature of
11:30 - 12:00 the physical world. It thinks that objects move according to particular laws, right? So animate objects can move in ways that are more unpredictable but still satisfy some constraints. You're not going to have objects that are not supported fall because of gravity, etcetera. Right? Now, human babies take nine months to learn about gravity. It's a long process.
12:00 - 12:30 Young animals, I think, learn this much quicker, but they don't have the same kind of grasp of where gravity really is in the end. Although cats and dogs are really good at this, obviously. So how do we reproduce this kind of training? So if we do the Naive scene, which I've been working on for twenty years doing similar thing as taking a piece of text, but just taking a video and then training a system to predict what happens next in the video, it doesn't really work. So if you're
12:30 - 13:00 training to predict the next frame, it doesn't learn anything useful because it's too easy. If you're training to predict longer term, it really cannot predict what's going to happen in the video because there's a lot of plausible things that might happen. Okay? So in the case of text, that's a very simple problem because you only have a finite number of words in the dictionary. And so you can never predict exactly what word follows a sequence, but you can predict a probability distribution of all words in the dictionary. And that's good
13:00 - 13:30 enough. You can represent uncertainty in the prediction. You can't do this with video. We do not know how to represent appropriate probability distribution over the set of all images or video frames or a video segment, for that matter. It's actually a mathematically intractable problem. So it's not just a question of we don't have big enough computers. It's just intrinsically intractable. So until maybe five, six years ago, I didn't have any solution
13:30 - 14:00 to this. I don't think anybody had any solution to this. And one solution that we came up with is a kind of architecture that changes the way we would do this. Instead of predicting everything that happens in the video, we basically train a system to learn a representation of the video, and we make the prediction in that representation space. And that representation eliminates a lot of details in the video that are just not predictable or impossible to figure out. That kind of architecture is called a JEPA,
14:00 - 14:30 Joint Embedding Predictive Architecture. And I can tell you a little bit about about that later. But what may be surprising about this that it's not generative. So everybody is talking about generative AI. My hunch is that the next generation AI system will be based on non-generative models, essentially. Jason: So, what occurs to me in hearing you talk about the real limitations of where we're at when we take a look at what everybody seems to be claiming is so great about LLM and that "we're right on the precipice of AGI, Artificial General Intelligence, and here's the reason
14:30 - 15:00 why." It depends on who you ask. Right? Some people are like, "it's right around the corner." Other people are like, "oh, it's already here. Take a look at this. Isn't that amazing?" Jeff: Others are "it'll never be here." Jason: Yes. And then others are, "it'll never be here." I think often on this show where we talk about this topic a little bit in disbelief, and I think what you just said kind of punctuates that a little bit for me. How do you model around or create a model that can really analyze all aspects of what you're talking about? Like, we've got LLMs focusing on reasoning. Although maybe maybe it's a different type of reasoning compared to what we are looking at now. Maybe that is an actual reasoning in the way that humans reason. But then you've got the physical world. You've got the planning, this persistent memory. All those components that you talk about, when you put it that way, it really makes me more confident that AGI is not right around the corner, that AGI is
15:00 - 15:30 really this distant theory that may never or, at least a very, very long time down the road, come true? What are your thoughts on that? ********* Yann: Okay. So, first of all, there is absolutely no question in my mind that at some point in the future, we'll have machines that are at least as smart as humans in all the domains where humans are smart. Okay? That's not a question. People have kind of had big philosophical questions about this. A lot of people still believe that human nature is kind of impalpable, and we're never going to be able to reduce this to computation.
15:30 - 16:00 I'm not a skeptic on that dimension. There's no question in my mind that at some point, we'll have machines that are more intelligent than us. They already are in narrow domains. Right? So then there is the question of what does AGI really mean? Does it mean general what do you mean by general intelligence? Do you mean intelligence that is as general as
16:00 - 16:30 human intelligence? If that's the case, then okay, you can use that phrase, but it's very misleading because human intelligence is not general at all. It's extremely specialized. We are shaped by evolution to only do the tasks that are worth accomplishing for survival. And, we think of ourselves as having general intelligence, but we're just not at all general. It's just that all the problems that we're not able to apprehend, we can't think of them. And
16:30 - 17:00 so that makes us believe that we have general intelligence, but we absolutely do not have general intelligence. Okay. So I think this phrase is nonsense first of all. It is very misleading. I prefer the kind of phrase we use to designate the concept of human level intelligence within within Meta is AMI, Advanced Machine Intelligence. Okay? This
17:00 - 17:30 kind of a much more open concept. We actually pronounce it "ami", which means friend in French. But let's call it human level intelligence if you want. Right? So no question it will happen. It's not going to happen next year. It's not going to happen two years from now. It may happen or happen to some degree within the next ten years. Okay. So it's not that far away. If all of the things that we are working on at the moment turn
17:30 - 18:00 out to be successful, then maybe within ten years, we'll have a good handle on whether we can reach that goal. Okay. But it's almost certainly harder than we think. And probably much harder than we think because it's always harder than we think. Over the history of AI, it's always been harder than we think. You know, it's the story I was telling you earlier. So, I'm optimistic. Okay?
18:00 - 18:30 I'm not one of those pessimists who say we'll never get there. I'm not one of those pessimists that says all the stuff we're doing right now is useless. It's not true. It's very useful. I'm not people who say we're going to need some quantum computing or some completely new principle, blah blah blah. No. I think it's going to be based on deep learning, basically. And that underlying principle, I think, is going to stay with us for a long time. But within this domain the type of things that we need to
18:30 - 19:00 discover and implement, we're not there yet. We're missing some basic concepts. And the best way to convince yourself of this is to say, okay. We have systems that can answer any question that has a response somewhere on the Internet. We have systems that can pass the bar exam, which is basically information retrieval to a large extent. We have systems that can
19:00 - 19:30 shorten the text and help us understand it, They can criticize a piece of writing that we're doing, they can generate code. But generating code is actually, to some extent, relatively simple because the syntax is strong, and a lot of it is stupid. Right? We have systems that can solve equations, that can solve problems as long as they've been trained to solve those problems. If they see a new problem from scratch,
19:30 - 20:00 current systems just cannot find a solution. There was actually a paper just recently that showed that if you test all the best LLMs on the latest math Olympiad, they basically get zero performance, because there are new problems they have not been trained to solve. So okay. So we have those systems that can manipulate language, and that fools us into thinking that they are smart because we're used to smart people being able to
20:00 - 20:30 manipulate language in smart ways. Okay. But where is my domestic robot? Where is my level five self driving car? Where is a robot that can do what a cat do? Even a simulated robot that can do what a cat can do. What cats can do. Right? And the issue is not that we can't build a robot. We can actually build robots that have the physical abilities.
20:30 - 21:00 It's just that we don't know how to make them smart enough. And it's much, much harder to deal with the real world and to deal with systems that produce actions than to deal with systems that understand language. And again, it's related to the part that I was mentioning before. Language is discrete. It has strong structure. The real world is a huge mess, and it's unpredictable. It's not deterministic. You know? It's high dimensional. It's continuous.
21:00 - 21:30 It's got all the problems. So let's try to build something that can learn as fast as a cat, first of all. Jeff: I've got so many questions for you, but I'm going to stay on this for another minute. Should human level activity or thought even be the model? Is that limiting? There's a wonderful book from some years ago by Alex Rosenberg called, How History Gets Things Wrong, arguing that the theory of mind, he debunks the theory of mind, that we don't have this reasoning that we go through.
21:30 - 22:00 That in fact, we're kind of doing what an LLM does in the sense that we have a bunch of videotapes in our head. And when we hit a circumstance, we find the nearest videotape and play that and decide yes or no in that way. And so that does sound like the human mind a bit. But the model we tend to have for the human mind is one of reasoning and weighing things and so on. And, also, as you say, we are not generally intelligent, but the machine conceivably could do things that we, right now it does things we cannot do. It could do more. So when you think about
22:00 - 22:30 success and that goal, what is that model? Aa cat would be a big victory, to get to the point of being a cat. But what's your larger goal? Is it human intelligence, or is it something else? Yann: Well, it's a type of intelligence that is similar to human and animal intelligence in the following way. Current AI systems have a very hard time solving new problems
22:30 - 23:00 that they've never faced before. Right? So they don't have this mental model, this world model I was I was telling you about earlier, that allows them to kind of imagine what the consequence of their actions or whatever. They they don't reason in that way. Right? I mean, an LLM certainly doesn't because the only way it can do anything is just produce words, produce tokens. Right? So one way you trick an LLM into spending more time thinking about a question,
23:00 - 23:30 a complex question than a simple question, is you ask it to go through the steps of reasoning. And as a consequence, it produces more tokens and then spends more computation answering that question. But it's a horrible trick. It's a hack. It's not the way humans reason. Another example that LLMs do is, for writing code or answering questions,
23:30 - 24:00 you get an LLM to generate lots and lots of sequences, of tokens, that have some decent level of probability or something like that. And then you have a second neural net that sort of tries to evaluate each of those and then picks the one that is best. Okay? It's sort of like producing lots and lots of answers to a question and then have a critique of telling you which of those answers is the best. Now there is a lot of AI systems that work this
24:00 - 24:30 way, and it works in certain situations. If you want a system, your computer system to play chess, that's exactly how it works. It produces a tree of all the possible moves from you, and then from your opponent, and then from you, and then from your opponent. That tree grows exponentially. So you can't generate the entire tree. You have to have some smart way of only generating a piece of the tree. And then you have what's called a evaluation function or value function that picks
24:30 - 25:00 out the best branch in the tree that results in a position that is most likely to win. And all of those things are trained nowadays. Okay? ********** They're neural nets basically that generate the good branch in the trees and select it. That's a limited form of reasoning. Why is it limited? And it's, by the way, a type of reasoning that humans are terrible at. The fact that a $30 gadget that you buy at a toy
25:00 - 25:30 store can beat you at chess demonstrates that humans totally suck at this kind of reasoning. ********** Okay? We're just really bad at it. We just don't have the the memory capacity, the computing speed, and everything. Right? So we're terrible at this. What we are really good at, though, is the kind of reasoning and what cats and dogs and rats are really good at is, sort of planning actions in the real world and planning them
25:30 - 26:00 in a hierarchical manner. So knowing that if we want to, let me take an example in human domain, but there are similar ones in sort of animal tasks. Right? I mean, you see cats learning to open jars and jump on doors to open them and open the lock of a door and things like that. So, they learn how to do this, and they learn how to plan that sequence of actions to arrive at a goal,
26:00 - 26:30 which is getting to the other side, perhaps to get food or something. You see squirrels doing this. Right? I mean, they're pretty smart actually in the way they learn how to do this kind of stuff. Now this is a type of planning that we we don't know how to reproduce with machines. And all of it is completely internal. It has nothing to do with language. Right? We think, as humans, we think that thinking is related to language, but it's not. The animals can think. People who don't talk can think.
26:30 - 27:00 And there are types of reasoning. Mostly most types of reasoning have nothing to do with language. So if I tell you imagine a cube floating in the air in front of you or in front of us, Okay? Now rotate that cube 90 degrees or along a vertical axis. So probably, you made the assumption that the cube was horizontal, that the bottom was horizontal. You didn't imagine a cube that was kind of sideways. And then you rotate it 90 degrees, and you know that it looks just like the cube you started with because it's a
27:00 - 27:30 cube. It's got 90 degree symmetry. There's no language involved in this reasoning. It's just you know, images and sort of abstract representations of the situation. and how do we do this? Like, we have those abstract representation of thought, and then we can manipulate those representations through sort of virtual actions that we imagine taking, like rotating the cube,
27:30 - 28:00 and then imagine the result. Right? And that is what allows us to actually accomplish tasks, in the real world, at an abstract level. It doesn't matter what the cube is made of, how heavy it is, whether it floats in front of us or not. You know? I mean, all of the details don't matter, and the representation is abstract enough to really not care about those details. If I plan to, I'm in New York. Right? If I plan to be in Paris tomorrow, I could try to plan my trip to Paris in terms of
28:00 - 28:30 elementary action I can take, which basically are millisecond by millisecond controls of my muscles. But I can't possibly do this because it's several hours of muscle control, and, it will depend on information I don't have. Like, I can go on the street and hail a taxi. I don't know how long it's going to take for a taxi to come by. I don't know if the light is going to be red or green.
28:30 - 29:00 I cannot plan my entire trip. Right? So I have to do hierarchical planning. I have to imagine that if I were to be in Paris tomorrow, I first have to go to the airport and catch a plane. Okay. Now I have a circle going to the airport. How do I go to the airport? I'm in New York, so I can go down on the street, hail a taxi. How do I go down on the street? Or I have to walk through the elevator, the stairs, hit the button, go down, walk out the building.
29:00 - 29:30 And before that I have a circle going to the elevator or to the stairs. How do I even stand up from my chair? So can you explain words how you climb a stair or you stand up from your chair? You can't. Like this is low level understanding of the real world. And at some point, in all those set goals that I just described, you get to a situation where you can just accomplish the task without really kind of planning and thinking because you're used to
29:30 - 30:00 standing up from your chair. But the complexity of this process, of imagining what the consequences of reactions are going to be with your internal world model and then planning a sequence of actions to accomplish this task, that's the big challenge of AI for the next few years. We're not there yet. ********** Jeff: So one question I've been wanting to ask. This has been a great lesson, professor. I'm
30:00 - 30:30 really grateful for that, but I also want to get to the current view of Meta's strategy on this. And the fact that Meta has decided to go, what do we call open source or open or available or whatever, but LLAMA is a tremendous tool. I, as an educator myself, am grateful. I'm Emeritus of CUNY, but now I'm at Stony Brook, and it's because of LLAMA that universities can run models and learn from them and build things. And it struck me, and I've said this often, that I think that the Meta strategy,
30:30 - 31:00 your strategy here on LLAMA and company, is a spoiler for much of the industry part, but an enabler for tremendous open development, whether it's academic or entrepreneurial. And so I'd love to hear from the horse's mouth here, what's the strategy behind opening up LLAMA in the way that you've done? Yann: Okay. It's a spoiler for exactly three companies. Jeff: Yeah. Well, exactly.
31:00 - 31:30 Yann: It's an it's an enabler for thousands of companies. So obviously, from a pure ethical point of view, it's obviously the right thing to do. Right? I mean, LLAMA, LLAMA two, the release of LLAMA two in qualified open source, has basically completely jump started the AI ecosystem not just in industry and startups,
31:30 - 32:00 but but also in academia, as you as you were saying. Right? I mean, academia basically doesn't have the means to train their own foundation model at the same level as as as companies. And so they rely on this kind of open source platform to be able to make contributions to AI research. And that's kind of one of the main reasons for Meta to actually release those foundation models in open source is to enable innovation, faster innovation.
32:00 - 32:30 And the question is not whether this or that company is three months ahead of the other, which is really the case right now. The question is, do we have the capabilities in the AI systems that we have at the moment to enable the products we want to build? And the answer is no. The product that Meta wants to build ultimately is an AI assistant, or maybe a collection of AI assistants, that is with us at all times, maybe lives in our smart glasses,
32:30 - 33:00 that we can talk to. Maybe it displays information in the lens and everything. And for those things to be maximally useful, they would need to have human level intelligence. Now we know that moving towards human level intelligence is not going, so first of all, it's not going to be an event. There's not going to be a day where we don't have AGI and a day after which we have AGI. It's just not not going to happen this way.
33:00 - 33:30 Jeff: I'll buy you the drinks if that happens. Yann: Well, I should be buying you drinks because it's not happening. It's not going to happen this way. Right? So the question really would be how do we make fastest possible progress towards human level intelligence? And since it's one of the biggest
33:30 - 34:00 scientific and technological challenge that we've faced, we need contributions from anywhere in the world. There's good ideas that can come up from anywhere in the world. And we've seen an example with DeepSeek recently, right, which surprised everybody in Silicon Valley. Didn't surprise many of us in the open source world that much. Right? I mean, that's the point. It's sort of validation of the whole idea of open source. And so good ideas can come from anywhere. Nobody has a monopoly
34:00 - 34:30 on good ideas, except people who have an incredibly inflated superiority complex. Jeff: Not that we're talking about anybody in particular. Right? Yann: No. No. We're not talking about anybody in particular. There's is a high concentration of those people in certain areas of the country. So and, of course, they have a vested interest in sort of disseminating this idea that they somehow,
34:30 - 35:00 they are better than everybody else. So I think it's still a major scientific challenge, and we need everybody to contribute. So the best way we know how to do this in the context of academic research is you publish your research, you publish your code in open source, as much as you can, and you get people to contribute. And I think the history of AI over the last dozen years really shows that, I mean, the progress has been fast because people were sharing code and scientific information. And some, a few players in the space, started
35:00 - 35:30 coming up over the last three years because they need to generate revenue from the technology. Now at Meta, we don't generate revenue from the technology itself. We generate revenue from ads, and those ads rely on the quality of products that we build on top of the technology. And they rely on the network effect of the social networks and pass a conduit to the people and the users. And so
35:30 - 36:00 the fact that we distribute our technology doesn't hurt us commercially. In fact, it helps us. Jason: Yeah. 100%. Hearing you talk, you mentioned the topic of wearables and glasses, and that, of course, always sparks my attention. I had the opportunity to check out Google's Project Astra Glasses last December. And it has stuck with me ever since, and really solidified my
36:00 - 36:30 view of, and we're not talking AI ten, twenty years down the line and what it will become, but kind of more punctuating this moment in AI, and that being a really wonderful next step for contextualizing the world while wearing a piece of hardware that we might already be wearing. If it's a pair of glasses and looks like our normal glasses suddenly we have this extra context. And I guess the line that I've been able to draw in talking with you between where we are now and where we're going potentially is not only the context that experience gives the wearer, but for you, for Meta and for those creating these systems, smart glasses out in the real world, taking in information on how humans live and operate in our physical world could
36:30 - 37:00 be a really good source of knowledge to pull from for what you were talking about earlier. Am I on the right track, or is that just one piece, one very small piece of the puzzle? Yann: Well, it's a piece, an important piece. But, yeah, I mean, the idea that you have an assistant with with you at all times that sees what you see, hears what you hear, if you let it, obviously. See if you let it for sure. You know, but to some extent, is your confidant and can help you perhaps even better than
37:00 - 37:30 how a human assistant could help you. I mean, that's certainly an important vision. In fact, the vision is that you won't have a single assistant. You will have a whole staff of intelligent virtual assistant, working around with you. It's like all of us would be a boss. Okay? I mean, people feel threatened. Some people feel threatened by the fact that machines would be smarter than us, but we should feel empowered by it. I mean,
37:30 - 38:00 they're going to be working for us, you know? I don't know about you, but as a scientist or as a manager in industry, the best thing that can happen to you is you hire students or engineers or people working for you that are smarter than you. That's the ideal situation. And you shouldn't feel threatened by that. You should feel empowered by it. So so I think that's the future we should envision. Smart collection of assistants that helps you in your daily lives.
38:00 - 38:30 Maybe smarter than you. You give them a task, they accomplish it, perhaps better than you. And that's great. Now that connects to another point I wanted to make related to the previous question, which is about open source. Which is that, in that future, most of our interactions with the digital world will be mediated by AI systems. Okay. And that's why Google is a little frantic right now because they know that
38:30 - 39:00 nobody is going to go to a search engine anymore. Right? You're just going to talk to your AI assistant. So they they're trying to experiment with this within Google. That's going to be through glasses, so they realize they probably have to build those. Like, I realized this several years ago. So we have a bit of a head start, but that's really what's going to happen. We're going to have those AI sitting with us at all times. And they're going to mediate all of our information diet.
39:00 - 39:30 Now if you think about this, if you are a citizen anywhere in the world, you do not want your information diet to come from AI assistant built by a handful of companies on the West Coast of the US or China. You want a high diversity of AI assistants that, first of all, speaks your own language whether it's a obscure dialect or local language. Second of all,
39:30 - 40:00 understands your culture, your value system, your biases, whatever they are. And so we need a high diversity of such assistants for the same reason we need a high diversity of the press. Right? And I realize I'm talking to a journalism professor here. But am I right?
40:00 - 40:30 Jeff: Amen. In fact, I think that's that's what I celebrate is what the Internet and next AI can do is to tear down the structure of mass media and open up media once again at a human level. AI lets us be more human, I hope. Yann: I hope too. So the only way we can achieve this with current technology is if the people building those assistants with cultural diversity and everything,
40:30 - 41:00 have access to powerful open source foundation models. Because they're not going to have the resources to train their own models. Right? We need models that speak all the languages in the world, understand all the value system, and have all the biases that you can imagine in terms of culture, political biases, whatever you want. And so there's going to be thousands of those that we're going to have to choose from, and they're going to be built by small shops, everywhere around the world. And they're going to have to be built on top of foundation models trained by a large company like Meta or
41:00 - 41:30 maybe an international consortium that trains those foundation models. the picture I see, the evolution of the market that I see, is similar to what happened with the software infrastructure of the Internet in the late nineties or the early two thousands, where in the early days of the Internet, you had Sun Microsystems, Microsoft, HP, IBM, and a few others kind of pushing to provide the hardware and software infrastructure of the Internet,
41:30 - 42:00 their own version of UNIX or whatever, or Windows NT, and their own web server, and their own racks, and blah blah blah. All of this got completely wiped out by Linux and commodity hardware. Right? And the reason it got wiped out is because running Linux is a platform software. it's more portable, more reliable, more secure, more cheaper everything. And so Google was one of the first to do this, building infrastructure on commodity hardware and open source operating system. Meta,
42:00 - 42:30 of course, did exactly the same thing, and everybody is doing it now, even Microsoft. So, I think there's going to be a similar pressure from the market to make those AI financial models open and free because it's an infrastructure like the infrastructure of the Internet. Yann: How long have you been teaching? Yann: Twenty two years. Twenty two years.
42:30 - 43:00 Jeff: So what differences do you see in students and their ambitions today in your field? Yann: I don't know. It's hard for me to tell, because in the last dozen years or so, I've only taught graduate students. So I don't see any significant change in PhD students, other than the fact that they come from all over the world. I mean, there is something absolutely
43:00 - 43:30 terrifying happening in The US right now where funding for research is being cut, and then there's sort of threats of visas not being given to foreign students and things like that. I mean, it's completely going to destroy the technological leadership in the US, if it's actually implemented the way it seems to be going. Most PhD students in STEM,
43:30 - 44:00 science, technology, engineering, mathematics, are foreign. And it's even higher in most engineering disciplines at the graduate level. It's mostly foreign students. Most founders of or CEOs of tech companies are foreign born. Jeff: French universities are offering the opportunity for American researchers to go to go there. I've got one more question for you. Do you have a cat?
44:00 - 44:30 Yann: I don't, but, our youngest son has a cat, and we watch the cat occasionally. Jeff: Okay. I wondered if that was your model. Jason: Alright. Well, Yann, this has been wonderful. I know we've kept you just a slight bit longer than we had agreed to for your schedule. So we really appreciate you carving out some time. Yeah. It's been really wonderful, and it's wonderful to hear some of this, as Jeff said,
44:30 - 45:00 from the horse's mouth earlier because you come up in our conversations quite a lot, and we really appreciate your perspective, in the world of AI and all the work that you've done over the years. Thank you for being here with us. This has been an honor. Jeff: Thank you for the sanity you bring to the conversation. Yann: Well, thank you so much. It's really been a pleasure talking with you.