Yann LeCun: Human Intelligence is not General Intelligence // AI Inside 63
Estimated read time: 1:20
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
In this illuminating episode of the "AI Inside" podcast, Yann LeCun, Chief AI Scientist at Meta and Turing Award winner, discusses the current limitations and future directions of AI technology, particularly focusing on Large Language Models (LLMs). LeCun provides insights into why the promise of Artificial General Intelligence (AGI) isn't just around the corner, emphasizing the need for machines that truly understand and interact with the physical world. He critiques the hype surrounding LLMs, asserting that while they're useful, they fall short of representing a comprehensive model of intelligence. Instead, he advocates for a future where AI progresses through open collaboration, akin to the early days of the internet, and stresses diversity in AI systems to truly capture global perspectives.
Highlights
Yann LeCun, often dubbed the 'godfather of AI,' sheds light on why LLMs like ChatGPT are not the ultimate path to AGI 📚.
LeCun emphasizes the importance of AI that understands and interacts with the real world 🌍.
He argues that human intelligence is specialized, not general, challenging the notion of AGI as a close reality 🔍.
Meta's strategy of open-sourcing LLAMA is aimed at democratizing AI development and fostering faster innovation 🌐.
LeCun envisions a future where AI assistants aid daily life, but notes achieving human-level intelligence is a significant challenge 🌟.
Key Takeaways
LLMs are useful tools, but they aren't the breakthrough to general intelligence 🎯.
AI needs to understand the physical world and have a world model to advance 🚀.
True intelligence isn't just about solving known problems but adapting to new ones 🌟.
Open-source AI, like Meta's LLAMA, opens doors for innovation beyond a few big players 🤝.
We shouldn't feel threatened by AI; it should empower us, working as our 'smart assistants' 🔧.
Overview
Yann LeCun, the Chief AI Scientist at Meta, candidly discusses the real-world applications and limitations of Large Language Models on AI Inside podcast. He shares his skepticism about the prevalent notion that AI is on the brink of achieving Artificial General Intelligence.
LeCun proposes a forward-thinking perspective that AI systems need to deeply understand and model the physical world to reach new heights. In his view, creating open-source platforms like Meta's LLAMA is pivotal. This openness is intended to democratize AI research, accelerating innovation through global collaboration.
Looking to the future, LeCun imagines a world where AI assistants become integral companions in daily life. However, he maintains that reaching human-level AI isn't just around the corner—it requires groundbreaking advancements in understanding and technology.
Chapters
00:00 - 01:30: Introduction and Guest Welcome The chapter begins with an introduction to the main topics and themes that will be covered in the discussion. The host warmly welcomes the guests, outlining their background and expertise related to the subject matter. The conversation is set to explore various aspects of the topic, promising an in-depth and insightful dialogue.
01:30 - 05:00: Discussion on LLMs and AI Reliability In this chapter, Jason welcomes Yann LeCun, the chief AI scientist at Meta, to discuss topics related to Large Language Models (LLMs) and the reliability of AI systems.
05:00 - 10:00: Limits of AI and Future Research Directions The chapter is titled 'Limits of AI and Future Research Directions' and features a dialogue with a Turing Award winner, often referred to as the godfather of AI. The chapter opens with a welcome to Yann, who is a guest on the show, setting the stage for a discussion likely centered around the current limitations of artificial intelligence and potential avenues for future research in the field. The presence of a prominent figure like Yann suggests that the conversation will offer deep insights into AI challenges and innovations.
10:00 - 15:00: Challenges in Achieving Human-Level Intelligence The chapter opens with a conversation between Jason and Yann, where Jason refers to Yann as the 'godfather of AI.' Yann modestly responds by saying he shuts his ears to avoid turning red, indicating his humility despite the recognition.
15:00 - 30:00: Meta's Open Source Strategy and AI Future The chapter discusses the dynamics of the AI industry, specifically focusing on the differing opinions on the effectiveness and potential of Large Language Models (LLM). Despite OpenAI securing substantial funding due to its success with LLM technology, there are concerns about diminishing returns. The chapter explores why some companies continue to heavily invest in generative AI and LLM, possibly overlooking the limitations highlighted by certain experts.
30:00 - 45:30: Reflecting on Education and AI's Potential The chapter discusses the current and potential future applications of Large Language Models (LLMs) in various fields, particularly in coding and as AI assistants. Yann acknowledges the usefulness of these models but points out that they are not yet fully reliable, especially when considering their application as agentic systems.
Yann LeCun: Human Intelligence is not General Intelligence // AI Inside 63 Transcription
00:00 - 00:30
00:30 - 01:00 Jason: I am thrilled to welcome to AI Inside
Yann LeCun, chief AI scientist at Meta,
01:00 - 01:30 Turing Award winner, known by many as
the godfather of AI. Welcome to the show, Yann. It's really nice to meet you. Yann: Thanks for having me on.
01:30 - 02:00 Jason: Does it ever get old
hearing someone introduce you as the godfather of AI? Are you
kind of like, yeah, here we go again. Yann: I shut my ears so I don't turn red. Jason: But you can accept it at this point because
it's the truth. Well, there's so many different directions that we can go on this conversation.
And we end up talking about your work a lot, obviously, the work of Meta and this moment in
LLM. I think the kind of question that I have to kick things off is that we are so firmly implanted
into the current realm of artificial intelligence which really seems to be the LLM generation, and
there's probably something on the horizon around
02:00 - 02:30 that. But we're still firmly implanted there, and
you've been pretty opinionated on the limits of LLM at a time when we're also seeing things like
OpenAI securing a record breaking round of funding largely built on its success in LLM technology.
And so I see diminishing returns on one side. On the other, companies betting everything on
generative AI and LLM. And I'm curious to know what you think as far as why they might not be
seeing what you're seeing about this technology.
02:30 - 03:00 Or maybe they are and they're just approaching
it differently. What are your thoughts there? Yann: Oh, maybe they are. There's no
question that LLMs are useful. I mean, particularly for coding assistants and
stuff like that. And in the future, probably for more general AI assistants, jobs.
People are talking about agentic systems. It's still not totally reliable yet. For these
kinds of applications, the main issue, and it's
03:00 - 03:30 been a recurring problem with AI and computer
technology more generally, is the fact that you can see impressive demos. But when it comes
time to actually deploy a system thats reliable enough that you put it in the hands of
people and they use it on a daily basis, there's a big distance. It's much harder to
make those systems reliable enough. Right? Ten years ago, we were seeing demos of
cars driving themselves in the countryside, streets, for about ten minutes before you had
to intervene. And we made a lot of progress
03:30 - 04:00 but we're still not to the point of having cars
that can drive themselves as reliably as humans, except if we cheat, which is fine, which is what
Waymo and others are doing. So there's been sort of a repeated history over the last seventy years
in AI of people coming up with a new paradigm and then claiming, okay, that's it. This going to
take us to human level AI. Within ten years
04:00 - 04:30 most intelligent entity on
the planet would be a machine. And every time it's turned out to be
false because the new paradigm either hits a limitation that people didn't see or
turned out to be really good at solving a subcategory of problem that didn't turn
out to be the general intelligence problem.
04:30 - 05:00 And so there's been generation
after generation of AI researchers, and industrialists, and founders, making those
claims, and they're being wrong every time. So, I don't want to poo poo LLMs. They're
very useful. There should be a lot of investment in them. There should be a lot
of investment in infrastructure to run them, which is where most of the money is
going, actually. It's not to train them or anything. It's to run them in the end
serving billions of users potentially. But,
05:00 - 05:30 like every other computer technology, it can be
useful even if it's not human level intelligence. Now if we want to shoot for human
level intelligence, I think we should. We need to invent new techniques.
We're just nowhere near matching that. Jeff: I'm really grateful you're here, Yann, because I quote you constantly on this show and
elsewhere because you are the voice of realism, I think, in AI. And I don't hear you
spouting the hype that I hear elsewhere.
05:30 - 06:00 And you've been very clear about where
we are now. I think you've equated us to, maybe we're getting to the point
of a smart cat or a three year old. Yann: Not even
***** Jeff: Right. And I think you've also talked about
that we've hit the limits of what LLMs can do. So there is a next paradigm, a next leap, And I
think you've talked about them being understanding reality better. But can you talk about where
you think research, where you're taking it
06:00 - 06:30 or where it should be going next? Where we should
be putting resources next, to get more out of AI? Yann: So I wrote a long paper three years ago
where I explained where I think AI research should should go over the next ten years. This
was before the world learned about LLMs. And, of course, I knew about it because
we were working on it before. But this vision hasn't changed. It has not
been affected by the success of LLM.
06:30 - 07:00 So here's the thing. We need machines
that understand the physical world. We need machines that are capable of
reasoning and planning. We need machines that have persistent memory. And we need
those machines to be controllable and safe, which means that they need to be driven by
objectives we give them. We give them a task, they accomplish it, or they give us
the answer to the question we ask, and that's it. Right? And they can't escape
whatever it is that we're asking them to do.
07:00 - 07:30 So, what I explained in that document
is how we might potentially one way get to that point. And it's centered on a
central concept called "world model." So we all have world models in our head. And
animals do too, right? And it's basically the mental model that we have in our head that
allows us to predict what's going to happen in the world. Either because the world is being
the world or because of an action we might
07:30 - 08:00 take. So if you can predict the consequences of
our actions then what we can do with that is if we set ourselves an objective, a goal, a task
to accomplish, we can, using our world model, imagine whether a particular sequence of
actions will actually fulfill that goal. Okay? And that allows us to plan. So planning
and reasoning really is manipulating our
08:00 - 08:30 mental model to figure out if a particular
sequence of actions is going to accomplish a task that we set for ourselves. Okay? So that is
what psychologists call "System 2." A deliberate, I don't want to say conscious
because it's a loaded term, but deliberate process of thinking about
how to accomplish a task, essentially.
08:30 - 09:00 And that we don't know how to do really. I mean, we're making some progress at the research level.
A lot of the most interesting research in that domain is done in the context of robotics.
Because when you need to control a robot, you need to know in advance what the effect
of sending a torque on an arm is going to be. So this process, in fact, in control theory
and robotics, of imagining the consequences
09:00 - 09:30 of a sequence of actions and then basically, by
optimization searching for a sequence of action that satisfies the task, even has a name, even has
an acronym. It's called Model Predictive Control, MPC. It's a very classical method in
optimal control going back decades. The main issue with this that in robotics
and control theory, the way this works, the one model is a bunch of equations that are
written by someone, by an engineer. You only control a robot arm or a rocket or something. You
can just write down the dynamical equations of it.
09:30 - 10:00 But what we need to do for AI systems,
we need this world model to be learned from experience or learned from observation.
So this the kind of process that seems to be taking place in the minds of animals and
maybe humans, infants, learning how the world works by observation. That's the part
that seems really complicated to reproduce. Now, this can be based on a very simple principle
which people have been playing with for a long
10:00 - 10:30 time without much success, called Self-supervised
Learning. And Self-supervised Learning has been incredibly successful in the context of natural
language understanding and LLMs and things like that. In fact, it's the basis of LLM. Right?
So you take a piece of text, and you train a big neural net to predict the next word in the
text. Okay? That's basically what it comes out to.
10:30 - 11:00 These tricks are how to make this efficient and
everything. But that's the basis of LLM. You just train it to predict the next word
in the text. And then when you use it, you have it predict the next word, shift
the predicted word into its viewing window, and then predict the second word, then
shift that in, pick the third. Right? That's autoregressive prediction. That's
what LLMs are based on. And the trick is how much money you can afford to hire
people to fine tune it so we can answer questions correctly. Which is what a
lot of money is going into right now.
11:00 - 11:30 So you could imagine using this principle
of Self-supervised Learning for learning representations of images, learning to predict
what's going to happen in a video. Right? So if you show a video to a computer, and train some big
neural net to predict what's going to happen next in a video, if the system is capable of learning
this and doing a good job at that prediction, it will probably have understood a
lot about the underlying nature of
11:30 - 12:00 the physical world. It thinks that objects
move according to particular laws, right? So animate objects can move in ways that are more
unpredictable but still satisfy some constraints. You're not going to have objects that are not
supported fall because of gravity, etcetera. Right? Now, human babies take nine months
to learn about gravity. It's a long process.
12:00 - 12:30 Young animals, I think, learn this much quicker,
but they don't have the same kind of grasp of where gravity really is in the end. Although cats
and dogs are really good at this, obviously. So how do we reproduce this kind of training? So if
we do the Naive scene, which I've been working on for twenty years doing similar thing as taking
a piece of text, but just taking a video and then training a system to predict what happens next in
the video, it doesn't really work. So if you're
12:30 - 13:00 training to predict the next frame, it doesn't
learn anything useful because it's too easy. If you're training to predict longer term, it
really cannot predict what's going to happen in the video because there's a lot of plausible
things that might happen. Okay? So in the case of text, that's a very simple problem
because you only have a finite number of words in the dictionary. And so you can never
predict exactly what word follows a sequence, but you can predict a probability distribution
of all words in the dictionary. And that's good
13:00 - 13:30 enough. You can represent uncertainty in the
prediction. You can't do this with video. We do not know how to represent appropriate probability
distribution over the set of all images or video frames or a video segment, for that matter. It's
actually a mathematically intractable problem. So it's not just a question of we don't have
big enough computers. It's just intrinsically intractable. So until maybe five, six
years ago, I didn't have any solution
13:30 - 14:00 to this. I don't think anybody had any
solution to this. And one solution that we came up with is a kind of architecture
that changes the way we would do this. Instead of predicting everything that happens
in the video, we basically train a system to learn a representation of the video, and we
make the prediction in that representation space. And that representation eliminates
a lot of details in the video that are just not predictable or impossible to figure out.
That kind of architecture is called a JEPA,
14:00 - 14:30 Joint Embedding Predictive Architecture. And
I can tell you a little bit about about that later. But what may be surprising
about this that it's not generative. So everybody is talking about
generative AI. My hunch is that the next generation AI system will be based
on non-generative models, essentially. Jason: So, what occurs to me in hearing you
talk about the real limitations of where we're at when we take a look at what everybody seems to
be claiming is so great about LLM and that "we're right on the precipice of AGI, Artificial
General Intelligence, and here's the reason
14:30 - 15:00 why." It depends on who you ask. Right?
Some people are like, "it's right around the corner." Other people are like, "oh, it's already
here. Take a look at this. Isn't that amazing?" Jeff: Others are "it'll never be here." Jason: Yes. And then others are, "it'll never be
here." I think often on this show where we talk about this topic a little bit in disbelief, and I
think what you just said kind of punctuates that a little bit for me. How do you model around or
create a model that can really analyze all aspects of what you're talking about? Like, we've got LLMs
focusing on reasoning. Although maybe maybe it's a different type of reasoning compared to what
we are looking at now. Maybe that is an actual reasoning in the way that humans reason. But
then you've got the physical world. You've got the planning, this persistent memory. All those
components that you talk about, when you put it that way, it really makes me more confident that
AGI is not right around the corner, that AGI is
15:00 - 15:30 really this distant theory that may never or,
at least a very, very long time down the road, come true? What are your thoughts on that?
********* Yann: Okay. So, first of all, there is absolutely
no question in my mind that at some point in the future, we'll have machines that are at least as
smart as humans in all the domains where humans are smart. Okay? That's not a question. People
have kind of had big philosophical questions about this. A lot of people still believe
that human nature is kind of impalpable, and we're never going to be able
to reduce this to computation.
15:30 - 16:00 I'm not a skeptic on that dimension. There's
no question in my mind that at some point, we'll have machines that are more intelligent than
us. They already are in narrow domains. Right? So then there is the question of what does
AGI really mean? Does it mean general what do you mean by general intelligence? Do you
mean intelligence that is as general as
16:00 - 16:30 human intelligence? If that's the case, then
okay, you can use that phrase, but it's very misleading because human intelligence is not
general at all. It's extremely specialized. We are shaped by evolution to only do the tasks
that are worth accomplishing for survival. And, we think of ourselves as having general
intelligence, but we're just not at all general. It's just that all the problems that we're not
able to apprehend, we can't think of them. And
16:30 - 17:00 so that makes us believe that we have general
intelligence, but we absolutely do not have general intelligence. Okay. So I think this phrase
is nonsense first of all. It is very misleading. I prefer the kind of phrase we use to
designate the concept of human level intelligence within within Meta is AMI,
Advanced Machine Intelligence. Okay? This
17:00 - 17:30 kind of a much more open concept.
We actually pronounce it "ami", which means friend in French. But let's call
it human level intelligence if you want. Right? So no question it will happen. It's
not going to happen next year. It's not going to happen two years from now. It may happen or
happen to some degree within the next ten years. Okay. So it's not that far away. If all of the
things that we are working on at the moment turn
17:30 - 18:00 out to be successful, then maybe within ten years,
we'll have a good handle on whether we can reach that goal. Okay. But it's almost certainly harder
than we think. And probably much harder than we think because it's always harder than we think.
Over the history of AI, it's always been harder than we think. You know, it's the story I was
telling you earlier. So, I'm optimistic. Okay?
18:00 - 18:30 I'm not one of those pessimists
who say we'll never get there. I'm not one of those pessimists that
says all the stuff we're doing right now is useless. It's not true.
It's very useful. I'm not people who say we're going to need some quantum computing
or some completely new principle, blah blah blah. No. I think it's going to
be based on deep learning, basically. And that underlying principle,
I think, is going to stay with us for a long time. But within this domain
the type of things that we need to
18:30 - 19:00 discover and implement, we're not there
yet. We're missing some basic concepts. And the best way to convince yourself of this is
to say, okay. We have systems that can answer any question that has a response somewhere on the
Internet. We have systems that can pass the bar exam, which is basically information retrieval
to a large extent. We have systems that can
19:00 - 19:30 shorten the text and help us understand it, They
can criticize a piece of writing that we're doing, they can generate code. But generating
code is actually, to some extent, relatively simple because the syntax is
strong, and a lot of it is stupid. Right? We have systems that can solve equations,
that can solve problems as long as they've been trained to solve those problems.
If they see a new problem from scratch,
19:30 - 20:00 current systems just cannot find a solution.
There was actually a paper just recently that showed that if you test all the best LLMs on
the latest math Olympiad, they basically get zero performance, because there are new problems
they have not been trained to solve. So okay. So we have those systems that can manipulate
language, and that fools us into thinking that they are smart because we're
used to smart people being able to
20:00 - 20:30 manipulate language in smart ways. Okay.
But where is my domestic robot? Where is my level five self driving car? Where
is a robot that can do what a cat do? Even a simulated robot that can do what a cat can
do. What cats can do. Right? And the issue is not that we can't build a robot. We can actually
build robots that have the physical abilities.
20:30 - 21:00 It's just that we don't know how to
make them smart enough. And it's much, much harder to deal with the real world and
to deal with systems that produce actions than to deal with systems that understand language.
And again, it's related to the part that I was mentioning before. Language is
discrete. It has strong structure. The real world is a huge mess,
and it's unpredictable. It's not deterministic. You know? It's
high dimensional. It's continuous.
21:00 - 21:30 It's got all the problems. So let's try to build something that can learn as
fast as a cat, first of all. Jeff: I've got so many questions for you, but
I'm going to stay on this for another minute. Should human level activity or thought even be the
model? Is that limiting? There's a wonderful book from some years ago by Alex Rosenberg called, How
History Gets Things Wrong, arguing that the theory of mind, he debunks the theory of mind, that we
don't have this reasoning that we go through.
21:30 - 22:00 That in fact, we're kind of doing what an LLM does
in the sense that we have a bunch of videotapes in our head. And when we hit a circumstance, we find
the nearest videotape and play that and decide yes or no in that way. And so that does sound like
the human mind a bit. But the model we tend to have for the human mind is one of reasoning
and weighing things and so on. And, also, as you say, we are not generally intelligent, but
the machine conceivably could do things that we, right now it does things we cannot do.
It could do more. So when you think about
22:00 - 22:30 success and that goal, what is that model? Aa
cat would be a big victory, to get to the point of being a cat. But what's your larger goal? Is
it human intelligence, or is it something else? Yann: Well, it's a type of intelligence that
is similar to human and animal intelligence in the following way. Current AI systems
have a very hard time solving new problems
22:30 - 23:00 that they've never faced before. Right?
So they don't have this mental model, this world model I was I was telling you about
earlier, that allows them to kind of imagine what the consequence of their actions or whatever.
They they don't reason in that way. Right? I mean, an LLM certainly doesn't because the only
way it can do anything is just produce words, produce tokens. Right? So one way you trick an LLM
into spending more time thinking about a question,
23:00 - 23:30 a complex question than a simple question, is you
ask it to go through the steps of reasoning. And as a consequence, it produces more tokens and
then spends more computation answering that question. But it's a horrible trick. It's
a hack. It's not the way humans reason. Another example that LLMs do is, for
writing code or answering questions,
23:30 - 24:00 you get an LLM to generate lots
and lots of sequences, of tokens, that have some decent level of probability or
something like that. And then you have a second neural net that sort of tries to evaluate each of
those and then picks the one that is best. Okay? It's sort of like producing lots and lots of
answers to a question and then have a critique of telling you which of those answers is the best.
Now there is a lot of AI systems that work this
24:00 - 24:30 way, and it works in certain situations. If you
want a system, your computer system to play chess, that's exactly how it works. It produces a tree
of all the possible moves from you, and then from your opponent, and then from you, and then from
your opponent. That tree grows exponentially. So you can't generate the entire tree. You have
to have some smart way of only generating a piece of the tree. And then you have what's called a
evaluation function or value function that picks
24:30 - 25:00 out the best branch in the tree that results in
a position that is most likely to win. And all of those things are trained nowadays. Okay?
********** They're neural nets basically that generate
the good branch in the trees and select it. That's a limited form of reasoning. Why
is it limited? And it's, by the way, a type of reasoning that humans are terrible at.
The fact that a $30 gadget that you buy at a toy
25:00 - 25:30 store can beat you at chess demonstrates that
humans totally suck at this kind of reasoning. **********
Okay? We're just really bad at it. We just don't have the the memory capacity, the computing speed,
and everything. Right? So we're terrible at this. What we are really good at, though, is the
kind of reasoning and what cats and dogs and rats are really good at is, sort of planning
actions in the real world and planning them
25:30 - 26:00 in a hierarchical manner. So knowing that if we
want to, let me take an example in human domain, but there are similar ones in sort of animal
tasks. Right? I mean, you see cats learning to open jars and jump on doors to open them and
open the lock of a door and things like that. So, they learn how to do this, and they learn how to
plan that sequence of actions to arrive at a goal,
26:00 - 26:30 which is getting to the other side,
perhaps to get food or something. You see squirrels doing this. Right? I
mean, they're pretty smart actually in the way they learn how to do this kind of stuff.
Now this is a type of planning that we we don't know how to reproduce with machines.
And all of it is completely internal. It has nothing to do with language. Right? We
think, as humans, we think that thinking is related to language, but it's not. The animals
can think. People who don't talk can think.
26:30 - 27:00 And there are types of reasoning. Mostly
most types of reasoning have nothing to do with language. So if I tell you imagine
a cube floating in the air in front of you or in front of us, Okay? Now rotate that
cube 90 degrees or along a vertical axis. So probably, you made the assumption that
the cube was horizontal, that the bottom was horizontal. You didn't imagine a cube that
was kind of sideways. And then you rotate it 90 degrees, and you know that it looks just
like the cube you started with because it's a
27:00 - 27:30 cube. It's got 90 degree symmetry. There's
no language involved in this reasoning. It's just you know, images and sort of abstract
representations of the situation. and how do we do this? Like, we have those abstract representation
of thought, and then we can manipulate those representations through sort of virtual actions
that we imagine taking, like rotating the cube,
27:30 - 28:00 and then imagine the result. Right? And that
is what allows us to actually accomplish tasks, in the real world, at an abstract level. It doesn't matter what the cube is made
of, how heavy it is, whether it floats in front of us or not. You know? I mean, all of the
details don't matter, and the representation is abstract enough to really not care about those
details. If I plan to, I'm in New York. Right? If I plan to be in Paris tomorrow, I could
try to plan my trip to Paris in terms of
28:00 - 28:30 elementary action I can take, which basically are
millisecond by millisecond controls of my muscles. But I can't possibly do this because it's several
hours of muscle control, and, it will depend on information I don't have. Like, I can go on the
street and hail a taxi. I don't know how long it's going to take for a taxi to come by. I don't
know if the light is going to be red or green.
28:30 - 29:00 I cannot plan my entire trip. Right? So I have to
do hierarchical planning. I have to imagine that if I were to be in Paris tomorrow, I first have
to go to the airport and catch a plane. Okay. Now I have a circle going to the airport.
How do I go to the airport? I'm in New York, so I can go down on the street, hail a taxi.
How do I go down on the street? Or I have to walk through the elevator, the stairs, hit
the button, go down, walk out the building.
29:00 - 29:30 And before that I have a circle going to the
elevator or to the stairs. How do I even stand up from my chair? So can you explain words
how you climb a stair or you stand up from your chair? You can't. Like this is low
level understanding of the real world. And at some point, in all those set goals that I
just described, you get to a situation where you can just accomplish the task without really kind
of planning and thinking because you're used to
29:30 - 30:00 standing up from your chair. But the complexity of
this process, of imagining what the consequences of reactions are going to be with your internal
world model and then planning a sequence of actions to accomplish this task, that's the
big challenge of AI for the next few years. We're not there yet.
********** Jeff: So one question I've been wanting to ask.
This has been a great lesson, professor. I'm
30:00 - 30:30 really grateful for that, but I also want to get
to the current view of Meta's strategy on this. And the fact that Meta has decided
to go, what do we call open source or open or available or whatever,
but LLAMA is a tremendous tool. I, as an educator myself, am grateful. I'm
Emeritus of CUNY, but now I'm at Stony Brook, and it's because of LLAMA that universities
can run models and learn from them and build things. And it struck me, and I've said this
often, that I think that the Meta strategy,
30:30 - 31:00 your strategy here on LLAMA and company,
is a spoiler for much of the industry part, but an enabler for tremendous open development,
whether it's academic or entrepreneurial. And so I'd love to hear from the horse's
mouth here, what's the strategy behind opening up LLAMA in the way that you've done? Yann: Okay. It's a spoiler
for exactly three companies. Jeff: Yeah. Well, exactly.
31:00 - 31:30 Yann: It's an it's an enabler
for thousands of companies. So obviously, from a pure ethical point
of view, it's obviously the right thing to do. Right? I mean, LLAMA, LLAMA two, the
release of LLAMA two in qualified open source, has basically completely jump started the AI
ecosystem not just in industry and startups,
31:30 - 32:00 but but also in academia, as you as you were
saying. Right? I mean, academia basically doesn't have the means to train their own foundation
model at the same level as as as companies. And so they rely on this kind of open source
platform to be able to make contributions to AI research. And that's kind of one of the main
reasons for Meta to actually release those foundation models in open source is to
enable innovation, faster innovation.
32:00 - 32:30 And the question is not whether this or that
company is three months ahead of the other, which is really the case right now. The question
is, do we have the capabilities in the AI systems that we have at the moment to enable the
products we want to build? And the answer is no. The product that Meta wants to build
ultimately is an AI assistant, or maybe a collection of AI assistants, that is with us
at all times, maybe lives in our smart glasses,
32:30 - 33:00 that we can talk to. Maybe it displays
information in the lens and everything. And for those things to be maximally useful,
they would need to have human level intelligence. Now we know that moving towards human level
intelligence is not going, so first of all, it's not going to be an event. There's not
going to be a day where we don't have AGI and a day after which we have AGI. It's
just not not going to happen this way.
33:00 - 33:30 Jeff: I'll buy you the drinks if that happens. Yann: Well, I should be buying you
drinks because it's not happening. It's not going to happen this way. Right? So
the question really would be how do we make fastest possible progress towards human level
intelligence? And since it's one of the biggest
33:30 - 34:00 scientific and technological challenge that we've
faced, we need contributions from anywhere in the world. There's good ideas that can come up from
anywhere in the world. And we've seen an example with DeepSeek recently, right, which
surprised everybody in Silicon Valley. Didn't surprise many of us in the open
source world that much. Right? I mean, that's the point. It's sort of validation
of the whole idea of open source. And so good ideas can come from
anywhere. Nobody has a monopoly
34:00 - 34:30 on good ideas, except people who have an
incredibly inflated superiority complex. Jeff: Not that we're talking about
anybody in particular. Right? Yann: No. No. We're not talking about
anybody in particular. There's is a high concentration of those people in
certain areas of the country. So and, of course, they have a vested interest in sort
of disseminating this idea that they somehow,
34:30 - 35:00 they are better than everybody else. So I
think it's still a major scientific challenge, and we need everybody to contribute. So
the best way we know how to do this in the context of academic research is you publish your
research, you publish your code in open source, as much as you can, and you get people to
contribute. And I think the history of AI over the last dozen years really shows that, I
mean, the progress has been fast because people were sharing code and scientific information.
And some, a few players in the space, started
35:00 - 35:30 coming up over the last three years because they
need to generate revenue from the technology. Now at Meta, we don't generate revenue from the
technology itself. We generate revenue from ads, and those ads rely on the quality of products that
we build on top of the technology. And they rely on the network effect of the social networks and
pass a conduit to the people and the users. And so
35:30 - 36:00 the fact that we distribute our technology doesn't
hurt us commercially. In fact, it helps us. Jason: Yeah. 100%. Hearing you talk, you mentioned
the topic of wearables and glasses, and that, of course, always sparks my attention. I had
the opportunity to check out Google's Project Astra Glasses last December. And it has stuck
with me ever since, and really solidified my
36:00 - 36:30 view of, and we're not talking AI ten, twenty
years down the line and what it will become, but kind of more punctuating this moment in AI,
and that being a really wonderful next step for contextualizing the world while wearing a piece of
hardware that we might already be wearing. If it's a pair of glasses and looks like our normal
glasses suddenly we have this extra context. And I guess the line that I've been able to draw
in talking with you between where we are now and where we're going potentially is not only
the context that experience gives the wearer, but for you, for Meta and for those creating these
systems, smart glasses out in the real world, taking in information on how humans live
and operate in our physical world could
36:30 - 37:00 be a really good source of knowledge to pull
from for what you were talking about earlier. Am I on the right track, or is that just one
piece, one very small piece of the puzzle? Yann: Well, it's a piece, an important piece.
But, yeah, I mean, the idea that you have an assistant with with you at all times that
sees what you see, hears what you hear, if you let it, obviously.
See if you let it for sure. You know, but to some extent, is your confidant
and can help you perhaps even better than
37:00 - 37:30 how a human assistant could help you. I mean,
that's certainly an important vision. In fact, the vision is that you won't have a single
assistant. You will have a whole staff of intelligent virtual assistant, working around
with you. It's like all of us would be a boss. Okay? I mean, people feel threatened. Some
people feel threatened by the fact that machines would be smarter than us, but
we should feel empowered by it. I mean,
37:30 - 38:00 they're going to be working for us,
you know? I don't know about you, but as a scientist or as a manager in industry,
the best thing that can happen to you is you hire students or engineers or people
working for you that are smarter than you. That's the ideal situation. And
you shouldn't feel threatened by that. You should feel empowered
by it. So so I think that's the future we should envision. Smart collection of
assistants that helps you in your daily lives.
38:00 - 38:30 Maybe smarter than you. You give
them a task, they accomplish it, perhaps better than you. And that's great.
Now that connects to another point I wanted to make related to the previous question,
which is about open source. Which is that, in that future, most of our interactions with
the digital world will be mediated by AI systems. Okay. And that's why Google is a little
frantic right now because they know that
38:30 - 39:00 nobody is going to go to a search engine
anymore. Right? You're just going to talk to your AI assistant. So they they're trying
to experiment with this within Google. That's going to be through glasses, so
they realize they probably have to build those. Like, I realized this several years
ago. So we have a bit of a head start, but that's really what's going to happen.
We're going to have those AI sitting with us at all times. And they're going
to mediate all of our information diet.
39:00 - 39:30 Now if you think about this, if you
are a citizen anywhere in the world, you do not want your information diet to come from
AI assistant built by a handful of companies on the West Coast of the US or China. You want a high
diversity of AI assistants that, first of all, speaks your own language whether it's a obscure
dialect or local language. Second of all,
39:30 - 40:00 understands your culture, your value system, your
biases, whatever they are. And so we need a high diversity of such assistants for the same reason
we need a high diversity of the press. Right? And I realize I'm talking to a journalism
professor here. But am I right?
40:00 - 40:30 Jeff: Amen. In fact, I think that's that's
what I celebrate is what the Internet and next AI can do is to tear down the
structure of mass media and open up media once again at a human level.
AI lets us be more human, I hope. Yann: I hope too. So the only way we can
achieve this with current technology is if the people building those assistants
with cultural diversity and everything,
40:30 - 41:00 have access to powerful open source foundation
models. Because they're not going to have the resources to train their own models. Right?
We need models that speak all the languages in the world, understand all the value
system, and have all the biases that you can imagine in terms of culture,
political biases, whatever you want. And so there's going to be thousands of those
that we're going to have to choose from, and they're going to be built by small shops,
everywhere around the world. And they're going to have to be built on top of foundation
models trained by a large company like Meta or
41:00 - 41:30 maybe an international consortium that trains
those foundation models. the picture I see, the evolution of the market that I see, is
similar to what happened with the software infrastructure of the Internet in the
late nineties or the early two thousands, where in the early days of the Internet, you had
Sun Microsystems, Microsoft, HP, IBM, and a few others kind of pushing to provide the hardware
and software infrastructure of the Internet,
41:30 - 42:00 their own version of UNIX or whatever, or Windows
NT, and their own web server, and their own racks, and blah blah blah. All of this got completely
wiped out by Linux and commodity hardware. Right? And the reason it got wiped out is because running
Linux is a platform software. it's more portable, more reliable, more secure, more cheaper
everything. And so Google was one of the first to do this, building infrastructure on commodity
hardware and open source operating system. Meta,
42:00 - 42:30 of course, did exactly the same thing, and
everybody is doing it now, even Microsoft. So, I think there's going to be a similar pressure
from the market to make those AI financial models open and free because it's an infrastructure
like the infrastructure of the Internet. Yann: How long have you been teaching? Yann: Twenty two years. Twenty two years.
42:30 - 43:00 Jeff: So what differences do you see in students
and their ambitions today in your field? Yann: I don't know. It's hard for me to
tell, because in the last dozen years or so, I've only taught graduate students. So I don't
see any significant change in PhD students, other than the fact that they come from all over
the world. I mean, there is something absolutely
43:00 - 43:30 terrifying happening in The US right now
where funding for research is being cut, and then there's sort of threats of visas not
being given to foreign students and things like that. I mean, it's completely going to
destroy the technological leadership in the US, if it's actually implemented the way it
seems to be going. Most PhD students in STEM,
43:30 - 44:00 science, technology, engineering,
mathematics, are foreign. And it's even higher in most engineering
disciplines at the graduate level. It's mostly foreign students. Most founders of
or CEOs of tech companies are foreign born. Jeff: French universities are offering
the opportunity for American researchers to go to go there. I've got one more
question for you. Do you have a cat?
44:00 - 44:30 Yann: I don't, but, our youngest son has
a cat, and we watch the cat occasionally. Jeff: Okay. I wondered if that was your model. Jason: Alright. Well, Yann, this has been
wonderful. I know we've kept you just a slight bit longer than we had agreed to for your
schedule. So we really appreciate you carving out some time. Yeah. It's been really wonderful, and
it's wonderful to hear some of this, as Jeff said,
44:30 - 45:00 from the horse's mouth earlier because you
come up in our conversations quite a lot, and we really appreciate your perspective,
in the world of AI and all the work that you've done over the years. Thank you for
being here with us. This has been an honor. Jeff: Thank you for the sanity
you bring to the conversation. Yann: Well, thank you so much. It's
really been a pleasure talking with you.