Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
AMD's MI300 is a groundbreaking leap in the world of semiconductors, incorporating five key technologies crucial for competitive chip design. The MI300 melds chiplets, advanced packaging, SoC designs, unified memory, and AI acceleration into one powerful product. It features an astounding 147 billion transistors, integrating 10,000 CDNA3 GPU cores with 24 Zen 4 CPU cores. This innovative chip is aimed at advancing efficiency and performance, particularly in achieving future zettascale computing capabilities. With the MI300, AMD sets the stage for future developments and challenges the dominance of competitors like Nvidia.
Highlights
AMD MI300 integrates revolutionary technologies for the next-gen semiconductor. 🤖
The extensive use of chiplets allows flexible and efficient design beyond traditional monolithic chips. 🌌
Advanced packaging with true 3D stacking makes efficient power and data management a reality. 🏗️
Unified memory architecture simplifies programming and enhances system-wide efficiency. 🧠
MI300's breakthrough in AI acceleration positions AMD as a strong competitor to Nvidia. 🏎️
Key Takeaways
AMD MI300 combines five crucial technologies: chiplets, advanced packaging, SoC design, unified memory, and AI acceleration. 🚀
It's a powerhouse with 147 billion transistors and integrates 10,000 CDNA3 GPU cores and 24 Zen 4 CPU cores. 💪
The MI300 is a fully functioning APU, breaking new ground in high-performance computing efficiency. 🌟
Aims at challenging Nvidia and driving zettascale computing advancement through enhanced efficiency. ⚡
It represents AMD's first step towards achieving integrated systems aimed at extreme performance and efficiency. 🌐
Overview
AMD's MI300 is nothing short of a marvel in the tech world, marrying five game-changing technologies into a single semiconductor chip. Each one of these technologies—chiplets, advanced packaging, SoC (System on Chip) design, unified memory, and AI acceleration—significantly enhances the performance and efficiency of the MI300, setting a new standard for high-performance computing and AI applications.
This technological symphony results in a chip boasting an overwhelming 147 billion transistors, equipped with over 10,000 CDNA3 GPU cores and 24 Zen 4 CPU cores. It's designed not just to perform but to redefine performance benchmarks, posing a serious challenge to industry giants like Nvidia. AMD's MI300 isn't just about raw power; it's about delivering power smartly, optimizing efficiency to work towards the colossal aim of zettascale computing.
What makes MI300 truly revolutionary is its design approach. By embracing a fully integrated APU that merges CPUs and GPUs into one harmonious unit, it breaks away from traditional concepts and opens new frontiers in efficiency and performance. This chip is AMD’s bold step towards a future where zettascale isn't a dream, it's an upcoming reality, driving everything from supercomputers to comprehensive AI solutions.
Deep-dive into the technology of AMD's MI300 Transcription
00:00 - 00:30 The semiconductor landscape is rapidly
changing. Over the past couple of years, five new technologies have surfaced, which I think will
become fundamental requirements for anyone trying to design and produce competitive silicon chips in
the future. Each single of these technologies on its own is already very impactful, but combining
all five of them will be the ultimate challenge. And that's exactly what AMD is trying to do with
its next-gen HPC and AI accelerator MI300. Let's take a closer look at what these five important
technologies are, how AMD is using them for
00:30 - 01:00 MI300 and what all of this has to do with the
race to the first zettascale supercomputer. Until now the design of silicon chips
primarily revolved around two key aspects: architecture and process node. However, modern
chips have evolved to become much more than just the sum of these two elements. Architecture and
process node have expanded to encompass broader, overarching concepts. This shift has given rise
to next generation technologies. Number one won't
01:00 - 01:30 be a surprise for you, if you have watched
this channel before, it's chiplets. Instead of a single monolithic chip, almost all future
semiconductors will be chiplet based. The benefits of chiplet designs are plentiful and range from
simple cost savings, due to increased yield, the ability to use the perfect process node for
each individual chiplet, all the way to scaling up an entire product stack based on only a few
tapouts. There's no future without them. Number two is advanced packaging and 3D stacking. The
shift towards chiplets has made it essential to
01:30 - 02:00 closely integrate each individual chiplet into the
final product. While the process node was once the primary focus, packaging is increasingly gaining
prominence. In the future I expect that packaging will surpass the importance of the process node,
reflecting a fundamental change in chip design priorities. Number three is the transition
towards semiconductors that are primarily SoCs or APUs. These chips consolidate functions
that were previously separated into CPU, GPU and
02:00 - 02:30 I/O components. Apple has been a pioneer in this
area and many current CPUs essentially function as SoCs. This development also relies heavily
on the chiplet approach, with each building block of the SoC potentially serving as its own
distinct chiplet. Number four is the adoption of a unified memory architecture. As systems
become more tightly integrated, the importance of the memory architecture growths. Unified memory
enables all components of the SoC to access the
02:30 - 03:00 same memory pool, resulting in more efficient
memory allocation and simplified programming, since every part of the system can access the
same data. We have to move away from the idea that each component needs its own dedicated memory.
And last, but certainly not least, we have AI. While you could argue it's already part of a SoC
design, for me the integration of AI and machine learning acceleration is such an important aspect
of any future semiconductor, that it has its own dedicated point. And as we will learn later,
AI also pays a very important role when trying
03:00 - 03:30 to achieve zettascale. By the way, all the logos
I'm using here are AI generated. In a nutshell, future semiconductors will be chiplet based,
rely on advanced packaging, function as an SoC with unified memory and accelerate AI and machine
learning code. AMD's MI300 is such an interesting product because it's the first of its kind to
incorporate all five technologies. Before we dive into each of these technologies and examine how
the MI300 implements them, let's briefly review
03:30 - 04:00 the specifications. AMD's Instinct accelerators
are designed for supercomputers and as such MI300 is targeting the HPC and AI market. Its
predecessor, MI250xX, powers Frontier, the world's first exascale supercomputer. I've talked about
very large chips before, but compared to MI300 and it's 147 billion transistors all other chips seem
tiny. It's truly a monster of a chip and easily outclasses Intels 100 billion transistor Ponte
Vecchio or Nvidia's 80 billion Hopper GPU. These
04:00 - 04:30 numbers are a clear indication of AMD's mmbitions
for MI300 and while we don't have a official die size yet MI300 will use well over 1000 millimeters
squared of active silicon. In total MI300 combines more than 10,000 CDNA3 GPU cores with 24 Zen 4
CPU cores, lots of I/O, lots of cache and 128 gigabytes of HBM3 memory. The whole chip is
manufactured in a combination of TSMC N6 and
04:30 - 05:00 N5 process nodes, which brings us right to the
first of the five technologies: chiplets. Just by looking at pictures of MI300 the complexity
of its chiplet design isn't really apparent. It doesn't look that different to its predecessor
MI200 with what seems like four medium-sized dies in the middle, surrounded by a couple of smaller
chiplets, most likely the HBM3 memory. But this appearance is very deceiving, because what we are
looking at is the absolutely crazy combination of a large interposer with four six nanometer
base layer dies on top, on top of which AMD
05:00 - 05:30 stacked another nine, yes nine, five nanometer
chiplets. To round it off the interposer also connects eight 16 gigabyte stacked HBM3 memory
modules and another 8 smaller unidentified chips, most likely spacers used for physical integrity.
Io understand the complexity of this layout let's take a look at this mock-up AMD provided. The
foundation of MI300 is the interposer, it connects the HBM3 memory and the base chiplets. The marked
area in yellow might not be the exact size,
05:30 - 06:00 but it's good enough to understand the concept.
On top of the interposer we have 8 stacked HBM3 memory chips, each adding 16 gigabytes for a
total of 128 gigabytes of memory. On pictures of the physical chip we can see smaller chips
sitting in between the HBM stacks that aren't present in this illustration, which means they
are most likely inert silicon used as spacers and don't have any active functionality. Next are
the four six nanometer base chips marked in pink. They sit on top of the interposer and underneath
the five nanometer compute chiplets,.sSnce the
06:00 - 06:30 base chiplets are active silicon that most
likely houses I/O functionality and cache, AMD is using true 3D stacking. More on packaging
and 3D stacking in just a bit. And finally, on top of the four base chiplets, we have six
five nanometer CDNA3 MCDs (*GCDs) containing the GPU cores and three five nanometers Zen 4
CCDs with the CPU cores. Each CCD holds 8 cores for a total of 24 Zen 4 cores on MI300. I think
this overview is the best way to understand how
06:30 - 07:00 AMD counts four 6 nanometer and nine 5 nanometer
chiplets, and the interesting part is that while it's a complicated layout from a packet point of
view, it's a very simple design from a chiplet perspective. All AMD has to do is to tape out
one six nanometer cache and I/O base chiplet, one five nanometer CDNA3 MCD (*GCD) chiplet and
use the already existing 5 nanometer 8 cores Zen 4 chiplet. It's literally just three different
chiplets and only two new individual tape outs
07:00 - 07:30 for MI300. All these chiplets are rather small
and not too complex in design. We know how tiny a singles Zen 4 CCD is and while a CDNA3 MCD (*GCD)
will be slightly bigger, at this size and we can still expect close to perfect yields. It's a mix
and match system that allows AMD to effortlessly scale their HPC and AI products. You could also
match nine Zen 4 CCDs with two CDNA3 MCDs (*GCDs) or basically any other configuration.
Depending on what the customer wants,
07:30 - 08:00 AMD always delivers the perfect product. And if
you are building a supercomputer ,semi-custom silicon is not uncommon. We have seen chiplet
based architectures since the release of Zen 2 and AMDs's X3D CPUs up a notch in verticality.
But nothing comes close to MI300! This chip is what AMD's R&D has been working towards for the
past decade, it's basically science fiction, the end goal: a truly modular chiplet design.
And it's not some far away product either, mMI300 already exists in silicon and will launch
in the second half of this year. I think it's
08:00 - 08:30 impossible to overstate the importance of a design
like this! What if I told you that I didn't study anything related to semiconductors or computer
science and everything I know is based on many years of just following my passion and trying to
absorb as much knowledge as I can? That's why I'm really excited that brilliant.org is sponsoring
this video! The cool thing about Brilliant is that it doesn't feel like working, more like
exploring new topics with its intuitive and
08:30 - 09:00 interactive concepts. Brilliant offers thousands
of lessons related to math and computer science, from beginners to advanced levels. Just recently
they added a whole new introductory course on artificial intelligence, which really helps you
understand how neural networks function. AI stops being just "magic" and turns into this tangible
thing that you now understand. The best way to reach your goals and increase your knowledge about
the things you are interested in is to never stop learning. Visit brilliant.org/HighYield for a free
30-day trial plus the first 200 of you also get 20
09:00 - 09:30 off Brilliants annual premium subscription. If you
are curious and enjoy my videos I think you will love Brilliant! With that, let's get back to MI300
and its advanced packaging technology. If chiplets are the brain of MI300, the advanced packaging has
has to be the heart. It distributes all the energy and data and is a equally important part of the
chip. Without proper packaging MI300 would be just a collection of useless chiplets. For MI300 AMD
uses a mix of 2.5D and 3D stacking. The easiest
09:30 - 10:00 way to explain the difference between 2.5 and 3D
stacking is that while in both cases a silicon chip sits on top of another silicon chip, only
with 3D stacking both of these ships are actually active. The 5 nanometer MCD and CCD chiplets on
top of a 6 nanometer I/O and cache based chiplets are true 3D stacking, while the HBM memory on
top of the interposer is 2.5D. It's vertical
10:00 - 10:30 but the interposer is only used for data and power
routing. A really interesting aspect are the five nanometer compute chiplets which sit on top of the
infinity cache, located in the base layer, while AMD's 3D V-Cache has it the other way around. On
a 7950X3D the cache chiplet sits on top of the CPU chiplet, creating rather enter interesting
engineering problems, something I extensively talked about in my previous video. MI300 proves
that AMD has the technology to implement a reverse stacking method. Currently we can only speculate
on the specific 3D stacking technology AMD and
10:30 - 11:00 TSMC are using to enable MI300. The first thing
that comes to mind is the same TSMC SoIC and TSV based technology as used in Zen 4 X3D, where the
base chipletwould use TSVs to directly wire into the compute chiplet above. The only question is if
this method provides enough bandwidth and if it's the most efficient and cost effective solution.
Another way would be using the large interposer to route the data and power connections. The idea
here is that the six nanometer base chiplets might
11:00 - 11:30 actually be a bit smaller than the 5 nanometer
compute chiplets stacked on top. This approach would allow for direct data and power routing,
without having to use TSVs to go through the chiplet itself. Twitter user AMDFanUwe created
some very interesting drawings which illustrate this concept very well. Such a approach would be
less complex and most likely cheaper to implement. The challenge is to create a stacking method that
fulfills the bandwidth needs, is low latency, doesn't complicate the power routing and at the
same time isn't too expensive. As you can see,
11:30 - 12:00 a super easy task. I'm hoping AMD will share more
information on the exact technology used sooner than later. Let me know in the comments down below
what technology you think AMD is using for MI300 and how you would approach the 3D stacking of
MI300. Next let's talk about another first in the HPC space. MI300 isn't a simple GPU like
its predecessors, but it's a fully featured APU combining CPU and CPU cores into one single
package, and with the included I/O functionality
12:00 - 12:30 of the base chiplets is actually a true SoC. We
have seen the rise of SoCs in mobile and desktop, but high performance computing is a entirely new
frontier. If we talk about power draw in modern computers, most of us probably think of power
hungry GPUs and high-speed CPUs, but data connects are becoming another huge factor for power
consumption. As chips become faster, interconnects need to keep up. Transporting large amounts of
data at a high speeds between physically separated
12:30 - 13:00 chips like CPU and GPU consumes a lot of power. By
reducing the physical distance you not only reduce latency, but it actually virtually eliminates
the energy requirements for data transfers. When I first heard about APU and SoC concepts, I
primarily saw it as space optimization, but now I understand that the main benefit is the increase
in efficiency. Especially supercomputers use a lot of their power budget for data connection,
the less data you have to move off silicon the better. With MI300s APU approach not only will
the interconnect power usage decrease drastically,
13:00 - 13:30 but motherboards also don't need to be as complex.
Two huge advantages in the server space! A single MI300 is already a fully functioning chip in of
itself. Somewhat connected to the SoC design and another very impactful technology is is the use
of unified memory. Apple's M-chips are already a great example how unified memory benefits the
entire system. There are two reasons why unified memory is beneficial: first is the software.
If all parts of your system have access to the
13:30 - 14:00 same data, because they use the same memory pool,
programming becomes easier. The other reason is hardware related: having unified memory eliminates
redundant memory copies, which also means you need less total physical memory. And as you can
guess, on-package memory is more efficient too. In a nutshell, with unified memory you need less
memory capacity, it's on package and thus power efficient, it reduces mainboard complexity and
simplifies programming. A well-implemented unified
14:00 - 14:30 memory architecture has many advantage over the
legacy approach, but of course you need to be able to build such a system with the proper GPU and
CPU IP. And currently only AMD is capable of that, although Nvidia's Grace Superchip isn't far
behind, something we will look at in a future video. On the topic of Nvidia, let's talk about AI
acceleration. Artificial intelligence and machine learning are currently the number one topic
in tech and there is only one company that's clearly leading a pack: Nvidia. With MI300 AMD
is trying its best to catch up. CDNA3 focuses on
14:30 - 15:00 machine learning acceleration, AMD claims up to 8
times the AI training performance and 5 times the AI power efficiency. MI300 will also support new
math formats, most likely low precision INT and floating point operations to further accelerate
deep neural networks. This won't be enough to surpass Nvidia in my opinion, especially since
Nvidia's advantage is deeply rooted in its CUDA software package, something AMD has to invest a
lot of time and money into if they really want to challenge Nvidia. MI300 isn't only for AI, there
are plenty of other applications within the HPC
15:00 - 15:30 space it will excel at, but the overall direction
of the design is clear. AMD's goal is to challenge Nvidia and if MI300 won't bring victory, MI400
might. Now that we have talked about all five technologies AMD implemented with MI300, let me
tell you why AMD is so focused on integrating them into a single product. It was only a few weeks
ago, while watching an ISSCC presentation from AMD CEO Lisa Su, that I finally understood AMD's
goal. There's actually a bigger picture behind
15:30 - 16:00 all of this: achieving Zettascale performance.
Not with MI300, but with one of its hopefully many successors. I highly recommend you watch
the keynote yourself, but here's a gist of it: going from exascale to zettascale is not a
performance problem, eventually we will get there, it's a efficiency problem. Currently supercomputer
efficiency roughly doubles every 2.2 years, which doesn't sound too bad, does it? But if
we continue this trend and we finally are able
16:00 - 16:30 to build a zettascale supercomputer in the
mid-2030s, even if we achieve the same 2x efficiency increase every two years until then,
a single zettascale supercomputer would consume about 500 megawatts. That's a lot, that's nuclear
power plant levels of energy draw! Cooling such a system alone would be impossible! That's
why, if we want to enter the zettascale age, we need to find a way to drastically increase
efficiency, well above the current levels. And Mi 300 is exactly that, it's AMD's first
step towards a fully integrated system,
16:30 - 17:00 with a focus on not only increasing performance
but specifically using every technology available to be as efficient as possible, The chiplets, the
advanced packaging, the SoC design, the unified memory and the focus on AI all target efficiency.
You could get a similar performance with less engineering expense, with less R&D and with a less
complex layout, but the goal is not to create the cheapest or most convenient product, the goal
is a first step towards continuous efficiency
17:00 - 17:30 scaling. Reaching zettascale will require a
lot more R&D over the years, it will require technology that's not available yet. Off-chip
communication will be optical in the future, huge AI networks will exponentially increase
efficiency for specific calculations and 3D stacking will get even more crazy. But eventually
we will get there. Now it's your turn! Let me know your thoughts about the design of MI300 and
to be honest, I'm even more interested how you think we can reach zettascale in the future. What
technologies will stay and which ones will fail?
17:30 - 18:00 Leave a comment down below and let's see if we can
correctly predict the future of semiconductors. You know what to do if you found this video
interesting and see you in the next one :)