AMD's MI300: The Future of Semiconductors!

Deep-dive into the technology of AMD's MI300

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

AMD's MI300 is a groundbreaking leap in the world of semiconductors, incorporating five key technologies crucial for competitive chip design. The MI300 melds chiplets, advanced packaging, SoC designs, unified memory, and AI acceleration into one powerful product. It features an astounding 147 billion transistors, integrating 10,000 CDNA3 GPU cores with 24 Zen 4 CPU cores. This innovative chip is aimed at advancing efficiency and performance, particularly in achieving future zettascale computing capabilities. With the MI300, AMD sets the stage for future developments and challenges the dominance of competitors like Nvidia.

Highlights

AMD MI300 integrates revolutionary technologies for the next-gen semiconductor. 🤖
The extensive use of chiplets allows flexible and efficient design beyond traditional monolithic chips. 🌌
Advanced packaging with true 3D stacking makes efficient power and data management a reality. 🏗️
Unified memory architecture simplifies programming and enhances system-wide efficiency. 🧠
MI300's breakthrough in AI acceleration positions AMD as a strong competitor to Nvidia. 🏎️

Key Takeaways

AMD MI300 combines five crucial technologies: chiplets, advanced packaging, SoC design, unified memory, and AI acceleration. 🚀
It's a powerhouse with 147 billion transistors and integrates 10,000 CDNA3 GPU cores and 24 Zen 4 CPU cores. 💪
The MI300 is a fully functioning APU, breaking new ground in high-performance computing efficiency. 🌟
Aims at challenging Nvidia and driving zettascale computing advancement through enhanced efficiency. ⚡
It represents AMD's first step towards achieving integrated systems aimed at extreme performance and efficiency. 🌐

Overview

AMD's MI300 is nothing short of a marvel in the tech world, marrying five game-changing technologies into a single semiconductor chip. Each one of these technologies—chiplets, advanced packaging, SoC (System on Chip) design, unified memory, and AI acceleration—significantly enhances the performance and efficiency of the MI300, setting a new standard for high-performance computing and AI applications.

This technological symphony results in a chip boasting an overwhelming 147 billion transistors, equipped with over 10,000 CDNA3 GPU cores and 24 Zen 4 CPU cores. It's designed not just to perform but to redefine performance benchmarks, posing a serious challenge to industry giants like Nvidia. AMD's MI300 isn't just about raw power; it's about delivering power smartly, optimizing efficiency to work towards the colossal aim of zettascale computing.

What makes MI300 truly revolutionary is its design approach. By embracing a fully integrated APU that merges CPUs and GPUs into one harmonious unit, it breaks away from traditional concepts and opens new frontiers in efficiency and performance. This chip is AMD’s bold step towards a future where zettascale isn't a dream, it's an upcoming reality, driving everything from supercomputers to comprehensive AI solutions.

Deep-dive into the technology of AMD's MI300 Transcription

00:00 - 00:30 The semiconductor landscape is rapidly changing. Over the past couple of years, five new technologies have surfaced, which I think will become fundamental requirements for anyone trying to design and produce competitive silicon chips in the future. Each single of these technologies on its own is already very impactful, but combining all five of them will be the ultimate challenge. And that's exactly what AMD is trying to do with its next-gen HPC and AI accelerator MI300. Let's take a closer look at what these five important technologies are, how AMD is using them for
00:30 - 01:00 MI300 and what all of this has to do with the race to the first zettascale supercomputer. Until now the design of silicon chips primarily revolved around two key aspects: architecture and process node. However, modern chips have evolved to become much more than just the sum of these two elements. Architecture and process node have expanded to encompass broader, overarching concepts. This shift has given rise to next generation technologies. Number one won't
01:00 - 01:30 be a surprise for you, if you have watched this channel before, it's chiplets. Instead of a single monolithic chip, almost all future semiconductors will be chiplet based. The benefits of chiplet designs are plentiful and range from simple cost savings, due to increased yield, the ability to use the perfect process node for each individual chiplet, all the way to scaling up an entire product stack based on only a few tapouts. There's no future without them. Number two is advanced packaging and 3D stacking. The shift towards chiplets has made it essential to
01:30 - 02:00 closely integrate each individual chiplet into the final product. While the process node was once the primary focus, packaging is increasingly gaining prominence. In the future I expect that packaging will surpass the importance of the process node, reflecting a fundamental change in chip design priorities. Number three is the transition towards semiconductors that are primarily SoCs or APUs. These chips consolidate functions that were previously separated into CPU, GPU and
02:00 - 02:30 I/O components. Apple has been a pioneer in this area and many current CPUs essentially function as SoCs. This development also relies heavily on the chiplet approach, with each building block of the SoC potentially serving as its own distinct chiplet. Number four is the adoption of a unified memory architecture. As systems become more tightly integrated, the importance of the memory architecture growths. Unified memory enables all components of the SoC to access the
02:30 - 03:00 same memory pool, resulting in more efficient memory allocation and simplified programming, since every part of the system can access the same data. We have to move away from the idea that each component needs its own dedicated memory. And last, but certainly not least, we have AI. While you could argue it's already part of a SoC design, for me the integration of AI and machine learning acceleration is such an important aspect of any future semiconductor, that it has its own dedicated point. And as we will learn later, AI also pays a very important role when trying
03:00 - 03:30 to achieve zettascale. By the way, all the logos I'm using here are AI generated. In a nutshell, future semiconductors will be chiplet based, rely on advanced packaging, function as an SoC with unified memory and accelerate AI and machine learning code. AMD's MI300 is such an interesting product because it's the first of its kind to incorporate all five technologies. Before we dive into each of these technologies and examine how the MI300 implements them, let's briefly review
03:30 - 04:00 the specifications. AMD's Instinct accelerators are designed for supercomputers and as such MI300 is targeting the HPC and AI market. Its predecessor, MI250xX, powers Frontier, the world's first exascale supercomputer. I've talked about very large chips before, but compared to MI300 and it's 147 billion transistors all other chips seem tiny. It's truly a monster of a chip and easily outclasses Intels 100 billion transistor Ponte Vecchio or Nvidia's 80 billion Hopper GPU. These
04:00 - 04:30 numbers are a clear indication of AMD's mmbitions for MI300 and while we don't have a official die size yet MI300 will use well over 1000 millimeters squared of active silicon. In total MI300 combines more than 10,000 CDNA3 GPU cores with 24 Zen 4 CPU cores, lots of I/O, lots of cache and 128 gigabytes of HBM3 memory. The whole chip is manufactured in a combination of TSMC N6 and
04:30 - 05:00 N5 process nodes, which brings us right to the first of the five technologies: chiplets. Just by looking at pictures of MI300 the complexity of its chiplet design isn't really apparent. It doesn't look that different to its predecessor MI200 with what seems like four medium-sized dies in the middle, surrounded by a couple of smaller chiplets, most likely the HBM3 memory. But this appearance is very deceiving, because what we are looking at is the absolutely crazy combination of a large interposer with four six nanometer base layer dies on top, on top of which AMD
05:00 - 05:30 stacked another nine, yes nine, five nanometer chiplets. To round it off the interposer also connects eight 16 gigabyte stacked HBM3 memory modules and another 8 smaller unidentified chips, most likely spacers used for physical integrity. Io understand the complexity of this layout let's take a look at this mock-up AMD provided. The foundation of MI300 is the interposer, it connects the HBM3 memory and the base chiplets. The marked area in yellow might not be the exact size,
05:30 - 06:00 but it's good enough to understand the concept. On top of the interposer we have 8 stacked HBM3 memory chips, each adding 16 gigabytes for a total of 128 gigabytes of memory. On pictures of the physical chip we can see smaller chips sitting in between the HBM stacks that aren't present in this illustration, which means they are most likely inert silicon used as spacers and don't have any active functionality. Next are the four six nanometer base chips marked in pink. They sit on top of the interposer and underneath the five nanometer compute chiplets,.sSnce the
06:00 - 06:30 base chiplets are active silicon that most likely houses I/O functionality and cache, AMD is using true 3D stacking. More on packaging and 3D stacking in just a bit. And finally, on top of the four base chiplets, we have six five nanometer CDNA3 MCDs (*GCDs) containing the GPU cores and three five nanometers Zen 4 CCDs with the CPU cores. Each CCD holds 8 cores for a total of 24 Zen 4 cores on MI300. I think this overview is the best way to understand how
06:30 - 07:00 AMD counts four 6 nanometer and nine 5 nanometer chiplets, and the interesting part is that while it's a complicated layout from a packet point of view, it's a very simple design from a chiplet perspective. All AMD has to do is to tape out one six nanometer cache and I/O base chiplet, one five nanometer CDNA3 MCD (*GCD) chiplet and use the already existing 5 nanometer 8 cores Zen 4 chiplet. It's literally just three different chiplets and only two new individual tape outs
07:00 - 07:30 for MI300. All these chiplets are rather small and not too complex in design. We know how tiny a singles Zen 4 CCD is and while a CDNA3 MCD (*GCD) will be slightly bigger, at this size and we can still expect close to perfect yields. It's a mix and match system that allows AMD to effortlessly scale their HPC and AI products. You could also match nine Zen 4 CCDs with two CDNA3 MCDs (*GCDs) or basically any other configuration. Depending on what the customer wants,
07:30 - 08:00 AMD always delivers the perfect product. And if you are building a supercomputer ,semi-custom silicon is not uncommon. We have seen chiplet based architectures since the release of Zen 2 and AMDs's X3D CPUs up a notch in verticality. But nothing comes close to MI300! This chip is what AMD's R&D has been working towards for the past decade, it's basically science fiction, the end goal: a truly modular chiplet design. And it's not some far away product either, mMI300 already exists in silicon and will launch in the second half of this year. I think it's
08:00 - 08:30 impossible to overstate the importance of a design like this! What if I told you that I didn't study anything related to semiconductors or computer science and everything I know is based on many years of just following my passion and trying to absorb as much knowledge as I can? That's why I'm really excited that brilliant.org is sponsoring this video! The cool thing about Brilliant is that it doesn't feel like working, more like exploring new topics with its intuitive and
08:30 - 09:00 interactive concepts. Brilliant offers thousands of lessons related to math and computer science, from beginners to advanced levels. Just recently they added a whole new introductory course on artificial intelligence, which really helps you understand how neural networks function. AI stops being just "magic" and turns into this tangible thing that you now understand. The best way to reach your goals and increase your knowledge about the things you are interested in is to never stop learning. Visit brilliant.org/HighYield for a free 30-day trial plus the first 200 of you also get 20
09:00 - 09:30 off Brilliants annual premium subscription. If you are curious and enjoy my videos I think you will love Brilliant! With that, let's get back to MI300 and its advanced packaging technology. If chiplets are the brain of MI300, the advanced packaging has has to be the heart. It distributes all the energy and data and is a equally important part of the chip. Without proper packaging MI300 would be just a collection of useless chiplets. For MI300 AMD uses a mix of 2.5D and 3D stacking. The easiest
09:30 - 10:00 way to explain the difference between 2.5 and 3D stacking is that while in both cases a silicon chip sits on top of another silicon chip, only with 3D stacking both of these ships are actually active. The 5 nanometer MCD and CCD chiplets on top of a 6 nanometer I/O and cache based chiplets are true 3D stacking, while the HBM memory on top of the interposer is 2.5D. It's vertical
10:00 - 10:30 but the interposer is only used for data and power routing. A really interesting aspect are the five nanometer compute chiplets which sit on top of the infinity cache, located in the base layer, while AMD's 3D V-Cache has it the other way around. On a 7950X3D the cache chiplet sits on top of the CPU chiplet, creating rather enter interesting engineering problems, something I extensively talked about in my previous video. MI300 proves that AMD has the technology to implement a reverse stacking method. Currently we can only speculate on the specific 3D stacking technology AMD and
10:30 - 11:00 TSMC are using to enable MI300. The first thing that comes to mind is the same TSMC SoIC and TSV based technology as used in Zen 4 X3D, where the base chipletwould use TSVs to directly wire into the compute chiplet above. The only question is if this method provides enough bandwidth and if it's the most efficient and cost effective solution. Another way would be using the large interposer to route the data and power connections. The idea here is that the six nanometer base chiplets might
11:00 - 11:30 actually be a bit smaller than the 5 nanometer compute chiplets stacked on top. This approach would allow for direct data and power routing, without having to use TSVs to go through the chiplet itself. Twitter user AMDFanUwe created some very interesting drawings which illustrate this concept very well. Such a approach would be less complex and most likely cheaper to implement. The challenge is to create a stacking method that fulfills the bandwidth needs, is low latency, doesn't complicate the power routing and at the same time isn't too expensive. As you can see,
11:30 - 12:00 a super easy task. I'm hoping AMD will share more information on the exact technology used sooner than later. Let me know in the comments down below what technology you think AMD is using for MI300 and how you would approach the 3D stacking of MI300. Next let's talk about another first in the HPC space. MI300 isn't a simple GPU like its predecessors, but it's a fully featured APU combining CPU and CPU cores into one single package, and with the included I/O functionality
12:00 - 12:30 of the base chiplets is actually a true SoC. We have seen the rise of SoCs in mobile and desktop, but high performance computing is a entirely new frontier. If we talk about power draw in modern computers, most of us probably think of power hungry GPUs and high-speed CPUs, but data connects are becoming another huge factor for power consumption. As chips become faster, interconnects need to keep up. Transporting large amounts of data at a high speeds between physically separated
12:30 - 13:00 chips like CPU and GPU consumes a lot of power. By reducing the physical distance you not only reduce latency, but it actually virtually eliminates the energy requirements for data transfers. When I first heard about APU and SoC concepts, I primarily saw it as space optimization, but now I understand that the main benefit is the increase in efficiency. Especially supercomputers use a lot of their power budget for data connection, the less data you have to move off silicon the better. With MI300s APU approach not only will the interconnect power usage decrease drastically,
13:00 - 13:30 but motherboards also don't need to be as complex. Two huge advantages in the server space! A single MI300 is already a fully functioning chip in of itself. Somewhat connected to the SoC design and another very impactful technology is is the use of unified memory. Apple's M-chips are already a great example how unified memory benefits the entire system. There are two reasons why unified memory is beneficial: first is the software. If all parts of your system have access to the
13:30 - 14:00 same data, because they use the same memory pool, programming becomes easier. The other reason is hardware related: having unified memory eliminates redundant memory copies, which also means you need less total physical memory. And as you can guess, on-package memory is more efficient too. In a nutshell, with unified memory you need less memory capacity, it's on package and thus power efficient, it reduces mainboard complexity and simplifies programming. A well-implemented unified
14:00 - 14:30 memory architecture has many advantage over the legacy approach, but of course you need to be able to build such a system with the proper GPU and CPU IP. And currently only AMD is capable of that, although Nvidia's Grace Superchip isn't far behind, something we will look at in a future video. On the topic of Nvidia, let's talk about AI acceleration. Artificial intelligence and machine learning are currently the number one topic in tech and there is only one company that's clearly leading a pack: Nvidia. With MI300 AMD is trying its best to catch up. CDNA3 focuses on
14:30 - 15:00 machine learning acceleration, AMD claims up to 8 times the AI training performance and 5 times the AI power efficiency. MI300 will also support new math formats, most likely low precision INT and floating point operations to further accelerate deep neural networks. This won't be enough to surpass Nvidia in my opinion, especially since Nvidia's advantage is deeply rooted in its CUDA software package, something AMD has to invest a lot of time and money into if they really want to challenge Nvidia. MI300 isn't only for AI, there are plenty of other applications within the HPC
15:00 - 15:30 space it will excel at, but the overall direction of the design is clear. AMD's goal is to challenge Nvidia and if MI300 won't bring victory, MI400 might. Now that we have talked about all five technologies AMD implemented with MI300, let me tell you why AMD is so focused on integrating them into a single product. It was only a few weeks ago, while watching an ISSCC presentation from AMD CEO Lisa Su, that I finally understood AMD's goal. There's actually a bigger picture behind
15:30 - 16:00 all of this: achieving Zettascale performance. Not with MI300, but with one of its hopefully many successors. I highly recommend you watch the keynote yourself, but here's a gist of it: going from exascale to zettascale is not a performance problem, eventually we will get there, it's a efficiency problem. Currently supercomputer efficiency roughly doubles every 2.2 years, which doesn't sound too bad, does it? But if we continue this trend and we finally are able
16:00 - 16:30 to build a zettascale supercomputer in the mid-2030s, even if we achieve the same 2x efficiency increase every two years until then, a single zettascale supercomputer would consume about 500 megawatts. That's a lot, that's nuclear power plant levels of energy draw! Cooling such a system alone would be impossible! That's why, if we want to enter the zettascale age, we need to find a way to drastically increase efficiency, well above the current levels. And Mi 300 is exactly that, it's AMD's first step towards a fully integrated system,
16:30 - 17:00 with a focus on not only increasing performance but specifically using every technology available to be as efficient as possible, The chiplets, the advanced packaging, the SoC design, the unified memory and the focus on AI all target efficiency. You could get a similar performance with less engineering expense, with less R&D and with a less complex layout, but the goal is not to create the cheapest or most convenient product, the goal is a first step towards continuous efficiency
17:00 - 17:30 scaling. Reaching zettascale will require a lot more R&D over the years, it will require technology that's not available yet. Off-chip communication will be optical in the future, huge AI networks will exponentially increase efficiency for specific calculations and 3D stacking will get even more crazy. But eventually we will get there. Now it's your turn! Let me know your thoughts about the design of MI300 and to be honest, I'm even more interested how you think we can reach zettascale in the future. What technologies will stay and which ones will fail?
17:30 - 18:00 Leave a comment down below and let's see if we can correctly predict the future of semiconductors. You know what to do if you found this video interesting and see you in the next one :)