AMD's MI300: The Future of Semiconductors!

Deep-dive into the technology of AMD's MI300

Estimated read time: 1:20

    Learn to use AI like a Pro

    Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

    Canva Logo
    Claude AI Logo
    Google Gemini Logo
    HeyGen Logo
    Hugging Face Logo
    Microsoft Logo
    OpenAI Logo
    Zapier Logo
    Canva Logo
    Claude AI Logo
    Google Gemini Logo
    HeyGen Logo
    Hugging Face Logo
    Microsoft Logo
    OpenAI Logo
    Zapier Logo

    Summary

    AMD's MI300 is a groundbreaking leap in the world of semiconductors, incorporating five key technologies crucial for competitive chip design. The MI300 melds chiplets, advanced packaging, SoC designs, unified memory, and AI acceleration into one powerful product. It features an astounding 147 billion transistors, integrating 10,000 CDNA3 GPU cores with 24 Zen 4 CPU cores. This innovative chip is aimed at advancing efficiency and performance, particularly in achieving future zettascale computing capabilities. With the MI300, AMD sets the stage for future developments and challenges the dominance of competitors like Nvidia.

      Highlights

      • AMD MI300 integrates revolutionary technologies for the next-gen semiconductor. 🤖
      • The extensive use of chiplets allows flexible and efficient design beyond traditional monolithic chips. 🌌
      • Advanced packaging with true 3D stacking makes efficient power and data management a reality. 🏗️
      • Unified memory architecture simplifies programming and enhances system-wide efficiency. 🧠
      • MI300's breakthrough in AI acceleration positions AMD as a strong competitor to Nvidia. 🏎️

      Key Takeaways

      • AMD MI300 combines five crucial technologies: chiplets, advanced packaging, SoC design, unified memory, and AI acceleration. 🚀
      • It's a powerhouse with 147 billion transistors and integrates 10,000 CDNA3 GPU cores and 24 Zen 4 CPU cores. 💪
      • The MI300 is a fully functioning APU, breaking new ground in high-performance computing efficiency. 🌟
      • Aims at challenging Nvidia and driving zettascale computing advancement through enhanced efficiency. ⚡
      • It represents AMD's first step towards achieving integrated systems aimed at extreme performance and efficiency. 🌐

      Overview

      AMD's MI300 is nothing short of a marvel in the tech world, marrying five game-changing technologies into a single semiconductor chip. Each one of these technologies—chiplets, advanced packaging, SoC (System on Chip) design, unified memory, and AI acceleration—significantly enhances the performance and efficiency of the MI300, setting a new standard for high-performance computing and AI applications.

        This technological symphony results in a chip boasting an overwhelming 147 billion transistors, equipped with over 10,000 CDNA3 GPU cores and 24 Zen 4 CPU cores. It's designed not just to perform but to redefine performance benchmarks, posing a serious challenge to industry giants like Nvidia. AMD's MI300 isn't just about raw power; it's about delivering power smartly, optimizing efficiency to work towards the colossal aim of zettascale computing.

          What makes MI300 truly revolutionary is its design approach. By embracing a fully integrated APU that merges CPUs and GPUs into one harmonious unit, it breaks away from traditional concepts and opens new frontiers in efficiency and performance. This chip is AMD’s bold step towards a future where zettascale isn't a dream, it's an upcoming reality, driving everything from supercomputers to comprehensive AI solutions.

            Deep-dive into the technology of AMD's MI300 Transcription

            • 00:00 - 00:30 The semiconductor landscape is rapidly  changing. Over the past couple of years, five   new technologies have surfaced, which I think will  become fundamental requirements for anyone trying   to design and produce competitive silicon chips in  the future. Each single of these technologies on   its own is already very impactful, but combining  all five of them will be the ultimate challenge.   And that's exactly what AMD is trying to do with  its next-gen HPC and AI accelerator MI300. Let's   take a closer look at what these five important  technologies are, how AMD is using them for
            • 00:30 - 01:00 MI300 and what all of this has to do with the  race to the first zettascale supercomputer.   Until now the design of silicon chips  primarily revolved around two key aspects:   architecture and process node. However, modern  chips have evolved to become much more than just   the sum of these two elements. Architecture and  process node have expanded to encompass broader,   overarching concepts. This shift has given rise  to next generation technologies. Number one won't
            • 01:00 - 01:30 be a surprise for you, if you have watched  this channel before, it's chiplets. Instead   of a single monolithic chip, almost all future  semiconductors will be chiplet based. The benefits   of chiplet designs are plentiful and range from  simple cost savings, due to increased yield,   the ability to use the perfect process node for  each individual chiplet, all the way to scaling   up an entire product stack based on only a few  tapouts. There's no future without them. Number   two is advanced packaging and 3D stacking. The  shift towards chiplets has made it essential to
            • 01:30 - 02:00 closely integrate each individual chiplet into the  final product. While the process node was once the   primary focus, packaging is increasingly gaining  prominence. In the future I expect that packaging   will surpass the importance of the process node,  reflecting a fundamental change in chip design   priorities. Number three is the transition  towards semiconductors that are primarily   SoCs or APUs. These chips consolidate functions  that were previously separated into CPU, GPU and
            • 02:00 - 02:30 I/O components. Apple has been a pioneer in this  area and many current CPUs essentially function   as SoCs. This development also relies heavily  on the chiplet approach, with each building   block of the SoC potentially serving as its own  distinct chiplet. Number four is the adoption   of a unified memory architecture. As systems  become more tightly integrated, the importance   of the memory architecture growths. Unified memory  enables all components of the SoC to access the
            • 02:30 - 03:00 same memory pool, resulting in more efficient  memory allocation and simplified programming,   since every part of the system can access the  same data. We have to move away from the idea that   each component needs its own dedicated memory.  And last, but certainly not least, we have AI.   While you could argue it's already part of a SoC  design, for me the integration of AI and machine   learning acceleration is such an important aspect  of any future semiconductor, that it has its own   dedicated point. And as we will learn later,  AI also pays a very important role when trying
            • 03:00 - 03:30 to achieve zettascale. By the way, all the logos  I'm using here are AI generated. In a nutshell,   future semiconductors will be chiplet based,  rely on advanced packaging, function as an SoC   with unified memory and accelerate AI and machine  learning code. AMD's MI300 is such an interesting   product because it's the first of its kind to  incorporate all five technologies. Before we dive   into each of these technologies and examine how  the MI300 implements them, let's briefly review
            • 03:30 - 04:00 the specifications. AMD's Instinct accelerators  are designed for supercomputers and as such   MI300 is targeting the HPC and AI market. Its  predecessor, MI250xX, powers Frontier, the world's   first exascale supercomputer. I've talked about  very large chips before, but compared to MI300 and   it's 147 billion transistors all other chips seem  tiny. It's truly a monster of a chip and easily   outclasses Intels 100 billion transistor Ponte  Vecchio or Nvidia's 80 billion Hopper GPU. These
            • 04:00 - 04:30 numbers are a clear indication of AMD's mmbitions  for MI300 and while we don't have a official die   size yet MI300 will use well over 1000 millimeters  squared of active silicon. In total MI300 combines   more than 10,000 CDNA3 GPU cores with 24 Zen 4  CPU cores, lots of I/O, lots of cache and 128   gigabytes of HBM3 memory. The whole chip is  manufactured in a combination of TSMC N6 and
            • 04:30 - 05:00 N5 process nodes, which brings us right to the  first of the five technologies: chiplets. Just   by looking at pictures of MI300 the complexity  of its chiplet design isn't really apparent.   It doesn't look that different to its predecessor  MI200 with what seems like four medium-sized dies   in the middle, surrounded by a couple of smaller  chiplets, most likely the HBM3 memory. But this   appearance is very deceiving, because what we are  looking at is the absolutely crazy combination of   a large interposer with four six nanometer  base layer dies on top, on top of which AMD
            • 05:00 - 05:30 stacked another nine, yes nine, five nanometer  chiplets. To round it off the interposer also   connects eight 16 gigabyte stacked HBM3 memory  modules and another 8 smaller unidentified chips,   most likely spacers used for physical integrity.  Io understand the complexity of this layout let's   take a look at this mock-up AMD provided. The  foundation of MI300 is the interposer, it connects   the HBM3 memory and the base chiplets. The marked  area in yellow might not be the exact size,
            • 05:30 - 06:00 but it's good enough to understand the concept.  On top of the interposer we have 8 stacked HBM3   memory chips, each adding 16 gigabytes for a  total of 128 gigabytes of memory. On pictures   of the physical chip we can see smaller chips  sitting in between the HBM stacks that aren't   present in this illustration, which means they  are most likely inert silicon used as spacers   and don't have any active functionality. Next are  the four six nanometer base chips marked in pink.   They sit on top of the interposer and underneath  the five nanometer compute chiplets,.sSnce the
            • 06:00 - 06:30 base chiplets are active silicon that most  likely houses I/O functionality and cache,   AMD is using true 3D stacking. More on packaging  and 3D stacking in just a bit. And finally,   on top of the four base chiplets, we have six  five nanometer CDNA3 MCDs (*GCDs) containing   the GPU cores and three five nanometers Zen 4  CCDs with the CPU cores. Each CCD holds 8 cores   for a total of 24 Zen 4 cores on MI300. I think  this overview is the best way to understand how
            • 06:30 - 07:00 AMD counts four 6 nanometer and nine 5 nanometer  chiplets, and the interesting part is that while   it's a complicated layout from a packet point of  view, it's a very simple design from a chiplet   perspective. All AMD has to do is to tape out  one six nanometer cache and I/O base chiplet,   one five nanometer CDNA3 MCD (*GCD) chiplet and  use the already existing 5 nanometer 8 cores Zen   4 chiplet. It's literally just three different  chiplets and only two new individual tape outs
            • 07:00 - 07:30 for MI300. All these chiplets are rather small  and not too complex in design. We know how tiny a   singles Zen 4 CCD is and while a CDNA3 MCD (*GCD)  will be slightly bigger, at this size and we can   still expect close to perfect yields. It's a mix  and match system that allows AMD to effortlessly   scale their HPC and AI products. You could also  match nine Zen 4 CCDs with two CDNA3 MCDs (*GCDs)   or basically any other configuration.  Depending on what the customer wants,
            • 07:30 - 08:00 AMD always delivers the perfect product. And if  you are building a supercomputer ,semi-custom   silicon is not uncommon. We have seen chiplet  based architectures since the release of Zen 2   and AMDs's X3D CPUs up a notch in verticality.  But nothing comes close to MI300! This chip is   what AMD's R&D has been working towards for the  past decade, it's basically science fiction,   the end goal: a truly modular chiplet design.  And it's not some far away product either,   mMI300 already exists in silicon and will launch  in the second half of this year. I think it's
            • 08:00 - 08:30 impossible to overstate the importance of a design  like this! What if I told you that I didn't study   anything related to semiconductors or computer  science and everything I know is based on many   years of just following my passion and trying to  absorb as much knowledge as I can? That's why I'm   really excited that brilliant.org is sponsoring  this video! The cool thing about Brilliant is   that it doesn't feel like working, more like  exploring new topics with its intuitive and
            • 08:30 - 09:00 interactive concepts. Brilliant offers thousands  of lessons related to math and computer science,   from beginners to advanced levels. Just recently  they added a whole new introductory course on   artificial intelligence, which really helps you  understand how neural networks function. AI stops   being just "magic" and turns into this tangible  thing that you now understand. The best way to   reach your goals and increase your knowledge about  the things you are interested in is to never stop   learning. Visit brilliant.org/HighYield for a free  30-day trial plus the first 200 of you also get 20
            • 09:00 - 09:30 off Brilliants annual premium subscription. If you  are curious and enjoy my videos I think you will   love Brilliant! With that, let's get back to MI300  and its advanced packaging technology. If chiplets   are the brain of MI300, the advanced packaging has  has to be the heart. It distributes all the energy   and data and is a equally important part of the  chip. Without proper packaging MI300 would be just   a collection of useless chiplets. For MI300 AMD  uses a mix of 2.5D and 3D stacking. The easiest
            • 09:30 - 10:00 way to explain the difference between 2.5 and 3D  stacking is that while in both cases a silicon   chip sits on top of another silicon chip, only  with 3D stacking both of these ships are actually   active. The 5 nanometer MCD and CCD chiplets on  top of a 6 nanometer I/O and cache based chiplets   are true 3D stacking, while the HBM memory on  top of the interposer is 2.5D. It's vertical
            • 10:00 - 10:30 but the interposer is only used for data and power  routing. A really interesting aspect are the five   nanometer compute chiplets which sit on top of the  infinity cache, located in the base layer, while   AMD's 3D V-Cache has it the other way around. On  a 7950X3D the cache chiplet sits on top of the   CPU chiplet, creating rather enter interesting  engineering problems, something I extensively   talked about in my previous video. MI300 proves  that AMD has the technology to implement a reverse   stacking method. Currently we can only speculate  on the specific 3D stacking technology AMD and
            • 10:30 - 11:00 TSMC are using to enable MI300. The first thing  that comes to mind is the same TSMC SoIC and TSV   based technology as used in Zen 4 X3D, where the  base chipletwould use TSVs to directly wire into   the compute chiplet above. The only question is if  this method provides enough bandwidth and if it's   the most efficient and cost effective solution.  Another way would be using the large interposer   to route the data and power connections. The idea  here is that the six nanometer base chiplets might
            • 11:00 - 11:30 actually be a bit smaller than the 5 nanometer  compute chiplets stacked on top. This approach   would allow for direct data and power routing,  without having to use TSVs to go through the   chiplet itself. Twitter user AMDFanUwe created  some very interesting drawings which illustrate   this concept very well. Such a approach would be  less complex and most likely cheaper to implement.   The challenge is to create a stacking method that  fulfills the bandwidth needs, is low latency,   doesn't complicate the power routing and at the  same time isn't too expensive. As you can see,
            • 11:30 - 12:00 a super easy task. I'm hoping AMD will share more  information on the exact technology used sooner   than later. Let me know in the comments down below  what technology you think AMD is using for MI300   and how you would approach the 3D stacking of  MI300. Next let's talk about another first in   the HPC space. MI300 isn't a simple GPU like  its predecessors, but it's a fully featured   APU combining CPU and CPU cores into one single  package, and with the included I/O functionality
            • 12:00 - 12:30 of the base chiplets is actually a true SoC. We  have seen the rise of SoCs in mobile and desktop,   but high performance computing is a entirely new  frontier. If we talk about power draw in modern   computers, most of us probably think of power  hungry GPUs and high-speed CPUs, but data connects   are becoming another huge factor for power  consumption. As chips become faster, interconnects   need to keep up. Transporting large amounts of  data at a high speeds between physically separated
            • 12:30 - 13:00 chips like CPU and GPU consumes a lot of power. By  reducing the physical distance you not only reduce   latency, but it actually virtually eliminates  the energy requirements for data transfers.   When I first heard about APU and SoC concepts, I  primarily saw it as space optimization, but now I   understand that the main benefit is the increase  in efficiency. Especially supercomputers use a   lot of their power budget for data connection,  the less data you have to move off silicon the   better. With MI300s APU approach not only will  the interconnect power usage decrease drastically,
            • 13:00 - 13:30 but motherboards also don't need to be as complex.  Two huge advantages in the server space! A single   MI300 is already a fully functioning chip in of  itself. Somewhat connected to the SoC design and   another very impactful technology is is the use  of unified memory. Apple's M-chips are already   a great example how unified memory benefits the  entire system. There are two reasons why unified   memory is beneficial: first is the software.  If all parts of your system have access to the
            • 13:30 - 14:00 same data, because they use the same memory pool,  programming becomes easier. The other reason is   hardware related: having unified memory eliminates  redundant memory copies, which also means you   need less total physical memory. And as you can  guess, on-package memory is more efficient too.   In a nutshell, with unified memory you need less  memory capacity, it's on package and thus power   efficient, it reduces mainboard complexity and  simplifies programming. A well-implemented unified
            • 14:00 - 14:30 memory architecture has many advantage over the  legacy approach, but of course you need to be able   to build such a system with the proper GPU and  CPU IP. And currently only AMD is capable of that,   although Nvidia's Grace Superchip isn't far  behind, something we will look at in a future   video. On the topic of Nvidia, let's talk about AI  acceleration. Artificial intelligence and machine   learning are currently the number one topic  in tech and there is only one company that's   clearly leading a pack: Nvidia. With MI300 AMD  is trying its best to catch up. CDNA3 focuses on
            • 14:30 - 15:00 machine learning acceleration, AMD claims up to 8  times the AI training performance and 5 times the   AI power efficiency. MI300 will also support new  math formats, most likely low precision INT and   floating point operations to further accelerate  deep neural networks. This won't be enough to   surpass Nvidia in my opinion, especially since  Nvidia's advantage is deeply rooted in its CUDA   software package, something AMD has to invest a  lot of time and money into if they really want to   challenge Nvidia. MI300 isn't only for AI, there  are plenty of other applications within the HPC
            • 15:00 - 15:30 space it will excel at, but the overall direction  of the design is clear. AMD's goal is to challenge   Nvidia and if MI300 won't bring victory, MI400  might. Now that we have talked about all five   technologies AMD implemented with MI300, let me  tell you why AMD is so focused on integrating them   into a single product. It was only a few weeks  ago, while watching an ISSCC presentation from   AMD CEO Lisa Su, that I finally understood AMD's  goal. There's actually a bigger picture behind
            • 15:30 - 16:00 all of this: achieving Zettascale performance.  Not with MI300, but with one of its hopefully   many successors. I highly recommend you watch  the keynote yourself, but here's a gist of it:   going from exascale to zettascale is not a  performance problem, eventually we will get there,   it's a efficiency problem. Currently supercomputer  efficiency roughly doubles every 2.2 years,   which doesn't sound too bad, does it? But if  we continue this trend and we finally are able
            • 16:00 - 16:30 to build a zettascale supercomputer in the  mid-2030s, even if we achieve the same 2x   efficiency increase every two years until then,  a single zettascale supercomputer would consume   about 500 megawatts. That's a lot, that's nuclear  power plant levels of energy draw! Cooling such   a system alone would be impossible! That's  why, if we want to enter the zettascale age,   we need to find a way to drastically increase  efficiency, well above the current levels. And   Mi 300 is exactly that, it's AMD's first  step towards a fully integrated system,
            • 16:30 - 17:00 with a focus on not only increasing performance  but specifically using every technology available   to be as efficient as possible, The chiplets, the  advanced packaging, the SoC design, the unified   memory and the focus on AI all target efficiency.  You could get a similar performance with less   engineering expense, with less R&D and with a less  complex layout, but the goal is not to create the   cheapest or most convenient product, the goal  is a first step towards continuous efficiency
            • 17:00 - 17:30 scaling. Reaching zettascale will require a  lot more R&D over the years, it will require   technology that's not available yet. Off-chip  communication will be optical in the future,   huge AI networks will exponentially increase  efficiency for specific calculations and 3D   stacking will get even more crazy. But eventually  we will get there. Now it's your turn! Let me   know your thoughts about the design of MI300 and  to be honest, I'm even more interested how you   think we can reach zettascale in the future. What  technologies will stay and which ones will fail?
            • 17:30 - 18:00 Leave a comment down below and let's see if we can  correctly predict the future of semiconductors.   You know what to do if you found this video  interesting and see you in the next one :)