Unraveling Processor Dynamics with Dr. Hernandez
Computer Architecture Course - Chapter 4 - Processor - Part 5
Estimated read time: 1:20
Summary
In this segment of the Computer Architecture course, Dr. Orlando J. Hernandez dives deep into dynamic branch prediction as a solution to reduce stalling in processor execution. The discussion critically explores pipelines, super scalar organizations, and the impact of dynamic branch prediction on system performance. Dr. Hernandez elaborates on the nuances of branch prediction using one-bit and two-bit predictors, highlighting their implications on nested loops and overall accuracy. Further, the lecture transitions into the intricacies of handling exceptions and interrupts, pivotal in maintaining efficient processor performance. With a focus on ARM's ISA and the architectural challenges associated with exceptions and interrupt handling, Dr. Hernandez elucidates strategies to mitigate pipeline stalls and control hazards, emphasizing the broader implications in complex, multi-stage pipelines.
Highlights
- Dynamic branch prediction helps reduce pipeline stalls during branch execution. 🚦
- Exploring one-bit vs. two-bit predictors shows the benefit of remembering more outcomes. 🔍
- Super scalar systems replicate pipelines for parallel instruction execution, boosting efficiency. 🌟
- Exceptions vs. interrupts: understanding the difference is crucial in system architecture. ⚙️
- Handling unexpected events with Vector Interrupts helps streamline processor control. 🖥️
- ARM's ISA defines steps to manage exceptions, improving stability and performance. 🎯
Key Takeaways
- Dynamic branch prediction is key to minimizing execution stalls in pipelines. 🚀
- One-bit predictors fall short with nested loops, prompting the need for two-bit predictors. 🔄
- Pipelines and superscalar architectures enhance performance but introduce complexity. 🏗️
- Handling exceptions and interrupts requires careful architectural considerations to avoid processor inefficiencies. 🔧
- ARM's approach to exceptions involves dedicated registers like the Exception Link Register (ELR) to manage control flow. 📝
- Efficient prediction and exception handling ensure robust processor performance, even with complex architectures. 📈
Overview
Dr. Orlando J. Hernandez introduces dynamic branch prediction, an important concept for reducing stalls in pipeline executions. Understanding this is crucial for students as it lays the groundwork for managing more complicated processor operations involving super scalar architectures. Super scalar architectures, as detailed, replicate pipelines to execute instructions in parallel, significantly boosting processing efficiency, albeit at the cost of increased complexity.
The lesson delves into the mechanics of branch prediction using both one-bit and two-bit predictors. While one-bit predictors are simpler, they struggle with nested loops, commonly found in software, leading to inaccuracies. Two-bit predictors improve on this by tracking more outcomes, effectively reducing incorrect predictions. The insights provided into these predictors aid in grasping how modern CPUs handle branching and help build a foundation for advanced computing topics.
In addressing exceptions and interrupts, Dr. Hernandez provides clarity on these often confusing but critical areas. By detailing ARM's strategies, including the use of Exception Link Registers (ELRs) and Vector Interrupts, he highlights how processors maintain smooth operation amidst unexpected events. This part of the lesson underscores the necessity for robust error management in CPUs, ensuring continued performance even in complex, multi-stage pipelines.
Chapters
- 00:00 - 01:30: Introduction to Dynamic Branch Prediction The chapter introduces the concept of Dynamic Branch Prediction as a method to minimize stalls during the execution of branch instructions. This approach is presented as a solution to a significant issue discussed in previous sessions, highlighting its importance in improving execution flow in computing.
- 01:30 - 03:00: Challenges with Deeper Pipelines and Superscalar Organizations The chapter discusses the challenges associated with deeper pipelines in computing systems, specifically focusing on organizations with pipeline stages ranging from 9 to 13 stages. It emphasizes the increased penalty in terms of cycles when the pipeline must be flushed and stall cycles are introduced, as the number of cycles affected grows larger. The chapter also touches on superscalar organizations, which will be elaborated on later. These organizations involve executing multiple instructions per cycle, though specifics are briefly mentioned here.
- 03:00 - 04:30: Dynamic Branch Prediction Techniques The chapter discusses the concept of dynamic branch prediction techniques within computer architecture. It explains that multiple pipelines can operate in parallel to increase processing efficiency. Each pipeline contains multiple stages, allowing numerous instructions to be executed simultaneously. The text implies that dynamic branch prediction is crucial in managing these concurrent executions, especially when the execution process must be 'rewound' or adjusted in case of incorrect predictions, thereby ensuring efficiency and performance within a computing process.
- 04:30 - 10:00: One-bit and Two-bit Predictors This chapter discusses methods to mitigate performance impacts in pipelines by utilizing Dynamic Branch prediction. It specifically focuses on branch prediction buffers, which act as history tables to track previous prediction outcomes.
- 10:00 - 20:00: Handling Control Hazards: Exceptions and Interrupts The chapter delves into handling control hazards focusing on exceptions and interrupts. It discusses using a branch target buffer to improve computing efficiency by storing recently used branch target addresses. This storage allows for immediate access during branch prediction, removing the need for constant recomputation. The strategy forms a crucial part in dynamic branch prediction methodologies aimed at enhancing processing speed and accuracy.
- 20:00 - 27:00: ARM ISA for Handling Exceptions This chapter discusses how ARM ISA deals with exceptions, focusing on the pipeline's response to branch instructions. When a branch prediction is made, the CPU begins fetching instructions either from the fall-through path or the branch target. If the prediction is incorrect, the pipeline must be flushed.
- 27:00 - 35:00: Overflow and Pipeline Flushing Mechanism The chapter titled 'Overflow and Pipeline Flushing Mechanism' delves into the strategies for minimizing errors in predictive mechanisms, particularly in computing contexts. Despite the inevitability of mistakes in any prediction scheme, the focus is on reducing the frequency and impact of these errors. It introduces the concept of a 'one bit predictor' as a simplistic approach to making predictions, highlighting the importance of efficient error management in computational processes.
- 35:00 - 43:00: Hardware for Handling Exceptions The chapter discusses the concept of predictors in handling exceptions, particularly in the context of nested loops in software. It highlights the drawback of predictors running twice in a row when dealing with nested loops, which is a common occurrence in many software applications.
- 43:00 - 46:00: Complexity of Precise Exceptions in Superscalar Pipelines The chapter discusses the limitations of the one-bit predictor in superscalar pipelines. It explains that a one-bit predictor often starts with the assumption that a branch will be taken, which is particularly relevant in loops where branching back occurs. The chapter uses a simulation to illustrate the behavior and common issues associated with one-bit predictors in pipeline processes.
Computer Architecture Course - Chapter 4 - Processor - Part 5 Transcription
- 00:00 - 00:30 so this is where we left last time and we were going to start to um discuss Dynamic Branch prediction as a solution to decrease the likelihood of having to stall during the execution of a branch instruction now this is obviously as I explained last time a larger problem in
- 00:30 - 01:00 deeper pipelines where you have pipeline stages in the order of 9 11 13 uh pipeline stages and if you have to flush the pipeline and introduce stall Cycles the number of cycles and the penalty for that is much larger also where you have a a super scaler organization which we will see later on what that is exactly but that essentially is uh where you
- 01:00 - 01:30 replicate pipelines and they may not be uh for the same functionality but you have pipelines working in parallel to execute the stream of instructions now you have multiple pipelines obviously multiple stages per pipeline so you may have a large number of instructions uh being executed in the process or any given time and if you have to rewind your execution uh
- 01:30 - 02:00 stream and flush the pipeline or and the PIP or the pipelines that may cause a lot of um it may be a big impact on performance so that way one way to try to mitigate that is to use Dynamic Branch prediction and there are several aspects to that so one aspect of that is uh a branch prediction buffer which is really u a history table of what has happened to the um to the prediction that you
- 02:00 - 02:30 have done before and also you can have a branch Target buffer so instead of computing the branch uh Target addresses every time you have stored the recently used uh Branch Target addresses so you don't have to compute them you have them readily available to use them in your branch prediction uh scheme so in branch in Dynamic brand prediction uh the essential steps are
- 02:30 - 03:00 the following uh you check the table and you can expect the same outcome that you have had recently and you start fetching from the uh that Target which is either the fall through which is the instruction immediately following the branch instruction or the Target that has to do with that prediction of course if you are wrong you do have to flush the pipeline uh
- 03:00 - 03:30 reverse your prediction and start fetching from the other Target but what we want to do is minimize the number of times that we are wrong eventually we are going to be wrong in any prediction scheme but the question is how do we minimize that and minimize the the penalty for being wrong one uh the simplest type of predictor is what we call a bit one bit
- 03:30 - 04:00 predictor so we're going to uh just predict uh the outcome that was successful last now this has a a shortcoming in the sense that when you have um nested Loops you're going to be run twice in a row you're always going to be run twice in a row and uh a lot of software have you know uh nested Loop so that's that's a that's a
- 04:00 - 04:30 a common construct so that is a big um drawback to the one bit predictor so we can see here uh a simulation of a one bit predictor so let's say that we start arbitrarily with the uh assumption that the branch is going to be taken and since we're talking about loops we're going to be branching back so
- 04:30 - 05:00 we predict that it's going to be taken and we are correct and we're going to follow that Trend and going to be correct for as many times as the loop is is um is in effect but when we are going to exit the loop we are going to be incorrect and then we are going to predict the reverse we're going to predict that the branch is not taken
- 05:00 - 05:30 because we were wrong last time so we revers our prediction and then we are incorrect again because we are exiting the second Loop and we're not branching back uh or we're branching actually back in the other loop uh rather so we have that situation for nested loops with a one bit predictor so in the in this simple simulation the accuracy is um
- 05:30 - 06:00 80% uh nothing magical about that number that's just a function of the length of this um simulation and we can have another type of uh simulation where we actually the loops are shorter in the sense uh in the way that they are arranged so in that sense with a one bit predictor if the we start with
- 06:00 - 06:30 assuming that the branch was taken we assume that and our prediction is correct then uh the branch is not taken and we're incorrect then we assume that it's not taken but it is taken so we are incorrect again now we flip our prediction but we are incorrect again and you can see how our performance can
- 06:30 - 07:00 really suffer uh as a function of the existence of nested uh loops and the fact that we are flipping our prediction every time that we are wrong but we are not taking into account for the nesting of the loops so one way to solve that is to use a two-bit predictor which instead of of having two states as implied by the one
- 07:00 - 07:30 bit predictor appro predictor approach we have four states which is what we can afford with uh two bits and here we are able to remember we're able to remember two consecutive outcomes so if we start predicting that the branch is taken as long as our prediction is correct we stay we stay in that state if the branch is not taken and we
- 07:30 - 08:00 are wrong we move to another state but we still predict that the branch is taken we don't change our prediction but we by switching States we remember that we were wrong ones if the branch is taken we were right and we go back to the original state so essentially resetting our Trend but if we are wrong again we go to a third state and at that point
- 08:00 - 08:30 we have remembered through the state machine that the brand we were run twice and only when we are run twice is that we switch our prediction if um if when we are predicting that the branch is not taken we are wrong we go back to the previous state so that way we are not we're absorbing the fact that we are flipping the outcome of the uh the branch
- 08:30 - 09:00 outcomes if the branch is not taken so we were right we predict that it's not taken but now we have remembered that we were right twice and we stay there as long as our prediction is correct if we start being wrong in this state we basically reverse the pa the state path just like we did before and we start remembering that we were wrong once and if we are here and we are
- 09:00 - 09:30 wrong uh we go to this state and we have remembered that we were wrong twice and switch our prediction if we are right here we go back to this state essentially absorbing the fact that we were wrong once so this is the two predictor uh the two bit predictor D State diagram so let's see how that fares with the um what we were looking at before with the simulation
- 09:30 - 10:00 so here's the um the first simulation which gave us 80% accuracy with a one bit predictor so in this case we start uh predicting that the branch is taken so this demonstrates that if you you start with a wrong assumption this
- 10:00 - 10:30 can lower this can lower your accuracy but that's a little bit of an unfair comparison before be uh with the one bit predictor uh simulation of before because before we uh we were right in our Branch outcome if we had been right here then our accuracy would have been 90% and let's look at the second one that had uh that was a more complex uh
- 10:30 - 11:00 set of outcomes so here we're incorrect so it does improve our accuracy from 40% to 50% that's a 25% increase the way that we would look at that but it does show that your accuracy does increase with a two bit uh predictor in the case that you have more
- 11:00 - 11:30 heterogeneous outcomes in the uh actual outcomes of the of the branch um of the branch being taken versus non- taken so going back to what I introduced at the beginning that even with the predictor you still need to calculate the target address and that causes a cycle even in the case of the simple pipeline that we are
- 11:30 - 12:00 seeing uh for the arm uh version that we're looking at so again one way to mitigate that is to only calculate the branch Target once and keep around the recently used uh Branch Target addresses in a branch Target buffer so in essentially caching the target addresses and this uh buffer is indexed by the PC
- 12:00 - 12:30 when that Branch instruction is fetched so the buffer and the branch Target address is linked to the address to the um the code address that has that Branch instruction so if we are fetching a branch with at that program counter address then that will be indexed the branch Target buffer and we already
- 12:30 - 13:00 have the branch Target address and in the case that we predict that the branch is taken we can just use that that value immediately so this gives you an idea of how to mitigate the installing and the flushing of the pipeline and how to deal with control hazards uh which is really the subject of Branch
- 13:00 - 13:30 prediction and so far we have uh looked at how to deal with hazards in general for pipelines for structural hazards data hazards and control hazards what we have not talked about are the effect that hazards have in in the flow control when we deal with exceptions and interrupts so this on expected events that require the
- 13:30 - 14:00 vectoring to a service routine is really the hardest of all although they are part of the control type of Hazard but it's not really a branch um a branch type of Hazard it is a an exception or an interrupt uh Hazard the difference between exceptions and interrupts are that exceptions uh they come about within the CPU so some
- 14:00 - 14:30 some event some uh unexpected event happened uh in the CPU it could be that there was an undefined up code um because of noise you can have alpha particles that can change a bit in memories so if there is an undefined of code or there was an overflow from an arithmetic instruction or the operating system uh requires a system call these
- 14:30 - 15:00 are things that happen inside the CPU we call those exceptions interrupts on the other hand are generated by external entities to the CPU such as peripherals and uh they interrupt the CPU to let the CPU know that this peripheral this entity that is external to the processor requires the attention of the CPU for example all
- 15:00 - 15:30 external IO controllers uh will behave in this manner from your mouse your keyboard your disc drive your communication type of uh peripherals so the processor has to go on service this entity service this peripheral via an interrupt service routine so given the fact that this uh types of calls are asynchronous to the operation of the program that is being executed in
- 15:30 - 16:00 the processor it is really hard dealing with them without sacrificing the performance so this is a one part that a lot of work has been done on how to minimize the impact of servicing exceptions and interrupts in uh pipeline processors essentially the ones that are that have um High number of stages and have a super scaler organization with multiple pipelines uh running at the
- 16:00 - 16:30 same time so what would we do to handle exceptions well for handling exceptions uh the ISA that we are studying for arm has defined the following steps we have to save the program counter of the instruction that cost the the exception and we're going to save that in the exception link register which is
- 16:30 - 17:00 an special register defined in the ISA and then the we're going to save the indication of the problem in what is called an exception syndrome register so this is the cost of um of the the the exception and uh let's say that we assume for the purpose of the discussion that we have a one bit that this is a one bit register and that our exception is going to be uh whether we have an
- 17:00 - 17:30 undefined up code and or an overflow so zero is going to be undefined up code one is going to be for overflow those are the two exceptions that we are um discussing at this at this moment another mechanism is to have what is called Vector interrupts which um there is a Handler that is predefined this is actual software
- 17:30 - 18:00 that is uh predefined that determines uh or handles they interrupt and this Handler is determined by the cost so an exception Vector register or or vector address uh has to be added to this Vector table from some base register so if we have a base register that has the addresses of all this exception handlers
- 18:00 - 18:30 you could have a vector table that is an offset from the base where the code to handle the exceptions start so as zero which would be no displacement from the base register there we have the code of to handle an unknown type of uh exception and here on this one that's where the code would start to handle
- 18:30 - 19:00 overflow and so forth okay so uh the instruction can deal with the interrupt or uh jump to another Handler so from here you can just have some small code that handles the exception or from here you can jump to a larger uh software Handler for the exception or
- 19:00 - 19:30 interrupt so what are the C what are the actions that normally this uh handlers would um would do so obviously read the cost and transfer to the relevant re relevant Handler if you require something more complex and in there you would determine the action that is uh required if uh if the cost is deemed to
- 19:30 - 20:00 be restorable meaning that you can absorb that error and gracefully continue with the operation of the problem or the the program you would take that small corrective action and use the use the uh exception link register to return to the program at the offending instruction and restart the program if uh if the cost is deemed that is the program is not restartable
- 20:00 - 20:30 because it's a catastrophic failure then you would terminate the program uh and report that error using the um exception link register and the cost register so that's what the ISA for the arm has specified to handle this events uh but the issues that would um that would be a catalyst for termination of a program are the things that you know you
- 20:30 - 21:00 would see in Windows when you get like a blue screen or something like that so these exceptions as I mentioned are another form of control uh Hazard okay so uh let's say we consider an overflow on an addition instruction in the execution stage where you are adding the contents of registers two and one
- 21:00 - 21:30 and putting the result in register one one thing that we could do if we detect an overflow is preventing register one from being overridden why because we have an overflow it's not a valid result so we don't want to uh do away with our previous VA value which was a valid
- 21:30 - 22:00 value we may complete any uh instructions that came before this addition that caused the problem and then at that point flush the addition and all the subsequent instructions because we don't know if those instructions were relying on the new value of X1 and as we mentioned um as part of the mechanism we would said the ESR and ELR
- 22:00 - 22:30 register values and transfer control to the Handler okay the good news is that a lot of the same Hardware that can that can be used or was used to um to handle branches uh can be reused because this this phenomenon behaves much of the in the same way as a mispredicted branch in terms of having to flush the pipeline uh
- 22:30 - 23:00 finishing the uh the previous instructions and things like that so now let's take a look at our diagram with some Hardware added to handle exceptions here we have a the address of a vector for handling exceptions we added that in front of the mock of the program counter so we can decide so so the controller can decide whether we are
- 23:00 - 23:30 going to take the next instruction we had that before the branch instruction so this is for handling branches or the exception Master Vector Handler but we still have all the same Hardware in in [Music] place but we still had the same um previous Hardware in place for flushing
- 23:30 - 24:00 the pipeline and uh and things like that so again remember that we can have restorable exceptions uh which is just we just flush the pipeline uh execute whatever the Handler needs to do and then restart the instruction uh and that we save the PC in the exception link register which is what
- 24:00 - 24:30 identifies the cing instruction and also the PC that we have the next PC which is PC plus4 is also save and the Handler must adjust that if he going to restart the instruction so that's a small point to um to take there so let's look at the
- 24:30 - 25:00 um the same instruction that we were pointing at here it causes an an overflow we prevent register one from being uh written we finish executing the previous instructions but then we jump to where the Handler is so we force the this value into the PC see and we start fetching from here and you can see how
- 25:00 - 25:30 the first instruction here is or the first couple of instructions are the stores so we are storing important registers um that we need for the handling of the of the exception so let's look look at that sequence in our pipeline diagram here
- 25:30 - 26:00 the first instruction was a substract so that is being fetched it moves to the next stage and we fetch the second instruction the end now we fetch the or and now we fetch the add when the add gets to
- 26:00 - 26:30 the execute stage over here that's where the over when the that's where the Overflow is going to be detected and where we have the ESR and the ELR and the values that go there are just going to be sent down the pipeline through the pipeline registers so now we're going to get an overflow at this
- 26:30 - 27:00 point we have to flush the pipeline but let all the instructions that were before which finish with the or instruction execute we flush the pipeline and install in this case three Cycles transfer control through here to the Handler and
- 27:00 - 27:30 we'll start executing the instructions at the Handler so when we have multiple exceptions this issue becomes even more complex okay if the uh in like like in the case of super scaler that we're going to see uh next time we meet but if
- 27:30 - 28:00 you can visualize multiple pipelines in parallel uh and exceptions that are generated that overlap multiple instructions they could generate multiple exceptions at once okay so a simple approach uh is to deal with the exception from the earliest instruction and then flush everything else we call that approach um and phenomenon precise
- 28:00 - 28:30 exceptions however as we'll see next time in complex pip pipelines where you have a super scalar organization with parallel pipelines to keep these pipelines busy at all times you need to start issuing instructions uh out of order
- 28:30 - 29:00 from how the compiler placed them in the instruction stream as long as there are no dependencies you can do that and we'll discuss Hardware to do that uh and keep the pipelines busy so now you have out of order uh execution and out of order completion multiple instructions uh issued per cycle to keep the multiple pipelines uh busy so it really um can
- 29:00 - 29:30 become difficult maintaining that preciseness in exceptions so that is a big uh problem that is really beyond the scope of the course but uh we did want to raise it so you know that um it exists and is a real issue okay