Exploring Cryptographic Vulnerabilities

16. Side-Channel Attacks

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

In MIT's lecture on side-channel attacks, the focus is on how cryptography can leak information unintentionally through side channels. These channels may be physical manifestations such as power usage, sound, or even timing discrepancies. Often overlooked, side channels can reveal secret keys by examining system behaviors outside of expected data flows. Through detailed analysis, this lecture examines techniques like the use of timing variations to extract cryptographic keys from seemingly secure systems, with specific focus on attacks over networks and detailed implementation deficiencies, spotlighting the real-world seriousness of these vulnerabilities.

Highlights

The lecture introduces the concept of side-channel attacks, which focus on unintentional data leaks in systems📡.
Historical examples include RF emissions from teletypes in the 1940s and power analysis of computers📠.
A modern case presented involves timing attacks over networks, highlighting how such vulnerabilities manifest online💻.
Cryptographic implementations, like RSA, can be vulnerable due to their complex arithmetic under assumptions📊.
Attacks often target cryptographic keys due to their concise nature and ease of extraction compared to bulk data📬.

Key Takeaways

Side-channel attacks exploit unintentional information leaks in systems, posing significant security threats⚠️.
Timing analysis can reveal cryptographic keys, even over noisy networks like Ethernet🌐.
Optimizations like the Chinese Remainder Theorem and sliding windows are double-edged swords, enhancing efficiency but also exposing systems to attacks⚔️.
The methods of carrying out these attacks are intricate, often involving clever manipulation of equations and observations👀.
Security measures must account for these vulnerabilities, ensuring that systems aren't just theoretically but also practically safe🔒.

Overview

Side-channel attacks delve into the unexpected leaks of information that appear when cryptographic systems are deployed. These attacks exploit any information inadvertently exposed through system interactions, like timing and power usage.

The lectures go on to illuminate how attackers can extract secret information by monitoring seemingly innocuous elements of computation, such as timing discrepancies when decrypting messages using RSA. This involves intricate mathematical and computational techniques that expose vulnerabilities even in complex systems.

By dissecting these channels, MIT's lecture series aims to equip listeners with an understanding of the fragile nuances of cryptography in real-world applications, urging a careful consideration of implementation details in security-critical systems.

Chapters

00:30 - 03:00: Introduction to Side-Channel Attacks The chapter introduces the concept of side-channel attacks. The discussion opens with a reference to the support needed for MIT OpenCourseWare, acknowledging the Creative Commons license under which the content is provided. The focus then shifts to defining side-channel attacks, indicating that the session will delve into this specialized topic.
03:00 - 06:00: Examples of Side-Channel Attacks This chapter introduces the concept of side-channel attacks. It discusses how they occur when unintentional information is revealed through an overlooked channel, rather than direct data breaches. The focus is on the indirect ways systems can leak data, such as through user-server communications, where systems might inadvertently reveal unconsidered information despite secure direct communications.
06:00 - 11:00: RSA Cryptosystem Basics This chapter delves into the fundamentals of the RSA Cryptosystem, exploring the unique side channels that can inadvertently reveal information. Originally identified in the 1940s, side channels in cryptography remain a critical security concern, as they allow for the deduction of hidden data through analysis of the timing of communications, among other factors. The chapter emphasizes the importance of understanding and mitigating these potential vulnerabilities, thereby ensuring the robustness of the RSA system against such indirect exposure.
11:00 - 17:00: RSA Exponentiation Techniques The chapter discusses the concept of side channels in the context of RSA exponentiation, providing historical anecdotes like the observation of RF radiation from teletype machines. This is used as a classic example of side channels that can be exploited to reveal information such as characters being typed. These side channels pose a significant risk in cryptographic systems, where unintended information leakage can occur through various systems beyond the intended input-output pathways.
17:00 - 21:00: Montgomery Multiplication The chapter on Montgomery Multiplication delves into various side channels that could potentially leak sensitive information during computational processes. It highlights that power usage can vary according to the computational tasks being performed, which might allow attackers to deduce what is being computed. Additionally, the chapter presents an intriguing example involving sound as a side channel, where researchers could decipher printed characters by analyzing the audio produced by a printer.
21:00 - 26:00: Karatsuba Multiplication The chapter titled 'Karatsuba Multiplication' discusses the intriguing aspects of side channels in technological devices. It opens with a mention of dot matrix printers and their distinctive sound, setting a backdrop for discussing the quirks and operational characteristics of technology. Kevin is mentioned as an individual who introduced the audience to interesting side channels noted in his research. A significant focus of the chapter is the examination of a particular side channel analysis conducted by David Brumley and Dan Bonet. About a decade ago, they published a paper on how they managed to extract a cryptographic key, highlighting vulnerabilities in cryptographic systems. The analysis serves as a crucial case study for understanding how computational processes can unintentionally reveal sensitive information. The chapter underscores the importance of being aware of these vulnerabilities, especially in the context of cryptography.
26:00 - 33:30: Combining Optimizations in RSA Pipeline The chapter discusses side-channel attacks on cryptographic keys within an RSA pipeline. It explores how an adversarial client might exploit timing differences in server responses, particularly targeting cryptographic keys due to their sensitivity. Even small amounts of leaked key information can be highly valuable, making such keys a common target in side-channel attacks.
33:30 - 41:00: Timing Attack Strategy In this chapter, the concept of timing attacks is explored, specifically how attackers can extract small amounts of sensitive information, such as cryptographic keys, from a system. Although only about 200 to 256 bits might be extracted at a time, it is significant enough to break the cryptographic key of a web server. The chapter contrasts this with the extraction of larger datasets, such as databases full of social security numbers, which would require leaking many more bits. The focus of side channel attacks is often on obtaining small, critical pieces of information like cryptographic keys.

16. Side-Channel Attacks Transcription

00:00 - 00:30 the following content is provided under a Creative Commons license your support will help MIT open courseware continue to offer highquality educational resources for free to make a donation or to view additional materials from hundreds of MIT courses visit MIT opencare at ocw.mit.edu all right guys let's uh get started uh so today we're going to talk about called side Channel attacks which
00:30 - 01:00 is a well a general class of problems that uh comes up in all kinds of systems um sort of broadly side Channel attacks are situations where you haven't thought about some information that your system might be revealing so typically you have you know multiple components communicating maybe a user talking to some server and you're thinking great I know exactly all the bids going over some wire between the user and the server and those are secure but it's often easy to miss some information information revealed either by the user
01:00 - 01:30 or by the server so the example that the paper for today talks about is a situation where the timing of the messages between the user and the server reveals some additional information that you wouldn't have otherwise learn by just observing the bits flowing between these two guys but in fact there's sort of a much broader class of site channels you might worry about originally side channels showed up or sort of people discovered them in the 40s when they
01:30 - 02:00 discovered that when you started typing characters on a teletype the electronics or the electrical Machinery in the teletype would emit RF radiation and you could sort of hook up an oscilloscope nearby and just watch the characters being typed out by sort of monitoring the frequency uh RF frequency sort of going out of this machine right so RF radiation is a sort of classic example of a side channel that you might worry about um and there's sort of lots of examples lots of other examples that
02:00 - 02:30 people have looked at sort of almost anything right so power usage is another uh kind of side Channel you might worry about so your computer is probably going to use different amounts of power depending on what exactly it's Computing uh and there's been sort of other clever examples so sound turns out to also leak stuff uh there's a a cute paper that you can look at that people sort of listen to a printer and based on the sounds the printer is making you can sort of tell what characters it's printing uh this is especially to do for
02:30 - 03:00 dot matrix printers that make this very annoying sound when they're printing um and uh sort of in general it's a sort of good thing to think about uh Kevin and on Monday's lecture also mentioned uh some interesting side channels that he's run into in his research but uh in particular um here we're going to look at the specific side channel that David Brumley and Dan Bonet looked at in their paper I guess about 10 years ago now where they're able to extract a cryptographic graphic key out
03:00 - 03:30 of a web server running Apache by measuring the timing of different responses to different input packets from the sort of an adversarial client and in this particular case they're sort of going after a cryptographic key and in fact many side Channel TXS Target cryptographic Keys partly because it's a little bit tricky to get lots of data through a side Channel and cryptographic keys are one situation where getting even a small number of bits helps you a lot so in
03:30 - 04:00 their attack they're able to extract you know maybe about 200 256 bits or so and just from those 200ish bits they're able to break the cryptographic key of this web server whereas if you're trying to leak some database full of social security numbers then that'll be a lot of bits you have to leak to get a lot of data out of this uh database so that's why many of these side channels if you'll sort of see them uh later on they often focus on getting uh small Secrets out might be cryptographic keys or
04:00 - 04:30 passwords but in general this is applicable to lots of other situations as well and one sort of cool thing about this paper before we sort of jump into the details is that they show that you can actually do this over the network so as you probably figured out reading this paper they have to do a lot of careful work to like tease out these minute differences in timing information so if you actually compute out the numbers from this paper it turns out that each request that they sent to the server uh
04:30 - 05:00 differs from potentially another request by on order of one to two microc seconds which is pretty tiny so um you have to sort of be quite careful and uh over a network it might be hard to tell whether uh some server took one or two micros seconds longer to process your request or not and as a result uh it was not so clear for a while whether you could Mount this kind of an attack over a very
05:00 - 05:30 noisy Network and these guys were one of the first people to show that you can actually do this over a real you know ethernet network with a server sitting in one place a client sitting somewhere else and you could actually measure these differences partly by averaging partly through other tricks all right does that make sense so overall side Channel stuff all right so the plan for the rest of this lecture is we'll first dive into the details of this RSA crypto system that these guys used and we'll not look at exactly why
05:30 - 06:00 it's secure or not but we'll look at how do you implement it because that turns out to be critical for exploiting uh this particular side Channel they carefully leverage various details of the implementation to figure out when are something's faster or slower and then we'll sort of pop back out once we understand how RSA is implemented then we'll come back and figure out how do you attack it how do you attack all these different optimizations that RSA has sounds good all right so I guess let's start out by looking at the high
06:00 - 06:30 level plan for RSA so RSA is a pretty widely used uh public key crypto system we sort of mentioned these guys a couple of weeks ago in general in certificates in the context of certificates uh but now we're going to look at actually how it works so typically there's three things you have to worry about so there's generating a key encrypting and decrypting so for RSA the way you generate a key is you actually pick two
06:30 - 07:00 large Prime integers so you're going to pick two primes p and Q and in the paper um these guys uh focus on P and Q which are about 512 bits each so this is typically called 1024 bit RSA because the resulting product of these primes which we're going to use in a second is a thousand bit integer number these days it's probably not a particularly good choice for for the size of your RSA key
07:00 - 07:30 because it makes it relatively easy to for attackers to factor this uh not trivial but certainly viable so if you know 10 years ago this seemed like a potentially sensible parameter now if you're actually building a system you should probably pick a 2,000 or 3,000 or maybe even 4,000 bit RSA key but that's what sort of RSA you know key size means is the size of these primes and then uh for convenience we're going to talk about the number n which is just a product of these two primes P *
07:30 - 08:00 Q all right so now we know how to generate a key now we need to figure out well this is at least part of a key uh now we're going to have to figure out how we're going to encrypt and decrypt messages and the way we're going to encrypt and decrypt messages is by exponentiating numbers modul this number n so it seems a little weird but let's sort of go with it for a second so if you want to encrypt a message
08:00 - 08:30 then we're going to take a message M and transform it into m to the power e mod n so e is going to be some exponent we'll talk about how to choose it in a second but this is how we're going to encrypt a message we'll just take this message as an integer number and just exponentiate it and then we'll see why this works in a second but uh let's call this guy C Cipher TCH
08:30 - 09:00 then to decrypt it we're going to somehow find a a interesting other exponent where you can take a cipher Tex C and if you exponentiate it to some power D mod n then you'll magically get back the same message M so this is the general plan you to encrypt you exponentiate to decrypt You exponentiate by another exponent and in general it seems a little hard to figure out how we're going to come up with these two magic
09:00 - 09:30 numbers that somehow end up giving us back the same message but it turns out that if you look at uh sort of how exponentiation works or multiplication Works modular this number n then there's this cool property that if you um have any number X and you raise it to What's called the earlier Fe function of n sorry well maybe I'll use more board
09:30 - 10:00 space for this this seems actually important uh so if you take X and you raise it to five of n then this is going to be equal to 1 mod n and this F function for our particular choice of n is pretty straightforward it's actually P minus1 * Q minus one so this sort of gives us hope that maybe if we pick e d so that e * D is 5 n + 1
10:00 - 10:30 then we're in good shape because then any message M we exponentiate it to e and D we get back 1 time M because our e d product is going to be roughly 5n + 1 or maybe some constant you know Alpha * 5n + 1 so this roughly Mak sense this is sort of why the message is going to get decrypted correctly and turns out that
10:30 - 11:00 there's a reasonably straightforward algorithm if you know this five value for how to compute D given an e or E given a d all right question is one mod n just one yeah so that's why we add one more sorry like up over there yeah this one yeah yeah isn't one what n just one yeah sorry I mean this so when I I say this mod n it means that both sides taken mod
11:00 - 11:30 n are equal so what this means is like if if you want to think of mod as literally an operator you would write okay this guy mod n equals one mod n so that's what sort of mod n on the side means like the whole equality is mod n yeah sorry for the notation yeah make sense all right so what this basically means for RSA is that we're going to uh pick some value e so e is
11:30 - 12:00 going to be our encryption value and then from E we're going to generate D to be basically uh 1 over e mod 5 ofan and there are some you know ukan algorithm you can use to do this computation efficiently but in order to do this you actually have to know this F of n which requires knowing the factorization of our number n into p and
12:00 - 12:30 Q all right so finally so so RSA ends up being um sort of a system where the public key is this number n and this encryption exponent e so n and E are public and D should be private so then anyone can exponentiate a message to encrypt it for you but only you know this value D and therefore can decrypt messages and as long as you don't know this factorization of p and Q or of n
12:30 - 13:00 into p and Q then you don't know what this five value is and as a result it's actually kind of difficult to compute this D value so this is roughly what RSA is at a of high level this roughly make sense all right so there's two things I want to talk about now that we at least have the basic notation for RSA um there's sort of tricks to use it correctly and sort of pitfalls and how to use RSA and then there's all kinds of implementation tricks tricks of how do you actually Implement uh sort of code
13:00 - 13:30 to do these exponentiations and do it efficiently and it's actually non-trivial because these are all large numbers these are you know thousand bit integers that you can just do a multiply instruction for uh it's probably going to take in CPU a fair amount of time to do these operations all right so the first thing I want to mention is uh sort of various RSA uh pitfalls one of them we're actually going to rely on in a little bit um so one property uh is that it's
13:30 - 14:00 multiplicative so what I mean by this is that suppose we have two messages you know suppose we have m0 and M1 and suppose I encrypt these guys so I encrypt m0 I'm going to get m0 to the power e mod n and if I encrypt M1 then I get M1 to the E mod
14:00 - 14:30 n the problem is that or you know not necessarily a problem but could be a surprise to someone using RSA it's very easy to generate an encryption of m0 * M1 because you just multiply these two numbers so if you multiply these guys out you're going to get m0 M1 to the E mod n this is a correct encryption under this simplistic use of RSA for the value
14:30 - 15:00 m0 * M1 I mean at this point it's not a huge problem because you aren't able to decrypt it you're just able to construct this encrypted message but it might be that the overall system you maybe allows you to decrypt certain messages and if it allows you to decrypt this message that you construct yourself maybe you can now go back and figure out what were these messages so it's maybe not a great plan to sort of be ignorant of this fact so this
15:00 - 15:30 could has certainly come back to bite a number of protocols that use RSA so that's one property we'll actually use it as a defensive mechanism in a towards the end of the lecture another property of RSA that you probably want to watch out for uh is the fact that it's deterministic so in this naive implementation that I described for here if you take a message M and you encrypt it you're going to get m to the E mod n
15:30 - 16:00 which is the deterministic function of the message so if you encrypt it again you'll get exactly the same encryption this is not surprising but it might not be a desirable property because if I see you send some message encrypted with RSA and I want to know what it is it might be hard for me to decrypt it but I could try different things and I can see well are you sending this message I'll encrypt it and let see if you get the same Cipher text and if so I'll know oh yeah that's what you encrypted because all I need to encrypt a message is the publicly known public key which is n and the number e so that's not so great uh
16:00 - 16:30 and you might want to watch out for this property if you're actually using RSA so all these sort of Primitives are probably a little bit hard to use uh directly what people do in practice uh in order to avoid these problems with RSA is they encode the message in a certain way before encrypting it so instead of directly exponentiating a message they actually take some function of a message and then they encrypt
16:30 - 17:00 that mod n and this function f uh the sort of right one to use these days is probably something called optimal asymmetric encryption padding O A you can sort of look it up but it's um some encoding that has two interesting properties first of all it injects Randomness so you can sort of think of f ofm as generating a 1,000 bit message that you're going to encrypt part of this message is going to be your message
17:00 - 17:30 m in the middle here so that you can get it back when you decrypt of course but there's two interesting things you want to do you want to put in some Randomness here some value R so that when you encrypt the message multiple times you'll get different results out each time so then it's not deterministic anymore and in order to defeat this multiplicative property and other kinds of problems you're going to put in some fixed padding here you can sort of think of this as maybe an alternating sequence of one 0 one 0 one 0 you can do better things but it's some predictable sequence that you put in here and
17:30 - 18:00 whenever you decrypt you make sure the sequence is still there if you did multiplication then it's going to destroy this bit pattern and then you should declare that oh someone tampered with my message and reject it and if it's still there then presumably sometimes provably no one tampered with your message and as a result you you should be able to accept it and treat message as being correctly encrypted by someone make sense yeah the attacker knows how
18:00 - 18:30 big the p is can't they can't they put a one in the lowest place and then a bunch of zeros so that youate the pad under multiplication yeah maybe it's a little bit tricky because this Randomness is going to bleed over so the particular construction of this OE is a little bit more sophisticated than this uh but if you imagine uh this is like integer multiplication not sort of bitwise multiplication and uh so you know this Randomness is going
18:30 - 19:00 to bleed over somewhere and it's uh you can construct an O scheme such that this doesn't uh happen this is a somewhat simplified version make sense all right so it turns out basically you shouldn't really use this RSA math directly you should use some library in practice that implements all this things correctly for you and use it just as an encrypt decrypt primitive but turns out these details will come in and matter for us because we're actually trying to figure out how how to break or how to attack an existing RSA
19:00 - 19:30 implementation so in particular the attack from this paper is going to exploit the fact that the server is going to check for this padding when they get a message so this is how we're going to time the how long it takes a server to decrypt we're going to send some random message or some carefully constructed message but the message wasn't constructed by taking a real M and encrypting it we're going to construct a careful Cipher text integer
19:30 - 20:00 value and the server is going to decrypt it it's going to decrypt to some nonsense and the padding is going to not match with a very high probability and immediately the server is going to reject it and the reason this is going to be good for us is because it'll tell us exactly how long it took the server to get to this point to just do the RSA decryption get this message check the padding and reject it so that's what we're going to be measuring in this attack from the paper that make sense so there's some Integrity component to the message that allows us to to time the decryption leading up to
20:00 - 20:30 it all right so now let's talk about how do you actually Implement RSA so the core of it is really this exponentiation which is not exactly trivial to do as I was mentioning earlier because all these numbers are very large integers so the message itself is going to be a at least in this paper a thousand bit integer and the exponent itself is also going to be pretty large the encryption exponent is at least well known but the decryption exponent better be also a large integer also on the order of a thand bits so if
20:30 - 21:00 you have a thousand bit integer you want to exponentiate it to another thousand bit integer power modulo some other thousand bit integer n that's going to be a little messy uh if you just do the naive thing so almost everyone has lots of optimizations in their RSA implementations to make this go a little bit faster and there's sort of four four optimizations that matter for the purpose of this attack there's actually more tricks that you could play but the most important ones are these so first there's something called the Chinese
21:00 - 21:30 remainder theorem or CRT and just remind you from like grade school or something high school maybe uh What uh this remainder theorem says um it actually says that if you have um two numbers and um so if you have some value X and you know that X is equal to A1 mod P and you know that X is equal to A2 Q mod Q where p and Q are prime
21:30 - 22:00 numbers and this is you know this modular equality applied to the whole equation then it turns out that there's a unique solution to this mod PQ so there are some you know x equals to some X Prime mod PQ and in fact there's a unique such X Prime and it's actually very efficient to compute so the P remainder theorem sort of also comes with an algorithm for
22:00 - 22:30 how do you compute this unique X Prime that's equal to X Mod PQ given the values A1 and A2 mod P and Q respectively that make sense roughly okay so how could you use this Chinese remainder theorem to speed up modular exponentiation so the way this is going to help us is that if you notice all the time we're doing this computation of some ofu mod n which is p * q and the
22:30 - 23:00 Chinese remainder theorem says that well if you want the value of something mod P * Q it suffices to compute the value of that thing mod P and the value of that thing mod q and then use the Chinese remainder theorem to figure out the unique solution to what this thing is mod mod P * Q all right why is this faster seems like you're basically doing the same thing twice and then some more work Rec combine it is this Gonna Save anything
23:00 - 23:30 yeah if T Q are small enough to F inside machine size well they're certainly smaller they're not that small right so P so N is a th000 bits p and Q are both 500 bits so they're not quite to the machine word size yet but it is going to help us because most of the stuff we're doing in in this computation is all these multiplications and roughly multiplication is quadratic in the size of the thing you're multiplying because because like you know the sort of grade school method of multiplication you take
23:30 - 24:00 all the digits and you multiply by all the other digits in the number and as a result doing uh exponentiation and multiplication is roughly quadratic in the input size so if we shrink the value of P well we basically go from 1,000 bits to 512 bits we've reduced the size of our input by two so this means that all this multiplication exponentiation is going to be roughly four times cheaper so even though we do it twice each time is four times faster so overall the CRT optimization is going to
24:00 - 24:30 give us basically a 2X performance boost for doing any RSA operations both on the encryption and decryption side that make sense roughly all right so that's the sort of first optimization that uh most people use um the second thing that um most implementations do is a technique called sliding windows
24:30 - 25:00 and we'll sort of look at this optimization two steps so this optimization is going to be concerned with what basic operations are going to perform to do this exponentiation so suppose you have some Cipher Tex C that's now 500 bits because you've we're now doing mod P or mod Q so we have that 500 bit C and similarly you know roughly a 500 bit d as well so how do we raise C to the power D
25:00 - 25:30 I guess the completely stupid way to do it is to take C and keep multiplying it D times but D is very big it's two to the 500 so that's never going to finish so a more amenable or more performant plan is to do what's called repeated squaring so that's sort of the step before sliding windows so this technique called repeated squaring
25:30 - 26:00 kind of looks like this so if you want to compute c um to you know the power two uh X then you could actually compute C to the X and then Square it so in our naive plan Computing C to the 2x would have involved us making twice as many iterations of multiplying because it's multiplying C twice as many times but in fact you can be clever and just compute C to the X and then Square it later so this works well and you know this
26:00 - 26:30 means that if you're Computing C to some even exponent this works and conversely if you're Computing C to some 2x + 1 then you can imagine this is just C to the x^ 2 times another C so this is what's called repeated squaring and this now allows us to compute these exponentiations or modular exponentiations in a time that's basically linear in the size of the
26:30 - 27:00 exponent so for every bit in the exponent we're going to either Square something or Square something and do an extra multiplication so that's sort of the plan for uh repeated squaring so now we can you know at least have non- embarrassing run times for computing modular exponents does this make sense why this is working and why it's faster all right so what's the sliding Windows trick that the paper talks about so this is a little bit uh more sophisticated
27:00 - 27:30 than this repeated squaring business and basically the squaring is going to be pretty much inevitable but what the sliding Windows optimization is trying to do is reduce the overhead of multiplying by this extra C down here so suppose if you have some number that has several one bits in the exponent for every one bit in the exponent in the binary representation you're going to have to do this step instead of this step because for every sort of e odd
27:30 - 28:00 number you're goingon to have to multiply by C so these guys would like to not multiply by this c as often So the plan is to precompute different powers of C so instead of so what we're going to do is we're going to generate a table that says well here's the value of C to the x uh sorry uh C to the 1 here's the value of C to the 3 C to the 7 and I think in open as a cell it goes up to C to the
28:00 - 28:30 31st so this table is going to just be precomputed when you want to do some modular exponentiation you're going to precompute all the slots in this table and then when you want to do this exponentiation instead of doing the repeated squaring and multiplying by this C every time what you're basically going to do is use a different formula it says well if if you have C to the 32x plus some y well you can do C to the X and you can do repeated squaring you
28:30 - 29:00 know very much like before this is to get to 32 this's like 5 poers of two here times C to the Y and C to the Y you can get out of this table so you can see that we're doing the same number of squarings as before here but we don't have to multiply by c as many times you're going to fish it out of this table and do several multiplies by C for the cost of a single multiply this make sense yeah how do youer X Y in
29:00 - 29:30 the first place how do you determine why X and Y oh okay so yeah let's look at that so uh for for repeated squaring uh well actually in both cases what you want to do is you want to look at the exponent that you're trying to use in a binary representation so suppose I'm trying to compute the value of C to the exponent I don't know one 0 1 1 0 1 0 and maybe
29:30 - 30:00 there's more bits okay so if we want to do repeated squaring then you look at the lowest bit here it's zero so what you're going to write down is this is equal to C to the 1 0 1 1 01 squared okay so now you know if only you knew this value then you could just Square it okay now we're going to compute this guy so C to the 1 0 1 1 0 1 is equal to well okay so here we can't use this rule because it's not 2x it's going to be 2 x + 1 so now we're going
30:00 - 30:30 to write okay this is C to the 1 0 1 1 0 squared time another C because it's sort of this prefix * 2 plus this one at the end so that's how you fish it out for repeated squaring and for sliding window you just grab more bits from the low end so if you wanted to do the uh sliding window trick here instead of taking one C out suppose we do do uh you know instead of this giant table maybe we do three bits at a
30:30 - 31:00 time so we go up to C to the 7th so here you would grab the first three bits and that's what you would compute here C to the 101 to the e8th power and then the rest is C to the 101 power here it's a little unfortunate these are the same thing but really there's like more bits here uh but here this is the thing that you're going to look up in the table this is C to the 5th in De deal and this is you're going to keep doing the
31:00 - 31:30 sliding window to compute this value make sense all right so this is just saves on how many times you have to multiply by C by pre-multiplying it a bunch of times and the opl guys at least 10 years ago thought that you know going up to 32 power was the best plan in terms of efficiency because there's some trade-off here right you know you you spend time pre-computing a stable but then you know if the stable is too giant you're not going to use some entries because you know if you run this table out to I don't know C to the 128 but
31:30 - 32:00 you're Computing just like 512-bit exponents maybe you're not going to use all these entries it's going to be a waste of time question other than uh timing side channels is there a reason not to compute the table lazily so you like locate enough memory for 128 entries and then you just fill them in whenever you end up needing them um it ends up being the case that you don't want uh well there's two things going on one is that you'll have now code to check whether the entry is
32:00 - 32:30 filled in or not and that'll probably reduce your branch predictor accuracy on the CPU so it'll run slower in the common case because you keep having to check whether the entry is there the other slightly annoying thing is that turns out this entry leaks stuff through a different side Channel mean namely cash access patterns so if you have some other process on the same CPU you can sort of see which cash addresses are getting evicted out of the cach or are slower because someone accessed this
32:30 - 33:00 entry or this entry and the bigger this table gets the easier it is to tell what the exponent bits were like in the limit this table is you know gigantic and just telling just being able to tell which uh cache address on a CPU you know had a Miss tells you that the encryption process must have accessed that entry in the table and tells you that oh that long bit sequence appears somewhere in your secret key exponent um so I guess the answer is
33:00 - 33:30 mathematically you could totally fill this in on demand uh in practice you probably don't want it to be that giant and also if you if it has if it's particularly giant then you're going to be able to reuse entries as efficiently as well you can sort of re reuse these entries as you're Computing it right it's not actually that expensive because you use C to the cubed when you're Computing C to the 7th and and so on and so forth um so it's not that bad make sense other questions all right so this is the sort
33:30 - 34:00 of repe uh repeated squaring and sliding window optimization that uh op SSL uh implements still does I don't actually know whether they still have the same size of the sliding wind or not but it does actually give you a fair bit of speed up so before you had to square for every bit in the exponent and then you'd have to have a multiply for every one bit so if you have a 500 bit exponent then you're going to do 500
34:00 - 34:30 squar rings and on average roughly 256 multiplications by C so with sliding Windows you're going to still do the 512 squarings because there's no sort of getting around that but instead of doing 256 multiplies by C you're going to hopefully do way fewer uh maybe something on the order of like 32 maybe multiplies by some entry in this table so that's sort of the general plan or that's why it's sort of save something not as dramatic as CRT not 2x
34:30 - 35:00 but could save you like close to 1.5x maybe um well depending on exactly what uh the numbers look like make sense other questions about this all right so these are the sort of relatively easier optimizations um and then there's two sort of clever tricks playing with numbers for how to do just a multiplication more efficiently so the first one of these
35:00 - 35:30 optimizations that we're going to look at um I think I'll erase this board um is called this Montgomery representation and we'll see in a second why it's particularly important for us so the problem that this Montgomery representation op ization is trying to solve for us is the fact that every time
35:30 - 36:00 we do a multiply we get a number that keeps growing and growing and growing in particular you know both uh or in sliding windows or in repeated squaring actually you when you square you do multiply two numbers together when you multiply by C to the Y you multiply two numbers together and the problem is that if the inputs to the multiplication were let's say 512 bits each then the result of the multiplication is going to be a
36:00 - 36:30 th000 bits and then you take this 10,00 bit result and you multiply It Again by something like 512 bits now it's 1500 bits 2,000 bits 2500 bits and it keeps growing and growing and we really don't want this because multiplication is quadratic in the size of the number we're multiplying so we have to keep the size of our number as small as possible which means basically 512 bits because all this computation is mod P or mod Q yeah what you want find is just AER so
36:30 - 37:00 you can just take theer out that's right yeah so the cool thing is that we can keep this number down because what we do let's say we want to compute you know C to the X just from this example squared squared again you know squared again so what you could do is you compute C to the X then you take it mod P let's say right then you square it then you do mod P again then you square it again and then you do mod P again and so on so this is basically what you're proposing so this is great and in fact it keeps
37:00 - 37:30 the size of our numbers to basically 52 bits which is as small as we could get this is good in terms of keeping down the size of these numbers for multiplication but it's actually kind of expensive to do with this mod P operation because the way that you do mod P something is you basically have to do division and division is way worse than multiplication I'm not going to go through sort of the AL various algorith for division but it's uh really slow you
37:30 - 38:00 really want to avoid division as much as possible because it's not even just sort of a straightforward quadratic thing you have to do some approximation algorithm sort of like Newton's method of some sort and just keep it rating it's going to be slow and in the naive implementation this actually turns out to be the slowest part of doing multiplications the multiplication is cheap but then doing mod P or mod Q to bring it back down in size is going to be actually more expensive than the multiply so that's actually kind of a
38:00 - 38:30 bummer so the way that we're going to get around this is by doing this multiplication in this clever other representation and I'll sort of show you the trick here um let's see sort of well bear with me for a second and then we'll sort of at the end see why it's actually so fast to use this Montgomery trick and the the basic idea is to represent numbers so these are like regular numbers that you might actually want to multiply and we're going to have a
38:30 - 39:00 different representation for these numbers called the mtom representation and the representations which actually very easy uh we just take the value a and we multiply it by some magic value R I'll tell you what this R is in a second uh but let's first figure out if we picked some arbitrary value R what's going to happen here so we take two numbers A and B they are representations are you know sort of expectedly a is a r b is BR R and if you
39:00 - 39:30 want to compute the product of a * B well in Montgomery space you can also multiply these guys out you can take a r multiply by BR R and what you get here is a Ab * R 2 so there's two RS now that's kind of annoying but you can sort of divide it back by R and we get AB * r so this is probably weird in the sense that why why would you multiply by this
39:30 - 40:00 extra number but let's first figure out why the whether this is correct and then we'll figure out why this is going to be faster so it's correct in the sense that it's very easy if we want to multiply some numbers we just multiply by this r value and we get the mtom representation then we can do all these multiplications to these mtom forms and it's just every time we multiply two numbers we have to divide by R we'll get the Montgomery form of the multiplication result and then when we're done doing all of our squarings multiplication all this stuff we're going to move back to the normal
40:00 - 40:30 regular form by just dividing by R One Last Time nice number yeah so we're going to now pick R to be a very nice number and in particular we're going to pick R to be a very nice number to make this division by R very fast and the cool thing is that if this division by R is going to be very fast then this is going to be a small number and we're not going to have to do this mod Q very often in particular AR let's
40:30 - 41:00 say is also going to be roughly 500 bits because it's all actually mod P or mod Q so AR is 500 bits BR is going to be also 500 bits so this product is going to be a th000 bits this R is going to be this nice 500 roughly bit number same size as p and if we can make this division to be fast then the result is going to be roughly a 500 bit number here so we were able to do the multiply without having to do an extra divide dividing by R
41:00 - 41:30 cheaply gives us the small result getting us out of doing a mod P for most situations okay so what is this weird number that I keep talking about well R is just going to be 2 to 512 so it's going to be like one followed by a ton of zeros so multiplying by it is easy you just append a bunch of zeros to a number dividing you know could be easy if the
41:30 - 42:00 low bits of the result are all zeros right so if you have a value that's a bunch of bits followed by 512 zeros then dividing by 2 the 512 is cheap you just discard the zeros on the right hand side and that's actually a correct divisiones that make sense so slight problem is that we actually don't have the zeros on the right hand side when we do this multiplication we're going to get these are like real 512 bit numbers with all the five 12 bits used so this will be a thousand bit number with all the bits also sort of set to randomly
42:00 - 42:30 zero or one depending on what's going on so we can't just discard the low bits but the cleverness comes from the fact that the only thing we care about is the value of this thing mod P so we can always add multiples of P to this value without changing it what it's equivalent to mod P and as a result we can add multiples of P to get the low bits to be all zeros so let's look through some simple
42:30 - 43:00 examples I'm not going to write out 512 bits on the board uh but uh suppose that here's a sort of a short example suppose that we have a situation where our value R is 2 to 4th so it's 1 followed by 4 zos this is a much smaller example than the real thing but let's see actually how this Montgomery division is going to work out so suppose we're going to try to compute Stuff Mod Q uh where Q let's say is
43:00 - 43:30 maybe 7 so this is one one one in binary form and what we're going to try to do um is maybe we did some multiplication and this value a r * BR is equal to uh this binary representation 1 1 0 1 0 so this is going to be sort of the value of a r * BR how do we divide it by R so
43:30 - 44:00 clearly the low four bits aren't all zero so we can just divide it out but we can add multiples of Q in particular we can add 2 * Q so 2 Q is equal to 1 1 1 0 and now what we get is 0 0 carryer one Z carrier one one uh carrier 1 0 1 hope I did that right so
44:00 - 44:30 this is what we get so now we get a r BR plus 2 q but we actually don't care about the plus 2 Q it's actually fine because all we care about is the value mod well Q or something right and now we're closer we have three zero bits at the bottom now we can add another multiple of Q this time it's going to be probably 8 Q so we add one one one here 0 Z and if we add it we're going to get let's say 0 0 0 at the end add these two
44:30 - 45:00 guys zero carry a one zero carryer one one one I think that's right but now we have our original AR BR plus 2 Q + 8 Q is equal to this thing and finally we can divide this thing by R very cheaply because we just discard the low four zeros make sense question arvr always going to end in i
45:00 - 45:30 guess4 zos no and the reason is that a ah okay yeah here's the thing that's maybe confusing a was let's say 512 bits then you multiply it by R so here you're right this value is a 1,00 bit number where the high bit is a the high 512 bits are a and the low bits are all zeros but then you're going to do it mod Q to bring it down to make it smaller and in general this is going to be the case because it only sort of has these
45:30 - 46:00 low zeros the first time you convert it but after you do a couple of multiplications they're going to be arbitrary bits um so these guys right so so yeah I really should have written you know mod Q here and you compute this mod Q as soon as you do the conversion uh to keep the whole value small make sense other question expensive yeah so the initial conversion is expensive or at least it's as expensive as doing a regular mod modulus during the multiplication the cool thing
46:00 - 46:30 is that you pay this cost just once when you do the conversion into Montgomery form and then instead of converting it back at every step you just keep it in Montgomery form so remember that in order to do an exponentiation to an exponent which has 500 bits we are saying we're going to have to do you know over 500 multiplications because we have to do at least 500 squarings plus then some so you do these mod Q twice and then you get a lot of cheap divisions if you stay in this form and
46:30 - 47:00 then you do a division by R to get back to this form again so instead of doing 500 mod Q's for every multiplication step you do it twice mod q and then you keep doing these divisions by R cheaply using this trick question um so when you're adding the multiples of Q and then diving ah because it's actually mod Q means the remainder when you divide by Q so you know X plus you know y * Q mod
47:00 - 47:30 Q is just X so in this case uh dividing by so another sort of nice property is that because it's all modulus a prime number um it's also true that if you have x + yq ided R mod Q This is actually the
47:30 - 48:00 same thing as x / R mod Q uh the way to sort of think of it is that there's no real division in modular arithmetic it's just an inverse so what this really says is this is actually x + yq times some number called R inverse and then you compute this whole thing mod q and then you could sort of think of this as well x * R inverse mod Q Plus y q r inverse mod
48:00 - 48:30 q and this thing cancels out because it's something times Q make sense all right and uh you know there's some closed form for this thing so here I did it like bid by bit okay well 2 Q then 8 Q Etc there's actually a nice closed formula that you can compute it's an lecture note but it's not probably worth spending time on the board here for how do you figure out what multiple
48:30 - 49:00 of Q should you add to get all the low bits to turn to zero so then it turns out that in order to do this division by R you just need to compute this magic multiple of Q add it and then discard the low bits and that brings your number back to 512 bits or whatever the sizes make sense okay and here's the subtlety like the only reason we're talking about this is that there's something funny going on here that is going to allow us
49:00 - 49:30 to learn timing information and in particular even though we divide it by R we know the result is going to be 512 bits but it still might be greater than Q because Q isn't exactly 2 to 512 it's some 512 bit number so it might be a little bit less than R so it might be that after we do this cheap division by R we might have to subtract out Q one more time because we get something that's small but not quite small enough
49:30 - 50:00 so there's a chance that after doing this division we maybe have to also you know subtract Q again and this subtraction is going to be part of what this attack is all about so it turns out that subtracting this Q adds time and someone figured out uh not these guys but some previous work actually showed that this probability of doing this thing this is called an extra
50:00 - 50:30 reduction um this probability uh sort of depends on the particular value that you're exponentiating so if you're Computing x to the D mod Q the probability of an extra reduction at some point while Computing x to the D mod Q is going to equal to X Mod Q / 2
50:30 - 51:00 R all right so let me just of separate out so if we're going to be Computing x to the D mod Q then depending on what the value of x mod Q is whether it's big or small you're going to have either more or less of these extra reductions and just to show you where where this is going to fit in this is actually going to happen in the decrypt step because during the decrypt step the server is going to be Computing C to the D and this says that well the extra reductions are going to be proportional
51:00 - 51:30 to how close X or C in this case is to the value Q so this is going to be worrisome right because the attacker gets to choose the input C and the number of extra reductions is going to be proportional to how close the C is to one of the factors the Q and this is how you're going to tell that oh I'm getting close to the Q or I've overshot q and all of a sudden there's no extra is probably because X Mod Q is very small because X is Q Plus a little Epsilon and it's very small so that's
51:30 - 52:00 one part of the timing attack we're going to be looking at in a second make sense I don't have any proof that this is actually true but we'll sort of take it on faith that these extra reductions work like this yeah question what happen if you don't do this extra reduction ah what happens if you don't do this seual reduction yeah you know it's you can avoid this extra reduction uh and then you're just GNA have to do
52:00 - 52:30 some extra you know probably modular reductions later I think the math just works out nicer this way for uh the Montgomery form I think for many of these things it's actually like once you look at it as a timing channel then you're thinking well yeah yeah maybe we should just like not do this at all or maybe we should uh do some other plan so you're right I think you can probably avoid this extra reduction and probably just do the mod Q perhaps at the end I haven't actually tried implementing this but seems like it could work uh it might be that you just have to do mod Q once at the end which you'll have to probably
52:30 - 53:00 do anyway so it's not super clear uh maybe it saves you a little bit of something but uh probably not huge so in light of the fact that the sty attack exists maybe you should avoid this U but I I actually I shouldn't speak authoritative to this I haven't actually tried implementing this so maybe there's some like deep reason why this extra reduction has to happen I couldn't think of one but uh all right other questions all right so here's the last piece of the puzzle for how op SSL this library
53:00 - 53:30 that this paper attacks um implements multiplication so this Montgomery trick is great for avoiding the mod Q Part D during modular multiplication but then there's a question of how do you actually multiply two numbers together so we're like going lower and lower level so suppose you have um of the Raw multiplic so this is not even modular multiplication suppose you have two
53:30 - 54:00 numbers A and B and both these guys are you know 512 bit numbers how do you multiply them together when your machine is only a 32-bit machine like the guys in the paper or 64-bit but still same thing so how would you implement multiplication of these guys any suggestions well I guess there's a straightforward plan which is you just represent a and b
54:00 - 54:30 as a sequence of machine words and then you just do the sort of quadratic product of these two guys so let me just just see a simple example instead of thinking of 512-bit numbers let's think of these guys as 64-bit numbers and we're on a 32-bit machine right so we're going to have values the value a is going to be represented by two 32-bit things that's going to be uh let's call it A1 and a z so a z is the low bits A1 is the High bits and similarly we're going to
54:30 - 55:00 represent b as two things B1 B 0 so then a naive way to represent AB is going to be to multiply uh all these guys out so it's going to be a three cell number right the high bit is going to be A1 B1 the low bit is going to be a0 b0 and the middle word is going to be A1 B 0 plus
55:00 - 55:30 a0 B1 so this is how you do the multiplication right make sense Yeah question so I was going to say so you getting to like using car method yeah so this is a there is like a clever method alternative for doing multiplication which doesn't involve four steps so here we have to do four multiplications there's this clever other method caruba I don't know do they teach it in like 601 or something these days all for two excellent yeah so it's a very nice method and almost every uh cryptographic Library implements it uh and uh for those of you that I guess
55:30 - 56:00 weren't undergrads here since we have grad students maybe they haven't seen caruba I'll just sort of write it out on the board it's sort of a clever thing the first time you uh see it uh and what you can do is basically um compute out three values uh you're going to compute A1 B1 you can also compute A1 minus B 0 * uh B1 uh minus oop sorry A1 minus a0 B1
56:00 - 56:30 minus B 0 and a0 B 0 and this does three multiplications instead of four and it turns out you can actually reconstruct this value from these three multiplication results uh and the particular way you do it is you sort of this is going to be the well here let me write it out in a different form um so we're going to have
56:30 - 57:00 2 64 times um sorry 2 to the 64 + 2 the 32 time uh A1 B1 plus um 2 to the 32 times uh minus that little guy in the middle A1 minus
57:00 - 57:30 a0 B1 minus B 0 and finally we're going to do uh 2 32 + 1 times a0 B 0 and it's a little messy but actually if you work through the details you'll end up convincing yourself hopefully that this value is exactly the same as uh this value it's a little clever uh but uh nonetheless it saves you one
57:30 - 58:00 multiplication and the way you apply this to doing much larger multiplications is that you just recursively keep going down so if you had 512-bit values you could break it down to 256-bit multiplication you do three 256bit multiplications and then each of those you're going to do using the same card subar trick uh recursively and eventually you'll get down to machine size which is you can just do with a single machine instruction and then sort of reconstruct the value at the top this make
58:00 - 58:30 sense so what's the timing attack here how do these guys exploit this car subba multiplication well it turns out that op SSL worries about basically two kinds of multiplications that it might need to do one is a multiplication between two large numbers that are about the same size so this happens a lot when we're doing this modular exponentiation because all the values we're going to be multiplying are all going to be roughly 512 bits in size so when we doing mult
58:30 - 59:00 multiplying by C to the Y or doing a squaring we're multiplying two things that are about the same size and then this katuba trick makes a lot of sense because instead of computing stuff in times squared of the input size katuba is roughly n to the 1.58 something like that so it's much faster but then there's this other situation where op SSL might be multiplying two numbers that are very different in size one that's very big and one that's very small and in that case you could use
59:00 - 59:30 caruba but then it's going to be actually slower than doing the naive thing suppose you are trying to do multiply a 512-bit number by a 64-bit number you'd rather just do the straightforward thing where you just multiply by each of the things in the 64-bit number that's 2N instead of n to the 1.58 something so as a result the open cell guys try to be clever and that's where often you know problems start uh they uh decided that they'll actually switch dynamically between this caruba efficient thing and this sort of you
59:30 - 60:00 know grade school method of multiplication here and their heuristic was basically if the two things you're multiplying are exactly the same number of machine words so that they basically have the same number of bits up to 32-bit units then they're going to do CBA and if the two things that are multiplying have a different number of 32-bit units then they'll do the quad quadratic or sort of straightforward regular normal multiplication and there you could sort
60:00 - 60:30 of see well you know if you if your number all of a sudden you know switches to being a little bit smaller then you're going to switch from the efficient thing to this other multiplication method and presumably the cut off point isn't going to be exactly smooth so you'll be able to tell oh you all of a sudden it's now taking a lot longer to multiply or a lot shorter to multiply than before and that's what uh these guys sort of expl in their timing attack again that make sense what's going on with those
60:30 - 61:00 multiplies all right so I think I'm now done with sort of telling you about all the weird implementation tricks that people play when implementing RSA in practice so now let's try to put them back together into an entire web server and figure out how do you you know tickle all these interesting bits of the implementation from the input Network packet so what happens in a web server is that uh the web server uh if you remember from the htps lecture has a
61:00 - 61:30 secret key and it uses the secret key to prove that it's the correct owner of that certificate uh in this hgps protocol or in TLS and the way this works is that uh the client sends some randomly chosen bits and the bits are encrypted using the server's public key and the server in this TLS protocol decrypts this message and if the message checks out it uses those random bits to establish a session key but in this case
61:30 - 62:00 the message isn't going to check out the message is going to be carefully chosen the padding bits aren't going to match and the server is going to return an error as soon as it finishes decrypting our message and that's what we're going to time here so the server you can sort of think of it as this is you know Apache uh with OP SSL it's going to get a message from the client and you can sort of think of this as you know a cipher text C or a hypothetical Cipher text that the client might have produced and the first thing
62:00 - 62:30 we're going to do with a cipher Tex C we want to decrypt it using roughly this formula and if you remember the first optimization we're going to apply is the Chinese remainder theorem so the first thing we're going to do is basically split our pipeline into two parts we're going to do one thing mod P another thing mod q and then recombine the results at the end of the day so the first thing we're going to do is we're going to actually going to take C and we're going to compute uh let's call this c0 which is going to be equal to C
62:30 - 63:00 Mod q and we're also going to have a different value let's call it C1 which is going to be C mod P and then we're going to do the same thing to each of these values to basically compute C to the D mod P and C to the D mod Q and here we're going to basically initially we're going to start sort of after CRT we're going to switch into
63:00 - 63:30 Montgomery representation because that's going to make our multiplies very fast so the next thing op SSL is going to do to your number it's actually going to compute I'll denoted c0 Prime which is going to be c0 * R mod q and the same thing down here I'm not going to write out the bottom pipeline it's exactly the same and then now that we've switched into Montgomery form we can finally do our
63:30 - 64:00 multiplications uh and here's where we're going to use the sliding window technique so once we have C Prime we can actually try to compute the C Prime uh exponentiated to D mod q and here as we're Computing this value to the D we're going to be using sliding windows so here we're going to do sliding windows for the bits in this D
64:00 - 64:30 exponent and also we're going to do katuba or regular multiplication depending on exactly what the sizes of our operands are so if it turns out that the thing we're multiplying c0 Prime and you know maybe the previously squared you know result are the same size we're going to do cart subba if c0 Prime is Tiny But some previous thing we're
64:30 - 65:00 multiplying it to is Big then we're going to do sort of quadratic multiplication normal multiplication so there's sort of sliding Windows coming in here here we also have this katuba versus normal multiply and also in this step the extra reductions come in because at every multiply the extra reductions are going to be proportional to the thing we're exponentiating mod Q so just to sort of
65:00 - 65:30 plug in that formula over here the probability of extra reduction is going to be proportional to the value of c0 Prime mod Q divided by 2 R make sense so this is where the really timing sensitive bit is going to come in and there's actually two effects here there's a scuba versus normal Choice and then there's the number of extra reductions you're going to be
65:30 - 66:00 making but okay so we we'll see how we exploit this in a second but now that you get this result for mod Q you're going to get a similar result mod P you can finally recombine these guys from the top and the bottom and sort of you use CRT and what you get out from CRT is actually sorry I guess we need to uh first convert it back down into non- Montgomery form so we're going to get you know first uh we're going to get
66:00 - 66:30 Prime to the D divided by R mod q and this thing because c0 Prime was c0 * R mod Q if we do this then we're going to get back out our value of C to the D mod q and we get C to the d q here we're going to get C to the D mod P on the bottom version of this Pipeline and we're can use CRT to get the value of C
66:30 - 67:00 to the D mod n sorry for the small type here or font size but roughly it's the same thing we're expecting here we can finally get our result and we get our message M so the server takes an incoming packet that it gets runs it through this whole pipeline does two parts of this pipeline ends up with a decrypt Ed message M that's equal to C to the D mod n and then it's going to check the padding on this message and in this particular
67:00 - 67:30 attack because we're going to carefully construct this value C the padding is going to actually not match up we're going to choose the value C According to some other heris that aren't uh encrypting a real message with the correct padding so the padding is going to be a mismatch and the server is going to immediately report an error back to the client and close the connection and that's the time that we're going to measure to figure out how long this whole pipeline took that make sense any questions about this pipeline
67:30 - 68:00 putting all these optimizations together c q and C1 when you assembl them back yeah you're probably right yeah C1 to the D C 0 to the D yeah right this is c0 yeah cor question when you um divide by R um isn't there and how many cues you have to add to get to get the Z
68:00 - 68:30 and also extra reduction yeah so there might be extra reductions in this Final Phase as well you're right so potentially you know we have to do this divide by R correctly so we probably have to do exactly the same thing as we saw for the Montgomery reductions here when we do this divide by R to convert it back so it's not clear exactly how many Q's we should add we should figure out how many Q's to add add that many Kill the low zeros and then do mod Q again maybe an extra reduction so you're absolutely right this is exactly the
68:30 - 69:00 same kind of divide by R mod Q as we do for every Montgomery multiplication step make sense all right any other questions all right so how do we exploit this how does an attacker actually figure out what the secret key of the server is by measuring the time of this entire pipeline so these guys have a plan that basically involves guessing one bit of
69:00 - 69:30 the private key at a time and what they mean actually by guessing the private key is that you you might think the private key is this decryption exponent D because actually you know e you know n that's the public key the only thing you don't know is D but in fact in this attack they didn't go for the exponent D directly that's a little bit harder to guess instead what they're going to go for is the value Q or the value P doesn't really matter which one once you guess what the value P or Q is then you
69:30 - 70:00 can given n you can factor into P * Q then if you know P * Q you can actually sorry if you know the values p and Q you can compute that five function we saw before and it's going to allow you to get the value D from the value e so this factorization of the value n is hugely important it should be secret for RSA to remain secure so these guys are actually going to go and try to guess what the value of Q is by timing this pipeline all right so how do these guys
70:00 - 70:30 actually do it well they construct carefully chosen inputs C into this Pipeline and you know I guess I keep saying they keep measuring the time for this guy uh but the particular well so the there's two parts of the attack you have to sort of bootstrap it a little bit to guess the first couple of bits and then once you have the first couple of bits you can sort of guess the next bit so let me not
70:30 - 71:00 say exactly how they guess the first couple of bits uh because it's actually much more interesting to see how they guess the next bit and then we'll come back if we have time to look at how we how do they guess the first couple of bits they say it in the paper uh but basically suppose you have a guess g about how what the bits are of this value Q so you know that Q you know has some bits you know g0 G1 1 G2 Etc and actually I guess these are not even
71:00 - 71:30 G's these are the real Q bits so let me sort of write it as that uh so you know that you know qit zero Q bit one Q bit two these are the highest bits of Q and then you're trying to guess lower and lower bits so suppose you know the value of Q up to bit J and from that point on your guess is actually all zeros you have no idea what the bits are so these guys are going to try to get this guest g into this place in
71:30 - 72:00 the pipeline because this is where there's two timing effects this choice of car tuba versus normal multiplication and this choice of or this uh you know different number of extra reductions depending of the value of c0 prime so they're going to actually try to get two different guest values into that place in the pipeline one that looks like this and one that they call G High which is all the same High
72:00 - 72:30 bits Q2 QJ and for the next bit which they don't know guess g is going to have zero g high is going to have a bit one here and all zeros later on so how does it help these guys figure out what's going on so there's really sort of two ways you can sort of think of it uh suppose that we get this guess g to be the value of c0 prime so you can sort of think of this G and G High being the c0 prime
72:30 - 73:00 value on that left board over there it's actually fairly straightforward for them to do this right because c0 Prime is pretty deterministically computed from the input Cipher Tex c0 you just multiply it by R so in order for them to get some value into here as a guess they just need to take their guess and and first divided by R so divided by 2 512 mod something and then they're going to inject it back and the server is going
73:00 - 73:30 to multiply by R and uh off you go make sense all right so suppose that we managed to get our particular chosen integer value into that c0 Prime spot so what's going to be the time to compute c0 Prime to the d q so there's two possible sort of options here where Q Falls in this picture so it might be that Q is sort of between these two
73:30 - 74:00 values because the next bit of Q is zero so this value is going to be less than q but this guy is going to be greater than Q so this this happens if the next bit of Q is zero or it might be that Q lies above both of these values if the next bit of Q is one so now we can try to tell okay like what's going to be the timing of decrypting these two values if Q lies in between them or if Q lies above both of
74:00 - 74:30 them so if let's look at the situation where Q lies above both of them well in that case uh actually everything is pretty much the same right because both of these values are smaller than Q then the value of these things mod Q is going to be roughly the same they're going to be a little bit different uh because there extra bit but more or less they're sort of the same magnitude uh and uh the number of extra reductions is also probably not going to be hugely
74:30 - 75:00 different because it's proportional to the value of this guy mod q and for both of these guys they're both a little bit smaller than Q so they're all about the same right neither of them is going to exceed q and all of a sudden have many fewer extra reductions so if if Q is greater than both of these guesses then C sub versus normal is going to stay the same the server is going to do the same thing basically for both G and G high in terms of caruba versus normal and the server is going to do about the same number of extra reductions for both these guys as
75:00 - 75:30 well so if you see that the server is taking the same amount of time to respond to these guesses then you should probably guess that oh Q probably has the bit one here on the other hand if Q lies in the middle then there's two possible things that could sort of trigger and change the timing one possibility is that because G high is just a little bit larger than Q then the number of extra reductions is going to be proportional to this guy mod Q which is very small because c0 Prime is Q Plus just a little bit in these
75:30 - 76:00 extra bits so the number of extra reduction is going to plummet and all of a sudden it will be faster another possible thing that could happen is that maybe the server will decide oh now it's time to do normal multiplication instead of caruba maybe you know for this value all these uh C to the 0 Prime was the same same number of bits as Q if it turns out that g high is above Q then G High mod Q is potentially
76:00 - 76:30 going to have fewer Bits And if this crosses a 32-bit boundary then the server is going to do normal multiplication all of a sudden so that's going to be sort of in the other direction so if you cross over then normal multiplication kicks in and things get a lot slower because Normal multiplication is quadratic instead of uh nicer faster caruba question yeah because the number of X reduction is proportional to uh you know from above there to C 0 Prime mod Q so if c0
76:30 - 77:00 Prime which is this value is just a little overow Q then this is tiny as opposed to this guy it's basically the same as Q or all all the High bits are the same as q and then it's big so then there will be this difference that you can try to measure so there's this one interesting thing actually a couple of interesting things these effects actually work in different directions right so if you hit a 32-bit boundary and card sub versus normal switches then all of a sudden it takes much longer to decrypt this message on other hand if
77:00 - 77:30 you if it's not a 32bit boundary maybe this effect will uh sort of tell you what's going on so you have to actually watch for different effect so if you're not guessing a bit that's a multiple of 32 bits then you should probably expect the time to drop because of extra reductions on the other hand if you're trying to guess a bit that's a multiple of 32 then maybe you should be expecting for it to jump a lot or maybe drop if it's already normal uh so I guess what these guys look at in the paper is they say actually it doesn't really matter
77:30 - 78:00 whether there's a jump up or a jump down in time you should just expect if Q is if the next bit of Q is one you should expect these things to take almost the same amount of time and if the next bit of Q is zero then you should expect these guys to have a noticeable difference even if it's big or small even if it's positive or negative so they actually sort of measure the this and it turns out to actually work pretty well they have to do actually two interesting tricks to make this work out if you remember the
78:00 - 78:30 timing difference was Tiny it's an order of 1 to two micros seconds so it's going to be hard to measure this over a network over an ethernet switch for example so what these do is they actually do two kinds of measurement uh two kinds of averaging so for each guest that they send they actually send it several times and then the paper they say they send it like seven times or something so what kind of noise do you think this helps them with if they just like resend the same guess over and over yeah with the Jitter that there
78:30 - 79:00 might be lat yeah so if the network keeps you know adding different latency you just try the same thing many times the thing at the server should be taking exactly the same amount of time every time they just average out the network noise and the paper they say they take the median value I actually don't understand why they take the median I think they should be taking the Min that's the real thing that's going on uh but anyway this lets them average out the network noise but then then they do this other weird thing which is that when they're sending a guess they don't just send the same guess seven times they actually send a neighborhood
79:00 - 79:30 of guesses and each each each value in the neighborhood gets sent seven times itself so they actually send G uh seven times then they send G + 1 also seven times then they send G+ 2 also seven times Etc up to g plus 400 in the paper why do they do this kind of averaging as well over different G values instead of just sending G 7 time 400
79:30 - 80:00 times seems more straightforward Yeah by other yeah so actually what's going on right is like we're actually trying to measure exactly how long this piece of the computation will take but there's lots of other stuff for example this other pipeline that's at the bottom it's doing all the stuff mod P it's also going to take different amounts of time depending depending on what exactly the input is so the cool thing is that if you perturb the value of your guest G by
80:00 - 80:30 adding one two three whatever it's just going to twiddle the low bits so the timing attack we just looked at just now isn't going to change because that depended on this middle bit flipping but everything that's happening on the bottom side of the pipeline mod P is going to be totally randomized by this because when you do it mod P then adding you know an extra bit could you know shift things around quite a bit mod P and you're going to uh it'll average out other kinds of sort of computational noise that's deterministic for a particular value but it's not related to
80:30 - 81:00 this part of the computation we're trying to go after make sense how do they do that when they try to guess the law of the bit Ah so actually they use some other mathematical trick to only actually bother guessing the top half of the bits of Q it turns out if you know the top half of the bits of Q there's some math you can rely on to factor the numbers and then you're in good shape so you can always swi a little bits and basically not worry about it make sense Yeah
81:00 - 81:30 question well you're going to construct this value c0 well you want the C 0 Prime you're going to construct a value C by basically taking your c0 Prime and multiplying it times R inverse mod n and then when the server takes this value it's going to push it through here so it's going to compute C 0 it's going to be C Mod Q so that value is going to be c0 prime R inverse mod Q then you multiply it by R so you get rid of the r
81:30 - 82:00 inverse and then you end up with the guess exactly in this position so the cool thing is basically all the manipulations leading up to here are just multiplying by this R and you know what R is going to be it's going to be 2 to 512 so that's going to be relatively straightforward to do sense other questions yeah could we just cancel out time from the bottom all the pipeline I makeing change B be a mple of Q mple of P well if you knew P you'd be in business yeah so that's the thing yeah
82:00 - 82:30 so you don't know what p is but you just want to sort of randomize it out yeah that's that's going on yeah other questions all right well we ran over a little bit but thanks for sticking around so we'll start talking about other kinds of problems uh next week see you guys then