# Hardware Security Side-channel attacks R. Pacalet Telecom Paris / EURECOM /73 Telecom Paris / EURECO Hardware Security — Side-channel attacks ## Section 1 ## Introduction ## Outline - 1 Introduction - 2 Timing attacks - 3 Power attacks - 4 Solutions of exercises 3 Teleco Hardware Security - Side-channel attacks ## Hardware leaks information #### Eventually security always implemented in hardware - Electronic devices: - Consume energy - Take time to compute - Emit electromagnetic radiations - Plus temperature, noise. . . - These *side-channels* usually correlated with processing - In security applications side-channels can be used to retrieve embedded secrets - Few hundreds power traces can be sufficient to retrieve secret key - Even from theoretically unbreakable system - Unlike in quantum cryptography information leakage usually undetectable - True for time - Almost true for power ## A bit of history (1/2) #### An old idea - 1956: MI5/GCHQ against Egyptian Embassy in London - Communications embassy ↔ Cairo encrypted (Hagelin crypto machine) - Suez crisis, MI5/GCHQ want to read the Egyptian cipher - They play the old faulted phone system trick to plant microphones - They record the clicks during the machine reset every morning. . . "Spy Catcher: The Candid Autobiography of a Senior Intelligence Officer", Peter M. Wright, July 1, 1988, Dell Publishing Hardware Security — Side-channel attacks ## A bit of history (2/2) #### An old idea - 1996: P. Kocher timing attacks on RSA, DH, DSS - Applied with success in 2003 against OpenSSL 0.9.6 - 1999: P. Kocher power-attacks DES, AES, etc. (SPA, DPA) - Successful against smart cards, FPGAs... - 2000-: New attacks (SEMA, DEMA, TPA). - Importance of hardware security increases (CHES) - Huge scientific literature on side-channel attacks - Software and hardware implementations consider them - Not always History repeats itself (2/5) - Not always seriously enough - Certification authorities take them into account The attack can extract full 4096-bit RSA decryption keys from laptop computers (of various models), within an hour, using the sound generated by the computer during the decryption of some chosen ciphertexts. We experimentally demonstrate that such attacks can be carried out, using either a plain mobile phone placed next to the computer, or a more sensitive microphone placed 4 meters away. Hardware Security - Side-channel attacks ## History repeats itself (1/5) #### Ultrasounds - 2013. 18<sup>th</sup> of December Daniel Genkin (Technion and Tel Aviv University), Adi Shamir (Weizmann Institute of Science), Eran Tromer (Tel Aviv University) RSA Key Extraction via Low-Bandwidth Acoustic Cryptanalysis. - Target: regular computer computing RSA - GnuPG's current implementation - Vibrations of electronic components (capacitors and coils) - Parts of voltage regulation circuit - Regulate voltage across large fluctuations in power consumption - http://www.cs.tau.ac.il/~tromer/acoustic/ ## History repeats itself (3/5) http://www.cs.tau.ac.il/~tromer/acoustic/img/gnupg-manykeys-downshifted.mp3 https://perso.telecom-paris.fr/pacalet/HWSec/misc/gnupg-manykeysdownshifted.mp3 (the RSA Paso Doble) Telecom Paris / EURECOM Hardware Security — Side-channel attacks ## History repeats itself (4/5) Source ## CVE-2023-0361 CVE-2023-0361 Description A timing side-channel in the handling of RSA ClientKeyExchange messages was discovered in GnuTLS. This side-channel can be sufficient to recover the key encrypted in the RSA ciphertext across a network in a Bleichenbacher style attack. To achieve a successful decryption the attacker would need to send a large amount of specially crafted messages to the vulnerable server. By recovering the secret from the ClientKeyExchange message, the attacker would be able to decrypt the application data exchanged over that connection. > CVE (at NVD; CERT, LWN, oss-sec, fulldisc, bugtrag, EDB, Metasploit, Red Hat, Ubuntu, Gentoo, SUSE bugzilla/CVE, Mageia, GitHub advisories/code/issues, web search, more) References DLA-3321-1, DSA-5349-1 Hardware Security - Side-channel attacks # History repeats itself (5/5) Even if a memory location is only accessed during out-of-order execution, it remains cached. Iterating over the 256 pages of probe array shows one cache hit, exactly on the page that was accessed during the out-of-order execution. https://meltdownattack.com/ ## Timing attacks ## History and notations #### History - First published by Paul Kocher (CRYPTO'96) - Possibly known before that - Implemented by Dhem, Quisquater, et al. (CARDIS'98) - Used by Canvel, Hiltgen, Vaudenay, and Vuagnoux to attack OpenSSL - CRYPTO'03 #### Notations - Unless otherwise stated all numbers are natural integers - $\mathbf{v}[i]$ : $i^{th}$ component of vector $\mathbf{v}$ - $\mathbf{w}[i,j]$ : element of row i, column j of matrix w - $b_{n-1}b_{n-2}...b_1b_0$ : binary representation of number b on n bits - b<sub>0</sub>: Least Significant Bit (LSB) - *b<sub>i</sub>*: *i*<sup>th</sup> bit - b<sub>n-1</sub>: Most Significant Bit (MSB) Telecom Paris / EURECOM Hardware Security — Side-channel attacks ## Example 1 (1/4) #### Exponentiation: $m, d \mapsto m^d$ - $\blacksquare$ $d = d_{w-1}d_{w-2}\dots d_2d_1d_0$ , w-bits exponent - $x^{a+b} = x^a \times x^b; x^{a \times b} = (x^a)^b = (x^b)^a$ - $\blacksquare m^d = m^{d_0} \times m^{d_1 \times 2} \times m^{d_2 \times 4} \times m^{d_3 \times 8} \times \cdots \times m^{d_{w-2} \times 2^{w-2}} \times m^{d_{w-1} \times 2^{w-1}}$ - $m^d = m^{d_0} \times \left(m^{d_1}\right)^2 \times \left(m^{d_2}\right)^4 \times \left(m^{d_3}\right)^8 \times \cdots \times \left(m^{d_{w-2}}\right)^{2^{w-2}} \times \left(m^{d_{w-1}}\right)^{2^{w-1}}$ - $m^d = m^{d_0} \times \left( m^{d_1} \times \left( m^{d_2} \right)^2 \right)^2 \times \left( m^{d_3} \right)^8 \times \cdots \times \left( m^{d_{w-2}} \right)^{2^{w-2}} \times \left( m^{d_{w-1}} \right)^{2^{w-1}}$ - $\blacksquare m^d = m^{d_0} \times \left(m^{d_1} \times \left(m^{d_2} \times \left(m^{d_3}\right)^2\right)^2\right)^2 \times \cdots \times \left(m^{d_{w-2}}\right)^{2^{w-2}} \times \left(m^{d_{w-1}}\right)^{2^{w-1}}$ Hardware Security — Side-channel attacks ## Example 1 (2/4) ## Exponentiation: $m, d \mapsto m^d$ $$m^d = m^{d_0} imes \left( m^{d_1} imes \left( m^{d_2} imes \left( m^{d_3} imes \left( \cdots imes \left( m^{d_{w-2}} imes \left( m^{d_{w-1}} ight)^2 \right)^2 \dots \right)^2 \right)^2 \right)^2 \right)$$ ## Example 1 (3/4) ## Multiply and Square algorithm, 1 exponent bit per iteration (MS1) ■ MSB first, iterations k = w - 1 down to 0, temporary variables a, b ## **Algorithm** MS1 exponentiation | 1: | a ← 1 | | |-----|---------------------------------------|----------------------------------| | 2: | for $k \leftarrow w - 1$ down to 0 do | ▶ From MSB to LSB of d | | 3: | if $d_k = 1$ then | $\triangleright k^{th}$ bit of d | | 4: | $b \leftarrow a \times m$ | | | 5: | else | | | 6: | $b \leftarrow a$ | | | 7: | end if | | | 8: | $a \leftarrow b^2$ | ⊳ Square | | 9: | end for | | | 10: | return b | $\triangleright b = m^d$ | #### **Algorithm** Variant: SM1 1: a ← 1 2: for $k \leftarrow w - 1$ down to 0 do $a \leftarrow a^2$ if $d_k = 1$ then 7: end for 8: return a ## Example 1 (4/4) #### Modular exponentiation, pseudo-code ■ TA prone? Why? What can we expect from TA? #### Algorithm MS1 modular exponentiation ``` 1: a ← 1 ⊳ From MSB to LSB of d 2: for k \leftarrow w - 1 down to 0 do \triangleright k^{th} bit of d if d_{\nu} = 1 then b \leftarrow a \times m \mod n ▶ Modular multiplication else b \leftarrow a 6: 7: end if a \leftarrow b^2 \mod n ▶ Modular square 9: end for \triangleright b = m^d \mod n 10: return b ``` Hardware Security — Side-channel attacks ## Example 2 #### What if computation time of modular product and square is data-dependent? ■ What can we expect from TA? #### **Algorithm** MS1 modular exponentiation ``` 1: a ← 1 ⊳ From MSB to LSB of d 2: for k \leftarrow w - 1 down to 0 do \triangleright k^{th} bit of d if d_{\nu} = 1 then b \leftarrow a \times m \mod n ▶ Modular multiplication 4: 5: else 6: b \leftarrow a end if a \leftarrow b^2 \mod n ▶ Modular square 9: end for \triangleright b = m^d \mod n 10: return b ``` Principle of attack Example 2: P. Kocher attack $k \leftarrow w-1 \text{ down to } r+1 \text{ do}$ $a \leftarrow b^2 \mod n$ $b \leftarrow a \times m \mod n$ 12: for $k \leftarrow r - 1$ down to 0 do 13: ... 14: $a \leftarrow b^2 \mod n$ 15: end for 16: return $b \rhd b = m^d \mod n$ 5: end for 8: else 9: b ← a 6: if $d_r = 1$ then 7: $b \leftarrow a \times n$ 10: end if $a \leftarrow b^2 \mod n$ Hardware Security - Side-channel attacks ## Example 2: P. Kocher attack ``` k \leftarrow w - 1 \text{ down to } r + 1 \text{ do} 3: ... 4: a ← 5: end for a \leftarrow b^2 \mod n 6: if d_r = 1 then 7: b \leftarrow a \times m 8: else 9: b \leftarrow a b \leftarrow a \times m \mod n 9: b \leftarrow a 10: end if 11: a \leftarrow b^2 \mod n 12: for k \leftarrow r - 1 down to 0 do 13: ... 14: a \leftarrow b^2 \mod n 15: end for 16: return b \rightarrow b = m^d \mod n ``` #### Principle of attack - Target computes x modexp - Known plaintexts m[i]; 1 < i < x - $\forall 1 < i < x$ attacker measures **total** modexp computation time $t[i] = TIME(m[i]^d \mod n)$ - W.l.o.g. attacker already knows leading bits of d - $d_{w-1} \dots d_{r+1}$ , for some 0 < r < w-1 - $\bigcirc$ None if r = w 1 - Attacker extracts next unknown bit $d_r$ : - $\forall m[i]$ attacker computes input a of iteration r - $\forall m[i]$ attacker **measures or estimates** computation time of modular multiplication at iteration r: $\widehat{t_{\times}}[i,r] = \text{TIME}(a \times m[i] \mod n)$ Hardware Security - Side-channel attacks ■ Assume for some m[i], $\widehat{t_{\times}}[i, r]$ significantly $(slow) \Rightarrow d_r = 1 (probably)$ ■ Else $d_r = 0$ (probably) confidence Victim probably computed Victim probably skipped $b \leftarrow a \times m[i] \mod n \Rightarrow t \uparrow$ $b \leftarrow a \times m[i] \mod n \Rightarrow t \rightsquigarrow$ greater/smaller than average (attacker can ■ In average, if t[i] large (slow) when $\widehat{t_{\times}}[i,r]$ large ■ The larger the difference, the higher the attacker's ■ Do you understand difference with example 1? distinguish "slow" and "fast" cases from "average") ## Example 2: P. Kocher attack ``` k \leftarrow w - 1 \text{ down to } r + 1 \text{ do} a \leftarrow b^2 \mod n 5: end for 6: if d_r = 1 then 7: b \leftarrow a \times m \mod n 8: else 9: b ← a 9: b \leftarrow a 10: end if 11: a \leftarrow b^2 \mod n 12: for k \leftarrow r - 1 down to 0 do 13: ... 14: a \leftarrow b^2 \mod n 15: end for 16: return b \rightarrow b = m^d \mod n ``` #### Let's formalize a bit - $\forall 1 \leq i \leq x, m[i]$ : $i^{th}$ plaintext - $\mathbf{t}[i] = \text{TIME}\left(m[i]^d\right)$ (measured) - $t[i] = e[i] + \sum_{k=0}^{k=w-1} t[i, k]$ - e[i]: measurement error - t[i, k]: computation time of iteration k of $m[i]^d$ - Attacker knows t[i] - Attacker ignores e[i] and t[i, k] - Compute w 1 r first iterations - $m[i], n, d_{w-1} \dots d_{r+1}$ known - $\widehat{t_{\times}}[i,r]$ : attacker's estimate for TIME $(a \times m[i] \mod n)$ at iteration r Hardware Security — Side-channel attacks ## Example 2: P. Kocher attack $k \leftarrow w-1 \text{ down to } r+1 \text{ do}$ $a \leftarrow b^2 \mod n$ 5: end for 6: if $d_r = 1$ then $b \leftarrow a \times m \mod n$ 8: else 9: b ← a 10: end if 11: $a \leftarrow b^2 \mod n$ 12: for $k \leftarrow r - 1$ down to 0 do 13: ... 14: $a \leftarrow b^2 \mod n$ 15: end for 16: return $b \triangleright b = m^d \mod n$ #### Let's formalize a bit - S: indexes of 10% slowest cases - $\forall i \in \mathcal{S}, \forall j \notin \mathcal{S}, \widehat{t_{\times}}[i, r] > \widehat{t_{\times}}[j, r]$ - |S| = x/10 $\sum_{i=1}^{n} t[i]$ - $t_{\mathcal{S}} = \frac{i \in \mathcal{S}}{x/10}$ - $\blacksquare$ $\mathcal{F}$ : indexes of 10 % fastest cases - $\forall i \in \mathcal{F}, \forall j \notin \mathcal{F}, \widehat{t_{\times}}[i, r] < \widehat{t_{\times}}[j, r]$ - $|\mathcal{F}| = x/10$ - $t_{\mathcal{F}} = \frac{\sum_{i \in \mathcal{F}} t[i]}{x/10}$ - Difference of computation time averages: $$\Delta = t_{\mathcal{S}} - t_{\mathcal{F}}$$ TELECOM Paris Hardware Security - Side-channel attacks ## Example 2: P. Kocher attack #### Let's formalize a bit - $\blacksquare \tau$ : threshold - $\Delta > \tau \Rightarrow d_r = 1$ - $\Delta < \tau \Rightarrow d_r = 0$ - Continue with next bit $(d_{r-1})$ until all bits of d"known" - Attack targets one bit at a time - Attack efficiency depends on: - Number *x* of acquisitions - Variability of $\widehat{t_{\times}}[i, r]$ (data dependency) - Noise *e*[*i*] - |S| and |F| (10% in our example) - Threshold $\tau$ TELECOM Paris ## P. Kocher attack on RSAREF (1/2) FIGURE 1: RSAREF Modular Multiplication Times - $\mathbf{I}_{\times} = \text{TIME}(a \times b \mod n)$ - $\mathbb{E}(t_{\times}) \approx 1167.8 \times 10^{-6}$ sec. - $\sigma(t_{\times}) \approx 12.01 \times 10^{-6}$ sec. - $\blacksquare \mathbb{E}(t_{\text{exp}}) \approx 419901 \times 10^{-6} \text{ sec.}$ - $\sigma(t_{exp}) \approx 235 \times 10^{-6}$ sec. ## P. Kocher attack on RSAREF (2/2) FIGURE 1: RSAREF Modular Multiplication Times FIGURE 2: RSAREF Modular Exponentiation Times - RSAREF (functional) reference software library - 512 bits exponentiation, 256 bits exponent (speed up) - With 250 measurements probability of correct decision at any step. . . **✓** 0.885 Hardware Security — Side-channel attacks # P IP PARIS ## OpenSSL BN library - $\mathbb{E}(t_{exp}) \approx 482206$ clock cycles - $\sigma(t_{exp}) \approx 11453$ clock cycles $\blacksquare t_{\times} = TIME(a \times b \mod n)$ $\sigma(t_{\times}) \approx 130$ clock cycles ■ $\mathbb{E}(t_{\times}) \approx 19929$ clock cycles Hardware Security — Side-channel attacks ## **Optimizations** #### a ← 1 -1 down to r+1 do $a \leftarrow b^2 \mod n$ end for if $d_r = 1$ then $b \leftarrow a \times m \mod n$ else 10: end if 10: end if 11: $a \leftarrow b^2 \mod n$ 12: for $k \leftarrow r - 1$ down to 0 do 13: ... $a \leftarrow b^2 \mod n$ 15: end for 16: return b $\triangleright b = m^d \mod n$ #### Cross-check on modular square computation time - $\hat{t}_2[i, r]$ : attacker's estimate of TIME $(b^2 \mod n)$ at iteration r - Once $d_r$ "known", verification on $\hat{t_2}[i, r]$ - In average, is total computation time t[i] large (small) when $\hat{t_2}[i, r]$ large (small)? - Yes: better confidence in decision - No: doubt about decision ## **Optimizations** #### Detect wrong decisions and fix them - $d_r$ hypothesis wrong $\Rightarrow \forall k \leq r, \hat{a} \neq a \land \hat{b} \neq b$ - Correlation with measured computation time not observable any more - Attack improvement - Keep list of decisions - Keep likelihood $t_S t_F$ - Likelihood-driven back-tracking - Hard decisions → soft decisions - Resembles channel decoding - More memory and CPU usage - Reduce number of acquisitions ## **Optimizations** #### 1: a ← 1 2: for $k \leftarrow w - 1$ down to r + 1 do 3: ... 4: $a \leftarrow b^2$ mo 5: end for 6: if $d_r = 1$ then 7: $b \leftarrow a \times m$ $a \leftarrow b^2 \mod n$ $b \leftarrow a \times m \mod n$ 8: else 9: b b ← a 10: end if 10: end if 11: $a \leftarrow b^2 \mod n$ 12: for $k \leftarrow r - 1$ down to 0 do 13: ... 14: $a \leftarrow b^2 \mod n$ 15: end for 16: return *b* $\triangleright b = m^d \mod n$ #### Observe variance of computation time residue - Following works iff absolute estimates - Not with relative estimates... - ... But relative / absolute ratio can be estimated - t[i, k] = TIME (iteration k of $m[i]^d \mod n$ ) - $\hat{t}[i,k]$ : attacker's estimate for iteration r: • $$\widehat{t_{\times}}[i,r] + \widehat{t_2}[i,r] (d_r = 1) \text{ or } \widehat{t_2}[i,r] (d_r = 0)$$ Hardware Security — Side-channel attacks ## **Optimizations** #### $k \leftarrow w - 1$ down to r + 1 do $a \leftarrow b^2 \mod n$ 5: end for 6: if $d_r = 1$ then 7: $b \leftarrow a \times n$ $b \leftarrow a \times m \mod n$ 8: else 10: end if 11: $a \leftarrow b^2 \mod n$ 12: for $k \leftarrow r - 1$ down to 0 do 13: 13: ... 14: $a \leftarrow b^2 \mod n$ 15: end for 16: return b #### Observe variance of computation time residue - If $d_{w-1} \dots d_{r+1}$ correct - $\widehat{t}[i, k > r] \approx t[i, k > r]$ $\Delta[i] = e[i] + \sum_{k=0}^{k=r} t[i, k] + \sum_{k=r+1}^{k=w-1} (t[i, k] \widehat{t}[i, k])$ - $\Delta[i] \approx e[i] + \sum_{k=0}^{k=r} t[i, k]$ - $\mathbb{V}(\Delta[.]) \approx \mathbb{V}(e[.]) + \mathbb{V}\left(\sum_{k=0}^{k=r} t[., k]\right)$ - $\mathbb{V}(\Delta[.]) \approx \mathbb{V}(e) + (r+1) \times \mathbb{V}(t)$ - $\blacksquare \ \mathbb{V}(\Delta[.]) \downarrow \text{ when } r \downarrow$ TELECOM Paris 画家[]別 DIP PARIS Hardware Security - Side-channel attacks ## **Optimizations** #### a ← 1 2: for $k \leftarrow w - 1 \text{ down to } r + 1 \text{ do}$ 3: ... 4: $a \leftarrow b^2$ mod 5: end for 6: if $d_r = 1$ then 7: $b \leftarrow a \times m$ $a \leftarrow b^2 \mod n$ $b \leftarrow a \times m \mod n$ 8: else 9: b $b \leftarrow a$ 10: end if 10: end if 11: $a \leftarrow b^2 \mod n$ 12: for $k \leftarrow r - 1$ down to 0 do 13: ... 14: $a \leftarrow b^2 \mod n$ 15: end for 16: return $b > b = m^d \mod n$ #### Observe variance of computation time residue - If $d_{w-1} \dots d_{s+1}$ correct but... - $\blacksquare$ ... $d_s$ ... $d_{r+1}$ wrong for some s > r+1 - $\widehat{t}[i, k > s] \approx t[i, k > s]$ $\widehat{t}[i, s \geq k > r] \neq t[i, s \geq k > r]$ $\Delta[i] = e[i] + \sum_{k=0}^{k=r} t[i, k] + \sum_{k=r+1}^{k=w-1} (t[i, k] \widehat{t}[i, k])$ - $\Delta[i] \approx e[i] + \sum_{k=0}^{k=r} t[i, k] + \sum_{k=r+1}^{k=s} (t[i, k] \hat{t}[i, k])$ - $\Delta[i] \approx e[i] + \sum_{k=0}^{k=s} t[i,k] \sum_{k=r+1}^{k=s} \hat{t}[i,k]$ - $\mathbb{V}(\Delta[.]) \approx \mathbb{V}(e) + (s+1) \times \mathbb{V}(t) + (s-r) \times \mathbb{V}(t)$ - $\mathbb{V}(\Delta[.]) \approx \mathbb{V}(e) + (2 \times s r + 1) \times \mathbb{V}(t)$ - $\blacksquare V(\Delta[.]) \uparrow \text{ when } r \downarrow$ ## Wrap up on TA (1/3) #### Where does it come from? ■ Time: processing data takes time #### How does it work? - Acquisition phase - Same secret - Sufficient number of acquisitions with different input messages - Build database of {input, computation time} pairs (may also work with outputs, how?) - Analysis phase (usually off-line) - Attacker tries to retrieve part s of secret (e.g., 1 bit) - Attacker builds "computation time models" $TM_{\sigma}(i)$ - Of one part of computation with input m[i] (e.g., 1 modmul) - Under assumption that s = g - Attacker estimates correlations between $TM_{\varepsilon}$ and measured total times - $TM_g$ with best correlation $\Rightarrow g$ best candidate for s ## Wrap up on TA (2/3) #### Principle - Our example attacks one exponent bit at a time - For each attacked bit $d_k$ , 2 computation time models: - $TM_0(i) = 0 \ (d_k = 0 \ case)$ - $TM_1(i) = TIME(a \times m[i] \mod n)$ at iteration $r(d_r = 1 \text{ case})$ - $t_S t_F$ : estimator of correlation between $TM_1$ and t - Note: there are better correlation estimators - Pearson correlation coefficient, Kolmogorov-Smirnov test... - Correlation( $TM_1, t$ ) > $\tau \Rightarrow d_r = 1$ ( $\tau$ : threshold) - Correlation( $TM_1, t$ ) $< \tau \Rightarrow d_r = 0$ Telecom Paris / EURECOM Hardware Security - Side-channel attacks ## Wrap up on TA (3/3) #### Principle - Other examples with $TM_0(i) \neq 0$ : - $TM_0(i) = TIME(b \leftarrow a; a \leftarrow b^2 \mod n)$ at iteration r - $TM_1(i) = TIME(b \leftarrow a \times m \mod n; a \leftarrow b^2 \mod n)$ at iteration r - Then correlations can be compared (no need for threshold) - Correlation $(TM_1, t) > \text{Correlation}(TM_0, t) \Rightarrow d_r = 1$ - Correlation $(TM_1, t) < Correlation(TM_0, t) \Rightarrow d_r = 0$ - Whatever the statistical tool, principle remains the same: - $TM_g$ with best correlation $\Rightarrow g$ best candidate for s - Analysis usually off-line - But interactive, adaptive attacks also exist Hardware Security - Side-channel attacks ## Pearson correlation coefficient #### A better statistical tool than partitioning - **s**: portion of the secret under attack - x: number of computation time measurements - $\bullet$ t[i], 1 < i < x: computation time measurements - $\blacksquare$ m[i], 1 < i < x: input messages - $TM_{\sigma}(i)$ : attacker's computation time estimate - Of part of $m[i]^d$ (e.g. one iteration) - With hypothesis g on s - $PCC(TM_g, t)$ : estimator of correlation between the t[i] and $TM_g(i)$ $(1 \le i \le x)$ - g with highest $PCC(TM_g, t) \Rightarrow$ best candidate for s ## Exercises on timing attacks (1/2)Check your understanding - ② Ex. 1: List the hypotheses for a timing attack to be practical - ② Ex. 2: Do you think the modified implementation below is protected? ## **Algorithm** Modified MS1 modular exponentiation, version 1 ``` \triangleright From MSB to LSB of d 2: for k \leftarrow w - 1 down to 0 do ▶ Modular multiplication f \leftarrow a \times m \mod n \triangleright k^{th} bit of d if d_k = 0 then b \leftarrow a else 6: b \leftarrow f end if a \leftarrow b^2 \mod n ▶ Modular square 10: end for ``` $\triangleright b = m^d \mod n$ 11: return b ## Exercises on timing attacks (2/2) #### Check your understanding ② Ex. 3: Do you think the modified implementation below is protected? #### **Algorithm** Modified MS1 modular exponentiation, version 2 ``` 2: for k \leftarrow w - 1 down to 0 do ⊳ From MSB to LSB of d f \leftarrow a \times m \mod n ▶ Modular multiplication g \leftarrow a^2 \mod n ▶ Modular square h \leftarrow f^2 \mod n ▶ Modular square \triangleright k^{th} bit of d if d_k = 0 then 7: b \leftarrow a 8: a \leftarrow g 9: else 10: 11: a \leftarrow h end if 12: 13: end for 14: return b \triangleright b = m^d \mod n ``` Hardware Security — Side-channel attacks ## Section 3 ## Power attacks ## Homework on timing attacks #### For next time (please do it) - Imagine countermeasures, evaluate their cost and efficiency - Study and understand P. Kocher paper - Especially the blinding countermeasure he proposes (section 10) - Prepare questions Hardware Security — Side-channel attacks ## Power in CMOS logic (1/2) ## Energy and cell output transitions are correlated ■ E falling edge $\Rightarrow S$ rising edge Current observed across $R_p$ resistor: I(VDD) $I_{\uparrow} = I_{short} + I_{L}$ Telecom Paris / EURECOM Hardware Security — Side-channel attacks Hardware Security — Side-channel attacks ## Power in CMOS logic (2/2) #### Energy and cell output transitions are correlated ■ E rising edge $\Rightarrow S$ falling edge ## Power analysis setup (1/2) ## Power analysis setup (2/2) Figure 1: Floorplan and acquisition board of the SecMat v3 ASIC. ## Simple power analysis, example 1 #### The power trace of a "naive" DES hardware implementation leaks a lot of information - CMOS structures consume energy when switching - Hamming distance of register transitions - Hamming weights in some implementations - Clock spikes Hardware Security - Side-channel attacks ## Simple power analysis, example 2 #### Multiply and square exponentiation - Iterate on exponent bits - From MSB to LSB - $\mathbf{r}$ $c \leftarrow m^d$ ## Algorithm MS1 exponentiation | | • | | |-----|---------------------------------------|--------------------------| | 1: | $a \leftarrow 1$ | | | 2: | for $k \leftarrow w - 1$ down to 0 do | | | 3: | if $d_k = 1$ then | | | 4: | $b \leftarrow a \times m$ | ⊳ Mult | | 5: | else | | | 6: | $b \leftarrow a$ | | | 7: | end if | | | 8: | $a \leftarrow b^2$ | ⊳ Square | | 9: | end for | | | 10∙ | return b | $\triangleright h = m^c$ | Hardware Security - Side-channel attacks ## Mount SPA of DES key schedule #### 6502: 8 bits CPU - 8b accumulator A, 1b carry flag C - Left rotate 28-bits half-key - Channel: energy per instruction - Transitions in *C*, *A* - $0 < n_t < 9$ - ② Ex. 4: What's your attack? 3 **ROL** $\# C/A \leftarrow A/C$ 4 STA a $\# mem[a] \leftarrow A$ 5 LDA a+1 # A <- mem[a+1]6 ROL $\# C/A \leftarrow A/C$ 7 STA a+1 # mem[a+1] < - A8 LDA a+2 # A <- mem[a+2]9 ROL $\# C/A \leftarrow A/C$ 10 STA a+2 # mem[a+2] <- A11 LDA a+3 # A <- mem[a+3] $\# C/A \leftarrow A/C$ 12 **ROL** 13 **STA** a+3 # mem[a+3] < -A# A < -000/A[4..0]14 AND 0x1f 15 **LSR** # C/A <- A[0]/0/A[7..1]16 LSR # C/A <- A[0]/0/A[7..1]17 **LSR** # C/A <- A[0]/0/A[7..1]18 LSR $\# C/A \leftarrow A[0]/0/A[7..1]$ 19 **ORA** a $\# A \leftarrow A \text{ or } mem[a]$ 20 STA a $\# mem[a] \leftarrow A$ Hardware Security — Side-channel attacks # C <- 0 $\# A \leftarrow mem[a]$ 1 CLC 2 **LDA** a ## P. Kocher attack on DES (1/2) #### Last round of DES, 1 SBox at a time - *n* {power trace, ciphertext} pairs • $\{T[i], c[i]\}$ - $\forall 1 \le i \le n, R_{15} = L_{16} \in c[i]$ known - Add hypothesis g on 6 bits of $K_{16}$ Compute" 4 bits of L<sub>15</sub> - As a function of g and c[i] - Focus on 1 bit only: b[i](g) - Split traces in 2 sets - $S_0(g) = \{ T[i] \mid b[i](g) = 0 \}$ - $S_1(g) = \{ T[i] \mid b[i](g) = 1 \}$ - If guess g correct sets should exhibit different energy consumptions - $score(g) = \mathbb{E}(S_0(g)) \mathbb{E}(S_1(g))$ ## P. Kocher attack on DES (2/2) ## Differential Power Analysis (DPA) - The larger the score (difference), the more likely the hypothesis - Hence the name "Differential Power Analysis" (DPA) ## Wrap up on DPA (1/4) #### Foundations - ? Where does it come from? - Power: computing consumes energy #### Power attack principles - Target small portion s of secret - 6 bits of last round key of DES - 1 bit of secret exponent of RSA... - Use intermediate value v as oracle - One bit of $L_{15}$ - Modular product or square. . . - Computed (stored) by same hardware element - Based on hypothesis of data-dependent power consumption - Compute (store) different $\nu$ values $\Rightarrow$ consume different energy Hardware Security — Side-channel attacks ## Wrap up on DPA (2/4) #### Acquisition phase - Same secret, different input messages - Sufficient number of acquisitions $\Rightarrow$ database of acquisitions - {T[i], m[i]} - Pairs of {power trace, input (output) message} #### Analysis phase (usually off-line) - lacksquare Attacker build "power models" $PM_g$ of target implementation - g hypothesis (guess) on portion of secret - $\blacksquare PM_{\sigma}(i) = \text{estimate of energy of specific operation (computation, storage...)}$ - For input (output) m[i] - If *g* correct - As many $PM_g$ models as hypotheses g - $\blacksquare$ $PM_{\sigma}$ that best correlates with actual power traces $\Rightarrow$ most likely hypothesis g Hardware Security - Side-channel attacks ## Wrap up on DPA (3/4) #### Ones do not consume more than zeros - CMOS logic has (almost) no static power consumption - It is only transitions $v_0 \rightarrow v_1$ that consume energy (dynamic power) #### From states to transitions - What if only $v_1$ can be "computed"? - Assume rising and falling transitions consume differently: $I_{\uparrow} I_{\bot} = \epsilon \neq 0$ - Assume *v*<sub>0</sub> uncorrelated - $v_1 = 0 \Rightarrow$ falling transition with probability 1/2, no transition with probability 1/2 - $v_1 = 1 \Rightarrow$ rising transition with probability 1/2, no transition with probability 1/2 - On large number of traces average difference $\approx \epsilon/2$ # Wrap up on DPA (4/4) #### Two types of attacks - "Compute" only $v_1 \Rightarrow \text{Hamming weight of } v_1$ - "Compute" $v_0$ and $v_1 \Rightarrow$ Hamming distance $v_0 \rightarrow v_1$ ## Exercises on DPA #### Check your understanding - Two power models - Hamming weight (based on current state only) - Hamming distance (based on previous and current states) - ② Ex. 5: Most efficient power model? - ② Ex. 6: Differences with timing attacks? - ② Ex. 7: DPA easier or more difficult than TA? Why? - ② Ex. 8: List the hypotheses for a DPA to be practical - ② Ex. 9: Design countermeasures - ② Ex. 10: What if energy depends only on key? - ② Ex. 11: What if energy depends only on input messages? - ② Ex. 12: What if energy depends neither on key nor on input messages? Telecom Paris / EURECOM Hardware Security - Side-channel attacks ## Homework on power attacks #### For next time (please do it) - Read and understand every detail of original paper by P. Kocher - Imagine attack against hardware DES implementation - Ciphertexts are known - $L_0R_0, \ldots, L_{15}R_{15}, R_{16}L_{16}$ values stored successively in **same** 64 bits register - Attacker monitors current on power supply side - Describe in deep details your algorithm - Search ways to blind DES #### Prepare first lab (please do it) - Read directions - Look at provided software libraries - Imagine what you will do, why and how Hardware Security — Side-channel attacks ## Section 4 ## Conclusion ## Further Reading ## In EURECOM library - The Hardware Hacking Handbook, Breaking Embedded Security with Hardware Attacks, Jasper van Woudenberg and Colin O'Flynn, No Starch Press, 2022, 515 p. - Security of Ubiquitous Computing Systems, Selected Topics, Avoine, Gildas; Hernandez-Castro, Julio, Springer, 2021, 268 p. - Power Analysis Attacks: Revealing the Secrets of Smart Cards, Stefan Mangard and Elisabeth Oswald and Thomas Popp, Springer, 2007, 337 p. (PDF available) - Smart Card Handbook, Wolfgang Rankl and Wolfgang Effing, John Wiley and Sons, 2004, 1088 p. - Embedded Cryptographic Hardware: Design & Security, Nadia Nedjah and Luiza de Macedo Mourelle, Nova Publishers, 2005, 255 p. - And a lot more... Questions? Section 5 Solutions of exercises Hardware Security - Side-channel attacks Hardware Security - Side-channel attacks Hardware Security - Side-channel attacks ## List the hypotheses for a timing attack to be practical (1/3) #### Ex. 1: H1: Cryptosystem takes different time to process different input data - Performance optimizations (bypass useless operations) - Branching, conditional statements - Cache misses A Note: intuition suggests leak of small amount of information - Like Hamming weight of key - ▲ Intuition usually wrong (same in probabilities and security) - ▲ Never trust your intuition ## List the hypotheses for a timing attack to be practical (2/3) #### H2: Attacker can build models of timing based on - Knowledge of inputs (or outputs) - Guesses on secret key - Knowledge of implementation - Attacker can redo computations and measure time - With similar timing characteristics - Or use another estimate - Hamming weight of operand... - Relative estimate sufficient ## H3: "Sufficient" number of different inputs, same secret key - Victim runs algorithm - Attacker records timing Telecom Paris / EURECOM ## List the hypotheses for a timing attack to be practical (3/3) #### H4: Total number of guesses on secret key "manageable" - Such that attacker can check them all for correctness - $\odot$ 1 bit $\Rightarrow$ 2 different guesses - $\ \, \odot \,$ 128 bits secret key $\Rightarrow 2^{128} \approx 3.4 \times 10^{38}$ different guesses - $\triangle$ One billion per second $\Rightarrow$ 720 billion times age of universe (brute force attack) - If 128 bits can be split in 16 bytes $\Rightarrow$ 16 $\times$ 28 = 4096 guesses to check... #### H5: SNR good enough ✓ Timing model of right guess match actual measurements "significantly" better TELECOM Paris SE 101 Telecom Paris / EURECON Hardware Security — Side-channel attacks ## Do you think the modified implementation below is protected? # $\begin{array}{lll} 1: & a \leftarrow 1 \\ 2: & \text{for } k \leftarrow w - 1 \text{ down to } 0 \text{ do} \\ 3: & f \leftarrow a \times m \mod n \\ 4: & \text{if } c_k = 0 \text{ then} \\ 5: & b \leftarrow a \\ 6: & \text{else} \\ 7: & b \leftarrow f \\ 8: & \text{end if } b^2 \\ 9: & a \leftarrow b^2 \mod n \\ \end{array}$ ``` Ex. 2: No, it is not better protected ``` - © Can even be worse - Modular multiplication always computed ⇒ higher signal - © branches of if perfectly balanced but... - ... Timing of modular square still depends on b - $\triangle$ That is, on taken branch, that is, on $d_r$ - Attacker can build 2 timing models - $TM_0 = TIME(a^2 \mod n)$ - $TM_1 = TIME(f^2 \mod n)$ - ✓ Timing model that better correlates with measured total times $\Rightarrow$ most likely $d_r$ value 54 / 73 lecom Paris / EURECO Hardware Security — Side-channel attacks ## Do you think the modified implementation below is protected? # $a \leftarrow 1$ for $k \leftarrow w - 1$ down to 0 do $\begin{array}{lll} 3: & f \leftarrow a \times m \bmod n \\ 4: & g \leftarrow a^2 \bmod n \\ 5: & h \leftarrow f^2 \bmod n \\ 6: & \text{if } d_k = 0 \text{ then} \\ 7: & b \leftarrow a \\ 8: & a \leftarrow g \\ 9: & \text{else} \\ 10: & b \leftarrow f \\ 11: & a \leftarrow h \\ 12: & \text{end if} \\ \end{array}$ 13: end for 14: return b - Ex. 3: No, it is not better protected - ② Can even be worse ♪ 3 modular operations ⇒ twice higher signal - © branches of if perfectly balanced but... - $\odot$ ... Timing of modular operations at iteration r-1 depend on a at end of iteration r - f A That is on taken branch, that is on $d_r$ - Attacker can build 2 timing models - $TM_0 = TIME$ (iteration $r 1 \mid d_r = 0$ ) - $TM_1 = \text{TIME} (\text{iteration } r 1 \mid d_r = 1)$ - ✓ Timing model that better correlates with measured total times $\Rightarrow$ most likely $d_r$ value ## Mount SPA of DES key schedule #### Ex. 4: Consumption of last LSR (line 18) - lacktriangle At beginning, before call number 1 - Memory = $k_1 \dots k_{28}$ - Before call $2 \le i \le 28$ - Memory $= k_i \dots k_{28} k_1 \dots k_{i-1}$ - $\blacksquare$ Before last LSR operation of call i - $A = 000000k_ik_{i+1}, C = k_{i+2}$ - After last LSR operation of call i• $A = 0000000k_i$ , $C = k_{i+1}$ - $n_t = k_i + k_{i+1} \oplus k_i + k_{i+2} \oplus k_{i+1}$ - $n_t = \kappa_i + \kappa_{i+1} \oplus \kappa_i + \kappa_{i+2} \oplus \kappa_{i+1}$ • $n_t = 0 \Leftrightarrow k_i k_{i+1} k_{i+2} \in \{000\}$ - $n_t = 1 \Leftrightarrow k_i k_{i+1} k_{i+2} \in \{001, 011, 111\}$ - $n_t = 2 \Leftrightarrow k_i k_{i+1} k_{i+2} \in \{010, 100, 110\}$ - $n_t = 3 \Leftrightarrow k_i k_{i+1} k_{i+2} \in \{101\}$ - ✓ 28 independent equations of 28 variables 56 / 73 ecom Paris / FURECOM ## Most efficient power model? #### Ex. 5: Hamming distance more efficient - $I_{short}$ , $I_L$ : short circuit and charging currents - Assume attacker monitors current on power supply side only - With Hamming distance: - If no transition $(0 \rightarrow 0 \text{ or } 1 \rightarrow 1)$ , $I_{\rightarrow} = 0$ - If rising transitions $(0 \rightarrow 1)$ , $I_{\uparrow} = I_{short} + I_{L}$ - If falling transitions (1 $\rightarrow$ 0), $I_{\downarrow} = I_{short}$ - With Hamming weight: - If ending state = 1 (? $\rightarrow$ 1), in average, $l_{\uparrow?} = \frac{l_{\rightarrow} + l_{short} + l_{L}}{2} = \frac{l_{short} + l_{L}}{2}$ - If ending state = 0 (? $\rightarrow$ 0), in average, $I_{\downarrow ?} = \frac{I_{\rightarrow} + I_{short}}{2} = \frac{I_{short}}{2}$ - ✓ Difference between ending states 1 and 0 (average): $d_{hw} = \frac{I_{short} + I_L}{2} \frac{I_{short}}{2} = \frac{I_L}{2} < d_{hd}$ 73 Telecom Paris / EURECO Hardware Security — Side-channel attacks ## Differences with timing attacks? #### Ex. 6: Pros and cons - Richer side channel (full vector instead of single scalar) - Remote attacks not practical any more (almost)More expensive acquisition phase - More expensive acquisition phase - More processing in analysis phase - Attack could be detectable - More difficult to compute with constant power than with constant time - ? How would you compute with constant power? - ✓ More on this later 58 / 73 com Paris / EURECO! Hardware Security - Side-channel attacks ## DPA easier or more difficult than TA? Why? ## Ex. 7: Pros and cons - $\hfill\blacksquare$ More difficult to monitor energy than time - More data to process - More information in side channel - Extra difficulties like perfect synchronization, for instance - More difficult to compute with constant power than with constant time - ✓ More on this later # List the hypotheses for a DPA to be practical (1/2) #### Ex. 8: H1: Power consumption depends on processed data - Activity of CMOS logic (number of node switches) data-dependent - Dynamic power of CMOS logic is driven by activity #### H2: Attacker can build models of power, based on - Knowledge of inputs (or outputs) - Guesses on secret key - Knowledge of implementation - Attacker can "simulate" some internal states (or transitions) of victim - With similar power consumptions - Or use another estimate - Hamming weight or distance... - Relative estimate sufficient ## List the hypotheses for a DPA to be practical (2/2) #### H3: "Sufficient" number of different inputs, same secret key - Victim runs algorithm - Attacker records power #### H4: Total number of guesses on secret key "manageable" ■ Same reasons as for timing attacks ## H5: SNR good enough ✔ Power models of right guess match actual power traces "significantly" better ## Design countermeasures (1/4) ## Ex. 9: H1: Power consumption depends on processed data - **©** Compute with constant power consumption - (2) Much more difficult than constant timing - But the data dependency can be reduced - © Rather expensive (slower, energy hungry, larger silicon area) - More on this next slide #### H2: Attacker can build models of power, based on knowledge of inputs (outputs) - Prevent attacker from accessing inputs or outputs - Not Always Practical (NAP) - More on this later ## Constant power computation #### 13 hardware implementations of AES SBox - I Std cells, decomposed in GF(16) - 2 Std cells, lookup table - 3 Std cells, decode-permute-encode - 4 ROM - **5** WDDL basic - 6 + Symmetric placement - **7** + Symmetric routing + dummies - 8 + Shielding - 9 + Symmetric gates - SecLib, symmetric placement - $\mathbf{II}$ + Symmetric routing + dummies - **■** + Shielding - **I** + EMA coating plane ## Prevent attacker from accessing inputs or outputs (1/4) #### H2: Attacker can build models of power, based on knowledge of inputs (outputs) Solution Blinding ## Mix input with random seed before processing - Compute complementary transform after processing - © Simple with math-based algorithms (RSA) - ② Difficult with symmetric block ciphers (DES) - Has a cost, may be sensitive to higher order DPA ## Prevent attacker from accessing inputs or outputs (2/4) #### Example of blinding on a public key algorithm ■ RSA, $c = m^d \mod n = (m \times r)^d \times s \mod n$ where $r^d \times s \mod n = 1$ #### **Algorithm** Without blinding ``` 1: procedure RSA-PURE(m, n, d) a \leftarrow 1 for k \leftarrow w - 1 down to 0 do a \leftarrow a^2 \mod n if d_k = 1 then 5: a \leftarrow a \times m \mod n 6: end if end for return a 10: end procedure ``` #### Algorithm With blinding ``` 1: procedure RSA-BLIND(m, n, d) let r^d \times s \mod n = 1 in a \leftarrow 1 m' \leftarrow m \times r \mod n for k \leftarrow w - 1 down to 0 do a \leftarrow a^2 \mod n if d_k = 1 then a \leftarrow a \times m' \mod n 9: end if 10: end for 11: return a \times s \mod n 12: end procedure ``` Hardware Security - Side-channel attacks ## Prevent attacker from accessing inputs or outputs (3/4) #### Example of blinding on a public key algorithm - RSA, $c = m^d \mod n = (m \times r \mod n)^d \times s \mod n$ where $r^d \times s \mod n = 1$ - $\triangle$ $m \times r \mod n$ is prone to timing attacks - $\Rightarrow$ r could be revealed - $\mathbb{Q}$ Change (r,s) after each modular exponentiation - $\odot$ Computing $r^d \mod n$ and $s = (r^d \mod n)^{-1} \mod n$ time consuming - $(r,s) \leftarrow (r^2 \mod n, s^2 \mod n)$ - $\blacksquare$ ... or any other update preserving $r^d \times s \mod n = 1$ Hardware Security - Side-channel attacks # Prevent attacker from accessing inputs or outputs (4/4) #### Masking - Notations: $\neg$ (NOT), $\lor$ (OR), $\land$ (AND), $\oplus$ (XOR), . (product in $GF(2^n)$ ) - Randomly split data in two (or more) parts, process sub-parts, merge results - Applied at level of elementary boolean operations - Example of first order boolean masking - Kind of blinding with similar advantages and drawbacks - Works also with symmetric block ciphers - (2) Has a cost, may be sensitive to higher order DPA - $x \to (m_1.x) \oplus m_0$ (second order affine masking) ## Design countermeasures (2/4) - H2: Attacker can build models of power, based on guesses on secret key - ? How to prevent somebody from guessing? - H2: Attacker can build models of power, based on knowledge of implementation - Auguste who, did you say? ## H3: "Sufficient" number of different inputs, same secret key - Prevent access to the device (NAP) - Authentication and limited number of tries; same as PIN codes (NAP) - ♀ Variant: if authentication fails add exponential delay before next try (NAP) - - ⇒ Secret keys used only for small number of processing - © Not enough traces with same secret key - (NAP) ## Design countermeasures (3/4) H4: Total number of guesses on secret key "manageable" ∇ Increase number of required hypothesis PPA "easy" against DES or AES Because small width of last (first) logic cones A But combinational functions with large number of inputs... 🙁 ... Expensive ☼ ...Slow ② And attacks could also be mounted against internal nodes of logic cone TELECOM Paris 図を設け © IP PARIS Telecom Paris / EURECON Hardware Security - Side-channel attacks ## Design countermeasures (4/4) ## H5: SNR good enough - S Desynchronize traces - Random clock drift - © Clock spikes still there - Random dummy clock cycles - ▲ Dummy clock cycles could be found out - **?** Add some power noise - ✓ Increases number of required traces TELECOM PAIRS SERVI com Paris / EURECO Hardware Security — Side-channel attacks ## What if energy depends only on key? #### Ex. 10: No statistical analysis any more - C Equivalent to Simple Power Analysis - One trace only - <sup>3</sup> But several traces can be averaged to improve SNR - As with SPA, in most cases, best attacker can expect... - $\bullet$ ... $\approx$ Hamming weight of secret key. . . - ✓ ... but not always (see SPA examples and exercise) ## What if energy depends only on input messages? #### Ex. 11: Energy does not depend on secret - $\Rightarrow$ Energy contains no information on secret - $\Rightarrow$ There is nothing to expect about secret from energy - ✔ Power attacks (simple and differential) do not work any more ## What if energy depends neither on key nor on input messages? ## Ex. 12: Energy is constant - $\Rightarrow$ Energy contains no information - $\Rightarrow$ There is nothing to expect about anything from energy - ✔ Power attacks (simple and differential) do not work any more Telecom Paris / EURECOM Hardware Security — Side-channel attacks