Improved Alpha-Information Bounds for Higher-Order Masked Cryptographic Implementations

Authors

Yi Liu

Julien Béguinot

Wei Cheng

Sylvain Guilley

Loïc Masure

Olivier Rioul

François-Xavier Standaert

Abstract

Embedded cryptographic devices are usually protected against side-channel attacks by masking strategies. In this paper, the security of protected cryptographic implementations is evaluated for any masking order, using alpha-information measures. Universal upper bounds on the probability of success of any type of side-channel attack are derived. These also provide lower bounds on the minimum number of queries required to achieve a given success rate. An important issue, solved in this paper, is to remove the loss factor due to the masking field size.

Introduction

When a cryptographic device is operating, any kind of unintended leakage (time, power, electromagnetic, etc.) can be exploited by an attacker. By querying the device multiple times, measuring the corresponding leakages, and correlating them with internal sensitive values, the attacker is able to guess the secret key with a given success probability.

Therefore, evaluating the security of cryptographic devices against side-channel attacks has become a major concern. Information-theoretic metrics turn out to be effective and have been used in many studies: Using classical metrics such as mutual information and Fano inequality, Chérisey et al. (de Chérisey et al. 2019) established several universal bounds on the probability of success and minimum number of queries required to achieve success. This approach has been extended to conditional \(\alpha\)-informational quantities in (Liu et al. 2021). Both (de Chérisey et al. 2019) and (Liu et al. 2021), however, were restricted to unprotected cryptographic devices.

Masking is one of the most well-established protection with provable security. Some research (Duc, Faust, and Standaert 2015; Cheng et al. 2022; Masure, Rioul, and Standaert 2022; Ito, Ueno, and Homma 2022) was conducted to evaluate the security of masked implementations against side-channel attacks. To review the state of the art, we follow the framework and notations from (Heuser, Rioul, and Guilley 2014; de Chérisey et al. 2019; Cheng et al. 2022).

Background and Notations

Let \(K\) be the secret key and \(T\) be a public variable (usually, plain or ciphertext) known to the attacker. Both \(K\) and \(T\) are \(n\)-bit variables, uniformly distributed, and independent of each other. The field size is \(M=2^n\). The cryptographic algorithm operates on \(K\) and \(T\) to compute an \(n\)-bit sensitive variable \(X=f(K,T)\). In a masking scheme of order \(d\), the sensitive variable is randomly split into \(d+1\) shares and cryptographic operations are performed on each share separately. Thus \(X=X_0 \oplus X_1 \oplus \cdots \oplus X_d\), where each share \(X_i\) is a \(n\)-bit variable and \(\oplus\) is the additive operation in the underlying field (or Abelian group). A typical example is “Boolean masking,” for which \(\oplus\) is the bitwise XOR operation. During computation, side-channel information on \({X}=(X_0, X_1, \ldots, X_d)\) is leaking and can be measured as a noisy “trace” by the attacker, denoted by \({Y}=({Y}_0, {Y}_1, \ldots, {Y}_d)\). We assume that \({Y}\) is the output of a memoryless side-channel with input \({X}\). Since masking shares are drawn uniformly and independently, both \({X}\) and \({Y}\) are i.i.d. sequences.

The attacker measures \(m\) traces \({Y}^m=({Y}_1,{Y}_2,\ldots,{Y}_m)\) corresponding to the independent text sequence \(T^m=(T_1,T_2,\ldots,T_m)\)—assumed independent of the secret \(K\)—and exploits her knowledge of \({Y}^m\) and \(T^m\) to estimate the secret key \(\hat{K}\). Again, since the side channel is memoryless, \({X}^m\) and \({Y}^m\) are i.i.d. sequences. Let \(\mathbb{P}_s=\mathbb{P}(K=\hat{K})\) be the probability of success of the attack upon observing \(T^m\) and \({Y}^m\). In theory, maximum success is obtained by the MAP (maximum a posteriori probability) rule with success probability denoted by \(\mathbb{P}_s=\mathbb{P}_s(K|{Y}^m,T^m)\). The whole process is illustrated in Fig. [model1].

State-of-the-art

Duc et al. (Duc, Faust, and Standaert 2015) derived a lower bound on the minimum number \(m\) of queries required to achieve a given probability of success \(\mathbb{P}_s\), which can be rewritten as: \[\label{duc} m \geq \frac{\log (1-\tfrac{1}{M})-\log(1-\mathbb{P}_s)}{-\log \Big( 1-(\frac{M}{\sqrt{2 \log e}})^{d+1} \prod_{i=0}^{d} \sqrt{I(X_i;{Y}_i)} \Big)}\] where \(d+1\) is the number of shares, \(M\) is the field size, and \(I(X_i;{Y}_i)\) is the mutual information between each share and its corresponding leakage. They also showed that this bound was quite loose in practice and conjectured that when the leakage of shares is sufficiently noisy (and independent among shares), the lower bound on \(m\) should take the approximate form \[\label{eq:duc:conj} m \gtrsim \frac{\beta(\mathbb{P}_s)}{\prod_{i=0}^{d} I(X_i;{Y}_i)}\] where \(\beta\) is a “small constant depending on \(\mathbb{P}_s\)(Duc, Faust, and Standaert 2019, 1279).

The bound [duc] was improved recently in (Masure, Rioul, and Standaert 2022): \[\label{loic} m \geq \frac{d_1(\mathbb{P}_s\|\frac{1}{M})}{\log \Big( 1+\tfrac{M}{2} (\tfrac{2}{\log e})^{d+1} \prod_{i=0}^{d} I(X_i;{Y}_i) \Big)}\] where \(d_1(\cdot \| \cdot)\) is the binary Kullback–Leibler divergence. A similar bound was derived independently in (Ito, Ueno, and Homma 2022). Although this greatly improves [duc] for small \(M\), when the field size \(M\) is large, the \(M\) factor in the denominator loosens the bound by an substantial amount. Therefore, an important issue is to find out whether this factor \(M\) can be removed.

Outline

In this paper, we have two main contributions. First, we generalize the “linear bound” \(d_1(\mathbb{P}_s\|\frac{1}{M})\leq mI(X;{Y})\) in (de Chérisey et al. 2019) to \(\alpha\)-informational quantities where the usual linear bound is recovered by letting \(\alpha\to 1\). Second, we derive the following novel bound which removes the loss caused by the field size: \[m \geq \frac{d_2(\mathbb{P}_s\|\frac{1}{M})}{ \log \Bigl( 1+ \prod_{i=0}^d ( e^{I^R_2(X_i;Y_i)} -1) \Bigr) } .\] Here, instead of using usual Kullback–Leibler divergence and mutual information, we consider the \(\alpha\)-divergence and the Rényi \(\alpha\)-mutual information for \(\alpha=2\): \(d_2\) and \(I^R_2\). This particular value of \(\alpha\) allows one to link \(\mathbb{P}_s\) to \(\alpha\)-information via a quadratic version of the total variation distance.

Our bounds are particularly useful under the usual “high noise assumption,” that is, when the side channel of Fig. [model1] has low capacity. Then, values of \(I^R_2(X_i;Y_i)\) will be small, and the lower bound on \(m\) is approximately equal to: \[m \gtrsim %\frac{d_2(\P_s\|\frac{1}{M})}{ \log \Bigl( 1+ \prod_{i=0}^d ( e^{I^R_2(X_i;Y_i)} -1)\Bigr)} \approx \frac{d_2(\mathbb{P}_s\|\frac{1}{M})}{\prod_{i=0}^d I^R_2(X_i;Y_i)}.\] This is very similar to the conjectured bound [eq:duc:conj], except for the use of \(I^R_2\) instead of \(I\). Additionally, we show that when \(M\) is large, the numerator does not lose tightness compared that of [loic].

In the remainder of the paper, we first recall some definitions and properties of \(\alpha\)-informational quantities in Section 2, and then derive the \(\alpha\)-extension of the main inequality (“linear bound”) in Section 3. The main result is then derived in Section 4 and illustrated by numerical simulations. Section 5 gives some perspectives.

\(\alpha\)-Information Measures

\(\alpha\)-Entropy and \(\alpha\)-Divergence

Assume that either \(0<\alpha<1\) or \(1<\alpha<+\infty\) (the limiting values \(0,1,+\infty\) will be obtained by taking limits). We consider probability distributions \(P,Q\) with a dominating measure, with respect to which they follow densities denoted by the corresponding lower-case letters \(p,q\).

We follow the notations of (Liu et al. 2021) in the following

\[\begin{aligned} H_\alpha(P)&= \tfrac{\alpha}{1-\alpha} \log\|p\|_\alpha\\ D_\alpha (P\|Q)&= \tfrac{1}{\alpha-1} \log\langle p\|q\rangle^\alpha_\alpha \label{divergence}\end{aligned}\] with the following special notation: \[\begin{aligned} \|p\|_\alpha &= \bigl(\int |p|^\alpha d\mu \bigr)^{1/\alpha} \label{alpha-norm}\\ \langle p\|q\rangle_\alpha &= \bigl(\smash{\int} p^\alpha q^{1-\alpha} d\mu \bigr)^{1/\alpha} \end{aligned}\]

The usual entropy and Kullback-Leibler divergence are recovered by letting \(\alpha\to 1\).

Conditional \(\alpha\)-Entropy

Many different definitions of conditional \(\alpha\)-entropy \(H_\alpha(X|Y)\) were proposed in the literature (see, e.g., (Fehr and Berens 2014)). Any reasonable definition should at least yield the classical definition of conditional entropy as \(\alpha\to 1\), and satisfy the property that conditioning reduces entropy (CRE): \(H_\alpha(X|Y)\leq H_\alpha(X)\), where equality holds if and only if \(X\) and \(Y\) are independent. At least four definitions are often used:

  1. \(\tilde{H}_\alpha^{(o)} (X|Y) =H_\alpha(X,Y)-H_\alpha(Y)\)

  2. \(\tilde{H}_\alpha^{(i)} (X|Y)= \mathbb{E}_Y H_\alpha(X|Y=y)\)

  3. \(\tilde{H}_\alpha^{(ii)} (X|Y) =\frac{1}{1-\alpha}\log \mathbb{E}_Y \| P_{X|Y} \|^{\alpha}_\alpha\)

  4. \(\tilde{H}_\alpha (X|Y) =\frac{\alpha}{1-\alpha}\log \mathbb{E}_Y \| P_{X|Y} \|_\alpha\)

The first two definitions appear in (Jizba and Arimitsu 2004, sec. 2.2) (see also (Golshani, Pasha, and Yari 2009 equation (2.10))) and in (Cachin 1997 equation (2.15)). However, both violate the CRE property (Fehr and Berens 2014). The last two definitions were proposed by Hayashi (Hayashi 2011) and Arimoto (Arimoto 1975) respectively. Both satisfy the CRE property. In the sequel, we use Arimoto’s definition which we simply denote as \(H_\alpha(X|Y)\).

\(\alpha\)-Information

Again, many different definitions of \(\alpha\)-information \(I_\alpha(X;Y)\) were proposed in the literature. Any reasonable definition should at least yield the classical definition of mutual information as \(\alpha\to 1\), and possibly also satisfy the following useful properties:

  • independence: \(I_\alpha(X;Y) \geq 0\) with equality if and only if \(X\) and \(Y\) are independent;

  • data post-processing inequality (post-DPI): if \(X-Y-Z\) forms a Markov chain, then post-processing cannot increase the information, i.e., \(I_\alpha(X;Z) \leq I_\alpha(X;Y)\);

  • data pre-processing inequality (pre-DPI): if \(X-Y-Z\) forms a Markov chain, then pre-processing cannot increase the information, i.e., \(I_\alpha(X;Z) \leq I_\alpha(Y;Z)\);

  • monotonicity: \(I_\alpha(X;Y)\) is nondecreasing as \(\alpha\) increases;

  • closed-form expression amenable to efficient numerical estimation.

At least four definitions are used in the literature:

  1. \(I^A_\alpha(X;Y) = H_\alpha(X)-H_\alpha(X|Y)\)

  2. \(I^C_\alpha(X;Y) = \min_{Q_Y} \mathbb{E}_X (D_\alpha(P_{Y|X}\| Q_Y))\)

  3. \(I^R_\alpha(X;Y) = D_\alpha(P_{XY}\|P_X \times P_Y)\)
    \(\hphantom{I^R_\alpha(X;Y) }=\frac{1}{\alpha-1} \log \mathbb{E}_Y \langle p_{X|Y}\|p_X\rangle^\alpha_\alpha\).

  4. \(I_\alpha(X;Y) = \min_{Q_Y} D_\alpha(P_{XY}\|P_X \times Q_Y)\)
    \(\hphantom{I_\alpha(X;Y) }= \frac{\alpha}{\alpha-1} \log \mathbb{E}_Y \langle p_{X|Y}\|p_X\rangle_\alpha\),

which somehow parallel the corresponding ones for conditional entropy. The first definition was proposed by Arimoto (Arimoto 1975). It is easily seen to satisfy both the independence and post-DPI property because of the CRE property of Arimoto’s conditional entropy. However, it does not satisfy monotonicity because \(I^A_\alpha(X;X)=H_\alpha(X)\) can be decreasing in \(\alpha\). The second definition is from Csiszár (Csiszar 1995). It does not seem to admit a closed-form expression, and the minimization is hard to solve analytically even in simple examples (Verdú 2015). However, one can prove monotonicity and the independence property, based on the properties of the \(\alpha\)-divergence.

The third definition requires no minimization and appears in (Tomamichel and Hayashi 2017 equation (50)). We call it Rényi’s \(\alpha\)-mutual information because it is a natural definition from Rényi’s divergence, just as in the classical case \(\alpha=1\). Also, it is mutual in the sense that \(I^R_\alpha(X;Y)=I^R_\alpha(Y;X)\). From the nonnegativity of \(\alpha\)-divergence: \(D_\alpha(P\|Q)\geq 0\) with equality if and only if \(P=Q\), it is easily seen that \(I^R_\alpha(X;Y)\) satisfies the independence property. From the monotonicity property of \(\alpha\)-divergence, it also satisfies monotonicity. One can also check post-DPI and pre-DPI properties, by same reasoning line as in the proof of (Liu et al. 2021 Property 12), replacing \(Q_{Y|T}, Q_{Z|T}\) by \(P_{Y|T}, P_{Z|T}\), respectively.

Finally, the fourth definition is due to Sibson (Sibson 1969) (see also (Verdú 2015)). In contrast to Rényi \(\alpha\)-mutual information, symmetry does not hold in general: \(I_\alpha(X;Y)\ne I_\alpha(Y;X)\). However, it is known to satisfy the independence property, monotonicity, and the pre and post-DPI (Polyanskiy and Verdú 2010) (see also (Rioul 2021)). See Table 1 for a summary of all properties. In the sequel, we use Sibson’s definition in connection with a generalized Fano inequality and simply denote it as \(I_\alpha(X;Y)\)

Summary of properties for various definitions of \(\alpha\)-information.[tab]
Def. Independence Post-DPI Pre-DPI Monotonicity Closed-form
\(I^A_\alpha\) yes yes no yes
\(I^C_\alpha\) yes yes no
\(I^R_\alpha\) yes yes yes yes yes
\(I_\alpha\) yes yes yes yes yes

Since \(\min_{Q_Y} D_\alpha(P_{XY}\|P_X \times Q_Y) \leq D_\alpha(P_{XY}\|P_X \times P_Y)\), Sibson’s \(\alpha\)-information can not exceed Rényi mutual information: \[\label{SibsonRenyi} I_\alpha(X;Y) \leq I^R_\alpha(X;Y).\]

Lower Bound on Sibson’s \(\alpha\)-Information

The first result of this paper is based on the following generalized Fano inequality(Rioul 2021). Assume \(K\) is discrete and estimated from \(Y\) using the MAP rule, with (maximal) probability of success \(\mathbb{P}_s=\mathbb{P}_s(K|Y)=\mathbb{E}\sup_k p_{K|Y}(k|Y)\). Also let \(\mathbb{P}_s(K)=\sup p_K\) be the probability of success when guessing \(K\) without even knowing \(Y\).

[lem-gen-fano] \[d_\alpha\bigl(\mathbb{P}_s(K|Y)\|\mathbb{P}_s(K)\bigr) \leq I_\alpha(K;Y)\] where \(d_\alpha (p\|q)\) is the binary \(\alpha\)-divergence: \[\label{eq-binary-div} d_\alpha (p\|q)= \tfrac{1}{\alpha-1} \log \bigl( p^\alpha q^{1-\alpha} + (1-p)^\alpha (1-q)^{1-\alpha} \bigr).\]

Bounding Success by Sibson’s \(\alpha\)-Information

In Fig. [model1], the sensitive variable \(X^m\) is a function of \(K\) and \(T^m\); \(\hat{K}\) is a function of \(({{Y}}^m,T^m)\). It is easily seen from the figure that the following Markov chains hold: \[\begin{aligned} & K \longleftrightarrow ({{Y}}^m,T^m) \longleftrightarrow \hat{K}, \\ & (K,T^m) \longleftrightarrow X^m \longleftrightarrow {{Y}}^m. \end{aligned}\] The probability of success of the side-channel attack is \(\mathbb{P}_s=\mathbb{P}_s(K|{{Y}}^m,T^m)\). Using Lemma [lem-gen-fano], one has \(d_\alpha\bigl(\mathbb{P}_s\| \frac{1}{M}\bigr) \leq I_\alpha (K;{{Y}}^m,T^m)\). Now, the following lemma is proved in Appendix 5.1:

[lemma3] \(I_\alpha (K;{{Y}}^m,T^m) \leq I_\alpha (K,T^m;{{Y}}^m)\).

It follows that the generalized Fano inequality implies \[\label{fano} d_\alpha\bigl(\mathbb{P}_s\| \frac{1}{M}\bigr) \leq I_\alpha (K,T^m;{{Y}}^m) .\] Because \((K,T^m) \leftrightarrow X^m \leftrightarrow {{Y}}^m\) forms a Markov chain, using the DPI of Sibson’s \(\alpha\)-information we have \[\label{markov} I_\alpha (K,T^m;{{Y}}^m ) \leq I_\alpha (X^m;{{Y}}^m ).\] Also, when \(T^m\) is not observed, each component of \(X^m\) is i.i.d., and since the side-channel is memoryless, \((X^m;{{Y}}^m)\) is an i.i.d. sequence. It easily follows from the definition that \[\label{linear} I_\alpha (X^m;{{Y}}^m) = m I_\alpha (X;{Y}).\] From [fano], [markov], and [linear], we arrive at the main result of this section:

[thm2] [linear-r] \(d_\alpha\bigl(\mathbb{P}_s\| \frac{1}{M}\bigr) \leq m I_\alpha (X;{Y})\).

Note that since \(d_\alpha\bigl(p\| q\bigr)\) is increasing in \(p\) when \(p \geq q\), Theorem [thm2] gives an upper bound on the probability of success \(\mathbb{P}_s\).

Comparison with the Classical Bound

A natural question is to compare [linear-r] with the classical bound for \(\alpha=1\), especially in terms of how it depends on \(M\). Since \(d_\alpha\) and \(I_\alpha\) are non-decreasing in \(\alpha\), a precise answer is not obvious. One can argue as follows. Assume \(\mathbb{P}_s\) is fixed in \((0,1)\). For \(\alpha=1\), one has at first order \[d_1\bigl(\mathbb{P}_s\| \tfrac{1}{M}\bigr) = \log M - (1-\mathbb{P}_s)\log(M-1) - h(\mathbb{P}_s) \approx \mathbb{P}_s \log M\] where \(h(\mathbb{P}_s)\) is the binary entropy function. For \(\alpha<1\), \(d_\alpha \bigl(\mathbb{P}_s\| \frac{1}{M}\bigr)\leq d\bigl(\mathbb{P}_s\| \frac{1}{M}\bigr)\) does not grow faster than \(O(\log M)\). For \(\alpha>1\), one has at first order \[%\label{eq1} d_\alpha \bigl(\mathbb{P}_s\| \tfrac{1}{M}\bigr)= \log M+ \tfrac{1}{\alpha-1}\log \Big( \mathbb{P}_s^\alpha + \frac{(1-\mathbb{P}_s)^\alpha}{(M -1)^{\alpha-1}} \Big) \approx \log M %\notag \\ % & \stackrel{(\star)} \geq \log M+ \frac{1}{\alpha-1}\log\P_s^\alpha + \frac{1}{\alpha-1}\log\Big( \frac{(1-\P_s)^\alpha}{(M -1)^{\alpha-1}} \Big) \notag \\ & = \log \frac{M}{M-1}+ \frac{\alpha}{\alpha-1}\log\P_s + \frac{\alpha}{\alpha-1}\log(1-\P_s), \notag\] Thus the \(O(\log M)\) term applies for any \(\alpha\), and the lower bound in [linear-r] will not become looser than the classical bound as the field size \(M\) increases.

Upper Bound on Rényi Mutual Information

Euclidean Distance to the Uniform

In the field of cryptography, the total variation distance \(\| P - U \|_1\) of a given \(M\)-ary distribution \(P\) to the uniform distribution \(U \sim \mathcal{U}(M)\) is a common criterion to evaluate randomness. For \(\alpha\ne 1\) we have the following

Let \(X\) be an \(M\)-ary random variable. The “\(\alpha\)-distance” between \(P_X\) and a uniform distribution \(U \sim \mathcal{U}(M)\) is defined as \[\| P_X - U \|_\alpha = \Bigl(\sum_x \bigl| p_X(x)-\tfrac{1}{M} \bigr| ^{\alpha} \Bigr)^{\frac{1}{\alpha}} .\]

In this section we focus on the Euclidean distance (\(\alpha=2\)) because of the following

[divergence-delta] With the same notations, one has \[D_2(P_X \| U) = \log (1+ M \cdot \| P_X - U \|_2^2). \label{pinsker}\]

One has \(\| P_X - U \|_2^2= \sum_x ( p_X(x)- \frac{1}{M})^2 =\sum_x p_X^2(x)- \frac{1}{M}\). Since \(D_2(P_X \| U) %= \frac{1}{2-1} \log \bigl( \sum_x p_X^2(x) (\frac{1}{M})^{-1} \bigr) = \log (M \cdot \sum_x p_X^2(x))\), the result follows.

The following important Lemma is known as the XOR Lemma in the case of Boolean Masking(Masure, Rioul, and Standaert 2022).

[lemma6] Let \(X_1\), \(X_2\) be independent random variables over a finite Abelian group \(\mathcal{X}\) of size \(M\), and \(U\sim \mathcal{U}(\mathcal{X})\). Let \(X= X_1 \oplus X_2\), where \(\oplus\) denotes the group operator in \(\mathcal{X}\). One has \[\label{two-element} \| P_X - U \|_2^2 \leq M \cdot \| P_{X_1} - U \|_2^2 \cdot \| P_{X_2} - U \|_2^2.\]

By finite induction, if \(X\) is split into \(d+1\) independent shares: \(X= X_0 \oplus X_1 \oplus \cdots \oplus X_d\), one has \[\label{convo-delta} \| P_X - U \|_2^2 \leq M^{d} \| P_{X_0} - U \|_2^2\| P_{X_1} - U \|_2^2 \cdots \| P_{X_d} - U \|_2^2 .\]

Since \(X= X_1 \oplus X_2\), one has \(P_X=P_{X_1}\ast P_{X_2}\) where \(\ast\) denotes the convolution operator over the Abelian group. It is easy to check that \(P_X-U=(P_{X_1}-U)\ast (P_{X_2}-U)\), and by the Cauchy-Schwarz inequality, \(|(P_{X_1}-U)\ast (P_{X_2}-U)|\leq \|P_{X_1}-U\|_2\|P_{X_2}-U\|_2\). Summing over the \(M\) values of \(X\) gives [two-element].

Lemmas [divergence-delta] and [lemma6] do not seem to be easily generalized to other values of \(\alpha\ne 2\). This is the main reason why we focus on \(\alpha=2\) in this paper.

Upper Bound of Rényi \(2\)-Information for Each Share

Since Sibson’s \(\alpha\)-information does not exceed Rényi mutual information (inequality [SibsonRenyi]), Theorem [thm2] implies \[d_\alpha\bigl(\mathbb{P}_s\| \frac{1}{M}\bigr) \leq m I^R_\alpha (X;{Y}) .\] We now upper bound \(I^R_\alpha (X;{Y})\) by noting that, by definition since \(X\) is uniformly distributed, \[\begin{split} I^R_2 (X;{{Y}}) &= \log \mathbb{E}_{{{Y}}} \exp{ D_2(P_{X|{Y}} \| U)} \\&= \log (1+ M \cdot \mathbb{E}_{{{Y}}}\| P_{X|{Y}} - U \|_2^2). \end{split}\] Since \(\{X_i,{Y}_i\}_{i=0,\ldots,d}\) are mutually independent, [convo-delta] applies for \(X|{Y}\) and we have

\[\begin{aligned} I^R_2 (X;{\boldsymbol{Y}}) &\leq \log \bigl(1+ M \cdot \mathbb{E}_{{{Y}}} M^{d} \prod_{i=0}^d \| P_{X_i|Y_i} - U \|_2^2 \bigr)\\ &=\log \bigl(1+ \prod_{i=0}^d M\cdot \mathbb{E}_{Y_i}\| P_{X_i|Y_i} - U \|_2^2 \bigr) \\ & = \log\bigl(1+ \prod_{i=0}^d ( \exp{I^R_2(X_i;{Y}_i)} -1) \bigr). \label{eq:I2:shares}\end{aligned}\]

Putting all inequalities together yields the main result of this paper:

[thm:main:I2] The number of traces \(m\) can be lower bounded by \[m \geq \frac{d_2(\mathbb{P}_s\|\frac{1}{M})}{ \log \Bigl( 1+ \prod_{i=0}^d ( \exp{I^R_2(X_i;Y_i)} -1) \Bigr) }.\]

Note that from Subsection 3.2 with \(\alpha=2\), the numerator does not lose tightness compared the case \(\alpha=1\) (compare [loic]).

Numerical Results

In this subsection, we validate our results by simulation. The side-channel settings of § 1.1 are as follows:

  • the field of variables is the AES (Advanced Encryption Standard) field with \(n=8\), thus \(M=256\);

  • side-channel information is generated by taking the Hamming weight leakage model and additive white Gaussian noise (one of the most commonly adopted models (Mangard, Oswald, and Popp 2007));

  • the Boolean masking is considered with orders \(d\in{0,1,2}\).

Shannon and Rényi mutual information (MI) is evaluated by Monte-Carlo simulation. In particular, we compare Rényi MI in [eq:I2:shares] with the following \[\label{eq:I:shares} I (X;{\boldsymbol{Y}}) \leq \log \bigl(1+\tfrac{M}{2} (\tfrac{2}{\log e})^{d+1} \prod_{i=0}^{d} I(X_i;{Y}_i) \bigr) %\vspace*{-0.43cm}\] used in [loic]. Fig. [fig:num:res] compares MI and Rényi MI for \(d=0,1,2\). Our result based on Rényi MI significantly narrows the gap between the direct evaluation and the estimation. This leads to more accurate prediction of number of queries \(m\) to achieve certain success rate \(\mathbb{P}_s\).

Fig. 1 confirms this on the performance bounds on the success rate as a function of \(m\), for \(d=1\) and \(2\). Our new bounds are significantly more accurate than the state-of-the-art: For \(\mathbb{P}_s = 80\%\) and \(d=1\), the ML attack gives about \(m\geq 60\), our new bound gives \(m\geq 25\), while [loic] gives only \(m\geq 1\). Much improvement can also be observed for \(d=2\).

\(\mathbb{P}_s\) vs \(m\) in attacks and the corresponding bounds for noise variance \(\sigma^2 = 8\). The plain curves show the results of direct maximum likelihood (ML) attacks (Heuser, Rioul, and Guilley 2014); the dotted curves show the predictions by Theorem [thm:main:I2]; the dashed curves are for the state-of-the-art bound [loic].

Perspectives

Similar improved bounds (removing the field size loss) can also be obtained in the cases of Boolean masking and arithmetic masking modulo a power of two, using “Mrs. Gerber’s lemma”, see (Beguinot et al. 2022). Interestingly, our result may also be related to (Prest et al. 2019), since [eq:I2:shares] has the same form as (Prest et al. 2019 Theorem 3), but with different information-theoretic metrics. It would be interesting to relate and compare various information metrics used in security proofs.

Acknowledgments

François-Xavier Standaert is a Senior Research Associate of the Belgian Fund for Scientific Research (FNRS-F.R.S.). This work has been funded in part by the ERC project SWORD (724725).

Proof of Lemma [lemma3]

\[\begin{aligned} I_\alpha (K;& {{Y}}^m,T^m) = \frac{\alpha}{\alpha-1} \log \mathbb{E}_{{{Y}}^m,T^m} \langle p_{K|{{Y}}^m,T^m}\|p_K\rangle_\alpha \\ &= \frac{\alpha}{\alpha-1} \log \mathbb{E}_{T^m} \int_{{{Y}}^m} p_{{{Y}}^m|T^m}\Big( \sum_k p^\alpha_{K|{{Y}}^m,T^m} p^{1-\alpha}_{K} \Big) ^{\frac{1}{\alpha}} \\ &= \frac{\alpha}{\alpha-1} \log \mathbb{E}_{T^m} \int_{{{Y}}^m} \Big( \sum_k p^\alpha_{K,{{Y}}^m|T^m} p^{1-\alpha}_{K} \Big) ^{\frac{1}{\alpha}} \\ & \stackrel{{\scriptstyle(\star)}}= \frac{\alpha}{\alpha-1} \log \mathbb{E}_{T^m} \int_{{{Y}}^m} \Big( \sum_k p^\alpha_{{{Y}}^m|K,T^m} p_{K|T^m} \Big) ^{\frac{1}{\alpha}} \\ & \stackrel{{\scriptstyle(\star\star)}}\leq \frac{\alpha}{\alpha-1} \log \int_{{{Y}}^m} \Big( \mathbb{E}_{T^m} \sum_k p^\alpha_{{{Y}}^m|K,T^m} p_{K|T^m} \Big) ^{\frac{1}{\alpha}} \\ % &= \frac{\alpha}{\alpha-1} \log \int_{{{Y}}^m} \Big( \E_{K,T^m} p^\alpha_{{{Y}}^m|K,T^m} % \Big) ^{\frac{1}{\alpha}} \\ % &= \frac{\alpha}{\alpha-1} \log \int_{{{Y}}^m} \Big( \sum_{k,t^m} p^\alpha_{{{Y}}^m|K,T^m} p_{K,T^m} \Big) ^{\frac{1}{\alpha}} \\ % &= \frac{\alpha}{\alpha-1} \log \int_{{{Y}}^m} p_{{{Y}}^m}\Big( \sum_{k,t^m} p^\alpha_{K,T^m|{{Y}}^m} p^{1-\alpha}_{K,T^m} \Big) ^{\frac{1}{\alpha}} \\ % % &= \frac{\alpha}{\alpha-1} \log \E_{{{Y}}^m} \langle p_{K,T^m|{{Y}}^m}\|p_{K,T^m}\rangle_\alpha \\ % & = I_\alpha (K,T^m;{{Y}}^m) \end{aligned}\] where \((\star)\) holds since \(p_{K}=p_{K|T^m}\) (\(K\) and \(T^m\) are independent) and \(p^\alpha_{K,{{Y}}^m|T^m} p^{-\alpha}_{K|T^m}= p^\alpha_{{{Y}}^m|K,T^m}\); \((\star\star)\) is Jensen’s inequality: when \(\alpha>1\), \(x^{\frac{1}{\alpha}}\) is concave and \(\frac{\alpha}{\alpha-1}\) is positive; when \(0<\alpha<1\), \(x^{\frac{1}{\alpha}}\) is convex and \(\frac{\alpha}{\alpha-1}\) is negative. In both cases the inequality holds in the same direction.

I created this website using Quarto https://quarto.org/docs/websites.

References

Arimoto, Suguru. 1975. Information Measures and Capacity of Order \(\alpha\) for Discrete Memoryless Channels.” In Topics in Information Theory, Proc. 2nd Colloq. Math. Societatis János Bolyai, edited by Antoine Joux, 16:41–52.
Beguinot, Julien, Wei Cheng, Sylvain Guilley, Yi Liu, Loı̈c Masure, Olivier Rioul, and François-Xavier Standaert. 2022. Removing the Field Size Loss from Duc et al.’s Conjectured Bound for Masked Encodings.” IACR Cryptol. ePrint Arch., 1–18. https://eprint.iacr.org/2022/1738.
Cachin, Christian. 1997. Entropy Measures and Unconditional Security in Cryptography.” PhD thesis, ETH Zurich.
Cheng, Wei, Yi Liu, Sylvain Guilley, and Olivier Rioul. 2022. Attacking Masked Cryptographic Implementations: Information-theoretic Bounds.” In 2022 IEEE International Symposium on Information Theory (ISIT), 654–59. IEEE.
Csiszar, I. 1995. Generalized Cutoff Rates and Renyi’s Information Measures.” IEEE Transactions on Information Theory 41 (1): 26–34. https://doi.org/10.1109/18.370121.
de Chérisey, Eloi, Sylvain Guilley, Olivier Rioul, and Pablo Piantanida. 2019. Best Information is Most Successful: Mutual Information and Success Rate in Side-Channel Analysis.” IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019: 49–79. https://doi.org/10.13154/tches.v2019.i2.49-79.
Duc, Alexandre, Sebastian Faust, and François-Xavier Standaert. 2015. Making Masking Security Proofs Concrete - Or How to Evaluate the Security of Any Leaking Device.” In Advances in Cryptology - EUROCRYPT 2015 - 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Sofia, Bulgaria, April 26-30, 2015, Proceedings, Part I, edited by Elisabeth Oswald and Marc Fischlin, 9056:401–29. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-662-46800-5\_16.
———. 2019. Making Masking Security Proofs Concrete (Or How to Evaluate the Security of Any Leaking Device), Extended Version.” J. Cryptol. 32 (4): 1263–97. https://doi.org/10.1007/s00145-018-9277-0.
Fehr, Serge, and Stefan Berens. 2014. On the Conditional Rényi Entropy.” IEEE Transactions on Information Theory 60: 6801–10. https://doi.org/10.1109/TIT.2014.2357799.
Golshani, Leila, Einollah Pasha, and Gholamhossein Yari. 2009. Some Properties of Rényi Entropy and Rényi Entropy Rate.” Information Sciences 179 (14): 2426–33. https://doi.org/https://doi.org/10.1016/j.ins.2009.03.002.
Hayashi, Masahito. 2011. Exponential Decreasing Rate of Leaked Information in Universal Random Privacy Amplification.” IEEE Transactions on Information Theory 57 (6): 3989–4001. https://doi.org/10.1109/TIT.2011.2110950.
Heuser, Annelie, Olivier Rioul, and Sylvain Guilley. 2014. Good is Not Good Enough — Deriving Optimal Distinguishers from Communication Theory.” In 16th International Workshop on Cryptographic Hardware and Embedded Systems (CHES 2014), 8731:55–74. Lecture Notes in Computer Science. Busan, South Korea: Springer.
Ito, Akira, Rei Ueno, and Naofumi Homma. 2022. On the Success Rate of Side-Channel Attacks on Masked Implementations: Information-Theoretical Bounds and Their Practical Usage.” In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022, edited by Heng Yin, Angelos Stavrou, Cas Cremers, and Elaine Shi, 1521–35. ACM. https://doi.org/10.1145/3548606.3560579.
Jizba, Petr, and Toshihico Arimitsu. 2004. The World According to Rényi: Thermodynamics of Multifractal Systems.” Annals of Physics 312 (1): 17–59.
Liu, Yi, Wei Cheng, Sylvain Guilley, and Olivier Rioul. 2021. On Conditional Alpha-Information and its Application to Side-Channel Analysis.” In IEEE Information Theory Workshop, ITW 2021, Kanazawa, Japan, October 17-21, 2021, 1–6. IEEE. https://doi.org/10.1109/ITW48936.2021.9611409.
Mangard, Stefan, Elisabeth Oswald, and Thomas Popp. 2007. Power Analysis Attacks — Revealing the Secrets of Smartcards. Springer.
Masure, Loı̈c, Olivier Rioul, and François-Xavier Standaert. 2022. A Nearly Tight Proof of Duc et al.’s Conjectured Security Bound for Masked Implementations.” In Smart Card Research and Advanced Applications - 21st International Conference, CARDIS 2022, Birmingham, UK, November 7-9, 2022, Revised Selected Papers, edited by Ileana Buhan and Tobias Schneider, 13820:69–81. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-031-25319-5\_4.
Polyanskiy, Yury, and Sergio Verdú. 2010. Arimoto Channel Coding Converse and Rényi Divergence.” In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 1327–33. https://doi.org/10.1109/ALLERTON.2010.5707067.
Prest, Thomas, Dahmun Goudarzi, Ange Martinelli, and Alain Passelègue. 2019. Unifying Leakage Models on a Rényi Day.” In Advances in Cryptology–CRYPTO 2019: 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18–22, 2019, Proceedings, Part i 39, 683–712. Springer.
Rioul, Olivier. 2021. A Primer on Alpha-Information Theory with Application to Leakage in Secrecy Systems.” In Geometric Science of Information - 5th International Conference, GSI 2021, Paris, France, July 21-23, 2021, Proceedings, edited by Frank Nielsen and Frédéric Barbaresco, 12829:459–67. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-030-80209-7\_50.
Sibson, Robin. 1969. Information Radius.” Zeitschrift für Wahrscheinlichkeitstheorie Und Verwandte Gebiete 14: 149–60.
Tomamichel, Marco, and Masahito Hayashi. 2017. Operational Interpretation of Rényi Information Measures via Composite Hypothesis Testing against Product and Markov Distributions.” IEEE Transactions on Information Theory 64 (2): 1064–82.
Verdú, Sergio. 2015. \(\alpha\)-Mutual Information.” In IEEE Information Theory and Applications Workshop (ITA2015), 1–6. San Diego, USA. https://doi.org/10.1109/ITA.2015.7308959.