Maximal Leakage of Masked Implementations
Using Mrs. Gerber’s Lemma for Min-Entropy

Authors

Julien Béguinot

Yi Liu

Olivier Rioul

Wei Cheng

Sylvain Guilley

Abstract

A common countermeasure against side-channel attacks on secret key cryptographic implementations is \(d\)th-order masking, which splits each sensitive variable into \(d+1\) random shares. In this paper, maximal leakage bounds on the probability of success of any side-channel attack are derived for any masking order. Maximal leakage (Sibson’s information of order infinity) is evaluated between the sensitive variable and the noisy leakage, and is related to the conditional “min-entropy” (Arimoto’s entropy of order infinity) of the sensitive variable given the leakage. The latter conditional entropy is then lower-bounded in terms of the conditional entropies for each share using majorization inequalities. This yields a generalization of Mrs. Gerber’s lemma for min-entropy in finite Abelian groups.

Introduction

When a cryptographic device is operating, any kind of physical leakage (time, power, electromagnetic emanations, etc.) can be exploited by an attacker. The attacker queries the device multiple times, and measures the corresponding leakages to infer the secret key. The security of devices against side-channel attacks has become a major concern.

To evaluate the probability of success for any side-channel attack, information-theoretic metrics turn out to be effective and have been used in many studies. Using conditional mutual information and Fano’s inequality, de Chérisey et al. (de Chérisey et al. 2019) established several universal bounds on the probability of success for a given number of queries, or equivalently, the minimum number of queries required to achieve a given level of success. This approach has been extended to conditional Sibson’s \(\alpha\)-information by Liu et al. (Liu et al. 2021). However, both (de Chérisey et al. 2019) and (Liu et al. 2021) were restricted to unprotected cryptographic devices.

Masking is one of the most well-established countermeasures. The main issue in this context is the fact that a direct evaluation of the information leakage requires data and computational complexities that increase rapidly with the masking order (Cheng et al. 2022). Therefore, it is important to derive bounds in terms of the individual information leakages for each share. Duc et al. (Duc, Faust, and Standaert 2015) conjectured a general form of such bounds. Rigorous bounds were obtained in two independent recent works by Ito et al. (Ito, Ueno, and Homma 2022) and Masure et al. (Masure, Rioul, and Standaert 2022). Even more recently, Béguinot et al. (Béguinot, Cheng, Guilley, Liu, et al. 2022) improved these results using Mrs. Gerber’s lemma (Wyner and Ziv 1973; Jog and Anantharam 2013) to derive sharp bounds in terms of mutual information for masking in additive groups of order \(2^n\).

In the case of unprotected implementations (without masking), it is shown by simulation in (Liu et al. 2021) that the probability of success of a side-channel attack is evaluated using Sibson’s \(\alpha\)-information all the more accurately as \(\alpha\) increases. Therefore, the case of mutual information, which corresponds to \(\alpha=1\) is not optimal. This motivates the derivation of new bounds in the limiting case \(\alpha=+\infty\).

The usual setup of masking countermeasures involves bitwise XOR (exclusive or) operations, which are particularly well suited to symmetric cryptographic algorithms like AES. However, modern cryptography also relies on operations performed in groups of prime order, and masking can also be multiplicative (Akkar and Giraud 2001) and not only additive (Goubin and Patarin 1999). For all these reasons, there is a strong incentive to extend the previous bounds to arbitrary finite Abelian groups. This motivates the generalization of Mrs. Gerber’s lemma to any such Abelian group.

Mrs. Gerber’s lemma was initially derived by Wyner and Ziv (Wyner and Ziv 1973) to lower bound the entropy of a modulo 2 addition of binary random variables in terms of the entropies of each summand. It was extended by Jog and Anatharam (Jog and Anantharam 2013) to the case of additive groups of order \(2^n\), and by Hirche (Hirche 2020) to the case of Rényi entropy of binary variables. The general case of additive groups was only considered by Tao (Tao 2009) for Shannon entropy and independent copies of two shares, in relation to sumset theory. While the original binary Mrs. Gerber’s lemma was used to derive a binary version of the entropy power inequality (Shamai and Wyner 1990), a generalization of the entropy power inequality to any prime cyclic additive group and Rényi entropy was investigated by Madiman et al. (Madiman, Wang, and Woo 2021), but does not reduce to an explicit “Mrs. Gerber’s lemma”-type inequality. Therefore, it appears that the case of min-entropy (Rényi entropy of order \(\infty\)) and additive groups of any order has not been investigated yet in our context.

Contributions

In this paper, we show that when evaluating the performance of side-channel attacks of masked implementations using conditional Sibson’s \(\alpha\)-information, the exact performance of optimal maximum likelihood attacks is attained in the limiting case \(\alpha=+\infty\). This motivates the investigation of Mrs. Gerber’s lemma for conditional min-entropy (Arimoto’s conditional entropy of order \(\infty\)). We derive a variation of such Mrs. Gerber’s lemma for any finite Abelian group and for any masking order.

The remainder of this paper is organized as follows. Section 2 gives some notations and preliminaries on \(\alpha\)-informational quantities. Section 3 shows that the optimal evaluation of side-channel attack success by Fano’s inequality is achieved in the limiting case \(\alpha=+\infty\) and derives the corresponding bound in terms of the information between the sensitive variable and the leakage, which is linear in the number of queries. Section 4 derives Mrs. Gerber’s lemma for min-entropy, first for two summands in any finite Abelian group, then extends it to the general case of \(d+1\) summands. Section [sec-concl] concludes and gives some perspectives.

Preliminaries and Notations

Framework and Notations

Let \(K\) be the secret key and \(T\) be a public variable (usually plaintext or ciphertext) known to the attacker. It is assumed that \(T\) is independent of \(K\), and \(K\) is uniformly distributed over an Abelian group \(\mathcal{G}\) of order \(M\). The cryptographic algorithm operates on \(K\) and \(T\) to compute a sensitive variable \(X\), which takes values in the same group \(\mathcal{G}\) and is determined by \(K\) and \(T\), in such a way that \(X\) is also uniformly distributed over \(\mathcal{G}\).

In a masking scheme of order \(d\), the sensitive variable \(X\) is randomly split into \(d+1\) shares \(X_0\), \(X_1\), …, \(X_d\) and cryptographic operations are performed on each share separately. Thus, \(X=X_0 \oplus X_1 \oplus \cdots \oplus X_d\), where each share \(X_i\) is a uniformly distributed random variable over \(\mathcal{G}\) and \(\oplus\) is the group operation in \(\mathcal{G}\). For this group operation, we let \(\ominus g\) denote the opposite of \(g \in \mathcal{G}\). A typical example is “Boolean masking”, for which \(\oplus\equiv\ominus\) is the bitwise XOR operation.

During computation, shares \({X}=(X_0, X_1, \ldots, X_d)\) are leaking through some side channel. Noisy “traces,” denoted by \({Y}=({Y}_0, {Y}_1, \ldots, {Y}_d)\), are measured by the attacker, where \({Y}\) is the output of a memoryless side channel with input \({X}\). Since masking shares are drawn uniformly and independently, both \({X}\) and \({Y}\) are i.i.d. sequences. The attacker measures \(m\) traces \({Y}^m=({Y}_1,{Y}_2,\ldots,{Y}_m)\) corresponding to the i.i.d. text sequence \(T^m=(T_1,T_2,\ldots,T_m)\), then exploits her knowledge of \({Y}^m\) and \(T^m\) to guess the secret key \(\hat{K}\). Again, since the side-channel is memoryless, both \({X}^m\) and \({Y}^m\) are i.i.d. sequences.

Let \(\mathbb{P}_s=\mathbb{P}(K=\hat{K})\) be the probability of success of the attack upon observing \(T^m\) and \({Y}^m\). In theory, maximum success is obtained by the MAP (maximum a posteriori probability) rule with success probability denoted by \(\mathbb{P}_s=\mathbb{P}_s(K|{Y}^m,T^m)\). The whole process is illustrated in Fig. [model1].

Rényi’s \(\alpha\)-Entropy and Arimoto’s Conditional \(\alpha\)-Entropy

Assume that either \(0<\alpha<1\) or \(1<\alpha<+\infty\) (the limiting values \(0,1,+\infty\) can be obtained by taking limits). We consider probability distributions \(P,Q\) with a dominating measure \(\mu\), with respect to which they follow densities denoted by the corresponding lower-case letters \(p,q\). We follow the notations of (Liu et al. 2021) in the following

\[\begin{aligned} H_\alpha(P)&= \tfrac{\alpha}{1-\alpha} \log\|p\|_\alpha\\ D_\alpha (P\|Q)&= \tfrac{1}{\alpha-1} \log\langle p\|q\rangle^\alpha_\alpha \label{divergence} \end{aligned}\] with the special notation: \[\begin{aligned} \|p\|_\alpha &= \bigl(\int |p|^\alpha d\mu \bigr)^{1/\alpha} \label{alpha-norm}\\ \langle p\|q\rangle_\alpha &= \bigl(\smash{\int} p^\alpha q^{1-\alpha} d\mu \bigr)^{1/\alpha}. \end{aligned}\]

The usual Shannon entropy and Kullback-Leibler divergence are recovered by letting \(\alpha\to 1\). The \(\alpha\)-entropy is nonincreasing in \(\alpha\) and achieves its min-entropy \(H_\infty\) at the limit \(\alpha =\infty\):

For a probability distribution \(P\) over a finite alphabet, the min-entropy is \[H_\infty(P)=- \log ( \max~ p).\]

Many different definitions of conditional \(\alpha\)-entropy \(H_\alpha(X|Y)\) were proposed in the literature. We use Arimoto’s definition, which is argued to be the most promising one (Fehr and Berens 2014):

The conditional \(\alpha\)-entropy of \(X\) given \(Y\) is defined as \[H_\alpha(X|Y) = \frac{\alpha}{1-\alpha}\log \mathbb{E}_Y \|p_{X|Y}\|_\alpha.\]

Assuming \(X\) takes values in a finite alphabet, the conditional min-entropy can be obtained by letting \(\alpha \to \infty\) in \(H_\alpha(X|Y)\):

\[H_\infty(X|Y)= - \log ( \mathbb{E}_Y \max\limits_{x} p_{X|Y} ) = - \log \mathbb{P}_s(X|Y)\] where \(\mathbb{P}_s(X|Y)\) is the maximum average probability of success in estimating \(X\) having observed \(Y\), by the MAP rule.

Sibson’s \(\alpha\)-Information and Liu et al.’s Conditional Version

Again, several different definitions of \(\alpha\)-information \(I_\alpha(X;Y)\) have been proposed, and Sibson’s \(\alpha\)-information is perhaps the most appropriate one because it satisfies several useful properties that other definitions do not (Verdú 2015).

\[\begin{aligned} I_\alpha(X;Y) &= \min_{Q_Y} D_\alpha(P_{XY}\|P_X \times Q_Y) \\ &= \tfrac{\alpha}{\alpha-1} \log \mathbb{E}_Y \langle p_{X|Y}\|p_X\rangle_\alpha. \end{aligned}\]

[maximal] Assuming \(X,Y\) are discrete random variables, one has \[\label{max-i} I_\infty (X;Y) = \log \sum_y \sup\limits_{x:p_X(x)>0} p_{Y|X}(y|x).\]

Max-information is also studied in (Issa, Wagner, and Kamath 2020) as maximal leakage.

Again, there are many different proposals for conditional \(\alpha\)-information. We use the following definition which seems most appropriate in the context of side-channel analysis (Liu et al. 2021):

\[\begin{aligned} I_\alpha(X;Y|Z) &= \min\limits_{Q_{YZ}}D_\alpha(P_{XYZ}\|P_{X|Z}Q_{YZ}) \\ &= \tfrac{\alpha}{\alpha-1} \log \mathbb{E}_{YZ} \langle p_{X|YZ}\|p_{X|Z}\rangle_\alpha. \end{aligned}\]

Fano’s Equality for Order \(\infty\): Linear Bound

Fano Inequality for Conditional \(\alpha\)-Information as \(\alpha\to\infty\)

Using conditional \(\alpha\)-information, Liu et al.(Liu et al. 2021) derived a universal bound on the probability of success as follows.

\[\label{fano} I_\alpha(K;{Y}^m|T^m) \geq d_\alpha(\mathbb{P}_s(K|{Y}^m,T^m) \| (\mathbb{P}_s(K)))\] where \(d_\alpha(p\|q)\) is the binary \(\alpha\)-divergence: \[d_\alpha(p\|q) =\tfrac{1}{\alpha-1} \log (p^\alpha q^{1-\alpha} +(1-p)^\alpha (1-q)^{1-\alpha}).\]

When \(\alpha \to 1\), this bound recovers the previous bound in (de Chérisey et al. 2019). The simulation results in (Liu et al. 2021) show that [fano] is tighter as \(\alpha\) increases.

In this section, we prove that Fano’s inequality for conditional \(\alpha\)-information becomes an equality in the limiting case \(\alpha = \infty\). Thus, conditional max-information can accurately characterize the probability of success.

[thm:1] For a uniformly distributed secret \(K\), \[\begin{split} \label{eq:1} I_\infty(K;{Y}^m|T^m) &= d_\infty(\mathbb{P}_s(K|{Y}^m,T^m) \| (\mathbb{P}_s(K))) \\ &=\log (M\mathbb{P}_s) \end{split}\] where \(d_\infty(p \|q)=\lim\limits_{\alpha \to \infty} d_\alpha(p\|q)=\log \max\limits_{x,q(x)>0} (p(x)/q(x))\), \(\mathbb{P}_s=\mathbb{P}_s(K|{Y}^m,T^m)\) is the optimal probability of success, and \(\mathbb{P}_s(K)=1/M\) is the corresponding probability of success in the case of blind estimation (without any observation).

To prove this theorem, we need the explicit expression of conditional max-information.

Assuming \(X\) takes values in a finite alphabet, one has \[I_\infty(X;Y|Z) = \log \mathbb{E}_Z \int_y ( \mathop{\max}\limits_{x:p_{X|Z}(x|z)>0} p_{Y|XZ}) ~d\mu_Y \label{eq:I_infty}.\]

This result easily follows from the following Lemmas [lemma1] and [lemma2], which are proved in Appendices 5.2 and 5.3 respectively. In (Issa, Wagner, and Kamath 2020), conditional maximal leakage is defined as a maximum over \(Z\), while our conditional max-information is averaged over \(Z\)—which is less than or equal to the conditional maximal leakage of (Issa, Wagner, and Kamath 2020).

[lemma1] Given any fixed \(y,z\), we have \[\lim_{\alpha \to \infty} ~p_{Y|Z} \cdot \langle p_{X|YZ}\|p_{X|Z}\rangle_\alpha = \max_{x:p_{X|Z}(x|z)>0} p_{Y|XZ}.\]

[lemma2] \[\begin{aligned} \lim_{\alpha \to \infty} \log & ~\mathbb{E}_{YZ} \langle p_{X|YZ}\|p_{X|Z}\rangle_\alpha \notag \\ &= \log \mathbb{E}_{Z} \int_y \lim_{\alpha \to \infty} ~p_{Y|Z} \cdot \langle p_{X|YZ}\|p_{X|Z}\rangle_\alpha. \end{aligned}\]

Under the MAP rule, the probability of success writes \[\begin{aligned} \mathbb{P}_s &= \mathbb{E}_{{Y}^mT^m} (\max_{k} ~p_{K|{Y}^m,T^m}) \notag \\ &= \mathbb{E}_{T^m} \int_{{y}^m}(\max_{k} ~p_{{Y}^m|K,T^m}p_{K|T^m}) d\mu_{{Y}^m} \label{ps}. \end{aligned}\] Recall \(K\) is uniformly distributed and independent from \(T^m\). Therefore, [ps] becomes \[\label{P_s} \mathbb{P}_s= \frac{1}{M} \cdot \mathbb{E}_{T^m} \int_{{y}^m} \bigl(\max_{k} ~p_{{Y}^m|K,T^m} \bigr) d\mu_{{Y}^m}.\] Combining [eq:I_infty] and [P_s] we have \(I_\infty(K;{Y}^m|T^m) = \log(M \mathbb{P}_s)\). Since \(\mathbb{P}_s\geq 1/M\), one has \(\mathbb{P}_s\cdot M \geq (1-\mathbb{P}_s) \cdot M/(M-1)\) and \(d_\infty(\mathbb{P}_s(K|{Y}^m,T^m) \| (\mathbb{P}_s(K))) = \log (M\mathbb{P}_s)\), which proves [eq:1].

Linear Bound Using Maximal Leakage \(I_\infty(X;\mathbf{Y})\)

Evaluating \(I_\infty(K;{Y}^m|T^m)\) directly turns out to be cumbersome (see Remark [con-i-remark] below). Instead we use the unconditional max-information measure, i.e., maximal leakage \(I_\infty(X;\mathbf{Y})\) to bound the probability of success, which is linear in the number \(m\) of measurements:

[thm:linear-bound] \[\log(M\mathbb{P}_s) \leq m I_\infty (X;{Y}).\] [thm:linearBound]

By Definition [maximal], \[I_\infty (K,T^m;{Y}^m) = \log \int_{{y}^m} \max\limits_{k,t^m} ~p_{{Y}^m|K,T^m} d\mu_{{Y}^m}.\] Because \(\max\limits_{k,t^m} ~p_{{Y}^m|K,T^m} \geq \mathbb{E}_{T^m}~(\max_{k}~p_{{Y}^m|K,T^m})\), by [eq:1] and [eq:I_infty] we have \[I_\infty (K,T^m;{Y}^m) \geq I_\infty (K;{Y}^m|T^m) = \log (M\mathbb{P}_s).\] Because \((K,T^m) \leftrightarrow X^m \leftrightarrow {{Y}}^m\) forms a Markov chain, using the data processing inequality (DPI) for Sibson’s \(\alpha\)-information (Polyanskiy and Verdú 2010; Rioul 2021) we have \[\label{markov} I_\alpha (K,T^m;{{Y}}^m ) \leq I_\alpha (X^m;{{Y}}^m ).\] Also, when \(T^m\) is not observed, each component of \(X^m\) is i.i.d., and since the side-channel is memoryless, \((X^m;{{Y}}^m)\) is an i.i.d. sequence. It easily follows from the definition that \[\label{linear} I_\alpha (X^m;{{Y}}^m) = m I_\alpha (X;{Y}).\] Letting \(\alpha \to \infty\) in [markov] and [linear] we have \(I_\infty (K,T^m;{Y}^m) \leq m I_\infty (X;{Y})\).

[con-i-remark] For conditional \(\alpha\)-information we have the inequality \(I_\alpha(K;{Y}^m|T^m) \leq I_\alpha(X^m;{Y}^m|T^m)\) similar to [markov]. However, one does not have an equality similar to [linear] when \(T^m\) is observed.

This proof cannot use the result in (Issa, Wagner, and Kamath 2020 Theorem 1) because in this theorem \({Y^m}\) is not on a finite alphabet. What’s more, if we use Definition 1 and Theorem 1 in (Issa, Wagner, and Kamath 2020) we will have \[I_{\infty}(X^m;{Y}^m, T^m) \geq \log (M\cdot \mathbb{P}_s(K|{Y}^m,T^m))\] but \(I_{\infty}(X^m;{Y}^m)\) is less than \(I_{\infty}(X^m;{Y}^m, T^m)\).

Mrs. Gerber’s Lemma for Min-Entropy in Any Finite Abelian Group

To benefit from Theorem [thm:linear-bound] it remains to upper bound \(I_\infty (X;{Y})\). Since \(X\) is uniformly distributed, it is easily seen from the definition that \(I_\infty (X;{Y})=\log M - H_\infty (X|{Y})\). Thus, it remains to lower bound the conditional min-entropy \(H_\infty (X|{Y})\). This can be seen as an extension of Mrs. Gerber’s lemma to min-entropy in finite additive groups.

Mrs. Gerber’s Lemma for Two Random Variables

  Wyner and Ziv (Wyner and Ziv 1973) lower bounded the entropy of a sum of binary random variables with the entropies of each summand. This is known as Mrs. Gerber’s lemma.

Let \(X_0,X_1\) be two independent \(\mathbb{Z}_2\)-valued random variables with side information \(\mathbf{Y}=(Y_0,Y_1)\) and sensitive bit \(X=X_0\oplus X_1\). Then \[H(X|\mathbf{Y}) \geq h( h^{-1}(H(\!X_0 |Y_0)) \star h^{-1}(H(X_1 |Y_1)))\] where \(h(p) = - p \log p - \bar{p} \log \bar{p}\), \(a\star b= a \bar{b} + \bar{a} b\) and \(\bar x =1-x\). [thm:originalMGL]

Jog and Anatharam (Jog and Anantharam 2013) extended Mrs. Gerber’s lemma to additive groups of order \(2^n\). Hirche (Hirche 2020) extended Mrs. Gerber’s lemma for binary random variables to the case of Rényi entropies. In particular for min-entropy, one has equality:

Let \(X_0,X_1\) be two independent \(\mathbb{Z}_2\)-valued random variables with side information \(\mathbf{Y} = (Y_0,Y_1)\) and \(X=X_0\oplus X_1\). Then \[H_\infty(X|\mathbf{Y}) = h_\infty( h_\infty^{-1}(H_\infty(X_0 |Y_0)) \star h_\infty^{-1}(H_\infty(\!X_1 |Y_1))) \] where \(h_\infty(p)= - \log \max \{p,\bar{p}\}\). [thm:inftyMGL]

In this section, Mrs. Gerber’s lemma is extended for the min-entropy in any additive finite group:

Let \(X_0,X_1\) be two independent \(\mathcal{G}\)-valued random variables with side information \(\mathbf{Y}=(Y_0,Y_1)\) and sensitive variable \(X=X_0\oplus X_1\). Then for \(k = \max \{ \lfloor p^{-1} \!\rfloor , \lfloor q^{-1} \!\rfloor \}\), one has the optimal bound \[\exp(\!-\!H_\infty\!(X|\mathbf{Y})) \!\leq\! \begin{cases} k p q +\!(1\!-\!kp)(1\!-\!kq) & \hspace{-1em} \text{ if } \!\frac{1}{k+1} \leq p,q \leq\! \frac{1}{k} \\ \min \{ p,q\} & \hspace{-1em} \text{otherwise,} \end{cases} \label{eq-condcase}\] where \(p = \exp(\!-\!H_\infty(X_0|Y_0))\) and \(q = \exp(\!-\!H_\infty(X_1|Y_1))\). [thm:gerber-infty]

Since \(k p q + (1\!-\!kp)(1\!-\!kq) \!=\! \tfrac{1}{k+1} \!+\! \tfrac{k}{k+1} (\!(k\!+\!1)p\!-\!1)(\!(k\!+\!1)q \!-\! 1)\), \(\frac{1}{k+1} \leq p,q \leq \frac{1}{k}\) implies \(\frac{1}{k\!+\!1} \leq k p q \!+\! (1\!-\!kp)(1\!-\!kq) \leq \frac{1}{k}\). Thus, if both \(H_\infty(X_0|Y_0)\) and \(H_\infty(X_1|Y_1)\) lie in the interval \([\log k,\log(k+1)]\), then so does the corresponding bound on \(H_\infty(X|\mathbf{Y})\).

We first prove the inequality in the unconditional case. The probability mass function of \(X_0\oplus X_1\) is given by the convolution with respect to \(\mathcal{G}\) of the probability mass functions of \(X_0\) and \(X_1\). That is, for any \(x \in \mathcal{G}\), \[%\forall x \in \mathcal{G}, \mathbb{P}(X_0 \oplus X_1=x) = \sum_{i \in \mathcal{G}} \mathbb{P}(X_0 = x \oplus i) \mathbb{P}(X_1=\ominus i). \] In particular, \[\exp(\!-\!H_\infty(X_0 \oplus X_1)) = \max_{x \in \mathcal{G}} \sum_{i \in \mathcal{G}} \mathbb{P}(X_0 = x\oplus i) \mathbb{P}(X_1= \ominus i). \] Hence the problem reduces to upper-bound \[\max_{x \in \mathcal{G}} \sum_{i \in \mathcal{G}} \mathbb{P}(X_0 = x \oplus i) \mathbb{P}(X_1= \ominus i). \] Since \(\exp(\!-\!H_\infty(X_0 \ominus x)) = \exp(\!-\!H_\infty(X_0))\) we can assume without loss of generality that the maximum is reached for \(x=0\) and the problem reduces to the maximization of \[\sum_{i \in \mathcal{G}} \mathbb{P}(X_0 = i) \mathbb{P}(X_1= \ominus i).\] Let \((1),\ldots,(M) \in \mathcal{G}\) be an ordering of the group elements so that \(\mathbb{P}(X_0 = (1)) \geq \mathbb{P}(X_0 = (2)) \geq \ldots \geq \mathbb{P}(X_0 = (M))\). The problem is to maximize \[\sum_{i=1}^M \underbrace{\mathbb{P}(X_0 = (i))}_{p_{(i)}} \underbrace{\mathbb{P}(X_1 = \ominus (i))}_{q_{(i)}}. \] The min-entropy of \(X_1\) is invariant under any permutation of its probability mass function. Furthermore, by the rearrangement inequality (Lemma [rearrangment-lemma] in Appendix 5.1) a permutation of the probability mass function of \(X_1\) maximizing the sum is such that \(\mathbb{P}(X_1 \!=\! \ominus (1)) \geq \mathbb{P}(X_1 \!=\! \ominus (2)) \geq \ldots \geq \mathbb{P}(X_1 \!=\! \ominus (\!M\!))\). Finally the problem is reduced to \[\max_{\mathbf{p},\mathbf{q}} \phi(\mathbf{p},\mathbf{q}) {\color{blueR} \triangleq} \sum p_{(i)} q_{(i)} \label{eqn:max-two-pmfs} \] under the constraint that \(\exp(\!-\!H_\infty(X_0))= p_{(1)} = p\) and \(\exp(\!-\!H_\infty(X_1)) = q_{(1)} = q\). Moreover, \(h\) is Schur-convex in \(\mathbf{p}\) when \(\mathbf{q}\) is fixed and vice-versa (see Lemma [schur-combination] in Appendix 5.1). Hence the maximum in [eqn:max-two-pmfs] is reached for the least spread out probability mass function under the min entropy constraints. That is (Lemma [majorization-lemma] in Appendix 5.1), \[\begin{cases} (p_{(1)},\ldots,p_{(M)}) = (p,\ldots,p,1-k p,0,\ldots,0) \\ (q_{(1)},\ldots,q_{(M)}\,) = (q,\ldots,q,1-l\,q,0,\ldots,0) \end{cases}\] where \(k = \lfloor p^{-1} \rfloor\) and \(l = \lfloor q^{-1} \rfloor\). Hence we obtain the bound \[\exp(\!-\!H_\infty(X)) \leq \begin{cases} k p q + (1 - k p) (1 - k q) & \hspace{-1em} \text{ if } k=l \\ \min \{ p, q \} & \hspace{-1em} \text{ otherwise.} \end{cases} \label{eq-uncondcase}\] It remains to prove that [eq-uncondcase] carries over to the conditional case. Note that the bound is concave in \(p\) for a fixed \(q\) and vice-versa. Indeed, let \(\frac{1}{k+1} \leq q \leq \frac{1}{k}\) be fixed. Then the inequality is piecewise linear in \(p\), equal to \[\begin{cases} p & \text{if } p \leq \frac{1}{k+1} \\ k pq + (1-k p) (1-k q) & \text{if } \frac{1}{k+1} \leq p \leq \frac{1}{k} \\ q &\text{otherwise.} \end{cases}\] The three successive slopes are \(1\), \(k(k+1)q-k\) and \(0\). Since \(k(k+1)q-k \in [0,1]\), these slopes are in decreasing order and the function is indeed concave. Therefore, applying Jensen’s inequality (twice) proves [eq-condcase].

Extension to \(d+1\) Summands

Jog and Anatharam (Jog and Anantharam 2013) extended their generalization of Mrs. Gerber’s lemma (for Shannon entropy) for random variables in group of order \(2^n\) with two summands by repeating their inequality. In the same fashion, Theorem [thm:gerber-infty] is extended to \(d+1\) summands by repeated application of Theorem [thm:gerber-infty]:

Let \(p_i = \exp(\!-\!H_\infty(X_i|Y_i))\), without loss of generality assume \(p_0 \leq p_1 \leq \ldots \leq p_d\). Let \(k = \lfloor p_0^{-1} \rfloor\), \(r = \max \{ i | p_i \leq \frac{1}{k}\}\). Then \(H_d=H_\infty(X|\mathbf{Y})\) is lower bounded as \[H_d \geq - \log \biggl(\frac{1}{k+1} + \frac{k^r}{k+1} \prod_{i=0}^r ((k+1)p_i-1) \biggl).\] [thm:extended-gerber-infty]

See Appendix 5.4.

In the side-channel context, it is particularly interesting to characterize the behavior of the inequality in the high entropy regime in terms of maximal leakage. This corresponds to the high noise regime of Theorem [thm:linear-bound].

Let \(I_d = I_\infty(X;\mathbf{Y})\) in bits, then as \(I_\infty(X_i;Y_i) \rightarrow 0\), \[I_d \leq C_d \prod_{i=0}^d I_\infty(X_i;Y_i) + o\biggl( \prod_{i=0}^d I_\infty(X_i;Y_i) \biggl)\] where \(C_d = (M-1)^d (\ln 2)^d\). [thm:taylor1]

See Appendix 5.5.

Refined Unconditioned Extension to \(d+1\) Summands

In contrast to Theorem [thm:gerber-infty], Theorem [thm:extended-gerber-infty] is not guaranteed to be optimal when \(d>1\). The inequality can be improved by exploiting the structure of the sum of multiple random variables. We derive an improved bound which is optimal for entropies in the range \([\log(k\!-\!1),\log(k)]\) provided that there is a subgroup of \(\mathcal{G}\) of order \(k\). In particular, it is optimal in the high entropy regime \([\log(M\!-\!1),\log(M)]\) (since the group itself is a subgroup of order \(M\)).

Let \(p_i = \exp(\!-\!H_\infty(X_i))\), without loss of generality we assume \(p_0 \leq p_1 \leq \ldots \leq p_d\). Let \(k = \lfloor p_0^{-1} \!\rfloor\), \(r = \max \{ i | p_i \leq \frac{1}{k}\}\). Let \(H_d=H_\infty(X)\), \[H_d \! \geq \! \begin{cases} \!-\! \log \bigl( \frac{1}{k+1} \!+\! \frac{1}{k+1} \prod \limits_{j=0}^r ((k\!+\!1)p_i \!-\! 1) \bigl) & \hspace{-1em} \text{ if $r$ is even,} \\ \!-\! \log \bigl( \frac{1}{k+1} \!+\! \frac{k}{k+1} \prod \limits_{j=0}^r ((k\!+\!1)p_i \!-\! 1) \bigl) & \hspace{-1em} \text{ if $r$ is odd.} \end{cases}\] [thm:refined-extension]

See Appendix 5.6. Contrary to Theorem [thm:extended-gerber-infty], Theorem [thm:refined-extension] does not apply to conditional min-entropy in general. In fact, when all the variables are fixed except one, the bound inside the logarithm is piece-wise linear but discontinuous in \(\frac{1}{k}\) when \(r\) is even. This discontinuity breaks the convexity of the inequality. Ensuring continuity for the desired convexity, we are led back to the expression of Theorem [thm:extended-gerber-infty]. However, under the assumption that \[%\forall i, \forall y, \frac{1}{M} \leq \exp(\!-\!H_\infty(X_i|Y_i=y)) \leq \frac{1}{M-1} \label{eq:hypothesis}\] for all \(i\) and \(y\), the bound of Theorem [thm:refined-extension] inside the logarithm is linear and we do obtain a conditional inequality. Fortunately, assumption [eq:hypothesis] makes sense in the side-channel context. In fact, a common leakage model is \(Y_i = f_i(X_i) + \sigma \mathcal{N}(0,1)\) where \(f_i\) is a fixed (possibly unknown) leakage function, such as the Hamming weight or a linear combination of the bits of the variable \(X_i\). In particular [eq:hypothesis] holds for large enough \(\sigma\) (high noise regime). Then we have the following

Assume [eq:hypothesis] and let \(I_d = I_\infty(X;\mathbf{Y})\) in bits, then as \(I_\infty(X_i;Y_i) \rightarrow 0\), \[I_d \leq C_d \prod_{j=0}^d I_\infty(X_i;Y_i) + o\biggl( \prod_{j=0}^d I_\infty(X_i;Y_i) \biggl)\] where \[C_d = \begin{cases} (\ln 2)^d & \text{ if $d$ is even,} \\ (M-1) (\ln 2)^d & \text{ if $d$ is odd.} \end{cases} \label{equ:cd-pri}\] [thm:taylor2]

Taylor expansion of the exponential about \(0\) and of the logarithm about \(1\). Theorem [thm:taylor2] is particularly interesting because it suggests that, with respect to the worst case leakage distribution, masking of odd order \(d\) is not useful compared to masking with order \(d-1\) at high noise. In practice, however, for observed leakages this phenomenon may not apply. Theorem [thm:taylor2] is different from Theorem [thm:taylor1] as the constant \(C_d\) is improved largely. Though Theorem [thm:taylor2] requires the high noise assumption [eq:hypothesis] to hold.

Finally, combining Theorem [thm:taylor2] and Theorem [thm:linearBound] yields a bound on the probability of success

[cor-bound_PS] For \(m\) traces, as \(\mathbb{P}_s \rightarrow \tfrac{1}{M}\), \[\mathbb{P}_s \leq \frac{\exp(m I_\infty(X;\mathbf{Y}))}{M} \approx \frac{1}{M} + \frac{m C_d}{M} \prod_{i=0}^d I_\infty(X_i;Y_i).\]

This is to be compared with the bound of (Béguinot, Cheng, Guilley, Liu, et al. 2022 Eqn. 8):

As \(\mathbb{P}_s \rightarrow \tfrac{1}{M}\), \[\mathbb{P}_s \leq \frac{1}{M} + \sqrt{m} A_d \biggl(\prod_{i=0}^d I(X_i,Y_i)\biggl)^{\tfrac{1}{2}}\] where \(A_d = \sqrt{M-1} (2 \ln 2)^{\tfrac{d+1}{2}} M^{-1}\). [prop:cosade-eq]

  See Appendix 5.7.

As expected both bounds decrease exponentially in \(d\) to the minimum value \(\frac{1}{M}\). Although \(I\) and \(I_\infty\) are different metrics, we observe that

  • the constant factor \(C_d/M\) for \(I_\infty\) in [equ:cd-pri] is exponentially lower in \(d\) than the factor \(A_d\) for \(I\);

  • the exponential decay in \(d\) is twice higher for \(I_\infty\);

  • the inequality scales better for \(I\) than for \(I_\infty\) in terms of number \(m\) of traces (since we compared both bounds for \(\mathbb{P}_s \approx \frac{1}{M}\), \(m\) is not necessarily taken large).

Finally, we can contrast both bounds on a toy example. Let \(Y_i\) be uniformly distributed in \(\{ x \in \mathcal{G} | x \neq X_i\}\). Then it is easily seen that \(I(X_i,Y_i)=I_\infty(X_i,Y_i)=\log(\frac{M}{M-1})\). In this case, the bound of this paper outperforms the bound of (Béguinot, Cheng, Guilley, Liu, et al. 2022) in the high noise regime (\(\mathbb{P}_s\to \frac{1}{M}\)). Both bounds are compared numerically in Figs. [fig:d1] and [fig:d2] in Appendix 5.9 for \(d=1\) and \(2\), respectively, and \(M=256\).

Conclusion and Perspectives

 [sec-concl]

We have shown that maximal leakage for masked implementations can be used to bound the probability of success of any side-channel attack. Maximal leakage is bounded by an efficiently computable bound based on a new variation of Mrs. Gerber’s lemma for min-entropy. The bound tightness is commented with some example groups and probability mass function with figures in Appendix 5.8.

Improving the inequality when there is no subgroup of order \(k+1\) in \(\mathcal{G}\) is an interesting perspective. Indeed, groups of prime order which have no subgroup except the trivial ones are of major interest for their application to masking in asymmetric cryptographic schemes (especially post-quantum schemes). Besides, it would also be of interest to check whether the parity of \(d\) does play a practical role in the efficiency of masked implementations.

Background on Majorization

We recall definitions and basic results of majorization theory. An extensive presentation can be found in the reference textbook (Marshall, Olkin, and Arnold 1980).

If \(\mathbf{p} = (p_1,\ldots,p_M)\) is a probability mass function, an arrangement \((1),(2),\ldots,(M)\) of \(\mathbf{p}\) so that \(p_{(1)} \geq \ldots \geq p_{(M)}\) is said to be the statistical ordering of \(\mathbf{p}\). The associated cumulative mass function is noted \(P_{(i)} = p_{(1)} + \ldots + p_{(i)}\) where \(P_{(0)}=0\) by convention.

Let \(\mathbf{p},\mathbf{q}\) be two probability mass functions. We say that \(\mathbf{q}\) majorizes \(\mathbf{p}\) and write \(\mathbf{p} \preceq \mathbf{q}\) if \[P_{(i)} \leq Q_{(i)} \qquad (i=1,\ldots,M).\] This partial order on the probability mass functions quantifies whether a distribution is more spread out than the other.

\(f : \mathbf{p} \mapsto f(\mathbf{p}) \in \mathbb{R}\) is said to be Schur-convex if it is increasing with respect to majorization i.e. \(\mathbf{p} \preceq \mathbf{q} \implies f(\mathbf{p}) \leq f(\mathbf{q})\).

If \(\alpha_1 \geq \ldots \geq \alpha_M\) then \((p_1,\ldots,p_M) \mapsto \sum_{i=1}^M \alpha_i p_{(i)}\) is Schur-convex. [schur-combination]

This can be shown by an Abel transform as pointed out in (Béguinot, Cheng, Guilley, and Rioul 2022 Remark 2). \[\begin{aligned} \sum \alpha_i p_{(i)} &= \sum \alpha_i (P_{(i)} - P_{(i-1)}) \\ &= \alpha_M P_{(M)} - \alpha_1 P_{(0)} - \sum_{i=1}^{M-1} (\alpha_{i+1}-\alpha_i) P_{(i)} \\ &= \alpha_M - \sum_{i=1}^{M-1} (\alpha_{i+1}-\alpha_i) P_{(i)}. \end{aligned}\] Since \(\alpha_{i+1}-\alpha_i \leq 0\) the Schur-convexity follows from the definition.

Let \(\mathbf{p}\) be a probability mass functions whose min-entropy is equal to \(-\log p\) and \(k = \lfloor p^{-1} \!\rfloor\) then \[(p,\frac{1\!-\!p}{M\!-\!1},\ldots,\frac{1\!-\!p}{M\!-\!1}) \preceq \mathbf{p} \preceq (p,\ldots,p,1\!-\!kp,0,\ldots,0).\] [majorization-lemma]

Let \((a_1,\ldots,a_n)\), \((b_1,\ldots,b_n) \in \mathbb{R}^{+n}\) be two sequences in descending order. Then for all permutations \(\sigma\) of \(\{1,\ldots,n\}\) it holds that \[\sum a_i b_{n+1-i} \leq \sum a_i b_{\sigma(i)} \leq \sum a_i b_i.\] [rearrangment-lemma]

See (Marshall, Olkin, and Arnold 1980) for a proof using majorization.

Proof of Lemma [lemma1]

Method 1:
By Theorem 6 of (Erven and Harremos 2014) we have \[\begin{aligned} \lim\limits_{\alpha \to \infty} \langle p_{X|YZ}\|p_{X|Z}\rangle_\alpha = \exp \big( D_{\infty} (P_{X|YZ}\|P_{X|Z}) \big) \\ = \max_{x:p_{X|Z}(x|z)>0} \frac{p_{X|YZ}}{p_{X|Z}}. \end{aligned}\] Because \(p_{Y|Z}\cdot p_{X|YZ}/p_{X|Z} =p_{Y|XZ}\), the proof is finished.
Method 2:

We use \(L^{\infty}\)-norm to prove this lemma. \[\begin{aligned} p_{Y|Z} & \langle p_{X|YZ} \|p_{X|Z}\rangle_\alpha = p_{Y|Z} ~\bigl( \sum_{x \in \mathcal{X}} p^{\alpha}_{X|YZ} ~p^{1-\alpha}_{X|Z} \bigr) ^{\frac{1}{\alpha}} \notag \\ &= \bigl( \sum_{x \in \mathcal{X}} p^{\alpha}_{XY|Z}~ p^{1-\alpha}_{X|Z} \bigr) ^{\frac{1}{\alpha}} = \Bigl( \sum_{x \in \mathcal{X}} \bigl( p_{XY|Z}~ p^{\frac{1-\alpha}{\alpha}}_{X|Z} \bigr)^{\alpha} \Bigr) ^{\frac{1}{\alpha}} \notag \\ &= \Bigl( \sum_{x \in \mathcal{X}} \bigl( p_{Y|XZ}~ p^{\frac{1}{\alpha}}_{X|Z} \bigr)^{\alpha} \Bigr) ^{\frac{1}{\alpha}}. \label{expression} \end{aligned}\] For any \(\varepsilon >0\), there exists a sufficiently large \(\alpha >0\) such that \[\label{ineq} p_{Y|XZ} -\varepsilon \leq p_{Y|XZ}~ p^{\frac{1}{\alpha}}_{X|Z} \leq p_{Y|XZ}.\] Because \(\mathcal{X}\) is finite, one always has a sufficiently large \(\alpha >0\) such that [ineq] holds for any \(x \in \mathcal{X}\). By \(L^{\infty}\)-norm we have \[\lim_{\alpha \to \infty} \Bigl(\! \sum_{x:p_{X|Z}(x|z)>0} \hspace{-1em} \bigl( p_{Y|XZ} -\varepsilon \bigr)^{\alpha} \Bigr) ^{\frac{1}{\alpha}} = \hspace{-1em} \max_{x:p_{X|Z}(x|z)>0} p_{Y|XZ} -\varepsilon\] Since \(\varepsilon >0\) is arbitrary, combined with the squeeze theorem, the proof is finished.

Proof of Lemma [lemma2]

By definition we have \[\begin{aligned} & \lim\limits_{\alpha \to \infty} \log \mathbb{E}_{YZ} \langle p_{X|YZ}\|p_{X|Z}\rangle_\alpha \notag\\ &= \lim\limits_{\alpha \to \infty} \log \mathbb{E}_{YZ} \exp \bigl(\tfrac{\alpha-1}{\alpha} D_{\alpha} \langle p_{X|YZ}\|p_{X|Z}\rangle_\alpha\bigr) . \end{aligned}\] This value is bounded because \(I(X;Y|Z) \leq \log M\). Since \(\tfrac{\alpha-1}{\alpha}D_{\alpha} \langle p_{X|YZ}\|p_{X|Z}\rangle_\alpha\) is increasing in \(\alpha\), the lemma follows from the monotone convergence theorem.

Proof of Theorem [thm:extended-gerber-infty]

We prove the inequality by induction. Theorem [thm:gerber-infty] settles the case of \(d+1=2\) variables. We assume it is true for all sets of at most \(d+1\) variables and show it is true for all set of at most \(d+2\) variables. Let \(k,r_{d+1}\) be the value of \(k,r\) in the theorem associated to \(X_0,\ldots,X_{d},X_{d+1}\). If \(r_{d+1} < d+1\) we lower bound the min-entropy of the sum \(X_0 \oplus \ldots \oplus X_{d+1}\) by the entropy of \(X_0 \oplus \ldots \oplus X_{d}\). We conclude by applying the induction hypothesis to this sum of \(d\) random variables. Else \(r_{d+1} = d+1\). Since \(X_0 \oplus \ldots \oplus X_d \oplus X_{d+1} = (X_0 \oplus \ldots \oplus X_{d}) \oplus X_{d+1}\), we apply the induction hypothesis \(\mathcal{H}_d\) to \(X_0,\ldots,X_d\) then we apply Theorem. [thm:gerber-infty] to \(X_{d+1}\) and \(X_{0} \oplus \ldots \oplus X_d\). Let \(K = \exp(\!-\!H_\infty(X_0 \oplus\ldots\oplus X_{d+1}|Y_0\ldots Y_{d+1}))\). \[\begin{aligned} K &\overset{(a)}{\leq} 1 - k (p_{d+1} + \exp(\!-\!H_\infty(X_0\!\oplus\!\ldots\!\oplus\!X_d|Y_0\!\ldots\!Y_d))) \nonumber \\ &+ k(k+1) p_{d+1} \exp(\!-\!H_\infty(X_0\!\oplus\!\ldots\!\oplus\!X_d|Y_0\!\ldots\!Y_d)) \\ &\overset{(b)}{\leq} 1 - k( p_{d+1} + \frac{1}{k+1} + \frac{k^d}{k+1} \prod_{i=0}^d ((k+1)p_i-1)) \nonumber \\ &+ k(k+1)p_{d+1} (\frac{1}{k+1} + \frac{k^d}{k+1} \prod_{i=0}^d ((k+1)p_i-1)) \\ &= \frac{1}{k+1} + \frac{k^{d+1}}{k+1} \prod_{i=0}^d ((k+1)p_i-1)) ((k+1)p_{d+1}-1) \\ &= \frac{1}{k+1} + \frac{k^{d+1}}{k+1} \prod_{i=0}^{d+1} ((k+1)p_i-1)) \end{aligned}\] where \((a)\) holds by \(\mathcal{H}_1\) and \((b)\) holds by \(\mathcal{H}_d\). As a repeated application of Theorem [thm:gerber-infty] the inequality naturally extends to the conditional case.

Proof of Theorem [thm:taylor1]

We upper bound \(I_d= \log M - H_\infty(X|\mathbf{Y})\) using the lower bound on the min entropy. At high entropy \(k=M-1\) hence \[\log M - I_d \geq -\log\biggl(\frac{1}{M} + \frac{(M-1)^d}{M} \prod_{i=0}^d (Mp_i-1)\biggl)\] where \[p_i = \frac{\exp(I_\infty(X_i;Y_i))}{M}.\]

\[\begin{aligned} I_d &\leq \log\biggl( 1 + (M-1)^d \prod_{i=0}^d \bigl(\exp(I_\infty(X_i;Y_i)) -1\bigl)\biggl) \\ &= (\!M\!-\!1\!)^d \! (\ln 2)^d \! \prod_{i=0}^d I_\infty(X_i;Y_i) \!+\! o\biggl( \prod_{i=0}^d I_\infty(\!X_i;Y_i\!) \!\biggl) \end{aligned}\]

Proof of Theorem [thm:refined-extension]

We first prove the following usefull lemma. It intuitively tells that to minimize the min-entropy the pmf should not spread out to other values. For instance when the summed random variables are in a sub-group the value of their sum is confined in this sub-group.

If \(X_0\) and \(X_1\) have pmfs up to permutation \((q,\ldots,q,1-kq,0,\ldots,0)\) and \((p,\ldots,p,1-kp,0,\ldots,0)\) then the pmf of \(X_0 \oplus X_1\) is majorized by the pmf \((r,\ldots,r,1-kr,0,\ldots,0)\) where \(r=p+q-(k+1)pq\). There is equality when \(X_0,X_1\) are supported on the coset of a subgroup of \(\mathcal{G}\) of order \(k+1\). [convolution-lemma]

The convolution involves \((k+1)^2\) strictly positive terms. Namely \[\begin{cases} pq & k^2 \text{ times} \\ p(1-kq) & k \text{ times } \\ q(1-kp) & k \text{ times } \\ (1-kp)(1-kq) & \text{ once } \end{cases}.\] Further, in each mass of the results they are at most \(k+1\) terms that are added and at most once an expression containing \((1-kp)\) and \((1-kq)\). Let us assume that \(q \geq 1-kq\) and \(p \geq 1-kp\) or \(1-kq \geq q\) and \(1-kp\geq p\). By rearrangement inequality (Lemma [rearrangment-lemma]), \(kpq + (1-kp)(1-kq)\) is the largest terms that can be obtained. The \(2^{\text{nd}}\) to \((k\!+\!1)\)-th largest terms are \((k\!-\!2)pq + p(1\!-\!kq)+q(1\!-\!kp)=p\!+\!q\!-\!(k\!+\!1)pq\). This majorizes all possible results since each term of the statistical ordering is maximized the sequence of cumulative mass function is also maximized. If \(q \geq 1-kq\) and \(1-kp \geq p\) or \(1-kq\geq q\) and \(p \geq 1-kp\) the proof is the same but the \(1^{\text{st}}\) to \(k\)-th largest terms are \((k\!-\!2)pq + p(1\!-\!kq)+q(1\!-\!kp)=p\!+\!q\!-\!(k\!+\!1)pq\) and the \(k+1\)-th largest term is \(kpq + (1-kp)(1-kq)\) which is the same pmf up to a permutation. The case of equality is clear.

We derive the inequality of Thm. [thm:refined-extension]. The proof is composed of three steps. The first step is to prove that the inequality is achieved for pmf of the form \(\mathbf{p_j} = (p_{j},\ldots,p_{j},1-k_j p_{j},0,\ldots,0)\). The second step is to majorize the resulting convolution by induction. The final steps is to conclude the majorization argument. As in the case of two summands the problem is to maximize \[\max_{x \in \mathcal{G}} \sum_{i_0,i_1,\ldots,i_{d-1} \in \mathcal{G}} \biggl(\prod_{j=0}^{d-1} \mathbb{P}(X_j = i_j)\biggl) \mathbb{P}(X_d=x\ominus\bigoplus_{j=0}^{d-1} i_j).\]

Without loss of generality, we can assume that the maximum is reached in \(x=0\), it remains to upper bound \[\phi(\mathbf{p_0},\ldots,\mathbf{p_d}) \triangleq \hspace{-2em} \sum_{i_0,i_1,\ldots,i_{d-1} \in \mathcal{G}} \biggl(\prod_{j=0}^{d-1} \mathbb{P}(X_j \!=\! i_j)\!\biggl) \mathbb{P}(X_d=\ominus\bigoplus_{j=0}^{d-1} i_j).\]

We fix \(\mathbf{p_1},\ldots,\mathbf{p_d}\). The maximization can be written as \[\sum_{i_0 \in \mathcal{G}} \mathbb{P}(X_0=i_0) \alpha_{i_0}\] where \[\alpha_{i_0} = \hspace{-0.6em}\sum_{i_1,\ldots,i_{d-1} \in \mathcal{G}} \biggl(\prod_{j=1}^{d-1} \mathbb{P}(X_j = i_j)\biggl) \mathbb{P}(X_d= \ominus\bigoplus_{j=0}^{d-1} i_j).\]

This is equivalent to maximize \[\sum_{i=1}^M \mathbb{P}(X_0=(i)) \alpha_{(i)} \label{eqn:alpha-i-sum}\] where \((1),\ldots,(M)\) are such that \(\mathbb{P}(X_0=(1)) \geq \ldots \geq \mathbb{P}(X_0=(M))\). By rearrangement (Lemma [rearrangment-lemma]), [eqn:alpha-i-sum] is maximum when \(\alpha_{(1)} \geq \ldots \geq \alpha_{(M)}.\) By lemma [schur-combination] this mapping is Schur-Convex in \(\mathbf{p_0}\) hence by lemma [majorization-lemma] it is maximized for statistical ordering of the probability mass function of \(X_0\) of the form \[\mathbf{p_0} = (p_0,\ldots,p_0,1-k_0p_0,0,\ldots,0) \label{eqn:shape}\] where \(k_0 = \lfloor p_0^{-1} \!\rfloor\). Equation [eqn:shape] does not depend on the fixed probability mass functions of \(X_1,\ldots,X_d\). By symmetry, we also obtain that the for \(j=0,\ldots,d\) the statistical ordering of the probability mass function of \(X_j\) is of the form \[\mathbf{p_j} = (p_j,\ldots,p_j,1-k_jp_j,0,\ldots,0)\] where \(k_j = \lfloor p_j^{-1} \!\rfloor\). This concludes the first step of the proof. As previous proof we can further assume without loss of generality that \(k_j\) is constant equal to \(k\) for all \(j\). It remains to determine for which permutation of these probability mass function we obtain the lowest min-entropy.

Now we fix the pmf \(\mathbf{p_2},\ldots,\mathbf{p_d}\). And we consider the maximization with respect to the pmf of \(X_0+X_1\). By lemma [schur-combination], the expression is Schur-convex. Hence it is maximized for the least spread out pmf. By lemma [convolution-lemma], the pmf is majorized by \((r,\ldots,r,1-kr,0,\ldots,0)\) where \[r=p+q-(k+1)pq. \label{eqn:r-rec}\]

We can proceed by induction to majorize the sum of \(d+1\) random variables. Let \(\mathcal{H}_d\) be the induction hypothesis: The probability mass function of the sum of \(d+1\) random variables is majorized by \((r,\ldots,r,1-kr,0,\ldots,0)\) where \[(k+1)r = 1 + (-1)^d \prod_{i=0}^d ((k+1)p_i-1).\] The initialization \(\mathcal{H}_1\) is true from [eqn:r-rec]. We assume \(\mathcal{H}_j\) holds and proves \(\mathcal{H}_{j+1}\) holds. Using [eqn:r-rec] with \(\mathcal{H}_j\) we obtain that the convolution is majorized by \((r,\ldots,r,1-kr,0,\ldots,0)\) with \[\begin{aligned} r &= p_{j+1} + \frac{1}{k+1} + \frac{(-1)^j}{k+1} \prod_{i=0}^j ((k+1)p_i-1) \\ &- (k+1)p_{j+1} \left( \frac{1}{k+1} + \frac{(-1)^j}{k+1} \prod_{i=0}^j ((k+1)p_i-1) \right) \\ &= \frac{1}{k+1} + \frac{(-1)^j}{k+1} \prod_{i=0}^j ((k+1)p_i-1) (1 - (k+1)p_{j+1}) \end{aligned}\] This proves \(\mathcal{H}_{j+1}\) and we conclude by induction. This concludes the second step of the proof and it remains to conclude.

We proved that the probability mass function of the sum of \(d+1\) random variables is majorized by \((r,\ldots,r,1-kr,0,\ldots,0)\) where \[(k+1)r = 1 + (-1)^d \prod_{i=0}^d ((k+1)p_i-1). \label{eqn:r-refined}\]

This shows that \[\begin{aligned} \exp( \!-\! H_d ) &\leq \! \begin{cases} r & \text{ if } $d$ \text{ is even} \\ 1 - k r & \text{ if } $d$ \text{ is odd} \end{cases} \\ &\!= \! \begin{cases} \frac{1}{k+1} \!+\! \frac{1}{k+1} \prod_{j=0}^d ((k\!+\!1)p_i\!-\!1) & \hspace{-1em} \text{ ($d$ even) } \\ \frac{1}{k+1} \!+\! \frac{k}{k+1} \prod_{j=0}^d ((k\!+\!1)p_i\!-\!1) & \hspace{-1em} \text{ ($d$ odd) } \end{cases}. \end{aligned}\]

Proof of Proposition [prop:cosade-eq]

Using Fano’s inequality, de Chérisey et al. (de Chérisey et al. 2019 Eqn. 11) have shown that \[m I(X;\mathbf{Y}) \geq \log(M) - h(\mathbb{P}_s) - (1-\mathbb{P}_s)\log(M-1). \label{eq:eloi-linear}\] This can be explicited by computing a Taylor expansion of degree two of the binary entropy function in \(\mathbb{P}_s=\tfrac{1}{M}\), \[\begin{aligned} h(\mathbb{P}_s) &= h(\tfrac{1}{M}) + h'(\tfrac{1}{M}) (\mathbb{P}_s-\tfrac{1}{M}) \nonumber \\ &+\frac{h''(\tfrac{1}{M})}{2} (\mathbb{P}_s-\tfrac{1}{M})^2 + o( (\mathbb{P}_s-\tfrac{1}{M})^2) \\ &= \log(M) \!-\! (1\!-\!\frac{1}{M}) \log(M\!-\!1) \!+\! \log(M \!-\! 1) (\mathbb{P}_s \!-\! \tfrac{1}{M}) \nonumber \\ &- \frac{M^2 \log(e)}{2(M-1)} (\mathbb{P}_s-\tfrac{1}{M})^2 + o( (\mathbb{P}_s-\tfrac{1}{M})^2). \end{aligned}\] In particular [eq:eloi-linear] reduces to \[m I(X;\mathbf{Y}) \geq \frac{M^2 \log(e)}{M-1} (\mathbb{P}_s-\tfrac{1}{M})^2 + o( (\mathbb{P}_s-\tfrac{1}{M})^2), \label{eq:lin-lin}\] where we leveraged the following equalities \[h(\tfrac{1}{M}) = \log(M) - (1-\frac{1}{M}) \log(M-1),\] \[h'(\tfrac{1}{M})=\log(M-1) \text{ and } h''(\tfrac{1}{M})= \frac{- M^2 \log(e)}{M-1}.\] In particular, [eq:lin-lin] shows that, \[\begin{aligned} \mathbb{P}_s &\leq \frac{1}{M} + \sqrt{ \frac{2 \ln 2(M-1) m }{M^2} I(X,\mathbf{Y}) } \\ &\approx \frac{1}{M} + \sqrt{m} A_d \sqrt{ \prod_{i=0}^d I(X_i,Y_i) } & \text{ (with \cite[Eqn.~8]{DBLP:journals/iacr/BeguinotCGLMRS22})} \end{aligned}\] where \[A_d = \frac{\sqrt{ (M-1) (2 \ln 2)^{d+1}}}{M}.\]

Discussion on the Bound Optimality

To investigate the bound tightness we compute and plot in Figs. [fig:z14], [fig:z13], and [fig:z5] the sequence \(\mathbf{p}_d\) of pmf supported on a finite additive group \(\mathcal{G}\) given by a fixed pmf \(\mathbf{p}_0\) and the equation \(\mathbf{p}_{d+1} = \mathbf{p}_d * \mathbf{p}_0\) where \(*\) is the convolution with respect to the group \(\mathcal{G}\). In other words, \(\mathbf{p}_d\) is the pmf of the sum of \(d+1\) i.i.d. \(\mathcal{G}\)-valued random variables with a law given by \(\mathbf{p}_0\).

Figs. [fig:z14] and [fig:z5] show that the presented bound is tight in two situations:

  1. When the support of the random variables is in the coset of a sub-group of order \(k+1\) the inequality is tight. This is the case in Fig. [fig:z14] as \(\{\bar{0};\bar{7}\}\) is a finite sub-group of \(\mathbb{Z}_{14}\) with two elements.

  2. In the high entropic regime, \(k=M-1\) and there is always a sub-group, the group itself. This is the case in Fig. [fig:z5].

However, when there is no finite sub-group of order \(k+1\) the inequality can be strictly violated as shown by Fig. [fig:z13]. Figs. [fig:z13] and [fig:z14] differs only by their group structure changed from \(\mathbb{Z}_{14}\) to \(\mathbb{Z}_{13}\), though the effect is huge on the actual entropy of the sum. Indeed \(\{\bar{0};\bar{7}\}\) is not the coset of a sub-group of \(\mathbb{Z}_{13}\), it is even spanning the whole group. As reported in (Madiman, Wang, and Woo 2021) the Cauchy-Davenport inequality shows that for \(A,B\) two subsets of \(\mathbb{Z}_p\) (p prime), \(|A+B| \geq \min \{ |A|+|B|-1, p \}\). As a consequence, the support of the sum must spread to the whole group very quickly. The investigation of this results may improve the presented inequalities.

In Fig. [fig:z5], we observe that the min-entropy does not increase visibly neither from \(d=0\) to \(d=1\) nor from \(d=2\) to \(d=3\). This supports the observation of Theorem [thm:refined-extension] that masking with odd order might not be relevant with respect to the worst case leakages as measured by the min-entropy.

Bound Comparison

Figs [fig:d1] and [fig:d2] compare both bounds for the toy example introduced. Though the bound obtained with \(I_\infty\) does not change significantly from \(d=1\) to \(d=2\). The new bound performs better for this leakage. Especially for moderate (less than \(10^5\)) number of traces. The main limitation of this bound is that it relies on a high noise assumption.

I created this website using Quarto https://quarto.org/docs/websites.

References

Akkar, Mehdi-Laurent, and Christophe Giraud. 2001. “An Implementation of DES and AES, Secure Against Some Attacks.” In Cryptographic Hardware and Embedded Systems - CHES 2001, Third International Workshop, Paris, France, May 14-16, 2001, Proceedings, edited by Çetin Kaya Koç, David Naccache, and Christof Paar, 2162:309–18. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/3-540-44709-1\_26.
Béguinot, Julien, Wei Cheng, Sylvain Guilley, Yi Liu, Loı̈c Masure, Olivier Rioul, and François-Xavier Standaert. 2022. “Removing the Field Size Loss from Duc Et Al.’s Conjectured Bound for Masked Encodings.” IACR Cryptol. ePrint Arch., 1–18. https://eprint.iacr.org/2022/1738.
Béguinot, Julien, Wei Cheng, Sylvain Guilley, and Olivier Rioul. 2022. “Be My Guess: guessing Entropy vs. Success Rate for Evaluating Side-Channel Attacks of Secure Chips.” In 25th Euromicro Conference on Digital System Design, DSD 2022, Maspalomas, Spain, August 31 - Sept. 2, 2022, 496–503. IEEE. https://doi.org/10.1109/DSD57027.2022.00072.
Cheng, Wei, Yi Liu, Sylvain Guilley, and Olivier Rioul. 2022. “Attacking Masked Cryptographic Implementations: information-Theoretic Bounds.” In 2022 IEEE International Symposium on Information Theory (ISIT), 654–59. IEEE.
de Chérisey, Eloi, Sylvain Guilley, Olivier Rioul, and Pablo Piantanida. 2019. “Best Information Is Most Successful: mutual Information and Success Rate in Side-Channel Analysis.” IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019: 49–79. https://doi.org/10.13154/tches.v2019.i2.49-79.
Duc, Alexandre, Sebastian Faust, and François-Xavier Standaert. 2015. “Making Masking Security Proofs Concrete - or How to Evaluate the Security of Any Leaking Device.” In Advances in Cryptology - EUROCRYPT 2015 - 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Sofia, Bulgaria, April 26-30, 2015, Proceedings, Part I, edited by Elisabeth Oswald and Marc Fischlin, 9056:401–29. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-662-46800-5\_16.
Erven, Tim van, and Peter Harremos. 2014. “Rényi Divergence and Kullback-Leibler Divergence.” IEEE Transactions on Information Theory 60 (7): 3797–3820. https://doi.org/10.1109/TIT.2014.2320500.
Fehr, Serge, and Stefan Berens. 2014. “On the Conditional Rényi Entropy.” IEEE Transactions on Information Theory 60: 6801–10. https://doi.org/10.1109/TIT.2014.2357799.
Goubin, Louis, and Jacques Patarin. 1999. DES and Differential Power Analysis (the ‘Duplication’ Method).” In Cryptographic Hardware and Embedded Systems, First International Workshop, CHES’99, Worcester, MA, USA, August 12-13, 1999, Proceedings, edited by Çetin Kaya Koç and Christof Paar, 1717:158–72. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/3-540-48059-5\_15.
Hirche, Christoph. 2020. Rényi Bounds on Information Combining.” In 2020 IEEE International Symposium on Information Theory (ISIT), 2297–2302. https://doi.org/10.1109/ISIT44484.2020.9174256.
Issa, Ibrahim, Aaron B. Wagner, and Sudeep Kamath. 2020. “An Operational Approach to Information Leakage.” IEEE Transactions on Information Theory 66 (3): 1625–57. https://doi.org/10.1109/TIT.2019.2962804.
Ito, Akira, Rei Ueno, and Naofumi Homma. 2022. “On the Success Rate of Side-Channel Attacks on Masked Implementations: information-Theoretical Bounds and Their Practical Usage.” In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022, edited by Heng Yin, Angelos Stavrou, Cas Cremers, and Elaine Shi, 1521–35. ACM. https://doi.org/10.1145/3548606.3560579.
Jog, Varun, and Venkat Anantharam. 2013. “The Entropy Power Inequality and Mrs. Gerber’s Lemma for Groups of Order \(2^n\).” 2013 IEEE International Symposium on Information Theory (ISIT), 594–98.
Liu, Yi, Wei Cheng, Sylvain Guilley, and Olivier Rioul. 2021. “On Conditional Alpha-Information and Its Application to Side-Channel Analysis.” In IEEE Information Theory Workshop, ITW 2021, Kanazawa, Japan, October 17-21, 2021, 1–6. IEEE. https://doi.org/10.1109/ITW48936.2021.9611409.
Madiman, Mokshay, Liyao Wang, and Jae Oh Woo. 2021. “Entropy Inequalities for Sums in Prime Cyclic Groups.” SIAM Journal on Discrete Mathematics 35 (3): 1628–49.
Marshall, Albert W., Ingram Olkin, and Barry C. Arnold. 1980. Inequalities: Theory of Majorization and Its Applications. Springer.
Masure, Loı̈c, Olivier Rioul, and François-Xavier Standaert. 2022. “A Nearly Tight Proof of Duc Et Al.’s Conjectured Security Bound for Masked Implementations.” IACR Cryptol. ePrint Arch., 600. https://eprint.iacr.org/2022/600.
Polyanskiy, Yury, and Sergio Verdú. 2010. “Arimoto Channel Coding Converse and Rényi Divergence.” In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 1327–33. https://doi.org/10.1109/ALLERTON.2010.5707067.
Rioul, Olivier. 2021. “A Primer on Alpha-Information Theory with Application to Leakage in Secrecy Systems.” In International Conference on Geometric Science of Information, 459–67. Springer.
Shamai, S., and A. D. Wyner. 1990. “A Binary Analog to the Entropy-Power Inequality.” IEEE Transactions on Information Theory 36 (6): 1428–30. https://doi.org/10.1109/18.59938.
Tao, Terence. 2009. “Sumset and Inverse Sumset Theory for Shannon Entropy.” Combinatorics, Probability and Computing 19: 603–39.
Verdú, Sergio. 2015. \(\alpha\)-Mutual Information.” In IEEE Information Theory and Applications Workshop (ITA2015), 1–6. San Diego, USA. https://doi.org/10.1109/ITA.2015.7308959.
Wyner, Aaron D., and Jacob Ziv. 1973. “A Theorem on the Entropy of Certain Binary Sequences and Applications-I.” IEEE Transactions on Information Theory 19: 769–72.