Be My Guesses:
The Interplay Between Side-Channel Leakage Metrics

Authors

Julien Béguinot

Wei Cheng

Sylvain Guilley

Olivier Rioul

Abstract

In a theoretical context of side-channel attacks, optimal bounds between success rate, guessing entropy and statistical distance are derived with a simple majorization (Schur-concavity) argument. They are further theoretically refined for different versions of the classical Hamming weight leakage model, in particular assuming a priori equiprobable secret keys and additive white Gaussian measurement noise. Closed-form expressions and numerical computation are given. A study of the impact of the choice of the substitution box with respect to side-channel resistance reveals that its nonlinearity tends to homogenize the expressivity of success rate, guessing entropy and statistical distance. The intriguing approximate relation between guessing entropy and success rate $GE=1/SR$ is observed in the case of $8$-bit bytes and low noise. The exact relation between guessing entropy, statistical distance and alphabet size $GE = \tfrac{M+1}{2}-\tfrac{M}{2}SD$ for deterministic leakages and equiprobable keys is proved.

Side-channel analysis ,Guessing entropy ,Success rate ,Statistical Distance ,Schur concavity

Introduction

Side-Channel analysis (SCA) is a well-known threat for secure chips in embedded symmetric crypto-systems. They aim at recovering the key, byte by byte in a divide-and-conquer approach, by exploiting the leakage information. The attacker guesses one key byte $K$ from several side-channel observations $Y$ (modeled as a random vector) knowing the corresponding plain or cipher text bytes $T=t$ and leveraging a (noiseless or noisy) leakage model.

There are two main figures of merit in order to characterize the efficiency of the secrets’ recovery: success rate SR and guessing entropy GE. Roughly speaking, SR is the empirical success probability that the best ranked (most likely) key happens to be the correct one, while GE relates to the number of tries that the attacker has to make before finding the actual secret, thereby estimating the brute force effort to find the correct key by exhaustive search. On one hand, GE is more informative insofar as it depends on the whole key ranking distribution for a given number of leakage traces. On the other hand, SR computation scales easily to the whole multibyte key (the global SR being the byte-wise product of SRs) while GE is much harder to estimate in a multibyte context.

In principle, it is desirable to evaluate both SR and GE during the attack because it gives a trade-off between the required number of observations (traces) and the remaining effort for key enumeration. Of course, there is a clear strong correlation between SR and GE: a lower GE will generally mean higher SR and vice versa. This is true not only for a given attack on a given device as the number of traces increases, but also to compare different attacks or different devices endowed with different countermeasures against SCA. In this respect, these metrics are relevant both for the “black hat” attacker or the “white hat” evaluator, and the “blue hat” defender.

Another ubiquitous metric in the cryptographic community is the statistical distance (SD) to the uniform distribution. This quantifies how far a cryptographic object differs from an ideal randomness. While guessing entropy and success rate are explicitly related to the ranking distribution of an attacker, statistical distance lacks some operational interpretation in terms of attack performance. Still, it can be used as a measure of information leakage. In particular, a small statistical distance ensure that no statistical test can distinguish the considered distribution from the uniform distribution. As a consequence, this implies that no attack performs better than random guess.

However, there remains a missing theoretical link between SR, GE and SD that could be exploited to estimate one metric knowing the other. Obviously there is no one-to-one relation between them, but we show that one metric can be lower and upper bounded as a function of the other, which can be optimally determined for a given leakage model. This extended version complements the conference version (Beguinot et al. 2022) with results from Rioul [Rioul (2022a)](Rioul 2023) applied for the statistical distance that we discuss in the side-channel context.

State-of-the-art

Some previous approaches attempted to bridge the gap by extending the definition of SR to the probability SR$_i$ that the correct key belongs to the list of the first $i$ best key guesses (Standaert, Malkin, and Yung 2009). For instance (Veyrat-Charvillon et al. 2012) compares various key enumeration algorithms that allow to estimate SR$_i$ based on the knowledge of the key bytes’ likelihoods. In (Prest et al. 2019) the authors try to link statistical distance, euclidean norm, relative error and average relative error. They derive approximations for Hamming weight leakage models, large number of bits and large noise.

While computing GE can be intractable in practice, (Veyrat-Charvillon, Gérard, and Standaert 2013) heuristically approximates GE by considering “security graphs” summarizing both SR and GE for a given number of traces in the same visual representation.

Chérisey et al. (Chérisey et al. 2019) evaluate side channel attacks through SR with inequalities derived from mutual information. They also improve an inequality on GE yet the relation between the two metrics is not investigated.

A very different approach in (Choudary and Popescu 2017) derives fairly tight mathematical bounds to estimate GE from entropy or Rényi entropy of order $1/2$. From a purely theoretical viewpoint, (Sason and Verdú 2018) derives optimal bounds in very generic settings for the “guessing moments” with Rényi entropies of various orders. In this respect, considering entropy of infinite order and first order guessing moment yields optimal bounds between SR and GE. In a similar approach, [Rioul (2022a)](Rioul 2023) derives optimal inequalities for all randomness measures using majorization theory.

Contribution

In this paper, we first present simple and intuitive arguments to derive the optimal bounds between the three metrics SR, GE and SD. Such bounds are all the more tighter as the key space is small. We then refine the relationship in various SCA scenarios and leakage models, providing closed-form expressions for GE in these scenarios. We observe that the bounds are all the more tight as the leakage model is nonlinear (property of an S-Box in a block cipher), which tends to explain why the expressivity of SR and GE gets similar. This accounts for their interchangeable use as an attack working factor in the SCA literature.

Outline

The remainder of this paper is organized as follows. The notions of SR, GE and SD are introduced in Section 2 with emphasis on their similar properties such as data processing inequalities. Section 3 establishes the Schur-concavity of GE using majorization theory which allows one to derive simple and intuitive bounds between GE and SR. Further SR and SD are recalled to be Schur-convex which permits to obtain all optimal relations between the three metrics. Section 4 derives the optimal inequalities between the three metrics. The important cases of Hamming weight leakage model, with an S-Box, and with noise, are mathematically developed in Section 5.

Section 6 concludes the paper.

Definitions and Basic Properties

In this section, we define success rate, guessing entropy and statistical distance with emphasis on their similar properties.

Basic Notations

We consider an $M$-ary secret $K\in\{1,2,\ldots,M\}$ taking $M=2^n$ values and some side-channel observation $Y$ used to guess the key $\hat{K}$. Observation $Y$ gathers several measurements with known plain or cipher text bytes $T=t$. Since $\hat{K}$ depends on the actual secret key $K$ only through $Y$, the triple $K-Y-\hat{K}$ forms a Markov chain. The guess $\hat{K}$ is said to be blind if it does not depend on the observation $Y$. For any finite set $A$, $|A|$ denotes its cardinality.

Success Rate

The success rate of $\hat{K}$ denoted $\mathbb{P}_s$ is the probability that $\hat{K}$ guesses the secret, \[\mathbb{P}_s = \mathbb{P}(\hat{K}=K).\]

[thm:SR] The maximal success rate is attained with the MAP rule $\hat{k}(y) \in \arg\max_k\; \mathbb{P}(K=k|Y=y)$ and is given by \[\mathbb{P}_s(K|Y) = \mathop{\mathrm{\mathbb{E}}}_Y\bigl( \max_k\; \mathbb{P}(K=k|Y)\bigr). \label{eq:PsK|Yformula}\] In particular, for a blind guess, we write \[\mathbb{P}_s(K) = \max_k \;\mathbb{P}(K=k) \geq \frac{1}{M}.\]

Since $K-Y-\hat{K}$ is a Markov chain, $\mathbb{P}(\hat{K}=\hat{k}|Y,K)=\mathbb{P}(\hat{K}=\hat{k}|Y)$ so that \[\begin{aligned} \mathbb{P}_s &= \mathop{\mathrm{\mathbb{E}}}_Y \bigl(\mathbb{P}(\hat{K}= K |Y) \bigr)\\ &=\mathop{\mathrm{\mathbb{E}}}_Y \bigl( {\sum\nolimits_{k}} \mathbb{P}(K=k|Y) \mathbb{P}(\hat{K}=k|Y) \bigr)\\ &\leq \mathop{\mathrm{\mathbb{E}}}_Y\bigl( \max_k\; \mathbb{P}(K=k|Y)\bigr) \end{aligned}\] with equality if and only if $\mathbb{P}(\hat{K}=\hat{k}|Y)=1$ for some $\hat{k}\in \arg\max_k \mathbb{P}(K=k|Y)$.

[thm:dpi-ps] One has \[\label{eq:PsK|Y} \mathbb{P}_s(K) \leq \mathbb{P}_s(K|Y)\] (observing side channel information always increases success). More generally, if $K-Y-Z$ is a Markov chain, then \[\mathbb{P}_s(K|Z) \leq \mathbb{P}_s(K|Y)\] (data processing can only reduce success).

Since $\mathbb{P}(K=k|Y)\leq \max_k\; \mathbb{P}(K=k|Y)$, taking the expectation over $Y$ gives $\mathop{\mathrm{\mathbb{E}}}_Y\mathbb{P}(K=k|Y) \leq \mathop{\mathrm{\mathbb{E}}}_Y \max_k\; \mathbb{P}(K=k|Y)$ for every $k$, hence \[\max_k\; \mathop{\mathrm{\mathbb{E}}}_Y\mathbb{P}(K=k|Y) \leq \mathop{\mathrm{\mathbb{E}}}_Y \max_k\; \mathbb{P}(K=k|Y)\] which is [eq:PsK|Y]. This in turn implies $\mathbb{P}_s(K|Z) \leq \mathbb{P}_s(K|Y,Z)$ by considering each fixed value $Z=z$ and taking the expectation over $Z$. Finally, $\mathbb{P}_s(K|Y,Z)=\mathbb{P}_s(K|Y)$ because $K|Y,Z$ is distributed as $K|Y$ since $K-Y-Z$ is a Markov chain.

Guessing Entropy

In a guessing problem, key candidates are guessed one by one in a sequence $(1),(2),\ldots,(M)$. Such a sequence is a permutation of $\{1,2,\ldots,M\}$ where $(i)$ denotes the $i$th ranked key for $i=1,2,\ldots,M$. Thus, first $(1)$ is guessed, then $(2)$, etc. The number of key guesses before the actual secret $K=(I)$ is found is $I$, a random variable which depends upon the observation $Y$. Hence, $K-Y-I$ forms a Markov Chain.

The guessing entropy is the average number of guesses: \[G=\mathop{\mathrm{\mathbb{E}}}_{K,Y}(I)\]

Notice that some previous works define GE as $I$ itself (Choudary and Popescu 2017; Zhang, Ding, and Fei 2020).

Let $p_{(i)|y}=\mathbb{P}(K=(i)|Y=y)$ be the probability of the $i$th ranked key given observation $Y=y$.

The minimal guessing entropy is attained with the ranking rule \[p_{(1)|y}\geq p_{(2)|y}\geq \cdots \geq p_{(M)|y}\] and is given by \[\label{eq:GK|Y} G(K|Y) = \mathop{\mathrm{\mathbb{E}}}_Y \biggl(\sum_{k=1}^M k\; p_{(k)|Y}\biggr).\] In particular, for a blind guess, this reduces to $G(K) =\sum_{k=1}^M k p_{(k)}$, where the $p_{(k)}=\mathbb{P}(K\!=\!(k))$ are in descending order.

Often $G(K)$ is simply referred to as the guessing entropy of $K$ while $G(K|Y)$ is known as the conditional guessing entropy of $K$ given $Y$.

By the law of total expectation, \[G=\mathop{\mathrm{\mathbb{E}}}_Y\mathop{\mathrm{\mathbb{E}}}_{K}(I|Y) = \mathop{\mathrm{\mathbb{E}}}_Y \biggl(\sum_{i=1}^M i\cdot \mathbb{P}(K=(i)|Y)\biggr).\] By the rearrangement inequality (Hardy, Littlewood, and Pólya 1934 Thm. 368), since $(i)$ is an increasing sequence, the minimum $G$ is obtained when the probabilities $\mathbb{P}(K\!=\!(i)|Y)$ are in descending order.

[thm:dpi-ge] One has \[\label{eq:DPIG} G(K) \geq G(K|Y)\] (observing side channel information improves guessing).

More generally, if $K-Y-Z$ is a Markov chain, then \[G(K|Z) \geq G(K|Y)\] (data processing can only worsen guessing).

Without loss of generality assume that $K$’s probability distribution is in descending order $p_1\geq p_2\geq \cdots\geq p_M$ so that $I=K$ and $G(K)=\mathop{\mathrm{\mathbb{E}}}(K)$. Then by definition of minimum guessing, $G(K|Y=y)\leq \mathop{\mathrm{\mathbb{E}}}(K|Y=y)$. Taking the expectation over $Y$ gives $G(K|Y)\leq \mathop{\mathrm{\mathbb{E}}}_Y\mathop{\mathrm{\mathbb{E}}}(K|Y)=\mathop{\mathrm{\mathbb{E}}}(K)=G(K)$ by the law of total expectation. This proves [eq:DPIG]. This in turn implies $G(K|Z) \geq G(K|Y,Z)$ by considering each fixed value of $Z=z$ and taking the expectation over $Z$. Finally, $G(K|Y,Z)=G(K|Y)$ because $K|Y,Z$ is distributed as $K|Y$ since $K-Y-Z$ is a Markov chain.

Statistical Distance to the Uniform

[def:disting] Let $A$ be an event. The distinguishability of the random variable $K$ from the uniform random variable $U$ under event $A$ is defined as \[\Delta_A(K) = |\mathbb{P}(K \in A) - \mathbb{P}(U \in A)|.\] If $\Delta_A(K)$ is significantly large then $K$ can be distinguished from the uniform distribution. In the following we consider the optimal distinguishability.

The optimal distinguishability corresponds to the statistical distance (SD) to the uniform random variable i.e. \[\begin{aligned} \label{eq:delta-opt} \Delta(K) = \max_A \Delta_A(K) &= \frac{1}{2} \sum_k |\mathbb{P}(K=k)- \frac{1}{M}| \\ \nonumber &= \sum_k \left(\mathbb{P}(K=k)- \frac{1}{M} \right)^+ \leq 1 \end{aligned}\] where $(x)^+ = \max(0,x)$ is the positive part function. In the conditional case we write the average optimal distinguishability \[\Delta(K|Y) = \mathop{\mathrm{\mathbb{E}}}_Y[\Delta(K|Y=y)].\]

The expression with the positive part is direct from Definition [def:disting]. Equality with [eq:delta-opt] is well known, we recall a simple proof for completeness. Let $A^+ = \{ k | p(k) \geq \frac{1}{M} \}$. Then $\sum_k \left(\mathbb{P}(K=k)- \frac{1}{M} \right)^+ = \mathbb{P}(K \in A^+) - \frac{|A^+|}{M} = (1 - \mathbb{P}(K \not\in A^+))-(1 - \frac{M-|A^+|}{M})= \frac{M-|A^+|}{M}-\mathbb{P}(K \not\in A^+)=\sum_k \left(\frac{1}{M} - \mathbb{P}(K=k)\right)^+$. This concludes the proof since $x^+ + (-x)^+ = |x|$.

Duc et al. (Duc, Faust, and Standaert 2015) uses the statistical distance to the uniform that we term distinguishability as metric to measure the security of implementations against side channel analysis. Sometimes the distinguishability is referred to as total variation in the blind guess setting (no conditioning) and statistical distance with side-channel information (conditional version).

This notion is relevant in the cryptographic context. A small statistical distance means that the random variable is indistinguishable from the uniform random variable. As is clear from its definition, the probability of success minus the probability of success of a random guess in a statistical test is upper bounded by the distinguishability. Hence a small distinguishability implies a probability of success close to a random guess. This notion is related to the notion of distinguishing advantage of an adversary in a cryptographic context.

[thm:dpi-sd] \[\Delta(K|Y) \geq \Delta(K)\] (observing side channel information increases distinguishability).

If $K-Y-Z$ forms a Markov Chain \[\Delta(K|Y) \geq \Delta(K|Z).\] (data processing can only decrease distinguishability)

The proof rely on the convexity of the absolute value combined with Jensen’s inequality. \[\begin{aligned} \Delta(K|Y) &= \mathop{\mathrm{\mathbb{E}}}_Y [ \frac{1}{2} \sum_k |p(k|Y) - \frac{1}{M}|] \\ &= \frac{1}{2} \sum_k \mathop{\mathrm{\mathbb{E}}}_Y [|p(k|Y) - \frac{1}{M}|] \\ &\geq \frac{1}{2} \sum_k |\mathop{\mathrm{\mathbb{E}}}_Y [p(k|Y) - \frac{1}{M}]| \\ &= \frac{1}{2} \sum_k |p(k)-\frac{1}{M}| \\ &= \Delta(K). \end{aligned}\] This in turn implies $\Delta(K|Z) \leq \Delta(K|Y,Z)$ by considering each fixed value of $Z=z$ and taking the expectation over $Z$. Finally, $\Delta(K|Y,Z)=\Delta(K|Y)$ because $K|Y,Z$ is distributed as $K|Y$ since $K-Y-Z$ is a Markov chain.

More generally (Rioul 2022a) unified Thms. [thm:dpi-ps], [thm:dpi-ge], [thm:dpi-sd] by showing that all “randomness measures” verify a data processing inequality where being a “randomness measure” essentially means being a Schur-concave function.

Schur Properties

Key Concepts of Majorization Theory

We first introduce some notations for the theory of majorization (Marshall, Olkin, and Arnold 2011). Hereafter we let $p_{(1)}, p_{(2)}, \ldots, p_{(M)}$ denote the vector $p=(p_1,p_2,\ldots,p_M)$ of non-negative elements arranged in descending order $p_{(1)}\geq p_{(2)}\geq \cdots\geq p_{(M)}$. We also use the cumulative sum notation \[P_{(k)}= p_{(1)}+p_{(2)}+\cdots+p_{(k)}\qquad (k=1,\ldots, M)\] with the convention $P_{(0)}=0$.

We say that $q$ majorizes $p$, and we write $p\preceq q$ if \[P_{(k)} \leq Q_{(k)} \qquad (k=1,\ldots, M-1)\] and $P_{(M)}=Q_{(M)}$. (Notice that this latter condition is always satisfied when $p$ and $q$ are probability distributions since $P_{(M)}=\sum_k p_k = 1$ and $Q_{(M)}=\sum_k q_k = 1$.)

The intuition behind majorization is that $p\preceq q$ means that $p$ is more “spread out” than $q$. Thus in the case of a probability distribution $p$, the minimum spread is for a deterministic (but Dirac) distribution and the maximum spread is for a uniform distribution. Indeed, it is easily checked that \[\label{eq:majminmax} (\tfrac{1}{M}, \tfrac{1}{M}, \ldots,\tfrac{1}{M}) \preceq p \preceq (1, 0, 0, \ldots, 0)\] for any probability distribution $p=(p_1,p_2,\ldots,p_M)$. More generally (Marshall, Olkin, and Arnold 2011), \[(\tfrac{P_{(M)}}{M}, \tfrac{P_{(M)}}{M}, \ldots,\tfrac{P_{(M)}}{M}) \preceq p \preceq (P_{(M)}, 0, 0, \ldots, 0)\] for any vector $p=(p_1,p_2,\ldots,p_M)$.

A function $G$ is Schur-concave if $p\preceq q\implies G(p)\geq G(q)$. Similarly, $G$ is Schur-convex if $p\preceq q\implies G(p)\leq G(q)$.

In other words, a Schur-concave function is large for “spread out” distributions and small for “condensed” distributions, while inversely for a Schur-convex function.

Guessing Entropy is Schur-Concave, Probability of Success and Distinguishability are Schur-Convex

It is well known that entropy (Marshall, Olkin, and Arnold 2011), and more generally the Rényi entropy of any order (Ho and Verdú 2015) (e.g., min-entropy, collision entropy, etc.) is Schur-concave. Perhaps lesser known is that guessing entropy is Schur-concave:

[thm:schur] One has

Guessing entropy $G(K)= \sum_{k=1}^M k p_{(k)}$ is Schur-concave in $p$.
Probability of Success $\mathbb{P}_s(K)$ is Schur-convex in $p$.
Distinguishability $\Delta(K)$ is Schur-convex in $p$.

Using summation by parts, \[\begin{aligned} \sum_{k=1}^M k p_{(k)} &= \sum_{k=1}^M k (P_{(k)} -P_{(k-1)})\\[-1ex] &=MP_{(M)} - P_{(0)} + \sum_{k=1}^{M-1} \bigl(k-(k+1)\bigr) P_{(k)}\\ &= M - P_{(1)} - P_{(2)} - \cdots - P_{(M-1)}. \end{aligned}\] The Schur-concavity of $G(K)$ is now obvious from the definitions. The Schur-convexity of the probability of success is immediate from the definitions. Let $p \preceq q$. Let $t$ be the largest index such that $p_{(t)} \geq \tfrac{1}{M}$. Then the distinguishability for $p$ is $P_{(t)}-\tfrac{t}{M}$. Moreover the distinguishability for $q$ is at least $Q_{(t)}-\tfrac{t}{M}$. The Schur-convexity follows from the definition as $Q_{(t)}\geq P_{(t)}$.

Recent works on guessing such as (Graczyk and Sason 2021) state Schur-concavity of Rényi entropy but do not mention the same property for GE. During the review process we became aware that the Schur-concavity of GE was observed earlier by Khouzani and Malacaria (Khouzani and Malacaria 2019) among many other types of entropies. They established Schur-concavity by stating (without proof) that $G(K)$ is symmetric and concave in the probability distribution of $K$. While symmetry is obvious here, concavity of GE is precisely established by inequality [eq:DPIG] above. The Schur-convexity of $\Delta$ is also shown in [Rioul (2022a)](Rioul 2023) by stating it is symmetric and concave.

The proof of this Theorem carries over verbatim for any function of the form $\sum_{k=1}^M \alpha_k p_{(k)}$ where $(\alpha_k)$ is an increasing sequence. In particular for guessing moments (Arikan 1996):

$G_\rho(K) = \sum_{k=1}^M k^\rho p_{(k)}$ is Schur-concave in $p$.

These results are in line with the known inequalities between guessing entropy (or guessing moments) and entropy (or Rényi entropies) as established in (Arikan 1996; Rioul 2022b).

Since guessing entropy is Schur-concave, it follows from [eq:majminmax] that guessing entropy is minimized for the deterministic distribution and maximized for the uniform distribution, which gives the trivial bounds $1\leq G(K) \leq \frac{M+1}{2}$.

Optimal Bound Derivation

In this section, we present the optimal bounds for SR, SD and GE explicitly. The results are recapped in Fig. [fig:tri-force], which depicts the overall connections among SR, SD and GE.

Optimal Bounds between GE and SR

[thm:optimalbounds] For a fixed success rate $\mathbb{P}_s(K)$, the optimal lower and upper bound on guessing entropy $G(K)$ are \[\bigl(1+\lfloor\tfrac{1}{\mathbb{P}_s(K)}\rfloor\bigr) \bigl(1-\tfrac{1}{2}\lfloor\tfrac{1}{\mathbb{P}_s(K)}\rfloor \mathbb{P}_s(K)\bigr) \leq G(K) \leq 1+\frac{M}{2}(1-\mathbb{P}_s(K)).\]

From Theorem [thm:schur], for a fixed $p_{(1)}$, $G(K)-\mathbb{P}_s(K) = \sum_{k=2}^M k p_{(k)}$ is Schur-concave in $(p_{(2)}, \ldots, p_{(M)})$. It follows that this quantity is maximum for the uniform distribution $(p_{(2)},\ldots,p_{(M)})=(\frac{1-\mathbb{P}_s}{M-1}, \frac{1-\mathbb{P}_s}{M-1}, \ldots, \frac{1-\mathbb{P}_s}{M-1})$ and minimum for the least spread out distribution $(p_{(2)},\ldots,p_{(M)})$ with $p_{(k)}\leq \mathbb{P}_s$. It is easily seen that the latter (least spread out) distribution is of the form $(p_{(2)},\ldots,p_{(M)})=(\mathbb{P}_s,\ldots,\mathbb{P}_s, x, 0, \ldots, 0)$ where $x<\mathbb{P}_s$ is such that $\sum_{k=2}^M p_{(k)}=1$, that is, $x=1- \lfloor1/\mathbb{P}_s\rfloor \mathbb{P}_s$. Plugging these values of $(p_{(1)},p_{(2)},\ldots,p_{(M)})$ into the expression of the guessing entropy gives the announced lower and upper bounds.

Fig. [fig:bounds:ps-ge] illustrates the corresponding optimal regions (in blue) between $\mathbb{P}_s$ and $G$ for $M=2^n$ with $n=2, 4, 8$, respectively.

If $X$ is a geometric random variable with parameter $p=\mathbb{P}_s$ defined over $\mathbb{N}$ then the guessing entropy of $X$ is exactly the inverse of the optimal probability of guessing $X$. This suggests that if a random variable is well approximated by a geometric random variable then the approximation that the guessing entropy is the reciprocal of the probability of success holds.

[thm:ps-ge] \[(1 + \bigl\lfloor \tfrac{1}{\mathbb{P}_s(K|Y)} \bigr\rfloor)(1 - \bigl\lfloor\tfrac{1}{\mathbb{P}_s(K|Y)}\bigr\rfloor \frac{\mathbb{P}_s(K|Y)}{2}) \leq G(K|Y) \leq 1 + \frac{M}{2} (1 - \mathbb{P}_s(K|Y)). \label{conditioned-bound}\]

Applying Theorem [thm:optimalbounds] to the random variable $K|Y=y$ for every value $y$ gives $(1 + \lfloor\tfrac{1}{\mathbb{P}_s(K|Y=y)}\rfloor)(1 - \lfloor\tfrac{1}{\mathbb{P}_s(K|Y=y)}\rfloor \tfrac{\mathbb{P}_s(K|Y=y)}{2}) \leq G(K|Y=y) \leq 1 + \frac{M_y}{2} (1 - \mathbb{P}_s(K|Y=y))$ where $M_y\leq M$ is the number of possible keys given $Y=y$. Taking the expectation over $Y$ we obtain lower and upper bounds on $G(K|Y)=\mathop{\mathrm{\mathbb{E}}}_y G(K|Y=y)$. By Theorem [thm:SR], $\mathbb{P}_s(K|Y)=\mathop{\mathrm{\mathbb{E}}}_y \mathbb{P}_s(K|Y=y)$, we obtain the announced upper bound $G(K|Y) \leq 1 + \frac{M}{2} (1 - \mathbb{P}_s(K|Y))$.

The lower bound, of the form $\phi(p)=(1+\lfloor\frac{1}{p}\rfloor)(1-\lfloor\frac{1}{p}\rfloor \frac{p}{2})$, is piecewise linear and convex in $p=\mathbb{P}_s$. Indeed, its value at $p=\frac{1}{k}$ for positive integer $k$ is $(1 + k)(1 - \frac{k}{2k} )=\frac{1+k}{2}$, hence its successive slopes between $p=\frac{1}{k-1}$ and $p=\frac{1}{k}$ are $\frac{1/2}{\frac{1}{k}-\frac{1}{k-1}}=-\frac{k(k-1)}{2}$, which is increasing as $p=\frac{1}{k}$ increases. Thus, by Jensen’s inequality, we have $\mathop{\mathrm{\mathbb{E}}}_{y}[\phi(\mathbb{P}_s(K|Y=y))]\geq \phi(\mathop{\mathrm{\mathbb{E}}}_{y}[\mathbb{P}_s(K|Y=y)])=\phi(\mathbb{P}_s(K|Y))$, which gives the announced lower bound.

At high noise when $\frac{1}{M} \leq \mathbb{P}_s(K|Y) \leq \frac{1}{M-1}$ this simplifies to \[\tfrac{M+1}{2} - \tfrac{M(M-1)}{2} (\mathbb{P}_s(K|Y) - \tfrac{1}{M}) \leq G(K|Y) \leq \tfrac{M+1}{2} - \tfrac{M}{2} (\mathbb{P}_s(K|Y) - \tfrac{1}{M}).\] At low noise when $\mathbb{P}_s(K|Y) = 1 - \epsilon$ where $\epsilon \leq \frac{1}{M}$, this simplifies to \[1 + \epsilon \leq G(K|Y) \leq 1 + \tfrac{M}{2} \epsilon.\]

It is immediate from its proof that a refinement of the upper bound of Theorem [thm:ps-ge] is given by \[G(K|Y) \leq 1 + \frac{\max_y M_y}{2} (1 - \mathbb{P}_s(K|Y)). \label{eq:refB}\] This is particularly interesting for deterministic (noiseless) leakage since, as shown in the next Section, $M_y$ decreases rapidly as the number of traces increases.

Optimal Bounds on GE for a Given SD

[thm:sd-ge-blind] Let $U(K)=M(1-\Delta(K))$, then for a blind guess, \[1 + \lfloor U(K) \rfloor \frac{2 U(K) - \lfloor U(K) \rfloor -1}{2M} \leq G(K) \leq \frac{1+U(K)}{2}. \label{eqn:ge-sd-blind}\]

This is proved using majorization in [Rioul (2022a)](Rioul 2023). This corresponds to optimal Pinsker and reverse Pinsker inequalities.

[thm:sd-ge] Let $U(K|Y)=M(1-\Delta(K|Y))$, then with Side-Channel-Information $Y$, \[1 + \lfloor U(K|Y) \rfloor \frac{2 U(K|Y) - \lfloor U(K|Y) \rfloor -1}{2M} \leq G(K|Y) \leq \frac{1+U(K|Y)}{2}. \label{eqn:ge-sd}\]

This is proved in [Rioul (2022a)](Rioul 2023). The upper is linear and the lower bound is convex, hence Jensen’s inequality can be applied.

We are often interested in the behavior of the metrics in the high noise scenario. Hence we explicit the bound when $\Delta(K|Y) < \frac{1}{M}$, \[\frac{M+1}{2} - (M-1)\Delta(K|Y) \leq G(K|Y) \leq \frac{M+1}{2} - \frac{M \Delta(K|Y)}{2}.\] As expected, we obtain a bound with a constant term $\frac{M+1}{2}$ corresponding to a blind guess. Then a linear term in the statistical distance is subtracted from it. Another interesting case is the noiseless scenario where $\Delta = 1 - \frac{1}{M} - \epsilon$ where $\epsilon \leq \frac{1}{M}$, \[1 + \epsilon \leq G(K|Y) \leq 1 + \frac{M}{2} \epsilon.\] As expected we obtain a constant term $1$ corresponding to a perfect guess. Then a linear term in $\epsilon$ is added from it.

Fig. [fig:bounds:sd-ge] illustrates the corresponding optimal regions (in blue) between $\Delta$ and $G$ for $M=2^n$ with $n=2, 4, 8$, respectively.

Optimal Bounds between SD and SR

Finally, we present the optimal Fano and reverse-Fano inequalities in between the probability of success and statistical distance. Equivalently we also use optimal Pinsker and reverse Pinsker inequalities to obtain the optimal regions.

[thm:ps-sd-uncond] For a blind guess, \[\label{eq:Pinsker-1} \Delta(K) + \frac{1}{M} \geq \mathbb{P}_s(K) \geq \frac{1}{M} + \frac{\Delta}{ \lfloor M(1-\Delta(K)) \rfloor}\] or equivalently \[\label{eq:Fano-1} \begin{split} \mathbb{P}_s(K) - \tfrac{1}{M} \leq\Delta(K) \leq \frac{1}{2} \left( \tfrac{M-1}{M} + (\mathbb{P}_s(K) - \tfrac{2}{M}) \lfloor \mathbb{P}_s(K)^{-1} \rfloor \right. \\ \left. \quad\qquad + |1 - \mathbb{P}_s(K) \lfloor \mathbb{P}_s(K)^{-1} \rfloor - \tfrac{1}{M} | \right). \end{split}\]

This is also an application of majorization theory as shown in [Rioul (2022a)](Rioul 2023). The application of optimal Pinsker and reverse Pinsker inequalities yields [eq:Pinsker-1] while the application of optimal Fano and reverse Fano inequalities yields [eq:Fano-1].

Notice that [eq:Pinsker-1] [eq:Fano-1]. are equivalent because they correspond to the same regions which are optimally characterized. This equivalence would have been difficult to derive directly.

In the high noise scenario where $\frac{1}{M} \leq \mathbb{P}_s \leq \frac{1}{M-1}$ it simplifies to, \[\mathbb{P}_s(K) - \frac{1}{M} \leq \Delta(K) \leq (M-1) (\mathbb{P}_s(K) - \frac{1}{M}).\] In the noiseless scenario where $\mathbb{P}_s(K|Y) = 1 - \epsilon(K)$ with $\epsilon(K) \leq \frac{1}{M}$ it boils down to \[\frac{M-1}{M} - \epsilon \leq \Delta(K) \leq \frac{M-1}{M} - \frac{2}{M} \epsilon(K).\]

[thm:ps-sd-cond] Let $U(K|Y)=M(1-\Delta(K|Y))$, then with side-channel information $Y$, \[\label{eq:cond-pinsker} \mathbb{P}_s(K|Y) \geq 1-\frac{U(K|Y)-1+\lfloor U(K|Y) \rfloor \lfloor U(K|Y) -1 \rfloor }{ \lfloor U(K|Y) \rfloor \lfloor U(K|Y) + 1 \rfloor}\] or equivalently, \[\begin{gathered} \label{eq:cond-rev-fano} \Delta(K|Y) \leq \tfrac{M-1}{M} - (\lfloor \mathbb{P}_s(K|Y)^{-1} + 1 \rfloor \mathbb{P}_s(K|Y) -1) \lfloor \mathbb{P}_s(K|Y)^{-1} \rfloor \tfrac{\lfloor \mathbb{P}_s(K|Y)^{-1} \rfloor\!-\!1}{M} \\- (1-\lfloor \mathbb{P}_s(K|Y)^{-1} \rfloor \mathbb{P}_s(K|Y)) \lfloor \mathbb{P}_s(K|Y)^{-1} + 1 \rfloor \tfrac{ \lfloor \mathbb{P}_s(K|Y)^{-1} \rfloor}{M}. \end{gathered}\]

The convex envelope of [eq:Pinsker-1] yields [eq:cond-pinsker]. The concave envelope of [eq:Fano-1] yields [eq:cond-rev-fano].

Once again we know by construction that [eq:cond-pinsker] and [eq:cond-rev-fano] are equivalent which is not obvious from the formulas.

Fig. [fig:bounds:ps-sd] illustrates the corresponding optimal regions (in blue) between $\mathbb{P}_s$ and $\Delta$ for $M=2^n$ with $n=2, 4, 8$, respectively.

Refined Bounds for Hamming Weight Leakage Model

Deterministic Leakage for One Observed Trace

A well-known leakage model of an embedded cryptographic device in a noiseless scenario is the Hamming weight model \[\label{eq:HWmodel} Y=w_H(K \oplus t)\] where $w_H$ is the bitwise Hamming weight operator (Mangard, Oswald, and Popp 2006), $\oplus$ denotes the XOR operation and $T=t$ is given value of plain or cipher text. Let $\mathcal{Y}=\{0,1,\ldots,n\}$ be the set of all values taken by $Y$ and $\mathcal{K}_y$ be the set of key values $k$ for fixed $Y=y$.

For the Hamming weight model, the region [conditioned-bound] reduces (improves) to the following values of SR and GE: \[\begin{aligned} G(K|Y) &\leq 1 + \frac{1}{2} \binom{n}{\lfloor\frac{n+1}{2}\rfloor} (1-\mathbb{P}_s(K|Y)) \label{eq:rouge}\\ \mathbb{P}_s(K|Y) &\geq \binom{n}{\lfloor\frac{n+1}{2}\rfloor}^{-1}. \label{eq:lowps} \end{aligned}\]

For observed $Y=y$, $M_y=|\mathcal{K}_y|$ is the number of $n$-bit vectors having Hamming weight $y$, that is, $M_y=\tbinom{n}{y}$ in the improved bound [eq:refB]. Since $\max_y M_y= \tbinom{n}{\lfloor\frac{n+1}{2}\rfloor}$, this gives [eq:rouge].

Since $K|Y=y$ has $M_y$ possible values, $\mathbb{P}_s(K|Y=y) =\max_k \mathbb{P}(K=k|Y=y) \geq \frac{1}{M_y}\geq {{1}/{\tbinom{n}{ \lfloor\frac{n+1}{2}\rfloor}}}$. Averaging over $Y$ gives [eq:lowps]. Equality holds if and only if $K$ is uniformly distributed over the largest class $\mathcal{K}_y$.

Figure [fig:bounds:ps-ge] illustrates the improvement for $n=2$, $4$, and $8$ bits, where the red curves correspond to the reduced upper bound [eq:rouge]. It can be observed that the case of equality in [eq:lowps] corresponds to the points where the upper bound [eq:rouge] (red curve) and the lower bound in [conditioned-bound] (blue curve) meet. In particular for $M=2^2$ our improved upper bound coincide with the lower bound. This proves that in this case the SR and GE are in one to one correspondence with a Hamming Weight leakage model.

Case of Equiprobable Keys

A usual assumption is that $K$ is a priori uniformly distributed over $M$ values. In this case the following exact formulas hold. Similar formulas for SR and GE can be found in (Cheng 2021).

[th:psy] \[\begin{split} \mathbb{P}_s(K|Y) &= \frac{|\mathcal{Y}|}{M} \\ G(K|Y) &= \frac{1}{2} + \frac{1}{2M} \sum \limits_{y \in \mathcal{Y}} M_y^2, \end{split} \label{eq:gen-formula}\] \[\Delta(K|Y) = 1 - \frac{1}{M^2} \sum_{y \in \mathcal{Y}} M_y^2. \nonumber\] More generally, these formulas hold when $Y$ is any deterministic function of $K$. It is interesting to remark that the statistical distance and the guessing entropy are in one-to-one relationship in this case. Indeed, \[\Delta(K|Y) = 1 - \frac{2 G(K|Y) - 1}{M}. \label{eq:SD-G-1to1}\] In the special case of the Hamming weight model [eq:HWmodel], this gives \[\label{eq:nonoise} \mathbb{P}_s(K|Y) = \frac{n+1} {2^n}, \quad G(K|Y) = \frac{1 + 2^{-n }\tbinom{2n}{n}}{2},\] \[\text{and } \Delta(K|Y) = 1 - 2^{-2n}\tbinom{2n}{n}.\] If the leakage model is a subset of $k$ bits this yields \[\mathbb{P}_s(K|Y) = 2^k / M, \quad GE(K|Y) = \frac{1}{2} + \frac{M 2^{-k}}{2}, \quad\Delta(K|Y)=1-2^{-k}.\]

If $K$ is equiprobable and $Y=y$ is fixed (with probability $\mathbb{P}(Y=y)=\frac{M_y}{M}$), then $K|Y=y$ is equiprobable over $M_y=|\mathcal{K}_y|$ values so that $\mathbb{P}_s({K}|Y=y)=\frac{1}{M_y}$. Taking the average over $Y$ gives $\mathbb{P}_s(K|Y)=\sum_y \frac{M_y}{M} \frac{1}{M_y}$, which yields the announced expression for SR. Similarly $G(K|Y=y)= \frac{M_y+1}{2}$ for a uniform guess, and taking the average over $Y$ gives $G(K|Y)=\sum_y \frac{M_y}{M} \frac{M_y+1}{2}$, which yields the announced expression for GE. If $Y=y$ is fixed then $\Delta(K|Y=y) = \frac{1}{2} ( M_y (\frac{1}{M_y} - \frac{1}{M}) + (M-M_y) \frac{1}{M}) = 1 - \frac{M_y}{M}$. Taking the average over $Y$ yields $\sum_y \frac{M_y}{M} (1 - \frac{M_y}{M}) = 1 - \frac{1}{M^2} \sum_{y \in \mathcal{Y}} M_y^2$. The Hamming weight case follows from the Vandermonde’s identity $\sum_{k=0}^n \binom{n}{k}^{\!2} = \binom{2n}{n}$.

It is easily seen that we recover the well-known expressions $\mathbb{P}_s(K)=\frac{1}{M}$, $G(K)=\frac{M+1}{2}$, and $\Delta(K)=0$ for a blind guess.

Deterministic Leakage for Multiple Observed Traces

Consider multiple observed traces ($Q$ queries) $Y=(Y_1,Y_2,\ldots,Y_Q)$, where \[Y_i=w_H(K \oplus t_i)\qquad (i=1,2,\ldots,Q)\] for fixed and distinct plain or cipher texts $t_1,t_2,\ldots,t_Q$. In this case we are faced with a combinational problem since letting $Y=y$ determines the intersection of $Q$ Hamming balls.

To simplify the analysis we consider $Q=2$ and the computation of SR. Without loss of generality we can set $t_1=0$ and consider variable $t_2=t$.

Let $w=w_H(t)$. Then \[\mathbb{P}_s(K|Y) = \frac{(w+1)(n-w+1)}{2^n}\]

In particular for $8$-bit bytes ($n=8$), one obtains: \[\mathbb{P}_s= \begin{cases} \frac{n+1}{2^n} &\text{for $w\in\{0, 8\}$} \\ \frac{2n}{2^n} & \text{for $w\in\{1, 7\}$} \\ \frac{3(n-1)}{2^n} & \text{for $w\in\{3, 6\}$} \\ \frac{4(n-2)}{2^n} & \text{for $w\in\{4, 5\}$.} \end{cases}\]

We show that $|\mathcal{Y}|=(w+1)(n-w+1)$ in [eq:gen-formula] as illustrated in Fig. [fig:latt] (a), where the set $\mathcal{Y}$ of points $(y_1=w_H(k),y_2=w_H(k\oplus t))$ forms a (rotated) $(w+1)\times(n-w+1)$ rectangle. Indeed, let $\bar{t}$ be the binary complement of $t$ and write the decomposition $w_H(k)=w_H(k\cdot t)+w_H(k\cdot \bar{t})$ where $\cdot$ denotes the bitwise product. For fixed $w_H(t)=w$, $w_H(k\cdot t)$ can take $w+1$ values and $w_H(k\cdot\bar{t})$ takes $(n-w)+1$ independent values. Since $w_H(k\oplus t)=w_H(k)+w_H(t)-w_H(k\cdot t)=w+w_H(k\cdot\bar{t})$, we have $(y_1,y_2)=(w_H(k\cdot t)+w_H(k\cdot \bar{t}), w+w_H(k\cdot\bar{t}))$ which takes all possible $(w+1)(n-w+1)$ values.

More generally, the set $\mathcal{Y}$ can be determined by exhaustive enumeration of Hamming weights. We computed numerically the resulting SR and GE for $Q=1$, 2, 3, and 4 traces. They are plotted as green dots in Fig. [fig:bounds:ps-ge] for different values of $M$.

Role of the S-Box in the Hamming Weight Model

To prevent differential and linear cryptanalyses, block ciphers are composed with non-linear operations. This non-linearity is performed by substitution box (S-Box). We investigate different choices for the S-Box to observe its effect on SR and GE with respect to SCA resistance. We consider \[S_i(x)= a x^i \oplus b\in \mathbb{F}_{2^n}\] for exponents $i=\{1,7,19,101,254\}$, constants $a,b \in \mathbb{F}_{2^n}$.

As an illustration, Fig. [fig:latt] plots the various sets $\mathcal{Y}$ of Hamming weight leakage values for $S_1$ (linear), $S_7$, and the AES standard $S_{254}$ (highly nonlinear). We observe that the cardinality $|\mathcal{Y}|$ increases as exponent $i$ increases. This shows that SR (as given by Theorem [th:psy]) increases as nonlinearity increases. Fig. [fig:latt3] (the 3-D extension of Fig. [fig:latt]) also plots $M_y$ as a function of $y\in\mathcal{Y}$. Here we observe that $M_y$ tends to globally decrease as exponent $i$ increases, which shows that GE (as given by Theorem [th:psy]) decreases as nonlinearity increases.

Therefore, the non-linearity of the S-Box diminishes the side channel resistance. The geometrical explanation of this phenomenon is that the scatter plots of Fig. [fig:latt] and [fig:latt3] tend to spread out for nonlinear S-Boxes. This confirms the observation of (Heuser, Rioul, and Guilley 2014) on the effect of the S-Box on the confusion coefficient, which for monobit leakage relates to both SR and GE (Chérisey, Guilley, and Rioul 2018).

Hamming Weight Leakage Model With Gaussian Noise

In this section we derive the expression of SR and GE in an Hamming Weight leakage scenario \[Y=w_H(K \oplus t)+N\] in the presence of additive white Gaussian Noise (AWGN) $N\sim\mathcal{N}(0,\sigma^2)$. Let $f_Y$ and $\phi_{\sigma}$ denote the p.d.f. of $Y$ and $N$, respectively. Thus $\phi_{\sigma}(x) = \frac{1}{\sqrt{ 2 \pi \sigma^2}} \exp(- \frac{x^2}{2 \sigma^2})$. Let $Q$ denote the standard $Q$-function $Q(x) = \int_{x}^\infty \frac{e^{-\frac{u^2}{2}}}{\sqrt{2 \pi}} du$.

[thm:noisy:hw] \[\begin{aligned} \mathbb{P}_s(K|Y) &= \frac{n+1}{M} - \frac{2n}{M} Q\bigl(\frac{1}{2 \sigma}\bigr), \label{eq:PsK|Ynoisy} \\ G(K|Y) &= \frac{1}{2} + \frac{\tbinom{2n}{n}}{2 M} + \frac{2\tbinom{2n}{n+1}}{M} Q\bigl(\frac{1}{2\sigma}\bigr) + \sum \limits_{i=2}^{2n} f_i(n) Q\bigl(\frac{i}{2\sigma}\bigr), \label{eq:GK|Ynoisy} \\ \Delta(K|Y) &= 1 - \frac{\binom{2n}{n}}{M^2} - 2\left(1+\tfrac{1-M+\binom{2n+1}{n+1}-2\binom{2n}{n}}{M^2} \right) Q(\frac{1}{2\sigma}) + O\left(Q(\frac{3}{2\sigma})\right) \label{eq:SD|Ynoisy}\end{aligned}\] where the latter sum is negligible at first order in $\sigma$ and where the $f_i$ are rational functions in $n$ and $M$.

For low noise one recovers [eq:nonoise]. The proof is left in 7.

Validation by Simulation

We evaluated numerically the relation between SR and GE for different noise levels $\sigma^2$ and different number of traces. The evaluation has been performed by $10^3$ repetitions of maximum likelihood attacks on synthetically generated leakages.

Figures [fig:simul], [fig:simul-ps-sd] and [fig:simul-sd-ge] plots the resulting values for various noise levels and S-Boxes. We observe that for low noise the approximation $G(K|Y) \approx \mathbb{P}_s(K|Y)^{-1}$ still holds (yellow curve). As the noise increases, for a given SR, GE increases, and the latter approximation is no longer valid. The S-Box nonlinearity accentuates this effect because it decreases the minimum distance of points in $\mathcal{Y}$ in Fig. [fig:latt] and, therefore, makes the maximum likelihood attack less robust to noise. As expected, for low noise the approximation GE$\approx\tfrac{M+1}{2}-\tfrac{M}{2}$SD holds.

Validation on real traces from DPA Contest V4.2

Figure 1 plots the results on values of SR and GE computed on the three first folders of the DPA Contest V4.2 with a Hamming Weight template attack with known mask. As expected from the simulation the guessing entropy is lower bounded by $SR^{-1}$.

Results on Traces from DPA Contest v4.2

Conclusion

In this paper, optimal bounds between success rate, guessing entropy and distinguishability are derived with a simple majorization argument, and further improved for the Hamming weight leakage model—in particular for the classical assumptions of a priori equiprobable secret keys and additive white Gaussian measurement noise. Closed-form expressions and numerical computations are given for various leakage scenarios. A study of the impact of the choice S-Box with respect to SCA resistance confirms that nonlinearity of the S-Box tends to tighten the bounds between SR and GE. We established that distinguishability and guessing entropy are in one to one relationship for uniform keys and deterministic leakage models. In particular for low noise we have the approximation $GE \approx \tfrac{M+1}{2} - \tfrac{M}{2} SD$. The approximate relation $GE=1/SR$ holds in the case of $8$-bit bytes and low noise. This in turns imply that for $8$-bit and low noise $1/SR \approx \tfrac{M+1}{2} - \tfrac{M}{2} SD$. We observed that for the probability of success and distinguishability the optimal reverse Pinsker inequality corresponds to the optimal Fano inequality and that the optimal reverse Fano inequality corresponds to the optimal Pinsker inequality.

As a perspective, we notice that our methodology can be easily generalized to the definitions of the $i$th order success rate (Standaert, Malkin, and Yung 2009) SR$_i$ vs. GE. However, as pointed out in (Zhang, Ding, and Fei 2020), such theoretical work assumes perfect knowledge on the distribution of $K$ given observation $Y$. This generally underestimates the practical GE for a non optimal attack because such a practical attack generally gives a suboptimal key ranking. Thus the results of this paper should yield adequate estimates only for optimal template attacks. The determination of more precise regions SR vs. GE for other types of attacks is a topic for future investigation.

Proof of Thm. [thm:noisy:hw]

$\pi_i(y)$ for $i=0,1,2,3$. We can observe that the $\pi_i$ are step functions. Their are constant on the interval of the form $[\frac{p}{2},\frac{p+1}{2})$ for all integer $p$.

[fig:pi]

For $j=0,\ldots,n$ let $\pi_j(y)$ denote the $(j+1)$-th closest point to $y$ in $\mathcal{Y}$. In particular, $\pi_0(y)$ is the closest point to $y$ in $\mathcal{Y}$. It can be checked with the help of Fig. 2 that \[ \pi_0(y)= \begin{cases} 0 &\text{for $y \leq -\frac{1}{2}$} \\ i & \text{for $y \in [i-\frac{1}{2},i+\frac{1}{2})$} \\ n & \text{for $y \geq n + \frac{1}{2}$.} \end{cases}\] From [eq:PsK|Yformula], one has \[\begin{aligned} \mathbb{P}_s &= \frac{1}{M} \int \phi_{\sigma}(y - \pi_0(y)) \,\mathrm{d}y \\ &= \frac{1}{M} \biggl( 2 Q(\frac{1}{2\sigma}) + \sum_{i=0}^n \int \limits_{i-\frac{1}{2}}^{i+\frac{1}{2}} \phi_{\sigma}(y-i)\biggr) \\ &= \frac{1}{M} \biggl( 2 Q(\frac{1}{2\sigma}) + (n+1) \int_{-\frac{1}{2}}^{\frac{1}{2}} \phi_{\sigma}(y)\biggr) \\ &= \frac{1}{M} ( 2 Q(\frac{1}{2\sigma}) + (n+1) (1 -2 Q(\frac{1}{2\sigma}))) \end{aligned}\] which after simplification proves [eq:PsK|Ynoisy].

Now from [eq:GK|Y], one has \[G(K|Y) = \int f_Y(y) \sum \limits_{k=1}^{M} k\, p_{(k)|y} \,\mathrm{d}y\] Since the noise is Gaussian, the $p_{(k)|y}$ are sorted by Euclidean distance. Applying Bayes’ rule we obtain \[p_{(k)|y} = \phi_{\sigma}(y - \pi_j(y)) \frac{{1}/{M}}{f_Y(y)}, ~~ k=S_{j-1}(y)+1,\ldots,S_j(y).\] where $S_j(y)=\sum_{i=0}^{j} \tbinom{n}{\pi_i(y)}$ for $j=0,\ldots,n$ with the convention $S_{-1}(y)=0$. Therefore, \[\begin{aligned} G(K|Y) &= \!\int \!f_Y(y) \sum \limits_{j=0}^{n} \sum_{k = S_{j-1}(y)+1}^{S_{j}(y)} \!\!\!\!\!\!\!\! k\, \phi_{\sigma}(y \!-\! \pi_j(y)) \frac{{1/M}}{f_Y(y)} \,\mathrm{d}y \\ &= \frac{1}{M} \sum \limits_{j=0}^{n} \int \sum \limits_{k = S_{j-1}(y)+1}^{S_{j}(y)} k \phi_{\sigma}(y - \pi_j(y)) \,\mathrm{d}y \\ &= \frac{1}{M} \sum \limits_{j=0}^{n} \int C_j(y) \phi_{\sigma}(y - \pi_j(y)) \,\mathrm{d}y \end{aligned}\] where \[\begin{aligned} C_j(y) &= \frac{S_j(y)(S_j(y)+1) - S_{j-1}(y)(S_{j-1}(y)+1)}{2} \\ &= \tfrac{1}{2} \tbinom{n}{\pi_j(y)} \bigl(2 S_{j-1}(y) + \tbinom{n}{\pi_j(y)} + 1\bigr). \end{aligned}\]

The $j=0$ term can be written as \[\begin{aligned} &\int \frac{ S_1(y)(1 + S_1(y))}{2} \phi_{\sigma}(y - w_H(\pi_1(y))) \,\mathrm{d}y \\ &= [ 2 \int \limits_{\frac{1}{2}}^{\infty} \phi_{\sigma}(y)dy + \sum \limits_{i=0}^{n} \int \limits_{i-\frac{1}{2}}^{i+\frac{1}{2}} \frac{\tbinom{n}{i} (1 + \tbinom{n}{i})}{2} \phi_{\sigma}(y-i) \biggl ] \\ &= \biggl [ 2Q(\frac{1}{2\sigma}) + \sum \limits_{i=0}^{n} \frac{\tbinom{n}{i}(1 + \tbinom{n}{i})}{2} (1 -2Q(\frac{1}{2\sigma}))\biggl] \\ &= \frac{M}{2} + \frac{1}{2} \binom{2n}{n} - Q(\frac{1}{2\sigma}) (M + \binom{2n}{n} - 2). \end{aligned}\] We now compute the $j=1$ term. It can be checked with the help of Fig. 2 that \[\pi_1(y)= \begin{cases} 1 &\text{for $y \leq -\frac{1}{2}$} \\ i-1 & \text{for $y \in [i-\frac{1}{2},i)$} \\ i+1 & \text{for $y \in [i,i+\frac{1}{2})$} \\ n-1 & \text{for $y \geq n + \frac{1}{2}$.} \end{cases}\] In the $j=1$ term, the contribution of the integral from $\frac{1}{2}$ to $\infty$ and $-\infty$ to $-\frac{1}{2}$ both yields a term of value $\frac{n(n+3)}{2}Q(\frac{3\sigma}{2})$. The contribution of the integral over $[i-\frac{1}{2},i)$ yields \[\frac{1}{2} \tbinom{n}{i-1} [2 \tbinom{n}{i} + \tbinom{n}{i-1} + 1](Q(\frac{1}{2\sigma}) -Q(\frac{1}{\sigma})).\] and that over $(i,i+\frac{1}{2}]$ yields \[\frac{1}{2} \tbinom{n}{i+1} [2 \tbinom{n}{i} + \tbinom{n}{i+1} + 1](Q(\frac{1}{2\sigma}) -Q(\frac{1}{\sigma})).\] Summing the contribution yields, after some calculation, \[n(n+3) Q(\frac{3\sigma}{2}) + [M-2+2\tbinom{2n+1}{n+1}-\tbinom{2n}{n}](Q(\frac{1}{2\sigma})-Q(\frac{1}{\sigma})).\] Here we have used the following Vandermonde identities: \[\begin{aligned} \sum \limits_{i=0}^{n} \tbinom{n}{i+1}^2 &= \sum \limits_{i=0}^{n} \tbinom{n}{i-1}^2 = \tbinom{2n}{n} - 1 \\ % \sum \limits_{i=0}^{n} \tbinom{n}{i} \tbinom{n}{i-1} &= \sum \limits_{i=0}^{n} \tbinom{n}{i} \tbinom{n}{i+1} \\ &= \sum \limits_{i=0}^{n} \tbinom{n}{i} [\tbinom{n+1}{i} - \tbinom{n}{i}]\\ &=\tbinom{2n + 1}{n+1} - \tbinom{2n}{n}. \end{aligned}\] Summing the $j=0$ and $j=1$ terms simplifies to the first three terms in [eq:GK|Ynoisy].

One can go further and compute terms corresponding to $j=2,3,\ldots,n$. It is easily seen from the above derivation that splitting the integral with Chasles relation on the interval where $\pi_i$ is constant yields a sum of weighted $Q(\frac{i}{2\sigma})$ as shown in [eq:GK|Ynoisy].

We first prove expression for the statistical distance. \[\begin{aligned} \Delta(K|Y) &= \int_\mathbb{R}f_Y(y) \Delta(K|Y=y) dy \\ &= \int_\mathbb{R}f_Y(y) \sum_{k} \left(p(k|y)-1/M \right)^+ dy \\ &= \int_\mathbb{R}\sum_{k} \left( f_Y(y) p(k|y)- f_Y(y)/M \right)^+ dy \\ &= \frac{1}{M} \int_\mathbb{R}\sum_{k} \left( \phi_\sigma(y - w_H(k)) - f_Y(y) \right)^+ dy \\ &= \frac{1}{M^2} \sum_{w=0}^n \binom{n}{w} \int_\mathbb{R}( \sum_{j=0}^n \binom{n}{j} (\phi_\sigma(y-w) - \phi_\sigma(y-j)) \biggl)^+ dy \\ &= \frac{1}{M^2} \sum_{w=0}^n \binom{n}{w} \int_\mathbb{R}( \sum_{j=0}^n \binom{n}{j} (\phi_\sigma(y) - \phi_\sigma(y-(j-w))) \biggl)^+ dy . \end{aligned}\] At this point it is hard to determine when the integrand is positive to simplify the positive part. Hence we split the integral over multiple slices. Let \[I_{a,b}(w) = \int_a^b ( \sum_{j=0}^n \binom{n}{j} \left(\phi_\sigma(y) - \phi_\sigma(y-(j-w))\right) \biggl)^+ dy\] where we omit the subscript when $a=-\infty$ and $b=\infty$. Then \[\Delta(K|Y) = \frac{1}{M^2} \sum_{w=0}^n \binom{n}{w} I(w).\] If $w=0$ then for $\sigma$ small enough the integrand is positive on $(-\infty,\frac{1}{2}]$ and negative elsewhere. Then using an Abel summation technique it follows that \[\begin{aligned} I(0) &= I_{-\infty,\frac{1}{2}}(0) \\ &= (M-1) (1-Q_\sigma(\frac{1}{2})) - n Q_\sigma(\frac{1}{2}) - \sum_{j=0}^{n-2} \binom{n}{j+2} Q_\sigma(\frac{3}{2}+j) \\ &= (M-1) (1-Q_\sigma(\frac{1}{2})) - n Q_\sigma(\frac{1}{2}) + O(Q(\frac{3}{2\sigma})). \end{aligned}\]

If $w=n$ for $\sigma$ small enough the integrand is positive on $[-\frac{1}{2},\infty)$ and negative elsewhere. It follows that $I(0)=I(n)$. Else $1 \leq w \leq n-1$ and for $\sigma$ small enough the integrand is positive on $[-\frac{1}{2},\frac{1}{2}]$ and negative elsewhere. It follows that \[\begin{aligned} &I(w) = I_{-\frac{1}{2},\frac{1}{2}}(w) \\ &= M \left(1-2Q_\sigma\left(\frac{1}{2}\right)\right) - \sum_{j=0}^n \binom{n}{j} \left( Q\left(w-\frac{1}{2}-j\right)-Q\left(\frac{1}{2}+w-j\right) \right) \\ &= M \left(1-2Q_\sigma\left(\frac{1}{2}\right)\right) - \sum_{j=0}^n \binom{n}{j} Q\left(w-\frac{1}{2}-j\right) + \sum_{j=0}^n \binom{n}{j} Q\left(\frac{1}{2}+w-j\right) \\ &= M \left(1-2Q_\sigma\left(\frac{1}{2}\right)\right) - \sum_{j=0}^n \binom{n}{j} Q\left(w-\frac{1}{2}-j\right) + \sum_{j=-1}^{n-1} \binom{n}{j+1} Q\left(w-\frac{1}{2}-j\right) \\ &= M \left(1-2Q_\sigma\left(\frac{1}{2}\right)\right) + Q_\sigma(\frac{1}{2}+w)-Q_\sigma(w-\frac{1}{2}-n) \nonumber \\ &\qquad - \sum_{j=0}^{n-1} \left(\tbinom{n}{j}-\tbinom{n}{j+1} \right) Q\left(w-j-\frac{1}{2}\right) \\ &= M \left(1-2Q_\sigma\left(\frac{1}{2}\right)\right) - 1 - \sum_{j=0}^{n-1} \left(\tbinom{n}{j}-\tbinom{n}{j+1} \right) Q\left(w-j-\frac{1}{2}\right) +O(Q(\frac{3}{2\sigma})) \\ &= M \left(1-2Q_\sigma\left(\frac{1}{2}\right)\right) - 1 - \sum_{j=w-1}^{n-1} \left(\tbinom{n}{j}-\tbinom{n}{j+1} \right) Q\left(w-j-\frac{1}{2}\right) +O(Q(\frac{3}{2\sigma})) . \end{aligned}\] It only remains to sum the integrals over $w$ up to an additive $O(\frac{3}{2\sigma})$ term. \[\begin{split} \sum_{w=1}^{n-1} \binom{n}{w} I_w =~& (M-2) \left( M \left(1-2Q_\sigma\left(\frac{1}{2}\right)\right) - 1 \right) \\ &- \sum_{w=1}^{n-1} \binom{n}{w} \sum_{j=w-1}^{n-1} \left(\tbinom{n}{j}-\tbinom{n}{j+1} \right) Q\left(w-j-\frac{1}{2}\right). \end{split}\]

We compute the sum $S = \sum_{w=1}^{n-1} \binom{n}{w} \sum_{j=w-1}^{n-1} \left(\tbinom{n}{j}-\tbinom{n}{j+1} \right) Q\left(w-j-\frac{1}{2}\right)$ to simplify the expression. \[\begin{aligned} &S = \sum_{w=1}^{n-1} \binom{n}{w} \left(\binom{n}{w-1}-\binom{n}{w} \right) Q\left(\frac{1}{2}\right) + \sum_{w=1}^{n-1} \binom{n}{w} \left(\binom{n}{w}-\binom{n}{w+1} \right) (1-Q\left(\frac{1}{2}\right)) \nonumber \\ &\qquad + \sum_{w=1}^{n-1} \binom{n}{w} \sum_{j=w+1}^{n-1} \left(\binom{n}{j}-\binom{n}{j+1} \right) (1-Q\left(j+\frac{1}{2}-w\right)) \\ &= \sum_{w=1}^{n-1} \binom{n}{w} \left(\binom{n}{w-1}-\binom{n}{w} \right) Q\left(\frac{1}{2}\right) \nonumber \\ &\quad + \sum_{w=1}^{n-1} \binom{n}{w} \left(\binom{n}{w}-\binom{n}{w+1} \right) (1-Q\left(\frac{1}{2}\right)) + \sum_{w=1}^{n-1} \binom{n}{w} \left(\binom{n}{w+1}-\binom{n}{n} \right) \\ &= \sum_{w=1}^{n-1} \binom{n}{w} \left(\binom{n}{w-1}-\binom{n}{w}-\binom{n}{w}+\binom{n}{w+1} \right) Q\left(\frac{1}{2}\right) \nonumber \\ &\qquad + \sum_{w=1}^{n-1} \binom{n}{w} \left(\binom{n}{w}-\binom{n}{w+1}+\binom{n}{w+1}-\binom{n}{n} \right) \\ &= \sum_{w=1}^{n-1} \binom{n}{w} \left(\binom{n}{w-1}-2\binom{n}{w}+\binom{n}{w+1} \right) Q\left(\frac{1}{2}\right) + \sum_{w=1}^{n-1} \binom{n}{w} \left(\binom{n}{w}-1 \right) \\ &= \sum_{w=1}^{n-1} \binom{n}{w} \left(\binom{n}{w-1}-2\binom{n}{w}+\binom{n}{w+1} \right) Q\left(\frac{1}{2}\right) + \binom{2n}{n}-M \\ &= \left( 2 \binom{2n+1}{n+1} - 4 \binom{2n}{n} - 2n + 4 \right) Q\left(\frac{1}{2}\right) + \binom{2n}{n}-M . \end{aligned}\] Summing everything we obtain finally obtain up to $O\left(Q\left(\frac{3}{2}\right)\right)$, \[\begin{aligned} M^2 \Delta(K|Y) &=(M-2)M(1-2Q_\sigma(\frac{1}{2})) -(M-2) \nonumber \\ &\qquad - \left( 2 \binom{2n+1}{n+1} - 4 \binom{2n}{n} - 2n + 4 \right) Q\left(\frac{1}{2}\right) \nonumber \\ &\qquad -\binom{2n}{n} + M + 2(M-1) (1-Q_\sigma(\frac{1}{2})) -2n Q_\sigma(\frac{1}{2}) \\ &= \left( M^2 - \binom{2n}{n} \right) - 2\left(M^2-M+1+\binom{2n+1}{n+1}-2\binom{2n}{n} \right) Q\left(\frac{1}{2}\right). \end{aligned}\]

I created this website using Quarto https://quarto.org/docs/websites.

References

Arikan, Erdal. 1996. “An Inequality on Guessing and its Application to Sequential Decoding.” IEEE Transactions on Information Theory 42 (1): 99–105.

Beguinot, Julien, Wei Cheng, Sylvain Guilley, and Olivier Rioul. 2022. “Be My Guess: Guessing Entropy vs. Success Rate for Evaluating Side-Channel Attacks of Secure Chips.” In 25th Euromicro Conference on Digital System Design, DSD 2022, Maspalomas, Spain, August 31 - Sept. 2, 2022, 496–503. IEEE. https://doi.org/10.1109/DSD57027.2022.00072.

Cheng, Wei. 2021. “What can information guess? : Towards information leakage quantification in side-channel analysis. (Qu’est ce que l’information permet de deviner? : Vers une quantification des fuites d’informations dans l’analyse de canaux auxiliaires).” PhD thesis, Polytechnic Institute of Paris, France. https://tel.archives-ouvertes.fr/tel-03504182.

Chérisey, Éloi de, Sylvain Guilley, and Olivier Rioul. 2018. “Confused yet Successful: - Theoretical Comparison of Distinguishers for Monobit Leakages in Terms of Confusion Coefficient and SNR.” In Information Security and Cryptology - 14th International Conference, Inscrypt 2018, Fuzhou, China, December 14-17, 2018, Revised Selected Papers, edited by Fuchun Guo, Xinyi Huang, and Moti Yung, 11449:533–53. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-030-14234-6\_28.

Chérisey, Éloi de, Sylvain Guilley, Olivier Rioul, and Pablo Piantanida. 2019. “Best Information is Most Successful — Mutual Information and Success Rate in Side-Channel Analysis.” IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019 (2): 49–79. https://doi.org/10.13154/tches.v2019.i2.49-79.

Choudary, Marios O., and Pantelimon George Popescu. 2017. “Back to Massey: Impressively Fast, Scalable and Tight Security Evaluation Tools.” In Cryptographic Hardware and Embedded Systems - CHES 2017 - 19th International Conference, Taipei, Taiwan, September 25-28, 2017, Proceedings, edited by Wieland Fischer and Naofumi Homma, 10529:367–86. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-319-66787-4\_18.

Duc, Alexandre, Sebastian Faust, and François-Xavier Standaert. 2015. “Making Masking Security Proofs Concrete - Or How to Evaluate the Security of Any Leaking Device.” In Advances in Cryptology - EUROCRYPT 2015 - 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Sofia, Bulgaria, April 26-30, 2015, Proceedings, Part I, edited by Elisabeth Oswald and Marc Fischlin, 9056:401–29. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-662-46800-5_16.

Graczyk, Robert, and Igal Sason. 2021. “On Two-Stage Guessing.” Information 12 (4): 159.

Hardy, Godfrey Harold, John Edensor Littlewood, and George Pólya. 1934. Inequalities. Cambridge Univ. Press.

Heuser, Annelie, Olivier Rioul, and Sylvain Guilley. 2014. “A Theoretical Study of Kolmogorov-Smirnov Distinguishers — Side-Channel Analysis vs. Differential Cryptanalysis.” In Constructive Side-Channel Analysis and Secure Design - 5th International Workshop, COSADE 2014, Paris, France, April 13-15, 2014. Revised Selected Papers, edited by Emmanuel Prouff, 8622:9–28. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-319-10175-0_2.

Ho, Siu-Wai, and Sergio Verdú. 2015. “Convexity/Concavity of Rényi Entropy and $\alpha$-Mutual Information.” Hong Kong, China.

Khouzani, MHR, and Pasquale Malacaria. 2019. “Generalized Entropies and Metric-Invariant Optimal Countermeasures for Information Leakage Under Symmetric Constraints.” IEEE Transactions on Information Theory 65 (2): 888–901.

Mangard, Stefan, Elisabeth Oswald, and Thomas Popp. 2006. Power Analysis Attacks: Revealing the Secrets of Smart Cards. Springer.

Marshall, Albert W., Ingram Olkin, and Barry C. Arnold. 2011. Inequalities: Theory of Majorization and Its Applications. 2nd ed. Springer.

Prest, Thomas, Dahmun Goudarzi, Ange Martinelli, and Alain Passelègue. 2019. “Unifying Leakage Models on a rényi Day.” In Advances in Cryptology - CRYPTO 2019 - 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2019, Proceedings, Part I, edited by Alexandra Boldyreva and Daniele Micciancio, 11692:683–712. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-030-26948-7\_24.

Rioul, Olivier. 2022a. “What Is Randomness? The Interplay between Alpha Entropies, Total Variation and Guessing.” Physical Sciences Forum 5 (1). https://doi.org/10.3390/psf2022005030.

———. 2022b. “Variations on a Theme by Massey.” IEEE Transactions on Information Theory 68 (5): 2813–28.

———. 2023. “The Interplay between Error, Total Variation, Alpha-Entropy and Guessing: Fano and Pinsker Direct and Reverse Inequalities.” Entropy as Part of the Special Issue MaxEnt 2022-the 41st International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering 25 (7): 978.

Sason, Igal, and Sergio Verdú. 2018. “Improved Bounds on Lossless Source Coding and Guessing Moments via Rényi Measures.” IEEE Transactions on Information Theory 64 (6): 4323–46. https://doi.org/10.1109/TIT.2018.2803162.

Standaert, François-Xavier, Tal Malkin, and Moti Yung. 2009. “A Unified Framework for the Analysis of Side-Channel Key Recovery Attacks.” In EUROCRYPT, 5479:443–61. LNCS. Springer.

Veyrat-Charvillon, Nicolas, Benoı̂t Gérard, Mathieu Renauld, and François-Xavier Standaert. 2012. “An Optimal Key Enumeration Algorithm and Its Application to Side-Channel Attacks.” In Selected Areas in Cryptography, edited by Lars R. Knudsen and Huapeng Wu, 7707:390–406. Lecture Notes in Computer Science. Springer.

Veyrat-Charvillon, Nicolas, Benoı̂t Gérard, and François-Xavier Standaert. 2013. “Security Evaluations Beyond Computing Power.” In Advances in Cryptology - EUROCRYPT 2013, 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Athens, Greece, May 26-30, 2013. Proceedings, edited by Thomas Johansson and Phong Q. Nguyen, 7881:126–41. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-642-38348-9_8.

Zhang, Ziyue, A. Adam Ding, and Yunsi Fei. 2020. “A Fast and Accurate Guessing Entropy Estimation Algorithm for Full-key Recovery.” IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020 (2): 26–48. https://doi.org/10.13154/tches.v2020.i2.26-48.