Formal Security Proofs via Doeblin Coefficients:

Optimal Side-channel Factorization from Noisy Leakage to Random Probing

Authors

Julien Béguinot

Wei Cheng

Sylvain Guilley

Olivier Rioul

Abstract

Masking is one of the most popular countermeasures to side-channel attacks, because it can offer provable security. However, depending on the adversary’s model, useful security guarantees can be hard to provide. At first, masking has been shown secure against \(t\)-threshold probing adversaries by Ishai et al. at Crypto’03. It has then been shown secure in the more generic random probing model by Duc et al. at Eurocrypt’14. Prouff and Rivain have introduced the noisy leakage model to capture more realistic leakage at Eurocrypt’13. Reduction from noisy leakage to random probing has been introduced by Duc et al. at Eurocrypt’14, and security guarantees were improved for both models by Prest et al. at Crypto’19, Duc et al. in Eurocrypt’15/J. Cryptol’19, and Masure and Standaert at Crypto’23. Unfortunately, as it turns out, we found that previous proofs in either random probing or noisy leakage models are flawed, and such flaws do not appear easy to fix.

In this work, we show that the Doeblin coefficient allows one to overcome these flaws. In fact, it yields optimal reductions from noisy leakage to random probing, thereby providing a correct and usable metric to properly ground security proofs. This shows the inherent inevitable cost of a reduction from the noisy leakages to the random probing model. We show that it can also be used to derive direct formal security proofs using the subsequence decomposition of Prouff and Rivain.

Introduction

Context

All cryptographic implementations leak some side information about the sensitive variables they manipulate through the so-called side-channels. These leakages can be of different natures: Timing (Dhem et al. 1998), power consumption (P. C. Kocher, Jaffe, and Jun 1999; P. Kocher et al. 2018), electromagnetic (Gandolfi, Mourtel, and Olivier 2001; Agrawal et al. 2002). The corresponding side-channel attacks can be very harmful if there is no countermeasure or if the countermeasure is not carefully implemented. One may classify countermeasures into three categories, that can be jointly implemented:

Masking is one of the most effective countermeasures known so far. It is especially relevant because of its provable security (Ishai, Sahai, and Wagner 2003; Rivain and Prouff 2010a; Prouff and Rivain 2013; Duc, Dziembowski, and Faust 2014; Duc, Faust, and Standaert 2015, 2019; Barthe et al. 2016; Béguinot, Cheng, et al. 2023; Masure and Standaert 2023). Previously published security proofs for masking fall into two classes:

  • Simulation paradigm (indirect approach): A black-box adversary is modeled by an algorithm that only accesses the public information, which corresponds to the usual cryptanalysis. By contrast, the side-channel adversary is given both public and side-channel information. If there always exists a black-box adversary whose output is indistinguishable from the side-channel adversary’s output, then the implementation is considered secure.

  • Information Theoretic Paradigm (direct approach): The implementation is considered secure if the mutual information (or some other informational metric) between the side-channel leakage and the corresponding sensitive variable is negligible, given the available public information. Equivalently, the required number of side-channel queries required to achieve a given success rate is prohibitively large for any attack.

Those two approaches have been respectively termed indirect and direct by Prest et al. (Prest et al. 2019). Both approaches have their pros and cons. On the one hand, the security proof based on information theory is conceptually simpler and provides more realistic security parameters. On the other hand, the simulation-based approach is very generic and can be applied to a whole cipher at once. Indeed, as remarked in (Duc, Dziembowski, and Faust 2014 Footnote 4) a few pairs of plaintext/ciphertext completely reveal the key of an AES in the information theoretic sense. Hence block ciphers such as AES are not secure in this sense, a fortiori in the presence of side-channel information. The security of AES relies on a one-way computational assumption which cannot be taken into account in the information theoretic paradigm1.

To prove the security of a cryptographic implementation in any of the two paradigms above, it is necessary to define the side-channel adversary’s model. Some restrictions should be imposed on the adversary, since if she/he is allowed to observe all variables manipulated in the circuit, then the implementation would be trivially broken. Micali and Reyzin introduced physically observable cryptography (Micali and Reyzin 2004), in which only computation can leak information. Leakage resilient cryptography (Dziembowski and Pietrzak 2008; Kalai and Reyzin 2019) also considers memory leakage models. In this context, the simplest model is the t-threshold probing model (Ishai, Sahai, and Wagner 2003) in which the adversary is only allowed to probe the values of \(t\) wires within the circuit. In the more elaborated region probing model (Goudarzi et al. 2022), the circuit is divided into small regions and the adversary can probe \(t\) values in each region. A more realistic model is the random probing model (Duc, Dziembowski, and Faust 2014) where the side-channels correspond to erasure channels. The most generic type of model is the noisy leakage model from Prouff and Rivain (Prouff and Rivain 2013) where the \(\mathcal{D}\)-noisy adversary has minimal distortion \(\mathcal{D}\) between the channel input and output. In this list of models, the security proof is all the more hard to establish as the model is more complex and realistic.

Contributions

In this paper, we aim at grounding security proofs of side-channel analysis countermeasures on solid mathematical foundations. This work has the following contributions.

  1. We carry out a systematic mathematical study of the complementary Doeblin coefficient (\(\mathrm{CDC}\)). This coefficient was originally used to study Markov chains (Doeblin 1937) and appeared in the side-channel literature as the value of \(\epsilon\) in (Duc, Dziembowski, and Faust 2019 Eqn. 9, Proof of Lemma 4). We show that the \(\mathrm{CDC}\) provides the optimal reduction from a noisy leakage model to the random probing model. Since the reduction is optimal it exhibits the unavoidable loss to pay to use a security proof based on the random probing model.

  2. Bounds on the success rate (SR) and guessing entropy (GE) of a side-channel attack are derived using the CDC. Such bounds holds with equality for erasure side-channels, scale well with the number of channel queries, can be applied to adaptive adversaries and are amenable to practical evaluation, e.g., in Gaussian additive noise (Hamming weight or least significant bit model, etc.).

  3. A new direct security proof is presented based on the CDC and on the Prouff-Rivain subsequence decomposition. As a supplementary material, some flaws in previous direct security proofs for masking in the noisy leakage model are identified (this does not necessarily mean that the corresponding results cannot hold) and some patches or bypasses are presented. Namely, this concerns Lemma 4 (hence Theorem 3) in (Prouff and Rivain 2013), Lemma 8 (hence Theorem 6, Corollary 4) in (Prest et al. 2019), and Theorem 5 in (Masure and Standaert 2023). Details are in Appendix 11.

  4. A new methodology providing indirect security proofs is presented, based on the optimal CDC reduction from the noisy leakage model to the random probing model. As a supplementary material, minor errors are also corrected from the original proof on a reduction to the \(t\)-threshold probing model of Lemma 4 in (Duc, Dziembowski, and Faust 2014). As a result, the bounds derived in (Duc, Dziembowski, and Faust 2014; Duc, Faust, and Standaert 2015, 2019; Prest et al. 2019) that leveraged this Lemma can be improved significantly. Again details can be found in Appendix 11.

Outline

The remainder of this paper is structured as follows. Preliminaries and mathematical results on channels, leakage measures (including CDC) and figures of merit are presented in Section 2. The key property of the \(\mathrm{CDC}\) and the resulting bounds on figures of merit are presented in Section 3, along with some theoretical expressions for concrete evaluation. Direct security proofs leveraging \(\mathrm{CDC}\) based on Prouff-Rivain subsequence decompositions are provided in Section 4. A new methodology for the derivation of indirect proofs is shown in Section 5. Section 6 concludes.

This paper also contains supplementary material in the Appendix, which are not necessary in order to follow the main arguments of the article. Appendix 7 provides a formal channel definition, relating it to the notion of random function. Appendices 89 and 10 contain technical proofs and results. Appendix 11 provides a comprehensive list of flaws that we identified in the state-of-the-art papers, along with patches we devised. It should be noted that we have obtained confirmation from the various authors of the papers in which we have detected flaws in the security proof in the noisy leakage model. To fit page limitations most proofs and the descriptions of the existing flaws are deferred to the full version of the article (DBLP:journals/iacr/BeguinotCGR24?) available at https://eprint.iacr.org/2024/199.

Overview of formal security proofs, organized in four levels. Novelty is in blue, revisions of the state of the art is in red, and the state of the art is in black.

Figure 1 updates Fig. 1 from (Prest et al. 2019) where the black arrows indicate the state of the art, the red arrows are flaws that we identified and revised and the blue arrows correspond to our new derivations using the \(\mathrm{CDC}\). The figure is organized in four levels: The top level corresponds to the Hamming weight leakage model. The second level contains the main leakage measures, corresponding to different noisy leakage adversaries. Each arrow label from the first to the second level indicates how the leakage measure scales with respect to the number \(n\) of bits, while each arrow label between two leakage models indicates the appropriate reference of the reduction from one model to another. The third layer contains the various adversarial models based on probing, and the bottom level contains the secure compilers that generate secure circuits against a given adversarial model. Each arrow label from the second or third to the fourth level indicates the appropriate reference of the corresponding security proof. The comparison with the other informational leakage measure is elaborated in Tab. [tab:comp-metrics] below.

Detailed Technical Overview

Theorem [thm:doeblin-degradation] provides the optimal factorization of a given channel into an erasure channel followed by another channel. As a consequence, any channel can be seen as a stochastically degraded erasure channel with the largest possible erasure probability. In particular, this implies that the optimal reduction of a side-channel adversary from the noisy leakage model (arbitrary channel) to the random probing model (erasure channel) is measured by the \(\mathrm{CDC}\).

The \(\mathrm{CDC}\) equally applies to multivariate leakages. Lemma [lemma:adaptive-single-letter] shows that the \(\mathrm{CDC}\) with multiple traces is bounded in terms of the \(\mathrm{CDC}\) with one trace, even for an adaptive “chosen channel” adversary.

The main figures of merit (Definition [def:fig-merit]) satisfy the data processing inequality (DPI) recalled in Lemma [lemma:DPI-metrics]. In other words, the stochastically degraded adversary can only perform worse than the non-degraded one. As a consequence, the performance of any side-channel attacker can be bounded in terms of \(\mathrm{CDC}\) as shown in Proposition [prop:doeblin-to-guessing]. Intuitively, these bounds result from averaging two extreme cases: Either the leakage value is an erasure symbol and the figure of merit is that of a blind guess, or it is probed and the figure of merit is that of a disclosed value. This applies even for computationally bounded adversaries, allowing one to avoid complex simulation arguments (Duc, Dziembowski, and Faust 2014).

Lemma [prop:doeb-concat] shows that the \(\mathrm{CDC}\) satisfies a strengthened data processing inequality which is useful in the derivation of the security proof.

Theorem [thm:doeblin-pr] gives a direct security bound for ISW masked computations of an AES following the subsequence decomposition of Prouff and Rivain (Prouff and Rivain 2013). To achieve this, we derive a security lemma for each type of subsequence. For type 1 and 2 subsequences, we prove an analog of Mrs Gerber’s Lemma (MGL) (Wyner and Ziv 1973) in Lemma [MGL:doeblin] which shows that the \(\mathrm{CDC}\) between the leakage and a masked value is upper bounded by the product of the CDCs share by share. This is expected since a sensitive value is probed if and only if all shares are probed. For type 3 subsequences, Lemma [lemma:type3] provides a security bound for the cross-wise terms, in terms of the domination polynomial of the rook graph of Definition [def:rook-poly]. The idea is that a value is probed if and only if all shares are probed at least once through cross-wise terms of one of the two shared inputs.

Theorem [th:random-probing] explains how any formal security proof in the random probing model can be lifted to the noisy leakage model using the \(\mathrm{CDC}\). This is illustrated for the security proof of Duc et al. (Duc, Dziembowski, and Faust 2014).

The descriptions of the flaws appearing in previous derivations is deferred to Appendix 11. the long version of the article (DBLP:journals/iacr/BeguinotCGR24?). To summarize, the derivations of (Prest et al. 2019) and (Prouff and Rivain 2013) are invalidated because of an incorrect chain rule on probabilities. The flaw in (Masure and Standaert 2023) is due to the fact that the bound appearing in the MGL for mutual information is separately but not jointly convex in the variables. The \(\mathrm{CDC}\) overcomes these difficulties by allowing a direct bound which is then degraded in terms of the corresponding leakage measure through Lemma [lemma:sandwich-bound].

Mathematical Framework

In this Section, we present the mathematical framework of side-channel analysis that we use in our analysis. The notations are given in Subsection 2.1. The formal definition of a side-channel is given in Subsection 2.2. The main informational leakages measures are recalled in Subsection 2.3. Some useful properties of the complementary Doeblin coefficient (\(\mathrm{CDC}\)) such as an adaptive single letterization and strengthened data processing inequality (DPI) are provided in Subsection 2.4. The model for a side-channel attack is described in Subsection 2.5. Finally, the figures of merit to evaluate the advantage of a side-channel adversary are introduced in Subsection 2.6.

Notations

Random variables are denoted by uppercase letters like \(X,Y\). The corresponding set of values taken by the random variables are denoted by the corresponding calligraphic letters like \(\mathcal{X},\mathcal{Y}\). Lowercase letters denote values taken by random variables, e.g., \(x\in\mathcal{X}\), \(y\in\mathcal{Y}\). Bold letters denote random vectors \(\mathbf{X}\) taking vector values \(\mathbf{x}\). The probability distribution of \(X\) is denoted \(P_X\); we write \(X\sim P_X\).

  • When \(X\) is discrete, taking values in a discrete set \(\mathcal{X}\) of cardinality \(|\mathcal{X}|\), its probability mass function (pmf) is noted \(p_X(x)= \mathbb{P} (X=x)\);

  • When \(X\) is continuous, its probability density function (pdf) is also noted \(p_X(x)\) where \(\mathrm{d}P_X(x)=p_X(x) \;\mathrm{d}x\).

We use the unified notation \(\int\) which is a sum in the discrete case and an integral in the continuous case. Therefore, we write \(\mathbb{P} ( X \in E) = \int_{x\in E} p_X(x)\). Expectation is denoted by \(\mathbb{E} _X[\cdot]\). The \(p\)-norm is noted \(\|\cdot\|_p\).

  • The uniform distribution on a set \(\mathcal{X}\) is denoted by \(\mathcal{U}(\mathcal{X})\);

  • \(\mathcal{B}(p)\) denotes the Bernoulli distribution with parameter \(p\) and \(\mathcal{B}(n,p)\) denotes the Binomial distribution with parameters \(n,p\). The survival function of \(B \sim \mathcal{B}(n,p)\) is noted \(Q_B(x,n,p) \triangleq \mathbb{P} ( B > x) = \mathbb{P} ( B \geqslant x + 1)\).

  • \(\mathcal{N}(\mu,\sigma^2)\) denotes the Gaussian distribution of mean \(\mu\) and variance \(\sigma^2\). The survival function of the standard Gaussian \(\mathcal{N}(0,1)\) is denoted by \(Q\).

The joint probability distribution of \((X,Y)\) is noted \(P_{X,Y}\) with pmf or pdf \(p_{X,Y}\). When \(X\) and \(Y\) are independent, \(P_{X,Y}=P_X P_Y\), that is, \(p_{X,Y}(x,y)=p_X(x)p_Y(y)\). The conditional probability distribution of \(Y\) given \(X\) is denoted \(P_{Y|X}\) where \(p_{Y|X}(y|x) = \frac{p_{X,Y}(x,y)}{p_X(x)}\).

Finally, the positive (resp. negative) part of \(x\) is \(x^+ \triangleq \max( 0 , x)\) (resp. \(x^- \triangleq \max(-x,0)\)), and the complementary of \(x \in [0,1]\) is \(\overline{x}\triangleq1-x\).

Side-Channels

A random transformation with input \(X\) and output \(Y\) is defined by a transitional probability distribution \(P_{Y|X}\), also known as a Markov kernel (Polyanskiy and Wu 2023). For example, when \(X\) is discrete, one has \(p_Y(y)=\sum_x p_X(x) p_{Y|X}(y|x)\). When \(X\) is continuous, one has \(p_Y(y)=\int p_X(x) p_{Y|X}(y|x) \;\mathrm{d}x\). This random transformation is noted \(X \to \boxed{ P_{Y|X}} \to Y\) or \(X \to Y\) for short.

In the sequel, a side-channel is defined as a random transformation \(X \to Y\), and we shall refer to any transformation \(X\to Y\) as a “channel”. In the side-channel literature, it is also commonly defined as a random function \(Y=F(X)\) where \(F=f\) is picked at random among a set of deterministic functions \(f\) according to some probability distribution \(P_F\). It is true, but not obvious, that the two descriptions coincide : See Appendix 7 for details. . We say that the channel \(X \to \boxed{ P_{Y|X} } \to Y\) is opaque if \(Y\) is independent of its input \(X\), that is, \(p_{Y|X}(y|x)=p_Y(y)\) for all \(x\) and \(y\).

Notice that any deterministic function \(Y=f(X)\) can be seen as a “random” transformation where \(p_{Y|X}(y|x)=\delta(y=f(x))\) (Dirac distribution). This functional channel will be denoted by \(X \to \boxed{ f } \to Y\). A functional channel with constant \(f\) is opaque. If \(f\) is the identity, the corresponding functional channel is named identity channel. When we write \(X \to Y \to Z\) we always assume that it forms a Markov chain.

Additive masking of order \(d \geqslant 0\) can be seen as a channel \(X \to \boxed{ \mathrm{Mask}_d } \to \mathbf{X}\), where \(\mathbf{X} = (X_0,\ldots,X_d) \triangleq (R_0,\ldots,R_{d-1},X-\sum_{i=0}^{d-1} R_i)\) where the \(R_i\) are independent and uniformly distributed \(R_i \sim \mathcal{U}(\mathcal{X})\). The \(d+1\) components of \(\mathbf{X}\) are called shares of \(X\). By the well-known secret sharing property, any subset of at most \(d\) shares of \(X\) is independent of \(X\).

An important class of channels is as follows:

The channel \[X \to \boxed{ \mathrm{EC}_{{\mathcal{E}}}^{{\bot}} } \to Y\] is said to be an erasure channel with erasure probability \({\mathcal{E}}\in[0,1]\) and special erasure symbol \(\bot\) if on input \(x\), \(\mathrm{EC}_{{\mathcal{E}}}^{{\bot}}\) outputs \(x\) with probability \[\overline{{\mathcal{E}}}=1-{\mathcal{E}}\] and the special erasure symbol \({\bot}\) otherwise (with probability \({\mathcal{E}}\)). That is \[\begin{cases} p_{Y|X}(\bot | x) = {\mathcal{E}} \\ p_{Y|X}(x|x) =\overline{{\mathcal{E}}} \end{cases} \qquad (\forall x \neq \bot)\]

When convenient, we also consider \(\bot\) as input value, and let \(p_{Y|X}(\bot|\bot)=1\). Notice that an erasure channel with erasure probability \(\mathcal{E} = 1\) is opaque.

The notation \(\mathcal{E}\) for the erasure probability is classical in information theory. A few articles such as (Duc, Dziembowski, and Faust 2014; Prest et al. 2019) use the complementary \(1 - \mathcal{E}\) instead. Here we follow the standard information theoretic convention.

Erasure channels satisfy useful properties that are easy to check:

[lemma:erasure-channel-commutes] Let \(P_{Y|X}\) be any channel from \(\mathcal{X}\) to \(\mathcal{Y}\) and \(P_{Y|X}^{\bot}\) its extension to the input \(\bot\) by setting \(P_{Y|X}^{\bot}(\bot|\bot) = 1\). Then \[\left ( X \to \boxed{ \mathrm{EC}_{{\mathcal{E}}}^{\bot} } \to \boxed{ P_{Y|X}^{\bot} } \to Y' \right ) = \left ( X \to \boxed{ P_{Y|X} } \to \boxed{ \mathrm{EC}_{{\mathcal{E}}}^{\bot} } \to Y' \right ).\] Note that on the left-hand side the erasure channel is defined on \(\mathcal{X}\) while on the right-hand side it is defined on \(\mathcal{Y}\).

See Appendix 8.1.

[prop:EC-composition] Let \({\mathcal{E}}_0,{\mathcal{E}}_1 \in [0,1]\) and set \(\overline{{\mathcal{E}}} = \overline{{\mathcal{E}}}_0 \overline{{\mathcal{E}}}_1\). Then \[% \mathrm{EC}_{\mathcal{E}_0}^{\bot} \to \mathrm{EC}_{\mathcal{E}_1}^{\bot} = \mathrm{EC}_{\mathcal{E}}^{\bot} \left ( X \to \boxed{ \mathrm{EC}_{{\mathcal{E}}_1}^{\bot} } \to \boxed{ \mathrm{EC}_{{\mathcal{E}}_0}^{\bot} } \to Y \right ) = \left ( X \to \boxed{ \mathrm{EC}_{{\mathcal{E}}}^{\bot} } \to Y \right )\]

The output is not erased if and only if it is not erased in both channels, hence with probability \(\overline{{\mathcal{E}}} = \overline{{\mathcal{E}}}_0 \overline{{\mathcal{E}}}_1\).

Informational Leakage Measures

There exist many noisiness metrics in the literature that quantify how noisy a channel \(X \to Y\) can be. In this Subsection we list different leakage measures used in this paper.

The correlation coefficient is widely adopted in side-channel analysis for its simplicity in e.g., the associated correlation power analysis (CPA) (Brier, Clavier, and Olivier 2004).

\[\rho(X;Y) \triangleq \frac{ \mathbb{E} _{XY} \left [ (X - \mathbb{E} [X]) (Y - \mathbb{E} [Y] ) \right ]} { \sqrt{ \mathbb{E} _X \left [ (X - \mathbb{E} [X])^2 \right ] \mathbb{E} _Y \left [ (Y- \mathbb{E} [Y])^2 \right ] } } .\]

The correlation coefficient is symmetric \(\rho(X;Y) = \rho(Y;X)\). Note, however, that \(\rho(X;Y) = 0\) does not imply that \(X\) and \(Y\) are statistically independent.

Let \(P,Q\) be two probability distributions with respective pdf or pmf \(p,q\) defined over \(\mathcal{X}\). The Kullback–Leibler (KL) divergence between \(P\) and \(Q\) is \[D_{ \mathrm{KL} }(P \| Q) \triangleq \int_{\mathcal{X}} p \log \frac{p}{q}\] and the total variation distance (TV) between \(P\) and \(Q\) is \[D_{ \mathrm{TV} }(P \| Q) = \frac{1}{2} \int_{\mathcal{X}} |p-q| = \frac{1}{2} \| p - q \|_1.\]

KL divergence is not symmetric in general \(D_{ \mathrm{KL} }(P \| Q) \neq D_{ \mathrm{KL} }(Q \| P)\) but the total variation is symmetric \(D_{ \mathrm{TV} }(P \| Q) = D_{ \mathrm{TV} }(Q \| P)\).

Total variation is known to characterize indistinguishability in the sense that no statistical test can distinguish \(P\) and \(Q\) if \(D_{ \mathrm{TV} }(P \| Q)\) is negligible (Polyanskiy and Wu 2023, sec. 7.3). Both \(D_{\mathrm{TV}}\) and \(D_{\mathrm{KL}}\) are particular instances of \(f\)-divergences, that satisfy a data processing inequality (see (Polyanskiy and Wu 2023 Def. 7.1)).

The mutual information (\(\mathrm{MI}\)) is the KL divergence between the joint distribution of \((X,Y)\) and the product of its marginals: \[I(X;Y) \triangleq D_\mathrm{KL} (p_{XY}\|p_X p_Y) %\\ = \int_{ \mathcal{X} \times \mathcal{Y} } p_{XY}(x,y) \log \frac{ p_{X,Y}(x,y) }{ p_X(x) p_Y(y)}. %\\ %&= \E_Y \left [ \sumint_x p_{X|Y}(x|Y) \log \frac{p_{X|Y}(x|Y)}{p_X(x)} \right ].\]

\(\mathrm{MI}\) is symmetric \(I(X;Y)=I(Y;X)\). It is a measure of statistical dependence: if \(X\) and \(Y\) are statistically independent then \(p_{XY}=p_X p_Y\) so that \(I(X;Y) = 0\).

The total variation information (\(\mathrm{TVI}\)) is the TV distance between the joint distribution of \((X,Y)\) and the product of its marginals: \[\begin{aligned} \Delta(X;Y) &\triangleq D_\mathrm{TV} (p_{XY}\|p_X p_Y) = \frac{1}{2} \| p_{XY} - p_X p_Y \|_1 \\ &= \frac{1}{2} \int_{ \mathcal{X} \times \mathcal{Y} } | p_{XY}(x,y) - p_X(x) p_Y(y) |. \end{aligned}\]

Note that \(\mathrm{TVI}\) is symmetric, \(\Delta(X;Y) = \Delta(Y;X)\). A negligible \(\mathrm{TVI}\) implies that no test can distinguish \(p_{XY}\) from \(p_X p_Y\), that is no test can exhibit a statistical dependence between \(X\) and \(Y\). \(\mathrm{TVI}\) can be seen as a particular \(f\)-information (Polyanskiy and Wu 2023 Eqn. 7.46). In (Prest et al. 2019) TV is referred to as Statistical Distance (SD).

The maximal leakage quantifies the maximal advantage in estimating \(X\) from the side-channel information \(Y\): \[\mathcal{L}(X \to Y) = \log \int_{ \mathcal{Y} } \sup_{x \in \mathcal{X} } p_{Y|X}(y|x).\]

We use an arrow instead of a semicolon in the definition of \(\mathcal{L}(X \to Y)\) because it depends only on the channel \(X \to Y\) and not on the input probability distribution of \(X\). Note that maximal leakage is not symmetric \(\mathcal{L}(X \to Y) \neq \mathcal{L}(Y \to X)\).

The Euclidean norm bias (EN) is the expected Euclidean distance between the posterior distribution \(p_{X|Y}\) and its prior \(p_X\): \[\beta(X;Y) \triangleq \mathbb{E} _Y \| p_{X|Y}(\cdot|Y) - p_X \|_2 . %= \| p_{XY} - p_X p_Y \|_2\]

\(\beta(X;Y)\) is similar to \(\Delta(X;Y)\) where \(\|\cdot\|_1\) is replaced by \(\|\cdot\|_2\) inside the expectation. However, \(\beta(X;Y)\) is not equal to \(\| p_{XY} - p_X p_Y \|_2\) because of the square root appearing in the Euclidean norm. In particular it is not symmetric \(\beta(X;Y) \neq \beta(Y;X)\). A similar quantity \(\Delta_2(X;Y)\) with squared norm, related to the Rényi \(2\)-information, is used for side-channel leakage evaluation in (Liu et al. 2023).

\[{\mathrm{RE}}(X;Y) \triangleq \sup_{x,y} \left| \frac{p_{X|Y}(x|y)}{p_X(x)} - 1 \right| = \sup_{x,y} \left| \frac{p_{XY}(x,y)}{p_X(x) p_Y(y)} - 1 \right|.\]

\[{\mathrm{ARE}}(X;Y) \triangleq \mathbb{E} _Y \left[ \sup_{x} \left| \frac{p_{X|Y}(x|Y)}{p_X(x)} - 1 \right| \right]. %= \E_Y \left[ \sup_{x} \left| \frac{p_{XY}(x,Y)}{p_X(x) p_Y(Y)} - 1 \right| \right]\]

While relative error is symmetric \({\mathrm{RE}}(X;Y)={\mathrm{RE}}(Y;X)\) the average relative error is not symmetric \({\mathrm{ARE}}(X;Y) \neq {\mathrm{ARE}}(Y;X)\).

This paper focuses on another important quantity:

\[\begin{aligned} \overline{{\mathcal{E}}}&(X \to Y) = 1 - \int_y \inf_x p_{Y|X}(y|x) = \int_y \sup_x \left ( p_Y(y) - p_{Y|X}(y|x) \right ) \\ &= \mathbb{E} _Y \left [ \sup_x \left ( 1 - \frac{p_{Y|X}(Y|x)}{p_Y(Y)} \right ) \right ] = \mathbb{E} _Y \left [ \sup_x \left ( 1 - \frac{p_{X|Y}(x|Y)}{p_X(x)} \right ) \right ] . \end{aligned}\]

Doeblin’s coefficient appeared implicitly in (Doeblin 1937, 1), later explicitly in (Seneta 1973 Eqn. 2.6) and has been recently known as the “Doeblin ergodicity coefficient”, see e.g., (Chestnut and Lladser 2010 Eqn. 10).

While the expression of \(\mathrm{CDC}\) resembles both that of maximal leakage and \({\mathrm{ARE}}\), it is fundamentally different. \(\mathrm{CDC}\) is non-symmetric \(\overline{{\mathcal{E}}}(X \to Y) \neq \overline{{\mathcal{E}}}(Y \to X)\). Like for maximal leakage, we use an arrow instead of a semicolon in \(\bar{\mathcal{E}}(X\to Y)\) because it depends only on \(X \to Y\) and not on \(P_X\). The original Doeblin coefficient is \({\mathcal{E}}(X \to Y)=1-\overline{{\mathcal{E}}}(X \to Y)\).

Maximal leakage is a particular Sibson’s \(\alpha\)-information (Verdú 2015; Esposito, Vandenbroucque, and Gastpar 2022) of order \(\alpha=+\infty\): \(\mathcal{L}(X \to Y) = I_\infty(X;Y)\), while Mutual information can be seen as Sibson’s \(\alpha\)-information of order \(\alpha=1\): \(I(X;Y)=I_1(X;Y)\). The Doeblin coefficient is the exponential of minus Sibson’s \(\alpha\)-information of order \(\alpha=-\infty\) also known as maximal cost leakage: \(\mathcal{E}(X \to Y) = \exp( - I_{-\infty}(X;Y) )\) (Esposito, Vandenbroucque, and Gastpar 2022).

Properties of the Complementary Doeblin Coefficient

In Lemma [prop:EC-composition] we have seen the composition property of erasure channels sharing the same erasure symbol. What happens now if we compose two erasure channels with different erasure symbols? The following Lemma shows that even though the resulting channel is not an erasure channel, its \(\mathrm{CDC}\) is identical:

[lemma:composition-different-erasures] For the channel \(X \to \boxed{ \mathrm{EC}_{{\mathcal{E}}_1}^{\bot_1} } \to Y \to \boxed{ \mathrm{EC}_{{\mathcal{E}}_0}^{\bot_0} } \to Z\), \[\overline{{\mathcal{E}}}( X \rightarrow Z ) = \overline{{\mathcal{E}}}( X \rightarrow Y ) \overline{{\mathcal{E}}}( Y \rightarrow Z ).\]

Straightforward from the definitions (see Appendix 8.4).

Consider a channel \(X\to Y\) and suppose one has a post processing \(Y\to Z\) (such that \(X \to Y \to Z\) is a Markov chain (Polyanskiy and Wu 2023)). Intuitively, \(Z\) does not contain more information than \(Y\) about \(X\), and we have the following:

[lemma:doeb-consistencency] For any \(X \to Y \to Z\), one has \[\label{eqn:consistency} {\mathcal{E}}(X \rightarrow (Y,Z) ) = {\mathcal{E}}( X \rightarrow Y).\]

Straightforward from the definitions (see Appendix 8.5).

A sensitive variable \(X\) may leak several times in a side-channel attack. For instance, the adversary may access two side-channels \(X \to Y_1\) and \(X \to Y_2\). What can be said about the \(\mathrm{CDC}\) of the combined leakages? The following Lemma provides an answer.

[lemma:single-letter] For the multi-channel \(X\substack{\nearrow \boxed{\!P_{Y_1|X}\!\!} \to Y_1\\\searrow \boxed{\!P_{Y_2|X}\!\!} \to Y_2}\) denoted by \(X\to Y_1,Y_2\), we have \[{\mathcal{E}}( X \rightarrow Y_1,Y_2) \geqslant{\mathcal{E}}( X \rightarrow Y_1) {\mathcal{E}}( X \rightarrow Y_2).\] Generally with \(q\) channels in parallel i.e. \(P_{Y_1,\ldots,Y_q | X} = \prod_{i=1}^{q} P_{Y_i|X}\) we have \[\label{eqn:single-letter} {\mathcal{E}}( X \rightarrow Y_1, \cdots ,Y_q) \geqslant\prod_{i=1}^q {\mathcal{E}}( X \rightarrow Y_i).\] In terms of \(\mathrm{CDC}\), equation [eqn:single-letter] reformulates as \[\overline{{\mathcal{E}}}( X \rightarrow Y_1,\ldots, Y_q) \leqslant 1-\prod_{i=1}^q \left ( 1-\overline{{\mathcal{E}}}( X \rightarrow Y_i) \right ) \leqslant\sum_{i=1}^q \overline{{\mathcal{E}}}( X \rightarrow Y_i) .\]

See Appendix 8.6.

In the adaptive setting, the adversary may observe \(Y_1\) through the side-channel \(X \to Y_1\) and then chose the channel \(X \to Y_2\) based on his observation of \(Y_1\). The following adaptive single letterization lemma extends Lemma [lemma:single-letter] by showing how the \(\mathrm{CDC}\) of the combined leakages can be derived even when the channels are chosen adaptively:

[lemma:adaptive-single-letter] In the adaptive setting where all channels satisfy \(\overline{{\mathcal{E}}}(X\to Y_i)\leqslant\overline{{\mathcal{E}}}\), we still have \[{\mathcal{E}}( X \rightarrow Y_1,Y_2) \geqslant{\mathcal{E}}^2.\] More generally we have \({\mathcal{E}}( X \rightarrow Y_1,\ldots, Y_q) \geqslant{\mathcal{E}}^q.\)

See the Appendix 8.7

Side-Channel Attack Models

We use the following terminology from (Prouff and Rivain 2013; Prest et al. 2019; Masure and Standaert 2023).

A channel \(X\to\!\boxed{ P_{Y|X}}\!\to Y\) is said to be \(\delta\)-noisy for input \(X\) with respect to some metric \(\mathcal{D}\) if \(\mathcal{D}(X;Y) \leqslant\delta\). For short, it is said to be \(\delta\)-noisy with respect to \(\mathcal{D}\) (without reference to \(X\)) when \(X\) is taken uniformly distributed \(X \sim \mathcal{U}(\mathcal{X})\). \(\mathcal{D}\) should be understood as a distortion measure of the channel. For instance \(\mathcal{D}\) can be \(\rho,I,\Delta,\mathcal{L},\beta,{\mathrm{RE}},{\mathrm{ARE}}\) or \(\overline{\mathcal{E}}\). The lower \(\delta\), the noisier the channel.

Consider a set of \(l\) sensitive values \((X_1,\ldots,X_l)\). A side-channel adversary obtains multiple side information \((Y_1,\ldots,Y_l)\) through the channels \(\varphi_i = (X_i \to Y_i)\), \(i=1,\ldots,l\). The tuple of channels \(\varphi= (\varphi_1,\ldots,\varphi_l)\) is restricted so that the adversary’s ability is limited. Typically, the adversary is said to be:

  • \(t\)-threshold probing (Ishai, Sahai, and Wagner 2003): if \(\varphi\) contains at most \(t\) identity channels and opaque channels on the remaining positions;

  • \(\overline{{\mathcal{E}}}\)-random probing (Duc, Dziembowski, and Faust 2014): if \(\varphi\) is made of \({\mathcal{E}}\)-erasure channels;

  • \(\delta\)-noisy (Prouff and Rivain 2013; Prest et al. 2019): if \(\varphi\) contains only \(\delta\)-noisy channels with respect to some metric \(\mathcal{D}\);

  • \((\sigma,f)\)-additive: if \(\varphi\) is made of channels of the form \(X \to Y \triangleq f(X) + \sigma N\) where \(f\) is a fixed deterministic leakage function and \(N \sim \mathcal{N}(0,1)\) is an independent additive Gaussian noise. Typically, \(f\) can be the Hamming weight function or the least significant bit function. When \(X\) is a bit leaking as \(X \to Y = X + \sigma N\) it specializes to the leakage model of Chari et al. (Chari et al. 1999).

A cryptographic implementation is classically modeled as a circuit \(\Gamma\). There are two main paradigms about the channel inputs that are both legitimate. Either it is assumed that every wire within the circuit leaks like (Duc, Dziembowski, and Faust 2014). Or it is assumed that every gate within the circuit leaks like (Prouff and Rivain 2013). In this case, the channel takes as input the operands of the gate. For unary gates both models are exactly equivalent. The models differ whenever the gates process multiple operands. Assuming that the wires leak leads to tighter security bounds, while assuming that the gates leak seems to be closer to the physical nature of leakages. (Duc, Dziembowski, and Faust 2014, sec. 5.5) discusses more in depth the trade off between both models.

One concern in side-channel analysis is to cover adaptive adversaries \(\mathcal{A}\). This term can be confusing as it is used with different meanings. To avoid any ambiguity, we make more precise the terminology with the following definitions:

We clarify the different notations of adaptivity in the context of side-channel attacks:

  1. When \(\mathcal{A}\) is allowed to choose sequentially the public information (Masure and Standaert 2023) used by \(\Gamma\) then she/he is a chosen public information adversary. It corresponds to the usual setting of chosen plaintext or ciphertext adversary in cryptology.

  2. When \(\mathcal{A}\) is allowed to specify \(\mathbf{\varphi}\) sequentially (Duc, Dziembowski, and Faust 2014) she/he is said to be a chosen channel adversary. This differs from chosen public information adversary; in this setting the adversary is allowed to move the position of the side-channel acquisition instruments (probes) from one query to the other.

  3. If \(\mathcal{A}\) can specify \(\varphi_1,\ldots,\varphi_l\) sequentially within a query (Duc, Dziembowski, and Faust 2014), she/he is said to be a strong chosen channel adversary. The adversary is even allowed to move the position of its probe within a query. This last type of adaptivity is, however, unrealistic in most of practical settings.

The activity of a side-channel adversary \(\mathcal{A}\) with \(q\) queries can be viewed as a game. This game unfolds differently depending on the side-channel adaptivity and depending on the gate/wire leakage model. After side-channel collection, the adversary exploits them to distinguish the correct key \(K\) and outputs \(\mathrm{out}_{\mathcal{A}}(K)\).

The complete acquisition and attack led by \(\mathcal{A}\) is formalized by Alg. [alg:SCA-setup]. In practice \(\mathrm{out}_{\mathcal{A}}(K)\) can be a score vector sorting the key hypotheses (or parts of the key). If \(\mathcal{A}\) is restricted to opaque channels she/he does not learn anything through them. In this case, \(\mathcal{A}\) is said to be a black-box adversary.

Oracle \(\mathcal{O}\) draws uniformly at random a secret key \(K\).

()\(i = 1 ,\ldots,q\)

\(\mathcal{A}\) specifies a public information \(\mathbf{t}_i\) and send it to \(\mathcal{O}\).

\(\mathcal{O}\) draws uniformly at random the randomness \(\mathbf{R}_i\) and computes the corresponding wire values \(\mathbf{X}_i\).

()\(j =1,\ldots,l\)

\(\mathcal{A}\) specifies \(\varphi_i\) to \(\mathcal{O}\)

\(\mathcal{O}\) sends back the corresponding leakage \(x_i\) from side-channel \(\varphi_i\) to \(\mathcal{A}\) under the constraint that \(\mathbf{\varphi}\) is an allowed tuple of channel.

\(\mathcal{A}\) outputs \(\mathrm{out}_{\mathcal{A}}(K)\)

Figures of Merit

When \({\mathrm{out}}_{\mathcal{A}}( K)\) is a key ranking the performances of the adversary are measured via three classical figures of merits: The success rate (\(\mathrm{SR}\)) \(\mathbb{P} _s\), the success rate of order \(o\) (\({\mathrm{SR}}_o\), success rate in \(o\)-trials) \(\mathbb{P} _{s,o}\) (Standaert, Malkin, and Yung 2009) and the guessing entropy \(\mathrm{GE}\)(Massey 1994). We follow Ito et al. (Ito, Ueno, and Homma 2022, sec. 2.3) and express these metrics in terms of the a posteriori \(\mathrm{rank}\) of the key hypothesis given the side-information.

[def:fig-merit] Let \(K\) be a secret random variable taking values in a finite set \(\mathcal{K}\). Let \(Y\) be an arbitrary random variable representing a side-information. The success rate (\(\mathrm{SR}\)) is given by the Maximum a Posteriori (MAP) rule \[\mathbb{P} _s( K | Y )\triangleq \mathbb{P} (\mathrm{rank}(K|Y) = 1) %= \E_{Y} \{\sup_{k \in \mathcal{K}} p_{K|Y}(k|Y) \}. %\geq \frac{1}{|\mathcal{K}|}.\] The success rate of order \(o\), \({\mathrm{SR}}_o\) is \[\mathbb{P} _{s,o}( K | Y ) \triangleq \mathbb{P} (\mathrm{rank}(K|Y) \leqslant o) %= \E_{Y} \bigl\{ \sum_{\mathrm{rank}(k|Y) \leq o} p_{K|Y}(k|Y) \bigr\}. %\geq \frac{o}{|\mathcal{K}|}.\] The guessing entropy (\(\mathrm{GE}\)) is the minimum average number of guesses of an optimal guessing strategy \[G( K | Y ) \triangleq \mathbb{E} \{ \mathrm{rank}(K|Y) \} %= \E_{Y} \bigl\{ \sum_{k \in \mathcal{K}} \mathrm{rank}(k|Y) p_{k|Y}(k|Y) \bigr\}. %\leq \frac{|\mathcal{K}|+1}{2}.\] In general \(\mathrm{rank}\) is defined as a function such that for each \(y\), \(k \to \mathrm{rank}(k|y)\) is a permutation of the key space \(\mathcal{K}\). In an optimal guessing strategy, for each \(y \in \mathcal{Y}\), \(\mathrm{rank}(k|y) \in \{1,\ldots,|\mathcal{K}|\}\) is the rank of \(p_{K|Y}(k|y)\) in the list \(\{p_{K|Y}(k|y) | k \in \mathcal{K} \}\) sorted in decreasing order. In case of collisions, ties are resolved randomly which does not change the statistical quantities at stake.

Since \(\mathbb{P} _{s}( K | Y ) = \mathbb{P} _{s,o=1}( K | Y )\) holds, results for \({\mathrm{SR}}\) will be derived from results in terms of \({\mathrm{SR}}_o\).

When no side-information is available, the adversary performs a blind guess whose figures of merits are constants depending only on the a priori key-distribution. Namely \[\mathbb{P} _{s,o}(K) \triangleq \mathbb{P} (\mathrm{rank}(K) \leqslant o) %\sum_{\mathrm{rank}(k) \leq o} p_K(k) \qquad\text{and}\qquad G( K ) \triangleq \mathbb{E} \{ \mathrm{rank}(K) \} %\sum_{k \in \mathcal{K}} \mathrm{rank}(k) p_{K}(k).\] where \(\mathrm{rank}(k) \in \{1,\ldots,|\mathcal{K}|\}\) is the rank of \(p_{K}(k)\) in the list \(\{p_{K}(k) | k \in \mathcal{K} \}\) sorted in decreasing order.

The advantage of the adversary is quantified by \(\mathbb{P} _{s,o}(K|Y) - \mathbb{P} _{s,o}(K) \geqslant 0\), \(G(K) - G(K|Y) \geqslant 0\) and for statistical tests \(\Delta(K;Y) \geqslant 0\). If further \(K\) is uniformly distributed then \(\mathbb{P} _{s,o}(K) = \frac{o}{|\mathcal{K}|}\) and \(G(K)=\frac{|\mathcal{K}|+1}{2}\).

If an adversary \(\mathcal{A}\) is computationally bounded then she/he may not be able to fully exploit the side-information \(Y\). The corresponding figures of merit are denoted by \(\mathbb{P} _s^{\mathcal{A}}( K | Y )\), \(\mathbb{P} _{s,o}^{\mathcal{A}}( K | Y )\) and \(G^{\mathcal{A}}( K | Y )\). Obviously \(\mathbb{P} _s^{\mathcal{A}}( K | Y ) \leqslant \mathbb{P} _s( K | Y )\), \(\mathbb{P} _{s,o}^{\mathcal{A}}( K | Y ) \leqslant \mathbb{P} _{s,o}( K | Y )\) and \(G^{\mathcal{A}}( K | Y ) \geqslant G( K | Y )\).

Mathematical Key Properties of the CDC

In this section we derive the key mathematical properties of the \(\mathrm{CDC}\) that will be useful to derive security bounds. In Subsection 3.1 we exhibit the optimal factorization of a side-channel into a stochastically degraded erasure channel. This shows that \(\mathrm{CDC}\) is the optimal parameter in the reduction from noisy leakages to the random probing model. The word optimal refer to the reduction from noisy leakage to a random probing adversary, we do not claim however that the \(\mathrm{CDC}\) yields an optimal bound on the success rate of a side-channel attack. In Subsection 3.2 we show how the figures of merit of a side-channel attack can be bounded using \(\mathrm{CDC}\) leveraging the Data Processing Inequality (DPI). We show that \(\mathrm{CDC}\) is amenable to evaluation in Subsection 3.3. Finally, we compare the \(\mathrm{CDC}\) to the informational measure (introduced in Subsection 2.3) in Subsection 3.4.

Optimal Channel Degradation

It is known that security in the noisy leakage model can be reduced to security in the random probing model. In (Duc, Dziembowski, and Faust 2014 Lemma 3), security in the noisy leakage model measured by TVI is reduced to security in the random probing model. In (Prest et al. 2019 Lemma 3), security in the noisy leakage model measured by \(\mathrm{ARE}\) is reduced to security in the random probing model. Finally, (Duc, Faust, and Standaert 2019 Theorem 3) proves security in the noisy leakage model measured by \(\mathrm{MI}\) by upper bounding (Duc, Dziembowski, and Faust 2014 Lemma 3) using Pinsker’s inequality (Polyanskiy and Wu 2023 Thm 7.10).

The key property is that any channel can be seen as a stochastically degraded erasure channel. This stochastic degradation can be seen as a factorization like (Duc, Dziembowski, and Faust 2014 Lemma 4) of \(\mathrm{Noise}(X)\) into \(\mathrm{Noise}'(\varphi(X))\) where \(\varphi\) is an erasure channel (termed \(\epsilon\)-identity function in their presentation). In this section, we derive the optimal parameter in this reduction and show it corresponds to the complementary Doeblin coefficient (\(\mathrm{CDC}\)). This unifies previous results that can seen as a weakened version of our reduction by upper bounding the \(\mathrm{CDC}\).

The channel \(X \to Z\) is said to be stochastically degraded with respect to the channel \(X \to Y\) if there exists a channel \(Y \to Z\) such that \[\left (X \to \boxed{ P_{Y|X}} \to Y \to \boxed{ P_{Z|Y}} \to Z \right ) = \left ( X \to \boxed{ P_{Z|X} } \to Z \right ).\]

[thm:doeblin-degradation] Any channel \(X\to \boxed{ P_{Y|X}} \to Y\) is a stochastically degraded erasure channel: \[\label{eqn-degradation} X \to \boxed{\mathrm{EC}_{{\mathcal{E}}}^{\bot}} \to X' \to \boxed{ P_{Y|X'}} \to Y\] with the maximum erasure probability \[\label{eq:cdc-def} {\mathcal{E}}(X \rightarrow Y) = \int_{y} \inf \limits_{x \in \mathcal{X}} p_{Y|X}(y|x).\]

We provide a proof for completeness in Appendix 8.3.

\({\mathcal{E}}(X \rightarrow Y)\) is known in the literature as the Doeblin coefficient of ergodicity of the channel  (Doeblin 1937; Dobrushin 1956; Makur 2020; Makur and Singh 2023). In our context \({\mathcal{E}}(X \rightarrow Y)\) represents the erasure probability while \(\overline{{\mathcal{E}}}(X \rightarrow Y)\) represents the probing probability. Thm [thm:doeblin-degradation] was proved for binary input channels in (Bloch and Barros 2011 Prop. 6.4) in the context of physical layer security and wiretap channels and in the general for network coding in (Makur 2020 Lemma 6) or key agreements in (Gohari, Günlü, and Kramer 2020 Lemma 5). The \(\mathrm{CDC}\) appears for the first time in the side-channel literature as the value of \(\epsilon\) in (Duc, Dziembowski, and Faust 2019 Eqn. 9, Proof of Lemma 4).

Maximum erasure probability in Theorem [thm:doeblin-degradation] means that there exists at least one channel degradation (in the form of Equation [eqn-degradation]) achieving \({\mathcal{E}}= \mathcal{E}(X \to Y)\) and that there does not exist any channel degradation with \({\mathcal{E}}> \mathcal{E}(X \to Y)\). In this sense, the \(\mathrm{CDC}\) is the optimal parameter in the reduction from noisy leakage to the random probing model.

Obviously, the erasure channel is optimally degraded into itself, that is, \({\mathcal{E}}( X \to \boxed{ \mathrm{EC}_{{\mathcal{E}}}^{\bot}(X) } \to Y ) = {\mathcal{E}}\).

If \(X\) is a binary random variable and \(X \to Y\) a BSC with crossover probability \(0 \leqslant p \leqslant\frac{1}{2}\) then \(\mathcal{E}(X \to Y) = 2 p\). The factorization given by Theorem [thm:doeblin-degradation] is shown in Figure 2.

If \(X\) is binary and \(X \to Y\) is a \(Z\)-channel with parameter \(0 \leqslant e \leqslant 1\) then \(\mathcal{E}(X \to Y) = e\). The factorization given by Theorem [thm:doeblin-degradation] is shown in Figure 3.

BSC

Z-Channel

Theorem [thm:doeblin-degradation] implies the following corollary in terms of simulatability similar to (Duc, Dziembowski, and Faust 2014 Lemma 3) in terms of \(\mathrm{CDC}\):

Any \(\overline{\mathcal{E}}\)-noisy adversary \(\mathcal{A}\) with respect to \(\mathrm{CDC}\) can be perfectly simulated by a \(\overline{\mathcal{E}}\)-random probing adversary \(\mathcal{S}\).

Let us assume that \(\mathcal{A}\) receives the side information \(Y\) about \(K\) through the channel \(K \to Y\). By assumption \(\overline{\mathcal{E}}(K \to Y) = \overline{\mathcal{E}}\) so that Theorem [thm:doeblin-degradation] implies that \((K \to Y) = (K \to K' \to Y)\) where \(K \to K'\) is an erasure channel with erasure probability \(\mathcal{E}\). An adversary \(\mathcal{S}\) that receives side information \(K'\) about \(K\) through \(K \to K'\) is \(\overline{\mathcal{E}}\)-random probing. Now \(\mathcal{S}\) can sample \(\tilde{Y}\) by passing its observation \(K'\) through the channel \(K' \to Y\). By construction \(\mathcal{S}\) obtains \(\tilde{Y}\) equals in law with \(Y\).

Strengthened Data Processing Inequality

By Lemma [prop:EC-composition], the composition of several erasure channels is an erasure channel with a larger erasure probability. In general, the composition of several channels should leak less information. This is formalized by a strengthened data processing inequality (DPI):

[prop:doeb-concat] For any \(X \to Y \to Z\), \(\mathrm{CDC}\) satisfies the following strengthened-DPI \[\overline{{\mathcal{E}}}(X \rightarrow Z) \leqslant\overline{{\mathcal{E}}}(X \rightarrow Y) \overline{{\mathcal{E}}}(Y \rightarrow Z)\] which implies a preprocessing-DPI (preprocessing \(X\to Y\) can only reduce leakage): \(\overline{{\mathcal{E}}}(X \rightarrow Z) \leqslant\overline{{\mathcal{E}}}(Y \rightarrow Z)\), and a post-processing-DPI (post-processing \(Y\to Z\) can only reduce leakage) \(\overline{{\mathcal{E}}}(X \rightarrow Z) \leqslant\overline{{\mathcal{E}}}(X \rightarrow Y)\). In a nutshell, stochastic degradation reduces the value of the \(\mathrm{CDC}\).

By Theorem [thm:doeblin-degradation] we can degrade the channels \(X \to Y\) and \(Y \to Z\) so that the channel \(X \to Z\) rewrites as \[X \to \boxed{\mathrm{EC}_{{\mathcal{E}}(X \rightarrow Y)}^{\bot_x}} \to X' \to \boxed{ P_{Y|X'}} \to Y \to \boxed{\mathrm{EC}_{{\mathcal{E}}(Y \rightarrow Z)}^{\bot_y}} \to Y' \to \boxed{ P_{Z|Y'}} \to Z.\] Since by Lemma [lemma:erasure-channel-commutes] an erasure channel commutes with any other channel, the channel is equivalent to \[X \to \boxed{\mathrm{EC}_{{\mathcal{E}}(X \rightarrow Y)}^{\bot_x}} \to X' \to \boxed{\mathrm{EC}_{{\mathcal{E}}(Y \rightarrow Z)}^{\bot_y}} \to X'' \to \boxed{ P_{Y'|X''}^{\bot_y}} \to Y \to \boxed{ P_{Z|Y'}} \to Z.\] We have two different erasures symbols but by Lemma [lemma:composition-different-erasures], the concatenated channel is stochastically degraded with respect to an erasure channel with \(\overline{{\mathcal{E}}} = \overline{{\mathcal{E}}}(X \rightarrow Y) \overline{{\mathcal{E}}}(Y \rightarrow Z)\). So we have \[X \to \boxed{\mathrm{EC}_{{\mathcal{E}}}^{\bot}} \to \tilde{X} \to \boxed{ P_{ \tilde{X} | X''}} \to X'' \to \boxed{ P_{Y'|X''}^{\bot_y}} \to Y \to \boxed{ P_{Z|Y'}} \to Z.\] Now let \(p_{Z|\tilde{X}} = p_{Z|Y'} \to p_{Y'|X''}^{\bot_y} \to p_{ \tilde{X} | X''}\) so that \[X \to \boxed{\mathrm{EC}_{{\mathcal{E}}}^{\bot}} \to \tilde{X} \to \boxed{ P_{Z|\tilde{X}} } \to Z.\] Since \(\overline{{\mathcal{E}}}(X \to Z)\) is the infimum such that this factorization holds, we have \(\overline{{\mathcal{E}}}(X \to Z) \leqslant\overline{{\mathcal{E}}} = \overline{{\mathcal{E}}}(X \rightarrow Y) \overline{{\mathcal{E}}}(Y \rightarrow Z)\) which concludes the proof.

Makur and Singh (Makur and Singh 2023), with a very different proof, established a similar property of \(\mathrm{CDC}\) in the discrete setting and interpreted it as the sub-multiplicativity of \(\mathrm{CDC}\).

Bounds on the Figures of Merit

An adversary tries to recover the sensitive variable \(X\) with the help of the side-information \(Z\) through the side-channel \(X \to Z\) which is stochastically degraded with respect to the channel \(X \to Y\). Intuitively she/he can only perform worse than an adversary that accesses \(Y\). This intuition is formalized by a DPI data processing inequality (DPI).

[lemma:DPI-metrics] Consider the channel \(U \to V \to W \to X\). Then \[I(V;W) \geqslant I(U;X) \quad \text{and} \qquad \Delta(V;W) \geqslant\Delta(U;X).\]

Consider the channel \(X \to Y \to Z\), where \(X\) is valued in the finite set \(\mathcal{X}\), the \({\mathrm{SR}}_o\) and \(\mathrm{GE}\) verify a post-processing DPI, \[\mathbb{P} _{s,o}(X|Y) \geqslant \mathbb{P} _{s,o}(X|Z) \qquad\text{and}\qquad G(X|Y) \leqslant G(X|Z).\]

It is well known that the \({\mathrm{SR}}_o\) and \(\mathrm{GE}\) verify a post-processing DPI (see for instance (Béguinot et al. 2022; Rioul 2023)). As shown in (Polyanskiy and Wu 2023 Theorem 7.16), the \(f\)-information also verifies a DPI which includes \(\mathrm{MI}\) and \(\mathrm{TVI}\).

[prop:doeblin-to-guessing] Let \(\lambda_{{\mathrm{SR}}_o} = (1 - \mathbb{P} _{s,o}(K)), \lambda_{ {\mathrm{GE}}} = (G( K ) - 1), \lambda_{{\mathrm{TVI}}}= (1 - \mathrm{exp}( -H_2(K)) )\) be three constants that only depend on the a priori secret key distribution (where \(H_2\) is the collision entropy). The adversary’s advantage for \({\mathrm{SR}},{\mathrm{GE}}\) and \({\mathrm{TVI}}\) can be bounded as follows: \[\begin{array}{rcccl} 0 & \leqslant& \mathbb{P} _{s,o}( K |Y) - \mathbb{P} _{s,o}(K) &\leqslant& \overline{{\mathcal{E}}} ( K \rightarrow Y ) \lambda_{{\mathrm{SR}}_o} , \\ 0 & \leqslant& G( K ) - G( K |Y) &\leqslant& \overline{{\mathcal{E}}} ( K \rightarrow Y ) \lambda_{ {\mathrm{GE}}} , \\ 0 & \leqslant& \Delta( K ;Y) &\leqslant& \overline{{\mathcal{E}}} ( K \rightarrow Y ) \lambda_{{\mathrm{TVI}}} . \end{array}\]

See Appendix 8.2.

We cannot directly deduce from Lemma [lemma:DPI-metrics] that \(\mathbb{P} _s^{\mathcal{A}}( K | Y )\), \(\mathbb{P} _{s,o}^{\mathcal{A}}( K | Y )\) and \(G^{\mathcal{A}}( K | Y )\) verify a DPI. But as shown by Duc et al. (Duc, Dziembowski, and Faust 2014 Lemma 2) if the channel \(K' \to Y\) can be efficiently sampled then \(\mathcal{A}\) can efficiently reproduce \(\tilde{Y}\) equal in distribution with \(Y\) from \(X'\). As a consequence, under this hypothesis we can assume that \(\mathbb{P} _s^{\mathcal{A}}( K | Y )\), \(\mathbb{P} _{s,o}^{\mathcal{A}}( K | Y )\) and \(G^{\mathcal{A}}( K | Y )\) also verify the bounds from Proposition [prop:doeblin-to-guessing]. As shown by Brian et al. (Brian et al. 2022), a large class of noisy channels \(K' \to Y\) can be indeed simulated almost for free. In fact, we do not need to have an efficient simulation of the channel noise \(K' \to Y\). Indeed, since \(K \to K'\) is an erasure channel we obtain \[\mathbb{P} _s^{\mathcal{A}}(K|Y) \overset{(a)}{\leqslant} \mathbb{P} _s(K|Y) \overset{(b)}{\leqslant} \mathbb{P} _s(K|K') \overset{(c)}{=} \mathbb{P} _s^{\mathcal{A}}(K|K')\] where \((a)\) holds because a computationally bounded adversary can only perform worse than the optimal unbounded adversary, \((b)\) is the usual DPI and \((c)\) is due to the fact that for an erasure channel an optimal attack is efficiently computable. For example, the attack that outputs the key when it is not erased and a random ranking otherwise is both optimal and efficient. The same derivation holds for \({\mathrm{GE}}\) (with reversed inequalities).

Discussion on the Optimality of the \(\mathrm{CDC}\)

The trade-off between a \(\mathrm{CDC}\)-based bound and a \(\mathrm{MI}\) based bound depends on the nature of the channel \(K \to Y\) which is factorized optimally with the \({\mathrm{CDC}}\) into \(K \to K' \to Y\).

  • If \(K \to Y\) is an erasure channel then the factorization is \(K \to K' = Y\) so that using the DPI to consider \(K'\) instead of \(Y\) as a leakage can be done without any degradation of the final bound. In this case the bound using \(\mathrm{CDC}\) is optimal and holds with equality.

  • If the channel \(K \to Y\) is far from an erasure channel then the channel \(K' \to Y\) can be noisy. As a consequence, using the DPI to consider \(K'\) as the leakage instead of \(Y\) incurs an unavoidable loss. In this case the \(\mathrm{CDC}\) based bound can be loose and another informational leakage measure such as \(\mathrm{MI}\) or \(\mathrm{TVI}\) may be more suitable to capture the noise in the side-channel. A very bad channel in this respect is the channel from \(K \to Y\) where \(K\) and \(Y\) are both taking their values in \(\{1,\ldots,|\mathcal{X}|\}\), \(p_{Y|K}(y|k) = (|\mathcal{X}|-1)^{-1}\) if \(y \neq k\) and \(p_{Y|K}(y|k) = 0\) otherwise. In this case, \(\overline{{\mathcal{E}}}( K \to Y) = 1\) while \(I(K;Y) = \log(|\mathcal{X}|/(|\mathcal{X}|-1))\) is small.

Both of these two extreme cases are toy examples that do not occur in practice. Depending on the nature of the practical side-channel the bound based on \(\mathrm{CDC}\) will be tight or not.

Theoretical Expressions for Concrete Evaluation

In this Subsection we show how \(\mathrm{CDC}\) can be evaluated in a practical setting. We first show how we can derive a closed form expression for univariate functional channels perturbed by an additive Gaussian noise. (This corresponds to \((\sigma,f)\)-additive adversaries.) Then we show how this allows to bound \(\mathrm{CDC}\) in the multivariate case perturbed by an additive Gaussian noise with a given correlation matrix \(\mathbf{\Sigma}\). This corresponds to the widely used setting from template attack (TA) (Chari, Rao, and Rohatgi 2002; Lemke-Rust and Paar 2007; Unterluggauer et al. 2017)

We show that in this model \(\mathrm{CDC}\) is suitable for concrete evaluation, even when the noise is multivariate and potentially high dimensional.

Univariate Case

The real-valued r.v. \(Z\) is said to be radially symmetric decreasing if \(p_Z(z)=p_Z(|z|)\) and decreasing in \(|z|\).

We derive a closed-form expression for \(\mathrm{CDC}\) when the channel is functional perturbed by a radially symmetric decreasing additive noise. This includes Gaussian, Laplacian or Cauchy distribution for example. As \(\mathrm{CDC}\) only depends on the channel the result does not depend on the probability distribution of \(X\). As expected the \(\mathrm{CDC}\) tends to zero as the noise increases.

[prop:closed-form-univariate] Let \(X\) be a random variable taking values in \(\mathcal{X}\) and \(Y = f(X) + Z\) where \(f\) is an arbitrary real-valued function and \(Z\) is a radially symmetric decreasing noise with survival function \(S\). Let \(m = \inf \limits_{x \in \mathcal{X}} f(x)\) and \(M = \sup \limits_{x \in \mathcal{X}} f(x)\). Then \[{\mathcal{E}}( X \rightarrow Y) = 2 S \left ( \frac{ M - m}{ 2 } \right ).\] For the widely adopted linear leakage model, \(\mathcal{X} = \mathbb{F}_2^n\) and \(f(X) = \sum_{i=1}^n a_i X_i\) where \(\mathbf{a} = (a_1,\ldots,a_n) \in \mathbb{R}^n\) and \(X_i\) denotes the \(i\)-th bit in the binary representation of \(X\), we have \(m = -\sum_i a_i^{-}\) and \(M = \sum_i a_i^+\) so that \(M-m = \sum_i |a_i| = \| \mathbf{a} \|_1\) and the expression simplifies to \[{\mathcal{E}}( X \rightarrow Y) = 2 S \left ( \frac{ \| \mathbf{a} \|_1}{ 2 } \right ).\] If \(Z \sim \sigma \mathcal{N}(0,1)\), the survival function \(S\) is the Marcum function \(Q\), and \[\overline{{\mathcal{E}}}( X \rightarrow Y) = 1 - 2 Q \left ( \frac{ \| \mathbf{a} \|_1}{ 2 \sigma } \right ) \overset{\sigma \to \infty}{=} \frac{ \| \mathbf{a} \|_1 }{\sqrt{ 2 \pi}} \frac{1}{\sigma} + O \left ( \sigma^{-3} \right ).\] For the classical Hamming weight (HW) model, \(\mathbf{a} = (1,\ldots,1)\) and \(\| \mathbf{a} \|_1 = n\).

See Appendix 8.8.

When \(f\) is constant then \(m=M\), and we obtain \(\mathcal{E}(X \to Y)=1\). This is expected as in this case the channel is opaque.

Multivariate Case

Let \(f : x \in \mathbb{F}\mapsto f(x) \in \mathbb{R}^m\) be a multivariate leakage function and \(\mathbf{Y} = f(X) + \mathcal{N}(0, \mathbf{\Sigma})\). Then \(\tilde{\mathbf{Y}} \triangleq \mathbf{W} \mathbf{Y} = ( \mathbf{W} \cdot f)(X) + \tilde{\mathbf{Z}}\) where \(\mathbf{W}\) is a given whitening matrix (e.g. \(\mathbf{W} = \mathbf{\Sigma}^{ - \frac{1}{2}}\)) so that \(\tilde{\mathbf{Z}} = \mathbf{W} \mathbf{Z} \sim \mathcal{N}(0,\mathbf{I}_m)\). Given \(X\), \(\tilde{\mathbf{Y}}\) is a Gaussian vector whose covariance matrix is diagonal. Hence, by theorem on Gaussian vectors, the different components of \(\tilde{\mathbf{Y}}\) are independent given \(X\) and Lemma [lemma:single-letter] implies that \[\label{eq:multivariate} {\mathcal{E}}(X \to \mathbf{Y}) = {\mathcal{E}}(X \to \tilde{\mathbf{Y}}) \geqslant\prod_{i=1}^{m}{\mathcal{E}}( X \to \tilde{Y}_i ).\] Since every channel \(X \to \tilde{Y}_i\) is univariate additive Gaussian noise, its expression is given by Proposition [prop:closed-form-univariate]. This methodology yields a positively biased estimator of \({\mathcal{E}}(X \to \mathbf{Y})\) from the non-biased estimator of each \(\mathcal{E}( X \to \tilde{Y}_i )\). This approach is more conservative but ensures that we do not overestimate the security parameter which would result in a false sentiment of security. This is by opposition with the perceived information (PI (Renauld et al. 2011 Eqn. 3)) which is a negatively biased estimator of \(\mathrm{MI}\) as shown in (Bronchain et al. 2019; Ito, Ueno, and Homma 2022).

As an example, consider the channel \(\mathbf{Y} \triangleq ( f(X), f(X))^T + \mathbf{Z}\) where \(f\) is a univariate leakage function and \(\mathbf{Z} \sim \mathcal{N}(0,\mathbf{\Sigma})\) with a covariance matrix \(%\begin{equation} \mathbf{\Sigma} = \sigma^2 \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}\) where \(\sigma\) is the noise standard deviation and \(\rho \in [-1,1]\) is the correlation coefficient of the noise components. Let \(m = \inf_x f(x)\) and \(M = \sup_x f(x)\). With a little of linear algebra we observe that \(\mathbf{\Sigma} = \mathbf{P} \mathbf{D} \mathbf{P}^T\) where \(\mathbf{P} = \mathbf{P}^T= \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}\) is orthonormal and \(\mathbf{D} = \sigma^2 \begin{pmatrix} 1\!+\!\rho & 0 \\ 0 & 1\!-\!\rho \end{pmatrix}\) is diagonal. We compare two noise whitening techniques.

Karhunen–Loève transform (PCA Whitening): Let \(\mathbf{W} = \mathbf{D}^{-\frac{1}{2}} \mathbf{P}^T\) be the noise whitening of Karhunen–Loève transform and \(\mathbf{W} \mathbf{Z} \triangleq \tilde{\mathbf{Z}} \sim \mathcal{N}(0,\mathbf{I}_2)\), \[\tilde{\mathbf{Y}} = \mathbf{W} \mathbf{Y} = \sqrt{\frac{2}{1+\rho}} \frac{1}{\sigma} \begin{pmatrix} f(X) \\ 0 \end{pmatrix} + \tilde{\mathbf{Z}}.\] By Proposition [prop:closed-form-univariate], \({\mathcal{E}}(X \to \tilde{Y}_1) = 2 Q \Bigl ( \sqrt{\frac{2}{1+\rho}} \frac{1}{\sigma} \frac{M-m}{2} \Bigr ).\) Furthermore, \({\mathcal{E}}(X \to \tilde{Y}_2) = 1\) so that Equation [eq:multivariate] becomes an equality here \[\label{eqn:gaussian} {\mathcal{E}}(X \to \mathbf{Y}) = 2 Q \Bigl ( \sqrt{\frac{2}{1+\rho}} \frac{1}{\sigma} \frac{M-m}{2} \Bigr ) = 1 - \frac{1}{\sigma} \sqrt{\frac{2}{1+\rho}}\frac{M-m}{\sqrt{2\pi}} + O(\sigma^{-2}).\] Equation [eqn:gaussian] is coherent:

  • When \(\rho = 1\), the leakage is repeated and as expected from Lemma [lemma:doeb-consistencency] the \(\mathrm{CDC}\) remains unchanged.

  • When \(\rho=0\), we have two independent noises, and it is optimal to average the samples such that the global noise variance is halved.

  • As \(\rho \to -1\), \({\mathcal{E}}(X \to \mathbf{Y}) \to 0\) which is expected since averaging both components completely cancels out the noise and \(f(X)\) is revealed.

Mahalanobis transform (ZCA Whitening): Let \(\mathbf{W} = \mathbf{\Sigma}^{ - \frac{1}{2}} = \mathbf{P} \mathbf{D}^{-\frac{1}{2}} \mathbf{P}^T\) be the noise whitening of Mahalanobis transform and \(\mathbf{W} \mathbf{Z} \triangleq \tilde{\mathbf{Z}} \sim \mathcal{N}(0,\mathbf{I}_2)\), \[\tilde{\mathbf{Y}} = \mathbf{\Sigma}^{ - \frac{1}{2} } \mathbf{Y} = \sqrt{\frac{1}{1+\rho}} \frac{1}{\sigma} \begin{pmatrix} f(X) \\ f(X) \end{pmatrix} + \tilde{\mathbf{Z}}.\] By Proposition [prop:closed-form-univariate], \({\mathcal{E}}(X \to \tilde{Y}_i) = 2 Q \Bigl ( \sqrt{\frac{1}{1+\rho}} \frac{1}{\sigma} \frac{M-m}{2} \Bigr ).\) Equation [eq:multivariate] becomes \[{\mathcal{E}}(X \to \mathbf{Y}) \geqslant 4 Q^2 \Bigl ( \sqrt{\frac{1}{1+\rho}} \frac{1}{\sigma} \frac{M-m}{2} \Bigr ).\] Interestingly, with Mahalanobis transform we have an inequality in [eq:multivariate] which shows that the choice of the whitening technique can affect the bound tightness.

Comparison with the Other Informational Leakage Measures

Let \(X\) be a random variable taking values in a finite set \(\mathcal{X}\) and a channel \(X \to Y\). The following inequalities proved in Appendix 10 hold:

[lemma:sandwich-bound] \[{ \left . \begin{array}{rcl} \frac{I(X;Y)}{ \log |\mathcal{X}|} \leqslant\frac{I(X;Y)}{ H(X)} \\[1ex] %\frac{\ARE(X;Y)}{ |\mathcal{X}| (|\mathcal{X}|+1)} \leq \frac{\ARE(X;Y)}{ (|\mathcal{X}|+1) (|\mathcal{X}| - 1 + \left ( 2 \mathrm{exp}(-H_\infty(X)) - 1 \right )^+) } %\\ %\left (\frac{ |\mathcal{X}|}{ |\mathcal{X}|-1} \right ) \Delta(X;Y) \leq \left . \begin{array}{rcl} \frac{ {\mathrm{ARE}}(X;Y)}{2 \gamma_X \lambda_{{\mathrm{TVI}}}} \\ \\ \frac{\beta(X;Y)}{ 2 \lambda_{{\mathrm{TVI}}} } \end{array} \! \right \} \leqslant \left . \frac{\Delta(X;Y)}{\lambda_{{\mathrm{TVI}}}} \right . \\ \\ \frac{ \exp \left ( \mathcal{L}(X \to Y) \right ) - 1}{|\mathcal{X}| - 1} %\\ %\left (\frac{ |\mathcal{X}|}{ |\mathcal{X}|-1} \right )^{ \frac{1}{2} } \beta(X;Y) %\leq \frac{ \beta(X;Y)}{ \sqrt{ 1 - \mathrm{exp}( - H_2(X))} } \end{array} \right \} \! \leqslant \!\overline{{\mathcal{E}}}(X \!\to\! Y)\! \leqslant\! \left \{ \begin{array}{lcl} {\mathrm{ARE}}(X;Y) \leqslant{\mathrm{RE}}(X;Y) \\ \\ \gamma_X \beta(X;Y) \\ \gamma_X \Delta(X;Y) \! \leqslant\! \gamma_X \left ( \frac{I(X;Y)}{2 \log e} \right )^{\frac{1}{2}} \\ \\ ( |\mathcal{X}| - 1) ( \exp \left ( \mathcal{L}(X \to Y) \right ) - 1 ) \end{array} \right . }\] where \(H\) is Shannon entropy, \(H_2\) is the collision entropy, \(\lambda_{{\mathrm{TVI}}} = 1 - \mathrm{exp}( - H_2(X) )\) and \(\gamma_X \triangleq \bigl ( \inf \limits_{x \in \mathcal{X}} p_{X}(x) \bigr )^{-1}\). If \(X \sim \mathcal{U}(\mathcal{X})\) then \(\gamma_X = |\mathcal{X}|\) and \(\lambda_{{\mathrm{TVI}}} = 1 - \frac{1}{|\mathcal{X}|}\).

Lemma [lemma:sandwich-bound] does not lower bound the \(\mathrm{CDC}\) in terms of \(\mathrm{RE}\) because it is impossible to obtain a meaningful bound. Indeed, as remarked by Masure & Standaert (Masure and Standaert 2023) if \(X \to Y\) is an erasure channel with an arbitrarily small parameter \({\mathcal{E}}> 0\) then \({\mathrm{RE}}(X;Y) = |\mathcal{X}|-1\). As a consequence, \(\mathrm{RE}\) cannot provide a smooth reduction from noisy leakages to the random probing model. We compare the different leakage measures in Table. [tab:comp-metrics] via three criteria:

  • the ratio of their lower bound by their upper bound in Lemma [lemma:sandwich-bound] when \(X \sim \mathcal{U}(\mathcal{X})\) (as a measure of relative looseness);

  • their maximal value (which measures their normalization);

  • their asymptotic values in the Hamming weight leakage model when \(X \sim \mathcal{U}(\mathbb{F}_2^n)\) hence \(|\mathcal{X}|=2^n\) (which measures their performance for a typical leakage model).

This allows us to label the introductive Figure 1.

[tab:comp-metrics]

Tab. [tab:comp-metrics] shows that \(\mathrm{CDC}\) and \(\mathrm{ARE}\) have the same asymptotic expression in the Hamming weight leakage model with high noise. While \(\mathrm{ARE}\) is suboptimal and do not verify the properties of the \(\mathrm{CDC}\), it provides a tight reduction to the random probing model in this scenario. However, the range of \(\mathrm{ARE}\) is \([0,|\mathcal{X}|-1]\) and its relative looseness is \(2 (|\mathcal{X}| - 1)\) which indicates that in a sense \({\mathrm{ARE}}\) contains the field size in its definition. In any case, it remains preferable to use the \(\mathrm{CDC}\) which provides the optimal reduction from noisy leakage to random probing.

Direct Proofs via CDC and Prouff-Rivain Subsequences

In this section we revisit the direct security proof in the noisy leakage model based on Prouff and Rivain’s subsequence decomposition (Prouff and Rivain 2013) to obtain a new derivation in terms of \(\mathrm{CDC}\). The subsequence decomposition of Prouff and Rivain is recalled in Subsection 4.1. We first prove security for subsequences of type 1 and 2 in Subsection 4.2. The security of type 3 and type 4 subsequences is obtained in Subsection 4.3 and Subsection 4.4 respectively. Finally, we combine the security bounds obtained for each subsequence into a security bound for the whole circuit in Subsection 4.5 and compare it with a \(\mathrm{MI}\) based bound in Subsection 4.6.

Subsequence Decomposition

For typical block ciphers like the AES, featuring substitution boxes (denoted by Sboxes), Prouff and Rivain (Prouff and Rivain 2013, sec. 4.2) decompose the computations in four different types of subsequences:

Type 1

\((z_i \leftarrow g(x_i))_i\) where \(g\) is a linear function (of the block cipher)

Type 2

\((x_i \leftarrow g(y_i))_i\) where \(g\) is an affine function (of Sbox evaluation)

Type 3

\((v_{i,j} \leftarrow a_i b_j)_{i,j}\) (First step of non-linear secure multiplication)

Type 4

\((t_{i,j} \leftarrow t_{i,j-1} + v_{i,j})_{i,j}\) (Last step of non-linear secure multiplication)

This decomposition has become standard to derive security proofs (Prest et al. 2019; Masure and Standaert 2023). Note that in this model it is classically assumed that the gates leak.

The first type of subsequences considers linear operations on a shared uniform variable. The second type of subsequence considers linear operations on a shared polynomial expression of a uniform variable. This is typically the case of linear operations within Sboxes. The third type of subsequences deals with the first part of the ISW multiplications involving the cross-product of the input shares of two (non-necessarily independent) random variables. Finally, the type 4 subsequences correspond to the compression layer of the ISW multiplication.

Flaws for Type 3 Subsequences in the State of the Art

In Appendix 11, In the long version, we list some flaws in the preceding direct proofs in the noisy leakage model (Prouff and Rivain 2013)(Prest et al. 2019)(Masure and Standaert 2023). While these three proofs are different in nature, the flaws appear at a similar step: proving security for type 3 subsequences. For (Prouff and Rivain 2013)(Prest et al. 2019) it is due to an incorrect derivation of the chain rule for conditional probabilities. For (Masure and Standaert 2023) it is due to the fact that the function used in Mrs. Gerber’s Lemma is convex in one variable when the others are fixed but not jointly convex. We patch part of the flaws in Appendix 11 using the reductions of \(\mathrm{MI}\), \(\mathrm{ARE}\) and EN to the \(\mathrm{CDC}\) presented in Subsection 3.4. In this section, \(X\) is a sensitive random variable taking values in \(\mathcal{X} = \mathbb{F}_2^n\) that can be expressed as a function of the secret \(K\) and a public information (e.g., plaintext or ciphertext).

Security of Type 1 and Type 2 Subsequences

In (Béguinot, Cheng, et al. 2023 Coro. 1) Béguinot et al. leveraged Mrs. Gerber’s Lemma (MGL) to derive security bounds for encodings in terms of \(\mathrm{MI}\). Masure and Standaert (Masure and Standaert 2023 Coro. 4 & 5) showed how to exploit such MGL to prove the security of type 1 and type 2 subsequences. We now show that \(\mathrm{CDC}\) also verifies a sort of MGL that quantifies the security for both type 1 (\(f = \mathrm{id}\)) and type 2 subsequences (generic \(f\)).

[MGL:doeblin] Let \(\mathbf{G} =(G_i)_{i=0}^d\) be a \(d\)-th order encoding of \(G=g(X)\) where \(g\) is a given function. Assume that each share leaks independently through the side-channels \((G_i \to Y_i)_{i=0}^d\). Let \(\mathbf{Y} \triangleq (Y_0,\ldots,Y_d)\) then, \(\overline{{\mathcal{E}}} ( X \rightarrow \mathbf{Y}) \leqslant\prod_i\overline{{\mathcal{E}}}( G_i \rightarrow Y_i).\)

The key observation is that by the secret sharing property a masked value is probed if and only if all of its shares are probed. See Appendix 8.9.

Security of Type 3 Subsequences

Let \(\mathbf{G} =(G_i)_{i=0}^d\) and \(\mathbf{H} =(H_i)_{i=0}^d\) be \(d\)-th order encodings of \(g(X)\) and \(h(X)\) where \(g,h\) are given functions. This section proves security of type 3 subsequences involving the computations with the pairs \((H_i,G_h)\). We need to introduce family of polynomials to express the security bound:

[def:rook-poly] Let \((E_{i,j})_{0 \leqslant i,j \leqslant d}\) be a collection of independents events with respective probabilities \(( (\overline{{\mathcal{E}}}_{i,j})_{0 \leqslant i,j \leqslant d})\). Let \[\label{eqn:def-rook-poly} \Upsilon( (\overline{{\mathcal{E}}}_{i,j})_{0 \leqslant i,j \leqslant d} ) \triangleq \mathbb{P} \left ( (\cap_{i=0}^d \cup_{j=0}^d E_{i,j}) \cup (\cap_{j=0}^d \cup_{i=0}^d E_{i,j}) \right ). %\\ %&\leq \label{eq:union-bound} %\P \left ( \cap_{i=0}^d \cup_{j=0}^d E_{i,j} \right ) + %\P \left ( \cap_{j=0}^d \cup_{i=0}^d E_{i,j} \right ) %\\ %&= \label{eq:DeMorgan-Derivation} %\prod_{i=0}^d \bigl ( 1 - \prod_{j=0}^d \mathcal{E}_{i,j} \bigr ) + %\prod_{j=0}^d \bigl ( 1 - \prod_{i=0}^d \mathcal{E}_{i,j} \bigr ).\] For short \(\Upsilon_d(\overline{{\mathcal{E}}}) \triangleq \Upsilon( (\overline{{\mathcal{E}}}_{i,j})_{0 \leqslant i,j \leqslant d} )\) when for all \(i,j\) we have \(\overline{{\mathcal{E}}}_{i,j}=\overline{{\mathcal{E}}}\).

In fact, \(\Upsilon_d\) corresponds to the domination polynomial of the rook graph (Mertens 2024). It can be sandwiched explicitly as follows:

\[\begin{aligned} \max \{ \prod \limits_{i=0}^d \bigl ( 1 - \prod \limits_{j=0}^d &{\mathcal{E}}_{i,j} \bigr ) , \prod \limits_{j=0}^d \bigl ( 1 - \prod \limits_{i=0}^d {\mathcal{E}}_{i,j} \bigr )\} \leqslant \Upsilon( (\overline{{\mathcal{E}}}_{i,j})_{0 \leqslant i,j \leqslant d} ) \\ &\leqslant \min \{\prod \limits_{i=0}^d \bigl ( 1 - \prod \limits_{j=0}^d {\mathcal{E}}_{i,j} \bigr ) + \prod \limits_{j=0}^d \bigl ( 1 - \prod \limits_{i=0}^d {\mathcal{E}}_{i,j} \bigr ),1 \}. \end{aligned}\] In particular when for all \(i,j\), \(\overline{{\mathcal{E}}}_{i,j}=\overline{{\mathcal{E}}}\) it yields: \[\label{eq:sandwich-doeb} \left ( 1 - \mathcal{E}^{d+1} \right )^{d+1} \leqslant\Upsilon_d(\overline{\mathcal{E}}) \leqslant\min \{ 2 \left ( 1 - \mathcal{E}^{d+1} \right )^{d+1} , 1 \}.\]

By monotonicity of probability, \[\begin{aligned} \Upsilon( (\overline{\mathcal{E}}_{i,j})_{0 \leqslant i,j \leqslant d} ) &= \mathbb{P} \left ( (\cap_{i=0}^d \cup_{j=0}^d E_{i,j}) \cup (\cap_{j=0}^d \cup_{i=0}^d E_{i,j}) \right ) \\ &\geqslant \mathbb{P} (\cap_{i=0}^d \cup_{j=0}^d E_{i,j}) = \prod_{i=0}^d \bigl ( 1 - \prod_{j=0}^d \mathcal{E}_{i,j} \bigr ). \end{aligned}\] And similarly \(\Upsilon( (\overline{\mathcal{E}}_{i,j})_{0 \leqslant i,j \leqslant d} ) \geqslant\prod_{j=0}^d \bigl ( 1 - \prod_{i=0}^d \mathcal{E}_{i,j} \bigr )\). As a consequence \[\max \{ \prod_{i=0}^d \bigl ( 1 - \prod_{j=0}^d \mathcal{E}_{i,j} \bigr ) , \prod_{j=0}^d \bigl ( 1 - \prod_{i=0}^d \mathcal{E}_{i,j} \bigr )\} \leqslant\Upsilon( (\overline{\mathcal{E}}_{i,j})_{0 \leqslant i,j \leqslant d} ).\] Also, by the union bound, \[\begin{aligned} \Upsilon( (\overline{\mathcal{E}}_{i,j})_{0 \leqslant i,j \leqslant d} ) &\leqslant \mathbb{P} (\cap_{i=0}^d \cup_{j=0}^d E_{i,j}) + \mathbb{P} (\cap_{j=0}^d \cup_{i=0}^d E_{i,j}) \\ &= \prod_{i=0}^d \bigl ( 1 - \prod_{j=0}^d \mathcal{E}_{i,j} \bigr ) + \prod_{j=0}^d \bigl ( 1 - \prod_{i=0}^d \mathcal{E}_{i,j} \bigr ). \end{aligned}\] Finally, \(\Upsilon( (\overline{{\mathcal{E}}}_{i,j})_{0 \leqslant i,j \leqslant d} ) \leqslant 1\) since it is a probability.

Stephan Mertens (Mertens 2024 Thm. 4) provides a recursive formula for this polynomial, which gives an efficient way to compute exhaustively the coefficients of \(\Upsilon_d\). Additional properties are mentioned in Appendix 9.

Rationale For The Rook Domination Polynomial

Within type 3 subsequences of ISW the cross-wise terms \(G_i H_j\) are computed so that each pair \((G_i, H_j)\) leaks. After degradation into an erasure channel, for each pair \((i,j)\) the degraded adversary \(\mathcal{A}\) either probes \((G_i, H_j)\) or receives an erasure symbol. Let \(E_{i,j}\) be the event that \(\mathcal{A}\) probes the pair \((G_i, H_j)\). By the secret sharing property, the sensitive value leaks if and only if we probe each \(G_i\) or each \(H_j\).

Let \(E \triangleq (\cap_{i=0}^d \cup_{j=0}^d E_{i,j}) \cup (\cap_{j=0}^d \cup_{i=0}^d E_{i,j})\). Equation [eqn:def-rook-poly] defines \(\Upsilon \triangleq \mathbb{P} (E)\). If one represents the \(E_{i,j}\) in a checkerboard, the sensitive value is probed if and only if at least one event \(E_{i,j}\) occurs at least once in each line (\(\cap_{i=0}^d \cup_{j=0}^d E_{i,j}\)) or at least once in each column (\(\cap_{j=0}^d \cup_{i=0}^d E_{i,j}\)). This results in Equation [eqn:def-rook-poly].

Considering that \(E_{i,j}\) occurs when a rook is placed at coordinates \((i,j)\) of the checkerboard, the event \(E\) is realized if and only if every position in the checkerboard is attacked (or occupied) by at least one rook. The domination polynomial of the rook graph counts the number of such configurations of \(m\) rooks. Therefore, it allows one to compute \(\Upsilon_d\) when all \(E_{i,j}\) are equiprobable.

We now show the security of type 3 subsequences using the domination polynomial of the rook graph:

[lemma:type3] Consider the channels \(((G_i,H_j) \to Y_{i,j})_{0\leqslant i,j \leqslant d}\) and let \(\mathbf{Y} \triangleq (Y_{i,j})_{0\leqslant i,j \leqslant d}\). Then one has \[\overline{{\mathcal{E}}}(X \rightarrow \mathbf{Y} ) \leqslant\Upsilon( (\overline{{\mathcal{E}}} ( (G_i,H_j) \rightarrow Y_{i,j} ))_{0 \leqslant i,j \leqslant d} ).\]

See Appendix 8.10.

Security of Type 4 Subsequences

Type 4 subsequences consider the compression layer in multiplication gadgets. At this stage, the sensitive variable is masked in \((d+1)^2\) shares. In the compression layer, the shares are grouped in \(d+1\) groups each of size \(d+1\), and are recombined to obtain a \(d\)-th order encoding of the sensitive variable. Formally, let \((V_{i,j})\) be an encoding in \((d+1)^2\) shares of \(f(X)\) where \(f\) is a given function. Let \(T_{i,0} = V_{i,0}\) and \(T_{i,j} = T_{i,j-1} \oplus V_{i,j}\). In particular \((T_{i,d})_{i=0}^d\) is a \(d\)-th order encoding of \(f(X)\).

[lemma:recombination-doeb] Consider the cumulative sum function \(h_d : (x_0,\ldots,x_d) \in \mathbb{F}^{d+1} \mapsto (x_0,x_0+x_1,\ldots,x_0+\ldots+x_d) \in \mathbb{F}^{d+1}.\) \((V_{i,0},\ldots,V_{i,d})\) is a \(d\)-th order sharing of \(T_{i,d}\) and since \(T_{i,j} - T_{i,j-1} = V_{i,j}\), a channel from \((T_{i,j-1},V_{i,j})\) can be seen as a channel from the pair \((T_{i,j}, T_{i,j-1})\). Let \[\mathbf{T}_i = (T_{i,0},\ldots,T_{i,d+1}) \to \boxed{ p_{\mathbf{Y}| \mathbf{T}_i } \triangleq \prod_{j=0}^d p_{Y_{i,j} | (T_{i,j-1},T_{i,j})} } \to \mathbf{Y}_i=(Y_{i,0},\ldots,Y_{i,d})\] with the convention that \(T_{i,-1}=0\). Then the channel rewrites, \[T_{i,d} \to \boxed{ \mathrm{Mask}_d } \to \mathbf{V_i} = (V_{i,0},\ldots,V_{i,d}) \to \boxed{ h_d } \to \mathbf{T}_i \to \boxed{ P_{\mathbf{Y}| \mathbf{T}_i } } \to \mathbf{Y}_i\] and \(\overline{{\mathcal{E}}}( T_{i,d} \rightarrow \mathbf{Y}_i ) \leqslant\overline{{\mathcal{E}}}( (T_{i,d-1},V_{i,d}) \rightarrow Y_{i,d} ).\)

See Appendix 8.11.

We obtain now the following security result for type 4 subsequences:

[lemma:type4] Consider \(( (T_{i,j-1},V_{i,j}) \to Y_{i,j} )_{0 \leqslant i,j \leqslant d}\) and let \(\mathbf{Y} = (Y_{i,j})_{0 \leqslant i,j \leqslant d}\) then, \(\overline{{\mathcal{E}}}( X \rightarrow \mathbf{Y} ) \leqslant\prod_{i=0}^d \overline{{\mathcal{E}}}( (T_{i,d-1},V_{i,d}) \rightarrow Y_{i,d} ) .\)

Combine Lemma [MGL:doeblin] with Lemma [lemma:recombination-doeb].

Security for the Whole Circuit

Combining Lemmas [MGL:doeblin],[lemma:type3],[lemma:type4] for the different subsequences together with the single-letterization Lemma [lemma:adaptive-single-letter] we obtain a security guarantee for the whole circuit:

[thm:doeblin-pr] Consider an implementation with \(n_i\) subsequences of type \(i\) (\(i=1,2,3,4\)) and a \(\overline{{\mathcal{E}}}\)-noisy adversary with respect to \(\mathrm{CDC}\) with \(q\) queries. Let \(\mathbf{Y}\) be the vector of all corresponding side-informations acquired by the chosen channel adversary. Then one has \[\label{eq:main-doeb-bound} 0 \leqslant\overline{{\mathcal{E}}}( K \rightarrow \mathbf{Y} ) \leqslant 1 - \bigl ( \bigl ( 1- \overline{{\mathcal{E}}}^{d+1} \bigr )^{n_1 + n_2 + n_4} \bigl ( 1 - \Upsilon_d(\overline{{\mathcal{E}}}) \bigr )^{n_3} \bigr )^q \leqslant 1.\] The upper bound can be weakened via the union bound to \[\label{eq:ub-main} \overline{{\mathcal{E}}}( K \rightarrow \mathbf{Y} ) \leqslant q \left ( (n_1 + n_2 + n_4) + 2 n_3 (d+1)^{d+1} \right )\overline{{\mathcal{E}}}^{d+1}.\] Also, using the asymptotic equivalent of the domination polynomial of the rook graph the upper bound of Equation [eq:main-doeb-bound] is asymptotically equivalent to \[q \left (n_1 + n_2 + \left ( 2 (d+1)^{d+1} - (d+1)! \right ) n_3 +n_4 \right )\overline{{\mathcal{E}}}^{d+1}.\]

Equation [eq:ub-main] is of a similar form as (Masure and Standaert 2023 Thm. 5) but the constants \(n_i\) are not scaled by a term depending on the field size \(|\mathcal{X}|\), contrary to the \(t_i\) occurring in (Masure and Standaert 2023 Thm. 7).

See Appendix 8.12.

De Chérisey et al. obtained a lower bound on the minimum number of queries required by the adversary to achieve a given advantage in terms of \(\mathrm{SR}\) in the unprotected setting with \(\mathrm{MI}\) (Chérisey et al. 2019 Thm. 2, Eqn. 4). Béguinot et al. derived a tight bound for masked encoding with \(\mathrm{MI}\) (Béguinot, Cheng, et al. 2023 Coro. 2) and maximal leakage (Béguinot, Liu, et al. 2023 Coro. 1). Liu et al. derived a tight bound for masked encoding with Sibson’s \(\alpha\)-information of order \(2\) (Liu et al. 2023 Thm. 2). The combination of Theorem [thm:doeblin-pr] with Proposition [prop:doeblin-to-guessing] yields a lower bound on the minimum number of queries required by the adversary to achieve a given advantage in terms of \(\mathrm{SR}\), \(\mathrm{GE}\) or \(\mathrm{TVI}\) for the entire protected computations (not only encodings).

[thm:lower-bound-q] Let \[\begin{aligned} \lambda(\overline{{\mathcal{E}}},d) &= \bigl ( \ln \bigl (\bigl ( 1- \overline{{\mathcal{E}}}^{d+1} \bigr )^{n_1 + n_2 + n_4} \left ( 1 - \Upsilon_d(\overline{{\mathcal{E}}}) \right )^{n_3} \bigr ) \bigr )^{-1} \\ &= \bigl ( (n_1 + n_2 + n_4) \log \bigl (1- \overline{{\mathcal{E}}}^{d+1}\bigr ) + n_3 \log \bigl ( 1 - \Upsilon_d(\overline{{\mathcal{E}}}) \bigr ) \bigr )^{-1} \\ &\approx \bigl ( \left ( n_1 + n_2 + n_4 + n_3 (2 (d+1)^{d+1} - (d+1)!) \right ) \overline{{\mathcal{E}}}^{d+1} \bigr )^{-1}. \end{aligned}\] The number of queries to achieve \(\mathbb{P} _{s,o}(K|\mathbf{Y}) = \mathbb{P} _{s,o}\), \(G(K|\mathbf{Y}) = G\) or \(\Delta(K;Y) = \Delta\) is at least: $$ \[\begin{array}{rll} &q_{\mathrm{sr}} &\geqslant\lambda(\overline{{\mathcal{E}}},d) \ln \left ( (1 - \mathbb{P} _{s,o})^{-1} \lambda_{{\mathrm{SR}}_o} \right ), \\ &q_{\mathrm{ge}} &\geqslant\lambda(\overline{{\mathcal{E}}},d) \ln \left ( (G - 1)^{-1} \lambda_{{\mathrm{GE}}} \right ), \\ &q_{\mathrm{tvi}} &\geqslant\lambda(\overline{{\mathcal{E}}},d) \ln \left ( \Delta^{-1} \lambda_{{\mathrm{TVI}}} \right ). \end{array}\]

$$

Theorem [thm:lower-bound-q] is illustrated by Figure. [fig:lb-q]. It shows how it behaves for a fixed number of queries and increasing level of noise in Figure 4 and how it behaves for a fixed value of the \(\mathrm{CDC}\) and increasing number of queries \(q\) in Figure 5. We can observe on Figure. 4 that there is an optimal masking order with respect to Theorem [thm:lower-bound-q] depending on the noise level. Further, Figure. 5 shows that the bound benefits from the single letterization of Lemma [lemma:single-letter] and Lemma [lemma:adaptive-single-letter] as it is an “S”-shaped curve. The weakened version of Lemma [lemma:adaptive-single-letter] into a linear bound corresponds to the dotted line.

One query (\(q=1\)) and variable \(\sigma\) under Hamming weight leakage model.

\(\overline{{\mathcal{E}}} = 2 \times 10^{-3}\) and increasing number of queries \(q\).

Comparison With Bounds Based on Mutual Information

Theorem [thm:doeblin-pr] provides bounds on all figures of merits that we compare with bounds based on \(\mathrm{MI}\). Let \(K \to Y\) be a side-channel of a uniform key \(K \sim \mathcal{U}(\mathbb{F}_2^n)\). Let \(Y = \mathrm{HW}(K) + N\) where \(N \sim \mathcal{N}(0,\sigma^2)\) is an additive Gaussian noise. Let \(\mathbf{Y} = (Y_1,\ldots,Y_q)\) be a side-channel leakage with \(q\) queries.

No Constraint on the Noise Level

A strength of Theorem [thm:doeblin-pr] is that the bound on the \(\mathrm{CDC}\) (Equation [eq:main-doeb-bound]) is always less than \(1\). In particular, our bound does not require any constraint on the noise level to apply. This is by contrast with \(\mathrm{MI}\) based bound such as (Masure and Standaert 2023).

Scaling with the Number of Queries

De Chérisey et al. (Chérisey et al. 2019 Lemma 5) established an upper bound on the \(\mathrm{MI}\) for side-channels that grows linearly with respect to \(q\) i.e., \(I(K;\mathbf{Y}) \leqslant q I(K;Y)\). However, the \(\mathrm{MI}\)\(I(K;\mathbf{Y})\) is bounded above by the constant \(\log |\mathcal{K}|\). Therefore, the linear bound cannot be tight for large values of \(q\). Finding an alternative practical non-linear bound on the \(\mathrm{MI}\) that remains bounded when \(q\) increases remains an interesting open question. However, our bound \(\overline{\mathcal{E}}(K \to \mathbf{Y} ) \leqslant 1 - (1- \overline{\mathcal{E}}(K \to Y))^q \leqslant q \overline{\mathcal{E}}(K \to Y)\) of Lemma [lemma:adaptive-single-letter] for the \(\mathrm{CDC}\) is nonlinear with respect to \(q\) and can be weakened into a linear bound using Boole’s inequality. This linear approximation is tight when \(\overline{\mathcal{E}} \to 0\).

Attack with One Trace (SPA) and Multiple Traces (DPA)

On the one hand, Béguinot et al. (Béguinot, Liu, et al. 2023 Eqn. 87) showed that as \(I(K;\mathbf{Y}) \rightarrow 0\), we have \[\label{eqn:fano-taylor} \mathbb{P} _s(K| \mathbf{Y}) - \mathbb{P} _s(K) \leqslant\sqrt{ \frac{2(2^n-1)}{2^{2n}} \frac{q I(K;Y)}{ \log e} }.\] On the other hand, Proposition [prop:doeblin-to-guessing] yields \[\label{eqn:cdc-vs-fano} \mathbb{P} _s(K|\mathbf{Y}) - \mathbb{P} _s(K) \leqslant(1 - (1-\overline{{\mathcal{E}}}(K \to Y))^q) \lambda_{ \mathrm{SR} } \approx q \overline{{\mathcal{E}}}(K \to Y) \lambda_{ \mathrm{SR} }.\] While \(\delta_{\mathrm{MI}}\propto \frac{1}{\sigma^2}\) (Battistello et al. 2016 Appendix A) and \(\delta_{{\mathrm{CDC}}} \propto \frac{1}{\sigma}\) (Proposition [prop:closed-form-univariate]), the square root in the \(\mathrm{MI}\)-based security bounds (e.g., Eqn. [eqn:fano-taylor]) makes both bounds on the \(\mathrm{SR}\) advantage to be \(O( \sigma^{-1})\). As a consequence, the \(\mathrm{CDC}\)-based bound is comparable with the \(\mathrm{MI}\)-based bound for single trace attack \((q=1)\) and at large noise.

However, in the context of a DPA where the goal is to lower bound the minimum number of queries \(q\) to achieve a given figure of merit the \(\mathrm{MI}\) based bound Eqn. [eqn:fano-taylor] will be in \(O(\sigma^2)\) while the \(\mathrm{CDC}\) based bound Eqn. [eqn:fano-taylor] is only \(O(\sigma)\). In the presence of masking the same observation applies to Theorem [thm:lower-bound-q] and (Masure and Standaert 2023 Theorem 7) where \(\sigma\) is replaced by \(\sigma^{d+1}\). In conclusion \(\mathrm{CDC}\) may not be the most suitable informational noisiness measure to capture the leakage in a DPA with many traces. However, since the factorization is optimal this leads to the important conclusion that such a loss is inherent to a reduction from noisy leakages to the random probing model. For this reason informational bounds based on \(\mathrm{MI}\) remain an interesting tool for side-channel analysis.

Indirect Proofs via CDC With Random Probing

In this Section, we explain how existing proof in the random probing model can be combined with the \(\mathrm{CDC}\). We show how a security proof in the random probing model can be lifted into the noisy leakage model by improving  (Duc, Dziembowski, and Faust 2014 Thm. 1) in Subsection 5.1. The resulting bound is shown to yield an upper bound on the side-channel adversary’s advantage in Subsection 5.2. The asymptotic behavior of the security bound is analyzed in Subsection 5.3. This analysis confirms theoretically the finding from Battistello et al. (Battistello et al. 2016) that increasing indefinitely the masking order of ISW gadgets whose noise rate is not constant can be detrimental to security.

Lifting Security Proof in the Random Probing Model to Noisy Leakage

We first refer back the existing security proofs in the random probing model. Cassiers et al. (Cassiers et al. 2021) showed how to derive tight security bounds in the random probing model using probe distribution table (PDT). Belaïd et al. (Belaı̈d et al. 2020; Belaı̈d, Rivain, and Taleb 2021) also derived security proof in the random probing model based on an expansion strategy of small gadgets. Their improvements are based on the fact that when an adversary probes more than \(t\) wires it does not necessarily learn any information about the sensitive variable. The t-threshold probing security ensures that no subset of at most \(t\) wires leak information. However, some subsets of more than \(t\) wires does not leak information either. By carefully determining the subsets of leaking wires and the subset of non-leaking wires the so-called probability of simulation failure can be reduced.

Our goal is to show how we can lift a proof in the random probing model to the noisy leakage model using \(\mathrm{CDC}\). To do so we show how to improve (Duc, Dziembowski, and Faust 2014 Thm. 1). In the same way the above-mentioned proof in the random probing model can be lifted to the noisy leakage model using the \(\mathrm{CDC}\).

In this section, the circuit \(\Gamma\) is decomposed in \(|\Gamma|\) regions (gadgets) whose numbers of wires is specified by the sequence \((l_i)\). We assume that \(\Gamma\) is secure in the region probing model, i.e. any set of at most \(t\) (probed) wires in each region of the circuit is independent with the secret key.

[th:random-probing] Let \(\mathcal{A}\) be a \(\overline{{\mathcal{E}}}\)-noisy adversary with respect to \(\mathrm{CDC}\) with \(q\) queries. Let \(\mathbf{Y}\) be the vector of all corresponding side-information acquired by the chosen channel adversary. Then one has \[\overline{{\mathcal{E}}}(K \to \mathbf{Y}) \leqslant\mathrm{fail}(t,(l_i),{\overline{{\mathcal{E}}}},q) \triangleq 1 - \prod_{i=1}^{|\Gamma|} \Big( 1 - Q_B(t,l_i, {\overline{{\mathcal{E}}}}) \Bigr )^{q} %\\ %& \leqslant q \sum_{i=1}^{|\Gamma|} Q_B(t,l_i, {\overline{{\mathcal{E}}}}).\]

Here the wires leak while the gates leak in Section 4.

See Appendix 8.13.

Theorem [th:random-probing] is very generic, it is “agnostic” to the countermeasure implemented to achieve security against \(t\)-threshold probing adversary in the region probing model. Masure & Standaert observed in (Masure and Standaert 2023 Tab. 1) that (Duc, Dziembowski, and Faust 2014 Thm. 1) does not provide incentive to noisier leakage. Theorem [th:random-probing] does not suffer from this weakness. In particular, the adversary’s advantage vanishes as the noise level increases. Theorem [th:random-probing] depends on five parameters:

  1. The noise level as quantified by \({\overline{{\mathcal{E}}}}\).

  2. The security order of the gadgets as quantified by t.

  3. The “temporal attack surface” quantified by the number of queries \(q\).

  4. The attack surface of the adversary within a gadget as measured by the number \(l\) of wires potentially probed within the gadgets.

  5. The attack surface of the adversary on the whole circuit as measured by \(|\Gamma|\).

Bounds on SCA Advantage

If a cryptographic algorithm is insecure against a black-box adversary then it is also insecure against a side-channel adversary. For this reason we are interested in the advantage of the side-channel adversary compared to its black-box counterpart. In Proposition [prop:sca:adv] we upper bound this advantage, contrary to the usual bound for side-channel attacks it depends on the computational assumption made on the adversary through the term \(\mathbb{P} _{s,o}^{\mathrm{BB}}(q)\).

[prop:sca:adv] Let \(\mathcal{A}\) be a \(\overline{{\mathcal{E}}}\)-noisy with respect to \(\mathrm{CDC}\) adversary with \(q\) queries. Let \(\mathbb{P} _{s,o}^{\mathrm{BB}}(q)\) and \(\mathbb{P} _{s,o}^{\mathrm{SCA}}(q,\overline{{\mathcal{E}}})\) be the respective \({\mathrm{SR}}_o\) of the best black box and side-channel adversary with \(q\) queries. Then, \[\left\{ \begin{array}{llll} & \mathbb{P} _{s,o}^{\mathrm{SCA}}(q,\overline{{\mathcal{E}}}) - \mathbb{P} _{s,o}^{\mathrm{BB}}(q) &\leqslant\left ( 1 - \mathbb{P} _{s,o}^{\mathrm{BB}}(q) \right ) &\mathrm{fail}(t,(l_i),\overline{{\mathcal{E}}},q) \\ &{\mathrm{GE}}^{\mathrm{BB}}(q) - {\mathrm{GE}}^{\mathrm{SCA}}(q,\overline{{\mathcal{E}}}) &\leqslant\left ( {\mathrm{GE}}^{\mathrm{BB}}(q) - 1 \right ) &\mathrm{fail}(t,(l_i),{\overline{{\mathcal{E}}}},q). \end{array} \right.\]

This is a direct consequence of Theorem [th:random-probing] and Proposition [prop:doeblin-to-guessing] applied to a (possibly computationally bounded) adversary.

Properties of the Failure Probability Function

We analyze how our bound depends on the order of an implemented countermeasure. Consider a masking countermeasure of order \(d\), and for illustrative purposes assume that \(t=d\) (which would obtained for the implementation of (Rivain and Prouff 2010b) separated by leak-free refresh). Also assume that the \(l_i\) associated with the multiplication gadgets grow quadratically with respect to \(d\) (e.g., ISW). We use the shorthand notations \(l(d) = \max_i l_i(d)\), \(\mathrm{fail}(d,\overline{{\mathcal{E}}} ) = \mathrm{fail}(t(d),(l_i(d)),\overline{{\mathcal{E}}},1)\) and \(\mathrm{fail}(d,\overline{{\mathcal{E}}},q) = \mathrm{fail}(t(d),(l_i(d)),\overline{{\mathcal{E}}},q)\). We can define an optimal masking order with respect to Theorem [th:random-probing]: \[d^*({\overline{{\mathcal{E}}}}) \triangleq \arg \min_{d \in \mathbb{N}} \mathrm{fail}(d,{\overline{{\mathcal{E}}}},q).\] Since \(\mathrm{fail}(d,{\overline{{\mathcal{E}}}},q) = 1 - \left ( 1 - \mathrm{fail}(d,{\overline{{\mathcal{E}}}}) \right )^q\) we have \(d^*({\overline{{\mathcal{E}}}}) = \arg \min_{d \in \mathbb{N}} \mathrm{fail}(d,{\overline{{\mathcal{E}}}})\) so that the optimal masking order is independent of the number of queries.

It is not true in general that \(\mathrm{fail}(d,{\overline{{\mathcal{E}}}})\) is a decreasing function of the masking order \(d\). Though we show that it essentially holds in the limits of high noise.

[prop-essentialy-decreasing] For all \(d_1 < d_2\) there exist a noise threshold \(\overline{{\mathcal{E}}}_0\) such that for all noise level \(\overline{{\mathcal{E}}} \leqslant\overline{{\mathcal{E}}}_0\), \(\mathrm{fail}(d_1,\overline{{\mathcal{E}}}) > \mathrm{fail}(d_2,\overline{{\mathcal{E}}}),\) which indicates that \(d^*(\overline{{\mathcal{E}}}) \overset{\overline{{\mathcal{E}}} \to 0 }{\longrightarrow} \infty.\)

See Appendix 8.14.

Proposition [prop-essentialy-decreasing] means that while the bound is not decreasing with respect to the masking order \(d\) there always exists a noise level for which masking at higher order is more interesting.

[prop:finite-masking-order] If \(l(d) / t(d) \overset{ d \to \infty }{\longrightarrow} \infty\) then \(\mathrm{fail}(d,{\overline{{\mathcal{E}}}}) \overset{ d \to \infty }{\longrightarrow} 1.\) Therefore, there exist a finite optimal masking order with respect to the noise level \(d^*({\overline{{\mathcal{E}}}}) < \infty\) and \(\mathrm{fail}(d,{\overline{{\mathcal{E}}}})\) cannot be reduced further than \(\mathrm{fail}(d^*,{\overline{{\mathcal{E}}}}) > 0\). This in turn implies that the adversary’s advantage cannot be made arbitrarily small.

See Appendix 8.15.

Proposition [prop:finite-masking-order] means that for a fixed level of noise increasing indefinitely the masking order is detrimental to the security bounds. As a consequence, there exists a finite optimal masking order with respect to the noise level.

[prop:error-exp] If \({\overline{{\mathcal{E}}}} < {\overline{{\mathcal{E}}}}_0 \leqslant\frac{t}{l}\) then \(Q_B(t,l,{\overline{{\mathcal{E}}}}) \leqslant\mathrm{exp}\left ( - l d_{\mathrm{KL}} \left ( t/l \middle\| {\overline{{\mathcal{E}}}} \right ) \right ) \leqslant\mathrm{exp}\left ( - l d_{\mathrm{KL}} \left ( {\overline{{\mathcal{E}}}}_0 \middle\| {\overline{{\mathcal{E}}}} \right ) \right )\) where \(d_{ \mathrm{KL} }(p\|q) \triangleq D_{ \mathrm{KL} }( \mathcal{B}(p) \| \mathcal{B}(q) ) = p \log \frac{p}{q} + \bar{p} \log \frac{ \bar{p}}{ \bar{q}}\). In this case \(d^*({\overline{{\mathcal{E}}}}) = \infty\).

We apply Chernoff-Hoeffding bound (Hoeffding 1994).

Proposition [prop:error-exp] shows that when the gadget size grows at most linearly with respect to the security parameter \(t\) then masking provides an exponential gain with respect to the gadget size. The coefficient in the exponential is lower bounded by a binary divergence between a protection rate \(\frac{t}{l}\) and a leakage rate \(\overline{{\mathcal{E}}}\).

The original expression of (Duc, Dziembowski, and Faust 2014 Thm. 1) can be misinterpreted as it seems that the probability of error converges to \(0\) when \(d\) increases. But when the gadget size \(l\) grows quadratically with respect to \(d\) then the maximum value of \(d\) tolerated in the proof is upper bounded by a function inversely proportional to the noise level. Hence, it is not true to say that the advantage of the adversary decreases exponentially with respect to \(d\). This confirms the observations from (Battistello et al. 2016) where Battistello et al. observed that the noise is expected to decrease linearly with the number of shares and that this assumption is not met in practice. Figure 6 derived from Theorem [th:random-probing] shows that for quadratic gadget and a fixed level of noise, the advantage of the adversary is either increasing or decreasing and then increasing with respect to the masking order depending on the noise level. For linear gadget, Figure 7 shows that masking does provide an exponential gain with respect to the adversaries advantage. Note that this is a weakness of the ISW gadget whose noise rate is not constant and not a weakness of the bound. This emphasizes the importance to obtain gadget with improved noise rate (Cassiers and Standaert 2019). In particular quasi-linear masking scheme (Goudarzi, Joux, and Rivain 2018; Goudarzi et al. 2022; Carlet et al. 2024) and Toom-Cook based gagdets (Plançon 2022) are promising approaches.

Quadratic gadget (multiplication) with \(t=d\) and \(l=3d^2\).

Linear gadget (linear operation) with \(t=d\) and \(l=3d\).

Proposition [prop:finite-masking-order] may appear as contradictory with (Duc, Faust, and Standaert 2019 Coro. 2). It turns out that (Duc, Faust, and Standaert 2019 Coro. 2) is incorrectly derived from (Duc, Faust, and Standaert 2019 Thm. 3). See the Appendix 11.6 for more details.

Conclusion

We showed how the complementary Doeblin coefficient (\(\mathrm{CDC}\)) can be used to reduce optimally a noisy adversary to a random probing adversary. This allows us to exhibit the unavoidable inherent cost of a reduction from noisy leakages to the random probing model. We derived a set of properties of the \(\mathrm{CDC}\) which makes it a sound leakage measure that is easy to manipulate and showed that it is amenable to evaluation in a multivariate setting.

The \(\mathrm{CDC}\) yields security bounds for all figures of merits that scale well with the number of side-channel queries (single letterization property). As a byproduct we also lower bounded the minimum number of queries to achieve a given figure of merit.

Furthermore, security bounds in terms of \(\mathrm{CDC}\) are easily derived using the Prouff-Rivain subsequence decomposition or can be naturally combined with existing security proofs in the random probing model for any type of countermeasures. We analyzed the asymptotic behavior of the obtained bounds in terms of countermeasure order and confirmed the existence of an optimal masking order with respect to the security bounds.

Overall, we believe that these contributions are essential to ground the security of masked implementations in the noisy leakage model on solid foundations. As perspectives, we would like to obtain direct security proof of code-based masking implementation using \(\mathrm{CDC}\) leveraging a new appropriate subsequence decomposition. Investigating formal security proof of masking combined with shuffling (Azouaoui et al. 2022) using \(\mathrm{CDC}\) could also be relevant as a way to reduce the physical noise requirement to obtain relevant security parameters.

Acknowledgements

We are grateful to Stephan Mertens that provided us an efficient way to compute the coefficients of the rook domination polynomials appearing in Type 3 subsequence. We also thank him for sharing us the explicit values of the coefficients of the domination polynomials up to \(d=20\). We are also grateful to Loïc Masure, François-Xavier Standaert, Matthieu Rivain and Thomas Prest for our very insightful discussions. We would also like to thank the anonymous reviewers for their in-depth reviews of the article and their extremely valuable suggestions. Secure-IC acknowledges partial funding from the European Union’s Horizon Europe research and innovation programme through the ALLEGRO project, under grant agreement No. 101070009. Julien Béguinot is a PhD candidate funded by Institut Mines-Télécom through the Futur & Ruptures program. Wei Cheng is also partially supported by National Key R&D Program of China No. 2022YFB3103800.

— SUPPLEMENTARY MATERIAL —

[sec-appendix]

Two Equivalent Descriptions of a Channel

In this section, we provide a proof sketch that any random transformation \(X \to \boxed{ P_{Y|X} } \to Y\) can be seen as a random function \(Y=F(X)\) where \(F\) is distributed according to a distribution \(P_F\) (\(F\sim P_F\)).

On one hand, the random function description \(Y=F(X)\) is obviously a particular case of a channel \(X \to \boxed{ P_{Y|X} } \to Y\) where for every input \(x\) and every event \(E\subset \mathcal{Y}\), \[P_{Y|X}(Y\in E | X=x) = \mathbb{P} (F(x)\in E) .\] and the probability on the right-hand side is taken over \(F\sim P_F\).

Conversely, any channel \(X \to \boxed{ P_{Y|X} } \to Y\) can be seen as a random function \(Y=F(X)\) for a suitably defined \(F\sim P_F\). This fact was proved rigorously in (Weizsäcker 1974) and can be seen as follows. For each fixed input \(x\) define the random variable \(F(x)\) by the probability distribution \[\mathbb{P} (F(x)\in E) = P_{Y|X}(Y\in E | X=x)\] for any event \(E\subset \mathcal{Y}\). This defines a stochastic process \(F = \{ F(x) |x \in \mathcal{X} \}\) indexed by \(\mathcal{X}\) that follows some probability distribution \(P_F\). It can be shown (Weizsäcker 1974) that \(F\) is the desired random function such that \(X \to \boxed{ P_{Y|X} } \to Y=F(X)\).

Technical Proofs

Proof of Lemma [lemma:erasure-channel-commutes]

We verify that we obtain the same transition probability. Let \(x \in \mathcal{X}\) and \(y' \in \mathcal{Y}\). First on the left-hand side: \[\begin{cases} p_{Y' |X}( \bot | x) &\hspace*{-0.2cm}= \int_{ \mathcal{ X} \cup \{ \bot \} } \hspace*{-0.7cm} p_{X'|X}(x'|x) p_{Y|X}^{\bot}(\bot|x') = \bar{ {\mathcal{E}}} p_{Y|X}^{\bot}(\bot|x) + {\mathcal{E}}p_{Y|X}^{\bot}(\bot|\bot) = {\mathcal{E}} \\ p_{Y' |X}( y' | x) &\hspace*{-0.2cm}= \bar{ {\mathcal{E}}} p_{Y|X}^{\bot}(y' |x) + {\mathcal{E}}p_{Y|X}^{\bot}( y'|\bot) = \bar{ {\mathcal{E}}} p_{Y|X}(y' |x). \end{cases}\] On the right-hand side: \[\begin{cases} p_{Y' |X}( \bot | x) &= \int_{ \mathcal{ Y} } p_{Y|X}( y|x) p_{Y'|Y}(\bot|y) = \int_{ \mathcal{ Y} } p_{Y|X}( y|x) {\mathcal{E}} = {\mathcal{E}} \\ p_{Y' |X}( y' | x) & = \int_{ \mathcal{ Y} } p_{Y|X}( y|x) p_{Y'|Y}(y'|y) = \bar{ {\mathcal{E}}} p_{Y|X}(y' |x). \end{cases}\]

Proof of Proposition [prop:doeblin-to-guessing]

This is a consequence of a DPI. By Theorem [thm:doeblin-degradation], we obtain \[K \to \boxed{ \mathrm{EC}_{{\mathcal{E}}( K \to Y)}^{\bot} } \to K' \to Y.\] Hence, by DPI on the \(\mathrm{SR}\)(Lemma [lemma:DPI-metrics]), we have \[\mathbb{P} _{s,o}( K | Y) \leqslant \mathbb{P} _{s,o}(K|K') = \overline{{\mathcal{E}}}( K \to Y) +{\mathcal{E}}( K \to Y) \mathbb{P} _{s,o}(K).\] Further by DPI on the \(\mathrm{GE}\)(Lemma [lemma:DPI-metrics]), we have \[G(K|Y) \geqslant G(K|K') = \overline{{\mathcal{E}}}( K \to Y) +{\mathcal{E}}( K \to Y) G(K).\] Finally, by DPI on the \(\mathrm{TVI}\)(Lemma [lemma:DPI-metrics]), we have \[\begin{aligned} \Delta(K|Y) \leqslant\Delta(K|K') &= \overline{{\mathcal{E}}}( K \to Y) \sum_{k \in \mathcal{K}} p_K(k) (1 - p_K(k)) \\ &= \overline{{\mathcal{E}}}( K \to Y) \left (1 - \mathrm{exp} \left ( -H_2(K) \right ) \right ). \end{aligned}\]

Proof of Theorem [thm:doeblin-degradation]

Consider a channel \(P_{Y|X}\) with a given Doeblin coefficient \(\mathcal{E}\). We show that there exists a channel \(P_{Y|X'}\) such that the channel \(P_{Y|X}\) factorizes into \(P_{X'|X} P_{Y|X'}\) where \(P_{X'|X}\) is an erasure channel with parameter \(\mathcal{E}\). Let \[\begin{cases} p_{Y|X'}(y|\bot) = \mathcal{E}^{-1} \inf \limits_{x \in \mathcal{X}} p_{Y|X}(y|x) \\ p_{Y|X'}(y|x) = \overline{\mathcal{E}}^{-1} ( p_{Y|X}(y|x) - \inf \limits_{x \in \mathcal{X}} p_{Y|X}(y|x)) \end{cases}.\] This is a well defined channel since all terms are nonnegative and \(\int_{\mathcal{Y}} p_{Y|X'}(y|\bot) =\int_{\mathcal{Y}} p_{Y|X'}(y|x) = 1\). This gives the desired factorization since \(\mathcal{E} \mathcal{E}^{-1} \inf \limits_{x \in \mathcal{X}} p_{Y|X}(y|x) + \overline{\mathcal{E}} \overline{\mathcal{E}}^{-1} ( p_{Y|X}(y|x) - \inf \limits_{x \in \mathcal{X}} p_{Y|X}(y|x)) = p_{Y|X}(y|x)\).

Conversely, assume that there exists a channel \(P_{Y|X'}\) such that the channel \(P_{Y|X}\) factorizes into \(P_{X|X'} P_{Y|X'}\) where \(P_{X|X'}\) is an erasure channel with parameter \(\mathcal{E}\). Then for any pair \(x,y\) we have \(p_{Y|X}(y|x) = \overline{\mathcal{E}} p_{Y|X'}(y|x) + \mathcal{E} p_{Y|X'}(y|\bot) \geqslant\mathcal{E} p_{Y|X'}(y|\bot)\). Since this is true for all \(x\), this is also true by taking the infimum over \(x\): \(\inf_x p_{Y|X}(y|x) \geqslant\mathcal{E} P_{Y|X}(y|\bot)\). Summing over \(y\) and noting that \(\int_{y \in \mathcal{Y}} P_{Y|X}(y|\bot) = 1\) gives \(\int_{y \in \mathcal{Y}} \inf_{x \in \mathcal{X}} p_{Y|X}(y|x) \geqslant\mathcal{E}\).

Proof of Lemma [lemma:composition-different-erasures]

The composition can be computed explicitly \[\begin{cases} p_{Z|X}( z | x ) = 0 & \text{ if } x \neq z \\ p_{Z|X}(z | x) = \overline{{\mathcal{E}}}_0 \overline{{\mathcal{E}}}_1 & \text{ if } x = z \\ p_{Z|X}( \bot_0 | x) = {\mathcal{E}}_0 & \text{ for all $x$} \\ p_{Z|X}( \bot_1 | x) = \overline{{\mathcal{E}}}_0 {\mathcal{E}}_1 & \text{ for all $x$} \end{cases}\] Hence the result by computing equation [eq:cdc-def], \({\mathcal{E}}=0+{\mathcal{E}}_0+ \overline{{\mathcal{E}}}_0 {\mathcal{E}}_1=1 - \overline{{\mathcal{E}}}_0 \overline{{\mathcal{E}}}_1\).

Proof of Lemma [lemma:doeb-consistencency]

\[\begin{aligned} \int_{ \mathcal{Y} \times \mathcal{Z} } \inf_{x \in \mathcal{X}} p(yz|x) &= \int_{ \mathcal{Y} \times \mathcal{Z} } \inf_{x \in \mathcal{X}} p_{Y|X}(y|x) p_{Z|XY}(z|xy) \\ &= \int_{ \mathcal{Y} \times \mathcal{Z} } \inf_{x \in \mathcal{X}} p_{Y|X}(y|x) p_{Z|Y}(z|y) \\ &= \int_{ \mathcal{Y} \times \mathcal{Z} } p_{Z|Y}(z|y) \inf_{x \in \mathcal{X}} p_{Y|X}(y|x) \\ &= \int_{ \mathcal{Y} } \left ( \int_{\mathcal{Z}} p_{Z|Y}(z|y) \right ) \inf_{x \in \mathcal{X}} p_{Y|X}(y|x) \\ &= \int_{ \mathcal{Y} } \inf_{x \in \mathcal{X}} p_{Y|X}(y|x) . \end{aligned}\]

Proof of Lemma [lemma:single-letter]

Very recently Makur and Singh (Makur and Singh 2023) established the same property of \(\mathrm{CDC}\) in the discrete setting (with transition matrices) for Bayesian networks (Friedman, Geiger, and Goldszmidt 1997).

\[\begin{aligned} \int_{ \mathcal{Y}_1 \times \mathcal{Y}_2 } \inf_{x} p_{Y_1Y_2|X}(y_1y_2|x) &= \int_{ \mathcal{Y}_1 \times \mathcal{Y}_2 } \inf_{x} p_{Y_1|X}(y_1|x) p_{Y_2|X Y_1}(y_2|x y_1) \\ &= \int_{ \mathcal{Y}_1 \times \mathcal{Y}_2 } \inf_{x} p_{Y_1|X}(y_1|x) p_{Y_2|X}(y_2|x) \\ &\geqslant\int_{ \mathcal{Y}_1 \times \mathcal{Y}_2 } \inf_{x} p_{Y_1|X}(y_1|x) \inf_{x} p_{Y_2|X}(y_2|x) \\ &= \int_{ \mathcal{Y}_1 } \inf_{x} p_{Y_1|X}(y_1|x) \int_{ \mathcal{Y}_2 } \inf_{x} p_{Y_2|X}(y_2|x). \end{aligned}\] The result with \(q\) channels follows by induction.

The inequality is a consequence of the union bound. Indeed, if \(0\leqslant x_1,\ldots,x_q \leqslant 1\) we can see each as a probability of a given event \(E_i\), \(x_i = \mathbb{P} (E_i)\) and \(1-\prod_{i=1}^q (1-x_i) = \mathbb{P} ( \cup_{i=1}^q E_i) \leqslant\sum_{i=1}^q \mathbb{P} ( E_i ) = \sum_{i=1}^q x_i\).

Proof of Lemma [lemma:adaptive-single-letter]

The key point is that for any observation \(y_1\), the channel \(X \to \boxed{ P_{Y_2|X,Y_1=y_1}} \to Y_2\) is \(\overline{{\mathcal{E}}}\)-noisy with respect to \(\mathrm{CDC}\). This exactly means that for all \(y_1\), \(\int_{\mathcal{Y}_2} \inf_{x} p(y_2|x y_1 ) = {\mathcal{E}}_{y_1}(X \to Y_2)\geqslant{\mathcal{E}}\). \[\begin{aligned} \int_{ \mathcal{Y}_1 \times \mathcal{Y}_2 } \hspace*{-.3cm} \inf_{x} p_{Y_1Y_2|X}(y_1y_2|x) &= \int_{ \mathcal{Y}_1 \times \mathcal{Y}_2 } \inf_{x} p_{Y_1|X}(y_1|x) p_{Y_2|X Y_1}(y_2|x y_1) \\ &\geqslant\int_{ \mathcal{Y}_1 \times \mathcal{Y}_2 } \inf_{x} p_{Y_1|X}(y_1|x) \inf_{x} p(y_2|x y_1 ) \\ &= \int_{ \mathcal{Y}_1} \hspace*{-.2cm} \left ( \! \left ( \int_{\mathcal{Y}_2} \inf_{x} p_{Y_2|X Y_1}(y_2|x y_1 ) \! \right ) \inf_{x} p_{Y_1|X}(y_1|x) \right ) \\ &\geqslant\int_{ \mathcal{Y}_1} \left ( {\mathcal{E}}\inf_{x} p_{Y_1|X}(y_1|x) \right ) \\ &\geqslant{\mathcal{E}}^2. \end{aligned}\] The result with \(q\) channels follows by induction.

Proof of Proposition [prop:closed-form-univariate]

Since the channel noise is additive \(p_{Y|X}(y|x) = p_Z( y - f(x))\). As the noise is radially symmetric \(p_Z( y - f(x))= p_Z( | y - f(x) | )\). Since the noise is radially symmetric decreasing \(\inf_x p_{Y|X}(y|x) = p_Z ( \sup_x | y - f(x) | )\). If \(y \geqslant\frac{m+M}{2}\) then \(\sup_x | y - f(x) | = y - m\) else \(\sup_x | y - f(x) | = M-y\). Hence the result by splitting the integral over \(\mathcal{Y}\) into two parts, \[{\mathcal{E}}(X \to Y) = \int_{-\infty}^{ \frac{m+M}{2}} p_Z(M-y) \;\mathrm{d}y + \int_{ \frac{m+M}{2}}^{\infty} p_Z(y-m) \;\mathrm{d}y = 2 \int_{ \frac{M-m}{2}}^{\infty} p_Z(u) \;\mathrm{d}u.\] Since \(Q(x) = \frac{1}{2} - \frac{x}{ \sqrt{2 \pi }} + O ( x^3 )\), we also obtain the announced approximation for high noise and additive Gaussian noise.

Proof of Lemma [MGL:doeblin]

Consider the channel \[X \to \boxed{ f } \to G \to \boxed{ \mathrm{Mask}_d } \to \mathbf{ G } \to \boxed{ \prod_{i=0}^d p_{Y_i|G_i} } \to \mathbf{Y}.\] Since by the strengthened DPI (Lemma [prop:doeb-concat]), \(\overline{{\mathcal{E}}}( X \rightarrow \mathbf{Y} ) \leqslant\overline{{\mathcal{E}}}( X \rightarrow G ) \overline{{\mathcal{E}}}( G \rightarrow \mathbf{Y} ) \leqslant\overline{{\mathcal{E}}}( G \rightarrow \mathbf{Y} )\). We can limit ourselves to the channel \(G \to \mathbf{Y}\). By Theorem [thm:doeblin-degradation], \(G_i \to \boxed{ P_{Y_i|G_i} } \to Y_i\) is a stochastically degraded erasure channel with erasure probability \({\mathcal{E}}_i = {\mathcal{E}}( G_i \rightarrow Y_i)\). Consequently, the channel rewrites \[G_i \to \boxed{ \mathrm{EC}_{{\mathcal{E}}_i}^{{\bot}_i} } \to G_i' \to \boxed{ P_{Y_i | G_i'}} \to Y_i.\] So that the channel is \[G \to \boxed{ \mathrm{Mask}_d } \to \boxed{ \prod_{i=0}^d \mathrm{EC}_{{\mathcal{E}}_i}^{{\bot}_i} } \to \mathbf{Y}' \to \boxed{ \prod_{i=0}^d p_{Y_i|G_i} } \to \mathbf{Y}.\] Hence, the channel \(G \to \mathbf{Y}\) is stochastically degraded with respect to the channel \(G \to \mathbf{Y}'\). By the strengthened DPI (Lemma [prop:doeb-concat]) we obtain \[\overline{{\mathcal{E}}}(X \rightarrow \mathbf{Y}) \leqslant\overline{{\mathcal{E}}}( G \rightarrow \mathbf{Y} ) \leqslant\overline{{\mathcal{E}}}( G \rightarrow \mathbf{Y}' ).\] Recall that by definition \[\overline{{\mathcal{E}}}( G \rightarrow \mathbf{Y}' ) = \mathbb{E} _{Y_0',\ldots,Y_d'} \left [ \sup \limits_{ g \in f(\mathcal{X}) } \left ( 1 - \frac{ p(g|Y_0',\ldots,Y_d')}{ p(g) } \right )\right ].\] If there exists an index \(i\) such that \(y_i'={\bot}_i\) then by secret sharing property \(p(g|y_0',\ldots,y_d') = p(g)\) so that \[\sup \limits_{ g \in f(\mathcal{X}) } \left ( 1 - \frac{ p(g|y_0',\ldots,y_d')}{ p(g) } \right ) = 0.\] Otherwise, for all \(i \in \{0,\ldots,d\}\), \(y_i' \neq {\bot}_i\) and \[\begin{cases} p(g|y_0',\ldots,y_d') = 1 & \text{ if } g = y_0'+\ldots+y_d' \\ p(g|y_0',\ldots,y_d') = 0 & \text{ if } g \neq y_0'+\ldots+y_d' \end{cases}\] so that \[\sup \limits_{ g \in f(\mathcal{X}) } \left ( 1 - \frac{ p(g|y_0',\ldots,y_d')}{ p(g) } \right ) = \sup (1 , 1-\frac{1}{ p( y_0'+\ldots+y_d')}) = 1.\] As a consequence, \[\overline{{\mathcal{E}}}( G \rightarrow \mathbf{Y}' ) = \mathbb{E} _{Y_0',\ldots,Y_d'} \left [ {1}_{ Y_0' \neq {\bot}_0,\ldots,Y_d' \neq {\bot}_d} \right ] = \mathbb{P} \left ( Y_0' \neq {\bot}_0,\ldots,Y_d' \neq {\bot}_d \right ) = \prod_{i=0}^d \overline{{\mathcal{E}}}_i.\]

Proof of Lemma [lemma:type3]

By Theorem [thm:doeblin-degradation], \((G_i,H_j) \rightarrow Y_{i,j}\) is stochastically degraded with respect to an erasure channel with erasure probability \({\mathcal{E}}_{i,j} \triangleq {\mathcal{E}}( (G_i,H_j) \rightarrow Y_{i,j} )\). Hence, we can write \[(G_i,H_j) \to \boxed{ \mathrm{EC}_{ {\mathcal{E}}_{i,j} }^{ {\bot}_{i,j}}} \to (G_i',H_j') \to Y_{i,j}.\] As a consequence the channel \(X \to \mathbf{Y}\) is stochastically degraded with respect to the channel \(X \to ((G_i',H_j'))_{0 \leqslant i,j \leqslant d}\). By the strengthened DPI (Lemma [prop:doeb-concat]) we have \({\mathcal{E}}(X \to \mathbf{Y}) \geqslant{\mathcal{E}}(X \to ((G_i',H_j'))_{0 \leqslant i,j \leqslant d})\). It only remains to show that \(\overline{\mathcal{E}}(X \to ((G_i',H_j'))_{0 \leqslant i,j \leqslant d}) = \Upsilon( (\overline{{\mathcal{E}}}_{i,j})_{0 \leqslant i,j \leqslant d} )\).

By the secret sharing property we have \(p(x | ((g_i',h_j'))_{i,j}) = p(x)\) whenever \(\exists i, \forall j, (g_i',h_j') = {\bot}_{i,j}\) and \(\exists j, \forall i, (g_i',h_j') = {\bot}_{i,j}\). In this case \[\sup \limits_{x \in \mathcal{X}} \left ( 1 - \frac{ p(x | ((g_i',h_j'))_{i,j}) }{p(x)} \right ) = 0.\] Otherwise, when \(\forall i, \exists j, (g_i',h_j') \neq {\bot}_{i,j}\) or \(\forall j, \exists i, (g_i',h_j') \neq {\bot}_{i,j}\), as in the previous proof, \[\sup \limits_{x \in \mathcal{X}} \left ( 1 - \frac{ p(x | ((g_i',h_j'))_{i,j}) }{p(x)} \right ) = 1.\] Let \(E_{i,j}\) be the event \((G_i',H_j') \neq {\bot}_{i,j}\). Then the \(E_{i,j}\) are mutually independents events with probabilities \(\overline{{\mathcal{E}}}_{i,j}\). Further we have \[\sup \limits_{x \in \mathcal{X}} \left ( 1 - \frac{ p(x | ((G_i',H_j'))_{i,j}) }{p(x)} \right ) = 1\] on the event \[E = (\cap_{i=0}^d \cup_{j=0}^d E_{i,j}) \cup (\cap_{j=0}^d \cup_{i=0}^d E_{i,j})\] and \(0\) otherwise. As a consequence, \[\overline{\mathcal{E}}(X \to ((G_i',H_j'))_{0 \leqslant i,j \leqslant d}) = \mathbb{E} \left [ {1}_{E} \right ] = \mathbb{P} (E).\]

Proof of Lemma [lemma:recombination-doeb]

By Theorem [thm:doeblin-degradation] the channel \((T_{i,j},T_{i,j-1}) \to Y_{i,j}\) rewrites \[(T_{i,j},T_{i,j-1}) \to \boxed{\mathrm{EC}_{ \mathcal{E}_{i,j} }^{{\bot}_i}} \to Y_{i,j}' \to Y_{i,j}\] where \(\mathcal{E}_{i,j} \triangleq \mathcal{E}( (T_{i,j},T_{i,j-1})\to Y_{i,j})\). Hence, \(T_{i,d} \to \mathbf{Y}_i\) is stochastically degraded with respect to \[T_{i,d} \to \mathbf{T}_i \to \boxed{ \prod_{j=0}^d \mathrm{EC}_{ \mathcal{E}_j}^{{\bot}_j}} \to \mathbf{Y}_i'.\] Thus, by the strengthened DPI (Lemma [prop:doeb-concat]) \({\mathcal{E}}( T_{i,d} \rightarrow \mathbf{Y}_i ) \geqslant{\mathcal{E}}( T_{i,d} \rightarrow \mathbf{Y}_i' )\). It remains to show that \({\mathcal{E}}( T_{i,d} \rightarrow \mathbf{Y}_i' ) = {\mathcal{E}}( (T_{i,d},T_{i,d-1}) \rightarrow Y_{i,d} )\). By secret sharing property if \(y_{i,d}' = {\bot}\) then \(p(t|y_{i,1}',\ldots,y_{i,d}') = p(t)\) . Hence, \[\sup_{t} \left ( 1 - \frac{ p(t|y_{i,1}',\ldots,y_{i,d}')}{ p(t)} \right ) = 0\] in this case. Otherwise, \(y_{i,d} = (t_{i,d},t_{i,d-1})\) so that \(p(t|y_{i,1}',\ldots,y_{i,d}'=(t_{i,d},t_{i,d-1})) = {1}_{t=t_{i,d}}\). Hence, \[\sup_{t} \left ( 1 - \frac{ p(t|y_{i,1}',\ldots,y_{i,d}')}{ p(t)} \right ) = 1\] in this case. Finally, \[\overline{{\mathcal{E}}}(T_{i,d} \to \mathbf{Y}_i') = \mathbb{E} _{ \mathbf{Y}_i' }[{1}_{Y_{i,d}' \neq {\bot}}] = \mathbb{P} (Y_{i,d}' \neq {\bot})=\overline{{\mathcal{E}}}( (T_{i,d},T_{i,d-1}) \rightarrow Y_{i,d} ).\] This concludes the proof since \(\overline{{\mathcal{E}}}( (T_{i,d-1},V_{i,d}) \rightarrow Y_{i,d} ) = \overline{{\mathcal{E}}}( (T_{i,d},T_{i,d-1}) \rightarrow Y_{i,d} )\).

Proof of Theorem [thm:doeblin-pr]

Each sensitive variable in the different subsequences is a function \(f\) of the secret key \(K\) and public information \(T\) i.e \(X = f(K,T)\). By Lemma [prop:doeb-concat] we have \(\overline{{\mathcal{E}}}(K \to Y) \leqslant\overline{{\mathcal{E}}}(X \to Y)\). Since the \(\mathrm{CDC}\) does not depend on the input distribution and by Lemma [lemma:adaptive-single-letter] carries over adaptively chosen channel we can recombine the different leakages from the different queries and subsequences. Since the subsequences are separated by leakage-free refreshing and that the channel noise is independent of one query to another we can use Lemma [lemma:adaptive-single-letter] to combine the different leakages of each subsequence and query. Further, the Lemmas [MGL:doeblin],[lemma:type3],[lemma:type4] provide the bounds for each subsequence. The asymptotic equivalence is a consequence of Proposition [prop:upsilon].

Proof of Theorem [th:random-probing]

Let \(\mathbf{Y} = (Y_{j,i})\) be the vector of all leakages, where \(Y_{j,i}\) corresponds to the leakage of the \(l_i\) wires of the \(i\)-th gadget for the \(j\)-th trace. By Lemma [lemma:adaptive-single-letter] we obtain for an adaptive chosen channel adversary that \[\overline{\mathcal{E}}(K \to \mathbf{Y}) \leqslant 1 - \prod_{j=1}^q \prod_{i=1}^{|\Gamma|} 1 - \overline{\mathcal{E}}(K \to Y_{j,i}).\] It remains to show that \(\overline{\mathcal{E}}(K \to Y_{j,i}) \leqslant Q_B(t,l_i, {\overline{{\mathcal{E}}}})\) for all \((j,i)\). Let \(X_1,\ldots,X_{l_i}\) be the \(l_i\) considered wires and \(L_1,\ldots,L_{l_i}\) be the corresponding leakage for the \(j\)-th trace \(Y_{j,i} = (L_1,\ldots,L_{l_i})\). The channel \(K \to Y_{j,i}\) is factorized by Theorem [thm:doeblin-degradation] to \[K \to (X_1,...,X_{l_i}) \to (X_1',\ldots,X_{l_i}') \to Y_{j,i}\] where \(X_i \to X_i'\) is an erasure channel with erasure probability \(\mathcal{E}\). By the DPI (Lemma [prop:doeb-concat]), \(\overline{\mathcal{E}}(K \to Y_{j,i}) \leqslant\overline{\mathcal{E}}(K \to (X_1',\ldots,X_{l_i}'))\).

Now \(\sup_{k} 1 - \frac{p(k|x_1',\ldots,x_{l_i}')}{p(k)} = 0\) provided that there exists at most \(t\) non-erased values in the tuple \((x_1',\ldots,x_{l_i}')\). Otherwise, \(\sup_{k} 1 - \frac{p(k|x_1',\ldots,x_{l_i}')}{p(k)} \leqslant 1\). Hence \(\overline{\mathcal{E}}(K \to (X_1',\ldots,X_{l_i}')) \leqslant \mathbb{E} _{X_1',\ldots,X_{l_i}'} {1}_E = \mathbb{P} (E)\) where \(E\) is the event that there is at least \(t\) non-erased values in the tuple \((X_1',\ldots,X_{l_i}')\). Since the number of non-erased values follows a binomial distribution with parameters \(l_i\) and \(\overline{\mathcal{E}}\) we obtain that \(\mathbb{P} (E) = Q_B(t,l_i, {\overline{{\mathcal{E}}}})\) which concludes the proof.

Proof of Proposition [prop-essentialy-decreasing]

As \({\overline{{\mathcal{E}}}} \rightarrow 0\) we have \[\mathrm{fail}(d_1,{\overline{{\mathcal{E}}}}) \sim \left (\sum_i \binom{ l_i(d_1) }{ t(d_1) + 1} \right ) \overline{{\mathcal{E}}}^{d_1} = C_{d_1} \overline{{\mathcal{E}}}^{d_1}\] and similarly for \(d_2\). As a consequence, \[\frac{ \mathrm{fail}(d_2,{\overline{{\mathcal{E}}}}) } {\mathrm{fail}(d_1,{\overline{{\mathcal{E}}}})} \sim \frac{C_{d_2}}{C_{d_1}} {\overline{{\mathcal{E}}}}^{d_2-d_1}.\] Since \(d_2 > d_1\) the ratio in the equivalent becomes strictly inferior to \(1\) whenever \({\overline{{\mathcal{E}}}} < \left ( \frac{C_{d_1}}{ C_{d_2}} \right )^{\frac{1}{d_2-d_1}}\). Finally, \(\mathrm{fail}(d_1,{\overline{{\mathcal{E}}}}) > \mathrm{fail}(d_2,{\overline{{\mathcal{E}}}})\) if \({\overline{{\mathcal{E}}}}\) is small enough.

Proof of Proposition [prop:finite-masking-order]

We assume \({\overline{{\mathcal{E}}}} < 1\). We have \[Q_B( t(d), l(d) , {\overline{{\mathcal{E}}}}) = 1 - Q_B( l(d) - t(d) + 1 , l(d), {\overline{{\mathcal{E}}}} )\] Since \(\frac{ l(d) }{ t(d) } \overset{ d \rightarrow \infty }{\longrightarrow} \infty\), from a certain rank \({\overline{{\mathcal{E}}}} l(d) < l(d) - t(d) + 1\), and we can apply Chernoff-Hoeffding (Hoeffding 1994) bound \[1 \geqslant Q_B( t(d) , l(d) , {\overline{{\mathcal{E}}}}) \geqslant 1 - \mathrm{exp} \left ( - l(d) d_{\mathrm{KL}} \left ( \frac{l(d) - t(d) + 1 }{ l(d)} \middle\| {\overline{{\mathcal{E}}}} \right ) \right )\] where \(d_{ \mathrm{KL} }(p\|q) \triangleq D_{ \mathrm{KL} }( \mathcal{B}(p) \| \mathcal{B}(q) ) = p \log \frac{p}{q} + \bar{p} \log \frac{ \bar{p}}{ \bar{q}}\).

We have \(d_{\mathrm{KL}} \left ( \frac{ l(d) - t(d) + 1 }{ l(d) } \middle\| {\overline{{\mathcal{E}}}} \right ) \rightarrow d_{\mathrm{KL}}(1 \| {\overline{{\mathcal{E}}}}) > 0\) and \(l(d) \rightarrow \infty\) so by sandwiching theorem \(1-Q_B( t(d), l(d) , {\overline{{\mathcal{E}}}})\) converges exponentially fast to \(0\). Then \[0 \leqslant\prod_{i} \left ( 1 - Q_B(t(d),l_i(d), {\overline{{\mathcal{E}}}}) \right ) \leqslant\left ( 1 - Q_B(t(d), l(d), {\overline{{\mathcal{E}}}}) \right ).\] As a consequence, \(\mathrm{fail}(d,{\overline{{\mathcal{E}}}}) \overset{ d \rightarrow \infty }{\longrightarrow} 1\).

Technical Properties of the Rook Domination Polynomial

[prop:upsilon] \(\Upsilon_d\) verifies the following properties:

  • Lower and upper bounds: \[\label{eq:annex-sandwich-doeb} \left ( 1 - {\mathcal{E}}^{d+1} \right )^{d+1} \leqslant\Upsilon_d(\overline{{\mathcal{E}}}) \leqslant\min \{ 2 \left ( 1 - {\mathcal{E}}^{d+1} \right )^{d+1} , 1 \}.\]

  • \(\Upsilon_d\) is a polynomial function of degree \((d+1)^2\). The multiplicity of zero as a root of \(\Upsilon_d\) is \(d+1\). Namely we can write: \[\Upsilon_d(\overline{{\mathcal{E}}} ) = \sum_{j = d+1 }^{(d+1)^2} a_j\overline{{\mathcal{E}}}^j {\mathcal{E}}^{(d+1)^2-j}.\]

  • For all \(i \in \{0,\ldots,(d+1)^2\}\), \(a_i \in \mathbb{N}\) and \[\begin{cases} a_{d+1} = 2 (d+1)^{d+1} - (d+1)! \\ 0 \leqslant a_k \leqslant\binom{(d+1)^2}{k} - \binom{d^2}{k} & d+2 \leqslant k \leqslant d^2 \\ a_{k}= \binom{(d+1)^2}{k} & k \geqslant d^2 + 1. \end{cases}\] In particular, \(\sum_{j = d+1 }^{(d+1)^2} a_i < 2^{(d+1)^2} - 2^{d^2}\).

See Stephan Mertens’s article (Mertens 2024) on the domination polynomial of the rook graph.

Comparison with Other Noisiness Metrics

The lower bound is obtained via the DPI. Namely, we degrade the channel \(X \to Y\) as \[X \to \boxed{ \mathrm{EC}_{ {\mathcal{E}}}^{\bot} } \to X' \to Y\] where \({\mathcal{E}}= {\mathcal{E}}(X \to Y)\) Then by DPI Lemma [lemma:DPI-metrics] we obtain \[I(X;X') \geqslant I(X;Y).\] Since \(I(X;X') = H(X) - H(X|X') = H(X) - {\mathcal{E}}H(X) - \bar{ {\mathcal{E}}} 0 = \bar{ {\mathcal{E}}} H(X)\) we obtain \[\bar{ {\mathcal{E}}} \geqslant\frac{I(X;Y)}{ H(X)} \geqslant\frac{I(X;Y)}{ \log |\mathcal{X}|}.\]

For \({\mathrm{TVI}}\) we proceed similarly to proposition [prop:doeblin-to-guessing]. By DPI Lemma [lemma:DPI-metrics], \(\Delta(X;X') \geqslant\Delta(X;Y)\) and \[\begin{aligned} \Delta(X;X') &= \sum_{ \mathcal{X}} p_{X'}(x) (1 - p_X(x)) \\ &= \bar{ {\mathcal{E}}} \sum_{ \mathcal{X}} p_X(x) (1 - p_X(x)) \\ &= \bar{ {\mathcal{E}}} \bigl ( 1 - \sum_{ \mathcal{X}} p_X(x)^2 \bigr ) \\ &= \bar{ {\mathcal{E}}} \bigl ( 1 - \mathrm{exp}( - H_2(X) ) \bigr ) \\ &\leqslant\bar{ {\mathcal{E}}} \bigl (1 - \frac{1}{ |\mathcal{X}|} \bigr ). \end{aligned}\] We obtain \[\bar{ {\mathcal{E}}} \geqslant\frac{\Delta(X;Y)}{ 1 - \mathrm{exp}( - H_2(X) ) } \geqslant\frac{\Delta(X;Y) |\mathcal{X}|}{ |\mathcal{X}| - 1 }.\]

For maximal leakage we can also leverage the DPI, \[\log(1+ (|\mathcal{X}|-1) \bar{ {\mathcal{E}}}(X \to Y) ) = \mathcal{L}(X \to X') \geqslant\mathcal{L}(X \to Y).\]

By Jensen’s inequality, as in Rivain’s HDR (Rivain 2022), we obtain that \(\beta(X;Y) \leqslant 2 \Delta(X;Y)\). Also, \[\begin{aligned} {\mathrm{ARE}}(X;Y) &= \mathbb{E} _{y} \sup_x \left | \frac{p_{X|Y}(x|y)}{p_X(x)} - 1 \right | \\ &= \int_{ y \in \mathcal{Y} } \sup_{x \in \mathcal{X}} \frac{1}{p_X(x)} \left | p_{X,Y}(x,y) - p_X(x) p_Y(y) \right | \\ &\leqslant 2 \gamma_X \Delta(X;Y). \end{aligned}\]

We have \[\overline{{\mathcal{E}}}(X \rightarrow Y) = 1 - \int_y \inf_x p(y|x) = \int_y \sup_x p(y) - p(y|x) = \mathbb{E} _{y} \left [ \sup_x \left ( 1 - \frac{p(y|x)}{p(y)} \right ) \right ].\] Hence similarly to (Prest et al. 2019 Proposition 1) we obtain, \[\overline{{\mathcal{E}}}(X \rightarrow Y) \leqslant \mathbb{E} _{y} \left [ \sup_x \left |1 - \frac{p(y|x)}{p(y)} \right | \right ] = ARE(X;Y) \leqslant RE(X;Y).\] For \(\mathrm{TVI}\) we simplify the derivations from (Duc, Dziembowski, and Faust 2014 Appendix C, Eqn. 14) and use Pinsker’s inequality as (Duc, Faust, and Standaert 2019) to obtain a bound for \(\mathrm{MI}\). \[\begin{aligned} \overline{{\mathcal{E}}}(X \rightarrow Y) &\leqslant \mathbb{E} _{y} \left [ \sup_x \left (1 - \frac{p(y|x)}{p(y)} \right )^+ \right ] \\ &\leqslant \mathbb{E} _{y} \left [ \sum_x \left (1 - \frac{p(y|x)}{p(y)} \right )^+ \right ] \\ &= \mathbb{E} _{y} \left [ \sum_x \frac{1}{p(x)} \left (p(x) - p(x|y) \right )^+ \right ] \\ &\leqslant\frac{1}{ \inf \limits_{x \in \mathcal{X}} p_{X}(x)} \Delta(X;Y) \\ &\leqslant\frac{1}{ \inf \limits_{x \in \mathcal{X}} p_{X}(x)} \sqrt{ \frac{I(X;Y)}{2 \log e}}. \end{aligned}\] Finally by adding positive term we obtain the bound for \(\mathrm{EN}\), \[\begin{aligned} \overline{{\mathcal{E}}}(X \rightarrow Y) &\leqslant \mathbb{E} _{y} \left [ \sqrt{ \sum_x \left (1 - \frac{p(y|x)}{p(y)} \right )^2 } \right ] \\ &= \mathbb{E} _{y} \left [ \sqrt{ \sum_x \frac{1}{p_X(x)^2 }\left ( p(x) - p(x|y) \right )^2 } \right ] \\ &\leqslant\frac{1}{ \inf \limits_{x \in \mathcal{X}} p_X(x)} \beta(X;Y). \end{aligned}\]

The upper bound in terms of maximal leakage is obtained as follows, let \(x_y\) be a value of \(x\) achieving \(\inf_x p(y|x)\) then \[\begin{aligned} \inf_x p(y|x) &= \sum_x p(y|x) - \sum_{x \neq x_y} p(y|x) \\ &\geqslant\sum_x p(y|x) - (|\mathcal{X}|-1) \sup_x p(y|x). \end{aligned}\] Summing over \(y\) yields the announced result.

Extensive State of the Art: Flaws and Patches

“Masking against Side-Channel Attacks: a Formal Security Proof”

In their seminal work Prouff and Rivain (Prouff and Rivain 2013) introduced the noisy leakage model and provided for the first time a direct proof of security for masking in this model. The proof of the technical lemma used in (Prouff and Rivain 2013) are available in Rivain HDR (Rivain 2022). Though, we identified a critical flaw in (Prouff and Rivain 2013 Lemma 4) used to derive (Prouff and Rivain 2013 Thm. 3) and the main theorem. Recall (Prouff and Rivain 2013 Lemma 4):

Let \(A,B\) be two uniform and mutually independent random variables defined over \(\mathbb{F}\). Let \(L = f(A,B)\) be a leakage corresponding to the side-channel \(f : (a,b) \in \mathbb{F}^2 \mapsto l \in \mathcal{L}\). Then for every \(a,b \in \mathbb{F}\) and \(l \in \mathcal{L}\) we have \[\beta( A ; f(A,b) = l ) \leqslant|\mathbb{F}| \beta( (A,B) ; L = l).\]

If \(\mathbb{F}= \mathbb{F}_2\) and \(L = f(A,B) = A \cdot B\) then \[|\mathbb{F}| \beta((A,B) ; L=0) = \frac{1}{\sqrt{3}} < \frac{1}{\sqrt{2}} = \beta(A ; f(A,1)=0)\] which violates strictly the statement of (Prouff and Rivain 2013 Lemma 4)

If \(f(A,1) =0\) then \(A = 0\) and \(\beta^2(A; f(A,1)=0)=\frac{1}{2}\). If \(L=0\) then there are three equilikely cases \((A,B) \in \{ (0,0); (0,1) ; (1,0) \}\) hence \(|\mathbb{F}|^2 \beta^2((A,B);L=0) = 4 *\left(3* (\frac{1}{3} - \frac{1}{4})^2 + (\frac{1}{4})^2 \right) =\frac{1}{3}\).

The error comes from a flawed statement of the chain rule for conditional probabilities in the third equation in the proof of the lemma. Indeed, it is not true that \(p(a|bl) p(b) = p(ab|l)\) since \(p(b|l) \neq p(b)\) in general. We provided a counter-example to [Lemma 4,  (Prouff and Rivain 2013)]. As a consequence, the proof of Theorem 3 in (Prouff and Rivain 2013) recalled below is also flawed.

Let \(A\) and \(B\) be two random variables uniformly distributed over some finite set \(\mathcal{X}\). Let \(d\) be a positive integer, and let \((A_i)_i\) and \((B_j)_j\) be two \(d\)th-order encoding of \(A\) and \(B\) respectively. Let \(\mathcal{E}\) be a real number such that \(\mathcal{E} \leqslant\frac{\alpha}{(d+1) |\mathcal{X}|^2}\) for some \(\alpha \in (0,1]\) and let \((f_{i,j})_{i,j}\) be noisy leakage functions defined over \(\mathcal{X} \times \mathcal{X}\) and belonging to \(\mathcal{N}(\mathcal{E})\). We have: \[\beta( (A,B) ; (f_{i,j}(A_i,B_j))_{i,j}) \leqslant 2 |\mathcal{X}|^{ \frac{3d+2}{2} } ( \lambda(\mathcal{E},d) \mathcal{E})^{d+1}\] where \(\lambda(\mathcal{E},d) = \inf_{ \alpha \in [(d+1) |\mathcal{X}|^2 \mathcal{E}, 1]} \frac{e^\alpha-1}{\alpha} d + \frac{e^\alpha-1}{\alpha} + e^\alpha\).

We propose the following patch to their main theorem based on our derivations. A weakened version of Theorem [thm:doeblin-pr] using Lemma [lemma:sandwich-bound] yields:

\[\begin{aligned} \frac{1}{2} \left (\frac{ |\mathcal{X}|}{ |\mathcal{X}|-1} \right ) &\beta(X; \mathbf{Y}) \leqslant\overline{{\mathcal{E}}}( X \rightarrow \mathbf{Y} ) \\ &\leqslant 1 - \left ( \left ( 1- \overline{{\mathcal{E}}}^{d+1} \right )^{n_1 + n_2 + n_4} \left ( 1 - \Upsilon_d(\overline{{\mathcal{E}}}) \right )^{n_3} \right )^q \\ & \leqslant 1 - \left ( \left ( 1- (|\mathcal{X}| \beta)^{d+1} \right )^{n_1 + n_2 + n_4} \left ( 1 - \Upsilon_d( |\mathcal{X}|^2 \beta) \right )^{n_3} \right )^q \\ &\leqslant q \left ( (n_1 + n_2 + n_4) + 2 n_3 (d+1)^{d+1} |\mathcal{X}|^{d+1} \right ) |\mathcal{X}|^{d+1} \beta^{d+1}.\end{aligned}\]

“Unifying Leakage Model on a Rényi Day”

The proof of (Prest et al. 2019 Lemma 8) is flawed, which also affects (Prest et al. 2019 Thm. 6, Corollary 4) and the main result. Recall Lemma 8 from (Prest et al. 2019):

Let \(A,B\) be two independent random variable over \(\mathbb{F}\) with \(|\mathbb{F}|\). Let \(f : (a,b) \in \mathbb{F}^2 \mapsto l \in \mathcal{L}\) be a side-channel for \((A,B)\) i.e., \(L = f(A,B)\). Then for every \(b \in \mathbb{F}\), \[{\mathrm{RE}}(A | f(A,b) ) \leqslant{\mathrm{RE}}( (A,B) | L).\]

The flaw is similar as in Lemma 4 of (Prouff and Rivain 2013). The chain rule of probability is used wrongly since \(p(b|l) \neq p(b)\). As a consequence, (Prest et al. 2019 Thm. 6) recalled below is, unfortunately, incorrect.

Let \(A\),\(B\) be two uniform random variables over a finite field \(\mathcal{X}\), \(d\) a positive integer, and \((A_i)\),\((B_i)\) be \(d+1\) additive sharing of \(A\) and \(B\) respectively. Let \(\delta \in \mathbb{R}\) such that \(\delta \leqslant\frac{1}{2d+1}\), and \((f_{i,j})_{i,j}\) be a family of randomized and mutually independent functions such that each \(f_{i,j}\) is \(\delta_{\mathrm{RE}}\)-noisy with respect to \(\mathrm{RE}\). We have: \[{\mathrm{RE}}( (A,B) | (f_{i,j}(A_i,B_j))_{i,j}) \leqslant 3 \left ( \frac{(d+1) \delta_{\mathrm{RE}}}{1-d \delta_{\mathrm{RE}}} \right )^{d+1}.\]

We revise the asymptotic evaluation of \(\mathrm{RE}\) in (Prest et al. 2019 Prop. 3):

Let \(\mathcal{X} = \mathbb{F}_2^n\) and consider \(X \sim \mathcal{U}(\mathcal{X})\). Let \(Y = w_H(X) + Z\) where is an independent additive Gaussian noise \(Z \sim \sigma \mathcal{N}(0,1)\). Then \[{\mathrm{RE}}(X;Y) = 2^n - 1 = |\mathcal{X}| - 1.\]

First \(p_Y(y) \sim \frac{1}{2^n} \binom{n}{n} \varphi_{\sigma}(y - n)\) as \(y \rightarrow \infty\). Second for \(x=(1,\ldots,1)\) we have \(p_{Y|X}(y|x) = \varphi_{\sigma}(y-n)\). Hence, \(\frac{ p_{Y|X}(y|x)}{p_Y(y)} \rightarrow 2^n\) as \(y \rightarrow \infty\). Hence, by sandwich theorem \(2^n - 1 \geqslant\sup_x \left | \frac{ p_{Y|X}(y)}{p_Y(y)} - 1 \right | \geqslant\frac{ p_{Y|X}(y|(1,\ldots,1))}{p_Y(y)} - 1\) tends to \(2^n - 1\) as \(y \rightarrow \infty\). This in turn implies that \({\mathrm{RE}}(X;Y) = \sup_y \sup_x \left | \frac{ p_{Y|X}(y)}{p_Y(y)} - 1 \right | = 2^n - 1 = |\mathcal{X}|-1\).

A potential fix would be to redefine the \(\mathrm{RE}\) to remove the worst \(y\) that are extremely unlikely.

We relax the definition of \(\mathrm{RE}\) to allow it to be large on a set whose probability is less than \(\mathcal{E}\), \[{\mathrm{RE}}_{{\mathcal{E}}}(X;Y) \triangleq \inf_{ \substack{ A \subseteq \mathcal{Y} \\ \mathbb{P} (Y \in A^c) \leqslant{\mathcal{E}}}} \sup_{ y \in A} \sup_{ x \in \mathcal{X} } \left | \mathrm{PMI}(x;y) - 1 \right |.\]

Though, this patch would require to re-derive all results in terms of this new definition. We suggest the following patch in terms of \(\mathrm{ARE}\) weakening Theorem [thm:doeblin-pr] with Lemma [lemma:sandwich-bound]:

\[\begin{aligned} \frac{1}{ 2 (|\mathcal{X}| - 1 )} {\mathrm{ARE}}(X; \mathbf{Y}) &\leqslant\overline{{\mathcal{E}}}( X \rightarrow \mathbf{Y} ) \\ &\leqslant 1 - \left ( \left ( 1- \overline{{\mathcal{E}}}^{d+1} \right )^{n_1 + n_2 + n_4} \left ( 1 - \Upsilon_d(\overline{{\mathcal{E}}}) \right )^{n_3} \right )^q \\ &\leqslant 1 - \left ( \left ( 1- \delta_{{\mathrm{ARE}}}^{d+1} \right )^{n_1 + n_2 + n_4} \left ( 1 - \Upsilon_d(\delta_{{\mathrm{ARE}}}) \right )^{n_3} \right )^q .\end{aligned}\]

“Prouff & Rivain Formal Security Proof of Masking, Revisited”

The proof of (Masure and Standaert 2023 Thm. 5) is flawed, which affects the main result (Masure and Standaert 2023 Thm. 7). In the last step of the proof the bound is averaged over the values of \(\mathbf{b}\) in equation (27). However, the \(i\)-th coordinates depends on all the \(b_0,\ldots,b_d\) and no only on \(b_i\). Indeed, it is \(A_i\) that is fixed on this coordinate. Hence, the average on the \(i\)-th coordinate should be over \(b_0,\ldots,b_d\) and not only on \(b_i\). But the inequality cannot be obtained this way because the bound provided by Mrs. Gerber lemma is convex in one variable when the others are fixed, but is not jointly convex. Recall (Masure and Standaert 2023 Thm. 5),

Let \(A,B\) be two independent and uniform random variables over a finite field \(\mathbb{F}\). Let \((A_i)_{0 \leqslant i \leqslant d},(B_i)_{0 \leqslant i \leqslant d}\) be \(d\)-encodings of \(A\) and \(B\) respectively. Let \(L_{i,j}\) be the side-channel for the cross term \(A_i,B_j\) i.e., we observe \(L_{i,j}(A_i,B_j)\). Let \(\mathbf{L}=(L_{i,j})_{0\leqslant i,j \leqslant d}\). Further it is assumed that \[I((A_i,B_j); L_{i,j}(A_i,B_j)) \leqslant\delta_{i,j} \leqslant\delta_{\mathrm{MI}}\leqslant 1.\] Then, \[\begin{aligned} I((A,B); \mathbf{L}) &\leqslant f_{MGL}( \sum_j \delta_{0,j}, \ldots, \sum_j \delta_{d,j}) + f_{MGL}( \sum_i \delta_{i,0}, \ldots, \sum_i \delta_{i,d}) %\\ %&\leq 2 \eta \left ( \frac{(d+1) \delta}{\eta} \right )^{d+1}. \end{aligned}\]

We suggest the following patch based on the \(\mathrm{CDC}\). A weakened version of Theorem [thm:doeblin-pr] using Lemma [lemma:sandwich-bound] yields:

\[\begin{aligned} \frac{I(X; \mathbf{Y})}{ \log |\mathcal{X}|} &\leqslant\overline{{\mathcal{E}}}( X \rightarrow \mathbf{Y} ) \\ &\leqslant 1 - \left ( \left ( 1- \overline{{\mathcal{E}}}^{d+1} \right )^{n_1 + n_2 + n_4} \left ( 1 - \Upsilon_d \left (\overline{{\mathcal{E}}} \right ) \right )^{n_3} \right )^q \\ &\leqslant 1 - \left ( \left ( 1 - \left ( \delta_{{\mathrm{CDC}}}^A \right )^{d+1} \right )^{n_1 + n_2 + n_4} \left ( 1 - \Upsilon_d \left ( \delta_{{\mathrm{CDC}}}^B \right ) \right )^{n_3} \right )^q\end{aligned}\] where \[\begin{cases} \delta_{{\mathrm{CDC}}}^A = \min \left ( 1, |\mathcal{X}| \left ( \frac{ \delta_{{\mathrm{MI}}}}{2 \log e} \right )^{\frac{1}{2}} \right ) \\ \delta_{{\mathrm{CDC}}}^B = \min \left ( 1, |\mathcal{X}|^2 \left ( \frac{ \delta_{{\mathrm{MI}}}}{2 \log e} \right )^{\frac{1}{2}} \right ). \end{cases}\]

The choice of the uniform prior in the definition of a noisy adversary is arbitrary but makes sense for most cryptographic sensitive variables. However, replacing the uniform prior by the maximum over all input distribution to obtain the analog of a communication “capacity" 2 (Cover and Thomas 2006) is also meaningful.

The capacity of the channel \(X \to \boxed{ P_{Y|X} } \to Y\) is \[C(X \to Y) \triangleq \sup_{p_X} I(X;Y).\]

As pointed out by by Masure & Standaert (Masure and Standaert 2023) we can bound capacity (for \(\mathrm{MI}\)) by a function of the \(\delta\) noisiness using results on the universality of the uniform prior (Shulman and Feder 2004). This roughly leads to an overhead of \(O(|\mathcal{X}|)\) in the worst case though. Note that like \(\mathrm{CDC}\) the channel capacity is a property of the channel (hence the notation with the arrow). This provides another patch for (Masure and Standaert 2023 Thm. 5).

Let \(A,B\) be two independent and uniform random variables over a finite field \(\mathbb{F}\). Let \((A_i)_{0 \leqslant i \leqslant d},(B_i)_{0 \leqslant i \leqslant d}\) be \(d\)-encodings of \(A\) and \(B\) respectively. Consider the side-channels \(((A_i,B_j) \to Y_{i,j})_{0 \leqslant i,j \leqslant d}\). Further, assume that the adversary is a \(\delta_C\) capacity noisy adversary (i.e., \(C( (A_i,B_j) \to Y_{i,j} ) \leqslant\delta_C\) for all \(i,j\)). Then, \[\begin{aligned} I((A,B); \mathbf{L}) \leqslant 2 f_{ \mathrm{MGL}}( (d+1) \delta_C,\ldots, (d+1) \delta_C). \end{aligned}\]

We upper-bound every term within the MGL function in (Masure and Standaert 2023 Eqn. 27) by \((d+1) \delta_C\). We do not need to take the average anymore since we upper-bounded  (Masure and Standaert 2023 Eqn. 27) uniformly in \(\mathbf{b}\).

Also, (Masure and Standaert 2023 Coro. 2) can be slightly improved:

Let \(Y\) be uniformly distributed over \(\mathbb{F}\). Let \(Y \to Y^k \to L\) be a \(\delta_{\mathrm{MI}}\)-noisy with respect to \(\mathrm{MI}\) channel. Then, \[I(Y; L ) \leqslant{\mathrm{GCD}}(k, |\mathcal{X}|-1) \delta_{\mathrm{MI}}\] where we removed the \(\frac{|\mathcal{X}|-1}{|\mathcal{X}|}\) overhead factor from the original lemma.

Indeed, for \(s = y^k\), \(s \neq 0\), \[\mathbb{P} (Y^k=s | Y \neq 0) = \frac{ {\mathrm{GCD}}(k,|\mathcal{X}|-1)}{|\mathcal{X}|-1}.\] Hence, \[\begin{aligned} \mathbb{P} (Y^k=s) &= \mathbb{P} (Y \neq 0) \mathbb{P} (Y^k=s | Y \neq 0) \\ &= \frac{|\mathcal{X}|-1}{|\mathcal{X}|} \frac{ {\mathrm{GCD}}(k,|\mathcal{X}|-1)}{|\mathcal{X}|-1} \\ &= \frac{ {\mathrm{GCD}}(k,|\mathcal{X}|-1)}{|\mathcal{X}|}.\end{aligned}\] We can conclude using the original proof from (Masure and Standaert 2023 Coro. 2).

Now we can state a revisited (Masure and Standaert 2023 Thm. 7):

Consider a \(\delta_C\)-noisy adversary with respect to capacity on the same setting of (Masure and Standaert 2023 Thm. 7). \[I( Y , \mathbf{L} ) \leqslant t_3 2 f_{ \mathrm{MGL} }( (d\!+\!1) \delta_C, \ldots, (d\!+\!1) \delta_C) \!+\! t_{1,2,4} f_{ \mathrm{MGL} }( \delta_C, \ldots, \delta_C)\] where \[t_3 = \sum_{p,q \in \mathcal{M} } \varphi(p,q,|\mathcal{Y}|), \qquad t_{1,2,4} = \sum_{p,q \in \mathcal{M} } \xi(p,q,|\mathcal{Y}|) + \sum_{k \in \mathcal{S}} \psi(k,|\mathcal{Y}|)\] and \[\begin{cases} \varphi(p,q,|\mathcal{Y}|) = |\mathcal{Y}| \min \{ {\mathrm{GCD}}(p,|\mathcal{Y}|-1) , {\mathrm{GCD}}(q,|\mathcal{Y}|-1) \} \\ \xi(p,q,|\mathcal{Y}|) = {\mathrm{GCD}}(p+q,|\mathcal{Y}|-1) \\ \psi(k,|\mathcal{Y}|) = {\mathrm{GCD}}(k,|\mathcal{Y}|-1). \end{cases}\]

“Making Masking Security Proof Concrete”

We revise several claims from (Duc, Faust, and Standaert 2019). Recall (Duc, Faust, and Standaert 2019 Thm. 2):

Let \(d\) be the number of shares used for a key encoding, \(m\) be the number of measurements, and \(I(Y_i ;\mathbf{L}_{Y_i}) \leqslant\delta_{{\mathrm{MI}}}\) for some \(\delta_{\mathrm{MI}}\leqslant\frac{2}{|\mathcal{X}|^2}\). Then, if we refresh the encoding in a leak-free manner between each measurement, the probability of success of a key recovery adversary under independent leakage is: \[\label{eqn-flawed-dfs} \mathbb{P} _s \leqslant 1 - \left ( 1 - \left ( |\mathcal{X}| \sqrt{ \frac{\delta_{\mathrm{MI}}}{2} } \right )^d \right )^m.\]

Equation. [eqn-flawed-dfs] is slightly incorrect because when the side-information is “erased” the adversary still guesses the secret with probability \(\frac{1}{|\mathcal{X}|}\). Also, implicitly the unit of the \({\mathrm{MI}}\) is the nats here. We prefer to normalize the inequality by \(\log e\) so that the result holds in all basis and to make explicit that \(\frac{\delta_{\mathrm{MI}}}{log e}\) has no physical dimension:

\[\mathbb{P} _s \leqslant 1 - \left ( 1 - \left ( |\mathcal{X}| \sqrt{ \frac{\delta_{\mathrm{MI}}}{2 \log e} } \right )^d \right )^m \left ( 1 - \frac{1}{|\mathcal{X}|} \right ).\]

As in (Duc, Faust, and Standaert 2019) the adversary is reduced to a random probing adversary with probing probability \(\overline{\mathcal{E}} = |\mathcal{X}| \sqrt{ \frac{\delta_{\mathrm{MI}}}{2 \log e} }\). The adversary probes all \(d\) shares with probability \(\overline{\mathcal{E}}^d\). This does not happen with probability \(1 - \overline{\mathcal{E}}^d\). This does not happen for all \(m\) queries with probability \(\left (1 - \overline{\mathcal{E}}^d \right )^m\). In this case the \(\mathrm{SR}\) is \(\frac{1}{|\mathcal{X}|}\) otherwise it is \(1\) so that \[\mathbb{P} _s \leqslant\left (1 - \overline{\mathcal{E}}^d \right )^m \frac{1}{|\mathcal{X}|} + 1 - \left (1 - \overline{\mathcal{E}}^d \right )^m = 1 - \left (1 - \overline{\mathcal{E}}^d \right )^m \left ( 1 - \frac{1}{|\mathcal{X}|} \right ).\]

(Duc, Faust, and Standaert 2019 Coro. 1) is revised accordingly:

In the same setting as (Duc, Faust, and Standaert 2019 Coro. 1) we have: \[0 \leqslant \mathbb{P} _s - \frac{1}{|\mathcal{X}|} \leqslant m \left ( 1 - \frac{1}{|\mathcal{X}|} \right ) \mathrm{exp} \left ( - \alpha d \right )\] where \[\alpha = - \log \left ( |\mathcal{X}| \sqrt{ \frac{\delta_{\mathrm{MI}}}{2 \log e} } \right ) > 0.\]

“Unifying Leakage Models: from Probing Attacks to Noisy Leakage”

Tightness

Recall Theorem 1 from (Duc, Dziembowski, and Faust 2014).

Let \(\Gamma\) be an arbitrary stateful arithmetic circuit over some field \(\mathbb{F}\). Let \(\Gamma'\) be the circuit derived from \(\Gamma\) using the ISW compiler. Then \(\Gamma'\) is a \((\delta, |\Gamma| \mathrm{exp}( - \frac{d}{12}) )\)-noise-resilient implementation of \(\Gamma\) where \[\delta \triangleq ((28d + 16) |\mathbb{F}|)^{-1} = O( (d |\mathbb{F}|)^{-1}).\]

As pointed out by Masure & Standaert in (Masure and Standaert 2023), (Duc, Dziembowski, and Faust 2014 Thm. 1) is limited in the sense that the security bound does not depend on the noise level. The bound only depends on the masking order. Further, the required number of shares to obtain a negligible probability of error is prohibitive. Indeed, the bound on the adversary’s advantage is \[|\Gamma| \mathrm{exp} \left ( - \frac{d}{12} \right ).\] Since the advantage is always upper bounded by \(1\) this bound is non-trivial only if \(|\Gamma| e^{ - \frac{-d}{12} } \leqslant 1\) which happens when \(d \geqslant\left\lceil 12 \log |\Gamma| \right\rceil\). If we consider a very minimalistic implementation with only \(|\Gamma|=2\) gadgets then we already need the unpractical masking order \(d \geqslant 9\). This non-tightness essentially comes from the use of concentration inequality. It is applied in a non-asymptotic context to the number of wires \(l\) in a gadget. (Duc, Dziembowski, and Faust 2014) is improved by removing this concentration inequality.

Random Probing to t-Threshold Probing.

(Duc, Dziembowski, and Faust 2014 Lemma 4) shows that the security in the random probing model can be reduced to security in the \(t\)-threshold probing model using a Markov style argument. The original proof comports two minor flaws. First the simulator presented in (Duc, Dziembowski, and Faust 2014 Appendix E) is not \(2 {\overline{{\mathcal{E}}}} l -1\) threshold probing but \(\lceil 2 {\overline{{\mathcal{E}}}} l - 1 \rceil\) threshold probing. Further it is not true that (Duc, Dziembowski, and Faust 2014 Equation 23) holds, we clarify why in our proof. We revise (Duc, Dziembowski, and Faust 2014 Lemma 4) as follows:

Let \(\mathcal{A}\) be an \({\overline{{\mathcal{E}}}}\)-random-probing adversary on \(\mathcal{X}^l\). Then for all \(t \in \mathbb{N}\), there exists a \(t\)-threshold probing adversary \(\mathcal{S}\) on \(\mathcal{X}^l\) such that for every \((x_1,\ldots,x_l) \in \mathcal{X}^l\) we have, \[\Delta( \mathrm{out}_{\mathcal{A}}(x_1,\ldots,x_l) ; \mathrm{out}_{\mathcal{S}}(x_1,\ldots,x_l)) \leqslant Q_B(t,l,{\overline{{\mathcal{E}}}})\] where \(Q_B(\cdot,l,{\overline{{\mathcal{E}}}})\) is the survival function of the binomial distribution with parameters \((l,{\overline{{\mathcal{E}}}})\).

We follow the proof-line of  (Duc, Dziembowski, and Faust 2014). Let \(\mathbf{x} \triangleq (x_1,\ldots,x_l) \in \mathcal{X}^l\). As explained in (Duc, Dziembowski, and Faust 2014) we can assume without loss of generality that the adversary outputs all the information that he obtains i.e., \[\mathrm{out}_{\mathcal{A}}( \mathbf{x}) = \left ( \varphi_1(x_1), \ldots, \varphi_l(x_l) \right )\] where \(\varphi_1,\ldots,\varphi_l\) are erasure channel with probability of erasure \(1-{\overline{{\mathcal{E}}}}\). We show that \(\mathcal{A}\) can be simulated by a \(t\)-threshold probing adversary \(\mathcal{S}\). First, \(\mathcal{S}\) draw \(l\) iid Bernoulli random variables \(B_1,\ldots,B_l\) with parameter \({\overline{{\mathcal{E}}}}\). Let \(S_l = B_1 + \cdots + B_l\), \(S_l \sim \mathcal{B}(l,{\overline{{\mathcal{E}}}})\).

  • If \(S_l \leqslant t\) then \(\mathcal{S}\) queries the values \(x_i\) such that \(B_i = 1\). Then \(\mathcal{S}\) outputs the vector \((y_1,\ldots,y_l)\) where \(y_i=x_i\) if \(B_i=1\) and \(y_i=\bot\) if \(B_i=0\).

  • Else \(S_l > l\) then \(\mathcal{S}\) fails and outputs the error vector \((\bot,\ldots,\bot)\).

Let \(N_{\overline{{\mathcal{E}}}}\) be the random variable that counts the number of values that \(\mathcal{A}\) receives without erasures (i.e., \(l-N_{\overline{{\mathcal{E}}}}\) is the number of erasures that \(\mathcal{A}\) receives). The distribution of \(\mathrm{out}_\mathcal{A}( \mathbf{x})\) given the event \((N_{\overline{{\mathcal{E}}}} \leqslant t)\) is equal to the distribution of \(\mathrm{out}_\mathcal{S}( \mathbf{x})\) given the event \((S_l \leqslant t)\)3. Let \(\Delta \triangleq \Delta( \mathrm{out}_{\mathcal{A}}( \mathbf{x}) ; \mathrm{out}_{\mathcal{S}}( \mathbf{x}) )\) and \(y \in \mathcal{Y} \triangleq \left ( \mathcal{X} \cup \{ \bot \} \right )^l\). By the law of total probability \(\mathbb{P} ( \mathrm{out}_{\mathcal{A}}( \mathbf{x}) = y) = \mathbb{P} ( N_{\overline{{\mathcal{E}}}} \leqslant t ) \mathbb{P} ( \mathrm{out}_{\mathcal{A}}( \mathbf{x}) = y | N_{\overline{{\mathcal{E}}}} \leqslant t) + \mathbb{P} ( N_{\overline{{\mathcal{E}}}} > t ) \mathbb{P} ( \mathrm{out}_{\mathcal{A}}( \mathbf{x}) = y | N_{\overline{{\mathcal{E}}}} > t),\) and \(\mathbb{P} ( \mathrm{out}_{\mathcal{S}}( \mathbf{x} ) = y) = \mathbb{P} ( S_l \leqslant t ) \mathbb{P} ( \mathrm{out}_{\mathcal{S}}( \mathbf{x} ) = y | S_l \leqslant t) + \mathbb{P} ( S_l > t ) \mathbb{P} ( \mathrm{out}_{\mathcal{S}}( \mathbf{x}) = y | S_l > t).\) Further, \[\begin{cases} \mathbb{P} ( S_l \leqslant t ) = \mathbb{P} ( N_{\overline{{\mathcal{E}}}} \leqslant t ) \\ \mathbb{P} ( S_l > t ) = \mathbb{P} ( N_{\overline{{\mathcal{E}}}} > t ) = Q_B(t,l,{\overline{{\mathcal{E}}}}) \\ \mathbb{P} ( \mathrm{out}_{\mathcal{S}}( \mathbf{x}) = y | S_l \leqslant t) = \mathbb{P} ( \mathrm{out}_{\mathcal{A}}( \mathbf{x}) = y | N_{\overline{{\mathcal{E}}}} \leqslant t) \end{cases}\] so part of the terms cancels, and we obtain \(\mathbb{P} ( \mathrm{out}_{\mathcal{A}}( \mathbf{x}) = y) - \mathbb{P} ( \mathrm{out}_{\mathcal{S}}( \mathbf{x}) = y) = \mathbb{P} ( S_l > t ) \left ( \mathbb{P} ( \mathrm{out}_{\mathcal{A}}( \mathbf{x}) = y | N_{\overline{{\mathcal{E}}}} > t) - \mathbb{P} ( \mathrm{out}_{\mathcal{S}}( \mathbf{x}) = y | S_l > t) \right ).\)

Now let \(U\) be a random variable that follows the distribution of \(\mathrm{out}_{\mathcal{A}}( \mathbf{x})\) given \(N_{\overline{{\mathcal{E}}}} > t\) and \(V\) be a random variable that follows the distribution of \(\mathrm{out}_{\mathcal{S}}( \mathbf{x})\) given \(S_l > t\). We can conclude the proof since \[\begin{aligned} \Delta &= \frac{1}{2} \sum_{ y \in \mathcal{Y} } \left | \mathbb{P} ( \mathrm{out}_{\mathcal{A}}( \mathbf{x}) = y) - \mathbb{P} ( \mathrm{out}_{\mathcal{S}}( \mathbf{x}) = y) \right | \\ &= \frac{1}{2} \sum_{ y \in \mathcal{Y} } \left | Q_B(t,l,{\overline{{\mathcal{E}}}}) \left ( \mathbb{P} ( \mathrm{out}_{\mathcal{A}}( \mathbf{x}) = y | N_{\overline{{\mathcal{E}}}} > t) - \mathbb{P} ( \mathrm{out}_{\mathcal{S}}( \mathbf{x}) = y | S_l > t) \right ) \right | \\ &= Q_B(t,l,{\overline{{\mathcal{E}}}}) \frac{1}{2} \sum_{ y \in \mathcal{Y} } \left | \mathbb{P} ( \mathrm{out}_{\mathcal{A}}( \mathbf{x}) = y | N_{\overline{{\mathcal{E}}}} > t) - \mathbb{P} ( \mathrm{out}_{\mathcal{S}}( \mathbf{x}) = y | S_l > t) \right | \\ &= Q_B(t,l,{\overline{{\mathcal{E}}}}) \Delta ( U ; V ) \\ &\leqslant Q_B(t,l,{\overline{{\mathcal{E}}}}) . \end{aligned}\]

(Duc, Dziembowski, and Faust 2014 Lemma 4) is used in (Duc, Faust, and Standaert 2019 Thm. 3) and (Prest et al. 2019 Lemma 5) so our patch directly improves their corresponding results.

Optimal Masking Order in Other Bounds

Proposition [prop:finite-masking-order] may appear as contradictory with (Duc, Faust, and Standaert 2019 Coro. 2). It turns out that (Duc, Faust, and Standaert 2019 Coro. 2) is incorrectly derived from (Duc, Faust, and Standaert 2019 Thm. 3). Namely, inverting equation \((13)\) from (Duc, Faust, and Standaert 2019 Thm. 3) yields \[d \leqslant d_{\mathrm{DFS}}^* \triangleq \left \lfloor \frac{1 - 16 |\mathcal{X}| \sqrt{\frac{ \delta_{\mathrm{MI}}}{2 \log e}}} {28|\mathcal{X}| \sqrt{\frac{ \delta_{\mathrm{MI}}}{2 \log e}}} \right \rfloor = \left\lfloor \frac{1}{28 |\mathcal{X}| } \left (\frac{ \delta_{\mathrm{MI}}}{2 \log e} \right )^{-\frac{1}{2}} - \frac{4}{7} \right\rfloor\] while the inequality is in the opposite direction in (Duc, Faust, and Standaert 2019 Coro. 2). It follows that equation \((15)\) in (Duc, Faust, and Standaert 2019 Coro. 2) does not hold. This corollary may lead to the misleading conclusion that provided that the masking order is high enough then the adversary’s advantage can be made arbitrarily small. The patched inequality however indicates that the bound derived in (Duc, Faust, and Standaert 2019 Thm. 3) is only valid up to a given maximal masking order \(d_{\mathrm{DFS}}^*\). In (Duc, Dziembowski, and Faust 2014 Thm. 1), it is required that \(\delta_{ {\mathrm{TVI}}}^{-1} \geqslant(28 d + 16) |\mathcal{X}|\) which in turn implies for a fixed noise level and because \(d\) is an integer that \[d \leqslant d_{\mathrm{DDF}}^* \triangleq \left\lfloor \frac{ \delta_{ {\mathrm{TVI}}}^{-1}}{28 |\mathcal{X}| } - \frac{4}{7} \right\rfloor.\] The security bound provided in (Duc, Dziembowski, and Faust 2014 Thm. 1) being decreasing with respect to the masking order \(d\), \(d_{\mathrm{DDF}}^*\) is the optimal masking order predicted by the inequality. The advantage of the adversary cannot be reduced further than \[|\Gamma| \mathrm{exp}\left (- \frac{d_{\mathrm{DDF}}^*}{12} \right ) = |\Gamma | \mathrm{exp}\left (- \frac{1}{12} \left\lfloor \frac{ \delta_{ {\mathrm{TVI}}}^{-1}}{28 |\mathcal{X}| } - \frac{4}{7} \right\rfloor \right )\] with this security proof. Once again this is in line with Proposition [prop:finite-masking-order] and the observation of Battistello et al. (Battistello et al. 2016), from a certain rank as \(d\) increases the bound on the adversary’s advantage increases in ISW gadgets whose noise rate is not constant.

In (Prouff and Rivain 2013 Coro. 2) it is required that \(d \leqslant\left \lfloor \delta_{{\mathrm{EN}}}^{-1} |\mathcal{X}|^{-2} \right \rfloor\) which implies the existence of a finite optimal masking order \(d_{{\mathrm{EN}}}^* \leqslant\left \lfloor \delta_{{\mathrm{EN}}}^{-1} |\mathcal{X}|^{-2} \right \rfloor\) with respect to the bound. Also in (Prest et al. 2019 Thm. 6) it is required that \(\delta_{{\mathrm{RE}}}^{-1} \geqslant 2d +1\) i.e., the derivation holds for \(d \leqslant\left \lfloor \frac{1}{2} (\delta_{\mathrm{RE}}^{-1} - 1) \right \rfloor\). Also the security bound presented in (Prest et al. 2019) also has an optimal masking order \(d_{{\mathrm{RE}}}^* \leqslant\left \lfloor \frac{1}{2} (\delta_{\mathrm{RE}}^{-1} - 1) \right \rfloor\). In (Masure and Standaert 2023) it is required that \(\delta_{{\mathrm{MI}}}^{-1} \geqslant d+1\) which in turn implies that there is an optimal masking order \(d_{\mathrm{MI}}^* \leqslant\lfloor \delta_{\mathrm{MI}}^{-1} \rfloor\). For these three bounds a term of the form \(((d+1) \delta)^{d+1}\) appear which is maximized for \(d+1 \in \{ \lfloor (\delta e)^{-1} \rfloor , \lceil (\delta e)^{-1} \rceil \}\). This indicates that in these bounds it is sub-optimal with respect to the security bound to maximize the masking order.

In the direct security proof (Theorem [thm:doeblin-pr]) the bottleneck is associated to the type 3 subsequences. While it is complicated to derive analytically the minimal value of \(\Upsilon_d( \overline{{\mathcal{E}}} )\) with respect to \(d\) we know that \((1- {\mathcal{E}}^{d+1} )^{d+1} \leqslant\Upsilon_d( \overline{{\mathcal{E}}} ) \leqslant 1\). Since \((d+1) \ln ( 1 - {\mathcal{E}}^{d+1}) \sim - d {\mathcal{E}}^{d+1} \to 0\), \((1- {\mathcal{E}}^{d+1} )^{d+1} \to 1\) and by the sandwich theorem we obtain that \(\Upsilon_d( \overline{{\mathcal{E}}} ) \to 1\) as \(d \to +\infty\). This implies the existence of an optimal masking order \(d^*({\mathcal{E}})\) with respect to the direct security proof that can be computed numerically.

I created this website using Quarto https://quarto.org/docs/websites.

References

Abdalla, Michel, and Mihir Bellare. 2000. “Increasing the Lifetime of a Key: A Comparative Analysis of the Security of Re-Keying Techniques.” In International Conference on the Theory and Application of Cryptology and Information Security, 546–59. Springer.
Agrawal, Dakshi, Bruce Archambeault, Josyula R. Rao, and Pankaj Rohatgi. 2002. The EM Side-Channel(s).” In CHES, 2523:29–45. LNCS. Springer.
Azouaoui, Melissa, Olivier Bronchain, Vincent Grosso, Kostas Papagiannopoulos, and François-Xavier Standaert. 2022. “Bitslice Masking and Improved Shuffling: How and When to Mix Them in Software?” IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022 (2): 140–65. https://doi.org/10.46586/tches.v2022.i2.140-165.
Barthe, Gilles, Sonia Belaı̈d, François Dupressoir, Pierre-Alain Fouque, Benjamin Grégoire, Pierre-Yves Strub, and Rébecca Zucchini. 2016. “Strong Non-Interference and Type-Directed Higher-Order Masking.” In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, edited by Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi, 116–29. ACM. https://doi.org/10.1145/2976749.2978427.
Battistello, Alberto, Jean-Sébastien Coron, Emmanuel Prouff, and Rina Zeitoun. 2016. Horizontal Side-Channel Attacks and Countermeasures on the ISW Masking Scheme.” In Cryptographic Hardware and Embedded Systems - CHES 2016 - 18th International Conference, Santa Barbara, CA, USA, August 17-19, 2016, Proceedings, edited by Benedikt Gierlichs and Axel Y. Poschmann, 9813:23–39. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-662-53140-2_2.
Béguinot, Julien, Wei Cheng, Sylvain Guilley, Yi Liu, Loı̈c Masure, Olivier Rioul, and François-Xavier Standaert. 2023. Removing the Field Size Loss from Duc et al.’s Conjectured Bound for Masked Encodings.” In Constructive Side-Channel Analysis and Secure Design: 14th International Workshop, COSADE 2023, Munich, Germany, April 3–4, 2023, Proceedings, 86–104. Springer.
Béguinot, Julien, Wei Cheng, Sylvain Guilley, and Olivier Rioul. 2022. “Be My Guess: Guessing Entropy Vs. Success Rate for Evaluating Side-Channel Attacks of Secure Chips.” In 25th Euromicro Conference on Digital System Design, DSD 2022, Maspalomas, Spain, August 31 - Sept. 2, 2022, 496–503. IEEE. https://doi.org/10.1109/DSD57027.2022.00072.
Béguinot, Julien, Yi Liu, Olivier Rioul, Wei Cheng, and Sylvain Guilley. 2023. “Maximal Leakage of Masked Implementations Using Mrs. Gerber’s Lemma for Min-Entropy.” CoRR abs/2305.06276. https://doi.org/10.48550/arXiv.2305.06276.
Belaı̈d, Sonia, Jean-Sébastien Coron, Emmanuel Prouff, Matthieu Rivain, and Abdul Rahman Taleb. 2020. “Random Probing Security: Verification, Composition, Expansion and New Constructions.” In Advances in Cryptology - CRYPTO 2020 - 40th Annual International Cryptology Conference, CRYPTO 2020, Santa Barbara, CA, USA, August 17-21, 2020, Proceedings, Part I, edited by Daniele Micciancio and Thomas Ristenpart, 12170:339–68. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-030-56784-2\_12.
Belaı̈d, Sonia, Matthieu Rivain, and Abdul Rahman Taleb. 2021. “On the Power of Expansion: More Efficient Constructions in the Random Probing Model.” In Advances in Cryptology - EUROCRYPT 2021 - 40th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, October 17-21, 2021, Proceedings, Part II, edited by Anne Canteaut and François-Xavier Standaert, 12697:313–43. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-030-77886-6\_11.
Bloch, Matthieu R., and João Barros. 2011. Physical-Layer Security: From Information Theory to Security Engineering. Cambridge University Press. https://doi.org/10.1017/CBO9780511977985.
Brian, Gianluca, Antonio Faonio, Maciej Obremski, João Ribeiro, Mark Simkin, Maciej Skórski, and Daniele Venturi. 2022. “The Mother of All Leakages: How to Simulate Noisy Leakages via Bounded Leakage (Almost) for Free.” IEEE Trans. Inf. Theory 68 (12): 8197–227. https://doi.org/10.1109/TIT.2022.3193848.
Brier, Éric, Christophe Clavier, and Francis Olivier. 2004. “Correlation Power Analysis with a Leakage Model.” In Cryptographic Hardware and Embedded Systems - CHES 2004: 6th International Workshop Cambridge, MA, USA, August 11-13, 2004. Proceedings, edited by Marc Joye and Jean-Jacques Quisquater, 3156:16–29. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-540-28632-5\_2.
Bronchain, Olivier, Julien M. Hendrickx, Clément Massart, Alex Olshevsky, and François-Xavier Standaert. 2019. “Leakage Certification Revisited: Bounding Model Errors in Side-Channel Security Evaluations.” In Advances in Cryptology - CRYPTO 2019 - 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2019, Proceedings, Part I, edited by Alexandra Boldyreva and Daniele Micciancio, 11692:713–37. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-030-26948-7\_25.
Carlet, Claude, Abderrahman Daif, Sylvain Guilley, and Cédric Tavernier. 2024. Quasi-linear masking against SCA and FIA, with cost amortization.” IACR Trans. Cryptogr. Hardw. Embed. Syst. 2024 (1): 398–432. https://doi.org/10.46586/TCHES.V2024.I1.398-432.
Cassiers, Gaëtan, Sebastian Faust, Maximilian Orlt, and François-Xavier Standaert. 2021. “Towards Tight Random Probing Security.” In Advances in Cryptology - CRYPTO 2021 - 41st Annual International Cryptology Conference, CRYPTO 2021, Virtual Event, August 16-20, 2021, Proceedings, Part III, edited by Tal Malkin and Chris Peikert, 12827:185–214. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-030-84252-9\_7.
Cassiers, Gaëtan, and François-Xavier Standaert. 2019. “Towards Globally Optimized Masking: From Low Randomness to Low Noise Rate or Probe Isolating Multiplications with Reduced Randomness and Security Against Horizontal Attacks.” IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019 (2): 162–98. https://doi.org/10.13154/TCHES.V2019.I2.162-198.
Chari, Suresh, Charanjit S. Jutla, Josyula R. Rao, and Pankaj Rohatgi. 1999. Towards Sound Approaches to Counteract Power-Analysis Attacks.” In CRYPTO, edited by Michael J. Wiener, 1666:398–412. Lecture Notes in Computer Science. Springer.
Chari, Suresh, Josyula R. Rao, and Pankaj Rohatgi. 2002. “Template Attacks.” In Cryptographic Hardware and Embedded Systems - CHES 2002, 4th International Workshop, Redwood Shores, CA, USA, August 13-15, 2002, Revised Papers, edited by Burton S. Kaliski Jr., Çetin Kaya Koç, and Christof Paar, 2523:13–28. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/3-540-36400-5\_3.
Chérisey, Éloi de, Sylvain Guilley, Olivier Rioul, and Pablo Piantanida. 2019. Best Information is Most Successful — Mutual Information and Success Rate in Side-Channel Analysis.” IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019 (2): 49–79. https://doi.org/10.13154/tches.v2019.i2.49-79.
Chestnut, Stephen, and Manuel E. Lladser. 2010. “Occupancy Distributions in Markov Chains via Doeblin’s Ergodicity Coefficient.” In DMTCS Proceedings Vol. AM, 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’10). https://dmtcs.episciences.org/2789.
Coron, Jean-Sébastien, and Lorenzo Spignoli. 2021. “Secure Wire Shuffling in the Probing Model.” In Advances in Cryptology - CRYPTO 2021 - 41st Annual International Cryptology Conference, CRYPTO 2021, Virtual Event, August 16-20, 2021, Proceedings, Part III, edited by Tal Malkin and Chris Peikert, 12827:215–44. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-030-84252-9\_8.
Cover, Thomas M., and Joy A. Thomas. 2006. Elements of Information Theory (2nd Edition). Wiley. http://www.elementsofinformationtheory.com/.
Dhem, Jean-François, François Koeune, Philippe-Alexandre Leroux, Patrick Mestré, Jean-Jacques Quisquater, and Jean-Louis Willems. 1998. A Practical Implementation of the Timing Attack.” In CARDIS, edited by Jean-Jacques Quisquater and Bruce Schneier, 1820:167–82. Lecture Notes in Computer Science. Springer.
Dobrushin, Roland L. 1956. “Central Limit Theorem for Nonstationary Markov Chains. i.” Theory of Probability & Its Applications 1 (1): 65–80.
Doeblin, Wolfgang. 1937. “Le Cas Discontinu Des Probabilités En Chaîne.” Publ. Fac. Sci. Univ. Masaryk (Brno), 236:1–13.
Duc, Alexandre, Stefan Dziembowski, and Sebastian Faust. 2014. Unifying Leakage Models: From Probing Attacks to Noisy Leakage.” In Advances in Cryptology - EUROCRYPT 2014 - 33rd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Copenhagen, Denmark, May 11-15, 2014. Proceedings, edited by Phong Q. Nguyen and Elisabeth Oswald, 8441:423–40. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-642-55220-5\_24.
———. 2019. “Unifying Leakage Models: From Probing Attacks to Noisy Leakage.” J. Cryptol. 32 (1): 151–77. https://doi.org/10.1007/S00145-018-9284-1.
Duc, Alexandre, Sebastian Faust, and François-Xavier Standaert. 2015. Making Masking Security Proofs Concrete - Or How to Evaluate the Security of Any Leaking Device.” In Advances in Cryptology - EUROCRYPT 2015 - 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Sofia, Bulgaria, April 26-30, 2015, Proceedings, Part I, edited by Elisabeth Oswald and Marc Fischlin, 9056:401–29. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-662-46800-5_16.
———. 2019. Making Masking Security Proofs Concrete (Or How to Evaluate the Security of Any Leaking Device), Extended Version.” J. Cryptol. 32 (4): 1263–97. https://doi.org/10.1007/s00145-018-9277-0.
Dziembowski, Stefan, and Krzysztof Pietrzak. 2008. “Leakage-Resilient Cryptography.” In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, 293–302. IEEE Computer Society. https://doi.org/10.1109/FOCS.2008.56.
Esposito, Amedeo Roberto, Adrien Vandenbroucque, and Michael Gastpar. 2022. “On Sibson’s \(\alpha\)-Mutual Information.” In IEEE International Symposium on Information Theory, ISIT 2022, Espoo, Finland, June 26 - July 1, 2022, 2904–9. IEEE. https://doi.org/10.1109/ISIT50566.2022.9834428.
Friedman, Nir, Dan Geiger, and Moisés Goldszmidt. 1997. “Bayesian Network Classifiers.” Mach. Learn. 29 (2-3): 131–63. https://doi.org/10.1023/A:1007465528199.
Gandolfi, Karine, Christophe Mourtel, and Francis Olivier. 2001. Electromagnetic Analysis: Concrete Results.” In CHES, 2162:251–61. LNCS. Springer.
Gohari, Amin, Onur Günlü, and Gerhard Kramer. 2020. “Coding for Positive Rate in the Source Model Key Agreement Problem.” IEEE Trans. Inf. Theory 66 (10): 6303–23. https://doi.org/10.1109/TIT.2020.2990750.
Goudarzi, Dahmun, Antoine Joux, and Matthieu Rivain. 2018. “How to Securely Compute with Noisy Leakage in Quasilinear Complexity.” In Advances in Cryptology - ASIACRYPT 2018 - 24th International Conference on the Theory and Application of Cryptology and Information Security, Brisbane, QLD, Australia, December 2-6, 2018, Proceedings, Part II, edited by Thomas Peyrin and Steven D. Galbraith, 11273:547–74. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-030-03329-3\_19.
Goudarzi, Dahmun, Thomas Prest, Matthieu Rivain, and Damien Vergnaud. 2022. “Probing Security Through Input-Output Separation and Revisited Quasilinear Masking.” IACR Cryptol. ePrint Arch., 45. https://eprint.iacr.org/2022/045.
Hoeffding, Wassily. 1994. “Probability Inequalities for Sums of Bounded Random Variables.” The Collected Works of Wassily Hoeffding, 409–26.
Ishai, Yuval, Amit Sahai, and David Wagner. 2003. Private Circuits: Securing Hardware against Probing Attacks.” In CRYPTO, 2729:463–81. Lecture Notes in Computer Science. Springer.
Ito, Akira, Rei Ueno, and Naofumi Homma. 2022. Perceived Information Revisited New Metrics to Evaluate Success Rate of Side-Channel Attacks.” IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022 (4): 228–54. https://doi.org/10.46586/TCHES.V2022.I4.228-254.
Kalai, Yael Tauman, and Leonid Reyzin. 2019. “A Survey of Leakage-Resilient Cryptography.” In Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali, edited by Oded Goldreich, 727–94. ACM. https://doi.org/10.1145/3335741.3335768.
Kocher, Paul C., Joshua Jaffe, and Benjamin Jun. 1999. Differential Power Analysis.” In CRYPTO, edited by Michael J. Wiener, 1666:388–97. Lecture Notes in Computer Science. Springer.
Kocher, Paul, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. 2018. Spectre Attacks: Exploiting Speculative Execution.” CoRR abs/1801.01203. http://arxiv.org/abs/1801.01203.
Lagasse, Jacqueline, Christopher Bartoli, and Wayne P. Burleson. 2019. “Combining Clock and Voltage Noise Countermeasures Against Power Side-Channel Analysis.” In 30th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2019, New York, NY, USA, July 15-17, 2019, 214–17. IEEE. https://doi.org/10.1109/ASAP.2019.00009.
Lee, JongHyeok, and Dong-Guk Han. 2020. “Security Analysis on Dummy Based Side-Channel Countermeasures - Case Study: AES with Dummy and Shuffling.” Appl. Soft Comput. 93: 106352. https://doi.org/10.1016/j.asoc.2020.106352.
Lemke-Rust, Kerstin, and Christof Paar. 2007. Gaussian Mixture Models for Higher-Order Side Channel Analysis.” In CHES, edited by Pascal Paillier and Ingrid Verbauwhede, 4727:14–27. LNCS. Springer.
Liu, Yi, Julien Béguinot, Wei Cheng, Sylvain Guilley, Loı̈c Masure, Olivier Rioul, and François-Xavier Standaert. 2023. “Improved Alpha-Information Bounds for Higher-Order Masked Cryptographic Implementations.” In IEEE Information Theory Workshop, ITW 2023, Saint-Malo, France, April 23-28, 2023, 81–86. IEEE. https://doi.org/10.1109/ITW55543.2023.10161608.
Makur, Anuran. 2020. “Coding Theorems for Noisy Permutation Channels.” IEEE Trans. Inf. Theory 66 (11): 6723–48. https://doi.org/10.1109/TIT.2020.3009468.
Makur, Anuran, and Japneet Singh. 2023. “Doeblin Coefficients and Related Measures.” arXiv Preprint arXiv:2309.08475.
Massey, James L. 1994. “Guessing and Entropy.” In Proceedings of 1994 IEEE International Symposium on Information Theory, 204–4. https://doi.org/10.1109/ISIT.1994.394764.
Masure, Loı̈c, and François-Xavier Standaert. 2023. Prouff and Rivain’s Formal Security Proof of Masking, Revisited - Tight Bounds in the Noisy Leakage Model.” In Advances in Cryptology - CRYPTO 2023 - 43rd Annual International Cryptology Conference, CRYPTO 2023, Santa Barbara, CA, USA, August 20-24, 2023, Proceedings, Part III, edited by Helena Handschuh and Anna Lysyanskaya, 14083:343–76. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-031-38548-3\_12.
Mertens, Stephan. 2024. “Domination Polynomial of the Rook Graph.” Journal of Integer Sequences, Vol. 27, Article 24.3.7 and arXiv Preprint arXiv:2401.00716.
Micali, Silvio, and Leonid Reyzin. 2004. “Physically Observable Cryptography.” In TCC, 2951:278–96. Lecture Notes in Computer Science. Springer.
Moradi, Amir, Mohammad Taghi Manzuri Shalmani, and Mahmoud Salmasizadeh. 2009. Dual-rail transition logic: A logic style for counteracting power analysis attacks.” Computers & Electrical Engineering 35 (2): 359–69.
Plançon, Maxime. 2022. “Exploiting Algebraic Structures in Probing Security.” IACR Cryptol. ePrint Arch., 1540. https://eprint.iacr.org/2022/1540.
Polyanskiy, Yury, and Yihong Wu. 2023. Information Theory, from Coding to Learning (1. Ed.). Cambridge University Press. https://people.lids.mit.edu/yp/homepage/data/itbook- export.pdf.
Prest, Thomas, Dahmun Goudarzi, Ange Martinelli, and Alain Passelègue. 2019. Unifying Leakage Models on a Rényi Day.” In Advances in Cryptology - CRYPTO 2019 - 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2019, Proceedings, Part I, edited by Alexandra Boldyreva and Daniele Micciancio, 11692:683–712. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-030-26948-7\_24.
Prouff, Emmanuel, and Matthieu Rivain. 2013. Masking against Side-Channel Attacks: A Formal Security Proof.” In Advances in Cryptology - EUROCRYPT 2013, 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Athens, Greece, May 26-30, 2013. Proceedings, edited by Thomas Johansson and Phong Q. Nguyen, 7881:142–59. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-642-38348-9_9.
Renauld, Mathieu, François-Xavier Standaert, Nicolas Veyrat-Charvillon, Dina Kamel, and Denis Flandre. 2011. A Formal Study of Power Variability Issues and Side-Channel Attacks for Nanoscale Devices.” In Advances in Cryptology - EUROCRYPT 2011 - 30th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Tallinn, Estonia, May 15-19, 2011. Proceedings, edited by Kenneth G. Paterson, 6632:109–28. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-642-20465-4_8.
Rioul, Olivier. 2023. The Interplay between Error, Total Variation, Alpha-Entropy and Guessing: Fano and Pinsker Direct and Reverse Inequalities.” Entropy 25 (7): 978.
Rivain, Matthieu. 2022. “On the Provable Security of Cryptographic Implementations.” PhD thesis, Habilitation thesis. Personal website.
Rivain, Matthieu, and Emmanuel Prouff. 2010a. Provably Secure Higher-Order Masking of AES.” IACR Cryptology ePrint Archive 2010: 441.
———. 2010b. Provably Secure Higher-Order Masking of AES.” In CHES, edited by Stefan Mangard and François-Xavier Standaert, 6225:413–27. LNCS. Springer.
Seneta, Eugene. 1973. “On the Historical Development of the Theory of Finite Inhomogeneous Markov Chains.” Mathematical Proceedings of the Cambridge Philosophical Society 74 (3): 507–13.
Shulman, Nadav, and Meir Feder. 2004. The Uniform Distribution as a Universal Prior.” IEEE Trans. Inf. Theory 50 (6): 1356–62. https://doi.org/10.1109/TIT.2004.828152.
Standaert, François-Xavier, Tal Malkin, and Moti Yung. 2009. A Unified Framework for the Analysis of Side-Channel Key Recovery Attacks.” In EUROCRYPT, 5479:443–61. LNCS. Springer.
Ueno, Rei, Naofumi Homma, Akiko Inoue, and Kazuhiko Minematsu. 2024. Fallen Sanctuary: A Higher-Order and Leakage-Resilient Rekeying Scheme.” IACR Trans. Cryptogr. Hardw. Embed. Syst. 2024 (1): 264–308. https://doi.org/10.46586/TCHES.V2024.I1.264-308.
Unterluggauer, Thomas, Thomas Korak, Stefan Mangard, Robert Schilling, Luca Benini, Frank K. Gürkaynak, and Michael Muehlberghuber. 2017. Leakage Bounds for Gaussian Side Channels.” In Smart Card Research and Advanced Applications - 16th International Conference, CARDIS 2017, Lugano, Switzerland, November 13-15, 2017, Revised Selected Papers, edited by Thomas Eisenbarth and Yannick Teglia, 10728:88–104. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-319-75208-2\_6.
Verdú, Sergio. 2015. \(\alpha\)-Mutual Information.” In 2015 Information Theory and Applications Workshop, ITA 2015, San Diego, CA, USA, February 1-6, 2015, 1–6. IEEE. https://doi.org/10.1109/ITA.2015.7308959.
Veyrat-Charvillon, Nicolas, Marcel Medwed, Stéphanie Kerckhof, and François-Xavier Standaert. 2012. Shuffling against Side-Channel Attacks: A Comprehensive Study with Cautionary Note.” In ASIACRYPT, edited by Xiaoyun Wang and Kazue Sako, 7658:740–57. Lecture Notes in Computer Science. Springer.
Weizsäcker, Heinrich von. 1974. Zur Gleichwertigkeit zweier Arten der Randomisierung.” Manuscripta Mathematica 11: 91–94.
Wyner, Aaron D., and Jacob Ziv. 1973. “A Theorem on the Entropy of Certain Binary Sequences and Applications-i.” IEEE Trans. Inf. Theory 19 (6): 769–72. https://doi.org/10.1109/TIT.1973.1055107.

Footnotes

  1. Unless, for a given round in a divide-and-conquer attack, the round’s output is assumed not disclosed to the attacker because it is hidden by the one-way computational assumption in the subsequent rounds.↩︎

  2. This term is usually reserved for communication problems.↩︎

  3. However, it is not true to say that conditionally to \(S_l < t\) then \(\mathrm{out}_\mathcal{S}( \mathbf{x})\) is equal in distribution with \(\mathrm{out}_{\mathcal{A}}( \mathbf{x})\) as claimed in (Duc, Dziembowski, and Faust 2014). Indeed \(\mathrm{out}_{\mathcal{A}}( \mathbf{x})\) is independent from the event \(S_l < t\) hence its distribution is the same when conditioned on this event. On the one hand it is possible that \(S_l < t\) but \(\mathrm{out}_{\mathcal{A}}( \mathbf{x})\) outputs all the values \((x_1,\ldots,x_l)\) without any erasure. On the other hand when conditioned on the event \(S_l < t\), \(\mathrm{out}_{\mathcal{S}}(\mathbf{x})\) will produce at least \(l-t\) erasures.↩︎