Removing the Field Size Loss from Duc et al.’s Conjectured Bound for Masked Encodings
At Eurocrypt 2015, Duc et al. conjectured that the success rate of a side-channel attack targeting an intermediate computation encoded in a linear secret-sharing, a.k.a.masking with \({d}+1\) shares, could be inferred by measuring the mutual information between the leakage and each share separately. This way, security bounds can be derived without having to mount the complete attack. So far, the best proven bounds for masked encodings were nearly tight with the conjecture, up to a constant factor overhead equal to the field size, which may still give loose security guarantees compared to actual attacks. In this paper, we improve upon the state-of-the-art bounds by removing the field size loss, in the cases of Boolean masking and arithmetic masking modulo a power of two. As an example, when masking in the AES field, our new bound outperforms the former ones by a factor \(256\). Moreover, we provide theoretical hints that similar results could hold for masking in other fields as well.
Introduction
If sca may be considered as a critical threat against the security of cryptography on embedded devices, it is no longer a fatality. Over the past decades, the masking counter-measure (Chari et al. 1999; Goubin and Patarin 1999) has gained more and more success among designers and developers, both from an implementation and from a theoretical point of view. Masking can be seen as a linear secret sharing applied on each intermediate computation in the implementation of a cryptographic primitive that depends on some secret. In a nutshell, masking increases the attack complexity of any sca adversary exponentially fast with the number of shares — provided that the leakages are sufficiently noisy and independent — while increasing the runtime and memory overhead at most quadratically (Ishai, Sahai, and Wagner 2003a). This makes masking a theoretically sound counter-measure.
The Evaluation Challenge.
Despite these achievements, the evaluation of a protected implementation remains cluttered by various technical and even conceptual difficulties. One way for evaluators to assess the security level of an implementation is to mount some known end-to-end attacks and to infer some security level based on the outcomes of these attacks. Nevertheless, this relies on the assumption that the attacks mounted by the evaluators could depict well the optimal attacks that any adversary could realize. As an example, if for masking with two shares, end-to-end attacks using dl depict well optimal attacks (Maghrebi, Portigliatti, and Prouff 2016; Benadjila et al. 2020), it is no longer true when masking uses more shares (Bronchain and Standaert 2020; Loı̈c Masure et al. 2023). This could result in a false sense of security, and leaves the developers in an uncomfortable situation where implementations become increasingly hard to evaluate as their security level increases.
The Paradigm of Worst-Case Attacks.
One way to circumvent this issue is to consider attacks in a so-called worst-case evaluation setting (Azouaoui et al. 2020). The core idea is to apply Kerckhoff’s principles to side-channel security, by granting all the knowledge of the target to the adversary, e.g., the random nonces used during the encryption, except the knowledge of the secret to guess. This way, the evaluator can efficiently profile the target implementation in order to (more) easily mount online attacks that approach the optimal ones. She can also analyze the leakage of the shares independently, in order to take advantage of masking security proofs to bound the security level under some assumptions.
Indeed, a series of theoretical works on masking allow to bound the amount of information leaked by a masked secret, depending on the amount of information leaked by each share separately, under the assumption that the shares’ leakages are independent. Such bounds can be expressed, e.g., in terms of the mi, and then in turn be translated in terms of the sr of any attack, as shown by Duc et al. at Eurocrypt 2015 (Duc, Faust, and Standaert 2015). Nevertheless, most of the current masking security proofs provide conservative bounds, possibly due to technical artifacts. In particular, they generally require more noise and more shares than expected by the best known attacks in order to reach a given security level (Prouff and Rivain 2013; Duc, Dziembowski, and Faust 2014; Dziembowski, Faust, and Skorski 2015; Prest et al. 2019).
Duc et al.’s Conjecture.
Confronting this observation with empirical evidences, Duc et al. conjectured that the required number of queries to the target device needed to recover the target secret of a sca is inversely proportional to the product of the mis of each share (Duc, Faust, and Standaert 2015): \[N_{a}(\mathop{\mathrm{{\mathsf{SR}}}}) \approx \frac{f(\mathop{\mathrm{{\mathsf{SR}}}})}{\prod_{i=0}^{{d}} \mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}}_i ; {{L}}_i\right)} \enspace ,\] where \({d}\) stands for the masking order, 1 and \(f\) is a “small constant depending on the target sr” (Duc, Faust, and Standaert 2019, 1279). Later at Ches 2019, Chérisey et al. bounded this constant based on the entropy of the target secret (de Chérisey et al. 2019).
Due to its practical relevance, this conjecture recently gained attraction with two independent and simultaneous works by Ito et al. (Ito, Ueno, and Homma 2022) at CCS 2022 and by Masure et al. (Loïc Masure, Rioul, and Standaert 2023) at CARDIS 2022. Using different approaches, both works prove a nearly-tight version of Duc et al.’s conjecture for masked encodings, up to a constant factor equal to the field size \(M\) of the encoding.
This represents a significant improvement with respect to the previous proved bounds — e.g., \(\mathcal{O}\!\left({M}^{d}\right)\) in Duc et al.’s proof (Duc, Faust, and Standaert 2015), which additionally suffers from a reduced noise amplification rate. But it remains loose compared to empirical attacks performed against implementations of concrete ciphers like the aes. At a high level, Ito et al. and Masure et al.’s approaches used some back and forth between the mi and other metrics, such as the tv (Loïc Masure, Rioul, and Standaert 2023) or the en (Ito, Ueno, and Homma 2022), in order to state noise amplification lemmata. 2 If these conversions taken separately are tight, their combination introduces an \(\mathcal{O}\!\left({M}\right)\) overhead, leading to the question whether tighter bounds could be proved, on which we focus.
Our Contribution.
In this paper, we positively address the latter question, by removing this field size loss for masked encodings. At a high level, we do that by working directly with noisy leakages, without relying on reductions to more abstract (e.g., random probing) leakage models. Technically, our approach consists in stating the amplification lemma directly in terms of the mi, without any lossy conversion to other statistical distances. This idea is implemented using a result from Information Theory called mgl (Cheng 2014; Jog and Anantharam 2014). The mgl allows us to bound the mi between the secret and the whole leakage by a function of the mis between each share and their corresponding leakage. Moreover, the bound given by the mgl is proved to be tight, in the sense that there exists some leakage distributions for which the inequality from the mgl is actually an equality. The only limitation compared to the previous works is that our bound only works for fields whose size is a power of \(2\). Thankfully, this limitation is not prohibitive, since our result covers, e.g., Boolean masking or arithmetic masking modulo \(2^{n}\). Nevertheless, we argue at the end of this paper that similar results could also be obtained in different fields, whose size is not necessarily a power of \(2\). More generally, and since our results are for now specialized to masked encodings, it remains a natural question whether they generalize to computation, as also conjectured by Duc. et al. (Duc, Faust, and Standaert 2015).
Statement of the Problem
We start the paper by stating the problem under consideration, before providing the solution in , and discussing some perspectives in .
Notations and Background
Side-Channel Attack.
Let \(({\mathcal{Y}}, {\oplus})\) be a group of finite order, denoted by \({M}\). Let \({{K}}\in {\mathcal{Y}}\) be the secret key chunk to guess. To this end, we consider that the adversary knows a sequence of \(N_{a}\) plaintexts \(\left\{{{P}}\right\}_{N_{a}}\), and can observe the sequence of leakages \(\left\{{{\textbf{L}}}\right\}_{N_{a}}\) associated to the corresponding intermediate computations \(\left\{{{Y}}= \mathop{\mathrm{\mathbf{C}}}\!\left({{K}}, {{P}}\right)\right\}_{N_{a}}\). Based on this side-channel information, the adversary returns a key guess \(\widehat{{{K}}}\). We define the as \(\mathop{\mathrm{{\mathsf{SR}}}}= \mathop{\mathrm{Pr}}\!\left({{K}}= \widehat{{{K}}}\right)\). Since the sr increases when the number of observed traces \(N_{a}\) increases as well, we next define the quantity \(N_{a}(\mathop{\mathrm{{\mathsf{SR}}}},{\mathcal{Y}})\) as the minimal number of leakage traces required for any adversary to reach a success rate at least \(\mathop{\mathrm{{\mathsf{SR}}}}\).
Masking.
In order to protect cryptographic secrets against side-channel leakage, we consider the intermediate computation \({{Y}}\) — assumed to be uniformly distributed — to be masked. 3 Let \({{Y}}_0, \ldots, {{Y}}_{d}\) be \({d}+1\) random variables out of which \(d\) are uniformly drawn from \({\mathcal{Y}}\), that we call the shares, and denote by \({{Y}}= {{Y}}_0 {\oplus}\ldots {\oplus}{{Y}}_{d}\) the random variable to protect, that we call the secret. Concretely, for each trace \({{\textbf{L}}}= ({{L}}_0, \ldots, {{L}}_{d})\), the adversary observes a leakage \({{L}}_i\), whose distribution conditionally to \({{Y}}_i\) is independent of all the other random variables. In our setting, we assume that an evaluator has been able to characterize the amount of uncertainty about \({{Y}}_i\) that has been removed by observing \({{L}}_i\), measured in terms of the mi, whose definition is recalled hereafter.
[def:mi] Let \(\mathop{\mathrm{{\mathsf{p}}}}, \mathop{\mathrm{{\mathsf{m}}}}\) be two over the finite set \({\mathcal{Y}}\). 4 We denote by \(\mathsf{D_{KL}}(\mathop{\mathrm{{\mathsf{p}}}} \| \mathop{\mathrm{{\mathsf{m}}}})\) the divergence between \(\mathop{\mathrm{{\mathsf{p}}}}\) and \(\mathop{\mathrm{{\mathsf{m}}}}\):
\[\mathsf{D_{KL}}(\mathop{\mathrm{{\mathsf{p}}}} \| \mathop{\mathrm{{\mathsf{m}}}}) = \sum_{{\mathnormal{y}}\in {\mathcal{Y}}} \mathop{\mathrm{{\mathsf{p}}}}\!\left({\mathnormal{y}}\right) \log_2\!\left(\frac{\mathop{\mathrm{{\mathsf{p}}}}\!\left({\mathnormal{y}}\right)}{\mathop{\mathrm{{\mathsf{m}}}}\!\left({\mathnormal{y}}\right)}\right) \enspace . \label{eq:def_kldiv}\] Then, we define the between a discrete random variable \({{Y}}\) and a continuous random vector \({{\textbf{L}}}\) as follows: \[\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}} ; {{\textbf{L}}}\right) = \underset{{{\textbf{L}}}}{\mathbb{E}}\left[ \mathsf{D_{KL}}(\mathop{\mathrm{{\mathsf{p}}}}_{{{Y}}\;\mid\;{{\textbf{L}}}} \|\mathop{\mathrm{{\mathsf{p}}}}_{{{Y}}}) \right] \enspace , \label{eq:def_mi}\] where \(\mathop{\mathrm{{\mathsf{p}}}}_{{{Y}}}\) and \(\mathop{\mathrm{{\mathsf{p}}}}_{{{Y}}\;\mid\;{{\textbf{L}}}}\) respectively denote the pmf of \({{Y}}\) and the pmf of \({{Y}}\) given a realization \({\mathbf{l}}\) of the random vector \({{\textbf{L}}}\), with the expectation taken over \({{\textbf{L}}}\).
In the remaining of this paper, we will assume that for each share \({{Y}}_i\) and its corresponding sub-leakage \({{L}}_i\), we have a bound \(\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}}_i ; {{L}}_i\right) \leq \delta_i\). Intuitively, the lower the \(\delta_i\), the less informative the leakages, and the lower the sr. Moreover, we do not focus on the potential implementation overhead of masking in this paper — that could grow quadratically in memory and runtime (Ishai, Sahai, and Wagner 2003b) — to only focus on the security aspect of the counter-measure.
Problem and Conjecture
The problem that we consider here is to obtain upper bounds of the shape: \[\label{eq:conjecture} N_{a}(\mathop{\mathrm{{\mathsf{SR}}}}, {\mathcal{Y}}) \geq \frac{f(\mathop{\mathrm{{\mathsf{SR}}}}, {\mathcal{Y}})}{\prod_{i=0}^{d}(\delta_i/\tau)^r} \enspace ,\] where \(f(\mathop{\mathrm{{\mathsf{SR}}}}, {\mathcal{Y}})\) is a constant, \(\tau\) is the so-called noise threshold, i.e., the maximum amount of leakage that can leak such that the masking counter-measure remains sound and \(r\) is the amplification rate. Duc et al. (Duc, Faust, and Standaert 2015) conjectured that \(N_{a}\) satisfies an upper bound of the shape of , where \(\tau \approx 1\) and \(r=1\).
A Reduction to mi Maximization.
At Ches 2019, Chérisey et al. have shown that \(N_{a}\) can be linked to the mi as follows: \[N_{a}(\mathop{\mathrm{{\mathsf{SR}}}},{\mathcal{Y}}) \geq \frac{f(\mathop{\mathrm{{\mathsf{SR}}}}, {\mathcal{Y}})}{\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}} ; {{\textbf{L}}}\right)} \enspace , \label{eq:cherisey}\] where \(f\) is a known, computable function of \(\mathop{\mathrm{{\mathsf{SR}}}}\) that can be bounded based on the entropy of \(Y\) so that \(f(\mathop{\mathrm{{\mathsf{SR}}}}, {\mathcal{Y}}) = \mathcal{O}\!\left(\log({M})\right)\) (de Chérisey et al. 2019). In other words, it is possible to reduce the problem of bounding the security level of masked implementation to the problem of bounding the mi: \[\begin{aligned} \max_{\mathop{\mathrm{Pr}}\!\left({{L}}_i \;\mid\;{{Y}}_i\right), i \in [| 0, {d}|]} \quad & \mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}} ; {{\textbf{L}}}\right) \\ \textrm{s.t.} \quad & \mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}}_i ; {{L}}_i\right) \leq \delta_i \end{aligned} \label{eq:pb_MI_reduced} \enspace .\] Following the previous conjecture, we expect that \(\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}} ; {{\textbf{L}}}\right) \approx \prod_{i=0}^{d}(\delta_i/\tau)^r\) is a valid upper bound for this problem, where \(\tau \approx 1\), and \(r = 1\), whereas it could so far only be proven that \(\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}} ; {{\textbf{L}}}\right) \approx M \prod_{i=0}^{d}(\delta_i/\tau)\).
We note that the optimization defined in is convex, with convex constraints, as stated hereafter. 5
[prop:optim_convex] The optimization problem defined in is convex.
Let \({\mathbf{l}}\) be fixed. The mapping \[\mathop{\mathrm{Pr}}\!\left({{Y}}_0 \;\mid\;{{L}}_0 = {\mathnormal{l}}_0\right), \ldots, \mathop{\mathrm{Pr}}\!\left({{Y}}_{d}\;\mid\;{{L}}_{d}= {\mathnormal{l}}_{d}\right) \mapsto \mathop{\mathrm{Pr}}\!\left({{Y}}\;\mid\;{{\textbf{L}}}= {\mathbf{l}}\right)\] is a convolution product (Loı̈c Masure et al. 2023 Prop. 1) so it is \(({d}+1)\)-linear, and thereby convex. Hence, since the mapping \(\mathop{\mathrm{Pr}}\!\left({{Y}}\;\mid\;{{\textbf{L}}}= {\mathbf{l}}\right) \mapsto -\mathsf{H}\!\left({{Y}}\;\mid\;{{\textbf{L}}}= {\mathbf{l}}\right)\) is also convex, the composition of both mappings remains convex. Since \(\mathop{\mathrm{Pr}}\!\left({{Y}}\;\mid\;{{\textbf{L}}}\right) \mapsto -\mathsf{H}\!\left({{Y}}\;\mid\;{{\textbf{L}}}\right)\) is the expectation of the latter composed mappings, it remains convex. Adding \(\mathsf{H}\!\left({{Y}}\right) = \log_2({M})\) keeps the convexity property unchanged. As a result of this convexity, the optimal solution to the optimization of is necessarily such that for each \(i \in [| 0, {d}|]\), we have \(\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}}_i ; {{L}}_i\right) = \delta_i\).
Serial vs. Parallel Leakages.
In this section, we have implicitly assumed that the leakages occured in serial, which mostly depicts what could happen in a software implementation. We stress that our results may also extend without loss of generality to leakages occuring in parallel, e.g., leakages of the form \({{\textbf{L}}}= {{L}}_0 + \ldots + {{L}}_{d}\), provided that the independence assumption remains verified. It suffices to reduce to the serial case, thanks to the dpi: \[\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}} ; {{L}}_0 + \ldots + {{L}}_{d}\right) \leq \mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}} ; {{L}}_0, \ldots, {{L}}_{d}\right) \enspace .\]
A Proof without Field Size Loss
We now provide our main result, namely we give a solution to the optimization problem stated in . Compared to previous works, we introduce a mild additional assumption on the group \({\mathcal{Y}}\), namely that its order is a power of two. Nevertheless, this assumption covers Boolean masking and arithmetic masking modulo \(2^{n}\). To this end, we need to introduce some definitions.
Introducing mgl
We first recall the definition of the entropy for a binary random variable.
Let \[\begin{array}{l|rcl} H_b: & [0, 1] & \longrightarrow & [0,1] \\ & p & \longmapsto & -p\log_2(p) - (1-p)\log_2(1-p) \end{array}\] be the binary entropy function. Let \(H_b^{-1} : [0,1] \mapsto \left[0, \frac{1}{2} \right]\) be the inverse of \(H_b\) restricted to \(\left[0, \frac{1}{2} \right]\).
Likewise, we introduce the convolution for a binary random variable.
Let \[\begin{array}{l|rcl} \star: & [0,1]^2 & \longrightarrow & [0,1] \\ & x, y & \longmapsto & (1-x) y + x (1-y). \end{array}\]
Note that when \(\mathop{\mathrm{\large \star}}\) is iterated, it can be replaced by a product, as stated next.
For \(x_0, \ldots, x_{d}\in [0, 1]\), the \(\star\) operations can be mapped into a product for operands in the form of a bias as follows \[\mathop{\mathrm{\large \star}}_{i=0}^d \left( \frac{1}{2} -x_i\right) = \frac{1}{2} - 2^d \prod_{i=0}^d x_i.\] [prop:iteratedStar]
This is proved by induction on \(d\).
For any positive integers \({n}, p\), let \(\mathop{\mathrm{f}}_{\mathsf{H}, 2^{n}}: [0, 1]^{p+1} \rightarrow [0, 1]\) be the function defined by \[{\mathop{\mathrm{f}}_{\mathsf{H}, 2^{n}}\!\left(x_0,\ldots, x_p\right)} = H_b\!\left( \mathop{\mathrm{\large \star}}_{i=0}^p H_b^{-1}\!\left(x_i\right) \right) \enspace .\] Moreover, we also define the function \(\mathop{\mathrm{f}}_{\mathsf{MI}, 2^{n}}: [0,1]^{p+1} \rightarrow [0, 1]\) as \[{\mathop{\mathrm{f}}_{\mathsf{MI}, 2^{n}}\!\left(\delta_0,\ldots, \delta_p\right)} = 1 - {\mathop{\mathrm{f}}_{\mathsf{H}, 2^{n}}\!\left(1 - \delta_0,\ldots, 1 - \delta_p\right)}.\]
The function \(\mathop{\mathrm{f}}_{\mathsf{MI}}\) is decreasing with respect to each of its inputs, and is equal to 0 when every \(\delta_i = 0\).
We are now equipped to introduce the technical lemma that will set the ground for our result, namely the so-called mgl. mgl has been first established by Wyner and Ziv (Wyner and Ziv 1973) for a two-element group \({\mathcal{Y}}\), but it has been extended to any Abelian group whose order is a power of two by Jog and Anantharam (Jog and Anantharam 2014).
[thm:extendedMGL] Let \(({\mathcal{Y}},\oplus)\) be any Abelian group of order \({M}=2^{n}\). Let \({{Y}}_0,\ldots,{{Y}}_{d}\) be \({d}+1\) independent \({\mathcal{Y}}\)-valued random variables with side information \({{L}}_0,\ldots,{{L}}_{d}\). We assume that conditionally to \({{Y}}_i\), \({{L}}_i\) is independent of any other random variable. Define \(x_i=\mathsf{H}\!\left({{Y}}_i \;\mid\;{{L}}_i\right)\), and without loss of generality assume that \(x_0 \geq \ldots \geq x_d\). Let \(k = \big\lfloor x_0\big\rfloor\) and \(p = \max \left\{i \middle| \left\lfloor x_i\right\rfloor \geq k\right\}\), then \[k + {\mathop{\mathrm{f}}_{\mathsf{H}, 2^{n}}\!\left(x_0 - k,\ldots,x_p - k\right)} \leq \mathsf{H}\!\left({{Y}}_0 \oplus \ldots \oplus {{Y}}_{d}\;\mid\;{{L}}_0,\ldots, {{L}}_{d}\right). \label{eqn:extendedMGL}\]
Let us denote \({{Y}}= {{Y}}_0 \oplus \ldots \oplus {{Y}}_{d}\) for short, and for any \(i \in [| 0, {d}|]\), let \({{X}}_i({\mathnormal{l}}) = \mathsf{H}\!\left({{Y}}_i \;\mid\;{{L}}_i = {\mathnormal{l}}\right)\), such that \(\underset{{{L}}_i}{\mathbb{E}}\left[{{X}}_i({{L}}_i)\right] = x_i\). Moreover, notice that by assumption, all the \({{X}}_i({{L}}_i)\) are mutually independent.
Jog and Anantharam claim (Jog and Anantharam 2014 Thm. V.1) that for a fixed leakage \({\mathbf{l}}= ({\mathnormal{l}}_0, \ldots, {\mathnormal{l}}_{d})\), we have \[\varphi\!\left( {{X}}_0({\mathnormal{l}}_0) \ldots, {{X}}_{d}({\mathnormal{l}}_{d}) \right) \leq \mathsf{H}\!\left( {{Y}}\;\mid\;{{\textbf{L}}}= {\mathbf{l}} \right) \enspace , \label{eqn:extendedMGL_fixed}\] for some function \(\varphi\) that is convex with respect to each variable, when the remaining are kept fixed (Jog and Anantharam 2014 Cor. V.1). Combining this property with the independence of the \({{X}}_i({{L}}_i)\), we may apply Jensen’s inequality \({d}+1\) times: \[\varphi\!\left( x_0, \ldots, x_{d} \right) \leq \underset{{\mathbf{l}}}{\mathbb{E}}\left[ \varphi\!\left( {{X}}_0({\mathbf{l}}_0) \ldots, {{X}}_{d}({\mathbf{l}}_{d}) \right) \right] \leq \underset{{\mathbf{l}}}{\mathbb{E}}\left[ \mathsf{H}\!\left( {{Y}}\;\mid\;{{\textbf{L}}}= {\mathbf{l}} \right) \right] = \mathsf{H}\!\left({{Y}}\;\mid\;{{\textbf{L}}}\right) \enspace ,\] Finally, replacing \(\varphi\) by its expression from (Jog and Anantharam 2014 Thm. 5.1) results in .
In this paper, for better readability, all the logarithms are taken in base \(2\), but all the results we rely on have been established with logarithms in natural base. Thankfully, the proof of the mgl for \({M}= 2\) can be straightforwardly extended to logarithms in any base (Cheng 2014 Thm. 1). Likewise, all the technical results used in Jog and Anantharam’s proof remain insensitive to the base, as they essentially involve computing ratios of logarithms (Jog and Anantharam 2014, sec. 2).
Application of mgl to Masking
Using the mgl, we prove the following upper bound on the side-channel information leaked by a masked encoding.
[cor:mgl] Let \({M}=2^{n}\) and \({d}\) be a positive integer. Let \({{Y}}_0, \ldots, {{Y}}_{d}\) be a \(({d}+1)\)-sharing of the uniform random variable \({{Y}}\) and \({{\textbf{L}}}= ({{L}}_0, \ldots, {{L}}_{d})\) be such that, conditionally to \({{Y}}_i\), the variable \({{L}}_i\) is independent of the others. For all \(i \in [| 0, {d}|]\), define \(\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}}_i ; {{L}}_i\right) = \delta_i\), and assume without loss of generality that there is a positive integer \(p\) such that for all \(i \leq p\), \(\delta_i \leq 1\) and for all \(i > p\), \(\delta_i \geq 1\). Then \[\label{eq:cor:mgl} \mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}} ; {{\textbf{L}}}\right) \leq {\mathop{\mathrm{f}}_{\mathsf{MI}, 2^{n}}\!\left(\delta_0, \ldots, \delta_p\right)} \enspace .\] [thm:sharp]
We upper-bound \(\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}} ; {{\textbf{L}}}\right) = \mathsf{H}\!\left({{Y}}\right) - \mathsf{H}\!\left({{Y}}\;\mid\;{{\textbf{L}}}\right) = {n}- \mathsf{H}\!\left({{Y}}\;\mid\;{{\textbf{L}}}\right)\) by lower-bounding \(\mathsf{H}\!\left({{Y}}\;\mid\;{{\textbf{L}}}\right)\), using . \[\begin{aligned} \mathsf{H}\!\left({{Y}}\;\mid\;{{\textbf{L}}}\right) &= \mathsf{H}\!\left({{Y}}_0 \oplus \ldots \oplus {{Y}}_{d}\;\mid\;{{\textbf{L}}}\right) \\ &\geq {n}- 1 + {\mathop{\mathrm{f}}_{\mathsf{H}, 2^{n}}\!\left( \mathsf{H}\!\left({{Y}}_0 \;\mid\;{{L}}_0\right) - ({n}-1), \ldots, \mathsf{H}\!\left({{Y}}_p \;\mid\;{{L}}_p\right) - ({n}-1) \right)}\\ &= n-1+{\mathop{\mathrm{f}}_{\mathsf{H}, 2^{n}}\!\left( 1 - \mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}}_0 ; {{L}}_0\right), \ldots, 1 - \mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}}_p ; {{L}}_p\right) \right)} \\ &= {n}- {\mathop{\mathrm{f}}_{\mathsf{MI}, 2^{n}}\!\left( \mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}}_0 ; {{L}}_0\right), \ldots, \mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}}_p ; {{L}}_p\right) \right)} \end{aligned}\]
Comparison with Former Upper Bounds
The removal of the field size loss in is illustrated by . The graph depicts the upper bounds on \(\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}} ; {{L}}\right)\) (in bits) with respect to the noise parameter \(\delta\) — assuming that the \(\delta_i\) are all equal. The dotted curves correspond to the bounds given by Masure et al. (Loïc Masure, Rioul, and Standaert 2023) 6 for different masking orders, whereas the dashed curves are obtained with our new bound. It can for example be noticed that for \({d}=1\) and \(\delta_i = 2^{-7}\), the bound from Ito et al. and Masure et al. (Loïc Masure, Rioul, and Standaert 2023) is roughly equal to \(2^{-5}\), whereas our upper bound is less than \(2^{-12}\), meaning that the gain is roughly \(2^{12-5}\) which corresponds to half the field size. A similar factor is observed for larger \(d\) values.
We also add the following proposition (proven in Appendix) that gives a more intuitive view of our results and makes the removal of the field size loss explicit.
[prop:approx_0] The Taylor expansion of the mgl function is the following: \[\label{eq:prop:approx_0} {\mathop{\mathrm{f}}_{\mathop{\mathrm{{\mathsf{MI}}}}, 2^{n}}\!\left(\delta_0, \ldots, \delta_{d}\right)} = \eta \prod_{i=0}^d \frac{\delta_i}{\eta} + o\!\left( \prod_{i=0}^d \delta_i \right) \enspace ,\] where \(\eta = (2 \ln 2)^{-1} \approx 0.72\).
We note that the \(\eta\) parameter does not exactly correspond to the noise rate \(\tau\) of , since it depends on the noise level. But for high noise levels, where the first-order Taylor expansion is accurate, its value of 0.72 corresponds to the noise threshold in the CCS 2022 and the CARDIS 2022 papers.7
The MGL: Tighter or Tight?
[sub:tight] We have shown in that our upper bound obtained from the mgl is tighter than the one achieved by Ito et al. (Ito, Ueno, and Homma 2022) and Masure et al. (Loïc Masure, Rioul, and Standaert 2023). We may therefore wonder to what extent the new mi upper bound is tight. In other words, are there some leakage models such that the mi between the secret and the leakage of all shares equals the mgl function. In this respect, Jog and Anantharam’s results could be interpreted as the fact that the bound given by the mgl is at least locally tight, as stated hereafter.
[prop:locally_tight] For all \((x_0, \ldots, x_{d}) \in \left[0, {n}\right]^{{d}+1}\), there exists a leakage distribution \(({{\textbf{L}}} \;\mid\;{{Y}})\) such that:
For all \(i \in [| 0, {d}|]\), we have \(\mathsf{H}\!\left({{Y}}_i \;\mid\;{{L}}_i = {\mathnormal{l}}_i\right) = \delta_i\) and
\(\mathsf{H}\!\left({{Y}}\;\mid\;{{\textbf{L}}}= ({\mathnormal{l}}_0, \ldots, {\mathnormal{l}}_{d})\right) = k + {\mathop{\mathrm{f}}_{\mathsf{H}, 2^{n}}\!\left(x_0 - k, \ldots, x_p - k \right)},\)
where \(k\) and \(p\) are the parameters defined in .
In other words, without further assumption on the leakage model, the bound given by the mgl is the best possible. We next investigate whether it is actually tight for practically-relevant leakage functions by confronting the bounds from the mgl to the direct computation of the mi for a shared secret. For this purpose, we assume that each share leaks a deterministic function of its value with an additive Gaussian noise, similarly to the experiments conducted by Ito et al. (Ito, Ueno, and Homma 2022, sec. 7.1) and Masure et al. (Loïc Masure, Rioul, and Standaert 2023, sec. 3.1). In particular, we consider two deterministic leakages, namely the lsb of the share, or its Hamming weight. The mi is estimated with Monte-Carlo methods by sampling \(N_{v}= 10,000\) leakages. Then, for each simulated leakage, the conditional pmf can be exactly computed using a sasca (Veyrat-Charvillon, Gérard, and Standaert 2014). 8
The results are depicted in , for Boolean sharings with \(2\) shares and \(3\) shares, and for lsb (Figs. [fig:HWGN:lsb1], [fig:HWGN:lsb2]) and Hamming weight leakages (Figs. [fig:HWGN:hw1], [fig:HWGN:hw2]). Each plot depicts the mi of the secret, depending on the variance \(\sigma^2\) of the additive Gaussian noise.
As one can observe on and , the bounds obtained by the mgl, depicted in dashed curves, are tight with the plain curves computed from the sasca for the lsb leakage model. However, for the Hamming weight leakage model, we observe a gap between our upper bound and the ground truth. Moreover, the gap between the dashed curve and the plain curve in seems wider than the one in . This shows that the Hamming weight leakage model does not verify . The combination of these observations confirms that no significant improvements of the bound can be obtained without making additional assumptions on the leakage function.
Linking the MI with the Success Rate
Having upper bounded the mi between the secret and one side-channel trace, we may then lower bound the required number of queries for any sca adversary, by leveraging Chérisey et al.’s \(f(\mathop{\mathrm{{\mathsf{SR}}}},{\mathcal{Y}})\) function, as stated hereafter.
[cor:cherisey] In the same setting as in , \[\label{eq:cor:cherisey} N_{a}(\mathop{\mathrm{{\mathsf{SR}}}}) \geq \frac{f(\mathop{\mathrm{{\mathsf{SR}}}}, {\mathcal{Y}})}{{\mathop{\mathrm{f}}_{\mathop{\mathrm{{\mathsf{MI}}}}, 2^{n}}\!\left(\delta_0, \ldots, \delta_p\right)}} \enspace \cdot\]
Combining with . We compare this approach with a simulated sasca attack on , for the two leakage models investigated in . The plain curves denote the attack complexity obtained from a key recovery. There, the success rate is estimated with re-sampling from a validation set of \(N_{v}= 10,000\) traces. More precisely, the \(N_{v}\) validations traces are re-shuffled between \(100\) and \(1,000\) times to emulate different attack sets. While this method is prone to be biased when \(N_{a}\) is close to \(N_{v}\), the method remains sound if the success rate converged towards \(1\) within \(N_{v}\) traces, as it cancels the bias. 9 The dotted green curves correspond to Equation [eq:cherisey] where the direct estimation of the mi between the shared secret and the leakage of the shares (from ) is used. The dashed red curves correspond to the bound given by .
\(\sigma^2 = 2^5, {d}= 1\).
[fig:sigma_high]
\(\sigma^2 = 2^2, {d}=2\).
[fig:sigma_low]
One can notice that the plain curves and the dotted curves are always close to each other, meaning that Chérisey et al.’s function is reasonably tight in our context. Moreover, similarly to what was noticed in , the bound provided by is tight for the lsb leakage model, but remains non-tight for the HW leakage model.
On the Dependence of the Group Structure
In our previous derivations, we assume that the field in which masking is applied is a power of two. Since this is the only limitation compared to the results of Ito et al. (Ito, Ueno, and Homma 2022) and Masure et al. (Loïc Masure, Rioul, and Standaert 2023), we finally discuss whether this additional assumption is crucial. To this end, we show that Masure et al.’s approach using Pinsker (Cover and Thomas 2006 Lemma 11.6.1) and reverse Pinsker (Sason and Verdú 2015 Thm. 1) inequalities can be improved using the theory of majorization (Marshall, Olkin, and Arnold 1980).
In a nutshell, majorization can be seen as a partial order relationship on pmf’s quantifying “how spread out” a pmf is, compared to another. The most spread out pmf is the uniform distribution, so it can be used to assess how close to uniform a given pmf is. Hereupon, Rioul recently characterized optimal Pinsker-like and reversed Pinsker-like inequalities (Rioul 2022). While the optimal Pinsker inequality does not improve upon Pinsker’s inequality, the optimal reverse Pinsker does improve it. Leveraging this improvement, the results of Masure et al. (Loïc Masure, Rioul, and Standaert 2023) are refined for arbitrary field size, as stated hereafter.
[thm:informal] Let \({\mathcal{Y}}\) be a group of order \({M}\), and \({{Y}}, {{\textbf{L}}}\) denote the joint distribution of a \({d}+1\)-shared secret and its corresponding leakage. Let \(\tau = \left(2 \log(2)\right)^{-1} \approx 0.72\), and let \(P = \frac{1}{4} \prod_{i=0}^{d}\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}}_i ; {{L}}_i\right)\tau^{-1}\). Then, for any \(\alpha \in [0,1]\), there exists a constant \(C_\alpha \in [\log_2({M}), 2{M}]\) such that \[\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}} ; {{\textbf{L}}}\right) \leq C_\alpha P^{\alpha} \enspace .\] In particular, for \(\alpha = \frac{1}{2}\), we have \[\mathop{\mathrm{{\mathsf{MI}}}}\!\left({{Y}} ; {{\textbf{L}}}\right) \leq \log({M}) \left(1 + \frac{1}{{M}}\right)P^{1/2} \enspace .\]
The bounds in — whose formal statement is given and proven in Appendix — are not as tight as the ones from but hold for any field size \({M}\), which makes them interesting when \({M}\) is not a power of two.
depicts the range of \(C_\alpha\) depending on \(\alpha\).
It illustrates that there is a trade-off between the constant factor overhead \(C_\alpha\) and the effective number of shares \(\alpha \cdot \left({d}+1\right)\). Overall, this provides good hints towards the conjectured absence of constant factor overhead in non-binary fields, and opens some perspectives towards a formal proof of the masked encoding bound in this context.
Conclusion and Perspectives
From a practical perspective, our work contributes to formalizing the soundness of so-called shortcut evaluations, where the security level of an implementation protected with higher-order masking is assessed based on the security of its individual shares. By performing our proofs directly with noisy leakages, we show that such shortcuts are actually tight for masked encodings.
As mentioned in introduction, a natural extension of this work is to explore the tightness of bounds for masked computations (e.g., multiplications) and not only encodings. Besides, our results of show that while the bound we provide is locally tight (i.e., tight for some leakage functions), it is not tight for other practically-relevant leakage functions like the Hamming weight one (and, in general, for leakage functions having preimages of different sizes). It could therefore be interesting to study whether a mild characterization of the leakages could be used to improve the shortcut evaluation of masking for these functions. Another possible track of research is to study whether improved connections between the mutual information and the success rate can be obtained: despite the bounds of already give good evaluations, there remains a small gap that could possibly be removed (e.g., taking advantage of other information theoretic metrics like the Alpha-Information (Liu et al. 2021)). Eventually, yet another question is whether these bounds, for now studied in a known (random) plaintext context cover adaptive chosen-plaintext side-channel attacks (Veyrat-Charvillon and Standaert 2010)?
Acknowledgments
François-Xavier Standaert is a Senior Research Associate of the Belgian Fund for Scientific Research (FNRS-F.R.S.). This work has been funded in part by the ERC project number 724725 (acronym SWORD). This work has also partly benefited from the bilateral MESRI-BMBF project “APRIORI” from the ANR cybersecurity 2020 call. The authors also acknowledge financial support from the French national Bank (BPI) under Securyzr-V grant (Contract DOS0144216/00), a RISC-V centric platform integrating security co-processors.
Proof of
\[\begin{aligned} \mathop{\mathrm{{\mathsf{MI}}}}\!\left(Y ; \mathbf{L}\right) &\leq 1 - H_b\Bigg ( \frac{1}{2} - 2^d \prod_{i=0}^{d} \biggl ( \frac{1}{2} - H_b^{-1} \big (1 - \mathop{\mathrm{{\mathsf{MI}}}}\!\left(Y_i ; \mathbf{L}\right) \big ) \biggl ) \Bigg ) \\ &= \frac{2 \log(e)}{4} \prod_{i=0}^d 4 ( \frac{1}{2} - H_b^{-1}(1 - \mathop{\mathrm{{\mathsf{MI}}}}\!\left(L_i ; Y_i\right)))^2 \nonumber \\ &\qquad+ o \bigg( \prod_{i=0}^d 4 ( \frac{1}{2} - H_b^{-1}(1 - \mathop{\mathrm{{\mathsf{MI}}}}\!\left(L_i ; Y_i\right)))^2 \bigg) \\ &= \frac{2 \log(e)}{4} \prod_{i=0}^d 4 \frac{\mathop{\mathrm{{\mathsf{MI}}}}\!\left(L_i ; Y_i\right)}{2 \log e} + o\bigg( \prod_{i=0}^d 4 \frac{\mathop{\mathrm{{\mathsf{MI}}}}\!\left(L_i ; Y_i\right)}{2 \log e} \bigg) \end{aligned}\] That it under normalized form \[\begin{aligned} \mathop{\mathrm{{\mathsf{MI}}}}\!\left(Y ; \mathbf{L}\right) &\leq \frac{\log(e)}{2} \prod_{i=0}^d \mathop{\mathrm{{\mathsf{MI}}}}\!\left(L_i ; Y_i\right) \frac{2}{\log e} + o\bigg( \prod_{i=0}^d \mathop{\mathrm{{\mathsf{MI}}}}\!\left(L_i ; Y_i\right)\bigg) \\ &= \eta \prod_{i=0}^d \frac{\mathop{\mathrm{{\mathsf{MI}}}}\!\left(L_i ; Y_i\right)}{\eta} + o\bigg( \prod_{i=0}^d \mathop{\mathrm{{\mathsf{MI}}}}\!\left(L_i ; Y_i\right)\bigg). \end{aligned}\]
Technical Statements and Proofs from
[prop:optRevPin] Let \(f_{\mathsf{opt}}\) be the optimal reverse Pinsker inequality, i.e., \[\begin{gathered} \begin{array}{l|rcl} f_{\mathsf{opt}}: & \left[0, \frac{1}{{M}}\right] & \longrightarrow & \mathbb{R}_+ \\ & \Delta & \longmapsto & \frac{1}{M} ((1 + M \Delta) \log(1 + M \Delta) \end{array}\\{ + ( (1-\Delta)M - L) \log( (1-\Delta)M - L))} \end{gathered}\] where \(L=\lfloor M (1 - \Delta) \rfloor\). For all pmf \(P\) we have \(\mathsf{D_{KL}}(P\|U) \leq f_{\text{opt}}(\Delta(P,U))\).
By applying the entropy which is Schur-concave to Eqn. 51 in (Rioul 2022). We factor \(-\log M\) in each term of the inequality to obtain Prop. [prop:optRevPin].
Let \(\mathcal{H}\) be the class of function that is lower bounded by \(f_{\text{opt}}\), concave when composed with a square root and increasing. Let \(P = \frac{1}{4} \prod_{i=0}^d C \mathop{\mathrm{{\mathsf{MI}}}}\!\left(Y_i ; L_i\right)\), we have \[\mathop{\mathrm{{\mathsf{MI}}}}\!\left(Y ; \mathbf{L}\right) \leq \inf_{f \in \mathcal{H}} (f\circ \sqrt{\text{ }}) (P).
\label{eqn:classIneq}\] Let \(C_{\alpha}=\sup_{\Delta \in ]0,1-\frac{1}{M}]} f^*(\Delta) \Delta^{-2\alpha} = \max_{\Delta = k/M, k \in \{1,\ldots M-1\}} f^*(\Delta) \Delta^{-2\alpha}\).
We have \[\begin{aligned}
\mathop{\mathrm{{\mathsf{MI}}}}\!\left(Y ; \mathbf{L}\right)
&\leq \min \biggl (\log(1 + M^2(4^{\frac{1}{M}}-1) P ) , (\frac{1}{M} + \sqrt{P}) \log(1+M \sqrt{P}) \biggl )
\label{eqn:f1f2}
\\
&\leq \inf_{\alpha \in [0,1]} C_{\alpha} \cdot P^{\alpha}
\label{eqn:ineqPoly}
\\
&\leq \log(M) (1 + \frac{1}{M}) \cdot P^{ \frac{1}{2} }.
\label{eqn:ineqSQRT}
\end{aligned}\] [thm:Eqn51]
The infinum in Eqn. [eqn:classIneq] can be computed with the Legendre-Fenchel transform \(f \mapsto f^*\) (i.e. \(f^*(p) = \sup_x \{ px-f(x) \}\)). Indeed, it is given by \(\Delta \mapsto - (-f_{\text{opt}}\circ\sqrt{\cdot})^{**}(\Delta^2)\) by applying Thm. 10 in (Touchette 2005).
The different inequalities are shown in Fig. [fig:Eqn51]. \(f_1\) is the best for \(\Delta \leq 1/M\) and else \(f_2\) is the best. Eqn. [eqn:ineqSQRT] shows that if we reduce the security exponent to \(\frac{1}{2}\) we can obtain a mild (logarithmic) dependency in the field size.
All derivations of (Loïc Masure, Rioul, and Standaert 2023) hold for \(f \in \mathcal{H}\) which shows Eqn. [eqn:classIneq]. Indeed, \[\begin{aligned} \mathop{\mathrm{KL}}(Y|\mathbf{L}||U) &\leq f_{\text{opt}}(\Delta) && Prop.~\ref{prop:optRevPin} \\ &\leq f(\Delta) && f_{\text{opt}} \leq f \\ &\leq f( \frac{1}{2} \prod_{i=0}^d 2 \Delta((Y_i|L_i);U)) && \text{XOR Lemma} \\ &= (f \circ \sqrt{\cdot} )(\frac{1}{4} \prod_{i=0}^d 4 \Delta((Y_i|L_i);U))^2 \\ &\leq (f \circ \sqrt{\cdot} )(\frac{1}{4} \prod_{i=0}^d C \mathop{\mathrm{KL}}(Y_i|L_i||U)) && \text{Pinsker} \end{aligned}\] Since \((f \circ \sqrt{\cdot} )\) is concave, we apply Jensen inequality and take the expectation to obtain the desired inequality. The expectation of the product is simplified to the product of the expectations by independence of the terms. Let \(f_1 : \Delta \mapsto \log(1 + M^2(4^{\frac{1}{M}}-1) \Delta^2) \approx M C \Delta^2\) and \(f_2 : \Delta \mapsto (\Delta + \frac{1}{M}) \log(1 + M \Delta)\). We show that \(f_1\) and \(f_2\) are in \(\mathcal{H}\). For \(f_2\) it is clear since \(f_2\) is \(f^*\) where we removed the negative \(1/M\) periodic term. \(f_1\) is clearly concave in \(\Delta^2\) and increasing. To ensure that \(f_1 \geq f_{\text{opt}}\) we consider the case of equality in \(\frac{1}{M}\). This imposes \(\log(1 + \beta_M M^{-2}) = \frac{2}{M}\) where \(\beta_M = M^2(4^{\frac{1}{M}}-1)\). For \(\Delta \leq \frac{1}{M}\), \(M f_{\text{opt}}(\Delta) = (1+M\Delta)\log(1+M\Delta) +(1-M\Delta)\log(1-M\Delta) \leq \frac{2}{M} \log(1+M^2\Delta^2)\) by Jensen in equality. This upper bound of \(f_{\text{opt}}\) is a lower bound of \(f_1\). Since \(\log\) is increasing the inequality holds if and only if \(1+\beta_M \Delta^2 \geq (1+M^2\Delta^2)^{\frac{2}{M}}\). Equality holds in \(0\) and \(1/M\) and we show that the derivative of the difference is increasing then decreasing. The derivative is given by \(2\Delta(\beta_M-2M(1+M^2\Delta^2)^{\frac{2}{M}-1})\) and its sign is given by \(\beta_M - 2M(1+M^2\Delta^2)^{\frac{2}{M}-1}\). This quantity is positive in \(0\) and monotonically decreasing hence the result. It remains to prove the inequality for \(\Delta \geq \frac{1}{M}\). To do so, we show that \(f_1(\Delta) \geq \log(1+M\Delta) \geq \frac{1+M\Delta}{M} \log(1+M\Delta)\). Since \(\log\) is increasing it is enough to have \(1+\beta_M \Delta^2 \geq 1 + M \Delta\) that is \(\Delta \geq M/\beta_M\) i.e., \(4^{\frac{1}{M}} - 1 \geq \frac{1}{M}\). This holds since \(e^x-1 \geq x\) by convexity of the exponential. This shows that \(f_1 \in \mathcal{H}\) and Eqn. [eqn:f1f2] is proved. Let \(\mathcal{H}_{\text{poly}} = \{ f_{\alpha} : \Delta \mapsto C_\alpha \Delta^{2\alpha} | \alpha \in [0,1]\}\), we show that \(\mathcal{H}_{\text{poly}} \subset \mathcal{H}\). Functions \(\mathcal{H}_{\text{poly}}\) are concave when composed with a square root since \(\alpha \leq 1\), increasing since \(\alpha \geq 0\) and lower bounded by \(f_{\text{opt}}\) by definition of \(C_\alpha\). This proves Eqn. [eqn:ineqPoly]. To prove Eqn. [eqn:ineqSQRT] we observe that \(C_0 = \log(M)\), \(C_{\alpha}\) is continuous and increasing in \(\alpha\). Consider the values of \(\Delta\) for which the \(\sup\) in the definition of \(C_\alpha\) is reached. Since \(f_{\text{opt}}\) is square-root convex in the intervals \([k/M,(k+1)/M]\) and \(\Delta \mapsto C_\alpha \Delta^{2\alpha}\) is square-root concave we can only have equality in \(\frac{k+1}{M}\). This shows that \(C_\alpha = \max_{\Delta = k/M, k \in \{1,\ldots M-1\}} f^*(\Delta) \Delta^{-2\alpha}\). We verify that the maximum is reached in \(1-1/M\) when \(\alpha= \frac{1}{2}\). The ratio of the sequence \(f^*(k/M) (\frac{k}{M})^{-2\alpha}\) is larger than \(1\) which proves Eqn. [eqn:ineqSQRT].
I created this website using Quarto https://quarto.org/docs/websites.
References
Footnotes
i.e., the number of shares is \({d}+1\) if the independence assumption is met.↩︎
e.g., Young-Minkowski’s convolution inequality for the tv (Loïc Masure, Rioul, and Standaert 2023) or Plancherel’s formula combined with the convolution theorem for the en (Ito, Ueno, and Homma 2022).↩︎
For cryptographic reasons, the vast majority of the intermediate computations are uniformly distributed, including the inputs and outputs of Sbox — provided that the plaintext and the key are uniformly distributed as well. The only non-uniform intermediate computations of a block cipher may be the potential intermediate calculations of an Sbox.↩︎
We assume without loss of generality that \(\mathop{\mathrm{{\mathsf{m}}}}\) has full support over \({\mathcal{Y}}\).↩︎
The interested reader may find a similar convexity result, stated in terms of statistical distance, in the works of Dziembowski et al. (Dziembowski, Faust, and Skórski 2016 Cor. 2).↩︎
We have afterwards noticed that the proof of Masure et al.’s bound (Loïc Masure, Rioul, and Standaert 2023 Prop. 2, Thm. 3) was suboptimal, so the bound of Masure et al. is actually slightly better than Ito et al.’s one by a factor \(\frac12\). That is why in the remaining of this paper, we mainly compare against the bound of Masure et al..↩︎
For low noise levels, it gets gradually closer to one, but this gain has limited practical relevance since masking only provides high security with sufficient noise.↩︎
This condition is verified retrospectively on .↩︎