What can Information Guess? Guessing Advantage vs. Rényi Entropy for Small Leakages

Author

Julien Béguinot and Olivier Rioul

Abstract

We leverage the Gibbs inequality and its natural generalization to Rényi entropies to derive closed-form parametric expressions of the optimal lower bounds of \(\rho\)th-order guessing entropy (guessing moment) of a secret taking values on a finite set, in terms of the Rényi-Arimoto \(\alpha\)-entropy. This is carried out in an non-asymptotic regime when side information may be available. The resulting bounds yield a theoretical solution to a fundamental problem in side-channel analysis: Ensure that an adversary will not gain much guessing advantage when the leakage information is sufficiently weakened by proper countermeasures in a given cryptographic implementation. Practical evaluation for classical leakage models show that the proposed bounds greatly improve previous ones for analyzing the capability of an adversary to perform side-channel attacks.

Introduction

Guessing entropy (Massey 1994), also known as guesswork (Pliam 2000), is perhaps the most popular security metric in the context of side-channel attacks of embedded cryptographic devices, such as the cryptographic microcontrollers used in banking smartcards (Standaert, Malkin, and Yung 2009; Choudary and Popescu 2017; Tănăsescu et al. 2021). Such attacks exploit leakage information to recover the secret key, byte by byte in a divide-and-conquer strategy, in which each subkey byte \(K\in\{1,\ldots,M\}\) (typically \(M=128\) or \(256\)) is targeted independently of the others. The secret key \(K\) is generally assumed uniformly distributed, but in a more general framework, any type of secret \(X\in\{1,\ldots,M\}\) (e.g., passwords, sensitive personal information, etc.) can be targeted from some disclosed side information \(Y\).

Guessing entropy (Massey 1994) \(G(X|Y)\), or more generally, \(\rho\)th order guessing moments (Arikan 1996) \(G_\rho(X|Y)\), relate to the number of tries that the attacker has to make to find the actual secret \(X\) for a given leakage side information \(Y\), thereby estimating the brute force effort to find \(X\) by exhaustive search. The popularity of \(G_\rho\) (particularly \(G=G_1\)) as a security criterion comes from the fact that it is particularly informative, as it is computed from whole key ranking distribution (Poussier, Standaert, and Grosso 2016). The adversary’s guessing advantage is defined as \[\Delta G_\rho (X;Y) \triangleq G_\rho(M)-G_\rho(X|Y)\] where \(G_\rho(M)=G_\rho(K)\) is for a blind estimation (no leakage) of a uniformly distributed \(M\)-ary secret \(K\).

Side channel leakage was evaluated by information theoretic measures such as mutual information \(I(X;Y)\) (Standaert, Malkin, and Yung 2009; Chérisey et al. 2019a, 2019b; Cheng et al. 2022), maximal leakage (Issa, Wagner, and Kamath 2019; Béguinot, Liu, et al. 2023), and more generally, Arimoto’s \(\alpha\)-information (Arimoto 1977) \(I_\alpha(X;Y)=H_\alpha(X)-H_\alpha(X|Y)\) or Sibson’s \(\alpha\)-information (Sibson 1969; Verdú 2015; Liao et al. 2019; Liu et al. 2021). The latter two \(\alpha\)-informations coincide for uniform secrets: \(I_\alpha(K;Y)=\log M -H_\alpha(K|Y)\) (Hayashi and Tan 2017). In many practical cases, such as for protected implementations with masking or low SNR noise (Ishai, Sahai, and Wagner 2003; Cheng et al. 2022; Masure, Rioul, and Standaert 2023; Béguinot, Cheng, et al. 2023; Béguinot, Liu, et al. 2023), the information leakage is small, which means that the conditional Rényi-Arimoto entropy (Fehr and Berens 2014) \(H_\alpha(X|Y)\) approaches its maximum value \(\log M\) in the case of uniform secrets. Thus, to evaluate the impact of leakage in cryptographic implementations, a significant quantity is the “information advantage” \[\Delta H_\alpha(X;Y) \triangleq \log M - H_\alpha(X|Y).\]

A fundamental problem in side-channel analysis is to ensure that an adversary will not gain much guessing advantage when the leakage is sufficiently weakened by proper countermeasures in some cryptographic implementation (Choudary and Popescu 2017; Tănăsescu et al. 2021). In other words, it is important to upper bound \(\Delta G_\rho (X;Y)\) in terms of \(\Delta H_\alpha(X;Y)\), or equivalently, to lower bound \(G_\rho (X|Y)\) in terms of \(H_\alpha(X|Y)\). As explained in (Choudary and Popescu 2017; Tănăsescu et al. 2021), this is also useful to practically evaluate guessing advantage for a full key (e.g., 256-bit key) whose direct evaluation is not tractable.

Many such bounds have been derived in the literature. Massey’s original inequality (Massey 1994) is for \(\alpha=\rho=1\) and was improved by Rioul (Rioul 2022) as an asymptotically optimal inequality (Tănăsescu et al. 2021) as \(M\to+\infty\). Arikan’s inequalities (Arikan 1996) are for \(\alpha=\frac{1}{1+\rho}\) and exhibit asymptotic equivalence. More general bounds for various ranges of \(\alpha\) and \(\rho\) were established in (Rioul 2022). These works, however, can only be optimal as \(M\to+\infty\) and do not consider the nonasymptotic scenario for small leakage, i.e., around the corner \((H_\alpha(X|Y)\approx \log M,G_\rho(X|Y)\approx G_\rho(M))\) for relatively small \(M\).

Optimal joint range region between \(H(X|Y)\) and \(G(X|Y)\) for different values of \(M\). The (optimal) upper bound is that of Mc Eliece and Yu (McEliece and Yu 1995). The optimal lower bound is derived in this paper. The black dotted and dash-dotted curves correspond to Massey’s (Massey 1994) and Rioul’s (Rioul 2022) inequalities, that do not depend on \(M\).

As an illustration for \(\alpha=\rho=1\), Figure 1 shows that for any fixed value of \(M\), there is a multiplicative gap of approximately \(e/2\) in guessing entropy from this corner point \((\log M, \frac{M+1}{2})\) compared to Rioul’s inequality. Sason and Verdú (Sason and Verdú 2018) improved Arikan’s inequalities in a non-asymptotic regime for any ranking function but did not obtain the exact locus of attainable values of \(G_\rho\) vs. \(H_\alpha\), nor closed-form expressions for lower bounds of \(G_\rho\) vs. \(H_\alpha\).

In this paper, we leverage the Gibbs inequality (Cover and Thomas 1st Ed. 1990, 2nd Ed. 2006) and its natural generalization to Rényi entropies (Rioul 2020) to derive closed-form parametric expressions of the optimal lower bounds of \(G_\rho(X|Y)\) vs. \(H_\alpha(X|Y)\) (or upper bounds of \(\Delta G_\rho(X ;Y)\) vs. \(\Delta H_\alpha(X ;Y)\)). We obtain an explicit, easily computable, first-order bound of the form \(\Delta G_\rho(X;Y)\leqslant c \sqrt{\Delta H_\alpha(X;Y)}\) for small leakages. We then evaluate and refine our bounds in the Hamming weight and random probing leakage models.

Notations

The secret is modeled as a random variable \(X\) with pmf \(p_X\) taking values in a finite set \(\{1,\ldots,M\}\). The side channel \(X\to Y\) leaks some side information \(Y\), which is modeled as an arbitrary random variable (discrete or continuous). For most cryptographic applications, \(X\) is a uniformly distributed key \(K \sim \mathcal{U}(M)\). In this case the side channel is noted \(K \to Y\).

We let \(\alpha > 0\) and \(\rho > 0\) be the entropy and guessing entropy orders, respectively. The Rényi entropy of order \(\alpha\) is \(H_\alpha(X) \triangleq - \alpha' \log \| p_X \|_\alpha\) where \(\| p \|_\alpha \triangleq ( \sum p^\alpha )^{{1}/{\alpha}}\) denotes the “\(\alpha\)-norm”, the logarithm is taken to an arbitrary base, and \(\alpha'\) is the Hölder conjugate of \(\alpha\): \(\frac{1}{\alpha} + \frac{1}{\alpha'} = 1\), i.e. \(\alpha' = \frac{\alpha}{\alpha-1}\) or \((\alpha'-1)(\alpha-1) = 1\). We also write \(H_\alpha(p)=- \alpha' \log \| p \|_\alpha\). The limiting case \(\alpha\to 1\) is the Shannon entropy \(H(X)=H_1(X)\). The limiting case \(\alpha\to\infty\) is the min-entropy \(H_\infty(X) \triangleq - \log \max p_X\).

The \(\rho\)-guessing entropy (a.k.a. guessing \(\rho\)th-order moment) is \(G_\rho(X) \triangleq \min_{ \sigma} \sum_{i = 1}^{ M} p_X( \sigma(i) ) i^\rho = \min_{ \sigma} \mathbb{E} (\sigma(X)^\rho)\), where the minimum is over all possible permutations of the secret values. We also write \(G_\rho(p)\) when \(X\) has pmf \(p\). By the rearrangement inequality, since the sequence \(i^\rho\) is increasing, the optimal guessing strategy is such that the \(p_X( \sigma(i) )\) are arranged in decreasing order. When \(X=K\) is uniformly distributed, its \(\rho\)-guessing entropy is also noted \(G_\rho(M) \triangleq \frac{1}{M} \sum_{i = 1}^{M} i^\rho\).

In the presence of leakage \(Y\), the most natural definition of conditional \(\alpha\)-entropy is that of Arimoto (Arimoto 1977; Fehr and Berens 2014): \(H_\alpha(X|Y) \triangleq - \alpha' \log \mathbb{E} _Y \| p_{X|Y} \|_\alpha\). In particular \(H_\infty(X|Y) \triangleq - \log \mathbb{E} _Y \max p_{X|Y}\) where \(\mathbb{E} _Y \max p_{X|Y}\) is the optimal success probability as given by the MAP rule. Following Hirche (Hirche 2020) we also define \[\begin{aligned} \label{eqKalpha} K_\alpha(X) &\triangleq \|p_X\|_\alpha =\exp \bigl ( \tfrac{1-\alpha}{\alpha} H_\alpha(X) \bigr ) \\ K_\alpha(X|Y) &\triangleq \mathbb{E} _Y \|p_{X|Y}\|_\alpha = \exp \bigl ( \tfrac{1-\alpha}{\alpha} H_\alpha(X|Y) \bigr ) \end{aligned}\] so that \(K_\alpha(X|Y) = \mathbb{E} _y K_\alpha(X|Y=y)\). Likewise, the conditional \(\rho\)-guessing entropy is \(G_\rho(X|Y) \triangleq \mathbb{E} _{y } G_\rho(X|Y=y)\). In particular for \(\rho=1\), \(G(X|Y)\triangleq \mathbb{E} _{y } G(X|Y=y)\).

Optimal Lower Bound on \(G_\rho(X|Y)\) vs. \(H_\alpha(X|Y)\)

Case \(\rho=1\): \(G(X|Y)\) vs. \(H(X|Y)\)

[th:GH] The optimal lower bound on \(G(X|Y)\) vs. \(H(X|Y)\) is given by the parametric curve for \(\gamma\in(0,1)\):

\[\label{eq:GH} \begin{cases} G(X|Y) = \frac{1}{1-\gamma} - \frac{M \gamma^M}{1-\gamma^M} \\ H(X|Y) = \log (\! \gamma \frac{1 - \gamma^{ M}}{1 -\gamma} ) \!-\! (\log \gamma) ( \frac{1}{1-\gamma} \!-\! \frac{M \gamma^M}{1-\gamma^M} ) \end{cases}\] where the limiting case \(\gamma\to 1\) gives \(G=\frac{M+1}{2}\) and \(H=\log M\) attained for the uniform distribution.

The optimal upper bound on \(\Delta G(X ; Y) = \frac{M+1}{2}-G(X|Y)\) vs. \(\Delta H(X ; Y) = \log M -H(X|Y)\) is given by the parametric curve for \(\mu\in(0,+\infty)\): \[\label{eq:DeltaGDeltaH} %\mathcal{C} &: %\begin{cases} % \Delta H(X;Y) = \log \left ( \card \frac{\gamma^{-1/2} - \gamma^{1/2}}{ \gamma^{-\card/2} - \gamma^{\card/2}} \right ) % \\ % \qquad - \frac{(\log \gamma)}{2} \left ( \card \frac{1 + \gamma^{\card}}{1 - \gamma^{\card}} - \frac{1 + \gamma}{1 - \gamma} \right ) % \\ % \Delta G(X ; Y) = \frac{1}{2} \left ( \card \frac{1 + \gamma^{\card}}{1 - \gamma^{\card}} - \frac{1 + \gamma}{1 - \gamma} \right ) %\end{cases} %\\ %&: \!\!\begin{cases} \Delta G(X ; Y) = \frac{1}{2} \bigl( M\coth( M\mu) - \coth( \mu ) \bigr) \\ \Delta H(X;Y) = \log \frac{ M\sinh \mu}{\sinh( M\mu)} %+ (\mu \log e) \bigl( \card \coth( \card \mu) - \coth( \mu ) \bigr) +2\mu (\log e) \Delta G(X;Y). \end{cases}\]

Here \(\alpha=\rho=1\) is the classical situation studied by Massey (Massey 1994), where the essential ingredient is the Gibbs inequality (Cover and Thomas 1st Ed. 1990, 2nd Ed. 2006): In the unconditional case, for any pmf \(q\), \[H( X ) \leqslant- \mathbb{E} _X \log q(X)\] with equality iff \(q=p_X\). One may always assume that \(p_X(x)\) is nonincreasing in \(x\), in which case \(G(X)= \mathbb{E} (X)\). Therefore, we choose \(q\) such that \(\log q(x) = a + b x\) for real constants \(a,b\), so that \(\mathbb{E} _X \log q(X) = a + b G(X)\). To allow equality in the Gibbs inequality (\(q=p_X\)), it is necessary that \(q(x)\) be nonincreasing in \(x\), i.e., \(b\leqslant 0\). Now \(q(x)\) rewrites as a truncated geometric pmf \(q(x)= c_\gamma \gamma^x\) where \(\gamma = \exp(b) \in (0,1]\) and \(c_\gamma = ( \sum_{i=1}^M \gamma^i)^{-1} > 0\) is a normalization factor. Thus one obtains \[H(X) \leqslant-\log c_\gamma - (\log \gamma) G(X).\] for all \(\gamma \in (0,1)\), with equality iff \(p_X=q\) where \(H(X)=H(q)\) and \(G(X)=G(q)\).

In the conditional case, we similarly have \(H(X|Y=y) \leqslant-\log c_\gamma - (\log \gamma) G(X|Y=y)\) for every \(y\). Taking the expectation over \(Y\) yields the same inequality for conditioned entropies: \[H(X|Y) \leqslant-\log c_\gamma - (\log \gamma) G(X|Y),\] that is, \(G(X|Y) \geqslant- (\log \gamma)^{-1} ( H(X|Y) + \log c_\gamma)\). The equality case still corresponds to \(H(X|Y)=H(q)\) and \(G(X|Y)=G(q)\). Since \(q\) approaches the uniform distribution as \(\gamma \to 1\) and the Dirac distribution as \(\gamma \to 0^+\), all possible values of entropies are attainable.

We thus obtain the following parameterization of the optimal lower bound on \(G(X|Y)\) vs. \(H(X|Y)\): \[\begin{cases} G(X|Y) = G(q)= c_\gamma \sum_{i=1}^M i \gamma^i \\ H(X|Y) = H(q)= - \log c_\gamma - G(q) \log \gamma \end{cases}\] where \(\gamma \in (0,1]\). The case \(\gamma=1\) gives \(H=\log M\) and \(G=\frac{M+1}{2}\) attained for the uniform distribution. For \(0<\gamma<1\), a straightforward calculation gives [eq:GH]. Setting \(\mu \triangleq - \frac{1}{2} \ln \gamma \in (0,\infty)\), i.e., \(\gamma=e^{-2\mu}\) gives [eq:DeltaGDeltaH] for \(\Delta G(X ; Y) = \frac{M+1}{2}-G(X|Y)\) and \(\Delta H(X ; Y) = \log M -H(X|Y)\).

\(G_\rho(X|Y)\) vs. \(H(X|Y)\)

[th:GrhoH] The optimal lower bound of \(G_\rho(X|Y)\) vs. \(H(X|Y)\) is given by the parametric curve for \(\gamma \in (0,1]\):

\[\label{eq:GrhoH} % \mathcal{C}_\rho : \begin{cases} G_\rho(X|Y) = (\sum_{i=1}^M i^{\rho} \gamma^{i^\rho}) (\sum_{i=1}^M \gamma^{i^\rho})^{-1} \\ H(X|Y) = \log ( \sum_{i=1}^M \gamma^{i^\rho} ) - (\log \gamma) \frac{\sum_{i=1}^M i^{\rho} \gamma^{i^\rho}}{ \sum_{i=1}^M \gamma^{i^\rho} } \end{cases}\]

The proof is analog to the case of the preceding subsection, where \(\log q(x) = a + b x^\rho\), \(q(x)=c_\gamma \gamma^{x^\rho}\), \(\gamma = \exp b \in (0,1]\) and \(c_\gamma = ( \sum_{i=1}^M \gamma^{i^\rho} )^{-1} > 0\). Again \(q\) approaches the uniform distribution as \(\gamma \to 1\) and the Dirac distribution as \(\gamma \to 0^+\), hence all possible values of entropies are attainable. One readily obtains \[\begin{cases} G_\rho(X|Y) = G_\rho(q)= c_\gamma \sum_{i=1}^M i^{\rho} \gamma^{i^\rho} \\ H(X|Y) = H(q)= - \log c_\gamma - G_\rho(q) \log \gamma \end{cases}\] which gives [eq:GrhoH].

\(G_\rho(X|Y)\) vs. \(H_\alpha(X|Y)\)

The most general optimal lower bound is given by the following Theorem. It generalizes Theorems [th:GH] and [th:GrhoH], which can be recovered in the limiting case \(\alpha\to 1\).

[th:GrhoHalpha] When \(0<\alpha<1\), the optimal lower bound of \(G_\rho(X|Y)\) vs. \(H_\alpha(X|Y)\) is given by the parametric curve for \(\gamma \in (0,\infty)\): \[\label{eq:GrhoHalpha<1} % \mathcal{C}_{\alpha,\rho} : \begin{cases} G_\rho(X|Y) = 1+ \gamma^{-1} \bigr ( \frac{ \sum_{i=1}^M ( 1-\gamma + \gamma i^\rho )^{\alpha'} }{ \sum_{i=1}^M ( 1-\gamma + \gamma i^\rho )^{\alpha'-1} } - 1 \bigr ) \\ H_\alpha(X|Y) = \alpha' %%% ATTENTION: c'est bien \alpha' et non \alpha ICI ??? % VERIFIE: pour \gamma = 0 on doit obtenir H_alpha=\log M !!! \log \sum_{i=1}^M ( 1 -\gamma + \gamma i^\rho )^{\alpha'-1} \\ \hphantom{H_\alpha(X|Y) } + (1-\alpha') \log \sum_{i=1}^M ( 1 -\gamma + \gamma i^\rho )^{\alpha'}. \end{cases}\] When \(\alpha > 1\), the optimal lower bound of \(G_\rho(X|Y)\) in terms of \(H_\alpha(X|Y)\) is given by the parametric curve for \(\gamma \in (0,1)\): \[\label{eq:GrhoHalpha>1} % \mathcal{C}_{\alpha,\rho} : \begin{cases} G_\rho(X|Y) = \gamma^{-1} \bigl ( 1- \frac{ \sum_{i=1}^M (1-\gamma i^\rho)_+^{\alpha'} }{ \sum_{i=1}^M (1-\gamma i^\rho)_+^{\alpha'-1} } \bigr ) %%% ATTENTION c'est bien ceci et non %%% \gamma^{-1} \bigl ( \frac{ \sum_{i=1}^M (1-\gamma i^\rho)_+^{\alpha'} }{ \sum_{i=1}^M (1-\gamma i^\rho)_+^{\alpha'-1} } - 1 \bigr ) ICI ?????? % VERIFIE: pour \gamma proche de 1 on doit obtenir G_rho=1 et pas -1 !!! \\ H_\alpha(X|Y) = \alpha' \log \sum_{i=1}^M (1-\gamma i^\rho)_+^{\alpha'-1} %%% ATTENTION: c'est bien \alpha' et non \alpha ICI ??? % VERIFIE: pour \gamma = 0 on doit obtenir H_alpha=\log M !!! \\ \hphantom{H_\alpha(X|Y) } + (1-\alpha') \log \sum_{i=1}^M (1-\gamma i^\rho)_+^{\alpha'} \end{cases}\] where \(x_+=\max(x,0)\) denotes the positive part of \(x\).

The proof relies on the following Gibbs inequality for Rényi entropies(Rioul 2020 Prop. 8):

For any pmf \(q\), \[\label{eqn:alpha-Gibbs} H_\alpha(X) \leqslant- \alpha' \log \mathbb{E} _{X} q_\alpha^{{1}/{\alpha'}}(X)\] with equality iff \(p_X=q\). Here \(q_\alpha\) is the escort distribution (Rioul 2020) of \(q\), defined by \(q_\alpha(x) =q^\alpha(x) / \| q \|_\alpha^{\alpha}\).

The proof given (Rioul 2020) was for random variables having pdfs with respect to the Lebesgue measure, but applies verbatim to discrete random variables having pmfs with respect to the counting measure on \(\{1,2,\ldots,M\}\). The Gibbs inequality [eqn:alpha-Gibbs] can be easily rewritten directly in terms of \(q(x)\) as \[\label{eq:GibbsHalpha} H_\alpha(X) \leqslant(1-\alpha) H_\alpha(q) - \alpha' \log \mathbb{E} _X q^{\alpha-1}(X).\] In terms of [eqKalpha] it also rewrites \[\label{eq:GibbsKalpha} K_\alpha(X) \lessgtr \mathbb{E} _{X} q_\alpha^{{1}/{\alpha'}}(X) =\frac{ \mathbb{E} _X q^{\alpha-1}(X)}{\|q\|_\alpha^{\alpha-1}}\] where \(\lessgtr\) denotes \(\geqslant\) for \(0<\alpha<1\) and \(\leqslant\) for \(\alpha>1\).

First consider the unconditional case. One may always assume that \(p_X(x)\) is nonincreasing in \(x\), in which case \(G_\rho(X)= \mathbb{E} (X^\rho)\). Therefore, we wish to choose \(q\) such that \(q^{\alpha-1}(x) = a + b x^\rho\) for real constants \(a,b\), i.e., \(q(x)=(a + b x^\rho)^{\alpha'-1}\), so that \(\mathbb{E} _X q^{\alpha-1}(X)=a + b\,G_\rho(X)\).

When \(0<\alpha<1\), \(\alpha'<0\), to allow equality in the Gibbs inequality (\(p_X=q\)), it is necessary that \(a + b x^\rho\geqslant 0\) with nonempty support and that \((a + b x^\rho)^{\alpha'-1}\) is nonincreasing for \(x=1,2,\ldots,M\). This gives the conditions \(b\geqslant 0\) and \(a>-b\). Rewriting \(a + b x^\rho = (a+b) + b(x^\rho-1)\) we obtain \(q(x) =c_\gamma (1 + \gamma (x^\rho -1))^{\alpha'-1}\) where \(\gamma \triangleq \frac{b}{a+b}\in[0,+\infty)\) and \(c_\gamma = ( \sum_{i=1}^M (1 + \gamma (i^\rho-1))^{\alpha'-1} )^{-1} > 0\) is a normalization factor. Note that \(q\) is the uniform distribution when \(\gamma =0\) and approaches the Dirac distribution as \(\gamma \to +\infty\), hence all possible values of entropies are attainable.

When \(\alpha>1\), \(\alpha'>0\), we choose the positive part \(q(x)=(a + b x^\rho)_+^{\alpha'-1}\). Again to allow equality in the Gibbs inequality (\(p_X=q\)), it is necessary \(q(x)\) has nonempty support and is nonincreasing for \(x=1,2,\ldots,M\). This gives the conditions \(b\leqslant 0\) and \(a>-b\). Factoring out \(a>0\) we obtain \(q(x) =c_\gamma (1 - \gamma x^\rho)_+^{\alpha'-1}\) where \(\gamma\triangleq \frac{-b}{a} \in (0,1)\) and \(c_\gamma = ( \sum_{i=1}^M (1 - \gamma x^\rho)_+^{\alpha'-1} )^{-1} > 0\) is a normalization factor. Note that \(q\) approaches the uniform distribution when \(\gamma \to 0\) and the Dirac distribution when \(\gamma \to 1\)—in fact, it is the Dirac distribution for all \(\gamma \in [2^{-\rho},1)\). Hence all possible values of entropies are again attainable.

In both cases, the Gibbs inequality [eq:GibbsKalpha] takes the form¹ \(K_\alpha(X) \lessgtr \varphi\bigl(G_\rho(X)\bigr)\) for some linear function \(\varphi\), with equality iff \(p_X=q\), in which case \(H_\alpha(X)=H_\alpha(q)\) and \(G_\rho(X)=G_\rho(q)\). In the conditional case, we similarly have \(K_\alpha(X|Y=y) \lessgtr \varphi\bigl(G_\rho(X|Y=y)\bigr)\) for every \(y\), and taking the expectation over \(Y\) yields the same inequality for conditioned entropies: \(K_\alpha(X|Y) \lessgtr \varphi\bigl(G_\rho(X|Y)\bigr)\). The equality case still corresponds to \(H_\alpha(X|Y)=H_\alpha(q)\) and \(G_\rho(X|Y)=G_\rho(q)\).

When \(0<\alpha<1\), \(\mathbb{E} _X q^{\alpha-1}(X) = c_\gamma^{\alpha - 1} (1 + \gamma (G_\rho(X)-1))\) and the equality case of the Gibbs inequality [eq:GibbsHalpha] rewrites \(H_\alpha(q)=-\frac{\alpha'}{\alpha} \log \mathbb{E} _X q^{\alpha-1}(X) = -\frac{\alpha'}{\alpha} \log \bigl(1 -\gamma+ \gamma G_\rho(q) \bigr )-\log c_\gamma\). Since by definition \(H_\alpha(q)=- \alpha' \log \| q \|_\alpha\) we obtain the parametric curve \[% \mathcal{C}_{\alpha,\rho} % &: % \begin{cases} % G_\rho(X|Y) =G_\rho(q) % \\ % H_\alpha(X|Y) = -\frac{\alpha'}{\alpha} \log \left ( c_\gamma^{\alpha - 1} (1 + \gamma (G_\rho(q)-1)) \right ) % \end{cases} % \\ %% &: \begin{cases} G_\rho(X|Y) = \gamma^{-1} ( \|q \|_\alpha^\alpha c_\gamma^{1-\alpha} - 1) + 1 \\ H_\alpha(X|Y) = - \alpha' \log \| q \|_\alpha. \end{cases}\] Substituting \(\| q \|_\alpha^\alpha = c_\gamma^\alpha \sum_{i=1}^M (1+\gamma (i^\rho -1))^{\alpha'}\) gives [eq:GrhoHalpha<1].

When \(\alpha > 1\), \(\mathbb{E} _X q^{\alpha-1}(X) = c_\gamma^{\alpha - 1} (1 - \gamma G_\rho(X))\), and the equality case of the Gibbs inequality [eq:GibbsHalpha] rewrites \(H_\alpha(q)=-\frac{\alpha'}{\alpha} \log \mathbb{E} _X q^{\alpha-1}(X) = -\frac{\alpha'}{\alpha} \log(1 - \gamma G_\rho(q))-\log c_\gamma\). We similarly obtain the parametric curve \[\begin{aligned} % \mathcal{C}_{\alpha,\rho} % &: % \begin{cases} % H_\alpha(X|Y) = -\frac{\alpha'}{\alpha} \log \left ( c_\gamma^{\alpha - 1} (1 - \gamma G_\rho(X)) \right ) % \\ % H_\alpha(X|Y) = - \alpha' \log \| q \|_\alpha % \end{cases} % \\ % & : \begin{cases} G_\rho(X|Y) = \gamma^{-1} (1- \|q \|_\alpha^\alpha c_\gamma^{1-\alpha}) \\ H_\alpha(X|Y) = - \alpha' \log \| q \|_\alpha \end{cases}\end{aligned}\] Substituting \(\| q \|_\alpha^\alpha = c_\gamma^\alpha \sum_{i=1}^M (1-\gamma i^\rho)_+^{\alpha'}\) gives [eq:GrhoHalpha>1].

Sason and Verdú (Sason and Verdú 2018) stated an implicit lower bound on guessing moment vs. Rényi entropy in the unconditional case, by specifying only the minimizing pmf (equation (59) in (Sason and Verdú 2018)) for a guessing strategy which is not necessarily optimal. Their minimizing pmf is not the same as in the above proof when \(0<\alpha<1\) because it is does not satisfy the constraint that \(p_X(x)\) should be decreasing in \(x\). Additionally, it can be checked that it cannot approach the Dirac distribution, hence does not provide the full range of the entropy values.

As an important consequence, an explicit first-order upper bound on \(\Delta G_\rho(X;Y)\) can be obtained, which is easy to compute for any adversary observing small leakages.

[thm:explicit] As \(\Delta H_\alpha(X;Y) \to 0\), up to first order, \[\label{eq:GrhoHalpha1storder} \Delta G_\rho(X;Y) \lesssim \sqrt{ \frac{2 (G_{2\rho}(M)-G^2_\rho(M))}{\alpha} } \sqrt{ \frac{\Delta H_\alpha(X;Y)}{\log e} }.\] In particular, \(\Delta G(X;Y) \lesssim \sqrt{\frac{M^2-1}{6 \alpha}} \sqrt{ \frac{\Delta H_\alpha(X;Y)}{\log e} }\).

One has \(\Delta G_\rho(q)\to0\) and \(\Delta H_\alpha(q) \to0\) when \(q\) approaches the uniform distribution, i.e., when \(\gamma\to 0\) in [eq:GrhoHalpha<1] or [eq:GrhoHalpha>1]. Taylor expansion about \(\gamma=0\) in both cases gives \[\begin{cases} \Delta G_\rho(X;Y) = \gamma |1 - \alpha'| (G_{2\rho}(M)-G^2_\rho(M)) + O(\gamma^2) %%%% VERIFIE: IL Y A BIEN UN CARRE sur G_\rho(M) \\ \frac{\Delta H_\alpha(X;Y)}{\log e} = \tfrac{|\alpha'(1-\alpha')|}{2} (G_{2\rho}(M)\!-\!G_\rho^2(M)) \gamma^2 + O(\gamma^3) \end{cases}\] which yields [eq:GrhoHalpha1storder]. This is also valid for \(\alpha=1\) by taking the limit \(\alpha\to 1\).

Fig. [fig:illus-main-theorem] illustrates the results of this section. Next we give two examples for which explicit upper and lower bounds can be derived directly.

{#fig:illustration_card=8 width=“\textwidth”} [[fig:illustration_card=8]]{#fig:illustration_card=8 label=“fig:illustration_card=8”}

Let \(\mathcal{B}(p)\) be the Bernoulli distribution with parameter \(p \in [0,1]\) and define \(h_\alpha(p) \triangleq H_\alpha( \mathcal{B}(p))=\frac{1}{1-\alpha}\log((1-p)^\alpha+p^\alpha)\) and \(k_\alpha(p) \triangleq K_\alpha( \mathcal{B}(p))=((1-p)^\alpha+p^\alpha)^{1/\alpha}\). We let \(h_\alpha^{-1}\) and \(k_\alpha^{-1}\) be their inverse when restricted to \([0,\frac{1}{2}]\). Note that \(k_\alpha\) is concave increasing if \(\alpha \in (0,1)\); and convex decreasing if \(\alpha > 1\).

For \(M=2\) in the unconditional case (without side information) we simply have \(G_\rho(X)=(1-p)+p2^\rho=1 + (2^\rho -1) p= 1 + (2^\rho -1) k_\alpha^{-1}( K_\alpha(X) )\). Since \(k_\alpha^{-1}\) is convex, by Jensen’s inequality, we obtain the desired lower bound explicitly. The upper bound is similarly obtained by taking the chord of \(k_\alpha^{-1}\): \[\begin{aligned} 1 + (2^\rho -1)& h_\alpha^{-1}( H_\alpha(X|Y) ) \leqslant G_\rho(X|Y) \\ &\leqslant 1 + \tfrac{2^\rho -1}{2} \frac{\exp{ \left ( \frac{1-\alpha}{\alpha} H_\alpha(X|Y) \right ) } -1 }{2^\frac{1-\alpha}{\alpha} - 1}. \end{aligned}\] Since \(h_\alpha^{-1}(\log 2 - \delta) \approx \frac{1}{2} - \sqrt{ \frac{1}{2\alpha} \frac{\delta}{\log e } }\) as \(\delta \to 0\) we recover Corollary [thm:explicit] as \(\delta=\Delta H_\alpha(X;Y) \to 0\), which for \(M=2\) takes the form \[\label{eqn:taylor2} \Delta G_\rho(X;Y) \lesssim % \frac{ 2^\rho -1}{\sqrt{2\alpha}} %\sqrt{ \frac{(2^\rho -1)^2}{2\alpha} } % \sqrt{\frac{\Delta H_\alpha(X;Y)}{\log e} }. (2^\rho -1) \sqrt{\frac{\Delta H_\alpha(X;Y)}{2\alpha\log e} }.\] In the other direction we also obtain at first order that \[\Delta G_\rho(X;Y)\gtrsim \tfrac{2^\rho - 1}{2} \tfrac{1-\alpha}{\alpha} \tfrac{2^{\frac{1-\alpha}{\alpha}}}{2^{\frac{1-\alpha}{\alpha}}-1} \frac{\Delta H_\alpha(X;Y)}{\log e}\] which reads \(\Delta G_\rho(X;Y)\gtrsim \tfrac{2^\rho - 1}{2} \frac{\Delta H(X;Y)}{\log e}\) in the limiting case \(\alpha\to 1\).

When \(\alpha = \infty\) explicit bounds can also be derived. Sason and Verdù (Sason and Verdú 2018 Theorem 9) provided the joint range between probability of error \(\epsilon_{X|Y}\) and \(\rho\)th-order guessing moments. Since \(H_\infty(X|Y) = - \log (1-\epsilon_{X|Y})\), their results can be easily rewritten in terms of min-entropy. This gives \[\begin{aligned} K \sum_{i = 1}^{ \lfloor K^{-1} \rfloor } i^\rho &+ ( 1 - K \lfloor K^{-1} \rfloor ) ( 1 + \lfloor K^{-1} \rfloor )^\rho \\ \leqslant G_\rho(X|Y) &\leqslant 1 + \frac{ M}{ M- 1} ( G_\rho(M) -1 ) (1 - K ) \end{aligned}\] where \(K = K_\infty(X|Y) = \exp \left ( -H_\infty(X|Y) \right )\). For \(\rho =1\), as shown in (Rioul 2023), this simplifies to \(( \lfloor K^{-1} \rfloor + 1 ) ( 1 - \frac{\lfloor K^{-1} \rfloor}{2} K ) \leqslant G(X|Y) \leqslant 1 + \frac{ M}{2} (1 - K).\)

For small leakages of uniform secrets with maximal leakage \(I_\infty(K;Y) = H_\infty(K) - H_\infty(K|Y) \leqslant\log \frac{ M}{ M-1}\), this rewrites \(\frac{1}{2} ( e^{I_\infty(K;Y)} - 1 ) \leqslant\Delta G(K ; Y) %\\ \leqslant\frac{ M- 1}{2} ( e^{I_\infty(K;Y)} - 1 ).\) As \(I_\infty(K;Y) \rightarrow 0\) this yields the first order approximation, \[\tfrac{1}{2} \tfrac{I_\infty(K;Y)}{\log e} \lesssim \Delta G(K ; Y) \lesssim \tfrac{ M- 1}{2} \tfrac{I_\infty(K;Y)}{\log e}.\] The upper bound is linear in \(I_\infty(K;Y)\) which agrees with the fact that in [eq:GrhoHalpha1storder] the term in \(\sqrt{\Delta H_\alpha(K;Y)}=\sqrt{I_\alpha(K;Y)}\) vanishes as \(\alpha\to+\infty\).

Evaluation in the Hamming Weight Lekage Model

A standard leakage model in side-channel analysis (Standaert, Malkin, and Yung 2009; Choudary and Popescu 2017; Chérisey et al. 2019a, 2019b; Cheng et al. 2022; Liu et al. 2021; Masure, Rioul, and Standaert 2023; Béguinot, Cheng, et al. 2023) is the Hamming weight leakage model \(Y = w_H(K) + N\), in which the Hamming weight \(w_H(K)\) of the binary representation of the secret byte \(K \sim \mathcal{U}(M)\), \(M=2^n\) leaks under additive Gaussian noise \(N \sim \mathcal{N}(0,\sigma)\). Fig. 4 compares our bounds to previous ones and to the numerical evaluation of the actual guessing advantage. While the case \(\alpha=1\) (Thm. [th:GH]) is slightly better than \(\alpha = \frac{1}{2}\) (Thm. [th:GrhoHalpha]), both bounds are very close to the exact value as noise increases, with huge gaps compared to previous Arikan’s (Arikan 1996) and Rioul’s (Rioul 2022) bounds which were used in (Choudary and Popescu 2017; Tănăsescu et al. 2021) for practical evaluation of full-key guessing entropy.

Bound on the guessing advantage in the Hamming weight leakage model for increasing noise variance \(\sigma^2\).

Evaluation in the Random Probing Model

The random probing model (Duc, Dziembowski, and Faust 2014; Belaı̈d et al. 2020; Cassiers et al. 2021) is a well-known setting for security evaluation of cryptographic circuits. The side-channel is noiseless but has a random state: \(Y \!=\!(Z,f_Z(K))\) where \(f_z\) is a deterministic leakage function depending on state \(z\). Typically \(Y\) is either erased or leaks \(f(K,T)\) with publicly available plain or cyphertext \(T\).

[thm:improved-random-probing] In the random probing model, \[1 + \tfrac{M-1}{2} \tfrac{ \exp ( \tfrac{1-\alpha}{\alpha} H_\alpha(K|Y) ) - 1}{M^{(1-\alpha)/{\alpha}}-1} \lessgtr G(K|Y) \lessgtr \tfrac{1 + \exp ( H_\alpha(K|Y)) }{2}.\] where \(\lessgtr\) denotes \(\geqslant\) for \(\alpha \geqslant{1}/{2}\) and \(\leqslant\) for \(\alpha \leqslant{1}/{2}\). In the limiting case \(\alpha = 1\), \[\frac{1+\exp H(K|Y)}{2} \leqslant G(K|Y) \leqslant 1 + \frac{M-1}{2} \frac{H(K|Y)}{\log M}.\]

First, consider a fixed function \(f = f_z\) with pre-image cardinality \(M_y= |f^{-1}( \{ y \} )|\). Then \(H_\alpha(K|Y=y) = \log M_y\) and \(G(K|Y=y) = \frac{M_y + 1}{2}=\frac{ \exp ( H_\alpha(K|Y=y) ) + 1}{2}= \frac{ K_\alpha(K|Y=y)^{\frac{\alpha}{1-\alpha}} + 1}{2}.\) If \(\alpha = 1\), since \(x \mapsto \frac{1 + \exp (x) }{2}\) is convex we obtain by Jensen’s inequality that \(G(K|Y) \geqslant\frac{\exp \left ( H(K|Y) \right ) + 1}{2}\). If \(\alpha \neq 1\), \(x \mapsto \frac{1 + x^{ \frac{\alpha}{1-\alpha} } }{2}\) is convex if \(\alpha \geqslant\frac{1}{2}\) and concave otherwise, hence \(G(K|Y) \lessgtr \frac{ K_\alpha(K|Y)^{\frac{\alpha}{1-\alpha}} + 1}{2}\). The other bound is obtained by taking the chord of \(x \mapsto \frac{1 + x^{ \frac{\alpha}{1-\alpha} } }{2}\). The region when \(f_z\) is obtained at random is obtained by taking the convex hull of the region \((K_\alpha,G)\), which is already convex.

Fig. [fig:illus] shows that Theorem [thm:improved-random-probing] greatly improves Theorem [th:GrhoHalpha] in the random probing model. Interestingly, the case \(\alpha = \frac{1}{2}\) gives equality \(\Delta G(K ; Y) = \frac{M}{2} ( 1 - \exp ( - I_{\frac{1}{2}}(K;Y) ) ) \approx \frac{M}{2} \frac{I_{\frac{1}{2}}(K;Y)}{\log e}\) where the approximation holds for small leakages.

{#fig:illustration_card=8 width=“90%” height=“.55\textwidth”} [[fig:illustration_card=8]]{#fig:illustration_card=8 label=“fig:illustration_card=8”}

{#fig:illustration_card=32 width=“90%” height=“.55\textwidth”} [[fig:illustration_card=32]]{#fig:illustration_card=32 label=“fig:illustration_card=32”}

Conclusion

We derived closed-form optimal regions with explicit bounds on the guessing advantage \(\Delta G_\rho\) of a secret random variable in terms of the \(\alpha\)-information advantage \(\Delta H_\alpha\). An important outcome is that it decreases as the square root \(\Delta G_\rho\leqslant c \sqrt{\Delta H_\alpha}\) for small leakages. Simulations in the classical Hamming weight leakage model show that our bounds are much tighter than previous ones, especially for large noise (small leakage). We further sharpened the bounds in the random probing model where the guessing advantage is actually equal to a function of the Rényi-Arimoto entropy of order \(1/2\).

As possible extensions of this work, it would be valuable to obtain sharpened bounds for any additive noise leakage model \(Y = f(X) + Z\), for secrets with known (nonuniform) prior pmf \(p_X\) and generalize to negative values of \(\alpha\) (Esposito, Vandenbroucque, and Gastpar 2022).

I created this website using Quarto https://quarto.org/docs/websites.

References

Arikan, E. 1996. “An Inequality on Guessing and Its Application to Sequential Decoding.” IEEE Transactions on Information Theory 42 (1): 99–105. https://doi.org/10.1109/18.481781.

Arimoto, Suguru. 1977. “Information Measures and Capacity of Order \(\alpha\) for Discrete Memoryless Channels.” Topics in Information Theory (Colloquia Mathematica Societatis Janos Bolyai) 16: 41–52.

Béguinot, Julien, Wei Cheng, Sylvain Guilley, Yi Liu, Loı̈c Masure, Olivier Rioul, and François-Xavier Standaert. 2023. “Removing the Field Size Loss from Duc et al.’s Conjectured Bound for Masked Encodings.” In Constructive Side-Channel Analysis and Secure Design (COSADE 2023), 13979:86–104. LNCS. Springer Nature.

Béguinot, Julien, Yi Liu, Olivier Rioul, Wei Cheng, and Sylvain Guilley. 2023. “Maximal Leakage of Masked Implementations Using Mrs. Gerber’s Lemma for Min-Entropy.” In 2023 IEEE International Symposium on Information Theory (ISIT), 654–59.

Belaı̈d, Sonia, Jean-Sébastien Coron, Emmanuel Prouff, Matthieu Rivain, and Abdul Rahman Taleb. 2020. “Random Probing Security: Verification, Composition, Expansion and New Constructions.” In Advances in Cryptology - CRYPTO 2020, Santa Barbara, CA, USA, August 17-21, 2020, 12170:339–68. LNCS. Springer.

Cassiers, Gaëtan, Sebastian Faust, Maximilian Orlt, and François-Xavier Standaert. 2021. “Towards Tight Random Probing Security.” In Advances in Cryptology - CRYPTO 2021, Virtual Event, August 16-20, 2021, Proceedings, Part III, 12827:185–214. LNCS. Springer.

Cheng, Wei, Yi Liu, Sylvain Guilley, and Olivier Rioul. 2022. “Attacking Masked Cryptographic Implementations: Information-Theoretic Bounds.” In 2022 IEEE International Symposium on Information Theory (ISIT), 654–59. https://doi.org/10.1109/ISIT50566.2022.9834556.

Chérisey, Éloi de, Sylvain Guilley, Olivier Rioul, and Pablo Piantanida. 2019a. “An Information-Theoretic Model for Side-Channel Attacks in Embedded Hardware.” In 2019 IEEE International Symposium on Information Theory (ISIT), 310–15. https://doi.org/10.1109/ISIT.2019.8849763.

———. 2019b. “Best Information Is Most Successful: Mutual Information and Success Rate in Side-Channel Analysis.” IACR Transactions on Cryptographic Hardware and Embedded Systems (CHES 2019), February, 49–79.

Choudary, Marios Omar, and Pantelimon George Popescu. 2017. “Back to Massey: Impressively Fast, Scalable and Tight Security Evaluation Tools.”

Cover, Thomas M., and Joy A. Thomas. 1st Ed. 1990, 2nd Ed. 2006. Elements of Information Theory. John Wiley & Sons.

Duc, Alexandre, Stefan Dziembowski, and Sebastian Faust. 2014. “Unifying Leakage Models: From Probing Attacks to Noisy Leakage.” In Advances in Cryptology - EUROCRYPT 2014, Copenhagen, Denmark, May 11-15, 2014, 8441:423–40. LNCS. Springer.

Esposito, Amedeo Roberto, Adrien Vandenbroucque, and Michael Gastpar. 2022. “On Sibson’s \(\alpha\)-Mutual Information.” In IEEE International Symposium on Information Theory, ISIT 2022, Espoo, Finland, June 26 - July 1, 2022, 2904–9.

Fehr, Serge, and Stefan Berens. 2014. “On the Conditional Rényi Entropy.” IEEE Transactions on Information Theory 60 (11): 6801–10.

Hayashi, Masahito, and Vincent Y. F. Tan. 2017. “Equivocations, Exponents, and Second-Order Coding Rates Under Various Rényi Information Measures.” IEEE Transactions on Information Theory 63 (2): 975–1005.

Hirche, Christoph. 2020. “Rényi Bounds on Information Combining.” In IEEE International Symposium on Information Theory, ISIT 2020, Los Angeles, CA, USA, June 21-26, 2020, 2297–2302.

Ishai, Yuval, Amit Sahai, and David Wagner. 2003. “Private Circuits: Securing Hardware against Probing Attacks.” In CRYPTO, 2729:463–81. Lecture Notes in Computer Science. Springer.

Issa, Ibrahim, Aaron B Wagner, and Sudeep Kamath. 2019. “An Operational Approach to Information Leakage.” IEEE Transactions on Information Theory 66 (3): 1625–57.

Liao, Jiachun, Lalitha Sankar, Oliver Kosut, and Flavio P. Calmon. 2019. “Robustness of Maximal \(\alpha\)-Leakage to Side Information.” In 2019 IEEE International Symposium on Information Theory (ISIT), 642–46. https://doi.org/10.1109/ISIT.2019.8849769.

Liu, Yi, Wei Cheng, Sylvain Guilley, and Olivier Rioul. 2021. “On Conditional Alpha-Information and Its Application to Side-Channel Analysis.” In 2021 IEEE Information Theory Workshop (ITW), 1–6. https://doi.org/10.1109/ITW48936.2021.9611409.

Massey, James L. 1994. “Guessing and Entropy.” In Proceedings of 1994 IEEE International Symposium on Information Theory, 204.

Masure, Loı̈c, Olivier Rioul, and François-Xavier Standaert. 2023. “A Nearly Tight Proof of Duc et al.’s Conjectured Security Bound for Masked Implementations.” In Smart Card Research and Advanced Applications (CARDIS 2022), 13820:69–81. LNCS. Springer Nature.

McEliece, Robert J, and Zhong Yu. 1995. “An Inequality on Entropy.” In Proceedings of 1995 IEEE International Symposium on Information Theory, 329.

Pliam, John O. 2000. “Guesswork and Variation Distance as Measures of Cipher Security.” In Selected Areas in Cryptography, edited by Howard Heys and Carlisle Adams, 62–77.

Poussier, Romain, François-Xavier Standaert, and Vincent Grosso. 2016. “Simple Key Enumeration (and Rank Estimation) Using Histograms: An Integrated Approach.” In Cryptographic Hardware and Embedded Systems - CHES 2016, 9813:61–81. LNCS. Springer.

Rioul, Olivier. 2020. “Rényi Entropy Power and Normal Transport.” In International Symposium on Information Theory and Its Applications, ISITA 2020, Kapolei, HI, USA, October 24-27, 2020, 1–5.

———. 2022. “Variations on a Theme by Massey.” IEEE Transactions on Information Theory 68 (5): 2813–28.

———. 2023. “The Interplay Between Error, Total Variation, Alpha-Entropy and Guessing: Fano and Pinsker Direct and Reverse Inequalities.” Entropy 25 (7): 978.

Sason, Igal, and Sergio Verdú. 2018. “Improved Bounds on Lossless Source Coding and Guessing Moments via Rényi Measures.” IEEE Transactions on Information Theory 64 (6): 4323–46.

Sibson, Robin. 1969. “Information Radius.” Zeitschrift Wahrscheinlichkeitstheorie Verwandte Gebiete 14: 149–60.

Standaert, François-Xavier, Tal G. Malkin, and Moti Yung. 2009. “A Unified Framework for the Analysis of Side-Channel Key Recovery Attacks.” In Advances in Cryptology - EUROCRYPT 2009, edited by Antoine Joux, 443–61.

Tănăsescu, Andrei, Marios O. Choudary, Olivier Rioul, and Pantelimon George Popescu. 2021. “Tight and Scalable Side-Channel Attack Evaluations Through Asymptotically Optimal Massey-Like Inequalities on Guessing Entropy.” Entropy 23 (11): 1–10.

Verdú, Sergio. 2015. “\(\alpha\)-Mutual Information.” La Jolla, CA.

Footnotes

For \(\alpha>1\), the Gibbs inequality takes this form because when \(p_X=q\), these pmfs have the same support, so that equality also holds in the inequality \(\mathbb{E} _X (a + b X^\rho)_+ \geqslant \mathbb{E} _X (a + b X^\rho ) \iff \mathbb{E} _X q^{\alpha-1}(X) \geqslant a+bG_\rho(X)\).↩︎