A Template Attack to Reconstruct the Input of
SHA-3 on an 8-bit Device

Shih-Chun You? and Markus G. Kuhn

Department of Computer Science and Technology,
University of Cambridge, Cambridge CB3 0FD, UK

{scy27,mgk25}@cl.cam.ac.uk

Abstract. We present an enumeration procedure based on a template
attack to recover the complete input text of a SHA-3 implementation
on an 8-bit microprocessor from a single trace of a power-analysis side
channel. This attack targets 600 bytes of triple-redundant internal state
in each invocation of the permutation used by SHA-3. We first build
templates that can generate for each of these bytes a rank table of all
256 candidates. The templates we obtained for our 8-bit target CPU
nearly identified the correct value of most target bytes directly, rather
than just gathering information about their Hamming weights. We then
search the full intermediate state of the Keccak permutation to elimi-
nate remaining uncertainties about the recovered byte values. From the
resulting intermediate states we finally reconstruct both the input and
output of SHA-3 and verify the output. In our experimental evaluation
of this procedure we achieved success rates higher than 99%.

Keywords: Template attack · SHA-3 · Keccak · enumeration trees.

1 Introduction

In 2015, the National Institute of Standards and Technology (NIST) standard-
ized Secure Hash Algorithm 3 (SHA-3) [16], which is based on the Keccak sponge
function and the Keccak-f permutation designed by Bertoni et al. [2,3]. Keccak-
f consists of multiple rounds, each of which consists of five steps known as θ,
ρ, π, χ and ι. The Keccak-f permutation is not only the main building block
of the SHA-3 family of hash functions, but is also used in the SHAKE family
of extendable-output functions, and can be used in many other contexts, such
as key-derivation functions, message-authentication codes, and key-agreement
schemes (e.g., NewHope [1]), where either its inputs or outputs can be confiden-
tial data for which side-channel attacks may be a concern.

Previous papers discussed side-channel attacks to recover keys used in the
generation of Keccak-based message authentication codes (MAC-Keccak). Taha
and Schaumont mainly used Differential Power Analysis (DPA) to attack step θ
to recover a fixed-length key and discussed the relationship between key-length

? supported by the Cambridge Trust and the Ministry of Education, Taiwan

to be presented at COSADE 2020, Lugano, 5–7 October 2020


2 S.-C. You, M.G. Kuhn

and the DPA resilience of MAC-Keccak [23]. They later applied similar attacks
to recovering MAC-Keccak keys with arbitrary length [22]. Luo et al. modified
this attack to determine the intermediate state after a complete round of Keccak-
f [10], applying DPA after the non-linear step χ.

Such DPA-style attacks can effectively recover a MAC-Keccak key K, but
they do not extend to other applications where there is no fixed key K, as they
require leakage traces of many thousand repeated executions of SHA-3(K‖M)
with known variable input message M . For example, a DPA-style attack could
not reconstruct the complete input of MAC-Keccak. Instead, we focus here on
attacking a single invocation of Keccak-f in order to reconstruct both its input
and output. To achieve this, we require a template attack (TA) [4]. We then use
this capability to demonstrate recovery of a complete SHA-3 input given a single
power trace and then verify the results with the given output of SHA-3. Our
technique therefore not only can recover MAC-Keccak keys of arbitrary length
without prior knowledge of the message M . It naturally also extends to other
Keccak-f applications with confidential inputs or outputs, such as random-bit
generation.

Since each step of the Keccak-f permutation is invertible, given its full output
state we can calculate the input state of the step. Likewise, if we can determine a
complete intermediate state, we can calculate from that both the input and out-
put of the entire permutation. Having reconstructed the output of one Keccak-f
invocation and the input of the next, we merely have to XOR these together in
order to reconstruct one block of input of a SHA-3 execution.

We first tried to use the template attack to determine the value of every byte
in a single full intermediate state. However, there is no room for mistakes: the dif-
fusion of Keccak-f means that even a single bit error will result in a completely
different input or output. Therefore, we combined a kind of template attack
with an enumeration technique around the mathematical structure of Keccak-f
to correct errors. We first use a template attack to estimate the likelihood of
each of the 256 possible values of each byte in three consecutive intermediate
states. Since the state of Keccak-f contains 200 bytes (1600 bits), there will be
200 per-byte rank tables associated with each observed intermediate state, that
is 600 rank tables in total. In a pair of (nearly) consecutive intermediate states,
each byte will only depend on a small number of bytes in neighboring states: the
avalanche effect takes multiple rounds to come into effect. This makes it pos-
sible to eliminate errors by combining likelihood information from neighboring
intermediate states and using the result to build rank tables for combinations of
bytes. We repeat this until we obtain the (top of the) rank table for the entire
state.

In this paper, we discuss the details of the template attack we performed
on SHA-3 to obtain rank tables of all bytes of three consecutive intermediate
states (Section 4), and then present the search procedure we used to recover
the complete intermediate states (Section 5). Finally, we evaluate the success
probability of recovering the inputs of SHA-3 by this method (Section 5.4).


A Template Attack to Reconstruct the Input of SHA-3 on an 8-bit Device 3

2 Preliminaries and notation

2.1 Keccak-f [1600] and SHA-3

Our terminology and notation related to SHA-3 and the Keccak-f permuta-
tion closely follow NIST FIPS 202 [16]. The SHA-3 algorithm is based on the
Keccak-f [1600] permutation, which consists of a sequence of five steps that it-
erates 24 times on a 1600-bit state.

Each of the steps θ, ρ, π, χ and ι results in an intermediate state of 1600
bits. In this paper, we refer to these intermediate states as αω, α′ω, βω and β′ω
as follows:

Input
θ−→ α0

ρ,π−−→ α′0
χ−→ β0

ι−→ β′0
θ−→ α1

ρ,π−−→ · · · χ−→ β23
ι−→ Output

The round index ω runs from 0 to 23 in Keccak-f [1600]. We use the term in-
termediate byte to refer to one of the 200 bytes in an intermediate state of
Keccak-f [1600]. The SHA-3 standard describes these states as a 5 × 5 × 64-bit
cube with an x, y and z axis. Since we used an 8-bit processor in our exper-
iments, we refer to the 64 bits along the z axis as 8 bytes. For example, we
describe an intermediate byte in state α0 as α0[i, j, k], where i, j, k are the x, y,
z coordinates with 0 ≤ i ≤ 4, 0 ≤ j ≤ 4, 0 ≤ k ≤ 7. The least significant bit in
this byte we denote by α0[i, j, k][0] and its most significant bit by α0[i, j, k][7].
We call the five bytes with the same y and z coordinates a byte row, and the 25
bytes with the same z coordinate a byte slice.

All five steps in a Keccak-f [1600] round are practical to invert [2] and the
Keccak team provides C++ implementations of the corresponding inverse func-
tions [9]. In other words, the input, output, and all intermediate states of a
Keccak-f [1600] execution can be converted into each other efficiently.

Fig. 1. The diagram of the Keccak sponge function from NIST FIPS 202 [16]. In this
diagram, N is the arbitrary-length input sequence and Z is the d-bit output sequence.


4 S.-C. You, M.G. Kuhn

The Keccak[c](N, d) function is based on the Keccak-f [1600] permutation
[16]. It first “absorbs” an arbitrary-length input bit sequence into its internal
state and then can “squeeze” out an arbitrary-length output bit sequence, and so
is described as a “sponge function”. Figure 1 shows how Keccak[c](N, d) absorbs
the input bit string N and squeezes out a d-bit result. Input message N is first
padded and then split into blocks of r = 1600−c bits, where parameter c is called
the capacity and parameter r the rate. The input and output of Keccak-f [1600]
each consist of r + c = 1600 bits, which we denote accordingly by R‖C. After
all r-bit blocks have been absorbed, in the squeezing stage the output sequence
is generated by concatenating the R fragment being output by each iteration of
Keccak-f [1600] until the concatenated sequence is at least of the required length
d, and it is then truncated to d bits.

The SHA-3 family is finally defined for input messages M using Keccak[c]
for the output sizes d ∈ {224, 256, 384, 512} bits as

SHA3-d(M) = Keccak[2d](M‖01, d).

In addition, SHA-3 defines two extendable-output functions (XOFs) as

SHAKE128(M,d) = Keccak[256](M‖1111, d)

SHAKE256(M,d) = Keccak[512](M‖1111, d)

where users have free choice over the output length d.
We ran all experiments in this paper on SHA3-512(M) because this is the

SHA-3 algorithm with the largest capacity c, i.e. the largest security margin.
The technique works equally well on the other SHA-3 algorithms.

2.2 Template attack

The traditional template attack. Chari et al. introduced a powerful side-
channel exploitation technique called Template Attack (TA) [4]. It consists of two
stages, profiling and attack. During profiling, we build templates that model the
leakage traces of different candidate secrets from traces recorded while a known
secret is processed. Then, we record an attack trace while an unknown secret
is processed. We compare that with all the templates, and predict the secret as
the candidate with the template most similar to the attack trace.

In this approach, attackers need to collect a sizable number of profiling traces.
These will be separated into subsets according to the secret value targeted. If we
target one intermediate byte, the number of subsets will be 256. From the trace
subset corresponding to intermediate byte b, we construct a template consisting
of an expected trace x̄b ∈ Rm and a covariance matrix Sb ∈ Rm×m, as

x̄b =
1

nb

nb∑
t=1

xb,t, Sb =
1

nb − 1

nb∑
t=1

(xb,t − x̄b)(xb,t − x̄b)
T,

where nb is the number of profiling traces in this subset, and xb,t is the tth

profiling trace with corresponding intermediate byte b, each trace containing m
points in time.


A Template Attack to Reconstruct the Input of SHA-3 on an 8-bit Device 5

Later, when we obtain an attack trace xa, we can calculate as a likelihood
function a probability-density value for each template with

f(xa|x̄b,Sb) =
1√

(2π)m|Sb|
exp

(
−1

2
(xa − x̄b)

TS−1b (xa − x̄b)

)
.

Then we can sort the 256 results into a rank table, where the top entry is the
most likely candidate.

The template attack with stochastic models. The previous approach,
where the arithmetic mean of the traces in each subset is used to estimate
their expected value, needs a large total number of profiling traces. Based on
the stochastic model F9 by Schindler et al. [19], Choudary and Kuhn used an
alternative solution [6]. They treat each bit, b[0] to b[7], in the targeted interme-
diate byte as an independent variable and then use multivariate linear regression
to calculate coefficients c0 to c7 and a constant c8 for predicting the expected
values of single points on a trace as x̂b =

∑7
l=0(b[l] · cl) + c8 and equivalently as

x̂b =

7∑
l=0

(b[l] · cl) + c8

for an entire trace, where c0, . . . , c8 ∈ Rm are the vectors of coefficients and
constants previously estimated by multivariate linear regression.

They also modified the way to calculate the covariance matrices Sb as

Sb =
1

nb − 1

nb∑
t=1

(xb,t − x̂b)(xb,t − x̂b)
T, Spooled =

1∑255
b=0 nb

255∑
b=0

(nb − 1)Sb.

Instead of a different Sb in each template, they used one single pooled covariance
matrix estimate, Spooled, which is the weighted average of the Sb, because previ-
ous studies [17,8] had suggested this is a more effective estimate when the actual
covariance matrix can be assumed to be independent of the targeted value b.
The function to calculate the probability density value then becomes

f(xa|x̂b,Spooled) =
1√

(2π)m|Spooled|
exp

(
−1

2
(xa − x̂b)

TSpooled
−1(xa − x̂b)

)

Data compression with linear discriminant analysis. Choudary and Kuhn
also integrated Fisher’s Linear Discriminant Analysis (LDA), as proposed by
Standaert and Archambeau [20], into their approach [6]. This is a procedure to
project the traces onto a subspace with higher signal-to-noise ratio (SNR), as
determined by two covariance matrices B and Σ, where B is the inter-class scat-
ter representing the signal, while Σ is the total intra-class scatter representing
the noise. When recovering 8-bit secrets, these two matrices can be calculated


6 S.-C. You, M.G. Kuhn

from the profiling traces as

B =
1∑255
b=0 nb

255∑
b=0

nb(x̂b − x̄)(x̂b − x̄)T,

Σ =
1∑255
b=0 nb

255∑
b=0

nb∑
t=1

(xb,t − x̂b)(xb,t − x̂b)
T = Spooled,

where x̄ = 256−1
∑255
b=0 x̂b = c8 + 1

2

∑7
l=0 cl is the arithmetic mean of the ex-

pected values x̂b.

We then build a matrix A ∈ Rm×m′
where the columns are the m′ normalized

eigenvectors of the matrix Σ−1B corresponding to its m′ largest eigenvalues (see
also [7, footnote 6]). The LDA projection of a raw trace xa onto the resulting
m′-dimensional subspace is then xproj = ATxa.

In our experiments, we follow Choudary and Kuhn’s approach [6] as outlined
above, firstly using multivariate linear regression to build matrices Σ and B,
secondly calculating the projection matrix A, then using that to project all
profiling traces onto the subspace with high SNR. From these projected traces,
we then build very compact templates, again using multivariate linear regression.
The resulting template information consists of a new covariance matrix Sproj ∈
Rm′×m′

, 256 new expected traces x̂b,proj ∈ Rm′
, along with A.

2.3 Combining multiple likelihood tables

With ideal templates, attackers should find the full state of a secret by simply
taking the most likely candidate from each part of the secret and concatenating
them. However, template attacks are noise sensitive, so the correct candidate
will not always top the rank table. Therefore, Veyrat-Charvillon et al. intro-
duced an optimal key enumeration algorithm to search the correct key across
several ranked likelihood tables of the sub-keys of AES [24]. Given two rank
tables in descending order of likelihood, each with 28 values, there will be 216

possible combinations. Their approach searches the 216 possible combinations in
descending order of their joint likelihood until the correct combination is found,
without calculating the joint likelihoods of all 216 combinations. They general-
ized this method using a recursive tree structure that combines two tables at a
time to combine the results of more than two rank tables. With this algorithm,
it becomes practical to search the correct combination of the sub-keys when cor-
rect candidates do not top the tables. This increases the noise resiliency of the
attack significantly.

We applied their method in our experiments to enumerate the intermediate
states instead of any key. We will refer to their tree-structured algorithm as an
enumeration tree in this paper.


A Template Attack to Reconstruct the Input of SHA-3 on an 8-bit Device 7

?
=

⊕ ⊕ ⊕
2©5©

6©

?
=

?
=
4©7©

0c
?
=

8©

3© 1©

f

C C C C C C C C

f f f

R R R R R R R R

?
=

SHA-3
output

SHA-3 input
with padding

Fig. 2. The procedure to reconstruct SHA-3 inputs by template attack: 1© reconstruct
an intermediate state of the last Keccak-f [1600] permutation and calculate its input
and output; 2© verify the correctness by checking whether the first 512 bits in the
output match the SHA3-512 output; 3© repeat 1© on other permutations but 4© verify
the correctness by checking whether the C of the output matches that of the input
in the following permutation; 5© XOR the R of the two consecutive permutations to
calculate each part of the SHA-3 input; 6© in the special case of the first r bits of the
SHA-3 input, that part is identical to the R part of the input of the first Keccak-f [1600]
permutation and 7© the C part of that permutation should be c 0 bits; 8© concatenate
each part to form the complete SHA-3 input with padding.

3 Attack Strategy

Because of the invertibility of every step in Keccak, attackers can access not only
the output but also the input of a Keccak-f [1600] permutation once they obtain
any intermediate state. Figure 2 depicts how we can use this to recover an input
of SHA3-512.

First, we use template attacks to reconstruct all the bytes in an intermediate
state of the last Keccak-f [1600] permutation. After, for example, state α′0 is
reconstructed, we can calculate the inverses of π, ρ, and θ to find out the input
of this Keccak-f [1600] permutation, and then its output. We can verify the
correctness of the latter by checking whether its first 512 bits match the SHA3-
512 output.

Second, we can repeat what we have done on the last Keccak-f [1600] per-
mutation for its predecessor, and verify the correctness of its output by checking
whether its last c = 1024 bits C match those of the input of its successor. The
input of the first Keccak-f [1600] permutation has C equal to an all-zero string.

Third, we can calculate each part of the SHA-3 input by XOR-ing the R part
of the input and the output of two consecutive Keccak-f [1600] permutations.
In the special case of the first r bits of the SHA-3 input, that is identical to
the R part of the input of the first Keccak-f [1600] permutation. Finally, after


8 S.-C. You, M.G. Kuhn

concatenating all the parts and removing the padding, the input of SHA3-512 is
recovered.

To target SHAKE128 or SHAKE256, we only need to attack permutations
in the absorbing stage, as the squeezing stage fully depends on the output of the
former, and recall that SHAKE uses slightly different padding.

4 Template Attack on SHA-3

Now the problem remains how to successfully recover at least one intermediate
state in each invocation of the Keccak-f [1600] permutation in the SHA3-512
procedure by template attack. We chose the intermediate states α′0, β0, and α1

to build our templates, in order to cover a non-linear step (χ) while limiting the
dependency on bits from other slices. (Any other choice of target round should
work equally well.)

4.1 Target hardware device and measurement setup

Our SHA3-512 implementation is based on the Keccak-f [1600] implementation
in the official C reference code, the Extended Keccak Code Package [25]. We ran
it on a power-analysis test board designed by Choudary [5, Section 2.2.2].

The target processor is the 8-bit microcontroller ATxmega256A3U [12]. We
supply it with an external 2 MHz square wave clock signal generated by a Na-
tional Instruments PXIe-5423 [15] wave generator that is configured to use the
same reference clock as the NI PXIe-5160 [14] oscilloscope that we used to record
the traces of power consumption. This way, with a sampling rate of 250 MHz,
each clock cycle contains exactly 125 data points, with phase jitter about 8 ps
standard deviation. The power supply was an NI PXI-4110 [13].

We recorded 32 000 profiling traces and 1000 evaluation traces of the Keccak-
f [1600] permutation with random inputs to build the templates and evaluate
their quality. For testing, we also recorded two sets of SHA3-512 traces. The
first one contains 1000 random inputs with length shorter than 71 bytes, so
it needs one Keccak-f [1600] permutation to absorb the input. The second set
contains 1000 random inputs whose lengths range from 216 to 287 bytes, so they
need four Keccak-f [1600] permutations to be absorbed. Since our target states
are α′0, β0, and α1, we only recorded the traces covering the power consumption
of the first two rounds of one Keccak-f [1600] permutation, and each raw trace
contained 40 000 clock cycles or 5 000 000 samples.

4.2 Interesting clock cycle detection

Since our raw traces were too long for building templates directly, we first deter-
mined the clock cycles that contain information about the targeted intermediate
states, which in the Keccak-f [1600] permutation each contain 200 intermediate
bytes. We tested each clock cycle to find out whether it is related to any of the
intermediate bytes we target. We used the 8 bits in the intermediate bytes as 8


A Template Attack to Reconstruct the Input of SHA-3 on an 8-bit Device 9

binary variables in a multivariate linear regression to analyze their correlation
with the peak current in each clock cycle.

Fig. 3. Comparison of the highest R2 coefficient and SNR value in each clock cycle.

We decided whether the correlation is sufficiently high via the coefficient of
determination (R2), as estimated by the regression. The clock cycles with R2

higher than a threshold were added to the set of interesting clock cycles. Since
traditionally the interval −0.3 < R < 0.3 indicates a variable of low correlation,
we selected clock cycles based on the threshold R2 > 0.09. The multivariate
linear regression and R2 were calculated using the LinearRegression class in
the Python library scikit-learn [18]. Fig. 3 shows the resulting highest R2

value occuring in each clock cycle, along with SNR value [11]

SNR(s) =

∑255
b=0 nb(x̄b[s]− x̄[s])2∑255

b=0

∑nb
t=0(xb,t[s]− x̄b[s])2

at each per-clock-cycle peak time s. (Our R2 > 0.09 threshold is approximately
equivalent to an SNR > 7 threshold.)

Let A′0,[i,j,k] be the set of interesting clock cycles for intermediate byte

α′0[i, j, k], B0,[i,j,k] that of β0[i, j, k], and A1,[i,j,k] that of α1[i, j, k]. The clock
cycles that leak these 3 × 200 = 600 intermediate bytes should be sufficient for
building working templates, but we found a method to consider more clock cy-
cles at the same time. Between the intermediate states α0 and α′0 are the steps
ρ and π, which are both transposition steps. We give an example here how the
eight bits in α′0[2, 1, 1] match those from up to two bytes in α0:

α′0[2, 1, 1][0] = α0[0, 2, 0][5], α′0[2, 1, 1][1] = α0[0, 2, 0][6],

α′0[2, 1, 1][2] = α0[0, 2, 0][7], α′0[2, 1, 1][3] = α0[0, 2, 1][0],

α′0[2, 1, 1][4] = α0[0, 2, 1][1], α′0[2, 1, 1][5] = α0[0, 2, 1][2],

α′0[2, 1, 1][6] = α0[0, 2, 1][3], α′0[2, 1, 1][7] = α0[0, 2, 1][4].

Therefore we extend the set of interesting clock cycles for α′0[2, 1, 1] fromA′0,[2,1,1]
to A′0,[2,1,1]∪A0,[0,2,0]∪A0,[0,2,1]. This similarly applies to the intermediate state
α1, but the other way round.


10 S.-C. You, M.G. Kuhn

Table 1. The number of interesting clock cycles for each byte in α′
0[i, j, k] (left) and

β0[i, j, k] (right). The numbers for α1 (omitted here) look similar to those for α′
0.

(i, j)
k

0 1 2 3 4 5 6 7

(0, 0) 36 38 33 33 32 33 42 33

(1, 0) 112 114 102 96 100 109 98 106

(2, 0) 107 103 96 96 98 103 94 98

(3, 0) 115 122 103 84 78 89 92 103

(4, 0) 134 124 82 74 74 87 95 100

(0, 1) 110 116 102 94 80 91 93 105

(1, 1) 109 117 95 83 77 88 97 102

(2, 1) 107 87 75 75 72 82 94 108

(3, 1) 109 109 96 93 97 102 92 100

(4, 1) 118 112 97 93 88 106 122 121

(0, 2) 90 75 75 73 69 70 84 97

(1, 2) 113 99 82 73 77 85 98 110

(2, 2) 86 86 94 85 70 69 76 81

(3, 2) 50 38 35 33 32 30 51 37

(4, 2) 103 99 87 71 65 72 80 100

(0, 3) 99 101 98 91 82 88 91 97

(1, 3) 108 112 104 99 95 97 97 103

(2, 3) 110 99 77 73 70 78 89 96

(3, 3) 127 114 79 70 73 87 89 99

(4, 3) 44 44 45 41 46 45 60 45

(0, 4) 127 119 104 98 97 112 127 125

(1, 4) 117 109 98 92 96 110 112 111

(2, 4) 115 110 100 103 94 89 94 98

(3, 4) 87 88 88 87 98 95 86 83

(4, 4) 93 87 89 83 72 80 90 104

(i, j)
k

0 1 2 3 4 5 6 7

(0, 0) 34 39 34 31 30 29 37 33

(1, 0) 25 26 23 23 30 26 32 27

(2, 0) 28 28 25 29 27 24 31 30

(3, 0) 26 32 30 25 27 24 34 28

(4, 0) 29 38 24 25 24 24 31 30

(0, 1) 27 25 25 27 24 24 34 29

(1, 1) 27 29 23 25 23 24 34 29

(2, 1) 27 28 23 25 24 27 36 37

(3, 1) 26 30 25 26 28 29 34 31

(4, 1) 30 29 24 27 28 22 34 35

(0, 2) 27 27 23 24 23 23 35 34

(1, 2) 30 24 22 24 21 21 29 30

(2, 2) 27 28 28 25 21 21 30 28

(3, 2) 32 24 23 24 23 23 30 31

(4, 2) 28 28 21 23 21 23 29 29

(0, 3) 28 26 26 29 26 26 33 28

(1, 3) 25 25 22 26 27 28 32 28

(2, 3) 32 26 23 25 25 25 35 33

(3, 3) 31 36 22 28 24 25 35 30

(4, 3) 30 29 25 27 29 29 45 34

(0, 4) 28 36 23 27 24 26 36 36

(1, 4) 27 32 25 25 27 29 42 30

(2, 4) 28 32 26 31 31 25 35 30

(3, 4) 27 29 25 30 28 22 35 28

(4, 4) 26 33 26 30 56 32 35 40


A Template Attack to Reconstruct the Input of SHA-3 on an 8-bit Device 11

Table 1 lists the number of interesting clock cycles selected for each interme-
diate byte after that extension. In state α′0, the numbers in lanes (0, 0), (3, 2),
and (4, 3) are smaller because step ρ rotates the bits in these lanes by multiples
of eight. For example, we always have α′0[3, 2, 0] = α0[4, 3, 7], which implies that
A′0,[3,2,0] = A0,[4,3,7] = A′0,[3,2,0] ∪ A0,[4,3,7], and that does not extend the set of
clock cycles.

4.3 Building templates

Pre-processing. When targeting a specific byte, we select only the samples
in the interesting clock cycle set of this byte. For example, when building the
template for α′0[2, 1, 1], the profiling traces reassembled this way cover 87 clock
cycles with 87× 125 = 10875 samples.

Since the 125 samples per clock cycle still lead to too long execution times
for building the templates, we reduced the sampling rate further by a factor 5,
averaging five consecutive samples into a new sample.

Templates with LDA compression. After the detection and pre-processing
steps, we now have shorter traces for building templates for each of 600 bytes. We
apply Choudary et al.’s method [6] (see Section 2.2). In the LDA compression, we
chose only the first m′ = 8 eigenvectors to form the projection matrices since the
other eigenvalues are negligible. Besides the projection matrices, our templates
therefore contain 8× 8 covariance matrices and 8-point expected traces.

4.4 Evaluating the quality of templates

Having built the templates, we use the 1000 evaluation traces to estimate tem-
plate quality, resulting in 600 rank tables for each evaluation trace.

As figures of merit, we use both the first-order success rate and the guessing
entropy as defined by Standaert et al [21]. Table 2 shows the resulting success
rates for states α′0 and β0, i.e. the fraction of these 1000 evaluation where the
correct candidate topped the rank table. Table 3 shows the guessing entropy for
each byte of states α′0 and β0, i.e. the average rank of the correct candidates in
these 1000 evaluations (top rank = 1).

5 Searching the correct intermediate states

The results of the template evaluations show that it is improbable that all 200
bytes of an intermediate state can be directly recovered by combining only the
top-ranking candidates. Therefore attackers will need a search scheme to find the
correct combination of high-ranking candidates. One obvious choice is to build
an enumeration tree [24] to successively combine the rank tables for individual
target bytes into tables for larger byte sequences, until the high-ranking com-
binations of all 200 bytes of an intermediate state are determined. While this


12 S.-C. You, M.G. Kuhn

Table 2. Success rates on α′
0[i, j, k] (left) and β0[i, j, k] (right). The rates for α1 (omit-

ted here) look similar to those for α′
0.

(i, j)
k

0 1 2 3 4 5 6 7

(0, 0) 0.924 0.924 0.598 0.749 0.485 0.542 0.946 0.931

(1, 0) 0.995 0.994 0.931 0.957 0.971 0.965 0.999 0.991

(2, 0) 0.993 0.978 0.937 0.936 0.963 0.918 0.981 0.992

(3, 0) 0.999 0.997 0.983 0.787 0.771 0.878 0.967 0.969

(4, 0) 0.999 0.999 0.769 0.736 0.669 0.831 0.979 0.995

(0, 1) 1.000 1.000 0.982 0.956 0.846 0.780 0.999 0.986

(1, 1) 0.995 0.997 0.931 0.905 0.794 0.903 0.984 0.991

(2, 1) 1.000 0.925 0.811 0.819 0.655 0.879 0.987 0.998

(3, 1) 0.997 0.978 0.923 0.946 0.995 0.949 0.988 0.988

(4, 1) 1.000 0.975 0.877 0.921 0.896 0.943 0.998 1.000

(0, 2) 0.998 0.951 0.829 0.803 0.657 0.695 0.999 1.000

(1, 2) 0.998 0.997 0.836 0.726 0.669 0.838 0.995 0.998

(2, 2) 0.972 0.989 0.984 0.853 0.719 0.664 0.969 0.990

(3, 2) 0.998 0.816 0.642 0.536 0.579 0.616 0.973 0.991

(4, 2) 0.997 0.977 0.810 0.679 0.677 0.747 0.984 0.997

(0, 3) 1.000 1.000 0.968 0.945 0.816 0.846 0.994 0.980

(1, 3) 0.990 0.996 0.941 0.979 0.959 0.945 0.988 0.994

(2, 3) 0.999 0.942 0.823 0.728 0.703 0.658 0.986 1.000

(3, 3) 0.999 1.000 0.732 0.715 0.632 0.834 0.964 0.994

(4, 3) 0.911 0.878 0.791 0.759 0.850 0.972 0.997 0.987

(0, 4) 1.000 1.000 0.897 0.889 0.880 0.961 1.000 1.000

(1, 4) 1.000 0.998 0.879 0.895 0.896 0.978 1.000 0.991

(2, 4) 0.992 0.996 0.935 0.984 0.984 0.749 0.970 0.991

(3, 4) 0.982 0.939 0.905 0.977 0.992 0.832 0.972 0.989

(4, 4) 0.991 0.947 0.914 0.959 0.727 0.768 0.999 1.000

(i, j)
k

0 1 2 3 4 5 6 7

(0, 0) 0.803 0.872 0.718 0.587 0.413 0.528 0.801 0.677

(1, 0) 0.530 0.654 0.255 0.226 0.354 0.274 0.522 0.314

(2, 0) 0.487 0.592 0.334 0.262 0.263 0.355 0.475 0.351

(3, 0) 0.529 0.683 0.309 0.220 0.294 0.275 0.498 0.355

(4, 0) 0.526 0.651 0.299 0.207 0.235 0.351 0.490 0.353

(0, 1) 0.373 0.365 0.286 0.305 0.274 0.306 0.536 0.483

(1, 1) 0.293 0.348 0.327 0.280 0.272 0.376 0.608 0.449

(2, 1) 0.259 0.353 0.262 0.240 0.291 0.298 0.596 0.533

(3, 1) 0.290 0.346 0.290 0.267 0.352 0.376 0.544 0.485

(4, 1) 0.358 0.385 0.295 0.390 0.362 0.259 0.619 0.437

(0, 2) 0.277 0.300 0.340 0.322 0.200 0.263 0.569 0.325

(1, 2) 0.289 0.300 0.309 0.354 0.216 0.259 0.553 0.341

(2, 2) 0.224 0.299 0.339 0.358 0.197 0.258 0.541 0.281

(3, 2) 0.275 0.244 0.327 0.269 0.233 0.270 0.508 0.341

(4, 2) 0.284 0.230 0.236 0.293 0.173 0.263 0.530 0.315

(0, 3) 0.301 0.252 0.291 0.289 0.444 0.319 0.638 0.374

(1, 3) 0.312 0.256 0.260 0.257 0.438 0.344 0.700 0.336

(2, 3) 0.383 0.225 0.274 0.268 0.347 0.328 0.661 0.396

(3, 3) 0.379 0.285 0.270 0.265 0.311 0.307 0.695 0.340

(4, 3) 0.337 0.262 0.260 0.247 0.425 0.340 0.696 0.401

(0, 4) 0.351 0.413 0.241 0.225 0.256 0.326 0.612 0.474

(1, 4) 0.338 0.393 0.260 0.216 0.228 0.332 0.593 0.332

(2, 4) 0.299 0.350 0.282 0.299 0.302 0.318 0.616 0.493

(3, 4) 0.303 0.326 0.271 0.290 0.253 0.262 0.649 0.400

(4, 4) 0.319 0.783 0.528 0.516 0.828 0.601 0.587 0.670

Table 3. Guessing entropy on α′
0[i, j, k] (left) and β0[i, j, k] (right). The entropy for

α1 (omitted here) look similar to those for α′
0.

(i, j)
k

0 1 2 3 4 5 6 7

(0, 0) 1.095 1.109 2.336 1.616 3.215 2.592 1.074 1.096

(1, 0) 1.005 1.006 1.085 1.049 1.033 1.048 1.001 1.009

(2, 0) 1.007 1.024 1.074 1.070 1.044 1.102 1.022 1.008

(3, 0) 1.001 1.003 1.018 1.377 1.424 1.185 1.035 1.034

(4, 0) 1.001 1.001 1.452 1.575 1.680 1.297 1.028 1.005

(0, 1) 1.000 1.000 1.021 1.053 1.255 1.440 1.002 1.014

(1, 1) 1.005 1.003 1.084 1.127 1.353 1.147 1.020 1.009

(2, 1) 1.000 1.089 1.325 1.347 1.756 1.208 1.014 1.002

(3, 1) 1.003 1.022 1.092 1.066 1.006 1.056 1.013 1.012

(4, 1) 1.000 1.027 1.187 1.107 1.158 1.076 1.002 1.000

(0, 2) 1.003 1.057 1.294 1.377 1.833 1.819 1.001 1.000

(1, 2) 1.002 1.003 1.275 1.565 1.670 1.269 1.005 1.002

(2, 2) 1.031 1.012 1.020 1.274 1.625 1.947 1.035 1.010

(3, 2) 1.002 1.341 2.042 2.546 2.370 2.100 1.027 1.009

(4, 2) 1.003 1.026 1.395 1.709 1.832 1.508 1.019 1.003

(0, 3) 1.000 1.000 1.035 1.075 1.297 1.294 1.008 1.026

(1, 3) 1.010 1.004 1.068 1.024 1.053 1.072 1.012 1.008

(2, 3) 1.001 1.072 1.355 1.575 1.710 1.812 1.015 1.000

(3, 3) 1.001 1.000 1.594 1.618 1.959 1.324 1.050 1.006

(4, 3) 1.121 1.194 1.443 1.525 1.301 1.054 1.003 1.013

(0, 4) 1.000 1.000 1.140 1.175 1.156 1.054 1.000 1.000

(1, 4) 1.000 1.002 1.216 1.177 1.142 1.024 1.000 1.009

(2, 4) 1.010 1.005 1.083 1.020 1.022 1.491 1.030 1.009

(3, 4) 1.023 1.078 1.131 1.028 1.008 1.318 1.032 1.012

(4, 4) 1.009 1.060 1.122 1.052 1.652 1.492 1.001 1.000

(i, j)
k

0 1 2 3 4 5 6 7

(0, 0) 1.296 1.178 1.622 2.351 3.931 2.629 1.391 1.715

(1, 0) 2.643 1.954 7.313 9.001 5.537 7.692 2.752 5.906

(2, 0) 2.675 2.241 4.973 8.000 6.842 4.567 2.914 4.949

(3, 0) 2.371 1.778 7.058 8.803 6.444 6.724 2.959 5.089

(4, 0) 2.433 1.794 6.284 9.404 6.959 4.883 3.105 5.764

(0, 1) 4.583 5.037 6.780 7.534 5.965 6.288 2.697 3.360

(1, 1) 6.258 5.443 5.074 7.012 7.183 4.046 2.053 3.480

(2, 1) 6.325 5.132 7.682 8.731 6.660 6.622 2.468 2.980

(3, 1) 6.103 5.088 6.765 7.806 5.521 4.701 2.317 3.210

(4, 1) 5.267 4.972 6.526 5.000 4.129 7.227 2.214 3.897

(0, 2) 7.704 6.183 5.059 5.273 9.640 7.801 2.431 6.919

(1, 2) 5.800 7.270 6.671 4.691 9.212 6.722 2.723 5.457

(2, 2) 8.800 7.315 5.902 4.676 9.164 7.875 2.852 7.929

(3, 2) 6.875 8.534 6.677 6.691 8.061 8.670 2.906 6.216

(4, 2) 7.238 8.397 8.326 6.095 9.477 9.050 2.687 7.163

(0, 3) 5.747 7.825 6.600 6.936 3.231 5.893 2.140 4.747

(1, 3) 5.547 8.029 7.555 7.707 3.502 5.444 1.716 5.898

(2, 3) 4.549 8.766 7.473 6.990 4.631 5.860 1.899 3.982

(3, 3) 4.746 6.739 7.764 7.300 5.486 6.208 1.648 5.044

(4, 3) 5.313 8.414 8.048 7.751 3.531 5.413 1.796 4.470

(0, 4) 5.294 3.874 7.979 9.418 8.310 6.139 2.309 3.309

(1, 4) 5.309 3.939 7.766 8.770 7.162 6.030 2.335 5.722

(2, 4) 5.261 4.359 6.343 6.365 6.494 6.079 2.259 3.364

(3, 4) 6.766 4.995 7.510 7.268 7.313 7.794 1.929 4.508

(4, 4) 5.753 1.355 2.426 3.045 1.295 2.164 2.393 2.405


A Template Attack to Reconstruct the Input of SHA-3 on an 8-bit Device 13

approach is practical to search for moderately-sized states (e.g., 16-byte AES
keys), we found that, when it comes to our much larger 200-byte states, it would
still require unrealistically accurate templates for the search time to be tolerable.

To avoid directly combining the rank tables of our 200 target bytes, we built
a three-layer scheme that can gradually combine the probabilistic information
available about these bytes into a full state. In addition, rather than targeting
just 200 bytes, our scheme actually takes 600 rank tables into consideration,
to consider per-byte likelihoods from three intermediate states: α′0, β0, and α1.
At the bottom, Layer 1 first merges the rank tables associated with five bytes
in the same byte row, updates the likelihood of each combination, and then
generates in total 40 new rank tables that cover entire byte rows. Layer 2 then
combines five byte rows in the same byte slice, updates their likelihood values,
and then generates eight new rank tables for byte slices. Finally, Layer 3 just
concatenates the eight top candidates from each byte-slice rank table, and verifies
the correctness of the resulting full intermediate state.

5.1 Layer 1: generating tables for byte rows

Between the intermediate states α′0 and β0 is step χ. It can be calculated within
a byte row, without any influence from other byte rows, which allows us to split
the combination of these intermediate states into 40 mutually independent parts.
Therefore we can combine per-byte rank tables using a practical enumeration
tree that covers only five bytes at a time. We use the first byte row (j = 0, k = 0)
here to demonstrate this.

First, we initialize the number T of combinations we want to collect in the
resulting byte-row rank table to T = 2500. The five bytes of state α′0 in the
first byte row are α′0[0, 0, 0], α′0[1, 0, 0], α′0[2, 0, 0], α′0[3, 0, 0], α′0[4, 0, 0], and we
use the five variables A′0, A

′
1, A

′
2, A

′
3, A

′
4 to represent their values. As likelihood

functions we use the Gaussian multivariate probability-density values provided
by the template attack: L(α′0[0, 0, 0] = A′0) = fα′

0[0,0,0]
(xproj|x̂A′

0,proj
,Sproj), etc.

With the rank tables of these five bytes, we build an enumeration tree to search
the first T combinations in descending order of joint likelihood of a byte row.
Assuming independence, our first estimate of their joint likelihood is

Lrow(α′0[·, 0, 0] = (A′0, A
′
1, A

′
2, A

′
3, A

′
4)) :=

4∏
i=0

L(α′0[i, 0, 0] = A′i).

Now the top-T combinations and their corresponding joint likelihoods form a
truncated rank table for this byte row.

For these T combinations, we calculate the values of state β0 in this byte row
as

(B0, B1, B2, B3, B4) = χ(A′0, A
′
1, A

′
2, A

′
3, A

′
4).

Since we also have ranked likelihood tables for all bytes in state β0, we now can
similarly calculate the likelihood for any combination (B0, B1, B2, B3, B4), and


14 S.-C. You, M.G. Kuhn

update the above top-T joint likelihoods by multiplying with the likelihood of
β0, that is

Lnew
row (α′0[·, 0, 0] = (A′0, A

′
1, A

′
2, A

′
3, A

′
4)) :=

4∏
i=0

L(α′0[i, 0, 0] = A′i)L(β0[i, 0, 0] = Bi).

Then, we sort these T combinations again in descending order of their updated
joint likelihood, and obtain the new rank table of this byte row.

5.2 Layer 2: generating tables for byte slices

We then use a method similar to Layer 1 to combine five byte-row rank tables into
a byte-slice rank table. We use here the first byte slice (k = 0) to demonstrate
this. Let R′j represent a byte row value of state α′0[·, j, 0] in this byte slice, such
that it contains five bytes, where R′j = (A′0,j , A

′
1,j , A

′
2,j , A

′
3,j , A

′
4,j).

We use the rank tables of the five byte rows again to build an enumeration
tree, and search the first T combinations in descending order of joint likelihood
of a byte slice. Number T is as in Layer 1. Our initial joint likelihood estimate
for a byte slice is

Lslice(α
′
0[·, ·, 0] = (R′0, R

′
1, R

′
2, R

′
3, R

′
4)) :=

4∏
j=0

Lnew
row (α′0[·, j, 0] = R′j) =

4∏
j=0

4∏
i=0

L(α′0[i, j, 0] = A′i,j)L(β0[i, j, 0] = Bi,j).

Similar as in Layer 1, we now update these joint likelihoods by taking the rank
tables of α1 into account. We use variable Ai,j to represent the candidates of
intermediate byte α1[i, j, 0], and with Rj = (A0,j , A1,j , A2,j , A3,j , A4,j) have

(R0, R1, R2, R3, R4) = θ∗(ι∗0,k(χ(R′0), χ(R′1), χ(R′2), χ(R′3), χ(R′4)), τ).

where ι∗0,k is ι in round 0 with input and output truncated to byte slice k, and

θ∗(. . . , τ) is θ applied to just one byte slice, where τ ∈ {0, 1}5 represents the
five bits of column-parity information taken by θ from the previous byte slice.
Since step χ operates within a byte row, it will not use any data outside the byte
slice. Likewise, step ι XORs with a round constant, so it too is independent of
other byte slices. However, when executing step θ on only a byte slice, we will
lack information about five bits, because bit rotations are involved in step θ and
hence these five bits come from another byte slice. Without that information τ ,
step θ∗ on only one byte slice will have 32 possible outcomes. It is reasonable to
choose the combination τ that maximizes the joint likelihood of α1[·, ·, 0], which
is

max
τ∈{0,1}5

4∏
j=0

4∏
i=0

L(α1[i, j, 0] = Ai,j).


A Template Attack to Reconstruct the Input of SHA-3 on an 8-bit Device 15

Then, we can update the joint likelihood of this byte slice by multiplying with
the joint likelihood of α1, that is

Lnew
slice(α

′
0[·, ·, 0] = (R′0, R

′
1, R

′
2, R

′
3, R

′
4)) :=

4∏
j=0

4∏
i=0

L(α′0[i, j, 0] = A′i,j)L(β0[i, j, 0] = Bi,j)L(α1[i, j, 0] = Ai,j).

We then again sort these T combinations in descending order of the updated
joint likelihoods to form a new rank table for this byte slice.

5.3 Layer 3: consistency checking

In Layer 3, we could again form an enumeration tree to combine the top-T
entries in the eight byte-slice rank tables from Layer 2 into a single top-T rank
table of the full 200-byte state. In practice, however, we found that this was
never necessary, as in all our experiments if a byte-slice rank table contained
the correct combination, it was already ranked top. Therefore, Layer 3 actually
only needed to concatenate the top-ranked combinations from all eight byte-slice
tables together and then can calculate the corresponding input and output of
the Keccak-f [1600] permutation. Then, we can check consistency of these with
available SHA-3 data, as described in Section 3 and Fig. 2.

If the top combination fails that consistency check, most likely the correct
candidate was already missing in the tables produced by layers 1 or 2. Therefore
we quadruple T and restart the search from Layer 1 in such cases. (In our ex-
periments, we gave up after still not finding a correct solution with T = 640 000,
but this limit can of course be raised given sufficient computing resources.)

Layer 1

Layer 2

3. Combine tables in the same byte row of α′
0

to 40 byte-row tables with the first T com-
binations

7. Update the joint likelihoods using tables
for α1, sort the tables

6. Combine tables in the same byte slice to 8
byte-slice tables with the first T combina-
tions

Layer 3

Failed

10. Quadruple T

Tables for the 8 byte slices

Tables for the 40 byte rows

9. Concatenate the top-ranked combination in
all byte-slice tables, then check consistency

Recovered states

Passed

4. Update the joint likelihoods using tables
for β0, sort the tables

11. Restart from Layer 1
2. Tables for α′

0, β0

1. Initialize T = 2500

5. Tables for α1

8. Either SHA-3
output or C part of
the input of the
next permutation

Fig. 4. The procedure to combine a full state from 600 tables.


16 S.-C. You, M.G. Kuhn

5.4 Results

SHA3-512 with only one Keccak-f [1600] invocation. We first evaluated
our attack using 1000 test traces of SHA3-512 executions, each with a random
input shorter than 72 bytes. This is the simplest case, where a SHA3-512 execu-
tion invokes Keccak-f [1600] only once to digest the input. We only need to apply
the template attack here to obtain the 600 rank tables of intermediate bytes in
that one Keccak-f [1600] invocation, apply our three-layer search to find the
correct combination, and calculate the input and output of the Keccak-f [1600]
invocation. Its correctness can be verified by checking whether the first 512 bits
of the output match the SHA-3 output and whether the last 1024 bits of the
input are all zero. If both checks pass, the input of SHA-3 can be reconstructed
by removing the padding from the first 576 bits of the recovered Keccak-f [1600]
input.

In these 1000 tests, we successfully reconstructed the SHA3-512 input 999
times, while we failed to recover one remaining one even with T = 640 000. The
number of additional traces for which we recovered the correct input was for
each T value

T 2500 10 000 40 000 160 000 640 000 failed
new traces recovered 873 77 33 11 5 1

cumulative %age 87.3 95.0 98.3 99.4 99.9 100
CPU time avr. [s] 8.20 24.98 90.53 431.35 2605.88 N/A
CPU time std. [s] 0.23 0.44 1.70 7.65 23.86 N/A

SHA3-512 with multiple Keccak-f [1600] invocations. Generally, where
the input is longer than 72 bytes, it takes multiple Keccak-f [1600] invocations to
digest. There we need to use the templates to obtain the 600 rank tables of the
three intermediates states in every invocation, and we then start the three-layer
search for each, from the last invocation to the first. We verify the correctness
and calculate the SHA-3 input as described in Section 3.

While the success probability for each Keccak-f [1600] invocation is the same,
the success rate of reconstructing the entire SHA-3 input should drop with
increasing number of invocations, as the failure to recover the state of any
Keccak-f [1600] invocation means that two SHA-3 input blocks cannot be re-
covered. If the success rate of reconstructing the state of one Keccak-f [1600]
permutation is p, then the success rate of reconstructing SHA3-512 inputs of L

bytes length will be pd
L+1
72 e.

We also tried to recover SHA3-512 inputs ranging from 216 bytes to 287
bytes, where Keccak-f [1600] was invoked four times. Of 1000 attempted traces,
we successfully reconstructed the SHA-3 input 999 times, while in the only un-
successful one the search failed for one invocation of the permutation. While we
would normally expect the success rate of attacking SHA-3 with shorter input
to be higher than with longer inputs, in these experiments the success rates were
both too close to 1 to be distinguishable.


A Template Attack to Reconstruct the Input of SHA-3 on an 8-bit Device 17

6 Discussion and conclusion

Search time and success rate may be optimized further by adjusting the rank-
table length T for each byte row or slice separately, depending on the relative
likelihoods involved. So far we used the same T for all 40 byte rows in Layer 1
and all eight byte slices. From the numbers in Table 2, it is evident that the
success rates are much better for some byte locations, and for these, smaller
initial values of T may lead to a faster hit.

Our method could be extended by also building templates of intermediate
states in later rounds, such as a combination of α′1, β1, α2. When attackers
fail to recover the state in the first round, they could then try to search other
rounds and do a similar search as they have done in the first round. Although ι
is different in each round, there may be scope for reusing at least some templates
across rounds. In total there should be 23 combinations of intermediate states
that attackers could target using our search method.

With our method, we demonstrated that it is practical to reconstruct the in-
puts of an unprotected SHA-3 software implementation on an ATxmega256A3U
8-bit microcontroller using a template attack, even where the templates fail to
rank some correct bytes highest. In future work, we hope to extend this attack
procedure to work on 32-bit devices, where the success rates of templates can
be far worse.

References

1. Alkim, E., et al.: NewHope: Algorithm specifications and supporting documenta-
tion (2019), https://newhopecrypto.org/

2. Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: The Keccak reference (Jan
2011)

3. Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: Keccak. In: Johansson, T.,
Nguyen, P.Q. (eds.) Advances in Cryptology – EUROCRYPT 2013. pp. 313–314.
Springer, Berlin, Heidelberg (2013)

4. Chari, S., Rao, J.R., Rohatgi, P.: Template attacks. In: International Workshop on
Cryptographic Hardware and Embedded Systems. pp. 13–28. Springer (2002)

5. Choudary, M.O.: Efficient multivariate statistical techniques for extracting secrets
from electronic devices. Tech. Rep. UCAM-CL-TR-878, University of Cambridge
(2015), PhD thesis

6. Choudary, M.O., Kuhn, M.G.: Efficient stochastic methods: profiled attacks be-
yond 8 bits. In: International Conference on Smart Card Research and Advanced
Applications. pp. 85–103. Springer (2014)

7. Choudary, M.O., Kuhn, M.G.: Efficient, portable template attacks. IEEE Trans-
actions on Information Forensics and Security 13(2), 490–501 (Feb 2018)

8. Choudary, O., Kuhn, M.G.: Efficient template attacks. In: International Conference
on Smart Card Research and Advanced Applications. pp. 253–270. Springer (2013)

9. KeccakTools, https://github.com/KeccakTeam/KeccakTools
10. Luo, P., Fei, Y., Fang, X., Ding, A.A., Kaeli, D.R., Leeser, M.: Side-channel analysis

of MAC-Keccak hardware implementations. In: Proceedings of the Fourth Work-
shop on Hardware and Architectural Support for Security and Privacy (HASP ’15).
Association for Computing Machinery (2015)

https://newhopecrypto.org/
https://github.com/KeccakTeam/KeccakTools


18 S.-C. You, M.G. Kuhn

11. Mangard, S.: Hardware countermeasures against DPA – a statistical analysis of
their effectiveness. In: Topics in Cryptology – CT-RSA 2004. Springer, Berlin,
Heidelberg (2004)

12. Microchip: ATxmega256A3U, accessed February 2020,
https://www.microchip.com/wwwproducts/en/atxmega256a3u

13. National Instruments: PXI-4110 programmable power supply,
http://www.ni.com/en-gb/support/model.pxi-4110.html

14. National Instruments: PXIe-5160 oscilloscope,
http://www.ni.com/en-gb/support/model.pxie-5160.html

15. National Instruments: PXIe-5423 waveform generator,
http://www.ni.com/en-gb/support/model.pxie-5423.html

16. NIST: SHA-3 standard: permutation-based hash and extendable-output functions
(Aug 2015), http://dx.doi.org/10.6028/NIST.FIPS.202, FIPS PUB 202

17. Oswald, D., Paar, C.: Breaking Mifare DESFire MF3ICD40: power analysis and
templates in the real world. In: International Workshop on Cryptographic Hard-
ware and Embedded Systems. pp. 207–222. Springer (2011)

18. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: Machine
learning in Python. Journal of Machine Learning Research 12, 2825–2830 (Oct
2011)

19. Schindler, W., Lemke, K., Paar, C.: A stochastic model for differential side chan-
nel cryptanalysis. In: International Workshop on Cryptographic Hardware and
Embedded Systems. pp. 30–46. Springer (2005)

20. Standaert, F.X., Archambeau, C.: Using subspace-based template attacks to com-
pare and combine power and electromagnetic information leakages. In: Interna-
tional Workshop on Cryptographic Hardware and Embedded Systems. pp. 411–
425. Springer (2008)

21. Standaert, F.X., Malkin, T.G., Yung, M.: A unified framework for the analysis of
side-channel key recovery attacks. In: Annual International Conference on the The-
ory and Applications of Cryptographic Techniques. pp. 443–461. Springer (2009)

22. Taha, M., Schaumont, P.: Differential power analysis of MAC-Keccak at any key-
length. In: Sakiyama, K., Terada, M. (eds.) Advances in Information and Computer
Security. pp. 68–82. Springer, Berlin, Heidelberg (2013)

23. Taha, M., Schaumont, P.: Side-channel analysis of MAC-Keccak. In: 2013 IEEE
International Symposium on Hardware-Oriented Security and Trust (HOST). pp.
125–130. IEEE (2013)

24. Veyrat-Charvillon, N., Gérard, B., Renauld, M., Standaert, F.X.: An optimal key
enumeration algorithm and its application to side-channel attacks. In: International
Conference on Selected Areas in Cryptography. pp. 390–406. Springer (2012)

25. Extended Keccak code package, https://github.com/XKCP/XKCP, accessed April
2019, lib/low/KeccakP-1600/Compact64/KeccakP-1600-compact64.c

https://www.microchip.com/wwwproducts/en/atxmega256a3u
http://www.ni.com/en-gb/support/model.pxi-4110.html
http://www.ni.com/en-gb/support/model.pxie-5160.html
http://www.ni.com/en-gb/support/model.pxie-5423.html
http://dx.doi.org/10.6028/NIST.FIPS.202
https://github.com/XKCP/XKCP

	A Template Attack to Reconstruct the Input of SHA-3 on an 8-bit Device