Supplementary Information1

2

Unraveling the Mechanics of a Repeat-Protein Nanospring — From Folding of3

Individual Repeats to Fluctuations of the Superhelix4

Marie Synakewicz, Rohan S. Eapen, Albert Perez-Riba, Daniela Bauer, Andreas Weißl,5

Gerhard Fischer, Marko Hyvönen, Matthias Rief, Laura S. Itzhaki, Johannes Stigler∗6

∗ To whom correspondence should be addressed: L.S. Itzhaki (lsi10@cam.ac.uk), M. Synakewicz
(m.synakewicz@bioc.uzh.ch) and J. Stigler (stigler@genzentrum.lmu.de)

1

mailto:L.S. Itzhaki (lsi10@cam.ac.uk), M. Synakewicz (m.synakewicz@bioc.uzh.ch) and J. Stigler (stigler@genzentrum.lmu.de)
mailto:L.S. Itzhaki (lsi10@cam.ac.uk), M. Synakewicz (m.synakewicz@bioc.uzh.ch) and J. Stigler (stigler@genzentrum.lmu.de)


CONTENTS7

I. Supplementary figures 38

II. Supplementary tables 109

III. Materials 1210

IV. Protein Sequences 1211

V. Experimental methods 1212

A. Molecular biology 1213

1. Mutagenesis 1214

2. General repeat array construction 1315

3. Construction of yCTPRrv3y and yCTPRrv5y 1416

4. Construction of yCTPRrv10y, yCTPRrv20y and yCTPRrv26y 1417

5. Construction of yCTPRa5y, yCTPRa9y, cCTPRrv5c and cCTPRa5c 1418

B. Protein preparation 1419

C. Equilibrium denaturation 1520

D. Crystallography 1621

E. Calculation of plane angles 1622

F. Circular dichroism spectroscopy 1723

G. Force spectroscopy experiments 1724

1. Sample preparation 1725

2. Data acquisition 1826

VI. Data analysis of raw FECs and FDCs 1827

A. Fitting of raw FECs 1828

B. Extracting average unfolding and refolding forces 1929

C. Estimating the work done by the trap/protein from constant velocity data 1930

VII. Mechanical Ising models 2031

A. Structure information 2132

B. Interaction models 2133

1. Homopolymer repeat model 2234

2. Homopolymer helix model 2235

3. Heteropolymer helix model 2236

4. Heteropolymer helix nearest & next-nearest (NNN) model 2337

C. Calculation of force-distance curves 2338

D. Calculation of unfolding profile 2339

E. Minimal folding unit under load 2440

F. Minimal folding unit in the absence of load 2441

G. Computation and simplification 2442

1. Skip approximation 2443

2. Zipper approximation 2444

3. Verification 2545

H. Error estimation and propagation 2546

References 2647

2


I. SUPPLEMENTARY FIGURES48

A B

200 210 220 230 240 250 260 270 280
Wavelength (nm)

2

0

2

4

6

[
M
R
](

de
g
m

1
M

1
N

1 )

1e6

CTPRa5
cCTPRa5c
yCTPRa5y
CTPRrv5
cCTPRrv5c
yCTPRrv5y

CTP
Ra5

cCT
PR

a5c

yCT
PR

a5y

CTP
Rrv

5

cCT
PR

rv5
c

yCT
PR

rv5
y

3.0

2.5

2.0

1.5

1.0

0.5

0.0

[
M
R
,2
22
]
(d

eg
m

1
M

1
N

1 )

1e6

FIG. S1. Circular dichroism (CD) data of all 5-repeat constructs used in this study, reported as mean
residue ellipticity (θMR). (A) CD spectra are shown as the mean and estimated error with line and shaded
area, respectively. Although the signal at 222 nm remains largely unchanged, the mutations in the rv-type
arrays appear to decrease the signal at 208 nm relative to that of the CTPRa arrays. This may either
reflect the changes in helix coiling within the tertiary structure, or it is simply due to the loss of aromatics
which are known to contribute to the CD signal at these wavelengths. (B) The changes in mean residue
ellipticity at 222 nm, indicative of α-helicity, are small if not negligible due to the uncertainty in the protein
concentration measurements between different samples (approximately 10%).

A BC

C
C

N

C

N

N N

C

FIG. S2. Crystal structure of CTPRrv. (A) Structures of two macromolecules (marine blue, cartoon
representation) present in the asymmetric unit with 2Fo-Fc maps (grey) contoured at 1.5σ. (B) Zoomed
view of chain A, showing clear density for backbone atoms. (C) Structural deviations are minimal between
chains A (marine blue) and B (dark blue), an alignment having a backbone RMSD of 0.446�A.

3


-0.2

[GdnHCl] (M)

Fr
ac

tio
n
un

fo
ld
ed

�Gunit = 0.2 +/- 0.05
�Gnn = -6.8 +/- 0.1

A B
CTPRrv5
cCTPRrv5c
yCTPRrv5y
CTPRrv10
CTPRrv10y
CTPRa5
cCTPRa5c
yCTPRa5y

0 1 2 3 4 5 6
[GdnHCl] (M)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Fr
ac

tio
n
un

fo
ld
ed

0.6

0.8

1.0

1.2

0 1 2 3 4 5 6

0.0

0.2

0.4

CTPRrv2
CTPRrv4
CTPRrv5
CTPRrv8
CTPRrv10

FIG. S3. Equilibrium denaturation data of CTPR arrays using guanidine hydrochloride. (A) Attachment
variants of CTPRrv5, CTPRrv10 and CTPRa5 were tested to examine the effect of the added ybbR-tag
or cysteine residues at the N- and C-termini. While cysteine modifications did not altered the unfolding
profile, the ybbR-tag slightly altered both the transition mid-point and the slope of the transition. We
intentionally did not display any fits, since (i) TPRs with more than three repeats clearly deviate from
two-state behaviour and (ii) the number of variants was not sufficient to build ensemble heteropolymer Ising
models that treated the ybbR-tag as a separate helix with different intrinsic stability and interaction energy
at the N- and C-terminal interfaces of the CTPR array. (B) Ensemble Ising models require a global fitting
procedure to denaturation data of a series of rv-type arrays with increasing number of repeats. Here, the
fits to a homopolymer repeat model with the resulting values for with ∆Gunit and ∆Gnn are displayed. A
heteropolymer helix model that treated the A- and B-helices different was not fitted as it would result in
over-parametrization of the data (6 free parameters versus only 3 used for the homopolymer repeat model).
Experiments were performed in technical triplicates in 96-well plate format, and all data are represented as
averages with corresponding standard errors.

4


10 pN

10 pN

5 pN

100 nm 100 nm

100 nm

100 nm

10
nm

/s

10
nm

/s

10
0 nm

/s

10
0 nm

/s

50
0 nm

/s

50
0 nm

/s

10
00

nm
/s

10
00

nm
/s

50
00

nm
/s

20
00

nm
/s

A

B

C D

5 pN

Pulling speed (nm/s)

0

10

20

30

40

50

60

H
ys

te
re

si
s

(k
BT

)

rv-type
a-type

E

101 102 103

FIG. S4. Hysteresis of CTPR unfolding increases slightly with higher loading rates. (A,B) Consecutive
FDCs of one CTPRa9 (green) and one CTPRrv5 molecule (blue) acquired at 1 µm/s highlight the variation
observed within a single molecule in the force response at higher pulling speeds. (C,D) Representative traces
of the same molecules collected at five different pulling speeds. In all cases the unfolding (darker colours)
and refolding traces (lighter colours) are overlaid to highlight the absence or presence of hysteresis. (E) The
area under the FDCs was calculated to obtain first estimates of the unfolding and refolding free energies.
Using the unfolding and refolding energies it is possible to quantify the hysteresis for individual stretch-relax
cycles, here shown as mean with corresponding standard deviations to highlight the increase in variation at
the higher pulling speeds. More importantly, this graph clearly shows that hysteresis is negligible for pulling
speeds ≤100 nm/s.

5


FIG. S5. Zipper and skip approximations of the heteropolymer helix model result in comparable values
for ∆Gtot, ∆Gunit and ∆Gnn. (A,C) Scatter plot of resulting intrinsic repeat energy and next-neighbour
interaction energy for each molecule obtained from a heterpolymer helix model with either skip or zipper
approximation. Colours/symbols: filled – rv-type, empty – a-type, circles – ybbR attachments, squares
– cysteine attachments, colours represent array lengths (see (B,D)). (B,D) The respective total energy
∆Gtot = N∆Gunit + (N − 1)∆Gnn for each array length of rv-type (filled symbols) and a-type (empty
symbols). Error bars represent the SEM, but are too small to be seen.

FIG. S6. Model selection. (A) The homopolymer repeat model (dashed line) fails to reproduce the curvature
at the transition between the DNA stretch response and the protein unfolding plateau for CTPRrv10, while
the heteropolymer helix model (continous line) fits well. (B) The homopolymer helix model has higher fit
residuals (top) than the heteropolymer helix model (middle) when fitting CTPRrv5 data (bottom). Black
line: fit line of heteropolymer helix model. (C) Akaike information criterion (AIC) for the four different
interaction models. Reported is the average over all molecules. (D) Comparison of the AIC calculated for
the Zipper and Skip approxmations of all molecules (N = 3, 5, 9, 10, 20) to which the Skip approximation
could be fitted.

6


FIG. S7. Predicted unfolding of a 26-repeat protein. (A) Experimental force-distance profile (purple) fitted
with a heteropolymer zipper model (continuous black line). Dashed black line: Corresponding prediction
from the Skip approximation. Roman letters point to corresponding panels in B. (B) Individual columns rep-
resent the ten likeliest configurations at the indicated distances. The likelihood of a particular configuration
is shown on top. Color code: Colored stretches are folded, grey/white stretches are unfolded.

14.0

13.5

13.0

12.5

12.0

11.5

Fo
rc

e 
(p

N
)

480460440420400380360

Trap distance (nm)

15
10

5H
el

ix
 #1.0

0.8

0.6

0.4

0.2

0.0

p f
ol

de
d

N

C

10.5

10.0

9.5

9.0

8.5

8.0

Fo
rc

e 
(p

N
)

440420400380360

Trap distance (nm)

5

H
el

ix
 #

N
C

CTPRrv3
5

10

10.5

10.0

9.5

9.0

8.5

8.0

Fo
rc

e 
(p

N
)

460440420400380360340

Trap distance (nm)

H
el

ix
 # N

C

CTPRrv5

13.5

13.0

12.5

12.0

11.5

11.0

Fo
rc

e 
(p

N
)

480460440420400380

Trap distance (nm)

H
el

ix
 #

5
10 N

C

CTPRa5
CTPRa9

CTPRa9
N pair

single
helix

C

10.5

10.0

9.5

9.0

8.5

8.0

Fo
rc

e 
(p

N
)

480460440420400380360340320

Trap distance (nm)

15
20

10
5H

el
ix

 #

N

C

CTPRrv10

10.5

10.0

9.5

9.0

8.5

8.0

Fo
rc

e 
(p

N
)

550500450400350

Trap distance (nm)

30

40

20

10

H
el

ix
 #

N

C

CTPRrv20

11.0

10.5

10.0

9.5

9.0

8.5

Fo
rc

e 
(p

N
)

600550500450400

Trap distance (nm)

50

40

30

20

10

H
el

ix
 #

N

C

CTPRrv26A

B C

FIG. S8. Unfolding profiles for all measured CTPRrv (A) and CTPRa (B) constructs. Colour maps represent
the probability for each helix to be folded as a function of trap distance (please note that indexing proceeds
from the C-terminus to the N-terminus in this case). (C) Using a zoom of the CTPRa9 data to exemplify
how unfolding starts at the N- and C-termini: in all cases, unfolding starts with the C-terminal helix, and
proceeds with the unfolding of (more or less) paired helices from both ends.

7


20 30 400.4
0

10

20

30

40

50

60

70
mean seed
contour length
32.7± 0.2 nm

mean seed
contour length
35.0 ± 0.3 nm

30 40
0

10

20

30

40

ΔLseed (nm)

1.0

Fr
eq
ue
nc
y

FIG. S9. Contour length histograms of the final “dip” for as measured (roughly) from the end of the plateau
to the unfolded contour. Shown are data extracted from FDCs collected at 10 and 100 nm/s of CTPRrv
(blue) and CTPR (green). The mean and standard errors for each repeat type are shown. As a reference, the
expected contour length increase corresponding to on average 6 helices unfolding is approximately 34 nm,
while that of 7 helices unfolding is 38 nm (differences between the two repeat types are less than 1 nm).

Fo
rc

e 
(p

N
)

706050403020
Distance (nm)Distance (nm)

R
ep

ea
t #

20

16

12

8

Fo
rc

e 
(p

N
)

480470460450440430

5 N

C

N

C

4
3
2
1

20

16

12

8

5
4
3
2
1R

ep
ea

t # 1.0
0.8
0.6
0.4
0.2

0

p f
ol

de
d

A B

FIG. S10. Simulated FDCs for a consensus ankryin repeat protein in (A) an optical tweezers set-up and
(B) under conditions similar to AFM in which linker molecules are much shorter and the protein is tethered
between a surface and a much stiffer cantilever. Here we used the structure of the consensus ankyrin NI3C
modelled using the I-Tasser webserver (using all default values [1]), and previously reported values for the
energetic parameters of ∆Gunit = 5.56 kBT and ∆Gnn = −24 kBT [2]. Please note, that given this particular
structure our results indicate unfolding from the N-terminus to the C-terminus, which is contrary to previous
findings.

8


500 1000 1500

KD (pN)

50

100

150

200

250

300
Δ
L c

(n
m

)

10 20 30

pD (nm)
340 360 380

LD (nm)

Apparent LD

LN
5 10 15 20 25

Number of repeats

340

350

360

370

380

A
pp

ar
en

t
L D

(n
m

)

Linear fit: 0.8(±0.1) nm N + 349(±2) nm
A B

FIG. S11. Fitting DNA-WLCs to raw FDCs without explicitly using model for protein folding (Ising or
other). (A) There is no indication for a dependence of the protein contour length on any of the DNA
parameters. (B) The fitted contour lengths of the tethered constructs are compatible with predictions from
the crystal structure. With a rough linear fit, we can estimate an end-to-end distance for CTPRrv20,
LN ≈ 16 nm, based on the increase in the contour length of the full construct (comprising DNA and folded
protein) with increasing number of repeats. This value agrees with the crystallographic value (Fig. 1).

9


II. SUPPLEMENTARY TABLES49

TABLE S1. Data collection, phasing and BUSTER refinement statistics for the CTPRrv4 structure. Values
in parentheses are for the outermost shell.

Parameters and statistics PDB ID: 7obi

Data collection

Space group P31 2 1

Unit cell, a, b, c (Å), 58.912 58.912 189.517

α, β, γ (◦) 90.00, 90.00, 120.00

Resolution range, Å 51.02 - 3.00 (3.11 - 3.00)

Total reflections 16284 (1550)

Unique reflections 8153 (775)

Multiplicity 2.0 (2.0)

Completeness, % 99.6 (96.9)

I/σI 20.0 (1.3)

Rmerge 0.017 (0.530)

CC1/2 1.000 (0.858)

Refinement

Rwork/Rfree, % 0.226/0.271

Unique reflections used 8152

R.m.s deviations:

bond lengths, Å 0.009

bond angles, ◦ 0.96

Ramachandran analysis:

Favoured, % 98.11

Allowed, % 2.89

Outliers, % 0.00

Number of atoms

(average B-factor, Å2):

Protein 2187 (131.04)

Ligands 20 (177.48)

Mean/Wilson B-factor, Å2 131.46/114.92

TABLE S2. Repeat plane angles calculated for both CTPRa and CTPRrv arrays. Values are presented as
mean ± s.e.m. of the three repeat interfaces present in the unit cell of the crystal structure, or of the 19
interfaces present in the structure of a 20 repeat model based on symmetry transformation. Cumulative
angles are shown to highlight the differences between the repeat types in small and long arrays. Chain A
and B of the CTPRrv crystallographic units produced values within error, hence only values for chain A are
shown here.

Type Number
Curvature [◦] Twist [◦] Bending [◦]

x̄
∑
x x̄

∑
x x̄

∑
x

CTPRa
4 28± 1 83± 4 13.07± 0.03 39± 0.12 22.7± 0.4 68± 1.6

20 497± 20 256± 0.6 444± 8

CTPRrv
4 32± 2 95 12± 1 37 18± 2 55

20 31.6± 0.6 601 11.4± 0.7 217 19.1± 0.7 364

10


TABLE S3. Fitted energy parameters in units of kBT . N is the number of repeats (Zipper approximation).
Intrinsic repeat energy ∆Gunit and repeat next-neighbour interaction energy ∆Gnn (see eq. (S15)). ∆Gtot =
N∆Gunit + (N − 1)∆Gnn is the total energy for a n N -mer.

Heteropolymer helix model Heteropolymer helix NNN model

Type N ∆Gtot ∆Gunit ∆Gnn ∆Gtot ∆Gunit ∆Gnn

rv 3 –18.4±0.9 0.8±1.1 –10.3±1.5 –18.4±1.0 0.0±1.3 –9.2±1.4
5 –39.7±0.4 1.5±0.3 –11.8±0.3 –39.4±0.5 1.3±0.3 –11.5±0.3
10 –87.0±2.7 1.2±0.3 –11.0±0.1 –87.1±2.6 1.3±0.4 –11.2±0.3
20 –173.3±2.3 1.0±0.3 –10.2±0.2 –173.7±2.6 1.2±0.3 –10.4±0.3
26 –236.7±2.4 0.5±0.2 –10.0±0.3 –238.9±2.2 0.5±0.2 –10.1±0.2

combined 1.1±0.2 –11.0±0.2 1.0±0.2 –10.8±0.2
a 5 –61.3±0.6 –2.4±0.4 –12.4±0.4 –61.6±0.6 –2.8±0.3 –11.9±0.4

9 –117.9±1.6 –1.3±0.3 –13.3±0.4 –119.0±1.5 –1.6±0.4 –13.1±0.3
combined –1.9±0.3 –12.7±0.3 –2.3±0.3 –12.4±0.3

11


III. MATERIALS50

All reagents were purchased from Sigma Aldrich, New England Biolabs (NEB), ThermoFisher,51

Merck or Asco Chemicals unless otherwise stated. 2x yeast tryptone (2xYT) and Lysogeny Broth52

(LB) Miller were purchased from Formedium. Unmodified DNA oligonucleotides were purchased53

from Integrated DNA Technologies (IDT) or Sigma Aldrich. Synthetic genes were purchased from54

IDT. FastDigest restriction enzymes (ThermoFischer), Phusion High-Fidelity DNA polymerase55

(NEB), and QuickStick Ligase (Bioline, discontinued) or the Anza T4 Ligase Master Mix (Invitro-56

gen) were used for all cloning processes. E. coli strains for molecular biology were purchased from57

Bioline (α-select Competent Cells, Gold/Bronze Efficiency, discontinued) or NEB (NEB 5-alpha58

Competent E. coli, High efficiency). E. coli cells for expression were generated in house from C4159

cells obtained from the Kommander Lab (MRC-LMB, Cambridge). All constructs were expressed60

in vectors based on a pRSET backbone (Ampicillin resistance).61

IV. PROTEIN SEQUENCES62

The majority of CTPRs used for this study are based on the consensus sequence containing (a)63

the terminal RS residues arising from the BglII restriction site that is required for constructing64

longer repeat arrays [3, 4], and (b) the QK mutation for charge balancing of the final repeat65

protein [5]. The four-repeat construct used for crystallography was purchased as a synthetic gene,66

and contained the consensus asparagine residues at the repeat termini as well as a solvating helix.67

In the following sequences the pre/suffixes c and y identify cysteine and ybbR-tag attachment68

points for handles.69

(CTPRrv)N MRGSHHHHHHGLVPRGS(AEALNNLGNVYREQGDYQKAIEYYQKALELDPRS)N

y(CTPRrv)Ny MRGSHHHHHHGLVPRGSDSLEFIASKLA(AEALNNLGNVYREQGDYQKAIEYYQK
ALELDPRS)NDSLEFIASKLA

c(CTPRrv)Nc MRGSHHHHHHNNNNNNNNNNENLYFQGCGS(AEALNNLGNVYREQGDYQKAIEY
YQKALELDPRS)NKLC

CTPRrv4
(crystallography)

MRGSHHHHHHGLVPRGS(AEALNNLGNVYREQGDYQKAIEYYQKALELDPNN)4A
EALNNLGNVQRKQG

(CTPRa)N MRGSHHHHHHGLVPRGS(AEAWYNLGNAYYKQGDYQKAIEYYQKALELDPRS)N

y(CTPRa)Ny MRGSHHHHHHNNNNNNNNNNENLYFQGDSLEFIASKLAGS(AEAWYNLGNAYYK
QGDYQKAIEYYQKALELDPRS)NKLDSLEFIASKLA

c(CTPRa)Nc MRGSHHHHHHNNNNNNNNNNENLYFQGCGS(AEAWYNLGNAYYKQGDYQKAIE
YYQKALELDPRS)NKLC

70

71

V. EXPERIMENTAL METHODS72

A. Molecular biology73

1. Mutagenesis74

For Round-the-Horn site-directed mutagenesis (RTH-SDM, [6, 7]), 100 µM primers containing75

the required mutation/insertion in the overhang were phosphorylated using polynucleotide kinase76

(ThermoFischer) according to the manufacturer’s protocol. Phosphorylated primers were stored at77

12


H6

H6

H6 N10

H6

H6

H6BamHI

BamHI

BamHI
(GS)

BamHI

BamHI

BamHITPRM

TPR(M+N)

Protein of interest

TPRM

TPRN

TPRN

TPRN

BglII

BglII

Ligate
+

Transform

BglII
+

HindIII

vector insert (PCR product or vector)

BamHI
+

HindIII

BglII

BglII

BglII

HindIII

HindIII

HindIII
(KL)

ybbR/
cys

ybbR/
cys

HindIIITAA TAA

TAA TAA

TGA

TAA TAA

TAA TAA

TAA TAA

thrombin

thrombin

TEV

thrombin

thrombin

thrombin

A

B

FIG. S12. Schematics illustrating (A) the BamHI-BglII cloning method required to create longer CTPR
arrays, and (B) the vector backbone construct developed to facilitate N- and C-terminal modification of
proteins for force spectroscopy.

−20 °C until required. The mutation was inserted by PCR, and products were DpnI-digested and78

gel-purified. About 50 to 100 µg of DNA material was added to 1 µL Anza T4 Ligase Master Mix79

in a total volume of 4 µL, incubated for 10 to 20 min at room temperature and transformed into80

E. coli. Plasmids were isolated from individual colonies and tested for the presence of the correct81

mutation/insertion by Sanger sequencing (Eurofins).82

2. General repeat array construction83

DNA constructs of CTPR proteins in a pRSET backbone were built sequentially from from84

one, two and four repeat modules using BamHI/BglII cloning as previously described [8]. CTPR85

repeats are preceded by a BamHI restriction site and followed by a BglII restriction site, double stop86

codon and HindIII restriction site (Fig. S12). A vector containing M repeats was digested using87

BglII, HindIII and FastAP Thermosensitive Alkaline Phosphatase (ThermoFisher) according to the88

manufacturers specifications, and purified using the QIAquick gel extraction protocol. Inserts of up89

to two repeats were produced by PCR amplification using T7-forward and -terminator sequencing90

primers. The PCR product was purified according to the QIAquick PCR purification protocol,91

and digested using BamHI and HindIII followed by heat-inactivation of the enzymes according92

to the manufacturers specifications. Inserts containing more than two repeats were obtained by93

restriction digest using BamHI and HindIII and gel extraction. Since BamHI and BglII produce94

the same 5’-overhangs, the N -repeat construct was then ligated directly into the vector using95

QuickStick (according to the manufacturer’s protocol) or Anza T4 ligase (reduced reaction volume96

as described above), transformed into high efficiency E. coli cells, and plasmid purified according97

to QIAGEN protocols. The whole procedure was repeated until the desired number of repeats was98

obtained. Using synthetic genes of single repeats, all constructs without tags for DNA attachment99

were generated this way, and were subsequently used to produce the tagged variants. The construct100

used for crystallization was obtained as a synthetic gene (Integraed DNA Technologies) and was101

sub-cloned using the BamHI and HindIII restriction sites. For short arrays (e.g. up to 8 repeats)102

DNA sequencing could verify the exact number of repeats. Longer arrays were sequenced from103

13


both termini to verify the exact cloning boundaries and digested using BamHI and HindIII to104

determine the number of repeats.105

3. Construction of yCTPRrv3y and yCTPRrv5y106

Using RTH-SDM, the 11-amino acid ybbR-tags (DSLEFIASKLA) was inserted sequentially107

between (a) the BamHI restriction site and a TPR, and (b) the BglII site and the stop codons in108

a construct containing only one repeat (see Fig. S12A, Tab. S4). After digestion with BglII, two109

and four repeats obtained from BamHI-BglII digests were added at once. The correct orientation110

of the inserts was identified by restriction digest and Sanger sequencing.111

4. Construction of yCTPRrv10y, yCTPRrv20y and yCTPRrv26y112

First, ybbR-tags were introduced by RTH mutagenesis directly adjacent to the repeat sequence113

either N-terminally or C-terminally of a single repeat, giving rise to yCTPRrv1 and CTPRrv1y,114

respectively. Second, the required number of repeats were added to yCTPRrv1 two or four repeats115

at a time, resulting in yCTPRrv9, yCTPRrv19 and yCTPRrv25. Last, the C-terminally tagged116

repeat was added to produce constructs with 10, 20 and 26 that contained both N- and C-terminal117

ybbR-tags.118

5. Construction of yCTPRa5y, yCTPRa9y, cCTPRrv5c and cCTPRa5c119

To facilitate ybbR-tagged construct generation, a pRSET vector was modified using RTH-SDM120

to contain an N-terminal ybbR-tag between TEV cleavage and BamHI restriction sites, and a C-121

terminal ybbR-tag between the HindIII restriction site and a stop codon (Fig S12B), Tab. S4).122

The restriction sites give rise to additional amino acids between the individual ybbR-tags and the123

protein: GS at the N-terminus and KL at the C-terminus. CTPRa5 and CTPRa10 were assembled124

in this vector by BamHI/BglII cloning. However, the last two repeats inserted were obtained125

by a PCR omitting the stop codons (Tab. S4) such that the C-terminal ybbR-tag was in frame.126

Recombination of CTPRa10 by E. coli resulted in a 9-repeat instead of a 10-repeat construct.127

Since the exact repeat number was irrelevant to our study, we proceeded with this construct. Due128

to recombination it was not possible to obtain any CTPRa constructs with ≥ 10 repeats.129

Proteins containing terminal cysteine residues were created in a similar manner using the same130

vector but with each ybbR-tag exchanged to a single cysteine (Tab. S4). The CTPRa5 was trans-131

ferred directly from the corresponding ybbR construct, while the CTPRrv5 had to be re-assembled132

from a 4-repeat construct fused to a repeat obtained by PCR and without stop codon (Tab. S4).133

B. Protein preparation134

N-terminally H6-tagged CTPR proteins were transformed in C41 E. coli and plated on LB135

Agar containing 100 µg/mL ampicillin. All colonies were used to inoculate 0.5 L of 2xYT media136

and grown at 37 °C until an optical density between OD600 = 0.6 and OD600 = 0.8 was reached,137

and protein expression was induced with 0.5 mM IPTG over 3-5 hours at 37 °C. After lysis the138

cell suspension was heated to 70 to 80 °C in a water bath to denature the majority of soluble139

cellular contaminants. The soluble protein was separated from denatured and insoluble protein140

fractions by centrifugation for 30 min at 35 000×g, filtered through a 0.22 µm PES membrane and141

14


TABLE S4. Sequences of DNA oligonucleotides used for molecular biology.

Name DNA sequence (5’ → 3’)

NybbR Fw TGCTAGTAAGCTTGCGGCAGAAGCACTGAATAATCTGGG

NybbR Rev ATAAATTCAAGAGAATCGGATCCACGCGGAACCAG

CybbR Fw TGCTAGTAAGCTTGCGTAATAAAAGCTTGATCCGGC

CybbR Rev ATAAATTCAAGAGAATCAGATCTCGGGTCCAGTTCC

pRSETa NybbR Fwd TGCTAGTAAACTTGCGGGATCCGACCTCGAGATCTGC

pRSETa NybbR Rev ATAAATTCAAGAGAATCGCCCTGAAAATACAGGTTTTCGTTG

pRSETa CybbR Fwd TGCTAGTAAACTTGCGTGAGATCCGGCTGCTAACAAAGCCC

pRSETa CybbR Rev ATAAATTCAAGAGAATCAAGCTTCGAATTCCATGGTACC

CTPRa2 BamHI Fwd TGCATGCGGATCCGCCGAGGCGTGGTATAATCTAGG

CTPRa2 RS+HindIII Rev GCATGCATAAGCTTAGATCTTGGGTCGAGTTCTAGGGCC

pRSET Ncys Fwd TGTGGATCCGACCTCGAGATCTGC

pRSET Ncys Rev GCCCTGAAAATACAGGTTTTCGTTG

pRSET Ccys Fwd TGCTGAGATCCGGCTGCTAACAAAGCCC

pRSET Ccys Rev AAGCTTCGAATTCCATGGTACCAGC

CTPR RV1 BamHI Fwd TGCATGCGGATCCGCAGAAGCACTGAATAATCTGGGTAATGTTTATCG

CTPR RV1 HindIII Rev GCATGCATAAGCTTAGATCTCGGGTCCAGTTCCAGCGC

applied to a 5 mL HisTrap Excel column connected to an Äkta Pure chromatography system and142

equilibrated in wash buffer (50 mM Tris-HCl pH 7.5, 500 mM NaCl, 20 mM imidazole, SIGMAFAST143

Protease Inhibitor Cocktail (Sigma), DnaseI (Sigma), Lysozyme (Sigma)). The column was washed144

with 20 column volumes of wash buffer before proteins were eluted using a high-imidazole buffer145

(50 mM Tris-HCl pH 7.5, 150 mM NaCl, 300 mM imidazole). All fractions containing protein were146

pooled, and if necessary, concentrated using a Vivaspin centrifugal concentrator (Sartorius) with147

the appropriate molecular weight cutoff. The protein was then further purified by size exclusion148

chromatography using a HiLoad 26/600 Superdex 75 pg or HiLoad 16/600 Superdex 75 pg (GE149

Healthcare) equilibrated in either Tris or phosphate buffer (50 mM Tris-HCl pH 7.5 or 50 mM150

sodium phosphate pH 6.8, 150 mM NaCl). Constructs with 10 repeats or more exhibited significant151

recombination resulting in proteins that had a decreasing number of repeats. Hence, only the first152

few fractions of the elution peak were pooled for concentration, while >60 % of the fractions had153

to be discarded.154

The CTPRrv4 construct used for crystallography was purified essentially as above but in 50 mM155

sodium phosphate pH 6.8, 150 mM NaCl based buffers. After elution from the resin with buffer156

containing imidazole, the protein was dialysed against 50 mM sodium phosphate pH 6.8, 150 mM157

NaCl for 18 hours, in the presence of thrombin (MP biomedical) to remove the H6-tag from158

the construct. The protein was further purified using a HiLoad 26/600 Superdex 75 pg column159

(GE Healthcare) equilibrated in 10 (10 mM HEPES pH 7.5, 150 mM NaCl, and concentrated to160

20 mg/mL.161

The intact mass of all constructs was confirmed by mass spectrometry.162

C. Equilibrium denaturation163

Samples of a total volume of 150 µL were prepared in a 96-well format (Greiner, medium-164

binding), in 50 mM sodium phosphate pH 6.8, 150 mM NaCl with guanidinium hydrochloride165

(GdHCl) gradients of 0 to 4.5 M (CTPRrv2 and yCTPRrv3y) or 0 to 7 M (all other proteins) [9].166

The exact denaturant concentration was calculated using the refractive indices of the native and167

denaturing buffers. A semi-automatic Hamilton Syringe unit was used to dispense the denaturant168

15


gradient. The final protein concentration was adjusted for each construct, depending on repeat169

type (presence/absence of one tryptophan per repeat) and array length, and ranged from <1 µM170

(large CTPRrv and all CTPRa constructs) to >11 µM (CTPRrv2). Samples were incubated on an171

orbital shaker at 25 °C for 2 h. Tryptophan residues were excited at 295± 10 nm and fluorescence172

was monitored at 360± 10 nm using a CLARIOStar microplate reader (BMG Labtech). Due to173

the deletion of tryptophan residues from the CTPRrv variant, tyrosine residues were excited at174

280± 10 nm and their fluorescence measured at 330± 10 nm. The data from 9 reads were averaged175

and normalised. The resulting fluorescence curve, F , was converted to the fraction of folded, θ, or176

unfolded protein, 1− θ, using177

F = (αN + βND) θ + (αU + βUD) (1− θ) (S1)

or178

1− θ =
−F + αN + βND

αN − αU + (βN − βU )D
, (S2)

where αN + βND and αU + βUD describe the base lines at low (native) and high (unfolded)179

denaturant concentrations. Parameters for the baselines were extracted using a two-state unfolding180

equation to the whole data set or two separate linear fits to the baselines only.181

To extract the intrinsic and interfacial energies (∆Gunit and ∆Gnn) a homopolymer repeat Ising182

model was globally fit to denaturation data of un-tagged constructs with N = 2, 4, 5, 8 and 10183

repeats using the PyFolding suite [10], the code of which is based on the formalism developed by184

Barrick and co-workers [11]. We did not fit a heteropolymer helix model as this would lead to185

overparametrization (6 free parameters vs. 5 data sets).186

D. Crystallography187

CTPRrv4 at 20 mg/mL was crystallised in JCSG-plus screen, well B10 (0.2 M MgCl2, 0.1188

M sodium cacodylate, pH 6.5 and 50% v/v PEG 200, Molecular Dimensions) in sitting drop189

plates (SwissSci, Molecular Dimensions) with 600 nL droplets in 1:1 and 1:2 ratios of protein to190

well solution. Crystals were looped and flash frozen without further cryoprotectants. Crystals191

diffracted to 3.0�A resolution on beamline I04 at Diamond Light Source (Oxford, UK). The data192

were processed using autoPROC [12] with the determination of diffraction limits set by a local193

I/σI ≥ 1.50. The phase was solved by molecular replacement using a CTPRa4 structure (PDB194

accession code: 2hyz) with two molecules in the asymmetric unit. Refinements were performed195

using BUSTER version 2.10.3, [13, 14] and iterative model building in Coot [15]. We conservatively196

modelled phosphate molecules in the concave face of the TPR superhelix, since this buffer was197

present during all purification steps prior to size exclusion chromatography. Further details on198

collection and refinement statistics can be found in Table S1. Models of proteins containing more199

than 4 repeats were created by symmetry transformation in PyMOL, and missing residues and200

peptide bonds, e.g. between individual 4-mers, were added using MODELLER [16].201

E. Calculation of plane angles202

Changes in geometry between different repeat protein structures can be measured on two levels:203

(a) by comparing the whole repeat array (e.g. the superhelical arrangement in the case of TPRs), or204

(b) by comparing the angular differences between repeat planes. Dimensions of the TPR superhelix205

were estimated using the “Structure Measurments” tool of UCSF Chimera [17] and 20-repeat206

16


models of both repeat types. Calculations for obtaining angles between repeat planes were adapted207

from Forwood et al. [18]. In brief, a principal component analysis (PCA) is performed on the208

Cα-atom coordinates of each repeat, omitting the inter-repeat loops, to calculate the principal209

components (PCs, Fig. S13A) that are orientated along the length (PC1, purple), width (PC2,210

blue) and depth (PC3, green) of the repeat. As previously reported, curvature is defined as the211

angle between the respective PC2s of repeats i and i+ 1 projected onto the plane of repeat i+ 1,212

twist is the angle between PC1s projected onto the plane formed by PC1i+1 and PC3i+1, and lateral213

bending is the angle of PC3s projected onto the plane formed by PC1i+1 and PC3i+1 (Fig. S13B).214

Next, some conventions were introduced to ensure the correct direction (positive or negative) of215

the angle: (i) PC1 always has the same orientation as the superhelical axis, which is defined by the216

right-hand-rule from the N- to C-terminal direction of the polypeptide chain [19], (ii) PC3 points217

into the same direction as a vector from the centroid of repeat i to the centroid of repeat i + 1,218

and (iii) PC2 has the same direction as cross-product of PC3 with PC1. All calculations were219

performed using custom-written Python scripts with NumPy and Matplotlib extensions [20–23].220

FIG. S13. Visualisation of principal components fitted to repeat planes. (A) Sketch of alignment of PC1-3
with TPR repeats. (B) Schematic representation of how PC1-3 are used to calculate angles for curvature,
twist and bending.

F. Circular dichroism spectroscopy221

Proteins used for circular dichroism spectroscopy (CD) were buffer exchanged into 10 mM222

sodium phosphate pH 6.8, 50 mM NaCl, 1 mM DTE using PD10 minitrap columns (Cytiva), and223

diluted to approximately 2 µM. CD measurements were performed on a Chirascan CD spectrom-224

eter (Applied Photophysics) using 1 mm path-length cuvettes (Precision Cells, 110-QS, Hellma225

Analytics). CD spectra were recorded between 200 and 280 nm at a bandwidth of 1 mm with a226

rate of 0.5 s/nm. The data of five scans were averaged and converted to mean residue ellipticity227

to account for differences in the measured concentrations and in construct length (see Section IV).228

Uncertainties were estimated based on the standard error of the mean of the CD readings and a229

10% error to approximate uncertainties in concentration.230

G. Force spectroscopy experiments231

1. Sample preparation232

Protein-DNA chimeras based on Sfp-mediated conjugation were essentially produced as de-233

scribed previously [24, 25]. Reaction volumes of 50 to 100 µL containing 50 mM HEPES pH 7.5,234

17


10 mM MgCl2, 10 µM ybbR-tagged protein, 20 µM CoA-oligo (Biomers) and 10 µM Sfp-synthase235

(made in-house, the plasmid was a kind gift from the Gaub Lab at the LMU, Munich) were incu-236

bated over-night at room temperature. If necessary, yields pf the desired product were increased237

by performing the reaction with 40 µM CoA-oligo and 20 µM Sfp-synthase.238

Protein-DNA chimeras based on cysteine-maleimide reactions were produced as described pre-239

viously [26]. In brief, proteins were reduced with a 10-fold excess of TCEP (Sigma Aldrich) for240

at least 30 min, desalted into phosphate-buffered saline (PBS) using a HiTrap Desalting 5ml (GE241

Healthcare), and reacted to a 10-fold excess of DBCO-maleimide (Sigma Aldrich) for at least 2 h.242

After renewed desalting, 10 µM protein was then reacted with 20 µM azide oligo (Integrated DNA243

Technologies) in 100 µL volumes over-night at 37 °C in an orbital shaker.244

Samples were purified using a Superdex 200 10/300 GL (GE Healthcare) or YMC Pack Diol-300245

(Yamamura Chemical Research) equilibrated in 50 mM Tris-HCl pH 7.5, 150 mM NaCl. Fractions246

containing protein conjugated to two oligos were identified by SDS-PAGE, and 4 to 10 µL of those247

fractions were incubated with 100 to 200 ng biotin- or digoxigenin-functionalised DNA handles at248

room temperature for at least 30 min. Less than 1 µL of that mixture was added to anti-digoxigenin249

beads in 10 µL measuring buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl) and incubated for less250

than 5 min. Then, 0.5 to 0.7 µL of this mixture were added to 50 µL containing streptavidin beads,251

an oxygen scavenger system consisting of 0.65% (w/v) glucose (Sigma), 13 U/mL glucose oxidase252

(Sigma), and 8500 U/mL catalase (Calbiochem). Anti-digoxigenin and streptavidin beads were253

produced in-house using carboxyl-functionalised 1 µm beads (Bangs Laboratories) [27]. The final254

mixture was introduced into a home-built chamber that had been blocked with 10 mg/mL BSA for255

at least 5 min and washed with measuring buffer twice.256

2. Data acquisition257

All experiments were conducted on a custom-built, dual-beam set up with back-focal plane258

detection, with both traps having a stiffness of 0.25 to 0.35 pN/nm. An acousto-optical deflector259

was used to move one bead away from (or towards) the other at speeds ranging between 10 nm/s to260

5 µm/s. Bead positions were tracked using a photo-diode detector. Signals were filtered at 50 kHz261

using an 8-pole Bessel filter, acquired at 100 kHz and downsampled to 20 kHz before storage.262

Averaged force-distance curves were obtained from constant-velocity pulling cycles at≤100 nm/s,263

where there was no detectable hysteresis by binning, by averaging several stretch FDCs at typically264

100 different trap distances.265

VI. DATA ANALYSIS OF RAW FECS AND FDCS266

A. Fitting of raw FECs267

Force-extension curves (FECs) were fit with268

FeWLC (ξ) =
kBT

pD

 1

4
(

1− ξ
LD

)2 − 1

4
+

ξ

LD
− FeWLC

K

 (S3)

to model the DNA force response [28] and269

FWLC (ξ, c) =
kBT

pp

 1

4
(

1− ξ
Lc

)2 − 1

4
+

ξ

Lc

 (S4)

18


to model the unfolded polypeptide [29], where ξ is the extension, kB is the Boltzmann constant,270

T the temperature, pD the persistence length of DNA, LD the contour-length of the DNA and K271

its elastic stretch modulus, and pp and Lc are the persistence and contour length of the protein,272

respectively. Theoretical and measured protein contour lengths are listed in Tab. S5. On average,273

we found that pD = 21.6± 0.6 nm, K = 730± 70 pN and pp = 0.70± 0.01 nm (mean ± SEM). LD274

correlated with the number of repeats (see Fig. S11).275

TABLE S5. Expected and measured contour lengths of CTPRa proteins. End-to-end distances |∆~r| are
measured between the Cα atoms of the first and last amino acids. The exact length of the ybbR-tags differ
between CTPRrv (12 amino acids) and CTPRa (16 amino acids) constructs due to cloning boundaries. All
values are in nm. Calculated contour length of the attachment tags are 4.38 nm and 5.84 nm for the ybbR
tags of the rv- and a-type proteins, respectively, and 2.19 nm for the cysteine attachments. Measured values
are reported as the mean of all molecules and the corresponding standard error.

Protein No. molecules Mean no. of traces used for averaging Lcalc
a |∆~r| L∗

calc
b Lc

yCTPRrv3y 4 12 41.61 3.07 34.16 30.9 ± 0.4
yCTPRrv5y 5 7 66.43 4.19 57.86 56.9 ± 0.7
yCTPRrv10y 4 10 128.48 7.22 117.22 116 ± 2
yCTPRrv20y 7 6 252.58 14.59 233.61 222 ± 3
yCTPRrv26y 12 5 327.04 18.86 303.8 297 ± 1

yCTPRa5y 11 5 67.89 4.69 57.36 55.7 ± 0.5
yCTPRa9y 15 6 117.53 7.65 104.04 97.8 ± 0.8

cCTPRrv5c 19 7 64.24 4.19 57.86 52.2 ± 0.5
cCTPRa5c 12 5 64.24 4.69 57.36 52.6 ± 0.6

a Lcalc = 0.365 nm ·Nresidues
b L∗calc = Lcalc − |∆~r| − Ltag

B. Extracting average unfolding and refolding forces276

Due to the nature of their unfolding transition, it was not possible to extract the unfolding277

forces, which traditionally are the force at which a protein or a subdomain unfolds completely,278

i.e. the force peak. The force data were processed using Igor Pro (Wavemetrics) and analysed279

further in Python. The data of each force curve were binned into a histogram, giving rise to clear280

peaks corresponding to the baseline and the unfolding plateau (Figure S14A). The positions of281

these peaks was extracted from the histogram using a sum of two Gaussian functions and a linear282

dependence of the background noise on force (force clamping):283

P (F ) = mF + c+ a1e
1
2

(
F−µ1
σ1

)2

+ a2e
1
2

(
F−µ2
σ2

)2

, (S5)

where P (F ) is the probability density of force values, m and c are the slope and intercept of the284

noise level, and a the scaling factor, µ the mean and σ the standard deviation of the gaussian.285

C. Estimating the work done by the trap/protein from constant velocity data286

Force-extension curves taken at 10 nm/s and 100 nm/s were fitted with WLC models for both287

the DNA and fully extended protein. The non-equilibrium energies, or the work done by or on the288

system, W , were then extracted from force-distance curves (FDCs) [30]. The work done on the289

protein, or the unfolding energy, is simply the difference between the unfolding trace, U(d) and290

19


FIG. S14. Calculating the forces and energies of TPR unfolding transitions. (a) The mean unfolding force
is extracted by fitting a Gaussian function (red) to a histogram of forces (right) which was derived from
the raw data (left, plotted as force against its index array). (b) The non-equilibrium energies of unfolding
are simply the area (shaded light blue) between the unfolding curve and the contour of the fully extended
construct.

the FDC of the fully extended protein, C(d):291

WU =

∫ d2

d1

U(d) dd−
∫ d2

d1

C(d) dd, (S6)

which corresponds to the area between those two curves (Figure S14B). The work done by the292

protein, or the refolding energy, is the difference between the force response of the unfolded protein293

and the refolding trace R(d):294

WF =

∫ d2

d1

C(d) dd−
∫ d2

d1

R(d) dd. (S7)

VII. MECHANICAL ISING MODELS295

A microscopic conformation c = {c1, . . . , cN} of a protein consisting of N subunits was described296

by a bit-word of length N , where ones indicate folded subunits and zeros indicate unfolded subunits.297

In the case of N subunits there are 2N possible microscopic conformations, e.g. for N = 3,298

c = {000, 100, 010, 001, 110, 101, 110, 111}.299

The full Hamiltonian of the entire system at a trap distance d is given by300

Hd(x, c) = Hint(c) +Hmech
d (x, c), (S8)

where Hint(c) describes the conformation-dependent internal energy and Hmech
d (x, c) describes the301

mechanical energy stored in the system.302

The energy for mechanically stretching the system consisting of linker and the Hookean spring303

of the optical trap is304

Hmech
d (x, c) =

∫ d−x

0
Fconstruct (c, ξ) dξ +

1

2
kx2. (S9)

In the experimental configuration, the two mechanical parts consisting of dsDNA and unfolded305

polypeptide are in series (see Fig. S15). Hence, the extension of the full linker consisting of dsDNA306

and unfolded polypeptide is given by307

ξconstruct(F, c) = ξeWLC(F ) + ξWLC(F, c) + ξfolded(c), (S10)

20


where ξeWLC and ξWLC are given by eq. (S3) and eq. (S11). The extension of the folded protein308

ξfolded was assumed to be independent of force, but dependent on the particular configuration c309

of the protein, i.e. it contained information on the protein structure (see Fig. S16G and Section310

VII A below). The inverse of eq. (S10) yields the force on the construct as a function of length of311

unfolded polypeptide and total extension Fconstruct (ξ, c).312

FIG. S15. Lengths and quantities used in the compliance model for a two-bead configuration (top) and the
equivalent one-bead configuration (bottom).

The mechanical properties of the dsDNA linker were modelled using Eq. S3, and the mechanical313

properties of the polypeptide part were modelled using314

FWLC (ξ, c) =
kBT

pp

 1

4
(

1− ξ
Lc(c)

)2 − 1

4
+

ξ

Lc(c)

 , (S11)

where Lc(c) =
(
N −

∑N
i=1 ci

)
· Laa + Ltag is the contour length of the unfolded polypeptide when315

the protein is in conformation c, pp is the persistence length of the unfolded polypeptide, Ltag is316

the contour length of the attachment tag and Laa = 0.365 nm is the length of a single amino acid317

[31].318

A. Structure information319

As highlighted in the main text, the models only accurately described the experimental data320

when the superhelical nature of CTPR proteins was considered. We incorporated this structural321

information into eq. S10 by setting ξfolded(c) to the sum of the end-to-end distances (Cα to Cα) of322

all folded stretches of helices, as given by the crystal structure.323

For example, for a configuration 0111001111, we set ξfolded = ξ2...4 + ξ7...10, where ξi...j is the324

crystal-structure end-to-end distance from the start of helix i to the end of helix j.325

B. Interaction models326

We considered four different interaction models of subunits and their coupling. For all models,327

the folded protein extension ξfolded(c) was obtained from the crystal structure for each possible328

configuration (see Fig. S16).329

21


A
A

B B

A A
B B

D
A AB B

A B C

A A
B B

repeat units helix units helix unitshelix units

FIG. S16. Different Ising models were tested to describe the folding of TPR proteins. In all models, red
arrows indicate the interactions between respective subunits and ξfolded represents the end-to-end distance
of the folded portion. (A) In the homopolymer repeat model subunits consist of a whole repeats (i.e. two
helices). (B) In the homopolymer helix model subunits consist of individual helices that are treated exactly
the same. (C) In the heterpolymer helix model the structural repeat is divided into its A and B helices with
respective energies. (D) The heteropolymer helix model can be extended to include nearest & next-nearest
neighbour interactions (NNN) that may occur e.g. due to structural contacts.

1. Homopolymer repeat model330

In models based on a whole repeat (i.e. one A- and B-helix) as the smallest independent protein331

unit the internal energy of the protein is332

Hint(c) = ∆Gunit

N∑
i=1

ci + ∆Gnn

N−1∑
i=1

cici+1, (S12)

where ∆Gunit is the energy of a folded subunit and ∆Gnn describes the energy of the next-neighbour333

interactions between two adjacent folded subunits (Fig. S16A). This is the simplest form of a one-334

dimensional Ising model.335

2. Homopolymer helix model336

The homopolymer helix model is equivalent to the homopolymer repeat model, but subunits337

consist of helices instead of repeats. Just as for the repeat model, interaction energies only affect338

next neighbours (Fig. S16B).339

3. Heteropolymer helix model340

This model takes into account that the two alpha helices in a repeat are different and thus may341

be parameterized by different energies. Only next-neighbour energies are allowed. The internal342

energy is given by343

Hint(c) = nA∆GA + nB∆GB + nAB∆GAB + nBA∆GBA, (S13)

where nA is the number of folded A-helices in conformation c, nAB is the number of folded pairs344

of A and B helices, nBA is the number of folded pairs of B and A helices, etc (see Fig. S16C).345

22


4. Heteropolymer helix nearest & next-nearest (NNN) model346

This model accounts for contacts between adjacent A-A and B-B helices found in the crystal347

structure and assigns corresponding energies (Fig. S16D). The internal energy of the protein is348

Hint(c) = nAB∆GAB + nBA∆GBA + nAA∆GAA + nBB∆GBB + nA∆GA + nB∆GB. (S14)

Here, nAB is the number of adjacent folded A and B helices and so on. Unfolded helices are349

considered to break contacts between next-nearest neighbours, such that a configuration ABA would350

contribute toward nAA, but A-A would not.351

We note that the both the heteropolymer helix model and the heteropolymer helix NNN model
can be mapped to the repeat model when

∆Gunit = ∆GA + ∆GB + ∆GAB and (S15)

∆Gnn = ∆GBA + ∆GAA + ∆GBB.

For all models, the total energy of a protein with N repeats is then

∆Gtot = N ∆Gunit + (N − 1) ∆Gnn. (S16)

C. Calculation of force-distance curves352

Under equilibrium conditions, the mean bead deflection x for a given trap distance d is353

〈x (d)〉 =

∫
x

∑
c x exp

(
−Hd(x,c)kBT

)
dx∫

x

∑
c exp

(
−Hd(x,c)kBT

)
dx

, (S17)

where Hd(x, c) is the full Hamiltonian of the system (eq. (S8)), which also depends on the model-354

dependent energies (e.g. ∆Gnn, ∆Gunit), which are omitted here for ease of notation.355

Consequently, a force-distance curve (FDC) can be calculated using356

F (d) = 〈x(d)〉 ·
(

1

k1
+

1

k2

)
, (S18)

where k1 and k2 are the spring constants of the two traps.357

D. Calculation of unfolding profile358

Similarly, the probability of a subunit i to be folded at a given trap distance d is359

pi (d) =

∫
x

∑
c δi(c) exp

(
−Hd(x,c)kBT

)
dx∫

x

∑
c exp

(
−Hd(x,c)kBT

)
dx

, (S19)

where360

δi(c) =

{
1, if the i-th bit of word c is set

0, otherwise
. (S20)

23


E. Minimal folding unit under load361

To determine the size of the minimal folded unit under force conditions, we first numerically362

determined d∗ = d | p(c = 0) = 1
2 , i.e. the distance at which the unfolded configuration is equally363

populated as all other configurations, where364

p (c) =

∫
x exp

(
−Hd(x,c)kBT

)
dx∫

x

∑
c′ exp

(
−Hd(x,c

′)
kBT

)
dx

(S21)

is the relative population of conformation c.365

The minimal folded unit was then calculated as the mean number of folded subunits of all other366

configurations c 6= 0, weighted by their population.367

F. Minimal folding unit in the absence of load368

We define the minimal folding unit in the absence of load as the minimal amount of subunits369

that are necessary such that the total energy of the protein becomes negative.370

G. Computation and simplification371

FDCs were calculated by numerically evaluating eq. (S18) using custom-written CUDA software372

on a GeForce RTX 2080 graphics card (Nvidia). Even though massive parallelization greatly ac-373

celerated the computation time, the calculations were still too expensive for long repeat molecules,374

such as the 26-repeat protein in the helix models with a conformational space size of 252 ≈ 5×1015.375

A matrix formalism, which was previously employed to reduce model complexity in chemical un-376

folding [11], could not be used to describe the mechanical unfolding because of the non-linear contri-377

butions of the linker molecules (DNA and unfolded polypeptide) to the mechanical energy.Instead,378

we considered two simplifications that reduced the conformational space by eliminating extremely379

unlikely high-energy configurations.380

1. Skip approximation381

In helix models, we excluded all configurations in which an individual helix was folded without382

adjacent folded neighbours (e.g. 010111), or in which two adjacent helices were folded without383

a stabilising neighbors (e.g. 110111). These simplifications were in accordance with previous384

experimental findings that individual repeats are not stable in solution and resulted in a reduction385

of the computational complexity from O(2N ) to < O(1.65N ).386

The simplifications allowed us to calculate FDCs for molecules of all repeat lengths. However,387

the computational cost for the longest molecules was still very expensive (≈60 h per iteration for388

one FDC with ≈ 4× 1010 configurations of a 26-mer in the Skip approximation) and prevented us389

from using these approximations in a fit function.390

2. Zipper approximation391

Therefore, we also considered a zipper approximation, in which unfolding always occurs from392

the ends and configurations such as 11101111 do not exist. This model was of complexity O(N2)393

and could easily be fitted to all molecules.394

24


3. Verification395

In practice, we obtained the energy parameters by fitting the zipper approximation to molecules396

of all repeat lengths. We then verified that FDCs obtained from the Skip approximation, with the397

same energy parameters, closely reproduced the prediction of the zipper model (see fig. S5A).398

The resulting energies for all molecules for which the computation was feasible were identical399

within errors when comparing the Skip approximation and the zipper approximation. (see Table 1400

in the main text).401

H. Error estimation and propagation402

To determine the errors of the reported energies ∆Gunit, ∆Gnn and ∆Gtot (eqns. (S15, S16)),403

we performed model fits to each individual molecule. The reported errors were then calculated by404

Gaussian error propagation based on the covariance matrix of the individual values of ∆GA, ∆GB,405

∆GAB, ∆GBA, ∆GAA and ∆GBB and reported as standard error of the mean (SEM) [32].406

25


[1] J. Yang, R. Yan, A. Roy, D. Xu, J. Poisson, and Y. Zhang, The I-TASSER Suite: protein structure407

and function prediction, Nature Methods 12, 7 (2015).408

[2] S. K. Wetzel, G. Settanni, M. Kenig, H. K. Binz, and A. Plückthun, Folding and unfolding mechanism409

of highly stable full-consensus ankyrin repeat proteins, Journal of Molecular Biology 376, 241 (2008).410

[3] E. R. Main, Y. Xiong, M. J. Cocco, L. D’Andrea, and L. Regan, Design of stable α-helical arrays from411

an idealized TPR motif, Structure 11, 497 (2003).412

[4] T. Kajander, A. L. Cortajarena, S. Mochrie, and L. Regan, Structure and stability of designed tpr413

protein superhelices: unusual crystal packing and implications for natural tpr proteins, Acta Crystal-414

lographica Section D 63, 800 (2007).415

[5] A. L. Cortajarena, T. Kajander, W. Pan, M. J. Cocco, and L. Regan, Protein design to understand pep-416

tide ligand recognition by tetratricopeptide repeat proteins, Protein Engineering, Design and Selection417

17, 399 (2004).418

[6] A. Hemsley, N. Arnheim, M. D. Toney, G. Cortopassi, and D. J. Galas, A simple method for site-directed419

mutagenesis using the polymerase chain reaction, Nucleic Acids Research 17, 6545 (1989).420

[7] S. Moore, ’round the horn site-directed mutagenesis.421

[8] T. Kajander, A. L. Cortajarena, E. R. G. Main, S. G. J. Mochrie, and L. Regan, A new folding paradigm422

for repeat proteins, Journal of the American Chemical Society 127, 10188 (2005).423

[9] A. Perez-Riba and L. S. Itzhaki, A method for rapid high-throughput biophysical analysis of proteins,424

Scientific Reports 7, 9071 (2017).425

[10] A. R. Lowe, A. Perez-Riba, L. S. Itzhaki, and E. R. Main, Pyfolding: Open-source graphing, simulation,426

and analysis of the biophysical properties of proteins, Biophysical Journal 114, 511 (2018).427

[11] T. Aksel and D. Barrick, Analysis of repeat-protein folding using nearest-neighbor statistical mechanical428

models, in Biothermodynamics, Part A, Methods in Enzymology, Vol. 455, edited by M. L. Johnson,429

J. M. Holt, and G. K. Ackers (Academic Press, 2009) Chap. 4, pp. 95–125.430

[12] C. Vonrhein, C. Flensburg, P. Keller, A. Sharff, O. Smart, W. Paciorek, T. Womack, and G. Bricogne,431

Data processing and analysis with the autoproc toolbox, Acta Crystallographica Section D 67, 293432

(2011).433

[13] B. G., B. E., B. M., F. C., K. P., P. W., R. P, S. A., S. O.S., V. C., and W. T.O., Buster (2020).434

[14] O. S. Smart, T. O. Womack, C. Flensburg, P. Keller, W. Paciorek, A. Sharff, C. Vonrhein, and435

G. Bricogne, Exploiting structure similarity in refinement: automated ncs and target-structure re-436

straints in buster, Acta Crystallographica Section D 68, 368 (2012).437

[15] P. Emsley, B. Lohkamp, W. G. Scott, and K. Cowtan, Features and development of coot, Acta Crys-438

tallographica Section D - Biological Crystallography 66, 486 (2010).439

[16] A. Šali and T. L. Blundell, Comparative protein modelling by satisfaction of spatial restraints, Journal440

of Molecular Biology 234, 779 (1993).441

[17] E. F. Pettersen, T. D. Goddard, C. C. Huang, G. S. Couch, D. M. Greenblatt, E. C. Meng, and T. E.442

Ferrin, UCSF Chimera–a visualization system for exploratory research and analysis, J Comput Chem443

25, 1605 (2004).444

[18] J. K. Forwood, A. Lange, U. Zachariae, M. Marfori, C. Preast, H. Grubmüller, M. Stewart, A. H.445

Corbett, and B. Kobe, Quantitative structural analysis of importin-β flexibility: Paradigm for solenoid446

protein structures, Structure 18, 1171 (2010).447

[19] B. Kobe, T. Gleichmann, J. Horne, I. G. Jennings, P. D. Scotney, and T. Teh, Turn up the HEAT,448

Structure 7, R91 (1999).449

[20] K. J. Millman and M. Aivazis, Python for scientists and engineers, Computing in Science & Engineering450

13, 9 (2011).451

[21] T. E. Oliphant, Python for scientific computing, Computing in Science & Engineering 9, 10 (2007).452

[22] S. v. d. Walt, S. C. Colbert, and G. Varoquaux, The numpy array: A structure for efficient numerical453

computation, Computing in Science & Engineering 13, 22 (2011).454

[23] J. D. Hunter, Matplotlib: A 2D graphics environment, Computing In Science & Engineering 9, 90455

(2007).456

[24] J. Yin, P. D. Straight, S. M. McLoughlin, Z. Zhou, A. J. Lin, D. E. Golan, N. L. Kelleher, R. Kolter,457

and C. T. Walsh, Genetically encoded short peptide tag for versatile protein labeling by Sfp phospho-458

pantetheinyl transferase, Proceedings of the National Academy of Sciences 102, 15815 (2005).459

26

https://openwetware.org/wiki/%27Round-the-horn_site-directed_mutagenesis


[25] M. Synakewicz, D. Bauer, M. Rief, and L. S. Itzhaki, Bioorthogonal protein-DNA conjugation methods460

for force spectroscopy, Sci Rep 9, 13820 (2019).461

[26] A. Mukhortava and M. Schlierf, Efficient formation of site-specific protein-dna hybrids using copper-free462

click chemistry, Bioconjugate Chemistry 27, 1559 (2016).463

[27] K. Tych and G. Žoldák, Stable Substructures in Proteins and How to Find Them Using Single-Molecule464

Force Spectroscopy, Methods Mol Biol 1958, 263 (2019).465

[28] M. D. Wang, H. Yin, R. Landick, J. Gelles, and S. M. Block, Stretching DNA with optical tweezers,466

Biophysical Journal 72, 1335 (1997).467

[29] C. Bustamante, J. F. Marko, E. D. Siggia, and S. B. Smith, Entropic elasticity of lambda-phage DNA,468

Science 265, 1599 (1994).469

[30] J. C. M. Gebhardt, T. Bornschlögl, and M. Rief, Full distance-resolved folding energy landscape of one470

single protein molecule, Proceedings of the National Academy of Sciences 107, 2013 (2010).471

[31] H. Dietz and M. Rief, Exploring the energy landscape of gfp by single-molecule me-472

chanical experiments, Proceedings of the National Academy of Sciences 101, 16192 (2004),473

https://www.pnas.org/content/101/46/16192.full.pdf.474

[32] I. G. Hughes and T. P. A. Hase, Measurements and their Uncertainties: A Practical Guide to Modern475

Error Analysis (Oxford University Press, 2010).476

27

https://doi.org/10.1073/pnas.0909854107
https://doi.org/10.1073/pnas.0404549101
https://arxiv.org/abs/https://www.pnas.org/content/101/46/16192.full.pdf

	Supplementary Information Unraveling the Mechanics of a Repeat-Protein Nanospring — From Folding of Individual Repeats to Fluctuations of the Superhelix
	Contents
	Supplementary figures
	Supplementary tables
	Materials
	Protein Sequences
	Experimental methods
	Molecular biology
	Mutagenesis
	General repeat array construction
	Construction of yCTPRrv3y and yCTPRrv5y
	Construction of yCTPRrv10y, yCTPRrv20y and yCTPRrv26y
	Construction of yCTPRa5y, yCTPRa9y, cCTPRrv5c and cCTPRa5c

	Protein preparation
	Equilibrium denaturation
	Crystallography
	Calculation of plane angles
	Circular dichroism spectroscopy
	Force spectroscopy experiments
	Sample preparation
	Data acquisition


	Data analysis of raw FECs and FDCs
	Fitting of raw FECs
	Extracting average unfolding and refolding forces
	Estimating the work done by the trap/protein from constant velocity data

	Mechanical Ising models 
	Structure information
	Interaction models
	Homopolymer repeat model
	Homopolymer helix model
	Heteropolymer helix model
	Heteropolymer helix nearest & next-nearest (NNN) model

	Calculation of force-distance curves
	Calculation of unfolding profile
	Minimal folding unit under load
	Minimal folding unit in the absence of load
	Computation and simplification
	Skip approximation
	Zipper approximation
	Verification

	Error estimation and propagation

	References