Supplementary Information1 2 Unraveling the Mechanics of a Repeat-Protein Nanospring — From Folding of3 Individual Repeats to Fluctuations of the Superhelix4 Marie Synakewicz, Rohan S. Eapen, Albert Perez-Riba, Daniela Bauer, Andreas Weißl,5 Gerhard Fischer, Marko Hyvönen, Matthias Rief, Laura S. Itzhaki, Johannes Stigler∗6 ∗ To whom correspondence should be addressed: L.S. Itzhaki (lsi10@cam.ac.uk), M. Synakewicz (m.synakewicz@bioc.uzh.ch) and J. Stigler (stigler@genzentrum.lmu.de) 1 mailto:L.S. Itzhaki (lsi10@cam.ac.uk), M. Synakewicz (m.synakewicz@bioc.uzh.ch) and J. Stigler (stigler@genzentrum.lmu.de) mailto:L.S. Itzhaki (lsi10@cam.ac.uk), M. Synakewicz (m.synakewicz@bioc.uzh.ch) and J. Stigler (stigler@genzentrum.lmu.de) CONTENTS7 I. Supplementary figures 38 II. Supplementary tables 109 III. Materials 1210 IV. Protein Sequences 1211 V. Experimental methods 1212 A. Molecular biology 1213 1. Mutagenesis 1214 2. General repeat array construction 1315 3. Construction of yCTPRrv3y and yCTPRrv5y 1416 4. Construction of yCTPRrv10y, yCTPRrv20y and yCTPRrv26y 1417 5. Construction of yCTPRa5y, yCTPRa9y, cCTPRrv5c and cCTPRa5c 1418 B. Protein preparation 1419 C. Equilibrium denaturation 1520 D. Crystallography 1621 E. Calculation of plane angles 1622 F. Circular dichroism spectroscopy 1723 G. Force spectroscopy experiments 1724 1. Sample preparation 1725 2. Data acquisition 1826 VI. Data analysis of raw FECs and FDCs 1827 A. Fitting of raw FECs 1828 B. Extracting average unfolding and refolding forces 1929 C. Estimating the work done by the trap/protein from constant velocity data 1930 VII. Mechanical Ising models 2031 A. Structure information 2132 B. Interaction models 2133 1. Homopolymer repeat model 2234 2. Homopolymer helix model 2235 3. Heteropolymer helix model 2236 4. Heteropolymer helix nearest & next-nearest (NNN) model 2337 C. Calculation of force-distance curves 2338 D. Calculation of unfolding profile 2339 E. Minimal folding unit under load 2440 F. Minimal folding unit in the absence of load 2441 G. Computation and simplification 2442 1. Skip approximation 2443 2. Zipper approximation 2444 3. Verification 2545 H. Error estimation and propagation 2546 References 2647 2 I. SUPPLEMENTARY FIGURES48 A B 200 210 220 230 240 250 260 270 280 Wavelength (nm) 2 0 2 4 6 [ M R ]( de g m 1 M 1 N 1 ) 1e6 CTPRa5 cCTPRa5c yCTPRa5y CTPRrv5 cCTPRrv5c yCTPRrv5y CTP Ra5 cCT PR a5c yCT PR a5y CTP Rrv 5 cCT PR rv5 c yCT PR rv5 y 3.0 2.5 2.0 1.5 1.0 0.5 0.0 [ M R ,2 22 ] (d eg m 1 M 1 N 1 ) 1e6 FIG. S1. Circular dichroism (CD) data of all 5-repeat constructs used in this study, reported as mean residue ellipticity (θMR). (A) CD spectra are shown as the mean and estimated error with line and shaded area, respectively. Although the signal at 222 nm remains largely unchanged, the mutations in the rv-type arrays appear to decrease the signal at 208 nm relative to that of the CTPRa arrays. This may either reflect the changes in helix coiling within the tertiary structure, or it is simply due to the loss of aromatics which are known to contribute to the CD signal at these wavelengths. (B) The changes in mean residue ellipticity at 222 nm, indicative of α-helicity, are small if not negligible due to the uncertainty in the protein concentration measurements between different samples (approximately 10%). A BC C C N C N N N C FIG. S2. Crystal structure of CTPRrv. (A) Structures of two macromolecules (marine blue, cartoon representation) present in the asymmetric unit with 2Fo-Fc maps (grey) contoured at 1.5σ. (B) Zoomed view of chain A, showing clear density for backbone atoms. (C) Structural deviations are minimal between chains A (marine blue) and B (dark blue), an alignment having a backbone RMSD of 0.446�A. 3 -0.2 [GdnHCl] (M) Fr ac tio n un fo ld ed �Gunit = 0.2 +/- 0.05 �Gnn = -6.8 +/- 0.1 A B CTPRrv5 cCTPRrv5c yCTPRrv5y CTPRrv10 CTPRrv10y CTPRa5 cCTPRa5c yCTPRa5y 0 1 2 3 4 5 6 [GdnHCl] (M) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Fr ac tio n un fo ld ed 0.6 0.8 1.0 1.2 0 1 2 3 4 5 6 0.0 0.2 0.4 CTPRrv2 CTPRrv4 CTPRrv5 CTPRrv8 CTPRrv10 FIG. S3. Equilibrium denaturation data of CTPR arrays using guanidine hydrochloride. (A) Attachment variants of CTPRrv5, CTPRrv10 and CTPRa5 were tested to examine the effect of the added ybbR-tag or cysteine residues at the N- and C-termini. While cysteine modifications did not altered the unfolding profile, the ybbR-tag slightly altered both the transition mid-point and the slope of the transition. We intentionally did not display any fits, since (i) TPRs with more than three repeats clearly deviate from two-state behaviour and (ii) the number of variants was not sufficient to build ensemble heteropolymer Ising models that treated the ybbR-tag as a separate helix with different intrinsic stability and interaction energy at the N- and C-terminal interfaces of the CTPR array. (B) Ensemble Ising models require a global fitting procedure to denaturation data of a series of rv-type arrays with increasing number of repeats. Here, the fits to a homopolymer repeat model with the resulting values for with ∆Gunit and ∆Gnn are displayed. A heteropolymer helix model that treated the A- and B-helices different was not fitted as it would result in over-parametrization of the data (6 free parameters versus only 3 used for the homopolymer repeat model). Experiments were performed in technical triplicates in 96-well plate format, and all data are represented as averages with corresponding standard errors. 4 10 pN 10 pN 5 pN 100 nm 100 nm 100 nm 100 nm 10 nm /s 10 nm /s 10 0 nm /s 10 0 nm /s 50 0 nm /s 50 0 nm /s 10 00 nm /s 10 00 nm /s 50 00 nm /s 20 00 nm /s A B C D 5 pN Pulling speed (nm/s) 0 10 20 30 40 50 60 H ys te re si s (k BT ) rv-type a-type E 101 102 103 FIG. S4. Hysteresis of CTPR unfolding increases slightly with higher loading rates. (A,B) Consecutive FDCs of one CTPRa9 (green) and one CTPRrv5 molecule (blue) acquired at 1 µm/s highlight the variation observed within a single molecule in the force response at higher pulling speeds. (C,D) Representative traces of the same molecules collected at five different pulling speeds. In all cases the unfolding (darker colours) and refolding traces (lighter colours) are overlaid to highlight the absence or presence of hysteresis. (E) The area under the FDCs was calculated to obtain first estimates of the unfolding and refolding free energies. Using the unfolding and refolding energies it is possible to quantify the hysteresis for individual stretch-relax cycles, here shown as mean with corresponding standard deviations to highlight the increase in variation at the higher pulling speeds. More importantly, this graph clearly shows that hysteresis is negligible for pulling speeds ≤100 nm/s. 5 FIG. S5. Zipper and skip approximations of the heteropolymer helix model result in comparable values for ∆Gtot, ∆Gunit and ∆Gnn. (A,C) Scatter plot of resulting intrinsic repeat energy and next-neighbour interaction energy for each molecule obtained from a heterpolymer helix model with either skip or zipper approximation. Colours/symbols: filled – rv-type, empty – a-type, circles – ybbR attachments, squares – cysteine attachments, colours represent array lengths (see (B,D)). (B,D) The respective total energy ∆Gtot = N∆Gunit + (N − 1)∆Gnn for each array length of rv-type (filled symbols) and a-type (empty symbols). Error bars represent the SEM, but are too small to be seen. FIG. S6. Model selection. (A) The homopolymer repeat model (dashed line) fails to reproduce the curvature at the transition between the DNA stretch response and the protein unfolding plateau for CTPRrv10, while the heteropolymer helix model (continous line) fits well. (B) The homopolymer helix model has higher fit residuals (top) than the heteropolymer helix model (middle) when fitting CTPRrv5 data (bottom). Black line: fit line of heteropolymer helix model. (C) Akaike information criterion (AIC) for the four different interaction models. Reported is the average over all molecules. (D) Comparison of the AIC calculated for the Zipper and Skip approxmations of all molecules (N = 3, 5, 9, 10, 20) to which the Skip approximation could be fitted. 6 FIG. S7. Predicted unfolding of a 26-repeat protein. (A) Experimental force-distance profile (purple) fitted with a heteropolymer zipper model (continuous black line). Dashed black line: Corresponding prediction from the Skip approximation. Roman letters point to corresponding panels in B. (B) Individual columns rep- resent the ten likeliest configurations at the indicated distances. The likelihood of a particular configuration is shown on top. Color code: Colored stretches are folded, grey/white stretches are unfolded. 14.0 13.5 13.0 12.5 12.0 11.5 Fo rc e (p N ) 480460440420400380360 Trap distance (nm) 15 10 5H el ix #1.0 0.8 0.6 0.4 0.2 0.0 p f ol de d N C 10.5 10.0 9.5 9.0 8.5 8.0 Fo rc e (p N ) 440420400380360 Trap distance (nm) 5 H el ix # N C CTPRrv3 5 10 10.5 10.0 9.5 9.0 8.5 8.0 Fo rc e (p N ) 460440420400380360340 Trap distance (nm) H el ix # N C CTPRrv5 13.5 13.0 12.5 12.0 11.5 11.0 Fo rc e (p N ) 480460440420400380 Trap distance (nm) H el ix # 5 10 N C CTPRa5 CTPRa9 CTPRa9 N pair single helix C 10.5 10.0 9.5 9.0 8.5 8.0 Fo rc e (p N ) 480460440420400380360340320 Trap distance (nm) 15 20 10 5H el ix # N C CTPRrv10 10.5 10.0 9.5 9.0 8.5 8.0 Fo rc e (p N ) 550500450400350 Trap distance (nm) 30 40 20 10 H el ix # N C CTPRrv20 11.0 10.5 10.0 9.5 9.0 8.5 Fo rc e (p N ) 600550500450400 Trap distance (nm) 50 40 30 20 10 H el ix # N C CTPRrv26A B C FIG. S8. Unfolding profiles for all measured CTPRrv (A) and CTPRa (B) constructs. Colour maps represent the probability for each helix to be folded as a function of trap distance (please note that indexing proceeds from the C-terminus to the N-terminus in this case). (C) Using a zoom of the CTPRa9 data to exemplify how unfolding starts at the N- and C-termini: in all cases, unfolding starts with the C-terminal helix, and proceeds with the unfolding of (more or less) paired helices from both ends. 7 20 30 400.4 0 10 20 30 40 50 60 70 mean seed contour length 32.7± 0.2 nm mean seed contour length 35.0 ± 0.3 nm 30 40 0 10 20 30 40 ΔLseed (nm) 1.0 Fr eq ue nc y FIG. S9. Contour length histograms of the final “dip” for as measured (roughly) from the end of the plateau to the unfolded contour. Shown are data extracted from FDCs collected at 10 and 100 nm/s of CTPRrv (blue) and CTPR (green). The mean and standard errors for each repeat type are shown. As a reference, the expected contour length increase corresponding to on average 6 helices unfolding is approximately 34 nm, while that of 7 helices unfolding is 38 nm (differences between the two repeat types are less than 1 nm). Fo rc e (p N ) 706050403020 Distance (nm)Distance (nm) R ep ea t # 20 16 12 8 Fo rc e (p N ) 480470460450440430 5 N C N C 4 3 2 1 20 16 12 8 5 4 3 2 1R ep ea t # 1.0 0.8 0.6 0.4 0.2 0 p f ol de d A B FIG. S10. Simulated FDCs for a consensus ankryin repeat protein in (A) an optical tweezers set-up and (B) under conditions similar to AFM in which linker molecules are much shorter and the protein is tethered between a surface and a much stiffer cantilever. Here we used the structure of the consensus ankyrin NI3C modelled using the I-Tasser webserver (using all default values [1]), and previously reported values for the energetic parameters of ∆Gunit = 5.56 kBT and ∆Gnn = −24 kBT [2]. Please note, that given this particular structure our results indicate unfolding from the N-terminus to the C-terminus, which is contrary to previous findings. 8 500 1000 1500 KD (pN) 50 100 150 200 250 300 Δ L c (n m ) 10 20 30 pD (nm) 340 360 380 LD (nm) Apparent LD LN 5 10 15 20 25 Number of repeats 340 350 360 370 380 A pp ar en t L D (n m ) Linear fit: 0.8(±0.1) nm N + 349(±2) nm A B FIG. S11. Fitting DNA-WLCs to raw FDCs without explicitly using model for protein folding (Ising or other). (A) There is no indication for a dependence of the protein contour length on any of the DNA parameters. (B) The fitted contour lengths of the tethered constructs are compatible with predictions from the crystal structure. With a rough linear fit, we can estimate an end-to-end distance for CTPRrv20, LN ≈ 16 nm, based on the increase in the contour length of the full construct (comprising DNA and folded protein) with increasing number of repeats. This value agrees with the crystallographic value (Fig. 1). 9 II. SUPPLEMENTARY TABLES49 TABLE S1. Data collection, phasing and BUSTER refinement statistics for the CTPRrv4 structure. Values in parentheses are for the outermost shell. Parameters and statistics PDB ID: 7obi Data collection Space group P31 2 1 Unit cell, a, b, c (Å), 58.912 58.912 189.517 α, β, γ (◦) 90.00, 90.00, 120.00 Resolution range, Å 51.02 - 3.00 (3.11 - 3.00) Total reflections 16284 (1550) Unique reflections 8153 (775) Multiplicity 2.0 (2.0) Completeness, % 99.6 (96.9) I/σI 20.0 (1.3) Rmerge 0.017 (0.530) CC1/2 1.000 (0.858) Refinement Rwork/Rfree, % 0.226/0.271 Unique reflections used 8152 R.m.s deviations: bond lengths, Å 0.009 bond angles, ◦ 0.96 Ramachandran analysis: Favoured, % 98.11 Allowed, % 2.89 Outliers, % 0.00 Number of atoms (average B-factor, Å2): Protein 2187 (131.04) Ligands 20 (177.48) Mean/Wilson B-factor, Å2 131.46/114.92 TABLE S2. Repeat plane angles calculated for both CTPRa and CTPRrv arrays. Values are presented as mean ± s.e.m. of the three repeat interfaces present in the unit cell of the crystal structure, or of the 19 interfaces present in the structure of a 20 repeat model based on symmetry transformation. Cumulative angles are shown to highlight the differences between the repeat types in small and long arrays. Chain A and B of the CTPRrv crystallographic units produced values within error, hence only values for chain A are shown here. Type Number Curvature [◦] Twist [◦] Bending [◦] x̄ ∑ x x̄ ∑ x x̄ ∑ x CTPRa 4 28± 1 83± 4 13.07± 0.03 39± 0.12 22.7± 0.4 68± 1.6 20 497± 20 256± 0.6 444± 8 CTPRrv 4 32± 2 95 12± 1 37 18± 2 55 20 31.6± 0.6 601 11.4± 0.7 217 19.1± 0.7 364 10 TABLE S3. Fitted energy parameters in units of kBT . N is the number of repeats (Zipper approximation). Intrinsic repeat energy ∆Gunit and repeat next-neighbour interaction energy ∆Gnn (see eq. (S15)). ∆Gtot = N∆Gunit + (N − 1)∆Gnn is the total energy for a n N -mer. Heteropolymer helix model Heteropolymer helix NNN model Type N ∆Gtot ∆Gunit ∆Gnn ∆Gtot ∆Gunit ∆Gnn rv 3 –18.4±0.9 0.8±1.1 –10.3±1.5 –18.4±1.0 0.0±1.3 –9.2±1.4 5 –39.7±0.4 1.5±0.3 –11.8±0.3 –39.4±0.5 1.3±0.3 –11.5±0.3 10 –87.0±2.7 1.2±0.3 –11.0±0.1 –87.1±2.6 1.3±0.4 –11.2±0.3 20 –173.3±2.3 1.0±0.3 –10.2±0.2 –173.7±2.6 1.2±0.3 –10.4±0.3 26 –236.7±2.4 0.5±0.2 –10.0±0.3 –238.9±2.2 0.5±0.2 –10.1±0.2 combined 1.1±0.2 –11.0±0.2 1.0±0.2 –10.8±0.2 a 5 –61.3±0.6 –2.4±0.4 –12.4±0.4 –61.6±0.6 –2.8±0.3 –11.9±0.4 9 –117.9±1.6 –1.3±0.3 –13.3±0.4 –119.0±1.5 –1.6±0.4 –13.1±0.3 combined –1.9±0.3 –12.7±0.3 –2.3±0.3 –12.4±0.3 11 III. MATERIALS50 All reagents were purchased from Sigma Aldrich, New England Biolabs (NEB), ThermoFisher,51 Merck or Asco Chemicals unless otherwise stated. 2x yeast tryptone (2xYT) and Lysogeny Broth52 (LB) Miller were purchased from Formedium. Unmodified DNA oligonucleotides were purchased53 from Integrated DNA Technologies (IDT) or Sigma Aldrich. Synthetic genes were purchased from54 IDT. FastDigest restriction enzymes (ThermoFischer), Phusion High-Fidelity DNA polymerase55 (NEB), and QuickStick Ligase (Bioline, discontinued) or the Anza T4 Ligase Master Mix (Invitro-56 gen) were used for all cloning processes. E. coli strains for molecular biology were purchased from57 Bioline (α-select Competent Cells, Gold/Bronze Efficiency, discontinued) or NEB (NEB 5-alpha58 Competent E. coli, High efficiency). E. coli cells for expression were generated in house from C4159 cells obtained from the Kommander Lab (MRC-LMB, Cambridge). All constructs were expressed60 in vectors based on a pRSET backbone (Ampicillin resistance).61 IV. PROTEIN SEQUENCES62 The majority of CTPRs used for this study are based on the consensus sequence containing (a)63 the terminal RS residues arising from the BglII restriction site that is required for constructing64 longer repeat arrays [3, 4], and (b) the QK mutation for charge balancing of the final repeat65 protein [5]. The four-repeat construct used for crystallography was purchased as a synthetic gene,66 and contained the consensus asparagine residues at the repeat termini as well as a solvating helix.67 In the following sequences the pre/suffixes c and y identify cysteine and ybbR-tag attachment68 points for handles.69 (CTPRrv)N MRGSHHHHHHGLVPRGS(AEALNNLGNVYREQGDYQKAIEYYQKALELDPRS)N y(CTPRrv)Ny MRGSHHHHHHGLVPRGSDSLEFIASKLA(AEALNNLGNVYREQGDYQKAIEYYQK ALELDPRS)NDSLEFIASKLA c(CTPRrv)Nc MRGSHHHHHHNNNNNNNNNNENLYFQGCGS(AEALNNLGNVYREQGDYQKAIEY YQKALELDPRS)NKLC CTPRrv4 (crystallography) MRGSHHHHHHGLVPRGS(AEALNNLGNVYREQGDYQKAIEYYQKALELDPNN)4A EALNNLGNVQRKQG (CTPRa)N MRGSHHHHHHGLVPRGS(AEAWYNLGNAYYKQGDYQKAIEYYQKALELDPRS)N y(CTPRa)Ny MRGSHHHHHHNNNNNNNNNNENLYFQGDSLEFIASKLAGS(AEAWYNLGNAYYK QGDYQKAIEYYQKALELDPRS)NKLDSLEFIASKLA c(CTPRa)Nc MRGSHHHHHHNNNNNNNNNNENLYFQGCGS(AEAWYNLGNAYYKQGDYQKAIE YYQKALELDPRS)NKLC 70 71 V. EXPERIMENTAL METHODS72 A. Molecular biology73 1. Mutagenesis74 For Round-the-Horn site-directed mutagenesis (RTH-SDM, [6, 7]), 100 µM primers containing75 the required mutation/insertion in the overhang were phosphorylated using polynucleotide kinase76 (ThermoFischer) according to the manufacturer’s protocol. Phosphorylated primers were stored at77 12 H6 H6 H6 N10 H6 H6 H6BamHI BamHI BamHI (GS) BamHI BamHI BamHITPRM TPR(M+N) Protein of interest TPRM TPRN TPRN TPRN BglII BglII Ligate + Transform BglII + HindIII vector insert (PCR product or vector) BamHI + HindIII BglII BglII BglII HindIII HindIII HindIII (KL) ybbR/ cys ybbR/ cys HindIIITAA TAA TAA TAA TGA TAA TAA TAA TAA TAA TAA thrombin thrombin TEV thrombin thrombin thrombin A B FIG. S12. Schematics illustrating (A) the BamHI-BglII cloning method required to create longer CTPR arrays, and (B) the vector backbone construct developed to facilitate N- and C-terminal modification of proteins for force spectroscopy. −20 °C until required. The mutation was inserted by PCR, and products were DpnI-digested and78 gel-purified. About 50 to 100 µg of DNA material was added to 1 µL Anza T4 Ligase Master Mix79 in a total volume of 4 µL, incubated for 10 to 20 min at room temperature and transformed into80 E. coli. Plasmids were isolated from individual colonies and tested for the presence of the correct81 mutation/insertion by Sanger sequencing (Eurofins).82 2. General repeat array construction83 DNA constructs of CTPR proteins in a pRSET backbone were built sequentially from from84 one, two and four repeat modules using BamHI/BglII cloning as previously described [8]. CTPR85 repeats are preceded by a BamHI restriction site and followed by a BglII restriction site, double stop86 codon and HindIII restriction site (Fig. S12). A vector containing M repeats was digested using87 BglII, HindIII and FastAP Thermosensitive Alkaline Phosphatase (ThermoFisher) according to the88 manufacturers specifications, and purified using the QIAquick gel extraction protocol. Inserts of up89 to two repeats were produced by PCR amplification using T7-forward and -terminator sequencing90 primers. The PCR product was purified according to the QIAquick PCR purification protocol,91 and digested using BamHI and HindIII followed by heat-inactivation of the enzymes according92 to the manufacturers specifications. Inserts containing more than two repeats were obtained by93 restriction digest using BamHI and HindIII and gel extraction. Since BamHI and BglII produce94 the same 5’-overhangs, the N -repeat construct was then ligated directly into the vector using95 QuickStick (according to the manufacturer’s protocol) or Anza T4 ligase (reduced reaction volume96 as described above), transformed into high efficiency E. coli cells, and plasmid purified according97 to QIAGEN protocols. The whole procedure was repeated until the desired number of repeats was98 obtained. Using synthetic genes of single repeats, all constructs without tags for DNA attachment99 were generated this way, and were subsequently used to produce the tagged variants. The construct100 used for crystallization was obtained as a synthetic gene (Integraed DNA Technologies) and was101 sub-cloned using the BamHI and HindIII restriction sites. For short arrays (e.g. up to 8 repeats)102 DNA sequencing could verify the exact number of repeats. Longer arrays were sequenced from103 13 both termini to verify the exact cloning boundaries and digested using BamHI and HindIII to104 determine the number of repeats.105 3. Construction of yCTPRrv3y and yCTPRrv5y106 Using RTH-SDM, the 11-amino acid ybbR-tags (DSLEFIASKLA) was inserted sequentially107 between (a) the BamHI restriction site and a TPR, and (b) the BglII site and the stop codons in108 a construct containing only one repeat (see Fig. S12A, Tab. S4). After digestion with BglII, two109 and four repeats obtained from BamHI-BglII digests were added at once. The correct orientation110 of the inserts was identified by restriction digest and Sanger sequencing.111 4. Construction of yCTPRrv10y, yCTPRrv20y and yCTPRrv26y112 First, ybbR-tags were introduced by RTH mutagenesis directly adjacent to the repeat sequence113 either N-terminally or C-terminally of a single repeat, giving rise to yCTPRrv1 and CTPRrv1y,114 respectively. Second, the required number of repeats were added to yCTPRrv1 two or four repeats115 at a time, resulting in yCTPRrv9, yCTPRrv19 and yCTPRrv25. Last, the C-terminally tagged116 repeat was added to produce constructs with 10, 20 and 26 that contained both N- and C-terminal117 ybbR-tags.118 5. Construction of yCTPRa5y, yCTPRa9y, cCTPRrv5c and cCTPRa5c119 To facilitate ybbR-tagged construct generation, a pRSET vector was modified using RTH-SDM120 to contain an N-terminal ybbR-tag between TEV cleavage and BamHI restriction sites, and a C-121 terminal ybbR-tag between the HindIII restriction site and a stop codon (Fig S12B), Tab. S4).122 The restriction sites give rise to additional amino acids between the individual ybbR-tags and the123 protein: GS at the N-terminus and KL at the C-terminus. CTPRa5 and CTPRa10 were assembled124 in this vector by BamHI/BglII cloning. However, the last two repeats inserted were obtained125 by a PCR omitting the stop codons (Tab. S4) such that the C-terminal ybbR-tag was in frame.126 Recombination of CTPRa10 by E. coli resulted in a 9-repeat instead of a 10-repeat construct.127 Since the exact repeat number was irrelevant to our study, we proceeded with this construct. Due128 to recombination it was not possible to obtain any CTPRa constructs with ≥ 10 repeats.129 Proteins containing terminal cysteine residues were created in a similar manner using the same130 vector but with each ybbR-tag exchanged to a single cysteine (Tab. S4). The CTPRa5 was trans-131 ferred directly from the corresponding ybbR construct, while the CTPRrv5 had to be re-assembled132 from a 4-repeat construct fused to a repeat obtained by PCR and without stop codon (Tab. S4).133 B. Protein preparation134 N-terminally H6-tagged CTPR proteins were transformed in C41 E. coli and plated on LB135 Agar containing 100 µg/mL ampicillin. All colonies were used to inoculate 0.5 L of 2xYT media136 and grown at 37 °C until an optical density between OD600 = 0.6 and OD600 = 0.8 was reached,137 and protein expression was induced with 0.5 mM IPTG over 3-5 hours at 37 °C. After lysis the138 cell suspension was heated to 70 to 80 °C in a water bath to denature the majority of soluble139 cellular contaminants. The soluble protein was separated from denatured and insoluble protein140 fractions by centrifugation for 30 min at 35 000×g, filtered through a 0.22 µm PES membrane and141 14 TABLE S4. Sequences of DNA oligonucleotides used for molecular biology. Name DNA sequence (5’ → 3’) NybbR Fw TGCTAGTAAGCTTGCGGCAGAAGCACTGAATAATCTGGG NybbR Rev ATAAATTCAAGAGAATCGGATCCACGCGGAACCAG CybbR Fw TGCTAGTAAGCTTGCGTAATAAAAGCTTGATCCGGC CybbR Rev ATAAATTCAAGAGAATCAGATCTCGGGTCCAGTTCC pRSETa NybbR Fwd TGCTAGTAAACTTGCGGGATCCGACCTCGAGATCTGC pRSETa NybbR Rev ATAAATTCAAGAGAATCGCCCTGAAAATACAGGTTTTCGTTG pRSETa CybbR Fwd TGCTAGTAAACTTGCGTGAGATCCGGCTGCTAACAAAGCCC pRSETa CybbR Rev ATAAATTCAAGAGAATCAAGCTTCGAATTCCATGGTACC CTPRa2 BamHI Fwd TGCATGCGGATCCGCCGAGGCGTGGTATAATCTAGG CTPRa2 RS+HindIII Rev GCATGCATAAGCTTAGATCTTGGGTCGAGTTCTAGGGCC pRSET Ncys Fwd TGTGGATCCGACCTCGAGATCTGC pRSET Ncys Rev GCCCTGAAAATACAGGTTTTCGTTG pRSET Ccys Fwd TGCTGAGATCCGGCTGCTAACAAAGCCC pRSET Ccys Rev AAGCTTCGAATTCCATGGTACCAGC CTPR RV1 BamHI Fwd TGCATGCGGATCCGCAGAAGCACTGAATAATCTGGGTAATGTTTATCG CTPR RV1 HindIII Rev GCATGCATAAGCTTAGATCTCGGGTCCAGTTCCAGCGC applied to a 5 mL HisTrap Excel column connected to an Äkta Pure chromatography system and142 equilibrated in wash buffer (50 mM Tris-HCl pH 7.5, 500 mM NaCl, 20 mM imidazole, SIGMAFAST143 Protease Inhibitor Cocktail (Sigma), DnaseI (Sigma), Lysozyme (Sigma)). The column was washed144 with 20 column volumes of wash buffer before proteins were eluted using a high-imidazole buffer145 (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 300 mM imidazole). All fractions containing protein were146 pooled, and if necessary, concentrated using a Vivaspin centrifugal concentrator (Sartorius) with147 the appropriate molecular weight cutoff. The protein was then further purified by size exclusion148 chromatography using a HiLoad 26/600 Superdex 75 pg or HiLoad 16/600 Superdex 75 pg (GE149 Healthcare) equilibrated in either Tris or phosphate buffer (50 mM Tris-HCl pH 7.5 or 50 mM150 sodium phosphate pH 6.8, 150 mM NaCl). Constructs with 10 repeats or more exhibited significant151 recombination resulting in proteins that had a decreasing number of repeats. Hence, only the first152 few fractions of the elution peak were pooled for concentration, while >60 % of the fractions had153 to be discarded.154 The CTPRrv4 construct used for crystallography was purified essentially as above but in 50 mM155 sodium phosphate pH 6.8, 150 mM NaCl based buffers. After elution from the resin with buffer156 containing imidazole, the protein was dialysed against 50 mM sodium phosphate pH 6.8, 150 mM157 NaCl for 18 hours, in the presence of thrombin (MP biomedical) to remove the H6-tag from158 the construct. The protein was further purified using a HiLoad 26/600 Superdex 75 pg column159 (GE Healthcare) equilibrated in 10 (10 mM HEPES pH 7.5, 150 mM NaCl, and concentrated to160 20 mg/mL.161 The intact mass of all constructs was confirmed by mass spectrometry.162 C. Equilibrium denaturation163 Samples of a total volume of 150 µL were prepared in a 96-well format (Greiner, medium-164 binding), in 50 mM sodium phosphate pH 6.8, 150 mM NaCl with guanidinium hydrochloride165 (GdHCl) gradients of 0 to 4.5 M (CTPRrv2 and yCTPRrv3y) or 0 to 7 M (all other proteins) [9].166 The exact denaturant concentration was calculated using the refractive indices of the native and167 denaturing buffers. A semi-automatic Hamilton Syringe unit was used to dispense the denaturant168 15 gradient. The final protein concentration was adjusted for each construct, depending on repeat169 type (presence/absence of one tryptophan per repeat) and array length, and ranged from <1 µM170 (large CTPRrv and all CTPRa constructs) to >11 µM (CTPRrv2). Samples were incubated on an171 orbital shaker at 25 °C for 2 h. Tryptophan residues were excited at 295± 10 nm and fluorescence172 was monitored at 360± 10 nm using a CLARIOStar microplate reader (BMG Labtech). Due to173 the deletion of tryptophan residues from the CTPRrv variant, tyrosine residues were excited at174 280± 10 nm and their fluorescence measured at 330± 10 nm. The data from 9 reads were averaged175 and normalised. The resulting fluorescence curve, F , was converted to the fraction of folded, θ, or176 unfolded protein, 1− θ, using177 F = (αN + βND) θ + (αU + βUD) (1− θ) (S1) or178 1− θ = −F + αN + βND αN − αU + (βN − βU )D , (S2) where αN + βND and αU + βUD describe the base lines at low (native) and high (unfolded)179 denaturant concentrations. Parameters for the baselines were extracted using a two-state unfolding180 equation to the whole data set or two separate linear fits to the baselines only.181 To extract the intrinsic and interfacial energies (∆Gunit and ∆Gnn) a homopolymer repeat Ising182 model was globally fit to denaturation data of un-tagged constructs with N = 2, 4, 5, 8 and 10183 repeats using the PyFolding suite [10], the code of which is based on the formalism developed by184 Barrick and co-workers [11]. We did not fit a heteropolymer helix model as this would lead to185 overparametrization (6 free parameters vs. 5 data sets).186 D. Crystallography187 CTPRrv4 at 20 mg/mL was crystallised in JCSG-plus screen, well B10 (0.2 M MgCl2, 0.1188 M sodium cacodylate, pH 6.5 and 50% v/v PEG 200, Molecular Dimensions) in sitting drop189 plates (SwissSci, Molecular Dimensions) with 600 nL droplets in 1:1 and 1:2 ratios of protein to190 well solution. Crystals were looped and flash frozen without further cryoprotectants. Crystals191 diffracted to 3.0�A resolution on beamline I04 at Diamond Light Source (Oxford, UK). The data192 were processed using autoPROC [12] with the determination of diffraction limits set by a local193 I/σI ≥ 1.50. The phase was solved by molecular replacement using a CTPRa4 structure (PDB194 accession code: 2hyz) with two molecules in the asymmetric unit. Refinements were performed195 using BUSTER version 2.10.3, [13, 14] and iterative model building in Coot [15]. We conservatively196 modelled phosphate molecules in the concave face of the TPR superhelix, since this buffer was197 present during all purification steps prior to size exclusion chromatography. Further details on198 collection and refinement statistics can be found in Table S1. Models of proteins containing more199 than 4 repeats were created by symmetry transformation in PyMOL, and missing residues and200 peptide bonds, e.g. between individual 4-mers, were added using MODELLER [16].201 E. Calculation of plane angles202 Changes in geometry between different repeat protein structures can be measured on two levels:203 (a) by comparing the whole repeat array (e.g. the superhelical arrangement in the case of TPRs), or204 (b) by comparing the angular differences between repeat planes. Dimensions of the TPR superhelix205 were estimated using the “Structure Measurments” tool of UCSF Chimera [17] and 20-repeat206 16 models of both repeat types. Calculations for obtaining angles between repeat planes were adapted207 from Forwood et al. [18]. In brief, a principal component analysis (PCA) is performed on the208 Cα-atom coordinates of each repeat, omitting the inter-repeat loops, to calculate the principal209 components (PCs, Fig. S13A) that are orientated along the length (PC1, purple), width (PC2,210 blue) and depth (PC3, green) of the repeat. As previously reported, curvature is defined as the211 angle between the respective PC2s of repeats i and i+ 1 projected onto the plane of repeat i+ 1,212 twist is the angle between PC1s projected onto the plane formed by PC1i+1 and PC3i+1, and lateral213 bending is the angle of PC3s projected onto the plane formed by PC1i+1 and PC3i+1 (Fig. S13B).214 Next, some conventions were introduced to ensure the correct direction (positive or negative) of215 the angle: (i) PC1 always has the same orientation as the superhelical axis, which is defined by the216 right-hand-rule from the N- to C-terminal direction of the polypeptide chain [19], (ii) PC3 points217 into the same direction as a vector from the centroid of repeat i to the centroid of repeat i + 1,218 and (iii) PC2 has the same direction as cross-product of PC3 with PC1. All calculations were219 performed using custom-written Python scripts with NumPy and Matplotlib extensions [20–23].220 FIG. S13. Visualisation of principal components fitted to repeat planes. (A) Sketch of alignment of PC1-3 with TPR repeats. (B) Schematic representation of how PC1-3 are used to calculate angles for curvature, twist and bending. F. Circular dichroism spectroscopy221 Proteins used for circular dichroism spectroscopy (CD) were buffer exchanged into 10 mM222 sodium phosphate pH 6.8, 50 mM NaCl, 1 mM DTE using PD10 minitrap columns (Cytiva), and223 diluted to approximately 2 µM. CD measurements were performed on a Chirascan CD spectrom-224 eter (Applied Photophysics) using 1 mm path-length cuvettes (Precision Cells, 110-QS, Hellma225 Analytics). CD spectra were recorded between 200 and 280 nm at a bandwidth of 1 mm with a226 rate of 0.5 s/nm. The data of five scans were averaged and converted to mean residue ellipticity227 to account for differences in the measured concentrations and in construct length (see Section IV).228 Uncertainties were estimated based on the standard error of the mean of the CD readings and a229 10% error to approximate uncertainties in concentration.230 G. Force spectroscopy experiments231 1. Sample preparation232 Protein-DNA chimeras based on Sfp-mediated conjugation were essentially produced as de-233 scribed previously [24, 25]. Reaction volumes of 50 to 100 µL containing 50 mM HEPES pH 7.5,234 17 10 mM MgCl2, 10 µM ybbR-tagged protein, 20 µM CoA-oligo (Biomers) and 10 µM Sfp-synthase235 (made in-house, the plasmid was a kind gift from the Gaub Lab at the LMU, Munich) were incu-236 bated over-night at room temperature. If necessary, yields pf the desired product were increased237 by performing the reaction with 40 µM CoA-oligo and 20 µM Sfp-synthase.238 Protein-DNA chimeras based on cysteine-maleimide reactions were produced as described pre-239 viously [26]. In brief, proteins were reduced with a 10-fold excess of TCEP (Sigma Aldrich) for240 at least 30 min, desalted into phosphate-buffered saline (PBS) using a HiTrap Desalting 5ml (GE241 Healthcare), and reacted to a 10-fold excess of DBCO-maleimide (Sigma Aldrich) for at least 2 h.242 After renewed desalting, 10 µM protein was then reacted with 20 µM azide oligo (Integrated DNA243 Technologies) in 100 µL volumes over-night at 37 °C in an orbital shaker.244 Samples were purified using a Superdex 200 10/300 GL (GE Healthcare) or YMC Pack Diol-300245 (Yamamura Chemical Research) equilibrated in 50 mM Tris-HCl pH 7.5, 150 mM NaCl. Fractions246 containing protein conjugated to two oligos were identified by SDS-PAGE, and 4 to 10 µL of those247 fractions were incubated with 100 to 200 ng biotin- or digoxigenin-functionalised DNA handles at248 room temperature for at least 30 min. Less than 1 µL of that mixture was added to anti-digoxigenin249 beads in 10 µL measuring buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl) and incubated for less250 than 5 min. Then, 0.5 to 0.7 µL of this mixture were added to 50 µL containing streptavidin beads,251 an oxygen scavenger system consisting of 0.65% (w/v) glucose (Sigma), 13 U/mL glucose oxidase252 (Sigma), and 8500 U/mL catalase (Calbiochem). Anti-digoxigenin and streptavidin beads were253 produced in-house using carboxyl-functionalised 1 µm beads (Bangs Laboratories) [27]. The final254 mixture was introduced into a home-built chamber that had been blocked with 10 mg/mL BSA for255 at least 5 min and washed with measuring buffer twice.256 2. Data acquisition257 All experiments were conducted on a custom-built, dual-beam set up with back-focal plane258 detection, with both traps having a stiffness of 0.25 to 0.35 pN/nm. An acousto-optical deflector259 was used to move one bead away from (or towards) the other at speeds ranging between 10 nm/s to260 5 µm/s. Bead positions were tracked using a photo-diode detector. Signals were filtered at 50 kHz261 using an 8-pole Bessel filter, acquired at 100 kHz and downsampled to 20 kHz before storage.262 Averaged force-distance curves were obtained from constant-velocity pulling cycles at≤100 nm/s,263 where there was no detectable hysteresis by binning, by averaging several stretch FDCs at typically264 100 different trap distances.265 VI. DATA ANALYSIS OF RAW FECS AND FDCS266 A. Fitting of raw FECs267 Force-extension curves (FECs) were fit with268 FeWLC (ξ) = kBT pD  1 4 ( 1− ξ LD )2 − 1 4 + ξ LD − FeWLC K  (S3) to model the DNA force response [28] and269 FWLC (ξ, c) = kBT pp  1 4 ( 1− ξ Lc )2 − 1 4 + ξ Lc  (S4) 18 to model the unfolded polypeptide [29], where ξ is the extension, kB is the Boltzmann constant,270 T the temperature, pD the persistence length of DNA, LD the contour-length of the DNA and K271 its elastic stretch modulus, and pp and Lc are the persistence and contour length of the protein,272 respectively. Theoretical and measured protein contour lengths are listed in Tab. S5. On average,273 we found that pD = 21.6± 0.6 nm, K = 730± 70 pN and pp = 0.70± 0.01 nm (mean ± SEM). LD274 correlated with the number of repeats (see Fig. S11).275 TABLE S5. Expected and measured contour lengths of CTPRa proteins. End-to-end distances |∆~r| are measured between the Cα atoms of the first and last amino acids. The exact length of the ybbR-tags differ between CTPRrv (12 amino acids) and CTPRa (16 amino acids) constructs due to cloning boundaries. All values are in nm. Calculated contour length of the attachment tags are 4.38 nm and 5.84 nm for the ybbR tags of the rv- and a-type proteins, respectively, and 2.19 nm for the cysteine attachments. Measured values are reported as the mean of all molecules and the corresponding standard error. Protein No. molecules Mean no. of traces used for averaging Lcalc a |∆~r| L∗ calc b Lc yCTPRrv3y 4 12 41.61 3.07 34.16 30.9 ± 0.4 yCTPRrv5y 5 7 66.43 4.19 57.86 56.9 ± 0.7 yCTPRrv10y 4 10 128.48 7.22 117.22 116 ± 2 yCTPRrv20y 7 6 252.58 14.59 233.61 222 ± 3 yCTPRrv26y 12 5 327.04 18.86 303.8 297 ± 1 yCTPRa5y 11 5 67.89 4.69 57.36 55.7 ± 0.5 yCTPRa9y 15 6 117.53 7.65 104.04 97.8 ± 0.8 cCTPRrv5c 19 7 64.24 4.19 57.86 52.2 ± 0.5 cCTPRa5c 12 5 64.24 4.69 57.36 52.6 ± 0.6 a Lcalc = 0.365 nm ·Nresidues b L∗calc = Lcalc − |∆~r| − Ltag B. Extracting average unfolding and refolding forces276 Due to the nature of their unfolding transition, it was not possible to extract the unfolding277 forces, which traditionally are the force at which a protein or a subdomain unfolds completely,278 i.e. the force peak. The force data were processed using Igor Pro (Wavemetrics) and analysed279 further in Python. The data of each force curve were binned into a histogram, giving rise to clear280 peaks corresponding to the baseline and the unfolding plateau (Figure S14A). The positions of281 these peaks was extracted from the histogram using a sum of two Gaussian functions and a linear282 dependence of the background noise on force (force clamping):283 P (F ) = mF + c+ a1e 1 2 ( F−µ1 σ1 )2 + a2e 1 2 ( F−µ2 σ2 )2 , (S5) where P (F ) is the probability density of force values, m and c are the slope and intercept of the284 noise level, and a the scaling factor, µ the mean and σ the standard deviation of the gaussian.285 C. Estimating the work done by the trap/protein from constant velocity data286 Force-extension curves taken at 10 nm/s and 100 nm/s were fitted with WLC models for both287 the DNA and fully extended protein. The non-equilibrium energies, or the work done by or on the288 system, W , were then extracted from force-distance curves (FDCs) [30]. The work done on the289 protein, or the unfolding energy, is simply the difference between the unfolding trace, U(d) and290 19 FIG. S14. Calculating the forces and energies of TPR unfolding transitions. (a) The mean unfolding force is extracted by fitting a Gaussian function (red) to a histogram of forces (right) which was derived from the raw data (left, plotted as force against its index array). (b) The non-equilibrium energies of unfolding are simply the area (shaded light blue) between the unfolding curve and the contour of the fully extended construct. the FDC of the fully extended protein, C(d):291 WU = ∫ d2 d1 U(d) dd− ∫ d2 d1 C(d) dd, (S6) which corresponds to the area between those two curves (Figure S14B). The work done by the292 protein, or the refolding energy, is the difference between the force response of the unfolded protein293 and the refolding trace R(d):294 WF = ∫ d2 d1 C(d) dd− ∫ d2 d1 R(d) dd. (S7) VII. MECHANICAL ISING MODELS295 A microscopic conformation c = {c1, . . . , cN} of a protein consisting of N subunits was described296 by a bit-word of length N , where ones indicate folded subunits and zeros indicate unfolded subunits.297 In the case of N subunits there are 2N possible microscopic conformations, e.g. for N = 3,298 c = {000, 100, 010, 001, 110, 101, 110, 111}.299 The full Hamiltonian of the entire system at a trap distance d is given by300 Hd(x, c) = Hint(c) +Hmech d (x, c), (S8) where Hint(c) describes the conformation-dependent internal energy and Hmech d (x, c) describes the301 mechanical energy stored in the system.302 The energy for mechanically stretching the system consisting of linker and the Hookean spring303 of the optical trap is304 Hmech d (x, c) = ∫ d−x 0 Fconstruct (c, ξ) dξ + 1 2 kx2. (S9) In the experimental configuration, the two mechanical parts consisting of dsDNA and unfolded305 polypeptide are in series (see Fig. S15). Hence, the extension of the full linker consisting of dsDNA306 and unfolded polypeptide is given by307 ξconstruct(F, c) = ξeWLC(F ) + ξWLC(F, c) + ξfolded(c), (S10) 20 where ξeWLC and ξWLC are given by eq. (S3) and eq. (S11). The extension of the folded protein308 ξfolded was assumed to be independent of force, but dependent on the particular configuration c309 of the protein, i.e. it contained information on the protein structure (see Fig. S16G and Section310 VII A below). The inverse of eq. (S10) yields the force on the construct as a function of length of311 unfolded polypeptide and total extension Fconstruct (ξ, c).312 FIG. S15. Lengths and quantities used in the compliance model for a two-bead configuration (top) and the equivalent one-bead configuration (bottom). The mechanical properties of the dsDNA linker were modelled using Eq. S3, and the mechanical313 properties of the polypeptide part were modelled using314 FWLC (ξ, c) = kBT pp  1 4 ( 1− ξ Lc(c) )2 − 1 4 + ξ Lc(c)  , (S11) where Lc(c) = ( N − ∑N i=1 ci ) · Laa + Ltag is the contour length of the unfolded polypeptide when315 the protein is in conformation c, pp is the persistence length of the unfolded polypeptide, Ltag is316 the contour length of the attachment tag and Laa = 0.365 nm is the length of a single amino acid317 [31].318 A. Structure information319 As highlighted in the main text, the models only accurately described the experimental data320 when the superhelical nature of CTPR proteins was considered. We incorporated this structural321 information into eq. S10 by setting ξfolded(c) to the sum of the end-to-end distances (Cα to Cα) of322 all folded stretches of helices, as given by the crystal structure.323 For example, for a configuration 0111001111, we set ξfolded = ξ2...4 + ξ7...10, where ξi...j is the324 crystal-structure end-to-end distance from the start of helix i to the end of helix j.325 B. Interaction models326 We considered four different interaction models of subunits and their coupling. For all models,327 the folded protein extension ξfolded(c) was obtained from the crystal structure for each possible328 configuration (see Fig. S16).329 21 A A B B A A B B D A AB B A B C A A B B repeat units helix units helix unitshelix units FIG. S16. Different Ising models were tested to describe the folding of TPR proteins. In all models, red arrows indicate the interactions between respective subunits and ξfolded represents the end-to-end distance of the folded portion. (A) In the homopolymer repeat model subunits consist of a whole repeats (i.e. two helices). (B) In the homopolymer helix model subunits consist of individual helices that are treated exactly the same. (C) In the heterpolymer helix model the structural repeat is divided into its A and B helices with respective energies. (D) The heteropolymer helix model can be extended to include nearest & next-nearest neighbour interactions (NNN) that may occur e.g. due to structural contacts. 1. Homopolymer repeat model330 In models based on a whole repeat (i.e. one A- and B-helix) as the smallest independent protein331 unit the internal energy of the protein is332 Hint(c) = ∆Gunit N∑ i=1 ci + ∆Gnn N−1∑ i=1 cici+1, (S12) where ∆Gunit is the energy of a folded subunit and ∆Gnn describes the energy of the next-neighbour333 interactions between two adjacent folded subunits (Fig. S16A). This is the simplest form of a one-334 dimensional Ising model.335 2. Homopolymer helix model336 The homopolymer helix model is equivalent to the homopolymer repeat model, but subunits337 consist of helices instead of repeats. Just as for the repeat model, interaction energies only affect338 next neighbours (Fig. S16B).339 3. Heteropolymer helix model340 This model takes into account that the two alpha helices in a repeat are different and thus may341 be parameterized by different energies. Only next-neighbour energies are allowed. The internal342 energy is given by343 Hint(c) = nA∆GA + nB∆GB + nAB∆GAB + nBA∆GBA, (S13) where nA is the number of folded A-helices in conformation c, nAB is the number of folded pairs344 of A and B helices, nBA is the number of folded pairs of B and A helices, etc (see Fig. S16C).345 22 4. Heteropolymer helix nearest & next-nearest (NNN) model346 This model accounts for contacts between adjacent A-A and B-B helices found in the crystal347 structure and assigns corresponding energies (Fig. S16D). The internal energy of the protein is348 Hint(c) = nAB∆GAB + nBA∆GBA + nAA∆GAA + nBB∆GBB + nA∆GA + nB∆GB. (S14) Here, nAB is the number of adjacent folded A and B helices and so on. Unfolded helices are349 considered to break contacts between next-nearest neighbours, such that a configuration ABA would350 contribute toward nAA, but A-A would not.351 We note that the both the heteropolymer helix model and the heteropolymer helix NNN model can be mapped to the repeat model when ∆Gunit = ∆GA + ∆GB + ∆GAB and (S15) ∆Gnn = ∆GBA + ∆GAA + ∆GBB. For all models, the total energy of a protein with N repeats is then ∆Gtot = N ∆Gunit + (N − 1) ∆Gnn. (S16) C. Calculation of force-distance curves352 Under equilibrium conditions, the mean bead deflection x for a given trap distance d is353 〈x (d)〉 = ∫ x ∑ c x exp ( −Hd(x,c)kBT ) dx∫ x ∑ c exp ( −Hd(x,c)kBT ) dx , (S17) where Hd(x, c) is the full Hamiltonian of the system (eq. (S8)), which also depends on the model-354 dependent energies (e.g. ∆Gnn, ∆Gunit), which are omitted here for ease of notation.355 Consequently, a force-distance curve (FDC) can be calculated using356 F (d) = 〈x(d)〉 · ( 1 k1 + 1 k2 ) , (S18) where k1 and k2 are the spring constants of the two traps.357 D. Calculation of unfolding profile358 Similarly, the probability of a subunit i to be folded at a given trap distance d is359 pi (d) = ∫ x ∑ c δi(c) exp ( −Hd(x,c)kBT ) dx∫ x ∑ c exp ( −Hd(x,c)kBT ) dx , (S19) where360 δi(c) = { 1, if the i-th bit of word c is set 0, otherwise . (S20) 23 E. Minimal folding unit under load361 To determine the size of the minimal folded unit under force conditions, we first numerically362 determined d∗ = d | p(c = 0) = 1 2 , i.e. the distance at which the unfolded configuration is equally363 populated as all other configurations, where364 p (c) = ∫ x exp ( −Hd(x,c)kBT ) dx∫ x ∑ c′ exp ( −Hd(x,c ′) kBT ) dx (S21) is the relative population of conformation c.365 The minimal folded unit was then calculated as the mean number of folded subunits of all other366 configurations c 6= 0, weighted by their population.367 F. Minimal folding unit in the absence of load368 We define the minimal folding unit in the absence of load as the minimal amount of subunits369 that are necessary such that the total energy of the protein becomes negative.370 G. Computation and simplification371 FDCs were calculated by numerically evaluating eq. (S18) using custom-written CUDA software372 on a GeForce RTX 2080 graphics card (Nvidia). Even though massive parallelization greatly ac-373 celerated the computation time, the calculations were still too expensive for long repeat molecules,374 such as the 26-repeat protein in the helix models with a conformational space size of 252 ≈ 5×1015.375 A matrix formalism, which was previously employed to reduce model complexity in chemical un-376 folding [11], could not be used to describe the mechanical unfolding because of the non-linear contri-377 butions of the linker molecules (DNA and unfolded polypeptide) to the mechanical energy.Instead,378 we considered two simplifications that reduced the conformational space by eliminating extremely379 unlikely high-energy configurations.380 1. Skip approximation381 In helix models, we excluded all configurations in which an individual helix was folded without382 adjacent folded neighbours (e.g. 010111), or in which two adjacent helices were folded without383 a stabilising neighbors (e.g. 110111). These simplifications were in accordance with previous384 experimental findings that individual repeats are not stable in solution and resulted in a reduction385 of the computational complexity from O(2N ) to < O(1.65N ).386 The simplifications allowed us to calculate FDCs for molecules of all repeat lengths. However,387 the computational cost for the longest molecules was still very expensive (≈60 h per iteration for388 one FDC with ≈ 4× 1010 configurations of a 26-mer in the Skip approximation) and prevented us389 from using these approximations in a fit function.390 2. Zipper approximation391 Therefore, we also considered a zipper approximation, in which unfolding always occurs from392 the ends and configurations such as 11101111 do not exist. This model was of complexity O(N2)393 and could easily be fitted to all molecules.394 24 3. Verification395 In practice, we obtained the energy parameters by fitting the zipper approximation to molecules396 of all repeat lengths. We then verified that FDCs obtained from the Skip approximation, with the397 same energy parameters, closely reproduced the prediction of the zipper model (see fig. S5A).398 The resulting energies for all molecules for which the computation was feasible were identical399 within errors when comparing the Skip approximation and the zipper approximation. (see Table 1400 in the main text).401 H. Error estimation and propagation402 To determine the errors of the reported energies ∆Gunit, ∆Gnn and ∆Gtot (eqns. (S15, S16)),403 we performed model fits to each individual molecule. The reported errors were then calculated by404 Gaussian error propagation based on the covariance matrix of the individual values of ∆GA, ∆GB,405 ∆GAB, ∆GBA, ∆GAA and ∆GBB and reported as standard error of the mean (SEM) [32].406 25 [1] J. Yang, R. Yan, A. Roy, D. Xu, J. Poisson, and Y. Zhang, The I-TASSER Suite: protein structure407 and function prediction, Nature Methods 12, 7 (2015).408 [2] S. K. Wetzel, G. Settanni, M. Kenig, H. K. Binz, and A. Plückthun, Folding and unfolding mechanism409 of highly stable full-consensus ankyrin repeat proteins, Journal of Molecular Biology 376, 241 (2008).410 [3] E. R. Main, Y. Xiong, M. J. Cocco, L. D’Andrea, and L. Regan, Design of stable α-helical arrays from411 an idealized TPR motif, Structure 11, 497 (2003).412 [4] T. Kajander, A. L. Cortajarena, S. Mochrie, and L. Regan, Structure and stability of designed tpr413 protein superhelices: unusual crystal packing and implications for natural tpr proteins, Acta Crystal-414 lographica Section D 63, 800 (2007).415 [5] A. L. Cortajarena, T. Kajander, W. Pan, M. J. Cocco, and L. Regan, Protein design to understand pep-416 tide ligand recognition by tetratricopeptide repeat proteins, Protein Engineering, Design and Selection417 17, 399 (2004).418 [6] A. Hemsley, N. Arnheim, M. D. Toney, G. Cortopassi, and D. J. Galas, A simple method for site-directed419 mutagenesis using the polymerase chain reaction, Nucleic Acids Research 17, 6545 (1989).420 [7] S. Moore, ’round the horn site-directed mutagenesis.421 [8] T. Kajander, A. L. Cortajarena, E. R. G. Main, S. G. J. Mochrie, and L. Regan, A new folding paradigm422 for repeat proteins, Journal of the American Chemical Society 127, 10188 (2005).423 [9] A. Perez-Riba and L. S. Itzhaki, A method for rapid high-throughput biophysical analysis of proteins,424 Scientific Reports 7, 9071 (2017).425 [10] A. R. Lowe, A. Perez-Riba, L. S. Itzhaki, and E. R. Main, Pyfolding: Open-source graphing, simulation,426 and analysis of the biophysical properties of proteins, Biophysical Journal 114, 511 (2018).427 [11] T. Aksel and D. Barrick, Analysis of repeat-protein folding using nearest-neighbor statistical mechanical428 models, in Biothermodynamics, Part A, Methods in Enzymology, Vol. 455, edited by M. L. Johnson,429 J. M. Holt, and G. K. Ackers (Academic Press, 2009) Chap. 4, pp. 95–125.430 [12] C. Vonrhein, C. Flensburg, P. Keller, A. Sharff, O. Smart, W. Paciorek, T. Womack, and G. Bricogne,431 Data processing and analysis with the autoproc toolbox, Acta Crystallographica Section D 67, 293432 (2011).433 [13] B. G., B. E., B. M., F. C., K. P., P. W., R. P, S. A., S. O.S., V. C., and W. T.O., Buster (2020).434 [14] O. S. Smart, T. O. Womack, C. Flensburg, P. Keller, W. Paciorek, A. Sharff, C. Vonrhein, and435 G. Bricogne, Exploiting structure similarity in refinement: automated ncs and target-structure re-436 straints in buster, Acta Crystallographica Section D 68, 368 (2012).437 [15] P. Emsley, B. Lohkamp, W. G. Scott, and K. Cowtan, Features and development of coot, Acta Crys-438 tallographica Section D - Biological Crystallography 66, 486 (2010).439 [16] A. Šali and T. L. Blundell, Comparative protein modelling by satisfaction of spatial restraints, Journal440 of Molecular Biology 234, 779 (1993).441 [17] E. F. Pettersen, T. D. Goddard, C. C. Huang, G. S. Couch, D. M. Greenblatt, E. C. Meng, and T. E.442 Ferrin, UCSF Chimera–a visualization system for exploratory research and analysis, J Comput Chem443 25, 1605 (2004).444 [18] J. K. Forwood, A. Lange, U. Zachariae, M. Marfori, C. Preast, H. Grubmüller, M. Stewart, A. H.445 Corbett, and B. Kobe, Quantitative structural analysis of importin-β flexibility: Paradigm for solenoid446 protein structures, Structure 18, 1171 (2010).447 [19] B. Kobe, T. Gleichmann, J. Horne, I. G. Jennings, P. D. Scotney, and T. Teh, Turn up the HEAT,448 Structure 7, R91 (1999).449 [20] K. J. Millman and M. Aivazis, Python for scientists and engineers, Computing in Science & Engineering450 13, 9 (2011).451 [21] T. E. Oliphant, Python for scientific computing, Computing in Science & Engineering 9, 10 (2007).452 [22] S. v. d. Walt, S. C. Colbert, and G. Varoquaux, The numpy array: A structure for efficient numerical453 computation, Computing in Science & Engineering 13, 22 (2011).454 [23] J. D. Hunter, Matplotlib: A 2D graphics environment, Computing In Science & Engineering 9, 90455 (2007).456 [24] J. Yin, P. D. Straight, S. M. McLoughlin, Z. Zhou, A. J. Lin, D. E. Golan, N. L. Kelleher, R. Kolter,457 and C. T. Walsh, Genetically encoded short peptide tag for versatile protein labeling by Sfp phospho-458 pantetheinyl transferase, Proceedings of the National Academy of Sciences 102, 15815 (2005).459 26 https://openwetware.org/wiki/%27Round-the-horn_site-directed_mutagenesis [25] M. Synakewicz, D. Bauer, M. Rief, and L. S. Itzhaki, Bioorthogonal protein-DNA conjugation methods460 for force spectroscopy, Sci Rep 9, 13820 (2019).461 [26] A. Mukhortava and M. Schlierf, Efficient formation of site-specific protein-dna hybrids using copper-free462 click chemistry, Bioconjugate Chemistry 27, 1559 (2016).463 [27] K. Tych and G. Žoldák, Stable Substructures in Proteins and How to Find Them Using Single-Molecule464 Force Spectroscopy, Methods Mol Biol 1958, 263 (2019).465 [28] M. D. Wang, H. Yin, R. Landick, J. Gelles, and S. M. Block, Stretching DNA with optical tweezers,466 Biophysical Journal 72, 1335 (1997).467 [29] C. Bustamante, J. F. Marko, E. D. Siggia, and S. B. Smith, Entropic elasticity of lambda-phage DNA,468 Science 265, 1599 (1994).469 [30] J. C. M. Gebhardt, T. Bornschlögl, and M. Rief, Full distance-resolved folding energy landscape of one470 single protein molecule, Proceedings of the National Academy of Sciences 107, 2013 (2010).471 [31] H. Dietz and M. Rief, Exploring the energy landscape of gfp by single-molecule me-472 chanical experiments, Proceedings of the National Academy of Sciences 101, 16192 (2004),473 https://www.pnas.org/content/101/46/16192.full.pdf.474 [32] I. G. Hughes and T. P. A. Hase, Measurements and their Uncertainties: A Practical Guide to Modern475 Error Analysis (Oxford University Press, 2010).476 27 https://doi.org/10.1073/pnas.0909854107 https://doi.org/10.1073/pnas.0404549101 https://arxiv.org/abs/https://www.pnas.org/content/101/46/16192.full.pdf Supplementary Information Unraveling the Mechanics of a Repeat-Protein Nanospring — From Folding of Individual Repeats to Fluctuations of the Superhelix Contents Supplementary figures Supplementary tables Materials Protein Sequences Experimental methods Molecular biology Mutagenesis General repeat array construction Construction of yCTPRrv3y and yCTPRrv5y Construction of yCTPRrv10y, yCTPRrv20y and yCTPRrv26y Construction of yCTPRa5y, yCTPRa9y, cCTPRrv5c and cCTPRa5c Protein preparation Equilibrium denaturation Crystallography Calculation of plane angles Circular dichroism spectroscopy Force spectroscopy experiments Sample preparation Data acquisition Data analysis of raw FECs and FDCs Fitting of raw FECs Extracting average unfolding and refolding forces Estimating the work done by the trap/protein from constant velocity data Mechanical Ising models Structure information Interaction models Homopolymer repeat model Homopolymer helix model Heteropolymer helix model Heteropolymer helix nearest & next-nearest (NNN) model Calculation of force-distance curves Calculation of unfolding profile Minimal folding unit under load Minimal folding unit in the absence of load Computation and simplification Skip approximation Zipper approximation Verification Error estimation and propagation References