Title: Evolution of oligomeric state through allosteric pathways that mimic ligand binding Authors: Tina Perica1,2,§, Yasushi Kondo2, Sandhya P. Tiwari3, Stephen H. McLaughlin2, Katherine R. Kemplen4, Xiuwei Zhang1, Annette Steward4, Nathalie Reuter3, Jane Clarke4 and Sarah A. Teichmann1* Affiliations: 1 European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. 2 MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. 3 Department of Molecular Biology, University of Bergen and Computational Biology Unit, Department of Informatics, University of Bergen, PO Box. 7803, N-5020 Bergen, Norway. 4 Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK. § Current address: Department of Bioengineering and Therapeutic Sciences, California Institute for Quantitative Biosciences, University of California, San Francisco, 1700 4th St., San Francisco, CA 94143, USA *Correspondence to: saraht@ebi.ac.uk Abstract: Evolution and design of protein complexes is almost always viewed through the lens of amino acid mutations at protein interfaces. We showed previously that residues not involved in the physical interaction between proteins make important contributions to oligomerisation by acting indirectly or allosterically. Here, we sought to investigate the mechanism by which allosteric mutations act using the example of the PyrR family of pyrimidine operon attenuators. In this family, a perfectly sequence-conserved helix that forms a tetrameric interface is exposed as solvent-accessible surface in dimeric orthologues. This means that mutations must be acting from a distance to destabilize the interface. We identified eleven key mutations controlling oligomeric state, all distant from the interfaces and outside ligand-binding pockets. Finally, we show that the key mutations introduce conformational changes equivalent to the conformational shift between the free versus the nucleotide-bound conformations of the proteins. One Sentence Summary: This work probes the mechanism of indirect, allosteric mutations that employ the intrinsic dynamics of the protein involved in allosteric regulation by small molecules. Main Text: Introduction Proteins diverge during the course of evolution and experience a continuous trade-off between selection for function and stability (1). Gould and Lewontin described how organisms adapt to different competing demands, while at the same time accumulating traits that occur either due to drift or correlations with selected features (2). This view can also be applied to proteins, where mutations of individual residues interact and determine fitness, similar to mutations in genes at the level of organisms (3). Selection is then determined by conditions, both internal (interactions with other macromolecules in the cell), and external (environmental variables, e.g. temperature or pH). Furthermore, due to the difference in the sizes of sequence and structure space, proteins can accumulate destabilizing mutations, as long as they remain stable enough at given conditions (4). In previous work, we showed that mutations outside protein interfaces are as important for the evolution of quaternary structure/oligomeric state as mutations directly within interfaces (5). This raised the following question: by what mechanism do mutations outside interfaces affect their formation? The most likely hypothesis is that these mutations act by changing either protein conformation or conformational dynamics, analogous to the ways in which allosteric ligands introduce conformational change. Thus we referred to the indirect mutations as allosteric mutations. Furthermore, the conformational dynamics of proteins enable functional features such as ligand binding, and also contribute to evolutionary plasticity, i.e. “evolvability”. Protein dynamics are essential for the functions of many proteins (6), and are more conserved at the superfamily level than sequence (7). Selection favours mutations of side chain interactions that promote acquisition of the folded state. In the same way, selection is stronger on functionally relevant conformations of the entire protein structural ensemble (8). Importantly, the conformation under strongest constraint is not the one with the lowest free energy, but rather the one most similar to the functional, often ligand-bound, state. The protein family we study here is a group of pyrimidine attenuator regulatory proteins, PyrR, present in the Bacillaceae family as well as in some other bacterial species (9). The PyrR family shows clear evidence of mutations acting allosterically with respect to the protein interface. The change from homodimeric to homotetrameric family members is unmistakably brought about by allosteric mutations: homologues with different oligomeric states share a helix whose surface is 100% conserved in sequence. This helix forms the tetrameric interface in the homotetrameric family members, but is solvent-exposed in the dimeric family members. Bacillaceae live at a wide range of temperatures, to which the PyrR proteins have adapted. At the same time PyrR is constrained by the need to conserve its dsRNA-binding ability and allosteric regulation by nucleotides. PyrR binds to a stem-loop structure in the nascent mRNA of the pyr operon, which induces formation of the termination loop and attenuates transcription. UMP and GMP allosterically regulate binding of PyrR to RNA, reflecting the ratio between purines and pyrimidines in the cell (10, 11) (Figure 1). As the name suggests, excess pyrimidines as reflected in UMP binding attenuate further transcription of the pyr operon. In this work we use ancestral sequence reconstruction to infer the allosteric mutations that changed the oligomeric state and thermostability in the PyrR family during the course of evolution. We identify eleven allosteric mutations that decrease thermostability in all PyrR proteins, but change the oligomeric state only in the context of inferred ancestral PyrR proteins, and not in thermophilic PyrR. We show how these mutations affect oligomeric state indirectly, and describe this allosteric mechanism: the same internal conformational switch in PyrR proteins is toggled both by an allosteric ligand (GMP), and by a small number of mutations. Results and discussion Close homologues of PyrR have conserved interface amino acids but different oligomeric states. Using size-exclusion chromatography coupled with multi-angle light scattering (SEC-MALS) at room temperature and velocity analytical ultracentrifugation (AUC) at 10 °C, we observed that the PyrR oligomeric state differs between B. caldolyticus and B. subtilis homologues, and is affected by allosteric regulators such as GMP (Figure 2). Bacillus caldolyticus has an optimal growth temperature of 72 °C, and its PyrR (BcPyrR) elutes as one peak corresponding to a tetramer (Figure 2). Velocity AUC experiments show that the majority of BcPyrR sediments as a tetramer with a minor monomeric species and no apparent dimeric species (Figure S4). Bacillus subtilis has an optimal growth temperature of 25 °C, and BsPyrR elutes from the size-exclusion column as a broad peak with a range of molecular masses between monomeric, dimeric and tetrameric oligomeric state (Figure 2). In the velocity AUC experiment for BsPyrR, two species were observed to sediment, calculated to have molecular masses corresponding to that of a PyrR dimer and tetramer respectively (Figure S4). Therefore, BsPyrR exists as a dimer in equilibrium with the monomeric and tetrameric species at low micromolar concentrations, which correspond to the physiological range, as the estimated average concentration of PyrR in B. subtilis cells is 0.4 µM (12). BcPyrR and BsPyrR have the highest sequence identity of all PyrR homologues of known 3D structure with different oligomeric states: 73% sequence identity over 180 residues, corresponding to 49 substitutions, most of which are on the solvent-exposed surface of the protein. Interestingly, the residues involved in the tetrameric (dimer-of-dimers) interface are 100% sequence-identical. These interface residues are also likely involved in RNA-binding (13), and hence under purifying selection (Figure S2). A small number of allosteric mutations change the oligomeric state of PyrR The PyrR protein is present in various Bacillus species with diverse optimal growth temperatures, as well as the distantly related bacteria Mycobacterium tuberculosis and Thermus thermophilus, as shown in the phylogenetic tree of this protein family (Figure 1B). In order to trace the mutations changing the oligomeric state between the thermophilic BcPyrR (red) and mesophilic BsPyrR (blue), we focused on the two internal nodes in the phylogenetic tree after the split of the BcPyrR from the last common ancestor of the Bacillus sp. PyrR (LCABacillusPyrR). We reconstructed the most likely ancestral sequences of the internal nodes (14). Please refer to the Methods for details. In analogy to the colour wheel, we named the two inferred ancestral proteins AncORANGEPyrR and AncGREENPyrR (Figure 1B) SEC-MALS revealed AncORANGEPyrR formed a stable tetramer, and AncGREENPyrR only showed a decrease in the average molecular mass at the lowest concentration (1 µM) from which we imply a presence of a low concentration of lower oligomeric state species (Figure 2). In AUC analysis, both ancestral proteins displayed similar distributions of sedimenting species, as seen for BcPyrR (Figure S4). Therefore, we infer that evolution of the dimeric state occurred toward the terminal branches of the PyrR phylogenetic tree, between AncGREENPyrR and BsPyrR. There are twelve substitutions and three insertions/deletions (a set of fifteen mutations we refer to as m3) between AncGREENPyrR and BsPyrR (Figure S19). As our goal is to identify the smallest subset of allosteric mutations that clearly change the oligomeric state we excluded four of these mutations: three insertions/deletions and one (E4Q) substitution which is a revertant to the same amino acid as in the tetrameric BcPyrR. Two of the insertions/deletions were at each of the termini, and the third one was in a flexible loop. We would not expect these three changes to have a significant effect on the structure. Eleven substitutions (11/m3) were different between BsPyrR and all the tetrameric PyrRs (BcPyrR, AncORANGEPyrR and AncGREENPyrR). In order to confirm their role in the shift of the oligomeric state, we inserted them into the stable PyrR tetramer AncORANGEPyrR, producing the engineered protein VIOLETPyrR (Figure 3). The eleven substitutions did indeed destabilize the AncORANGEPyrR tetramer: VIOLETPyrR has similar SEC-MALS and AUC profiles as BsPyrR (Figures 2 and S4). Thus, remarkably, these mutations, none of which are in the tetrameric interface, shift the oligomeric state through an indirect, allosteric mechanism. Do these mutations turn any PyrR homologue into a dimer? We grafted the eleven allosteric mutations into the tetrameric BcPyrR, forming the engineered protein PURPLEPyrR. Surprisingly, PURPLEPyrR remains tetrameric, even at lowest concentration (Figures 3B and S4). This implies that these eleven allosteric mutations have an epistatic interaction with the 32 m1 mutations that separate BcPyrR and AncORANGEPyrR. Epistasis between amino acid substitutions is known to be ubiquitous in proteins, as described in multiple recent publications (3, 15, 16). In order to pinpoint the key oligomeric state-switching mutations, we further tested the effect of two non-overlapping sets of residues within the eleven allosteric mutations, represented by PLUMPyrR (3/m3) and MAGENTAPyrR (8/m3). We selected the three mutations in PLUMPyrR based on the proximity of these residues to the dimeric interface, expecting them to have the largest impact on the inter-subunit geometry. Both subsets of mutations had independent effects on the oligomeric state (Figure 3), as the equilibria of both PLUMPyrR and MAGENTAPyrR were shifted towards the dimeric state at the lowest protein concentrations in SEC-MALS, with the appearance of a dimeric species in AUC (Supp. Fig 4). This implies that these two small sets of mutations contribute to the oligomeric shift in a cumulative manner. The eleven mutations that shift PyrR oligomeric state are part of a downhill adaptation to temperature Members of the Bacillus genus live in dramatically different environments, the most notable difference being ambient temperature. Hobbs et al (17) have shown that the Bacillus species have adapted to different temperatures multiple times in evolution. B. subtilis (with the homodimeric BsPyrR) lives in soil with optimal growth at 25 °C, and B. caldolyticus (with the homotetrameric BcPyrR) in alkaline hot springs with the optimal growth at 72 °C (18). We recorded the circular dichroism (CD) spectra at temperatures from 20 to 90 °C for all of our PyrR constructs (Figure 3C). Their thermal unfolding was irreversible, and all but BsPyrR and PURPLEPyrR unfolded in a single phase. This was sufficient to estimate the thermal stability of PyrR proteins along the evolutionary tree. BcPyrR shows no variation in the CD spectra up to 70 °C, when it suddenly unfolds cooperatively. BsPyrR however, exhibits changes in helicity at temperatures as low as 35 °C, finally unfolding completely at 75 °C. We could not determine from the CD spectra which of the secondary structure changes occur at low temperatures in BsPyrR. However, plotting ellipticity for different wavelengths suggests an exchange between α helical and β sheet structure (Fig. S5). AncORANGEPyrR thermal unfolding follows the same pattern as that of BcPyrR, with unfolding taking place at 80 °C rather than 70 °C. This stabilization of 10 °C is most probably an artefact of ancestral protein sequence reconstruction, which has been suggested to overestimate protein stability due to a bias towards more stabilizing mutations in the evolutionary substitution models (19). Notably, VIOLETPyrR, which differs from AncORANGEPyrR by just the eleven mutations, unfolded at a significantly lower temperature than AncORANGEPyrR, while PLUMPyrR and MAGENTAPyrR have intermediate thermostability (Figure 3C). This raises the question as to the mechanism for the thermal destabilizing effect of the eleven mutations. They could be affecting thermal destabilisation through the switch in oligomeric state, or by changing the polarity of the protein surface. Both residue composition and oligomeric state have been suggested to play a role in protein thermostability (20). A higher oligomeric state is proposed to increase thermostability by burying more residue surface area. It has also been repeatedly observed that thermophilic bacteria have more charged residues and fewer polar residues compared to mesophilic organisms. This bias is especially pronounced when only surface residues are taken into account (21). To deconvolute whether it is the residue propensity or the oligomeric state that plays the main role in the differences in thermostability of PyrR, we took advantage of the engineered PURPLEPyrR, which has the eleven allosteric mutations, but is still tetrameric. Although all but the eleven allosteric residues of PURPLEPyrR are the same as in the thermophilic BcPyrR, the CD measurements show that tetrameric PURPLEPyrR has significantly decreased thermostability compared to that of BcPyrR. We thus infer that it is the change in thermophilic propensity of surface residues, and not the change in the oligomeric state, that plays a major role in the change of PyrR thermostability. In order to dissect this in detail, we bioinformatically define the residue thermophilic propensity as the log ratio of amino acid frequencies between the solvent-exposed surfaces of proteins from thermophilic and mesophilic organisms (21). Thus we calculated how mutations along the PyrR tree change this thermophilic propensity. As expected, the mutations increase the thermophilic propensity on the branches from AncORANGEPyrR towards BcPyrR, and decrease towards BsPyrR. Moreover, the largest decrease in thermophilic propensity occurs between AncGREENPyrR and BsPyrR, the branch that also corresponds to the switch towards the dimeric state (Figure S6). From this, we infer that the eleven allosteric mutations are part of a more general “downhill” adaptation to lower temperatures. Thus the switch in oligomeric state of free PyrR co-occurs with the evolutionary adaptation to lower temperatures of B. subtilis as compared to B. caldolyticus. How is this dimer/tetramer switch affected by mutations that are distant from all the inter-subunit interfaces? To answer this question, we investigated the oligomeric states during allosteric regulation by ligands in this protein family. The allosteric regulators UMP and GMP control oligomeric state Previous in vivo and biochemical experiments showed that the PyrR binding to the leader RNA sequence (PyrR binding loop) of the pyr operon is regulated by small molecules such as UMP and GMP (10, 11). In summary, higher concentrations of pyrimidines increase the affinity of PyrR for the PyrR binding loop, and in turn attenuate transcription of the pyrimidine synthesis operon. Higher concentrations of purines, on the other hand, decrease the affinity for the binding loop, which in turn increases the transcription of the pyrimidine synthesis operon (9) (Figure 1). As allosteric regulation usually affects conformational change, we wanted to investigate how GMP and UMP influence PyrR conformation and oligomeric state. We analysed the oligomeric state of BsPyrR and BcPyrR by SEC-MALS upon addition of allosteric ligands, and observed that both UMP and GMP stabilize the tetrameric state of PyrR (Figures 2 and S3). This is especially prominent in the case of BsPyrR, where addition of nucleotides shifts the equilibrium towards a higher oligomeric state. The RNA-bound form of PyrR, not investigated here, is likely to be dimeric, based on analytical ultracentrifugation (11) and mutagenesis experiments (13). This means that the effect of the eleven allosteric mutations is similar to that of RNA and opposite to the nucleotide ligands, which stabilize the tetrameric state. Interestingly, the eleven allosteric mutations are also allosteric with respect to the nucleotide-binding site: each of the eleven residues is 10 Å or more away from the bound GMP molecules. Overall, while different ligand-bound and RNA-bound forms of the protein sample both dimeric and tetrameric states, the eleven mutations shift the free protein equilibrium towards the dimeric state in this landscape of different conformations. Both mutations and ligands shift oligomeric state by changing inter-subunit geometry. In order to determine the structural changes that occur when the allosteric mutations switch the oligomeric state in the PyrR family, we solved four new X-ray crystal structures: AncORANGEPyrR, AncGREENPyrR, VIOLETPyrR, and BsPyrR+GMP (Table S2). We then compared these structures to those of BcPyrR and BsPyrR (22, 23). In our previous work, we hypothesized that evolutionary changes in oligomeric state can arise from difference in inter-subunit geometry within a protein complex (5). If this were true for the PyrR family, we would expect the dimeric structures to have distinct inter-subunit geometries as compared to the tetrameric structures. Superimposing the dimeric BsPyrR on the BcPyrR tetramer shows an 8° rotation around the dimeric interface (Figures 4A and S1). This conformation of BsPyrR is not compatible with the tetrameric oligomeric state, as the two helices that would form the dimer-of-dimers interface are pulled apart by more than 5 Å. The eleven mutations affect the same difference in conformation. Dimeric VIOLETPyrR, and tetrameric AncORANGEPyrR, which differ by the eleven mutations, exhibit the same relative rotations between subunits within the dimer (Figure 4). The subset of three allosteric mutations introduced into PLUMPyrR from AncORANGEPyrR leads to the same geometric change, where PLUMPyrR has a 9° inter-subunit rotation as compared to AncORANGEPyrR (Figure S8). How does this compare to the difference between free, dimeric BsPyrR and tetrameric GMP- bound BsPyrR? Addition of GMP introduces a 10° rotation, changing the protein conformation into the one compatible with forming the dimer of dimers (Figure 4). The tetrameric GMP-bound BsPyrR structure is similar to the tetramers formed by free BcPyrR and AncORANGEPyrR (Figure S8). There is a subtle 3.6 ° subunit rotation around the dimeric interface between the GMP bound form of BsPyrR, and AncORANGEPyrR. However, their tetrameric interfaces superimpose almost perfectly, with an average atomic distance difference in the tetrameric interface helices of less than 1 Å. In summary, homologous dimers all have a similar set of inter-subunit geometries, equally different from the inter-subunit geometries of tetramers. The tetramers exhibit limited variations in their geometries, all of which are significantly smaller than the differences between the dimers and tetramers. As the geometries of the dimers and tetramers are so clearly distinct from each other, we conclude that the eleven allosteric mutations affect oligomeric state in a manner almost identical to the allosteric ligand GMP. How does this change in intersubunit geometry come about? This is not immediately evident by inspecting the individual monomeric subunits (Figure S9) or the dimeric interfaces (5) between the dimeric and tetrameric proteins, as they all superpose well. To look for the subtle structural differences that could account for the observed differences in conformation and oligomeric state, we used the residue – residue interaction network approach (24-27). With this approach, the protein structure is reduced to a network where each node represents a residue and each edge represents a physical interaction between two residues. This allows for an unbiased analysis of structures using graph theoretical methods, as illustrated in Figure S10. The tetrameric AncORANGEPyrR and dimeric VIOLETPyrR contact networks differ in about 15% of their contacts, and these differences are non-uniformly distributed around the network (Figures 4B and S11). To estimate how much each residue contributes to the difference in residue-residue contacts, we determined the number of contact changes in two shells around the amino acid of interest. To maintain information on residue connectivity, which has been shown to determine residue evolvability (28), we used the absolute number of rewired residue contacts rather than normalizing by total number of contacts. This is because rewiring the contacts of a buried residue with a high connectivity will have a larger structural impact than rewiring a residue with lower connectivity. Three out of the eleven allosteric mutations, L68I, K84D, and A118G, exhibit dramatic rewiring of contacts (Figure 4B). Specifically, when comparing AncORANGEPyrR with the dimeric structures (BsPyrR, VIOLETPyrR and PLUMPyrR), these residues rewire between one and two standard deviations more contacts than the average buried PyrR residue (Figure S12). L68 and A118 are completely buried in the protein interior, while K84 is in the more flexible part of the protein, changing its accessible surface area between different conformations. Moreover, L68 and A118 are the two residues at the centre of the largest rewiring events in the transition from the dimeric BsPyrR to the tetrameric BsPyrR+GMP. This means that the structural changes due to the L68I and A118G mutations “mimic” the key residue rewiring events that occur upon GMP binding. Thus the evolutionary mutations and the allosteric ligand GMP share a common mechanism for achieving an identical inter-subunit rotation, leading to the same shift in oligomeric state. The stability difference between PyrR tetramers is coupled to changes in the dynamics of dimeric units. Above, we observed that small differences, either via mutation or ligand binding, affect the wiring of residue contact networks. This suggests a small energy difference between the two main conformations of PyrR. Thus we might expect both these conformations to be sampled by the vibrational normal modes that describe the intrinsic dynamics of dimeric PyrR. We compared the similarity of the intrinsic dynamics of different PyrR dimers (both the dimeric PyrRs and the halves of tetrameric PyrRs), by comparing their flexibility calculated using elastic network modeling (ENM) (29). ENM provides a distribution of fluctuations around the equilibrium conformation for each structure, and the overlap of these distributions between different structures can be described by the Bhattacharyya coefficient (BC). We have previously shown that the BC ranges between 0.85 and 1 for members of the same protein family (30). Accordingly, the differences between all PyrR proteins are in this range (between 0.83 and 0.98). Furthermore, it is clear that the pattern of flexibility is more similar amongst the three dimers (BsPyrR, VIOLETPyrR, PLUMPyrR) than the three tetramers (AncORANGEPyrR, PURPLEPyrR+UMP, and BsPyrR+GMP) (Figure 5A). The BsPyrR dimer and BsPyrR+GMP tetramer had a similarity score of 0.87, while BsPyrR and other dimers (VIOLETPyrR and PLUMPyrR) had higher pairwise similarity scores reflecting more similar dynamics. Importantly, this illustrates that similarities in intrinsic dynamics amongst the PyrR proteins are not a simple function of sequence identity, given the fact that BsPyrR+GMP and BsPyrR have the same sequence and VIOLETPyrR is closer in sequence to AncORANGEPyrR than it is to BsPyrR. The clustering based on the BC score seen in Figure 5A matches the clustering based on their RMSD (Figure S14). While BC and RMSD compare different properties of structures, here the structures with the highest BC score also have lowest RMSD, which confirms that the structural changes are encoded in the intrinsic dynamics of these structures. It has been repeatedly shown that the conformational difference, either between functional conformations or even homologues, can be sampled by a combination of a few low-frequency normal modes of the protein (31-35). We found that a few lowest frequency modes of PyrR proteins describe the transition between a tetramer and a dimer, both in the case of the transition being induced by allosteric ligands and by allosteric mutations. In particular, the second lowest frequency mode of the BsPyrR+GMP protein also captured 44% of the transition from this tetramer to the dimeric BsPyrR structure. This mode is very similar to the three lowest frequency modes of tetrameric AncORANGEPyrR. The second lowest frequency mode of this tetramer also contributes most to the conformational change between AncORANGEPyrR and the dimeric VIOLETPyrR, which differ by just the eleven mutations (Figure S15C). For both tetramers, the transition to the dimer occurs via the low frequency normal modes that describe the same type of motions in the structure (Videos S3 – S5). This mode corresponds to the overall subunit rotation (and translation) required to go from one state to the other as described in Figure 4A. The difference between correlations in residue motions between a dimer and a tetramer were particularly clear when comparing the correlations of dimeric and tetrameric interface residues between AncORANGEPyrR and VIOLETPyrR. In AncORANGEPyrR, the residues from the tetrameric interface exhibit correlations across the subunit and between the two tetrameric interface helices of the two subunits of a dimer, while in VIOLETPyrR the majority of the specific correlations are located close to the dimeric interface (Figures 5B and S16). Furthermore, we observed that the residues corresponding to the three out of the eleven allosteric mutations (K62P, L68I, and A118G) are within the largest regions that undergo a collective change in intrinsic dynamics from one oligomeric state to the other. The residue corresponding to the V8I mutation also experiences a notable gain in correlation in both tetramers, related to its proximity to the tetrameric interface and the overall difference seen in that region (Figure S16 C). (Details of the statistical analysis of the correlated dynamics are described in the Supplementary Material and Figure S17.) It is important to emphasize that the dynamics was calculated for the dimeric halves of the tetrameric PyrR structures. This means that the observed differences do not stem from the additional residue contacts in the tetrameric (dimer-of-dimers) interface, but are solely due to the conformational differences between the dimer and the equivalent half of the tetramer. Thus the conformational differences between different PyrR proteins are encoded in their intrinsic dynamics and, remarkably, relatively small effects, such as ligand binding or a small number of strategic mutations, can utilise these dynamics to toggle an intrinsic conformational switch and change the quaternary structure. Conclusions Reconstituting the PyrR sequences and analysing their biophysical properties enables us to recapitulate the evolutionary history of the family (Figure 1B). In one part of the phylogenetic tree, PyrR has adapted to remain stable and functional at extremely high temperatures (BcPyrR), while in the other part (after the AncGREENPyrR node) the organisms have adapted to life at lower temperatures (25 °C). It is known that many proteins maintain marginal stability, and one would thus expect the protein stability to reflect the differences in environmental temperatures. This can be explained by the simple fact that the selection pressure for increased stability is relaxed at mesophilic temperatures, meaning that proteins can accumulate destabilizing mutations until they reach marginal stability (4). However, high stability can come at the expense of increased conformational rigidity, and a protein adapted to be stable at high temperatures may not be flexible enough to perform its function at a lower temperature (36). Whether it was adaptational or simply drift, in the case of PyrR, “downhill” mutations lowered thermostability, while at the same time selection for maintaining the RNA-binding site and allosteric regulation acted continuously throughout evolution. Interestingly, the accumulated “downhill” mutations caused small and cumulative changes sufficient to switch oligomeric state in the absence of mutations in the actual tetrameric interface. This change in the stability of the tetramer may have been an evolutionary by-product, demonstrating the power and importance of indirect and structurally allosteric mutations. We have shown that the change in oligomeric state occurred through an interplay of mutations impacting residue contact networks, inter-subunit geometry, intrinsic dynamics and thermostability. At the same time, for six out of the eleven mutations, we were able to estimate their relative contributions to each of these properties (Figure 6). Interestingly, the K84D mutation, which is in the part of the structure that seems to be disordered in some of the conformations, is predicted to affect all three properties simultaneously. G172Q and K181N are mutations in surface residues, predicted to significantly contribute to the change in thermostability. L68I and A118G are mutations in buried residues at the centre of a large residue-residue rewiring event in the transition from dimer to tetramer, both in evolution and ligand binding. Both residues are also part of a region of the protein with highly correlated dynamics. K62P is a mutation in a surface residue, weakly connected to the rest of the structure, but part of a region with highly correlated motions where a change has a significant effect on protein dynamics. Here we showed compellingly how mutations in residues outside the interface can introduce rearrangements that have a knock-on effect on the interface itself. We hope that the importance of mechanisms of allosteric mutations will become increasingly clear with the advancement of methods that accurately predict effects of mutations, as well as methods for engineering proteins with multiple functional conformations. Methods Ancestral sequence reconstruction To reconstruct the ancestral PyrR sequences between the dimeric BsPyrR and the tetrameric BcPyrR we retrieved all the PyrR protein sequences from UniProtKB, including the sequences of two outliers: PyrR from M. tuberculosis and T. thermophilus. We used MUSCLE (37) to calculate a multiple sequence alignment of the PyrR proteins (Figure S18). We performed Bayesian inference with MrBayes version 3.1 (38). The evolutionary tree topology, branch lengths and the sequences of ancestral nodes were calculated from a PyrR protein alignment by using an estimated fixed-rate evolutionary model. The gaps in the ancestral sequences were determined using the F81-like model for binary data implemented in MrBayes (39). Please refer to the Supplementary Materials for more details. Oligomeric state analysis by SEC-MALS We resolved the protein samples on a Superdex S-200 10/300 analytical gel filtration column (GE Healthcare), pre-equilibrated with 50 mM Tris pH 7.5, 150 mM NaCl, and 1 mM DTT, at 0.5 ml/min. We performed the measurements using an online Dawn Heleos II 18 angle light scattering instrument (Wyatt Technologies Corp.) coupled to an Optilab rEX online refractive index detector (Wyatt Technologies Corp.) in a standard SEC-MALS format. We used the ASTRA v5.3.4.20 software (Wyatt Technologies Corp.) to determine the absolute molecular mass from the intercept of the Debye plot using Zimm’s model (40) and analysed the light scattering and differential refractive index. We determined the protein concentration from the excess differential refractive index based on dn/dc of 0.186 mg/ml. In order to determine the inter-detector delay volumes, band broadening constants and the detector intensity normalization constants for the instrument, we used BSA as a standard prior to sample measurement. X- ray crystallography All crystallisation trials were performed with 15-20 mg/ml of protein, and the sample buffer was supplemented with 10 mM MgCl2. AncPURPLE crystalised with 1.2 time excess UMP and BsPyrR with 2 time excess GMP. AncGREENPyrR and BsPyrR were additionally supplemented with 400 mM (NH4)2SO4 in order to obtain crystals. We set up 100 nl protein drop crystallisation trials with the in-house LMB screen (41). We collected the X-ray diffraction data at the Diamond Synchotron (Oxford, UK). The data were processed in the CCP4 suite (42). Please refer to Supplementary Material for details on crystallisation conditions and data processing. All the structures were solved using molecular replacement with Phaser (43), rebuilt with Coot (44), and refined with Refmac5 (45). Structural superpositions and inter-subunit geometry comparisons Inter-subunit geometries of all the PyrR structures (Figures 4 and S8) were compared as described previously in (5) and illustrated in Figure S1. We superimposed individual subunits using a sievefit approach, described by Arthur Lesk, and used notably in (46), and implemented more recently in the Bio3D package for R (47). With sievefit, subunits are structurally aligned using only residues that are superposable with an RMSD below 0.5 Å. These residues then define the structural core of the subunit with its corresponding centre of mass. We first sievefit only subunit A of the complex, and then re-sievefit the B subunit, noting how much the centre of mass needs to deviate from its original position (as an angle of rotation and vector of translation). Normal mode analysis We studied the intrinsic dynamics of all the PyrR proteins for which we had high quality structures (meaning solved to a high resolution and not having more than a few residues missing from the structure). These structures were: AncORANGEPyrR, VIOLETPyrR, PLUMPyrR, PURPLEPyrR with UMP, as well as the wild type B. subtilis PyrR with and without GMP. We performed the calculations using the Cα atom elastic network model implemented in the Molecular Modeling ToolKit (48), on the dimeric units (dimers and halves of tetramers). The GMP and UMP ligands were modelled into the B. subtilis PyrR and the PURPLEPyrR structures by placing dummy nodes at the C4’, N9, and N1 or C4’, N1, and C4 positions of the bound nucleotides in both subunits, respectively. To compare the intrinsic dynamics of these structures, we used a structural alignment obtained from MUSTANG (49). The Bhattacharya coefficient (BC) (30, 50) was used as a measure of similarity in flexibility, with a score from 0 (completely dissimilar) to 1 (identical). Furthermore, the correlation matrices were calculated from the 100 lowest frequency modes (51). (For more details, refer to Supplementary Material.) The conformational overlap analysis of BsPyrR and BsPyrR+GMP, as well as AncORANGEPyrR with PLUMPyrR and VIOLETPyrR to obtain the modes that contribute to the transition from the tetrameric state to the dimeric state was done according to Reuter et al. (34). Here, we calculated overlaps between the modes of dimeric halves of tetramers and the structural difference vectors between the dimeric half of the tetramer and the corresponding dimer. Thermostability We estimated the thermostability of the PyrR proteins by measuring the circular dichroism (CD) 210-260 nm spectrum of each protein over a range of temperatures (from 20 to 90 °C). We heated the proteins gradually and continuously (0.2 °C per minute) and collected the spectrum every 5 °C. The proteins were measured at an approximate concentration of 5 µM. All the measurements were done on a ChirascanTM CD Spectrometer (AppliedPhotophysics). Mean residue ellipticity for each protein at each 5 °C temperature point was calculated as the degrees of CD corrected by exact protein concentration and the length of the protein (number of amino acids). References 1. N. Tokuriki, D. S. Tawfik, Stability effects of mutations and protein evolvability. Current opinion in structural biology 19, 596 (Oct, 2009). 2. S. J. Gould, R. C. Lewontin, The Spandrels of San Marco and the Panglossian Paradigm: A Critique of the Adaptationist Programme. Proceedings of the Royal Society B: Biological Sciences 205, 581 (Sep 21, 1979). 3. M. Kaltenbach, N. Tokuriki, Dynamics and constraints of enzyme evolution. Journal of experimental zoology. Part B, Molecular and developmental evolution, (Mar 13, 2014). 4. D. M. Taverna, R. A. Goldstein, Why are proteins marginally stable? Proteins 46, 105 (Feb 01, 2002). 5. T. Perica, C. Chothia, S. A. Teichmann, Evolution of oligomeric state through geometric coupling of protein interfaces. Proceedings of the National Academy of Sciences of the United States of America 109, 8127 (Jun 22, 2012). 6. M. Karplus, J. Kuriyan, Molecular dynamics and protein function. Proceedings of the National Academy of Sciences of the United States of America 102, 6679 (Jun 10, 2005). 7. S. Maguid, S. Fernandez-Alberti, G. Parisi, J. Echave, Evolutionary conservation of protein backbone flexibility. Journal of molecular evolution 63, 448 (Oct, 2006). 8. E. Juritz, N. Palopoli, M. S. Fornasari, S. Fernandez-Alberti, G. Parisi, Protein conformational diversity modulates sequence divergence. Molecular biology and evolution 30, 79 (Feb, 2013). 9. C. L. Turnbough, R. L. Switzer, Regulation of pyrimidine biosynthetic gene expression in bacteria: repression without repressors. Microbiology and molecular biology reviews : MMBR 72, 266 (Jul 01, 2008). 10. E. R. Bonner, J. N. D'Elia, B. K. Billips, R. L. Switzer, Molecular recognition of pyr mRNA by the Bacillus subtilis attenuation regulatory protein PyrR. Nucleic acids research 29, 4851 (Dec 01, 2001). 11. C. M. Jørgensen et al., pyr RNA binding to the Bacillus caldolyticus PyrR attenuation protein. Characterization and regulation by uridine and guanosine nucleotides. The FEBS journal 275, 655 (2008). 12. S. Maass et al., Efficient, global-scale quantification of absolute protein amounts by integration of targeted mass spectrometry and two-dimensional gel-based proteomics. Analytical chemistry 83, 2677 (May 01, 2011). 13. H. K. Savacool, R. L. Switzer, Characterization of the interaction of Bacillus subtilis PyrR with pyr mRNA by site-directed mutagenesis of the protein. Journal of bacteriology 184, 2521 (Jun, 2002). 14. M. J. Harms, J. W. Thornton, Analyzing protein structure and function using ancestral gene reconstruction. Current opinion in structural biology 20, 360 (Jul 01, 2010). 15. J. T. Bridgham, E. A. Ortlund, J. W. Thornton, An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature 461, 515 (Sep 24, 2009). 16. L. I. Gong, M. A. Suchard, J. D. Bloom, Stability-mediated epistasis constrains the evolution of an influenza protein. eLife 2, e00631 (2013). 17. J. K. Hobbs et al., On the origin and evolution of thermophily: reconstruction of functional precambrian enzymes from ancestors of bacillus. Molecular biology and evolution 29, 825 (Mar 01, 2012). 18. U. J. Heinen, W. Heinen, Characteristics and properties of a caldo-active bacterium producing extracellular enzymes and two related strains. Archiv für Mikrobiologie 82, 1 (Jul 19, 1971). 19. P. D. Williams, D. D. Pollock, B. P. Blackburne, R. A. Goldstein, Assessing the accuracy of ancestral protein reconstruction methods. PLoS computational biology 2, e69 (Jul 23, 2006). 20. M. Robinson-Rechavi, A. Alibés, A. Godzik, Contribution of electrostatic interactions, compactness and quaternary structure to protein thermostability: lessons from structural genomics of Thermotoga maritima. Journal Of Molecular Biology 356, 547 (Mar 17, 2006). 21. S. Fukuchi, K. Nishikawa, Protein surface amino acid compositions distinctively differ between thermophilic and mesophilic bacteria. Journal Of Molecular Biology 309, 835 (Jul 15, 2001). 22. P. Chander et al., Structure of the nucleotide complex of PyrR, the pyr attenuation protein from Bacillus caldolyticus, suggests dual regulation by pyrimidine and purine nucleotides. Journal of bacteriology 187, 1773 (Apr 01, 2005). 23. D. R. Tomchick, R. J. Turner, R. L. Switzer, J. L. Smith, Adaptation of an enzyme to regulatory function: structure of Bacillus subtilis PyrR, a pyr RNA-binding attenuation protein and uracil phosphoribosyltransferase. Structure (London, England : 1993) 6, 337 (Apr 15, 1998). 24. G. Amitai et al., Network analysis of protein structures identifies functional residues. Journal Of Molecular Biology 344, 1135 (Dec 03, 2004). 25. V. Soundararajan, R. Raman, S. Raguram, V. Sasisekharan, R. Sasisekharan, Atomic interaction networks in the core of protein domains and their native folds. PLoS ONE 5, e9391 (2010). 26. A. J. Venkatakrishnan et al., Molecular signatures of G-protein-coupled receptors. Nature 494, 185 (Mar 14, 2013). 27. X. Zhang, T. Perica, S. A. Teichmann, Evolution of protein structures and interactions from the perspective of residue contact networks. Current opinion in structural biology, (Jul 25, 2013). 28. E. Dellus-Gur, A. Tóth-Petróczy, M. Elias, D. S. Tawfik, What Makes a Protein Fold Amenable to Functional Innovation? Fold Polarity and Stability Trade-offs. Journal Of Molecular Biology, (Apr 28, 2013). 29. K. Hinsen, A. J. Petrescu, S. Dellerue, M. C. Bellissent-Funel, G. R. Kneller, Harmonicity in slow protein dynamics. Chemical Physics 261, 25 (2000). 30. E. Fuglebakk, J. Echave, N. Reuter, Measuring and comparing structural fluctuation patterns in large protein datasets. Bioinformatics (Oxford, England) 28, 2431 (Oct 01, 2012). 31. J. Echave, Evolutionary divergence of protein structure: The linearly forced elastic network model. Chemical Physics Letters 457, 413 (2008). 32. A. Leo-Macias, P. Lopez-Romero, D. Lupyan, D. Zerbino, A. R. Ortiz, An analysis of core deformations in protein superfamilies. Biophysical journal 88, 1291 (Mar 01, 2005). 33. F. Raimondi, M. Orozco, F. Fanelli, Deciphering the deformation modes associated with function retention and specialization in members of the Ras superfamily. Structure (London, England : 1993) 18, 402 (Apr 10, 2010). 34. N. Reuter, K. Hinsen, J.-J. Lacapère, Transconformations of the SERCA1 Ca-ATPase: a normal mode study. Biophysical journal 85, 2186 (Oct, 2003). 35. F. Tama, Y. H. Sanejouand, Conformational change of proteins arising from normal mode calculations. Protein engineering 14, 1 (2001). 36. P. Závodszky, J. Kardos, Svingor, G. A. Petsko, Adjustment of conformational flexibility is a key event in the thermal adaptation of proteins. Proceedings of the National Academy of Sciences of the United States of America 95, 7406 (Jul 23, 1998). 37. R. C. Edgar, MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research 32, 1792 (2004). 38. F. Ronquist, J. P. Huelsenbeck, MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics (Oxford, England) 19, 1572 (Aug 12, 2003). 39. J. Felsenstein, Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of molecular evolution 17, 368 (1981). 40. B. H. Zimm, The Scattering of Light and the Radial Distribution Function of High Polymer Solutions. Journal of Chemical Physics 16, 1093 (Dec, 1948). 41. D. Stock, O. Perisic, J. Löwe, Robotic nanolitre protein crystallisation at the MRC Laboratory of Molecular Biology. Progress in biophysics and molecular biology 88, 311 (Jul, 2005). 42. M. D. Winn et al., Overview of the CCP4 suite and current developments. Acta crystallographica Section D, Biological crystallography 67, 235 (May, 2011). 43. A. J. McCoy et al., Phaser crystallographic software. Journal of applied crystallography 40, 658 (Aug 01, 2007). 44. P. Emsley, K. Cowtan, Coot: model-building tools for molecular graphics. Acta crystallographica Section D, Biological crystallography 60, 2126 (Dec, 2004). 45. G. N. Murshudov et al., REFMAC5 for the refinement of macromolecular crystal structures. Acta crystallographica Section D, Biological crystallography 67, 355 (May, 2011). 46. M. Gerstein, C. Chothia, Analysis of protein loop closure. Two types of hinges produce one motion in lactate dehydrogenase. Journal Of Molecular Biology 220, 133 (Jul 05, 1991). 47. B. J. Grant, A. P. C. Rodrigues, K. M. ElSawy, J. A. McCammon, L. S. D. Caves, Bio3d: an R package for the comparative analysis of protein structures. Bioinformatics (Oxford, England) 22, 2695 (Nov 01, 2006). 48. K. Hinsen, The molecular modeling toolkit: A new approach to molecular simulations. Journal of Computational Chemistry 21, 79 (Feb 30, 2000). 49. A. S. Konagurthu, J. C. Whisstock, P. J. Stuckey, A. M. Lesk, MUSTANG: a multiple structural alignment algorithm. Proteins 64, 559 (Aug 15, 2006). 50. E. Fuglebakk, N. Reuter, K. Hinsen, Evaluation of Protein Elastic Network Models Based on an Analysis of Collective Motions. Journal of Chemical Theory and Computation 9, 5618 (Dec 10, 2013). 51. T. Ichiye, M. Karplus, Collective motions in proteins: A covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations. Proteins 11, 205 (Nov, 1991). Acknowledgments: The authors wish to thank Prof. Robert L. Switzer (University of Illinois) for the cDNA of B. subtilis and B. caldolyticus PyrR. We thank Eviatar Natan and Dominika Gruszka for practical help, and Christine Vogel, Edvin Fuglebakk, and Nobuhiko Tokuriki for helpful discussions and insights. We thank Dr. Kiyoshi Nagai for generous access to his laboratory. We also thank Emmanuel Levy, Joseph Marsh, Roman Laskowski, Merridee Wouters and Boris Lenhard for feedback on the manuscript. YK was supported by Nakajima Foundation. XZ is supported by an Early Postdoc Mobility Fellowship from the Swiss National Science Foundation (SNSF, Grant Number PBELP2_143538). JC is a Senior Wellcome Trust Research Fellow (grant number 095195). ST and NR are supported by Bergen Forskningsstiftelse. This work was supported by the Medical Research Council, Lister Research Prize to SAT, and a Henry Wellcome Postdoctoral Fellowship to TP. Coordinates and structural factors have been deposited with the Protein Data Bank (entry codes 4P80, 4P81, 4P82, 4P83, 4P84, 4P86 and 4P3K) Figures Fig. 1 (A) Schematic representation of the pyrimidine operon attenuator system in Bacillus sp. Attenuator protein, PyrR, binds to the PyrR binding loop as a dimer. UMP allosterically promotes the binding of RNA, while addition of GMP decreases the affinity for RNA (11). Different Bacillus species live in different environments and are adapted to different optimal growth temperatures. (B) The phylogenetic tree of Bacillus PyrR proteins (inferred using Bayesian MCMC) shows the variety of optimal growth temperatures for different Bacillus species: Bacillus caldolyticus, lives at temperatures higher than 70 °C and, at room temperature, its PyrR is a homotetramer. Bacillus subtilis optimal growth temperature is 25 °C, and at room temperature, its PyrR is in equilibrium between a homodimer and a homotetramer (illustrated as just a dimer for simplicity). Analysis of the reconstructed ancestral sequences shows that the change from a tetramer to a dimer, occurred on the final (blue) branches of the tree, where 15 allosteric mutations (m3) turn a tetrameric AncGREENPyrR into a dimeric BsPyrR. A subset of eleven of those allosteric mutations (11/m3) also switches the oligomeric state in the context of the ancestral AncORANGEPyrR. Fig. 2 Analysis of PyrR oligomeric states. Samples of AncORANGEPyrR, AncGREENPyrR, Bacillus subtilis PyrR (BsPyrR), BsPyrR + GMP, Bacillus caldolyticus PyrR (BcPyrR), BcPyrR+GMP, and VIOLETPyrR at varying concentrations were separated by size-exclusion chromatography prior to determination of the excess refractive index and multi-angle light scattering (SEC-MALS) from which the molecular masses are determined. Horizontal dashed lines represent the expected masses for monomeric, dimeric and tetrameric PyrR species, respectively. Fig. 3 Allosteric mutations affect oligomeric state and thermostability. (A) Summary of the effects of allosteric mutations on oligomeric state along the PyrR phylogenetic tree. (B) Two non-overlapping subsets of the eleven allosteric mutations (3/m3 mutations (PLUMPyrR), and 8/m3 mutations (MAGENTAPyrR)) are enough to overcome the threshold and cause instability of the tetramer sufficient at 1 µM protein concentrations. All eleven allosteric mutations (11/m3) together (VIOLETPyrR) have a larger effect on the stability of the tetramer than the 3/m3 and the 8/m3 individual subsets of mutations. The eleven allosteric mutations change oligomeric state only in the context of AncORANGEPyrR (or AncGREENyrR), but not in the context of BcPyrR. This is due to the epistasis of (a subset of) m1 mutations over the m3 mutations. (C) Oligomeric state and thermostability are coupled in PyrR, but it is the thermophilic propensity of residues, not the oligomeric state that determines thermostability. Thermal unfolding of PyrR homologues, inferred from circular dichroism at 222 nm at temperatures ranging from 30 to 90 °C (from 20 to 90 °C for BsPyrR and PURPLEPyrR). Loss of CD signal at 222 nm is interpreted as the loss of helicity. The circular dichroism (CD) signal is plotted as the mean residue ellipticity corrected for protein concentration. Fig. 4 (A) Change in oligomeric state through evolutionary variation, functional allostery, or recapitulated by engineering is always coupled to the same difference in inter-subunit geometry. The three superimposed pairs of structures: (i) Bacillus subtilis PyrR (BsPyrR, pdb:1a3c) dimers superimposed on Bacillus caldolyticus PyrR (BcPyrR, pdb: 1non) tetramer; (ii) VIOLETPyrR dimers superimposed on AncORANGEPyrR tetramer; (iii) Bacillus subtilis PyrR (BsPyrR, pdb:1a3c) dimers superimposed on tetrameric Bacillus subtilis PyrR in complex with GMP. The inter-subunit geometry of free BsPyrR is incompatible with formation of the dihedral tetramer, however, GMP binding introduces a 10° inter-subunit geometric change and BsPyrR forms a tetramer. The VIOLETPyrR inter-subunit geometry is not compatible with the formation of the tetramer formed by AncORANGEpyrR, and this difference of conformation and oligomeric state is brought about by eleven allosteric mutations. (B) The tetrameric AncORANGEPyrR and dimeric VIOLETPyrR residue-residue contact networks differ by 15% of their contacts. Orientation of networks can be further explored in Supplementary Videos S1 and S2. L68I, K84D and A118G are central hubs and are involved in the majority of contact rewiring between the dimeric and tetrameric networks. Fig. 5 PyrR intrinsic dynamics and oligomeric state. (A) All PyrR proteins have a similar intrinsic dynamics, but the three dimeric proteins are more similar than the tetramers. This difference is most pronounced when comparing the sets of dimeric and tetrameric interface residues with correlated dynamics specific for the dimeric VIOLETPyrR or the tetrameric AncORANGEPyrR. (B) Both structures are represented by only their Cα atoms, connected by green or yellow edges if at least one of the residues is involved either in the dimeric or the tetrameric interface and only if the pair of residues is moving in a concerted, correlated manner either only in the dimeric VIOLETPyrR (yellow edges) or only in the tetrameric AncORANGEPyrR (green edges). The residues corresponding to the eleven allosteric mutations (11/m3) are coloured in red. The sets of residues with correlation differences shown here have a cluster size of more than three, and fall within the correlation difference threshold of 0.1. (Both threshold values were chosen for the sake of clarity; please see Figures S16 and S17 for a more exhaustive analysis of the correlation differences). Fig. 6 Summary of mutational mechanisms. A small number of allosteric mutations are responsible for the evolutionary difference in oligomeric state, thermostability and dynamics of PyrR homologues. We show the mechanism(s) by which each mutation acts, and summarize similar mutations using the same colour. 1 Supplementary Materials: Figures S1 - S18 Tables S1 and S2 Videos S1 – S4 Supplementary Materials and Methods References 52-68 2 Figure S1 BcPyrR is in a conformation compatible with the formation of a homotetramer with dihedral symmetry. Conformation of BsPyrR dimer (rotated 8° around the dimeric interface) is not compatible with the BcPyrR tetramers, as the tetrameric interface helices are approximately 5 Å apart. Conformational change of relative positions of subunits (their centres of mass) around the dimeric interface (described as a rotation of approx. 8°). We compared the inter-subunit geometries of PyrR homologues using sievefit, an approach where subunits are structurally aligned using only residues that are superposable with an RMSD below 0.5 Å. We would first sievefit only subunit A of the complex, and then re-sievefit the B subunit, noting how much the centre of mass needs to deviate from its original position (as an angle of rotation and vector of translation). 3 Figure S2 Bacillus subtilis PyrR surface residues coloured by their Rate4Site score (52). Residues with lower Rate4Site score are more conserved (i.e. evolve more slowly, have lower evolutionary rate). The surface on the right-hand side is more conserved and corresponds to the tetrameric interface helix (highlighted with a purple rectangular), as well as the putative RNA loop binding site. 4 Figure S3 SEC-MALS analysis of PyrR from B. subtilis and B. caldolyticus in the presence of UMP. 5 Figure S4 Analytical ultracentrifugation (AUC) analysis of the distribution of oligomeric species of PyrR proteins. We are showing only the c(s) distributions at 2 µM for the majority of samples for clarity of lowly populated species. 6 Figure S5 Circular dichroism (CD) spectra for PyrR proteins. CD spectra for B. subtilis PyrR and B. caldolyticus PyrR at a range of temperatures. B. caldolyticus PyrR unfolds cooperatively at approx. 75 °C, while the CD spectrum of B. subtilis PyrR changes differently for different wavelengths. Plotting change in ellipticity with temperature for different wavelengths shows that 7 while BcPyrR and VIOLETPyrR lose ellipticity for all wavelengths cooperatively, BsPyrR and PURPLEPyrR only partially lose ellipticity albeit at lower temperatures. Figure S6 Evolutionary change in thermophilic propensity in the PyrR family. (A) Thermophilic propensity of a residue is defined as a log ratio of amino acid frequencies in proteins from thermophilic versus proteins from mesophilic organisms. Amino acid frequencies in thermophilic and mesophilic organisms were calculated in (21). (B) Thermophilic propensity increases in the PyrR family towards the thermophilic BcPyrR and decreases towards the 8 mesophilic BsPyrR. (C) Eleven allosteric mutations contribute significantly to the decrease in thermophilic propensity between AncGREENPyrR and BsPyrR. According to the calculated thermophilic propensity, mutations in three surface residues (G172Q, K181N and K84D) contribute most towards the decrease in thermostability. Figure S7 Binding of nucleotides to PyrR. UMP and GMP both bind to the same nucleotide binding pocket. Our crystal structures also show a second GMP binding site, stacking between the subunits. Most X-ray crystal structures of PyrR homologs have a sulphate ion (SO4) bound in the nucleotide pocket, if crystallised without the nucleotides. The binding site of UMP and GMP overlap very well, and SO4 binds close to the phosphate of the nucleotides. 9 Figure S8 Inter-subunit geometry structural comparisons of extant, ancestral and engineered PyrR proteins. Although there are 23 substitutions between AncORANGEPyrR and BcPyrR (which is almost half of the mutations between BsPyrR and BcPyrR), the two proteins, have very similar inter-subunit geometries. The further 19 substitutions, from AncORANGEPyrR and AncGREENPyrR introduce more variation in the inter-subunit geometry, as well as a slight shift 10 in the elution peak in SEC MALS (Figure 2). Also shown is AncGREENPyrR, crystallised in a dimeric crystal form. Its inter-subunit geometry however, still exhibits an 8° rotation difference from BsPyrR, and structural superposition of AncGREENPyrR with the tetrameric BcPyrR shows that AncGREENPyrR inter-subunit geometry is compatible with tetramer formation, unlike that of BsPyrR. The engineered dimeric VIOLETPyrR has a very similar inter-subunit geometry to BsPyrR, incompatible with tetramer formation. It is important to note that the packing in different crystal forms is not sufficient to explain the differences in the inter-subunit geometry we observe. All PyrR dimers crystalized in a C121 (or I121) form, but so did AncGREENPyrR and the tetrameric BsPyrR with GMP. Also, in our previous work we controlled for the variation due to the crystal packing in all the families analysed, including PyrR (5). For all the cases where a change in inter-subunit geometry could explain the difference in oligomeric state, the differences in inter-subunit geometry between dimers and tetramers was larger than the inter-subunit geometry between different crystal forms. 11 Figure S9 Structural backbone superpositions of single subunits of different extant, ancestral and engineered PyrR proteins. All versions of the individual subunits superpose very well, illustrating the close similarity between their individual folds. 12 Figure S10 Method for comparing structures using residue-residue networks. 13 Figure S11 Residue-residue contact network differences between AncORANGEPyrR and VIOLETPyrR. 14 Figure S12 Eleven allosteric mutations and structural changes. For each residue in the dimeric PyrR structures we calculated the number of rewired contact in its subnetwork (radius 2, see 15 Figure S10). The kernel density plot shows the distribution of the number of rewired contacts for all ~360 residues of the dimer (gray distribution), and the distribution for only residues buried in the protein interior (plotted in black). The distributions change depending on which sets of structures are compared, for example, comparison of dimers (VIOLETPyrR with PLUMPyrR and BsPyrR) shows on average much less contact residue-residue rewiring than the comparison of VIOLETPyrR and AncORANGEPyrR. 48 to 56 out of 360 (13-16 %) residues show more than one standard deviation contact rewirings than an average residue (number shown in parentheses under the 1σ label) and thus form the region that undergoes significant structural change. Out of the residues changed by the eleven allosteric mutations, residues K/D 84, L/I 68 and A/G 118 fall into this group. 16 Figure S13 Comparison of residue-residue contact rewirings the eleven allosteric mutations are involved in for three levels of residue-residue contact shells. One shell of contacts considers only 17 residues that are in direct contact. Three out of the eleven allosteric residues (K/D 84, L/I 68, and A/G 118) show significant level of residue-residue contact rewiring when considering both one and two shells of contacts. Figure S14 Hierarchical clustering of PyrR structures based on their intrinsic dynamics compared to the clustering based on RMSD. The left panel shows the clustering based on the intrinsic dynamics as quantified by the BC score (refer to Figure 5A in the main text), where high BC score denotes higher similarity in intrinsic dynamics. The right panel shows the RMSD obtained from the multiple structure alignment performed to determine the corresponding Cα atoms between the structures in Cartesian coordinate space, where low RMSD denotes high structural similarity. Both measures agree well with each other, even though they are comparing different properties. The BC provides a pairwise comparison of the covariances calculated from the normal mode vectors of each structure, from the regions that correspond in all of the structures, 18 as they encode the changes in dynamics that are conferred by the static change in atomic positions quantified by the RMSD. Figure S15 Overlap between the conformational transition from tetrameric to dimeric state, and the calculated normal modes. The left column (A and C) corresponds to the squared overlap of the first two hundred normal modes of the dimeric units of the tetramers with the structural difference vectors between the tetrameric and dimeric conformations, while the right panel (B and D) are their respective cumulative squared overlap plots for all the normal modes calculated. (A) The overlap of BsPyrR + GMP with BsPyrR, shows that the second lowest energy normal 19 mode is the top contributing normal mode (0.44 overlap) to the transition between the two oligomeric states. (B) The cumulative plot of the BsPyrR + GMP and BsPyrR shows that close to 70% of this transition overlaps with the three lowest energy modes. (C) A similar trend is observed between AncORANGEPyrR and VIOLETPyrR, with the second lowest energy normal mode being the top contributor (0.14). (D) The cumulative overlap between AncORANGEPyrR and VIOLETPyrR shows a much gentler ascent, with more than half of the transition lying within the 100 lowest energy normal modes. 20 Figure S16 Dynamics of PyrR in evolution and function. (A) Normalised atomic displacement of Mode 2 of two tetrameric PyrRs: AncORANGEPyrR (orange) and BsPyrR + GMP (blue). The displacement is plotted as the residue index according to the structural alignment (x axis) versus the normalised atomic displacement (y axis). Videos S3 and S4 illustrate the corresponding structural change. The positions of the dimeric and the tetrameric interface in both subunits are indicated as dark blue and dark green dots, respectively. This low energy mode shows that the displacement of the tetrameric interface is greater than that of the dimeric interface, in both subunits. (B) Correlation matrix heatmap of BsPyrR. The motions of individual residues in the normal modes of proteins can range from being highly anti-correlated (-1, blue) to highly correlated (1, red), with 0 representing no correlation. Both the dark red and dark blue colours should be interpreted as regions with high correlations. We observe high intra- and inter-subunit correlations across the structure. The latter are highlighted by the large box on the top left part of the map. The tetrameric helix (small black box) correlates particularly well with other secondary structure elements, right up to the elements of the second subunit (highlighted with the dashed lines extended from the small square box). We infer that the changes in any of those parts of the structure could affect the tetrameric helices. (C) Correlation difference heatmaps of tetrameric interface residues in dimeric halves of tetramers versus the dimers: PURPLEPyrR - PLUMPyrR+UMP, AncORANGEPyrR – VIOLETPyrR, and BsPyrR+GMP - BsPyrR. We observe higher correlations between the two tetrameric interfaces in tetramers than in the dimers. The pale red patch on the plot indicates the gain of above 0.1 in correlation in the tetramers, and includes the region corresponding to the tetrameric helix in both subunits. The tetrameric helix region is indicated by dotted lines at residue alignment positions, 12 – 29 (first monomer) and 197 – 216 (second monomer). 21 Figure S17 Statistical analysis of clusters of correlation differences between the dimeric units of AncORANGEPyrR and VIOLETPyrR (A). The clusters are chosen such that each possesses a minimum size of 1 pair of residues with a correlation difference score of or above +0.05, or below -0.05, reflecting a gain in correlation for the first and second structures, respectively. The kernel density plot shows the distributions of the correlation difference clusters in total (black), that involve the mutations (red) and without mutations (blue). The distribution shifts significantly when clusters of correlation differences that include the mutated amino acids are considered in relation to all the clusters collected in the analysis, where the peak of the mutations distribution increases to a cluster size of 7 amino acids. The box-and-whisker plots show the range of sizes (y-axis) of the clusters that reflect a gain in correlation in one structure compared to the other structure, for each of the mutated positions. (B) AncORANGEPyrR (left panel, orange) and VIOLETPyrR (right panel, violet), (C) AncORANGEPyrR (left panel, orange) and 22 PLUMPyrR (right panel, plum) and (D) VIOLETPyrR (left panel, violet) and PLUMPyrR (right panel, plum). The extreme outliers were excluded in all cases. Nevertheless, the positions Q/N and V/I in both (B) and the corresponding Q/Q and V/V positions in (C) are associated with the largest cluster (of size 628) due to their proximity to the tetrameric interface, as described by the large pink patches in Fig. S16C. The tetramer, AncORANGEPyrR gains a greater number of correlation difference clusters that are larger in size than the dimers, VIOLETPyrR and PLUMPyrR, with significant contributions from the three mutated positions, K/P, L/I and A/G. The difference between the dimers VIOLETPyrR and PLUMPyrR exhibits smaller clusters compared to the difference between AncORANGEPyrR and the dimers. PLUMPyrR is consistently implicated with smaller clusters of correlation gain when compared to AncORANGEPyrR. 23 Figure S18 Multiple sequence alignment produced by MUSCLE (37). We reconstructed ancestral sequences based on this alignment with MrBayes (version 3.1) (38). 24 Figure S19 Multiple protein sequence alignment of BsPyrR, BcPyrR and all inferred ancestral as well as engineered PyrR proteins described in this study. PyrR proteins start with a Met residue, but due to the His-tag purification and His-tag cleavage with Thrombin, all the biophysical and structural experiments were performed with the PyrR proteins having a Gly-Ser instead of a Met in the beginning of the sequence. Residue numbers in all the figures and throughout the text as well as in the files in the PDB Database are according to this alignment. 25 Table S1 Sedimentation coefficients s(20, w) from velocity sedimentation AUC experiments. The s(20, w) value is a sedimentation coefficient corrected for viscosity and density of the solvent, relative to that of water at 20 °C. Values are shown as means with their standard deviations. 26 Table S2 Crystallographic data collection and refinement statistics 27 Table S2 continued a) Merging R factor € Rmerge = I i(hkl)− I(hkl) / I i(hkl) i ∑ hkl ∑ i ∑ hkl ∑ b) Calculated in Refmac (45) c) Calculated in Molprobity (53) 28 Supplementary Videos 1 and 2 – residue-residue contact networks Supplementary Videos 1 and 2 show a 720° view of subunits A and B of AncORANGEPyrR and VIOLETPyrR represented as residue-residue contact networks. Contacts conserved between these two crystal structures are shown in pale violet, and the ones specific for AncORANGEpyrR and VIOLETPyrR in orange and violet, respectively. For orientation, the subunits C and D of the AncORANGEPyrR tetramer are shown in a cartoon representation. Supplementary Videos 3, 4 and 5 – normal mode analysis Applying Mode 2 to the trace of the X-ray structures of AncORANGEPyrR and of the BsPyR + GMP complex, we generated 50 conformations along Mode 2 to generate a movie illustrating the associated displacement. Note that the amplitude of the movement is arbitrarily chosen as the elastic network model predicts only the directions and not amplitudes of movements. Video S3 – Animation of the AncORANGEPyrR dimeric units transformed along Mode 2. Mode 2 is the second lowest frequency mode obtained from AncORANGEPyrR, which was found to be the largest contributor to the conformational transition from tetramer to dimer in Fig. S14. The tetrameric helices (in cyan) are part of larger amplitude movements with respect to the rest of the structure, as also shown in the normalised atomic displacement plot in Figure S16A. Video S4 – In this video the dynamics animation is obtained in the same way as in Video S3, but with both dimeric units of AncORANGEPyrR shown moving towards and away from each other as they transform along the mode, using 100 conformations for each unit. Video S5 – Animation of the BsPyrR + GMP dimeric units transformed along Mode 2. Mode 2 is the second lowest frequency mode obtained from BsPyrR + GMP, which was found to be the largest contributor to the conformational transition from tetramer to dimer in Figure S15. The 29 tetrameric helices (in cyan) are among the regions displaced the most along this mode, as also shown in the normalized atomic displacement plot in Figure S16A. 30 Supplementary Methods Ancestral sequence reconstruction We performed the Bayesian inference by using MrBayes version 3.1 (38). The evolutionary tree topology, branch lengths and the sequences of ancestral nodes were calculated from a PyrR protein alignment using an estimated fixed-rate evolutionary model and PyrR from Mycobacterium tuberculosis for rooting the tree. MrBayes version 3.1. has nine default fixed rate models: Dayhoff (54), mtREV (55), MtMam (56), WAG (57), RtREV (58), CpREV (59), VT (60) and Blosum62 (61).In our analysis we have not determined the fixed rate model a priori, but have allowed MrBayes to jump between the models during the calculation until it converges with each of the models contributing in proportion to its posterior probability. For all the analyses, the ones estimating the evolutionary tree, as well as the ones calculating the ancestral sequences, the model with the highest posterior probability was Blosum62, followed by the WAG model. Each analysis was performed in four independent runs, with identical settings, and each run went through 1 000 000 generations, with the Markov Model Monte Carlo chain (MCMC) being sampled every 100 generations. When inferring sequences of ancestral nodes, each node is calculated separately, as suggested by the authors of MrBayes (v. 3.1) (38), in order to integrate the uncertainty in the rest of the tree. The program provides the probability of each amino acid for each state (i.e. position in the sequence). The most correct way of inferring the ancestral sequence using the Bayesian principle would be to sample a variety of likely ancestral sequences based on the posterior probabilities for each position. However, as a series of low-throughput experiments had to be performed on each of the ancestral proteins, we chose the most likely sequence, i.e. a sequence where each position has an amino acid with the highest posterior probability, to test experimentally. All of the 31 ancestral sequences expressed in that way expressed and purified equally well as the extant proteins. MrBayes (v 3.1) treats alignment gaps as missing data. In practice, this means the inferred sequence will be as long as the multiple sequence alignment used to infer it. MrBayes (v 3.1), therefore, implements a simple F81-like (39) model for binary type of data, which can be used to infer gap positions. The F81 model assumes all the sites in the sequence are independent, but the probability of change from i to j is proportional to the frequency of state j. In this simplified binary model, there are only two states, 1 representing a gap, and 0 representing an absence of a gap. The entire multiple sequence alignment was translated into a binary form (gaps (1), and absence of gaps (0)) and MrBayes (v3.1) was run for each node under the binary model. As in the case of inferred ancestral protein sequences, the program outputs a probability of a gap for each sequence position. We combined the binary and amino acid ancestral sequences in order to obtain the ancestral protein sequences of correct length. Protein cloning, expression and purification The cDNA of the wild type B. subtilis and B. caldolyticus PyrR was a gift from Prof. Robert L. Switzer from University of Illinois. Ancestral PyrR sequences were synthesized by GeneArt (Life Technologies), and we obtained the point mutants using the Quickchange protocol. We cloned all the constructs into a pRSET vector with a C-terminal His-tag and expressed in OverExpress™ C41 (DE3) cells. The lysis buffer contained 50 mM Tris pH 7.5, 300 mM NaCl, 1 M urea, and 2 mM β-mercaptoethanol. The proteins were eluted from the Ni-NTA column with 250 mM imidazole, and the His-tag was cleaved with Thrombin overnight at room temperature during dialysis into a buffer with 0.5 M urea (50 mM Tris pH 7.5, 300 mM NaCl and 5 mM β - mercaptoethanol). We then additionally purified the proteins on a size exclusion column (HiLoad 32 26/60 Superdex 200) and eluted them with a buffer containing 50 mM Tris pH 7.5, 150 mM NaCl and 1 mM DTT. Before setting up crystallization trays we additionally purified the proteins on a MonoQ column (GE Healthcare) and eluted them using a gradient of 50 mM to 1 M NaCl. Analytical ultracentrifugation Purified proteins were subjected to analytical ultracentrifugation using an Optima XL-I analytical ultracentrifuge (Beckmann) at various concentrations from 40 to 2 µM. Velocity sedimentation was carried out at 45,000 rpm at 10 ˚C in 50 mM Tris, pH 7.5, 150 mM NaCl, and 1 mM DTT using 12 mm double sector cells in an An60Ti rotor. The sedimentation coefficient distribution function, c(s), was analyzed using the Sedfit program, version 13.0 (62) with floated frictional ratios (f/fo) of between 1.27-1.32. Masses of sedimenting species were calculated assuming a constant f/fo. The partial-specific volume (v-bar), solvent density and viscosity were calculated using Sednterp (Dr. Thomas Laue, University of New Hampshire). Crystalisation conditions AncORANGEPyrR, PLUMPyrR, VIOLETPyrR and BsPyrR supplemented with 2 times excess GMPwere crystallized in Cryo 1 screen (Emerald BioStructures) condition 45, Cryo 2 screen (Emerald BioStructures) condition 5, Cryo2 screen condition 39, and CS Cryo (Hampton Research) condition 9, respectively. The crystals were flash-frozen by liquid nitrogen before data collection. BsPyrR was crystallised in the Wizard 2 screen (Emerald BioStructures) condition 38 and cryo-protected by soaking into the crystallization buffer supplemented with 25% glycerol before flash freezing. PURPLEPyrR supplemented with 1.2 excess UMP was crystallized in CS Lite screen (Hampton Research) condition 6 and cryo-protected by soaking into crystallization buffer supplemented by 30 % glycerol before flash freezing. AncGREENPyrR was crystallized in 1.2 M Ammonium Sulfate, 0.08 M Na Acetate pH4.8, 20% glycerol. Diffraction data were 33 collected at Diamond Light Source beam lines I02 and I04-1. AncORANGEPyrR, AncGREENPyrR, PURPLEPyrR, and VIOLETPyrR data were manually integrated with iMosflm (63) and scaled with Aimless (64). PLUMPyrR and BsPyrR data were processed by CCP4 (42), Pointless (64), Xds (65), and Xia2 (66). A structure of PyrR monomer (PDB ID: 1a3c) was used as a search model for molecular replacement by Phaser (43). The model was rebuilt using Coot (44) and the structure was refined by Refmac (45). Thermophilic propensity We used the frequencies of amino acids in proteins from mesophilic and thermophilic organisms from (21) and calculated the thermophilic propensity of an amino acid as its frequency in thermophilic proteins divided by its frequency in mesophilic proteins. In order to estimate the change of thermophilic propensity between two structures, for each mutation we subtracted the propensity of the amino acid in the second structure from that of the amino acid in the first. Fukuchi and Nishikawa (21) have shown that differences in propensity are more pronounced when only considering surface residues, rather than all the residues of the protein. We defined surface residues based on the percentage of accessible surface area of the residue in the structure, using thresholds defined by Levy (67). Comparison of residue-residue contact networks We define a residue-residue contact as a pair of residues in an X-ray crystal structure that have at least one pair of heavy atoms whose distance is less than the sum of their van der Waals radii (as defined in (68)), allowing for a 0.5 Å error to accommodate the uncertainties in atomic positions. Each residue is represented as a single node (illustrated by a Cα atom in Figures 4, 5, S10, and S11), but existence of a residue - residue contact was determined based on the distance of all heavy atoms, in both residue backbones and side chains. When we compared a pair of structures, 34 different contacts were those that existed between a pair of residues in one structure but not in the other. In cases where we compared sets of structures (in Figures S12 and S13) different contacts were the ones conserved in all structures from the first group and not existing in any of the structures from the second group. To estimate the impact each individual mutation has on the residue-residue contact network (i.e. structure), we counted the number of different contacts from the first, first two, and first three shells of residue-residue contacts around the residue of interest (Figures S12 and S13). An average residue (when considering both common and different contacts between AncORANGEPyrR and VIOLETPyrR structures) has 8.7 ± 5.5 first shell contacts (residues making direct residue-residue contacts), 48 ± 26 first+second shell contacts, and 142 ± 65 first+second+third shell contacts. An average PyrR dimer between AncORANGEPyrR and VIOLETPyrR has 974 residue-residue contacts (or 443 per monomer). That means that three shells of contacts around an average residue cover a quarter of residue-residue contacts (142/443). At the same time, considering only the first shell of residue-residue contacts can be misleading, for example in the case of residues on the surface, or in weakly connected regions, that have more flexible side chains. They can rewire a relatively large number of contacts but this change might not propagate through the structure. We used the igraph package (http://igraph.sourceforge.net) for R for all network analyses. Algorithm for identifying correlation differences between protein structures We constructed correlation matrices from the normal modes as described by Ichiye and Karplus (51). The matrix elements have values between -1 and +1 for each pair of residues in a structure. Values of -1 and +1 indicate pairs with highly correlated displacements in opposite and parallel directions, respectively. A value of zero means no correlation. We compared the matrices of two 35 proteins by subtracting the absolute values of corresponding residue pairs according to the structural alignment. The resulting difference correlation matrices (ΔC) for two proteins inform us about gains and losses of correlations in one structure compared to the other. For example, when comparing a tetramer (T) with a dimer (D), D is subtracted from T (T-D); thus, positive values in the difference map correspond to gains of correlation in the tetramer and negative values to losses of correlation in the tetramer (all relative to the dimeric units). The procedure used to define the clusters consists of parsing the ΔC matrix to find regions (a group of pairs of neigbouring residues) that undergo a gain or loss of correlations. When ΔC is plotted, these regions appear as red or blue ‘patches’. In practice, the search for clusters is performed iteratively as follows: 1) Starting from an amino acid of interest (e.g. a mutation point), one iterates over all its correlation values with other amino acids. On the plot of the ΔC matrix, it means starting from amino acid at position j on the Y-axis and moving along the X-axis through all the points that have (Yj, X(1 to N)) as coordinates, N being the total number of amino acids. 2) A point (a pair of amino acids) with a value above the chosen threshold (for example, 0.1) is stored in a list, and the starting point moves by 1 position, along the X-axis to the right and to the left (if this point has not been visited before) and along the Y-axis (up and down). If the values of these four pairs are above the threshold, the pairs are stored in the cluster list and the search continues in all four directions from each of these. 3) The search grows subsequently, and stops in a given direction when a ΔCij score that is below the predefined threshold is reached. When the boundary of a given cluster has been explored, one continues iterating along the X-axis to find other clusters involving the 36 amino acid at position j (next red or blue ‘patch’ on the map). The coordinates to the clusters previously defined are stored to avoid visiting the same cluster more than once. The clusters are selected for using two parameters, i) the minimum cluster size (number of points in a cluster), and ii) the minimum score. Only unique clusters, which differ in length (if the same points appear but includes more neighbouring points, then only the larger network is retained) and in the indexing of the points (for example, if a smaller cluster is sampled and these points differ from the ones collected before), are retained in this search. The threshold value is chosen so that the statistical cluster analysis captures the correlation differences revealed by the calculation of the difference between the correlation maps of the two structures compared. For example, for the statistical analysis, the threshold at above +0.05, below -0.05 fully samples the pink patches that define the largest correlation difference in the tetrameric interface regions shown in Figure S16 C. In general, the differences in correlation usually do not exceed 0.15 in pairs of residues that are away from the diagonal of the correlation matrix, thus 0.05 is a reasonable cut-off for sampling them. 37 References 52. T. Pupko, R. E. Bell, I. Mayrose, F. Glaser, N. Ben-Tal, Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics (Oxford, England) 18 Suppl 1, S71 (2002). 53. V. B. Chen et al., MolProbity: all-atom structure validation for macromolecular crystallography. Acta crystallographica Section D, Biological crystallography 66, 12 (Feb, 2010). 54. Schwartz, R.M., Dayhoff M.O., Chapter 22: A model of evolutionary change in proteins. In Atlas of protein sequence and structure, (1978). 55. J. Adachi, M. Hasegawa, Model of amino acid substitution in proteins encoded by mitochondrial DNA. Journal of molecular evolution 42, 459 (May, 1996). 56. Y. Cao et al., Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. Journal of molecular evolution 47, 307 (Sep, 1998). 57. S. Whelan, N. Goldman, A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Molecular biology and evolution 18, 691 (Jun, 2001). 58. M. W. Dimmic, J. S. Rest, D. P. Mindell, R. A. Goldstein, rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. Journal of molecular evolution 55, 65 (Jul, 2002). 59. J. Adachi, P. J. Waddell, W. Martin, M. Hasegawa, Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. Journal of molecular evolution 50, 348 (May, 2000). 60. T. Müller, M. Vingron, Modeling amino acid replacement. Journal of computational biology : a journal of computational molecular cell biology 7, 761 (2000). 61. S. Henikoff, J. G. Henikoff, Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America 89, 10915 (Nov 15, 1992). 62. P. Schuck, Size-distribution analysis of macromolecules by sedimentation velocity ultracentrifugation and lamm equation modeling. Biophysical journal 78, 1606 (Apr, 2000). 63. A. Leslie, H. R. Powell, Processing diffraction data with MOSFLM. Evolving methods for macromolecular crystallography, (2007). 64. P. Evans, Scaling and assessment of data quality. Acta crystallographica Section D, Biological crystallography 62, 72 (Feb, 2006). 65. W. Kabsch, XDS. Acta crystallographica Section D, Biological crystallography 66, 125 (Mar, 2010). 38 66. G. Winter, xia2: an expert system for macromolecular crystallography data reduction. Journal of applied crystallography 43, 186 (Dec 01, 2009). 67. E. D. Levy, A simple definition of structural regions in proteins and its use in analyzing interface evolution. Journal Of Molecular Biology 403, 660 (Nov 05, 2010). 68. J. Tsai, R. Taylor, C. Chothia, M. Gerstein, The packing density in proteins: standard radii and volumes. Journal Of Molecular Biology 290, 253 (Jul 02, 1999).