Department of Chemistry University of Cambridge Anaerobic, NADH-Dependent Haem Breakdown in a Family of Haemoproteins – An Experimental and Computational Study This dissertation is submitted to the University of Cambridge for the degree Doctor of Philosophy. Alasdair Donald Keith Jesus College September 2021 Declaration The work described in this dissertation was carried out by the author in the De- partment of Chemistry at the University of Cambridge between October 2017 and September 2021. The contents are the original work of the author except where otherwise indicated and contain nothing that is the outcome of collaboration. The contents have not previously or concurrently been submitted for any other degree or qualification at the University of Cambridge or any other institution. All result- ing publications and the research data are made publicly available as required by the Open Access policies of the University of Cambridge and the Engineering and Physical Sciences Research Council. The number of words in this dissertation does not exceed 60000. Alasdair Donald Keith September 2021 ii Abstract Anaerobic, NADH-Dependent Haem Breakdown in a Family of Haemoproteins Alasdair Donald Keith Many pathogens function by internalising the haem molecules of their host organism and breaking down the porphyrin scaffold to sequester the Fe2+ ion. Typically, this breakdown mechanism is mediated by a haem oxygenase. However, a novel class of reaction has been discovered, which can be performed anaerobically using ‘na- ture’s reductant’, NADH, and the Yersinia enterocolitica protein, HemS. To study the features of this reaction in more detail, conventional experimental methods were combined with Energy Landscape Theory. Deuterium labelling demonstrated that the reaction was initiated by hydride transfer and stopped-flow spectroscopy showed that the reaction proceeded via a short-lived intermediate. Since no structural in- formation regarding NADH-binding to HemS was available, computational calcula- tions were used to sample the conformational space around possible NADH-protein binding sites and to construct kinetic transition networks. From these networks, pathways showing the unfolding and approach of NADH to haem inside the pocket were determined. These pathways highlighted the roles of various residues, thus allowing for a targeted mutagenesis study. This study, carried out using both com- putation and laboratory-based experimentation, was especially focussed on a double phenylalanine gate located in the centre of the main cavity. Key insight concerning how this feature regulates the access of NADH to haem was gained. Computational results suggested that the HemS homologues, HmuS, ChuS and ShuS, were also capable of promoting anaerobic haem breakdown, but that catal- ysis by ChuS and ShuS may be limited by competing functions. Bioinformatics was used to gauge what these possible alternative functions could be, and to place HemS within its wider phylogenetic context. The computational predictions were iii then tested in the laboratory. The three homologues were all shown to engage in the reductive haem breakdown process but to varying degrees of efficacy. These findings demonstrate that this novel haem breakdown reaction is not unique to HemS, but instead is a feature of a wider class of haemoproteins. A subset of these haemopro- teins are known to bind certain DNA promoter regions, suggesting not only that they can catalytically degrade haem, but that they are also involved in transcrip- tional modulation responding to haem flux. Many of the bacterial species responsible for this class of protein (including those that produce HemS, ChuS and ShuS) are known to specifically target oxygen-depleted regions of the gastrointestinal tract. A deeper understanding of anaerobic haem breakdown processes engaged in by these pathogens could therefore prove useful in the development of future strategies for disease prevention. iv Acknowledgements I have always been an unashamedly sentimental type, so this acknowledgements page shall go on for a while... I thank my two supervisors, Professor David Wales and Dr Paul Barker, for their wisdom, guidance and forbearance. I thank them for suggesting this interesting, enthralling project and I hope I’ve been of help in unravelling the mysteries of HemS! Much of this project took me out of my comfort zone – I’m glad they were both there to guide me along the way. I thank Dr Konstantin Ro¨der, Daniel Sharpe and Luke Dicks from the Wales group. Konstantin has proven to be a good friend, mentor and, most importantly, drinking buddy, whilst I have thoroughly enjoyed these last four years with Daniel and Luke as fellow PhD students. I thank Dr Sally Boss, Dr George Biggs, Jamie Klein and Victoria Daramy- Williams from the Barker group. Their presence certainly made for some interesting group meetings, not least Sally’s cat – a regular and popular attendee over Zoom. I am indebted to both George and Jamie for their regular collaborations in the lab, in particular the many mass spec samples they submitted for me! I thank James Cole and Yuhang Xie, my two Part III students. Both proved to be model students and excellent companions in the lab, each with that slight hint of eccentricity required to work with me. I will never forget James’ face (nor he mine I should imagine!) when he lost half his protein sample down the sink, nor will I forget Yuhang’s many zany (yet always somehow relevant) cartoons in his group presentations. Thank you also to the many others I have gotten to know in Cambridge. Dr Alex Thom most definitely deserves a mention, especially considering the many bar tabs he’s insisted on paying over the years. I must owe him a fortune. I’m sure Alex would put my reticence to pay down to my being a Scot. I thank the crowd from Dundee, including Vera Ross, Jack & Margaret Scott and Philma Fotheringham, all of whom unfortunately are long gone but always took a keen interest in my education and motivated me to succeed when I was young. v I must also thank Sheila Whammond, who likewise encouraged me to stick in at school, and John Smith, who has always been available for advice, a cup of coffee and a good laugh. I thank my friends from school, including but not limited to Stuart Cant, Cameron Ireland, Chris Milne, Junaid Rasul, Jamie Stewart, Paul Whelan, Ryan Barnett and James Todd. I also thank my friends from undergraduate at Heriot-Watt University, in particular those who I lived with, Ryan Crowe, Callum Keanie and Sam Penman. I thank Nikol Kaderˇa´bkova´ for the years we had together, and wish her every success in the future. I also wish Archie good health and much love. I thank my uncle, Brian Keith, his wife Marian, and my two cousins Claire and Niall. Here’s to more visits to Ibrox together! I thank my uncle, Derek Keith, who has often been more than just an uncle to me. His staunch defence of the rights of the people of Scotland to walk freely in their own country and to fish its rivers has always served as an inspiration to me. I thank those who I lived with in Malcolm Street and Wesley House over the last fifteen months of my PhD. The crowd at Wesley House, in particular, managed to temper my pre-viva nerves and grumpiness with much fun and laughter, including but not limited to THAT Halloween prank, heated debates about Kenyan cats, and enthusiastic Christmas decorating & carolling. From Wesley House, I especially thank Allison Burnette. From talks over cuppas to late-night laughter, I’ve loved spending so much time together. I’ve never known someone with so much positivity, energy, and zeal for ‘making merry.’ I feel like we fell out of a lucky tree, hit every branch on the way down, and ended up in a pool full of cash and Sour Patch Kids. I thank my ‘adopted Grandad’, Magne Røin˚as. I don’t think I have ever met anyone more generous. Nor have I known someone so close to nature. When I saw him swimming with seals in the North Sea, I realised then that he was a true Viking! I’m not sure whether Jennifer was so pleased – she is sorely missed. I thank my Nana, Phyllis McLaren. From her I learned the wonderful Doric word, drochle, which makes for a great insult. She was of a generation of Scots long since gone – a stern Presbyterian on the surface, but a vivacious and loving woman to those who knew her best. I also thank her husband and my Great-Grandfather, Donald McLaren, Spitfire pilot during World War II, who made the ultimate sacrifice so we could live in peace and security today. I never got to know my Grandad, James Martin (1941-1965), nor my Granny, Catherine Keith (1931-1993). Both died far too young. However, through their examples, both taught me to love the people of this country. My Grandad died as vi an RAF officer, whilst my Granny spent much of her life standing up for ordinary working-class people. I know she would have been proud of her four grandchildren going to university, and especially of her youngest son being elected a Labour Party councillor. I thank my Step-Granny, Jean Keith, who sadly passed away the day I submitted this thesis for examination. There is no doubt she was a blessing to my Grandad in his later years. The way she told the story about my Grandad singing at their wedding always made me laugh. Jean was always so kind to my brother and I. She truly was part of the family, and will be sorely missed. I thank my brother, Martin. I could not wish for a better brother. He is not only that. He is my best friend. Though nearly 6 years younger than me, he’s taught me so much (not least because he’s far more ‘with it’ than me!). Though I was the brother blessed with the superior looks and charm (of course!), his wit has always cracked me up. I’ll never tire of Mum and I in hysterics at something Martin has said. Poor Dad, who is usually the brunt of it all! Yes, my Dad, Alexander, has to put up with a lot. I’ve never known somebody so selfless. He has always put his family first, his community second, and himself firmly last. It was he who taught me, as a young boy, the importance of serving others. As a local councillor, he never once wavered in his principles, which was to stand up for the poor and speak truth to power. Sometimes I wish I just had an ounce of the steel and determination he has. I could not have wished for a better role model. Lastly, I thank my Mum, Susan. She, like my Dad, spent many years in service to the community. As a social worker, she has seen the very best and very worst of society, and she can be proud of what she did to make it that bit better. But it’s always been clear, to me at least, that my Mum’s first love is her family. She has often been a crutch for me to lean on. She has always known what to say and how to make things better. Though I hate to say it, I think it may just be true that ‘Mums are always right.’ She has never let me down. This thesis is dedicated to my Grandad, Alexander Keith (1928-2015), and to my Granny, Anne Martin (1940-2019). Both inspired me deeply. My Grandad imparted two important life lessons – the virtue of patience and contentment, and a love for Rangers FC. I was especially close to my Granny, and I miss her every day. She had a hard life but remained happy and was able to light up any room. She showed you could make something out of nothing. She always used to say of her own Grandmother, ‘She had a heart of gold. She would have given a beggar her last penny.’ The same was true of herself. I will always look up to her. vii Glossary of Abbreviations 5′-dA• 5′-deoxyadenosyl-5′-radical ABC ATP-binding cassette AMBER Assisted model building with energy refinement AMBER12 AMBER package version 12 AMBER16 AMBER package version 16 ATP Adenosine triphosphate BH Basin-hopping BTP Bis-tris propane BVR Biliverdin reductase CCDC Cambridge Crystallographic Data Centre CDC The Centers for Disease Control and Prevention CIP Calf intestinal alkaline phosphatase CPR-NADPH Cytochrome P450 Reductase-NADPH CPU Central processing unit Cryo-EM Cryogenic electron microscopy DAB Deuteroanaerobilin DMSO Dimethyl sulfoxide DNA Deoxyribonucleic acid DNEB Doubly-nudged elastic band DPS Discrete path sampling dsDNA Double-stranded DNA DTT Dithiothreitol viii EF Eigenvector-following ELT Energy landscape theory EPR Electron paramagnetic resonance ESI-MS Electrospray ionisation mass spectrometry FBS Fur-binding site FES Free energy surface ff99SB Force field 99 with structure balance FFS Forward flux sampling Fur Ferric uptake regulator GB Generalised Born GPU Graphical processing unit HBP Haem breakdown product HEF Hybrid eigenvector-following HEPES 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid His Histidine HIV-1 Human immunodeficiency virus 1 HO Haem oxygenase HPLC High performance liquid chromatography IDA Iminodiacetic acid IPTG Isopropyl-β-d-thiogalactopyranoside IR Infrared I-TASSER Iterative threading assembly refinement IUPAC International Union of Pure and Applied Chemistry LB Lysogeny broth L-BFGS algorithm Limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm LCMS Liquid chromatography mass spectrometry LEaP Link, edit and parm. LED Light-emitting diode MAB Mesoanaerobilin ix MBH Mutational basin-hopping MCM Monte Carlo plus energy minimisation MD Molecular dynamics min Minimum/minima mRNA Messenger RNA MS Mass spectrometry MSn Tandem mass spectrometry NAD+ Nicotinamide adenine dinucleotide (oxidised) NADD Deuterated NADH NADH Nicotinamide adenine dinucleotide (reduced) NADP+ Nicotinamide adenine dinucleotide phosphate (oxidised) NADPH Nicotinamide adenine dinucleotide phosphate (reduced) ndMS Non-denaturing mass spectrometry NGT New graph transformation NIR Near-infrared NMR Nuclear magnetic resonance OD Optical density ORF Open reading frame PBT Periplasmic binding-protein-dependent transport PCR Polymerase chain reaction PDA Photodiode-array PDB Protein data bank PEG Polyethylene glycol PES Potential energy surface Phe- Phenylalanine PPI Protein-protein interaction PPIX Protoporphyrin IX QM/MM Quantum mechanics/molecular mechanics RD Restriction digest x RMS Root mean square RNA Ribonucleic acid RNAP RNA polymerase RRKM theory Rice-Ramsperger-Kassel-Marcus theory RSMT Radical SAM methyltransferase SAM S -adenosylmethionine SASA Solvent accessible surface area SD Sub-database SDS-PAGE Sodium dodecyl sulphate-polyacrylamide gel electrophoresis SEC Size exclusion chromatography SOC Super optimal broth with catabolite repression SP Stationary point SPR Surface plasmon resonance sRNA Small RNA ssDNA Single-stranded DNA SVD Singular value decomposition ToF Time of flight TPS Transition path sampling TS Transition state TST Transition state theory UPLC Ultra performance liquid chromatography UV-Vis Ultraviolet-visible WT Wild type xi Glossary of Symbols A, B and I Reactant, product and intervening sets from DPS Aij Electrostatic force constant of repulsion between i and j Aλ Absorbance at a given wavelength, λ Bij Electrostatic force constant of attraction between i and j CAb / C B a Committor probabilities CH(T ) Heat capacity c Concentration D(i, j) Euclidean distance between i and j E Potential energy EAMBER Energy derived from AMBER force field En Energy at which the superbasin analysis, n, was performed Es Set of edges in a weighted, directed graph FEi (T ) Free energy of minimum i Fi Component of the gradient along each eigenvector Fα Force on particle α G Gradient of the potential energy surface G Magnitude of G gi Change in occupation probability of minimum i g Gradient of the ‘true’ potential energy (PE) (DNEB) g˜ Gradient of the spring PE (DNEB) gDNEB Gradient of the doubly-nudged elastic band gNEB Gradient of the nudged elastic band xii g‖ Gradient of the parallel component of the ‘true’ PE g⊥ Gradient of the perpendicular component of the ‘true’ PE g˜‖ Gradient of the parallel component of the spring PE g˜⊥ Gradient of the perpendicular component of the spring PE H Hessian matrix h Planck constant K a Acid dissociation constant K D Dissociation constant K M Michaelis constant Kr Force constant (AMBER bonds) Kθ Force constant (AMBER angles) kAB / kBA Rate constants from region B to A / A to B kNSSAB / k NSS BA Non-steady state rate constants from region B to A / A to B kab / kba Rate constants from minimum b to a / a to b kB Boltzmann constant kspr Spring constant k†i Unimolecular rate constant through transition state † from minimum i l Optical path length Mα Mass of particle α Ms Set of minima in a weighted, directed graph min Local energy minimisation N Number of atoms n Multiplicity ni Number of distinct permutational isomers in minimum i nmax Maximum number of allowed connection attempts nspr Number of springs nu Number of connection attempts Pa / Pb / Pi Occupation probability of minimum a / b / i P eqA / P eq B Equilibrium occupation probability of region A / B xiii P eqa / P eq b Equilibrium occupation probability of minimum a / b P(t) Occupation probability vector for all states at time t Peq(t) Equilibrium occupation probability vector at time t qi Charge on atom i Ri / Rj Effective Born radii rb Bond length req Equilibrium bond length rij Distance between atoms i and j s Integrated path length T Temperature t Time ta / tb Mean waiting times under the non-steady state condition Vi Potential energy for minimum i Vn Force constant (AMBER dihedrals) Vt ‘True’ potential in DNEB V (X) Potential energy surface V ∗(X) Transformed potential energy surface V˜ Spring potential in DNEB W Transition matrix w(i, j) Edge weight of minima pair (i, j) in a weighted, directed graph X Nuclear coordinates Xα Nuclear coordinates at particle α x Small displacement from X xNR Newton-Raphson step Z(T ) Canonical partition function Zi(T ) Contribution to canonical partition function from minimum i Z†(T ) Modified canonical partition function of the TS β Thermodynamic beta, 1/kBT γ Phase angle xiv δE Potential energy change ∆Gel Electrostatic solvation free energy ∆Gnp Non-polar solvation free energy ∆Gsolv Total solvation free energy δt Short time step ∆V †i Energy difference between minimum i and TS †  Relative permittivity of medium protein Relative permittivity of protein SM Extinction coefficient at the Soret maximum solv Relative permittivity of solvent λ Extinction coefficient at a given wavelength, λ ζ Weighted, directed graph θ Bond angle θeq Equilibrium bond angle κ Debye-Hu¨ckel screening parameter λ˜i Eigenvalue i λ Wavelength νi Eigenvector i νi Geometric mean vibrational frequency of minimum i ρi Van der Waals radius of atom i ρ˜i Intrinsic radius of atom i φ Torsional angle χ Number of vibrational degrees of freedom Ω(E) Density of states Ωi(E) Contribution to density of states from minimum i xv Contents 1 Introduction 1 1.1 Iron in Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 The ‘Iron Paradox’ . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Haem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Haemoproteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Protein Evolution, Structure and Function . . . . . . . . . . . 4 1.2.2 Haem Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 Haem Transport . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.4 Haem Breakdown . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Operons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.1 Operon Structure . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.2 The Hem Operon of Yersinia enterocolitica . . . . . . . . . . . 12 1.3.2.1 The Bacterium . . . . . . . . . . . . . . . . . . . . . 12 1.3.2.2 The Operon . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.3 The Hmu Operon of Yersinia pestis . . . . . . . . . . . . . . . 21 1.3.3.1 The Bacterium . . . . . . . . . . . . . . . . . . . . . 21 1.3.3.2 The Operon . . . . . . . . . . . . . . . . . . . . . . . 21 1.3.4 The Chu Operon of Escherichia coli . . . . . . . . . . . . . . 22 1.3.4.1 The Bacterium . . . . . . . . . . . . . . . . . . . . . 22 1.3.4.2 The Operon . . . . . . . . . . . . . . . . . . . . . . . 22 1.3.5 The Shu Operon of Shigella dysenteriae . . . . . . . . . . . . 26 1.3.5.1 The Bacterium . . . . . . . . . . . . . . . . . . . . . 26 1.3.5.2 The Operon . . . . . . . . . . . . . . . . . . . . . . . 26 1.3.6 The Phu Operon of Pseudomonas aeruginosa . . . . . . . . . 26 1.3.6.1 The Bacterium . . . . . . . . . . . . . . . . . . . . . 27 1.3.6.2 The Operon . . . . . . . . . . . . . . . . . . . . . . . 27 1.4 HemS and its Homologues . . . . . . . . . . . . . . . . . . . . . . . . 28 1.4.1 HemS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.4.2 Haem-Binding in HemS . . . . . . . . . . . . . . . . . . . . . 30 xvi CONTENTS 1.4.3 HmuS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.4.4 ChuS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.4.5 ShuS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 1.4.6 PhuS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 1.5 Novel, Anaerobic Haem Breakdown Discovered in HemS . . . . . . . 43 1.5.1 NADH Structure and Properties . . . . . . . . . . . . . . . . . 43 1.5.2 Biophysical Research into Haem Breakdown in HemS . . . . . 44 1.5.3 Limitations of the Biophysical Approach . . . . . . . . . . . . 49 1.6 Bioinformatics and Computational Biochemistry . . . . . . . . . . . . 50 1.6.1 Principles of Bioinformatics . . . . . . . . . . . . . . . . . . . 50 1.6.2 Principles of Computational Biochemistry . . . . . . . . . . . 51 1.6.3 Energy Landscape Theory . . . . . . . . . . . . . . . . . . . . 53 1.6.4 Some Successful Applications of ELT . . . . . . . . . . . . . . 54 1.7 Previous Work Using Computational Methods to Investigate HemS . 56 1.7.1 Bioinformatics as Applied to the HemS-NADH Binding Problem 56 1.7.2 Energy Landscape Theory as Applied to the HemS-NADH Binding Problem . . . . . . . . . . . . . . . . . . . . . . . . . 58 2 Project Outline 63 3 Experimental Methods 66 3.1 General Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.2 Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3 Plasmid Preparation & Protein Expression / Purification . . . . . . . 68 3.3.1 HemS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.3.2 HemS Mutants . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3.3 HmuS, ChuS and ShuS . . . . . . . . . . . . . . . . . . . . . . 70 3.4 SDS-Polyacrylamide Gel Electrophoresis . . . . . . . . . . . . . . . . 71 3.5 Ultraviolet-Visible Spectroscopy . . . . . . . . . . . . . . . . . . . . . 71 3.5.1 Haem-Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.5.2 Steady State Reaction of holo-HemS with NADH . . . . . . . 72 3.6 Pre-Steady State Reaction Time-Course Using Stopped-Flow . . . . . 72 3.7 Anaerobic Reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.8 Extraction and Purification of the NADH-Dependent Haem Break- down Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.9 Nuclear Magnetic Resonance Spectroscopy . . . . . . . . . . . . . . . 74 3.10 Mass Spectrometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.11 X-ray Crystallography . . . . . . . . . . . . . . . . . . . . . . . . . . 75 xvii CONTENTS 3.11.1 Crystallisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.11.2 Data Collection and Analysis . . . . . . . . . . . . . . . . . . 75 3.12 Synthesis and Characterisation of (R)/(S )-NADD . . . . . . . . . . . 76 4 Computational Methods 78 4.1 AMBER Potential and Force Field . . . . . . . . . . . . . . . . . . . 78 4.2 Small Molecule Parameterisation . . . . . . . . . . . . . . . . . . . . 79 4.3 Implicit Solvent Model . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.4 Generating AMBER Input Files . . . . . . . . . . . . . . . . . . . . . 81 4.5 Basin-Hopping and Minima . . . . . . . . . . . . . . . . . . . . . . . 83 4.6 Transition States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.7 Discrete Path Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.8 From Initial Pathways to Complete Representations . . . . . . . . . . 90 4.9 Disconnectivity Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.10 Implementation of Wales Group Methods on GPUs . . . . . . . . . . 95 5 Further Experimental Insight into Haem Breakdown by HemS 97 5.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.2 Proof of Anaerobic Reaction . . . . . . . . . . . . . . . . . . . . . . . 98 5.3 Deuterium Labelling to Determine Hydride Transfer . . . . . . . . . . 99 5.4 Identification of an Intermediate . . . . . . . . . . . . . . . . . . . . . 106 5.5 Attempting to Determine the Haem Breakdown Product Structure . . 109 5.5.1 NMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.5.2 Crystallisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.6 Product Inhibition and NAD+ . . . . . . . . . . . . . . . . . . . . . . 113 5.7 Discussion and Summary . . . . . . . . . . . . . . . . . . . . . . . . . 114 6 Computational Method Development 117 6.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.2 Expanding connectunc . . . . . . . . . . . . . . . . . . . . . . . . 118 6.3 checkspmutate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.3.1 Test Tripeptide System . . . . . . . . . . . . . . . . . . . . . . 121 6.3.2 Point Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.3.3 Homologues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.3.4 Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.4 Discussion and Summary . . . . . . . . . . . . . . . . . . . . . . . . . 128 xviii CONTENTS 7 Computational Comparison of HemS with its Mutants and Homo- logues 131 7.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.2 Expansion of the Wild Type HemS Database, and Further Analysis of the Double Phe-Gate . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.3 Identifying Residues to Mutate . . . . . . . . . . . . . . . . . . . . . 140 7.4 NADPH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.5 Mutant and Homologue Systems . . . . . . . . . . . . . . . . . . . . . 146 7.5.1 Analysing the Databases . . . . . . . . . . . . . . . . . . . . . 146 7.5.2 Lowest Energy Minima from Each Database . . . . . . . . . . 157 7.6 Discussion and Summary . . . . . . . . . . . . . . . . . . . . . . . . . 160 8 Bioinformatic Study of HemS Homologues 163 8.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.2 Phylogenetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 8.3 Sequence Analysis and Conservation . . . . . . . . . . . . . . . . . . 166 8.4 DNA-Binding Exhibited by Some Homologues . . . . . . . . . . . . . 168 8.5 Discussion and Summary . . . . . . . . . . . . . . . . . . . . . . . . . 171 9 Experimental Characterisation of Mutants and Homologues 174 9.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 9.2 Expression and Purification . . . . . . . . . . . . . . . . . . . . . . . 175 9.3 Haem-Binding Properties . . . . . . . . . . . . . . . . . . . . . . . . . 175 9.4 Reaction with NADH as Monitored by UV-Visible Spectroscopy . . . 179 9.5 Stopped-Flow Spectroscopy . . . . . . . . . . . . . . . . . . . . . . . 181 9.5.1 Deconvolution of the Stopped-Flow Spectra . . . . . . . . . . 181 9.5.2 Dependence of the Intermediate on NADH Concentration . . . 182 9.5.3 Effect of Mutants and Homologues on Intermediate Formation and Consumption . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.6 Crystallography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.7 Discussion and Summary . . . . . . . . . . . . . . . . . . . . . . . . . 191 10 Conclusions and Future Work 196 10.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 10.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 10.3 A Broader View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 A % Homologies of Operon Proteins 207 xix CONTENTS B Gene / Protein Sequences 213 C Computing Free Energies 224 D Specifics of Method Development 228 E Full Phylogenetic Tree 234 F Sequence Conservation 235 G Stopped-Flow Curve Fitting 236 References 239 xx Chapter 1 Introduction “Life itself is a school, and Nature always a fresh study.” Hugh Miller In 1997, leading geneticists Craig Venter and Daniel Cohen famously stated that, ‘If the 20th century was the century of physics, the 21st century will be the century of biology.’1 In many respects, this prediction has been borne out, with significant advances, for example, in epigenetics,2–4 directed evolution,5;6 genome editing7–9 and cryo-electron microscropy.10–12 Chemistry has often acted as an essential bridge so that past advances in physics can be translated through to those in biology. A key aspect of this process is the interplay between theoretical, computational modelling, and bench-top experimentation. Such an approach is especially useful for understanding protein folding and protein-ligand interactions, since these techniques can access different realms of information, and so can be used to complement each other in a wider research context. Examples of successful collaborations between experimentalists and theoreticians within biology include the use of a computationally designed protein probe to iso- late a broadly neutralising antibody found in HIV-1-infected patient serum,13 and to better understand tumour growth in hypoxic tissues.14–16 A further example would be the construction of protein-protein interaction (PPI) networks to chart the dy- namic interactions between proteins in malaria-spreading Plasmodium falciparum.17 These examples were all deep-level collaborations but, more broadly, advances made in modelling packages, analysis software and database organisation have greatly ex- panded experimental horizons. In this thesis, both experiment and theory were used to help uncover the be- haviour of the haem-binding protein, HemS, and a selection of its homologues and mutants. The enzymatic reactivity of haem with the biomolecule, NADH, first dis- 1 Introduction covered in HemS by the Barker Group, was investigated in detail using both of these approaches. In this Introduction, a background to these proteins, their context within their wider operons, and how these systems fulfil certain aspects of bacterial haem ac- quisition and utilisation strategies, will be discussed. This presentation will be supplemented by a discussion of the principles behind computational biochemistry, and its expanding role in life sciences research. 1.1 Iron in Biology 1.1.1 The ‘Iron Paradox’ The primary isotope of iron (56Fe) has the greatest nuclear stability of any known element in the universe,18 and is the fourth most abundant species in the Earth’s crust.19 These features, coupled with its chemical versatility, made it a logical, and perhaps even ideal, candidate for use in early biological systems. Its ability to act as a Lewis acid and to occupy a variety of oxidation (primarily II and III) and spin (high and low) states under physiological conditions20;21 render it suitable for a wide range of biochemical processes. There are certain limitations to using free iron in biological systems, however, which can partly be explained by evolutionary history. In the era of the ‘primordial soup’, when life first began on Earth, iron primarily resided in its stable ferrous form, Fe2+. However, the expansion of oxygenic photosynthesis, which the evolution of early microorganisms afforded, led to the ‘pollution’ of the atmosphere with molecu- lar oxygen. This negatively impacted iron-based biochemistry in two ways. Firstly, it resulted in a gradual shift in iron’s natural state, from the ferrous to the ferric form (Fe2+ → Fe3+). This shift had serious implications for availability in biological systems as the solubility in aqueous conditions and pH 7.0 for ferrous and ferric iron are 0.1 M and 10-18 M respectively.20 Secondly, the use of ferrous iron became compromised due to its ability to coordinate O2 and initiate Fenton chemistry. 20;22 Fe2+ + O2 −→ Fe3+ + O–2 2 O–2 + 2 H + −→ H2O2 + O2 Fe2+ + H2O2 −→ Fe3+ + OH– + OH· Scheme 1: Iron oxidation, followed by Fenton chemistry. This process produces highly reactive hydroxyl radicals, which are destabilising for cellular environments and can cause extensive tissue damage. 2 Introduction Figure 1.1: Labelled structure of haem b. This dilemma is commonly known as the ‘iron paradox’ – life on Earth is reliant on iron, and yet its two main ions are either toxic or insoluble inside cells. Nature has developed a variety of methods to overcome this problem, such as storage as a crystallite in lactoferrin,23 or being bound as a single ion (primed to react) within a lipoxygenase.24 Another solution is to store the iron in a porphyrin scaffold, an examples of which, haem, has become pervasive across the biological realm. 1.1.2 Haem Haem is thought to be a vital component in every living organism. It can exist in a variety of subtly different forms, the most common of which is haem b, which is the focus of this thesis. This form consists of an iron ion coordinated via four chelating nitrogen atoms to the cyclic tetrapyrrole, protoporphyrin IX (Fig. 1.1). The ring surrounding the iron ion is conjugated and planar, leaving two free axial sites, which proteins often make use of to bind as a Lewis base to the iron. Such coordination can significantly disrupt the planarity of the ring. Additionally, the ring itself is functionalised by two propionate groups, two vinyl groups and four methyl groups, in an arrangement that makes the molecule asymmetrical about any axis. 3 Introduction These groups afford further opportunities for proteins to bind to haem. Examples include salt bridge formation between cationic residues (such as arginine, histidine or lysine) and the propionates,25 and covalent attachment of cysteine residues to the vinyl groups,26 as well as less rigid pi–pi aromatic interactions, polar contacts and dispersion forces. A significant complicating factor, however, is that free haem is cytotoxic.27 This is due to its hydrophobicity, causing it to separate into membranes or even stack together at high concentrations. The first of these possibilities can lead to lipid peroxidation due to the ability of the iron ion to initiate the Fenton reaction, despite being in a scaffold.28 It is therefore essential for haem to be contained at all times. To achieve this containment, haemoproteins have evolved to fulfil three basic functions: acquisition, transportation and utilisation. 1.2 Haemoproteins 1.2.1 Protein Evolution, Structure and Function Proteins are remarkable chemical machines. They are recognised as the ‘workhorses of the cell’, due to their abundance (comprising 55% of the dry mass of a typical bacterial cell29) and diversity in structure and function. Proteins are constructed from only six elements (carbon, hydrogen, oxygen, nitrogen and small amounts of sulfur and phosphorus), which are arranged into approximately 20a different amino acid building blocks, and are made via condensation polymerisation of these build- ing blocks in a tightly controlled reaction directed by DNA-derived RNA templates. This control over the sequence is essential for the reliable synthesis of proteins, which typically range from 50-2000 amino acid units.30 Assuming all sequential arrange- ments over this range to be possible gives a theoretical number of configurations of: 2000∑ n=50 20n = 1.209×102602. (1.1) This value vastly exceeds the number of atoms (∼1080) in the known universe. In reality, however, only a small subset of these configurations are ever accessed. This situation reflects the evolutionary history of protein, whereby they (and their associ- ated information storage partners, DNA and RNA) had to adapt to given geological aThere was a long-established consensus that 20 amino acids were required to make all proteins. However, in recent years, selenocysteine and pyrrolysine have come to be regarded as the 21st and 22nd amino acids. There are perhaps more to be discovered. 4 Introduction Figure 1.2: Top: A representative α-helix, taken from the first twelve residues (SIYE- QYLQAKAD) of HemS. Bottom: A representative β-sheet, taken from residues 259-278 (KVTPHQDWINVFNQRFTLHL) of HemS. In the ball-and-stick models, the side chains have been removed for clarity. Dashed black lines show hydrogen-bonding between back- bone atoms. There is a classic i + 4 → i pitch for the α-helix, and the β-sheet is in an antiparallel arrangement. conditions. As life proliferated, competitive pressures (particularly due to viruses and their hosts) further necessitated protein diversification and specialisation. To fulfil their required functions, different proteins have adopted a wide range of structural conformations. Each type of amino acid possesses a different side chain, which can broadly be categorised as charged, polar or non-polar. Most of the non-polar side chains are hydrophobic, and so a protein in aqueous solution will preferentially fold to bury as many of them as possible. This condition, combined with salt-bridge formations between the charged groups and polar-polar/hydrogen- bond contacts between the polar groups, gives a protein its overall conformation. Two main motifs that arise from such contacts are α-helices and β-sheets, and are depicted in Fig. 1.2. Protein structure is essential for function, and the organisation of its sequence determines how a protein folds in aqueous solution. Different folds are necessary for different purposes. There are four main classes of protein – intrinsically disordered, fibrous, membrane and globular – within which there are also significant variations. Roughly speaking, fibrous proteins, such as collagen and elastin, tend to be rich in glycine and proline. Their hydrophobicity governs the formation of long, rigid re- gions which can aggregate, and are relatively resistant to denaturation, two features necessary for structural integrity in cartilage and tendons, for example. Transmem- brane proteins (one of many membrane sub-classes), on the other hand, require 5 Introduction a large concentration of the residues necessary for α-helix and β-sheet formation, since their membrane-spanning domains require these motifs.31 Such proteins often produce ion channels, thus necessitating a large hydrophobic region through which the ion can travel. Globular proteins, meanwhile, tend to be more flexible in their conformations and have a degree of water solubility. They tend to be more spherical in shape, with hydrophobic residues pointing towards their interior regions, and hy- drophilic residues comprising their surfaces. This arrangement is typically necessary to store or utilise small biomolecules in the interior, whilst also allowing them to travel throughout the cell. As such, many globular proteins have roles in acquisition, transport and/or catalysis. Proteins that specifically interact with haem are known as haemoproteins. They can be split into different classes, as discussed over the following sections. 1.2.2 Haem Acquisition The first class of haemoproteins engage in haem acquisition. They are particularly prevalent in pathogens, as they do not typically synthesise their own haem, and so have to ‘steal’ it from other species. There are two main strategies a pathogen engages in to scavenge haem. The first is the secretion of siderophores, which are small biomolecules that chelate iron with high-affinity, thus solubilising it and trans- porting the iron to specific membrane receptors. Siderophores that acquire haem specifically are known as haemophores. Though widely used by pathogens, such a strategy is costly and even wasteful – there is an energy penalty for producing such molecules and only a fraction will return following secretion.32 The second uptake strategy is to have proteins capable of direct contact with ex- ogenous iron-containing compounds (typically haem) distributed across the outside surface of the cell. These proteins are anchored in the membrane, and belong to a class known as cell surface receptors. Those proteins within this class that acquire haem typically have a β-barrel motif running through the membrane, as well as flex- ible extracellular loops, which can bind to haem, host haemoproteins or pathogenic haemophores.33;34 Most also have characteristic FRAP/NPNL domains34;35 and a histidine residue to coordinate to iron directly. These features are illustrated in Fig. 1.3. Pumping molecules across membranes requires energy and so all haemo- proteins require a TonB motif, which is discussed in more detail in Section 1.3. 6 Introduction Figure 1.3: Example of a bacterial haem acquisition protein. H86 and H428 are high- lighted in magenta – these residues are important for initial coordination to haem. Also highlighted are the conserved FRAP (blue) and NPNL (orange) domains. The β-barrel structure allows for haem to be protected from the outer membrane these proteins are typically embedded in. This structure is of Shigella dysenteriae ShuA (aka ShuR, PDB Code 3FHH),36 an important protein which is discussed further in Section 1.3. 1.2.3 Haem Transport The location of haem once it has been acquired by a pathogen rarely corresponds to the region where it is ultimately utilised. Generally, in a Gram-negative pathogen the haem needs to be transported from the outer membrane through the periplasm and to the cytosol for use. The process has been coined the ‘periplasmic binding- protein-dependent transport (PBT)’ system.37 A complicated system of proteins is typically required to pump haem across the inner membrane. To shuttle haem through the periplasm or cytoplasm, haem transport proteins (otherwise known as haem chaperones) are required. These proteins need to be conformationally flexi- ble to be effective at binding and releasing haem. Haem chaperones, particularly Gram-negative pathogenic varieties, are still poorly understood, with most of the current knowledge having been derived from proteins produced by the operons to be discussed in Section 1.3. Periplasmic haem chaperones are those encountered once a haem molecule has been transported through the outer membrane. As the name would imply, a periplasmic haem chaperone then shuttles the haem through the periplasm, to be deposited at the inner cytoplasmic membrane surface. Due to the relaxed porosity in the outer membrane, which allows for the influx of small, hy- drophilic molecules, the periplasm tends to equilibrate to the pathogen’s surrounding environment. Therefore, the periplasmic environment is highly variable in pH and 7 Introduction Figure 1.4: Representation of a typical bacterial haem-uptake system. A purple box indi- cates a haemoprotein, whereas the other proteins are auxiliaries. Haem progress through the system is charted by red arrows. ABC is the ATP-binding cassette. salt levels. In addition, the periplasm contains a layer of peptidoglycan and has a high concentration of other proteins.38 Periplasmic haem chaperones therefore tend to have robust yet flexible structures, making them relatively resistant to unfold- ing whilst tolerating small degrees of conformational change. Also, because of the dearth of adenosine 5′ triphosphate (ATP) in the periplasm and therefore of ready external energy, many chaperones are able to store energy upon substrate-binding, which is then used to power transport.38 Though the study of these proteins has been limited, a common feature seems to be that haem is bound via a conserved tyrosine residue, which binds to Fe with its hydroxyl group.39 The haem molecule is typically bound in a cleft between two subdomains of the protein,38 and is oriented to face the relevant channel once at the inner membrane. Once at the inner membrane, haem encounters a permease. Typically, this is quite a complex setup, involving not only the permease, but also a partner protein to induce ATP hydrolysis for the energy required to pump haem across the membrane. As with haem acquisition proteins found at the outer membrane, these permeases tend to have a large funnel running through them, thus providing a channel for the haem to travel through. They also typically have a series of residues pointing into the funnel, which are capable of binding haem via its iron ion, such as arginine or histidine, facilitating translocation through the membrane.40 The ATP-binding protein, meanwhile, typically binds non-covalently to the cytosolic region of the permease. The energy it produces from ATP hydrolysis induces conformational 8 Introduction changes in the permease, driving haem movement through the membrane. Once in the cytosol, haem is picked up either by an enzyme (to be discussed in Section 1.2.4) or another haem chaperone. Cytosolic haem chaperones are com- monly found in eukaryotic cells. Haem synthesis in these cells is a complex process, requiring some steps to be undertaken in the mitochondrion and some in the cyto- plasm. Once haem is synthesised, haem chaperones are then required to shuttle the haem to their required destination, whether that is somewhere else in the cytoplasm or to a target beyond the cell. As mentioned above, pathogens do not produce their own haem, and so there are fewer clear roles for haem chaperones to play in the cy- toplasm. It is common for pathogenic cytosolic haemoproteins to be inconsistently labelled in the literature, as there is often uncertainty surrounding their function. This shall be a major topic of discussion in this thesis, where it shall be argued that the cytosolic recipients of haem derived from the Hem, Hmu, Chu and Shu oper- ons show enzymatic behaviour, despite them commonly being classified as simple haem chaperones. The ambiguity arises because these proteins do not engage in typical haem oxygenase activity. Furthermore, there is evidence that at least some of these proteins or their close relatives engage in chaperoning haem to other haem breakdown proteins as well. 1.2.4 Haem Breakdown Pathogens require haem for a number of processes. Some bacterial proteins require the haem ligand itself in order to function properly, and so the harvested haem is transferred directly to them. Many other processes require the iron instead, and so the haem molecule is broken down. This extracted iron could then be reconstituted for use in other iron-containing proteins. Therefore, to extract iron, various haem breakdown strategies have been devel- oped. Current research would suggest that the vast majority of pathogenic haem breakdown enzymes follow what is commonly known as the ‘canonical haem oxyge- nase (HO) pathway’. This complex mechanism, which sequesters iron and produces biliverdin with the release of carbon monoxide (CO), is shown in Fig. 1.5. HOs themselves are found in almost all living organisms. In mammals, for example, the production of biliverdin from HO activity accounts for the greenish-blue colour of bruises.41 As such, the mechanism is well-characterised – the breakdown of one haem molecule requires the consumption of three O2 molecules, proceeding through the intermediates shown in Fig. 1.5.42–44 Despite their prevalence, the first crystal structure of a HO was only solved in 1999.45 This result sparked an interest in solving such structures, and now certain 9 Introduction Figure 1.5: Schematic of the canonical HO mechanism, inspired by Unno et al.44 Ferric haem is first reduced to ferrous haem so that it can coordinate O2. Further reduction gives a ferric hydroxoperoxo haem. The hydroxyl group is then transferred to the porphyrin ring itself at the α-meso-position. Oxidation at this position yields verdohaem and expels CO. Verdohaem can then be cleaved, releasing iron and producing biliverdin. HO motifs have been discovered and scrutinised, from a wide range of sources. One such motif is GXXXG, a conserved monomeric α-helical fold, which, in combination with a proximal histidine residue, is needed for catalysis.46 As noted by Sawyer, the fact that these features are so well conserved across such a diverse range of sources (including bacteria, plants and humans) would suggest that other haem-degrading enzymes without these motifs are not strictly haem oxygenases.42 Quite apart from the useful extraction of iron, the other products from this mechanism have been shown to have beneficial properties over a wide range of down- stream processes. For example, the released CO, as well as being an essential cell signalling molecule in neurons, is a potent antioxidant in higher order species such as humans. Some strains of pathogenic bacteria, such as E. coli, have been shown to take advantage of this effect by appropriating and sabotaging this inflammatory response.47 The biliverdin product can also engage in important biological processes. For example, it has been shown to inhibit viral replication.48;49 Biliverdin can also be converted to bilirubin by biliverdin reductase (BVR). Bilirubin is an effective anti- mutagen, as it can scavenge hydroperoxyl radicals,50;51 thus making it a powerful antioxidant.52;53 The canonical haem oxygenase mechanism is unusual in a number of ways. Firstly, it utilises haem both as a substrate and a cofactor.54 Its controlled re- 10 Introduction lease of CO is also an important feature to prevent the inhibition of further haem breakdown since CO binds to ferrous haem with a higher affinity than O2. Self- hydroxylation of the α-meso-carbon is also unusual as other haem proteins, such as cytochrome P450s and NO synthases, tend to heterolytically cleave a ferryl (i.e. Fe4+) intermediate instead.42 Considering the prevalence of this reaction across such a wide range of species as one piece of evidence, and the product utility as another, it would appear as if this canonical HO mechanism is the preferred option for pathogenic haem breakdown. However, one of the main contentions of this thesis is that some pathogens may have developed alternative pathways, which can act as a ‘backup’ when the conditions for the canonical HO mechanism are not favourable. These conditions could include either a lack of oxygen or a lack/excess of haem. 1.3 Operons 1.3.1 Operon Structure Proteins tend not to operate in isolation. Almost always, they are essential com- ponents within some wider biochemical framework, operating together to fulfil a specific function. For example, some enzymes need co-enzymes or chaperones to fulfil their function. Often, protein expression is therefore controlled, not on an individual basis, but by clusters of genes, which are transcribed and translated to- gether. One example of this is an operon, which is a cluster of genes controlled by a single promoter. More precisely, operons consist of three DNA components: the structural genes, which code for the non-regulatory proteins required by the cell; the promoter, which is recognised by RNA polymerase, initiating transcription; and the operator, to which a repressor protein can bind. Regulation of the operator varies between operons. For some, the presence of the repressor is controlled by another region in the DNA known as the regulator gene. Other biochemical stimuli control whether this gene is switched on or off, thus controlling whether the repressor is produced. In other operons, it is possible for an inducer (i.e. a small biomolecule) to displace the repressor from the operator. In both cases, the removal of the repressor allows for the transcription of the DNA. Since the key goal of any organism is to thrive and replicate, often in harsh con- ditions or with a lack of resources, adaptability and energy/material conservation confer an advantage. This condition explains the utility behind the ability to switch particular operons on and off upon certain environmental signals. Protein synthesis 11 Introduction Figure 1.6: Diagram of a typical operon. The promoter, P, and operator, O, control the expression of the structural genes, A, B, C, etc. A regulator gene, R, controls synthesis of a repressor protein, which can bind to the operator. This gene is not necessarily situated near the operon. can be a costly process, both in terms of energy and materials, and so it would be wasteful to produce some non-essential molecules during periods of scarcity. Alter- natively, during periods of glut, it can be important to down-regulate the production of certain proteins, often because they interfere with or direct resources away from other proteins dealing with the excess. Pathogens, by their very nature, are often exposed to harsh and changeable environments. They have therefore evolved a series of sophisticated operons to deal with these conditions. One such class of operon is to regulate iron content within the cell, often comprising many of the protein types discussed in Section 1.2. Bacterial iron homeostasis, as described by Andrews et al., is governed by five strategies. An operon could cover some or all of these strategies, which are described in Table 1.1.20;42 The haemoproteins of interest in this thesis – HemS, HmuS, ChuS and ShuS – are all coded for as part of wider operons. These operons, along with the one that codes for PhuS, are all related, yet display interesting variations in their construc- tions. Their properties, similarities and differences shall now be described over the following sections. The hem operon of Yersinia enterocolitica shall be discussed in most detail since it codes the Hem family of proteins, which includes HemS, the most studied protein in this work. 1.3.2 The Hem Operon of Yersinia enterocolitica 1.3.2.1 The Bacterium Yersinia enterocolitica is a Gram-negative, pathogenic bacterium. It is capable of in- fecting a wide range of species, including human beings, other mammals, reptiles and birds. Typically, it enters at the gastrointestinal tract and so the main cause of in- 12 Introduction Strategy Problem Response Acquisition Pathogen requires exogenous iron Iron scavenged using high-affinity transport system Storage Exogenous iron supplies are limited Internal deposits that were made during glut can be utilised Consumption External and internal iron supplies are limited Down-regulation of certain, non- essential iron-containing proteins Protection Too many iron-induced reactive oxygen species Up-regulation of proteins to store iron / degrade these species Regulation Combination of the above Overall strategy to maintain iron homeostasis captured in operon Table 1.1: Strategies undertaken by bacterial cells to maintain iron homeostasis. fection for human beings is through the ingestion of uncooked meat or contaminated water. Y. enterocolitica is the strain chiefly responsible for yersiniosis, an infectious diarrhoeal disease which most commonly affects children. Though symptoms tend to be relatively benign (for example, abdominal pain and fever) these can develop into more serious conditions, such as acute gastroenteritis,55 endocarditis,56 mesen- teric lymphadenitis,57 or fulminant septicemia.58 The Centers for Disease Control and Prevention (CDC ), a US Federal Agency, estimates that Y. enterocolitica causes 117,000 illnesses, 640 hospitalisations and 35 deaths in America every year.59 Re- search into this bacterium is therefore not only of academic interest but of medical importance as well. As detailed in the sections below, access to exogenous iron is critical for the success and survival of the bacterium. Therefore, a deeper understanding of its haem-uptake and breakdown strategy could prove useful in future efforts to fight infection. 1.3.2.2 The Operon The hem operon of Yersinia enterocolitica was the first operon that codes for haem- sequestering proteins to be discovered.60 It consists of 9 genes, hemWXYPRSTUV, although two intergenic regions separate them into hemWXY, hemPR and hem- STUV. A restriction map showing this arrangement is given in Fig. 1.7. It was not realised in the early studies that the hemWXY section was part of the operon. Therefore, it was thought that the first open reading frame (ORF) was for hemP, which codes for the poorly understood protein, HemP. In their original studies, Stojiljkovic & Hantke60 were unable to isolate this protein, which is 81 amino acids long and approximately 8.5 kDa in weight, by sodium dodecylsulfate- 13 Introduction Figure 1.7: Restriction map of the hem operon in Yersinia enterocolitica. Genes are shown as colour-coded cylinders, separated by intergenic regions. Cylinder width corresponds to the number of base pairs in the gene. polyacrylamide gel electrophoresis (SDS-PAGE). However, they did demonstrate that this protein is needed (along with HemR and HemS) for Yersinia to grow successfully under iron-limiting conditions.42;60 Since then, it does not seem as if any studies have been carried out to uncover more of the behaviour of HemP in Y. enterocolitica. However, studies on other pathogenic hemin uptake systems have suggested that their versions of HemP/HmuP are involved in transcriptional acti- vation of the genes encoding the outer membrane haem receptors.61–63 It is hypothesised by this author that HemP operates in conjunction with the Fur box, to better control haem homeostasis. Before examining this idea, the role of the Fur box must therefore be explained. The Ferric Uptake Regulator (Fur) protein is a transcription factor that binds Fe2+, which causes its affinity for certain regions of DNA, known as Fur-binding sites (FBSs), to increase 1000-fold.42;64 These FBSs overlap with the−35 and−10 regions within the promoter. Therefore, when Fur binds, RNA polymerase is prevented from gaining access, and so transcription of the downstream genes is suppressed. The Fur box is therefore an effective feedback loop. When Fe2+ is relatively scarce, the Fur box sequence is free and so transcription proceeds, producing HemR, to acquire haem, and downstream proteins, to process it once in the cell. When the surroundings are rich in Fe2+, the Fur box sequence is blocked and transcription is repressed. This effect limits the amount of haem entering the pathogen, mitigating against the production of reactive oxygen species. A typical Fur box is shown in Fig. 1.8. It is possible that HemP adds another layer of complexity to the Fur box. In- deed, the overall Fur box overlaps the gene for HemP: in their original study of the hem system, Stojiljkovic & Hantke showed that the Fur box spans 412 nu- cleotides, starting at nucleotide 379, whereas the ORF for HemP ranged between nucleotides 373-615.60 Though they did not consider HemP in their studies, Jacobi 14 Introduction Figure 1.8: Model of Fur repression. Under iron limiting conditions, the RNA polymerase (RNAP) is free to bind to the −35 and −10 sites of the promoter, which allows the transcription of genes. Under iron replete conditions, the coordination of Fe2+ to Fur causes a conformational change, significantly increasing its affinity for the Fur-binding sites (FBS). As these overlap with the −35 and −10 sites, this effect prevents the coordination of RNAP to the promoter, and hence the transcription of downstream genes. The colour- coding and labelling of the genes is as in Fig. 1.6. 15 Introduction et al. observed three different levels of HemR expression when Y. enterocolitica was placed in various mouse tissues.65 In tissues from the liver and intestinal lumen, expression was weak, reflecting the high iron content of these organs (to reiterate, Y. enterocolitica cannot become overloaded with iron, and so HemR production is repressed to avoid excessive absorption of haem under these conditions). Meanwhile, in splenic tissues, which have moderate iron content, there is a moderate expression of HemR. Interestingly, however, in peritoneal tissue, where there is a dearth of iron, HemR is ‘hyperexpressed’. Jacobi et al. commented that ‘Yersiniae carrying fyuA or hemR reporter fusions exhibited threefold-stronger signals when grown in the peritoneal cavity of mice than those growing under iron derepression in vitro’.65 They proposed that such an increase could be controlled by additional activators besides the Fur box. Considering the recent findings that HemP in related systems could be involved in transcriptional activation, this author tentatively speculates that the hemPR operon is controlled in the manner described in Fig. 1.9. Studies will be required to determine if this idea is correct or not. The hemR gene produces HemR, a 78 kDa outer membrane haem acquisition protein.60 Unfortunately, structural characterisation of HemR has been limited. LaCross et al. investigated HemR of another Gram-negative bacterium, Nonty- peable Haemophilus influenzae,66 using the I-TASSER server for protein structure and function prediction.67–69 They even compared this result against the predicted structures of related proteins, some with very low sequence homologies, showing that they were very consistent. This observation suggests that HemR in Yersinia enterocolitica contains the motifs shown in these structures, namely those that are standard for haem acquisition proteins, as described in Section 1.2.2. These motifs are a β-barrel running through the membrane the protein is embedded in, as well as extracellular dangling loops, which act as receptors. HemR in Y. enterocolitica is TonB-dependent,60;70 a feature found in all iron ac- quisition systems described from Gram-negative bacteria. TonBs are a remarkable family of proteins, which span the periplasmic space of such bacteria. Their large, β-barrel structures allow them to transmit signals from the outer membrane to the cytoplasm. In Y. enterocolitica, the TonB box is located at the amino-terminus of HemR,60 and so signalling to the cytoplasm (and therefore haemoprotein produc- tion) is intimately related to the concentration of extracellular haem being sensed by HemR. Between hemR and the next ORF (required for hemSTUV ) is an intergenic region of 120 nucleotides. Part of this region constitutes an inverted repeat,60 prob- ably to terminate hemR transcription. As with hemR, there is a Shine-Dalgarno 16 Introduction Figure 1.9: Proposed model for HemR expression based on iron availability. With mod- erate and high iron, expression behaves similarly to a standard Fur model, as described in Fig. 1.8. Symbols also correspond to that figure. However, at very low iron concentra- tions, HemR is hyperexpressed. It is thought this effect could be due to transcriptional activation by HemP (brown hexagon), although how such activation can only occur at very low iron concentrations is not properly understood. 17 Introduction sequence71 approximately 10 nucleotides upstream of the start codon. Due to this intergenic region, this operon can be considered as a separate entity from the hemPR operon. Whereas hemPR is concerned with haem acquisition, hemSTUV is regarded as a haem-specific periplasmic binding-protein-dependent transport (PBT) system. Indeed, Y. enterocolitica hemSTUV was the first PBT to be characterised from any Gram-negative bacterium.37;42 PBTs contain at least one of each of the following three classes of protein: a periplasmic binding protein; a hydrophobic protein, for spanning the inner membrane; and a hydrophilic protein which can bind ATP. Typ- ically, they also have a fourth: a cytoplasmic protein, implicated either in haem transport or haem utilisation. In the case of hemSTUV, this is HemS. The structure and function of HemS shall be the focus of much of this thesis, and an introduction to this haemoprotein is presented in Section 1.4.1. The last codon of hemS overlaps with the first of hemT, implying that they are translationally coupled. The protein, HemT, is poorly characterised in the literature. Its sequence, however, suggests it has a hydrophobic α-helical core. Its homologues HmuT, ShuT and PhuT have all been crystallised successfully. Of these proteins, HmuT, which was derived from Y. pestis, has the closest sequence homology (91%). This is such a close homology, it is assumed that HemT will behave in a very similar manner to HmuT. HmuT (and therefore presumably HemT) is different from ShuT and PhuT in that it can bind two stacked haem molecules in its large central cleft.39 This dimer binds securely via a tyrosine (which is conserved across all periplasmic haem transport proteins) at the free axial site of one of the Fe2+ ions. A further histidine residue (less well conserved, reflecting the fact that not all of its homologues bind two haems) lies at the free axial site of the Fe2+ ion on the other haem. This structure provides a secure method with which to shuttle haem to the inner membrane. Once at the inner membrane, haem is taken up by HemU, the next protein coded for in the hemSTUV operon. HemU is a permease, and as such is very hydrophobic. As with HemT, this is a poorly characterised protein, but the close homologue, HmuU from Y. pestis has been more deeply studied. In 2012, Woo et al. solved the structure for HmuU,72 revealing that it exists as a heterodimer with HmuV, where the fully assembled transporter has stoichiometry HmuU2V2. The two HmuU subunits occupy the transmembrane domain whereas the two HmuV subunits are situated at the nucleotide-binding domain. As clearly shown in Fig. 1.10, HmuU is rich in α-helices. Indeed, each subunit has ten transmembrane helices, giving a total of twenty in the fully assembled trans- porter. This structure provides the large, membrane-spanning funnel required of a 18 Introduction Figure 1.10: Proposed alignment of the periplasmic haem transporter, HmuT (PDB 3MD9),39 with the membrane-spanning downstream transport system, HmuU2V2 (PDB 4G1U), as according to Woo et al.72 Glu77 and Glu206, thought to be important to the docking event, are highlighted. It is thought that two stacked haem molecules can be transported through the membrane together. 19 Introduction permease, as described in Section 1.2.3 The crystal structures obtained by Woo et al. strongly reinforced some proposals concerning permease-ATPase (i.e. HemU/HmuU–HemV/HmuV) interactions made by Stojiljkovic & Hantke when they first examined the sytem.37;72 Namely, it had been noticed that HemU, in common with other PBT permeases, such as the vitamin B12 transporter BtuC, has a conserved EAAX3GX9LLLL sequence. It was suggested that this motif interacts closely with the ATPase of the PBT (i.e. HemV/HmuV). Though not remarked upon by Woo et al., it can be seen from their crystal structure of the overall HmuU2V2 system, that each of the HmuU units has a sequence very nearly corresponding to this motif, of EAHYLGVNVRQAKLRLLLL. Furthermore, these motifs are intimately associated with the respective HmuV units, including a salt bridge between the first glutamic acid of the quoted sequence above and an arginine on HmuV. This salt bridge, plus other noncovalent bonds, is typical of ATP-binding proteins in haem-uptake systems (c.f. Section 1.2.3), which are required to associate with the cytosolic region of the permease to provide energy. HemV itself is highly homologous to other known PBT ATP-binding proteins. In order to generate energy, HemV is required first to bind ATP and prime it for hydrolysis. It achieves this goal through two Walker motifs73, which are prevalent across nucleotide-binding proteins: the GX4GK Walker A motif, expressed as GPNGAGK in HemV, which is important for phosphoryl-binding; and the hhhhDE (where h is a hydrophobic residue) Walker B motif, expressed as WLFLDE in HemV, as aspartate, D, is required to bind the Mg2+ and glutamate, E, is required to effect the hydrolysis.74 Stojiljkovic & Hantke conducted studies to determine the importance of the hemTUV genes. They found that inactivation of hemTUV limited, but did not stop, bacterial growth. As the hemPR operon was left unaffected, HemR was still able to transport haem into the periplasm. Though the standard mechanism for transporting haem across the inner membrance was removed, at high enough haem concentrations it was possible for haem to seep into the cytoplasm regardless. In another study, Stojiljkovic & Hantke demonstrated that removal of HemP, HemR and HemS each caused cell death under iron-limiting conditions.60 It does not seem as if an equivalent study was carried out in iron-replete conditions. An overall picture of the Hem set of proteins is provided in Fig. 1.11 20 Introduction Figure 1.11: Adaptation of Fig. 1.4, applied to the Hem system of proteins. For clarity, HemP,W,X,Y are not included. 1.3.3 The Hmu Operon of Yersinia pestis 1.3.3.1 The Bacterium Notorious for spreading the Black Death, or Bubonic Plague, Yersinia pestis is the most deadly pathogen in human history. It is a close relative of Yersinia enterocolit- ica and Yersinia pseudotuberculosis, the only other two bacteria that are pathogenic to humans from the wider Yersinia family. Whereas the other two target the in- testine of their host and so are largely restricted to that region, Y. pestis developed a distinct inoculation route during the course of its divergence from an ancestral Y. pseudotuberculosis.75 This distinct route is, famously, the flea bite. In common with the other pathogenic Yersinia, Y. pestis is lymphotropic75 (i.e. it can quickly travel to and colonise lymphoid tissues) but, unlike its close relatives, is not largely localised to the intestine, thereby making it a more potent pathogen. 1.3.3.2 The Operon The hmu (short for ‘hemin utilisation’) operon is the closest known relative of hem. Indeed, they and their proteins are so close it could be argued they should be re- garded as interchangeable, as they often are in online databases such as Uniprot.76 As detailed above, the hmu operon studied in this work comes from Y. pestis and, as such, has a few subtle differences from the hem operon of Y. enterocolitica. This dif- ference is despite the operons being structured in the same way, i.e. both have three independent promoter regions, effectively splitting the operon into hem/hmuWXY, 21 Introduction hem/hmuPR and hem/hmuSTUV (c.f. Fig. 1.7). Both also have the same Fur box setup. The first piece of evidence for subtle divergences came when Perry et al. showed that there was only modest hybridisation betweeen Y. pestis hmu and Y. enterocolitica hem DNA.77 In a follow-up study, they showed that these differ- ences were mainly manifested in the HmuP and HmuR proteins.78 In the case of HmuP, it was found to be only 41 amino acids long, nearly half the length of HemP (at 81 amino acids). This length gives credence to the findings made by Amarelle et al., when investigating HmuP from Sinorhizobium meliloti, that this set of proteins operate as transcriptional regulators,61 as protein length tends to be less important for function as it is in other protein classes. HmuR, meanwhile, was found to have a deleted region with respect to HemR, although in common with ChuA and ShuA, towards the carboxy-terminal region.78 It is thought this deletion could affect the outer membrane receptors. Differences in the other proteins do not appear to be significant: in each case, the main motifs, as described in Section 1.3.2.2, are all retained. 1.3.4 The Chu Operon of Escherichia coli 1.3.4.1 The Bacterium Escherichia coli has been studied more than any other bacterium, due to the ease with which it can be inexpensively grown and cultured in the laboratory. E. coli exists in many strains, most of which are harmless. Like Y. enterocolitica, they are Gram-negative and typically reside in the intestines of endotherms. Unlike Y. enterocolitica, however, E. coli tends not to be pathogenic, and indeed can have a symbiotic relationship with its host to prevent against pathogenic infection. The chu operon has been most widely studied from the E. coli O157:H7 serotype, however, which is a pathogenic variety. 1.3.4.2 The Operon The chu (from E. coli haem-utilisation) operon shows both similarities and striking differences to hem or hmu. This operon is able to produce homologues of HemR (ChuA), HemS (ChuS), HemT (ChuT), HemU (ChuU) and HemV (ChuV). An equivalent for HemP is not included, but there are three other genes present, which code for ChuW, ChuX and ChuY, respectively. The overall operon structure is also controlled by four promoters, not three, and could therefore be considered as being split into chuA, chuS, chuTWXY and chuUV, respectively. The expression of ChuA is, like HemR, regulated by a Fur box.79 Without an 22 Introduction equivalent for HemP (which is also thought to be involved in HemR regulation, see Section 1.3.2.2) it does not appear as if transcription to ChuA is as tightly controlled. The members of the periplasmic-to-cytoplasmic haem transport system, ChuT, ChuU and ChuV, have not been studied rigorously, although it is commonly ac- cepted that they operate in a similar manner to HemT, HemU and HemV.80 The transported haem is then picked up by ChuS in the cytoplasm. The role of ChuS, like HemS and its other homologues, is controversial, as discussed in Section 1.4.4. The genes for ChuW, ChuX and ChuY differ from those for their homologues in Y. enterocolitica or Y. pestis in that they are clearly integrated within the overall chu operon. In the two Yersinia species, these genes precede hemP and, as such, have traditionally not been considered as part of the hem/hmu operons. Their char- acterisation is very limited; indeed, it was only through work done by the present author and, in particular, Yuhang Xie (also of the Barker Group), that it was re- alised that these proteins are homologous to ChuW, ChuX and ChuY.81 The obvious names to give these proteins would be HemW, HemX and HemY in Y. enterocolitica and HmuW, HmuX and HmuY in Y. pestis. However, to confuse matters, there are other proteins coded for at different parts of the Y. enterocolitica, Y. pestis and E. coli genomes, respectively, which are often designated HemW, HemX and HemY. These proteins, though also involved in haem utilisation, do not have any signifi- cant homologies with those associated with the hem/hmu/chu operons. To avoid confusion between these proteins, and those associated with the hem operon in Y. enterocolitica, the latter shall be referred to in the rest of this report as HemW′, HemX′ and HemY′ respectively. Finer details concerning this topic, as well as a more detailed investigation into homologies between the proteins coded by the hem, hmu, chu, shu and phu operons, are in Appendix A. It is perhaps because of their central position in the operon that the ChuW, ChuX and ChuY proteins have been studied more rigorously than any of their homologues. In 2009, Suits et al. investigated ChuX.82 Their analysis suggested that ChuX is a cytosolic haem transport protein, much like ChuS was thought to be. Indeed, as noted by Suits et al., ChuS and ChuX bear a structural resemblance, despite having low sequence homologies. ChuX (approximately half the size of ChuS) was found to dimerise, creating a fold similar to the ChuS monomer. However, they also showed that haem could bind to ChuX in a 1:1 ratio, meaning therefore that the homodimer could bind two haem molecules, rather than the one that ChuS typically accommodates. Unlike ChuS (which is known to be capable of breaking down haem using ascorbate or Cytochrome P450-Reductase, as shall be described in Section 1.4.4), ChuX was found not to have any haem breakdown capabilities. As ChuX 23 Introduction was shown to bind a molar equivalent of haem, and given its high K D value, Suits et al. noted that such features had been observed in other related cytosolic proteins which had been determined to be involed in haem storage. They therefore suggested a similar function for ChuX.82 In 2016, LaMattina et al. built on this research, this time considering ChuW.83 This work shifted the paradigm concerning bacterial haem catabolism, as they demonstrated a novel mechanism whereby this process could occur anaerobically. Key to this process were the ChuW, ChuX and ChuY proteins. Firstly, they noted that ChuW was a member of the radical S -adenosylmethionine (SAM) superfamily. Such enzymes coordinate redox active [4Fe-4S] clusters. These clusters are then used to reductively cleave SAM, creating a highly oxidative 5′-deoxyadenosyl-5′-radical (5′-dA•) that can then be used in catalysis. More precisely, ChuW belongs to the radical SAM methyltransferase (RSMT) family, which is known to transfer methyl groups to otherwise unreactive carbon atoms. Sometimes this transfer is accompa- nied by appreciable chemical rearrangements.84;85 In the case of ChuW, LaMattina et al. found that the transfer involved cleavage of the haem porphyrin ring. The full assay, which included E. coli flavodoxin, E. coli oxidoreductase and NADPH, pro- duced a novel compound, which LaMattina et al. called ‘anaerobilin’. Its proposed structure is shown in Fig. 1.12, along with those for ‘deuteroanaerobilin’ (DAB) and ‘mesoanaerobilin’ (MAB). DAB and MAB were produced from the reduction of deuterohaem and mesohaem respectively.83 Due to its greater solubility than haem under these conditions, deuterohaem proved to be a useful alternative, as it allowed for some downstream processes to be investigated in more detail. Tandem mass spectrometry (MS2) studies on DAB and MAB (and therefore, by implication, on anaerobilin) suggested that the porphyrin had been cleaved at the α-meso position, similar to the canonical HO mechanism described in Fig. 1.5. The observed increase in labile iron would seem to support this conclusion. LaMattina et al. then introduced DAB (still working as a surrogate for anaer- obilin) to ChuY. Based on its sequence, ChuY is considered to be part of the NAD(P)H oxidoreductase family. Therefore, in the assay, NADPH was also in- cluded. UV-Vis results showed the signature peaks for DAB decreasing over time. Furthermore, increasing ChuY concentration led to a linear increase in this activ- ity, suggesting an enzymatic reaction was occurring. From these results, LaMattina et al. concluded that the purpose of ChuY was to reduce a potentially toxic haem breakdown product. Though LaMattina et al. were not able to test this hypothesis, it seems reason- able to assume that the role of ChuX in vivo in this context would be to shuttle 24 Introduction Figure 1.12: Proposed breakdown structures from LaMattina et al.83 The methyl groups derived from SAM are shown in red. Top left and top right: proposed structures for anaerobilin. It is unknown which pyrrole the α-meso-carbon of haem remains attached to, or if both are produced, so the two possibilities are shown. Bottom left: deuteroanaerobilin (DAB) product. With respect to anaerobilin, DAB has two fewer vinyl groups, replacing them with H atoms. Bottom right: mesoanaerobilin (MAB) product. With respect to anaerobilin, MAB has two fewer vinyl groups, replacing them with ethyl groups. As with anaerobilin, it is unclear for both DAB and MAB which pyrrole the α-meso-carbon remains attached to. Only one possibility for each is shown. 25 Introduction anaerobilin from ChuW to ChuY to then be broken down further. When discussing the role of ChuS and its homologues, many parallels between them and this ChuW,X,Y system shall become apparent. 1.3.5 The Shu Operon of Shigella dysenteriae 1.3.5.1 The Bacterium Shigella is a genus of bacteria whose members, S. dysenteriae, S. flexneri, S. boydii and S. sonnei, it has been argued, would be better classified as strains of E. coli.86;87 Infection typically spreads due to ingestion or handling of contaminated food and water. All members of the Shigella family have been implicated in shigellosis, an infection of the intestines, which typically causes diarrhoea and fever. S. dysenteriae is recognised as the most dangerous Shigella serogroup to human beings. It is the leading cause of dysentery epidemics worldwide, which often arise in refugee camps. It is estimated that Shigella accounts for over 1 million deaths across the world every year.88 1.3.5.2 The Operon Considering that phylogenetic studies of E. coli O157:H7 suggest it should be reclas- sified under the Shigella subgenus (particularly as it contains the Shiga toxin) it is perhaps unsurprising that the shu and chu operons of S. dysenteriae and of E. coli O157:H7 are very similar. As with chu, the shu operon is split into four regions under different transcriptional controls: shuR (also known as shuA), shuS, shuTWXY and shuUV. Each of the genes within the operon show significant homologies to those in chu. There is a significant difference in the intergenic regions between shu/chuR and shu/chuT but Wyckoff et al. concluded that promoter elements and a Fur box were retained.89 The Shu proteins have not been as well studied as Chu. Work that has been done would suggest close similarities. However, there are interesting differences between ShuS and ChuS, which shall be discussed primarily in Section 8.4. 1.3.6 The Phu Operon of Pseudomonas aeruginosa The haem utilisation operon of Pseudomonas aeruginosa, phu, is not investigated in this work. However, it has intriguing similarities and differences from those operons which are studied, and so it is worth briefly considering. 26 Introduction 1.3.6.1 The Bacterium Though a Gram-negative Gammaproteobacteria like the other species considered thus far, P. aeruginosa is not a close relative to any of them. It is considered to be opportunistic, primarily attacking organisms that are immunocompromised or suffering from existing diseases, and can enter a human host via a variety of methods such as the urinary tract, burns or exposed wounds. As such, it is not as specific as the bacteria considered so far (Y. pestis excluded), which primarily target the gut, yet it still requires exogenous iron to survive and replicate. One haem- uptake strategy it uses is to produce siderophores, which capture and transport iron. This method can be wasteful, however, since the iron-siderophore complexes are not specific. This issue is exacerbated in populations of P. aeruginosa containing forms that do produce siderophores (known as cooperators) and mutated forms (known as cheaters) which do not. The cheaters can pick up siderophores, which they themselves did not produce, thus depriving the cooperators, which expend a significant amount of energy to produce these siderophores in the first place. In a mixed population, the cheaters therefore outcompete the cooperators, weakening the overall population over time and decreasing virulence.90;91 Mitigating this effect are alternative haem-uptake strategies, such as that produced by the phu operon, as shall now be described. 1.3.6.2 The Operon Unlike the operons discussed so far, the phu operon does not contain genes that code for the W, X and Y proteins. Those homologous proteins that are present are split into two regions under the control of different promoters and Fur boxes: phuR and phuSTUV. Following phuV, a further protein is coded for which was originally re- garded as PhuW.92 However, it bears little homology to other W proteins (with 11% identity to ChuW and HemW′ respectively). Instead, this molecule is now regarded as a ChaN lipoprotein, since it was found to have a 30% identity with ChaN from Campylobacter jejuni.93 Rather than acting as a cytosolic haem breakdown enzyme, ChaN lipoproteins are thought to associate with the outer membrane, operating in some sort of partnership with ChaR (PhuR in P. aeruginosa).94 Ochsner et al. showed that removal of this chaN gene limited, but did not stop, bacterial growth when haemin was the only exogenous source of iron. This arrest also proved to be the case when the phuR gene or the phuSTUV operon were removed.92 Whereas homologies between the proteins in the operons discussed previously tend to be relatively high (tending not to drop below 50% identity or 70% similarity) those proteins that are shared between them and the phu operon tend to be less 27 Introduction similar. This situation is not surprising, since P. aeruginosa is not of the same taxonomic order as the other bacteria considered. Despite this difference, the R, T, U and V proteins are thought to fulfil the same functions as those from the other operons. As far as the author is aware, none of these proteins have been extensively characterised (e.g. by crystallography or NMR studies). The exception is the S protein, PhuS, which has been shown to have interesting structural and functional differences compared to its homologues. This shall be discussed in Section 1.4.6. 1.4 HemS and its Homologues 1.4.1 HemS HemS from Yersinia enterocolitica is the main focus of this thesis. The precise function of this protein has long been under debate. When they first examined the hem operon, Stojiljkovic & Hantke concluded that HemS ‘could be either a cytoplasmic membrane permease that transfers hemin into the cytoplasm or a hemin- degrading enzyme.’60 Though it has been shown since then that HemS is not a permease, it has remained unclear whether it is a haem transfer protein or a haem- degrading enzyme. Despite not knowing its function, it was determined from the beginning that HemS was a key component of the hem operon. In their original study, Stojiljkovic & Hantke showed that HemR and HemS were both required for haem to be used as an iron source: gene knockouts led to cell death.60 The crystal structure of HemS was solved in 2006 by Schneider & Paoli, becoming the first cytosolic haemoprotein of its class to be resolved in both its apo- (without haem, PDB code 2J0R)95 and its holo- (haem-bound, PDB code 2J0P)96 forms. Such studies showed that HemS is a large, 41 kDa protein consisting of two topo- logically homologous domains joined by an unstructured loop to give a pair of large, stacked β-sheets.95 The fact that these domains are so similar has led to speculation that HemS is actually a fusion of two originally separate proteins. In the closely related chu operon (discussed in Section 1.3.4.2), for example, Suits et al. noted a close structural similarity between ChuS and the ChuX dimer, and hypothesised that both these proteins were fulfilling a similar function.82 The equivalent protein in hem, HemX′, is poorly studied – it would be of interest to determine whether it bears the same relationship to HemS, as ChuX does to ChuS. The unstructured loop connecting these domains is a poorly understood region, with its high conformational entropy making crystallographic resolution difficult. In the literature, the structure of this region is not properly known. In this thesis, a 28 Introduction Figure 1.13: Representations of HemS. Left: holo-HemS (PDB 2J0P, green) superimposed on apo-HemS (PDB 2J0R, cyan). Middle: rotated version of left-hand representation. Right: Surface representation of holo-HemS. Dark blue and red colours represent nitrogen and oxygen atoms, respectively. In all of the representations, the black circle corresponds to the large cavity, and the purple circle to the small cavity. A dashed line is provided to show the missing loop region from the holo- structure. The apo- loop is also incomplete. The structural overlay clearly shows the large cavity ‘clamping down’ upon haem-binding when compared against the apo-form. combination of further computational modelling and further X-ray crystallography shall shed more light on this region. The stacked central β-sheets of HemS are twisted and capped by α-helices to form two distinct pockets. Haem can bind within the larger, deeper pocket. A histidine (H196) binds to one of the free iron axial ligands, anchoring the haem inside the pocket. It is less clear what is at the sixth coordination site, since previous crystallisation studies around this region could not be tightly refined, though it is strongly suspected that it is either a H2O molecule or OH − ion, clamped in place by an arginine (R102) residue. As well as binding to iron, further residues in HemS appear to have a role in binding to the propionate groups of haem. R209, K294, Q316, Y318 and R321 together form a polar, solvent-inaccessible region suited to these propionates. Whereas R321 coordinates to the more exposed propionate, R209, K294, Q316 and Y318 all form polar bonds with the propionate which is more deeply buried inside the pocket. Inclusion of haem within this pocket leads to an induced fit conformational change. The C-domain moves towards the N-domain to facilitate H196-binding to the iron ion, clamping the haem molecule more tightly in place. This change causes the further burial of 350 A˚2 of solvent accessible surface area (SASA) on top of that occupied by the haem itself,21 giving the docking event a considerable entropic drive to complement the favourable enthalpic drive caused by the new intermolecular in- 29 Introduction Figure 1.14: Close-up representation of the holo-HemS haem-binding pocket from 2J0P. The most important residues implicated in haem-binding, as detailed in the text, are shown in cyan. teractions. The tight binding between H196 and the iron ion results in significant buckling and distortion to the porphyrin ring, suggesting that HemS is priming haem for degradation, rather than just transporting it within the cytoplasm. The following sections show that there are indeed molecules that can react with holo-HemS to give novel, breakdown products. As these molecules, in particular NADH, are discussed, further residues shall be highlighted that are thought to be important to NADH-binding/regulation. First, however, a deeper discussion of the nature of haem-binding in HemS is required. 1.4.2 Haem-Binding in HemS An in-depth study of haem-binding was conducted by Sawyer as part of her PhD at the University of Cambridge.42 Though unpublished, these data are important for a proper understanding of this protein’s relationship with haem. Sawyer showed that the UV-Vis spectra of holo-HemS is dependent on pH. As pH was increased from 4 to 9, the Soret band shifted from 403 nm to 408 nm. This shift was accompanied by an increase in intensity, indicating stronger haem-binding at higher pH. It must be remembered that haem typically has lower solubility in more acidic solution, and that protonation of amino side chains around the haem-binding pocket are possible. However, this shift in intensity was regarded as too great to be 30 Introduction assigned just to these effects. Instead, this change of intensity was thought mainly to be due to changing affinities of the water/hydroxide acting as the sixth haem iron ligand, and of the pK a of this group. Data fitting showed this pK a to be 5.5, lower than for other reported proteins with hydroxide ligands.97 Increasing the pH also led to a decrease in absorbance at ∼380 nm and at ∼650 nm, suggesting a transition to low-spin iron. Overall, therefore, it would seem that at high pH, haem is a low- spin, 6-coordinate, hydroxide complex. At low pH, the wavelength of the Soret peak would suggest a high-spin, 6-coordinate, water complex but the pK a would instead indicate a 5-coordinate complex. Some sort of equilibrium is assumed, and because water is a weak ligand, it is recognised that haem iron may fluctuate between high- and low-spin states. Over a series of difference spectra, in which a constant concentration of HemS was mixed with different concentrations of haem solution, Sawyer showed that the Soret peak reaches maximum absorbance once haem is equimolar to the protein. Thereafter, this peak remains approximately steady, but the overall spectra con- tinues to change. This observation suggests that there is saturation of the known binding site, but then further protein-haem interactions can occur elsewhere. In other words, there is one favoured binding site but other (relatively spectroscopi- cally ‘silent’) ones may be possible. To determine whether HemS could accommodate more than one haem at a time, further studies were required. There was also a question as to whether these ex- cess haem molecules were binding separately or as dimers. An analysis using non- denaturing mass spectrometry (ndMS) was therefore carried out. Here, it was im- portant to be able to run MS on the entire samples injected, and on ions of particular m/z values selected in the quadrupole. In the first case, this experiment allowed for all the species to be separated at the end of the time of flight (ToF) tube and then be detected. This setup allows for an overall picture of protein-ligand binding stoichiometries to be developed. In the second case, only the daughter ions of the species selected in the quadrupole could be detected. This latter case is known as tandem MS. By explicitly selecting a particular protein-ligand stoichiometric com- bination, the ligand molecules stripped from the protein could be unambiguously identified with that parent ion. Results from this experiment showed that HemS was indeed capable of binding more than one haem molecule at a time. Ratios up to holo-5-HemS were unam- biguously observed; it is possible that higher stoichiometries were present, but the spectra become difficult to deconvolute in the regions where these species would be expected. Furthermore, haem could bind as monomers or dimers. There was a 31 Introduction particular preference for monomers, especially at low haem : protein ratios.42 Choy, as part of his PhD at the University of Cambridge, probed this hypothesis further by computation.21 He showed that it might be possible both for haem to bind to the N-terminal pseudo-pocket in addition to the main pocket (termed 1,2- bishaem-HemS) or to bind in the main pocket as a dimer (1,2-dihaem-HemS). The pseudo-pocket has some properties that would appear to make it suitable for haem-binding. This situation was commented upon in the previous section when discussing the possible origins of HemS, and how it may have been a fusion of two smaller haem-binding proteins. This pseudo-pocket has an R-Q-Y-K-R line of residues in common with the main pocket, where both arginines and the lysine are implicated in propionate recognition, and tyrosine in iron coordination. Choy showed, by superimposing a second haem molecule onto his holo-HemS structure from experiment and then optimising, that haem could also bind inside this pocket and interact with these residues. However, the space available in this pocket was found to be very tight. It was further noted that there was a lack of conformational flexibility, and that there was an absence of some further propionate-recognising residues otherwise found in the main pocket. Altogether, this situation suggested that HemS would only use this pocket for haem-binding if the other was already occupied, in agreement with experiment. Choy was also able to superimpose haem dimer into the main pocket and op- timise to stable structures. He showed that these dimers were able to overlap in slipped-parallel mode with the propionates either perpendicular or antiparallel to one another. Having a second haem in this pocket precludes NADH from enter- ing, thereby preventing the novel haem breakdown discussed in this thesis (to be introduced in Section 1.5). 1.4.3 HmuS The structure of Yersinia pestis HmuS has not been resolved in either its apo- or holo-forms. It does, however, have 89.6% identity to HemS, retaining all of the important residues listed above (i.e. R102, H196, R209, K294, Q316, Y318 and R321). Of those residues which do differ, the changes tend to be cosmetic (% similarity between the two proteins is 94.8%). It is therefore thought, in the absence of detailed evidence, that the structures of HmuS and HemS are essentially identical. 32 Introduction 1.4.4 ChuS The structure and function of ChuS have both been studied in great detail. Its apo- crystal structure was first resolved by Suits et al. in 2005 (PDB Code 1U9T),80 with the holo-form following the year after (PDB Code 4CDP).98 Due to the crystal structures for ChuS being solved only a matter of months after those for HemS, a detailed comparison between them was not made at the time of publication. However, the similarities are abundant. As with HemS, it was found that the N- and C-terminal halves of ChuS represented a structural duplication, giving a root mean square deviation of 2.1 A˚ between the repeats, despite them only having 19% sequence identity.80 From this result, Suits et al. concluded that ChuS was fulfilling a similar role to ChuX, which could dimerise to give very similar pockets to those in ChuS, as noted in Section 1.4.1. Just like HemS, these two domains in ChuS are connected by an unstructured loop, which could not be entirely resolved in either the apo- or holo-forms. All of those residues important for haem docking in HemS noted in Section 1.4.1 (R102, H196, R209, K294, Q316, Y318, R321) are present in ChuS. There are points of deviation, however, between HemS and ChuS. Firstly, their % Identity and % Similarity are 66.8% and 78.2%, respectively. A region with many differences, including the deletion of a glutamic acid found in HemS but not in ChuS, is the outermost α-helix of the α-loop-α-loop-α motif in the C-terminal domain. These differences result in this helix being shifted further from the central cavity in the holo-form (when compared against HemS). Due to the intimate connections between the α-helices of this motif, this change causes the innermost helix (that which directly forms the cavity) to also be shifted away, thus creating a larger cavity. However, when haem is not present, this effect is reversed; in other words, the HemS cavity is wider. Taken together, these observations suggest HemS ‘clamps down’ more than ChuS upon haem inclusion, as illustrated in Fig. 1.15. This difference between the two homologues would perhaps suggest that HemS is more effective at haem-binding than ChuS, as discussed in the Results. The holo-ChuS crystal structure was of high resolution, allowing haem-binding to be closely investigated. Histidine, H193, coordinates to the iron ion at the proximal side, whereas an arginine, R100, is involved in coordination at the distal side, much as in HemS. Unlike HemS, however, this distal region was clearly resolved, showing two water molecules involved in coordination situated between R100 and the iron ion. Spectrally, haem-binding in this pocket results in a Soret maximum at∼408 nm (plus further sets of peaks for the β-band at ∼545 nm and for the α-band at ∼580 nm).80 Suits et al. showed that this Soret band depended on the H193 residue binding 33 Introduction Figure 1.15: Comparison of HemS and ChuS structures. Left: overlay of apo-HemS (2J0R, green) and apo-ChuS (1U9T, cyan).80 Here, the α-helix capping the central cavity is buried deeper in ChuS than in HemS (black square), thus restricting the pocket size. Right: overlay of holo-HemS (2J0P, green) and holo-ChuS (4CDP, cyan).98 Here, the relative positions of the capping α-helices (black square) have now been reversed: the helix for HemS is now buried deeper in the pocket. It appears that inclusion of haem causes a more significant ‘clamping’ effect than it does for ChuS, thus explaining this reversal. The purple squares in each representation also highlight a conformational difference in a central loop between the HemS and ChuS structures. H196 is the preferred residue for haem iron-binding. However, this highlighted loop contains two further histidines – H85 and H89 in HemS, and H83 and H87 in ChuS. Their close proximity to H196 would suggest some sort of role in haem-binding. Therefore, such conformational differences in the loop could affect the relative abilities of HemS/ChuS to ‘capture’ and store haem. 34 Introduction to haem: mutations to the histidine, such as H193N, were shown to broaden the absorbance and shift the maximum to∼390 nm.98 In the wild type, ChuS was further shown to ‘clamp down’ on haem when the latter was introduced to the pocket, bringing the histidine closer to the iron ion. As mentioned above, however, this conformational change was less drastic than with HemS. In their study of the structure of holo-ChuS, Suits et al. became the first to discover that this class of proteins can act as enzymes to promote haem degrada- tion. Noting that typical bacterial HOs either use the Cytochrome P450 Reductase (CPR)-NADPH system, flavodoxin, ferredoxin or ascorbic acid as reducing partners in vitro,80 they opted for ascorbic acid, introducing it to a holo-ChuS solution. Over time, the Soret band at 408 nm was shown to decrease, with a complementary in- crease then decrease in the near-IR (suggesting an intermediate) and a slower but steadier increase of absorbance at ∼560 nm (suggesting a final breakdown product). Further studies showed significant CO production when holo-ChuS was treated with ascorbic acid, an effect that could not be replicated when the protein was not present. They further showed that CPR-NADPH could be used as a reducing partner, al- though this produced less CO. In 2016, Ouellet et al. continued this study to a detailed level. They proved that this reaction is aerobic and that hydrogen peroxide, H2O2, is a crucial component. More precisely, they demonstrated that the sodium ascorbate (c.f. ascorbic acid) was used to produce H2O2, which would then react with haem. 99 Adding catalase (a fam- ily of enzymes that decompose H2O2 to water and oxygen) to the reaction mixture slowed haem-degradation in a concentration-dependent manner. This observation proved the presence of H2O2 in solution, differentiating this reaction from those that follow a ‘canonical’ HO pathway, as described in Section 1.2.4 and in Fig. 1.5. It is not entirely clear how this H2O2 is produced. Ouellet et al. did, however, show that the synthesis of H2O2 in a solution of ascorbate increased upon the addition of holo-ChuS. As this increase did not correlate linearly with an increase in ChuS, it was concluded that this non-correlation must be due to the competing haem break- down reaction (which results in H2O2 consumption), which ChuS then promotes. Ouellet et al. suggested that peroxide formation would be ‘produced indirectly after the formation of superoxide.’99 To demonstrate conclusively that H2O2 in solution was required for the reaction, Ouellet et al. set up an assay whereby peroxide was added to holo-ChuS rather than ascorbate. The same products were formed. Ouellet et al. also monitored holo-ChuS and H2O2 under anaerobic conditions. Interestingly, the Soret maximum disappears over time, suggesting haem is either breaking down or being displaced, but there is no concomitant increase of absorbance 35 Introduction at ∼560 nm.99 Unfortunately, these data have never been made publicly available as far as this author is aware, so a comparison with the novel reaction discussed in this thesis cannot be made. Returning to the aerobic reaction, Ouellet et al. also closely studied the break- down products. The peak at 565 nm did not indicate biliverdin formation, the expected product of a canonical HO reaction (which typically has peaks at ∼373 nm and ∼668 nm). However, upon complexation with pyridine of the intermediate species (i.e. pyridine was added to the reaction mixture when the near-IR species was most prevalent), the resulting spectrum was characteristic of a bis-pyridine Fe2+- verdohaem complex,99 with absorption maxima at 395, 499, 533 and 660 nm.100 This structure was illustrated as part of the reaction pathway for a canonical HO shown in Fig. 1.5. As shown in that figure, verdohaem appears to be the penultimate structure before biliverdin, thus suggesting that the aerobic reaction of H2O2 with holo-ChuS perhaps overlaps at least to some degree with a typical HO reaction, but then diverges when it comes to the further breakdown of verdohaem. Pyridine was shown to have little effect on the spectrum of the final breakdown product, with the peak at ∼565 nm remaining, suggesting it was unable to coordinate. Using a combination of electrospray ionisation mass spectrometry (ESI-MS), electron paramagnetic resonance (EPR) spectroscopy and nuclear magnetic res- onance (NMR) spectroscopy, Ouellet et al. further characterised this breakdown product. From MS, they saw masses at 514.32 m/z, which disappeared as time pro- gressed, and 437.19 m/z (for comparison, the mass of haem and PPIX are 616 Da and 563 Da respectively). The mass at 437.19 m/z corresponds to C24H26N3O5, suggesting a net loss of 10 C, 8 H, 1 N and a gain of 1 O with respect to PPIX (C34H34N4O4). NMR studies clearly identified the formation of hematinic acid, a molecule with one pyrrole and formula C8H9NO4. This formation would strongly suggest that PPIX loses one of its four pyrroles during the course of the reaction, and that the 437.19 m/z species is therefore a tripyrrole product. This reaction constitutes a sig- nificant difference from a canonical HO: even although the final product, biliverdin, is cleaved at one of its meso-positions, it retains all four of its pyrroles. Ouellet et al. were unable to determine the exact structure of this tripyrrole, but their working hypothesis for what this could be is shown in Fig. 1.16. Recently, Mathew et al. proposed that ChuS could also act as a haem chaperone under anaerobic conditions.101 They noted that, in the absence of molecular oxygen, ChuS can still bind, but not degrade, haem. Indeed, holo-ChuS can be isolated at substantial yields;80;98 Mathew et al. quote 2 mM.101 It was therefore concluded that 36 Introduction Figure 1.16: Confirmed hematinic acid fragment and proposed tripyrrole structure from Ouellet et al.99 It would seem unlikely that the m/z fragment would retain a hydroxide counterion. Without the benefit of further evidence, this author would therefore sug- gest that the hydroxide should instead be incorporated covalently within the tripyrrole structure, though it is not clear how. ChuS could store haem under anaerobic but iron-replete conditions (thus preventing toxicity caused by high cytosolic haem levels), and perhaps even deliver haem to ChuW for this latter enzyme to catalyse a SAM-mediated anaerobic breakdown (see Section 1.3.4.2 for details). Mathew et al. tested this hypothesis by running two ChuW-based assays where the only difference was in the substrate used: haem in one, and holo-ChuS in the other. The resulting UV-Vis spectra showed that anaerobilin (which has distinct peaks at 445 nm and 795 nm, and to a lesser extent at 570 nm) was produced in each case, strongly suggesting that holo-ChuS was transferring haem to ChuW for the latter to then break down. NADPH was used in both these assays which was of interest to the author because NAD(P)H is a reagent required for the anaerobic breakdown of haem in ChuS and its homologues, as shall be described in the Results. It is unclear whether the inclusion of other reagents in these assays precluded this reaction from occurring or whether it was outcompeted by the ChuW-mediated reaction, with the spectra for anaerobilin masking that for the haem breakdown product from ChuS (which has overlapping maxima at∼590 nm and ∼810 nm). 1.4.5 ShuS As with HmuS, the crystal structure for ShuS has not yet been determined. It does, however, retain 98.5% sequence identity with ChuS, and so it is predicted that the two would be structurally very similar. Paradoxically, however, investigations into 37 Introduction the properties of ShuS (undertaken by a different group from that which studied ChuS above) have led in a very different direction. As far as this author is aware, the possibility of haem breakdown in ShuS has not been investigated in detail. The Wilks group at the University of Maryland tried to determine whether there was any HO activity in ‘standard NADPH or ascorbate based assays of ShuS’ and found there was none.102 Unfortunately, the exact details of these assays and their results were not published. This study is particularly relevant to this thesis as it has been determined that holo-ShuS can indeed react with NAD(P)H, albeit not via a typical HO pathway (details of which shall be forthcoming in Chapter 9). Instead, the binding of haem to ShuS in competition with DNA was the primary focus. Wilks was the first to show that DNA can bind competitively to apo-ShuS.102 Such DNA-binding is non-sequence-specific. At first, it was thought that this DNA- binding could be analagous to Dps, a protein which can sequester and protect cellular DNA from oxidative stress or periods of low nutrient availability. However, a later study by Kaur & Wilks showed that the breakdown of DNA was only limited to a small degree by ShuS when exposed to oxidative stress.103 Why DNA binds to ShuS is still a mystery, although a recent study, also from the Wilks group, on the homologue PhuS may be able to shed some light on this result, as discussed presently. 1.4.6 PhuS PhuS is a more distant relative from HemS than any of the other homologues dis- cussed thus far. Its % Identities with HemS, HmuS, ChuS and ShuS are quite low for them to be considered as homologues, at 42.6%, 43.8%, 41.1% and 40.8% respec- tively. This protein was not part of the present study, but some of its structural and binding properties are worth considering. PhuS has been demonstrated to fulfil all of the proposed functions suggested for its homologues above: it has been shown to chaperone haem to other haem breakdown enzymes, degrade haem itself and bind to DNA. Wilks et al. have, as with ShuS, taken the lead in the study of PhuS. In 2006, they discovered that haem in PhuS could be broken down by ascorbate or the CPR- NADPH system.104 They noted that this was equivalent to experiments conducted on ChuS by the Jia group (mostly led by Suits) at Queen’s University, Kingston,80 as was discussed in Section 1.4.4. The Wilks group drew different conclusions from their results compared to the Jia group. Whereas Jia had taken this reaction to indicate HO activity, Wilks urged caution and argued that such a reaction did not 38 Introduction conform to the ordinary standards for a HO. Instead, it was taken to be a coupled oxidation reaction. This is a nonenzymatic class of reaction discovered nearly a century ago105–107 which does not even necessarily require the presence of a protein; proposed mechanisms for this process, as gathered from the literature (especially Avila et al.)108 are given in Fig. 1.17. Rather, haem in pyridine/water can be de- graded aerobically by hydrazine or ascorbate. Though verdohaem is produced (and this verdohaem can then be hydrolysed to biliverdin upon addition of KOH and HCl), this reaction is unlike the one promoted by a canonical HO as the products are not regioselective. A canonical HO reaction almost exclusively attacks haem at its α-meso-carbon, whereas a coupled oxidation reaction can attack any of the four meso-carbons. In both a canonical HO reaction and a coupled oxidation, however, CO is produced. As the Jia group had claimed HO activity for ChuS based on their observation of the production of CO,80 the Wilks group were therefore critical of this inference.104 Wilks also found that adding catalase to the reducing mixture stopped the reaction with PhuS,104 contradicting the claims of Jia that in ChuS, ‘addition of catalase and superoxide dismutase did not affect the trends of the spec- tral change.’80;98 Since catalase can break down H2O2, which is free in solution, and that a coupled oxidation reaction uses H2O2 whereas a canonical HO does not, Jia used their data to indicate that ChuS was indeed a HO. However, in a later paper, their position changed, and they too stated that addition of catalase inhibits the breakdown of haem in ChuS.99 These findings all seemed to reinforce Wilks’ position that the ascorbate-based assays conducted on ChuS and PhuS were both indicative of coupled oxidation. However, in this same later paper by Jia, the products were closely studied, as was described in Section 1.4.4. They found that the breakdown of haem in ChuS was regioselective, producing hematinic acid and a single tripyrrole. This outcome would suggest an enzymatic role for ChuS after all. It is therefore still unclear whether ChuS/PhuS are simply able to produce H2O2 from ascorbate, which then attacks haem via a couple oxidation process, or whether they are indeed aerobic haem breakdown enzymes. Due to the regioselectivity of the reaction, it would seem to be the latter situation, although the reaction does not seem to bear as much resemblance to a canonical HO as originally thought. Due to their scepticism surrounding the enzymatic role of PhuS, the Wilks group focussed more on its context within the cell, and whether it could be shuttling haem to other haem breakdown enzymes. They demonstrated that PhuS can transport haem to a bona fide haem oxygenase found in Pseudomonas aeruginosa, known in the literature as pa-HO or HemO. By observing the signature Soret bands (410 nm for holo-PhuS and 406 nm for holo-pa-HO), it was shown in the assays prepared that 39 Introduction Figure 1.17: Proposed scheme for the coupled oxidation mechanism. Reaction at the α- meso-carbon atom is depicted, although the β, γ and δ positions can also be targetted. Oxy haem can be converted to ferric hydroperoxo haem directly. Alternatively, superoxide can be released to regenerate ferric haem; a second superoxide causes dismutation to peroxide, which can attack free ferric haem, thus generating ferric hydroperoxo haem. The hydroxyl can then attack one of the porphyrin meso-positions, producing a meso-hydroxyhaem. Conversion to verdohaem and biliverdin then proceeds via the same mechanism as in the canonical haem oxygenase case. 40 Introduction haem readily transfers from PhuS to pa-HO but that the reverse is not true. Sur- face Plasmon Resonance (SPR) studies revealed a K D of 64 nM, a value considered consistent with PhuS associating to pa-HO in a specific and physiologically relevant manner.104 It is also thought that a particular arrangement of histidine residues in PhuS could be conducive to the release of haem. Various studies by Wilks et al.109;110 showed that PhuS contains an extra two histidines, H210 and H212, in ad- dition to H209, the residue typically implicated in haem-binding (and the one shown to bind haem in the crystal structure, PDB 4MF9).111 These three histidines are all in close proximity, and interestingly H210 and H212 do not have any equivalents in the aforementioned homologues of PhuS. Wilks et al. therefore suggested that haem, once PhuS was anchored to pa-HO, was transferred from His-209 to His-212, a more exposed histidine, to facilitate haem transfer. A H212A mutation confirmed this hypothesis. Though this is evidence from only one system and much more research into this area is needed, it suggests that haem transfer from pathogenic cytosolic haem chaperones requires multiple histidine residues at the haem-binding site in or- der to ultimately release the haem. Close relatives of PhuS, which lack these extra histidines, have not been implicated in haem transfer to HOs. Within the last year, the Wilks group have demonstrated that PhuS is also capable of binding DNA.112 Similar to their work with ShuS, they showed that such binding is mutually exclusive with respect to haem. However, in this case the binding was shown to be sequence-specific. Namely, PhuS was shown to target the prrF1 promoter, an iron-responsive region directly downstream of the phu operon. PrrF bacterial small RNAs (sRNAs) are important during iron starvation as they cause mRNA degradation of nonessential iron-containing proteins,113–115 thus directing iron to where it is needed most. In addition to this function, a haem-dependent read-through of the prrF1 terminator yields a longer PrrH transcript, PrrH being a recently discovered sRNA with suspected importance for infection.112 From these findings, Wilks therefore proposes that PhuS fulfils a dual function regulating haem flux through HemO, and in transcriptional modulation of PrrF/PrrH in conjunction with haem itself. Research into HemS and its homologues (and their context within their operons) has been pulled in different directions by different groups. This research has led to some fascinating insights, and much debate and confusion surrounding the true roles of these interesting proteins. Table 1.2 and Fig. 1.18 are provided to summarise the data discussed above, highlighting some key points. 41 Introduction Figure 1.18: Summary of different known haem breakdown pathways. Canonical HO : using oxygen and reductants such as NADPH/Cytochrome P450 reductase, Fe2+ is ex- tracted from haem, and biliverdin and CO are produced with ferrous verdohaem as the final intermediate. ChuS : non-canonical ‘HO’ reaction with ascorbate in oxygen releases CO, Fe3+, hematinic acid and a tripyrrole product. Ferric verdohaem is the final interme- diate. PhuS : oxidative haem degradation reaction that can use ascorbate or CPR-NADPH as reductants. Products of this reaction are not yet characterised but verdohaem is one of the intermediates and biliverdin is not produced. ChuW : anaerobic haem degradation using radical SAM mechanism to produce anaerobilin. HemS : anaerobic reductive haem degradation using NADH and producing a novel uncharacterised tetrapyrrole product, to be discussed below. Figure and caption reproduced from Xie.81 42 Introduction Species Protein Chaperone DNA Binding Haem Breakdown holo/apo Crystal Form Yersinia enterocolitica HemS –a – X21;42 Monomer/Monomer Yersinia pestis HmuS – – – N/A Escherichia coli O157:H7 ChuS X101b – X80;99c Dimer/Monomer Shigella dysenteriae ShuS – X102;103 –d N/A Pseudomonas aeruginosa PhuS X104e X112 X104f Dimer/Dimer Table 1.2: Summary of homologues selected for study/consideration in this thesis. To- gether, they represent a wide range of homology with HemS. A X indicates that the feature has been demonstrated using the protein in question, whereas a – indicates that it has not (although this could be because no group has tested for the activity under consideration). aHemS perhaps acts as a haem chaperone when the haem concentration is high.21;42 bDemonstrated to transport haem to ChuW. cReaction with CPR-NADH sys- tem demonstrated. Requires aerobic conditions. dWilks attempted standard NADPH and ascorbate based assays but found no HO activity. eDemonstrated to transport haem to HemO. fReaction with CPR-NADH system demonstrated, although it was suggested the observations may be due to a coupled oxidation process instead. 1.5 Novel, Anaerobic Haem Breakdown Discov- ered in HemS 1.5.1 NADH Structure and Properties Sawyer tested a variety of small molecules to determine whether any of them dis- rupted haem-binding in HemS. It was found that glycerol, triethanolamine, maltose, trehalose, imidazole, ATP, galactose, NADH and NADPH could all do so. It is in- teresting to note that the latter four molecules all have an adenine group, since DNA-binding has been reported in ShuS and PhuS. In each case, these molecules did not appear to induce any other activity, except for NADH and NADPH. For reasons mainly associated with cost (NADPH tends to be ten times more expensive than NADH), NADH became the focus of further research, although where NADPH has been used instead of NADH, the reaction has behaved identically. NADH is a common biological cofactor and can be used as a reducing agent and hydride source.116;117 It consists of two ribose groups attached via phosphate groups at their 5′-positions. Both of these ribose units have substituents attached to their 1′-positions: an adenine at one end, and a nicotinamide at the other. NADH can exist as two different diastereoisomers, but the β-form is found almost exclusively in a biological context. As has been demonstrated experimentally116;117 and computationally,21 NADH preferentially occupies a folded-up conformation when free in solution, with ribose- ribose stacking. However, this shape prohibits access to many protein pockets and so to fulfil its biological role, NADH is often required to unfold. Many crystal 43 Introduction Figure 1.19: NAD(P)H/NAD(P)+ structures. NAD(P)H (left) loses a hydride to produce NAD(P)+ (right). R = H gives NADH/NAD+. R = PO3 gives NADPH/NADP +. structures of NADH bound inside proteins show it in a stretched conformation. Certain residues can assist with this unfolding process (particularly by providing hydrogen-bonds to the phosphate backbone), and identifying which ones can do so has become a recent area of interest. This process will be discussed with respect to HemS and its homologues more fully in the Results section. Once inside the protein pocket, NADH functions by transferring its nicotinamide hydride, H−, to its target, leaving NAD+. The standard electrode potential of this NAD+/NADH redox pair is −0.32 V, making NADH a strong reducing agent.118 1.5.2 Biophysical Research into Haem Breakdown in HemS Incubation of NADH with HemS and haem resulted in dramatic changes to the UV- Vis spectrum. Most noticeable is the large increase in absorbance at 591 nm, which is accompanied by the solution turning from light yellow to purple. Variations in the wavelength at ∼800 nm were noted but not studied – this effect shall be expanded on in the Results section. Unfortunately, NADH, which has a high extinction coefficient with a peak at 340 nm, was used in such high concentrations that the spectra of the HemS complex was effectively masked from 250-400 nm. Therefore, it was difficult to track the Soret band, and whether it was decreasing as the peak at 591 nm increased. A series of absolute spectra, showing the formation of this purple compound, is given in Fig. 1.20. A variety of tests were undertaken to determine whether this peak could be an experimental artefact. It was shown that the holo-HemS complex was stable with no haem breakdown for a period of over two months; nor did any reaction occur 44 Introduction Figure 1.20: Representative UV-Vis absolute spectra showing haem breakdown by NADH in HemS. Evolution is charted in 1 minute intervals, from 1 min (red) to 20 min (blue), and reveals the formation of a purple haem breakdown product (591 nm). Reactants were mixed in the ratio 5 µM HemS : 20 µM Haem : 2000 µM NADH. The inset is from Sawyer,42 and shows the absolute spectrum of the haem breakdown product following HPLC. when free haem was incubated with NADH. An assay with cytochrome b562 showed that NADH could reduce haem but no enzymatic activity was exhibited. Tests were also conducted, by Gregory,119 to determine whether the iron ion was an essential component of the reaction, or whether it could proceed with other transition metals, or even non-complexed PPIX. He showed that Co(III)-PPIX, Zn(II)-PPIX and PPIX were all incapable of reacting, suggesting that iron was essential. Increasing the concentration of the protein or the NADH was demonstrated to increase the rate of formation of this 591 nm species. However, the concentration dependence on haem proved to be more complicated, as shown in Fig. 1.21. This figure shows that the initial rate of reaction increases with haem concentration when the haem concentration is low, but that the rate then decreases again when the haem concentration becomes too high. This behaviour would suggest therefore that at high concentrations, haem is inhibiting its own breakdown. Whether this inhibition is due to the haem : protein ratio being too high, or whether it is due to the absolute concentration of haem being too high, is unclear. Regardless, this behaviour suggested that HemS has a dual role: a haem breakdown enzyme at low haem concentrations to release iron; and a haem-storage protein at higher haem concentrations, protecting against toxicity and retaining the excess haem for future 45 Introduction Figure 1.21: Plot of the initial rates of the NADH-dependent holo-HemS reaction vs haem concentration. This experiment was conducted at four different protein concentrations (red: 0.1 µM; blue: 1 µM; green: 5 µM; black: 20 µM). At low haem concentrations, the initial rate of reaction increases. A maximum is reached, however, after which the rate decreases with increasing haem concentration. Figure reproduced from Sawyer42 and Choy.21 use. As the properties of this 591 nm species do not match any known porphyrin structure in the literature, Sawyer attempted to isolate and characterise it. This investigation has proved to be a difficult process. It was suspected that the butanone extraction method used to separate this breakdown product from HemS was leading to its further breakdown, reflected by a further colour change from purple to yellow.42 The extraction appeared to have removed iron completely, giving sharp resonances in a 1H spectrum. Nevertheless, the resulting NMR analyses proved too difficult to interpret. Paradoxically, butanone extractions carried out by this author and George Biggs appear to have been more successful in retaining the integrity of the haem breakdown product (i.e. the purple colour remained) but that the iron still present is paramagnetic and so causes peak-broadening, making the NMR data uninterpretable. This problem shall be discussed further in Chapter 5. Mass spectrometry has proved more fruitful. High performance liquid chro- matography (HPLC) was used to separate the product from the rest of the HemS reaction mixture, so that it could then be characterised by liquid chromatogra- phy mass spectrometry (LCMS). These results are shown in Fig. 1.22A, taken from Sawyer.42 46 Introduction Figure 1.22: LCMS analysis of haem breakdown product. (A) Complete mass spectrum. (B) Magnified spectra of highlighted major components from (A), giving a clear picture of the isotope patterns for each. (C) MS2 spectrum of the parent ion, m/z 613.3. (D) MS3 spectrum of the base peak, m/z 569.3. Reproduced from Sawyer.42 47 Introduction This complete spectrum indicates that the parent ion (and so presumably the haem breakdown product) has a m/z ratio of 613.3. The base peak, meanwhile, has a m/z ratio of 569.3. This value corresponds to the loss of a carboxylate group, which was later confirmed by an accurate mass analysis. Another notable peak in the spectrum is at m/z 462.2.b MSn analyses of m/z 613.3 and 569.3 showed that these ions can both break down to form this m/z 462.2 species, as shown in Fig. 1.22C+D. Accurate mass analysis shows clearly that, to give the m/z 569.3 or 462.2 species, iron is not lost in either case. The following scheme shows which fragments in principle may be lost to produce the m/z 462.2 species, and proposes what they may be: 613.2643 − 462.2010 = 151.0633 = C8H9NO2, propionate-containing pyrrole 569.2739 − 462.2010 = 107.0729 = C7H9N, vinyl-containing pyrrole This result is problematic because haem contains both of these potential fragments: it has two propionate-bearing pyrroles and two vinyl-bearing pyrroles. The question was therefore whether there was any way of determining whether it is one or the other, or a mixture of the two, which is lost during the fragmentation process. Sawyer determined conclusively that it is a vinyl-containing pyrrole that is always lost,42 implying that the m/z 462.2 ion is a tripyrrole with the haem molecule’s two original propionate-bearing pyrroles and one vinyl-bearing pyrrole still present. This result suggests that either the β-meso-carbon or the δ-meso-carbon must be broken along with the α-meso-carbon to form this 462.2 ion. It is now known that it is the β-meso-carbon that is broken, as shall be described later in this Introduction. Returning to Sawyer’s original experiment, the discovery that it is a vinyl-containing pyrrole that is always lost was made through comparative studies with deuterohaem and mesohaem. These molecules, chemically very similar to haem, can react in the same way with NADH in HemS. It was shown that the difference in masses that deuterohaem and mesohaem have with haem were replicated with those for the higher order products (i.e. those with m/z 613.3 and 569.3), but that this difference was halved when it came to m/z 462.2. This result is shown in Fig. 1.23 and Table 1.3. The fact that there are no other ions of significant abundance would suggest that the m/z 613.3 and 569.3 ions are breaking down consistently. This situation would perhaps suggest that the haem breakdown product already has its ring cleaved at one of its meso-positions before ionisation in the spectrometer. Were this not the bThe m/z value of 462.3 quoted in Fig. 1.22 is a typo. 48 Introduction Figure 1.23: Haem with its deutero- and meso- alternatives. case, and the cyclic tetrapyrrole was still intact, it would seem likely that there would not be a significant preference for which meso-carbon had one of its bonds broken first, resulting in a greater variety of fragments being lost. Such an inference is speculative, but it would seem sensible since haem breakdown proteins do typically cleave the PPIX scaffold of haem in order to reduce the effectiveness of iron chelation, and therefore to extract the iron. LCMS Masses Haem Deuterohaem Mesohaem Mass Mass Δ from Haem Mass Δ from Haem Before Reaction 616.2 564.2 −52 620.2 +4 LCMS Fragments 613.3 561.6 −52 617.2 +4 569.3 517.2 −52 573.3 +4 462.2 436.2 −26 464.2 +2 Table 1.3: Comparison between the masses, in Da, produced by haem from the NADH- induced reaction and some of its derivatives. The halved mass differences on the bottom line indicate that one of the pyrroles with a different substituent between the haem versions has been lost. Data from Sawyer.42 1.5.3 Limitations of the Biophysical Approach Sawyer estimated that NADH-binding in HemS has a K M of 1823µM at pH 7.0, which was taken to be very weak.42 The difficulties associated with crystallisation of proteins plus the suspected transitory nature of NADH-binding therefore discour- aged an attempt at X-ray crystallography. Furthermore, it was realised that unless the reaction could be quickly arrested, it would be difficult to crystallise HemS with NADH and haem together in the pocket. 49 Introduction In order to probe this possible NADH-binding site and how NADH could interact with haem, bioinformatics and computational modelling were turned to. Sawyer, through informed choices of mutations to the two residues which coordinate with the iron of haem (e.g. H196A, R102W and H196W) had shown that the reaction with NADH could be severely limited.42 These mutations were all with a view to disrupting haem-binding. It was wondered whether any residues could be identified which could disrupt NADH-binding, or the reaction between NADH and haem, and a combination of bioinformatics and modelling again seemed like a promising avenue of research to determine this. 1.6 Bioinformatics and Computational Biochem- istry 1.6.1 Principles of Bioinformatics Data science is the extraction of knowledge from data using a variety of scientific methods, algorithms and processes. This is to aid comprehension of data and identify trends. Within a generation, it has grown to become one of the major tools under- pinning modern society, from recruitment120 and manufacturing121 to healthcare122 and social networking.123 Bioinformatics is a subset of data science which focusses on developing software to better understand biological data. This field arose in order to handle the large quan- tities of data being generated when protein sequences first began to be determined in the 1950s. This took on further importance in the 1970s when first-generation sequencing technologies for DNA started to emerge. Today, bioinformatics plays a key role in biological research, having driven, for example, the discovery of several targetted anti-cancer drugs124 and the completion of the human genome project.125 Large quantities of genomic (DNA-based), proteomic (protein-based) and struc- tural sets of data are stored online in various databases. This allows for comparison between between sequences and structures. Therefore, newly-discovered molecules can be sequenced and/or their structures determined, and then compared against other molecules already in these databases. A range of algorithms have been devel- oped to expedite these comparisons, allowing for tens of thousands of scans to be made in a few seconds. One area of bioinformatics that is receiving increasing attention is protein- ligand/protein-cofactor docking. This approach is particularly useful if conventional experiments show that the ligand binds too transiently for full characterisation (as in 50 Introduction the case of NADH in HemS), or if a pocket appears capable of hosting a generic lig- and whose properties are otherwise unknown. Using the databases available, models can be developed that ‘score’ existing, known protein-ligand interactions based on residue-ligand compatibilities and certain structural motifs. A comparison can then be made with the new protein-ligand combination(s). ‘Hits’ then catalogue areas of potential similarity between this new combination and those in the database, with higher scores typically indicating better alignment. Sophisticated methods to deter- mine the relevant criteria for these hits and scoring methods have been developed, which are discussed in more detail later in this Introduction, and in the Methods section. 1.6.2 Principles of Computational Biochemistry Computational biochemistry is often considered as synonymous to, or a subset of, bioinformatics. The terms are nebulous, but in this thesis the two fields shall be considered distinct. Whereas bioinformatics handles, processes, and then makes pre- dictions based on deposited data (i.e. it is a ‘data-based’ approach), computational biochemistry uses ab initio or semi-empirical models to evaluate the behaviour of certain biological processes (i.e. it is a ‘calculation-based’ approach). Computational biochemistry has also proven to be a useful new tool in the bio- chemist’s repository. For example, it has been demonstrated to accurately predict the 3D structures of protein loops, which are usually difficult to determine experi- mentally, due to conformational flexibility.126 This capability will be important when it comes to discussing the Results, since the published X-ray structures of the ho- mologues under investigation all have at least a part of an important protein loop missing. Computational biochemistry encompasses a wide range of methods. Ab initio techniques are the most accurate, but the computational resources required preclude them from the study of all but the smallest of molecules. Coarse-grained methods provide a way for studying very large molecules or biological systems, but do not provide atomic resolution. For the study of proteins, semi-empirical methods tend to be a sensible compromise, as they sacrifice some accuracy for computational expediency and yet maintain atomic resolution. Semi-empirical methods consist of a force fieldc and a parameter set. A force field adheres to a carefully constructed functional form designed to accurately model covalent bonds, as well as electrostatic and van der Waals interactions. Though the cDespite this term, a force field typically refers to an equation describing the potential energy of a system, from which forces can be derived. 51 Introduction overall form tends to be similar, details can vary between the functions on offer from different companies and research groups. The functional form for the AMBER family of force fields, used in the calculations throughout this thesis, is as follows: EAMBER = ∑ bonds Kr(rb − req)2 + ∑ angles Kθ(θ − θeq)2 + ∑ dihedrals Vn 2 [1 + cos(nφ− γ)] + ∑ i Vold) and exp[(Vold − Vnew)/kBT ] > Ran[0, 1] Reject (Vnew > Vold) and exp[(Vold − Vnew)/kBT ] < Ran[0, 1] (4.8) In other words, the step is always accepted if the energy of the new minimum is less than the one preceding it. Otherwise, it is only accepted if a function of the energy increase is greater than a threshold value, which is generated randomly for every step taken.201–203 The structure can then either be reset to Vnew or Vold (depending upon whether the step was accepted or rejected) as is the case in the MCM procedure, or allowed to vary continuously.133 A significant advantage of this scheme is that the effective temperature is the only variable when a fixed average acceptance ratio is achieved via dynamic step size adjustment.204 This scheme can therefore be transferred between different, unrelated systems. Perturbations are also local and so the transformation expressed by Eq. (4.7) is not applied to the surface as a whole for each step. This search strategy, however, still depends on an effective minimisation proce- dure. It is particularly advantageous to use the limited memory Broyden-Fletcher- Goldfarb-Shanno (L-BFGS)205–210 algorithm for larger systems. This is a quasi- Newton optimisation algorithm where an approximation of H-1, the inverse Hessian, 84 Computational Methods is stored implicitly. This representation allows for the required storage of data to scale with respect to N 2 as opposed to N 3, an important consideration when the number of atoms, N, is large.133 A line search is typically used to dictate a suit- able step size. However, a different algorithm has been implemented in gmin,158 which has been determined to be more efficient.211 Rather than having to calculate a specific step size along the descent direction found, this algorithm is less stringent, instead ensuring that the step along the descent direction does not produce an ex- cessive energy rise or exceed a maximum step size. The step size is scaled down if either of these conditions is breached.211 4.6 Transition States For the PESs being investigated in this thesis, and to give accurate kinetic informa- tion, transition states connecting the minima need to be identified. A minimum on the PES is a stationary point with positive curvature in all di- rections aside from degrees of freedom corresponding to overall translations and rotations. A transition state has one negative eigenvalue and hence negative curva- ture in one direction (the reaction coordinate) but retains positive or zero curvature in all other directions.212 Therefore, to find a transition state, the energy must be minimised along all modes apart from along the reaction coordinate, where the en- ergy should be maximised. The system is balanced in one degree of freedom, thus requiring a precise search strategy. The strategy implemented in the program OPTIM213 is to use the doubly-nudged elastic band (DNEB)214;215 algorithm to generate transition state candidates, and then use hybrid eigenvector-following (HEF)216–218 to refine them. DNEB transition state searches begin by attempting to connect two minima. A direct path may not necessarily connect these two minima, in which case more (intermediate) minima and transition states may need to be considered (for more details, see Section 4.7). In the simple case where a direct path is possible, the DNEB search operates as follows.214;215 After receiving the two minima geometries, X0 and Xnspr+1, as input, a series of interpolated geometries known as images, {X1,X2...Xnspr}, are generated. Xi is simply a vector used to represent the coordinates of an endpoint or image, i. An attractive spring interaction is placed between the images to ensure that they do not simply collapse to the endpoint minima. This spring is what gives rise to the elastic band appellation: it can be imagined that an elastic band has been stretched across the PES and terminates at the endpoint structures. These springs 85 Computational Methods connect atoms of adjacent images, and so the spring potential, V˜, is distinct from the ‘true’ potential, Vt, which binds the atoms within each image together. In order to connect all of the images interpolated between the endpoints, nspr+1 springs are required. The overall ‘spring’ potential introduced is therefore: V˜ = 1 2 kspr nspr+1∑ i=1 |Xi −Xi−1|2. (4.9) Optimisation is carried out to minimise the forces acting on the images. In practice, this is difficult because the ‘spring’ potential interferes with the ‘true’ potential, the magnitude of such interference being system dependent.219 To overcome this problem, both the ‘true’ and ‘spring’ gradients, g and g˜, are split into components parallel and perpendicular to the path: g‖, g⊥, g˜‖ and g˜⊥. ‘Nudging’ the elastic band results in g‖ and g˜⊥ being projected out, flattening the PES and pulling the images down, respectively.134;220–222 Taken together, this procedure results in a more stable band, and the new gradient is simplified to, gNEB = g⊥ + g˜‖. (4.10) Following tests, Trygubenko & Wales214 found that giving the band a ‘second nudge’ (encapsulated by g˜∗ in Eq. (4.11)) causes the divergence of images from the path to be reduced. This new ‘doubly-nudged’ gradient, which reincorporates an element of the perpendicular component of the spring gradient, can therefore be expressed as: gDNEB = g⊥ + g˜‖ + g˜∗, (4.11) where g˜∗ = g˜⊥ − (g˜⊥ · gˆ⊥)gˆ⊥. (4.12) The L-BFGS algorithm is then used to minimise gDNEB. The evolution of the band during this DNEB process is illustrated in Fig. 4.2. Once the images have been optimised to below a specified root mean square (RMS) force, HEF is then used to refine candidates which appear to be maxima. HEF,216–218 in contrast to DNEB, is a single-ended method for determining tran- sition state structures. EF in general is a geometry optimisation method, which uses first and, optionally, second derivatives of the potential. Specifically, a Taylor expansion around the 86 Computational Methods Figure 4.2: Representation of the evolution of an interpolation band as the DNEB algo- rithm progresses. Blue areas corresponds regions of low potential energy (the endpoint minima being connected) and red to areas of high potential energy. The line with fine dashes represents the original linear interpolation. The dashed and solid lines show pro- gressively more optimised bands. Local maxima are taken as transition state candidates for refinement by hybrid eigenvector-following (HEF). Figure reproduced from Ro¨der.134 present geometry, X, can be implemented: V (X + x) = V (X) + G(X)Tˆx + 1 2 xTˆH(X)x. (4.13) Here, G(X) and H(X) are the gradient and Hessian at X respectively, with x being a small displacement. If the following condition is applied, dV (X+x) dx = 0, (4.14) then the standard Newton-Raphson formula can be derived:133 xNR = −H−1G. (4.15) Diagonalisation of the Hessian, Hνi = λ˜iνi, (4.16) can then be carried out, giving a Newton-Raphson step of xNR = 3N∑ i −Fi λ˜i νi, (4.17) where λ˜i and ν i are eigenvalue and eigenvector i, N is the number of atoms, and Fi is the component of the gradient along each eigenvector. Eq. (4.17) decreases in energy when a positive eigenvalue is being followed and increases when a negative eigenvalue is being followed. 87 Computational Methods This method has an important drawback, however, in that the Hessian scales quantitatively with the number of atoms. At very large numbers of atoms, there- fore, the size of the Hessian becomes very substantial, rendering its calculation and diagonalisation (which scales as N 3) computationally expensive. A significant advantage of HEF is that it does not require the Hessian matrix of second derivatives to be calculated at any point during its operation. Instead, a variational approach can be used to find the smallest eigenvalue along with its associated eigenvector.133;223 First, a Rayleigh-Ritz ratio for a displacement x from the present geometry is defined: λ˜(x) = xTˆHx x2 . (4.18) Though it may appear from this equation that the Hessian, H, is still essential to solving this problem, it is now possible to use a numerical second derivative to calculate λ˜(x), where ξ  1:211 λ˜(x) ≈ {∇V (X + ξx)−∇V (X− ξx)} · x 2ξx2 . (4.19) By minimising this ratio iteratively, the smallest non-zero eigenvalue can be deter- mined. This, in turn, allows its associated eigenvector to be calculated. Uphill steps can therefore be taken in this direction whilst the orthogonal subspace is minimised. It has been found that the L-BFGS algorithm is also effective in carrying out these calculations, and so it is used to carry out the HEF steps.224 To summarise, the DNEB & HEF approach is relatively inexpensive computa- tionally as it is capable of determining the eigenvalue and eigenvector of a transition state without needing to calculate or diagonalise the Hessian. After the eigenvalue- eigenvector pair has been found, an uphill step is taken in the direction of the eigen- vector, followed by partial minimisation orthogonally. This procedure is repeated iteratively until a user-defined RMS force convergence criterion is achieved. 4.7 Discrete Path Sampling The previous section showed how a transition state can be found by interpolating in a region I between two endpoint minima, A and B. Thus far, however, no method has been provided to determine whether this transition state is indeed directly connected to these starting minima. For long, complicated pathways, this situation is unlikely as it is probable that there will be intervening local minima in I. Therefore, a 88 Computational Methods methodology is required to determine how individual stationary points are connected within the pathway. Discrete path sampling (DPS) achieves this objective, first by determining the two local minima a transition state is directly connected to. This connection is known as a minimum-TS-minimum ‘triple’. DPS can then connect these ‘triples’ to give an overall chain between the two original endpoint minima. This chain is known as a discrete path. The minima directly connected to transition states are determined using steepest- descent algorithms. The steepest-descent paths, defined in Eq. (4.20), can be fol- lowed in both the parallel and antiparallel directions to give two connected local minima (with any such minima belonging to the intervening I set): dX ds = −G(X) G(X) . (4.20) In this steepest-descent equation, s is the integrated path length between two points on the curve, G(X), as in previous definitions, is the gradient vector, and G(X) is the magnitude of the gradient (G(X) = | G(X) |).133 When launched from a true transition state, there are only two solutions to Eq. (4.20).225 One of the minima the steepest-descent path reaches will sometimes be that from which the transition state search was first instigated. A decision then has to be made whether to remain at this original minimum or to move to the newly connected one in preparation for the next interpolation and transition state search. It is also possible that the transition state is not connected to either of the original minima at all, in which case these new minima can be saved separately and then checked periodically to determine whether they have been connected to the database in any subsequent search.133 When the two endpoints chosen are far apart in configuration space, the possi- ble number of connections that could be made becomes combinatorial. An efficient selection process for searches therefore must be utilised. DPS uses the Dijkstra shortest path algorithm,135 a greedy algorithm, which describes the total set of min- ima (whether they belong to A, B or I ) as a weighted, directed graph, ζ(Ms,Es). 226 Here, Ms is the set of minima and Es the set of edges connecting them. In other words, a complete graph is defined from a set of edges, with an edge assigned to each pair of (i, j) minima. An edge weight, w(i, j), defines the state of connection of these pairs:226;227 w(i, j) =  0, if a known path exists between i and j ∞, if nu(i, j) = nmax f(D(i, j)), otherwise. (4.21) 89 Computational Methods The rationale for this algorithm is to identify the shortest connected path and, if that is not possible due to gaps in the pathway, to identify the most likely pairs of minima to bridge these gaps. An edge weight of 0 indicates that i and j are already connected. The shorter a gap is, the more likely it is to be selected. This behaviour becomes apparent when f(D(i, j)) is broken down. D(i, j) is the Euclidean distance between the locally permutationally aligned minima i and j,228 and f(D(i, j)) is a weighting function based on the distance. The edge weight is set to ∞ for a pair of minima if the number of connection attempts, nu, reaches a user defined maximum, nmax; this condition is applied to prevent the same pair of minima from being attempted repetitively. The Dijkstra algorithm is considered complete either once a path between the endpoints can be extracted or if a user-defined limit of searches is exhausted. The first scenario occurs when the total of the edge weights between the two selected endpoints reaches zero, which indicates that there are no gaps in the path. If the pathway between the endpoints still includes at least one edge weight which is non- zero, then more DNEB connection attempts will have to be made between those i and j minima that have non-zero weightings, in a new attempt to find transition states between them. This process cycles until all w(i,j ) values required to connect the endpoints become 0 (i.e. the path is completely connected). The entire discrete path sampling process is outlined in Fig. 4.3. 4.8 From Initial Pathways to Complete Represen- tations Discrete Path Sampling is a useful tool for identifying possible pathways for chemi- cal processes. However, as discussed in Section 1.6.3, protein-ligand interactions are typically complicated processes, involving many steps and multiple possible path- ways. Two problems arise if DPS is used simply as described in the previous section: 1. The landscape is almost certainly undersampled. DPS has indeed found and identified a fastest pathway between the two starting endpoints, but it is un- likely to be as efficient as possible (in other words, the fastest path identified is not necessarily the true fastest path). There may be artificially high bar- riers and/or superfluous intermediate structures present. Further sampling is therefore required. 2. DPS gives the fastest single pathway. However, as discussed in Section 1.6.3, in order to derive relevant thermodynamic and kinetic information, a consid- 90 Computational Methods Figure 4.3: Discrete path sampling process to find a pathway. Small filled circles represent transition states. Large circles represent minima. Dashed lines between minima and transition states indicate a direct connection. If a minimum is filled in blue with a question mark, this means it has been selected as a candidate for an attempted connection. If filled in green with a tick, this indicates that the minimum is connected to two transition states in the pathway. Progression flows from top to bottom. The first step illustrates the selection of two endpoint minima A and B. A search is made in the intervening space, I, typically through the use of DNEB and HEF. Steepest-descent algorithms are then used to find the minima directly connected to this transition state candidate. These new minima are added to a stationary point database. A Dijkstra analysis is then used to select two points in the expanding database. Again, these two minima are represented by blue circles containing question marks, and an attempt is made to connect them in the same way that a connection between A and B was attempted before. This process continues until a fully connected path is found, or some user-defined number of connection attempts, nmax, is exhausted. This figure is inspired but adapted significantly from Whittleston.229 91 Computational Methods eration of the FES (which constitutes multiple pathways connecting basins rather than individual minima) rather than the PES is required. To tackle the first of these issues, four main strategies are typically employed. These are listed and described below and in Fig. 4.4. 1. Shortening Pathways. Most often, DPS will find a fully connected pathway which is unnecessarily long. Due to the bias for connection attempts between minima that are conformationally similar, such pathways tend to consist of many intermediate, largely identical structures. This path is not necessarily the most efficient method of getting from A to B. Therefore, the shortcut scheme,226;230 employed in the program pathsample,231 selects pairs of min- ima on this ‘fastest’ path with a user-defined number of intervening transition states between them, and tries to connect them directly. If the connection is successful, many superfluous intermediates are therefore removed from the pathway. 2. Removing Large Barriers. A fully connected pathway found by DPS may also have artificially high barriers. During interpolation between minima, it is possible that the transition state(s) identified are high in energy. Unless there are specific conditions in place to ignore high-energy TSs, these are incorporated into the pathway. Therefore, the final fully connected pathway contains these high-energy TSs, significantly slowing down the fastest path. The shortcut barrier scheme230 selects the pairs of minima at either side of these high barriers, and tries to reconnect them via lower energy alternatives. 3. Removing Kinetic Traps. It is also possible that kinetic traps are present, here defined as low-lying minima separated by large barriers from A and/or B.134 The untrap scheme230 selects minima that have high ratios of the energy barrier separating them from A or B versus the energy difference to those states. This is therefore similar to the shortcut barrier scheme but tends to select minima further apart in conformational space. As such, it is not included in Fig. 4.4. 4. Connecting Stationary Points to the AB Set. As a database of sta- tionary points expands, many transition states are located, which are not connected to either the A or B endpoints. Come the end of the DPS scheme and the identification of a fully connected pathway, many of these transition states and their steepest-descent-derived minima are still not connected in any way to either A or B. In other words, they are not part of the AB connected 92 Computational Methods set. Sometimes these AB -unconnected stationary points are themselves part of large, connected clusters. The database therefore contains a lot of information on relevant transition states and minima, which are nevertheless not included in the determination of the fastest fully connected path. The connectunc scheme232 has therefore been developed in order to connect these stationary points to the AB set. This scheme first works by splitting the minima in the database into two sets, AB and AB. A minimum i within AB is then selected, either explicitly by the user, or connectunc can identify the lowest energy minimum and choose that, and the n closest minima in AB are then identified. Connections are then attempted. This is an efficient method for expanding the AB set, using stationary points already in the database. In Chapter 6, a new feature added to connectunc shall be discussed. The second issue, namely the consideration of multiple pathways to derive ther- modynamic and kinetic properties, rather than from a single pathway, is dealt with by considering free energies. Unfortunately, a full consideration of free energies proved beyond the scope of the current work. However, as a derivation of these values would most likely be the next stage of this project, the principles behind the computation of free energies are given in Appendix C. 4.9 Disconnectivity Graphs Due to the sheer number of degrees of freedom possible in a large biomolecular system, the number of possible minima and TSs, which scale exponentially with system size,233 is immense. The representation of these data therefore becomes problematic. Textbook-style depictions of PESs, as in Fig. 1.24, are simply not suitable as these are merely 3D representations of landscapes, whereas the problems being studied in this work have much higher dimensionalities to consider. As it is not possible for human beings to visualise high-dimensional objects, an alternative strategy to represent these stationary points faithfully is therefore required. One such strategy is to use disconnectivity graphs.133;234;235 Such graphs, also known as ‘trees’, consist of nodes which correspond to superbasins and line endpoints which correspond to minima. The y-axis captures the energy of the system, whereas the x -axis is a free variable which can be altered by the user to give the clearest horizontal representation of the data. Superbasins arise by running analyses at fixed energy intervals, En, from which minima are classified into disjoint sets. Minima are considered as being in the same set (or superbasin) if a discrete path, consisting of intermediate minima and TSs, is possible between them for which no energy exceeds 93 Computational Methods Figure 4.4: Illustration of refinement schemes. The red line in each chart shows a connected pathway between endpoints A (the locus at the far left) and B (the locus at the far right). Loci corresponding to an odd number in the integrated pathway are minima, and loci with even numbers are TSs. A shortcut. A shorter path between minima 5 and 13 via a new TS was found, shortening the overall pathway. B shortcut barrier. An alternative TS with a lower barrier was found between minima 3 and 5. C connectunc. A min-TS-min triple unconnected to the main AB set was connected and incorporated into the pathway, giving an alternative route with lower barriers. 94 Computational Methods En. Each superbasin is marked on the graph by a node at this corresponding energy, En. Therefore, minima in separate superbasins need to traverse this node (i.e. one or more stationary points in the pathway separating them is above En) in order to interchange. By sampling at various energies, En, the disconnectivity graph takes on a structure, which can provide a useful pictorial guide for the basic properties of the system being investigated. This construction is illustrated in Fig. 4.5. 4.10 Implementation of Wales Group Methods on GPUs As emphasised in Section 1.7.2, previous computational work on the HemS sys- tem by Choy21 and Shang159 was severely limited by computational speed. This problem was largely due to these calculations necessarily having to be run on cen- tral processing units (CPUs). Since then, however, the basin-hopping, DNEB and HEF routines (including interfaces with AMBER) have been made compatible with graphics processing units (GPUs). GPUs require less resources to be allocated to data-caching and flow control compared to CPUs, freeing up more transistors for data-processing. This design increases efficiency for high intensity arithmetic cal- culations. Benchmarking has shown that Wales group methods typically run two orders of magnitude faster on GPUs than on CPUs.211 This speed-up was exploited in the work described in this thesis: all calculations were run on GPUs. 95 Computational Methods Figure 4.5: Pictorial correspondence between ordinary 2D-representations of a PES and disconnectivity graphs for three different energy landscapes, inspired by Wales.133 E cor- responds to energy, and En to energies at which a superbasin analysis was performed. A ‘Palm tree’ motif, which arises when the landscape corresponds to a steep funnel with low barriers. Protein folding and protein-ligand-based landscapes tend to have landscapes like this. B ‘Weeping willow’ motif, which arises when there is a shallow funnel with large bar- riers. C ‘Banyan tree’ motif, which arises from a ‘rough’ landscape and many competing low-energy minima. 96 Chapter 5 Further Experimental Insight into Haem Breakdown by HemS 5.1 Aims There were some outstanding issues from the experimental work on HemS previously done in the Barker group. Sawyer had clearly demonstrated that addition of NADH leads to the breakdown of haem in the HemS pocket to produce a novel product.42 The structure of this product, however, remained unknown, despite a large volume of biophysical data having been gathered on it. The mechanism of the reaction was also unknown. Indeed, it was not even certain if NADH was directly involved in haem breakdown. It was proposed, from the modelling done by Choy,21 that NADH transferred a hydride directly over to haem, but this had not been proven experimentally. There were also unanswered questions concerning the low turnover rate of the reaction. Was it, for example, due to inhibition by either of the products, and, if so, which one? Furthermore, though Sawyer had noted that the reaction was capable of proceeding anaerobically, these were preliminary results. Therefore, the aims of further research into this breakdown reaction were: 1. To establish whether the reaction can indeed proceed anaerobically, and more generally to note any points of differentiation from the aerobic reaction. 2. To investigate possible product inhibition in greater detail. 3. To conduct deuterium labelling experiments to determine whether a hydride is transferred from NADH or not. 4. To determine the structure of the final haem breakdown product. 97 Further Experimental Insight into Haem Breakdown by HemS Outcomes from experiments inspired by these aims are described in the following sections. 5.2 Proof of Anaerobic Reaction The NADH-dependent breakdown of haem in HemS was discovered under aerobic conditions,42 and these are the conditions under which the reaction has typically been carried out in the laboratory. Sawyer showed that the reaction was possible under anaerobic conditions too, with a HemS : Haem : NADH ratio of 5µM : 20 µM : 2000 µM, and that it proceeded at approximately the same rate as under aerobic conditions.42 These were preliminary results, and so the experiment was repeated by the author using the procedure described in Section 3.7. This repeat confirmed the discovery by Sawyer that the reaction could proceed without oxygen. This analysis was extended to investigate anaerobic reactivity under different haem : protein ratios. Sawyer had demonstrated in UV-Vis studies that increasing this ratio under aerobic conditions led to a greater likelihood of a further peak at ∼700 nm developing over time, in addition to the standard haem breakdown product peak at 591 nm. Increasing this ratio still further then led to the total shutdown of the reaction. From this result, it had been concluded that haem simply inhibited the reaction once its concentration was too high, and that this was most likely due to its propensity to dimerise at high concentrations. Anaerobic studies with higher haem : protein ratios than 4:1 shed light on the identity of this ∼700 nm species. At a ratio of 20:1, it simply did not appear, as shown in Fig. 5.1. This figure shows two sets of difference spectra. Baselines consisted of SEC buffer, as well as the appropriate concentrations of HemS and haem. Haem was given time (ca. 30 minutes) to bind to HemS before the reaction was initiated by the addition of NADH. The fact that this extra peak at ∼700 nm occurs in the aerobic case but not the anaerobic one would strongly suggest that there is a competing side-reaction that requires oxygen. The greater loss of haem in the the aerobic case, indicated by a greater loss of absorbance at the Soret band (408 nm), also suggests this conclusion. Furthermore, it is apparent that this side-reaction only becomes competitive when the haem concentration appreciably exceeds the HemS one. Verdohaem and biliverdin under similar conditions are known to display peaks in the 700 nm range.236 It was proposed, therefore, that this side-reaction was a coupled oxidation reaction, and was operating non-regiospecifically and without the 98 Further Experimental Insight into Haem Breakdown by HemS Figure 5.1: UV-Vis difference spectra charting the progress of NADH-dependent break- down of haem in HemS over time. Experiments were run in SEC buffer, and the pH was set to 6.5. Spectra were recorded for 20 minutes in 1 minute increments, indicated by the colour scheme which runs from red (1 minute) to blue/purple (20 minutes). Exact stoi- chiometries were 1 µM HemS : 20 µM Haem : 2000 µM NADH. Left: Aerobic conditions. A peak at ∼700 nm develops over time. Right: Anaerobic conditions. No such peak develops. need for any enzyme. This type of reaction was discussed in Section 1.4.6, where the Wilks group had pointed out that such reactions are often confused with genuine regiospecific haem oxygenase ones. NADH is a mild reductant, just like ascorbate, and is capable of reacting with oxygen to generate hydrogen peroxide, thus initiating non-enzymatic haem breakdown. As this is a non-catalytic process, this would also explain why it only becomes competitive when haem (and NADH) vastly exceed the protein concentration; otherwise, the catalytic haem breakdown process to form the 591 nm species dominates. To test this theory, LCMS was performed on both the aerobic and anaerobic product mixtures, following a reaction in the ratio 1µM HemS : 40µM haem : 2000µM NADH. As displayed in Fig. 5.2, the aerobic sample gives a large, clear m/z peak at 583.3 in addition to the signature haem breakdown product peaks at 569.3 and 462.2. This value corresponds to the mass of biliverdin. In the anaerobic LCMS spectrum, this 583.3 peak was negligible. This result was tangible proof, therefore, that a coupled oxidation reaction was capable of competing as a side-reaction at high enough haem concentrations, but only under aerobic conditions. To minimise this side-reaction, most experiments following this discovery were therefore run anaerobically, particularly if high product purity was sought. 5.3 Deuterium Labelling to Determine Hydride Transfer To better understand the role of NADH in the breakdown of haem, deuterium labelling experiments were conducted. 99 Further Experimental Insight into Haem Breakdown by HemS Figure 5.2: LCMS data for the aerobic reaction of holo-HemS with NADH, at a HemS : haem ratio of 1 : 40. The m/z peak at 583.3 corresponds to biliverdin. NADH is known both as a hydride donor and as a reducing agent. As discussed in Sections 1.4.4, 1.4.5 and 1.4.6, NAD(P)H can constitute one part of a group of reactants which produces hydrogen peroxide, and thus initiate coupled oxidation, which leads to the breakdown of haem and formation of biliverdin. However, direct transfer of hydride from NADH to haem has never, as far as the author is aware, been demonstrated in HemS or any of its homologues. One of the most common isotopes used in labelling studies is deuterium, which contains one neutron, whereas its isotope, hydrogen, has none. This makes deu- terium almost twice as heavy as hydrogen, which leads to some different physical properties, such as a thicker viscosity and decreased quantum tunnelling efficiency, but its chemical properties remain largely the same. These properties make deu- terium ideal for isotopic labelling experiments, especially when a particular hydrogen atom is of interest. Choy’s calculations suggested that the nicotinamide head of NADH points to- wards the haem once they are in close proximity. This nicotinamide head contains two hydride atoms, and their proximity to haem would suggest at least one of them is involved in haem breakdown. Tracking one or both of these hydrides would there- fore reveal if such an involvement is indirect or direct; the latter scenario would be indicated by the labelled atom becoming part of the protoporphyrin structure. Two enantiomers of deuterated NADH, (R)-NADD and (S )-NADD, were there- fore synthesised using the methods described in Section 3.12. NMR studies con- firmed that these were the correct (R)- and (S )-forms respectively, and that they were of high purity, as shown in Fig. 5.3. Three reactions were performed in parallel. Conditions were all the same, except that NADH was used in the first reaction, (R)-NADD in the second, and (S )-NADD 100 Further Experimental Insight into Haem Breakdown by HemS NADH (R)-NADD (S)-NADD Figure 5.3: Overlay of NMR spectra taken for NADH, (R)-NADD and (S )-NADD. Black boxes highlight the region corresponding to the hydride signals. The X label corresponds to the rest of the NADH molecule. in the third. Following the reactions, LCMS was performed on the product samples to determine whether deuterium transferred to the product fragments in any of the reactions. These spectra are shown in Fig. 5.4. The signals around the m/z 613.3 region were difficult to deconvolute, but the 569.3 and 462.2 regions showed clear variations in the isotope patterns in the two deuterated cases with respect to the NADH sample. In both the deuterated samples, the signals for 570.3 and 463.2 were greater than for those at 569.3 and 462.2, which is a reversal of the NADH-based sample. This difference suggests that deuterium can be transferred over to the porphyrin ring and, because each of the experiments were conducted in non-deuterated solvents, that the deuterium must have originated from NADD. Therefore, haem breakdown does occur via deuteride (and therefore, by implication, hydride) transfer. The fact that both the (R)-NADD and (S )-NADD experiments showed significant deuterium transfer suggests that the reaction is not stereospecific, and that hydride can therefore be released from either side of the nicotinamide head. This suggestion is reinforced by the fact that in each case, a mixture of deuterated and non-deuterated products were found at the fragment regions centred round 569.3 and 462.2, rather than the deuterated or non-deuterated products being found exclusively. These reactions had all been monitored using UV-Vis spectroscopy at a HemS : Haem : NADH/NADD ratio of 10µM : 10 µM : 1000 µM, which provided some low- 101 Further Experimental Insight into Haem Breakdown by HemS Figure 5.4: LCMS data for the product mixtures following reactions with NADH, (R)- NADD and (S )-NADD, respectively. Data for the molecular ion (m/z of 613.3 in the NADH-based sample) is inconclusive, but the peaks based at 569.3 and 462.2 have different isotope patterns in the (R)-NADD and (S )-NADD spectra compared to NADH. 102 Further Experimental Insight into Haem Breakdown by HemS resolution preliminary kinetic data, as shown in Fig. 5.5. The first thing to note about these spectra is their different evolution at the 408 nm Soret peak. Rather than pre-equilibrate HemS and haem for these reactions (unlike the typical procedure used throughout this thesis), HemS and NADH/NADD were pre-equilibrated first, a baseline taken, and the reactions were then initiated by the addition of haem. This procedure was to prevent NADH/NADD distorting the spectra around the 400 nm region. NADH/NADD has a broad, strong absorbance at 340 nm, which extends above 400 nm at high concentrations. Typically, experiments were run from the same NADH stock, and so these absorbances could be easily accounted for. However, for these reactions, three different NADH/NADD stocks were required and so, in case of slight concentration variations when added, it was decided that a baseline should be taken to correct for this possibility before initiating the reaction. This procedure has the benefit that the Soret peak seen at 408 nm for these experiments is purely due to HemS-haem binding, with no interference from NADH/NADD absorbance. These Soret peaks grew significantly up to absorbances of ∼0.8 in the two deuter- ated experiments, whereas in the non-deuterated case it did not even reach half that value. The conclusion reached was that in all three cases, haem was entering the pocket and coordinating to HemS quickly. However, in the case of NADH, conver- sion to the HBP was fast (thus preventing a build-up of HemS coordinated to haem, as indicated by the peak at 408 nm) whereas for (R)-NADD and (S )-NADD it was slow (where such a build-up did occur). This effect is reflected by the growth of absorbance at 591 nm, indicating the growth in concentration of the HBP. In the NADH case, this increase occurs quickly so that it has effectively reached a maxi- mum at 0.25 after 40 minutes, as shown by the inset in Fig. 5.5. For (R)-NADD, it appears that the absorbance is beginning to level off after 160 minutes, again at an absorbance of around 0.25. (S )-NADD appears to be slower still: it is still rising, and has only reached an absorbance of 0.2, after 160 minutes. These relative rates between the two deuterated forms of NADD are reflected by the data at the Soret peak. For (R)-NADD, this peak has decreased almost to zero after 160 minutes, suggesting that the haem added is almost used up. However, after the same amount of time in the (S )-NADD case, the Soret peak is still far from zero, indicating that a significant amount of haem is still to be converted. It is therefore clear that both deuterated forms of NADH slow down the reaction significantly, but one more so than the other; it is not entirely apparent why. The LCMS data in Fig. 5.4 unambiguously show that each form of NADD leads to a mixture of deuterated and non-deuterated product fragments. This mixture would 103 Further Experimental Insight into Haem Breakdown by HemS Figure 5.5: UV-Visible spectra comparing the rates of reaction between NADH, (R)- NADD and (S )-NADD. Reactants were mixed in the ratio 10 µM WT HemS : 10 µM Haem : 1000 µM NADH/NADD. Spectra were collected every minute for 10 minutes, then every 5 for 150 minutes, indicated by the colour change from red (1 minute) to blue/purple (160 minutes). Insets chart the changing absorbance at 591 nm over time. 104 Further Experimental Insight into Haem Breakdown by HemS suggest that both the hydride and the deuteride from each stereoisomer are capable of being transferred over to haem. Presumably, deuteride transfer is slower than hydride transfer. The kinetic isotope effect is a well-known phenomenon in chem- istry, where a heavier isotope, courtesy of a lower vibrational frequency, requires more energy to reach a given transition state. This difference would explain why both forms of NADD are slower than NADH, but does not explain the discrepancy between the (R)- and (S )-NADD forms. In the LCMS spectrum for NADH, the isotope patterns surrounding the 569.3 and 462.2 m/z peaks show smaller but significant amounts of species with one ad- ditional mass unit. This result is to be expected considering that isotopes, most commonly 13C, occur naturally. Meanwhile, Fig. 5.4 shows that, when either (R)- NADD or (S )-NADD is used, these heavier 570.3 and 463.2 m/z fragments are more abundant than their 569.3 and 462.2 non-deuterated alternatives. Therefore, accounting for naturally occurring isotopes, it appears that the proportion of deuter- ated and non-deuterated fragments when either (R)-NADD or (S )-NADD is used is approximately 1:1. This ratio suggests there is no, or very little, preference for which hydride/deuteride transfers over to haem. This result, at first glance, appears to contradict the slower rate of reaction (S )-NADD displays compared to (R)-NADD. A possible explanation is that there are, broadly speaking, two different conforma- tions NADH/NADD can adopt when transferring hydride/deuteride over to haem, and that these different conformations are each exclusive in terms of which hydride (either (R)- or (S )-) it can transfer. Furthermore, it is surmised that both conforma- tions are almost equally likely but one of them results in a slower rate of transfer of the hydride to haem (perhaps because a greater distance has to be traversed between the molecules). Since hydrides transfer faster than deuterides, hydride transfer will govern the overall rate. In (S )-NADD, the deuteride is at the (S )-position and the hydride is therefore at the (R)-position. As the rate of reaction is so much slower when the hydride is in this position (as opposed to (R)-NADD, when the hydride is at the (S )-position) we conclude that the rate of transfer is more difficult from the (R)-position. In other words, transfer of a hydride from the (S )-position is faster. The deuterides do not enter into this consideration because of their relatively slow rate of transfer with respect to hydrides. Because these different rates result in no noticeable difference between the pro- portion of the deuterated and non-deuterated forms in the LCMS spectra for either the (R)-NADD or (S )-NADD cases, it is further proposed that NADH/NADD must ‘commit’ to a conformation before transferring over one of its hydrides. In other words, even though a hydride at the (R)-position transfers more slowly to haem 105 Further Experimental Insight into Haem Breakdown by HemS than it does from the (S )-position, this process is still faster than the conforma- tional changes that would be required for the nicotinamide head to switch round and present its (S )-hydride instead. Possible conformations of the nicotinamide head of NADH shall be discussed further in Chapter 7. Efforts were made to char- acterise these deuterated haem breakdown products but, as with the non-deuterated products, this task proved to be difficult. 5.4 Identification of an Intermediate Sawyer had shown that the formation of the 591 nm species from UV-Vis spec- troscopy was directly dependent on haem breakdown. However, it was unclear whether this process was a straightforward conversion without any intermediates. It was noticed by the author after UV-Vis experiments under a variety of condi- tions that a further ‘peak’ seemed to arise, but then quickly disappear, at the edge of the spectra. It was realised that this peak must actually be in the near-IR (NIR) region, as only its shoulder could be observed due to the UV-Vis spectrophotometer cutoff at 800 nm. Often, this shoulder was only fleeting, particularly when the rate of formation of the 591 nm species was fast. This observation suggested that the shoulder represented an intermediate species on the pathway between haem and the HBP. To investigate this problem further, a stopped-flow spectrometer was used, which has the double benefit of being able to access the NIR region, and give high resolution over short timescales. Using an absorbance detector coupled with appropriate glass filters, a series of experiments charting the reaction of NADH with pre-incubated holo-HemS over a wide range of wavelengths (515–875 nm) were performed. The precise ratio used for these experiments was 5 µM HemS : 20 µM Haem : 2000 µM NADH, and the wave- lengths were measured in increments of 40 nm. Results are shown in Fig. 5.6. These experiments showed that the absorbances at the selected wavelengths in the 515–635 nm range grew steadily and then began to level off over the 1000 s time period measured. The absorbance was greatest at 595 nm, which is consistent with the peak for the HBP identified in other UV-Vis studies. The selected wavelengths between 675–875 nm, however, revealed that this change was not simply a conversion from haem to the HBP. These spectra showed a short burst in absorbance over the first 20 s, followed by a steadier decrease. This decreasing absorbance then began to level off around the 1000 s mark. The wavelength where this short-timescale ‘burst’ was shown to give the greatest absorbance was at 795 nm. As other UV- 106 Further Experimental Insight into Haem Breakdown by HemS Figure 5.6: Stopped-flow spectra showing 5 µM HemS : 20 µM Haem : 2000 µM NADH reac- tion progress at selected wavelengths. 200 time points were recorded over 1000 s. Spectra from 515–635 nm show a growth in absorbance over time, which begins to level off after ∼1000 s. This absorbance growth is greatest at 595 nm, consistent with the HBP peak seen in other UV-Vis studies. The behaviour is different from 675–875 nm. At these wavelengths, a short burst in absorbance (0–20 s), then a longer decrease which eventually levels off (20–1000 s), is observed. This result suggests intermediate formation, which then converts to the HBP. The absorbance growth is greatest at 795 nm for this intermediate within the wavelengths sampled. 107 Further Experimental Insight into Haem Breakdown by HemS Figure 5.7: Stopped-flow spectra showing 8 µM HemS : 8 µM Haem : 2000 µM NADH reac- tion progress at selected wavelengths and high resolution. 1000 time point were recorded over 60 s. There is a greatest growth in absorbance at 806 nm, indicating this is the peak for the intermediate species. Vis studies using the Cary spectrophotometers clearly showed that this peak must have a higher wavelength than 800 nm, it must therefore be located somewhere between 800 nm and the next increment tested by stopped-flow, 835 nm. These results suggest that an intermediate, with a signature peak at approximately 800 nm, is therefore formed first from haem breakdown, and then converts to the HBP (which has its signature peak at 591 nm). Having an intermediate with a peak at such a long wavelength could suggest that initial haem breakdown involves a disruption of aromaticity and/or change in the iron oxidation state. Given the porphyrin is eventually cleaved open to produce the HBP, it may be the case that some sort of sigmatropic rearrangement then occurs. Further stopped-flow experiments were then conducted to determine the exact wavelength of this intermediate peak using a shorter range of wavelengths (780– 820 nm) with shorter increments (2 nm). The location for this peak turned out to be at 806 nm, as shown in Fig. 5.7. Due to the fleeting nature of this intermediate, it was recognised that it would be difficult to extract from the protein pocket, isolate and characterise. Focus, instead, turned back to the final HBP as well as the precise role of NADH in the breakdown of haem. This intermediate shall be discussed in more detail in Chapter 9. 108 Further Experimental Insight into Haem Breakdown by HemS 5.5 Attempting to Determine the Haem Break- down Product Structure The exact structure of the HBP was still unknown, despite advances in under- standing the reaction mechanism. Attempts were therefore made to determine this structure. This task was difficult, though, as the yield of product was consistently low. One reason for this low yield is product inhibition; it is suspected that, in its biological context, HemS has an accompanying protein which it can transfer the HBP to. Another reason is that the reaction effectively shuts down at high haem concentrations. 5.5.1 NMR Sawyer42 and Choy21 had both reported difficulties in preparing HBP samples for NMR spectroscopy. They each tried different approaches: Sawyer tried a butanone extraction, and Choy used a C18 solid phase extraction column. Sawyer had shown that it was possible to extract the HBP from the protein on a large scale using 2-butanone. However, this procedure resulted in an immediate colour change of the overall solution from purple to light yellow, suggesting that iron had been removed from the product. This iron seemed to be important to the structural integrity of the multipyrrole, as a subsequent NMR analysis by Sawyer suggested that multiple species were present due to the sheer number of different resonances observed. It is interesting that iron was removed, as evidence from other experiments – albeit in the gas phase – would suggest that it binds quite strongly to the product multipyrrole. All three of the main LCMS fragments (613.3, 569.3 and 462.2 m/z ) are known from accurate mass analysis to still have iron bound. Furthermore, samples of the product mixture have been seen by the author to retain their purple colour (and so presumably the iron ion is still bound to the protein) when kept at 4 °C over a matter of months. An attempt was therefore made to repeat Sawyer’s experiment, but with an ef- fort to keep iron inside the HBP. Butanol was used instead of butanone since it was reckoned that this solvent would give a better separation from the aqueous layer. The reaction was performed under anaerobic conditions, as described in Section 3.8, and then kept under vacuum for the extraction process and subsequent removal of butanol by rotary evaporation. This procedure gave a deep purple residue, suggest- ing that iron was still coordinated to the HBP. This sample was redissolved in d6-DMSO and submitted for 1H NMR analysis. It was hoped that, since the structure of the HBP seemed to have been kept intact, 109 Further Experimental Insight into Haem Breakdown by HemS that the spectrum would be cleaner than that achieved by Sawyer. This point was rendered moot, however, because of the paramagnetism of the iron still left in the sample, which resulted in extreme peak broadening in the NMR spectrum, to such an extent that none could be resolved. Though this problem meant that the structure of the HBP still could not be resolved by NMR, it at least revealed some details concerning the oxidation state of iron. Fe3+ is necessarily paramagnetic due to its odd number of electrons. Fe2+, meanwhile, can be diamagnetic or paramagnetic depending on the splitting energies of the ligands it is coordinated to. In the samples submitted for NMR, therefore, it is likely that iron is either in its Fe3+ or Fe2+ high-spin state. 5.5.2 Crystallisation NMR analysis therefore presented a paradox. If extraction successfully retained the iron ion, paramagnetism caused peak broadening to such an extent that the spectra became uninterpretable. On the other hand, if iron was released during extraction, it seemed that the main HBP disintegrated, giving a mixture of compounds that could not be deconvoluted. Alternative methods of characterisation were therefore sought. One such method was X-ray crystallography. Despite the fact that it had been observed to be capable of retaining its purple colour (implying stability) at low temperatures over a long period of time, it was not clear how crystallisation of the isolated HBP should be approached. Even though crystallisation of haem (in its haemin form at least) has been known for a long time, with Teichmann crystals discovered in 1853,237 it was thought unlikely that the HBP would crystallise under the same conditions, given that it is most likely cleaved at one of the meso-carbons, resulting in a more flexible structure that is perhaps more of a linear than a cyclic tetrapyrrole. It was suspected that, even if crystallisation was successful, it would only occur after further breakdown of the product. Therefore, crystallisation with the HBP still bound inside the protein was at- tempted instead. As detailed in Section 1.4.1, suitable conditions have been de- veloped for HemS to crystallise in both its apo- and holo-forms. The conditions from Schneider & Paoli163 were largely followed with minor adaptations, which are detailed in Section 3.11.1. Samples of holo-HemS (i.e. HemS with unreacted haem bound) were crystallised as a standard in parallel with samples of HemS containing the HBP. Before getting to this stage, a method of concentrating the HBP-containing HemS samples and exchanging into the appropriate crystallisation buffer was re- quired. It was found that concentration could be achieved using centrifugal filters, 110 Further Experimental Insight into Haem Breakdown by HemS with the HBP remaining in the protein pocket. This approach solved an important paradox which could have prevented crys- tallisation altogether. As noted in Section 1.5.2 (particarly in Fig. 1.21), the NADH- dependent breakdown of haem does not occur at high haem concentrations (above ∼20 µM it begins to shut down). Meanwhile, crystallisation of HemS requires con- centrated samples (Schneider & Paoli quote 30 mg mL-1,163 which works out as ap- proximately 750 µM). If as many HemS molecules are to contain the HBP as possible, then the starting concentration of haem would need to be approximately 750 µM as well if it were not for this post-reaction concentration step. In short, the ability to concentrate the HemS and HBP samples via a method unlikely to significantly disrupt the structure of the protein was required in order for crystallisation to be possible, and those conditions were fulfilled by the ultrafiltration step. Crystal growth proved to be inconsistent between samples. Those stored at 4 °C tended to grow better than those stored at 25 °C. For those samples that did show growth, the crystals tended to be small. Most were long, thin and blue, and none appeared to exceed 0.16× 0.04 mm. With the help of Dr Paul Brear, Facility Manager of the Crystallographic X- ray Facility at the Department of Biochemistry, University of Cambridge, the most suitable crystals were treated with cryoprotectant and sent to Diamond Light Source, as described in Section 3.11.2. The resolution of the protein itself in these samples typically proved to be of good quality (at around 1.67 A˚). In many cases, there was clearly electron density in the pocket corresponding to the region where haem binds. This species was therefore taken to be the HBP. However, the resolution of this electron density was too low to determine a structure exactly. One side appeared more ‘open’ than the others, suggesting that the porphyrin ring had been cleaved, as expected. There was no obvious electron density corresponding to iron, suggesting that it had been removed, although it was unknown whether this loss occurred during crystallisation or cryoprotection. Without iron to bind to, this result would suggest the multipyrrole had a degree of conformational flexibility, thus giving rise to a degree of disorder and therefore potentially explaining its low resolution within the protein pocket. Alternatively, these densities in the protein pocket could be showing multiple stages of the degradation of the HBP at once; inconsistencies between unit cells would explain the low resolution. A representation of the electron density of the HBP from one of the crystals is given in Fig. 5.8. The unstructured loop of HemS discussed in detail in Sections 1.4.1 and 1.7.1 was also of too low a resolution to accurately resolve the structure in any of the wild type (WT) crystal structures considered. However, crystallisation was also performed on 111 Further Experimental Insight into Haem Breakdown by HemS Figure 5.8: Electron density from a crystal of WT HemS after haem breakdown. Unas- signed electron density is shown as a green mesh. The haem molecule, in cyan, is overlaid from a crystal of holo-HemS to provide a reference point. The shape of the unassigned electron density suggests the tetrapyrrole is still intact but has been cleaved. The ab- sence of a region of large electron density, indicative of iron, suggests that iron has been extracted. 112 Further Experimental Insight into Haem Breakdown by HemS Figure 5.9: UV-Visible difference spectra comparing a standard 5 µM WT HemS : 20 µM Haem : 2000 µM NADH reaction with one which also has 2000 µM NAD+ present. Spectra were collected every minute for 20 minutes, indicated by the colour change from red (1 minute) to blue/purple (20 minutes). Insets chart the changing absorbance at 591 nm over time. Results show that the presence of NAD+ has little effect on the reaction. the double mutant, F104AF199A, where the resolution was good enough to construct the loop. This experiment shall be discussed further in Section 9.6. 5.6 Product Inhibition and NAD+ It was unclear why the turnover for the NADH-dependent haem breakdown reaction is so low. Experiments by the author exhibited a diminishing rate of conversion upon further haem addition, even when it was assumed that all of the original stock of haem had been converted. This result suggested that one or both of the reaction products, the HBP and/or NAD+, were still occupying the pocket and involved in product inhibition. Therefore, two reactions were run in parallel and tracked by UV-Vis spectroscopy. In both cases, 5 µM WT HemS and 20 µM haem were pre-incubated for 30 minutes, but in one of them 2000µM NAD+ was also present. Baselines were then taken. NADH was then added to 2000µM in each, and the reactions monitored for 20 minutes. The results are shown in Fig. 5.9. It had been hypothesised that pre-soaking with NAD+ would result in NADH being blocked from accessing the protein pocket. However, the results in Fig. 5.9 show little difference between the reactions run with and without NAD+. The peak at 591 nm grows to an absorbance of ∼0.25 in each case over a similar time period. After 20 minutes, the reaction without NAD+ still appears to be rising slightly whereas the reaction with NAD+ appears to have reached a maximum and levelled off. However, this difference is only slight, so no meaningful conclusion can be drawn from it. A slight discrepancy is in the absolute values of the Soret band at 113 Further Experimental Insight into Haem Breakdown by HemS 408 nm between the two experiments. In the case without NAD+, the absorbance starts at approximately −0.25, and drops to roughly −0.60 over 20 minutes. In the case with NAD+, meanwhile, the absorbance starts at approximately −0.55 and decreases to roughly −0.80 over 20 minutes. The greater decrease in absorbance in the non-NAD+ case suggests that haem is being used up faster in a given amount of time. This difference is perhaps because NADH does not have to displace any NAD+ molecules from the protein pocket, so can access haem to break it down more readily. The reason why the absolute absorbance values at 408 nm are higher in the case without NAD+ is not clear, but is perhaps due to an inadequate amount of time being allowed for HemS-haem equilibration before the baseline was taken. These experiments show that NAD+ does not inhibit the haem breakdown re- action to any significant degree. The NAD+/NADH ratio is therefore presumably not of great importance to the reaction. If product inhibition is a factor limiting the turnover, it is perhaps due to the HBP or some sort of [HBP-NAD+] complex instead. 5.7 Discussion and Summary Greater insight had been attained from these further investigations into the NADH- dependent haem breakdown reaction in HemS. Firstly, the possibility of the reaction taking place anaerobically had been con- firmed. Furthermore, performing this reaction anaerobically also demonstrated min- imal side-product formation from a non-regiospecific, non-enzymatic coupled oxida- tion process, unlike under aerobic conditions, where this process competes. This side-reaction had not been noticed in previous studies but it can produce a sig- nificant quantity of biliverdin, particularly when the haem concentration is high. These facts taken together – that NADH-dependent haem breakdown can occur aerobically or anaerobically, but that competition only occurs aerobically – would suggest that HemS is a haem breakdown enzyme specifically designed to work under anaerobic conditions. As discussed in Section 1.3.2.1, the bacterium that produces HemS, Yersinia enterocolitica, generally attacks the gastrointestinal tract of its host organism. It then typically travels to the terminal ileum to replicate.238 This is a region of low oxygen content239 and so Y. enterocolitica, an aerobic but facultative (i.e. discretionary) anaerobic bacterium, has to adapt to its new conditions. What- ever the conditions, the bacterium still needs to obtain iron. The hem operon has been demonstrated to be able to obtain such iron in the form of haem, with HemS purportedly the end user. Either the hem itself adapts to the oxygen-limiting con- 114 Further Experimental Insight into Haem Breakdown by HemS ditions, or it may be a backup iron-uptake system, which only switches on when oxygen levels are low. Whatever the case, it provides an anaerobic method of haem uptake and, now that HemS has been demonstrated to be capable of breaking down haem without oxygen, utilisation. The reaction mechanism was also investigated in more detail. It is now clear, from stopped-flow experiments, that an intermediate with a peak at 806 nm is pro- duced before the final HBP, which has its signature at 591 nm. This intermediate is transient, typically lasting for only 20 s under the conditions investigated, and its loss corresponds to the production of the final HBP. It was unclear what structure this intermediate could have but the long wavelength associated with it implied that the conjugation of the porphyrin had been broken and/or the iron oxidation state had changed. Further insight into the reaction mechanism was gained from deuterium labelling experiments, which proved that NADH was acting as a direct hydride donor to haem. These experiments further showed that the reaction was not stereospecific in terms of which face of the nicotinamide head the hydride was delivered from. Indeed, it appeared that there was an approximately 50 : 50 chance of delivery from either face. Despite this ratio, the rate of transfer of the hydride was clearly faster from the (S )-position than it was from the (R)-position. At first sight, this result seemed to be a contradiction: if transfer from one of the faces was markedly faster than transfer from the other, then how did the deuterated : non-deuterated product ratio end up near to 1 : 1 no matter whether (R)-NADD or (S )-NADD was used? The current hypothesis is that NADH unfolds in the pocket in such a manner that either its (S )- or (R)-face is presented towards haem, and that the energy that would be required to change this conformation so that the alternative hydride is presented exceeds the energy required for the hydride to transfer, even if it is the less favoured face that is directed towards haem. Thus, the more difficult hydride transfer is still achieved, albeit at a slower rate. Further attempts were also made to determine the structure of the HBP. A method of HBP extraction from the protein using butanol was found to successfully retain the integrity of the HBP. However, 1H NMR spectroscopy on these samples was unsuccessful due to extreme peak broadening, which made the spectra difficult to interpret. This problem was most likely due to paramagnetic effects caused by the retention of the iron ion inside the HBP. One conclusion from this NMR analysis was possible though: the very fact that paramagnetism was occurring suggested that this iron ion was either in a Fe3+ or a high-spin Fe2+ state. Crystallisation and X-ray crystallography of the HBP inside HemS was also at- 115 Further Experimental Insight into Haem Breakdown by HemS tempted. To achieve a low enough haem concentration for the reaction itself and then arrive at high enough concentrations for crystallisation, samples were concentrated by ultrafiltration. Crystal growth between samples was inconsistent. Nevertheless, the resolution of the protein molecules inside the crystals selected for analysis at Diamond proved to be good. The resolution of the HBP, on the other hand, was never high enough for an accurate structural determination. This could have been due to a number of factors. The HBP may have been degraded by the centrifuga- tion step required to concentrate the samples, although this seems unlikely given the samples each retained their purple colour. It may also have been degraded during the crystallisation process itself as the crystallants used, such as PEG 400, may have reacted with it to cause further breakdown. Furthermore, the cryoprotectants added by Dr Paul Brear may have caused breakdown. Despite the poor resolution, it was clear from all of the crystals that the multipyrrole was not a full ring, with one of the sides longer than the others, and with density missing from its midpoint. This structure strongly suggested that it had been cleaved at one of the meso-carbon positions. Due to the low turnover of this reaction, the possibility of product inhibition (specifically from NAD+) was investigated. The results showed that NAD+ caused no such inhibition. Either product inhibition is caused by the HBP or by a [HBP- NAD+] complex, or the low turnover rate of this reaction is not due to the products at all. Further experimental results are discussed in Chapter 9. 116 Chapter 6 Computational Method Development A glossary and some diagrams are provided in Appendix D to clarify the terms and concepts discussed in this method development section. It is therefore recommended to refer to this resource whilst reading this chapter. 6.1 Aims Due to the weak and transient nature of NADH-binding inside the HemS pocket, the types of laboratory-based experiments that could be used to further elucidate NADH-residue interactions were limited. The overall aim of computational research into the [WT HemS + Haem + NADH] systema was therefore to find pathways for NADH approaching haem within the pocket (and thereby build up a thermody- namic and kinetic profile). As outlined in the Methods, this analysis is most easily achieved if a single pathway using discrete path sampling (DPS) is searched for first. Furthermore, such an approach requires good starting points. For a pathway as long and involved as the one under study, it was decided it should be split into smaller segments, with NADH at various stages of progression along the pocket. These starting structures (five in all) were sampled extensively before attempts were made to connect all of them together using DPS. This approach resulted in five large ‘sub- databases’ of stationary points, and so the question became one of how to connect them to uncover a complete pathway. It was mainly due to the implementation of Wales group software to GPUs since aPlease note that for the rest of this chapter, whenever the WT HemS system (or any of its mutants or homologues) are referred to, this implies inclusion of haem and NADH unless explicitly stated otherwise. 117 Computational Method Development the studies of Choy21 and Shang159 that far larger databases of stationary points describing the WT HemS system could be generated. However, the system under study contains a large number of atoms, as well as multiple biomolecules, implying that many intermolecular forces in addition to the intramolecular forces of the pro- tein must be included. Coupled with the fact that the conformational gaps between the sub-databases were large (even despite the attempt to split the pathway into more manageable segments), it was clear that efficient selection methods would be required to ensure sensible connection attempts were made. Therefore, the con- nectunc keyword (which was explained in Section 4.8) was expanded to reflect this particular issue of selecting efficiently for large sub-database connection attempts. Despite this development, it still took a long time to find the pathway for the WT HemS system. It was the intention of the author to study pathways from selected mutants and homologues too, but it was apparent that building these other databases from scratch using standard methods and the computational resources available would take too long. Therefore, a method was developed to take the stationary points in the WT HemS database, use them as templates, apply mutations to the relevant residues, reoptimise, and then fill in any gaps in the pathway where reoptimisation failed. This strategy led to checkspmutate, a subroutine now implemented in the Wales group code. Therefore, method development projects pertinent to this thesis were: 1. To develop an efficient procedure for selecting minima far apart in conforma- tional space and part of different sub-databases for connection attempts. 2. To develop a routine that can ‘mutate’ selected residues for all of the stationary points on a given pathway, and reoptimise, reconnect, and refine them to therefore give a pathway for a new, mutated system. Details of these methods, and how well they perform, are provided in the following sections. 6.2 Expanding connectunc As detailed in Section 1.7.2, Shang had identified, using gmin, three possible binding modes for NADH in the HemS pocket in addition to the one that had been identified by Choy using Relibase+.21;159 Further studies by the author identified another site. Therefore, five possible binding sites for NADH in the holo-protein pocket had been identified, all at different stages of unfolding and proximity to haem. These sites are numbered from 1 to 5, with site 1 representing the case where NADH is 118 Computational Method Development furthest from haem and at its most folded, and site 5 representing the case where NADH is closest to haem and is most unfolded. It was recognised that connecting these sites would therefore give a pathway showing NADH unfolding and travelling along the pocket towards haem. Despite there being three intermediate stages of NADH progression along this pathway to start calculations with, rather than just the start and end points, it was realised that these structures would still be difficult to connect via DPS. The sheer size of the system (5501 atoms in total) plus the requirement to model intermolecular as well as intramolecular interactions, promised that such connection attempts would be computationally expensive. Furthermore, these connections between binding sites often looked as if they would require significant conformational changes, both from the protein residues associated with the pocket and from the NADH molecule itself. To better understand the local energy landscapes around these different sites, basin-hopping runs using short step sizes were first run for each. This approach allowed for many minima close in conformational space to the site being studied to be discovered. These minima were then connected using DPS, as described in Sections 4.6 and 4.7. Minima derived from the same site were far easier to connect to each other than minima derived from different binding sites, because they were far closer in conformational space. This approach results in five ‘sub-databases’, each comprising the original bind- ing site minimum structure discovered by Choy, Shang or the author, and a large number of minima and connecting TSs close in conformational space. It can there- fore be said that this conformational space around each site has been well sampled, increasing the possibility that there are minima in these sites that can connect with less difficulty to minima in other sites. Because all of the minima in the sub- databases are connected to one another, just one successful connection between two minima in two different sub-databases would mean that all of the minima in these two sub-databases become connected. To save computational resources, an efficient scheme to choose which two minima to connect based on the likelihood of success was therefore developed. The criterion was Euclidean distance, as it was assumed that minima closer together in conformational space would be more likely to success- fully connect in a given number of cycles. In other words, the shorter the Euclidean distance between a pair of minima, the more likely they would be selected for a connection attempt. The connectunc keyword in the pathsample program (see a discussion of connectunc in Section 4.8) was therefore extended to achieve this selection. It was first ensured that the database being studied contained only the two sub-databases 119 Computational Method Development to be connected.b connectunc then splits the database into the AB set (one of the sub-databases) and the AB set (the other sub-database). If the lowest option is applied to connectunc, the lowest energy minimum from AB is identified, and the n closest minima in AB are chosen to try to connect to this AB minimum. For this work, lowest was adapted to give another option, designated lowesttest. Instead of submitting a job after the closest minima in AB to the lowest minimum in AB were identified, lowesttest stores this information and then cycles to the next-lowest minimum in AB. The closest minima of AB to this minimum are then identified too, and this information is also stored. This continues until the closest minima from AB have been identified for all of the minima from AB. In other words, the Euclidean distance for each and every combination of minima between AB and AB are identified. A connection is then attempted between the two minima with the shortest distance between them. The fact that it is the shortest distance to connect means that, in general, it should be the most likely to succeed. As sites (and therefore sub-databases) 1 through to 5 approximately charted the progression of NADH through the pocket (with 1 showing NADH just as it enters the pocket, and 5 showing NADH in close proximity to haem) it was therefore clear which sub-databases to try to connect. It would make far more sense to try to connect sub-databases 2 and 3 together as well as 3 and 4, rather than trying to connect 2 and 4 directly, as the Euclidean distance between any two minima in the latter case is likely to be very high. However, the author could envisage a scenario (totally removed from the WT HemS system) where far more than five sub-databases are brought together for connection attempts, and there would be no obvious relationship between them. Therefore, the connectunc lowesttest algorithm was generalised still further, so that, as well as choosing which minima between two sub-databases are most suitable to select for connection attempts, the best sub-databases to select for connection attempts in the first place are considered as well. 6.3 checkspmutate Despite the use of connectunc lowesttest to connect the five sub-databases, plus the benefit of GPU-acceleration, it still took over 12 months to find a fully connected pathway showing NADH unfolding and moving from a position at the bIf there were further minima and/or TSs in the database not belonging to either of the sub- databases of interest then these were removed, after being copied and saved elsewhere, to be added back in later. There are subroutines, such as retainsp and removesp, implemented in pathsample which can do this. 120 Computational Method Development edge of the pocket to one where its nicotinamide head is in close contact with haem. When we consider the computational efforts expended by Choy and Shang to generate the original binding sites in the first place, whether from Relibase+ or large step size basin-hopping, it is clear that building such a pathway from scratch constitutes an 18-24 month project. Analysis of this pathway had suggested seven mutants of HemS to study (to be discussed in the next chapter), and the literature indicated at least four homologues (HmuS, ChuS, ShuS and PhuS) of interest. Pathways were therefore sought for these other ten systems (PhuS was discontinued after failing the reoptimisation process, as shall be discussed in Section 6.3.3) to determine whether NADH interacts with the protein in a similar manner in each. Clearly, this task would not have been possible if the methods employed to find the WT HemS pathway had been followed, even if it was possible to run some of the systems in parallel. Therefore, it was decided that these other systems for the mutants and homo- logues should not be derived from scratch. Instead, the WT HemS system was used as a ‘template’ for the others. Specifically, all of the stationary points (minima and TSs) on the Dijkstra shortest pathway for WT HemS were reoptimised after having applied the necessary mutations. The question, therefore, was how to apply these mutations before reoptimisation. Furthermore, how were the homologues to be handled? Not only did they require changes to multiple residues throughout the sequence, but they also required residues to be added/removed from certain points in the sequences. 6.3.1 Test Tripeptide System In developing the checkspmutate subroutine, a small tripeptide, RQQ, was used as a test system, which also provides a useful conceptual framework to explain the operation of the algorithm. The simplest mutation to model is a single-point mutation which reduces the overall number of atoms. Increasing the size of the residue is more likely to lead to reoptimisation issues as it increases the steric bulk of the molecule under considera- tion. Therefore, the first mutation attempted was a change of the central glutamine to a glycine, i.e. RQQ → RGQ. The first thing to consider was therefore using the LEaP program in AMBER to generate a new topology file for RGQ (see Section 4.4 for further details on topology file generation). This is a simple, well-documented process which provides the necessary information on atomic type, bond strengths/angles etc. required for any system to run on AMBER. 121 Computational Method Development However, generating the new set of input coordinates for the RGQ peptide presents more difficulties. As the overarching intention of the ‘template-based’ ap- proach was to preserve the conformations of the reoptimised stationary points as far as possible, a method of ensuring the mutated residues were as close conformation- ally to what they had been changed from was required. Fortunately, this problem had been addressed by Ro¨der when he developed the Mutational Basin-Hopping (MBH) subroutine for the Wales group.134 MBH considers both the original and mutated residues as individual graphs, each consisting of nodes (which represent atoms) and edges (which represent covalent bonds). Each node stores information on the atom name, type and hybridisation. All nodes that are consistent between the original and mutated residues (i.e. have the same atomic properties) and are at the same position within their respective graphs are considered as preserved, and so the coordinates they had in the original residue are carried over to those for the mutated residue. Backbone atoms are therefore always preserved, and usually a given number of side-chain atoms are too. This process therefore preserves the overall orientation of the residue. For those mutated atoms not assigned coordinates at this stage, a subgraph is created to allow for the coordinates to be created one-by-one, thus ‘building up’ the remainder of the mutated residue. This construction is achieved first by locating all of the atoms in the subgraph that have a connection to a node already assigned coordinates (known as the host atom). Unit vectors from the host atom, which take into account its hybridisation state, are constructed to generate coordinates for those atoms connected to the host. These unit vectors are then adapted (i.e. scaled) to account for the properties (atom type and hybridisation state) of the new atoms to give a final set of coordinates. This procedure is then repeated iteratively until all the mutated atoms have been assigned a position. There are caveats for certain residue types, particularly ring systems. Also, methods have been put in place to deal with potential chirality problems. For more details, please refer to the PhD thesis of Ro¨der.134 By interfacing checkspmutate with this method to generate starting coordi- nates for mutated residues, a new overall structure with mutations applied to the user-specified residues could be created, ready for optimisation. If the starting sta- tionary point had been a minimum, reoptimisation used the L-BFGS routine as described in Section 4.5. Otherwise, if the stationary point had been a TS, reopti- misation was carried out using hybrid eigenvector-following (HEF), as described in Section 4.6. This process was cycled until all stationary points in the pathway had been mutated and an attempt made to reoptimise them. 122 Computational Method Development Figure 6.1: RQQ to RGQ mutations and reoptimisations. The top structures show two different minima (labelled 1 and 2) from a database describing the RQQ tripeptide. A Q → G mutation was applied to the central residue in each case, and the resulting set of coordinates reoptimised. The general conformational properties of the RQQ minima are retained in the new RGQ minima in each case. To illustrate this approach, Fig. 6.1 shows two different stationary points of the RQQ tripeptide and their equivalent stationary points in the RGQ tripeptide after mutation and reoptimisation. However, mutations that increased the overall number of atoms in the residue being mutated also had to be considered. Furthermore, it was recognised that for homologues several residues would need to be mutated, plus some may need to be deleted from or added to the sequence. Therefore, checkspmutate was developed further to account for each of these possibilities. In all, four types of transformation were attempted on the test system, RGQ. These were: RGQ → RFR. Two mutations attempted at once. Both constituted an increase in the number of atoms. RGQ → RQ. Deletion of the central G residue. RGQ → RGQFF. Insertion of two residues at the C terminal (which is located at Q). 123 Computational Method Development Figure 6.2: RGQ mutations and reoptimisations. Each mutation was from the same RGQ minimum. All reoptimisations converged to stable minima, giving mutated minima of similar conformational properties to the original RGQ tripeptide minimum. RGQ → RQFF. Deletion of the central G residue and insertion of two residues at the C terminal. The results from each of these mutations, all applied to the same RGQ minimum, are shown in Fig. 6.2. Reoptimisations following mutations were generally successful. However, particularly for cases where the mutations resulted in steric clashes, certain minima and TSs did not converge to within a user-specified RMS force criterion. 6.3.2 Point Mutations checkspmutate had been demonstrated to successfully mutate and reoptimise the majority of stationary points for the tripeptides studied. This subroutine was therefore applied to the WT HemS system. The Dijkstra shortest pathway was identified for this system and extracted to give a smaller, more manageable database. This database was then copied eleven times, the combined total of HemS with point mutations applied, or homologues of HemS, to be studied. checkspmutate was then applied to each one. 124 Computational Method Development No. Minima % Successfully Converted No. Transition States % Successfully Converted Wild Type 1235 N/A 1234 N/A F104A 1105 89.5% 803 65.1% F104AF199A 1059 85.7% 557 45.1% F104I 1110 89.9% 979 79.3% F199A 1068 86.5% 838 67.9% R209A 1050 85.0% 642 52.0% R209K 1090 88.3% 960 77.8% Q210A 1079 87.4% 660 53.5% Table 6.1: Success rate of point mutations and reoptimisations from the minima and transition states of the WT HemS Dijkstra shortest pathway. Analysis of the WT HemS pathway (see the next chapter) had inspired further study of six different single-point mutations (F104A, F104I, F199A, R209A, R209K and Q210A) and one double-point mutation (F104AF199A). As these each consti- tuted only one or two mutations in the protein out of a possible 338, and all such mutations led to a decrease in the number of atoms in the residue(s) being changed, the success rate of reoptimisations proved to be high. They are listed in Table 6.1. With so many stationary points successfully mutated and reoptimised in each case, the pathways for the mutants were mainly complete, and the gaps to fill in were generally small. Strategies to fill in these gaps will be explained in Section 6.3.4. 6.3.3 Homologues Conversion to homologues is a far more involved process than single- or double- point mutations. Even HmuS, which has an 89.6% sequence identity with HemS, required 34 residues to be converted. ChuS, meanwhile, required the conversion of 111 residues as well as the deletion of one midway through the sequence and another at the C terminus. ShuS required a change of 115 residues and the deletion of one near the middle of the sequence. Conversion to PhuS was also attempted, a transformation requiring 199 residues to change, 5 to be added at various points in the sequence, and one to be removed. The sequences themselves are given for each homologue in Appendix B. It was therefore not surprising that the rate of reoptimisation success was not as high for these homologues as it had been for the point mutations to HemS. Nevertheless, other than for PhuS, these rates remained good enough to give fairly complete pathways, as shown in Table 6.2. It is unsurprising that, as the % Identity of the homologue with respect to HemS decreases, the number of minima and TSs successfully reoptimised also decreases. Not a single minimum or TS was successfully mutated and reoptimised to PhuS. 125 Computational Method Development This failure was probably due to certain mutations causing steric clashes within the structure common to all of the stationary points along the pathway. The crystal- lographic structures of PhuS and HemS (PDB codes 4mf9111 and 2J0P,96 respec- tively) reveal that haem in PhuS is less tightly clamped, probably due to extra steric bulk in that region. Trying to fit these bulkier residues in a narrower pocket, as checkspmutate using HemS as a template does, may have resulted in over- lapping residues, precluding successful structural reoptimisation. This result is a reminder that checkspmutate is only feasible where the structure of the new sys- tem is sufficiently similar to the template system. Results will vary between different protein-ligand systems, but it would seem, from these data on HemS, that an advis- able rule-of-thumb would be to avoid trying to mutate and reoptimise to homologues where the sequence identity falls below 65%. % Identity to HemS No. Minima % Successfully Converted No. Transition States % Successfully Converted HemS 100% 1235 N/A 1234 N/A HmuS 89.6% 674 54.6% 706 57.2% ChuS 66.8% 208 16.8% 255 20.7% ShuS 66.2% 131 10.6% 184 14.9% PhuS 42.6% 0 0% 0 0% Table 6.2: Success rate of homologue conversions and reoptimisations from the minima and transition states of the WT HemS Dijkstra shortest pathway. % Identities are taken from experimental homology calculations using the sequences from the accession numbers listed in Appendix A. 6.3.4 Post-Processing Though PhuS failed, all of the other point mutations and homologues studied gave reasonably complete pathways. It should be stressed that, even for ShuS, where only 131 minima and 184 TSs were successfully reoptimised (% conversions of 10.6% and 14.9%, respectively), this still provides far more to start with than had been the case with WT HemS, where just five minima (the original binding sites identified by Choy, Shang and the author) were used to connect the entire pathway using DPS. Even accounting for the fact that certain regions of the pathway would be poorly optimised (if one stationary point failed to reoptimise, it was probable that its neighbouring stationary points, most likely having similar conformational characteristics, would also not reoptimise successfully), this result suggested that the gaps to fill in would be far shorter than those which had had to be tackled for HemS.c cEven though each of these five sites were sampled locally before connections were attempted, thus giving more minima to choose from, it must be remembered that they were all concentrated in particular regions of the landscape, whereas the mutation and reoptimisation scheme of check- spmutate provides for a wider spread of minima, reducing the likelihood that any large gaps will need to be connected. 126 Computational Method Development To fill in these gaps for each system, steepest-descent paths were followed for all of the TSs to find the minima they were directly connected to, which were often the reoptimised minima already identified. However, a number of new minima were also found. Attempts were then made to connect the gaps in the pathway. As the gaps arose from stationary points that failed to converge when reoptimised, to bridge these gaps, those minima that had been successfully reoptimised and were closest according to the template pathway to those that had failed were selected for con- nection attempts. This approach generally worked well because the gaps tended to be small. Therefore, in each mutated system (whether point mutation of HemS, or homologue) the majority of the gaps were filled. However, large gaps (which can arise, for example, if a long chain of stationary points from the original pathway is not reoptimised successfully) proved to be harder to bridge. When taken together, all of the DPS runs, whether they had succeeded in con- necting gaps or not, generated a large number of new minima and TSs. Therefore, each of the sub-databases within the overall database had grown large.d connec- tunc lowesttest, discussed in the previous section, was therefore used to choose which sub-databases would be best to try to connect, and which minima within these respective sub-databases should be selected to attempt these connections. Overall, it took approximately six months to find a fully connected pathway for all ten mutated systems (seven point mutations and three homologues). These runs were done in parallel using limited computational resources, and so if each had been performed separately they could each have taken approximately three weeks. This timescale is a significant speed-up from the original WT HemS system, where it took over twelve months to find a fully connected pathway from the five starting structures that had been identified by Choy, Shang and the author. Care must be taken, however, not to assume that these new mutated pathways are optimal. These new pathways are liable to have artificially high barriers, and so refinement using the methods outlined in Section 4.8 (i.e. shortcut, shortcut barrier, untrap and standard connectunc) is required. This refinement is particularly important, as an inherent bias was introduced by using the WT HemS pathway as a template. Because of this bias, the new pathways for these mutated systems are likely to follow a route more akin to the original WT HemS pathway than possible alternative, lower energy pathways unique to the new system. Suffi- dRecall that a sub-database in this context is a collection of minima and TSs all connected to each other, whether directly or indirectly. Therefore, a certain segment of a pathway containing no gaps can be considered as a sub-database, and it is separated from other sub-databases by gaps in the pathway. 127 Computational Method Development cient refinement and sampling is expected to iron out these differences, provided the point mutations administered or homologues considered do not significantly alter the protein structure. Experimental evidence would suggest both of these assump- tions are justified for the systems under study in this work. The apo-structures of the mutants studied, following certain biophysical investigations (see Chapter 9), showed little evidence of misfolding and the homologues, where crystal structures are available (e.g. for HemS and ChuS), show similar overall structures. 6.4 Discussion and Summary This chapter described the development of two different strategies to assist in the overall connection of difficult pathways. The first scheme uses connectunc lowesttest, a simple algorithm that de- termines, from a set of sub-databases, the fewest and narrowest set of conformational gaps that need to be filled in order to connect all of the sub-databases (and there- fore, by implication, all of the stationary points contained within each). It does this both by considering which sub-databases would be best to try to connect, as well as which minima to select within those sub-databases. connectunc lowesttest was first applied to the WT HemS system in order to best select the minima from each of the five original sites identified by Choy, Shang and the author for connection attempts. Due to the extensive sampling of the landscapes surrounding these sites before connectunc lowesttest was used, these regions were well-characterised. This situation contrasts with the regions of the landscape between sites, which were not characterised at all until connectunc lowesttest was used. There is therefore an issue with this overall connectunc lowesttest strategy in that it leads to some regions of conformational space being well sampled (those regions surrounding the original sites) whereas other regions are less well sampled (those between the original sites). This imbalance can be solved by pathway refine- ment, where the refinement schemes described in Section 4.8 can be set to focus more on the stationary points of these less sampled regions. It is difficult to determine how much computational time, if any, the strategy using connectunc lowesttest saved with respect to finding a fully connected pathway for the WT HemS system, as a parallel study using more conventional selection methods to fill gaps was not conducted. As the strategy was able to identify the closest two minima between two separate sub-databases, and it was later demonstrated by the work with checkspmutate just how important it is to 128 Computational Method Development try to minimise the gap size between attempted connections, it is likely that this strategy saved a significant amount of computational time and resources. The second strategy developed could only be applied to systems where a pre- existing system with similar characteristics (in terms of sequence and suspected structure) was already available. This strategy was a ‘template-based’ approach, and depended upon the subroutine, checkspmutate. In this strategy, all of the stationary points from the original system (in this study, describing the pathway showing movement of NADH towards haem in WT HemS) were mutated at specified residues and the resulting coordinates reoptimised. In addition to this procedure, provisions were made so that residues could be inserted or deleted if required. The success rate for reoptimisations was typically good, but generally became harder when more residues had to be mutated. This trend was confirmed when HemS was transformed to PhuS (a homologue with 42.6% homology) and not a single minimum or transition state was successfully reoptimised. The success rate for reoptimisation was never 100%, so gaps appeared in the pathways for the new systems. Most of these gaps were short and could easily be bridged by considering where these gaps would have appeared in the original WT HemS system, and connecting the closest successfully reoptimised minima either side of this gap. For larger gaps that could not be easily connected, connectunc lowesttest was used to select the closest pairs of minima for connection attempts in what were by now quite extensive sub-databases, as well as which sub-databases would be best to try to connect in the first place. Fully connected pathways were found for all of the new systems (ten in all) within six months. Had each system been allowed to run individually, rather than computational resources being pooled between all ten, it is estimated that each one would have completed within three weeks. This timescale is a significant speed-up with respect to the original approach used to find a fully connected pathway for the WT HemS system, which took over twelve months. Caution must be exercised, however, when it comes to using this ‘template- based’, checkspmutate approach. It is only suitable when similar pockets within a protein are being investigated, without large-scale differences between the path- ways of the different systems. For example, this method would not be suitable for investigating the majority of protein-folding problems, where the conformational changes a protein undergoes are often very significant, unless it is apparent that the two proteins fold via very similar pathways. One of just a few scenarios where this exemption would apply would perhaps be if a single-point mutation was applied to 129 Computational Method Development a protein that was expected to only affect protein folding at a local level. Instead, this ‘template-based’ approach is more suited to studying protein-ligand interactions (as in this thesis) as such interactions tend to be localised in a particular section of the protein. As long as the mutations do not significantly affect the overall shape of the binding pocket, there is justification for using the pathway from one system to derive the other. Although this approach biases the mutated pathway to be like the template, the fact that the mutated coordinates are reoptimised allows for new energies to be obtained, showing how the landscape transforms with respect to the new mutations. Sufficient refinement of the pathway should also identify any possible alternative pathways, which are likely to be similar since the binding pockets between the template and the mutated systems are related. Provided these refinement conditions are met, this ‘template-based’ approach can be an effective method for deriving pathways for new, yet related, systems at a frac- tion of the computational cost these would otherwise require. The ‘template-based’ approach therefore allows for multiple systems of mutated proteins or homologues to be constructed quickly, in preparation for thermodynamic and kinetic comparisons. 130 Chapter 7 Computational Comparison of HemS with its Mutants and Homologues 7.1 Aims Using GPUs and the novel methods described in the previous chapter, it became possible to construct a large stationary point database for the [WT HemS + Haem + NADH] system for the first time. From this database, it was possible to find fully connected pathways showing NADH moving along the main cavity towards haem. It was thought that these pathways would shed light on the residues involved in NADH-binding, unfolding and movement. The double phenylalanine gate was of particular interest due to its suspected role in regulating NADH access to haem. Once a fully connected pathway had been found and the WT HemS system analysed, further aims were introduced. Certain residues had been identified as im- portant for NADH-binding, and so these were mutated to ascertain what the effects would be. A system using NADPH in place of NADH was also briefly investigated to determine whether it behaved in the same manner. Finally, three HemS homologues, HmuS, ChuS and ShuS, were investigated to determine whether they too could fa- cilitate NADH unfolding and movement inside their pockets to promote reaction with haem. These aims were all attempted with the aid of the checkspmutate subroutine. To summarise, the aims of the computational investigation were: 1. To construct a large stationary point database for the [WT HemS + Haem + NADH] system. 2. To find a fully connected pathway showing NADH moving towards haem 131 Computational Comparison of HemS with its Mutants and Homologues within the WT HemS pocket. 3. To investigate which residues are of importance to NADH movement and un- folding within the WT HemS pocket, with a special emphasis on the double phenylalanine gate. 4. To use the [WT HemS + Haem + NADH] pathway as a template to find path- ways for other systems, including selected single-point mutants, homologues and alternatives to NADH. 5. To compare the structural and energetic properties of these systems. Outcomes from calculations inspired by these aims are described in the following sections. 7.2 Expansion of the Wild Type HemS Database, and Further Analysis of the Double Phe-Gate The author inherited the investigation into the relationship between the double phe- gate and NADH distance from haem started by Shang.159 This work was discussed towards the end of Section 1.7.2, particularly in Fig. 1.28. Due to the enormous increase in computational efficiency afforded by GPUs, this work could be continued on a much larger scale. For certain calculations, this speed-up amounted to two orders of magnitude. Therefore, where the number of basin-hopping steps had typically been 100 in previous studies for an individual run, this value was increased to 10,000, allowing for larger regions of local landscapes to be sampled. To recap, Shang had identified three binding sites (optimised minima structures) for NADH at various stages of its progression towards haem in the protein pocket, in additon to the original site identified by Choy using Relibase+.21 These sites were numbered from 1-4, with 1 indicating the site where NADH was furthest from haem, and 4 indicating the site where NADH was closest. Through manual manipulation with PyMOL240 and subsequent reoptimisation, the author identified a further site (labelled 5), where NADH was even closer to haem. Basin-hopping was therefore carried out starting at each of these sites (and also from a holo-HemS system, i.e. without NADH) using short step sizes to better sample the local landscapes surrounding each of these sites. Ten separate basin- hopping runs of 10,000 steps were conducted. The lowest minimum structure from each of these runs was then selected for further study. 132 Computational Comparison of HemS with its Mutants and Homologues The overall aim of this set of calculations was to determine the relative stabilities of the phe-gate conformations at various stages of the progression of NADH along the pocket (and also when NADH was not present at all). It was therefore necessary to define exactly what constituted open and closed gates. As the double gate con- sisted of two residues, F104 and F199, it could be described as having one of four conformations – closed-closed (CC, i.e. CF104CF199), closed-open (CO), open-closed (OC) or open-open (OO). Four representative examples of these conformations were provided in Fig. 1.27 in the Introduction. These labels describing these conforma- tions were introduced by Shang. However, a systematic way of determining what constituted a closed or an open state had not been necessary at that time; it was possible for Shang to look at the relatively few minima available on a visualisation package, and determine what conformation the two phenylalanine residues were in by sight. However, it was anticipated that many thousands of stationary points would be generated using the GPUs, and so a more rigorous, mathematical method for classifying which conformations the phenylalanines were in without the need to visualise them was developed. Due to the desire to distinguish between OC and CO states, it was realised that some sort of metric for the two phenylalanines with respect to each other would not work as the distances and angles between these two residues would be very similar for these OC and CO conformations. Instead, each phenylalanine was defined with reference to the haem molecule, whose location between stationary points was very consistent, not least because of the fe–H196 nε bond introduced by Choy.21 Specifically, for each phenylalanine, a dihedral an- gle, θ, between it and the haem was defined according to three of its carbon atoms (α, γ and ζ) and the β-meso-carbon of haem, in the order Cα–Cγ–Cζ–Cβ-meso. Through observation and testing, the four conformational states were classified as follows: – CC: –90° ≤ θF104 < 80° and 0° ≤ θF199 < 175° – CO: –90° ≤ θF104 < 80° and (θF199 < 0° or θF199 ≥ 175°) – OC: (θF104 < –90° or θF104 ≥ 80°) and 0° ≤ θF199 < 175° – OO: (θF104 < –90° or θF104 ≥ 80°) and (θF199 < 0° or θF199 ≥ 175°) The conformational state of the double phe-gate for the lowest minimum from each of the BH runs was determined. The conformations of these two phenylalanines were then changed (by substitution, using phenylalanine coordinates from other sta- tionary points) to reflect the other three possibilities, and these resulting structures reoptimised. Therefore, the four different double phe-gate conformations could be 133 Computational Comparison of HemS with its Mutants and Homologues compared for minima which were otherwise identical.a As such, a direct compar- ison between the relative stabilities of these double phe-gate conformations could be made. These four minima were then connected to one another in order to iden- tify possible barriers and to allow for them to be plotted as disconnectivity graphs. Typically, pathways between these four minima involved intermediates, and so the disconnectivity graphs tended to grow to more than four minima. As stated above, ten different BH runs were performed on each site (plus for holo-HemS), giving sixty BH runs in all. The lowest minimum from each of these was extracted, and the phe-gate substitutions/subsequent connections applied to each. This procedure gave sixty disconnectivity graphs. As a result, at each of the six sites, there were ten separate sets of minima describing how the energy of the system varied when the F104 and F199 gates were in different (i.e. CC, CO, OC or OO) conformations. These ten sets were averaged and plotted, as shown in Fig. 7.1, allowing for general trends between the sites (i.e. as NADH moved through the pocket towards haem) to be identified. This analysis is discussed further in the caption to Fig. 7.1, but the main findings were that, as NADH moved further into the pocket, the general stability of the system increased, and that the OO state became more favoured over the other conformations. The ten outputs from each site were then connected to each other using Discrete Path Sampling (DPS). The sites themselves were then connected to one another using the connectunc lowesttest strategy described in the previous chapter, producing a large database of all of the stationary points identified for the [WT HemS + Haem + NADH] system. Refinement of the Dijkstra shortest pathway resulted in further minima and TSs, giving a total of 21,852 connected minima in the disconnectivity graphs shown in Figures 7.2 and 7.3. The two disconnectivity graphs in Figures 7.2 and 7.3 represent the same database but are colour-coded differently. One represents the double phe-gate conformations within the respective minima, whereas the other is coded according to the NADH- haem distance. Specifically, this NADH-haem distance was determined using the β-meso-carbon of haem and the hydride-bearing carbon of NADH, as it was these two carbon atoms which were thought to be most involved in the hydride transfer process. Fig. 7.2 shows that the CO conformation is preferred by the vast majority of minima in the database. This result is most likely due to the stabilising effect of a T-shaped pi-pi bonding interaction that this conformation affords. However, as aIdentical save for very smaller differences introduced when the substituted structures were reoptimised. 134 Computational Comparison of HemS with its Mutants and Homologues -15015 -15014 -15013 -15012 -15011 -15010 -15208 -15204 -15200 -15196 -15192 -15212 -15210 -15208 -15206 -15204 -15220 -15218 -15216 -15214 -15212 -15258 -15254 -15250 -15246 -15242 -15286 -15282 -15278 -15274 -15270 h 1 2 3 4 5 Figure 7.1: Charts showing the relative stabilities of the double phe-gate conformations when NADH is located in different parts of the pocket. h represents the holo-HemS sys- tem (i.e. without NADH), whereas 1–5 represent an increasing degree of NADH unfolding and progression towards haem within the pocket. Bars are colour-coded according to the double phe-gate conformation: CC, black; CO, yellow; OC, green; OO, purple. The y-axis gives a measure of the potential energy of the system, in kcal mol-1. The scale varies between the systems. In moving from 1–5, the energies for each of the conforma- tions drop significantly, indicating that the movement of NADH through the pocket is enthalpically favourable. In the holo-system, the energy difference between conformations is low, suggesting that the OO conformation is marginally favoured against the others, and CC marginally disfavoured. Upon inclusion of NADH, at site 1, the CO conformation becomes significantly more favoured than the others. At 2, CO remains the most favoured conformation. OC is particularly disfavoured, being even higher in energy than the CC conformation. These two features also prove to be the case at 3 and 4, although the OO conformation is becoming relatively more stable. At 5, a switch occurs, whereupon the OC conformation becomes more stable than CC, and the OO conformation becomes more stable than CO, and thereby the most stable conformation of them all. 135 Computational Comparison of HemS with its Mutants and Homologues 10 kcal/mol Figure 7.2: Disconnectivity graph of [WT HemS + Haem + NADH], colour-coded ac- cording to double phe-gate conformation. Minima with a CC conformation are coloured black, CO yellow, OC green and OO purple. The vast majority of minima in the database displayed a CO conformation, suggesting most minima have a preference for this T-shaped pi-pi bonding interaction. The majority of the lowest energy minima, however, have an OO conformation, suggesting that the most stable minima prefer an arrangement where the two phenylalanine residues point away from each other, and away from the centre of the pocket. 136 Computational Comparison of HemS with its Mutants and Homologues 10 kcal/mol NADH-Haem Distance 4.0 8.0 12.0 16.0 20.0 Figure 7.3: Disconnectivity graph of [WT HemS + Haem + NADH], colour-coded ac- cording to NADH-haem distance, as shown by the colour-bar, which is demarcated in Angstroms. The main funnel is dominated by minima with low NADH-haem distances, suggesting that the approach of NADH to haem has a stabilising effect. 137 Computational Comparison of HemS with its Mutants and Homologues NADH approaches, F104 is required to open to allow passage of the ligand through to haem. Therefore, when NADH is close to haem, the OO conformation becomes the preferred option. This conclusion can be established by comparing Figures 7.2 and 7.3, where the minima with low NADH-haem distances are seen to overlap almost exactly with the minima possessing an OO conformation. Furthermore, these minima appear at the bottom of the main funnel, suggesting that the NADH approach to haem is an optimised process (i.e. that the protein has evolved to perform this function or one very much like it). The appearance of the disconnectivity graph in Fig. 7.2 therefore led to a revision of the double phe-gate hypothesis. Rather than occupying one of four conforma- tions, it is clear (once NADH has been introduced to the system, at least) that two conformations (CO and OO) are far more prevalent than the other two. Fur- thermore, the CO conformation predominates in all cases other than when NADH is close to haem. This result suggests that the CO conformation is the most sta- ble option, save for those instances when NADH occupies the space between the two residues. Even though this CO conformation is possible when NADH comes between the two phenylalanines, it is clear from a steric consideration that the pro- tein retaining this CO conformation would lead to a very crowded region in the cavity. It is therefore not surprising that F104 flips round so that it points away from NADH, giving the nicotinamide head more space to orient itself towards haem. More properly, therefore, the [WT HemS + Haem + NADH] system should perhaps be considered as only having two possible conformations at the ‘double’ phe-gate: the CO conformation, which should be considered simply as the closed state, and the OO conformation, which should be considered as the open state. Meanwhile, the CC and OC states, though possible, are so rare that they should be removed from consideration as meaningful competitors. This result is especially interesting because the holo-HemS (i.e. without NADH) system shows no such bias towards the CO (closed) state, as illustrated in Fig. 7.4. Though this holo-HemS system was not as thoroughly sampled, the disconnectivity graph shows a clear preference for the OO conformation, followed by OC. This observation would suggest that CO preference is dependent upon the inclusion of NADH, whereupon F104 switches from an open to a closed state. It was unclear why this preference would be the case; perhaps the inclusion of NADH, even at the very edge of the pocket, leads to some sort of structural change within the protein, which has a knock-on effect on residues deeper within the pocket, culminating in the change of F104 conformation. Further investigation of the stationary points upon NADH inclusion revealed the 138 Computational Comparison of HemS with its Mutants and Homologues 10 kcal/mol Figure 7.4: Disconnectivity graph of holo-HemS, colour-coded according to double phe- gate conformation. The OO conformation is preferred, and there are few instances of CO. 139 Computational Comparison of HemS with its Mutants and Homologues precise reasons behind this effect.b When NADH is introduced to the pocket, its adenine nucleobase positions itself such that it binds to two residues at opposite sides of the cavity (namely, the backbone of P169 and the carboxylate group of D310). This binding serves to narrow the entrance to the cavity significantly, holding the NADH molecule in place. This alteration to the cavity affects the unstructured loop further down its length, bringing a glutamate residue (E174) into much closer vicinity to an asparagine (N106). The glutamate hydrogen-bonds to the asparagine, preventing it from bonding to the pi system of an open F104. Without this stabilising binding interaction to asparagine, F104 moves to a closed position. As NADH unfolds and its nicotinamide head approaches the double phe-gate, E174 is ‘pushed’ upwards again, causing the E174-N106 hydrogen-bond to break. This change leaves N106 free to bind to F104 again, thus causing F104 to open, and thereby let NADH slip past. Three representative structures – two from the database with NADH, and the other from the database without – are shown in Fig. 7.5, with the relevant residues and bond lengths highlighted, to illustrate the interactions involved. 7.3 Identifying Residues to Mutate As the double phe-gate appears to be key to the regulation of NADH access to haem, the F104 and F199 residues were chosen to form the basis of a mutagenesis study. In all, four mutants involving this double phe-gate were chosen: F104A, F199A, F104I and F104AF199A. The two single-point mutations to alanine were to determine if a reduction in the steric bulk at either of the gates led to NADH being able to access haem more easily. F104AF199A was chosen for similar reasons, only this time the entire double gate would be removed. F104I was chosen as isoleucine is of a similar size to phenylalanine but is non-polar and aliphatic rather than polar and aromatic. This choice was to probe whether a change in electronics (specifically, the disruption of the T-shaped pi-pi interaction between F104 and F199) would have an effect on the gate, rather than simply a change of sterics. In addition to the double phe-gate, two other residues were selected for mutage- nesis, namely R209 and Q210. The first of these, R209, was of interest because it can bind to both haem and NADH simultaneously. R209 is one of the residues (de- tailed in Section 1.4.1) that binds to the buried propionate group of haem, helping to anchor the haem in the pocket. NADH at the edge of the pocket is too far away from R209 to interact. However, bThese findings regarding the change in F104 conformation when NADH is introduced to the pocket were only discovered towards the end of the project, hence the residues concerned were not included in the mutagenesis study. 140 Computational Comparison of HemS with its Mutants and Homologues Figure 7.5: Front and side views illustrating the long-scale effect of NADH-binding on F104 conformation. Haem is represented with a magenta skeleton, NADH (where present) in orange, F104 in white, N106 in purple, P169 in blue, E174 in salmon, F199 in cyan and D310 in yellow. Top left & right: front and side views of a selected minimum from the holo-HemS database. The cavity is open wide, with P169 and D310 far apart. E174 is also far from N106 (12.6 A˚), which frees up N106 to form a polar contact with F104, keeping the latter in an open conformation. Middle left & right: front and side views of a minimum from the [WT HemS + Haem + NADH] database, where NADH has not unfolded enough to reach the double phe-gate. P169 and D310 hydrogen-bond to the adenine nucleobase of NADH from different sides (2.0 A˚ and 4.7 A˚ respectively), narrowing the cavity entrance. 141 Computational Comparison of HemS with its Mutants and Homologues Figure 7.5: (continued) This movement brings E174 in close contact (1.9 A˚) with N106. N106 therefore shifts in conformation to make this bond, leaving F104 free. F104 changes from an open to closed conformation in order to form a T-shaped pi-pi bond with F199. This movement blocks NADH from easy access to haem. Bottom left & right: front and side views of a minimum from the [WT HemS + Haem + NADH] database, where NADH has extended fully and slipped past the double phe-gate. P169 and D310 still dock the adenine nucleobase of NADH in place (2.0 A˚ and 5.7 A˚ respectively). However, the ap- proach of the nicotinamide head of NADH to the double phe-gate has caused E174 to shift back upwards. To remain bonded, N106 therefore changes its conformation considerably, ultimately yielding a bond 4.6 A˚ long. This movement of N106 frees up the space required for F104 to flip back to an open conformation, which in turn provides more space for NADH to slip between the two phenylalanine residues and access the haem molecule. as it approaches haem, the ribose adjoining the nicotinamide head can interact with the free –NH2 of R209. Depending on the orientation of the ribose, this interaction is through either the C2′-hydroxyl or the ring-based oxygen. If the former case, then as NADH slips past the double phe-gate, the hydrogen-bond formed from the free –NH2 of R209 shifts from the C2 ′-hydroxyl of the ribose to the C3′-hydroxyl. Relevant structures are shown in Fig. 7.6. Analysis of the [WT HemS + Haem + NADH] database suggested that NADH did not readily slip through the double phe-gate when the ring-based oxygen of its ribose was hydrogen-bonded to R209, yet slipped through more readily when the ribose was oriented so that its C2′/C3′-hydroxyls could bond to R209. It is important to note that the two sites identified by Shang159 and the author, which were in closest proximity to haem (sites 4 and 5), both had the NADH oriented so that its C2′/C3′-hydroxyls would bond to R209, rather than the ring-based oxygen. On the one hand, the very fact that these were identified by BH suggests that this is the preferred orientation. However, on the other hand, the fact that these geometries were used as starting structures in connecting the pathway, and that structures with the ring-based oxygen bonded to R209 were therefore precluded, suggests that there may have been a bias against identifying these latter structures. With that caveat in place, it is nevertheless notable, that no structures were identified that had the nicotinamide head of NADH past the double phe-gate and the ring-based oxygen of the ribose bound to R209 simultaneously. This apparent selection for a particular orientation of the NADH molecule as it approaches haem suggests that R209 not only helps to bring NADH and haem into close proximity by hydrogen-bonding to both simultaneously, but also plays a role in ensuring that NADH obeys some aspect of stereospecificity. On the face of it, this result is surprising because the deuterium labelling experiments detailed in Section 142 Computational Comparison of HemS with its Mutants and Homologues (i) (ii) (iii) Figure 7.6: Selected minima highlighting the different bonding capabilities of R209. Haem is represented with a magenta skeleton, NADH with an orange skeleton, and R209 in cyan. In every case, one hydrogen from each of R209’s two –NH2 groups form tight hydrogen-bonds with the two oxygens of the innermost haem propionate. Top left: The free (i.e. non-haem bound) hydrogen of the –NH2 group of R209 more deeply embedded in the pocket hydrogen-bonds to the ring-based oxygen of the nicotinamide-bearing ribose of NADH. The double phe-gate (not shown) blocks NADH access to haem. Top right: The free hydrogen of R209 hydrogen-bonds to the C2′-hydroxyl oxygen of the NADH ribose. The double phe-gate has opened, allowing NADH to slip past. Bottom left: The free hydrogen of R209 hydrogen-bonds to the C3′-hydroxyl oxygen of the NADH ribose. NADH has slipped further along the pocket, with the nicotinamide head approaching the β-meso-position of haem. Bottom right (i): Segment of a selected minimum from the WT HemS database, showing the hydrogen-bonds formed between R209, haem and NADH. (ii): The same minimum following checkspmutate mutation and reoptimisation to R209A. No hydrogen-bonds are formed between the alanine and haem or NADH. (iii): The same minimum following checkspmutate mutation and reoptimisation to R209K. The lysine is hydrogen-bonded to the haem propionate. Other minima studied showed lysine was also capable of hydrogen-bonding to NADH, but it could not bond to both haem and NADH simultaneously. 143 Computational Comparison of HemS with its Mutants and Homologues 5.3 showed that both the (R)- and the (S )-hydrides of NADH could be transferred over to haem. However, it was also shown in these deuterium labelling experiments that transfer from the (R)-position was far slower than that from the (S )-position. It is perhaps the case that R209 plays some role in this difference, although it is not apparent how or for what reason. Two mutations were applied to R209. One was to mutate to alanine, which was of interest because it reduced the hydrogen-bonding capabilities to zero, and significantly reduced the steric bulk. In case this mutation was to prove too dis- ruptive when applied in the laboratory, a less drastic mutation to lysine was also investigated. This residue retains the ability to hydrogen-bond, but only via one amino side chain group, rather than via the three present in the guanidino group of arginine. It would therefore not be possible for this lysine group to anchor both NADH and haem in place at the same time. The final residue of interest was the one adjacent to R209, Q210. This residue was shown to change conformation considerably as NADH progressed through the pocket. Generally, for minima in the holo-HemS (i.e. without NADH) database, Q210 hydrogen-bonds to E288, and thus points away from the main cavity. Such a conformation has been termed the ‘resting position’ for Q210 by the author. The top left image of Fig. 7.7 shows that Q210 remains in this resting position when NADH is at the edge of the pocket. However, as NADH begins to unfold, Q210 swings round to hydrogen-bond first to the G311 backbone. As NADH extends along the pocket still further, Q210 flips over again, so that it can hydrogen-bond to both G311 and the NADH diphosphate backbone. This change most likely stabilises NADH enough for it to overcome its final barriers to full extension. Once NADH is fully extended, Q210 swings back to its original resting position, hydrogen-bonded to E288. This configuration perhaps encourages NAD+ to fold back up after the transfer of hydride to haem. Since Q210 appears to have a key role in stabilising extended conformations of NADH, it was hypothesised that its removal may disrupt, or even stop, the transfer of hydride from NADH to haem. Q210A was therefore chosen as a mutant to study as a change to alanine would remove this residue’s hydrogen-bonding capabilities. Analysis of the stationary points and pathways generated for the WT HemS system had therefore helped to develop a deeper understanding of residue-ligand bonding within the protein pocket. Furthermore, it is unlikely that the behaviour of Q210 would have been identified without a fully connected pathway, as such a pathway allowed for the residue (and its changing conformations) to be monitored continuously as NADH progressed along the pocket. In all, this analysis had identi- 144 Computational Comparison of HemS with its Mutants and Homologues Figure 7.7: Selected minima highlighting the conformational changes of Q210 as NADH progresses through the pocket. Haem is represented with a magenta skeleton and NADH with an orange skeleton. Residues Q210 (centre, cyan), E288 (left, cyan) and G311 (right, cyan) are represented explicitly. Top left: NADH is folded and at the edge of the pocket. A hydrogen from the –NH2 group of the nicotinamide hydrogen-bonds with the G311 backbone. Meanwhile, Q210 is hydrogen-bonded to E288. Top right: NADH has partially unfolded and progressed further along the pocket. Its nicotinamide is no longer hydrogen- bonded to G311. However, Q210 has swung around to hydrogen-bond with G311, thus abandoning E288. Bottom left: NADH has extended further into the pocket. Q210 has flipped over to another conformation, so that the opposite hydrogen of its –NH2 group is now hydrogen-bonded to G311 from that which was bound previously. The hydrogen atom that had originally been hydrogen-bonded to G311 is therefore freed, and bonds to the diphosphate backbone of NADH. This presumably helps to stabilise NADH as it occupies a high-energy extended conformation, and perhaps to help it overcome some further barriers to achieve full extension. Bottom right: NADH is now fully extended within the pocket. Q210 has swung back round to hydrogen-bond to E288. 145 Computational Comparison of HemS with its Mutants and Homologues fied a set of residues (F104, F199, R209 and Q210) with interesting properties, which were thought to be suitable for parallel computational and experimental mutagenesis studies. 7.4 NADPH Before examining any of the mutated (or homologue) systems, a scenario where NADH was replaced with NADPH was briefly considered. Experimental studies had revealed that NADPH can react with haem to give the same haem breakdown product as NADH. In essence, this result is unsurprising given that both molecules have similar structures and are capable of donating hy- drides. However, these two molecules tend to have different roles in the cell, with NADH primarily being used for catabolic purposes, and NADPH for anabolic pur- poses. Both NADH and NADPH are found almost ubiquitously in cells, but there are many cases in biology of an enzyme being specific for only one of these molecules. Typically, this specificity occurs when the extra phosphate group of NADPH (see Fig. 1.19) interferes within the binding pocket. Computational studies of the WT HemS system with NADH suggest that this extra phosphate group would point away from the pocket and is therefore unlikely to destabilise NADPH docking. Though checkspmutate is capable of ‘mutating’ ligands (as long as the AMBER param- eters for the new ligand are uploaded) as well as protein residues, the decision was made to not do a full study of the system with NADPH. However, a selection of stationary points were studied, and showed that the extra phosphate group does indeed point away from the pocket. In fact, this phosphate group was shown to be capable of forming a stabilising hydrogen-bond with the hydroxyl group of the T312 residue, perhaps indicating that NADPH is the natural ligand after all, and not NADH. This bond is shown in Fig. 7.8. 7.5 Mutant and Homologue Systems 7.5.1 Analysing the Databases Using the WT HemS system as a template, databases were generated and grown for the F104A, F104AF199A, F104I, F199A, R209A, R209K and Q210A mutants, and HmuS, ChuS and ShuS homologues. The databases are at various stages of comple- tion. Those databases describing the mutants can all be considered ‘complete’, as their disconnectivity graphs (shown in Figures 7.9 and 7.10) display clearly the gen- 146 Computational Comparison of HemS with its Mutants and Homologues Figure 7.8: Selected minimum showing NADPH extended along the WT HemS pocket. The extra phosphate group NADPH has with respect to NADH is shown hydrogen-bonding to the T312 residue of HemS. This phenomenon suggests that NADPH is perhaps even more suited to docking inside the HemS pocket than NADH, suggesting that it is the natural ligand instead. eral features of the system, plus the rate constants for the pathways investigated have converged.c The databases describing the homologues cannot be considered com- plete. In each case, the disconnectivity graphs, though showing many mis-assigned minima, are still able to reveal interesting general features of the homologues. How- ever, successive attempts to refine the pathways of interest for these homologues were still giving large changes to the features of these pathways and their rate constants. Further sampling (beyond the scope of this project) will therefore be required to complete these databases. The disconnectivity graphs for the mutants and homologues were each colour- coded according to the haem-NADH distance, as defined in Fig. 7.3 for the WT HemS system. Two minima are highlighted on each graph, labelled S for start, and F for finish, respectively. For the original, template WT HemS system (shown again in Fig. 7.9), S represents a manually selected minimum which is folded up and at the edge of the pocket. F, meanwhile, represents the minimum with the lowest NADH- cHowever, it must be noted that the databases describing these systems cannot be considered as being truly complete, due to the very high degrees of freedom these systems each contain. Though this observation implies that there are many possible unsampled minima and transition states in each of these systems, the assumption is made that the refinement schemes were robust enough to identify those structures most relevant to the lowest energy pathways. 147 Computational Comparison of HemS with its Mutants and Homologues haem distance in the entire database, as defined by the β-meso-carbon of haem and the hydride-bearing carbon of NADH (consistent with the colouring scheme for the disconnectivity graphs). These two minima, S and F, were those that were selected for the fastest pathway analysis, as it was thought that they were suitable representative structures for the endpoints of the movement of NADH within the protein pocket. The disconnectivity graphs for the mutants and homologues were also labelled S and F. The minimum representing S in each case was the reoptimised S minimum from the WT HemS system. It was not possible in every case for the minimum representing F from the WT HemS system to be successfully reoptimised following mutation. Therefore, F for each mutated or homologue system was the one with the lowest NADH-haem distance, as was used to define F in the WT HemS case. For some of these mutated systems, the minimum with the lowest NADH-haem distance coincided with that from the WT HemS system. As in the WT HemS case, these two S and F minima were those used to define the endpoints for any subsequent fastest pathway analysis. Figures 7.12 and 7.13 show the fastest pathways calculated for the WT HemS and mutant systems studied. Due to their databases not being sufficiently refined, which can give rise to artificial barriers and kinetic traps, pathways for the homologues are not provided. For those pathways that are provided, the starting minima (point 0 along the integrated path length) and the finishing minima (the last point along the integrated path length for that system) correspond to the S and F minima in the disconnectivity graphs for their respective systems. 148 Computational Comparison of HemS with its Mutants and Homologues 10 kcal/mol Wild Type HemS S F NADH-Haem Distance 4.0 8.0 12.0 16.0 20.0 10 kcal/mol F104A S F NADH-Haem Distance 4.0 8.0 12.0 16.0 20.0 10 kcal/mol F104AF199A S F NADH-Haem Distance 4.0 8.0 12.0 16.0 20.0 10 kcal/mol F104I S FNADH-Haem Distance 4.0 8.0 12.0 16.0 20.0 Figure 7.9: WT, F104A, F104AF199A and F104I HemS Disconnectivity Graphs. 149 Computational Comparison of HemS with its Mutants and Homologues 10 kcal/mol F199A S F NADH-Haem Distance 4.0 8.0 12.0 16.0 20.0 10 kcal/mol R209A S FNADH-Haem Distance 4.0 8.0 12.0 16.0 20.0 10 kcal/mol R209K S FNADH-Haem Distance 4.0 8.0 12.0 16.0 20.0 10 kcal/mol Q210A S FNADH-Haem Distance 4.0 8.0 12.0 16.0 20.0 Figure 7.10: F199A, R209A, R209K and Q210A HemS Disconnectivity Graphs. 150 Computational Comparison of HemS with its Mutants and Homologues 10 kcal/mol HmuS S F NADH-Haem Distance 4.0 8.0 12.0 16.0 20.0 10 kcal/mol ChuS S F NADH-Haem Distance 4.0 8.0 12.0 16.0 20.0 10 kcal/mol ShuS S F NADH-Haem Distance 4.0 8.0 12.0 16.0 20.0 Figure 7.11: HmuS, ChuS and ShuS Disconnectivity Graphs. 151 C o m p u ta tio n a l C o m p a riso n o f H e m S w ith its M u ta n ts a n d H o m o lo g u e s ΔE = −60.43 kcal mol-1 ΔE = −41.28 kcal mol-1 ΔE = −50.43 kcal mol-1 ΔE = −48.01 kcal mol-1 Figure 7.12: Dijkstra fastest pathways between minima S and F for WT, F104A, F104AF199A and F104I HemS. ΔE represents the overall potential energy change. Blue boxes highlight a region of NADH phosphate backbone twisting common to all systems, and green boxes highlight a region of NADH ribose twisting, also common to all systems. 152 C o m p u ta tio n a l C o m p a riso n o f H e m S w ith its M u ta n ts a n d H o m o lo g u e s ΔE = −53.14 kcal mol-1 ΔE = −47.93 kcal mol-1 ΔE = −47.06 kcal mol-1 ΔE = −45.84 kcal mol-1 Figure 7.13: Dijkstra fastest pathways between minima S and F for F199A, R209A, R209K and Q210A HemS. ΔE represents the overall potential energy change. Blue boxes highlight a region of NADH phosphate backbone twisting common to all systems, and green boxes highlight a region of NADH ribose twisting, also common to all systems. 153 Computational Comparison of HemS with its Mutants and Homologues When its disconnectivity graph is compared against the other systems, it is apparent that the WT HemS system contains many more low-lying minima with short NADH-haem distances. Though it is not complete, it appears that the HmuS graph is the only other one that comes close to WT HemS in this regard. This result would suggest that the WT forms for both of these proteins are efficiently optimised to bring NADH deeper into the pocket, and that any single-point mutations can disrupt this fine balance. Analysis of the fastest pathways between WT HemS and its mutants supports this conclusion: the movement of NADH towards haem in the WT HemS pocket results in an overall potential energy change of −60.43 kcal mol-1, which is a steeper decrease than for any of the mutants studied. Though not complete, certain conclusions can be drawn from the ChuS and ShuS disconnectivity graphs. It is clear from each that the approach of NADH towards haem does not give a potential energy decrease to the same extent as any of the HemS systems, nor of HmuS. Indeed, in the case of ShuS, minimum F is actually higher in energy than minimum S. In both the ChuS and ShuS databases, there are few minima where the NADH-haem distance is small, suggesting that there is a relatively small drive for NADH to approach and break down haem in these proteins. The disconnectivity graph for ShuS is complete enough to show a funnel developing. It is noteworthy that this funnel is totally removed from the basin that contains minimum F, as it suggests that there is a different function this protein may be engaged in. Furthermore, the barrier to move from S to F (i.e. the conventional unfolding of NADH and its approach to haem) is very large, whereas to move from S to the most stable structure, it appears to be almost entirely downhill. The fastest pathway to move from S to this lowest energy minimum was therefore calculated. Rather than charting movement of NADH through the protein pocket, this pathway instead showed significant movement of the N–terminal α-helix, which opens up the smaller cavity. Snapshots of this pathway are shown in Fig. 7.14. The stabilisation brought about by this α-helix movement suggests that ShuS may be priming itself to bind ligands within this smaller cavity. Analysis with bioinformatics (see Section 8.4 in the following chapter) strongly suggests that the preferred ligand is double- stranded DNA. The plotted pathways in Figures 7.12 and 7.13 show that the fastest pathway is roughly the same length between the WT and the mutants (ranging from approxi- mately 1600–1900 stationary points). The shortest pathway is with F104AF199A. It would seem that the removal of the entire double phe-gate opens up the cavity such that NADH can move through it and approach haem more directly. There are two highlighted regions in each of these plotted pathways, indicated by 154 Computational Comparison of HemS with its Mutants and Homologues Figure 7.14: Overlay of the starting, S (green cartoon), and lowest energy, L (cyan car- toon), minima from the ShuS database. Left: The haem and NADH conformations forming part of the S minimum are colour-coded in magenta and orange, respectively. For the L minimum, haem and NADH are colour-coded in yellow and dark blue, respectively. In the L minimum, NADH is further inside the pocket, but the nicotinamide head is still folded back towards the phosphate backbone. The most significant region of structural change between the S and L minima is in the N-terminal α-helix, highlighted by a black box. Right: Reverse angle, magnified image of the entrance to the small cavity. This image shows that the movement of the N-terminal α-helix in the transition from the S minimum to the L minimum results in a significant enlargement of the entrance to the small cavity. blue and green boxes, respectively. The blue boxes cover the region of the pathway in which a certain twisting motion of one of the NADH phosphate-ribose linking bonds takes place, and is highlighted because it is typically a high-energy process, which often results in it being the highest energy region across the entire pathway. All mutants retain this region, but the energy required to traverse it varies. For the F104A and F104I mutants, the overall barrier is particularly high, suggesting that the F104 residue is important for stabilising this high-energy backbone twist. The key stationary points involved in this twisting process are shown in Fig. 7.15, as demonstrated with the WT HemS system. This figure shows that, as the amide of the nicotinamide unfolds and thus moves away from the phosphate backbone, where it was engaged in a stabilising hydrogen-bond, the energy increases, giving the maximum along the region of the pathway encapsulated by the blue box. The system is then stabilised when the C11–C12 dihedral bond shifts from −85.9° to −177.8°. None of the residues that were selected for mutation appear to be directly involved in this process. The other region common to all of the pathways, and highlighted inside a green box in Figures 7.12 and 7.13, is concerned with a change of conformation of the nicotinamide-bearing ribose. It occurs late on in the pathway in each case, where 155 Computational Comparison of HemS with its Mutants and Homologues Overlaid Representations Minimum 529: −15227.52 kcal mol-1 TS 584: −15201.10 kcal mol-1 Minimum 609: −15222.45 kcal mol-1 Figure 7.15: Illustration of the high-energy NADH conformational change that occurs along the fastest pathway, as encapsulated by the blue box for the WT system in Fig. 7.12. The green representation is for the low-lying minimum at position 529 along the pathway before the barrier, the cyan representation for the highest transition state at position 584, and the magenta representation for the low-lying minimum at position 609 on the other side of the barrier. Top left: Overlay of these three stationary points. Top right: Minimum 529. One of the hydrogen atoms of the nicotinamide amide hydrogen-bonds to both the cyclic oxygen of the ribose and to the closest backbone phosphate oxygen. The bond lengths are 3.4 and 4.3 A˚, respectively. The dihedral angle around the C11–C12 covalent bond (i.e. the bond linking the phosphate backbone to the nicotinamide-bearing ribose) is −83.1°. Bottom left: Transition state 584. The nicotinamide amide group has moved away from the backbone, increasing the bond length from the nicotinamide amide hydrogen to the closest backbone phosphate oxygen from 4.3 to 4.8 A˚. This hydrogen-bond therefore becomes less stabilising. The C11–C12 dihedral angle has remained approximately the same (shifting from −83.1° to −85.9°) giving a high energy conformation. Bottom right: Minimum 609. Twisting this C11–C12 bond (changing the dihedral angle from −85.9° to −177.8°) stabilises the structure again. 156 Computational Comparison of HemS with its Mutants and Homologues the overall energy of the system tends to be lower, and yet it constitutes a significant barrier. Residue R209 is closely involved with the process, binding alternatively to the C2′-hydroxyl oxygen and C3′-hydroxyl oxygen of the ribose as the NADH confor- mation changes. Mutation of this residue to a lysine (R209K) does not significantly change the shape or size of this barrier, but a mutation to alanine (R209A) was shown to reduce it significantly. R209 is a very well conserved residue (see Section 8.3 for details), and so despite the alanine residue providing a smoother alterna- tive for this particular region of the fastest pathway, there must be other reasons for HemS and its homologues to retain it so consistently. Some possible reasons, such as stereospecific considerations, were suggested in Section 7.3. The conforma- tional changes of NADH involved in this highlighted green region, along with the involvement of R209, are shown in Fig. 7.16, as demonstrated with the WT HemS system. 7.5.2 Lowest Energy Minima from Each Database The disconnectivity graphs in Figures 7.9 and 7.10 show that the lowest energy min- imum does not always have a short NADH-haem distance. Furthermore, this feature is typically true for those mutants where the double phe-gate has been mutated in some way. The lowest energy minima from each of these databases were extracted, and are shown in a reduced form in Fig. 7.17. For those cases where the double phe-gate has not been mutated (WT, R209A, R209K and Q210A), atoms C5 (a methyl carbon) of haem and (R)-hydride of NADH are in close proximity, with the hydride pointing directly at this methyl carbon. Given these are the lowest energy minima in their respective databases, this result would suggest that these minima are the true endpoints of the movement of NADH through the HemS pocket, and that hydride transfer is to this methyl carbon of haem, rather than the β-meso-carbon, as previously assumed. There is no experi- mental evidence to definitively show which haem carbon is attacked. Were it to be via this methyl atom, then some sort of rearrangement would be required to bring about the cleaving of the ring at the β-meso-carbon position, which experiment has shown does occur. Furthermore, given the reaction can proceed with either the (R)- or the (S )-hydride of NADH (see Section 5.3) then an alternative mechanism whereby the (S )-hydride is brought into close proximity to this haem methyl carbon (of which there is little evidence in the databases studied) would be required. When the double phe-gate is disrupted, the lowest energy structure can change dramatically. For F104A, F104I and F199A, these structures occur before NADH has passed through the gate. This information suggests that the double phe-gate does 157 Computational Comparison of HemS with its Mutants and Homologues Overlaid Representation Minimum 1221: −15263.39 kcal mol-1 TS 1276: −15229.24 kcal mol-1 Minimum 1329: −15262.52 kcal mol-1 Figure 7.16: Illustration of the high-barrier nicotinamide-bearing ribose conformational change which occurs along the fastest pathway, as encapsulated by the green box for the WT system in Fig. 7.12. Segments of haem, NADH and R209 (functional group only) are shown. The green representation is for the low-lying minimum at position 1221 along the pathway before the barrier, the cyan representation for the highest transition state at position 1276, and the magenta representation for the low-lying minimum at position 1329 on the other side of the barrier. Top left: Overlay of these three stationary points. Top right: Minimum 1221. Residue R209 is hydrogen-bonded to both of the oxygen atoms of the deeply buried propionate of haem. It is also hydrogen-bonded to the C3′-hydroxyl oxygen of the ribose. Bottom left: Transition state 1276. NADH has twisted away from R209, leaving the cyclic oxygen of the ribose closest. Bottom right: NADH has twisted further, resulting in the C2′-hydroxyl oxygen of the ribose hydrogen-bonding with residue R209 instead. This conformation is primed to unfold, with few large barriers to traverse along the remainder of the pathway. As part of the NADH unfolding process, R209 bonds again to the C3′-hydroxyl oxygen, as shown in Fig. 7.6. 158 Computational Comparison of HemS with its Mutants and Homologues Wild Type F104A F104AF199A F104I F199A R209A R209K Q210A Figure 7.17: Lowest energy structures for the respective WT and mutated HemS systems, represented in a reduced form. Haem is represented in magenta, NADH in orange, and residues 104 and 199 in cyan. Methyl carbon C5 of haem (see Fig. 1.1 for a labelled diagram of the haem structure) and the (R)-hydride of NADH are shown as expanded spheres. In the WT, R209A, R209K and Q210A cases, these two atoms are in close proximity. Given that these are the lowest energy minima, this result suggests this location may be where the hydride transfer occurs. For F104A, F104I and F199A, the lowest energy structure occurs before NADH has passed through the gate, suggesting that these mutations give proteins that are not as suitable for facilitating hydride transfer. 159 Computational Comparison of HemS with its Mutants and Homologues indeed organise the energetics of the pocket to facilitate the controlled movement of NADH towards haem. Interestingly, the lowest energy structure of the F104AF199A double mutant shows NADH unfolded to the same extent as the cases where the double phe-gate is not disrupted at all, but that its hydride is not directly pointing towards the C5 methyl carbon of haem. This finding therefore suggests that the double phe-gate not only facilitates favourable energetics for NADH to unfold and approach haem, but that it also plays a role in orienting the nicotinamide head once in that close proximity. 7.6 Discussion and Summary Computational work helped to deepen an understanding of the WT HemS system, as well as of selected mutants and homologues. Taking advantage of GPU implementation, as well as development of the al- gorithms discussed in the previous chapter, the [WT HemS + Haem + NADH] database was grown to a much greater extent (approximately one thousandfold) than had previously been possible. Plotting this database as a disconnectivity graph revealed that the system exhibits funnel-like behaviour. Furthermore, Fig. 7.3 re- vealed that descent down the funnel correlates with a reduction of the NADH-haem distance. This structure implies that the approach of NADH towards haem within the pocket is a favourable process, strengthening the case that HemS is indeed an enzyme that has evolved to bring NADH and haem together in order to react. From this database, a fastest pathway could be identified, which clearly showed a favourable route with few large barriers for NADH to unfold and approach haem. Having a large database also allowed for the behaviour of the double phe-gate to be investigated more fully. This analysis showed that, in the absence of NADH, the gate is typically fully open. However, once NADH is introduced at the edge of the pocket, F104 flips to a closed conformation, resulting in a T-shaped pi-pi bonding interaction between the two gates. Only once NADH is in close proximity to these gates does F104 open up again in order to allow it to pass through to haem. This behaviour is due to a series of intramolecular residue interactions instigated by the presence of NADH, as described in Fig. 7.5. It seems that the overarching purpose of these interactions is to allow NADH to be held in place effectively throughout its time inside the pocket. The double phe-gate was clearly of importance to the reaction and so residues F104 and F199 were chosen for mutagenesis studies. Two other residues were also selected. The first, R209, was chosen because of its ability to bind to both NADH 160 Computational Comparison of HemS with its Mutants and Homologues and haem when they are in close proximity. It was therefore thought that this residue may be important in bringing the two ligands close to one another, or orienting them in a specific way. The second was Q210, which was selected due to the significant conformational changes it engages in, seemingly to stabilise the phosphate backbone of NADH at certain stages along the pathway. Before engaging in mutagenesis, a short study on NADPH was performed, with the aim being to explain the experimental finding that both NADH and NADPH can engage in the breakdown of haem. This study showed that the extra phosphate group of NADPH points away from the centre of the pocket, thus explaining why both of these ligands can access haem and react in the same way. It was further shown that this extra phosphate group can hydrogen-bond to residue T312, suggesting that NADPH association with HemS is more stabilising, and it is therefore more likely to be the natural ligand, compared to NADH. Using checkspmutate and the WT HemS system as a template, databases were created and refined for the F104A, F104AF199A, F104I, F199A, R209A, R209K and Q210A mutants, and HmuS, ChuS and ShuS homologues. The databases describing the mutants can all be considered as being complete. The disconnectivity graphs arising from these databases (see Figures 7.9 and 7.10) revealed that each had a far smaller proportion of minima with low NADH-haem distances compared to the WT, suggesting they were less capable of directing NADH towards haem. The mu- tations to F104 and F199 (other than F104AF199A) were shown to be particularly disruptive, where the lowest energy minima showed NADH still partially folded up, with the nicotinamide head pointing away from haem. The Dijkstra fastest pathway between two minima specially selected to demon- strate NADH unfolding and approach to haem was derived from each database de- scribing a mutant, and compared against the WT pathway. These pathways showed that the change in potential energy, ΔE, was negative for each mutant, but not by as much as for the WT. The two most prominent barriers in the WT were highlighted, and the effect of each mutation on these barriers investigated. The barrier earlier in the pathway pertained to a twisting motion of the NADH phosphate-ribose linking bond. This barrier grew significantly when F104 was mutated, demonstrating that this phe-gate is important for stabilising the dihedral changes required of NADH to allow it to unfold. The second highlighted barrier was concerned with the confor- mation of the nicotinamide-bearing ribose, a process which a mutation to the R209 residue disrupts. Though they were not entirely complete, certain conclusions could be drawn from the databases describing HmuS, ChuS and ShuS. The disconnectivity graph 161 Computational Comparison of HemS with its Mutants and Homologues for HmuS looked very similar to the one for WT HemS, containing a wide funnel with many minima where the two ligands are in close proximity, suggesting that these systems behave in a similar manner. The disconnectivity graphs for ChuS and ShuS, however, were entirely different. Neither had an obvious funnel, and very few minima where NADH and haem were close. The appearance of these graphs strongly suggests that these two proteins are not optimised to facilitate the reaction of NADH with haem. This structure does not preclude their ability to perform this reaction (as shall be demonstrated in Chapter 9), but it does suggest these proteins have different possible functions. Analysis of the lowest energy structures of the ShuS database gave an indication as to what this alternative function could be. These structures showed the N-terminal α-helix in a markedly different position from where it is typically located, with this movement corresponding to an opening up of the smaller cavity. The following chapter shall show, via bioinformatics, why this is likely to be part of an alternative DNA-binding function. Overall, computational analysis therefore provided a number of predictions to be tested in the laboratory. All of the mutants were demonstrated to lessen, but most likely not entirely prevent, the ability of NADH to access haem. It also showed that, of the homologues, HmuS was most likely to engage in this NADH-dependent haem breakdown reaction but that ChuS and ShuS might be more limited by competing alternative functions. 162 Chapter 8 Bioinformatic Study of HemS Homologues The bioinformatics work in this chapter was undertaken by Yuhang Xie (King’s College, Cambridge, 2020-21) as part of his Master’s project, under the day-to-day supervision of the author. The following discussion is original to the author apart from the figures and passages highlighted. 8.1 Aims HmuS, ChuS and ShuS had been chosen as alternative proteins to HemS for com- putational study, as they were the best characterised in the literature. However, these homologues together only provide a small snapshot of the overall phylogenetic context of HemS. Given the advances in the field of bioinformatics since Choy’s original study,21 it was reasoned that a fresh study could shed new light. With proteomic databases having expanded significantly, it was possible new proteins had been identified with NADH-binding homology to HemS. A further aim was to identify a consensus sequence within the HemS family and determine whether the residues selected for mutation are conserved. Due to the reported DNA-binding properties of ShuS and PhuS (see Sections 1.4.5 and 1.4.6 in the Introduction, re- spectively) an investigation of possible DNA-binding sites was also conducted. To summarise, the aims of the bioinformatic investigation were: 1. To investigate the phylogenetic context of HemS. 2. To search for newly-discovered proteins in the literature with possible links to NADH metabolism or NADH binding modes homologous to those in HemS. 163 Bioinformatic Study of HemS Homologues 3. To identify a consensus sequence for HemS and use this information to de- termine whether the residues predicted by computation to be important for NADH-binding are conserved. 4. To investigate possible regions for DNA-binding in HemS and its homologues. Outcomes from this research are described in the following sections. 8.2 Phylogenetics Xie began by placing HemS within its phylogenetic context. He amassed homolo- gous sequences of HemS using the default settings in pBLAST,241 except that the maximum number of sequences was increased to 5000. The sequences identified came from a total of 218 different genera. One sequence was manually selected from each of these genera, and a maximum-likelihood phylogenetic tree created in MEGA-X242 using default settings, except that the number of bootstrap replications was increased to 2000. iTOL243 was then used to visualise, manipulate and export the tree. This tree is provided in Fig. 8.1. Fig. 8.1 clearly shows that HemS and HmuS are closely related, as are ChuS and ShuS. This result is not surprising, considering that HemS/HmuS both be- long to the Yersinia genus, and ChuS/ShuS respectively belong to Escherichia and Shigella, which are, as noted in Section 1.3.5.1, arguably the same genus. Inter- estingly, though, these HemS/HmuS and ChuS/ShuS pairs are shown to be only distantly related to one another, with many intervening genera between them. The same is true for PhuS, which is shown to be only distantly related to both the HemS/HmuS and ChuS/ShuS pairs. This situation serves to emphasise the sheer number of bacterial species which contain a version of this protein. Those that appear in genera intervening between the HemS/HmuS and ChuS/ShuS pairs can be reasonably assumed to engage in anaerobic haem breakdown, since all four of these proteins have been shown to catalyse this reaction with NADH (see the next chapter for details). Xie further noted that these homologues originated from both pathogenic and non-pathogenic bacteria, suggesting these proteins (and the operons they form part of) do not necessarily adversely affect the host. Xie also noted that all 5000 of the homologues were listed as either haem transport or hemin degradation factors. It is interesting that so many of these homologues have been listed as haem degrad- ing enzymes, without any apparent experimental evidence. This classification was probably the result of Stojiljkovic & Hantke’s original investigations60 into the hem 164 Bioinformatic Study of HemS Homologues Figure 8.1: Maximum likelihood phylogenetic tree for HemS and its homologues. 218 different genera are represented, including both pathogenic and non-pathogenic bacteria. Selected species are highlighted. Red: Yersinia enterocolitica (HemS), Yersinia pestis (HmuS). Purple: Escherichia coli (ChuS), Shigella dysenteriae (ShuS). Green: Pseu- domonas aeruginosa (PhuS). A fuller version, which includes the Accession Numbers for each sequence and the bootstrap values for the nodes, is included in Appendix E. Figure reproduced from Xie,81 with minor adaptations. 165 Bioinformatic Study of HemS Homologues operon, where they speculated that HemS itself could be a haem degrading enzyme (see Section 1.4.1). It is therefore satisfying that such a haem breakdown process has now been identified in at least some of these homologues (i.e. HemS, HmuS, ChuS and ShuS), using NADH as a hydride donor. This result is remarkable because none of the homologues gathered by Xie are annotated as being able to bind NAD(P)H. This observation implies that NADH binds in the HemS pocket in a manner not seen before, as Choy had inferred when his Relibase+ searches only gave ‘weak’ hits. None of the homologues were known or speculated to bind to nucleic acids either (apart from ShuS and PhuS), a feature which shall be of interest in Section 8.4. 8.3 Sequence Analysis and Conservation The 218 sequences selected to construct the phylogenetic tree were aligned, and the consensus sequence taken to be 345 residues long. This alignment revealed that 177 of the 345 residues were conserved where the consensus threshold was set to 50%, that 105 were conserved where the threshold was set to 70%, and that 59 were conserved where the threshold was set to 90%. Of those very highly conserved residues, 44.1% were involved in the large cavity and 27.1% in the small cavity. Fig. 8.2 shows the residues which form the large and small cavities respectively, as assigned by MetaPocket.244. Those residues that were suspected to be involved in haem-binding (see Fig. 1.14) showed remarkable conservation. R102, H196 and R209 all showed 100% conserva- tiona across the 218 homologues, and K294 and R321 98.1% and 99.1%, respectively. The residue at 318 showed 69.7% conservation for phenylalanine, although HemS has a tyrosine at this position. Tyrosine and phenylalanine are both aromatic residues, and so presumably this change does not significantly alter the haem-binding proper- ties. Together, they account for 97.2% of residues at this position. Q316, meanwhile, is not as strictly conserved across the homologues, at 71.1%. Those residues identified by Choy21 as possibly being involved in NADH-binding (see Fig. 1.26) did not show as high levels of conservation. These, respectively, were: Q132, 29.8%; S171, 4.1%; K203, 43.1%; R250, 61.5%; and T312, 30.7%. The fact that conservation of haem-binding residues is so much higher than the conservation of the purported NADH-binding residues suggests that all of the homologues are involved in haem-binding, but not all of them are haem degrading enzymes, or at aUnless otherwise stated, all % conservation values listed are % identities rather than % simi- larities. 166 Bioinformatic Study of HemS Homologues Figure 8.2: PyMOL representation of HemS, using structure 2J0R95 from the Protein Data Bank. Residues were assigned to the large or small cavities using the default values in MetaPocket.244 Those residues determined to be part of the large cavity are highlighted in cyan, and those determined to be part of the small cavity are highlighted in salmon. Where MetaPocket assigned a residue to both cavities, the residue was manually assigned to the cavity where its ranking was highest. least not all of them have haem degradation as their primary function. It was also important to note the extent of conservation of the residues, which had become the focus of both the computational and experimental investigations in this thesis. The first, R209, which was already mentioned above as a haem-binding residue, is 100% conserved, suggesting that it is important at least for binding haem, if not for enzymatic activity. Q210 is only conserved in 42.2% of the homologues, suggesting it is not as important for protein function. This conclusion assumes, however, that all of the homologues are designed to bind to NADH. If this is not the case, then it is not particularly surprising that Q210, a residue predicted to bind to NADH, and not to haem, is not as strictly conserved. Importantly, this residue is conserved in HmuS, ChuS and ShuS, which were all shown to be capable of utilising NADH to break down haem (see Section 9.4 in the next chapter for details). Considering that they are not suspected to be directly involved in binding to haem or NADH, the two phenylalanine residues that make up the double-phe gate are remarkably well conserved. F104 is conserved across 98.6% of the homologues, and F199 across 82.6% of them (rising to 90.8% when the other possible aromatic residues, histidine and tyrosine, are included). This conservation lends weight to the notion that these two residues are performing some sort of regulatory function 167 Bioinformatic Study of HemS Homologues in controlling access of other ligands to haem. It is thought that the large cavity of HemS may have been the result of two smaller cavities merging in an ancestor, and that the two phenylalanines are situated where those cavities merged. Their presence at the centre of the pocket may therefore have originated as some sort of evolutionary ‘accident’. However, their remarkably high rates of conservation would suggest that these two residues then became essential to protein function. 8.4 DNA-Binding Exhibited by Some Homologues Sections 1.4.5 and 1.4.6 described the DNA-binding properties of ShuS and PhuS, and their link to iron and haem regulation, as discovered by the Wilks group. As far as the author is aware, no DNA-binding studies have been carried out on any of the other homologues (including HemS itself, HmuS and ChuS). No such studies were carried out in this work either. However, it was found that the expression and purification of ShuS (to be described in the next chapter) required DNase/RNase treatment while lysing the cells. The presumption is that this requirement was because DNA was binding to ShuS and causing it to aggregate, a phenomenon not found in any of the other homologues. It was remarkable that ShuS required this treatment with benzonase nuclease, and yet ChuS did not. The two homologues have 98.5% sequence identity, varying in only five of their residues. Xie therefore used bioinformatics to investigate why two such similar proteins have such apparently different DNA-binding characteristics. Xie first showed that ChuS and ShuS have very similar predicted isoelectric points (5.5 and 5.7, respectively) and somewhat similar net charges at pH 7.4 (−12.7 and −10.7, respectively).81 The study was therefore extended to determine whether there were any specific areas of the respective homologue surfaces that would be suitable for DNA-binding. Coulombic surface charge calculations were carried out on the apo-forms of ChuS, ShuS, HemS and PhuS.b These calculations showed that both the large and the small cavity (as shown in Fig. 1.13) are cationic in contrast to the rest of the surface, which was either anionic or neutral, making them prime candidates for binding to the highly anionic DNA. A charged surface for ShuS is shown in Fig. 8.3. The DNAbind package246 was then used to predict the number of DNA-binding bSurfaces could readily be derived for ChuS, HemS and PhuS from their published crystal structures. HmuS was not included in this study because no such crystal structure was readily available. The same was true for ShuS, but because a comparison between the DNA-binding properties of ChuS and ShuS was sought after, a ShuS structure was modelled upon the ChuS structure using SwissModel.245 168 Bioinformatic Study of HemS Homologues residues for each homologue, estimating ten apiece for ShuS and ChuS, eight for HemS and seventeen for PhuS. For HemS, these residues were spaced far apart at seemingly random positions of the surface, suggesting that HemS did not have a contiguous DNA-binding surface. For PhuS, the binding residues clustered around the large cavity (suggesting that is where DNA would bind), whereas for ShuS and ChuS the residues clustered around the small cavity. This clustering perhaps explains the laboratory-based findings by the Wilks group, where they determined that PhuS is sequence specific, but ShuS is not.102;112 In addition to this specificity, PhuS can readily dimerise, and it is apparent from structural studies that the small cavity of PhuS is involved in this.111 DNA-binding at this smaller cavity would interfere with this dimerisation process. None of the predicted binding residues (Y18, R20, R29, R67, T205, R206, F243, G245, N246 and R247) overlap with the five residues that differ between ChuS and ShuS (ChuS/ShuS: A/V19, H/R46, T/A78, S/R123 and T/M151). However, two of those five residues are located close together in the small cavity. One of those residues is at position 123 and is a serine in ChuS but an arginine in ShuS. As serine is a polar, neutral residue but arginine is cationic, this mutation may explain why the ChuS pocket does not bind to anionic DNA but ShuS does. The other residue close by in this cavity, which is different between the two homologues, is at position 19. In ChuS, it is an alanine, but in ShuS it is a valine. Both are hydrophobic residues, but valine has significantly more steric bulk, which allows the valine in ShuS to interact with R123, whereas the alanine in ChuS is too far away to do the same with S123, as shown in Fig. 8.3. Having R123 anchored to the centre of the cavity may assist ShuS to bind to DNA, although further studies would be required to determine if this is indeed the case. Wilks had noted that both ShuS and PhuS can bind to DNA when in their apo- forms but not when in their holo-forms.103;112 Assuming it is true that DNA binds to the large cavity of PhuS, this behaviour would therefore seem to be a simple case of direct competition between DNA and haem. However, assuming that DNA binds to the small cavity of ShuS, there would be no such direct competition to explain this phenomenon. Rather, for ShuS, it may be that certain conformational changes induced by haem-binding could preclude the small cavity from binding to DNA. To date, there are no crystal structures published for ShuS. Wilks was able to infer from circular dichroism that the secondary structure of ShuS does not change significantly with haem-binding.102 Crystal structures for ChuS are available,80;98 which suggest minimal structural changes once haem binds. Haem docking to ChuS does, however, result in a subtle movement of the very first α-helix at the N-terminus, which serves 169 Bioinformatic Study of HemS Homologues Figure 8.3: DNA-binding in ChuS and ShuS. Top left: Charged surface representation of ShuS, calculated using Chimera.247 Red represents regions of negative charge, blue positive regions and white neutral regions. The small cavity is circled. Top right: Same structure, rotated 180°. The large cavity is circled. Bottom left: Small cavity of ChuS. Residues A19 and S123 are shown in cyan. Bottom right: Small cavity of ShuS. Residues V19 and R123 are highlighted in cyan. Both residues are larger than their equivalents in ChuS. It is perhaps the case that V19 anchors R123 in position, in a suitable position to bind to DNA. 170 Bioinformatic Study of HemS Homologues to narrow the small cavity. This movement of the N-terminal α-helix reflects certain computational findings regarding ShuS, which were expressed in Section 7.5.1 of the previous chapter. Experiments are needed to investigate this effect further, but this conformational change caused by haem-binding may be enough to prevent DNA from binding to ShuS. The fact that DNA can bind to the apo-forms of PhuS and ShuS, but not the holo- forms, points to another aspect of binding, which DNAbind unfortunately cannot capture. Namely, these predictions assume that DNA-binding itself does not lead to significant changes in the protein structure. However, this assumption is not reflected by the experiments carried out by the Wilks group, which found that DNA-binding to ShuS led to significant aggregation.103 The author experienced this aggregation as well when trying to lyse cells containing ShuS without DNase. It is perhaps the case that dimerisation/oligomerisation of ShuS gives a larger contiguous cationic surface capable of binding DNA.81 Presumably, if haem is already bound to ShuS, the protein is afforded a certain structural integrity, which would prevent this collapse to a protein-DNA aggregate. This is perhaps, therefore, an alternative explanation for why apo-ShuS can bind to DNA, but holo-ShuS cannot. The Wilks group have provided strong evidence that PhuS binds to the prrF1 promoter, which regulates haem flux.112 However, ShuS has only been shown to bind DNA non-specifically. Furthermore, to reiterate, the research by Xie has suggested that PhuS and ShuS bind to DNA at different cavities. It is even possible that ShuS is intended to bind ssDNA or RNA rather than dsDNA. Both experimental research by the author and bioinformatic studies by Xie sug- gest there is no obvious binding site for DNA in HemS or ChuS. Further research into this interesting area is required to determine further the DNA-binding capabilities of these homologues. 8.5 Discussion and Summary This chapter used recent advances in bioinformatics to study the phylogenetic con- text of HemS. Using pBLAST, the 5000 closest sequences to HemS were identified. These sequences came from a total of 218 different genera, including both pathogenic and non-pathogenic varieties. This diversity suggests that these proteins are not necessarily, or solely, for the extraction and breakdown of haem from host species. Of the 5000 sequences investigated, none had been reported as capable of catalys- ing the reductive degradation of haem. The Jia group had identified CPR-NADPH 171 Bioinformatic Study of HemS Homologues as a possible oxidative agent for haem breakdown in ChuS, but the Wilks group, in similar studies on ShuS and PhuS, suggested this observation may just be a coupled oxidation reaction (see Section 1.4.6 for details). Other than these studies, it does not appear that NAD(P)H interactions with any of the homologues have been investigated. Coupled with the weak hits Choy received when using Relibase+ to investigate NADH-binding in the HemS pocket, this result implies that the method of NAD(P)H-binding inside HemS, and within a certain number of its homologues, is novel. Analysis of the 218 sequences selected from the different genera revealed remark- able conservation of the haem-binding residues, therefore indicating that all of the homologues bind haem. However, the conservation of the NADH-binding residues was not as high, suggesting that not all of the homologues are haem breakdown en- zymes. The conservation of F104 and F199, the two phenylalanine residues involved in the double phe-gate, was very high. It is interesting that these residues, which were thought to be important for controlling NADH access to haem, are conserved to a far higher extent than the supposed NADH-binding residues themselves. This conservation suggests that this double phe-gate has a role in controlling not just the access of NADH to haem, but of alternative ligands that some of the homo- logues are perhaps capable of binding to. Whatever the case, it is noteworthy that these two residues, which are not directly involved in haem-binding, should be so well conserved, suggesting that regulation of ligand access to haem is of foremost importance. A phylogenetic tree was constructed, with HemS, HmuS, ChuS, ShuS and PhuS highlighted. This tree emphasised how closely related HemS/HmuS and ChuS/ShuS are to one another. Indeed, ChuS and ShuS only vary in five residues, and yet their DNA-binding and reductive haem breakdown properties have been found to be very different. The next chapter will describe the alterations to the protein expression and purification protocols that had to be made for ShuS because of its propensity to aggregate and precipitate out of solution upon cell lysis, which is thought to be a result at least in part of DNA-binding. Such treatment was not required for ChuS. The bioinformatic package, DNAbind, identified ten residues apiece in ChuS and ShuS which could potentially bind to DNA. The majority of these residues were clustered around the small cavity but none of them coincided with the five residues which differ between the homologues. It is speculated that having R123 in ShuS rather than S123 found in ChuS could increase the positive charge of the small cavity such that it becomes more attractive to anionic DNA. DNAbind was also used to study HemS and PhuS. This analysis found no obvious 172 Bioinformatic Study of HemS Homologues region for DNA-binding in HemS, whilst in PhuS the most likely region for binding was determined to be the large rather than the small cavity. This difference may explain why experiments by other groups have suggested that ShuS binds to DNA non-specifically whereas PhuS binds selectively to the prrF1 promoter. Overall, incorporating a bioinformatic element into this study has allowed for analysis of HemS within its broader context. Given the wide range (and often seemingly contradictory) roles assigned to HemS, HmuS, ChuS, ShuS and PhuS from the literature, phylogenetic and residue conservation studies have been very useful in providing possible reasons for this. Clearly, these proteins have evolved from a common ancestor and retained their unique haem-binding cavity. Yet, as evolution has proceeded, it seems these homologues have diverged to exploit different niches depending upon the requirement of the species producing them, the two obvious functions being to control haem flux via promoter suppression and to degrade haem reductively. 173 Chapter 9 Experimental Characterisation of Mutants and Homologues 9.1 Aims Computation predicted clear mutants and homologues to test experimentally. First, therefore, efforts were made to express and isolate these proteins. This process proved successful, with any deviations from the standard WT HemS growth and purification protocol noted. Haem-binding characteristics of these proteins were in- vestigated, and whether they could catalyse the NADH-dependent haem breakdown reaction. Stopped-flow spectroscopy was used to investigate the kinetics of these re- actions and to test for the presence of the intermediate identified in the WT reaction. Crystallisation of selected mutants was also attempted. Therefore, to summarise, the aims of the characterisation of HemS mutants and homologues were: 1. To determine whether the mutant and homologous proteins could be grown successfully in E. coli cells using the same protocol that was developed for WT HemS. 2. To compare haem-binding characteristics between WT HemS and these mu- tants and homologues. 3. To determine whether these mutants and homologues could also catalyse the NADH-dependent breakdown of haem. 4. To use stopped-flow spectroscopy to compare the kinetics of reaction between NADH and haem in the mutants and homologues, with respect to WT HemS. 5. To attempt crystallisation and X-ray crystallography of selected mutants to de- termine the effect the mutations have on the protein structure, and to attempt 174 Experimental Characterisation of Mutants and Homologues crystallisation and X-ray crystallography of the homologues, particularly those that do not have published structures. Outcomes from experiments inspired by these aims are described in the following sections. 9.2 Expression and Purification The method of expression and purification used for HemS proved to be suitable for all of its mutants and homologues, with all giving similar yields. The one exception was ShuS. The first attempt to lyse E. coli cells containing over-expressed ShuS resulted in the protein aggregating and precipitating out of solution. ShuS is known to be capable of binding to DNA,102;103 and so a fresh batch of cells were lysed in the presence of benzonase nuclease and protease inhibitor, which prevented aggregation, and the purification process proceeded as normal. However, over time, it was found that ShuS was still precipitating out of solution when stored in high concentrations. Increasing the pH to 8.0 from 6.5 prevented this aggregation.a It was considered noteworthy that none of the other homologues required this treatment, which implied that ShuS was the only one capable of binding to DNA. As ChuS and ShuS share 98.5% sequence identity, a further study into why these two homologues behave so differently was instigated, which was carried out by Xie81 (see Section 8.4). An SDS-PAGE gel of all of the mutants and homologues studied is provided in Fig. 9.1, highlighting their respective purities and variations in mass. The exact masses, confirmed by mass spectrometry, are given in Table 9.1. 9.3 Haem-Binding Properties The haem-binding capabilities of the mutants and homologues were investigated to determine whether there were any differences between them and WT HemS. It aIt is therefore not entirely clear what causes ShuS to precipitate out of solution, making this effect worthy of further study. Solutions treated with benzonase nuclease then stored at pH 6.5 in low concentrations (< 100 µM) do not appear to precipitate out of solution faster than any of the other homologues, thus suggesting that the presence of DNA during cell lysis is indeed a problem. However, at higher concentrations, ShuS precipitates faster than any of its homologues, even if it was treated with benzonase nuclease. This problem can be minimised either by reducing the concentration or increasing the pH to 8.0. 175 Experimental Characterisation of Mutants and Homologues Protein Expected Mass / Da Actual Mass / Da Difference / Da Wild Type HemS 39,360 39,360 0 F104A 39,284 39,284 0 F104AF199A 39,208 39,208 0 F104I 39,326 39,326 0 F199A 39,284 39,284 0 R209A 39,275 39,275 0 R209K 39,332 39,337 +5 Q210A 39,303 39,303 0 HmuS 39,104 39,104 0 ChuS 38,845 38,845 0 ShuS 38,831 38,833 +2 Table 9.1: Accurate masses of the proteins. Mass spectrometry was used to determine the actual masses. The masses for the R209K HemS mutant and ShuS were both slightly different from their respective expected values. Care must therefore be taken when inter- preting the data involving these proteins. was not expected that there would be any significant differences, since the homo- logues were already known haemoproteins, and none of the mutated residues studied (except R209) were thought to be directly involved in haem-binding. UV-Visible spectroscopy experiments were carried out as described in Section 3.5.1. The intention had been to run a set of experiments for all mutants and homo- logues at pH values of 8.0, 6.5 and 5.0. However, it was found that the protein sam- ples were not stable enough at pH 5.0, typically precipitating out of solution within minutes, and so measurements at this pH value were abandoned. Furthermore, the F104A and R209K samples had aggregated over time in the 4 °C refrigerator and so were not suitable for study at all. Spectra are shown in Fig. 9.2, and numerical results in Table 9.2. The results show, consistent with the research carried out by Sawyer,42 that the wavelength and the intensity of the Soret band of WT HemS both increase as the pH is increased. The increase in the wavelength was attributed to the fact that high-spin water complexes, which are more likely to arise at lower pH, tend to have shorter wavelength Soret peaks (∼406 nm) than low-spin hydroxide complexes (∼408 nm). The increase in intensity could be attributed to haem-binding being more efficient at higher pH, a change in the electronic conditions of the porphyrin, or a change in pK a due to deprotonation. This increase in intensity with respect to increasing pH is a property shared by all of the mutants and homologues apart from F104AF199A (where it decreased) and F199A (where it effectively stayed the same). R209A stood out, as the wavelengths for its Soret maxima at both pH values 176 Experimental Characterisation of Mutants and Homologues Figure 9.1: SDS-PAGE gel of the proteins studied in this work. All have had their His- tags cleaved. Lanes are numbered from left-to-right. Lanes 1 and 10 are markers, using PageRuler Protein Ladder, where the bands range from 10 kDa to 180 kDa. The samples are all pure, and the proteins are all situated close to the fifth band, which corresponds to 40 kDa. Lane 2: WT HemS; 3: F104A; 4: F104AF199A; 5: F104I; 6: F199A; 7: R209A; 8: R209K; 9: Q210A; 11: ChuS; 12: ShuS; 13: HmuS; 14: WT HemS (repeated). were noticeably lower than for any of the other proteins. Furthermore, it did not have the signature set of peaks for the β-band at ∼545 nm and for the α-band at ∼580 nm. It was also unique, in that the wavelength decreased in going from pH pH 6.5 pH 8.0 Soret Max. εSM Soret Max. εSM WT HemS 409.00 103,000 410.25 123,900 F104AF199A 406.50 104,300 409.75 101,800 F104I 408.50 93,900 410.50 98,800 F199A 408.00 104,400 410.50 104,500 R209A 406.50 93,600 405.25 113,000 Q210A 408.50 99,400 409.75 125,100 HmuS 409.00 109,500 410.25 120,300 ChuS 409.50 112,300 410.50 130,500 ShuS 410.00 101,600 411.00 120,500 Table 9.2: Haem-binding properties of WT HemS and selected mutants and homologues. The Soret maximum is quoted in nm, and εSM is the extinction coefficient in M-1 cm-1 at that Soret maximum. The ShuS values are lower than those reported previously (Wilks reported an ε410 value of 159 mM-1 cm-1 at pH 7.4),102 but appear to be consistent with the other homologues investigated in this study. 177 Experimental Characterisation of Mutants and Homologues 178 Experimental Characterisation of Mutants and Homologues Figure 9.2: Selection of UV-Visible haem-binding spectra at pH 8.0. All spectra were normalised by setting the absorbance at 800 nm to 0, thus giving a consistent baseline. The absorbance at each Soret peak was then set to the appropriate εSM values given in Table 9.2, therefore yielding molar absorbances. Top: Homologues. All of the proteins show similar haem-binding properties. Middle: Most of the HemS mutants show similar haem-binding characteristics with respect to the WT. Bottom: WT vs R209A HemS. R209A was least similar to the WT of all of the mutants. The removal of bonding to one of the haem propionates was most likely responsible, and possible implications of this removal on the cavity environment are given in the text. 6.5 to pH 8.0. The shorter wavelengths perhaps suggest that R209A causes the pK a of the distal water/hydroxide ligand to change, shifting the equilibrium in favour of water. This shift in turn could have been the result of changed electronic properties of the haem (as a result of R209 no longer binding to one of its propionates) or of a change in the environment in which water/hydroxide was situated.161 Furthermore, this mutation will alter the pK a values of the propionates themselves, relative to the WT protein. Therefore, the removal of the interaction between the R209 residue and the more deeply buried of these propionates could significantly influence the pH dependence of the spectra as well. 9.4 Reaction with NADH as Monitored by UV- Visible Spectroscopy Once all of the mutants and homologues had been expressed and purified, their reactivity with haem and NADH was tested using UV-Visible spectroscopy. The spectra are given in Fig. 9.3, and these all show growth in absorbance at 591 nm, implying that every mutant and homologue studied is capable of producing the HBP. The fact that all of the mutants are capable of reactivity suggest that the struc- tural integrity of the proteins have been retained and that none of those residues that were mutated were essential for the reaction to proceed. It is not a surprise that the F104 and F199 mutations did not stop reactivity, as they are mainly suspected to be involved in the regulation of NADH access to haem, and not its complete allowance/prevention of access. It is interesting that neither of the R209 mutations, nor Q210A, appear to significantly reduce the rate of reaction, although this shall be discussed further when considering the stopped-flow data. All three homologues, HmuS, ChuS and ShuS, were also shown to be able to catalyse this haem breakdown process, which is proof, for the first time, that proteins other than HemS can produce this novel 591 nm HBP. This finding strongly suggests 179 Experimental Characterisation of Mutants and Homologues Figure 9.3: UV-Visible difference spectra of the mutants and homologues studied. Reac- tion stoichiometry was 8 µM protein : 8 µM haem : 2000 µM NADH. Scans were run every minute for 50 minutes, indicated by the colour scheme change from red (1 minute) to blue/purple (50 minutes), except for R209K, which only ran for 20 minutes. The posi- tive absorbance values between 400-450 nm for the WT, F104A, R209A and R209K HemS samples indicate that haem had not properly equilibrated before the baseline was taken. 180 Experimental Characterisation of Mutants and Homologues that proteins have evolved specifically to carry out this reaction, and that it was not merely an incidental artefact unique to HemS that it could catalyse haem breakdown via this reductive route. However, the ability of ShuS to catalyse the reaction differed markedly from the other homologues. On the face of it, this is surprising since ShuS only differs from ChuS by five residues. However, as discussed in Section 8.4, ShuS can bind DNA yet ChuS seemingly cannot, suggesting a different role for ShuS. Nevertheless, ShuS can still catalyse the breakdown of haem with NADH, albeit at a much reduced rate. A second experiment was run where the pH was set to 8.0 rather than 6.5, as ShuS is less prone to precipitate out of solution at this higher pH. Though loss of haem was observed (indicated by the decreasing absorbance of the Soret band at ∼408 nm), there appeared to be only little growth at ∼591 nm, suggesting that haem was perhaps being broken down by some other mechanism. 9.5 Stopped-Flow Spectroscopy As explained in Chapter 5, stopped-flow spectroscopy is a technique with high tem- poral resolution, making it suitable for further kinetic studies into the haem break- down process in HemS and its homologues, and on the intermediate formation and consumption in particular. This intermediate had been identified in studies on HemS in Chapter 5. Were it to be identified in studies on the homologues, this confirma- tion would provide further evidence that they are all promoting the same reaction. First, however, the correlation of the intermediate formation with the loss of haem was examined. 9.5.1 Deconvolution of the Stopped-Flow Spectra Experiments were run using a PDA detector over the range 400-722 nm, the latter being the long-wavelength limit for the detector. As the peak corresponding to the intermediate at 806 nm is broad, readings at 722 nm can capture the evolution of this intermediate qualitatively. Therefore, experiments run with a detector operat- ing between 400-722 nm can chart the progress of haem loss (at 408 nm), HBP gain (at 591 nm) and intermediate evolution (at 722 nm) over time. In order to find out more about these individual species (namely, protein-bound haem, the HBP and the intermediate), an effective means of deconvoluting the overall spectra was sought. Singular value decomposition (SVD) provides a ready means for doing this analysis. The time course for the reaction of pre-incubated 5 µM WT HemS and 5 µM haem with 1000 µM NADH is shown in Fig. 9.4, along with decomposed spectra. SVD sug- 181 Experimental Characterisation of Mutants and Homologues gested that the 3D spectrum consisted of three primary components. The first two corresponded closely to the Soret band at 408 nm and to the HBP peak at 591 nm, respectively. The third revealed a shoulder of a peak at the upper wavelength limit of 722 nm which is consistent with the intermediate species, which is known to have a broad peak centred around 806 nm. SVD also suggested that this component has a further, sharper peak at ∼433 nm. Such a peak is always masked by the Soret band, explaining why it has not been observed before. Much of the spectrum of the intermediate between 400-650 nm is obscured by haem and the HBP, but these data suggest that the intermediate species has two signature peaks in the 400-850 nm region, one around 430-440 nm and the 806 nm absorbance itself. These are signifi- cantly red-shifted compared with haem and suggest that the conjugated bonding in the porphyrin has moved to lower energy, but that the tetrapyrrole remains cyclic. The SVD amplitudes, shown in the bottom right of Fig. 9.4, strongly indicate a direct inverse correlation between haem loss and intermediate formation. At the beginning of the reaction, the component representing haem steeply declines, which is matched by a sharp increase in the component representing the intermediate. Then, as the intermediate reaches its maximum value and starts to decline, this change is mirrored by the time at which the decline of haem starts to level off. The rates at which both the haem and intermediate level off then appear to match up until the reaction cutoff of 1020 s. The component representing the HBP, however, behaves differently. At a very short timescale (within the first 10 s), there is a slight lag, before the HBP concentration starts to increase. This increase is not as rapid as that for the intermediate, but it lasts longer, with the formation of the HBP beginning to level off after ∼400 s. 9.5.2 Dependence of the Intermediate on NADH Concen- tration The rates of formation and consumption of the intermediate at 806 nm were inves- tigated over a range of NADH concentrations. Parallel investigations were run with the WT and F104AF199A HemS proteins. The time courses are shown in Fig. 9.5. Increasing the initial concentration of NADH leads to both an increase in the rate of formation and in the rate of consumption of the intermediate. Attempts were made to fit curves to these two sets of data using KinTek.166 Even with the introduction of normalisation constants, this fitting proved too difficult, perhaps due to issues with the data itself. In the case of the WT, it does appear as if the absolute absorbances at an NADH concentration of 400 µM are consistently too high, plus the readings at 150 µM and 250µM appear to be too close to one another 182 Experimental Characterisation of Mutants and Homologues Figure 9.4: SVD analysis of time course for pre-incubated 5 µM WT HemS + 5 µM haem with 1000 µM NADH, recorded from 400-722 nm. Top: The 3D spectrum. The Soret band at 408 nm decreases over time, indicating loss of haem. A peak centred around 591 nm increases over time, indicating product formation. At 722 nm, there is a small but fast initial increase in absorbance, followed by a steady decline, which indicates intermediate formation then consumption. Bottom left: SVD component spectra. The green curve corresponds closely to the Soret band at 408 nm, and so gives an indication of haem concentration. The blue curve peaks at 591 nm, therefore reflecting HBP concentration. The red curve appears to contain a broad peak, with the wavelength cutoff at 722 nm capturing its shoulder. This is to be expected for the intermediate species, and it is assumed that this peak will be centred at 806 nm. This red curve also reveals a further peak, located at ∼433 nm. This peak is masked by the Soret band in convoluted spectra. Bottom right: SVD amplitudes. The colour-coding corresponds to that found in the SVD component spectra. These amplitudes show haem being consumed over time, an initial burst of intermediate formation before a steady decline, and a slower rate of product formation, which begins to level off and then slowly decline after ∼400 s. 183 Experimental Characterisation of Mutants and Homologues Figure 9.5: 806 nm time courses for WT HemS (top) and F104AF199A HemS (bottom), with varying starting NADH concentrations, described in the legend in units of µM. In each experiment, NADH was added to a pre-incubated mixture of 5 µM HemS and 5 µM haem. The rates of formation and consumption of the intermediate both increase with increasing NADH concentration. 184 Experimental Characterisation of Mutants and Homologues to fit the overall trend. In the case of F104AF199A, meanwhile, some of the time courses start at negative absorbances. Another explanation for the failure to derive accurate curves from the data could be that the proposed reaction mechanism, given in Scheme 2 and described below, is wrong. Scheme 2: Proposed reaction scheme for the overall haem breakdown process. E corre- sponds to the enzyme, H to haem, N to NADH, M to NAD+, I to the intermediate with a characteristic peak at 806 nm and P to the haem breakdown product. The hydride transfer and subsequent step are considered as being irreversible. Evidence from Sections 5.5.2 and 5.6 suggests that NAD+, M, leaves the pocket faster than the haem breakdown product, P, thus explaining the order in which they leave in the scheme. Furthermore, the experiments were all performed on samples where haem, H, and the enzyme, E, were pre-incubated, and so the scheme reflects this by showing H being introduced to the enzyme before NADH, N. The assumption was made that only one intermediate, I, was formed throughout the scheme. As simulations run using this scheme did not yield any fully converged curves, alternative mechanisms were attempted, including ones that considered allostery, but none of these proved successful either. In the absence of a more detailed analysis, conclusions on the dependence of the intermediate on NADH concentration can only be tentative. One observation is that the rates of both the formation and the consumption of the intermediate increases with increasing NADH concentration, suggesting that the second phase of the reaction depends on NADH concentration in addition to the first phase. As the NADH concentration increases, the time at which the absorbance reaches a maximum decreases, although this maximum rises in absolute value. Though a global fit had failed, fitting of the individual kinetic traces was achieved. This fitting used a simple two-step model, where the first step consisted of the irre- versible conversion of holo-HemS and non-bound NADH to the intermediate species, and the latter step comprised the irreversible conversion of this intermediate to the final product, as shown in Scheme 3. The coefficients and associated errors derived from these fits are given in Table 185 Experimental Characterisation of Mutants and Homologues EH + N k1−→ EIM EIM k2−→ EPM Therefore, equation to fit: a * EIM + b Scheme 3: Simple two-step mechanism for haem breakdown, which was used to derive fits for the individual kinetic traces gathered by stopped-flow spectroscopy. The symbols follow the same logic as in Scheme 2. 9.3, and plots of the fits overlaid on top of the data are shown in Appendix G. For the Wild Type case, fits to the data gathered at NADH concentrations of 5µM and 25 µM could not be achieved, and for the F104AF199A case, could not be generated when [NADH] was 5µM, 25µM and 50µM, respectively. These fits could not be accurately derived as the absorbance at 806 nm (i.e. the concentration of the intermediate) was still rising after 1000 s, thus complicating the estimation of the rate of the conversion of the intermediate to the product. Protein Coefficient 50 µM 75µM 100µM 150 µM 250µM 400µM 750 µM 1000µM WT k1 0.001392(1068) 0.001037(747) 0.000732(90) 0.000548(33) 0.000497(63) 0.000278(42) 0.000303(146) 0.000266(70) WT k2 0.000316(40) 0.000301(27) 0.000296(30) 0.000665(16) 0.000797(27) 0.001710(551) 0.004112(1151) 0.004791(500) WT a 0.000806(6) 0.002396(12) 0.004353(24) 0.006549(9) 0.006368(13) 0.008727(46) 0.008295(89) 0.008038(37) WT b 0.003531(31) 0.000727(64) 0.001159(132) 0.000336(45) 0.000299(63) 0.003926(19) 0.006054(27) 0.007168(11) DM k1 – 0.000810(256) 0.000558(113) 0.000419(103) 0.000385(50) 0.000378(73) 0.000236(62) 0.000375(118) DM k2 – 0.000111(45) 0.000241(36) 0.000333(40) 0.000809(24) 0.001483(736) 0.002123(568) 0.004836(636) DM a – 0.002855(57) 0.004337(33) 0.005894(37) 0.008768(16) 0.006160(43) 0.007763(42) 0.007280(43) DM b – 0.000576(306) 0.000980(186) 0.000303(208) 0.000103(76) 0.003358(23) 0.005638(18) 0.004890(14) Table 9.3: Coefficients and associated errors derived from fitting to individual kinetic traces, using Scheme 3 as the model. WT stands for Wild Type HemS and DM for the double mutant of HemS, F104AF199A. k1, representing irreversible conversion of the haem to the intermediate, is shown to decrease with increasing NADH concentration. This is due to the faster consumption of this intermediate, as represented by the increasing value of k2 with increasing NADH concentration. From these data, plots of k1 and k2 versus NADH concentration were made, as shown in Fig. 9.6. These plots show that, whereas k1 decreases with increasing NADH, k2 increases. Such a phenomenon was concluded to be due to the faster conversion of the intermediate species to the HBP masking any possible increase in the rate of the conversion of the reactants to the intermediate (which is possible since the rates are both ultimately derived from the absorbance levels at the signature peak for the intermediate at 806 nm). Fig. 9.7 gives the variation in the initial formation rates of the intermediate species with respect to NADH concentration. This figure shows that the initial rate of intermediate formation increases with increasing NADH concentration, but then begins to level off as NADH reaches millimolar quantities. This data would therefore suggest that saturation of the enzyme occurs when NADH is approximately in 100- fold excess of the HemS : haem mixture. 186 Experimental Characterisation of Mutants and Homologues Figure 9.6: k1 and k2 values with respect to initial NADH concentration. In each ex- periment, NADH was added to a pre-incubated mixture of 5 µM HemS and 5 µM haem. Due to the extinction coefficient at 806 nm being unknown, these values are simply for the change in absorbance over the given time interval, rather than a direct description of the change in intermediate concentration. 187 Experimental Characterisation of Mutants and Homologues Figure 9.7: Initial rates of formation of the intermediate species with respect to initial NADH concentration. In each experiment, NADH was added to a pre-incubated mixture of 5 µM HemS and 5 µM haem. These rates were calculated from the first five seconds of the respective reactions, where the rises in absorbances are linear. Due to the extinction coefficient at 806 nm being unknown, these rates are simply for the rise in absorbance over a given time period, rather than a direct description of the change in intermediate concentration. 9.5.3 Effect of Mutants and Homologues on Intermediate Formation and Consumption Time courses for each of the homologues at 806 nm are provided in Fig. 9.8. These time courses show that, in the presence of each homologue, there is a quick initial rise in absorbance at 806 nm, followed by a more gradual decrease, when NADH is added in excess to pre-incubated holo-protein. These changes in absorbance, as with HemS, suggest the formation and consumption of an intermediate species. The curves charting HemS and HmuS fall away more rapidly than for the other homologues (with the ShuS samples being especially slow), suggesting that the intermediate is being used up more rapidly in the presence of these two homologues. These data therefore reinforce the conclusion drawn from the previous spectroscopic data that HemS and HmuS are more effective at breaking down haem to produce the HBP than ShuS. It appears that build-up of the intermediate is greatest with HmuS. However, care should be taken when analysing absolute absorbances. Despite the best efforts of the author, including repeating all experiments in triplicate (see Section 3.6 in the Methods for further details), it is recognised that the Xe lamps used in typical stopped-flow setups can fluctuate. This problem is especially relevant to the current 188 Experimental Characterisation of Mutants and Homologues A0.125 s tpeak / s Apeak A250 s Rform. / µs-1 Rcons. / µs-1 WT HemS 0.0049 14.000 0.0436 0.0194 2787 102 F104AF199A 0.0044 15.625 0.0459 0.0245 2676 91 F104I 0.0042 25.250 0.0509 0.0298 1859 94 F199A 0.0047 16.125 0.0448 0.0252 2507 84 R209A 0.0043 43.375 0.0284 0.0177 556 52 Q210A 0.0226 23.375 0.0720 0.0487 2128 103 HmuS 0.0103 18.125 0.0611 0.0358 2822 109 ChuS 0.0059 19.500 0.0462 0.0309 2078 66 ShuS pH 6.5 0.0057 24.625 0.0377 0.0300 1308 34 ShuS pH 8.0 0.0030 64.125 0.0396 0.0338 572 31 Table 9.4: Rates of formation and consumption of the intermediate species. tpeak corre- sponds to the time at which the intermediate absorbance at 806 nm reaches a maximum, Apeak is the absorbance at this time, A0.125 s is the absorbance after 0.125 s, and A250 s is the absorbance after 250 s. Rform. and Rcons. are simple approximations for the rates of formation and consumption of the intermediate species, respectively. Rform. is calculated by taking the difference between Apeak and A0.125 s, and dividing by the difference in time between those two readings. Rcons. is the same but for Apeak and A250 s instead. Extinc- tion coefficients at 806 nm are unknown, and so absorbance values are compared directly, rather than concentrations. In interpreting results, an assumption is made that the ex- tinction coefficient will be the same for all mutants and homologues. The data show that all mutations to HemS give a slower rate of formation, and that all mutations other than Q210A result in a slower rate of consumption of the intermediate. Furthermore, HemS and HmuS promote similar rates of formation and consumption, whereas ShuS gives the slowest rates of any homologue. F104A and R209K are not included in this study because these samples had degraded before the assays could be prepared. experiments, where the absorbance values being detected were often low and the experiments were conducted over several days. The time courses at 806 nm with the mutants of HemS were also studied, and are provided in Fig. 9.8. Here again, care must be taken when interpreting the absolute absorbance values. It is apparent, though, that the rate of consumption of the intermediate is greater for the WT than for most of the mutants. A numerical comparison of these rates of consumption, along with those from the homologues, is provided in Table 9.4. 9.6 Crystallography Attempts were made to crystallise a selection of the mutants and homologues. A limited number of crystals did form for the majority of them. However, these crystals proved to either not be of a high enough quality for diffraction, or were destroyed by the cryoprotectants added to them. 189 Experimental Characterisation of Mutants and Homologues Figure 9.8: Time courses of the intermediate concentration, as reported by the absorbance at 806 nm. In each experiment, 1000 µM NADH was added to a pre-incubated mixture of 5 µM protein and 5 µM haem. Top: Homologues. All plots show an initial fast rise in absorbance, followed by a gradual decrease, suggesting that an intermediate is formed in the presence of each homologue. The consumption of the intermediate is more rapid for HemS and HmuS than it is for the other homologues. Bottom: Mutants. All plots also show an initial fast rise in absorbance, followed by a gradual decrease, suggesting that an intermediate is formed in the presence of each mutant. Apart from Q210A, the consump- tion of the intermediate is more rapid for the WT than it is for the mutants, suggesting each of the mutations made deleteriously affect the conversion of the intermediate to the product. 190 Experimental Characterisation of Mutants and Homologues As it is central to the main pocket, separating the haem-binding and NADH- binding regions of the protein, it was hypothesised that removing the double phe-gate may lead to significant structural changes. Therefore, additional assays were run to crystallise F104AF199A HemS. For reference, the results for WT HemS can be found in Section 5.5.2. That section discussed how the resolution of the unstructured loop region, consistent with all previous crystallographic studies on HemS and its homologues other than the PhuS dimer,111 was too low to be accurately resolved. However, the resolution with F104AF199A samples proved to be higher, so that this ‘missing loop’ could be constructed for the first time. This is therefore the first X-ray structure of HemS or one of its homologues (other than the PhuS dimer)111 where all residues are resolved. The structure is shown in Fig. 9.9, superimposed upon the WT structure. Superimposing resolved crystal structures for the WT and for F104AF199A shows that the main pocket changes in subtle ways upon removal of the double phe-gate. Indeed, the main pocket is more open, with the capping α-helix less deeply buried. However, simulations suggest the reverse is true, with this α-helix more tightly clamped down in the F104AF199A case than in the WT case. As the simulations contain haem and NADH, and yet the crystal structures did not, it may be the case that these simulations are not inaccurate, but that they are instead showing that F104AF199A clamps down on haem more significantly than the WT does upon the inclusion of haem. Since the removal of the double phe-gate affords more space in the central cavity, this conclusion would seem to be consistent with basic steric considerations. Looking from one perspective, it is curious that HemS has therefore evolved to contain this double phe-gate, since its absence results in the protein being better able to clamp down on haem, and therefore presumably better hold it in place. However, the fact that the mutated protein shows greater con- formational flexibility between its apo- and holo-forms perhaps points to a greater inherent instability in its structure. When left at refrigerated temperatures, it was found that samples of F104AF199A did tend to aggregate quicker than the WT. 9.7 Discussion and Summary The research described in this chapter used experimental techniques to better un- derstand the properties of the selected mutants and homologues of HemS, as well as to deepen an understanding of the anaerobic breakdown of haem. Firstly, it was shown that all of the mutants and homologues were able to bind to haem, with each giving a clear Soret peak at ∼408 nm. The most significant 191 Experimental Characterisation of Mutants and Homologues Figure 9.9: WT and F104AF199A HemS structures. The WT is represented in green and F104AF199A in cyan. Top left: The structures resolved from X-ray crystallography. The (cyan) capping α-helix of F104AF199A is shown to be further from the central cavity compared to the WT capping α-helix. Top right: The starting structures in the Dijkstra fastest pathways, as derived from computation. Bottom left: The final structures in the Dijkstra fastest pathways, as derived from computation. For clarity, only NADH and haem from the WT structure are shown in each case. Bottom right: Two representa- tions of F104AF199A superimposed on one another, to demonstrate the movement of the unstructured loop and the pocket-capping α-helix upon NADH- and haem-binding. The cyan representation is the structure resolved from X-ray crystallography and the salmon representation is the final structure in the Dijkstra fastest pathway. 192 Experimental Characterisation of Mutants and Homologues conclusion from this chapter is that this haem breakdown reaction is not unique to HemS, but can also occur in the homologues, HmuS, ChuS and ShuS. These results would therefore suggest that the anaerobic breakdown of haem is indeed an important strategy engaged in by pathogenic bacteria to extract iron from their host organisms. As was predicted from both computational and bioinformatic studies, these homologues had different propensities to catalyse the reaction, with ShuS being the least effective enzyme. Studies with the bioinformatic package, DNAbind, had suggested that ShuS may have competing DNA-binding properties in addition to its ability to bind to haem. Though no studies in this current work were specifically designed to test this hypothesis, its veracity was inferred from the fact that the expression of ShuS required the inclusion of benzonase nuclease to prevent DNA- protein aggregates from forming. All of the mutants were able to catalyse the reaction as well, showing that none of the residues selected for mutation – which were all predicted to influence the NADH unfolding and its access to haem – are essential for the reaction to proceed, and so HemS can therefore tolerate a wide range of mutations to the NADH-binding region. Stopped-flow spectroscopy was used to further probe how the reaction varies between the mutants and homologues. The original intention had been to engage in a full kinetic study with global analysis. A combination of the mechanism’s com- plexity, plus suspected inconsistencies in absorbances arising from lamp fluctuations between runs, precluded this analysis. Stopped-flow experiments were run where the detector was set to the wavelength range 400-722 nm. This setting allowed for the concentrations of haem and the HBP to be tracked, plus the intermediate imperfectly (since its peak is at 806 nm, but its shoulder can be observed at 722 nm). Analysis by SVD gave three primary components whose spectra matched these three species (i.e. haem, the intermediate and the HBP) closely. The component corresponding to the intermediate revealed that this species has a peak at ∼433 nm, which tends to be masked by the haem Soret peak, in addition to the one at 806 nm. Tracking these components over the time course showed that there is a direct inverse correlation between haem and the intermediate, demonstrating that haem breakdown is indeed directly correlated to intermediate formation. This result strengthens the hypothesis that the intermediate is the immediate product resulting from hydride transfer from NADH to haem. The dependence of intermediate formation and consumption on NADH concen- tration was then investigated. It was shown that both rates increase with increasing NADH concentration, suggesting that, in addition to its formation step, the inter- 193 Experimental Characterisation of Mutants and Homologues mediate breakdown step also depends on NADH. Furthermore, the time at which the intermediate peaks in concentration shortens, plus its concentration at this time increases. The increase in intermediate concentration may be due to faster rates of NADH association with the protein, unfolding and hydride transfer, before the proposed sigmatropic rearrangement to form the final HBP takes meaningful effect. However, it is thought that the shift to shorter time is due to the rate for this pro- posed sigmatropic rearrangement increasing faster than for any of these preceding steps in the mechanism. It was shown that all of the selected mutations to HemS lead to a reduced rate of intermediate formation, and that all mutations other than Q210A result in a reduced rate of consumption. The fact that this Q210A mutant behaves this way is not surprising. Unlike the other residues that were mutated, Q210 is not located near to the region of the pocket where hydride transfer is expected. Instead, it is located further towards the edge of the pocket, where it was hypothesised from the computational data (see Section 7.3) that it helps NADH to unfold by hydrogen- bonding to the phosphate backbone. Its mutation to alanine would explain the decreased rate of formation of the intermediate as this residue cannot form hydrogen- bonds, and so, under this hypothesis, it would become more difficult for NADH to unfold and therefore it would take longer to reach haem. However, once in close proximity to haem, NADH would presumably be positioned and oriented similarly to the WT case, given that the residues in that region are all retained. Assuming NADH is involved in the breakdown of haem following the transfer of hydride, perhaps just by influencing the electronic environment, the position and orientation of NADH (or, more correctly, NAD+) could have a significant influence on the rate. The WT and Q210A cases should provide similar electronic environments, which would explain why they give similar rates of intermediate consumption whereas all of the other mutants give reduced values. The R209A mutation reduces the rates of formation and consumption of the intermediate the most. If R209 binds both the NADH and haem molecules, that such a mutation should be so disruptive is also not surprising. Qualitatively, it was shown that consumption of the intermediate species for HemS and HmuS was significantly higher than for the other homologues, reinforcing the conclusions drawn from the standard UV-Visible spectroscopic data, as well as from computational and bioinformatic predictions, that ChuS and ShuS are less effective enzymes. Attempts to crystallise the majority of the mutants and homologues either did not work, were subsequently destroyed by cryoprotectants, or produced too low a 194 Experimental Characterisation of Mutants and Homologues resolution for accurate structural determination. Crystals of F104AF199A, however, were successfully grown and their structures resolved, using the procedures outlined in Section 3.11. These structures showed that, in the absence of haem or NADH, the main cavity of HemS is more open than for the WT. As the computational work consistently shows that, with haem and NADH present, the reverse is true, two conclusions can be drawn. The first is that the simulations do not accurately represent the WT and/or F104AF199A proteins. The second is that the removal of the double phe-gate allows for greater conformational flexibility, meaning the double mutant can ‘clamp down’ more strongly on haem. The second conclusion implies that F104AF199A may not be as stable in its apo-protein form, since it seems unable to maintain its cavity size and shape as consistently, perhaps therefore explaining the observation that it can aggregate more readily than the WT. From these F104AF199A crystals, the unstructured loop spanning the entrance to one side of the main cavity was resolved fully for the first time for HemS or any of its close homologues. This structure, therefore, is the first one that has been fully resolved for this class of haemoprotein other than the PhuS dimer. 195 Chapter 10 Conclusions and Future Work 10.1 Conclusions The most significant conclusion to be drawn from this work is that the novel anaero- bic haem breakdown reaction discovered by Sawyer is not unique to HemS. Instead, such breakdown has been demonstrated to occur, to various extents, in the HemS homologues, HmuS, ChuS and ShuS. These proteins are a small subset of a wider class, which are typically notated as ‘haem transport’ or ‘haem-degrading’ factors. The research by Sawyer, Choy and the current author is the first demonstration that this class of protein can indeed promote the breakdown of haem. Novel features of this reaction were investigated in more detail. The hypothe- sis that haem breakdown was initiated by reductive hydride transfer from NADH was confirmed through a series of deuterium labelling experiments. Both stereoiso- mers of NADD produced deuterated and non-deuterated HBPs, suggesting that the reaction is not stereospecific. Such reactions are unusual but not unheard of in enzymology. Transfer of a hydride from the (S )-position of the nicotinamide head of NADH was shown to be significantly faster than from the (R)-position. It is perhaps the case that, once NADH has committed to unfolding in a certain manner and presenting one of its hydrides to haem, the barrier to changing conformation is higher than the transfer of the less-favoured hydride. As NADPH hydrogen-bonds to an additional (T312) residue when it associates with HemS, this structure may induce the stereospecificity not found with NADH. Studies on deuterated NADPH would be required to determine whether this is the case. The reaction mechanism was elucidated further through the discovery of a short- lived intermediate. This intermediate has signature UV-Visible absorption maxima at 806 nm and 433 nm, the latter only becoming apparent after SVD analysis of the time evolution of the UV-Visible spectra. Together, these peaks suggest a molecule 196 Conclusions and Future Work that is still largely similar to haem structurally, but which has had its conjugated system moved to lower energy. It was shown that the formation of the intermediate coincides with the loss of the haem absorbance in the Soret band. The working hypothesis is therefore that this intermediate is the immediate result of hydride transfer, and that a subsequent sigmatropic rearrangement causes the cyclic por- phyrin to open to produce the final HBP, a linear tetrapyrrole. The ability of the reaction to proceed anaerobically was demonstrated defini- tively. Under aerobic conditions, there is an indirect competition for haem break- down, which arises via coupled oxidation and produces non-regioselective biliverdin. Under certain protein : haem ratios, this competition significantly reduces formation of the HBP. All of the proteins studied in this work are from pathogenic bacteria, which are known to thrive in areas of the human gut where oxygen levels are low. It would therefore not be surprising for these bacteria to have developed alternative strategies for the controlled breakdown of haem, which are not dependent on oxy- gen. Further to this conclusion, there was no evidence of ‘canonical haem oxygenase’ activity in any of the proteins studied. Attempts to characterise the HBP by NMR proved unsuccessful. Other than try- ing paramagnetic NMR, it would seem that the electronic properties of this species, as well as its instability outside the protein, would preclude further study by this method. X-ray crystallography has proven to be more fruitful. Post-reaction struc- tures of HemS have been successfully crystallised and resolved. The structure of the HBP in the pocket has not proved to be as straightforward to resolve. This problem may be due to an inherent instability in the structure caused by loss of iron. It is thought that the cryoprotectants used may have been causing iron to escape from the cleaved porphyrin prematurely. If this is the case, then alternative cryoprotectants may prove capable of retaining the HBP for accurate resolution. Previous studies had shown little homology between HemS and other proteins known to bind to NADH, suggesting that the details of NADH-binding in HemS and its homologues are unprecedented. As NADH binds transiently in the pocket, com- putational experiments were required to probe this novel interaction more closely. All-atom models using AMBER and the programs developed by the Wales group at the University of Cambridge provided the means for this analysis. Previous studies had been limited by the computational methods and resources available at the time. However, for this study, calculations could be run on GPUs, giving speed-ups of up to two orders of magnitude. Even with this increase in efficiency, further strategies were sought to expedite these calculations. To accelerate growth of the WT HemS database, a method for 197 Conclusions and Future Work choosing unconnected minima in separate sub-databases for connection attempts was developed, based both on minimising the number of attempts required to fully connect all of the sub-databases, and on choosing the minima closest in conforma- tional space to make those connections. Once this WT HemS database was complete, a new subroutine was written, which could use the stationary points (or a selection thereof) in this database as a template to seed new databases where the protein has either been subject to single-point mutation(s), transformed into a homologue, or the ligand(s) changed. This subroutine removes the need to build up new databases from scratch, and thus allowed fresh databases describing seven HemS mutants and three homologues to be constructed and connected in a fraction of the time it would have required by alternative strategies. The construction of large databases afforded an opportunity for the general prop- erties of these systems to emerge, particularly when displayed as disconnectivity graphs. For those databases which could be considered as being ‘complete’ (i.e. all of those involving HemS), each showed one distinct low-energy funnel in their respective graphs. Further analysis showed that the bottoms of these funnels gen- erally corresponded to the minima with the shortest NADH-haem distances. This structure was especially true for WT HemS, and it was noted that some of the single-point mutations disrupted this feature of the landscape. A steep funnel in an energy landscape with few competing kinetic traps is indicative in a biological context of a system which is optimised (i.e. has evolved) to achieve a particular structure. Typically, this principle has been used to determine the folding charac- teristics of proteins and nucleic acids. In this context, the steep funnels show that HemS is optimised to bring haem and NADH into close proximity. Though none of their databases are fully complete, of the homologues it would appear that HmuS has a similar funnel, but ChuS and ShuS do not. Another advantage of being able to build up the WT HemS database quickly was that a pathway showing NADH unfolding and approaching haem could be fully connected for the first time. Further refinement of the database then allowed for alternative pathways to be identified, giving a detailed picture of the routes by which NADH can most effectively reach haem. The Dijkstra fastest pathway showed that this process was energetically favourable, with few large barriers to be traversed. These pathways identified an interesting conformational change in residue Q210. This residue was shown to swing towards the pocket as NADH moved further inside, with other residues anchoring it in a convenient location to hydrogen-bond to the phosphate backbone of the ligand. This structure appeared to facilitate a number of the unfolding steps required of NADH to reach haem and, once these processes were 198 Conclusions and Future Work completed, Q210 then swung back to its original position. Such behaviour would be difficult to identify by experiment or using computational methods which only consider snapshots in the pathway or rely on coarse-graining the protein. R209 was also identified as a residue of interest due to its ability to hydrogen-bond to both haem and NADH when the two ligands are in close proximity. The double phe-gate, first identified by Choy and Shang, was also studied in greater detail. It was shown that association of NADH with the edge of the main cavity caused the preferred conformation of the double phe-gate to change from open-open (OO) to closed-open (CO). This change was due to NADH hydrogen- bonding to a pair of residues, N106 and P169, which brought about a series of conformational changes, culminating in F104 closing and forming a T-shaped pi- pi interaction with F199. As NADH approaches haem, F104 typically opens up again, which alleviates steric crowding at that region of the pocket. It is possible, therefore, that this double phe-gate is engaging in a sophisticated regulatory mechanism to control the access of NADH to haem, which only comes into effect once NADH associates with the protein. Bioinformatics was also utilised to further understand the phylogenetic context of HemS. This work showed that HemS is a member of a family of at least 5000 related haemoproteins derived from 218 different genera. The majority of these genera are pathogenic bacteria although some are non-pathogenic. The working hypothesis developed as a result of this project is that HemS and its homologues are used to break down haem under anaerobic conditions as a means of allowing a bacterium to extract iron from its host organism. Non-pathogenicity would suggest that for some of these bacteria, this process is either benign to the host or does not occur at all. Analysing the sequences of these homologues revealed remarkable rates of conser- vation among some of the residues. Those directly involved with haem-binding (such as R102, H196 and R209) showed 100% rates of conservation, suggesting that all of the homologues are at least involved in binding to haem. Those residues thought to be involved in NADH-binding were not nearly as faithfully conserved, suggest- ing either that this region of the pocket is only required to bind to NADH weakly (as implied by the weak Michaelis constant for NADH determined by Sawyer) and therefore tolerates a wider range of possible residues, or the breakdown of haem by NADH is not the primary function of all of the homologues. The double phe-gate, which occupies a section of the cavity between the haem-binding and NADH-binding regions, has a much higher degree of conservation than any of the residues compris- ing the NADH-binding area. One possible reason for this difference is that the 199 Conclusions and Future Work double phe-gate helps to maintain the structural integrity of the main cavity, as determined from crystallographic studies and computational simulations, which to- gether showed that F104AF199A HemS cannot maintain the pocket size as well as the WT. Another possible reason is that the double phe-gate is not just involved in regulating the approach of NADH to haem, but of alternative ligands as well. One possible alternative function for certain haemoproteins in this class, which has already been demonstrated in the literature, is DNA-binding. It has been shown that PhuS, when not bound to haem, is capable of binding to the prrF1 promoter, which is known to regulate the expression of certain nonessential iron-containing proteins. This homologue, at least, is therefore potentially capable of controlling its own concentration in the cell. ShuS has also been demonstrated to bind to DNA, but non-specifically. Studies with the analysis package, DNAbind, in the present work showed that PhuS and ShuS are likely to bind to DNA at different cavities, suggesting that the ability of ShuS to bind to DNA is for reasons other than to control its own expression via prrF1. DNAbind also provided good reasons as to why ChuS has not been shown to bind DNA, yet ShuS has, despite the two proteins being different in only five residues. All but one of the mutants and homologues selected for the joint computational and experimental analyses were successfully synthesised using the protocol designed for WT HemS. In order to extract ShuS from the E. coli cells it was grown in, the protocol had to be adapted to account for possible aggregation with DNA. All of the mutants and homologues were demonstrated to bind haem similarly to WT HemS, and to catalyse its reductive breakdown by NADH to give the same intermediate species and final HBP. All of the mutants were shown to deleteriously affect the rate of the breakdown of haem, although not by as much as expected. It therefore seems that HemS is optimised to break down haem, but that it can also tolerate a wide range of mutations to its NADH-binding residues and double phe- gate. Previous studies by Sawyer had shown that mutations to the residues that bind haem at its iron centre, R102 and H196, effectively prevent haem from binding and therefore, by extension, severely limit the breakdown process. Together, these findings reinforce the conclusion derived from the bioinformatic study that these homologues have primarily evolved to bind to haem, and that they may be engaging in several functions, one of which is anaerobic, reductive degradation of the haem by NADH. 200 Conclusions and Future Work 10.2 Future Work Despite the investment of much effort and the use of a wide range of character- isation techniques, the exact structure of the HBP remains unsolved. The most promising avenue for definitively resolving this structure appears to be X-ray crys- tallography. Post-reaction crystals with the intact HBP still inside (as determined by their bright purple colour) can reliably be produced, and crystallographic studies have shown that the protein structures can be clarified to high resolution. The next stage of this research is to find a method of protecting the integrity of the HBP as cryoprotectant is added. A series of assays using different combinations of cryopro- tectants at different concentrations may alight on a formulation that does not lead to HBP degradation. Were this approach to work, further studies using crystallog- raphy to determine the intermediate structure may be possible as well. By soaking holo-HemS crystals with NADH and freezing collections of them rapidly through a series of time-steps, it may be possible to track the evolution of haem to the HBP over time. Depending on how much the reaction can be slowed down and how high the temporal resolution could be increased to, such experiments could show exactly which atom of haem the hydride from NADH is transferred over to. As stocks have been retained of all of the homologues made, further assays could be attempted to crystallise and resolve these proteins too. This information would be particularly helpful for HmuS and ShuS, which have never been crystallised before; comparing these structures against HemS and ChuS, respectively, could shed new light on the subtle differences between these HemS/HmuS and ChuS/ShuS homologue pairs. Marked progress has been made in developing an understanding of the kinetics of the reaction, and how it varies between the mutants and homologues. Unfortu- nately, neither the computational, nor the experimental approaches produced full quantitative kinetic analyses. To expedite the expansion of databases, the frequen- cies of new stationary points were not calculated from a full consideration of the Hessian. These frequencies are not required when considering potential energies, but are important when considering free energies or actual rates. There are routines used by the Wales group that can readily calculate and add in these frequencies, although such calculations would most likely take 3-4 months to run. In terms of the experimental kinetic analysis, it is suspected that limitations to the stopped- flow technique prevented its full realisation. Although stopped-flow spectroscopy is a powerful method for tracking reactions over short timescales, the lamp can fluc- tuate and is difficult to normalise. For many reactions, where strong absorbances are involved, this fluctuation does not prove to be a significant issue. However, for the breakdown of haem by NADH, absorbances tend to be low, particularly for the 201 Conclusions and Future Work intermediate species, making these fluctuations more prominent. A possible way to resolve this problem would be to change the light source from a xenon arc-lamp to monochromatic LEDs. Such LEDs have been shown to be very intense and highly stable.248 Experiments using LEDs would be required to run at specific wavelengths rather than over a range, as is possible with the xenon lamp. For experiments where the relationship between haem and the HBP is of interest, these LEDs would therefore not be suitable. They would be suitable, however, if the object of the experiment was to track the behaviour of a single compound (e.g. the intermedi- ate at 806 nm). Another possible reason that curve fitting did not prove possible is that the assumed mechanism is not actually correct. There could, for example, be further, silent intermediate species and/or alternative, competing pathways. A deeper understanding of the mechanism, which could possibly be achieved by the crystallography experiments described above, could lead to a model being developed which would allow for more accurate simulation and fitting to the data. To further understand the NADH-binding site, alternative mutations and com- binations thereof could be applied. Mutating R250 and Q313 could prove to be interesting, as these residues are both implicated in binding to the NADH phos- phate backbone, and are therefore possibly important in stabilising the molecule as it unfolds. The comparison between homologues could also be expanded. Those chosen (HmuS, ChuS and ShuS) all had at least 66% identity with HemS. By extending this study to more distant homologues, the juncture at which haem breakdown ceases (if indeed that does happen) could be pinpointed, and the combination of residues required to bring this change about could perhaps even be identified. PhuS would be a sensible homologue to start with, considering that this protein has already been studied in detail by other groups. It would also be interesting to study homologues from non-pathogenic bacteria to determine whether any of these haemoproteins can catalyse the haem breakdown reaction. PhuS, together with ShuS, could be subjected to DNA-binding assays to test the hypotheses suggested by the bioinformatic study. HemS, HmuS and ChuS should also be subjected to these assays to determine definitively whether these proteins are indeed not able to bind DNA. Attempts could be made to crystallise PhuS bound to the prrF1 promoter. Provided it does not aggregate, crystallisation could also be attempted on ShuS bound to certain DNA sequences. The structures obtained could then be compared against apo-forms or those with haem bound instead, in order to determine what kind of changes, if any, DNA-binding causes. To further understand the role of these S proteins within their biological con- 202 Conclusions and Future Work texts, protein-protein interactions could be investigated by pull-down assays. By running these assays under both haem-rich and haem-limiting conditions, it may even be possible to determine if such interactions with the S proteins are depen- dent upon the presence of haem or not. A series of gene knockouts could also be applied to the hem operon before attempting to grow the parent organism under iron-replete and iron-limiting conditions. The working hypothesis of this thesis is that the catalytic constant of HemS is low due to the breakdown reaction being limited by product inhibition, and that this inhibition may be due to the absence of a haemoprotein which can capture the HBP from HemS and transport it to another location. Pull-down assays and gene knockout studies provide a promising avenue for confirming/disproving this hypothesis. The most immediate concern of future computational work on this project should be to complete the refinement of the HmuS, ChuS and ShuS databases. Refinement of the HmuS database would allow for a more detailed comparison with HemS to be made, whereas refinement of the ChuS and ShuS databases may cause alternative funnels to emerge and therefore reveal more about possible alternative functions. These databases could then be subjected to detailed kinetic studies with a full consideration of the frequencies too. If checkspmutate could be made more robust to comprehensive changes to the protein sequence, then a wider range of homologues could be studied. Attempts to transform HemS to PhuS could be revisited. These two proteins have 42.6% homology, and none of the transformed stationary points could be successfully re- optimised during the course of the project. It is thought that this rate of failure was due to three related reasons: firstly, the large number of residues along the sequence being mutated increased the likelihood that at least one would lead to steric clashes; secondly, five residues, including two pairs, had to be added at var- ious points in the sequence, which significantly disrupts those local environments and also leads to increased likelihood of steric clashes; and thirdly, the main cavity of PhuS has a larger volume than that of HemS, and so the cavity as found in the HemS stationary points may not be compatible when these stationary points are set to use PhuS residues. A possible solution to these issues would be to expand the main cavity to give a buried volume akin to that found experimentally with PhuS. This expansion would be implemented after the transformation of the sequence, but before any attempt to optimise the new structures. The expansion would also be required to preserve the conformations of the residues in the chains that define the cavity. It could be achieved by artificially extending the bond lengths making up the protein backbone that surround the cavity. This change would need to be done in 203 Conclusions and Future Work very small increments. On the assumption that such an expansion would ultimately yield a structure with no steric clashes between the residues in the main cavity, and that none had been introduced to other parts of the protein in the meantime, optimisation could then be performed, which would relax these backbone bonds, bringing them back to their original lengths. Such a procedure would increase the likelihood that individual residue conformations would change significantly during the optimisation and so the stationary point of the transformed PhuS protein would not necessarily be a faithful replication of its HemS equivalent. This sacrifice may just be necessary to achieve a reasonable number of successfully reoptimised sta- tionary points for proteins with low % sequence homology. Thorough refinement of the new database would be essential. Rather than changing the protein, extra focus could also be given to transforming the ligands. This feature is already possible with checkspmutate, with tests having been carried out on changing NADH to NADPH and NAD+, respectively. However, care must to taken to ensure that the new ligand shares some features with the ligand found in the template. Otherwise, the template becomes largely redundant, as the new ligand is more likely to bind to the protein via different residues and in different conformational arrangements. The computational approach could also be expanded to investigate the haem breakdown mechanism itself. Calculations in this thesis were concerned primarily with the approach of NADH to haem, and the way these two ligands orientate them- selves prior to hydride transfer. However, using QM/MM, proposed intermediate and product structures could be tested. There is still uncertainty over which atom in haem the transferred hydride bonds to. A reasonable assumption is that it attaches to the β-meso-carbon of haem, as one of the bonds this carbon is engaged in must break to open up the tetrapyrrole. However, the lowest energy minima in most of the databases studied featured a hydride within 3 A˚ of the haem C5 methyl. Comparing the stabilities of molecules where hydride has attached to either of these positions could therefore provide clues as to where the hydride transfer occurs. QM/MM calculations are typically computationally expensive and so starting parameters for intermediate and product candidate structures should be chosen carefully, ideally only after accurate resolution of the HBP and, if possible, the 806 nm intermediate by X-ray crystallography. These calculations could then shed light on the molecular orbitals involved in hydride transfer and possible subsequent sigmatropic rearrange- ments, as well as giving an indication of the kinetic barriers involved in this haem breakdown process. Detailed QM/MM and QM-cluster calculations have already been carried out by a group at Shahid Beheshti University together with Ulf Ryde 204 Conclusions and Future Work at Lund University, which show the breakdown of haem via the ‘canonical HO mech- anism’ to verdohaem and biliverdin.249–252 Their approach has proved to be in good agreement with experiment, and could perhaps be adapted to investigate this novel reductive haem breakdown process as well. 10.3 A Broader View In July of this year, two ground-breaking papers were published in Nature by the developers of AlphaFold, an artificial intelligence program managed by Google’s DeepMind which uses deep learning to predict protein structure.253;254 The same month, a further paper was published in Science by the developers of an alternative protein-prediction tool, known as RoseTTaFold.255 Together, these papers showed that protein structures can regularly be predicted to atomic levels of accuracy, even when there are no homologous structures available from experiment. Structure prediction is also scalable, and this feature was exploited using AlphaFold to predict 98.5% of the entire human proteome. This achievement is remarkable, especially when it is considered that it has taken many decades of diligent work for experimentalists to structurally determine 17% of the total number of residues in human protein sequences.256 The ability to predict protein structures so quickly and to such consistently high levels of accuracy constitutes a significant advancement in the field of proteomics. It is not difficult to see how an accurate knowledge of the protein structures making up the human proteome will provide valuable insight for pharmaceutical and medical researchers. However, to gain a true understanding of the human proteome, or of the pro- teomes of other species, a deeper comprehension of protein function is also required. As shown by a combination of experimental, computational and bioinformatic re- search in this thesis, homologues with very similar structures can engage in a variety of different functions. Care must therefore be taken by the scientific community go- ing forward not to assign protein function based on structural homologies alone. Thorough experimentation in the laboratory is without a doubt the gold stan- dard approach to elucidating protein function. However, given constraints on both time and cost, computational research should not be discounted as an increasingly capable alternative. The methods used by the Wales group have been proven to ac- curately predict protein folding pathways for a wide variety of protein types and in a broad range of simulated environments. Whilst there has not been much atomistic protein-ligand research done in the group, it is hoped that this research on HemS and 205 Conclusions and Future Work its homologues demonstrates that this avenue is both feasible, and faithfully repli- cates/predicts features identified by experiment. The checkspmutate subroutine, in particular, could hopefully prove to be useful in studying other protein-ligand problems, where a knowledge of the effects of mutations or a comparison between closely-related homologues is required. The ‘template-based’ approach for seeding and growing new databases was very much a proof-of-principle in this thesis to de- termine how well this method reflected findings in the laboratory. In the future, it is hoped that it may be possible to use such a method to effectively screen certain mu- tations to determine whether they give feasible proteins with interesting properties, before committing to full experimental studies. Recent advances in artificial intelligence, such as those achieved by AlphaFold and RoseTTafold, are helping to rapidly expand the collective knowledge of the genomic and proteomic sciences. With such knowledge, it is important to develop a detailed understanding of the accumulated data, one aspect of which is being able to relate protein structure to function accurately. Bench-top experimentation, computational simulation and bioinformatic analysis have all been demonstrated to be important means of effecting such development, particularly when used together. The 21st century has indeed proved to be the ‘century of biology’ thus far, and Nature still has many secrets to share. 206 Appendix A % Homologies of Operon Proteins There are both similarities and differences between the five operons of interest in this report. A simple, but informative, approach to determining the degree of similarity of proteins is to compare their sequences. To make these comparisons, particular genomes from each of the five types of bac- teria were considered. The Accession Numbers for these genomes are listed in Table A.1. The annotations which came with these were often incomplete or inconsistent with the naming conventions used in this report. Therefore, RefSeq (NCBI Refer- ence Sequences) are provided in Table A.2. ChuW, ChuX and ChuY were discovered to be an integral part of the chu operon. In contrast, the roles of HemW, HemX and HemY in hem are less clear. Complicating matters is the fact that there are other proteins coded for in the Y. enterocolitica, Y. pestis, E. coli, S. dysenteriae and P. aeruginosa genomes that are commonly given the appellations HemW, HemX and HemY as well. These genes are far away from the respective hem/hmu/chu/shu/phu operons, suggesting they are part of alternative haem-uptake mechanisms available to these bacteria. Using these reference sequences, percentage homologies between the proteins could be determined. These results are split into individual Tables A.3 to A.13. 207 % Homologies of Operon Proteins Bacterium Accession Number Yersinia enterocolitica NC 017564.1 Yersinia pestis NC 017168.1 Escherichia coli O157:H7 NC 002695.2 Shigella dysenteriae CP000034.1 Pseudomonas aeruginosa NC 002516.2 Table A.1: Accession numbers which provide the bacterial proteomes used to select the proteins and their sequences for the derivations of % homologies. XxxX Y. enterocolitica Y. pestis E. coli S. dysenteriae P. aeruginosa XxxP HemP HmuP ChuP ShuP PhuP WP 005181153.1 No RefSeq N/A N/A N/A XxxR HemR HmuR ChuR ShuR PhuR WP 005181150.1 WP 002209062.1 NP 312407.1 WP 000089574.1 NP 253398.1 XxxS HemS HmuS ChuS ShuS PhuS WP 005156541.1 WP 002209061.1 NP 312406.1 WP 001017208.1 NP 253397.1 XxxT HemT HmuT ChuT ShuT PhuT WP 005156544.1 WP 002209060.1 NP 312409.3 WP 001081846.1 NP 253396.1 XxxU HemU HmuU ChuU ShuU PhuU WP 005156547.1 WP 002209059.1 NP 312413.1 WP 005019013.1 NP 253395.1 XxxV HemV HmuV ChuV ShuV PhuV WP 005156550.1 WP 002209058.1 NP 312414.1 WP 001626196.1 NP 253394.1 XxxW HemW′ HmuW ChuW ShuW PhuW (Operon) WP 005156531.1 WP 002209066.1 NP 312410.1 No RefSeq N/A XxxX HemX′ HmuX ChuX ShuX PhuX (Operon) WP 005156534.1 WP 002209065.1 NP 312411.1 WP 000020038.1 N/A XxxY HemY′ HmuY ChuY ShuY PhuY (Operon) WP 013649113.1 WP 002209064.1 NP 312412.1 WP 000189360.1 N/A XxxW HemW HemW HemW HemW HemW (Other) WP 013649113.1 WP 002209064.1 NP 312412.1 WP 000239935.1 NP 249077.1 XxxX HemX HemX HemX HemX HemX (Other) WP 005166100.1 WP 002211463.1 NP 312760.1 WP 000138987.1 NP 253945.1 XxxY HemY HemY HemY HemY HemY (Other) WP 005166099.1 WP 002211462.1 NP 312759.1 WP 000921781.1 NP 253944.1 Table A.2: Accession numbers for each of the protein sequences used in the derivation of % homologies. The W, X and Y proteins corresponding to those found in the hem operon are labelled with a dash, ′. The W, X and Y proteins annotated (Other) correspond to those found outside of the hem, hmu, chu, shu or phu operons. 208 % Homologies of Operon Proteins % Identity HemR HmuR ChuR ShuR PhuR HemR – 84.7 66.3 66.1 24.8 HmuR 91.7 – 68.0 67.7 24.3 ChuR 79.4 81.0 – 99.5 25.7 ShuR 79.4 81.0 99.8 – 25.4 PhuR 38.0 38.3 39.9 39.8 – % Similarity Table A.3: XxxR Homologies. % Identity HemS HmuS ChuS ShuS PhuS HemS – 89.6 66.8 66.2 42.6 HmuS 94.8 – 67.1 66.5 43.8 ChuS 78.2 78.8 – 98.5 41.1 ShuS 77.9 78.5 98.5 – 40.8 PhuS 56.5 58.3 56.8 56.5 – % Similarity Table A.4: XxxS Homologies. % Identity HemT HmuT ChuT ShuT PhuT HemT – 90.7 36.2 36.6 29.1 HmuT 94.3 – 34.8 35.2 29.2 ChuT 55.9 57.0 – 97.4 36.1 ShuT 55.5 56.6 99.3 – 36.5 PhuT 33.1 32.2 58.0 58.0 – % Similarity Table A.5: XxxT Homologies. 209 % Homologies of Operon Proteins % Identity HemU HmuU ChuU ShuU PhuU HemU – 93.1 66.9 66.6 49.2 HmuU 97.0 – 67.5 67.8 48.6 ChuU 81.1 80.4 – 99.4 48.8 ShuU 80.4 80.4 99.4 – 48.8 PhuU 65.2 64.4 67.8 67.8 – % Similarity Table A.6: XxxU Homologies. % Identity HemV HmuV ChuV ShuV PhuV HemV – 90.3 57.9 58.7 45.5 HmuV 95.7 – 59.2 60.0 44.0 ChuV 72.0 73.3 – 98.4 41.2 ShuV 71.7 72.9 99.6 – 41.2 PhuV 58.4 57.6 57.6 57.6 – % Similarity Table A.7: XxxV Homologies. % Identity HemW′ HmuW ChuW ShuW PhuW HemW′ – 93.1 58.5 x x HmuW 97.5 – 57.8 x x ChuW 72.0 71.8 – x x ShuW x x x – x PhuW x x x x – % Similarity Table A.8: XxxW (Operon) Homologies. 210 % Homologies of Operon Proteins % Identity HemX′ HmuX ChuX ShuX PhuX HemX′ – 87.6 60.4 59.1 x HmuX 91.7 – 59.4 58.1 x ChuX 77.4 76.9 – 98.2 x ShuX 76.1 75.6 98.2 – x PhuX x x x x – % Similarity Table A.9: XxxX (Operon) Homologies. % Identity HemY′ HmuY ChuY ShuY PhuY HemY′ – 85.2 55.5 55.9 x HmuY 92.6 – 55.9 55.9 x ChuY 67.8 68.2 – 97.1 x ShuY 68.2 68.7 99.0 – x PhuY x x x x – % Similarity Table A.10: XxxY (Operon) Homologies. % Identity Y. ent. Y. pest. E. coli S. dys. P. aer. Y. ent. – 95.2 83.2 83.2 55.1 Y. pest. 97.1 – 81.4 81.4 54.0 E. coli 89.4 88.8 – 98.9 55.3 S. dys. 89.4 88.8 99.2 – 55.3 P. aer. 71.0 69.9 69.7 69.4 – % Similarity Table A.11: HemW (Non-Operon) Homologies. 211 % Homologies of Operon Proteins % Identity Y. ent. Y. pest. E. coli S. dys. P. aer. Y. ent. – 90.9 68.0 68.3 27.1 Y. pest. 94.7 – 68.1 68.1 26.8 E. coli 81.6 81.2 – 97.7 26.5 S. dys. 81.6 80.9 98.0 – 26.2 P. aer. 44.5 44.5 47.0 46.2 – % Similarity Table A.12: HemX (Non-Operon) Homologies. % Identity Y. ent. Y. pest. E. coli S. dys. P. aer. Y. ent. – 92.0 70.9 70.6 26.9 Y. pest. 95.5 – 70.6 70.4 26.9 E. coli 86.3 84.6 – 99.7 25.1 S. dys. 86.1 84.3 99.7 – 25.1 P. aer. 49.6 49.1 48.0 47.7 – % Similarity Table A.13: HemY (Non-Operon) Homologies. 212 Appendix B Gene / Protein Sequences DNA Sequences Each line of data spans 50 bases. Red boxes denote areas of differentiation between the gene used in this thesis and that quoted in the literature – for HemS, this is Eu- ropean Nucleotide Archive (ENA) entry CAA54865.1, from Stojijlkovic & Hantke.37 Table B.1 catalogues these differences. Formatting for sequences is from the dnaseq package.257 HmuS, ChuS and ShuS inserts were bought in commerical plasmids from Ther- mofisher Invitrogen GeneArt. Following reconstitution into an appropriate plasmid (pET11d), each insert was sequenced. These sequences were then compared against those quoted in the literature (specifically, ENA entry AAC64867.1.78 for HmuS, BAB37802.1.258 for ChuS and AAC27810.1.259 for ShuS), with any differences also highlighted in red. Codon Lit. → Thesis Δ to Residue 241-243 tac → tat TYR → TYR 484-486 att → aat ILE → ASN 493-495 ttg → tta LEU → LEU 694-696 ttg → tta LEU → LEU 997-999 gac → gag ASP → GLU 1000-1002 gaa → caa GLU → GLN Table B.1: Codon differences between apo-HemS as quoted in the literature versus those used in this study. 213 Gene / Protein Sequences WT HemS – DNA Sequence 1 atgagcaaat caatatacga gcagtatcta caagctaaag cagataatcc 51 gggcaaatat gcgcgcgatt tggccacgct gatggggatt tcagaagcgg 101 aactgaccca tagccgcgtt agtcatgatg ccaaacgtct gaaaggtgat 151 gcccgcgcac tactggccgc attggaagct gtcggtgagg tcaaagctat 201 cacccgcaac acctatgccg tacatgagca aatgggccgt tatgaaaatc 251 aacatctgaa tggccatgct ggtttgatcc tcaatccacg caatttagat 301 ttacgcctgt tcctcaacca gtgggccagc gcattcacgc tgacagaaga 351 aactcgccac ggtgtacgcc atagcatcca gtttttcgac catcaaggcg 401 atgctctgca taaagtgtat gtcactgaac aaactgacat gccagcctgg 451 gaagcgctac tggcgcagtt tatcaccaca gaaaatccag agttacagct 501 agagccactg agcgcacctg aagtcactga accgacagcc accgatgaag 551 ctgtcgatgc tgaatggcgt gctatgactg acgtgcatca gttcttccag 601 ttgctcaaac gcaataattt gacccgtcag caagccttcc gtgccgtggg 651 taatgatctg gcttatcagg ttgataacag ttctctgacc cagttactga 701 acattgctca gcaagaacag aatgaaatca tgatttttgt gggtaaccgt 751 ggctgtgtac aaatattcac cggcatgatt gaaaaggtta caccacatca 801 agattggatt aatgttttca accagcgctt cacgctgcat ctgattgaaa 851 caacgattgc tgaaagctgg attacccgca agccaacaaa agacggtttc 901 gtgaccagtt tggaactgtt tgctgctgat ggcacccaaa ttgcacaact 951 ttacggtcag cgcaccgaag gccagccaga acaaacgcaa tggcgtgagc 1001 aaattgctcg cctcaataat aaggatatcg ccgcatga 214 Gene / Protein Sequences HmuS – DNA Sequence 1 ---aacgcat cattatacca acaatatgta caggctaaag cagagcaccc 51 tggcaaatat gcccgtgatt tagccaccct gatggggatt tcagaagcag 101 agctgaccca tagccgcgtc gggcatgatg caaaacgttt acaaagtgat 151 gctcgtgcat tattggccgc attggaatcc gtcggcgaag tcaaagccat 201 tacccgcaac acctatgcag ttcatgagca agtgggccgc tatgagaacc 251 aacacttaaa tggtcatgca gggttaatcc tcaatccacg cgccttggac 301 ctccggttat tcctgaatca gtgggcaagc gcctttacac tgaccgaaga 351 gacccgccac ggcgtgcgcc atagcatcca atttttcgac catcagggcg 401 atgcattaca caaagtgtat gtgacagaac agacagatat gtctgcctgg 451 gaagccttgc tggcacaatt tatcatcccg gaaaacccgg cattgcagtt 501 agaacctttg agcaccccag aagcggtaga acctacagcc gatgatgcaa 551 ccgtggatag cgaatggcgt gccatgaccg atgtacacca gttcttccaa 601 ctgcttaaac gcaataatct gacccgtcag caggcgttcc gcgctgttgg 651 tgatgatctg gcttaccagg tcgataacaa ctcactgact cagctgttgc 701 acatcgccca gcaagatcag aacgagatca tgatttttgt cggcaaccgc 751 ggctgtgtac aaattttcac cggcctgatt gaaaaagtca caccacacaa 801 cgaatggatt aatgtcttca atcagcgctt tacactgcat ctgatcgaaa 851 cggccattgc cgaaagctgg atcacccgca aaccaacaaa agacggtttt 901 gtcaccagcc tagaactgtt tgctgctgat ggtactcaac ttgcccaact 951 ctacggccag cgcaccgaag ggcagccaga acaaaaccaa tggcgtgaac 1001 agattgcccg cctaatcaac aaggatatcg ccgcatga 215 Gene / Protein Sequences ChuS – DNA Sequence 1 atgaaccact acacacgctg gcttgagtta aaagaacaaa atcccggaaa 51 gtacgcgcgt gacatcgcag ggttaatgaa tattagagaa gcagaactgg 101 catttgcacg cgtcacgcac gatgcgtggc ggatgcacgg cgatatccgt 151 gaaattctgg cggcgctcga aagtgttggc gaaaccaaat gtatttgtcg 201 taatgaatat gcagtccatg agcaagttgg tacgttcaca aaccagcatt 251 tgaacggaca tgccggattg atcctcaatc cgcgcgcgct ggatttacgt 301 ctgtttctca atcaatgggc cagtgttttc cacatcaaag aaaacacggc 351 tcgtggcgaa cgccagagta ttcagttctt tgatcatcag ggcgatgcat 401 tactaaaagt ttatgccacc gacaataccg atatggcggc atggagtgag 451 cttctggcac ggtttatcac cgatgagaat acgccgcttg agttaaaagc 501 cgttgatgcg ccagttgttc aaacgcgagc cgatgccact gtggtcgagc 551 aagagtggcg ggcgatgacc gacgttcatc agttttttac gttgctcaag 601 cgccacaacc tgacgcgcca acaggcgttc aatctggtgg cagacgattt 651 ggcctgcaaa gtatccaaca gtgcgttggc gcaaattctt gaatctgcac 701 agcaggatgg taatgaaatc atggtgtttg ttggcaaccg tggctgcgta 751 cagattttca ccggtgtggt agaaaaagtg gtgccaatga aaggttggct 801 gaatattttc aacccgacgt ttactcttca tctattagaa gagagcattg 851 ctgaagcctg ggttacccgt aaaccgacca gcgatggcta cgtaaccagt 901 ctggaattgt ttgcccatga tggtacgcag atagcgcaac tttatggtca 951 acgtacagaa ggcgaacagg agcaagcgca atggcgtaag caaattgctt 1001 cgctgatacc ggaaggcgtt gctgcataa 216 Gene / Protein Sequences ShuS – DNA Sequence 1 ---aaccact acacacgctg gcttgagtta aaagaacaaa atcccggaaa 51 gtacgtgcgt gacatcgcag ggttaatgaa tattagagaa gcagaactgg 101 catttgcacg agtcacgcac gatgcgtggc ggatgcgcgg cgatatccgt 151 gaaattctgg cggcgctcga aagtgttggc gagaccaaat gtatttgccg 201 taatgaatat gcagtccatg agcaagttgg tgcgttcaca aaccagcatt 251 tgaatggaca tgccggattg atcctcaatc cacgcgcgct ggatttacgt 301 ctgtttctca atcaatgggc cagtgttttc cacatcaaag aaaacacggc 351 tcgtggcgaa cgccagagga ttcagttctt tgatcatcag ggcgatgcat 401 tactaaaagt ttatgccacc gacaataccg atatggcggc atggagtgag 451 cttctggcac ggtttatcac cgatgagaat atgccgcttg agttaaaagc 501 cgttgatgcg ccagttgttc aaacgcgagc cgatgccact gtggtcgagc 551 aagagtggcg agcgatgacc gacgttcatc agttttttac gttgctcaag 601 cgccacaacc tgacgcgcca acaggcgttc aatctggtgg cagacgattt 651 ggcctgcaaa gtatccaaca gtgcgttggc gcaaattctt gaatctgcac 701 agcaggatgg taatgaaatc atggtgtttg ttggcaaccg tggctgcgta 751 cagattttca ccggtgtggt agaaaaagtg gtgccaatga aaggttggct 801 gaatattttc aacccgacgt ttactcttca tctattagaa gagagcattg 851 ctgaagcctg ggttacccgt aaaccgacca gcgatggcta cgtaaccagt 901 ctggaattgt ttgcccatga tggtacgcag atagcgcaac tttatggtca 951 acgtacagaa ggcgaacagg agcaagcgca atggcgtaaa caaattgctt 1001 cgctgatacc ggaaggcgtt gctgcataa 217 Gene / Protein Sequences Protein Sequences (Experimental) For HemS, of the six different codons between the literature and the sample used in the present study, three lead to different residues being coded for. All three of these residues are situated on solvent-exposed loops, which are nowhere near the main binding pocket. It was therefore concluded that these mutations would not affect functionality, and so were not corrected. In any case, the crystal structure for HemS (2J0R)95 corresponds with the present study rather than the ENA sequence at posi- tions 162 (c.f. codon 484-486) and 334 (c.f. codon 1000-1002), respectively. In other words, there is only one deviation in residue type from the published crystal struc- ture for apo-HemS and the WT structure used in the present study (a conversion from ASP to GLU at position 333). In reality, there is a His6-tag encoded in the expression vector along with a thrombin cleavage site. Once cleaved, a glycine and a serine residue are retained at the N-terminus, attached to the first methionine residue. As they are situated at a solvent-exposed region far from the main cavity, these two are also thought not to significantly affect the functionality of HemS. They are not included in the sequences below in order to keep the residue indices consistent with those quoted throughout the thesis. For each experimental protein sequence, residues highlighted in red denote dif- ferences between those used in this work, and those quoted in the literature. For HemS, the published crystal structure of apo-HemS (2J0R) is used as the literature reference, whereas for HmuS, ChuS and ShuS the reference sequences are from ENA codes AAC64867.1.,78 BAB37802.1.258 and AAC27810.1.259 respectively. Residues highlighted in blue and detailed in Table B.2 denote differences between the experi- mental and computational apo-HemS sequences. Where a residue differs both from the literature and the computational sequence, it is highlighted in maroon. These are also detailed in Table B.2. Reside Location Experimental Computational 1-3 MSK - - - 197 Q E 333 E D 342-346 DIAA* - - - - - Table B.2: Residue differences between the experimental and computational sequences used for apo-HemS. 218 Gene / Protein Sequences WT HemS – Protein Sequence (Experimental) 1 MSKSIYEQYL QAKADNPGKY ARDLATLMGI SEAELTHSRV SHDAKRLKGD 51 ARALLAALEA VGEVKAITRN TYAVHEQMGR YENQHLNGHA GLILNPRNLD 101 LRLFLNQWAS AFTLTEETRH GVRHSIQFFD HQGDALHKVY VTEQTDMPAW 151 EALLAQFITT ENPELQLEPL SAPEVTEPTA TDEAVDAEWR AMTDVHQFFQ 201 LLKRNNLTRQ QAFRAVGNDL AYQVDNSSLT QLLNIAQQEQ NEIMIFVGNR 251 GCVQIFTGMI EKVTPHQDWI NVFNQRFTLH LIETTIAESW ITRKPTKDGF 301 VTSLELFAAD GTQIAQLYGQ RTEGQPEQTQ WREQIARLNN KDIAA* HmuS – Protein Sequence (Experimental) 1 -NASLYQQYV QAKAEHPGKY ARDLATLMGI SEAELTHSRV GHDAKRLQSD 51 ARALLAALES VGEVKAITRN TYAVHEQVGR YENQHLNGHA GLILNPRALD 101 LRLFLNQWAS AFTLTEETRH GVRHSIQFFD HQGDALHKVY VTEQTDMSAW 151 EALLAQFIIP ENPALQLEPL STPEAVEPTA DDATVDSEWR AMTDVHQFFQ 201 LLKRNNLTRQ QAFRAVGDDL AYQVDNNSLT QLLHIAQQDQ NEIMIFVGNR 251 GCVQIFTGLI EKVTPHNEWI NVFNQRFTLH LIETAIAESW ITRKPTKDGF 301 VTSLELFAAD GTQLAQLYGQ RTEGQPEQNQ WREQIARLIN KDIAA* ChuS – Protein Sequence (Experimental) 1 MNHYTRWLEL KEQNPGKYAR DIAGLMNIRE AELAFARVTH DAWRMHGDIR 51 EILAALESVG ETKCICRNEY AVHEQVGTFT NQHLNGHAGL ILNPRALDLR 101 LFLNQWASVF HIKENTARGE RQSIQFFDHQ GDALLKVYAT DNTDMAAWSE 151 LLARFITDEN TPLELKAVDA PVVQTRADAT VVEQEWRAMT DVHQFFTLLK 201 RHNLTRQQAF NLVADDLACK VSNSALAQIL ESAQQDGNEI MVFVGNRGCV 251 QIFTGVVEKV VPMKGWLNIF NPTFTLHLLE ESIAEAWVTR KPTSDGYVTS 301 LELFAHDGTQ IAQLYGQRTE GEQEQAQWRK QIASLIPEGV AA* ShuS – Protein Sequence (Experimental) 1 -NHYTRWLEL KEQNPGKYVR DIAGLMNIRE AELAFARVTH DAWRMRGDIR 51 EILAALESVG ETKCICRNEY AVHEQVGAFT NQHLNGHAGL ILNPRALDLR 101 LFLNQWASVF HIKENTARGE RQRIQFFDHQ GDALLKVYAT DNTDMAAWSE 151 LLARFITDEN MPLELKAVDA PVVQTRADAT VVEQEWRAMT DVHQFFTLLK 201 RHNLTRQQAF NLVADDLACK VSNSALAQIL ESAQQDGNEI MVFVGNRGCV 251 QIFTGVVEKV VPMKGWLNIF NPTFTLHLLE ESIAEAWVTR KPTSDGYVTS 301 LELFAHDGTQ IAQLYGQRTE GEQEQAQWRK QIASLIPEGV AA* 219 Gene / Protein Sequences Experimental/Computational Sequence Differences HemS The residues removed at the N-terminal (MSK) and C-terminal (DIAA*) regions should not unduly affect protein functionality as they are far from the main cavity. Furthermore, these residues do not appear in any crystal structures any- way, whether in the literature or in the work described in this thesis. Conversion of residue 333 from glutamic acid to aspartic acid, being in a solvent-exposed loop, should not have much effect. Indeed, this change is simply a reversal of the mutation at residue 333, making the computational sequence at this region more faithful to the literature 2J0P structure than the experimental sequence. Another difference is at residue 197, where the glutamine from the experimental sequence has been changed to a glutamic acid. This seems to have been a mistake inherited from Choy,21 and was not noticed until well into the project. Though this residue is situated very close to the main cavity, it is hoped that it will not significantly affect the protein functionality, due to the fact that it is pointing away from the cavity. Though glu- tamine is a polar, neutral residue, and glutamic acid polar, acidic, it is also hoped that their similar steric bulks will help to preserve the protein function as well. HmuS Due to the computational sequence for HmuS being derived from that for HemS, some of the ‘errors’ from HemS were carried over during the checkspmu- tate routine. Namely, the computational sequence for HmuS has some residues removed from the N-terminal region (NA) and the C-terminal region (DIAA*) with respect to the experimental sequence. Being part of outer loops, these differences should have little bearing on protein functionality. Also, E197 replaces Q197. This is equivalent to HemS. ChuS The computational sequence for ChuS has some residues removed from the N-terminal region (M) and the C-terminal region (EGVAA*) with respect to the ex- perimental sequence. Being part of outer loops, these differences should have little bearing on protein functionality. ShuS The computational sequence for ShuS has some residues removed from the C-terminal region (GVAA*) with respect to the experimental sequence. Being part of an outer loop, this difference should have little bearing on protein functionality. Furthermore, there is a difference at residue 327, where a glutamic acid replaces a glutamine in the computational sequence, which was most likely due to an error carried over from HemS during the checkspmutate routine. Being situated on an outer loop, this difference should also be of little consequence. 220 Gene / Protein Sequences Sequence Transformations by checkspmutate checkspmutate used the [WT HemS + Haem + NADH] sequence as a template. Those residues which were mutated to give one of the HemS mutants studied in this work (F104A, F104AF199A, F104I, F199A, R209A, R209K and Q210A) are highlighted in brown in the WT sequence. For the HmuS, ChuS, ShuS and PhuS sequences, the residues which had to be mutated from the HemS template are highlighted in red. When a residue was re- moved, this is highlighted in blue, with a dash used as a placeholder for the missing residue. When a residue was inserted, this is highlight in green; a dash was also used as a placeholder where these residues were missing from the other homologues. WT HemS Sequence (Computational) 1 SIYEQYLQAK ADNPGKYARD LATLMGISEA ELTHSRVSHD AKRLKGDARA 51 LLAALEAVGE VKAITRNTYA VHEQMGRYEN QHL--NGHAG LILNPRNLDL 101 RLFLNQWASA FTLTEETRHG VRHSIQFFDH QGDALHKVYV TEQTDMPAWE 151 ALLAQFITTE NPELQ-LEP- -LSAPEVTEP TATDEAVDAE WRAMTDVHEF 201 FQLLKRNNLT RQQAFRAVGN DLAYQVDNSS LTQLLNIAQQ EQNEIMIFVG 251 NRGCVQIFTG MIEKVTPHQD WINVFNQRFT LHLIETTIAE SWITRKPTKD 301 GFVTSLELFA ADGTQIAQLY GQRTEGQPEQ TQWRDQIARL NNK HmuS Sequence (Computational) 1 SLYQQYVQAK AEHPGKYARD LATLMGISEA ELTHSRVGHD AKRLQSDARA 51 LLAALESVGE VKAITRNTYA VHEQVGRYEN QHL--NGHAG LILNPRALDL 101 RLFLNQWASA FTLTEETRHG VRHSIQFFDH QGDALHKVYV TEQTDMSAWE 151 ALLAQFIIPE NPALQ-LEP- -LSTPEAVEP TADDATVDSE WRAMTDVHEF 201 FQLLKRNNLT RQQAFRAVGD DLAYQVDNNS LTQLLHIAQQ DQNEIMIFVG 251 NRGCVQIFTG LIEKVTPHNE WINVFNQRFT LHLIETAIAE SWITRKPTKD 301 GFVTSLELFA ADGTQLAQLY GQRTEGQPEQ NQWREQIARL INK 221 Gene / Protein Sequences ChuS Sequence (Computational) 1 NHYTRWLELK EQNPGKYARD IAGLMNIREA ELAFARVTHD AWRMHGDIRE 51 ILAALESVGE TKCICRNEYA VHEQVGTFTN QHL--NGHAG LILNPRALDL 101 RLFLNQWASV FHIKENTARG ERQSIQFFDH QGDALLKVYA TDNTDMAAWS 151 ELLARFITDE NTPLE-LKA- -VDAPVVQT- RADATVVEQE WRAMTDVHQF 201 FTLLKRHNLT RQQAFNLVAD DLACKVSNSA LAQILESAQQ DGNEIMVFVG 251 NRGCVQIFTG VVEKVVPMKG WLNIFNPTFT LHLLEESIAE AWVTRKPTSD 301 GYVTSLELFA HDGTQIAQLY GQRTEGEQEQ AQWRKQIASL IP- ShuS Sequence (Computational) 1 NHYTRWLELK EQNPGKYVRD IAGLMNIREA ELAFARVTHD AWRMRGDIRE 51 ILAALESVGE TKCICRNEYA VHEQVGAFTN QHL--NGHAG LILNPRALDL 101 RLFLNQWASV FHIKENTARG ERQRIQFFDH QGDALLKVYA TDNTDMAAWS 151 ELLARFITDE NMPLE-LKA- -VDAPVVQT- RADATVVEQE WRAMTDVHQF 201 FTLLKRHNLT RQQAFNLVAD DLACKVSNSA LAQILESAQQ DGNEIMVFVG 251 NRGCVQIFTG VVEKVVPMKG WLNIFNPTFT LHLLEESIAE AWVTRKPTSD 301 GYVTSLELFA HDGTQIAQLY GQRTEGEQEQ AEWRKQIASL IPE PhuS Sequence (Computational) 1 ELYRAWQDLR AERPQLRARD AAALLQVSEG ELVASRVGID AVRLRPDWAA 51 LLPALGELGP IMALTRNEHC VHERKGPYRE VTVSANGQMG LVVSP-DIDL 101 RLFLGGWNAV FAIAEETARG TQRSIQVFDQ QGVAVHKVFL AEASDVRAWE 151 PLVERLRAAE QDAVLALHEP RAPAAALVDA QIDAAALREG WAALKDTHHF 201 HALLKKHGAQ RTQALRLAGG EWAERLDNGD LAKLFEAAAE SGLPIMVFVG 251 NAHCIQIHTG PVCNLKWLDD WFNVLDPEFN LHLKTTGIAE LWRVRKPSTD 301 GIVTSWEAFD PDGELIVQLF GARKPGEPER DDWRELAESF KAL 222 Gene / Protein Sequences HemS Mutants The primers used for point-mutations were as follows. Sequences are all shown from 5′ to 3′, and mutagenic bases are capitalised. F104A Forward Sequence: agatttacgcctgGCcctcaaccagtggg Reverse Complement: cccactggttgaggGCcaggcgtaaatct F104I Forward Sequence: agatttacgcctgAtcctcaaccagtgggc Reverse Complement: gcccactggttgaggaTcaggcgtaaatct F199A Forward Sequence: gactgacgtgcatcagttcGCccagttgctcaaacgc Reverse Complement: gcgtttgagcaactggGCgaactgatgcacgtcagtc R209A Forward Sequence: cgcaataatttgaccGCtcagcaagccttccg Reverse Complement: cggaaggcttgctgaGCggtcaaattattgcg R209K Forward Sequence: cgcaataatttgaccAAtcagcaagccttccg Reverse Complement: cggaaggcttgctgaTTggtcaaattattgcg Q210A Forward Sequence: cgcaataatttgacccgtGCgcaagccttccg Reverse Complement: cggaaggcttgcGCacgggtcaaattattgcg 223 Appendix C Computing Free Energies As discussed in Section 1.6.3, free energies can be derived from the underlying PES. This analysis is achieved by dividing the PES into catchment basins and applying the superposition approach,260 which defines the canonical partition function, Z(T ), and density of states, Ω(E), as the summation of individual contributions from each minimum, i, thus giving,133;134;261;262 Z(T ) = ∑ i Zi(T ) and Ω(E) = ∑ i Ωi(E). (C.1) Meanwhile, the vibrational partition function of each local minimum can be esti- mated using the harmonic approximation (which should be valid as protein folding and protein-ligand problems do not typically involve covalent bond breaking/making events). Therefore, the partition function for each local minimum can be expressed as, Zi(T ) = nie −βVi (βhνi) χ , (C.2) where ni is the number of distinct permutational isomers in minimum i, Vi is the potential energy for minimum i, νi its geometric mean vibrational frequency and χ the number of vibrational degrees of freedom. β is simply 1/kBT , where kB is the Boltzmann constant and T a user-defined temperature in K. The free energy of each minimum i, FEi (T ), can then be calculated using, FEi (T ) = −kBT lnZi(T ). (C.3) Eq. (C.1) can then be used to determine the total canonical partition function. Over- all, therefore, this formulation encapsulates the equilibrium occupation probability for each minimum as a function of temperature, and is explicitly ergodic.133;263 From this framework, many other interesting properties can be derived. For ex- 224 Computing Free Energies ample, heat capacities can be generated by differentiating the total partition function as follows, CH(T ) = 1 kBT 2 ∂2 lnZ(T ) ∂β2 , (C.4) giving, CH(T ) = χkB + kBT 2 ∑ i gi(T ) ( ∂ lnPi(T ) ∂T ) , (C.5) where Pi(T ) is the occupation probability of minimum i at temperature T , expressed as, Pi(T ) = nie −βVi/νiχ∑ γ nγe −βVγ/νγχ , (C.6) and gi(T ) is the change in occupation probability given by, gi(T ) = ∂Pi(T ) ∂T . (C.7) A maximum in the sum in Eq. (C.5) corresponds to a maximum in the heat capacity. gi(T ) and ∂ lnPi(T )/∂T both necessarily have the same sign and so such peaks can be interpreted in terms of contributions from local minima with positive and negative temperature derivatives. Therefore, for a decomposition according to the normal mode approximation, this analysis can reveal which local minima are responsible for any given heat capacity feature.263 As well as heat capacity curves, it is possible to use vibrational densities of states to investigate kinetic properties. This approach is based on the principle that the coupling between all possible individual transitions will yield a description of the overall transition rates between particular states in a system.133;134 Therefore, overall kinetic properties can be derived from a combination of unimolecular rate theory and master equation dynamics. Transition state theory (TST)264–266 is the unimolecular rate theory implemented in the Wales group program, pathsample.231 From this theory, the unimolecular rate constant through a transition state † from minimum i at temperature T is given by, k†i (T ) = kBT h Z† Zi e−∆βV † i , (C.8) where ∆V †i is the difference in energy between the minimum i and TS †, and Z† is the partition function of the TS without the degree of freedom associated with the negative eigenvalue. This formulation shows that an equilibrium between the start and end minima via the TS is not required.267;268 Also, Rice-Ramsperger-Kassel- Marcus (RRKM) theory269–272 yields the same expression but from the viewpoint 225 Computing Free Energies of reactive flux flowing through a dividing transition surface.134 Therefore, the dy- namics of the overall system can be described using a master equation if they are Markovian.273 The master equation approach assumes that the system occupies individual min- ima i for periods long enough so that its motion to i+1 can be effectively decorre- lated from the previous motion describing i–1→ i.134 Such a system is colloquially described as having no ‘memory’ between transitions, and is the signature of a Markovian system. If P(t) is a vector describing the occupation probability for all states at time t, the change in the occupation probabilities can be expressed by a master equation, dPa(t) dt = ∑ b 6=a [kabPb(t) − kbaPa(t)]. (C.9) Here, kab is the rate constant for the transition to minimum a from minimum b as defined from TST, and Pa is the occupation probability of minimum a. To solve the master equation analytically, the transition matrix, W is introduced, and Eq. (C.9) can be rewritten as, dP(t) dt = WP(t). (C.10) In a collection of minima that are all connected, a zero eigenvalue arises in W, which corresponds to the equilibrium occupation probability vector, Peq. Detailed balance is therefore obeyed, i.e.,∑ b6=a [kabP eq b − kbaP eqa ] = 0. (C.11) Combining this result with the discrete path sampling approach (where transi- tions between two states, A and B are considered) requires all of the local minima to be sorted into the A, B or intervening I sets. Considering just the A and B sets, this approach gives rise to phenomenological rate constants, which are weighted sums of the contributions from all transition states from each discrete path joining the A and B regions. These phenomenological rate constants, kAB and kBA, can therefore be expressed as, kAB = 1 P eqB ∑ b∈B ∑ a∈A kabP eq b and kBA = 1 P eqA ∑ a∈A ∑ b∈B kbaP eq a . (C.12) For the complicated protein-ligand interactions being investigated in this thesis, 226 Computing Free Energies however, it would be unusual for the A and B regions to be directly connected. Therefore, the I region must be included too. A steady state approximation is applied to these I minima, suggesting that the rate of change in the probability of occupation for these minima tends to zero. This result can then be applied to the master equation. However, it is also possible to relax this criterion and find non- steady state rates, using committor probabilities, CAb and C B a , giving 134;137;274;275 kNSSAB = 1 P eqB ∑ a←b CAb P eq b tb and kNSSBA = 1 P eqA ∑ b←a CBa P eq a ta . (C.13) Here, CAb denotes the probability of encountering a minimum in A before encoun- tering one from B when following a random walk starting from minimum b in B. Moreover, tb denotes the mean waiting time for a transition to any minimum in A from minimum b. Whereas such waiting times are automatically zero in the steady state approximation, an estimate for them must be made in the non-steady state condition. The new graph transformation (NGT) method, as it is applied in pathsample, is one such method that can be used.138 Should the above analysis prove too complicated for the system under study, it is also possible to apply regrouping schemes to simplify the data.147 This regrouping is achieved by lumping minima separated by barriers beneath a user-defined threshold. 227 Appendix D Specifics of Method Development Glossary of Terms for Method Development checkspmutate. Subroutine that mutates and reoptimises any user-specified residues for all of the stationary points from a template system. checkspmutate Strategy. Overall strategy, which uses checkspmutate to mutate and reoptimise all of the stationary points of a template system, and then fills in the gaps using a combination of steepest-descent algorithms, DPS and, if required, connectunc lowesttest. connectunc lowesttest. Subset of the larger connectunc subroutine that provides an efficient way to connect sub-databases, both by selecting which sub-databases to connect, and by selecting appropriate minima for con- nection attempts between these sub-databases connectunc lowesttest Strategy. Overall strategy, which uses con- nectunc lowesttest to connect sub-databases efficiently. It is intended for databases that contain many sub-databases far apart in conformational space. Converge. The act of optimising/reoptimising a given stationary point to within a given RMS force threshold value. Achievement of this value indicates a stable stationary point. Database. The overall collection of stationary points for a particular system. Fully connected pathway. A chain of minima and TSs between two minima of interest with no intervening gaps. 228 Specifics of Method Development Gap. Any break in a pathway between two selected minima. This could be due to missing TSs and/or minima. Mutation. A change of the atomic makeup of a selected residue, resulting in a list of new properties and of new coordinates. Region. A particular part of the landscape, usually used to describe the stationary points surrounding a site. Reoptimisation. Attempt to converge a mutated system to a stable station- ary point. Site. One of the original minima identified to be of interest for a particular system. In the case of the [WT HemS + Haem + NADH] system, there were five such sites, based on the work by Choy, Shang and the author. Sub-database (SD). A subset of a database where all of the minima and TSs are interconnected. In this work, sub-databases typically arose by sampling around a site, which provided a network of stationary points in the landscape surrounding that site. These stationary points tended to be close to one an- other in conformational space. As long as all of these stationary points are connected to one another, they constitute a sub-database. System. The collection of atoms being studied, which could comprise just one molecule (e.g. a protein) or many molecules (e.g. a protein with ligands). If any of the residues in the protein are mutated, or any of the ligands changed, the resulting collection of atoms should be considered as a new system. Please note that, for easier readability, the [WT HemS + Haem + NADH] system is typically referred to in the text simply as the WT HemS system, with the inclusion of haem and NADH being implied. This shorthand is also true for the mutants and homologues of HemS. Template. The original system studied. Its database is usually derived from more traditional sampling methods. To study a mutated system, the station- ary points from this original system can be used as a basis for determining the likely coordinates of the respective stationary points of the new system. Template-based Strategy. A strategy that uses a template system to derive others. It is synonymous with the checkspmutate strategy. 229 Specifics of Method Development Figures Describing the Methods Developed Fig. D.1 illustrates the operation of the overall strategy involving connectunc lowesttest to find connections between widely separated sub-databases. Fig. D.2 illustrates the operation of the overall checkspmutate strategy to find a fully connected pathway of a mutated system, using the fully connected pathway from a similar system as a template. 230 Specifics of Method Development Figure D.1: Cartoon representing the overall connectunc lowesttest strategy. Each point represents a minimum in an energy landscape, colour-coded according to which of the five original sub-databases (SD) they belong to. x and y are arbitrary parameters to give a sense of how close these minima are in conformational space. (A) None of the five sub-databases are connected to one another. (B) connectunc lowesttest is used to identify which two minima from two separate sub-databases are closest in conformational space, and select them for a connection attempt. Alternative pairs of minima, such as that represented by the red line with the cross through it, are not considered for connection attempts unless the closer pairs fail. Once a connection has successfully been established between two pairs in two separate sub-databases, all of the minima comprising these sub- databases become connected. This situated is indicated by the colour change of SD 3 from yellow to blue. (C) connectunc lowesttest also identifies which sub-databases to try to connect. The figure shows two minima between SD 2 and SD 3 in closer proximity to one another than any minima between SD 1 and SD 2. There is therefore no need to make an attempted connection between SD 1 and SD 2 directly – provided the connection attempt between SD 2 and SD 3 is successful, SD 1 and SD 2 are also automatically connected to one another. connectunc lowesttest therefore considers the fewest number of connection attempts required to connect all of the sub-databases within a database, and the closest minima within these respective sub-databases to select for these connection attempts. 231 Specifics of Method Development Figure D.2: Cartoon representing the overall checkspmutate strategy. (A) Fully con- nected pathway for an original, template system. Blue circles represent minima and smaller black cirlces represent TSs. Black dashed lines indicate direct connections between the minima and TSs. (B) Following mutations and reoptimisations to the stationary points from the template system, a new set of stationary points describing the mutated system emerges. These structures should closely resemble the original stationary points they were based on from the template, and so their relative positions are generally retained (in the figure, they are kept in the exact same positions for conceptual convenience, plus their colour-coding is kept consistent). However, it is unlikely that all stationary points will have converged successfully upon reoptimisation, and so gaps arise in the new pathway. Furthermore, information on connectivities between the minima and TSs is lost. (C) Steepest-descent pathways from the successfully reoptimised TSs are calculated to 232 Specifics of Method Development Figure D.2: (continued) determine the minima each TS directly connects. Where these minima coincide with successfully reoptimised minima already present in the database, this newly-found connection is indicated by a dashed line from the TS to that (blue) minimum. Where the steepest-descent pathway culminates in a minimum not already found in the database, this new minimum is therefore added, indicated by an orange circle, and its connection to the TS also represented by a dashed line. (D) Attempts are made to connect any gaps still present in the pathway. Minima to connect are selected according to the relative positions their equivalents had in the original pathway. New minima identified are represented by purple circles and new TSs by smaller red circles. (E) The process in D typically bridges most of the gaps in the pathway. However, there may be some longer gaps that prove difficult to connect. connectunc lowesttest is therefore used to try to connect the remaining parts of the pathway (which can be considered as sub-databases) based on an efficient selection of which minima from each part to consider. The fully connected pathway that is equivalent to the one seen in the original system is indicated by a series of green dashed lines. 233 Appendix E Full Phylogenetic Tree Figure E.1: Maximum likelihood phylogenetic tree for HemS and its homologues. Acces- sion numbers and bootstrap values are given. Figure reproduced from Xie,81 with small adaptations. 234 Appendix F Sequence Conservation The following sequence is of WT HemS according to the experimental sequence used throughout this thesis (see Appendix B). Residues are highlighted in bold if they showed >90% conservation with respect to the other 218 homologues in the bioin- formatic study. A background cyan colour indicates that the residue forms part of the large cavity, and a salmon colour indicates it forms part of the small cavity. The assignment of residues to these cavities was achieved using the default values in MetaPocket.244 1 MSKSIYEQYL QAKADNPGKY ARDLATLMGI SEAELTHSRV SHDAKRLKGD 51 ARALLAALEA VGEVKAITRN TYAVHEQMGR YENQHLNGHA GLILNPRNLD 101 LRLFLNQWAS AFTLTEETRH GVRHSIQFFD HQGDALHKVY VTEQTDMPAW 151 EALLAQFITT ENPELQLEPL SAPEVTEPTA TDEAVDAEWR AMTDVHQFFQ 201 LLKRNNLTRQ QAFRAVGNDL AYQVDNSSLT QLLNIAQQEQ NEIMIFVGNR 251 GCVQIFTGMI EKVTPHQDWI NVFNQRFTLH LIETTIAESW ITRKPTKDGF 301 VTSLELFAAD GTQIAQLYGQ RTEGQPEQTQ WREQIARLNN KDIAA* 235 Appendix G Stopped-Flow Curve Fitting Please see the following page. 236 Stopped-Flow Curve Fitting 50 75 100 150 250 400 750 1000 Figure G.1: Overlay of curves fitted to Wild Type HemS stopped-flow data. Graph titles correspond to the NADH concentration. 237 Stopped-Flow Curve Fitting 50 75 100 150 250 400 750 1000 Figure G.2: Overlay of curves fitted to F104AF199A HemS stopped-flow data. Graph titles correspond to the NADH concentration. The 50 µM NADH case could not be accurately fitted to the data, and so has been left blank. 238 References [1] Venter, C.; Cohen, D. The 21st century: the century of Biology. New Perspect. Q. 1997, 14, 26–31. [2] Herman, J. G.; Graff, J. R.; Myohanen, S.; Nelkin, B. D.; Baylin, S. B. Methylation-specific PCR: A novel PCR assay for methylation status of CpG islands. Proc. Natl. Acad. Sci. USA 1996, 93, 9821–9826. [3] Barski, A.; Cuddapah, S.; Cui, K.; Roh, T. Y.; Schones, D. E.; Wang, Z.; Wei, G.; Chepelev, I.; Zhao, K. High-resolution profiling of histone methyla- tions in the human genome. Cell 2007, 129, 823–837. [4] Booth, M. J.; Branco, M. R.; Ficz, G.; Oxley, D.; Krueger, F.; Reik, W.; Balasubramanian, S. Quantitative sequencing of 5-methylcytosine and 5- hydroxymethylcytosine at single-base resolution. Science 2012, 336, 934–938. [5] Arnold, F. H. Design by Directed Evolution. Acc. Chem. Res. 1998, 31, 125– 131. [6] Romero, P. A.; Arnold, F. H. Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol. 2009, 10, 866–876. [7] Woolf, T. M. Therapeutic repair of mutated nucleic acid sequences. Nat. Biotechnol. 1998, 16, 341–344. [8] Cermak, T.; Doyle, E. L.; Christian, M.; Wang, L.; Zhang, Y.; Schmidt, C.; Baller, J. A.; Somia, N. V.; Bogdanove, A. J.; Voytas, D. F. Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting. Nucleic Acids Res. 2011, 39, 1–11. [9] Jinek, M.; Chylinski, K.; Fonfara, I.; Hauer, M.; Doudna, J. A.; Charpen- tier, E. A programmable dual-RNA–guided DNA endonuclease in adaptive bacterial immunity. Science 2012, 337, 816–822. [10] Dubochet, J.; McDowall, A. W. Vitrification of pure water for electron mi- croscopy. J. Microsc. 1981, 124, 3–4. [11] Frank, J.; Radermacher, M.; Penczek, P.; Zhu, J.; Li, Y.; Ladjadj, M.; Leith, A. SPIDER and WEB: Processing and visualization of images in 3D electron microscopy and related fields. J. Struct. Biol. 1996, 116, 190–199. 239 REFERENCES [12] Henderson, R.; Baldwin, J.; Ceska, T.; Zemlin, F.; Beckmann, E.; Downing, K. Model for the structure of bacteriorhodopsin based on high-resolution electron cryo-microscopy. J. Mol. Biol. 1990, 213, 899–929. [13] Wu, X. et al. Rational Design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV-1. Science 2010, 329, 856–861. [14] Gatenby, R. A.; Smallbone, K.; Maini, P. K.; Rose, F.; Averill, J.; Nagle, R. B.; Worrall, L.; Gillies, R. J. Cellular adaptations to hypoxia and acidosis during somatic evolution of breast cancer. Br. J. Cancer 2007, 97, 646–653. [15] Smallbone, K.; Gatenby, R. A.; Gillies, R. J.; Maini, P. K.; Gavaghan, D. J. Metabolic changes during carcinogenesis: potential impact on invasiveness. J. Theor. Biol. 2007, 244, 703–713. [16] Metzcar, J.; Wang, Y.; Heiland, R.; Macklin, P. A review of cell-based com- putational modeling in Cancer Biology. JCO Clin. Cancer Inform. 2019, 3, 1–13. [17] Ramaprasad, A.; Pain, A.; Ravasi, T. Defining the protein interaction network of human malaria parasite Plasmodium falciparum. Genomics 2012, 99, 69– 75. [18] De Laeter, J. R.; Bo¨hlke, J. K.; De Bie`vre, P.; Hidaka, H.; Peiser, H. S.; Rosman, K. J.; Taylor, P. D. Atomic weights of the elements: Review 2000 (IUPAC Technical Report). Pure Appl. Chem. 2003, 75, 683–800. [19] Haynes, W. M.; Lide, D. R.; Bruno, T. J. Abundance of Elements in the Earth’s Crust and in the Sea in CRC Handbook of Chemistry and Physics, 94th ed., CRC Press: Boca Raton, FL, 2016, 14–17. [20] Andrews, S. C.; Robinson, A. K.; Rodr´ıguez-Quin˜ones, F. Bacterial iron home- ostasis. FEMS Microbiol. Rev. 2003, 27, 215–237. [21] Choy, D. C. Y. Haemoproteins and the study of protein-ligand interactions. PhD Thesis, University of Cambridge, 2015. [22] Fenton, H. J. H. Oxidation of tartaric acid in presence of iron. J. Chem. Soc. Trans. 1894, 65, 899–910. [23] Theil, E. C. Ferritin: structure, gene regulation, and cellular function in ani- mals, plants, and microorganisms. Ann. Rev. Biochem. 1987, 56, 289–315. [24] Pistorius, E. K.; Axelrod, B. Iron, an essential component of lipoxygenase. J. Biol. Chem. 1974, 249, 3183–3186. [25] Hayashi, T.; Matsuo, T.; Hitomi, Y.; Okawa, K.; Suzuki, A.; Shiro, Y.; Iizuka, T.; Hisaeda, Y.; Ogoshi, H. Contribution of heme-propionate side chains to structure and function of myoglobin: chemical approach by arti- ficially created prosthetic groups. J. Inorg. Biochem. 2002, 91, 94–100. 240 REFERENCES [26] Barker, P. D.; Ferguson, S. J. Still a puzzle: why is haem covalently attached in c-type cytochromes? Structure 1999, 7, 281–290. [27] Kumar, S.; Bandyopadhyay, U. Free heme toxicity and its detoxification sys- tems in human. Toxicol. Lett. 2005, 157, 175–188. [28] Chiabrando, D.; Vinchi, F.; Fiorito, V.; Mercurio, S.; Tolosano, E. Heme in pathophysiology: a matter of scavenging, metabolism and trafficking across cell membranes. Front. Pharmacol. 2014, 5, 1–24. [29] Neidhardt, F. C.; Umbarger, H. E. Chemical composition of Escherichia coli in Escherichia coli and Salmonella: Cellular and Molecular Biology, Am. Soc. Microbiol. (ASM) Press: Washington DC, 1996, Chapter 3. [30] Alberts, B.; Johnson, A.; Lewis, J. The shape and structure of proteins in Molecular Biology of the Cell, 4th ed., Garland Science: New York, 2002, Chapter 3. [31] Lodish, H.; Berk, A.; Zipursky, S. L. Molecular Cell Biology, 4th ed., W. H. Freeman: New York, 2000. [32] Hutchins, D. A.; Rueter, J. G.; Fish, W. Siderophore production and nitro- gen fixation are mutually exclusive strategies in Anabaena 7120. Limnol. and Oceanogr. 1991, 36, 1–12. [33] Wilks, A.; Burkhard, K. A. Heme and virulence: how bacterial pathogens regulate, transport and utilize heme. Nat. Prod. Rep. 2007, 24, 511–522. [34] Runyen-Janecky, L. J. Role and regulation of heme iron acquisition in gram- negative pathogens. Front. Cell. Infect. Microbiol. 2013, 3, 1–11. [35] Ahn, S. H.; Han, J. H.; Lee, J. H.; Park, K. J.; Kong, I. S. Identification of an iron-regulated hemin-binding outer membrane protein, HupO, in Vibrio fluvialis : effects on hemolytic activity and the oxidative stress response. Infect. Immun. 2005, 73, 722–729. [36] Cobessi, D.; Meksem, A.; Brillet, K. Structure of the heme/hemoglobin outer membrane receptor ShuA from Shigella dysenteriae: heme binding by an in- duced fit mechanism. Proteins 2010, 78, 286–294. [37] Stojiljkovic, I.; Hantke, K. Transport of haemin across the cytoplasmic mem- brane through a haemin-specific periplasmic binding-protein dependent trans- port system in Yersinia enterocolitica. Mol. Microbiol. 1994, 13, 719–732. [38] Allen, W. J.; Phan, G.; Waksman, G. Structural biology of periplasmic chap- erones in Advances in Protein Chemistry and Structural Biology, 1st ed., El- sevier, 2009, 51–97. [39] Mattle, D.; Zeltina, A.; Woo, J. S.; Goetz, B. A.; Locher, K. P. Two stacked heme molecules in the binding pocket of the periplasmic heme-binding protein HmuT from Yersinia pestis . J. Mol. Biol. 2010, 404, 220–231. 241 REFERENCES [40] Burkhard, K. A.; Wilks, A. Functional characterization of the Shigella dysen- teriae heme ABC transporter. Biochemistry 2008, 47, 7977–7979. [41] Wegiel, B.; Otterbein, L. E. Go green: the anti-inflammatory effects of biliverdin reductase. Front. Pharmacol. 2012, 3, 1–8. [42] Sawyer, E. B. Biophysical analysis of haem-protein interactions in bacterial haem transfer systems. PhD Thesis, University of Cambridge, 2009. [43] Maines, M. D.; Kappas, A. Cobalt induction of hepatic heme oxygenase; with evidence that cytochrome P-450 is not essential for this enzyme activity. Proc. Natl. Acad. Sci. USA 1974, 71, 4293–4297. [44] Unno, M.; Matsui, T.; Ikeda-Saito, M. Structure and catalytic mechanism of heme oxygenase. Nat. Prod. Rep. 2007, 24, 553–570. [45] Schuller, D. J.; Wilks, A.; Ortiz De Montellano, P. R.; Poulos, T. L. Crystal structure of human heme oxygenase-1. Nat. Struct. Biol. 1999, 6, 860–867. [46] Wilks, A. Heme oxygenase: evolution, structure and mechanism. Antioxid. Redox Signal. 2002, 4, 603–614. [47] Maharshak, N.; Ryu, H. S.; Fan, T.-J.; Onyiah, J. C.; Otterbein, S. L.; Wong, R.; Hansen, J.; Otterbein, L. E.; Plevy, S. E. Escherichia coli heme oxy- genase modulates host innate immune reponses. Microbiol. Immunol. 2015, 59, 452–465. [48] Lehmann, E.; El-Tantawy, W. H.; Ocker, M.; Bartenschlager, R.; Lohmann, V.; Hashemolhosseini, S.; Tiegs, G.; Sass, G. The heme oxygenase 1 product biliverdin interferes with hepatitis C virus replication by increasing antiviral interferon response. Hepatology 2010, 51, 398–404. [49] Zhu, Z.; Wilson, A. T.; Luxon, B. A.; Brown, K. E.; Mathahs, M. M.; Bandy- opadhyay, S.; McCaffrey, A. P.; Schmidt, W. N. Biliverdin inhibits hepatitis C virus nonstructural 3/4A protease activity: mechanism for the antiviral effects of heme oxygenase? Hepatology 2010, 52, 1897–1905. [50] Stocker, R.; Glazer, A. N.; Ames, B. N. Antioxidant activity of albumin-bound bilirubin. Proc. Natl. Acad. Sci. USA 1987, 84, 5918–5922. [51] Stocker, R.; Yamamoto, Y.; McDonagh, A. F.; Glazer, A. N.; Ames, B. N. Bilirubin is an antioxidant of possible physiological importance. Science 1987, 235, 1043–1046. [52] Ohrui, T.; Yasuda, H.; Yamaya, M.; Matsui, T.; Sasaki, H. Transient relief of asthma symptoms during jaundice: a possible beneficial role of bilirubin. J. Exp. Med. 2003, 199, 193–196. [53] Baran˜ano, D. E.; Rao, M.; Ferris, C. D.; Snyder, S. H. Biliverdin reductase: a major physiologic cytoprotectant. Proc. Natl. Acad. Sci. USA 2002, 99, 16093–16098. 242 REFERENCES [54] Rivera, M.; Rodr´ıguez, J. C. The dual role of heme as cofactor and substrate in the biosynthesis of carbon monoxide. Met. Ions Life Sci. 2009, 6, 241–293. [55] Saebø, A.; Lassen, J. Acute and chronic gastrointestinal manifestations asso- ciated with Yersinia enterocolitica infection: a Norwegian 10-year follow-up study on 458 hospitalized patients. Ann. Surg. 1992, 215, 250–255. [56] Karachalios, G.; Bablekos, G.; Karachaliou, G.; Charalabopoulos, A. K.; Charalabopoulos, G. Infectious endocarditis due to Yersinia enterocolitica. Chemotherapy 2002, 48, 158–159. [57] Zin´czuk, J.; Wojskowicz, P.; Ki´sluk, J.; Fil, D.; Kemona, A.; Dadan, J. Mesenteric lymphadenitis caused by Yersinia enterocolitica. Prz. Gastroen- terol. 2015, 10, 118–121. [58] Reinicke, V.; Korner, B. Fulminant septicemia caused by Yersinia enterocol- itica. Scand. J. Infect. Dis. 1977, 9, 249–251. [59] Centers for Disease Control and Prevention (CDC) Yellow Book 2020: Health Information for International Travel, Oxford University Press: New York, 2020. [60] Stojiljkovic, I.; Hantke, K. Hemin uptake system of Yersinia enterocolitica: similarities with other TonB-dependent systems in Gram-negative bacteria. EMBO J. 1992, 11, 4359–4367. [61] Amarelle, V.; Koziol, U.; Rosconi, F.; Noya, F.; O’Brian, M. R.; Fabiano, E. A new small regulatory protein, HmuP, modulates haemin acquisition in Sinorhi- zobium meliloti . Microbiol. 2010, 156, 1873–1882. [62] Escamilla-Hernandez, R.; O’Brian, M. R. HmuP is a coactivator of Irr- dependent expression of heme utilization genes in Bradyrhizobium japonicum. J. Bacteriol. 2012, 194, 3137–3143. [63] Sato, T.; Nonoyama, S.; Kimura, A.; Nagata, Y.; Ohtsubo, Y.; Tsuda, M. The small protein HemP is a transcriptional activator for the hemin uptake operon in Burkholderia multivorans ATCC 17616. Appl. Environ. Microbiol. 2017, 83, 1–14. [64] Troxell, B.; Hassan, H. M. Transcriptional regulation by Ferric Uptake Reg- ulator (Fur) in pathogenic bacteria. Front. Cell. Infect. Microbiol. 2013, 3, 1–13. [65] Jacobi, C. A.; Gregor, S.; Rakin, A.; Heesemann, J. Expression analysis of the yersiniabactin receptor gene fyuA and the heme receptor hemR of Yersinia enterocolitica in vitro and in vivo using the reporter genes for green fluorescent protein and luciferase. Infect. Immun. 2001, 69, 7772–7782. [66] LaCross, N. C.; Marrs, C. F.; Gilsdorf, J. R. Otitis media associated polymor- phisms in the hemin receptor HemR of nontypeable Haemophilus influenzae. Infect. Genet. Evol. 2014, 26, 47–57. 243 REFERENCES [67] Roy, A.; Kucukural, A.; Zhang, Y. I-TASSER: a unified platform for auto- mated protein structure and function prediction. Nat. Protoc. 2010, 5, 725– 738. [68] Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioin- formatics 2008, 9, 1–8. [69] Zhang, Y. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins 2007, 69, 108–117. [70] Braun, V.; Hantke, K. Genetics of bacterial iron transport in Handbook of Microbial Iron Chelates, CRC Press: Boca Raton, FL, 1991, 107–138. [71] Shine, J.; Dalgarno, L. The 3′ terminal sequence of Escherichia coli 16S ribo- somal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci. USA 1974, 71, 1342–1346. [72] Woo, J. S.; Zeltina, A.; Goetz, B. A.; Locher, K. P. X-ray structure of the Yersinia pestis heme transporter HmuUV. Nat. Struct. Mol. Biol. 2012, 19, 1310–1315. [73] Walker, J. E.; Saraste, M.; Runswick, M. J.; Gay, N. J. Distantly related sequences in the α- and β-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J. 1982, 1, 945–951. [74] Hanson, P. I.; Whiteheart, S. W. AAA+ proteins: have engine, will work. Nat. Rev. Mol. Cell Biol. 2005, 6, 519–529. [75] Davis, K. M. All Yersinia are not created equal: phenotypic adaptation to distinct niches within mammalian tissues. Front. Cell. Infect. Microbiol. 2018, 8, 1–8. [76] Bateman, A. et al. UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Res. 2021, 49, 480–489. [77] Hornung, J. M.; Jones, H. A.; Perry, R. D. The hmu locus of Yersinia pestis is essential for utilization of free haemin and haem-protein complexes as iron sources. Mol. Microbiol. 1996, 20, 725–739. [78] Thompson, J. M.; Jones, H. A.; Perry, R. D. Molecular characterization of the hemin uptake locus (hmu) from Yersinia pestis and analysis of hmu mutants for hemin and hemoprotein utilization. Infect. Immun. 1999, 67, 3879–3892. [79] Torres, A. G.; Payne, S. M. Haem iron-transport system in enterohaemorrhagic Escherichia coli O157:H7. Mol. Microbiol. 1997, 23, 825–833. [80] Suits, M. D.; Pal, G. P.; Nakatsu, K.; Matte, A.; Cygler, M.; Jia, Z. Identifica- tion of an Escherichia coli O157:H7 heme oxygenase with tandem functional repeats. Proc. Natl. Acad. Sci. USA 2005, 102, 16955–16960. 244 REFERENCES [81] Xie, Y. Predicting function from structure: haem degradation and DNA bind- ing in the HemS family. Master’s Thesis, University of Cambridge, 2021. [82] Suits, M. D.; Lang, J.; Pal, G. P.; Couture, M.; Jia, Z. Structure and heme binding properties of Escherichia coli O157:H7 ChuX. Protein Sci. 2009, 18, 825–838. [83] LaMattina, J. W.; Nix, D. B.; Lanzilotta, W. N. Radical new paradigm for heme degradation in Escherichia coli O157:H7. Proc. Natl. Acad. Sci. USA 2016, 113, 12138–12143. [84] Zhang, Q.; van der Donk, W. A.; Liu, W. Radical-mediated enzymatic methy- lation: a tale of two SAMS. Acc. Chem. Res. 2012, 45, 555–564. [85] Huang, W.; Xu, H.; Li, Y.; Zhang, F.; Chen, X. Y.; He, Q.-L.; Igarashi, Y.; Tang, G.-L. Characterization of yatakemycin gene cluster revealing a radical S-adenosylmethionine dependent methyltransferase and highlighting spirocy- clopropane biosynthesis. J. Am. Chem. Soc. 2012, 134, 8831–8840. [86] Lan, R.; Reeves, P. R. Escherichia coli in disguise: molecular origins of Shigella. Microb. Infect. 2002, 4, 1125–1132. [87] Jin, Q. et al. Genome sequence of Shigella flexneri 2a: insights into pathogenic- ity through comparison with genomes of Escherichia coli K12 and O157. Nu- cleic Acids Res. 2002, 30, 4432–4441. [88] World Health Organisation (WHO): State of the art of vaccine research and development. 2005, 1–99. [89] Wyckoff, E. E.; Duncan, D.; Torres, A. G.; Mills, M.; Maase, K.; Payne, S. M. Structure of the Shigella dysenteriae haem transport locus and its phylogenetic distribution in enteric bacteria. Mol. Microbiol. 1998, 28, 1139–1152. [90] Griffin, A. S.; West, S. A.; Buckling, A. Cooperation and competition in pathogen bacteria. Nature 2004, 430, 1024–1027. [91] Harrison, F.; Browning, L. E.; Vos, M.; Buckling, A. Cooperation and virulence in acute Pseudomonas aeruginosa infections. BMC Biol. 2006, 4, 1–5. [92] Ochsner, U. A.; Johnson, Z.; Vasil, M. L. Genetics and regulation of two distinct haem-uptake systems, phu and has, in Pseudomonas aeruginosa. Mi- crobiology 2000, 146, 185–198. [93] Chan, A. C.; Lelj-Garolla, B.; Rosell, F. I.; Pedersen, K. A.; Mauk, A. G.; Murphy, M. E. Cofacial Heme Binding is Linked to Dimerization by a Bacterial Heme Transport Protein. J. Mol. Biol. 2006, 362, 1108–1119. [94] van Vliet, A. H.; Ketley, J. M.; Park, S. F.; Penn, C. W. The role of iron in Campylobacter gene regulation, metabolism and oxidative stress defense. FEMS Microbiol. Rev. 2002, 26, 173–186. 245 REFERENCES [95] Schneider, S.; Sharp, K. H.; Barker, P. D.; Paoli, M. An induced fit con- formational change underlies the binding mechanism of the heme transport proteobacteria-protein HemS. J. Biol. Chem. 2006, 281, 32606–32610. [96] Schneider, S.; Paoli, M. Haem-binding properties and crystallisation of the bacterial protein HemS. Acta Cryst. A 2005, 61, 343. [97] Song, Y.; Mao, J.; Gunner, M. R. Electrostatic environment of hemes in pro- teins: pK as of hydroxyl ligands. Biochemistry 2006, 45, 7949–7958. [98] Suits, M. D.; Jaffer, N.; Jia, Z. Structure of the Escherichia coli O157:H7 heme oxygenase ChuS in complex with heme and enzymatic inactivation by mutation of the heme coordinating residue His-193. J. Biol. Chem. 2006, 281, 36776–36782. [99] Ouellet, Y. H.; Ndiaye, C. T.; Gagne´, S. M.; Sebilo, A.; Suits, M. D.; Ju- binville, E´.; Jia, Z.; Ivancich, A.; Couture, M. An alternative reaction for heme degradation catalyzed by the Escherichia coli O157:H7 ChuS protein: Release of hematinic acid, tripyrrole and Fe(III). J. Inorg. Biochem. 2016, 154, 103–113. [100] Sakamoto, H.; Omata, Y.; Adachi, Y.; Palmer, G.; Noguchi, M. Separation and identification of the regioisomers of verdoheme by reversed-phase ion- pair high-performance liquid chromatography, and characterization of their complexes with heme oxygenase. J. Inorg. Biochem. 2000, 82, 113–121. [101] Mathew, L. G.; Beattie, N. R.; Pritchett, C.; Lanzilotta, W. N. New insight into the mechanism of anaerobic heme degradation. Biochemistry 2019, 58, 4641–4654. [102] Wilks, A. The ShuS protein of Shigella dysenteriae is a heme-sequestering protein that also binds DNA. Arch. Biochem. Biophys. 2001, 387, 137–142. [103] Kaur, A. P.; Wilks, A. Heme inhibits the DNA binding properties of the cytoplasmic heme binding protein of Shigella dysenteriae (ShuS). Biochemistry 2007, 46, 2994–3000. [104] Lansky, I. B.; Lukat-Rodgers, G. S.; Block, D.; Rodgers, K. R.; Ratliff, M.; Wilks, A. The cytoplasmic heme-binding protein (PhuS) from the heme uptake system of Pseudomonas aeruginosa is an intracellular heme-trafficking protein to the δ-regioselective heme oxygenase. J. Biol. Chem. 2006, 281, 13652– 13662. [105] Warburg, O.; Negelein, E. Grunes haemin aus blast-haemin. Chem. Ber. 1930, 63, 1816–1819. [106] Lemberg, R. Transformation of haemins into bile pigments. Biochem. J. 1935, 29, 1322–1336. [107] Wilks, A.; Ikeda-Saito, M. Heme utilization by pathogenic bacteria: not all pathways lead to biliverdin. Acc. Chem. Res. 2014, 47, 2291–2298. 246 REFERENCES [108] Avila, L.; Huang, H.-w.; Damaso, C. O.; Lu, S.; Moe¨nne-Loccoz, P.; Rivera, M. Coupled oxidation vs heme oxygenation: insights from axial ligand mutants of mitochondrial cytochrome b5. J. Am. Chem. Soc. 2003, 125, 4103–4110. [109] O’Neill, M. J.; Bhakta, M. N.; Fleming, K. G.; Wilks, A. Induced fit on heme binding to the Pseudomonas aeruginosa cytoplasmic protein (PhuS) drives interaction with heme oxygenase (HemO). Proc. Natl. Acad. Sci. USA 2012, 109, 5639–5644. [110] Tripathi, S.; O’Neill, M. J.; Wilks, A.; Poulos, T. L. Crystal Structure of the Pseudomonas aeruginosa cytoplasmic heme binding protein, apo-PhuS. J. Inorg. Biochem. 2013, 128, 131–136. [111] Lee, M. J.; Schep, D.; McLaughlin, B.; Kaufmann, M.; Jia, Z. Structural analy- sis and identification of PhuS as a heme-degrading enzyme from Pseudomonas aeruginosa. J. Mol. Biol. 2014, 426, 1936–1946. [112] Wilson, T.; Mourin˜o, S.; Wilks, A. The heme binding protein PhuS transcrip- tionally regulates the Pseudomonas aeruginosa tandem sRNA prrF1,F2 locus. J. Biol. Chem. 2021, 296, 100275–100285. [113] Wilderman, P. J.; Sowa, N. A.; FitzGerald, D. J.; FitzGerald, P. C.; Gottes- man, S.; Ochsner, U. A.; Vasil, M. L. Identification of tandem duplicate regu- latory small RNAs in Pseudomonas aeruginosa involved in iron homeostasis. Proc. Natl. Acad. Sci. USA 2004, 101, 9792–9797. [114] Oglesby, A. G.; Farrow, J. M.; Lee, J. H.; Tomaras, A. P.; Greenberg, E. P.; Pesci, E. C.; Vasil, M. L. The influence of iron on Pseudomonas aeruginosa physiology: a regulatory link between iron and quorum sensing. J. Biol. Chem. 2008, 283, 15558–15567. [115] Reinhart, A. A.; Powell, D. A.; Nguyen, A. T.; O’Neill, M.; Djapgne, L.; Wilks, A.; Ernst, R. K.; Oglesby-Sherrouse, A. G. The prrF -encoded small regulatory RNAs are required for iron homeostasis and virulence of Pseu- domonas aeruginosa. Infect. Immun. 2015, 83, 863–875. [116] Weber, G. Intramolecular transfer of electronic energy in dihydro diphospho- pyridine nucleotide. Nature 1957, 180, 1409. [117] Patel, D. J. 220 MHz proton nuclear magnetic resonance spectra of retinals. Nature 1969, 221, 825–828. [118] Unden, G.; Bongaerts, J. Alternative respiratory pathways of Escherichia coli : energetics and transcriptional regulation in response to electron acceptors. BBA Bioenerg. 1997, 1320, 217–234. [119] Gregory, J. Investigating the NADH dependent reaction of the Yersinia ente- rocolitica haem chaperone HemS. Master’s Thesis, University of Cambridge, 2012. 247 REFERENCES [120] Ikram, A.; Su, Q.; Fiaz, M.; Khadim, S. Big data in enterprise management: transformation of traditional recruitment strategy. IEEE ICBDA 2017, 414– 419. [121] Lee, C.-Y.; Chien, C.-F. Pitfalls and protocols of data science in manufacturing practice. J. Intell. Manuf. 2020, 1–19. [122] Dash, S.; Shakyawar, S. K.; Sharma, M.; Kaushik, S. Big data in healthcare: management, analysis and future prospects. J. Big Data 2019, 6, 1–25. [123] Esfahani, H. J.; Tavasoli, K.; Jabbarzadeh, A. Big data and social media: a scientometrics analysis. Int. J. Data Netw. Sci. 2019, 3, 145–164. [124] Li, K.; Du, Y.; Li, L.; Wei, D.-Q. Bioinformatics Approaches for anti-cancer drug discovery. Curr. Drug Targets 2020, 21, 3–17. [125] Collins, F. S. et al. Finishing the euchromatic sequence of the human genome. Nature 2004, 431, 931–945. [126] Feng, J.-J.; Chen, J.-N.; Kang, W.; Wu, Y.-D. Accurate structure prediction for protein loops based on molecular dynamics simulations with RSFF2C. J. Chem. Theory Comput. 2021, 17, 4614–4628. [127] Cho, S. S.; Weinkam, P.; Wolynes, P. G. Origins of barriers and barrierless folding in BBL. Proc. Natl. Acad. Sci. USA 2008, 105, 118–123. [128] Salsbury, F. R.; Crowley, M. F.; Brooks III, C. L. Modeling of the metallo-β- lactamase from B. Fragilis : structural and dynamic effects of inhibitor binding. Proteins 2001, 44, 448–459. [129] Verlet, L. Computer “Experiments” on classical fluids. I. Thermodynamical properties of Lennard-Jones molecules. Phys. Rev. 1967, 159, 98–103. [130] Elber, R. Perspective: Computer simulations of long time dynamics. J. Chem. Phys. 2016, 144, 1–12. [131] Eaton, W. A.; Muno˜z, V.; Hagen, S. J.; Jas, G. S.; Lapidus, L. J.; Henry, E. R.; Hofrichter, J. Fast kinetic and mechanisms in protein folding. Annu. Rev. Biophys. Biomol. Struct. 2000, 29, 327–359. [132] Prigozhin, M. B.; Gruebele, M. Microsecond folding experiments and simula- tions: a match is made. Phys. Chem. Chem. Phys. 2013, 15, 3372–3388. [133] Wales, D. J. Energy Landscapes, Cambridge University Press: Cambridge, 2003. [134] Ro¨der, K. Energy landscaping - on the relationship between functionality and sequence mutations for multifunctional biomolecules. PhD Thesis, University of Cambridge, 2018. [135] Dijkstra, E. A note on two problems in connexion with graphs. Numer. Math. 1959, 1, 269–271. 248 REFERENCES [136] Baldwin, R. L. Matching speed and stability. Nature 1994, 369, 183–184. [137] Wales, D. J. Calculating rate constants and committor probabilities for tran- sition networks by graph transformation. J. Chem. Phys. 2009, 130, 204111– 204118. [138] Stevenson, J. D.; Wales, D. J. Communication: Analysing kinetic transition networks for rare events. J. Chem. Phys. 2014, 141, 041104–041108. [139] Trygubenko, S. A.; Wales, D. J. Graph transformation method for calculating waiting times in Markov chains. J. Chem. Phys. 2006, 124, 234110–234126. [140] Sharpe, D. J.; Wales, D. J. Identifying mechanistically distinct pathways in kinetic transition networks. J. Chem. Phys. 2019, 151, 124101–124114. [141] Burke, D. F.; Mantell, R. G.; Pitt, C. E.; Wales, D. J. Energy landscape for the membrane fusion pathway in influenza A hemagglutinin from discrete path sampling. Front. Chem. 2020, 8, 1–11. [142] Levinthal, C. How to fold graciously. Mossbauer spectroscopy in biological systems: proceedings of a meeting held at Allerton House, Monticello, Illinois. 1969, 22–24. [143] Anfinsen, C. B. Principles that govern the folding of protein chains. Science 1973, 181, 223. [144] Dill, K. A. The stabilities of globular proteins in Protein Engineering, Alan R. Liss, Inc.: New York, 1987, 187–192. [145] Leopold, P. E.; Montal, M.; Onuchic, J. N. Protein folding funnels: a kinetic approach to the sequence-structure relationship. Proc. Natl. Acad. Sci. USA 1992, 89, 8721–8725. [146] Bryngelson, J. D.; Wolynes, P. G. Spin glasses and the statistical mechanics of protein folding. Proc. Natl. Acad. Sci. USA 1987, 84, 7524–7528. [147] Carr, J. M.; Wales, D. J. Folding pathways and rates for the three-stranded β-sheet peptide Beta3s using discrete path sampling. J. Phys. Chem. B 2008, 112, 8760–8769. [148] Prentiss, M. C.; Wales, D. J.; Wolynes, P. G. The energy landscape, folding pathways and the kinetics of a knotted protein. PLoS Comput. Biol. 2010, 6, 1–12. [149] Joseph, J. A.; Chakraborty, D.; Wales, D. J. Energy Landscape for fold- switching in regulatory protein RfaH. J. Chem. Theory Comput. 2019, 15, 731–742. [150] Neelamraju, S.; Gosavi, S.; Wales, D. J. Energy landscape of the designed protein Top7. J. Phys. Chem. B 2018, 122, 12282–12291. 249 REFERENCES [151] Chakraborty, D.; Wales, D. J. Energy landscape and pathways for transitions between Watson-Crick and Hoogsteen base pairing in DNA. J. Phys. Chem. Lett. 2018, 9, 229–241. [152] Fejer, S. N.; James, T. R.; Herna´ndez-Rojas, J.; Wales, D. J. Energy land- scapes for shells assembled from pentagonal and hexagonal pyramids. Phys. Chem. Chem. Phys. 2009, 11, 2098–2104. [153] Hendlich, M. Databases for protein-ligand complexes. Acta Cryst. D 1998, 54, 1178–1182. [154] Schmitt, S.; Hendlich, M.; Klebe, G. From structure to function: a new ap- proach to detect functional similarity among proteins independent from se- quence and fold homology. Angew. Chem. Int. Ed. 2001, 40, 3141–3144. [155] Schmitt, S.; Kuhn, D.; Klebe, G. A new method to detect related function among proteins independent of sequence and fold homology. J. Mol. Biol. 2002, 323, 387–406. [156] Hendlich, M.; Rippmann, F.; Barnickel, G. LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J. Mol. Graph. Model. 1997, 15, 359–363. [157] Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weis- sig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [158] Wales, D. J. gmin: A program for basin-hopping global optimisation. 2021, http://www-wales.ch.cam.ac.uk/GMIN. [159] Shang, C.; Choy, D.; Barker, P. D.; Wales, D. J. Energy landscape for an automated enzymatic double gate: analysis of phenylalanine flipping in HemS [Unpublished]. 2015, 1–12. [160] Gasteiger, E.; Hoogland, C.; Gattiker, A.; Duvaud, S.; Wilkins, M. R.; Ap- pel, R. D.; Bairoch, A. Protein identification and analysis tools on the ExPASy server in The Proteomics Protocols Handbook, Humana Press: Totowa, New Jersey, 2005, 571–607. [161] Cole, J. Experimental investigations into the mechanism of the HemS- catalysed reaction between NADH and haem from computational studies. Master’s Thesis, University of Cambridge, 2020. [162] Pera¨nen, J.; Rikkonen, M.; Hyvo¨nen, M.; Ka¨a¨ria¨inen, L. T7 vectors with a modified T7 lac promoter for expression of proteins in Escherichia coli . Anal. Biochem. 1996, 236, 371–373. [163] Schneider, S.; Paoli, M. Crystallization and preliminary X-ray diffraction anal- ysis of the haem-binding protein HemS from Yersinia enterocolitica. Acta Cryst. F 2005, 61, 802–805. 250 REFERENCES [164] SnapGene software (from Insightful Science). 2021, https://www. snapgene.com/. [165] NEB Protocol for Restriction Endonuclease Reaction. 2012, https://international.neb.com/protocols/2012/12/07/ optimizing-restriction-endonuclease-reactions. [166] Johnson, K. A.; Simpson, Z. B.; Blom, T. Global Kinetic Explorer: a new computer program for dynamic simulation and fitting of kinetic data. Anal. Biochem. 2009, 387, 20–29. [167] Merow, C.; Smith, M. J.; Silander, J. A. A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. Ecography 2013, 36, 1058–1069. [168] Vonrhein, C.; Flensburg, C.; Keller, P.; Sharff, A.; Smart, O.; Paciorek, W.; Womack, T.; Bricogne, G. Data processing and analysis with the autoPROC toolbox. Acta Cryst. D 2011, 67, 293–302. [169] Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Cryst. D 2011, 67, 235–242. [170] Murshudov, G. N.; Vagin, A. A.; Dodson, E. J. Refinement of macromolecular structures by the maximum-likelihood method. Acta Cryst. D 1997, 53, 240– 255. [171] Emsley, P.; Lohkamp, B.; Scott, W. G.; Cowtan, K. Features and development of Coot. Acta Cryst. D 2010, 66, 486–501. [172] Northrop, D. B.; Duggleby, R. G. Preparation and storage of isotopically la- beled reduced nicotinamide adenine dinucleotide. Anal. Biochem. 1987, 165, 362–364. [173] Basran, J.; Harris, R. J.; Sutcliffe, M. J.; Scrutton, N. S. H-tunneling in the multiple H-transfers of the catalytic cycle of morphinone reductase and in the reductive half-reaction of the homologous pentaerythritol tetranitrate reduc- tase. J. Biol. Chem. 2003, 278, 43973–43982. [174] Pudney, C. R.; Hay, S.; Sutcliffe, M. J.; Scrutton, N. S. α-Secondary isotope effects as probes of “tunneling-ready” configurations in enzymatic H-tunneling: insight from environmentally coupled tunneling models. J. Am. Chem. Soc. 2006, 128, 14053–14058. [175] Pearlman, D. A.; Case, D. A.; Caldwell, J. W.; Ross, W. S.; Cheatham III, T. E.; DeBolt, S.; Ferguson, D.; Seibel, G.; Kollman, P. AMBER, a package of computer programs for applying molecular mechanics, normal mode analy- sis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comput. Phys. Commun. 1995, 91, 1– 41. 251 REFERENCES [176] Go¨tz, A. W.; Williamson, M. J.; Xu, D.; Poole, D.; Le Grand, S.; Walker, R. C. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. Generalized Born. J. Chem. Theory Comput. 2012, 8, 1542–1555. [177] Salomon-Ferrer, R.; Go¨tz, A. W.; Poole, D.; Le Grand, S.; Walker, R. C. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh Ewald. J. Chem. Theory and Comput. 2013, 9, 3878–3888. [178] Case, D. A. et al. amber12, University of California, San Francisco. 2012. [179] Case, D. A. et al. amber16, University of California, San Francisco. 2016. [180] Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of multiple AMBER force fields and development of improved protein backbone parameters. Proteins 2006, 65, 712–725. [181] Wang, J.; Cieplak, P.; Kollman, P. A. How well does a Restrained Electrostatic Potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J. Comput. Chem. 2000, 21, 1049–1074. [182] Okur, A.; Wickstrom, L.; Layten, M.; Geney, R.; Song, K.; Hornak, V.; Sim- merling, C. Improved efficiency of replica exchange simulations through use of a hybrid explicit/implicit solvation model. J. Chem. Theory Comput. 2006, 2, 420–433. [183] Roe, D. R.; Okur, A.; Wickstrom, L.; Hornak, V.; Simmerling, C. Secondary structure bias in Generalized Born solvent models: comparison of conforma- tional ensembles and free energy of solvent polarization from explicit and im- plicit solvation. J. Phys. Chem. B 2007, 111, 1846–1857. [184] Rieloff, E.; Skepo¨, M. Phosphorylation of a disordered peptide – structural effects and force field inconsistencies. J. Chem. Theory Comput. 2020, 16, 1924–1935. [185] Gopal, S. M.; Wingbermu¨hle, S.; Schnatwinkel, J.; Juber, S.; Herrmann, C.; Scha¨fer, L. V. Conformational preferences of an intrinsically disordered protein domain: a case study for modern force fields. J. Phys. Chem. B 2021, 125, 24–35. [186] Mustafa, G.; Nandekar, P. P.; Mukherjee, G.; Bruce, N. J.; Wade, R. C. The effect of force-field parameters on cytochrome P450-membrane interactions: structure and dynamics. Sci. Rep. 2020, 10, 1–11. [187] Wang, J.; Wang, W.; Kollman, P. A.; Case, D. A. Automatic atom type and bond type perception in molecular mechanical calculations. J. Mol. Graph. Model. 2006, 25, 247–260. [188] Bryce, R. Bryce Group - AMBER Parameter Database, University of Manch- ester. 2021, http://amber.manchester.ac.uk/. 252 REFERENCES [189] Giammona, D. A. An examination of conformational flexibility in porphyrins and bulky-ligand binding in myoglobin. PhD Thesis, University of California, Davis, 1984. [190] Pavelites, J. J.; Gao, J.; Bash, P. A. A molecular mechanics force field for NAD+, NADH and the pyrophosphate groups of nucleotides. J. Comput. Chem. 1997, 18, 221–239. [191] Walker, R. C.; De Souza, M. M.; Mercer, I. P.; Gould, I. R.; Klug, D. R. Large and fast relaxations inside a protein: calculation and measurement of reorganization energies in alcohol dehydrogenase. J. Phys. Chem. B 2002, 106, 11658–11665. [192] Holmberg, N.; Ryde, U.; Bu¨low, L. Redesign of the coenzyme specificity in l-Lactate dehydrogenase from Bacillus stearothermophilus using site-directed mutagenesis and media engineering. Protein Eng. 1999, 12, 851–856. [193] Bashford, D.; Case, D. A. Generalized Born models of macromolecular solva- tion effects. Annu. Rev. Phys. Chem. 2000, 51, 129–152. [194] Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T. Semianalytical treatment of solvation for molecular mechanics and dynamics. J. Am. Chem. Soc. 1990, 112, 6127–6129. [195] Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Parametrized models of aqueous free energies of solvation based on pairwise descreening of solute atomic charges from a dielectric medium. J. Phys. Chem. 1996, 100, 19824–19839. [196] Onufriev, A.; Bashford, D.; Case, D. A. Exploring protein native states and large-scale conformational changes with a modified Generalized Born model. Proteins 2004, 55, 383–394. [197] Onufriev, A.; Bashford, D.; Case, D. A. Modification of the Generalized Born model suitable for macromolecules. J. Phys. Chem. B 2000, 104, 3712–3720. [198] Guex, N.; Peitsch, M. C. SWISS-MODEL and the Swiss-PdbViewer: an en- vironment for comparative protein modeling. Electrophoresis 1997, 18, 2714– 2723. [199] Li, Z.; Scheraga, H. A. Monte Carlo-minimization approach to the multiple- minima problem in protein folding. Proc. Natl. Acad. Sci. USA 1987, 84, 6611–6615. [200] Li, Z.; Scheraga, H. A. Structure and free energy of complex thermodynamic systems. J. Mol. Struct. 1988, 179, 333–352. [201] Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953, 21, 1087–1092. 253 REFERENCES [202] Hastings, W. K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57, 97–109. [203] Hoffmann, K. H.; Franz, A.; Salamon, P. The structure of best possible strate- gies for finding ground states. Phys. Rev. E. 2002, 66, 046706–046714. [204] Wales, D. J.; Doye, J. P. Global optimization by basin-hopping and the lowest energy structures of Lennard-Jones clusters containing up to 110 atoms. J. Phys. Chem. A 1997, 101, 5111–5116. [205] Broyden, C. G. The convergence of a class of double-rank minimization algo- rithms. IMA J. Appl. Maths. 1970, 6, 76–90. [206] Fletcher, R. A new approach to variable metric algorithms. Comput. J. 1970, 13, 317–322. [207] Goldfarb, D. A family of variable-metric methods derived by variational means. Math. Comput. 1970, 24, 23–26. [208] Shanno, D. F. Conditioning of quasi-Newton methods for function minimiza- tion. Math. Comput. 1970, 24, 647–656. [209] Nocedal, J. Updating quasi-Newton matrices with limited storage. Math. Com- put. 1980, 35, 773–782. [210] Liu, D. C.; Nocedal, J. On the limited memory BFGS method for large scale optimization. Math. Prog. 1989, 45, 503–528. [211] Mantell, R. G.; Pitt, C. E.; Wales, D. J. GPU-Accelerated exploration of biomolecular energy landscapes. J. Chem. Theory Comput. 2016, 12, 6182– 6191. [212] Murrell, J.; Laidler, K. Symmetries of activated complexes. Trans. Faraday Soc. 1968, 64, 371–377. [213] Wales, D. J. optim: A program for optimising geometries and calculating pathways. 2021, http://www-wales.ch.cam.ac.uk/OPTIM. [214] Trygubenko, S. A.; Wales, D. J. A doubly nudged elastic band method for finding transition states. J. Chem. Phys. 2004, 120, 2082–2094. [215] Trygubenko, S. A.; Wales, D. J. Erratum: A doubly nudged elastic band method for finding transition states (Journal of Chemical Physics (2004) 120 (2082)). J. Chem. Phys. 2004, 120, 7820. [216] Cerjan, C. J.; Miller, W. H. On finding transition states. J. Chem. Phys. 1981, 75, 2800–2806. [217] Panc´ıˇr, J. Calculation of the least energy path on the energy hypersurface. Coll. Czech Chem. Commun. 1975, 40, 1112–1118. 254 REFERENCES [218] Munro, L. J.; Wales, D. J. Defect migration in crystalline silicon. Phys. Rev. B 1999, 59, 3969–3980. [219] Jo´nsson, H.; Mills, G.; Jacobsen, K. W. Nudged elastic band method for finding minimum energy paths of transitions in Classical and Quantum Dynamics in Condensed Phase Simulations, World Scientific, 1998, 385–404. [220] Henkelman, G.; Jo´nsson, H. A dimer method for finding saddle points on high dimensional potential surfaces using only first derivatives. J. Chem. Phys. 1999, 111, 7010–7022. [221] Henkelman, G.; Uberuaga, B. P.; Jo´nsson, H. A climbing image nudged elastic band method for finding saddle points and minimum energy paths. J. Chem. Phys. 2000, 113, 9901–9904. [222] Henkelman, G.; Jo´nsson, H. Improved tangent estimate in the nudged elastic band method for finding minimum energy paths and saddle points. J. Chem. Phys. 2000, 113, 9978–9985. [223] Hildebrand, F. Methods of Applied Mathematics, Dover Publications: New York, 1992. [224] Kumeda, Y.; Wales, D. J.; Munro, L. J. Transition states and rearrangment mechanisms from hybrid eigenvector-following and density functional theory. Application to C10H10 and defect migration in crystalline silicon. Chem. Phys. Lett. 2001, 341, 185–194. [225] Pechukas, P. On simple saddle points of a potential surface, the conservation of nuclear symmetry along paths of steepest descent, and the symmetry of transition states. J. Chem. Phys. 1976, 64, 1516. [226] Carr, J. M.; Trygubenko, S. A.; Wales, D. J. Finding pathways between distant local minima. J. Chem. Phys. 2005, 122, 234903–234909. [227] Rhee, Y. M. Construction of an accurate potential energy surface by inter- polation with Cartesian weighting coordinates. J. Chem. Phys. 2000, 113, 6021–6024. [228] Wales, D. J.; Carr, J. M.; Khalili, M.; de Souza, V. K.; Strodel, B.; Whittle- ston, C. S. Pathways and rates for structural transformations of peptides and proteins in Proteins: Energy, Heat and Signal Flow, Taylor and Francis/CRC Press: Oxford, 2009, 315–340. [229] Whittleston, C. S. Energy landscapes of biological systems. PhD Thesis, Uni- versity of Cambridge, 2011. [230] Strodel, B.; Whittleston, C. S.; Wales, D. J. Thermodynamics and kinetics of aggregation for the GNNQQNY peptide. J. Am. Chem. Soc. 2007, 129, 16005–16014. 255 REFERENCES [231] Wales, D. J. pathsample: A driver for optim to create stationary point databases using discrete path sampling and perform kinetic analysis. 2021, http://www-wales.ch.cam.ac.uk/PATHSAMPLE. [232] Ro¨der, K.; Wales, D. J. Energy landscapes for the aggregation of Aβ17−42. J. Am. Chem. Soc. 2018, 140, 4018–4027. [233] Stillinger, F. H.; Weber, T. A. Hidden structure in liquids. Phys. Rev. A 1982, 25, 978–989. [234] Becker, O. M.; Karplus, M. The topology of multidimensional potential energy surfaces: theory and application to peptide structure and kinetics. J. Chem. Phys. 1997, 106, 1495–1517. [235] Wales, D. J.; Miller, M. A.; Walsh, T. R. Archetypal energy landscapes. Nature 1998, 394, 758–760. [236] Liu, Y.; Ortiz De Montellano, P. R. Reaction intermediates and single turnover rate constants for the oxidation of heme by human heme oxygenase-1. J. Biol. Chem. 2000, 275, 5297–5307. [237] Teichmann, L. K. Uber die krystallisation der organischen bestandteile des bluts. Zeitschrift fur Kationelle Medicin 1853, 3, 375–388. [238] Le Baut, G.; O’Brien, C.; Pavli, P.; Roy, M.; Seksik, P.; Tre´ton, X.; Nancey, S.; Barnich, N.; Bezault, M.; Auzolle, C.; Cazals-Hatem, D.; Viala, J.; Allez, M.; Hugot, J. P.; Dumay, A. Prevalence of Yersinia species in the ileum of Crohn’s disease patients and controls. Front. Cell. Infect. Microbiol. 2018, 8, 1–9. [239] Friedman, E. S. et al. Microbes vs. chemistry in the origin of the anaerobic gut lumen. Proc. Natl. Acad. Sci. USA 2018, 115, 4170–4175. [240] Schro¨dinger, L. L. C.; DeLano, W. pymol. 2020, http://www.pymol. org/pymol. [241] Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [242] Kumar, S.; Stecher, G.; Li, M.; Knyaz, C.; Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 2018, 35, 1547–1549. [243] Letunic, I.; Bork, P. Interactive Tree of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 2019, 47, 256–259. [244] Huang, B. Metapocket: a meta approach to improve protein ligand binding site prediction. OMICS J. Integr. Biol. 2009, 13, 325–330. [245] Waterhouse, A.; Bertoni, M.; Bienert, S.; Studer, G.; Tauriello, G.; Gumi- enny, R.; Heer, F. T.; De Beer, T. A.; Rempfer, C.; Bordoli, L.; Lepore, R.; Schwede, T. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018, 46, 296–303. 256 REFERENCES [246] Liu, R.; Hu, J. DNABind: a hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches. Proteins 2013, 81, 1885–1899. [247] Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Green- blatt, D. M.; Meng, E. C.; Ferrin, T. E. UCSF Chimera – A visualization system for exploratory research and analysis. J. Comput. Chem. 2004, 25, 1605–1612. [248] Law, J.; Cole, L. Applied PhotoPhysics SX Series Application Note: LED Light Sources for Stopped-Flow Spectroscopy. 2011. [249] Alavi, F. S.; Zahedi, M.; Safari, N.; Ryde, U. QM/MM study of the conversion of oxophlorin into verdoheme by heme oxygenase. J. Phys. Chem. B 2017, 121, 11427–11436. [250] Gheidi, M.; Safari, N.; Zahedi, M. Density functional theory studies on the conversion of hydroxyheme to iron-verdoheme in the presence of dioxygen. Dalton Trans. 2017, 46, 2146–2158. [251] Alavi, F. S.; Gheidi, M.; Zahedi, M.; Safari, N.; Ryde, U. A novel mechanism of heme degradation to biliverdin studied by QM/MM and QM calculations. Dalton Trans. 2018, 47, 8283–8291. [252] Alavi, F. S.; Zahedi, M.; Safari, N.; Ryde, U. QM/MM study of the conversion of biliverdin into verdoheme by heme oxygenase. Theor. Chem. Acc. 2019, 138, 1–8. [253] Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [254] Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 2021, 596, 590–596. [255] Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871–876. [256] SWISS-MODEL. Homo sapiens (human). 2021, https://swissmodel. expasy.org/repository/species/9606. [257] Pedersen, B. The dnaseq package. 2002. [258] Yokoyama, K. et al. Complete nucleotide sequence of the prophage VT1-Sakai carrying the Shiga toxin 1 genes of the enterohemorrhagic Escherichia coli O157:H7 strain derived from the Sakai outbreak. Gene 2000, 258, 127–139. [259] Mills, M.; Payne, S. M. Identification of shuA, the gene encoding the heme receptor of Shigella dysenteriae, and analysis of invasion and intracellular mul- tiplication of a shuA mutant. Infect. Immun. 1997, 65, 5358–5363. 257 REFERENCES [260] Strodel, B.; Wales, D. J. Free energy surfaces from an extended harmonic superposition approach and kinetics for alanine dipeptide. Chem. Phys. Lett. 2008, 466, 105–115. [261] Hoare, M. R.; McInnes, J. A. Morphology and statistical statics of simple microclusters. Adv. Phys. 1983, 32, 791–821. [262] Joseph, J. A.; Ro¨der, K.; Chakraborty, D.; Mantell, R. G.; Wales, D. J. Explor- ing biomolecular energy landscapes. Chem. Commun. 2017, 53, 6974–6988. [263] Wales, D. J. Decoding heat capacity features from the energy landscape. Phys. Rev. E 2017, 95, 1–6. [264] Marcelin, R. Expression of speeds of transformations of physico-chemical sys- tems in an affinity function. Cr. Hebd. Acad. Sci. 1913, 157, 1419–1422. [265] Eyring, H. The activated complex in chemical reactions. J. Chem. Phys. 1935, 3, 107–115. [266] Evans, M. G.; Polanyi, M. Some applications of the transition state method to the calculation of reaction velocities, especially in solution. Trans. Faraday Soc. 1935, 31, 875–893. [267] Miller, W. H. Importance of nonseparability in quantum mechanical transition- state theory. Acc. Chem. Res. 1976, 9, 306–312. [268] Miller, W. H. Unified statistical model for “complex” and “direct” reaction mechanisms. J. Chem. Phys. 1976, 65, 2216–2223. [269] Rice, O. K.; Ramsperger, H. C. Theories of unimolecular gas reactions at low pressures. I. J. Am. Chem. Soc. 1927, 49, 1617–1629. [270] Rice, O. K.; Ramsperger, H. C. Theories of unimolecular gas reactions at low pressures. II. J. Am. Chem. Soc. 1928, 50, 617–620. [271] Kassel, L. S. Studies in homogeneous gas reactions. J. Phys. Chem. 1928, 32, 225–242. [272] Marcus, R. A. Unimolecular dissociations and free radical recombination re- actions. J. Chem. Phys. 1952, 20, 359–364. [273] Fain, B. Theory of rate constants: master equation approach. J. Stat. Phys. 1981, 25, 475–489. [274] Onsager, L. Initial recombination of ions. Phys. Rev. 1938, 54, 554–557. [275] Du, R.; Pande, V. S.; Grosberg, A. Y.; Tanaka, T.; Shakhnovich, E. S. On the transition coordinate for protein folding. J. Chem. Phys. 1998, 108, 334–350. 258