On the epigenetic ageing clock in humans Daniel Elías Martín Herranz European Molecular Biology Laboratory, European Bioinformatics Institute University of Cambridge This dissertation is submitted for the degree of Doctor of Philosophy Churchill College April 2019 A mi familia capicúa, Andrés, Pilar y Andrés. Porque estas páginas de ciencia son un reflejo de su arte. Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration with others, except when specified in the declarations at the beginning of the chapters. I further specify this by using the pronoun ‘we’ when others were substantially involved in the work and ‘I’ for those parts that are purely my own work. It is not substantially the same as any that I have submitted, or, is being concurrently submitted for a degree or diploma or other qualification at the University of Cambridge or any other University or similar institution. I further state that no substantial part of my dissertation has already been submitted, or, is being concurrently submitted for any such degree, diploma or other qualification at the University of Cambridge or any other University or similar institution. This dissertation contains fewer than 60,000 words exclusive of tables, footnotes, bibliography, and appendices and has fewer than 150 figures. Daniel Elías Martín Herranz April 2019 Acknowledgements This thesis has made use of a great amount of chronological time (hopefully not too much biological time) of a lot of people. This is also their work. I am deeply thankful ... ... to Janet Thornton, for opening the doors of the EBI to me, showing me the true nature of critical thinking, science and proper discussion, and for supporting my (sometimes) wild ideas and plans; ... to Wolf Reik, who is responsible for my scientific crush on epigenetics, for accepting me as an unofficial student, providing always stimulating ideas and inviting me to his garden parties; ... to Tom Stubbs, for his scientific creativity, friendship and burrito evenings; ... to the rest of my collaborators, especially Marc Jan Bonder, Antonio Ribeiro and Erfan Aref-Eshghi, for their contributions; ... to the rest of my TAC members, Oliver Stegle, Judith Zaugg and Gos Micklem, for their guidance; ... to Nils Eling, Hannah Meyer, Jack Monahan and Max Stammnitz; for taking the time to read through these pages and send me their thoughts and comments; ... to the incredible people in the Thornton and Reik labs, for their input and many shared lunches (and some beers); ... a mi familia, por su amor y apoyo siempre incondicional (y por alimentarme tan bien); ... a Parvathi ‘Ale’ Subbiah, porque su ‘efecto’ me ha dado fuerza todos los días desde que la conocí (y por ayudarme con el diseño de las figuras); ... to the EMBL-EBI crowd, especially to Jack, Nils, Lara, Omar, Hannah and Julia, for many good times at the Blue Moon and the Wiggle Mansion; ... to the rest of the Cambridge crowd, including members of Los del Cam (Max, Vlad, Gogi, Ale), Churchill College (Barbora, basketball team), the CompBio MPhil (Daniel, Elias, Dalia, Andy) and becari@s La Caixa; for keeping me sane in this bubble; ... a mis amigos de Salamanca (Salón del Té) y de Soria (Club de Bebedores Mercadona); por su eterna amistad; ... to La Caixa and EMBL, for funding me and giving me the opportunity to be writing these words; ... to all those people that I forgot to include because of my procrastination, they know who they are. Abstract Epigenetic clocks are mathematical models that predict the biological age of an organism us- ing DNA methylation data, and which have emerged in the last few years as the most accurate biomarkers of the ageing process. However, little is known about the molecular mechanisms that control the rate of such clocks. In this thesis I focus on the study of the epigenetic ageing clock in humans. First, I review and benchmark statistical and computational tools required for the analysis of DNA methylation data in the context of human ageing. Next, I validate the performance of the Horvath epigenetic clock, the most widely used multi-tissue epigenetic clock in humans, in a control blood dataset and test its behaviour in patients with a variety of developmental disorders, which harbour mutations in proteins of the epigenetic machinery. I demonstrate that loss-of-function mutations in the H3K36 methyltransferase NSD1, which cause Sotos syndrome, substantially accelerate epigenetic ageing. Furthermore, I show that the normal ageing process and Sotos syndrome share methylation changes and the genomic context in which they happen. These results suggest that the H3K36 methylation machinery is a key component of the epigenetic maintenance system in humans, which controls the rate of epigenetic ageing, and this role seems to be conserved in model organisms. Finally, I provide a technological strategy to make epigenetic clocks (or any DNA methylation-based mathematical models) more cost-effective by exploiting the ability of restriction enzymes to perform genomic enrichment. This thesis provides novel insights (statistical, biological, technological) into the epigenetic ageing clock in humans, which will help to shed light on the different processes that erode the human epigenetic landscape during ageing. Table of contents List of figures xiii List of tables xvii Abbreviations and acronyms xxiv 1 Introduction 1 1.1 The biology of ageing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 A brief introduction to ageing theory . . . . . . . . . . . . . . . . . 1 1.1.2 The genetic basis of ageing . . . . . . . . . . . . . . . . . . . . . . 5 1.1.3 Hallmarks of mammalian ageing . . . . . . . . . . . . . . . . . . . 9 1.1.4 Studying the ageing process in humans . . . . . . . . . . . . . . . 12 1.2 Epigenetics of ageing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2.1 A brief introduction to epigenetics . . . . . . . . . . . . . . . . . . 14 1.2.2 Fundamentals of DNA methylation in mammals . . . . . . . . . . 18 1.2.3 Links between the epigenetic machinery and ageing . . . . . . . . 23 1.3 The epigenetic ageing clock . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.3.1 Measuring the ageing process . . . . . . . . . . . . . . . . . . . . 27 1.3.2 The landscape of epigenetic clocks . . . . . . . . . . . . . . . . . . 29 1.3.3 Molecular mechanisms of the epigenetic ageing clock . . . . . . . 32 2 Statistical aspects 37 2.1 Analysing the blood methylome to study human ageing . . . . . . . . . . . 37 2.1.1 Building a DNA methylation dataset from public data . . . . . . . . 37 2.1.2 Main DNA methylation data pre-processing pipeline . . . . . . . . 38 2.1.3 Accounting for blood cell composition changes during ageing . . . 46 2.1.4 Identifying differentially methylated positions during ageing . . . . 53 2.1.5 Shannon methylation entropy . . . . . . . . . . . . . . . . . . . . 59 2.2 Behaviour of Horvath’s epigenetic clock during ageing . . . . . . . . . . . 61 xii Table of contents 2.2.1 Calculating epigenetic age using Horvath’s epigenetic clock . . . . 61 2.2.2 Horvath’s epigenetic clock measures physiological ageing . . . . . 64 2.2.3 Correcting for batch effects in the context of the epigenetic clock . 67 2.3 Behaviour of other epigenetic clocks during ageing . . . . . . . . . . . . . 70 2.3.1 Hannum’s epigenetic clock . . . . . . . . . . . . . . . . . . . . . . 70 2.3.2 Epigenetic mitotic clock: epiTOC . . . . . . . . . . . . . . . . . . 71 2.4 Additional methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3 Biological aspects 79 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.2 Screening for genes that accelerate the epigenetic ageing clock . . . . . . . 80 3.3 Sotos syndrome accelerates epigenetic ageing . . . . . . . . . . . . . . . . 84 3.4 Comparing Sotos syndrome and physiological ageing . . . . . . . . . . . . 87 3.5 Methylation Shannon entropy and the epigenetic clock . . . . . . . . . . . 90 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.7 Additional methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4 Technological aspects 103 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2 Restriction enzyme digestion as a tool for genomic enrichment . . . . . . . 106 4.3 cuRRBS: customised Reduced Representation Bisulfite Sequencing . . . . 109 4.4 Running cuRRBS in different biological systems . . . . . . . . . . . . . . 111 4.5 Experimental validation of cuRRBS . . . . . . . . . . . . . . . . . . . . . 113 4.6 Conclusions and future directions . . . . . . . . . . . . . . . . . . . . . . 115 4.7 Additional methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5 Final remarks 125 5.1 Statistical aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2 Biological aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.3 Technological aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Appendix 131 S.1 Supplementary for chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . 131 S.2 Supplementary for chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . 140 S.3 Supplementary for chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . 161 References 169 List of figures 1.1 Theoretical framework to conceptualise the ageing process . . . . . . . . . 4 1.2 Main signalling pathways that affect the ageing process . . . . . . . . . . . 6 1.3 Establishment and maintenance of 5-methylcytosine in mammalian genomes 20 1.4 Oxidation of 5-methylcytosine and the cycle of demethylation . . . . . . . 21 2.1 Chronological age distribution in the healthy individuals . . . . . . . . . . 40 2.2 Main DNA methylation data pre-processing pipeline . . . . . . . . . . . . 43 2.3 Effect of BMIQ normalisation on the β -value distribution . . . . . . . . . . 45 2.4 Benchmarking of the cell-type deconvolution strategies in blood: RMSE and MAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.5 Predictions obtained for each blood cell type using the optimal deconvolution strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.6 Changes in blood cell composition during human ageing . . . . . . . . . . 54 2.7 Changes in the blood methylome during human ageing . . . . . . . . . . . 57 2.8 Changes in the β -values of four different aDMPs . . . . . . . . . . . . . . 58 2.9 Relationship between the β -value and the Shannon entropy at a given CpG site 60 2.10 Genome-wide methylation Shannon entropy during physiological ageing . . 60 2.11 Transforming chronological age in Horvath’s model . . . . . . . . . . . . . 63 2.12 Horvath’s epigenetic clock measures physiological ageing . . . . . . . . . 66 2.13 Correcting for batch effects in the context of the epigenetic clock . . . . . . 69 2.14 Causes of deviation from the expected EAA distribution in the control model 70 2.15 Behaviour of Hannum’s epigenetic clock in the healthy individuals . . . . . 72 2.16 Behaviour of the epigenetic mitotic clock (epiTOC) in the healthy individuals 74 3.1 Chronological age distribution in the individuals with developmental disorders 81 3.2 Overview of the analyses performed in Chapter 3 . . . . . . . . . . . . . . 83 3.3 Screening for epigenetic age acceleration (EAA) in developmental disorders 85 3.4 Sotos syndrome accelerates epigenetic ageing . . . . . . . . . . . . . . . . 86 xiv List of figures 3.5 Comparing DNA methylation changes in Sotos syndrome and physiological ageing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.6 Landscape of Horvath’s epigenetic clock CpGs in Sotos syndrome . . . . . 91 3.7 Methylation Shannon entropy during physiological ageing and in Sotos syndrome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.8 Proposed model that highlights the role of H3K36 methylation maintenance on epigenetic ageing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.1 The landscape of restriction enzyme motifs . . . . . . . . . . . . . . . . . 106 4.2 Restriction enzyme digestion as a tool for genomic enrichment . . . . . . . 108 4.3 cuRRBS overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.4 Running cuRRBS in different biological systems . . . . . . . . . . . . . . 114 4.5 Experimental validation of cuRRBS . . . . . . . . . . . . . . . . . . . . . 116 S1.1 Effects of noob background correction on the array fluorescence intensities. 131 S1.2 Quality control (QC) strategy to identify outlier samples. . . . . . . . . . . 132 S1.3 M-value distributions in the GSE41273 batch . . . . . . . . . . . . . . . . 132 S1.4 Cell-type deconvolution strategies that were benchmarked . . . . . . . . . 133 S1.5 Benchmarking of the cell-type deconvolution strategies in blood: R2 . . . . 134 S1.6 Table showing the top 100 aDMPs . . . . . . . . . . . . . . . . . . . . . . 137 S1.7 Impact of the absence of background correction on the predictions from the epigenetic clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 S1.8 Correcting for batch effects: control model without cell composition correction138 S1.9 PCA on the array control probes captures batch effects: cases . . . . . . . . 139 S1.10Variance explained by the different principal components during batch effect correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 S2.1 Table showing information for the individuals with developmental disorders 149 S2.2 Effect of changing the median age of the controls when performing the screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 S2.3 Screening for epigenetic age acceleration (EAA) in developmental disorders: additional scatterplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 S2.4 Enrichment for the categorical (epi)genomic features in Sotos and ageing: genome-wide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 S2.5 Distributions of scores for the continuous (epi)genomic features in Sotos and ageing: genome-wide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 S2.6 Scores for the continuous (epi)genomic features in the Horvath’s epigenetic clock CpGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 List of figures xv S2.7 Enrichment for the categorical (epi)genomic features in Sotos and ageing: Horvath’s epigenetic clock . . . . . . . . . . . . . . . . . . . . . . . . . . 157 S2.8 Distributions of scores for the continuous (epi)genomic features in Sotos and ageing: Horvath’s epigenetic clock . . . . . . . . . . . . . . . . . . . . . . 158 S2.9 Methylation Shannon entropy acceleration . . . . . . . . . . . . . . . . . . 159 S2.10Batch effects in the methylation Shannon entropy for the epigenetic clock sites159 S2.11Information for the continuous (epi)genomic features . . . . . . . . . . . . 160 S3.1 Scatterplot of fragment length distributions for the isoschizomer families . . 161 S3.2 Genomic features that overlap with restriction enzyme cleavage sites . . . . 162 S3.3 Comparison of studies using restriction enzymes for genomic enrichment . 163 S3.4 Additional insights into cuRRBS . . . . . . . . . . . . . . . . . . . . . . . 164 S3.5 Additional results of running cuRRBS in different biological systems . . . . 165 S3.6 Effect of experimental errors during size selection in cuRRBS predictions . 166 S3.7 cuRRBS computational efficiency . . . . . . . . . . . . . . . . . . . . . . 167 List of tables 1.1 Comparison of epigenetic clocks in different species . . . . . . . . . . . . 31 2.1 Overview of the blood DNA methylation dataset from healthy individuals . 39 3.1 Overview of the developmental disorders that were included in the screening 82 4.1 Flexible user-defined cuRRBS parameters . . . . . . . . . . . . . . . . . . 119 S2.1 Additional information for the developmental disorders dataset . . . . . . . 140 Abbreviations and acronyms 27K Illumina Infinium HumanMethylation27 array 450K Illumina Infinium HumanMethylation450 array 5caC 5-carboxylcytosine 5fC 5-formylcytosine 5hmC 5-hydroxymethylcytosine 5mC 5-methylcytosine a.k.a. Also known as aDMPs Differentially methylated positions during ageing AMP Adenosine monophosphate AMPK Adenosine monophosphate-activated kinase ASD Autism spectrum disorder ATP Adenosine triphosphate ATR-X Alpha thalassemia/mental retardation X-linked syndrome aVMPs Variably methylated positions during ageing B CD19+ B cells BER Base excision repair BMIQ Beta-mixture quantile normalisation bp Base pairs CCC Cell composition correction CD4T CD4+ T cells CD8T CD8+ T cells CG 5′-cytosine-phosphate-guanine-3′ xx Abbreviations and acronyms CGI CpG island CHG 5′-cytosine-phosphate-H-phosphate-guanine-3′, where H corresponds to ade- nine, thymine or cytosine CHH 5′-cytosine-phosphate-H-phosphate-H-3′, where H corresponds to adenine, thymine or cytosine ChIP-seq Chromatin immunoprecipitation and sequencing CP/QP Constrained projection/quadratic programming CpG 5′-cytosine-phosphate-guanine-3′ CPU Central processing unit CRF Cost Reduction Factor in cuRRBS cSEA Shannon entropy acceleration for the Horvath’s epigenetic clock sites CTCF CCCTC-binding factor cuRRBS customised Reduced Representation Bisulfite Sequencing DHS DNase Hypersensitive Sites DHS-DMCs In cell-type deconvolution strategies, reference probes identified using infor- mation from differential methylation and chromatin accessibility DMCs Differentially methylated cytosines DMCTs Differentially methylated cytosines in individual cell types DMPs Differentially methylated positions DMRs Differentially methylated regions DMV DNA methylation valley DNA Deoxyribonucleic acid DNAmAge DNA methylation age i.e. epigenetic age calculated with Horvath’s epigenetic clock EAA Epigenetic age acceleration EPIC Illumina Infinium MethylationEPIC array epiTOC epigenetic Timer of Cancer (i.e. the epigenetic mitotic clock) ESCs Embryonic stem cells etc. Et cetera EV Enrichment Value in cuRRBS EWAS Epigenome-wide association studies Abbreviations and acronyms xxi FDR False discovery rate FN False negatives FP False positives FXS Fragile X syndrome GB Gigabytes Gbp Giga base pairs GC content Guanine + cytosine content GEO Gene Expression Omnibus repository Gran Granulocytes gSEA Genome-wide Shannon entropy acceleration GWAS Genome-wide association studies H3K27me3 Histone H3 lysine 27 trimethylation H3K36 Histone H3 lysine 36 H3K36me3 Histone H3 lysine 36 trimethylation H3K4me3 Histone H3 lysine 4 trimethylation hg19 Reference human genome assembly 19 hg38 Reference human genome assembly 38 hQTLs Histone quantitative trait loci HSCs Haematopoietic stem cells i.e. Id est IDOL IDentifying Optimal DNA methylation Libraries, a strategy to build cell-type deconvolution references IEAA Intrinsic epigenetic age acceleration IGF-1 Insulin-like growth factor 1 IHEC International Human Epigenome Consortium iPSCs Induced pluripotent stem cells kb Kilo base pairs KNN k-nearest neighbours xxii Abbreviations and acronyms m6A N6-methyladenosine MAE Mean absolute error (in the context of cell-type deconvolution benchmarking) or median absolute error (in the context of Horvath’s epigenetic clock) MBD Methyl-CpG-binding domain MEFs Mouse embryonic fibroblasts meQTLs Methylation quantitative trait loci Mono CD14+ monocytes mRNA Messenger RNA NAD+ Nicotinamide adenine dinucleotide NF Theoretical number of fragments sequenced in cuRRBS NFC Normalised fold change NK CD56+ natural killer cells NRC Normalised read counts NRE Normalised RNA expression NRF1 Nuclear respiratory factor 1 OOB Out-of-band fluorescence intensities in the Infinium I probes of Illumina arrays OR Odds ratio PAHs Polycyclic aromatic hydrocarbons PBMC Peripheral blood mononuclear cells PC Principal component PCA Principal component analysis PCC Pearson’s correlation coefficient pcgtAge Mitotic age according to the epigenetic mitotic clock (epiTOC) PCR Polymerase chain reaction PGCs Primordial germ cells PRC2 Polycomb Repressing Complex 2 QC Quality control Abbreviations and acronyms xxiii R It can have two meanings: robustness variable in cuRRBS or the R program- ming language R2 Coefficient of determination RAM Random-access memory Repli-seq Genome-wide analysis of replication timing by sequencing RMSE Root mean squared error RNA Ribonucleic acid RNA-seq RNA sequencing ROS Reactive oxigen species RPC Robust partial correlations RRBS Reduced Representation Bisulfite Sequencing rRNA Ribosomal RNA SASP Senescence-associated secretory phenotype SCC Spearman’s correlation coefficient SD Standard deviation Sexp Sex predicted for a sample using DNA methylation data SNP Single-nucleotide polymorphism SQN Stratified quantile normalisation sur Signal of unique reads TDG Thymine DNA glycosylase TKO Triple knockout TN True negatives TOR Target of rapamycin TP True positives TSS Transcription start site UTR Untranslated region WGBS Whole Genome Bisulfite Sequencing WTS Wavelet-transformed signals xxiv Abbreviations and acronyms Chapter 1 Introduction ‘[...] there are as many theories of ag[e]ing as there are biogerontologists.’ L. Hayflick [2007a] 1.1 The biology of ageing 1.1.1 A brief introduction to ageing theory The ageing process is one of the most mysterious, complex and fascinating biological problems that remains to be solved. Ageing and immortality have probably fascinated mankind since we have a conception of time and death [Renfrew et al., 2016]. Biological ageing (a.k.a. the ageing process) can be broadly defined as the time- dependent functional decline which increases vulnerability to death in most organisms [Lopez-Otin et al., 2013]. The revolution taking place in genetics and molecular biology during the 20th century gave rise to more than 300 theories that attempt to explain the mechanisms behind biological ageing [Medvedev, 1990]. Any valid modern theory of ageing would need to explain at least two things [Medvedev, 1990]: • The molecular basis for the increase in mortality rate (a.k.a. death rate) over time in the population of a given species. Mortality rate can be broadly defined as the number of deaths in a population per unit of time, scaled by the size of the population. More formally, by quantifying the deaths of individuals in a population over time (and assuming that there are no increases in the population number due to reproduction, migration, etc.), the survival fraction at a given time t, S(t), is [Witten, 1986]: 2 Introduction S(t) = N(t) N0 (1.1) where N(t) is the number of individuals alive at a given time t and N0 is the initial number of individuals in the population. It can be demonstrated that the mortality rate, λ (t), can be expressed as [Witten, 1986]: λ (t) =− 1 S(t) · dS(t) dt (1.2) • The evolutionary variations in lifespan between different species [Jones et al., 2013]; where lifespan is defined as the time passed between birth and death of an organ- ism. For example, the maximum lifespan in the case of the roundworm (Caenorhabditis elegans) is 0.16 years (58.4 days, in captivity); in the case of the fruit fly (Drosophila melanogaster) is 0.3 years (109.5 days, in captivity); in the case of the house mouse (Mus musculus) is 4 years (in captivity); in the case of humans (Homo sapiens) is 122.5 years and in the case of the bowhead whale (Balaena mysticetus) is 211 years (in the wild) according to the database AnAge [De Magalhães and Costa, 2009]. Furthermore, some species (such as certain turtles, certain species of rockfish or the bristlecone pine) seem to have negligible senescence i.e. negligible changes in adult mortality rates over extended periods of time at advanced adult ages [Finch, 2009]. Nowadays, there are at least two main paradigms, complementary to each other, that try to conceptualise the problem and that are a topic of intense discussion among biogeron- tologists: • Ageing as a consequence of molecular infidelity. In this case, stochastic chemical modifications of biomolecules, such as DNA or proteins, exceed the capacity of the repair and turnover systems of the organism and accumulate over time, which increases the entropy of the system. This leads to changes in molecular structure and, finally, changes in function, which increase vulnerability to age-related diseases [Hayflick, 2007a,b]. From an evolutionary point of view, this fits into the disposable soma theory, originally proposed by Thomas Kirkwood in 1977. This theory suggests that organisms have evolved to optimise the amount of energy dedicated to repair errors in somatic cells in order to maximise reproductive success (at the expense of indefinite survival) [Kirkwood and Rose, 1991; Kirkwood, 1977]. 1.1 The biology of ageing 3 • Ageing as a consequence of hyperfunction. In this case, the primary cause of ageing is an excessive activity of certain growth or development-related genes and pathways in later life [Blagosklonny, 2006, 2010; de Magalhães, 2012; Gems, 2015]. In other words, ageing originates from developmental programmes that have not been turned off [Blagosklonny, 2006]. This idea is rooted in the concept of antagonistic pleiotropy, an important pillar of the evolutionary theory of ageing originally proposed by George C. Williams in 1957 [Williams, 1957]. It implies that certain genes have opposite effects on fitness at different ages, which is a consequence of the decrease in selection forces after reproductive age. A strong candidate is the TOR (target of rapamycin) pathway (see section 1.1.2), which promotes development in early life but also the advancement of several late-life pathologies [Blagosklonny, 2010]. It has become clear that no single molecular mechanism will be able to explain ageing across all kingdoms of life. Different species have different life histories that are subjected to evolutionary trade-offs (e.g. regarding reproduction strategies, developmental schedules, etc.) and that can affect the rate of ageing [Jones et al., 2013; Ricklefs, 2010]. Nevertheless, it is possible to integrate all the ideas presented so far into a theoretical framework that can help to unify definitions across studies and set the foundations for mechanistic advancements on the biology of ageing (Fig. 1.1, inspired by ideas from [Freund, 2019; Gems, 2015; Hayflick, 2007a; Peto and Doll, 1997; Stroustrup et al., 2016]). Under this theoretical framework: • The ageing process is composed of different molecular mechanisms (subprocesses) that operate at different stages of life and contribute, in variable proportions, to the appearance of different age-related diseases i.e. the risk of developing an age-related disease is the ‘integral of its ageing subprocesses operating over time’. Furthermore, the development of different diseases affects the mortality rate and, thus, the probability of dying. The different ageing processes can also be understood as the sources of ageing-associated molecular damage [Lopez-Otin et al., 2013]. • If the ageing subprocesses can be altered through different genetic, lifestyle or pharma- cological interventions, it is possible to reduce the likelihood of several age-related diseases at the same time. This makes ageing research incredibly relevant to the biomedical sciences, since it changes the current paradigm away from developing interventions for a specific already-existing disease towards the prevention of several diseases simultaneously. • Differences in the average lifespan between different species should be explained by different combinations of ageing subprocesses and their rates. 4 Introduction A B … ? D1 D2 D3 D4 Ageing mechanism Disease Organism 1Rate Mortality Rate WA1 W D1 WD2 WD3 WD4 WB1 W B4 WA2 W A3 Reproductive age Diagnosis D2 Diagnosis D3 Death A B Time Organism 2Rate Reproductive age Diagnosis D1 Diagnosis D4 Death A B Time a b c Fig. 1.1 Theoretical framework to conceptualise the ageing process. a. The ageing process is composed of different molecular mechanisms (subprocesses) that operate at different stages of life and contribute, in variable proportions (specified by the weights), to the appearance of different age-related diseases. Furthermore, the development of different diseases affects the mortality rate and, thus, the probability of dying. b. and c. Examples of the life histories of two organisms. In these examples, two ageing mechanisms operate: A (which changes its rate after reproductive age e.g. activated growth-related pathways) and B (with a constant rate over time e.g. some type of (epi)mutational process). Differences in the mechanisms’ profiles lead to differences in the age-related diseases that manifest over the lifespan of the organisms, even though the molecular mechanisms are the same. This affects the mortality rate and, ultimately, the time-to-death. This figure is inspired by ideas from [Freund, 2019; Gems, 2015; Hayflick, 2007a; Peto and Doll, 1997; Stroustrup et al., 2016]. 1.1 The biology of ageing 5 Consequently, systems biology approaches become fundamental to understand the ageing process [Freund, 2019]. In the next sections, I will provide an overview of the ageing mechanisms that may operate in different species, with a special focus on mammalian species. 1.1.2 The genetic basis of ageing Given the large variability in lifespan between species [Jones et al., 2013], it is nowadays clear that the ageing process must have a genetic basis. However, for a long time, the ageing process was thought to be a ‘haphazard process driven solely by entropy’ [Kenyon, 2005]. Furthermore, in 1935 Clive Maine McCay had shown that caloric restriction (a reduction in calories intake without malnutrition) could extend mean and maximal lifespan in rats [McCay et al., 1935; McDonald and Ramsey, 2010], which probably shifted the focus towards environmental or external causes as the main driving forces of the ageing process. Since then, dietary restriction (which includes different types of dietary interventions that reduce food intake without malnutrition) has been established as the most successful non-genetic intervention to slow down the ageing process across species [Fontana and Partridge, 2015]. The establishment of the nematode Caenorhabditis elegans as a model organism in the 70s triggered its adoption in the ageing field [Klass and Hirsh, 1976], since it allowed well-controlled experiments in a much shorter period of time than rodents [Johnson, 2013]. This lead to the discovery of the first mutants that dramatically extended lifespan, which mapped to genes in the insulin/IGF-1 signalling pathway [Kenyon et al., 1993; Morris et al., 1996]. Since then, many genes have been found to significantly affect the lifespan of other model organisms as well, such as in budding yeast (Saccharomyces cerevisiae), in fruit flies (Drosophila melanogaster) and in mice (Mus musculus) [Kenyon, 2005, 2010; Singh et al., 2019]. Interestingly, the effects of many of these genetic mutations and their pathways are shared by distantly-related species. This suggests that at least part of the molecular mechanisms that drive the ageing process could be evolutionarily conserved. Major signalling pathways that have been associated with ageing include the following (Fig. 1.2) [Greer and Brunet, 2008; Kenyon, 2005, 2010; Singh et al., 2019]: • Insulin/IGF-1 pathway. This underscores the central role of the endocrine system on the biology of ageing. Mutations that lower the level of daf-2, encoding an insulin/IGF- 1 receptor, were originally found to double the lifespan of C. elegans [Guarente and 6 Introduction (DAF-2) IGF1R / INSR ( nutrients) Dietary restriction mTOR (TOR) NRF (SKN-1) FOXO1/3/4 (DAF-16) S6K1 (RSKS-1) HSF1 (HSF-1) 4EBP-1 (-) AMPK (AAK) Expression of longevity- promoting genes (e.g. stress resistance) Translation Autophagy Ageing rate Lifespan Fig. 1.2 Main signalling pathways that affect the ageing process. These pathways sense nutrient and stress inputs (such as a dietary restriction regime) to ultimate impact the rate of ageing. The grey lines represent inhibition (negative regulation) while the yellow arrows represent activation (positive regulation). As such, dietary restriction inhibits the insulin/IGF-1 pathway (in red), inhibits the TOR pathway (in green) and activates AMPK signalling (in purple), ultimately extending lifespan. For simplification, I have only included the main proteins that transduce the signal (e.g. there are more intermediate kinases in the insulin/IGF-1 pathway). Protein names are provided for both the mammalian (top) and the C. elegans orthologs if available (bottom, in parenthesis). 1.1 The biology of ageing 7 Kenyon, 2000; Kenyon et al., 1993]. Activation of the insulin/IGF-1 pathway leads to the phosphorylation of a transcription factor of the FOXO family, encoded by daf- 16 in C. elegans, which prevents it to reach the nucleus [Lin et al., 2001]. FOXO transcription factors, of which there are several members in mammals, activate the expression of longevity-promoting genes involved in processes such as autophagy (which clears protein aggregates and damaged organelles in the cell) [Singh et al., 2019], resistance to oxidative stress or stem cell maintenance [Martins et al., 2016]. This partially explains why the inhibition of the insulin/IGF-1 pathway can increase organismal lifespan. However, other downstream targets that regulate gene expression have also been identified, such as hsf-1 (a transcription factor that regulates heat-shock response) [Hsu et al., 2003] or skn-1 (a transcription factor that coordinates a response to oxidative stress) [Tullet et al., 2008] in C. elegans. • TOR pathway. TOR (target of rapamycin) is a kinase that acts as a major amino-acid and nutrient sensor by stimulating growth (including protein translation) and blocking autophagy [Kenyon, 2010]. The effects of TOR are partly mediated by activating the ribosomal subunit S6 kinase (which promotes protein translation) and by inhibiting 4EBP (a translation inhibitor) [Kenyon, 2010; Um et al., 2006]. Reductions in TOR activity (via genetic or pharmacological mechanisms) increase lifespan across many species [Kenyon, 2010]. Importantly, rapamycin, a drug that inhibits TOR, can increase the mean lifespan of mice when fed late in life, which showed for the first time that pharmacological interventions targeting mammalian ageing are possible [Harrison et al., 2009]. Interestingly, the increase in lifespan differed in males (9%) and females (13%) [Harrison et al., 2009], highlighting the sex-specific effects of some ageing mechanisms. • AMPK pathway. The AMP-activated kinase (AMPK) controls the balance between catabolic and anabolic processes depending on the cellular levels of AMP/ATP (i.e. when ATP levels decrease, AMPK is activated to promote catabolic pathways) [Kenyon, 2010; Mihaylova and Shaw, 2011]. Furthermore, AMPK activation promotes au- tophagy, partially by inhibiting TOR [Mihaylova and Shaw, 2011]. The anti-diabetic drug metformin, which activates AMPK among other targets, has been shown to extend lifespan in mice [Anisimov et al., 2008; Martin-Montalvo et al., 2013] and has been included as the first drug to target the human ageing process in a clinical trial [Barzilai et al., 2016]. • Sirtuins. Sirtuins are a family of nicotinamide adenine dinucleotide (NAD+)-dependent deacetylases i.e. they generally catalyse the removal of an acetyl group from lysine 8 Introduction residues using NAD+ as a cofactor [Bonkowski and Sinclair, 2016]. Sirtuins have been shown to play complex roles in the biology of ageing and age-related diseases, in general by cross-talking with other nutrient-sensing pathways and promoting longevity [Bonkowski and Sinclair, 2016; Kenyon, 2010]. Several authors have shown that increasing NAD+ levels enhances the activity of sirtuins, which could constitute an additional anti-ageing pharmacological avenue in mammals. Additionally, intensive re- search is being carried out to identify other molecules that activate sirtuins [Bonkowski and Sinclair, 2016]. • Other pathways. Mitochondrial respiration (and its production of reactive oxygen species or ROS), genome surveillance pathways (such as those involved in DNA repair or telomere maintenance), signals from the reproductive system or Wnt signalling have also been implicated in different ways in the ageing process [Greer and Brunet, 2008; Kenyon, 2010; Lezzerini and Budovskaya, 2014]. These pathways seem to have a dual role depending on the environmental context that the organism is facing, behaving as nutrient and stress sensors. Under abundant nutrient availability and low stress (oxidative, temperature), they tend to promote growth and reproduction. While in contrast, under harsh conditions (such as those posed by dietary restriction), they favour cell protection and maintenance [Kenyon, 2005, 2010]. It is worth mentioning that the responses of the different pathways to dietary restriction deeply depend on the characteristics of the diet and its timing [Kenyon, 2010]. This model also relates to the disposable soma theory, where more resources are allocated either to reproduction or somatic maintenance depending on the context [Kirkwood and Rose, 1991; Kirkwood, 1977]. This is further mechanistically supported by experiments showing that decreased insulin/IGF-1 signalling (e.g. via daf-2 mutation) produces the acquisition of germline characteristics (e.g. higher genomic stability) in C. elegans somatic cells [Curran et al., 2009]. Even though this model is a clear oversimplification, it becomes useful when thinking about the way in which the ageing process might have evolved and how the same biological pathways can be repurposed to activate genetic programs with completely different goals. There are many more complexities associated with these pathways that would require an entire thesis on its own. For example, the insulin/IGF-1 signalling pathway can work in a cell non-autonomous manner (i.e. the activity of the pathway in one tissue can affect lifespan by influencing cells in a different tissue), which could help to coordinate ageing rates in the organism, and the effects are many times tissue-specific [Kenyon, 2005, 2010]. Additionally, the pathways can have different effects depending on the life stage of the animal (e.g. development, adulthood, etc.) [Dillin et al., 2002]. Furthermore, cross-talk between the 1.1 The biology of ageing 9 pathways has previously been reported [Bonkowski and Sinclair, 2016; Greer et al., 2007]. Therefore, the inner workings of these signalling pathways is still an area of intense research. The discovery of signalling pathways that can dramatically extend the lifespan of model organisms has demonstrated that the ageing process has a genetic basis and it is possible to alter its rate. More importantly, the appearance of age-related disease seems to be delayed in many of these long-lived organisms [Arantes-Oliveira et al., 2003; Kenyon, 2010], suggesting that these interventions indeed reduce the rate of some the operating ageing mechanisms (Fig. 1.1). 1.1.3 Hallmarks of mammalian ageing Most studies on the biology of mammalian ageing have been conducted in mice. Many genetic mutations in conserved pathways (mainly nutrient-sensing pathways) have been shown to significantly extend the lifespan of mice. Among them, those that affect growth hormone signalling (which in mammals in turn controls the secretion of IGF-1 by the liver and therefore the insulin/IGF-1 signalling pathway) produce the longest lifespan improvements (in the order of 40-60%) [Singh et al., 2019]. Even though this is a remarkable result, it is far off the lifespan extensions achieved with ‘simpler’ model organisms such as C. elegans (where extensions of almost 1000% have been achieved with a mutation in a single gene of the insulin/IGF-1 pathway; the equivalent of a human living up to ≈ 1200 years!) [Ayyadevara et al., 2008]. This highlights a trend where translating lifespan interventions discovered in worms and flies yields generally less spectacular results in mice and potentially in humans. Evolution has been experimenting with lifespan extension for a long time. Consequently, some species of mammals, such as the naked mole rat (Heterocephalus glaber) or some species of bats (Chiroptera), are exceptionally long-lived for their body size. Recent reports point towards the possibility that these species do not increase their mortality rate with age (i.e. they may have negligible senescence) [Fleischer et al., 2017; Ruby et al., 2018a], which makes them incredibly interesting systems to study the biology of ageing in mammals. In 2013, López-Otín et al. reviewed the main common denominators of the ageing process across organisms [Lopez-Otin et al., 2013]. They defined several hallmarks of ageing, which can be understood as the measurable consequences of the ageing mechanisms that I proposed in Fig 1.1. I will briefly discuss some of them, with a special focus on those that directly affect the genome during mammalian ageing [Lopez-Otin et al., 2013; Singh et al., 2019]: 10 Introduction • Genomic instability. Somatic DNA mutations (single nucleotide variants, copy num- ber changes, structural rearrangements, etc.) accumulate over time in mammalian cells (both in the nuclear genome and the mitochondrial genome) [Larsson, 2010; Martincorena et al., 2018]. Different mutational processes (good candidates for ageing mechanisms) create specific patterns of mutations (a.k.a. mutational signatures) in the genome, which have been widely studied in the context of human cancer [Alexandrov and Stratton, 2014]. It is possible to assign specific endogenous (e.g. DNA replication errors) and exogenous factors (e.g. smoke exposure) that contribute to the different processes. In the context of ageing, deamination of 5-methylcytosine (5mC) in a CpG context leads to C>T (cytosine to thymine) mutations, which accumulate in a clock-like manner with a rate that correlates with the proliferative activity of the tissue [Alexandrov et al., 2015]. Furthermore, nuclear architecture and the 3-dimensional organisation of the genome both seem to change with age, which can distort nuclear homeostasis. Interestingly, several human diseases that are considered to display premature ageing, such as Werner syndrome or Hutchinson–Gilford progeria, have mutations in proteins that lead to genomic instability [Oberdoerffer and Sinclair, 2007]. Finally, it is possible that an increase in the mobilisation of transposable elements with age further contributes to destabilise the genome [Orr, 2016]. • Telomere attrition. The repetitive DNA sequences at the linear ends of mammalian chromosomes are capped with the protein complex shelterin to form structures known as telomeres. Due to the nature of the standard DNA replication machinery, the chromosomal DNA ends of somatic cells are eroded after each cell division (net loss of 100-200 bp of telomeric sequence per cell division). After a certain number of doublings (and therefore telomeres shortening) cells stop diving and they induce cellular senescence (see below) or cell death (apoptosis) [O’Sullivan and Karlseder, 2010]. For many years, this replicative limit, known as the Hayflick limit, was understood as the manifestation of the ageing process at the cellular level [Hayflick, 1998; Hayflick and Moorhead, 1961]. Telomere shortening has indeed been shown to occur with age in most human tissues [Blasco, 2007]. Importantly, stem cells and germ cells express telomerase, an enzymatic complex that synthesises new telomeric repeats, avoiding telomere shortening. This way organisms can regenerate their tissues if needed, which makes it unlikely that telomere attrition is the only mechanism driving ageing. In addition to telomere length shortening, other mechanisms may contribute to replicative senescence in mammals [O’Sullivan and Karlseder, 2010]. Nevertheless, telomere biology plays a critical role in many fundamental processes, such as DNA repair and genomic stability, and non-telomeric functions for telomerase have also been 1.1 The biology of ageing 11 suggested (such as global chromatin regulation and transcription of developmentally- regulated genes) [O’Sullivan and Karlseder, 2010]. As such, telomeres have been implicated in age-related diseases, such as cancer and cardiovascular disease [Blasco, 2007; O’Sullivan and Karlseder, 2010]. Interestingly, ectopic expression of the catalytic subunit of telomerase (TERT) extends the lifespan of mice that are cancer-resistant [Tomás-Loba et al., 2008]. • Cellular senescence. Cellular senescence is a cellular state characterised by a stable cell cycle arrest. There are different types of senescence induced by different stress stimuli, including telomere shortening (replicative senescence, previously mentioned), sustained DNA damage (e.g. via irradiation) or derepression of the INK4/ARF locus (which encodes three tumour suppressor genes)[Herranz and Gil, 2018; Lopez-Otin et al., 2013]. Under normal circumstances, cellular senescence carries out physiological functions such as preventing pre-malignant cells from dividing, participating in wound healing and tissue remodelling. Furthermore, senescent cells also secrete a cocktail of factors (termed the senescence-associated secretory phenotype, or SASP) with pleiotropic effects (e.g. pro-inflammatory, matrix remodelling, inducing growth, etc.) [Herranz and Gil, 2018]. Senescent cells accumulate in mammalian tissues during ageing. If this happens in excess, the SASP can perturb the homeostasis of the tissue. Consequently, the removal of senescent cells in mice increases lifespan and reduces the appearance of age-related phenotypes [Baker et al., 2016, 2011; Xu et al., 2018]. Drugs that selectively induce apoptosis in senescent cells (known as senolytics) [Kirkland et al., 2017] are currently undergoing clinical trials in humans. • Epigenetic alterations. This hallmark is reviewed in further detail in section 1.2.3, since it is the main focus of this thesis. • Other hallmarks of ageing. These include loss of proteostasis (appropriate quality control of the proteome, which is mechanistically connected with autophagy pathways; strongly implicated in neurodegenerative diseases); deregulated nutrient sensing (medi- ated by the pathways discussed in section 1.1.2); mitochondrial dysfunction (including a reduction in the efficacy of the respiratory chain with age); stem cell exhaustion (which is thought to contribute to the decline of regenerative potential of the tissues with ageing, such as in the case of the haematopoietic system) and altered intercellular communication (including an increase in inflammation, known as inflammageing, or alterations in the neuroendocrine system) [Lopez-Otin et al., 2013]. Importantly, complex interactions and interdependencies emerge between the different hallmarks of ageing. For example, in senescent cells in the mouse, a type of transposable 12 Introduction element (LINE-1) becomes derepressed and activates type-I interferon response, which in turn causes inflammageing [De Cecco et al., 2019]. Furthermore, understanding the role of the environment in modulating the ageing process and the different hallmarks in mammals is becoming increasingly important. Assuming that molecular damage is the main cause of biological ageing, the mechanisms that lead to genomic instability, telomere attrition, epigenetic alterations and loss of proteosta- sis are very likely the main drivers of the ageing process, with the rest of the hallmarks being a consequence of them [Lopez-Otin et al., 2013]. Nevertheless, interventions targeting some of the more ‘integrative hallmarks’ (such as removing senescent cells, optimising dietary restriction or stem cell therapies) will probably arrive earlier in the clinic. 1.1.4 Studying the ageing process in humans Average human lifespan has nearly doubled in most developed countries during the last 200 years. This has been the consequence of external factors, such as improvements in quality of water, nutrition, hygiene, housing and lifestyle, immunisation against infectious disease, antibiotics and medical care [Partridge et al., 2018]. One of the most debated questions in the human ageing field is whether there is a limit to maximal human lifespan [Dong et al., 2016]. Since Benjamin Gompertz’s pioneering work in 1825, it is known that the mortality rate in humans increases exponentially with age [Gompertz, 1825]. However, a recent study on Italian centenarians suggests that mortality rate, which increases exponentially up to about age 80, decelerates thereafter and reaches or closely approaches a plateau after age 105 [Barbi et al., 2018]. This implies that human lifespan may continue to increase in the next decades and that we have probably not reached our evolutionary lifespan limit as a species yet [Barbi et al., 2018; Kontis et al., 2017]. Thus, in order to avoid a massive socioeconomic burden on our societies [Fine, 2014], biomedical research should focus on extending human healthspan (i.e. the amount of time that we live free of disease) and not only lifespan. This goal, known as the ‘compression of morbidity’ [Partridge et al., 2018], is theoretically possible if we target the core mechanisms that drive the ageing process (Fig 1.1); which is assumed to be the biggest contributor to the development of most age-related diseases, such as cancer, diabetes, cardiovascular disorders and neurodegenerative diseases [Lopez-Otin et al., 2013]. Indeed, genetic and pharmacological interventions that increase lifespan in model organisms also seem to extend healthspan [Newell Stamper et al., 2018] and the compression of morbidity is a characteristic of human centenarians [Feldman et al., 2012]. 1.1 The biology of ageing 13 Most of our understanding of human ageing comes from studies carried out in population cohorts. Furthermore, during the last years different datasets of high-throughput molecular data (broadly know as ‘omics’) have been generated for many of these cohorts, including genetic data, epigenetic data, metabolomic data, imaging data or even the microbiome. These data (sometimes referred as ‘deep phenotypes’) complement the more traditional phenotypic measurements and health records and allow, for the first time, characterising the human ageing process with unprecedented resolution and scale. An example of such a cohort is the UK Biobank, which has enrolled > 500,000 participants [Bahcall, 2018]. Importantly, there is a trend in many of these cohorts to collect more longitudinal data (i.e. data over time for the same individual), which will likely increase the power to discover causal ageing mechanisms (as opposed to cross-sectional data, when data from different individuals at different ages is used) [Rahmadi et al., 2017]. As Nobel laureate Sydney Brenner (known for establishing C. elegans as a model organism) remarked 10 years ago: "We don’t have to search for a model organism anymore. Because we are the model organisms" [FitzGerald et al., 2018] (as a disclosure, I still believe we need model organisms to gain definitive mechanistic insights, and probably so did the late Prof. Brenner). The ageing process is an extremely polygenic trait, probably one of the most complex phenotypes to be studied (as one would expect if it is composed of many different molecular processes). Candidate gene studies (with biased hypotheses) and genome-wide association studies (GWAS, unbiased) have found genetic variants that may affect the rate of ageing in humans. Many of them are associated with the function of genes that are part of nutrient- sensing pathways (such as FOXO3 or IGF1R), that increase the risk of Alzheimer’s (such as APOE), that are involved in cellular senescence (such as CDKN2A) or that are related to the immune system and inflammation (such as HLA-DQA1, HLA-DRB1 or IL6) [Partridge et al., 2018; Singh et al., 2019]. Additionally, biological sex has a major impact on the ageing process and the incidence of age-related diseases. In the case of humans, females consistently live longer than males (females make around 90% of the supercentenarians i.e. individuals that live 110 years or more). However, they also seem to suffer greater morbidity in later life, which is known as the ‘mortality-morbidity paradox’ [Austad and Fischer, 2016]. It is clear that human longevity has a genetic component. However, the latest estimates of heritability are quite low (ranging between 10-15%) [Kaplanis et al., 2018; Ruby et al., 2018b]. Furthermore, GWAS have yielded relatively few genetic variants compared with other complex phenotypes [Singh et al., 2019]. This could be due to the sample sizes re- quired or methodological limitations (such as the way that the ageing phenotype is defined). Nevertheless, it is more likely that the environment (and its interaction with the genetic 14 Introduction background) accounts for most of the phenotypic variation in human ageing popula- tions [Partridge et al., 2018; Singh et al., 2019]. As such, there is evidence that diet (not only the content but also the timing) and exercise can act through nutrient-sensing pathways to regulate human healthspan and potentially lifespan [Most et al., 2017; Partridge et al., 2018; Redman et al., 2018; Richter and Ruderman, 2009; Singh et al., 2019; Wei et al., 2017]. Interestingly, social relationships are hypothesised to have a causal role in mortality rate, with lower levels of social integration associated with higher levels of inflammation, blood pressure or waist circumference across all human lifespan [Yang et al., 2016b]. A fascinating example of the impact of environmental and lifestyle factors on human lifespan are the so called ‘blue zones’. These are geographical areas (such as Ogliastra in Sardinia, Okinawa in Japan, the Nicoya peninsula in Costa Rica and the island of Ikaria in Greece) that have unusual proportions of long-lived individuals. However, the genetics of these populations are similar to their neighbours and therefore differences in the rate of ageing must be attributed to environmental and lifestyle factors [Partridge et al., 2018; Poulain et al., 2013]. As such, targeted lifestyle interventions will likely complement pharmacological interventions (some of them mentioned in section 1.1.2) in order to slow down the human ageing process. Finally, epigenetic mechanisms constitute an interesting layer of biological information that could mediate the interactions between genetics and environment to affect the ageing process and it will be the topic of discussion in the next sections. 1.2 Epigenetics of ageing 1.2.1 A brief introduction to epigenetics The coining of the term epigenetics is normally attributed to Conrad H. Waddington, when, in 1942, he defined it as the studies that deal with the causal mechanisms behind embryonic development (i.e. the processes by which the genotype of a single cell brings about the phenotype of an organism) [Waddington, 1942]. This led to the unification of two apparently distinct fields (genetics and embryology), today known as the field of developmental genetics [Gilbert, 2011]. Furthermore, Waddington is also known for introducing the concept of the epigenetic landscape, which depicted developmental trajectories and the theory behind them in an incredibly compelling way [Waddington, 1957]. Later work by Nanney, Riggs, Holliday and others evolved the definition of epigenetics towards the concept of cellular memory, that was materialised at the molecular level through DNA methylation (since it could affect transcription and be inherited after each cell division) [Lappalainen and Greally, 2017]. The next decades were characterised by the discovery of a great variety of ‘molecular 1.2 Epigenetics of ageing 15 routes’ to affect gene expression (such as chromatin modifications or non-coding RNAs), which in humans culminated with consortia such as ENCODE [Consortium et al., 2012], Roadmap Epigenomics [Consortium et al., 2015], BLUEPRINT or IHEC [Stunnenberg et al., 2016] and created a broader concept of epigenetics [Greally, 2018; Lappalainen and Greally, 2017]. Nowadays there is a debate in the scientific community about the appropriate definition of epigenetics [Bird, 2007; Greally, 2018]. For the purpose of this thesis I will define epigenetics as the study of molecular variation that is beyond changes in the DNA sequence, that is inherited after cell division and that regulates gene expression (in line with the definition by Wu and Morris) [Wu and Morris, 2001]. However, it is important to mention that, in the context of the epigenetic clock, we are still not sure whether these molecular changes have direct functional consequences (e.g. by affecting RNA expression) and/or whether they help to define a new metastable cellular state in the cells in which they occur (see section 1.3.3). There are different types of molecular mechanisms that are normally considered ‘epige- netic’. These include: • DNA methylation. This will be discussed in detail in section 1.2.2. • Histone modifications. The basic unit of chromatin is the nucleosome. It is composed of ∼147 bp of DNA wrapped around an octamer of histones (generally two copies of each one of the four core histones: H2A, H2B, H3 and H4; although histone variants such as H3.3 or H2A.Z have also been characterised). In order to fit∼2 meters of DNA into the nucleus of a human cell, chromatin needs to be further compacted with the help of scaffold proteins (with the furthest level of compaction achieved in the mitotic chromosome) [Ou et al., 2017]. Histones possess N-terminal regions (a.k.a histone tails) that project towards the outside of the nucleosome and are positively charged. By default, this helps to compact the chromatin by interacting with the negative charges of the DNA. However, many different types of post-translational modifications (acetylation, methylation, phosphorylation, ubiquitinylation, sumoylation, etc.) in the residues of the histone tails have been identified across the eukaryotic tree of life (although modifications have been also found in the globular domains) [Lawrence et al., 2016]. These histone modifications can affect the chemical properties of chromatin, its degree of compaction and ultimately contribute to the regulation of transcription (e.g. through the recruitment of downstream effector proteins). The sequence and combinations of these modifications that modulate chromatin activity was named the histone code [Strahl and Allis, 2000] and its complexity is slowly being characterised 16 Introduction thanks to technologies such as ChIP-seq [Consortium et al., 2015, 2012]. Finally, it is worth mentioning the nomenclature that is used to refer to histone modifications. For instance, for the histone modification ‘H3K36me3’, the information about the histone (‘H3’), the residue where the modification happens (‘K36’ is lysine 36) and the type and number of modification(s) (‘me3’ refers to three methyl groups) is provided. • Other ‘epigenetic’ players. Non-coding RNAs (such as long non-coding RNAs, PIWI-associated RNAs or short-interfering RNAs) have been shown to affect the epige- netic landscape through different mechanisms. Additionally, many RNA modifications (known as the epitranscriptome), are currently being elucidated. However, whether they are considered truly ‘epigenetic’ is debatable [Mattick et al., 2009; Morris and Mattick, 2014]. Furthermore, prions (misfolded proteins that accumulate in cells and act as templates to further misfold more protein molecules) have been proposed as an epigenetic mechanism that is not based on heritable changes in nucleic acid [Halfmann and Lindquist, 2010]. The different epigenetic marks present complex patterns of correlation and cross-talk, which are mechanistically linked to the way that its addition and removal is regulated. This helps to define chromatin states (i.e. combination of different epigenetic marks) that affect gene regulation in different ways. Historically, chromatin has been broadly classified in two categories [Allis and Jenuwein, 2016; Reinberg and Vales, 2018; Trojer and Reinberg, 2007]: • Euchromatin. It presents active gene activity and it is more accessible to the tran- scription machinery. It is generally characterised by histone modifications such as H4K16ac, H3K4me3 or H3K36me3. • Heterochromatin. It is normally subdivided in constitutive (highly condensed and transcriptionally repressed; mostly found in pericentromeric regions, telomeres and other regions that contain repetitive elements; it is generally marked by H3K9me3 and high levels of 5mC) and facultative (normally transcriptionally silent but it has the potential to adopt open conformations depending on the temporal and spatial context; it is generally marked by H3K27me3). Consortia that have mapped many epigenetic marks (collectively known as the epigenome) in humans [Consortium et al., 2015, 2012] and advances in chromatin segmentation algo- rithms [Ernst and Kellis, 2010] have led to a more fine-grained definition of chromatin states. This has helped to identify functional elements in the genome in a high-throughput way, such as active transcription start sites (TSS, enriched in H3K4me3), enhancers (enriched 1.2 Epigenetics of ageing 17 in H3K4me1) or bivalent chromatin (enriched in H3K4me3 and H3K27me3) [Consortium et al., 2015, 2012]. Epigenetic marks contribute to define (or in Waddingtonian terms, ‘canalise’) different cell types and cellular states from the same genomic sequence. Cellular identity is normally established by master regulators (initiators), generally transcription factors that activate the expression of a genetic program (i.e. coordinated gene expression) [Reinberg and Vales, 2018]. However, in order for this cellular state to survive once the initiator is no longer present, the patterns of epigenetic marks need to be inherited after cell division. This is clearly the case for 5-methylcytosine (5mC, see section 1.2.2). In the case of histone modifications, there is evidence for the propagation of some of the repressing histone modifications (such as H3K9me3 and H3K27me3). This is possible because the machinery in charge of catalysing the addition of these chemical modifications (i.e. the writers, SUV39H1 and Polycomb Repressive Complex 2) also has the ability to recognise it (i.e. they are also readers), therefore creating a positive feedback. However, it is not clear whether many other histone modifications are copied after DNA replication in the newly synthesised DNA strand and therefore whether they are truly epigenetic [Reinberg and Vales, 2018]. Additionally, it is important to mention that enzymatic activities to reverse most (if not all) epigenetic marks (i.e. erasers) have been identified [Allis and Jenuwein, 2016]. Besides regulating transcription and/or defining cellular states, epigenetic mechanisms play a fundamental role in other important biological processes. These include ge- nomic imprinting (monoallelic expression according to parental origin) [Peters, 2014], X- chromosome inactivation (silencing of one of the two X chromosomes in female therian mammals) [Wutz, 2011] or cast differentiation in eusocial insects (such as queen and worker differentiation in honeybees, where there is a 10-fold difference in lifespan) [Patalano et al., 2012; Remolina and Hughes, 2008]. One of the big questions in the field of epigenetics is to which extent epigenetic patterns are genetically programmed and to which extent they change in response to environmen- tal/stochastic influences. In the case of human populations, genetic variants that affect the levels of DNA methylation (meQTLs) and histone modifications (hQTLs) at specific loci have been identified [Taudt et al., 2016]. Interestingly, it is possible to predict different epigenetic marks from the raw DNA sequence, mainly by identifying transcription factor binding sites that guide different parts of the epigenetic machinery [Whitaker et al., 2014]. Furthermore, monozygotic twins allow to control for the genetic background and study the epigenetic variation derived from environmental and stochastic factors, which is particularly 18 Introduction interesting in the context of complex diseases [Castillo-Fernandez et al., 2014]. Nevertheless, the debate is far from being finished. 1.2.2 Fundamentals of DNA methylation in mammals Different types of DNA modifications have been described across the tree of life. DNA methylation enzymes evolved in bacterial species to protect them from the infection of bacteriophages, although roles in bacterial transcriptional regulation have also been described [Sánchez-Romero et al., 2015]. In mammals, the most common DNA modification is the addition of a methyl group in the carbon at the 5th position of cytosines (5mC), which has been called the 5th base of DNA. The traditional functions assigned to 5mC include the mediation of genomic imprinting and X-chromosome inactivation, repressing transposable elements and regulating transcription [Wu and Zhang, 2017]. In the latter case, 5mC has been commonly associated with the repression of transcription (e.g. by altering the ability of transcription factors to bind or by attracting methyl-CpG binding domain proteins) [Li and Zhang, 2014]. However, it is becoming clearer over time that the picture is more complex. For example, gene bodies of highly expressed genes are methylated in order to avoid cryptic transcription [Neri et al., 2017]. 5mC generally occurs when the cytosine is followed by a guanine in the DNA strand (commonly known as a CG dinucleotide or CpG site) [Li and Zhang, 2014; Smith and Meissner, 2013]. In the human genome there are around 28 million CpG sites, of which approximately 60-80% are normally methylated [Smith and Meissner, 2013]. The density of CpG sites in the genome is variable. CpG islands (CGIs) are CpG-enriched genomic regions (200-2000 bp long, ∼30,000 CGIs in the human genome which account for ∼10% CpG sites) and are frequently associated with promoters (although ∼9,000 of them are found inside gene bodies) [Jeziorska et al., 2017; Smith and Meissner, 2013; Zeng et al., 2014]. Promoter-associated CGIs are normally unmethylated across cell types, which contrasts with the high methylation levels in the rest of the genome. The mechanism by which these CGIs remain resistant to DNA methylation is starting to be elucidated. Recent reports suggest that active transcription together with the binding of proteins that block methylation are required for the resistance. Among these proteins (which bind non-methylated CpG sites in the CGI via their zinc-finger CXXC domain) it is worth mentioning CFP1 (which recruits an H3K4 methyltransferase that increases H3K4me3 levels, which in turn inhibits de novo methylation) and TET1 (see below) [Takahashi et al., 2017]. On the contrary, if a promoter-associated CGI is methylated, this commonly leads to transcriptional repression of the correspondent 1.2 Epigenetics of ageing 19 gene; something that is observed in the promoters of certain tumour suppressor genes in cancer [Flavahan et al., 2017]. Different enzymes contribute to the establishment, maintenance and removal of DNA modifications in mammals. De novo methyltransferases DNMT3A and DNMT3B are capable of catalysing the addition of 5mC in those CpG sites that originally lack the modification in any of the two DNA strands. Maintenance methyltransferase DNMT1 is able to add 5mC to hemimethylated DNA (i.e. when only one of the strands in the CpG site has 5mC) thanks to the symmetry of CpG sites (and its recruitment via UHRF1). This provides a mechanism for the inheritance of DNA methylation patterns after cell division, therefore making it a true epigenetic mark capable of generating cellular memory (Fig. 1.3) [Li and Zhang, 2014; Smith and Meissner, 2013]; as originally hypothesised in 1975 by Holliday, Pugh and Riggs [Holliday and Pugh, 1975; Riggs, 1975]. It is worth mentioning that 5mC in a non-CpG context (i.e. in CHG or CHH, where H corresponds to adenine, thymine or cytosine) has also been detected in human tissues [Schultz et al., 2015]. However, its abundance is generally very low with the exception of embryonic stem cells (ESCs), induced pluripotent stem cells (iPSCs) and some brain cells; probably due to the levels of DNMT3A and/or DNMT3B in these cell types [He and Ecker, 2015; Ziller et al., 2011]. A third de novo DNA methyltransferase that is catalytically inactive, DNMT3L, has also been identified in mammals. DNMT3L is a DNMT3 variant that lacks the N-terminal part of the regulatory domain and the C-terminal part of the catalytic domain [Lyko, 2017]. DNMT3L cooperates mainly with DNMT3A to add 5mC in maternal genomic imprints during gametogenesis [Bourc’his et al., 2001; Tomida et al., 2018]. Additionally, DNMT3C was recently discovered as a de novo DNA methyltranferase in rodents, where it is responsible for methylating and silencing young retrotransposons in the male germline, which is required for mouse fertility [Barau et al., 2016]. It was a long-standing question whether the loss of 5mC (a.k.a demethylation) can only occur by replication-coupled passive loss (i.e. preventing DNMT1 maintenance activity and diluting 5mC content by cell division), due to methyltransferase errors or as a result of DNA repair after DNA damage [Iurlaro et al., 2017]. In 2009, two groups conclusively identified the presence of a different type of DNA modification in mouse and human DNA, 5-hydroxymethylcytosine (5hmC) [Kriaucionis and Heintz, 2009; Tahiliani et al., 2009] (although, surprisingly, its presence in rat tissue had been detected almost 40 years before) [Penn et al., 1972]. Furthermore, one of them demonstrated that the enzyme TET1 is capable of oxidising 5mC to 5hmC [Tahiliani et al., 2009]. Since then, other enzymes from the TET family (TET2, TET3) have also been shown to catalyse this reaction [Ito et al., 20 Introduction C G G C C G G C C G G C C G G C C G G C C G G C DNMT3A DNMT3B DNA replicationDN A r ep lica tio n me me me me me me me me DNMT1 DNMT1 5’ 3’ 3’ 5’ CpG site Watson DNA strand Crick DNA strand Fig. 1.3 Establishment and maintenance of 5-methylcytosine (5mC) in mammalian genomes. Unmethylated cytosines in symmetric CpG sites are originally methylated de novo by DNA methyltransferases DNMT3A and DNMT3B to form 5mC. After cell division, the newly synthesised DNA strands lack the methylation mark. Maintenance DNA methyltransferase DNMT1 recognises this hemimethylated DNA and adds the missing methyl groups, therefore ensuring the inheritance of DNA methylation patterns and cellular memory. 1.2 Epigenetics of ageing 21 2010]. Further products of oxidation, 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC), can also be generated by TET enzymes, although their abundance is incredibly low in the genome. Replication-dependent dilution of oxidised products or thymine DNA glycosylase (TDG)-mediated excision of 5fC and 5caC coupled with BER have been shown to complete the demethylation process (Fig. 1.4). Altogether, this shows that active enzymatic DNA demethylation is a feature of mammalian epigenomes [Wu and Zhang, 2017]. Finally, it is worth mentioning that another type of DNA modification, N6-methyladenine, has recently been identified in both mouse and human cells, thus further expanding the DNA alphabet [Wu et al., 2016; Xiao et al., 2018]. 5mC C 5hmC5fC 5caC DN A re plic atio n DN A re pl ica tio n DNA replication TET TETTET DNMTTDG + BER TDG + BER N N O NH2 R N N O NH2 R N O ON O NH2 R N O N O NH2 R N N O NH2 R OH AM-AR AM-PD Fig. 1.4 Oxidation of 5-methylcytosine (5mC) and the cycle of demethylation. 5mC can be oxidised to different DNA modifications (5hmC, 5fC, 5caC) by TET enzymes. The maintenance DNA methyltransferase DNMT1 can only recognise 5mC. As a consequence, after DNA replication, the rest of the modifications would be eventually lost (active modification–passive dilution or AM-PD). Alternatively, thymine DNA glycosylase (TDG)-mediated excision of 5fC and 5caC coupled with base excision repair (BER) can lead to the same outcome (active modification–active removal or AM–AR). This figure was adapted from [Wu and Zhang, 2017]. 22 Introduction DNA methylation patterns change drastically during mammalian embryonic devel- opment. After fertilisation, mouse and human zygotes undergo epigenetic reprogramming in order to reset naive pluripotency. This is mainly characterised by a global loss of 5mC (i.e DNA hypomethylation) with different demethylation processes affecting the paternal and maternal genome. Nevertheless, some genomic regions, such as imprints, survive epigenetic reprogramming. De novo DNA methylation occurs after implantation of the blastocyst, which will restore DNA methylation levels for most somatic cells and eventually generate cell-type specific DNA methylation patterns [Atlasi and Stunnenberg, 2017; Iurlaro et al., 2017; Tang et al., 2016]. In the case of the cells that will give rise to the germline (primordial germ cells or PGCs), further genome-wide DNA demethylation occurs, which makes PGCs the most hypomethylated cells found during mammalian development thus far (global methylation levels are ∼4%). This ensures imprint erasure and that parental epigenetic memories are removed, therefore posing a barrier for transgenerational epigenetic inheritance. Nevertheless, some regions that escape epigenetic reprogramming in mice and humans have been described (mainly evolutionarily young and potentially hazardous retrotransposons). Methylation patterns of the germline will then be re-established in a sex-specific manner [Tang et al., 2016]. Over the years, many technologies have been developed to measure 5mC and its oxidative products (see section 4.1 for an overview). Many assays rely on a chemical procedure called bisulfite conversion [Frommer et al., 1992]. Genomic DNA is denatured and incubated with sodium bisulfite. This leaves 5mC residues intact, but unmethylated cytosines are deaminated and converted to uracil. Therefore, after PCR amplification, 5mCs are substituted by cytosines while unmethylated cytosines become thymines. This information can then be read at base-pair resolution through DNA sequencing or by hybridisation to a methylation array (such as the Illumina Infinium BeadChips, which are the platform used to generate the data analysed in this thesis) [Plongthongkum et al., 2014]. It is important to keep in mind that 5hmC is confounded with the 5mC signal [Wu and Zhang, 2017]. Furthermore, C>T mutations (a common mutation during ageing, see section 1.1.3) can be confounded with hypomethylation events. Another caveat of bisulfite treatment is that it degrades DNA to a great degree and generates sequencing libraries of low complexity, which leads to reduced mapping rates and higher costs. A few months ago, Liu et al. published a bisulfite-free protocol for 5mC sequencing at base-pair resolution, which could potentially solve some of these issues and start a new generation of bisulfite-free methods [Liu et al., 2019]. 1.2 Epigenetics of ageing 23 Exposure to certain environmental factors is associated with changes in the methy- lome that can potentially modulate disease risk. In mice, in utero undernourishment leads to weight and metabolic defects in the F1 offspring. Furthermore, this metabolic phenotype is inherited in the F2 offspring through the paternal line. Interestingly, this could be caused because genomic regions that become hypomethylated during paternal germline specification survive epigenetic reprogramming in the F2 zygote and lead to further chromatin alterations in adult tissues [Radford et al., 2014]. In a different example, smoking exposure in humans changes DNA methylation patterns of blood [Roby et al., 2016] or buccal cells [Teschendorff et al., 2015] in a consistent and reproducible manner. However, the mechanisms behind these changes and whether they are functional or mere passenger epimutations remain ob- scure. Moreover, many complex diseases, such as rheumatoid arthritis or many cancer types, are characterised by altered DNA methylation patterns (although these DNA methylation changes are more robust and more abundant in most cancers when compared with rheumatoid arthritis). This suggests that epigenetic mechanisms integrate genetic and environmental aetiologies of disease [Liu et al., 2013; Widschwendter et al., 2018]. Nevertheless, it is important to mention that many environmental factors have an impact in the biology of the organism through non-epigenetic mechanisms. For example, exposure to certain environ- mental factors, such as UV light or polycyclic aromatic hydrocarbons (PAHs), can lead to specific DNA mutational signatures and increase the risk of developing certain types of human cancer [Kucab et al., 2019]. Furthermore, many environmental adaptations, such as those that regulate responses to temperature and light [Narasimamurthy and Virshup, 2017] or nutrient availability (see section 1.1.2), rely on non-epigenetic molecular mechanisms. 1.2.3 Links between the epigenetic machinery and ageing As previously mentioned in section 1.1.3, epigenetic alterations are one of the hallmarks of mammalian ageing [Lopez-Otin et al., 2013]. Given the role of genetic pathways and environmental factors in the regulation of organismal lifespan, the epigenetic layer of biological information has attracted a lot of interest in the ageing field (to the point that some authors have suggested that it is the hub that connects all the hallmarks of ageing) [Booth and Brunet, 2016]. Indeed, many life-extending interventions (such as dietary restriction, exercise or a robust circadian rhythm) modulate the epigenetic machinery and induce chromatin changes [Benayoun et al., 2015]. Furthermore, since many epigenetic marks are stable over time, they could behave as cellular memories that store past environmental exposures. Taking into account the vast literature available on this topic, in this section I will try to extract the pieces of information that are more relevant for this work. 24 Introduction Several authors have reviewed the wide variety of chromatin changes that occur during ageing in different organisms [Benayoun et al., 2015; Booth and Brunet, 2016; Pal and Tyler, 2016; Sen et al., 2016]. These include changes in histone numbers, histone variants, histone modifications, DNA modifications, non-coding RNAs or nucleosome positioning; which eventually lead to transcriptional deregulation. Certain mutations in proteins of the epigenetic machinery affect the lifespan of organisms from yeast to mouse, thus proving a causal role for some of these changes in the ageing process. Furthermore, I highlight a few interesting insights from these studies: • Global heterochromatin loss and redistribution has been suggested as one of the mech- anisms behind the ageing process [Tsurumi and Li, 2012; Villeponteau, 1997]. Indeed, mutations in proteins that cause premature ageing in humans (such as nuclear lamins or the WRN helicase) have a major impact in heterochromatin structure and the genomic distribution of its characteristic repressive chromatin marks (such as H3K9me3 or H3K27me3) [Zhang et al., 2015a]. Cellular senescence is also associated with the remodelling of heterochromatin [Zhang et al., 2007]. Furthermore, heterochromatin deregulation can lead to the activation and mobilisation of transposable elements during mammalian ageing [De Cecco et al., 2013]. • Mutations that alter the levels of H3K27me3 and H3K4me3 can have contradictory effects in the lifespan of model organisms, probably depending on the loci and cell types that they affect. However, it appears that mutations that increase the levels of H3K36me3 consistently increase lifespan (at least in yeast, worm and fly) [Benayoun et al., 2015; Booth and Brunet, 2016; Pal and Tyler, 2016; Sen et al., 2016]. This will be of interest for Chapter 3. • Increased levels of SIRT6, an H3K9ac and H3K56ac histone deacetylase from the sirtuin family, can extend lifespan of male mice [Kanfi et al., 2012]. On the contrary SIRT6-deficient mice die at about 4 weeks, have a progeroid phenotype and have increased genomic instability (due to problems in the base excision repair pathway) [Mostoslavsky et al., 2006]. The role of SIRT6 in human ageing is still not clear. • Histone chaperone ASF1, which promotes histone deposition and stability, is required for normal replicative lifespan in yeast [Feser et al., 2010]. Intriguingly, the mouse ortholog ASF1A is important to resolve bivalent chromatin upon differentiation of embryonic stem cells [Gao et al., 2018] (see discussion regarding the importance of bivalent domains during mammalian ageing in the ‘Hypermethylated regions during ageing’ section below). 1.2 Epigenetics of ageing 25 • The naked mole rat, an incredibly long-lived rodent with very low cancer incidence, presents a stable epigenome that is resistant to in vitro reprogramming. Furthermore, higher levels of repressive chromatin marks (such as H3K27me3) are observed relative to the mouse [Tan et al., 2017]. Importantly, the DNA methylation landscape also seems to be affected during ageing in mammals. Certain CpG sites or genomic regions gain methylation with age (i.e. they become hypermethylated) while others sites lose methylation (i.e. they become hypomethylated). Furthermore, some of these age-associated methylation changes are shared across tissues, while others are tissue-specific. Notably, even though they have a stochastic component, the genomic context where these changes occur seems to be conserved in mice [Avrahami et al., 2015; Cole et al., 2017b; Maegawa et al., 2010; Sziráki et al., 2018; Wang et al., 2017] and humans [Day et al., 2013; Dozmorov, 2015; Fernández et al., 2015; Heyn et al., 2012; Horvath et al., 2012; Raddatz et al., 2013; Rakyan et al., 2010; Slieker et al., 2018, 2016; Teschendorff et al., 2010; Weidner et al., 2014; Yuan et al., 2015; Zhu et al., 2018]: • Hypermethylated regions during ageing. They are generally enriched for bivalent chromatin, regions repressed by PRC2 (Polycomb Repressing Complex 2) and CpG islands (CGIs, many of which overlap with bivalent promoters). Bivalent domains are populated with numerous transcription factor binding sites and are marked simultane- ously by histone marks H3K27me3 (established by EZH2, which is part of the PRC2 complex; associated with transcriptional repression) and H3K4me3 (established by Trithorax-group proteins; associated with transcriptional activation). The two histone marks seem to co-occur on the same loci of the same cell in a majority of the bivalent domains (as opposed to an heterogeneous population of cells with different histone marks), and sometimes even on the different histone copies of the same nucleosome [Voigt et al., 2013]. This opposing duality is thought to silence developmental genes in embryonic stem cells (and pluripotent stem cells in the embryo) while keeping them poised for activation (by developmental and/or environmental cues) [Voigt et al., 2013]. Developmental genes (many of them lowly expressed transcription factors) are indeed highly enriched in these regions and this seems to be a feature of most gene ontology analysis performed in hypermethylated CpGs during ageing. Many of the bivalent domains disappear after differentiation, leaving only one of the two marks [Bernstein et al., 2006], but specific nonpluripotent bivalent domains can also be generated after differentiation [Voigt et al., 2013]. Besides differentiation, the physiological ageing process also seems to change the landscape of bivalent domains, as observed in aged haematopoietic stem cells or HSCs (where around 335 bivalent domains disappear in 26 Introduction old mouse HSCs, whereas 1,245 emerge) [Sun et al., 2014a]. This process is apparently linked to the proliferative history of HSCs [Beerman et al., 2013] and could contribute to the myeloid skewing observed during ageing [Beerman et al., 2013; Sun et al., 2014a]. Interestingly, bivalent domain losses occur in cancer cells as well, which seems to correlate with the hypermethylation of the regions [Bernhart et al., 2016]. It is possible that the ageing- or cancer-related hypermethylation destroys the ability to create a bivalent equilibrium in these regions. If this happens in the stem cells, it could impair adequate differentiation and propagate the methylation change in the tissue. Overall, this provides an interesting mechanistic link between embryonic development, lineage-specific cellular identity and the ageing process that should be further explored. • Hypomethylated regions during ageing. They are generally enriched for tissue- specific enhancers (generally marked with H3K4me1) and depleted for CGIs (which makes sense, given the low methylation levels of CGIs). For example, in wild-type mouse liver, 8230 liver-specific enhancers are hypomethylated during ageing. On the contrary, only 4702 of those enhancers suffer the same fate in Ames dwarf mice (which have decreased insulin/IGF-1 signalling and a longer lifespan), which highlights that the epigenome from Ames dwarf mice appears more stable [Cole et al., 2017b]. DNA methylation patterns in enhancers are likely regulated by the balance between de novo DNA methyltranferases and TET enzymes. For example, in human epidermal stem cells, DNMT3A and DNMT3B associate with the most active enhancers in a H3K36me3-dependent way and, together with TET2, regulate enhancer DNA methyla- tion levels and function [Rinaldi et al., 2016]. Furthermore, loss of DNMT3A drives enhancer hypomethylation in both mouse and human leukaemia models [Yang et al., 2016a]. Conversely, deletion of TET2 causes extensive loss of 5hmC at enhancers in mouse ESCs, which is accompanied by enhancer hypermethylation [Hon et al., 2014]. Therefore, the enhancer-specific hypomethylation observed during ageing could be a consequence of changes in the expression or activity levels of DNA methyltransferases and TET enzymes with age, which have been reported both in mice and humans [Armstrong et al., 2013; Ciccarone et al., 2016; Gontier et al., 2018; Truong et al., 2015]. Importantly, some of these ageing-associated DNA methylation changes also seem to happen in dogs and wolves [Thompson et al., 2017]. Furthermore, the rate of change of many of these age-associated regions is negatively correlated with lifespan in six different mammals [Lowe et al., 2018]. Altogether, this suggests that conserved epigenetic mechanisms may operate during ageing to shape the mammalian methylome. 1.3 The epigenetic ageing clock 27 Hence, it is clear that the epigenome is eroded over time. In humans, this inter-individual divergence of DNA methylation patterns created upon ageing has been termed ‘epigenetic drift’ [West et al., 2013]. Interestingly, this phenomenon is found even in monozygotic twins [Fraga et al., 2005; Talens et al., 2012], again highlighting the role of environmental and stochastic factors. This is also observed at the single cell level, where cells from old organisms become more heterogenous at the epigenomic and transcriptomic level [Hernando-Herraez et al., 2018; Martinez-Jimenez et al., 2017]. 1.3 The epigenetic ageing clock 1.3.1 Measuring the ageing process In order to study any phenomenon one needs to be able to measure it. Using survival curves (a.k.a lifespan curves, i.e. plotting the survival fraction over time, see equation 1.1) we have been able to quantify the ageing process at a population level (where the assumption is that life extension in a significant proportion of the population is a surrogate marker of slowed ageing) [Johnson, 2013]. The adoption of this methodology in model organisms (that we can manipulate genetically and/or pharmacologically) triggered the discovery of the first genes impacting upon the ageing process (i.e. the mutants showed ‘shifts’ of the survival curve when compared with a control). Since then, this has been the main paradigm in ageing research, with efforts being made to automate the process and increase its throughput [Stroustrup et al., 2013]. Nevertheless, measuring the ageing process at the organismal level has proven more difficult. Due to environmental and stochastic factors, there are significant differences in the lifespan of even isogenic organisms. Therefore, there is a real need to develop accurate biomarkers of ageing i.e. measurements of ‘age-related change(s) in body function(s) or composition that can predict the future onset of age-related disease(s) and/or the residual lifetime left (i.e. predict the rate of ageing) more accurately than chronological age’ [Bürkle et al., 2015]. Furthermore, according to the American Federation of Aging Research, any valid biomarker of ageing must also monitor a basic (sub)process underlying ageing, it must be able to be tested repeatedly without harming the organism (i.e. it has the potential to become a longitudinal biomarker) and be reproducible in both humans and laboratory animals (such as mice) [Bürkle et al., 2015]. The derivation of a biomarker of ageing leads to the definition of two types of age: 28 Introduction • Chronological age. It is the time elapsed since the birth of an individual. • Biological age. It is the result derived from a specific biomarker. Each biomarker is trained using a set of biological parameters (independent variables) to predict a dependent variable (e.g. chronological age) that captures the probability of dying at a given time. The training takes place using several individuals, ideally from multiple populations. Afterwards, given a new individual and the biological parameters, the biological age can be predicted (and it should capture the risk of death more accurately than chronological age). Younger biological ages should be linked to high fitness and health whereas older biological ages should correlate with age-related disease onset and morbidity [Benayoun et al., 2015]. For example, if chronological age is used as the dependent variable (which is the case for most biomarkers), the biological age of an individual represents the chronological age of the average population that is most similar to the individual (according to the set of biological parameters). In this case, if the biological age of an individual is smaller than his chronological age, this could be interpreted as his probability of death being smaller than the probability of death for the average population (i.e. potentially the rate of ageing of the individual is slower than the average). In the case of humans, the initial ageing biomarkers included traditional biological parameters such as body mass index, waist and hip circumference, blood pressure or heart rate. Over the years, biomarkers that use molecular parameters have also been developed; these include clinical chemistry parameters (such as cholesterol, immunoglobulins or fasting glucose), telomere length or ‘omics’-based measurements [Bürkle et al., 2015; Jylhävä et al., 2017]. In the latter category, almost every layer of biological information can be used to derive a biomarker, including epigenomics (see next section), transcriptomics [Peters et al., 2015], proteomics [Tanaka et al., 2018], metabolomics [Hertel et al., 2016], microbiome [Galkin et al., 2018] or even brain neuroimaging data [Cole et al., 2017a]. Furthermore, composite biomarkers (that combine the biological parameters from molecular layers with measurements of physiological function) [Khan et al., 2017] and algorithmic innovations (such as deep neural networks) [Putin et al., 2016] will likely improve the predictions. The biomarkers of the human ageing process will serve as personalised risk indicators and will allow monitoring the response to interventions, therefore creating endpoints in clinical trials that target the ageing process. 1.3 The epigenetic ageing clock 29 1.3.2 The landscape of epigenetic clocks Epigenetic clocks are mathematical models that predict the biological age of an organism using DNA methylation data. These models exploit the fact that DNA methylation patterns change robustly with age in different tissues and species, as summarised in section 1.2.3. Epigenetic clocks have emerged in the last few years as the most accurate molecular biomark- ers of the ageing process in humans, which they can track across the entire lifespan. As a quick comparison, telomere length (one of the other popular ageing biomarkers) achieves a Pearson’s correlation coefficient with chronological age of ∼ -0.5 in blood leukocytes in the best case scenarios (with many studies reporting much lower values and contradictory results) [Newman and Sanders, 2013]. On the other hand, the coefficients for Hannum’s or Horvath’s epigenetic clocks (discussed later) are generally above ∼ 0.8 (in virtually all studies assessed) [Chen et al., 2016a]. The idea that DNA methylation patterns behave in a clock-like manner during cellular ageing was already proposed in 1975 [Holliday and Pugh, 1975]. With the advent of high-throughput DNA methylation technologies, some authors started to test the ability of DNA methylation patterns to predict chronological age in humans. In 2010, Bork et al. showed that DNA methylation values change at specific CpG sites upon long-term culture and between young and old individuals in mesenchymal stromal cells [Bork et al., 2010]. Later that same year, studies by Teschendorff et al. [2010], Rakyan et al. [2010], Grönniger et al. [2010] and others identified sets of CpG sites (signatures) that consistently altered their methylation states with age in different tissues and cell types (and interestingly some of them seemed to occur in the same genomic context). In 2011, Bocklandt et al. demonstrated that it was possible to predict chronological age in saliva with an average error of 5.2 years using the DNA methylation values of only two CpG sites [Bocklandt et al., 2011]. Shortly afterwards, Koch et al. built what was probably the first multi-tissue predictor of chronological age in humans (which worked using the same 5 CpG sites across different cell types) [Koch and Wagner, 2011]. The potential role of epigenetic clocks as biomarkers of human ageing was probably realised after the publications, in 2013, of the models by Hannum et al. [2013] and Horvath [2013a] (Table 1.1). Since then, these epigenetic clocks have being validated in a large number of independent cohorts and have become, de facto, the default human epigenetic clocks for blood and multi-tissue predictions respectively. Importantly, this inspired other groups to build epigenetic clocks in the mouse [Meer et al., 2018; Petkovich et al., 2017; Stubbs et al., 2017; Thompson et al., 2018; Wang et al., 2017], dogs and wolves [Thompson et al., 2017] or even humpback whales [Polanowski et al., 2014]; which will be instrumental 30 Introduction to broaden our understanding of the biology of ageing in mammals [Stubbs et al., 2017]. A comparison of some of these epigenetic clocks can be found in Table 1.1. The accuracy that they can achieve with a relatively small number of CpG sites as covariates is remarkable. The predictions from epigenetic clocks are normally referred as epigenetic age (which is equivalent to the concept of biological age previously explained). Interestingly, deviations of epigenetic age from chronological age (a.k.a epigenetic age acceleration or EAA) have been associated with many conditions in humans, including time-to-death [Chen et al., 2016a; Marioni et al., 2015], HIV infection [Horvath and Levine, 2015], Down syndrome [Horvath et al., 2015a], obesity [Horvath et al., 2014], menopause [Levine et al., 2016] and breast-cancer risk in women [Kresovich et al., 2019], Werner syndrome [Maierhofer et al., 2017] or Huntington’s disease [Horvath et al., 2016b], among others (reviewed in Horvath and Raj [2018]). Interestingly, females and people of Hispanic ethnicity have lower EAA (after correcting for blood cell composition effects) when compared with males and those of Caucasian origin respectively, highlighting a role for biological sex and genetic background in the rate of the epigenetic ageing clock [Horvath et al., 2016a]. In mice, the epigenetic clock is slowed down by dwarfism and calorie restriction [Cole et al., 2017b; Meer et al., 2018; Petkovich et al., 2017; Thompson et al., 2018; Wang et al., 2017] and is accelerated by ovariectomy and high fat diet [Petkovich et al., 2017; Stubbs et al., 2017; Thompson et al., 2018; Wang et al., 2017]. Recently, other epigenetic clocks have been created for slightly different purposes. For example, Yang and colleagues developed an epigenetic clock that can track the rate of (stem) cell divisions in normal and cancerous tissue (see section 2.3.2) [Yang et al., 2016c]. Furthermore, an epigenetic clock that performs well in skin cells (such as fibroblasts, buccal cells and endothelial cells; known as the skin-blood clock) was developed in order to improve ex vivo studies or forensic applications [Horvath et al., 2018]. Moreover, this epigenetic clock enables the detection of EAA in Hutchinson-Gilford progeria, which is not possible with Horvath’s clock [Horvath et al., 2018]. Additionally, other epigenetic clocks have been trained to predict more complex dependent variables than chronological age. Levine et al. built a model that predicts a combination of chronological age with clinically-relevant variables (such as erythrocytes distribution width or serum glucose), known as PhenoAge [Levine et al., 2018]; while Lu et al. built a model that predicts a composite variable mixing information from smoking pack-years and plasma proteins (adrenomedullin, C-reactive protein, plasminogen activation inhibitor 1 and growth differentiation factor 15), known as GrimAge [Lu et al., 2019]. These models perform better than previous epigenetic clocks in 1.3 The epigenetic ageing clock 31 Species Human Human Mouse Dog and wolf Main reference Hannum et al. [2013] Horvath [2013a] Thompson et al. [2018] Thompson et al. [2017] DNA methylation technology Illumina methylation array (450K) Illumina methylation array (27K and 450K) RRBS RRBS N samples (in train- ing set) N = 482 N = 3931 N = 893 N = 108 Tissues (in training set) Blood Multi-tissue (18) Multi-tissue (10) Blood Age range (in train- ing set) 19-101 years 0-100 years 0.2-32.2 months 0.5-8 years Number of CpGs in the final model 71 353 529 115 Median absolute er- ror (MAE) 4.9 years 3.6 years 2.5 months 0.8 years MAE max. age in model ·100 4.85% 3.6% 7.76% 10.0% Table 1.1 Comparison of some of the epigenetic clocks available for different species. RRBS: reduced representation bisulfite sequencing (see Chapter 4). 32 Introduction predicting the onset of several age-related diseases and therefore they will likely be useful in a clinical context. From an statistical point of view, most of the epigenetic clocks have been built using linear regression (see section 2.4). A model needs to be trained to predict the dependent variable (normally chronological age) using the methylation values of different cytosines (generally in CpG context) as covariates. Given that the number of covariates is normally several orders of magnitude bigger than the number of samples available for training, regularisation (i.e. ‘shrinking’ of the linear regression coefficients, many of which become zero) needs to be performed. More specifically, elastic net (a combination of lasso and ridge regularisation) has been successfully applied [Friedman et al., 2010]. Many epigenetic clocks with sim- ilar performance can be built from different sets of CpG sites (i.e. the construction of epigenetic clocks is highly statistically degenerate) [Thompson et al., 2018]. Therefore, it is important to understand that the CpG sites that constitute an epigenetic clock are not necessarily the most important biologically, but rather they are probably a lower-dimensional representation of the main processes that shape the epigenome with age. 1.3.3 Molecular mechanisms of the epigenetic ageing clock At this point it is probably useful to clarify a few concepts that I will refer to throughout this work. I define the epigenetic ageing clock as the biological mechanisms that give rise to the genome-wide epigenetic changes that occur during ageing (in a given species); a definition in line with the one reported in [Horvath and Raj, 2018]. These changes have been widely studied in the context of DNA methylation and can be utilised to train predictors of chronological age (or other more complex variables). These predictors constitute different types of epigenetic clocks, and I will try to refer to them by the specific model being mentioned (e.g. Horvath’s epigenetic clock, Hannum’s epigenetic clock, etc.). As such, specific epigenetic clocks capture the changes associated with the underlying epigenetic ageing clock. The molecular mechanisms that control the rate of the epigenetic ageing clock are still mysterious [Field et al., 2018; Horvath and Raj, 2018]. Steve Horvath proposed that his multi-tissue epigenetic clock captures the workings of an epigenetic maintenance system, although the molecular nature of this hypothetical system is unknown to this date [Horvath, 2013a]. Furthermore, we still do not know whether these changes are functional at all or whether they are just downstream consequences of other molecular processes that drive ageing. 1.3 The epigenetic ageing clock 33 As mentioned in section 1.2.3, many studies have characterised changes in DNA methyla- tion patterns during mammalian ageing, some of which seemed to be evolutionarily conserved [Horvath, 2013a; Lowe et al., 2018]. Interestingly, changes that involve a gain in methy- lation during ageing seem to be more conserved across tissues, whilst changes involving hypomethylation are generally more tissue-specific [Horvath, 2013a; Yang et al., 2016c]. Furthermore, many of these changes occur in regions normally occupied by Polycomb Repressing Complex 2, which are marked by the repressive histone mark H3K27me3. Therefore, it is likely that disruptions of H3K27me3 domains (which are generally inherited after cell division) play a role in epigenetic ageing. A specific instance would be bivalent promoters (which are marked by both H3K27me3 and H3K4me3); these tend to gain methy- lation with age (see section 1.2.3). These signals are captured by most epigenetic clocks trained to predict chronological age [Horvath and Raj, 2018]. The mere existence of multi-tissue epigenetic clocks supports the idea that some of the mechanisms behind the epigenetic ageing clock are shared across tissues. Furthermore, Hannum’s epigenetic clock (trained exclusively in blood) explains 72% of variation in chronological age across other tissues (such as breast, kidney, lung and skin), although there is generally a tissue-specific offset [Hannum et al., 2013]. Interestingly, Horvath’s epigenetic clock (which is multi-tissue) presents positive epigenetic age acceleration in breast tissue [Sehl et al., 2017], whilst the cerebellum looks younger than expected [Horvath et al., 2015b] and some tissues are poorly calibrated (uterine endometrium, dermal fibroblasts, skeletal muscle and heart) [Horvath, 2013a]. Moreover, Horvath’s epigenetic clock dramatically underestimates epigenetic age in sperm [Horvath, 2013a], which highlights differences between somatic cells and the germline. Altogether, this raises the possibility that some of the mechanisms behind the epigenetic ageing clock may be shared across tissues but that they may operate at different rates (e.g. because of different exposure to hormones, differences in proliferation rate, etc.). Horvath’s epigenetic clock works in primary tissues and cell types, and also in vitro (both in cell culture and organoids) [Horvath, 2013a; Hoshino et al., 2019]. Furthermore, recipients of allogeneic hematopoietic stem cell transplantations show an epigenetic age in their blood that corresponds to the age of the donor, even 17 years after the transplantation took place [Søraas et al., 2019]. This suggests that the epigenetic ageing clock is a stable cell-intrinsic property, as opposed to the idea that it is highly influenced by the systemic environment (such as the effects observed in heterochronic parabiotic experiments) [Conboy et al., 2005]. The stability is further demonstrated by experiments showing that human fibroblasts that have been reprogrammed into neurons maintain their original epigenetic 34 Introduction age [Huh et al., 2016]. Moreover, aneuploid mice carrying a complete copy of human chromosome 21 accumulate DNA methylation changes during ageing far more rapidly than seen in human tissues, which suggests that the epigenetic ageing clock is a molecular readout of the ageing cellular milieu [Lowe et al., 2018]. Epigenetic age acceleration (EAA) has been proposed as a way to capture the ageing phenotype in GWAS analysis. Genetic variants associated with EAA have been found in TERT, the catalytic subunit of telomerase [Lu et al., 2018]. Epigenetic age increases in vitro with cell passage, but it requires the expression of TERT to keep linearly increasing after a certain number of passages [Lu et al., 2018]. This suggests that bypassing replicative senescence is required for the epigenetic ageing clock to keep ticking, at least in vitro. Interestingly, inducing senescence in TERT-immortalised cells via an oncogene makes the cells age faster in culture, but induction of senescence via DNA damage does not increase epigenetic age [Lowe et al., 2016]. Overall, this could imply that the epigenome of senescent cells does not contribute substantially to the changes captured by the epigenetic ageing clock. Furthermore, it has been proposed that epigenetic ageing could serve a complementary role to that of senescence, by suppressing potential cancer development (e.g. by protecting against dedifferentiation signals) [Horvath and Raj, 2018]. The molecular connections between cell division, alternative non-telomeric functions of TERT and the epigenetic ageing clock need to be further studied. Moreover, these experiments do not discard an indirect effect of senescent cells on the epigenetic ageing clock (i.e. via the SASP by inducing changes in the epigenomes of other cells in the tissue) that could occur in vivo. The rate of the epigenetic ageing clock is substantially faster during post-natal organismal growth (something that Horvath’s model accounts for) [Horvath, 2013a], which could be related to the high levels of TERT expression during this period [Lu et al., 2018]. Interest- ingly, epigenetic ageing according to Horvath’s epigenetic clock (but not according to other epigenetic clocks, such as Hannum’s clock, the skin-blood clock, PhenoAge or GrimAge) seems to start a few weeks post-conception in fetal tissues [Hoshino et al., 2019]. This could imply that the molecular processes responsible for mammalian epigenetic ageing are operative even during pre-natal development, potentially with different consequences. This molecular continuum between development and ageing is further reinforced by the fact that embryonic stem cells have an epigenetic age around zero [Horvath, 2013a]. Notably, in vitro reprogramming of somatic cells into induced pluripotent stem cells (iPSCs) also reduces epigenetic age to values close to zero (or even negative) both in humans [Horvath, 2013a] and mice [Meer et al., 2018; Petkovich et al., 2017]. Moreover, the induction of in vivo partial reprogramming (short and cyclic exposure to reprogramming factors) in progeric 1.3 The epigenetic ageing clock 35 mice ameliorates several ageing phenotypes and extends lifespan [Ocampo et al., 2016]. We are currently testing whether a similar protocol applied to physiologically aged mice can reduce epigenetic age. This is of extreme importance since it shows that the epigenetic changes associated with the epigenetic ageing clock are reversible, which opens the door to further mechanistic studies and to the development of rejuvenation therapies [Mahmoudi et al., 2019; Olova et al., 2019; Rando and Chang, 2012; Sarkar et al., 2019]. The goal of this thesis is to improve our understanding of the epigenetic ageing clock in humans. For this purpose, I will first review statistical methods to quantify epigenetic ageing in human blood (Chapter 2). Then, I will study how different proteins of the epigenetic machinery affect the rate of the epigenetic ageing clock (Chapter 3). Next, I will discuss a technological improvement with the potential to make future epigenetic clocks more cost- effective (Chapter 4). Finally, I will provide interesting future avenues that should be explored in order to unravel the molecular mechanisms of the epigenetic ageing clock (Chapter 5). Chapter 2 Statistical aspects ‘I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.’ W. Thomson [1889] 2.1 Analysing the blood methylome to study human ageing 2.1.1 Building a DNA methylation dataset from public data During the last years large amounts of DNA methylation data have been generated to study complex diseases and ageing [Flanagan, 2015; Rakyan et al., 2011]. Many of these datasets can be obtained from public repositories, such as the NCBI-hosted Gene Expression Omnibus (GEO) [Edgar et al., 2002]. Given its clinical accessibility and ease of collection, blood is one of the most commonly profiled tissues in human DNA methylation studies [Flanagan, 2015], including published studies on developmental disorders [Aref-Eshghi et al., 2018b] (see Chapter 3). Therefore, I decided to use blood as my surrogate tissue to broaden our understanding of the human epigenetic ageing clock. 38 Statistical aspects Furthermore, most of these human datasets have been generated using different versions of the Illumina Infinium array technology, with the Illumina Infinium HumanMethylation450 array (450K) being the most frequently used platform [Flanagan, 2015]. Additionally, given that the different array versions have different chemistries, biases and number of probes [Bibikova et al., 2011, 2009; Pidsley et al., 2016], I decided to focus on 450K data for my analyses. Using the GEOquery R package [Davis and Meltzer, 2007], I programmatically downloaded from GEO all the DNA methylation data from human blood that I could find, including samples from both whole blood and peripheral blood mononuclear cells (PBMC). Furthermore, the data also had to satisfy the following criteria: • Raw DNA methylation data was available (i.e. IDAT files). This was required so the pre-processing pipeline and the batch effect correction (which requires access to control probes intensities, see section 2.2.3) could be consistently applied across all the samples in the study. • Metadata for the samples was available, with the chronological age as a minimum requirement. • In order to study physiological ageing, the blood samples were collected from individ- uals without prior disease diagnoses. However, it is important to mention that I could never be completely certain of this, since there could be a lack of diagnosis and/or lack of reporting of the disease in the metadata. This allowed me to assemble a human blood DNA methylation dataset for healthy individuals (after QC, total N = 2218) with the characteristics shown in Table 2.1, which spans the entire human lifespan (0.5 to 101 years). Fig. 2.1 shows that the chronological age distribution is bimodal, with peaks around 10.69 and 58.81 years respectively. This reflects a sampling bias in human population studies, with more data being generated for the periods of postnatal development and during the appearance of age-related disease. However, in order to understand the development of complex diseases as a consequence of the ageing process, efforts should be made to also sample people in their middle ages, before the diseases are normally diagnosed. 2.1.2 Main DNA methylation data pre-processing pipeline The analysis of DNA methylation data generated in Illumina arrays has been a topic of huge discussion and statistical innovation in the epigenetic community. There are plenty of reviews in the literature that discuss the different steps that should be involved in the pre-processing 2.1 Analysing the blood methylome to study human ageing 39 Batch name N♀ N♂ N Median age (years) Other comments Europe 0 121 121 10.96 - Feb_2016 0 1 1 0.50 - GSE104812 19 29 48 9.00 - GSE111629 111 124 235 71.00 - GSE40279 336 314 650 65.00 - GSE41273 0 51 51 10.25 - GSE42861 239 96 335 55.00 - GSE51032 253 78 331 54.57 Only people that remained cancer-free in the follow-up after sample collection were included GSE55491 1 5 6 29.50 - GSE59065 49 46 95 34.00 - GSE61496 72 78 150 57.00 Only one member of each twins pair was included GSE74432 29 22 51 12.00 - GSE81961 25 0 25 30.05 - GSE97362 39 80 119 13.00 - Total 1173 1045 2218 55.00 - Table 2.1 Overview of the blood DNA methylation dataset from healthy individuals (control). All the batches were downloaded from GEO [Edgar et al., 2002], with the exception of ‘Europe’ and ‘Feb_2016’, which were generated in-house by my collaborators in Canada (see Chapter 3). N♀: number of samples from females. N♂: number of samples from males. N: total number of samples. These numbers correspond to the samples left after applying quality control (QC, see section 2.1.2). 40 Statistical aspects 0.000 0.005 0.010 0.015 0.020 0.025 0 25 50 75 100 Chronological age (years) D en si ty Control: N=2218 Fig. 2.1 Histogram showing the chronological age distribution for all the healthy individuals included in the DNA methylation dataset. The blue line represents the 1D kernel density estimate, as calculated by the stat_density function in R with default parameters. of this data type [Liu and Siegmund, 2016; Morris and Beck, 2015; Wilhelm-Benartzi et al., 2013]. More specifically, a recent study by Je Liu and Kimberly D. Siegmund systematically benchmarked the pre-processing methods available for the 450K array in order to reduce variation among technical replicates and improve the detection of biological differences [Liu and Siegmund, 2016]. Inspired by their results, I implemented a pre-processing pipeline for the 450K data using the minfi R package [Aryee et al., 2014] embedded in the following steps (Fig. 2.2): 1. Background correction. I used the noob method [Triche Jr et al., 2013], as imple- mented in the preprocessNoob function from the minfi R package [Aryee et al., 2014]. noob allows accounting for technical variation in the background (i.e. non-specific) fluorescence signal, which can lead to a reduced dynamic range for the methylation values (β -values) obtained (Fig. 2.2b, Fig. S1.1) [Triche Jr et al., 2013]. Briefly, when measuring fluorescence intensities in the Illumina array platforms, the observed intensity (also known as foreground, X f ) is composed of: 2.1 Analysing the blood methylome to study human ageing 41 X f = Xs+Xb (2.1) where Xs is the true signal and Xb is the background signal. Making use of a normal- exponential convolution (which assumes Xs ∼ Exp(γ) and Xb ∼ N(µ,σ2)) and the ‘out-of-band’ (OOB) intensities (fluorescence signals in the opposite colour channel in Infinium I probes) to model Xb , noob is capable of estimating Xs given X f . Furthermore, I also applied the default dye-bias correction strategy, which controls for the different average intensities in the two colour channels [Triche Jr et al., 2013]. 2. Quality control (QC). Following guidelines from the minfi R package [Aryee et al., 2014], I kept only those samples that satisfied the following criteria: (a) The sex predicted from the DNA methylation data (Sexp) was the same as the re- ported sex in the metadata. The sex was predicted using the getSex function from the minfi R package [Aryee et al., 2014], which employs intensity information from the sex chromosomes, such that: Sexp = female, if: (median { log2(My+Uy) }−median{log2(Mx+Ux)})< c male, if: (median { log2(My+Uy) }−median{log2(Mx+Ux)})≥ c (2.2) where My and Uy represent the methylated and unmethylated intensity mea- surements for the array probes in the Y chromosome, Mx and Ux represent the methylated and unmethylated intensity measurements for the array probes in the X chromosome and c is a predefined cutoff (default in minfi: c =−2). A total of 13 samples (0.56%) did not satisfy this criterion. (b) They were not outliers according to their global intensity values after background correction, such that: median{log2(Mi)}+median{log2(Ui)} 2 ≥ 10.5 (2.3) 42 Statistical aspects where Mi and Ui represent the background-corrected methylated and unmethy- lated intensity measurements for all the 450K array probes (Fig. S1.2). A total of 95 samples (4.09%) did not satisfy this criterion. 3. Probe filtering. I filtered out the following types of probes: • Probes that contain SNPs at the single base extension site (position 0) or at the proximal CpG on the probe (positions 1-2), using the dropLociWithSnps function in the minfi package [Aryee et al., 2014]. • Cross-reactive probes, as defined by Chen et al. [2013]. These are probes that can co-hybridise to alternative genomic sequences that are highly homologous to the target sequences [Chen et al., 2013]. • Probes that map to the sex chromosomes (X and Y). It is important to mention that other authors have also filtered out probes with high detection p-value or low bead counts across samples [Morris and Beck, 2015; Wilhelm- Benartzi et al., 2013]. However, I did not include these filters since it was not pointed out in the minfi guidelines [Aryee et al., 2014; Fortin and Hansen, 2015] and it could complicate further downstream analyses (e.g. different sets of probes missing across different batches). 4. β -value calculation. The methylation status of a given cytosine (normally found in a CpG site) in one of the array probes can be quantified using the β -value statistic, which is calculated as [Du et al., 2010; Wilhelm-Benartzi et al., 2013]: βi = max(Mi,0) max(Mi,0)+max(Ui,0)+α (2.4) where Mi and Ui represent the methylated and unmethylated intensity measurements for the ith-probe and α is a constant offset (in this work α = 100, as recommended by Illumina) [Du et al., 2010]. In a DNA molecule of a single cell, a specific cytosine is either unmethylated or methylated (categorical / binary variable). However, given that a bulk DNA sample from a tissue is composed of thousands of cells (which can include different cell types with different methylation patterns), β -values result in a continuous variable between 0 and 1. A value of 0 means that all the measured DNA molecules are unmethylated (0%) and a value of 1 means that all the measured DNA molecules are methylated 2.1 Analysing the blood methylome to study human ageing 43 DNA methylation data (IDAT files) Background correction (noob) Healthy individuals Nsamples = 2325 Nprobes = 485512 Quality control Healthy individuals Nsamples = 2218 Nprobes = 485512 Probe filtering β-value calculation BMIQ normalisation Healthy individuals Nsamples = 2218 Nprobes = 428266 0 1 2 3 0.0 0.5 1.0 β−value De ns ity Failed QC Passed QC 0 1 2 3 4 0.0 0.5 1.0 β−value De ns ity Failed QC Passed QC 0 1 2 3 4 0.0 0.5 1.0 β−value De ns ity Passed QC a b c d Fig. 2.2 Main DNA methylation data pre-processing pipeline. a. Flowchart showing the main steps implemented to pre-process the DNA methylation data from the 450K methylation arrays. The number of samples (Nsamples) and the number of array probes (Nprobes) left after each step are also specified for the samples from the healthy individuals. b. β -value distributions, calculated using the raw fluorescence intensities (i.e. before any pre- processing), for the samples in the GSE41273 batch. Each curve represents a different sample. In grey: 51 samples that passed quality control (QC). In red: 2 samples that failed QC. c. As in b., but calculating the β -values after background correction. d. As in b., but calculating the β -values after background correction, QC, probe filtering and BMIQ normalisation (i.e. the final β -values that I used for downstream analyses). Note that the samples that failed QC have been removed. 44 Statistical aspects (100%) in that cytosine, which is roughly equivalent to say that 100% of the cells are either unmethylated or methylated respectively in that cytosine for the sampled tissue. The β -values for a given sample (i.e. considering all the cytosines measured) usually follow a bimodal distribution, where the two peaks are centred around 0 and 1 (Fig. 2.2d). Other authors have used M-values to quantify methylation levels in arrays (Fig. S1.3), which can be calculated as: M-valuei = log2 ( max(Mi,0)+α max(Ui,0)+α ) (2.5) with a default offset value of α = 1. Du et al. reported that β -values suffer from severe heteroscedasticity (i.e. differences in the variance) for highly methylated or unmethylated CpG sites and therefore the M-values have more desirable statistical properties [Du et al., 2010]. However, Zhuang et al. later showed that this only becomes a problem in studies with small sample sizes [Zhuang et al., 2012] (which is not the case for my analyses). Furthermore, β -values are easier to interpret biologically and can be readily used in the context of BMIQ normalisation (see below). For these reasons, I choose β -values as the main methylation variable for this work. 5. Beta-mixture quantile normalisation (BMIQ). In the case of the 450K arrays two types of probes / chemistry coexist in the same platform. Infinium I probes and Infinium II probes have different β -values distributions (a.k.a. Infinium II probe bias). BMIQ is an intra-array normalisation strategy that allows to correct for this bias and has been shown to outperform other methods used in this context [Dedeurwaerder et al., 2011; Maksimovic et al., 2012; Teschendorff et al., 2012; Touleimat and Tost, 2012]. BMIQ fits a three-state beta-mixture model to Infinium I and Infinium II probes separately and then maps the Infinium II probes distribution into the Infinium I probe distribution (Fig. 2.3). In the case of unmethylated (β -values close to 0) and methylated (β -values close to 1) probes, this is done by transforming probabilities into quantiles. In the case of ‘hemimethylated’ probes (intermediate β -values), a dilation transformation is applied to preserve the monotonicity and continuity of the data [Teschendorff et al., 2012]. I applied BMIQ to my samples and discarded those that failed the normalisation step. 2.1 Analysing the blood methylome to study human ageing 45 0 2 4 6 0.00 0.25 0.50 0.75 1.00 β−value D en si ty Infinium I Infinium II with BMIQ Infinium II without BMIQ Fig. 2.3 Effect of BMIQ normalisation on the β -value distribution of different subsets of array probes with different chemistries (Infinium I, Infinium II). These results correspond to a DNA methylation sample from the GSE41273 batch. It can be appreciated how BMIQ transforms the distribution of the Infinium II probes into a distribution more similar to the Infinium I probes. 46 Statistical aspects 2.1.3 Accounting for blood cell composition changes during ageing Whole blood is composed of several cell types that contain a nucleus, including neutrophils, eosinophils, basophils, CD14+ monocytes, CD4+ T cells, CD8+ T cells, CD19+ B cells and CD56+ natural killer (NK) cells [Teschendorff and Zheng, 2017a]. These cell types have different epigenetic profiles and, as a consequence, changes in their proportions (i.e. changes in blood cell composition) can affect bulk DNA methylation measurements [Reinius et al., 2012]. Accounting for this cellular heterogeneity is really important in epigenome-wide asso- ciation studies (EWAS) [Jaffe and Irizarry, 2014; Liu et al., 2013; McGregor et al., 2016]. Furthermore, previous research has highlighted changes in blood cell composition with age, which could be one of the causes behind immunosenescence [Chen et al., 2016b; Czesnikiewicz-Guzik et al., 2008; Kuranda et al., 2011; Manser and Uhrberg, 2016; Seidler et al., 2010]. Therefore, considering blood cell composition in the context of ageing-related studies and the epigenetic clock is fundamental in order to make sure that the observed age-related changes in the methylome are not a direct consequence of the changes in blood cell composition during ageing [Chen et al., 2016a; Horvath et al., 2016a; Jaffe and Irizarry, 2014]. Several methods have been developed to estimate the cell composition of a blood sample given a bulk DNA methylation measurement (a.k.a. cell-type deconvolution) [Teschendorff et al., 2017; Teschendorff and Relton, 2018; Teschendorff and Zheng, 2017a; Titus et al., 2017]. These methods can be broadly split in two categories: • Reference-based approaches. They use a pre-defined set of DNA methylation ref- erence profiles for the cell types that are supposed to be present in the tissue. In the case of methylation arrays, these reference profiles can be constituted by the β -values for a subset of array probes that are highly discriminative of the underlying cell types. Assuming that the blood sample is a weighted linear sum of the C reference profiles, the objective of the method is to find these weights (wc), which should be equivalent to the actual cell type proportions (given the assumption ∑Cc=1 wc ≤ 1) [Teschendorff and Zheng, 2017a]. In mathematical terms: y = C ∑ c=1 wcbc+ ε (2.6) 2.1 Analysing the blood methylome to study human ageing 47 where y is the DNA methylation profile of the sample being considered, C is the number of underlying cell types, bc is the DNA methylation profile for the cth cell type and ε is the error [Teschendorff et al., 2017]. Different algorithms have been applied to estimate the values of wc, with the approach by Houseman et al. [2012] (which uses a linear constrained projection) being the most widely used. • Reference-free approaches. Instead of making use of reference profiles for the cell types of interest, these methods generally calculate latent variables that capture variation driven by cell type composition, although the strategy and assumptions to derive these latent variables from the DNA methylation data is highly method-specific [Teschendorff and Zheng, 2017a]. These methods become particularly useful when no references are available for the cell types that constitute the tissue [Teschendorff and Zheng, 2017a]. However, reference-free approaches rarely provide estimates for the specific cell types in a given sample [Teschendorff and Zheng, 2017a] (which are needed in the current modelling framework of the epigenetic clock) and they often rely on the assumption that the top compo- nents of variation correlate with cell composition [Teschendorff et al., 2017], something that is not always true (especially in the case of developmental disorders, see Chapter 3), Thus, I decided to benchmark different reference-based cell-type deconvolution strategies in blood. In this context I tested (Fig. S1.4): • Different blood references. As pointed out before, the quality of the reference, containing the DNA methylation profiles of the cell types to be inferred, is crucial [Koestler et al., 2016; Teschendorff et al., 2017]. The reference must be composed of those CpG sites (in this case, array probes) that are able to better discriminate between the different cell types. In my case I considered six major blood ‘cell types’ for the inference: granulocytes (‘Gran’), CD4+ T cells (‘CD4T’), CD8+ T cells (‘CD8T’), CD19+ B cells (‘B’), CD14+ monocytes (‘Mono’) and CD56+ natural killer cells (‘NK’). It is important to point out that granulocytes are not themselves a ‘biological cell type’ (since they are composed of neutrophils, eosinophils and basophils), but will be considered as a single ‘computational cell type’ as previously done [Chen et al., 2016a; Horvath et al., 2016a]. I tested three different blood references whose constitutive probes were selected using different strategies: 1. The reference implemented in the estimateCellCounts function from the minfi R package [Aryee et al., 2014], which is widely used in the epigenetic literature. The reference probes were selected using t-statistics, by finding those probes that 48 Statistical aspects were differentially methylated in each cell type when compared with the rest of the cell types. Among those probes that showed differences at p-value < 10−8, the 100 most differentially methylated probes by effect size (50 hypermethylated and 50 hypomethylated) were chosen for each cell type (making a total of 600 probes for the reference) [Jaffe and Irizarry, 2014]. 2. The reference implemented in the EpiDISH R package (centDHSbloodDMC.m) [Teschendorff and Zheng, 2017b]. The reference probes (DHS-DMCs, 333 in total) were selected by leveraging information of both differentially methy- lated cytosines (DMCs, using moderated t-statistics) and chromatin accessibility (DNase Hypersensitive Sites or DHS) for each cell type [Teschendorff et al., 2017]. 3. The reference implemented as part of the IDOL strategy (IDentifying Optimal DNA methylation Libraries) [Koestler et al., 2016]. In this case, the reference probes (300 in total) were originally selected based on differential methylation criteria and are updated in an iterative manner, with the probability of being selected based on their contribution to prediction accuracy [Koestler et al., 2016]. The three references were built using the dataset from Reinius et al. [2012] (GSE35069), which I obtained directly from the FlowSorted.Blood.450k R package [Jaffe, 2018]. This dataset contains DNA methylation data generated in the 450K array for the six cell types considered, all of which were isolated using flow cytometry [Reinius et al., 2012]. The β -values for the selected probes were averaged across the biological replicates for each cell type. • Different DNA methylation pre-processing pipelines. I tested different configura- tions for the pre-processing of both the gold-standard (see below) and the reference data. For example, I tested whether probe filtering according to the criteria outlined in the previous section (section 2.1.2) is desirable, since this leads to the removal of some of the probes originally selected for the reference in the original publications [Koestler et al., 2016; Teschendorff et al., 2017] (Fig. S1.4). Furthermore, I also tested whether the prediction benefits from a similar pre-processing of both the gold-standard (or the dataset where the prediction will be made) and the reference. • Different deconvolution algorithms. I tested the performance of the following algo- rithms: CP/QP (constrained projection/quadratic programming, originally implemented by Houseman et al. [2012]), RPC (robust partial correlations) [Teschendorff et al., 2017] and CIBERSORT (which was originally developed for cell-type deconvolution 2.1 Analysing the blood methylome to study human ageing 49 using RNA expression data) [Newman et al., 2015; Teschendorff et al., 2017]. One of the key differences between the algorithms is how the normalisation constrain (∑Cc=1 wc ≤ 1) is implemented [Teschendorff et al., 2017]. All the algorithms were run using the implementations in the epidish function from the EpiDISH R package [Teschendorff and Zheng, 2017b], with the exception of the run in the minfi reference, for which I used the estimateCellCounts function with default parameters for the 450K array [Aryee et al., 2014]. In order to compare the results from the predictions against real cell composition values, I used a gold-standard dataset (GSE77797) containing 12 samples where known proportions of DNA isolated from the different blood cell types were mixed [Koestler et al., 2016]. I assessed the accuracy of the predictions using three different metrics: • Root mean squared error (RMSE), which is calculated as (for a given cell type c): RMSEc = √ ∑Nn=1(yˆcn− ycn)2 N (2.7) where yˆcn is the predicted proportion of the cth cell type in the nth sample, ycn is the real proportion of the cth cell type in the nth sample and N is the total number of samples in the gold-standard dataset (N = 12). A perfect prediction for a cell type would minimise the value of RMSEc (i.e. RMSEc = 0). • Mean absolute error (MAE), which is calculated as (for a given cell type c): MAEc = ∑Nn=1 |yˆcn− ycn| N (2.8) A perfect prediction for a cell type would minimise the value of MAEc (i.e. MAEc = 0). • Coefficient of determination (R2), which is calculated as (for a given cell type c): R2c = ∑Nn=1(yˆcn− y¯c)2 ∑Ni=1(ycn− y¯c)2 (2.9) where y¯c = ∑Nn=1 ycn N . A perfect prediction would maximise the value of R 2 c (i.e. R 2 c = 1). 50 Statistical aspects The most accurate strategy, according to the RMSE (mean across cell types: 1.9270) and MAE (mean across cell types: 1.5498), is ‘idol_NFB_houseman’ (Fig. 2.4, Fig. S1.5) i.e. the strategy that uses the IDOL reference, with all the pre-processing steps from my main pipeline for both reference and gold-standard (noob background correction, probe filtering and BMIQ normalisation) and employs Houseman’s CP/CQ algorithm (Fig. S1.4). This strategy performed well in all the cell types (Fig. 2.5) and I selected it for my cell-type deconvolution analyses. It is important to mention that the gold-standard dataset was generated as part of the same study where the IDOL reference was also derived [Koestler et al., 2016]. However, the gold-standard samples were used as an independent validation of the IDOL reference and should not influence the conclusions of the benchmarking that I performed. In the future, it will be interesting to validate these conclusions using new gold-standard datasets generated from whole blood. Next, I ran the optimal blood cell-type deconvolution strategy in the DNA methylation dataset that I built from healthy individuals (Table 2.1). The main goal of this analysis was to provide blood cell type proportions that can be used as covariates as part of the epigenetic clock modelling (see section 2.2.2). However, this also allowed me to broadly quantify the changes in blood composition that occur during human ageing (Fig. 2.6). The mammalian immune system undergoes dramatic changes during ageing. These changes are normally referred as immunosenescence and can be broadly defined as a decline in immune system functionality and its ability to fight infections, which results in an increase in morbidity and mortality with age [Nikolich-Žugich, 2018]. Furthermore, human ageing is also characterised by an increase in chronic, low-grade inflammation referred as inflam- mageing, which is thought to contribute to the development of age-related diseases (such as atherosclerosis, type 2 diabetes, Alzheimer’s disease and osteoporosis) [Franceschi, 2007]. In my dataset, I observe the following (Fig. 2.6): • A relative decrease in cell types from the adaptive immune system (CD4+ T cells, CD8+ T cells and CD19+ B cells). Interestingly, the decline in CD8+ T cells was more pronounced (i.e. higher absolute value of the slope) than in the case of CD4+ T cells, which has been previously reported [Czesnikiewicz-Guzik et al., 2008]. • A relative increase in cell types from the innate immune system (granulocytes, CD14+ monocytes and CD56+ natural killer cells). 2.1 Analysing the blood methylome to study human ageing 51 ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 m inf i dh s_ dif 1_ ho us em an dh s_ NB _h ou se m an dh s_ dif 2_ ho us em an dh s_ NF B_ ho us em an dh s_ dif 1_ cib er so rt dh s_ NB _c ibe rs or t dh s_ dif 2_ cib er so rt dh s_ NF B_ cib er so rt dh s_ dif 1_ rp c dh s_ NB _r pc dh s_ dif 2_ rp c dh s_ NF B_ rp c ido l_N B_ ho us em an ido l_N FB _h ou se m an ido l_N B_ cib er so rt ido l_N FB _c ibe rs or t ido l_N B_ rp c ido l_N FB _r pc Cell−type deconvolution strategy RM SE Cell ● ● ● ● ● ● B CD4T CD8T Gran Mono NK ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 0.0 2.5 5.0 7.5 10.0 m inf i dh s_ dif 1_ ho us em an dh s_ NB _h ou se m an dh s_ dif 2_ ho us em an dh s_ NF B_ ho us em an dh s_ dif 1_ cib er so rt dh s_ NB _c ibe rs or t dh s_ dif 2_ cib er so rt dh s_ NF B_ cib er so rt dh s_ dif 1_ rp c dh s_ NB _r pc dh s_ dif 2_ rp c dh s_ NF B_ rp c ido l_N B_ ho us em an ido l_N FB _h ou se m an ido l_N B_ cib er so rt ido l_N FB _c ibe rs or t ido l_N B_ rp c ido l_N FB _r pc Cell−type deconvolution strategy M AE Cell ● ● ● ● ● ● B CD4T CD8T Gran Mono NK a b Fig. 2.4 Benchmarking of the cell-type deconvolution strategies in blood. The x-axis shows the different strategies that were tested (for a detailed description see Fig. S1.4). The y-axis shows the results for a. the root mean squared error (RMSE) and b. the mean absolute error (MAE) when comparing the predictions with the real proportions of cells in a gold-standard dataset (GSE77797) [Koestler et al., 2016]. The grey horizontal solid lines represent the mean for the RMSE or the MAE across cell types and the grey dashed line the minimum of these values. 52 Statistical aspects l ll l ll l l l l l l 20 40 60 20 40 60 Real cellular fractions (%) Pr ed ic te d ce llu la r f ra ct io ns (% ) Strategy: idol_NFB_houseman Cell type: Gran l l l l ll l l l l l l 8 12 16 20 10 15 Real cellular fractions (%) Pr ed ic te d ce llu la r f ra ct io ns (% ) Strategy: idol_NFB_houseman Cell type: CD4T l l l l l l l l l l ll 0 10 20 30 10 20 30 Real cellular fractions (%) Pr ed ic te d ce llu la r f ra ct io ns (% ) Strategy: idol_NFB_houseman Cell type: CD8T l l l l l l l l l l l l 10 20 0 10 20 Real cellular fractions (%) Pr ed ic te d ce llu la r f ra ct io ns (% ) Strategy: idol_NFB_houseman Cell type: B ll l l l l l l l l l l 5 10 15 20 5 10 15 20 Real cellular fractions (%) Pr ed ic te d ce llu la r f ra ct io ns (% ) Strategy: idol_NFB_houseman Cell type: Mono l l l l l l l l l l l l 0 5 10 15 20 0 5 10 15 20 Real cellular fractions (%) Pr ed ic te d ce llu la r f ra ct io ns (% ) Strategy: idol_NFB_houseman Cell type: NK Fig. 2.5 Comparison of the predictions for the different cell types using the optimal deconvolution strategy (‘idol_NFB_houseman’) with the real cell type fractions in the gold-standard dataset (GSE77797) [Koestler et al., 2016]. Each point corresponds to a different sample in the gold-standard. The black dashed line represents the diagonal to aid visual interpretation. 2.1 Analysing the blood methylome to study human ageing 53 These results are highly consistent with the literature [Chen et al., 2016b; Czesnikiewicz- Guzik et al., 2008; Jaffe and Irizarry, 2014; Kuranda et al., 2011; Manser and Uhrberg, 2016; Seidler et al., 2010], which validates the methodology for cell-type deconvolution that I have used. These variations in blood cell composition may be caused by the age-related changes that happen in the two primary lymphoid organs: the bone marrow (whose hematopoietic stem cells exhibit reduced self-renewal potential and increased skewing towards myelopoiesis) and the thymus (which undergoes tissue involution) [Chinn et al., 2012]. This analysis provides a preliminary overview of the blood cell composition landscape during human ageing. However, only relative changes in blood composition were quantified and the analysis is limited by the ‘cell types’ that I have deconvoluted (e.g. granulocytes include different cell types, different subsets of monocytes exist, etc.), which means that these conclusions must be taken with care [Nikolich-Žugich, 2018]. Furthermore, the sex of the individual can influence the proportions of blood leukocytes [Chen et al., 2016b] and it should be taken into account in future analyses. 2.1.4 Identifying differentially methylated positions during ageing Differential methylation analysis is one of the most common types of downstream analyses in the context of DNA methylation data [Morris and Beck, 2015; Teschendorff and Relton, 2018; Wilhelm-Benartzi et al., 2013]. It involves finding associations between the DNA methylation levels at specific CpG sites in the genome (a.k.a. differentially methylation positions or DMPs) and a given phenotypic variable of interest (e.g. a specific disease, when compared with a healthy sample). It is worth mentioning that DMPs are also called differentially methylated cytosines (DMCs) in the literature [Teschendorff and Relton, 2018]. In order to study the changes that the methylome undergoes during physiological ageing, it is useful to identify differentially methylated positions during ageing (aDMPs) i.e. individual cytosines (normally found in a CpG context) that change their methylation status as a function of chronological age. Linear models, widely used in the context of differential RNA expression analysis [Ritchie et al., 2015], can also been adapted to find aDMPs [Teschendorff and Relton, 2018; Zhuang et al., 2012]. In the case of a continuous variable (such as chronological age) the association is performed using a linear regression modelling framework [Zhuang et al., 2012] (see section 2.4 for a short description of linear regression and the nomenclature used throughout this thesis). Briefly, for each probe in the methylation array, I fitted the following linear regression models to the data from the healthy individuals: 54 Statistical aspects 0 25 50 75 100 0 25 50 75 100 Chronological age (years) G ra n (% ) SCC: 0.3071; p−value < 2.2e−16 Model: y = 49.1307 + 0.1674*x 0 10 20 30 40 50 0 25 50 75 100 Chronological age (years) CD 4T (% ) SCC: −0.1998; p−value < 2.2e−16 Model: y = 19.9565 − 0.0522*x 0 10 20 30 40 50 0 25 50 75 100 Chronological age (years) CD 8T (% ) SCC: −0.3754; p−value < 2.2e−16 Model: y = 12.4339 − 0.0835*x 0 10 20 30 40 50 0 25 50 75 100 Chronological age (years) B (% ) SCC: −0.405; p−value < 2.2e−16 Model: y = 8.9931 − 0.0695*x 0 10 20 30 40 50 0 25 50 75 100 Chronological age (years) M on o (% ) SCC: 0.1094; p−value = 2.43e−07 Model: y = 5.2971 + 0.0066*x 0 10 20 30 40 50 0 25 50 75 100 Chronological age (years) N K (% ) SCC: 0.2064; p−value < 2.2e−16 Model: y = 3.9194 + 0.0313*x Fig. 2.6 Changes in blood cell composition during human ageing. Scatterplots showing the changes in the proportions of the six cell types considered (inferred using the cell-type deconvolution strategy) as a function of chronological age. Each point represents a different DNA methylation human sample from Table 2.1. The black line displays the linear model %cell_type ∼ Age (see section 2.4 for more details on linear modelling), with the slope and intercept shown in the titles. The Spearman’s correlation coefficient (SCC) and the p-value associated with it are also displayed. 2.1 Analysing the blood methylome to study human ageing 55 • A model with cell composition correction (CCC). As I have shown previously, the different blood cell types change their abundance with age. Therefore, in order to maximise the chances of finding aDMPs that are conserved across different cell types, it is important to include the estimated cell proportions as covariates in the model: Beta∼ Age+Sex+Gran+CD4T +CD8T +B+Mono+NK+PC1+ ...+PC17 (2.10) where Beta is the β -value for the array probe being evaluated; Age is the chronological age (in years) of the samples; Sex encodes for the sex of the samples (0/1); Gran, CD4T , CD8T , B, Mono and NK are the cell type proportions from the samples as calculated with my cell-type deconvolution strategy and PCN is the Nth principal component that captures technical variance and accounts for potential batch effects (see section 2.2.3 for more details). • A model without CCC, which can be expressed as: Beta∼ Age+Sex+PC1+ ...+PC17 (2.11) This leads to the identification of aDMPs which will be more confounded with the proportions of the different cell types (i.e. the change in β -value with age could be entirely driven by a change in a specific cell type that is differentially methylated at that particular probe). Furthermore, for each probe, I calculated a p-value, based on t-statistics [Teschendorff and Relton, 2018], to assess whether the putative linear association between the methylation status and chronological age was significant or not (at a significance level of α = 0.01 after applying Bonferroni correction to account for multiple testing, see section 2.4 for more details). I used a customised version of the dmpFinder function in the minfi R package [Aryee et al., 2014] to identify the aDMPs, which internally uses the limma framework [Ritchie et al., 2015]. Given the big sample size (N = 2218≫ 10), I did not use variance shrinkage (i.e. empirical Bayes moderated t-statistics) as part of the statistic calculations [Ritchie et al., 2015]. An overview of the different aDMPs (with and without CCC) identified in the healthy individuals can be found in Figure 2.7. Around 30% of the blood methylome (at least 56 Statistical aspects according to the 450K array) is affected by the ageing process during human lifespan. However, it is worth mentioning that Bonferroni correction provides a very conservative picture of the methylomic changes (when compared with other methods to control for type-I error, like FDR) and it is likely that an even greater proportion of the methylome is indeed altered with age [Zhu et al., 2018]. CpG sites can become both hypomethylated (i.e. lose methylation with age) or hypermethylated (i.e. gain methylation with age). Importantly, the effect sizes of the age coefficient (i.e. the observed changes in the β -values per year) are generally small. More specifically, in the model with CCC, the median age coefficient for the hypomethylated aDMPs is -0.000426 (equivalent to a -4.26% methylation change over 100 years of human life) and for hypermethylated aDMPs is 0.000437 (equivalent to a +4.37% methylation change over 100 years of human life). This is consistent with the progressive functional decline observed during ageing [Lopez-Otin et al., 2013]. It is worth mentioning that around 50% of the CpG sites that constitute the Horvath epigenetic clock are blood aDMPs according to my analysis (Fig. 2.7c,d). Overall, these results are consistent with previous studies [Slieker et al., 2018, 2016; van Dongen et al., 2016; Zhu et al., 2018]. Next, I looked at the top 100 aDMPs that were identified (according to their p-value and t-statistic, Fig. S1.6 and Fig. 2.8). The first aDMP in the list was cg16867657, a probe that consistently gains methylation with age (Fig. 2.8a) and has been previously identified as the strongest aDMP across tissues and human populations in several studies [Bacalini et al., 2017; Garagnani et al., 2012; Gopalan et al., 2017; Hannum et al., 2013; Slieker et al., 2018; Zbiec´-Piekarska et al., 2015]. cg16867657 is associated with the CpG island in the promoter of the ELOVL2 gene, which encodes an enzyme that catalises one of the reactions in the elongation of polyunsaturated fatty acids [Gopalan et al., 2017]. Furthermore, other aDMPs that were located among my top hits have previously been reported as well (such as cg06639320 in the FHL2 gene, which is the second aDMP, Fig. 2.8b) [Garagnani et al., 2012]. These results validate the statistical methods used so far to process the DNA methylation data and to identify aDMPs. It is important to mention that not all the CpG sites change their DNA methylation levels with age in a perfectly linear manner. For instance, the two top hypomethylated aDMPs (Fig. 2.8c,d) modify their rate at ages 20-25 years. This was already recognised by Horvath [Horvath, 2013a] and that is why he transformed the age into a logarithmic scale before the age of 20 years in order to improve the model fit (see section 2.2.1). Furthermore, genetic background can have a significant effect on the DNA methylation patterns and interact with the ageing process to shape the epigenome [Hannum et al., 2013; van Dongen et al., 2016]. Unfortunately, I did not have genetic data for the healthy individuals but this could help 2.1 Analysing the blood methylome to study human ageing 57 70.61% 13.24% 16.14% 0e+00 1e+05 2e+05 3e+05 Nu m be r o f a DM Ps Methylation change Hypermethlated Hypomethylated No change 70.39% 10.53% 19.08% 0e+00 1e+05 2e+05 3e+05 Nu m be r o f a DM Ps Methylation change Hypermethlated Hypomethylated No change aDMPs with CCC aDMPs without CCC a b c d Fig. 2.7 The blood methylome changes during physiological human ageing. a. Barplot showing the total number of differentially methylated positions during ageing (aDMPs) that were identified (in grey: probes that did not reach statistical significance). In this case, the model with cell composition correction (CCC) was applied. b. As in a., but using the model without CCC. c. Volcano plot showing the relationship between the p-value (y-axis) and the effect size (x-axis) of the age coefficient for each one of the array probes (each point represents a probe). Those probes above the dashed green line (α = 0.01 after Bonferroni correction) are the identified aDMPs. Above the volcano plot, a density plot captures the distributions of the age coefficient for the hypermethylated aDMPs (in red) and the hypomethylated aDMPs (in blue). In this case, the model with CCC was applied. The black points are the 353 CpG probes that constitute the Horvath epigenetic clock model [Horvath, 2013a]. d. As in c., but using the model without CCC. 58 Statistical aspects 0. 00 0. 25 0. 50 0. 75 1. 00 0 25 50 75 10 0 Chronological age (years) β− v al ue cg16867657 0. 00 0. 25 0. 50 0. 75 1. 00 0 25 50 75 10 0 Chronological age (years) β− v al ue cg06639320 0. 00 0. 25 0. 50 0. 75 1. 00 0 25 50 75 10 0 Chronological age (years) β− v al ue cg19283806 0. 00 0. 25 0. 50 0. 75 1. 00 0 25 50 75 10 0 Chronological age (years) β− v al ue cg10501210 Fig. 2.8 Changes in the β -values of four differentially methylated positions during ageing (aDMPs) in the blood of the healthy individuals. cg16867657 and cg06639320 are the top aDMPs that gain methylation with age (i.e. become hypermethylated) according to the model that accounts for cell composition correction (CCC). cg19283806 and cg10501210 are the top aDMPs that lose methylation with age (i.e. become hypomethylated) according to the model that accounts for CCC. In order to aid visualisation, the black line displays the linear model β -value ∼ Age. 2.1 Analysing the blood methylome to study human ageing 59 to refine the identification of aDMPs in the future. Additionally, it would be interesting to apply methods to control for bias and inflation in the test statistic, by estimating the empirical null distribution of the observed set of test statistics [van Iterson et al., 2017]. Finally, other types of epigenetic features can be derived to understand the effects of ageing in the epigenome, such as variably methylated positions during ageing (aVMPs) [Slieker et al., 2016], differentially methylated regions (DMRs, which consider several correlated CpGs at the same time) [Teschendorff and Relton, 2018] or differentially methylated cytosines in individual cell types (DMCTs, which consider interactions between the phenotypic variable and the proportions of cell types) [Zheng et al., 2018]. 2.1.5 Shannon methylation entropy Shannon entropy (H) can be used in the context of DNA methylation analysis to estimate the information content stored in a given set of CpG sites [Hannum et al., 2013; Jenkinson et al., 2017; Slieker et al., 2016; Wang et al., 2017; Xie et al., 2011]. I calculated it using the same approach as in Hannum et al. [2013]: H =− 1 N · N ∑ i=1 [βi · log2(βi)+(1−βi) · log2(1−βi)] (2.12) where βi represents the methylation β -value for the ith array probe (or CpG site) and N = 428266 if all the array probes that passed the pre-processing pipeline are considered (i.e. genome-wide, ot at least array-wide). Shannon entropy is minimised when the methylation levels of all the CpGs are either 0% or 100%, and maximised when all of them are 50% (Fig. 2.9). Next, I calculated the genome-wide Shannon entropy for the blood samples in the healthy individuals. Consistent with previous reports [Hannum et al., 2013; Jenkinson et al., 2017; Slieker et al., 2016; Wang et al., 2017], the genome-wide Shannon entropy associated with the methylome increases during ageing (Fig. 2.10a; Spearman correlation coefficient = 0.1985; p-value = 3.8281 ·10−21), which implies that the epigenome loses information content. Finally, it is worth mentioning that I observed a remarkable batch effect on the Shannon entropy calculations, which can generate high entropy variability for a given age (Fig. 2.10b). However, after removing potential outlier batches (such as GSE41273, GSE59065 or GSE97362) the increase of Shannon methylation entropy during ageing was still consistent. Thus, accounting for technical variation (see section 2.2.3) becomes crucial when assessing this type of data, even after careful pre-processing. 60 Statistical aspects 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 β−value (of a CpG site) Sh an no n en tro py (o f a C pG si te) Fig. 2.9 Plot showing the relationship between the β -value and the methylation Shannon entropy at a given CpG site (in my case, at a given array probe). 0.3 0.4 0.5 0.6 0 25 50 75 100 Chronological age (years) Ge no m e−w id e Sh an no n en tro py 0.3 0.4 0.5 0.6 0 25 50 75 100 Chronological age (years) Ge no m e−w id e Sh an no n en tro py Batch Europe Feb_2016 GSE104812 GSE111629 GSE40279 GSE41273 GSE42861 GSE51032 GSE55491 GSE59065 GSE61496 GSE74432 GSE81961 GSE97362 a b Fig. 2.10 a. Scatterplot showing the changes in genome-wide methylation Shannon entropy during ageing in the healthy individuals. Each sample is represented by one point. The black line displays the linear model Entropy ∼ Age. b. Same as in a., but colouring the samples according to the batch where they came from. 2.2 Behaviour of Horvath’s epigenetic clock during ageing 61 2.2 Behaviour of Horvath’s epigenetic clock during ageing 2.2.1 Calculating epigenetic age using Horvath’s epigenetic clock Steve Horvath’s model, originally published in 2013 [Horvath, 2013a], is without any doubt the most widely used epigenetic clock in the literature. Given that it works across tissues with high accuracy and that it has been validated in many human cohorts, I have used it as the main tool to quantify epigenetic ageing in this work. Horvath’s model measures epigenetic age (a.k.a. DNAmAge) by making use of the DNA methylation levels at 353 CpG sites, as quantified with the Illumina methylation arrays (27K or 450K). Previous studies have generally employed a ready-to-use online calculator for DNAmAge provided by Steve Horvath [Horvath, 2013b]. This has clearly simplified the computational process and helped a lot of research groups to test the behaviour of the epigenetic clock in their system of interest. However, this has also led to the treatment of the epigenetic clock as a ‘black-box’, without critical assessment of the statistical methodology behind it. Therefore, I decided to replicate the original code and to make it available in a GitHub repository for the scientific community to be used [Martin-Herranz, 2019]. Furthermore, I tested the impact of different steps involved in the estimation of epigenetic age acceleration (EAA), including the presence/absence of background correction, removal of technical variation from batch effects and the importance of the age distribution when fitting the control models, which I discuss in the following sections. The main pipeline to calculate the epigenetic age (DNAmAge) from a sample has the following steps (some of them are shared with the previously described pipeline for DNA methylation pre-processing in section 2.1.2): 1. Background correction. I implemented a pipeline that starts with the raw DNA methylation data (IDAT files) for a sample. First, I tested the effect of applying noob background correction, before calculating the β -values, on the median absolute error (MAE) of the predictions (see section 2.2.2). Background correction did not have a major impact in the final predictions as long as I also corrected for batch effects (Fig. S1.7, Fig. 2.13c, see section 2.2.3). Therefore, I decided to keep the noob background correction for consistency with the other pre-processing pipeline. 2. Quality control. I applied the same criteria as previously described in section 2.1.2. 3. Probe filtering. Horvath’s model was originally trained starting with 21368 array probes that had the following characteristics [Horvath, 2013a]: 62 Statistical aspects • They were shared between the 27K and 450K methylation arrays. • They had ≤ 10 missing values across all the training data. Therefore, these were the probes selected for downstream analysis. 4. β -value calculation. β -values were calculated as previously described in section 2.1.2. It is worth mentioning that Horvath’s original code includes two alternatives for the imputation of missing β -values: • Slow imputation (applied when the number of missing β -values is < 3000). In this case, k-nearest neighbours (KNN) is used. KNN imputation borrows information from the DNA methylation profiles of the most similar probes (the neighbours) according to a metric (normally the Euclidean distance). The impute.knn function from the impute R package can be used for these purposes [Troyanskaya et al., 2001]. • Fast imputation (applied when the number of missing β -values is ≥ 3000). In this case, the values from the blood gold-standard (see below) can be used as the imputed values. In the case of my dataset, no missing values were present for the 21368 probes so there was no need to perform imputation. 5. Gold-standard normalisation. A modified version of BMIQ normalisation is used [Teschendorff et al., 2012]. In this case, instead of mapping the distribution of the Infinium II probes to the distribution of Infinium I probes, the mapping is done from the distribution of the 21368 probes in the sample to the distribution of a previously derived gold-standard for the same set of probes. This gold-standard was created by taking the average β -values for the 21368 probes across all the whole blood samples from [Horvath et al., 2012]. 6. Calculating epigenetic age (DNAmAge). As previously observed for some of the aDMPs, the rate of β -value change can be different before and after adult age (Fig. 2.8). For this reason, Horvath performed a transformation of the chronological age before training the model: f (c) = ct = ln ( c+1 a+1 ) if: c≤ a( c−a a+1 ) if: c > a (2.13) 2.2 Behaviour of Horvath’s epigenetic clock during ageing 63 where ct is the transformed chronological age that was used as the dependent variable during training, c is the chronological age (in years) and a is the adult age (for humans, 20 years). This transformation allows accounting for a relationship between chronological age and methylation changes that is logarithmic until adult age and linear afterwards (Fig. 2.11). −2 0 2 4 0 25 50 75 100 Chronological age (years) Tr an sf o rm ed c hr o n o lo gi ca l a ge Fig. 2.11 Plot showing the relationship between the chronological age in years (c) and the transformed chronological age (ct ) in Horvath’s model. This transformation allows accounting for different rates of β -value change before and after adult age (20 years in humans, as pointed out by the dashed black line). Given a sample to predict, the epigenetic age can then be calculated as: DNAmAge = g(cˆt) = g(βˆ0+ 353 ∑ i=1 βˆi · xi) (2.14) where cˆt is the predicted transformed age according to Horvath’s model, βˆ0 is the intercept in the Horvath’s model, βˆi is the coefficient (weight) for the ith probe (only 353 probes are finally used), xi is the β -value for the ith probe after gold-standard normalisation and g(·) is the inverse of f (·), such that: g(cˆt) = f−1(cˆt) = cˆ = ecˆt · (a+1)−1 if: cˆt ≤ 0cˆt · (a+1)+a if: cˆt > 0 (2.15) 64 Statistical aspects where cˆ is the predicted age according to Horvath’s model (i.e. DNAmAge). 2.2.2 Horvath’s epigenetic clock measures physiological ageing Using the methodology from the previous section, I calculated the epigenetic age (DNAmAge) in the blood of the healthy individuals. Given that these individuals are supposed to be disease-free, Horvath’s epigenetic clock should predict epigenetic ages that are similar to the chronological age of the samples, and this was indeed the case (Fig. 2.12a, Pearson’s correlation coefficient (PCC)= 0.9671, p-value≈ 0). This validates that Horvath’s epigenetic clock does indeed measure the ageing process (at least in a cross-sectional population) and sets a foundation for the rest of the analyses presented in this thesis. As mentioned in Chapter 1, the difference between epigenetic age and chronological age is known as epigenetic age acceleration (EAA), with a positive EAA (i.e. DNAmAge>Age) associated with several age-related health problems. In order to calculate the EAA for the healthy individuals, I fitted the following linear regression models (hereinafter referred as the control models): • With cell composition correction (CCC): DNAmAge∼ Age+Sex+Gran+CD4T +CD8T +B+Mono+NK+PC1+ ...+PC17 (2.16) where DNAmAge is the epigenetic age calculated with Horvath’s epigenetic clock; Age is the chronological age (in years) of the samples; Sex encodes for the sex of the samples (0/1); Gran, CD4T , CD8T , B, Mono and NK are the cell type proportions from the samples as calculated with my cell-type deconvolution strategy and PCN is the Nth principal component that captures technical variance and accounts for potential batch effects (see section 2.2.3 for more details). Horvath’s epigenetic clock was trained using multiple tissues and its predictions should be robust to changes in blood cell composition. However, previous studies have highlighted that adding this correction can improve the ability to detect ‘pure’ ageing effects [Chen et al., 2016a; Horvath et al., 2016a] (i.e. epigenetic age acceleration mainly caused by DNA methylation changes that happen in the nucleus of all cell types). For a given sample, the EAAwith CCC is the residual from the model i.e. the 2.2 Behaviour of Horvath’s epigenetic clock during ageing 65 difference between the actual DNAmAge and the prediction from the control model (which is conceptually similar to the difference between DNAmAge and chronological age, but accounting for the rest of covariates as well). The EAAwith CCC that I have defined is very similar to the previously reported measure of ‘intrinsic EAA’ (IEAA) [Chen et al., 2016a; Horvath et al., 2016a]. • Without CCC: DNAmAge∼ Age+Sex+PC1+ ...+PC17 (2.17) In this case the residuals of the model are referred as the EAAwithout CCC for the different samples. It is possible to calculate the overall accuracy of the predictions using the median absolute error (MAE), that is calculated as: MAE = median{|EAAi|} (2.18) where EAAi is the epigenetic age acceleration for the ith sample calculated with one of the models (with CCC or without CCC). The MAE for all the healthy individuals (full lifespan) in the control models should approach zero, and this was indeed what I observed (MAEwith CCC = 2.7117 years, MAEwithout CCC = 2.8211 years). These results are below the original MAE reported by Horvath in his test set (3.6 years) [Horvath, 2013a]. However, it is worth mentioning that some of the samples from my healthy individuals (such as samples from batches GSE40279 and GSE42861) could have been used by Horvath as part of his training set [Horvath, 2013a], and therefore these results must be interpreted carefully. Even though Horvath’s model seems to predict epigenetic age accurately, it is also clear that some samples deviate substantially from the expected prediction. This is specially obvious for the older samples (> 55 years), that have a systematically younger epigenetic age than expected (see deviations from the diagonal in Fig. 2.12a). If a control model is fit to the full lifespan dataset (which contains around 50% samples which are > 55 years), this leads to a model with a smaller than expected age coefficient (slope), which introduces a bias when estimating epigenetic age acceleration for different age groups (Fig. 2.12b). Although many studies do not take this problem into account, this phenomenon has been previously reported in the context of humans [El Khoury et al., 2018; Marioni et al., 2018] and mice [Stubbs 66 Statistical aspects 0 25 50 75 100 0 25 50 75 100 Chronological age (years) DN Am Ag e (y ea rs ) Full lifespan control: N = 2218 −30 −20 −10 0 10 20 30 Young age M iddle age Old age Age group Ep ig en et ic a ge a cc el er at io n (y ea rs ) EAA model With CCC Without CCC Full lifespan control: N = 2218 0 25 50 75 100 0 25 50 75 100 Chronological age (years) DN Am Ag e (y ea rs ) 0−55 years control: N = 1128 −30 −20 −10 0 10 20 30 Young age M iddle age Age group Ep ig en et ic a ge a cc el er at io n (y ea rs ) EAA model With CCC Without CCC 0−55 years control: N = 1128 a b c d Fig. 2.12 Horvath’s epigenetic clock measures physiological ageing. a. Scatterplot showing the relationship between epigenetic age (DNAmAge) according to Horvath’s model [Horvath, 2013a] and chronological age of the samples for the healthy individuals. Each sample is represented by one point. The black dashed line represents the diagonal to aid visualisation. The solid brown line represents the linear model DNAmAge∼ Age, which deviates from the diagonal if the full lifespan samples are used. b. Boxplots displaying the epigenetic age acceleration (EAA) distributions for different age ranges (young age: ≤ 20 years; middle age: 20 < Age≤ 55 years; old age: > 55 years) after fitting the control models to the full lifespan samples. The dashed black line represents EAA = 0, where the distributions should be centred around. This is not the case for the samples in the young age and middle age groups. In red: EAA model with cell composition correction (CCC). In blue: EAA model without CCC. c. As in a., but removing the samples in the old age group (> 55 years). The solid green line represents the linear model DNAmAge ∼ Age, which is much more similar to the diagonal if only young and middle age samples are considered. d. As in b., but fitting the control models to the samples in the young and middle age groups (0-55 years). The bias in the EAA is corrected in this case (the distributions are centred around zero for the different age groups). 2.2 Behaviour of Horvath’s epigenetic clock during ageing 67 et al., 2017]. However, to this date, it is unclear whether it represents a technical artefact or has a biological explanation (e.g. survivor bias of the older individuals, the molecular processes that drive ageing slow down with age, etc.). This highlights the importance of having a properly age-matched control when per- forming analyses with the Horvath’s epigenetic clock. As expected, removing the older samples (> 55 years) from the control models corrected for this bias (Fig. 2.12c,d) and reduced the MAE (MAEwith CCC = 2.2742 years, MAEwithout CCC = 2.3237 years). This is the strategy that I used when screening for epigenetic age acceleration in the context of developmental disorders (see Chapter 3). 2.2.3 Correcting for batch effects in the context of the epigenetic clock As mentioned in the previous section, it is expected that, after fitting the control models, the EAA distributions of the samples from the healthy individuals should be centred around zero. However, when the principal components (PCs) that capture technical variation were not included in the control models (see equations 2.16 and 2.17), this was not the case for several batches (Fig. 2.13a, Fig. S1.8a). Therefore, I hypothesised that technical variation can affect the predictions from Horvath’s epigenetic clock and that batch effects need to be explicitly accounted for in this context, even after applying the internal normalisation step against the blood gold-standard [Horvath, 2013a]. This section explains how I implemented this batch effect correction (i.e. how I derived the principal components that capture technical variance across batches). A batch effect is a systematic technical source of variation that is unrelated to the biological or scientific variables in a study [Leek et al., 2010]. They affect low- and high- throughput measurements and can be caused by a wide variety of situations: different technicians performing the experiments, different laboratories generating the data, different lots of reagents or arrays used, etc. [Leek et al., 2010]. Correcting for batch effects is crucial, especially when integrating data from different studies and sources [Maksimovic et al., 2015], as it is the case in the analyses presented in this thesis. Data generated by DNA methylation arrays is also affected by batch effects and several methods have been described in the literature to correct for them, normally at the level of probe intensities [Fortin et al., 2014] or M-values [Maksimovic et al., 2015; Price and Robinson, 2018]. In the context of the epigenetic clock, previous attempts to account for technical variation have used the first five PCs estimated directly from the DNA methylation data (presumably the β -values) [Horvath et al., 2016b]. However, this approach potentially removes meaningful biological 68 Statistical aspects variation, especially in studies with global changes in DNA methylation, such as cancer [Fortin et al., 2014] or developmental disorders (see Chapter 3). Furthermore, given that Horvath’s epigenetic clock was trained with data pre-processed using different strategies, it is unclear how applying an additional batch effect correction step to the intensities or β -values would impact the predictions [Horvath, 2013c]. Thus, I decided to correct for the potential batch effects when fitting the control models (see equations 2.16 and 2.17). I make use of the control probes present on the 450K array, which have been shown to carry information about unwanted variation from a technical source (i.e. technical variance) [Fortin et al., 2014; Gagnon-Bartsch and Speed, 2012; Maksimovic et al., 2015]. These probes are designed to capture technical variance in negative controls, measure between-array differences and quantify the performance of different steps of the array protocol, such as bisulfite conversion, staining or hybridisation [Fortin et al., 2014; Illumina, 2010]. I performed principal component analysis (PCA, with centering but not scaling using the prcomp function in R) on the raw intensities of the control probes (847 probes · 2 channels = 1694 intensity values) for all the healthy individuals (N = 2218) and the samples with developmental disorders (cases, N = 666, see Chapter 3). This showed that the first two PCs capture the batch structure in both healthy individuals (Fig. 2.13b) and cases (Fig. S1.9). Including the first 17 PCs as part of the epigenetic age acceleration (EAA) modelling (see equations 2.16 and 2.17), which together accounted for 98.06% of the technical variance in all the samples (Fig. S1.10), significantly reduced the median absolute error (MAE) of the predictions in the healthy individuals (MAEwith CCC = 2.7117 years, MAEwithout CCC = 2.8211 years, mean MAE = 2.7664, Fig. 2.13c). Notably, the reduction in the MAE provided by the batch effect correction was higher than the improvement provided by cell composition correction, a common practice in the epigenetic clock field [Chen et al., 2016a; Horvath et al., 2016a]. The optimal number of PCs was found by making use of the findElbow function from [Akalin, 2014]. Finally, deviations from a median EAA close to zero in some of the batches after batch effect correction (Fig. 2.13d, Fig. S1.8b) could be explained by other variables, such as a small batch size or an overrepresentation of young samples (Fig. 2.14). The latter is a consequence of the fact that Horvath’s model underestimates the epigenetic ages of older samples, which I have discussed in the previous section. Thus, I have shown that correcting for batch effects in the context of the epigenetic clock is important, especially when combining datasets from different sources for meta-analysis purposes. Batch effect correction is essential to remove technical variance that could affect the epigenetic age of the samples and confound biological interpretation. Furthermore, given the flexibility of this modelling approach, I have 2.2 Behaviour of Horvath’s epigenetic clock during ageing 69 ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 0 −2 0 0 20 40 Eu ro pe Fe b_ 20 16 GS E1 04 81 2 GS E1 11 62 9 GS E4 02 79 GS E4 12 73 GS E4 28 61 GS E5 10 32 GS E5 54 91 GS E5 90 65 GS E6 14 96 GS E7 44 32 GS E8 19 61 GS E9 73 62 Batch EA A wi th C CC (y ea rs ) Batch effect correction: FALSE MAE: 3.0881 ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●●● ● ● ● ● ●● ● −4 0 −2 0 0 20 40 Eu ro pe Fe b_ 20 16 GS E1 04 81 2 GS E1 11 62 9 GS E4 02 79 GS E4 12 73 GS E4 28 61 GS E5 10 32 GS E5 54 91 GS E5 90 65 GS E6 14 96 GS E7 44 32 GS E8 19 61 GS E9 73 62 Batch EA A wi th C CC (y ea rs ) Batch effect correction: TRUE MAE: 2.7117 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 00 00 0 20 00 0 40 00 0 −1 e+ 05 −5 e+ 04 0e +0 0 5e +0 4 1e +0 5 PC1 (68.99%) PC 2 (1 0. 33 % ) Batch ● ● ● ● ● ● ● ● ● ● ● ● ● ● Europe Feb_2016 GSE104812 GSE111629 GSE40279 GSE41273 GSE42861 GSE51032 GSE55491 GSE59065 GSE61496 GSE74432 GSE81961 GSE97362 Healthy individuals 2. 5 3. 0 3. 5 4. 0 0 25 50 75 10 0 Number of PCs M AE in c on tro l Corrections CCC: No | Batch: No CCC: No | Batch: Yes CCC: Yes | Batch: No CCC: Yes | Batch: Yes Optimal number of PCs: 17 Optimal mean MAE: 2.7664 Background correction: noob a b c d Fig. 2.13 Correcting for batch effects in the context of the epigenetic clock. a. Distribution of the epigenetic age acceleration (EAA) for the different batches of healthy individual samples, using the control model with cell composition correction (CCC) and before applying batch effect correction. The dashed black line represents EAA = 0, where the distributions should be centred around. b. Scatterplot showing the values of the first two principal components (PCs) for the healthy individual samples after performing PCA on the control probes of the 450K arrays. Each point corresponds to a different sample and the colours represent the different batches. The different batches cluster together in the PCA space, showing that the control probes indeed capture technical variation. Please note that all the PCA calculations were done using samples from both healthy individuals (full lifespan, N = 2218) and cases from developmental disorders (N = 666, see Chapter 3). c. Plot showing how the median absolute error (MAE) of the prediction in the healthy individual samples, that should tend to zero, is reduced when the PCs capturing the technical variation are included as part of the modelling strategy (see equations 2.16 and 2.17). The dashed line represents the optimal number of PCs (17) that was finally used. The optimal mean MAE is calculated as the average MAE between the green and purple lines. d. As in a., but after applying batch effect correction (i.e. equivalent to equation 2.16). 70 Statistical aspects applied batch effect correction across other types of analyses in the thesis, such as DMPs identification (see equation 2.10). ● ● ● ● ● ● ● ● ● ● ●● −2 .5 0. 0 2. 5 0 20 40 60 Median age (years) M ed ia n EA A wi th C CC (y ea rs ) ● ● 200 400 600 Batch ● ● ● ● ● ● ● ● ● ● ● ● ● ● Europe Feb_2016 GSE104812 GSE111629 GSE40279 GSE41273 GSE42861 GSE51032 GSE55491 GSE59065 GSE61496 GSE74432 GSE81961 GSE97362 Fig. 2.14 After applying batch effect correction in the samples from the healthy individuals, deviations from a median epigenetic age acceleration (EAA) of zero (dotted black line) in some of the batches can be explained by other causes. The grey line separates in the lower left corner those weird batches (Feb_2016, GSE104812, GSE41273, GSE55491), which have a small sample size and/or a low median age. 2.3 Behaviour of other epigenetic clocks during ageing 2.3.1 Hannum’s epigenetic clock Besides Horvath’s epigenetic clock, other models have been proposed in the literature to measure the ageing process using DNA methylation. Among them, Hannum’s epigenetic clock has also been shown to accurately predict epigenetic age in several cohorts [Chen et al., 2016a; Horvath et al., 2016a; Irvin et al., 2018; Marioni et al., 2018, 2015; Perna et al., 2016]. Hannum’s model was originally trained in whole blood and it makes use of a linear combination of β -values from 71 probes in the 450K array. 2.3 Behaviour of other epigenetic clocks during ageing 71 I calculated the epigenetic ages according to Hannum’s model (HannumAge), although I only used 68 out of the 71 probes (the other 3 were filtered out during my pre-processing). Hannum’s epigenetic clock performed quite accurately in the dataset of healthy individ- uals, although with a slight overestimation of the epigenetic ages (Fig 2.15a), which has also been previously observed [Marioni et al., 2015]. Furthermore, it is possible to observe the non-linear behaviour of Hannum’s clock for young ages (≤ 20 years), for which the authors did not correct in their original publication [Hannum et al., 2013]. Horvath’s and Hannum’s epigenetic clocks are correlated (Fig. 2.15b). The magnitude of this correlation (HannumAge vs DNAmAge: PCC = 0.9778) was slightly stronger than the correlation be- tween HannumAge and chronological age (PCC = 0.9756), which could highlight the fact that both models indeed measure epigenetic age. Next, I estimated the epigenetic age acceleration (EAA) according to Hannum’s epige- netic clock, using similar models to the ones previously described (although in this case the dependent variable was HannumAge, see equations 2.16 and 2.17). The median absolute errors for Hannum’s model (MAEwith CCC = 2.8422 years, MAEwithout CCC = 2.9484 years) were slightly higher than the ones obtained for Horvath’s clock (MAEwith CCC = 2.7117 years, MAEwithout CCC = 2.8211 years), which could also be influenced by the fact that three of the model probes were not available. The EAAs estimated by Hannum’s and Horvath’s clocks showed a moderate correlation (Fig. 2.15c,d), consistent with previous estimates [Irvin et al., 2018]. Including cell composition correction improved the correlation between the EAAs from both clocks, highlighting the fact that Hannum’s clock seems to be confounded with the changes in blood cell composition with age [Irvin et al., 2018; Marioni et al., 2015]. Overall, Hannum’s epigenetic clock performed well in my dataset. However, given that it produces slightly worse predictions than Horvath’s and could be partially tracking blood immunosenescence instead of multi-tissue ageing effects, I used the latter as my main proxy to measure the ageing process in this thesis. Finally, it is also worth mentioning that the data that was used to train Hannum’s model (GSE40279) is also part of the dataset of healthy individuals that I assembled and, therefore, this analysis does not constitute a completely independent assessment of the behaviour of Hannum’s epigenetic clock. 2.3.2 Epigenetic mitotic clock: epiTOC In 2016, Yang and colleagues conceived a novel type of epigenetic clock called epiTOC (epigenetic Timer Of Cancer), which measures the rate of (stem) cell division in both normal and cancerous tissues and is associated with cancer risk [Yang et al., 2016c]. This epigenetic 72 Statistical aspects 0 25 50 75 100 0 25 50 75 100 Chronological age (years) Ha nn um Ag e (y ea rs ) Full lifespan control: N = 2218 0 25 50 75 100 0 25 50 75 100 DNAmAge (years) Ha nn um Ag e (y ea rs ) Full lifespan control: N = 2218 −20 0 20 40 −20 0 20 Horvath EAA with CCC (years) Ha nn um E AA w ith C CC (y ea rs ) PCC: 0.5952; p−value < 2.2e−16 Full lifespan control: N = 2218 −20 0 20 40 −20 0 20 Horvath EAA without CCC (years) Ha nn um E AA w ith ou t C CC (y ea rs ) PCC: 0.5781; p−value < 2.2e−16 Full lifespan control: N = 2218 a b c d Fig. 2.15 Behaviour of Hannum’s epigenetic clock in the healthy individuals. a. Scatterplot showing the relationship between the epigenetic age predicted with Hannum’s model (HannumAge) [Hannum et al., 2013] and chronological age of the samples for the healthy individuals. Each sample is represented by one point. The black dashed line represents the diagonal to aid visualisation. The solid brown line represents the linear model HannumAge ∼ Age. b. Relationship between the Hannum and Horvath epigenetic ages estimated for the same sample. The solid brown line represents the linear model HannumAge ∼ DNAmAge. c. Relationship between the epigenetic age acceleration (EAA) calculated with the Hannum and the Horvath’s epigenetic clocks. In this case the models include cell composition correction (CCC). The solid brown line represents the linear model Hannum_EAAwith CCC ∼ Horvath_EAAwith CCC. d. As in c., but in this case the models do not include CCC. 2.3 Behaviour of other epigenetic clocks during ageing 73 mitotic clock tracks the gain in methylation levels that happens in 385 CpG sites, which localise in the promoter of genes that are targeted by Polycomb Repressing Complex 2 (PRC2). Importantly, these CpG sites are unmethylated across fetal tissues and therefore this provides a ground state to measure these changes during human lifespan. I calculated the mitotic age (pcgtAge) of the healthy individuals in my dataset, although I only used 378 out of the 385 probes (the other 7 were filtered out during my pre-processing). The mitotic age of the individuals correlated with both chronological age (PCC = 0.5131, Fig. 2.16a) and DNAmAge (PCC = 0.5602, Fig. 2.16b), which is expected given the cumula- tive number of divisions of the hematopoietic stem cells [Beerman et al., 2013]. Furthermore, I estimated the epigenetic age acceleration (EAA) according to the epigenetic mitotic clock, using similar models to the ones previously described (although in this case the dependent variable was pcgtAge, see equations 2.16 and 2.17). Interestingly, the EAAs for pcgtAge and DNAmAge showed a small but highly statistically significant correlation (Fig. 2.16c,d). Moreover, I also did some preliminary work where I calculated the DNAmAge of different healthy tissues (that came from cancer patients). I observed that tissues with a high turnover (such as breast) [Horvath, 2013a; Sehl et al., 2017] had a higher DNAmAge when compared with tissues with a low turnover (data not shown). This was quite surprising given that Horvath’s epigenetic clock predicts across tissues with different turnover rates [Yang et al., 2016c]. Additionally, it has been recently demonstrated that DNAmAge increases linearly with cell passage in vitro if TERT (the catalytic subunit of telomerase) is expressed (although whether this also applies in vivo is unknown) [Lu et al., 2018]. All of this, together with the fact that DNAmAge has a stronger correlation with pcgtAge than chronological age (at least in this blood dataset), could suggest that Horvath’s epi- genetic clock might track cell division to a certain extent (although it is also clear that Horvath’s clock is mostly not a mitotic clock). Furthermore, it is important to point out that the observed effect sizes are small, that some of these results could be confounded by variables that are difficult to account for (e.g. higher DNAmAge in breast tissue could be due to hormonal factors) and that DNAmAge is not universally accelerated in cancer [Horvath, 2015]. Therefore, further testing of these ideas is required by future studies, which hopefully will improve our understanding of the contribution of cell division to Horvath’s epigenetic clock and its relation to the hypermethylation in PRC2-bound regions as measured by the epigenetic mitotic clock. 74 Statistical aspects 0.05 0.10 0.15 0.20 0 25 50 75 100 Chronological age (years) pc gt Ag e Full lifespan control: N = 2218 0.05 0.10 0.15 0.20 0 25 50 75 100 DNAmAge (years) pc gt Ag e Full lifespan control: N = 2218 −0.05 0.00 0.05 −20 0 20 Horvath EAA with CCC (years) pc gt Ag e EA A wi th C CC PCC: 0.279; p−value < 2.2e−16 Full lifespan control: N = 2218 0.00 0.05 0.10 0.15 −20 0 20 Horvath EAA without CCC (years) pc gt Ag e EA A wi th ou t C CC PCC: 0.2778; p−value < 2.2e−16 Full lifespan control: N = 2218 a b c d Fig. 2.16 Behaviour of the epigenetic mitotic clock (epiTOC) in the healthy individuals. a. Scatterplot showing the relationship between mitotic age (pcgtAge) [Yang et al., 2016c] and chronological age of the samples for the healthy individuals. Each sample is represented by one point. The solid brown line represents the linear model pcgtAge ∼ Age. b. Relationship between pcgtAge and DNAmAge estimated for the same sample. The solid brown line represents the linear model pcgtAge ∼ DNAmAge. c. Relationship between the epigenetic age acceleration (EAA) calculated with the mitotic and the Horvath’s epigenetic clocks. In this case the models include cell composition correction (CCC). The solid brown line represents the linear model pcgtAge_EAAwith CCC ∼ Horvath_EAAwith CCC. d. As in c., but in this case the models do not include CCC. 2.4 Additional methods 75 2.4 Additional methods A short introduction to the linear regression framework Linear models are a broad class of statistical analyses that are at the core of many bioin- formatic methods, including differential RNA expression analyses [Ritchie et al., 2015] or genome-wide association studies (GWAS) [Visscher et al., 2017]. An instance of such models is linear regression [Eaton, 2007], a statistical approach that allows modelling of the relationship between: • A dependent variable Y, with observations yi ∈ R and i ∈ {1, ...,n}, where n is the total number of observations (i.e. samples). • One or more independent variables X j, with observations xi j ∈ R and j ∈ {1, ...,k}, where k is the total number of independent variables (a.k.a covariates). These variables can indicate, for example, whether a specific condition or phenotype is present in a given sample, quantify the effects of a continuous variable (such as chronological age) or adjust for the effects of batch effects; which gives this statistical framework a great analytical flexibility [Ritchie et al., 2015]. We can describe the dependent variable Y as a function of the independent variables X j: yi = k ∑ j=1 xi jβ j + εi (2.19) where β j are unknown parameters that need to be estimated from the data and εi is the random error. In matrix form: y = Xβ + ε (2.20) where y ∈ Rn is the vector {y1, ...,yn}, X ∈ Rn×k is the n×k matrix of xi j’s, β ∈ Rk is the vector {β1, ...,βk} and ε ∈ Rn is the vector {ε1, ...,εn}. Assuming that E(ε) = 0, Var(ε) = σ2 > 0 and Cov(ε) = σ2In (where In is the n×n iden- tity matrix) and applying the Gauss-Markov theorem [Eaton, 2007], it can be demonstrated that: 76 Statistical aspects βˆ = (X′X)−1X′y (2.21) where X′ is the transpose of X and βˆ is the least-squares estimator of β , since it minimises: n ∑ i=1 (yi− k ∑ j=1 xi jβˆ j)2 (2.22) It is possible to test whether there is a statistically-significant linear association between the dependent variable (Y) and one of the independent variables (X j) i.e. to test: H0 : β j = 0 against HA : β j ̸= 0 (2.23) where H0 is the null hypothesis and HA is the alternative hypothesis. A t-statistic (T ) can be derived after performing the fitting of the linear regression model [Sheather, 2009]: T = βˆ j se(βˆ j) (2.24) where se(βˆ j) is the standard error of βˆ j. When H0 is true, then the statistic T follows a Student’s t distribution with n− k degrees of freedom i.e. T ∼ tn−k. This allows to estimate the p-value for the linear association of Y with a given X j. Finally, it is worth mentioning the nomenclature that I used for the linear regression models along this thesis. For example, the following model fits a linear association between the dependent variable (e.g. β -value at a specific CpG probe in the array) with intercept and 3 covariates (e.g. age, sex and disease status):  y1 y2 ... yn =  1 x11 x12 x13 1 x21 x22 x23 ... ... ... ... 1 xn1 xn2 xn3   β0 β1 β2 β3 +  ε1 ε2 ... εn  (2.25) where yi is the β -value at a certain CpG probe for the ith sample, xi1 is the age for the ith sample, xi2 is the sex (e.g. 0 for male and 1 for female) for the ith sample, xi3 is the 2.4 Additional methods 77 disease status (e.g. 0 for a healthy individual and 1 for an individual with a disease) for the ith sample, β0 is the intercept coefficient, β j are the covariate coefficients ( j = 1 for age, j = 2 for sex, j = 3 for disease status) and εi is the error for the ith sample. Throughout this thesis, I use the following nomenclature to describe the model above (‘R-style’ nomenclature): Beta∼ Age+Sex+Disease_status (2.26) Chapter 3 Biological aspects ‘At a fundamental level evolutionary survival is the preservation of a dynamic balance between information, or order, and entropy, or disorder.’ T. B. L. Kirkwood [1977] Declaration This chapter in mainly the product of my own work. Additionally, I would like to recognise the contributions of Janet M. Thornton, Wolf Reik and Thomas M. Stubbs (who helped designing the study and interpreting the data), Erfan Aref-Eshghi (who run some of the analyses using my code and provided part of the samples in the dataset), Marc Jan Bonder and Oliver Stegle (who provided statistical input) and Bekim Sadikovic (who provided part of the samples in the dataset). All of them also helped in the revision of the final text. This work has been published in the journal Genome Biology [Martin-Herranz et al., 2019]. 3.1 Background Epigenetic clocks can be understood as a proxy to quantify the changes of the epigenome with age. However, little is known about the molecular mechanisms that determine the rate of the underlying epigenetic ageing clock (see section 1.3.3). Steve Horvath proposed that the multi-tissue epigenetic clock captures the workings of an epigenetic maintenance system [Horvath, 2013a]. Recent GWAS studies have found several genetic variants associated with epigenetic age acceleration in genes such as TERT (the catalytic subunit of telomerase) [Lu et al., 2018], DHX57 (an ATP-dependent RNA helicase) [Lu et al., 2016] or MLST8 (a subunit of both mTORC1 and mTORC2 complexes) [Lu et al., 2016]. Nevertheless, to my 80 Biological aspects knowledge no genetic variants in epigenetic modifiers have been found and the molecular nature of this hypothetical system is unknown to this date. I decided to take a reverse genetics approach and look at the behaviour of the epigenetic clock in patients with developmental disorders, many of which harbour mutations in pro- teins of the epigenetic machinery [Aref-Eshghi et al., 2018b; Bjornsson, 2015]. I performed an unbiased screen for epigenetic age acceleration and found that Sotos syndrome accelerates epigenetic ageing, potentially revealing a role of H3K36 methylation maintenance in the regulation of the rate of the epigenetic clock. 3.2 Screening for genes that accelerate the epigenetic age- ing clock The main goal of this analysis is to identify genes, mainly components of the epigenetic ma- chinery, that can affect the rate of epigenetic ageing in humans (as measured by Horvath’s epigenetic clock) [Horvath, 2013a]. For this purpose, I assembled a dataset with all the DNA methylation data from patients with different developmental disorders that I could find, in order to perform an unbiased screen. This dataset combines samples publicly available in GEO [Edgar et al., 2002] with in-house data generated by my collaborators at the London Health Sciences Centre, Canada (Table S2.1, Fig. S2.1). All these data were generated from blood using the Illumina 450K methylation array, as in the case of the healthy individuals described in Chapter 2. Many of these developmental syndromes have overlapping clinical features [Aref-Eshghi et al., 2018b; Bjornsson, 2015]. Furthermore, in some cases with a clinical diagnosis, the genetic cause remains unknown, probably due to locus heterogeneity or difficulty to assess the clinical significance of some genetic variants [Aref-Eshghi et al., 2017]. Therefore, several studies have explored the ability of DNA methylation signatures to aid differential diagnoses of these syndromes [Aldinger et al., 2013; Alisch et al., 2013; Aref-Eshghi et al., 2018a,b, 2017; Butcher et al., 2017; Choufani et al., 2015; Grafodatskaya et al., 2013; Hood et al., 2016; Kernohan et al., 2016; Schenkel et al., 2017, 2016]. Given that most of the diagnoses for developmental disorders are carried out early in life, this dataset has a bias towards younger ages (Fig. 3.1). In order to maximise the ability to detect ageing-associated effects, I kept only those developmental disorders with at least 5 samples, of which at least 2 had a chronological age ≥ 20 years (which, according to Horvath’s model, is the adult age 3.2 Screening for genes that accelerate the epigenetic ageing clock 81 for humans) [Horvath, 2013a]. This filtering resulted in a dataset for the main screen with N = 367 samples from cases, which had ages between 0 and 55 years (Fig. 3.2, Table 3.1). 0.00 0.02 0.04 0.06 0 20 40 Chronological age (years) D en si ty Cases: N=367 Fig. 3.1 Histogram showing the chronological age distribution for all the individuals with developmental disorders (cases) included in the final dataset (i.e. after QC and filtering). The blue line represents the 1D kernel density estimate, as calculated by the stat_density function in R with default parameters. The purpose of the screen is to test whether the epigenetic ages of the samples from a given developmental disorder (cases) deviate from their chronological age i.e. identify those developmental disorders that present epigenetic age acceleration (EAA). For a given sample, a positive EAA indicates that the epigenetic (biological) age of the sample is higher than the one expected for someone with that chronological age. In other words, it means that the epigenome of that person resembles the epigenome of an older individual. The opposite is true when a negative EAA is found (i.e. the epigenome looks younger than expected). I calculated the epigenetic ages (DNAmAge) of all the samples according to Horvath’s epigenetic clock (see section 2.2.1) and I fitted the control models to the samples from the healthy individuals, including models with and without blood cell composition correction (CCC) and always accounting for potential batch effects (see equations 2.16 and 2.17). As previously discussed (see section 2.2.2), due to the fact that Horvath’s model underestimates the epigenetic age of old samples, the age distribution of the control samples 82 Biological aspects Developmental disorder Gene(s) involved Gene(s) function Molecular cause N Age range (years) Angelman UBE3A Ubiquitin protein ligase E3A Imprinting, mutation 14 1 to 55 Autism spectrum disorder (ASD) - - - 119 1.83 to 35.16 Alpha thalassemia/mental retardation X-linked syn- drome (ATR-X) ATRX Chromatin remodelling Mutation 15 0.7 to 27 Claes-Jensen KDM5C H3K4 demethylase Mutation 10 2 to 42 Coffin-Lowry RPS6KA3 Serine / thre- onine kinase Mutation 10 1.3 to 22.8 Floating-Harbour SRCAP Chromatin remodelling Mutation 17 4 to 42 Fragile X syndrome (FXS) FMR1 Translational control Mutation (CGG expansion) 32 0.08 to 48 Kabuki KMT2D H3K4 methyltrans- ferase Mutation 46 0 to 24.1 Noonan PTPN11, RAF1, SOS1 RAS/ MAPK signalling Mutation 15, 11, 14 0.2 to 49 Rett MECP2 Transcriptional repression Mutation 15 1 to 34 Saethre-Chotzen TWIST1 Transcription factor Mutation 22 0 to 38 Sotos NSD1 H3K36 methyltrans- ferase Mutation 20 1.6 to 41 Weaver EZH2 H3K27 methyltrans- ferase Mutation 7 2.58 to 43 Total - - - 367 0 to 55 Table 3.1 Overview of the developmental disorders that were included in the screening after quality control and filtering (total N = 367). 3.2 Screening for genes that accelerate the epigenetic ageing clock 83 Controls Healthy samples N = 2218 Cases 23 developmental disorders N = 666 QC DNA methylation data (IDAT files) Controls Healthy samples N = 1128 Cases 13 developmental disorders N = 367 Filtering Screening for epigenetic age acceleration (EAA) using Horvath’s epigenetic clock Calculate pcgtAge using epigenetic mitotic clock Enrichment of (epi)genomic features in DMPs Calculate Shannon entropyBenchmarking of pre-processing strategies and correcting for batch effects Calculating cell composition in blood Fig. 3.2 Flow diagram that portrays an overview of the different analyses that are carried out in the raw DNA methylation data (IDAT files) from human blood for cases (developmental disorders samples) and controls (healthy samples). The control samples are filtered to match the age range of the cases (0-55 years). The cases are filtered based on the number of ‘adult’ samples available (for each disorder, at least 5 samples, with 2 of them with an age ≥ 20 years). QC: quality control. DMPs: differentially methylated positions. can have an impact on the results of the screen. Therefore, I filtered the ages of the healthy individual samples to make them match the age range of the developmental disorders (0-55 years, N = 1128, see Fig. 3.2). The EAA for the control samples corresponds to the residuals from the control models (see section 2.2.2). On the other hand, the EAA for a case sample is calculated by taking the difference between the epigenetic age (DNAmAge) and the predicted value from the corresponding control model (with or without cell composition). Finally, I compared the dis- tributions of the EAA for the different developmental disorders against the EAA distributions for the healthy controls using the non-parametric two-sided Wilcoxon’s test. P-values were adjusted for multiple testing using Bonferroni correction and a significance level of α = 0.01 was applied. It is worth mentioning that some of the developmental disorders included in the screen (such as autism spectrum disorder or Coffin-Lowry syndrome) are not necessarily caused by alterations in the epigenetic machinery, but were still included to maintain the unbiased nature of the screen. 84 Biological aspects 3.3 Sotos syndrome accelerates epigenetic ageing The results from the screen are portrayed in Fig. 3.3. Most syndromes do not show evidence of accelerated epigenetic ageing, but Sotos syndrome presents a clear positive EAA (median EAAwith CCC = + 7.64 years, median EAAwithout CCC = + 7.16 years), with p-values consider- ably below the significance level of 0.01 after Bonferroni correction (p-valuecorrected, with CCC = 3.40 · 10−9, p-valuecorrected, without CCC = 2.61 · 10−7). Additionally, Rett syndrome (median EAAwith CCC = + 2.68 years, median EAAwithout CCC = + 2.46 years, p-valuecorrected, with CCC = 0.0069, p-valuecorrected, without CCC = 0.0251) and Kabuki syndrome (median EAAwith CCC = - 1.78 years, median EAAwithout CCC = - 2.25 years, p-valuecorrected, with CCC = 0.0011, p-valuecorrected, without CCC = 0.0035) reach significance, with a positive and negative EAA respectively. Finally, fragile X syndrome (FXS) shows a positive EAA trend (median EAAwith CCC = + 2.44 years, median EAAwithout CCC = + 2.88 years) that does not reach significance in the screen (p-valuecorrected, with CCC = 0.0680, p-valuecorrected, without CCC = 0.0693). Next, I tested the effect of changing the median age used to build the healthy control model (i.e. the median age of the controls) on the screening results (Fig. S2.2). Sotos syndrome is robust to these changes, whilst Rett, Kabuki and FXS are much more sensitive to the control model used. This again highlights the importance of choosing an appropriate age- matched control when testing for epigenetic age acceleration, given that Horvath’s epigenetic clock underestimates epigenetic age for advanced chronological ages [El Khoury et al., 2018; Marioni et al., 2018]. Moreover, all but one of the Sotos syndrome patients (19/20 = 95%) show a consistent deviation in EAA (with CCC) in the same direction (Fig. 3.4a,b), which is not the case for the rest of the disorders, with the exception of Rett syndrome (Fig. S2.3). Even though these data suggest that there are already some methylomic changes at birth, the EAA seems to increase with age in the case of Sotos patients (Fig. 3.4b; p-values for the slope coefficient of the EAA∼Age linear regression: p-valuewith CCC = 0.00569, p-valuewithout CCC = 0.00514).This could imply that at least some of the changes that normally affect the epigenome with age are happening at a faster rate in Sotos syndrome patients during their lifespan (as opposed to the idea that the Sotos epigenetic changes are only acquired during prenatal development and remain constant afterwards). Finally, I investigated whether Sotos syndrome leads to a higher rate of (stem) cell division in blood when compared with the healthy population. I employed the epigenetic mitotic clock, that makes use of the fact that some CpGs in promoters that are bound by Polycomb 3.3 Sotos syndrome accelerates epigenetic ageing 85 0.0 2.5 5.0 7.5 10.0 Angelm an ASD ATR −X Claes_Jensen Coffin_Lo w ry Floating_Harbour FXS Kabuki N oonan_PTPN 11 N oonan_R AF1 N oonan_SO S1 R ett Saethre_Chotze n Sotos W e a ve r − lo g 1 0(P − v al u e) EAA model With CCC Without CCC Age range in control: 0−55 years Median age in control: 34 years Number of samples in control: 1128 ll l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l ll l l l l l l l l l l l l l l l l l ll l l l l −20 −10 0 10 20 30 Ep ig en et ic a ge a cc el er at io n (ye ar s) EAA model With CCC Without CCC Fig. 3.3 Screening for epigenetic age acceleration (EAA) in developmental disorders. The upper panel shows the p-values derived from comparing the EAA distributions for the samples in a given developmental disorder and the control (two-sided Wilcoxon’s test). The dashed green line displays the significance level of α = 0.01 after Bonferroni correction. The bars above the green line reach statistical significance. The lower panel displays the actual EAA distributions, which allows assessing the direction of the EAA (positive or negative). In red: EAA model with cell composition correction (CCC). In blue: EAA model without CCC. ASD: autism spectrum disorder. ATR-X: alpha thalassemia/mental retardation X-linked syndrome. FXS: fragile X syndrome. 86 Biological aspects ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ●● ●●●●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ●● ● ● ●● ●●●● ● ● ● ●●●● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ●●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Control: N=1128 Sotos: N=20 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●●● ● ●● ●● ● ● ● ● ● ●●●● ● ●●● ●●● ● ● ●● ● ● ●●●● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ●● ●●●●●● ●● ● ● ● ● ● ● ●●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Sotos: N=20 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ●● ● ●●●●● ● ● ● ●● ●● ● ● ● ● ● ●●●●●●● ● ● ● ● ●● ●●● ●●●● ● ● ● ● ● ●●●●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.025 0.050 0.075 0.100 0.125 0 20 40 Chronological age (years) pc gt Ag e Control: N=1128 Sotos: N=20 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ● −0.02 0.00 0.02 0.04 0 20 40 Chronological age (years) pc gt Ag e ac ce le ra tio n Control: N=1128 Sotos: N=20 ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ●● ●●● ● ● ● ●●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 20 40 Chronological age (years) Disease status ● ● Sotos Control Control: N=1128 Sotos: N=20 a b c d Fig. 3.4 Sotos syndrome accelerates epigenetic ageing. a. Scatterplot showing the relationship between epigenetic age (DNAmAge) according to Horvath’s model [Horvath, 2013a] and chronological age of the samples for Sotos (orange) and control (grey). Each sample is represented by one point. The black dashed line represents the diagonal to aid visualisation. b. Scatterplot showing the relationship between the epigenetic age acceleration (EAA) and chronological age of the samples for Sotos (orange) and control (grey). Each sample is represented by one point. The yellow line represents the linear model EAA ∼ Age, with the standard error shown in the light yellow shade. c. Scatterplot showing the relationship between the score for the epigenetic mitotic clock (pcgtAge) [Yang et al., 2016c] and chronological age of the samples for Sotos (orange) and control (grey). Each sample is represented by one point. A higher value of pcgtAge is associated with a higher number of cell divisions in the tissue. d. Scatterplot showing the relationship between the epigenetic mitotic clock (pcgtAge) acceleration (with CCC) and chronological age of the samples for Sotos (orange) and control (grey). Each sample is represented by one point. The yellow line represents the linear model pcgtAge_EAAwith CCC ∼ Age, with the standard error shown in the light yellow shade. 3.4 Comparing Sotos syndrome and physiological ageing 87 group proteins become hypermethylated with age (captured by a metric called pcgtAge; see section 2.3.2). This hypermethylation correlates with the number of cell divisions in the tissue and is also associated with an increase in cancer risk [Yang et al., 2016c]. I calculated pcgtAge for the Sotos samples and compared them against the healthy controls (using a model similar to the one in equation 2.16, although in this case the dependent variable was pcgtAge; see section 2.3.2). I found a trend suggesting that the epigenetic mitotic clock might be accelerated in Sotos patients (p-value = 0.0112, Fig. 3.4c,d), which could explain the higher cancer predisposition (e.g. to acute leukemia, sacrococcygeal teratoma, neuroblastoma, ...) reported in these patients and might relate to their overgrowth [Leventopoulos et al., 2009]. Consequently, I report that individuals with Sotos syndrome present an accelerated epigenetic age, which makes their epigenome look, on average, more than 7 years older than expected. These changes seem to be the consequence of a higher ticking rate of the epigenetic ageing clock (or at least part of its machinery), with epigenetic age acceleration increasing during lifespan: the youngest Sotos patient (1.6 years) has an EAAwith CCC = 5.43 years and the oldest (41 years) has an EAAwith CCC = 24.53 years. Additionally, Rett syndrome, Kabuki syndrome and fragile X syndrome could also have their epigenetic ages affected, but more evidence is required to be certain about this conclusion. 3.4 Comparing Sotos syndrome and physiological ageing Sotos syndrome is caused by loss-of-function heterozygous mutations in the NSD1 gene, a histone H3K36 methyltransferase [Choufani et al., 2015; Kurotaki et al., 2002]. These mutations lead to a specific DNA methylation signature in Sotos patients, potentially due to the crosstalk between the histone and DNA methylation machinery [Choufani et al., 2015]. In order to gain a more detailed picture of the reported epigenetic age acceleration, I decided to compare the genome-wide (or at least array-wide) changes observed in the methylome during ageing with those observed in Sotos syndrome. For this purpose, I identified differentially methylated positions (DMPs) for both conditions, using the models that account for cell composition correction (see equations 2.10 and 3.1). Ageing DMPs (aDMPs) were calculated in this case using the healthy samples in the age range 0-55 years. aDMPs were composed almost equally of CpG sites that gain methylation with age (i.e. become hypermethylated, 51.69%) and CpG sites that lose methylation with age (i.e. become hypomethylated, 48.31%, barplot in Fig. 3.5a), a picture that resembles previous studies [Zhu et al., 2018]. It is worth mentioning that in this case fewer aDMPs were identified when compared with the full lifespan analysis presented in section 2.1.4, where the hypomethylated aDMPs were also 88 Biological aspects slightly more frequent when compared with the hypermethylated ones. This highlights the importance of the age range and/or the sample size when calculating aDMPs. On the contrary, DMPs in Sotos were clearly dominated by CpGs that have lower methylation levels in individuals with the syndrome (i.e. hypomethylated, 99.27%, barplot in Fig. 3.5a). This is highly consistent with the results from a previous report, where 99.3% of the Sotos DMPs identified (in this case applying a filter of >20% difference in average DNA methylation levels) were hypomethylated in Sotos patients [Choufani et al., 2015]. It is important to point out that Sotos syndrome patients and healthy control samples were matched for age and sex in both differential analyses. Furthermore, in my analysis I included age and sex as covariates in the linear model (see equation 3.1), which minimises the chances that Sotos DMPs could also constitute ageing DMPs. Then, I compared the intersections between the hypermethylated and hypomethylated DMPs in ageing and Sotos. Most of the DMPs were specific for ageing or Sotos (i.e. they did not overlap), but a subset of them were shared (table in Fig. 3.5a). Interestingly, there were 1728 DMPs that became hypomethylated both during ageing and in Sotos (‘Hypo-Hypo DMPs’). This subset of DMPs is of special interest because it could be used to understand in more depth some of the mechanisms that drive hypomethylation during physiological ageing. Thus, I tested whether the different subsets of DMPs are found in specific genomic contexts (Fig. S2.4, Fig. S2.5). DMPs that are hypomethylated during ageing and in Sotos were both enriched (odds ratio >1) in enhancer categories (such as ‘active enhancer 1’ or ‘weak enhancer 1’, see the chromatin state model used, from the K562 cell line, in section 3.7) and depleted (odds ratio <1) for active transcription categories (such as ‘active TSS’ or ‘strong transcription’), which was also observed in the ‘Hypo-Hypo DMPs’ subset (Fig. 3.5b). Interestingly, age-related hypomethylation in enhancers seems to be a characteristic of both humans [Slieker et al., 2018, 2016] and mice [Cole et al., 2017b]. Furthermore, both de novo DNA methyltransferases (DNMT3A and DNMT3B) have been shown to bind in an H3K36me3-dependent manner to active enhancers [Rinaldi et al., 2016], consistent with these results. When looking at the levels of total RNA expression (depleted for rRNA) in blood, I confirmed a significant reduction in the RNA levels around these hypomethylated DMPs when compared with the controls sets (Fig. 3.5c, see section 3.7 for more details on how the control sets were defined). Interestingly, hypomethylated DMPs in both ageing and Sotos were depleted from gene bodies (Fig. 3.5b) and were located in areas with lower levels of H3K36me3 when compared with the control sets (Fig. 3.5d, Fig. S2.5). Moreover, hypomethylated aDMPs and hypomethylated Sotos DMPs where both generally enriched or 3.4 Comparing Sotos syndrome and physiological ageing 89 ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● Hypo−Hypo DMPs Hypo Sotos DMPs Hypo aDMPs Active Enhancer 1 Active Enhancer 2 Active Enhancer Flank Active TSS Bivalent prom oter CGI Gene_body Heterochrom atin Poised prom oter Prim ary DNase Prim ary H3K27ac possible Enhancer Prom oter Downstream TSS 1 Prom oter Downstream TSS 2 Prom oter Upstream TSS Quiescent/low Repressed polycom b Shelf Shore Strong transcription Transcribed − 3' preferential Transcribed − 5' preferential Transcribed & regulatory (Prom /Enh) Transcribed 3' preferential and Enh Transcribed 5' preferential and Enh Transcribed and W eak Enhancer W eak Enhancer 1 W eak Enhancer 2 W eak transcription ZNF genes & repeats 0.01 0.10 1.00 10.00 0.01 0.10 1.00 10.00 0.01 0.10 1.00 10.00 Od ds ra tio 25 50 75 100 − log10(P − value) 0 20000 40000 60000 80000 Ageing Sotos Nu m be r o f D M Ps Methylation change Hypermethylated Hypomethylated Hyper aDMPs Hyper Sotos DMPs Hypo aDMPs Hypo Sotos DMPs 29 2550 7 1728 < 2.2e−16 390815 37451 −0.319 −0.351 < 2.2e−16 413204 15062 −0.319 −0.375 < 2.2e−16 426538 1728 −0.323 −0.375 Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In su bs et Co nt ro l In su bs et Co nt ro l In su bs et− 0. 6 −0 .3 0. 0 0. 3 0. 6 NR E Feature: RNA < 2.2e−16 390815 37451 −0.297 −0.335 < 2.2e−16 413204 15062 −0.301 −0.308 < 2.2e−16 426538 1728 −0.301 −0.34 Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In su bs et Co nt ro l In su bs et Co nt ro l In su bs et− 0. 6 −0 .3 0. 0 0. 3 0. 6 NF C Feature: H3K36me3 Figure 3 a b c d Fig. 3.5 Comparison between the DNA methylation changes during physiological ageing and in Sotos. a. On the left: barplot showing the total number of differentially methylated positions (DMPs) found during physiological ageing and in Sotos syndrome. CpG sites that increase their methylation levels with age in the healthy population or those that are elevated in Sotos patients (when compared with a control) are displayed in red. Conversely, those CpG sites that decrease their methylation levels are displayed in blue. On the right: table that represents the intersection between the ageing (aDMPs) and the Sotos DMPs. The subset resulting from the intersection between the hypomethylated DMPs in ageing and Sotos is called the ‘Hypo-Hypo DMPs’ subset (N=1728). b. Enrichment for the categorical (epi)genomic features considered when comparing the different genome-wide subsets of differentially methylated positions (DMPs) in ageing and Sotos against a control (see section 3.7). The y-axis represents the odds ratio (OR), the error bars show the 95% confidence interval for the OR estimate and the colour of the points codes for − log10(p-value) obtained after testing for enrichment using Fisher’s exact test. An OR > 1 shows that the given feature is enriched in the subset of DMPs considered, whilst an OR < 1 shows that it is found less than expected. In grey: features that did not reach significance using a significance level of α = 0.01 after Bonferroni correction. c. Boxplots showing the distributions of the ‘normalised RNA expression’ (NRE) when comparing the different genome-wide subsets of differentially methylated positions (DMPs) in ageing and Sotos against a control (see section 3.7). NRE represents normalised mean transcript abundance in a window of± 200 bp from the CpG site coordinate (DMP) being considered. The p-values (two-sided Wilcoxon’s test, before multiple testing correction) are shown above the boxplots. The number of DMPs belonging to each subset (in green) and the median value of the feature score (in dark red) are shown below the boxplots. d. As in c., but showing the ‘normalised fold change’ (NFC) for the H3K36me3 histone modification (representing normalised mean ChIP-seq fold change for H3K36me3 in a window of ± 200 bp from the DMP being considered). 90 Biological aspects depleted for the same histone marks in blood (Fig. S2.5), which adds weight to the hypothesis that they share the same genomic context and could become hypomethylated through similar molecular mechanisms. Intriguingly, I also identified a subset of DMPs (2550) that were hypermethylated during ageing and hypomethylated in Sotos (Fig. 3.5a). These ‘Hyper-Hypo DMPs’ seem to be enriched for categories such as ‘bivalent promoter’ and ‘repressed polycomb’ (Fig. S2.4), which are normally associated with developmental genes [Bernhart et al., 2016; Bernstein et al., 2006]. These categories are also a defining characteristic of the hypermethylated aDMPs, highlighting that even though the direction of the DNA methylation changes is different in some ageing and Sotos DMPs, the genomic context in which they happen is shared. Finally, I looked at the DNA methylation patterns in the 353 Horvath’s epigenetic clock CpG sites for the Sotos samples. For each clock CpG site, I modelled the changes of DNA methylation with age in the healthy control individuals (0-55 years) and then calculated the deviations from these patterns for the Sotos samples (Fig. 3.6, see equation 3.3). As expected, the landscape of clock CpG sites is dominated by hypomethylation in the Sotos samples, although only a small fraction of the clock CpG sites seems to be significantly affected (Fig. 3.6c). Overall, I confirmed the trends reported for the genome-wide analysis (Fig. S2.6, Fig. S2.7, Fig. S2.8). However, given the much smaller number of CpG sites to consider in this analysis, very few comparisons reached significance. I have demonstrated that the ageing process and Sotos syndrome share a subset of hy- pomethylated CpG sites that is characterised by an enrichment in enhancer features and a depletion of active transcription activity. This highlights the usefulness of developmental disorders as a model to study the mechanisms that may drive the changes in the methy- lome with age, since they permit stratification of the ageing DMPs into different functional categories that are associated with alterations in the function of specific genes and hence specific molecular components of the epigenetic ageing clock. 3.5 Methylation Shannon entropy and the epigenetic clock In section 2.1.5 I have discussed how Shannon entropy can be applied in the context of DNA methylation data in order to measure the genome-wide epigenetic information loss that happens during ageing. It is possible to apply a methodology similar to the one described in section 2.2.2 to compare the methylation Shannon entropy in healthy controls (0-55 years) 3.5 Methylation Shannon entropy and the epigenetic clock 91 ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ●● ● ●●● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ●● ● ●● ● ● ● ● ● ● 0. 00 0. 25 0. 50 0. 75 1. 00 0 20 40 Chronological age (years) β-v al ue Disease status ● ● Control Sotos cg02071305 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●●● ●● ●● ● ●●● ● ●● ● ● ● ●●● ● ●● ● ● ● ●●●●● ●● ●● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● 0. 00 0. 25 0. 50 0. 75 1. 00 0 20 40 Chronological age (years) Disease status ● ● Control Sotos cg18328933 Horvath's clock CpGs So to s sa m pl es cg 04 47 48 32 cg 13 82 80 47 cg 03 33 00 58 cg 16 98 49 44 cg 19 72 28 47 cg 22 19 78 30 cg 01 82 03 74 cg 03 76 04 83 cg 02 07 13 05 cg 05 36 57 29 cg 18 32 89 33 cg 27 01 63 07 cg 09 80 96 72 cg 27 16 90 20 cg 25 56 48 00 cg 22 80 90 47 cg 03 57 80 41 cg 26 45 35 88 cg 02 65 42 91 cg 02 36 46 42 cg 22 17 18 29 cg 00 07 59 67 cg 17 27 40 64 cg 00 16 89 42 cg 04 43 10 54 cg 22 94 70 00 cg 01 35 34 48 cg 13 26 94 07 cg 26 39 49 40 cg 08 03 00 82 cg 02 48 95 52 cg 00 94 55 07 cg 15 80 49 73 cg 25 10 19 36 cg 02 08 55 07 cg 03 27 02 04 cg 15 98 82 32 cg 26 61 40 73 cg 24 26 24 69 cg 19 85 37 60 cg 24 89 97 50 cg 25 77 11 95 cg 05 25 04 58 cg 22 44 91 14 cg 13 85 48 74 cg 24 12 68 51 cg 14 16 37 76 cg 00 09 16 93 cg 10 34 59 36 cg 05 75 57 79 cg 14 25 82 36 cg 19 72 44 70 cg 21 09 63 99 cg 02 27 52 94 cg 08 96 52 35 cg 18 44 00 48 cg 17 68 68 85 cg 16 89 94 42 cg 18 98 41 51 cg 10 26 64 90 cg 24 05 81 32 cg 05 29 42 43 cg 12 35 14 33 cg 17 06 39 29 cg 06 11 78 55 cg 20 82 80 84 cg 07 45 52 79 cg 00 43 66 03 cg 21 87 08 84 cg 10 86 51 19 cg 17 32 41 28 cg 06 14 49 05 cg 13 93 12 28 cg 05 59 02 57 cg 05 44 29 02 cg 20 24 08 60 cg 14 17 54 38 cg 11 29 99 64 cg 01 51 15 67 cg 14 72 30 32 cg 14 89 41 44 cg 22 28 98 37 cg 16 03 46 52 cg 12 94 13 69 cg 26 37 25 17 cg 19 30 52 27 cg 09 11 86 25 cg 22 90 18 40 cg 06 95 23 10 cg 27 20 27 08 cg 25 80 99 05 cg 19 04 69 59 cg 04 45 27 13 cg 21 95 05 18 cg 27 54 41 90 cg 19 70 66 82 cg 25 55 24 92 cg 26 72 38 47 cg 04 12 19 83 cg 03 89 13 19 cg 22 61 30 10 cg 17 33 84 03 cg 14 99 22 53 cg 16 16 83 11 cg 08 33 19 60 cg 09 64 63 92 cg 26 84 53 00 cg 19 76 12 73 cg 02 33 24 92 cg 19 69 27 10 cg 07 40 84 56 cg 07 33 75 98 cg 06 73 86 02 cg 14 65 48 75 cg 03 10 31 92 cg 16 41 93 45 cg 11 38 82 38 cg 25 16 68 96 cg 02 33 54 41 cg 16 57 91 01 cg 16 49 44 77 cg 26 62 09 59 cg 13 68 27 22 cg 01 26 29 13 cg 02 82 71 12 cg 06 99 34 13 cg 04 26 84 05 cg 18 98 36 72 cg 19 16 76 73 cg 08 12 47 22 cg 10 37 72 74 cg 21 39 57 82 cg 12 83 06 94 cg 06 46 22 91 cg 19 42 09 68 cg 20 76 13 22 cg 05 90 36 09 cg 10 52 30 19 cg 17 72 96 67 cg 04 83 60 38 cg 25 15 96 10 cg 24 45 03 12 cg 05 67 53 73 cg 09 13 30 26 cg 01 48 56 45 cg 24 08 18 19 cg 12 41 35 66 cg 21 37 82 06 cg 27 01 59 31 cg 02 21 71 59 cg 22 67 91 20 cg 07 59 59 43 cg 14 32 91 57 cg 23 94 15 99 cg 15 66 14 09 cg 01 65 62 16 cg 24 11 68 86 cg 27 37 74 50 cg 02 38 81 50 cg 09 72 25 55 cg 20 79 58 63 cg 05 96 00 24 cg 10 37 67 63 cg 02 58 06 06 cg 24 58 00 01 cg 13 03 85 60 cg 09 86 98 58 cg 18 18 07 83 cg 13 31 91 75 cg 25 65 78 34 cg 25 41 17 25 cg 17 65 56 14 cg 05 92 16 99 cg 19 56 96 84 cg 12 37 37 71 cg 22 73 63 54 cg 23 51 76 05 cg 06 49 39 94 cg 08 37 09 96 cg 07 38 84 93 cg 00 43 15 49 cg 07 73 03 01 cg 07 15 83 39 cg 12 76 86 05 cg 17 09 95 69 cg 16 15 04 35 cg 03 01 90 00 cg 01 57 08 85 cg 13 46 04 09 cg 22 19 01 14 cg 03 58 83 57 cg 16 54 75 29 cg 01 58 44 73 cg 00 37 47 17 cg 13 97 53 69 cg 13 12 90 46 cg 06 81 06 47 cg 04 12 68 66 cg 03 68 28 23 cg 10 48 69 98 cg 14 59 79 08 cg 10 04 58 81 cg 23 12 44 51 cg 24 47 18 94 cg 16 35 88 26 cg 11 02 57 93 cg 21 37 01 43 cg 07 28 52 76 cg 19 94 58 40 cg 09 50 96 73 cg 08 09 07 72 cg 14 06 08 28 cg 01 40 77 97 cg 06 36 11 08 cg 26 29 76 88 cg 27 49 43 83 cg 16 40 83 94 cg 12 61 62 77 cg 21 21 17 48 cg 27 09 20 35 cg 20 29 56 71 cg 20 99 98 13 cg 18 03 10 08 cg 20 10 03 81 cg 17 85 35 87 cg 15 97 40 53 cg 17 96 05 16 cg 14 50 12 53 cg 18 95 60 95 cg 19 51 49 28 cg 19 04 46 74 cg 23 78 65 76 cg 13 21 60 57 cg 09 78 51 72 cg 04 00 50 32 cg 24 83 47 40 cg 09 44 11 52 cg 26 84 20 24 cg 08 25 10 36 cg 26 04 54 34 cg 04 08 41 57 cg 26 82 40 91 cg 04 52 88 19 cg 01 23 40 63 cg 22 00 63 86 cg 22 63 75 07 cg 00 86 48 67 cg 27 31 98 98 cg 25 78 11 23 cg 07 84 99 04 cg 15 26 29 28 cg 15 54 75 34 cg 07 29 15 63 cg 19 00 88 09 cg 17 28 53 25 cg 02 97 25 51 cg 06 51 30 75 cg 14 72 79 52 cg 09 88 59 51 cg 08 41 34 69 cg 20 52 42 16 cg 10 92 09 57 cg 15 70 35 12 cg 15 38 17 69 cg 04 09 41 60 cg 01 87 36 45 cg 14 65 83 62 cg 03 94 73 62 cg 25 50 56 10 cg 08 77 17 31 cg 26 16 26 95 cg 11 65 32 66 cg 05 84 77 78 cg 23 18 03 65 cg 20 30 56 10 cg 25 68 30 12 cg 14 40 89 69 cg 17 40 86 47 cg 22 92 08 73 cg 02 33 15 61 cg 11 93 25 64 cg 15 18 52 86 cg 10 94 00 99 cg 09 41 82 83 cg 21 30 52 65 cg 16 24 17 14 cg 02 04 75 77 cg 12 98 54 18 cg 07 66 37 89 cg 22 43 22 69 cg 06 12 14 69 cg 14 42 45 79 cg 23 09 20 72 cg 06 92 67 35 cg 08 18 61 24 cg 26 45 69 57 cg 23 66 26 75 cg 07 49 84 21 cg 09 72 23 97 cg 21 46 00 81 cg 09 19 13 27 cg 06 68 88 48 cg 13 83 66 27 cg 06 55 73 58 cg 26 00 38 13 cg 09 01 99 38 cg 19 47 87 43 cg 01 02 77 39 cg 02 47 95 75 cg 26 00 50 82 cg 24 25 41 20 cg 21 80 13 78 cg 20 94 77 75 cg 26 04 33 91 cg 27 41 35 43 cg 25 07 06 37 cg 15 34 13 40 cg 19 27 31 82 cg 01 64 48 50 cg 12 94 62 25 cg 14 30 84 52 cg 22 56 85 40 cg 19 34 61 93 cg 14 40 99 58 cg 01 02 78 05 cg 25 92 85 79 cg 08 43 42 34 cg 03 28 67 83 cg 01 96 81 78 cg 18 57 33 83 cg 10 28 10 02 cg 03 16 72 75 cg 20 91 45 08 cg 13 89 91 08 cg 20 69 25 69 cg 18 05 50 07 cg 01 45 94 53 cg 06 83 67 72 cg 13 54 72 37 cg 25 14 85 89 cg 02 15 40 74 cg 17 58 93 41 cg 13 30 21 54 cg 03 56 53 23 cg 24 88 80 49 cg 06 04 48 99 cg 18 13 97 69 cg 01 56 08 71 cg 04 99 96 91 cg 07 77 02 22 cg 16 74 47 41 cg 11 31 46 84 cg 14 42 37 78 β-value difference −0.2 −0.1 0 0.1 0.2 Sex Male Female EAA with CCC (years) −10 0 10 20 30 Chronological age (years) 0 20 40 60 Sotos DMPs Hypomethylated aDMPs Hypermethylated Hypomethylated Weight in model −1 −0.5 0 0.5 1 ChrHMM state (in K562) Active TSS Promoter Transcribed Weakly transcribed Transcribed/regulatory Active enhancer Weak enhancer DNase Heterochromatin Poised promoter Bivalent promoter Repressed polycomb Quiescent/low RNA (in PBMC) −2 −1 0 1 2 H3K36me3 (in PBMC) −2 −1 0 1 2 In gene body Yes No a b c Fig. 3.6 The landscape of Horvath’s epigenetic clock CpG sites in Sotos syndrome. a. and b. DNA methylation (β -value) profiles for two of the clock CpG sites (cg02071305 and cg18328933). A linear model (displayed in dark grey, see equation 3.3) can be fixed to each CpG site to model the changes in β -value with chronological age in the controls (grey). Afterwards, the difference of the Sotos samples β -values (orange) with the controls can be estimated. c. Heatmap displaying the differential methylation patterns for Sotos samples (rows) when compared with controls in each one of the 353 epigenetic clock CpGs (columns). Hierarchical clustering was performed in both rows and columns. RNA refers to the ‘normalised RNA expression’ (NRE). H3K36me3 refers to the H3K36me3 histone modification ‘normalised fold change’ (NFC). aDMPs: differentially methylated positions during ageing. EAA: epigenetic age acceleration. CCC: cell composition correction. PBMC: peripheral blood mononuclear cells. 92 Biological aspects and Sotos patients (i.e. using a linear model similar to equation 2.16, although in this case the dependent variable is the entropy value). This allows testing whether Sotos syndrome patients present genome-wide Shannon entropy acceleration i.e. deviations from the expected genome-wide Shannon entropy for their age. Despite detailed analysis, I did not find evidence that this was the case when looking genome-wide (p-value = 0.71, Fig. 3.7a,b, Fig. S2.9a). When I considered only the 353 Horvath’s epigenetic clock CpG sites for the entropy calculations, the picture was different. Shannon entropy for the 353 clock sites slightly decreased with age in the controls when I included all the batches, showing the opposite direction when compared with the genome-wide entropy (SCC =−0.1223, p-value = 3.8166 ·10−5, Fig. 3.7c). However, when I removed the ‘Europe’ batch (which was an outlier even after pre-processing, Fig. S2.10), this trend was reversed and I observed a weak increase of clock Shannon entropy with age (SCC = 0.1048, p-value = 8.6245 ·10−5). This shows that Shannon entropy calculations are very sensitive to batch effects, especially when considering a small number of CpG sites, and the results must be interpreted carefully, as already discussed in section 2.1.5. Interestingly, the mean Shannon entropy across all the control samples was higher in the epigenetic clock sites (mean = 0.4726, Fig. 3.7c) with respect to the genome-wide entropy (mean = 0.3913, Fig. 3.7a). Sotos syndrome patients displayed a lower clock Shannon entropy when compared with the control (p-value = 5.0449 ·10−12, Fig. 3.7d, Fig. S2.9b), which is probably driven by the hypomethylation of the clock CpG sites. Furthermore, this highlights that the Horvath’s epigenetic clock sites could have slightly different characteristics in terms of the methylation entropy associated with them when compared with the genome as a whole, something that to my knowledge has not been reported before. 3.6 Discussion The epigenetic ageing clock has emerged as the most accurate biomarker of the ageing process and it seems to be a conserved property in mammalian genomes [Field et al., 2018; Horvath and Raj, 2018]. However, it is still unknown whether the age-related DNA methylation changes measured are functional at all or whether they are related to some fundamental process of the biology of ageing. Developmental disorders in humans represent an interesting framework to look at the biological effects of mutations in genes that are fundamental for the integrity of the epigenetic landscape and other core processes, such as growth or neurodevelopment [Aref-Eshghi et al., 2018b; Bjornsson, 2015]. Therefore, using a reverse 3.6 Discussion 93 ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ●●● ● ● ● ●●●●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●●●● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ●● ● ●● ● ●●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.3 0.4 0.5 0.6 0 20 40 Chronological age (years) Ge no m e− wi de S ha nn on e nt ro py Disease status ● ● Control Sotos Control: N=1128 Sotos: N=20 0.71 −0 .0 25 0. 00 0 0. 02 5 0. 05 0 Co nt ro l So to s Ge no m e− wi de S ha nn on e nt ro py a cc el er at io n ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 0.3 0.4 0.5 0.6 0 20 40 Chronological age (years) Sh an no n en tro py fo r t he cl oc k si te s Disease status ● ● Control Sotos Control: N=1128 Sotos: N=20 5e−12 −0 .0 6 −0 .0 3 0. 00 0. 03 0. 06 Co nt ro l So to s Sh an no n en tro py a cc el er at io n fo r t he cl oc k si te s Figure 4 a c b d Fig. 3.7 Analysis of methylation Shannon entropy during physiological ageing and in Sotos syndrome. a. Scatterplot showing the relation between genome-wide Shannon entropy (i.e. calculated using the methylation levels of all the CpG sites in the array) and chronological age of the samples for Sotos (orange) and healthy controls (grey). Each sample is represented by one point. b. Boxplots showing the distributions of genome-wide Shannon entropy acceleration (i.e. deviations from the expected genome-wide Shannon entropy for their age) for the control and Sotos samples. The p-value displayed on top of the boxplots was derived from a two-sided Wilcoxon’s test. c. As in a., but using the Shannon entropy calculated only for the 353 CpG sites in the Horvath’s epigenetic clock. d. As in b., but using the Shannon entropy calculated only for the 353 CpG sites in the Horvath’s epigenetic clock. 94 Biological aspects genetics approach, I aimed to identify genes that disrupt aspects of the behaviour of the epigenetic ageing clock in humans. Most of the studies have looked at the epigenetic ageing clock using Horvath’s epigenetic clock [Horvath, 2013a], and I decided to employ it as a tool to measure the epigenetic age of my samples. The results from the screen strongly suggest that Sotos syndrome accelerates epigenetic ageing. Sotos syndrome is caused by loss-of-function mutations in the NSD1 gene [Choufani et al., 2015; Kurotaki et al., 2002], which encodes a histone H3 lysine 36 (H3K36) methyltransferase. This leads to a phenotype which can include pre-natal and post-natal overgrowth, facial gestalt, advanced bone age, developmental delay, higher cancer predisposition and, in some cases, heart defects [Leventopoulos et al., 2009]. Remarkably, many of these characteristics could be interpreted as ageing-like, identifying Sotos syndrome as a potential human model of accelerated physiological ageing. NSD1 catalyses the addition of either monomethyl (H3K36me) or dimethyl groups (H3K36me2) and indirectly regulates the levels of trimethylation (H3K36me3) by altering the availability of the monomethyl and dimethyl substrates for the trimethylation enzymes (SETD2 in humans, whose mutations cause a ‘Sotos-like’ overgrowth syndrome ) [Luscan et al., 2014; Wagner and Carpenter, 2012]. H3K36 methylation has a complex role in the regulation of transcription [Wagner and Carpenter, 2012] and has been shown to regulate nutrient stress response in yeast [McDaniel et al., 2017]. Moreover, experiments in model or- ganisms (yeast and worm) have demonstrated that mutations in H3K36 methyltranferases decrease lifespan and, remarkably, mutations in H3K36 demethylases increase it [Ni et al., 2012; Pu et al., 2015; Sen et al., 2015]. In humans, DNA methylation patterns are established and maintained by three conserved enzymes: the maintenance DNA methyltransferase DNMT1 and the de novo DNA methyl- transferases DNMT3A and DNMT3B [Schübeler, 2015]. Both DNMT3A and DNMT3B contain PWWP domains that can read the H3K36me3 histone mark [Baubec et al., 2015; Dhayalan et al., 2010]. Therefore, the H3K36 methylation landscape can influence DNA methylation levels in specific genomic regions through the recruitment of the de novo DNA methyltransferases. Mutations in the PWWP domain of DNMT3A impair its binding to H3K36me2 and H3K36me3 and cause an undergrowth disorder in humans (microcephalic dwarfism) [Heyn et al., 2019]. This redirects DNMT3A, which is normally targeted to H3K36me2 and H3K36me3 throughout the genome, to DNA methylation valleys (DMVs, a.k.a DNA methylation canyons), which become hypermethylated [Heyn et al., 2019]; a phenomenon that also seems to happen during physiological ageing in humans [Rakyan et al., 2010; Slieker et al., 2016; Teschendorff et al., 2010] and mice [Cole et al., 2017b]. 3.6 Discussion 95 DMVs are hypomethylated domains conserved across cell types and species, often asso- ciated with Polycomb-regulated developmental genes and marked by bivalent chromatin (with H3K27me3 and H3K4me3) [Jeong et al., 2013; Li et al., 2018; Long et al., 2013; Xie et al., 2013]. Therefore, I suggest a model (Fig. 3.8) where the reduction in the levels of H3K36me2 and/or H3K36me3, caused by a proposed decrease in H3K36 methyla- tion maintenance during ageing or NSD1 function in Sotos syndrome, could lead to hy- pomethylation in many genomic regions (because DNMT3A is recruited less efficiently) and hypermethylation in DMVs (because of the higher availability of DNMT3A). In- deed, I observe enrichment for categories such as ‘bivalent promoter’ or ‘repressed polycomb’ in the hypermethylated DMPs in Sotos and ageing (Fig. S2.4), which is also supported by higher levels of Polycomb Repressing Complex 2 (PRC2, represented by EZH2) and H3K27me3, the mark deposited by PRC2 (Fig. S2.5).This is also consistent with the results obtained for the epigenetic mitotic clock [Yang et al., 2016c], where I observe a trend towards increased hypermethylation of Polycomb-bound regions in Sotos patients. Furthermore, it is worth mentioning that a mechanistic link between PRC2 recruitment and H3K36me3 has also been unravelled via the Tudor domains of some polycomb-like proteins [Cai et al., 2013; Li et al., 2017]. A recent preprint has shown that loss-of-function mutations in DNMT3A, which cause Tatton-Brown-Rahman overgrowth syndrome, also lead to a higher ticking rate of the epige- netic ageing clock [Jeffries et al., 2018]. They also report positive epigenetic age acceleration in Sotos syndrome and negative acceleration in Kabuki syndrome, consistent with my re- sults. Furthermore, they observe a DNA methylation signature in the DNMT3A mutants characterised by widespread hypomethylation, with a modest enrichment of DMPs in re- gions upstream of the transcription start site, shores and enhancers [Jeffries et al., 2018], which I also detect in the ‘Hypo-Hypo DMPs’ (those that become hypomethylated both during physiological ageing and in Sotos). Therefore, the hypomethylation observed in the ‘Hypo-Hypo DMPs’ is consistent with a reduced methylation activity of DNMT3A, which in my analysis could be a consequence of the decreased recruitment of DNMT3A to genomic regions that have lost H3K36 methylation (Fig. 3.8). Interestingly, H3K36me3 is required for the selective binding of the de novo DNA methyltransferase DNMT3B to the bodies of highly transcribed genes [Baubec et al., 2015]. Furthermore, DNMT3B loss reduces gene-body methylation, which leads to intragenic spurious transcription (a.k.a cryptic transcription) [Neri et al., 2017]. An increase in this so-called cryptic transcription seems to be a conserved feature of the ageing process [Sen et al., 2015]. Therefore, the changes observed in the ‘Hypo-Hypo DMPs’ could theoretically 96 Biological aspects DMV Ageing Sotos syndrome ( NSD1)( H3K36 methylation maintenance) DMV H3K36me2/3 5-mC C DNMT3A PWWP domain Figure 5 Fig. 3.8 Proposed model that highlights the role of H3K36 methylation maintenance on epigenetic ageing. The H3K36me2/3 mark allows recruiting de novo DNA methyltransferases DNMT3A (in green) and DNMT3B (not shown) through their PWWP domain (in blue) to different genomic regions (such as gene bodies or pericentric heterochromatin) [Baubec et al., 2015; Chantalat et al., 2011; Chen et al., 2004], which leads to the methylation of the cytosines in the DNA of these regions (5mC, black lollipops). On the contrary, DNA methylation valleys (DMVs) are conserved genomic regions that are normally found hypomethylated and associated with Polycomb-regulated developmental genes [Jeong et al., 2013; Li et al., 2018; Long et al., 2013; Xie et al., 2013]. During ageing, the H3K36 methylation machinery could become less efficient at maintaining the H3K36me2/3 landscape. This would lead to a relocation of de novo DNA methyltransferases from their original genomic reservoirs (which would become hypomethylated) to other non-specific regions such as DMVs (which would become hypermethylated and potentially lose their normal boundaries), with functional consequences for the tissues. This is also partially observed in patients with Sotos syndrome, where mutations in NSD1 potentially affect H3K36me2/3 patterns and accelerate the epigenetic ageing clock as measured with the Horvath’s model [Horvath, 2013a]. Given that DNMT3B is enriched in the gene bodies of highly transcribed genes [Baubec et al., 2015] and that I found these regions depleted in the differential methylation analysis, I hypothesise that the hypermethylation of DMVs could be mainly driven by DNMT3A instead. However, it is important to mention that my analysis does not discard a role of DNMT3B during epigenetic ageing. 3.6 Discussion 97 be a consequence of the loss of H3K36me3 and the concomitant inability of DNMT3B to be recruited to gene bodies. However, the ‘Hypo-Hypo DMPs’ were depleted for H3K36me3, active transcription and gene bodies when compared with the rest of the probes in the array (Fig. 3.5b-d), prompting me to suggest that the DNA methylation changes observed are likely mediated by DNMT3A instead (Fig. 3.8). Nevertheless, it is worth mentioning that the different biological replicates for the blood H3K36me3 ChIP-seq datasets were quite heterogeneous and that the absolute difference in the case of the hypomethylated Sotos DMPs, although significant due to the big sample sizes, is quite small. Thus, I cannot exclude the existence of this mechanism during human ageing and an exhaustive study on the prevalence of cryptic transcription in humans and its relation to the ageing methylome should be carried out. H3K36me3 has also been shown to guide deposition of the N6-methyladenosine mRNA modification (m6A), an important post-transcriptional mechanism of gene regulation [Huang et al., 2019]. Interestingly, a decrease in overall m6A during human ageing has been previously reported in PBMCs [Min et al., 2018], suggesting another biological route through which an alteration of the H3K36 methylation landscape could have functional consequences for the organism. Because of the way that the Horvath epigenetic clock was trained [Horvath, 2013a], it is likely that its constituent 353 CpG sites are a low-dimensional representation of the different genome-wide processes that are eroding the epigenome with age. My analysis has shown that these 353 CpG sites are characterised by a higher Shannon entropy when compared with the rest of the genome, which is dramatically decreased in the case of Sotos patients. This could be related to the fact that Horvath’s clock CpGs are enriched in regions of bivalent chromatin (marked by H3K27me3 and H3K4me3), conferring a more dynamic or plastic regulatory state with levels of DNA methylation deviated from the collapsed states of 0 or 1. Interestingly, EZH2 (part of Polycomb Repressing Complex 2, responsible for H3K27 methylation) is an interacting partner of DNMT3A and NSD1, with mutations in NSD1 affecting the genome-wide levels of H3K27me3 [Streubel et al., 2018]. Furthermore, Kabuki syndrome was weakly identified in my screen as having an epigenome younger than expected, which could be related to the fact that they show post-natal dwarfism [Aref-Eshghi et al., 2017; Butcher et al., 2017]. Kabuki syndrome is caused by loss-of-function mutations in KMT2D [Aref-Eshghi et al., 2017; Butcher et al., 2017], a major mammalian H3K4 mono-methyltransferase [Froimchuk et al., 2017]. Additionally, H3K27me3 and H3K4me3 levels can affect lifespan in model organisms [Sen et al., 2016]. It will be interesting to test whether bivalent chromatin is a general feature of multi-tissue epigenetic ageing clocks. 98 Biological aspects Thus, DNMT3A, NSD1 and the machinery in control of bivalent chromatin (such as EZH2 and KMT2D) contribute to an emerging picture on how the mammalian epigenome is regulated during ageing, which could open new avenues for anti-ageing drug development. Mutations in these proteins lead to different developmental disorders with impaired growth defects [Bjornsson, 2015], with DNMT3A, NSD1 and potentially KMT2D also affecting epigenetic ageing. Interestingly, EZH2 mutations (which cause Weaver syndrome, Table 3.1) do not seem to affect the epigenetic clock in my screen. However, this syndrome has the smallest number of samples (N = 7) and this could limit the power to detect any changes. My screen has also revealed that Rett syndrome and fragile X syndrome (FXS) could potentially have an accelerated epigenetic age. It is worth noting that FXS is caused by an expansion of the CGG trinucleotide repeat located in the 5’ UTR of the FMR1 gene [Schenkel et al., 2016]. Interestingly, Huntington’s disease, caused by a trinucleotide repeat expansion of CAG, has also been shown to accelerate epigenetic ageing of human brain [Horvath et al., 2016b], pointing towards trinucleotide repeat instability as an interesting molecular mechanism to look at from an ageing perspective. It is important to notice that the conclusions for Rett syndrome, FXS and Kabuki syndrome were very dependent on the age range used in the healthy control (Fig. S2.2) and these results must therefore be treated with caution. This study has several limitations that I tried to address in the best possible way. First of all, given that DNA methylation data for patients with developmental disorders is relatively rare, some of the sample sizes were quite small. It is thus possible that some of the other developmental disorders assessed are epigenetically accelerated but I lack the power to detect this. Furthermore, people with the disorders tend to get sampled when they are young i.e. before reproductive age. Horvath’s clock adjusts for the different rates of change in the DNA methylation levels of the clock CpGs before and after adult/reproductive age (20 years in humans) [Horvath, 2013a], but this could still have an effect on the predictions, especially if the control is not properly age-matched. My solution was to discard those developmental disorders with less than 5 samples and I required them to have at least 2 samples with an age ≥ 20 years, which reduced the list of final disorders included to the ones listed in Table 3.1. Future studies should increase the sample size and follow the patients during their entire lifespan in order to confirm these findings. Furthermore, it would be interesting to identify mutations that affect, besides the mean, the variance of epigenetic age acceleration, since changes in methylation variability at single CpG sites with age have been associated with fundamental ageing mechanisms [Slieker et al., 2016]. Finally, testing the influence of H3K36 3.7 Additional methods 99 methylation on the epigenetic clock and lifespan in mice will provide deeper mechanistic insights. 3.7 Additional methods Sample generation and annotation I collected DNA methylation data generated with the Illumina Infinium HumanMethyla- tion450 BeadChip (450K array) from human blood. In the case of the developmental disorder samples, I combined public data with data generated in-house by my collaborators in Canada (Table S2.1, Fig. S2.1). The wet-lab protocols used in the public datasets can be found in their respective GEO repositories. DNA methylation data from my Canadian collaborators was generated according to the manufacturer’s protocol [Illumina, 2015; Research, 2019]. Basic metadata (including the chronological age) was also stored. All the mutations in the developmental disorder samples were manually curated using Variant Effect Predictor [McLaren et al., 2016] in the GRCh37 (hg19) human genome assembly. Those samples with a variant of unknown significance that had the characteristic DNA methylation signature of the disease were also included (they are labelled as ‘YES_predicted’ in Fig. S2.1). In the case of fragile X syndrome (FXS), only male samples with full mutation (>200 repeats) [Schenkel et al., 2016] were included in the final screen. As a consequence, only samples with a clear molecular and clinical diagnosis were kept for the final screen. Identifying differentially methylated positions in Sotos syndrome Following a strategy similar to the one outlined in section 2.1.4, I identified those array probes that were differentially methylated in patients with Sotos syndrome. I compared the Sotos samples (N=20) against the internal control samples (N=51) from the same dataset (GSE74432) [Choufani et al., 2015], fitting the following linear model to each one of the array probes: Beta∼ Disease_status+Age+Sex+Gran+CD4T +CD8T +B+Mono+NK+PC1+ ...+PC17 (3.1) where Beta is the β -value for the array probe being evaluated; Disease_status indicates whether a sample comes from a healthy individual (0) or a Sotos syndrome patient (1); Age is 100 Biological aspects the chronological age (in years) of the samples; Sex encodes for the sex of the samples (0/1); Gran, CD4T , CD8T , B, Mono and NK are the cell type proportions from the samples as calculated with my cell-type deconvolution strategy and PCN is the Nth principal component that captures technical variance and accounts for potential batch effects (see section 2.2.3 for more details). P-values and regression coefficients were extracted for the Disease_status covariate. I selected as my final Sotos DMPs those CpG probes that survived the analysis after Bonferroni multiple testing correction with a significance level of α = 0.01. (Epi)genomic annotation of the CpG sites Different (epi)genomic features were extracted for the CpG sites of interest. All the data were mapped to the hg19 assembly of the human genome. The continuous features were calculated by extracting the mean value in a window of± 200 bp from the CpG site coordinate using the pyBigWig package [Richter et al., 2019]. I chose this window value based on the methylation correlation observed between neighbouring CpG sites in previous studies [Zhang et al., 2015b]. The continuous features included (Fig. S2.11): • ChIP-seq data from ENCODE (histone modifications from peripheral blood mononu- clear cells or PBMC; EZH2, as a marker of Polycomb Repressing Complex 2 binding, from B cells; RNF2, as a marker of Polycomb Repressing Complex 1 binding, from the K562 cell line). I obtained Z-scores (using the scale function in R) for the values of ‘fold change over control’ as calculated in ENCODE [Consortium et al., 2012]. When needed, biological replicates of the same feature were aggregated by taking the mean of the Z-scores in order to obtain the ‘normalised fold change’ (NFC). • ChIP-seq data for LaminB1 (GSM1289416, quantified as ‘normalised read counts’ or NRC) and Repli-seq data for replication timing (GSM923447, quantified as ‘wavelet- transformed signals’ or WTS). I used the same data from the IMR90 cell line as in [Zhou et al., 2018]. • Total RNA-seq data (rRNA depleted, from PBMC) from ENCODE. I calculated Z- scores after aggregating the ‘signal of unique reads’ (sur) for both strands (+ and - ) in the following manner: RNAi = log2(1+ suri++ suri−) (3.2) 3.7 Additional methods 101 where RNAi represents the RNA signal (that then needs to be scaled to obtain the ‘normalised RNA expression’ or NRE) for the ith CpG site. The categorical features were obtained by looking at the overlap (using the pybedtools package) [Dale et al., 2011] of the CpG sites with the following: • Gene bodies, from protein-coding genes as defined in the basic gene annotation of GENCODE release 29 [Frankish et al., 2018]. • CpG islands (CGIs) were obtained from the UCSC Genome Browser [Bock et al., 2007]. Shores were defined as regions 0 to 2 kb away from CGIs in both directions and shelves as regions 2 to 4 kb away from CGIs in both directions as previously described [Martin-Herranz et al., 2017b; Zhang et al., 2015b]. • Chromatin states were obtained from the K562 cell line in the Roadmap Epigenomics Project (based on imputed data, 25 states, 12 marks) [Consortium, 2014]. A visualisa- tion for the association between chromatin marks and chromatin states can be found in Consortium [2013]. When needed for visualisation purposes, the 25 states were manually collapsed to a lower number of them. I compared the different genomic features for each one of the subsets of CpG sites (hypomethylated aDMPs, hypomethylated Sotos DMPs, etc.) against a control set. This control set was composed of all the probes from the background set from which I removed the subset that I was testing. In the case of the comparisons against the 353 Horvath clock CpG sites, a background set of the 21368 (21K) CpG probes used to train the original Horvath model [Horvath, 2013a] was used. In the case of the genome-wide comparisons for ageing and Sotos syndrome, a background set containing all 428266 probes that passed my pre-processing pipeline was used (see section 2.1.2). For each continuous feature, the feature score distributions for a given subset of CpG sites and the control set were compared using the non-parametric two-sided Wilcoxon’s test. For each categorical feature, I first created a 2x2 contingency table, with the two variables indicating whether a given CpG site overlaps with the categorical feature under consideration (Yes/No) and whether the CpG site is in the subset (e.g. hypomethylated aDMPs) being considered (Yes/No). Using Fisher’s exact test (as implemented in the fisher.test function in R) I calculated the p-value and the odds ratio (OR), which allows determining whether the categorical feature under consideration is enriched in the CpGs subset. 102 Biological aspects Differences in the epigenetic clock CpGs β -values for Sotos syndrome To compare the β -values of the Horvath clock CpG sites between the healthy samples and Sotos samples I fitted the following linear model to each array probe from the Horvath’s epigenetic clock (353 in total) in the healthy individuals samples (Fig. 3.6a,b): Beta∼ Age+Age2+Sex+Gran+CD4T +CD8T +B+Mono+NK+PC1+ ...+PC17 (3.3) where Beta is the β -value for the clock array probe being evaluated; Age is the chronolog- ical age (in years) of the samples; Sex encodes for the sex of the samples (0/1); Gran, CD4T , CD8T , B, Mono and NK are the cell type proportions from the samples as calculated with my cell-type deconvolution strategy and PCN is the Nth principal component that captures technical variance and accounts for potential batch effects (see section 2.2.3 for more details). The Age2 covariate allows accounting for non-linear relationships between chronological age and the β -values. Finally, I calculated the difference between the β -values in Sotos samples and the predictions from the models in equation 3.3 and displayed these differences in an annotated heatmap (Fig. 3.6c). Chapter 4 Technological aspects ‘It is perfectly true, as the philosophers say, that life must be understood backwards. But they forget the other proposition, that it must be lived forwards.’ Søren Kierkegaard [1843] Declaration The content of this chapter was joint work with Tom Stubbs, with whom I designed and developed cuRRBS. Nevertheless, almost all the text, code and plots here presented were produced by myself. Additionally, I would like to recognise the contributions of Janet M. Thornton and Wolf Reik (who helped designing the study), Antonio J. M. Ribeiro (who implemented the last version of cuRRBS to make it more computationally efficient) and Felix Krueger (who processed the RRBS datasets). All of them also helped in the revision of the final text. This work has been published in the journal Nucleic Acids Research [Martin-Herranz et al., 2017b]. 4.1 Background With the advent of next-generation sequencing, scientists are studying the biology of life at unprecedented resolution [Shendure and Ji, 2008]. Unfortunately, owing to the large size of many commonly studied genomes (human, mouse and tobacco plant for example are all > 2.5 Gbp in size) [Consortium et al., 2001, 2002; Sierro et al., 2014], it is often still prohibitively expensive to conduct whole genome sequencing at high coverage. This creates a trade-off that negatively impacts the number of replicates that can be included and, therefore, it challenges the statistical power and the reproducibility of the studies [Fumagalli, 104 Technological aspects 2013; Wu et al., 2015]. This is true in particular for DNA methylation, where differentially methylated regions ( DMRs) are typically called by identifying changes as small as 10% and where 70−80% of the reads of Whole Genome Bisulfite Sequencing (WGBS) methods contain little to no relevant information on the DNA methylation status [Ziller et al., 2013]. To address these cost inefficiencies, many methods have been developed to reduce the number of genomic fragments that need to be sequenced for a given biological system [Kacmarczyk et al., 2018; Kurdyukov and Bullock, 2016; Plongthongkum et al., 2014; Suzuki and Greally, 2013; Yong et al., 2016]. These methods can be broadly split into those that positively select for genomic fragments of interest and those that deplete for fragments that are not of interest. Positive selection-based methods involve the sites of interest being enriched from the background. This usually occurs through pull-down of these sites via an antibody (e.g. anti-5mC antibody) [Taiwo et al., 2012], a recombinant binding protein (e.g. methyl-CpG-binding domains or MBD) [Brinkman et al., 2010], covalent biotin tagging [Kriukiene˙ et al., 2013], capture probes/baits for the sites of interest [Allum et al., 2015; Cheung et al., 2017; Ivanov et al., 2013], array-based approaches (e.g. 27K, 450K and EPIC arrays in human) [Bibikova et al., 2011, 2009; Hodges et al., 2009; Pidsley et al., 2016] or PCR-based approaches [Bernstein et al., 2015; Deng et al., 2009; Diep et al., 2012; Komori et al., 2011; Paul et al., 2014; Yang et al., 2015]. These methods have many limitations, including enrichment biases, complex protocols and difficulties in quantification [Suzuki and Greally, 2013; Yong et al., 2016]. Current evidence shows that depletion-based methods do not have enrichment biases, tend to be simpler and are more readily quantifiable [Kurdyukov and Bullock, 2016; Suzuki and Greally, 2013]. The most common depletion-based approaches use restriction enzymes to exploit the fact that the nucleotide composition in a given genome is non-random and that the fragment lengths produced from a given digestion will thus reflect this [Bystrykh, 2013; Cedar et al., 1979; Cohen-Karni et al., 2011; Martinez-Arguelles et al., 2014; Yu et al., 2004]. In the case of 5-methylcytosine (5mC), the most common depletion-based method is Reduced Representation Bisulfite Sequencing (RRBS) using the methylation-insensitive restriction enzyme MspI (with the recognition sequence C|CGG) [Boyle et al., 2012; Meissner et al., 2008], although enzymes such as BglII [Meissner et al., 2005], XmaI [Tanas et al., 2017], Taqα I [Lee et al., 2014; Lim et al., 2016], MspJI [Huang et al., 2013] , ApeKI [Wang et al., 2013], HpyCH4IV or HpaII [Kirschner et al., 2016] have also been used. RRBS has proven extremely useful for cost-effective, global studies of DNA methylation [Gu et al., 2010; Lee et al., 2014; Meissner et al., 2008; Stubbs et al., 2017], capturing around 10% of CpG sites 4.1 Background 105 within mammalian genomes but with up to a 30-fold reduction in the number of fragments sequenced in comparison to WGBS [Smith et al., 2009]. In the context of epigenetic clocks, most studies have used methylation arrays in humans [Hannum et al., 2013; Horvath, 2013a; Koch and Wagner, 2011] and MspI-based RRBS in mice, dogs and wolves [Meer et al., 2018; Petkovich et al., 2017; Stubbs et al., 2017; Thompson et al., 2018, 2017]. The utility of the MspI-based RRBS approach is limited to a specific subset of CpG sites in the genome, mainly found within CpG islands and promoters [Meissner et al., 2008]. Nevertheless, it is known that many age-related changes in the methylome occur in other genomic regions (such as enhancers) [Cole et al., 2017b; Martin-Herranz et al., 2019; Slieker et al., 2018, 2016], and current technologies could be biasing our discoveries. Furthermore, epigenetic clocks could be used in the near future to perform high-throughput screenings of anti-ageing drugs or employed as ageing biomarkers in clinical trials [Horvath et al., 2018]. However, the current assay costs could preclude the use of epigenetic clocks in this context. Given that restriction enzyme-based approaches are versatile and simple, we devel- oped a new computational method called customised Reduced Representation Bisulfite Sequencing (cuRRBS), which allows researchers to optimise the RRBS protocol for a spe- cific experiment. cuRRBS generalises the problem of genomic enrichment with restriction enzymes by allowing the user to define both the genome and the particular sites of interest, before outputting the optimal enzyme combinations and size ranges to target these sites. In addition, cuRRBS provides the user with a variety of metrics to compare the various sug- gested protocols, including an estimate of the fold-reduction in sequencing costs compared to WGBS and a robustness value to assess the impact of experimental error in the size selection step. Here, we have tested the enrichment ability of cuRRBS in several biological systems (including the Horvath epigenetic clock), with sites in both CpG and CHG contexts and multiple species, to showcase the generalisability and utility of the software [Domcke et al., 2015; Hanna et al., 2016; Horvath, 2013a; Kawakatsu et al., 2016; Lev Maor et al., 2015; Maurano et al., 2015; Milagre et al., 2017]. In addition, we take advantage of two recently published independent RRBS datasets to demonstrate the accuracy of the software predictions in both single and double enzyme experimental settings [Lim et al., 2016; Tanas et al., 2017]. We hope that cuRRBS will be useful as a tool for designing cost-effective, genome-wide studies in the future, to help in the development of new epigenetic-based predictors and to validate previous results from whole genome approaches in a simple, cheap and timely fashion. 106 Technological aspects 4.2 Restriction enzyme digestion as a tool for genomic en- richment Restriction enzymes represent an incredibly effective tool for the enrichment of certain sites of interest in a genome. This is possible due to the wide variety of motifs that commercially- available restriction enzymes can recognise (Fig. 4.1) combined with the non-random nature of the genome composition itself. Fig. 4.1 highlights that this motif diversity is driven both by the sequence composition (GC content) and the length of the recognition sequence. Thus, different restriction enzymes will generate different fragment length distributions, de- pendent upon how frequently their recognition site is present in a given genome (Fig. 4.2a, Fig. S3.1). GWGCWC GKGCMC GRGCYC GDGCHC GAGCT C GGCC RGGNCCYRGGWCCY GGTNACC GTNAC GTSAC GAATGC GA AN NN NN NN TT GG GCATGC CA AN NN NN GT GG CAY NNN NRT G CTAG AG CT GCAGC TG CA GA GT C AGGCCT TGTACA TG GC CA GT AC GGTACC GATATC RG AT CY GGATG GG AT G AG AT CT TG AT CA GTATCC GGATC GGATCC RA AT TY RCATGY CT NA G CT CA G CT CA G CCTC AAG CTT ATTAAT AA TT ATGAA CATG ATGCAT CATG CATG TC GATC TA GA GA TC GCTCTTCCTCTTC GG TG A CATATG ACATGT CA ST G WGTACW AATAT T RCCGGY WCCGGW GCTAGC AGTACTACRYGT ACCTGC GAAGAC GA AG A CCATC ACTAGT ACCGGT CA GC TG CTC GAG GCCGAGCAATTG GCAGTG CCATGG GT GC AG GAGG AG GCAATG AC TG G CC RY GG CCWWGGCCNNGG CTTG AGCTGG AG ACTGGG CT YR AG CT RY AG CACGAG CCTAGG CYCGRG CTT AAGCTG AAG GA TC CCTNAGG GCTNAGC CCTNAGC AC NN NN NC TC C CCTNNNNNAGG CCANNNNNTGG CCANNNNNNTGG CC AN NN NN NN NN TG G GAA NNN NTT C GA CN NN NN GT C GAC NNN GTC GA GN NN NN CTC AA GN NN NN CT T CAG NNN CTG CAGTG CT GC AG CCTGCAGG CCGG CCCWGGG CCWGG CCWGG ACCWGGT TC NG ATC AT GA TTCGAA TT SA A TTATAA GTTT AAAC TT TA AA ATTTA AAT TT AA TTAATTA A TTAA ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ●●● ● ● ●●● ● ● ● ●●●● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ● −2 −1 0 1 −5.0 −2.5 0.0 2.5 PC1 (30.14%) PC 2 (1 4. 85 % ) ● ● ● ● 75−100% GC content 50−75% GC content 25−50% GC content 0−25% GC content a b Fig. 4.1 The landscape of restriction enzyme motifs. a. Phylogenetic analysis of the motifs that are recognised by the different commercially-available restriction enzymes which are insensitive to CpG methylation. Each sequence represents a different isoschizomer family considered in this study. A neighbour-joining method was used to construct the tree. Motifs with different GC content are shown with different colours. b. Principal component analysis (PCA) performed on the matrix of pairwise distances from the aligned motifs. Each circle represents a different motif. The coordinates of the different motifs on the first two principal components are plotted on the x- and y-axes. Motifs with different GC content are shown with different colours (same as in a.) and the motif length is represented by the diameter of the circle. In DNA methylation studies the most common application is the use of MspI (cutting at C|CGG) in RRBS (Reduced Representation Bisulfite Sequencing), which is used to enrich for CG dinucleotides (CpGs) contained in promoters and CpG islands [Meissner et al., 2008] 4.2 Restriction enzyme digestion as a tool for genomic enrichment 107 a 80 01 −In f 78 01 −8 00 0 76 01 −7 80 0 74 01 −7 60 0 72 01 −7 40 0 70 01 −7 20 0 68 01 −7 00 0 66 01 −6 80 0 64 01 −6 60 0 62 01 −6 40 0 60 01 −6 20 0 58 01 −6 00 0 56 01 −5 80 0 54 01 −5 60 0 52 01 −5 40 0 50 01 −5 20 0 48 01 −5 00 0 46 01 −4 80 0 44 01 −4 60 0 42 01 −4 40 0 40 01 −4 20 0 38 01 −4 00 0 36 01 −3 80 0 34 01 −3 60 0 32 01 −3 40 0 30 01 −3 20 0 28 01 −3 00 0 26 01 −2 80 0 24 01 −2 60 0 22 01 −2 40 0 20 01 −2 20 0 18 01 −2 00 0 16 01 −1 80 0 14 01 −1 60 0 12 01 −1 40 0 10 01 −1 20 0 80 1− 10 00 60 1− 80 0 40 1− 60 0 20 1− 40 0 1− 20 0 XcmI Bsp1407I XbaI HindIII Bsp19I BglII BstENI FauNDI PflFI Asp700I AccB7I AxyI EcoT22I BspHI PciI BmcAI PaeI MfeI AspA2I BlpI Psp124BI BciVI BclI AflII PpuMI BstDSI PvuII Ama87I NmeAIII PasI Eco147I SmiI BsaWI AhdI CsiI KpnI BmtI BstEII BamHI BspQI BauI Eco32I AhlI Bse118I SbfI PaeR7I PacI AsuII AjuI CspCI BsgI Alw21I BpuEI BseRI BaeGI EcoO109I BstXI AanI BmrI SspI Bpu10I BanII BstX2I AcuI Bst6I AlwNI Acc36I AflIII BspMAI BalI BsmI TaqI AseI BbsI Bse3DI BplI FalI BsaXI BsiSI MssI AgeI AclWI BstNSI BssT1I DraI SmlI BpmI TatI BtsI Tsp45I MlyI Bsp1286I BbvI BfmI MslI AgsI TspDTI TscAI BtsIMutI BfaI MaeIII BseMII BspCNI Hpy188I MboII CviAII FatI FaeI AluBI MseI Tru9I MluCI AjnI BciT130I BsaJI BshFI AsuHPI BccI Bse1I BstKTI BfuCI FokI BseGI Csp6I AcsI HpyCH4V BstDEI MnlI Size ranges (bp) R es tr ic tio n en zy m es (i so sc hi zo m er fa m ili es ) P ro po rti on of fr ag m en ts To ta l n um be r of fr ag m en ts M ed ia n fra gm en t le ng th (b p) G C c on te nt (% ) 10 0 10 -4 10 -8 10 -2 10 -1 10 4 10 7 10 8 10 6 10 5 10 4 10 5 10 3 10 2 25 -5 0 0- 25 50 -7 5 75 -1 00 108 Technological aspects b MspI BssAI 5 10 0 5 10 15 Total number of sites 0.5x107 1.0x107 1.5x107 2.0x107 2.5x107 % o f s ite s in p ro m ot er s % of sites in CpG islands MfeI MspI BsmI 6 7 8 45 50 55 60 65 % o f s ite s in n on -c od in g R N A ge ne s % of sites in intergenic regions Total number of sites 0.5x107 1.0x107 1.5x107 2.0x107 2.5x107 c Fig. 4.2 Restriction enzyme digestion as a tool for genomic enrichment. a. Heatmap showing the fragment length distributions generated by different restriction enzymes in the human genome (hg38). Each column represents the distribution for an isoschizomer family of restriction enzymes that contains at least one member which is methylation-insensitive in a CpG context. The distributions are binned in size ranges of 200 bp, ordered as they would appear in an electrophoretic gel. Additional row annotations on top of the heatmap contain information regarding the total number of fragments (in red) and the median fragment length (in blue) produced by each in silico digestion, together with the GC content of the recognition motif in the isoschizomer family (in green). Legend is displayed on the right hand side. b. Scatterplot showing the percentage of cleavage sites from different restriction enzymes that overlaps with CpG islands (x-axis) and promoters (y-axis) in the human genome (hg38). The size of the circles represents the total number of cleavage sites generated by each enzyme. The enzymes MspI and BssAI are highlighted in red and blue respectively. Legend is displayed on the right hand side. c. Scatterplot showing the percentage of cleavage sites from different restriction enzymes that overlaps with intergenic regions (x-axis) and non-coding RNA genes (y-axis) in the human genome (hg38). The size of the circles represents the total number of cleavage sites generated by each enzyme. The enzyme MspI is highlighted in red. The enzymes BsmI and MfeI are both highlighted in blue. Legend is displayed on the right hand side. 4.3 cuRRBS: customised Reduced Representation Bisulfite Sequencing 109 (Fig. 4.2b). However, in many cases, MspI is by no means the most effective restriction enzyme that could be used. For instance, MspI would be a poor restriction enzyme to choose for the enrichment of CpGs found in intergenic regions or non-coding RNA genes in the human genome, which would be far better enriched for using BsmI or MfeI respectively (Fig. 4.2c). In fact, it turns out that across many genomic features MspI is rarely the most optimal methylation-insensitive restriction enzyme (Fig. S3.2). Previous studies have tested the potential of other restriction enzymes and enzyme com- binations to expand the range of CpG sites that can be targeted in a genome [Bystrykh, 2013; Cedar et al., 1979; Kirschner et al., 2016; Lee et al., 2014; Martinez-Arguelles et al., 2014; Tanas et al., 2017; Wang et al., 2013; Yu et al., 2004]. However, to our knowl- edge, there is currently no computational method that systematically explores the capac- ity of all commercially-available restriction enzymes to generate ‘personalised’ reduced- representations of the genome whilst minimising the experimental cost (Fig. S3.3). 4.3 cuRRBS: customised Reduced Representation Bisulfite Sequencing We have developed a novel computational method (cuRRBS) that determines the optimal combination of restriction enzymes and size range to enrich for any given set of sites of interest in any genome. In other words, by modifying two of the steps in the original RRBS protocol (Fig. 4.3a), cuRRBS generalises RRBS. The software takes as input the genomic coordinates that the user wants to target (Fig. 4.3b, Fig. S3.4a). Afterwards, cuRRBS assesses in silico the potential of all single enzymes and double-enzyme combinations to enrich for the sites of interest using the following variables: • NF, which reflects the theoretical number of genomic fragments that will be sequenced after the size selection step (i.e. those whose lengths after the in silico digestion are within the size range). Assuming that the sequencing cost is proportional to NF, cuRRBS attempts to minimise this value. • Score, which reflects the theoretical number of sites of interest that will be sequenced after the size selection step. cuRRBS attempts to maximise this value, which can be calculated as: 110 Technological aspects Score = n ∑ i=1 wi · γi (4.1) where n is the total number of sites of interest, wi is the weight of the ith site of interest and γi is 1 if the ith site would be theoretically sequenced (i.e. present in a size selected fragment and ≤ read length base pairs away from one of the ends of the fragment) and 0 otherwise. • Enrichment Value (EV), which combines both NF and Score into a single number. The objective of cuRRBS is to minimise EV, which can be calculated as: EV =−log10 ( Score NF · n max_Score ) (4.2) where max_Score is the Score obtained if all the sites of interest were sequenced. The NF and Score variables are positively correlated with one another, such that the more genomic fragments sequenced, the more sites of interest are likely to be contained within the reduced representation (Fig. 4.3c, Fig. S3.4b). However, this relationship disappears at higher NF values, where the Score variable becomes saturated such that any additional fragments sequenced will result in a reduction in the overall enrichment of the sites of interest. This Score saturation at high NF is mainly due to additional sites of interest being buried within long fragments that will not be sequenced due to limitations in the read length (cuRRBS parameter –r, see Table 4.1). For a given enzyme or enzyme combination, the NF and the Score variables depend on the size range chosen, since only the genomic fragments within the size range will be present in the reduced representation of the genome. cuRRBS requires that the user sets thresholds for the maximum NF (i.e. minimum CRF, see below) and minimum Score that would be acceptable for a given application (Fig. 4.3b, Fig. S3.4a). These thresholds allow cuRRBS to search through all possible size ranges for a given enzyme or enzyme combination and to find the one that minimises the Enrichment Value (EV ). cuRRBS repeats this procedure for every single enzyme and enzyme combination and reports those with the best hits (i.e. those with the lowest EV s) (Fig. S3.4a). The output file contains the best scoring enzymes with their correspondent size ranges and some other useful variables for each one of the hits, such as: 4.4 Running cuRRBS in different biological systems 111 • Cost Reduction Factor (CRF), which estimates the theoretical fold-reduction in se- quencing costs for the cuRRBS protocol when compared to Whole Genome Bisulfite Sequencing (WGBS). The CRF for a given cuRRBS protocol can be calculated as: CRF = NFre f NF = g/r NF (4.3) where NFre f is the estimated number of fragments that would be sequenced in a WGBS experiment, that can be roughly calculated as the genome size (g) divided by the read length (r). • Robustness (R). This assesses how much the cuRRBS prediction varies if a slightly different size range is used (Fig. 4.3d). The results for robust enzymes will not be greatly affected as a consequence of experimental error during the size selection step. This will help the user to make an informed decision on which enzyme combination to choose for the system of interest (Fig. S3.4c). The robustness of a given enzyme (combination) is calculated as: R = e−θ (4.4) with θ = ∑x∈{a−δ ,a,a+δ}∑y∈{b−δ ,b,b+δ} |EVx,y−EVa,b| EVa,b (4.5) where EVa,b is the EV for the optimal size range (a: lower limit in size range, b: breadth) and δ is the experimental error (in bp) that is assumed during the size selection step. The robustness will take values in the interval (0,1], with higher values identifying robust cuRRBS protocols. 4.4 Running cuRRBS in different biological systems cuRRBS provides a way to effectively interrogate DNA methylation in any biological system (including the CpG sites that constitute different epigenetic clocks) for which the reference 112 Technological aspects a b c d gDNA cuRRBS-defined enzyme combination Standard library preparation Bisulfite conversion PCR amplification cuRRBS-defined size selection Sequencing Restricted ends Illumina adapters Maximum no. of fragments to sequence Minimum proportion of sites of interest cuRRBS combination to useorder size range robustnessCRF 1st 40-200 bp 0.95205 2nd 100-135 bp 0.4163 ... 200-350 bp 0.897 30th 50-250 bp 186 Genome of interest Sites of interest % o f m ax im um S co re ●●● ●● ●●●●● ● ● ●●●● ●● ●●●●●● ● ●●● ● ● ●● ●●●● ●●● ●●● ●● ● ●● ● ●● ● ● ● ●● ●●●●● ● ● ●●●●● ●● ●● ●● ●● ●●●●●● ● ● ● ● ●● ●●●●●●● ● ● ●●●● ●● ● ●● ● ●●●●●●●●●● ● ● ●●●●●●● ● ● ● ●●● ●● ● ● ●●●● ●● ●●●●●● ● ● ● ●●●●●●●●●● ● ● ●● ●● ● ● ● ● ● ●●● ●●● ●● ● ● ●●●● ●●●●●●●●● ● ● ● ●● ● ● ● ● ●● ● ●●●● ●●●●● ● ●●●●●●● ● ●●●●● ● ●● ● ● ● ● ●● ●● ●●●●●● ● ●● ●●● ●●●● ●● ●●●●● ●● ● ● ● ● ●● ● ●● ●●●●● ●●●●● ● ●●● ●● ●●●●●● ● ●● ● ● ●● ● ● ●● ● ●●●● ●●● ● ● ●● ● ●●●●●●●●● ●● ●●● ● ● ● ● ● ● ●● ●●●●● ● ● ●● ●●●● ●● ●●●● ● ●●●●●● ●●● ●● ● ● ● ● ●● ●●●●● ●●●●●● ●●●●●●●●●●●● ●●●● ● ● ● ● ● ●● ●●●●● ● ● ● ● ● ●●●● ●●●●●●●●●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●●●●●●● ● ●●●●●● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●●●● ●●●● ● ● ●●●●●● ●●● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ●●●●●● ●●●●●● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ●●● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●●● ●●●● ● ●●●● ●●● ● ● ● ● ● ● ● ● ● ●●●● ● ●●● ● ●●● ● ● ●●●● ●●● ● ● ● ● ● ● ● ● ●●● ●● ● ●●● ●●●● ● ● ●●●● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ●●●● ● ● ●●● ●●● ● ● ● ● ● ● ●● ● ● ●● ● ●●●● ●● ● ● ● ●● ●●● ● ● ● ● ● ●● ● ●●●● ● ●●● ●●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● 60-540 bp 0 25 50 75 100 0 50 100 150 NF/1000 BsaWI & BssAI Pearson's correlation coefficient 0.9583 1.0 1.5 2.0 2.5 3.0 0 200 400 600 800 1000 200 400 600 800 1000 ****** * * * Lower limit size range (bp) B re ad th (b p) EV landscape for BsaWI & BssAI Optimal size range: 60-540 bp Robustness (d=20 bp): 0.934 ● ● ● Did not pass filtering Optimal size range Passed filtering Fig. 4.3 cuRRBS overview. a. Outline of an RRBS protocol. Highlighted are the two steps that would be modified according to the output produced by cuRRBS (i.e. the restriction enzymes used for the genomic digestion and the size selection). Legend is displayed on the bottom left. b. Schematic of cuRRBS. Highlighted are the two main inputs required for the software and the two thresholds that the user has to define (red and purple tags). The default output for cuRRBS is a table containing the top hits (restriction enzyme combination and size range) along with additional information that might be useful to the user (such as Cost Reduction Factor and robustness). c. Scatterplot showing the trade-off between the number of fragments (NF) and the Score for the best enzyme combination (BsaWI & BssAI) that targets the CpGs present in the human placental-specific imprinted regions [Hanna et al., 2016]. NF is divided by 1000 for visualization purposes. Each point represents a different size range. Shown in dark blue and grey are the size ranges that would and would not pass filtering respectively. Shown in orange is the optimal size range in the filtered search space. The dotted lines depict the thresholds that need to be specified by the user (red: maximum NF; purple: minimum percentage of the maximum Score). In this mock example we specified an NF threshold of 150000 fragments and a Score threshold of 25% of the maximum Score. Legend is displayed below the plot title. d. Contour plot that depicts how the robustness (R) variable is calculated for the optimal enzyme combination (BsaWI & BssAI; size range: 60-540 bp) that targets the CpGs present in the human placental-specific imprinted regions [Hanna et al., 2016]. Enrichment values (EVs) are calculated for all possible size ranges in order to create an EV ‘landscape’. In this landscape, cuRRBS finds the size range with the lowest EV that still satisfies the thresholds (asterisk in green). Afterwards, cuRRBS samples EVs around the optimum (asterisks in black). The points that are sampled depend on the experimental error (in this case, δ = 20 bp). A high robustness value means that the sampled EVs do not change a lot when compared to the optimum, which implies that cuRRBS prediction will not be greatly affected by experimental errors during the size selection step. 4.5 Experimental validation of cuRRBS 113 genome is available. Besides reducing the cost for organisms currently under intensive study (e.g. human, mouse), cuRRBS opens the door to the cost-effective study of DNA methylation in species with large genomes or where DNA methylation in non-CpG contexts is common, such as plants [Stroud et al., 2013], which currently lack an MspI-based RRBS protocol, owing to the enzyme’s CHG methylation sensitivity [Sun et al., 2014b]. We decided to test the ability of cuRRBS to enrich for genomic sites that have important functional roles in different systems. Some of the systems that we tested in silico include genomic regions whose methylation status is important during cellular reprogramming [Milagre et al., 2017], Horvath’s epigenetic clock [Horvath, 2013a], transcription factor binding sites that are affected by DNA methylation [Domcke et al., 2015; Maurano et al., 2015], imprinted loci [Hanna et al., 2016], CpGs found in the exon-intron boundaries [Lev Maor et al., 2015] and CHG sites that are differentially methylated between different arabidopsis accessions [Kawakatsu et al., 2016] (Fig. S3.5). For these in silico systems we chose to run the software with the threshold set to 25% of the maximum Score. In all cases, cuRRBS is able to dramatically reduce the cost associated with the sequencing by several orders of magnitude compared to WGBS, which is assessed using the Cost Reduction Factor (CRF) (Fig. 4.4). In addition, for cases where a comparison to MspI-based RRBS could be made, cuRRBS is able to improve the CRF, again, by orders of magnitude. As an example, for the placental-specific imprints, the sequencing costs are reduced by approximately 400-fold when compared to WGBS and by 12.5-fold when compared to the traditional MspI-based RRBS. Furthermore, we have also observed that many of the top hits reported by cuRRBS are digestions of two restriction enzymes (Fig. S3.5), highlighting the combinatorial power of restriction enzymes to produce optimal reduced representations of the genome [Bystrykh, 2013]. Excitingly, we are able to show that using cuRRBS it is possible to assay a far larger number of target sites, in a far simpler experimental design than would normally be achieved using amplicon-based bisulfite sequencing. 4.5 Experimental validation of cuRRBS To assess in an unbiased manner how well predictions from cuRRBS perform in an experi- mental setting, we employed two independent non-canonical RRBS datasets: one generated from a single enzyme (XmaI) and the other from a combination of two restriction enzymes (MspI and Taqα I) [Lim et al., 2016; Tanas et al., 2017]. By evaluating the predictive power 114 Technological aspects Ara bid op sis CH G s ites Mo us e i PS Cs de me thy lat ed Mo us e i PS Cs m ain tain ed Mo us e N RF 1 s ite s Hu ma n e xo n− int ron bo un da rie s Hu ma n e pig en etic clo ck Hu ma n i mp rin ted lo ci Hu ma n C TC F s ites Hu ma n p lac en tal im pri nte d l oc i 0 100 200 300 400 Co st R ed uc tio n Fa ct or (C RF ) Fig. 4.4 Running cuRRBS in different biological systems. Barplot showing the values for the Cost Reduction Factor (CRF) in the different biological systems that were tested (see Fig. S3.5) [Domcke et al., 2015; Hanna et al., 2016; Horvath, 2013a; Kawakatsu et al., 2016; Lev Maor et al., 2015; Maurano et al., 2015; Milagre et al., 2017]. The colours in the bars represent the different species interrogated (green: Arabidopsis thaliana, blue: Mus musculus, red: Homo sapiens). The CRF for the traditional RRBS protocol (MspI in the human genome, using a bead size selection step of 20-800 bp, CRF = 30.65) is displayed as a grey area, which is not compared with the A. thaliana system (since MspI is sensitive to CHG methylation). of cuRRBS in these two datasets, we were able to observe cuRRBS’ performance in both single and double enzyme contexts and across different genomes. To test the accuracy of cuRRBS predictions in the context of a single enzyme digestion, we utilised the non-canonical RRBS dataset generated from human DNA using the restriction enzyme XmaI [Tanas et al., 2017]. This dataset was previously used to show that XmaI could enrich for CpG islands (CGIs), while reducing the overall sequencing cost relative to MspI, making the protocol more cost-effective. To validate cuRRBS using this system, we therefore chose to enrich for all CpG sites that overlapped with a CGI (CGI-CpGs) in the human genome using a predetermined theoretical size range equivalent to the ‘reproducible library fragment lengths’ reported in Tanas et al. [2017] (i.e. 90-185 bp). cuRRBS predicted with high accuracy the CpG sites that were observed in the experimental XmaI-RRBS dataset (Fig. 4.5a). In particular, only a small proportion of the total number of CGI-CpGs should be theoretically sequenced (102253 out of 2164614 i.e. 4.72%), and this was indeed the case (Fig. 4.5a). Furthermore, upon filtering out sites with low depth of coverage, which commonly represent noise in RRBS datasets, the sensitivity increased up to approximately 80%. Importantly, the specificity remained constant at almost 100% independent of the threshold set for depth of coverage (Fig. 4.5b). Thus, cuRRBS produces a prediction that is 4.6 Conclusions and future directions 115 relatively conservative, as highlighted by the low numbers of false positives (Fig. 4.5a), at the expense of a small decrease in sensitivity. Interestingly, the original theoretical size range that the study was aiming for (110-200 bp) was slightly different to the one achieved in the actual experiments (90-185 bp) [Tanas et al., 2017]. We ran cuRRBS using the original size range target and obtained slightly worse results for the sensitivity but not the specificity of the prediction (Fig. S3.6). This demonstrates that the correct execution of the size selection step during the experimental protocol is key for obtaining the sites predicted by cuRRBS and highlights the importance of the robustness variable as part of the cuRRBS output in order to judge the consequences of these experimental errors. To test the accuracy of cuRRBS predictions in the context of a double enzyme digestion, we utilised the non-canonical RRBS dataset generated from mouse DNA using the restriction enzymes MspI and Taqα I [Lim et al., 2016]. To compare the accuracy of cuRRBS prediction in this double enzyme system to that of the XmaI-RRBS system, we again ran cuRRBS for CGI-CpGs, this time in the mouse genome with a theoretical size range of 80-160 bp [Lim et al., 2016]. cuRRBS predicted with high accuracy the CpG sites that were observed in this double enzyme experiment (Fig. 4.5c). In addition, the results for sensitivity and specificity were very similar to the ones reported for the XmaI-RRBS dataset (Fig. 4.5d). Therefore, we conclude that cuRRBS produces robust predictions for the sites of interest that will be sequenced in RRBS protocols both for single and double enzyme combinations independent of the genome under study. Lastly, the number of fragments that were theoretically recoverable in each of our experi- mental systems ranged from NF = 12780 (for XmaI) to NF = 331058 (for MspI and Taqα I). This represents approximately a 30-fold difference in the number of recoverable fragments and demonstrates that cuRRBS predictions, even for low NF values, are experimentally feasible. Importantly, in the nine theoretical examples that we report (Fig. S3.5), the number of fragments required by each cuRRBS protocol ranges from 107248 to 974050. Thus, the number of fragments required to achieve the stated CRF comfortably exceeds the minimum experimentally validated NF value (>8-fold). 4.6 Conclusions and future directions cuRRBS provides a new framework that allows the user to optimise RRBS for the biological system of interest by using novel combinations of restriction enzymes. Therefore, cuRRBS 116 Technological aspects 0 500000 1000000 1500000 2000000 0 5 10 15 20 Depth of coverage threshold Nu m be r o f s ite s FN FP TN TP ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 25 50 75 100 5 10 15 20 Depth of coverage threshold Pe rc en ta ge (% ) ● ● Sensitivity = TP TP + FN ⋅ 100 Specificity = TN FP + TN ⋅ 100 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 0 5 10 15 20 Depth of coverage threshold Nu m be r o f s ite s FN FP TN TP ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 25 50 75 100 5 10 15 20 Depth of coverage threshold Pe rc en ta ge (% ) ● ● Sensitivity = TP TP + FN ⋅ 100 Specificity = TN FP + TN ⋅ 100 a b c d Fig. 4.5 Experimental validation of cuRRBS. a. Barplots showing the number of true positives (TP, in green), true negatives (TN, in blue), false positives (FP, in red) and false negatives (FN, in orange) when comparing cuRRBS theoretical prediction with the actual XmaI-RRBS experimental data [Tanas et al., 2017] (see section 4.7 for more details). The number of sites in each category is calculated for different thresholds in the depth of coverage (number of reads covering a CpG site as reported by Bismark). cuRRBS prediction for the CpG sites in human CpG islands was obtained enforcing a theoretical size range of 90-185 bp and running the software for XmaI with all the default parameters (with a read length of 200 bp). Legend is displayed on the right hand side. b. Plot showing values of cuRRBS sensitivity (in light green) and specificity (in cyan) as a function of the depth of coverage threshold employed to filter the experimental data [Tanas et al., 2017]. The number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) are the same as in a. Legend is displayed below the plot curves. c. Same as in a. but for the MspI&Taqα I-RRBS experimental data [Lim et al., 2016]. cuRRBS prediction for the CpG sites in mouse CpG islands was obtained enforcing a theoretical size range of 80-160 bp and running the software for MspI&Taqα I with all the default parameters (with a read length of 75 bp). d. Same as in b. but for the MspI&Taqα I-RRBS experimental data [Lim et al., 2016]. 4.7 Additional methods 117 makes the study of DNA methylation more affordable across all species for which genomic sequences are available. Furthermore, it can open the door to the design of future studies in a clinical context [Lee et al., 2014], which require cost-effective and robust protocols. Currently, cuRRBS only considers combinations of up to two restriction enzymes. How- ever, in the future, it would be possible to adapt the software to explore combinations that contain higher numbers of enzymes, which could theoretically allow targeting the sites of interest even more efficiently [Bystrykh, 2013]. Moreover, there are several methods that are able to impute DNA methylation levels in sites that are not covered experimentally [Angermueller et al., 2017; Zhang et al., 2015b]. These methods could expand the set of sites of interest that are finally measured by making use of the additional DNA methylation information that is retrieved in a cuRRBS experiment. Finally, the potential of restriction enzymes to target different genomic coordinates is not limited to DNA methylation. As such, it would be conceivable for cuRRBS to be adapted to enrich for SNPs of interest [Davey and Blaxter, 2011; Davey et al., 2011] or to optimise chromosome conformation capture techniques [Dekker et al., 2013; Naumova et al., 2012]. By reducing the cost associated with sequencing, we believe that cuRRBS will help to democratise high-throughput genomic studies. 4.7 Additional methods Restriction enzymes annotation All the information regarding the commercially-available restriction enzymes that are used by cuRRBS was extracted from REBASE [Roberts et al., 2005, 2015]. Restriction enzymes were grouped in isoschizomer families (i.e. enzymes that recognise the same sequence and generate identical fragment length distributions) and each enzyme was manually annotated for different types of methylation-sensitivity (CpG, CHG, CHH). Only isoschizomer families that contained at least one methylation-insensitive enzyme were considered for the examples described here. Genome assemblies and genomic annotation All the analyses presented here were performed in the following genome assemblies: Homo sapiens (hg38), Mus musculus (mm10) and Arabidopsis thaliana (TAIR10). Scaffolds not assembled into the main chromosomes were discarded. Genomic annotation for the human 118 Technological aspects genome (hg38) was obtained from GENCODE (v25, basic gene annotation) [Harrow et al., 2012], with the exception of CpG islands (CGIs), which were extracted from the UCSC Genome Browser [Bock et al., 2007]. GC content and CpG content were calculated, around each restriction enzyme cleavage site, taking windows of ± 25 bp and ± 500 bp respectively. For each enzyme, the mean of all cleavage sites was calculated to obtain the mean GC content and the mean CpG content. Intragenic regions were defined as those regions within ± 2.5 kb of a protein-coding gene, whilst the rest of the genome was considered to be intergenic. CpG shores were defined as regions 0 to 2 kb away from CGIs in both directions and CpG shelves as regions 2 to 4 kb away from CGIs in both directions [Zhang et al., 2015b]. Promoters were defined as encompassing a 3 kb region (2.5 kb upstream and 0.5 kb downstream of the TSS) relative to the TSS of all protein-coding transcripts in GENCODE, similar to the strategy used in Taher et al. [2013]. Genomic annotation for the CGIs in the mouse genome (mm10) was also obtained from the UCSC Genome Browser [Bock et al., 2007]. All annotations were handled using the pybedtools library [Dale et al., 2011; Quinlan and Hall, 2010]. Performing in silico digestions of a given genome We used the Restriction package from Biopython v1.68 to digest the different genomes with the appropriate restriction enzymes in silico [Cock et al., 2009]. Only the first member of a given isoschizomer family (which contained at least one methylation-insensitive enzyme) was processed to avoid redundant computations. The output of the in silico digestions was stored (pre-computed files) and subsequently read by cuRRBS when needed to reduce the computational time (see ‘cuRRBS heuristics and computational efficiency’). When assessing enzyme combinations, the information from the appropriate individual pre-computed files (i.e. the genomic coordinates where the enzyme theoretically cuts) were combined by the software to compute all the necessary variables. cuRRBS’ enzyme flexibility To ensure the user has full control over the enzymes that cuRRBS will use to derive the desired enrichments, one of the inputs given to cuRRBS is an enzyme annotation file. This file contains the desired isoschizomer families that the user wishes to be tested by cuRRBS. In my GitHub repository we have already defined enzyme annotation files for enzymes that are methylation-insensitive in a CG context and in CG, CHG and CHH contexts [Martin-Herranz et al., 2017a]. However, it is also possible for the user to define a personalised set of enzymes by providing a self-generated annotation file. This can be useful, for instance, to reduce the chance of any star activity in the reported cuRRBS protocols. 4.7 Additional methods 119 cuRRBS parameter (abbrev.) Significance Default Range Enzymes to check (-e) Defines the enzymes (isoschizomer fami- lies) that cuRRBS will look at - - Annotation for the sites of inter- est (-a) Allows identification and weighting of the sites of interest - - Read length (-r) Defines the positions in the theoretical frag- ments that can be ‘seen’ after sequencing - 30-300 Adapters size (-s) Ensures correct experimental size selection - - C_Score constant (-c) Sets the minimum acceptable Score - 0-1 Genome size (-g) Needed to calculate the CRF - - C_NF/1000 constant (-k) Sets the minimum acceptable CRF 0.2 0-1 Experimental error (-d) Sets the assumed experimental error (δ ) 20 5-500 Size range breadth (-b) Constrains the breadth of the size range 980 - Output size (-t) Defines the number of cuRRBS protocols the user can compare 30 - Site IDs (-i) Enables the identification of the recovered sites of interest No - Table 4.1 Flexible user-defined cuRRBS parameters. This table details the flexible user-defined parameters that cuRRBS will accept as arguments. The cuRRBS parameter full name and command line abbreviation (in brackets) are provided alongside a simplified description of the significance of these arguments to the user. Where applicable, the defaults and ranges of these arguments are also detailed. In addition, the output file from cuRRBS contains, by default, 30 cuRRBS protocols that would enrich for the user’s sites of interest. Therefore, the user can determine which enzyme combination and size range would be the simplest and most appropriate for the given application. This provides the user with the opportunity to consider experimental factors that may complicate the protocol, such as buffer compatibility and whether consecutive digestions would be required. Flexible user-defined cuRRBS parameters cuRRBS contains a number of user-defined parameters to ensure the greatest possible flexibility and ease of use. A table of these parameters is provided to highlight the versatility that the user has and why such versatility is useful (Table 4.1). cuRRBS heuristics and computational efficiency cuRRBS employs several strategies to reduce the computational time needed in each run: 120 Technological aspects • Restriction enzymes are grouped in isoschizomer families. Since isoschizomers gen- erate the same genomic digestions, only one member of each family needs to be processed. • In silico digestions are read from pre-computed files. Digesting the genomes would be a limiting factor in the cuRRBS pipeline. The user can download the pre-computed files [Martin-Herranz et al., 2017a] and the information that they contain is read every time that an enzyme needs to be assessed. • The number of size ranges that are sampled is minimised. Since the experimental size selection step is generally imperfect, size ranges are sampled with a sliding window whose ‘resolution’ is equivalent to the experimental error specified by the user. • Parallelization. cuRRBS can use several cores to decrease the CPU time. Moreover, we have observed that, in many enzyme combinations, one of the enzymes is providing most of the enrichment for the sites of interest, while the second one complements the targeting. Therefore, it would be possible to implement a ‘heuristic’ mode, where only those enzymes that perform well individually are used as ‘seeds’ to construct combinations (as opposed to the current implementation, where all the enzyme combinations are checked exhaustively). This could further reduce the computational time, especially if combinations of more than two enzymes were being evaluated. The CPU time required by cuRRBS depends on several parameters, including the number of enzymes checked, the experimental error, the number of sites of interest or the genome size (Fig. S3.7). The RAM used will be approximately equal to the size of the pre-computed files that are read by the software. A standard cuRRBS run (e.g. for a few thousand sites of interest in the human genome, checking 128 CpG methylation-insensitive isoschizomer families) takes around 0.5-1 hours and uses around 4 GB RAM, which allows the user to easily run it on a dual-core laptop or desktop computer. Obtaining the sites of interest for different biological systems We have tested in silico the ability of cuRRBS to enrich for the sites of interest in a selection of different biological systems where DNA methylation has an important functional role. In some of these systems, described below, previous analysis was performed in order to obtain the genomic coordinates for the sites: 4.7 Additional methods 121 • Exon-intron boundaries in human. Exons and introns were obtained from protein- coding genes using GENCODE annotation data. Those CpG sites that were found within ± 5 bp of a canonical splice site (5’-GT, 3’-AG) were selected. • Epigenetic clock in human. These sites were obtained from the Horvath epigenetic clock [Horvath, 2013a] and were lifted over to hg38 [Kuhn et al., 2012] before running cuRRBS. • Canonical and placental imprints in human. These loci were obtained from Hanna et al. [2016]. The sites were lifted over to hg38 [Kuhn et al., 2012] and the CpG sites were then extracted for the analysis. • CTCF binding sites in human. We obtained the CpG sites that overlap with in vivo CTCF binding sites. Peaks from sites that seem to be affected by methylation (upregu- lated, reactivated) were kindly provided by Dr. M. T. Maurano [Maurano et al., 2015]. We scanned the peaks for high-scoring motifs according to the CTCF JASPAR model [Mathelier et al., 2015]. Finally, we extracted those CpGs that were found in positions 5 and 15 of the motif, whose methylation status is supposed to influence the binding of the transcription factor [Maurano et al., 2015]. • Induced pluripotent stem cells (iPSCs) demethylated and maintained sites in mouse. These were obtained by comparing mouse embryonic fibroblasts (MEFs) to iPSCs as described previously [Milagre et al., 2017], with an additional filter for magnitude of methylation change (>50% methylation change). • NRF1 binding sites in mouse. We obtained the CpG sites that overlap with in vivo NRF1 binding sites in mouse. ChIP-seq data was processed as described in the original publication [Domcke et al., 2015], where peaks were called using Peakzilla [Bardet et al., 2013]. We took as our final set of peaks the overlap between the two TKO replicates. Next, we scanned the peaks for high-scoring motifs according to the NRF1 JASPAR model [Mathelier et al., 2015]. Finally, we extracted those CpGs that were found in positions 2 and 8 of the motif, whose methylation status is supposed to influence the binding of the transcription factor [Mathelier et al., 2015]. • CHG sites in Arabidopsis thaliana. Non-CpG DMRs arising from the epigenomic diversity between Arabidopsis thaliana accessions were obtained from Kawakatsu et al. [2016]. The coordinates for C sites in non-CpG context were extracted. In all the cases the sites were equally weighted (wi = 1), with the exception of the human epigenetic clock system, where the sites were assigned the absolute value of the weights in 122 Technological aspects the linear model [Horvath, 2013a]. All the site annotation files can be found in my GitHub repository [Martin-Herranz et al., 2017a] Running cuRRBS for the different biological systems cuRRBS was run in the different systems described above using the default parameters (k = 0.2, d = 20, b = 980, t = 30), for a read length (r) of 75 bp and a Score threshold (c) of 0.25. In the mouse and human examples we considered 128 isoschizomer families that contained enzymes that were not sensitive to CpG methylation. In the case of Arabidopsis thaliana we used 28 isoschizomer families that contained enzymes that were not sensitive to 5mC in any context (CG, CHG, CHH). Mapping of RRBS samples XmaI-RRBS data generated on the Ion Torrent platform [Tanas et al., 2017] and MspI&Taqα I -RRBS data generated on the Illumina HiSeq platform [Lim et al., 2016] were quality trimmed using Trim Galore (www.bioinformatics.babraham.ac.uk/projects/trim_galore/) and had base pairs removed from the 3’ end to avoid including filled-in nucleotides with artificial methylation states (the filled-in XmaI, MspI and Taqα I cut sites include the nucleotide sequence CCGG, CG and CG respectively). The data was then mapped to the human genome (for XmaI data, parameters: –non_directional) or the mouse genome (for MspI&Taqα I data, parameters: –directional) using Bismark (0.18.0) [Krueger and Andrews, 2011]. In each of the two cases data from different experiments or replicates was merged into the same FASTQ file prior to quality trimming. Estimating cuRRBS’ sensitivity and specificity We assessed the performance of cuRRBS predictions in two independent experimental datasets [Lim et al., 2016; Tanas et al., 2017] (see section 4.5). We ran cuRRBS fixing the theoretical size ranges tested to the ones reported in the publications [Lim et al., 2016; Tanas et al., 2017] and we used as our sites of interest the CpGs that overlapped with CpG islands (CGI-CpGs) in the human [Tanas et al., 2017] and the mouse genomes [Lim et al., 2016] respectively. From the cuRRBS output files we recovered the IDs of the sites that should be theoretically sequenced. Moreover, using the experimental RRBS data [Lim et al., 2016; Tanas et al., 2017], we could obtain the IDs of the sites that were actually sequenced (filtered by a given depth of coverage threshold). Afterwards, we calculated the following variables for each one of the datasets: 4.7 Additional methods 123 • True positives (TP): number of CGI-CpGs that cuRRBS predicted to be sequenced and were indeed found in the RRBS data. • True negatives (TN): number of CGI-CpGs that cuRRBS predicted to be absent and were not found in the RRBS data. • False positives (FP): number of CGI-CpGs that cuRRBS predicted to be sequenced but were not found in the RRBS data. • False negatives (FN): number of CGI-CpGs that cuRRBS predicted to be absent but were found in the RRBS data. Finally, we estimated the sensitivity and specificity, for a given dataset, as follows: Sensitivity = T P T P+FN ·100 (4.6) Speci f icity = T N FP+T N ·100 (4.7) Software availability cuRRBS and its documentation are freely distributed under GNU General Public License v3.0 and can be accessed in my GitHub repository [Martin-Herranz et al., 2017a]. Chapter 5 Final remarks ‘Caminante, son tus huellas el camino, y nada más; caminante, no hay camino: se hace camino al andar.’ Antonio Machado [1912] The purpose of this thesis was to advance our understanding of the epigenetic ageing clock in humans. I now review the main conclusions from this work and propose future directions that could be of interest. 5.1 Statistical aspects In Chapter 2, I have assessed different statistical methods that allowed me to characterise the epigenetic landscape during human physiological ageing. To date, DNA methylation data from blood, generated in the Illumina 450K methylation array platform, is the most abundant epigenetic data type available to study human ageing. I built a dataset of this data type for healthy individuals, pre-processed it and benchmarked different methods to correct for blood cell composition changes. I reproduce previous findings showing that a great proportion of the epigenome is affected by the ageing process (in my case around 30%, using a conservative threshold to correct for multiple testing). This highlights that the epigenetic ageing clock is a genome-wide phenomena that extends way beyond the cytosines included in most epigenetic clock models. Furthermore, the small effect sizes suggest that most age-related DNA methylation changes occur only in a small proportion of cells (DNA molecules) in the tissue (around 4% on average for the entire human lifespan). 126 Final remarks Finally, I tested the behaviour of different epigenetic clocks (Horvath, Hannum, epiTOC) and developed a strategy to correct for potential batch effects in this context. Current epigenetic clocks use a linear modelling framework. Nevertheless, many changes in methylation values during ageing are non-linear (for example during organismal growth). Horvath’s clock corrects for this by transforming chronological age, but it would be inter- esting to try to model the changes of individual CpG sites before including them as part of the training. This could also help to identify modules of CpG sites that behave in the same way during ageing and allow deconvoluting the different processes that shape the epigenetic landscape and may be operative at different life stages. Additional improvements in epigenetic clocks will likely include integrating longitudinal information (which could help to identify different ageing trajectories) [Jensen et al., 2014] and separating the con- tributions of mutations and epimutations to the methylation signal. Furthermore, it would be interesting to try to map the shapes of the DNA methylation changes to the changes in mortality rate at a human population level, therefore creating a link between molecular changes and epidemiological observations. This could be further validated in species with extremely different profiles of mortality rate (e.g. naked mole rat). Current multi-tissue epigenetic clocks have been trained on all tissues available. Nev- ertheless, it is reasonable to assume that the strategies to maintain stable DNA methylation landscapes over time would significantly differ between highly proliferative tissues (such as blood) from those where cell division is a rare event (such as brain). Thus, building different epigenetic clocks for these two categories of tissues and analysing their genome-wide changes in DNA methylation over time could improve the accuracy of the models and provide further insights into the role of cell division on the epigenetic ageing clock. There are methods that allow imputing DNA methylation patterns based on different genomic features for a ‘static’ epigenome, both at the bulk [Zhang et al., 2015b] and single- cell levels [Angermueller et al., 2017; Kapourani and Sanguinetti, 2019]. Given that the regions that change their DNA methylation during ageing seem to share the genomic context, it would be interesting to design an imputation algorithm for a ‘dynamic’ epigenome (e.g. given an epigenome at time t, predict what that epigenome would look like at time t +∆t). Furthermore, it would be fascinating to attempt training machine learning models (e.g. deep neural networks) to predict whether a given CpG site or region will change the methylation status with age (and the direction and magnitude of the change) from the DNA sequence. This could give us additional insights into how much the ageing-related changes are hard-coded in the genome and how much the environment and lifestyle contribute to 5.2 Biological aspects 127 modify it. Moreover, some of these predictions could be tested by introducing exogenous pieces of DNA in ageing mice. Developmental disorders are useful biological systems to study the effects of altered functions in specific parts of the epigenetic machinery. As such, the analysis presented in Chapter 3 could be further expanded into a statistical framework that allows quantifying how much certain epigenetic functions contribute to the methylation status of specific regions. In other words, the definition of epimutational signatures (e.g. epimutational signature 1 is the consequence of reduced H3K4 methylation in enhancers) that would allow to deconvolute the epigenetic processes behind a specific DNA methylation pattern (e.g. the one caused by ageing or smoking exposure). 5.2 Biological aspects The goal of Chapter 3 was to study how different parts of the epigenetic machinery affect the rate of the epigenetic ageing clock, thus providing the first identified components of the hypothetical epigenetic maintenance system [Horvath, 2013a]. For that purpose, I studied the epigenetic age acceleration observed in patients with developmental disorders, many of which harbour mutations in proteins of the aforementioned epigenetic machinery. This analysis revealed that mutations in NSD1, an H3K36 methyltransferase, dra- matically accelerate epigenetic ageing. The effect sizes observed (on average > 7 years) are bigger than many of the conditions reported to accelerate the epigenetic ageing clock [Horvath and Raj, 2018]. Importantly, the genomic context where these changes happen is partially shared with the ageing process. Regions marked by H3K27me3, deposited by Polycomb Repressing Complex 2 (PRC2), were highly enriched for these changes both in ageing and Sotos, consistent with previous reports. Interestingly, global DNA hypomethyla- tion (a characteristic of Sotos patients) causes a redistribution of PRC2 and H3K27me3 from their normal targets (many of them developmental genes marked with bivalent chromatin) to other genomic regions, which leads to the aberrant expression of some of these genes [Reddington et al., 2013]. Importantly, there is a mechanistic link between PRC2 recruitment and H3K36me3 via the Tudor domains of some polycomb-like proteins [Cai et al., 2013; Li et al., 2017]. As such, it would be expected that perturbations in the H3K36 methylation landscape would affect PRC2 activity. Furthermore, methylation of CpG sites in normally unmethylated CpG islands could also lead to a loss of PRC2 binding [Li et al., 2017]. This could be happening in bivalent regions / DNA methylation valleys (DMVs) during ageing and affect the differentiation process of progenitor stem cells in adult tissues. Indeed, this 128 Final remarks seems to be the case for aged haematopoietic stem cells [Beerman et al., 2013; Sun et al., 2014a], but whether this applies to other tissues still needs to be elucidated. Importantly, DNA methylation changes affecting progenitor stem cells could be propagated in the tissue, therefore contributing substantially to the signal captured by epigenetic clocks. Hence, during ageing, there could be a redistribution of PRC2 from bivalent regions / DMVs to other regions that have become hypomethylated, at the same time that de novo DNMT3A/B get relocated in the opposite direction (as shown in Fig. 3.8), leading to a deregulation in the expression of developmental genes. This model expands and is overall compatible with the one proposed by Zheng, Widschwendter and Teschendorff to explain the increase in cancer risk with age [Zheng et al., 2016]. While this could be induced by the rewiring of the H3K36 methylation landscape, direct evidence needs to be provided to ascertain that this is indeed the case during human physiological ageing. As such, it would be interesting to profile H3K36me3 during ageing in different tissues. Furthermore, differential expression of genes coding for the H3K36 methylation machinery (both methyltransferases and demethylases) during ageing would also be expected (e.g. by hypermethylating the promoter of NSD1, as observed in human neuroblastoma and glioma cells) [Berdasco et al., 2009]. Moreover, a study showing if cryptic transcription increases during human ageing (something that seems to happen in model organisms) could contribute to our understanding of the global functional consequences of these epigenetic changes. Finally, genes with lower levels of H3K36me3 should be more prone to cryptic transcription during ageing [Pu et al., 2015] and potentially display higher transcriptional heterogeneity between cells. There is conflicting evidence on the literature on whether NSD1 can also catalyse the methylation of H4K20 in vivo [Berdasco et al., 2009; Kudithipudi et al., 2014]. H4K20me1 is a histone mark highly enriched in telomeres [Enguix et al., 2018] and depletion of H4K20 methylation leads to genomic instability [Sørensen et al., 2013]. This creates another interesting link between telomere biology and the epigenetic ageing clock (as discussed in Chapter 1, TERT genetic variants are associated with epigenetic age acceleration and its expression is required in vitro to ensure epigenetic ageing) [Lu et al., 2018]. It would be worth testing how the epigenetic ageing clock behaves in cancer-resistant mice that constitutively express TERT (which have an extended lifespan) [Tomás-Loba et al., 2008]. Ageing-related DNA methylation changes generally increase the informational entropy of the system (i.e. the methylation values tend to 0.5, see section 3.5). It is tempting to speculate that, from a biological point of view, this can be interpreted as a dilution of the epigenetic marks that define stable cell types and transcriptional programs and an increase in cell-to-cell epigenetic heterogeneity (in other words, as an erosion of the Waddingtonian 5.3 Technological aspects 129 epigenetic landscape). Some authors have suggested that epigenetic information is carried by a population of cells as a whole [Jenkinson et al., 2017; Shipony et al., 2014]. Furthermore, even populations of a specific cell type (such as primed ESCs) show oscillations in the methylation values of specific regions, which seem to have a particularly high amplitude in enhancers [Rulands et al., 2018] (one of the hotspots of hypomethylation changes during ageing). If such a population were to be analysed with a bulk DNA methylation method, it would likely display a high methylation entropy in enhancers. Furthermore, the fact that methylation entropy is higher in the sites of the Horvath clock could indicate that cytosines that display this type of metastable state make good predictors. Thus, it is possible that alterations in the DNA methylation oscillatory behaviour, caused by changes in the activities or the binding of DNMT3s and TETs (which could happen if the H3K36 methylation landscape is altered), are a feature of the epigenetic ageing clock. Moreover, cytosines that change their methylation status following circadian rhythms have been identified both in mice [Oh et al., 2018] and humans [Oh et al., 2019]. Importantly, these cytosines seem to significantly overlap with ageing DMPs, with the amplitude of the circadian oscillations correlating with the magnitude of epigenetic ageing effects. Intriguingly, oscillatory cytosines were highly enriched in neutrophil-specific enhancers [Oh et al., 2019]. Therefore, further studies should explore the relationship between the disruption of circadian rhythms during ageing and its association with the methylome and the epigenetic ageing clock. Mechanistic advances will require testing these ideas in the mouse. First, it would be interesting to confirm whether the effects of heterozygous loss-of-function mutations in NSD1 are evolutionarily conserved, using the mouse multi-tissue epigenetic clock, and test if they affect the lifespan of these mice. Moreover, one of the remaining questions is whether the DNA methylation changes associated with the epigenetic ageing clock are functional at all. Epigenomic editing technologies [Liu et al., 2016] could help to answer this question. Additionally, testing how conserved these mechanisms are beyond mammals (e.g. in the African turquoise killifish) or whether they behave differently in species with remarkable longevity (such as the naked mole rat) would be of interest. 5.3 Technological aspects In Chapter 4, we have created a computational method (cuRRBS) to optimise the enrich- ment of specific sets of genomic sites through the combinatorial use of restriction en- zymes. This could be potentially applied to make future epigenetic clocks more cost-effective (especially if they are composed of several hundreds or thousands of sites). Furthermore, 130 Final remarks given how statistically degenerate epigenetic clocks are, new models could be trained taking into account the most cost-effective combinations of sites. Reductions in assay cost could lead to the wide adoption of DNA methylation-based biomarkers for high-throughput drug screening. From a research point-of-view, it is fundamental that we expand our analysis beyond the biased regions from the Illumina methylation array. Therefore, whole genome bisulfite sequencing during ageing should become more common, allowing us to characterise the changes in the epigenetic landscape at higher resolution. This will likely become a reality thanks to the fast drop in sequencing costs and to the development of bisulfite-free methods that improve mapping rates [Liu et al., 2019]. Furthermore, it remains to be seen whether the DNA methylation changes observed during ageing occur in all cell types in the tissue or whether changes in the concentration of specific cell types (e.g. progenitor stem cells) or clones are responsible for them. In this sense, single-cell technologies (specially those that profile transcriptome and epigenome simultaneously) and lineage tracing will become instrumental for future mechanistic advances on the epigenetic ageing clock [Kelsey et al., 2017]. Appendix Supplementary figures S.1 Supplementary for chapter 2 0e+00 2e−04 4e−04 6e−04 0 10000 20000 30000 Raw Mi De ns ity Failed QC Passed QC 0e+00 2e−04 4e−04 0 10000 20000 30000 Raw Ui De ns ity Failed QC Passed QC a b 0e+00 2e−04 4e−04 6e−04 8e−04 0 10000 20000 30000 Background−corrected Mi De ns ity Failed QC Passed QC 0e+00 2e−04 4e−04 6e−04 0 10000 20000 30000 Background−corrected Ui De ns ity Failed QC Passed QC c d Fig. S1.1 Effects of noob background correction on the array flurescence intensities. Distributions of the array fluorescence intensities for the a. methylated signals (Mi) before background correction; b. unmethylated signals (Ui) before background correction; c. methylated signals (Mi) after background correction and d. unmethylated signals (Ui) after background correction. Each curve represents a DNA methylation sample from the GSE41273 batch. In grey: 51 samples that passed quality control (QC). In red: 2 samples that failed QC. 132 l ll l l l l l l l l l l l l l l l ll l l l l l l l ll l ll l l l l l l l l l l l l l l lll 9 10 11 12 13 9 10 11 12 13 median{ log2 Mi } m ed ia n{ lo g 2 U i } l l Failed QC Passed QC Fig. S1.2 Quality control (QC) strategy to identify outlier samples, according to their global intensity values, in the GSE41273 batch. Those samples with low median intensity values (see criteria in section 2.1.2) were discarded from downstream analyses (2/53, in red). Each sample is represented by one point. The dashed line represents the intensity threshold. Mi and Ui represent the background-corrected methylated and unmethylated intensity measurements for the different 450K array probes in a given sample. 0.00 0.05 0.10 0.15 −5 0 5 M−value D en si ty Passed QC Fig. S1.3 M-value distributions in the samples of the GSE41273 batch, after all the pre-processing steps have been carried out (background correction, quality control, probe filtering and BMIQ normalisation). M-values were calculated applying the logistic transformation to the β -values, as described in Du et al. [2010]. Each curve represents a different sample. S.1 Supplementary for chapter 2 133 Strategy name Reference Gold-standard preprocessing Reference preprocessing Probes in reference Algorithm Mean RMSE Mean MAE Mean R^2 minfi minfi SQN* SQN* 600 Houseman CP/QP 2.3246 2.0137 0.9473 dhs_dif1_houseman DHS-DMCs Noob+BMIQ Default 333 Houseman CP/QP 4.8039 3.843 0.7783 dhs_NB_houseman DHS-DMCs Noob+BMIQ Noob+BMIQ 333 Houseman CP/QP 4.9398 4.1559 0.8062 dhs_dif2_houseman DHS-DMCs Noob+Filtering+ BMIQ Default 316 Houseman CP/QP 6.1731 5.2469 0.7779 dhs_NFB_houseman DHS-DMCs Noob+Filtering+ BMIQ Noob+Filtering+ BMIQ 316 Houseman CP/QP 6.1194 5.3185 0.7816 dhs_dif1_cibersort DHS-DMCs Noob+BMIQ Default 333 CIBERSORT 2.3914 1.9502 0.8702 dhs_NB_cibersort DHS-DMCs Noob+BMIQ Noob+BMIQ 333 CIBERSORT 2.8578 2.3833 0.8453 dhs_dif2_cibersort DHS-DMCs Noob+Filtering+ BMIQ Default 316 CIBERSORT 2.9751 2.4714 0.8552 dhs_NFB_cibersort DHS-DMCs Noob+Filtering+ BMIQ Noob+Filtering+ BMIQ 316 CIBERSORT 3.0684 2.5403 0.8571 dhs_dif1_rpc DHS-DMCs Noob+BMIQ Default 333 RPC 2.0421 1.7032 0.8873 dhs_NB_rpc DHS-DMCs Noob+BMIQ Noob+BMIQ 333 RPC 2.5289 2.1689 0.8705 dhs_dif2_rpc DHS-DMCs Noob+Filtering+ BMIQ Default 316 RPC 2.9653 2.3887 0.8722 dhs_NFB_rpc DHS-DMCs Noob+Filtering+ BMIQ Noob+Filtering+ BMIQ 316 RPC 3.0755 2.5266 0.8611 idol_NB_houseman IDOL Noob+BMIQ Noob+BMIQ 300 Houseman CP/QP 2.0347 1.6778 0.9632 idol_NFB_houseman IDOL Noob+Filtering+ BMIQ Noob+Filtering+ BMIQ 281 Houseman CP/QP 1.927 1.5498 0.9672 idol_NB_cibersort IDOL Noob+BMIQ Noob+BMIQ 300 CIBERSORT 2.1997 1.7958 0.9626 idol_NFB_cibersort IDOL Noob+Filtering+ BMIQ Noob+Filtering+ BMIQ 281 CIBERSORT 1.9818 1.6216 0.9704 idol_NB_rpc IDOL Noob+BMIQ Noob+BMIQ 300 RPC 2.26 1.8812 0.9679 idol_NFB_rpc IDOL Noob+Filtering+ BMIQ Noob+Filtering+ BMIQ 281 RPC 2.0122 1.6288 0.9692 Fig. S1.4 Table showing the different cell-type deconvolution strategies that were benchmarked. BMIQ: beta-mixture quantile normalisation. CP/QP: constrained projection/quadratic programming. MAE: mean absolute error. Noob: noob background correction. R2: coefficient of determination. RMSE: root mean squared error. RPC: robust partial correlations. SQN: stratified quantile normalisation. ‘Default’ refers to the pre-processing strategy employed in the original DHS-DMCs publication, as implemented in the EpiDISH R package (centDHSbloodDMC.m) [Teschendorff et al., 2017; Teschendorff and Zheng, 2017b]. See section 2.1.3 in the main text for more details on what the different references refer to. 134 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l ll l l l l l l l l ll l 0.25 0.50 0.75 1.00 m in fi dh s_ di f1 _h ou se m an dh s_ NB _h ou se m an dh s_ di f2 _h ou se m an dh s_ NF B_ ho us em an dh s_ di f1 _c ib er so rt dh s_ NB _c ib er so rt dh s_ di f2 _c ib er so rt dh s_ NF B_ cib er so rt dh s_ di f1 _r pc dh s_ NB _r pc dh s_ di f2 _r pc dh s_ NF B_ rp c id ol _N B_ ho us em an id ol _N FB _h ou se m an id ol _N B_ cib er so rt id ol _N FB _c ib er so rt id ol _N B_ rp c id ol _N FB _r pc Cell−type deconvolution strategy R 2 Cell l l l l l l B CD4T CD8T Gran Mono NK Fig. S1.5 Benchmarking of the cell-type deconvolution strategies in blood. The x-axis shows the different strategies that were tested (for a detailed description see Fig. S1.4). The y-axis shows the results for the coefficient of determination (R2) when comparing the predictions with the real proportions of cells in a gold- standard dataset (GSE77797) [Koestler et al., 2016]. The grey horizontal solid lines represent the mean for the R2 across cell types and the grey dashed line the maximum of these values. S.1 Supplementary for chapter 2 135 ProbeID Chromosome Coordinate Intercept Slope T statistic p-value Methylation change In Horvath model Gene(s) cg16867657 chr6 11044877 0.5458189 0.0053562 96.7079 0 Hypermethylated No ELOVL2 cg06639320 chr2 106015739 -0.18099 0.0040751 68.4826 0 Hypermethylated No FHL2 cg21572722 chr6 11044894 0.4485118 0.0029979 67.7891 0 Hypermethylated No ELOVL2 cg22454769 chr2 106015767 -0.37256 0.0054721 65.4459 0 Hypermethylated No FHL2 cg07547549 chr20 44658225 -0.109895 0.0039332 60.4444 0 Hypermethylated No SLC12A5 cg24724428 chr6 11044888 0.1715795 0.003787 60.3559 0 Hypermethylated No ELOVL2 cg17110586 chr19 36454623 -0.076933 0.0027991 59.6101 0 Hypermethylated No cg19283806 chr18 66389420 1.1244081 -0.0052494 -55.5368 0 Hypomethylated No CCDC102B cg10501210 chr1 207997020 -0.767615 -0.0071941 -54.848 0 Hypomethylated No cg24079702 chr2 106015771 -0.239806 0.0037027 54.5055 0 Hypermethylated No FHL2 cg22796704 chr10 49673534 0.5923358 -0.0038938 -54.2818 0 Hypomethylated No ARHGAP22 cg04875128 chr15 31775895 -0.29584 0.0048949 53.8691 0 Hypermethylated No OTUD7A cg23606718 chr2 131513927 -0.192302 0.0024361 53.8427 0 Hypermethylated No FAM123C cg00059225 chr5 151304357 0.2564821 0.0023987 52.8361 0 Hypermethylated No GLRA1 cg23500537 chr5 140419819 0.2019473 0.0029768 52.4657 0 Hypermethylated No cg07553761 chr3 160167977 -0.085898 0.0030009 52.1708 0 Hypermethylated No TRIM59 cg14674720 chr2 219827930 -0.15175 0.0022723 52.1475 0 Hypermethylated No cg16419235 chr8 57360613 -0.110675 0.0021004 52.087 0 Hypermethylated No PENK cg07082267 chr16 85429035 -0.234831 -0.0024153 -51.9394 0 Hypomethylated No cg11970349 chr4 8582287 0.4395301 0.0024517 51.7603 0 Hypermethylated No GPR78 cg14556683 chr19 15342982 -0.354214 0.0030292 51.4444 0 Hypermethylated No EPHX3 cg06493994 chr6 25652602 -0.281467 0.0018639 51.2747 0 Hypermethylated Yes SCGN cg19560758 chr1 8086721 0.123634 0.0017654 51.0739 0 Hypermethylated No ERRFI1 cg22736354 chr6 18122719 -0.328228 0.0023877 50.7215 0 Hypermethylated Yes NHLRC1 cg17885226 chr6 105388731 -0.011797 0.0030608 50.2096 0 Hypermethylated No cg08262002 chr4 16575323 0.448234 -0.0036267 -50.1807 0 Hypomethylated No LDB2 cg18933331 chr1 110186418 0.1394501 -0.0026901 -49.3592 0 Hypomethylated No cg00329615 chr3 118706648 0.3767479 -0.0049889 -49.1687 0 Hypomethylated No IGSF11 cg08097417 chr7 130419133 -0.212277 0.0018305 48.9874 0 Hypermethylated No KLF14 cg00748589 chr12 11653486 0.1822405 0.0024207 48.2695 0 Hypermethylated No cg11084334 chr3 9594264 -0.022951 0.0027848 47.6682 0 Hypermethylated No LHFPL4 cg11071401 chr17 48637194 0.3081191 0.0023875 47.6374 0 Hypermethylated No CACNA1G cg06784991 chr1 53308768 0.0728526 0.0021442 47.4979 0 Hypermethylated No ZYG11A cg00439658 chr17 72848669 -0.187047 0.0019148 47.3396 0 Hypermethylated No GRIN2C cg16054275 chr1 169556022 -0.308762 -0.0031404 -47.2773 0 Hypomethylated No F5 cg14692377 chr17 28562685 -0.319816 0.0019735 47.2725 0 Hypermethylated No SLC6A4 cg13649056 chr9 136474626 0.0939199 0.0018608 47.0121 0 Hypermethylated No cg11693709 chr15 40542019 0.4398948 -0.0041179 -46.6849 0 Hypomethylated No PAK6 cg07080372 chr11 796607 -0.044385 -0.0020517 -46.5748 0 Hypomethylated No SLC25A22 cg19671120 chr2 98962974 0.2917162 0.0019275 46.5463 0 Hypermethylated No CNGA3 cg16219603 chr8 57360586 -0.243393 0.001599 46.4953 0 Hypermethylated No PENK cg11705975 chr10 120354248 0.1345631 0.0025062 46.1335 0 Hypermethylated No PRLHR cg15480367 chr14 93389485 0.1737257 0.0020641 46.1196 0 Hypermethylated No CHGA cg24466241 chr1 53308908 -0.192473 0.0028258 45.9054 5.9288E-323 Hypermethylated No ZYG11A cg02650266 chr4 147558239 -0.028284 0.0018604 45.5452 2.5444E-319 Hypermethylated No 136 cg03738025 chr6 105388694 0.1325219 0.0037303 45.5435 2.6480E-319 Hypermethylated No cg08160331 chr11 75140865 0.1225186 0.0024513 45.5115 5.5982E-319 Hypermethylated No KLHL35 cg14361627 chr7 130419116 -0.029613 0.0024426 45.4145 5.4238E-318 Hypermethylated No KLF14 cg08128734 chr1 206685423 0.5891423 -0.0054386 -45.0487 2.8384E-314 Hypomethylated No RASSF5 cg26290632 chr8 91094847 0.2029635 0.0020152 45.0401 3.4695E-314 Hypermethylated No CALB1 cg01974375 chr1 151298954 0.0385361 -0.0019059 -45.0297 4.4226E-314 Hypomethylated No PI4KB cg23479922 chr5 16179633 -0.5691 0.0045894 44.9595 2.2879E-313 Hypermethylated No MARCH11 cg09809672 chr1 236557682 0.175291 -0.0040059 -44.8504 2.9374E-312 Hypomethylated Yes EDARADD cg00481951 chr3 187387650 0.1841224 0.0023342 44.6878 1.3200E-310 Hypermethylated No SST cg03545227 chr2 220173100 0.0832971 0.0013552 44.5825 1.5491E-309 Hypermethylated No PTPRN cg18618815 chr17 48275324 -0.292108 -0.0031805 -44.5025 1.0061E-308 Hypomethylated No COL1A1 cg11649376 chr12 81473234 0.1177648 -0.0025894 -44.4751 1.9099E-308 Hypomethylated No ACSS3 cg11436113 chr20 19191145 -0.245529 -0.0028774 -44.446 3.7798E-308 Hypomethylated No cg20591472 chr1 110008990 0.2290873 0.0029438 44.3726 2.1018E-307 Hypermethylated No SYPL2 cg12757011 chr2 162281111 -0.036861 0.0022385 44.3402 4.4864E-307 Hypermethylated No TBR1 cg06570224 chr3 157812475 -0.255113 0.0021525 44.3003 1.1387E-306 Hypermethylated No cg12878812 chr12 119419696 -0.152434 0.0017975 44.1946 1.3495E-305 Hypermethylated No SRRM4 cg07931844 chr15 72102213 -0.347225 -0.0020941 -44.1556 3.363E-305 Hypomethylated No NR2E3 cg15341124 chr14 102027734 0.1822515 0.0021014 43.8202 8.5279E-302 Hypermethylated No DIO3; MIR1247 cg12534424 chr7 127992316 -0.038607 0.0019362 43.5602 3.7086E-299 Hypermethylated No PRRT4 cg25410668 chr1 28241577 0.5378571 0.0033963 43.5204 9.4093E-299 Hypermethylated No RPA2 cg19392831 chr10 120355756 0.1002692 0.0017162 43.3469 5.4065E-297 Hypermethylated No PRLHR cg16008966 chr1 114761794 0.2872323 -0.0024427 -43.054 5.0499E-294 Hypomethylated No cg05308819 chr1 155959156 -0.383566 -0.0018965 -43.0379 7.3568E-294 Hypomethylated No cg08468401 chr3 14303131 -0.481126 -0.0045074 -43.0226 1.0497E-293 Hypomethylated No cg19855470 chr22 40060836 -0.111118 0.0015512 42.913 1.3565E-292 Hypermethylated No CACNA1I cg11220950 chr16 2042693 0.0102849 0.0019377 42.8543 5.3374E-292 Hypermethylated No SYNGR3 cg16717122 chr15 51973920 0.3252301 0.00151 42.8415 7.1833E-292 Hypermethylated No SCG3 cg22156456 chr17 39844239 -0.229764 -0.0018499 -42.8279 9.8668E-292 Hypomethylated No EIF1 cg06335143 chr1 53308654 -0.088651 0.0022272 42.8111 1.4619E-291 Hypermethylated No ZYG11A cg23746497 chr6 105388668 0.072451 0.0034686 42.7311 9.4375E-291 Hypermethylated No cg08234504 chr5 139013317 -0.235634 -0.0015863 -42.72 1.2233E-290 Hypomethylated No cg24436906 chr2 242498081 0.4803492 0.0019615 42.6333 9.2401E-290 Hypermethylated No BOK cg13848598 chr10 115804578 -0.111233 0.0024786 42.4955 2.2983E-288 Hypermethylated No ADRB1 cg10804656 chr10 22623460 -0.950746 0.0028943 42.4594 5.3272E-288 Hypermethylated No cg13135455 chr2 241860318 0.0059196 -0.0022231 -42.4071 1.8043E-287 Hypomethylated No cg23078123 chr1 68577796 0.759047 -0.0026555 -42.3732 3.9744E-287 Hypomethylated No GPR177 cg13327545 chr10 22623548 -0.358846 0.0022651 42.3019 2.0954E-286 Hypermethylated No cg03431918 chr17 77716367 0.1575907 -0.0017119 -42.2827 3.2734E-286 Hypomethylated No cg01820374 chr12 6882083 -0.47997 -0.0022168 -42.2819 3.3323E-286 Hypomethylated Yes LAG3 cg20747538 chr3 137838021 -0.227794 -0.0019417 -42.2727 4.1287E-286 Hypomethylated No cg27320127 chr2 47798396 0.3532211 0.0019054 42.2074 1.8912E-285 Hypermethylated No KCNK12 cg20273670 chr17 21356245 -0.202763 0.0032538 42.1546 6.4709E-285 Hypermethylated No cg19702785 chr20 43727089 -0.307403 0.0016088 42.1542 6.5405E-285 Hypermethylated No KCNS1 cg14583999 chr3 10019040 0.051048 -0.0038329 -42.1149 1.6328E-284 Hypomethylated No TMEM111 cg01844642 chr3 51989764 -0.160677 0.0021369 42.1066 1.9788E-284 Hypermethylated No GPR62 S.1 Supplementary for chapter 2 137 cg00602811 chr2 145278564 -0.192604 -0.0038479 -42.1046 2.0743E-284 Hypomethylated No ZEB2 cg01770755 chr15 41914122 -0.106172 0.0017079 42.0334 1.089E-283 Hypermethylated No cg00484358 chr1 110610995 0.2396367 0.0016647 42.0065 2.0361E-283 Hypermethylated No ALX3 cg18064714 chr7 20824556 -0.082174 0.00167 41.9065 2.0891E-282 Hypermethylated No SP8 cg16512661 chr5 2743620 0.2799574 0.0020114 41.717 1.7193E-280 Hypermethylated No cg11741201 chr11 35638398 -0.069447 -0.0023228 -41.523 1.5688E-278 Hypomethylated No FJX1 cg22016779 chr2 230452311 -0.370728 -0.0023361 -41.4895 3.4156E-278 Hypomethylated No DNER cg18473521 chr12 54448265 0.1111276 0.0041993 41.3931 3.2188E-277 Hypermethylated No HOXC4 cg01528542 chr12 81468232 -0.352352 -0.0036075 -41.3691 5.6171E-277 Hypomethylated No Fig. S1.6 Table showing the characteristics of the top 100 differentially methylated positions during ageing (aDMPs) in the blood of the healthy individuals, ordered by p-value and the absolute value of the T statistic. The chromosome and coordinate refer to the hg19 human genome assembly. The reported genes are the closest genes associated with the array probe, as specified by the 450K array annotation. In this case, cell composition correction (CCC) was applied during modelling (see section 2.1.4). 2. 5 3. 0 3. 5 4. 0 0 25 50 75 10 0 Number of PCs M A E in c on tro l Corrections CCC: No | Batch: No CCC: No | Batch: Yes CCC: Yes | Batch: No CCC: Yes | Batch: Yes Optimal number of PCs: 11 Optimal mean MAE: 2.7485 Background correction: None Fig. S1.7 Plot showing how the median absolute error (MAE) of the prediction in the healthy individual samples, that should tend to zero, is reduced when the PCs capturing the technical variation are included as part of the modelling strategy (see equations 2.16 and 2.17). The dashed line represents the optimal number of PCs (11) that was finally used. The optimal mean MAE is calculated as the average MAE between the green and purple lines. In this case, no background correction was applied to the methylation data before calculating the epigenetic ages according to Horvath’s epigenetic clock [Horvath, 2013a]. 138 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● −4 0 −2 0 0 20 40 Eu ro pe Fe b_ 20 16 GS E1 04 81 2 GS E1 11 62 9 GS E4 02 79 GS E4 12 73 GS E4 28 61 GS E5 10 32 GS E5 54 91 GS E5 90 65 GS E6 14 96 GS E7 44 32 GS E8 19 61 GS E9 73 62 Batch EA A wi th ou t C CC (y ea rs ) Batch effect correction: FALSE MAE: 3.273 ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 0 −2 0 0 20 40 Eu ro pe Fe b_ 20 16 GS E1 04 81 2 GS E1 11 62 9 GS E4 02 79 GS E4 12 73 GS E4 28 61 GS E5 10 32 GS E5 54 91 GS E5 90 65 GS E6 14 96 GS E7 44 32 GS E8 19 61 GS E9 73 62 Batch EA A wi th ou t C CC (y ea rs ) Batch effect correction: TRUE MAE: 2.8211 a b Fig. S1.8 Correcting for batch effects in the context of the epigenetic clock. a. Distribution of the epigenetic age acceleration (EAA) for the different batches of healthy individual samples, using the control model without cell composition correction (CCC) and before applying batch effect correction. The dashed black line represents EAA= 0, where the distributions should be centred around. b. As in a., but after applying batch effect correction (i.e. equivalent to equation 2.17). S.1 Supplementary for chapter 2 139 l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l ll l l l l l ll l l l ll l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l ll l l l l l l ll l l l l l l l lll ll ll l l ll l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l ll l l l l ll l l l l l l l l l l l l l l ll l l ll ll l l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l − 40 00 0 − 20 00 0 0 20 00 0 40 00 0 − 50 00 0 0 50 00 0 PC1 (68.99%) PC 2 (10 .33 %) Batch l l l l l l l l l l l l l Europe Feb_2016 GSE116300 GSE41273 GSE55491 GSE74432 GSE97362 Jun_2015 Mar_2014 May_2015 May_2016 Nov_2015 Oct_2014 Cases Fig. S1.9 Scatterplot showing the values of the first two principal components (PCs) for the samples with developmental disorders (cases, see Chapter 3) after performing PCA on the control probes of the 450K arrays. Each point corresponds to a different sample and the colours represent the different batches. The different batches cluster together in the PCA space, showing that the control probes indeed capture technical variation. Please note that all the PCA calculations were done using samples from both healthy individuals (full lifespan, N = 2218) and cases from developmental disorders (N = 666). l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 0 25 50 75 10 0 0 5 10 15 20 25 Principal components (PCs) % v ar ia nc e l l Cumulative % variance % variance for PC Fig. S1.10 Plot showing the percentages of technical variance explained by the different PCs from the control probes. The dashed line represents the optimal number of PCs (17) that was finally used. 140 S.2 Supplementary for chapter 3 Batch name N♀ N♂ N Median age (years) Other comments Europe 0 119 119 7.73 Feb_2016 20 20 40 6 GSE116300 4 5 9 3 GSE41273 0 9 9 7.75 GSE74432 11 16 27 10 GSE97362 4 9 13 15 Samples from the ‘validation cohort’ were not included in the analysis, since they all seemed outliers on close exami- nation Jun_2015 1 1 2 3.5015 Mar_2014 11 6 17 8 May_2015 17 49 66 14 Nov_2015 35 30 65 6.7 Total 103 264 367 8 - Table S2.1 Overview of the blood DNA methylation dataset from individuals with developmental disorders. The batches ‘Europe’, ‘Feb_2016’, ‘Jun_2015’, ‘Mar_2014’, ‘May_2015’ and ‘Nov_2015’ were generated in-house by our collaborators in Canada (see Chapter 3). The rest of the batches were downloaded from GEO [Edgar et al., 2002]. N♀: number of samples from females. N♂: number of samples from males. N: total number of samples. These numbers correspond to the samples left after applying quality control and filtering (see section 3.2). S.2 Supplementary for chapter 3 141 Batch name Developmental disorder Gene Mutation (DNA) Mutation (protein) Mutation effect Pathogenic Sex Age (years) DNAmAge Europe ASD NA NA NA NA NA Male 23.25 29.94120469 Europe ASD NA NA NA NA NA Male 25.75 23.66579727 Europe ASD NA NA NA NA NA Male 23.75 22.89490773 Europe ASD NA NA NA NA NA Male 26.58 31.33521081 Europe ASD NA NA NA NA NA Male 11.83 13.55540994 Europe ASD NA NA NA NA NA Male 12.33 12.62567804 Europe ASD NA NA NA NA NA Male 11.67 11.91444556 Europe ASD NA NA NA NA NA Male 12.67 15.1433583 Europe ASD NA NA NA NA NA Male 15.92 20.69231419 Europe ASD NA NA NA NA NA Male 16.92 18.37736076 Europe ASD NA NA NA NA NA Male 15.92 14.74270021 Europe ASD NA NA NA NA NA Male 19 28.69942806 Europe ASD NA NA NA NA NA Male 16.75 20.84761017 Europe ASD NA NA NA NA NA Male 20.16 17.69509361 Europe ASD NA NA NA NA NA Male 12.92 18.28693655 Europe ASD NA NA NA NA NA Male 13.25 12.24924728 Europe ASD NA NA NA NA NA Male 13 15.27709141 Europe ASD NA NA NA NA NA Male 13.25 15.93247357 Europe ASD NA NA NA NA NA Male 13.16 17.97126245 Europe ASD NA NA NA NA NA Male 13.67 18.5985271 Europe ASD NA NA NA NA NA Male 7.67 9.834525429 Europe ASD NA NA NA NA NA Male 7.92 8.819610809 Europe ASD NA NA NA NA NA Male 7.73 10.53639331 Europe ASD NA NA NA NA NA Male 8 8.782413174 Europe ASD NA NA NA NA NA Male 7.83 8.331080792 Europe ASD NA NA NA NA NA Male 8 8.412508081 Europe ASD NA NA NA NA NA Male 10.83 12.94110542 Europe ASD NA NA NA NA NA Male 11.5 16.52427744 Europe ASD NA NA NA NA NA Male 10.83 9.546814402 Europe ASD NA NA NA NA NA Male 11.5 10.75219435 Europe ASD NA NA NA NA NA Male 10.83 11.7226536 Europe ASD NA NA NA NA NA Male 6 8.750320884 Europe ASD NA NA NA NA NA Male 5.75 8.069349936 Europe ASD NA NA NA NA NA Male 6 8.205893972 Europe ASD NA NA NA NA NA Male 5.83 8.765912407 Europe ASD NA NA NA NA NA Male 6.33 6.903468104 Europe ASD NA NA NA NA NA Male 5.25 5.648518225 Europe ASD NA NA NA NA NA Male 5.67 5.896253109 Europe ASD NA NA NA NA NA Male 5.42 6.160793858 Europe ASD NA NA NA NA NA Male 5.75 8.719005258 Europe ASD NA NA NA NA NA Male 5.42 6.49657694 Europe ASD NA NA NA NA NA Male 3.92 4.884904225 Europe ASD NA NA NA NA NA Male 4.08 4.766905985 Europe ASD NA NA NA NA NA Male 4 5.462162993 142 Europe ASD NA NA NA NA NA Male 4.08 4.557194499 Europe ASD NA NA NA NA NA Male 4 4.383741212 Europe ASD NA NA NA NA NA Male 4.25 5.321367013 Europe ASD NA NA NA NA NA Male 3.25 2.797437125 Europe ASD NA NA NA NA NA Male 3.42 3.906912403 Europe ASD NA NA NA NA NA Male 3.33 4.703272329 Europe ASD NA NA NA NA NA Male 3.5 3.223456196 Europe ASD NA NA NA NA NA Male 3.42 4.024449964 Europe ASD NA NA NA NA NA Male 3.58 4.662665584 Europe ASD NA NA NA NA NA Male 5.16 7.931806871 Europe ASD NA NA NA NA NA Male 5.16 6.144088681 Europe ASD NA NA NA NA NA Male 5.16 5.423886319 Europe ASD NA NA NA NA NA Male 5.25 6.873520458 Europe ASD NA NA NA NA NA Male 5.16 6.828746343 Europe ASD NA NA NA NA NA Male 5.25 6.287392617 Europe ASD NA NA NA NA NA Male 6.5 7.549817595 Europe ASD NA NA NA NA NA Male 6.83 5.310188113 Europe ASD NA NA NA NA NA Male 6.67 8.807848811 Europe ASD NA NA NA NA NA Male 7.16 7.314048584 Europe ASD NA NA NA NA NA Male 6.83 7.143809294 Europe ASD NA NA NA NA NA Male 7.25 4.888587648 Europe ASD NA NA NA NA NA Male 10.08 11.01168613 Europe ASD NA NA NA NA NA Male 10.08 9.091817984 Europe ASD NA NA NA NA NA Male 10.08 12.00962928 Europe ASD NA NA NA NA NA Male 10.5 11.89814401 Europe ASD NA NA NA NA NA Male 10.08 10.85200361 Europe ASD NA NA NA NA NA Male 10.58 15.97655481 Europe ASD NA NA NA NA NA Male 14.67 19.40830372 Europe ASD NA NA NA NA NA Male 15.25 17.28948864 Europe ASD NA NA NA NA NA Male 14.83 18.99313794 Europe ASD NA NA NA NA NA Male 15.25 17.40182035 Europe ASD NA NA NA NA NA Male 15.08 20.74719227 Europe ASD NA NA NA NA NA Male 15.83 17.66494621 Europe ASD NA NA NA NA NA Male 1.83 2.332369997 Europe ASD NA NA NA NA NA Male 2.33 2.079645877 Europe ASD NA NA NA NA NA Male 2.08 3.093728905 Europe ASD NA NA NA NA NA Male 2.5 3.327332717 Europe ASD NA NA NA NA NA Male 2.08 3.081702301 Europe ASD NA NA NA NA NA Male 2.5 3.640188937 Europe ASD NA NA NA NA NA Male 27.67 5.315328746 Europe ASD NA NA NA NA NA Male 32.92 35.79080593 Europe ASD NA NA NA NA NA Male 31.83 35.12415194 Europe ASD NA NA NA NA NA Male 35.16 34.8152863 Europe ASD NA NA NA NA NA Male 32.33 33.47894995 Europe ASD NA NA NA NA NA Male 11.58 14.81256772 Europe ASD NA NA NA NA NA Male 4.5 3.982793413 S.2 Supplementary for chapter 3 143 Europe ASD NA NA NA NA NA Male 4.75 6.632731853 Europe ASD NA NA NA NA NA Male 4.5 5.453577973 Europe ASD NA NA NA NA NA Male 5 6.0536493 Europe ASD NA NA NA NA NA Male 4.67 4.665684936 Europe ASD NA NA NA NA NA Male 5 5.538833496 Europe ASD NA NA NA NA NA Male 4.33 6.826640979 Europe ASD NA NA NA NA NA Male 4.42 5.074848057 Europe ASD NA NA NA NA NA Male 4.33 4.069969605 Europe ASD NA NA NA NA NA Male 4.5 2.914915908 Europe ASD NA NA NA NA NA Male 4.33 4.177855824 Europe ASD NA NA NA NA NA Male 4.5 5.359046992 Europe ASD NA NA NA NA NA Male 7.33 4.981096393 Europe ASD NA NA NA NA NA Male 7.5 7.521560211 Europe ASD NA NA NA NA NA Male 7.33 5.632014057 Europe ASD NA NA NA NA NA Male 7.58 5.381195679 Europe ASD NA NA NA NA NA Male 7.42 7.07596058 Europe ASD NA NA NA NA NA Male 7.58 6.118788705 Europe ASD NA NA NA NA NA Male 8.83 8.225301829 Europe ASD NA NA NA NA NA Male 9.08 9.139517533 Europe ASD NA NA NA NA NA Male 8.83 7.154970232 Europe ASD NA NA NA NA NA Male 9.67 9.966260719 Europe ASD NA NA NA NA NA Male 8.92 8.69481855 Europe ASD NA NA NA NA NA Male 9.67 12.84219838 Europe ASD NA NA NA NA NA Male 8.08 10.35219735 Europe ASD NA NA NA NA NA Male 8.25 8.849774575 Europe ASD NA NA NA NA NA Male 8.16 9.464032218 Europe ASD NA NA NA NA NA Male 8.33 10.51799454 Europe ASD NA NA NA NA NA Male 8.16 9.41622481 Europe ASD NA NA NA NA NA Male 8.75 13.39598874 May_2015 Angelman UBE3A NA NA NA YES Female 7 5.473183736 May_2015 Angelman UBE3A NA NA NA YES Male 13 15.48878288 May_2015 Angelman UBE3A NA NA NA YES Male 55 59.49787491 Nov_2015 Angelman UBE3A NA NA NA YES Male 1 2.790549766 Nov_2015 Angelman UBE3A NA NA NA YES Female 4 3.956276247 Nov_2015 Angelman UBE3A NA NA NA YES Female 15 17.87817565 Nov_2015 Angelman UBE3A NA NA NA YES Male 1 2.320603044 Nov_2015 Angelman UBE3A NA NA NA YES Male 4 4.348249902 Nov_2015 Angelman UBE3A NA NA NA YES Male 1 0.959598999 Nov_2015 Angelman UBE3A NA NA NA YES Female 1 1.994091886 Nov_2015 Angelman UBE3A NA NA NA YES Female 10 8.697172131 Nov_2015 Angelman UBE3A NA NA NA YES Female 14 15.7410421 Nov_2015 Angelman UBE3A NA NA NA YES Female 6 5.13374965 Nov_2015 Angelman UBE3A NA NA NA YES Male 25 32.45470863 May_2015 ATR-X ATRX c.6254G>A p.Arg2085His Missense YES Male 6.3 6.19432086 May_2015 ATR-X ATRX c.736C>T p.Arg246Cys Missense YES Male 18 13.11825849 May_2015 ATR-X ATRX c.6593A>G p.His2198Arg Missense YES Male 1.4 2.604328944 144 May_2015 ATR-X ATRX c.758T>C p.Leu253Ser Missense YES Male 18.5 6.108170831 May_2015 ATR-X ATRX c.4817G>A p.Ser1606Asn Missense YES Male 21 24.74309568 May_2015 ATR-X ATRX c.5786A>G p.Lys1929Arg Missense YES Male 0.7 -0.14552632 May_2015 ATR-X ATRX c.730A>C p.Ile244Leu Missense YES Male 14 11.30064691 May_2015 ATR-X ATRX c.7156C>T p.Arg2386* Nonsense YES Male 4.6 6.236506951 May_2015 ATR-X ATRX c.536A>G p.Asn179Ser Missense YES Male 4.6 33.54375298 May_2015 ATR-X ATRX Exon 207 deletion NA Exonic deletion YES Male 4.4 4.821921423 May_2015 ATR-X ATRX c.7366_7367ins A p.Met2456Asnfs* 42 Frameshift YES Male 27 39.19917395 May_2015 ATR-X ATRX c.109C>T p.Arg37* Nonsense YES Male 14.5 5.274937882 May_2015 ATR-X ATRX c.736C>T p.Arg246Cys Missense YES Male 2.5 1.113449871 May_2015 ATR-X ATRX c.109C>T p.Arg37* Nonsense YES Male 17.5 22.71435784 May_2015 ATR-X ATRX c.109C>T p.Arg37* Nonsense YES Male 14 11.21597332 Nov_2015 Claes_Jensen KDM5C c.1510G>A p.Val504Met Missense YES Male 30 42.69659356 Nov_2015 Claes_Jensen KDM5C c.1439C>T p.Pro480Leu Missense YES_predicted Male 6 8.103173952 Nov_2015 Claes_Jensen KDM5C c.4439_4440del AG p.Arg1481Glyfs* Frameshift YES Male 26 28.25654272 Nov_2015 Claes_Jensen KDM5C Intron 11:+5G>A NA Splice site mutation YES Male 42 54.3236723 Nov_2015 Claes_Jensen KDM5C c.1510G>A p.Val504Met Missense YES Male 8 10.07007313 Nov_2015 Claes_Jensen KDM5C c.1439C>T p.Pro480Leu Missense YES Male 2 3.619189097 Nov_2015 Claes_Jensen KDM5C c.229G>A p.Ala77Thr Missense YES Male 37 48.42002598 Nov_2015 Claes_Jensen KDM5C c.4439_4440del AG p.Arg1481Glyfs* Frameshift YES Male 28 31.61445991 Nov_2015 Claes_Jensen KDM5C c.229G>A p.Ala77Thr Missense YES Male 13 16.50827759 Nov_2015 Claes_Jensen KDM5C c.1510G>A p.Val504Met Missense YES Male 26 38.69008936 May_2015 Coffin_Lowry RPS6KA3 c.1520insA p.Arg507fs Frameshift YES Female 6 4.093225848 May_2015 Coffin_Lowry RPS6KA3 c.2065C>T p.Gln689* Nonsense YES Male 11.5 10.63296406 May_2015 Coffin_Lowry RPS6KA3 c.2186G>A p.Arg729Gln Missense YES_predicted Male 4 4.62981308 May_2015 Coffin_Lowry RPS6KA3 c.631_772del14 2 and c.774+5G>A NA Frameshift and intronic mutation YES_predicted Male 7 5.068637974 May_2015 Coffin_Lowry RPS6KA3 c.340C>T p.Arg114Trp Missense YES_predicted Male 1.3 8.170755226 May_2015 Coffin_Lowry RPS6KA3 c.727C>T p.Arg243* Nonsense YES Male 13 14.17141748 May_2015 Coffin_Lowry RPS6KA3 Intron 14:+1G>A NA Splice site mutation YES Male 22.8 25.56720654 May_2015 Coffin_Lowry RPS6KA3 NA NA Exonic and intronic deletion YES Male 12 10.17620766 May_2015 Coffin_Lowry RPS6KA3 c.386_387insCT TT p.Phe130Phefs*1 41 Frameshift YES Male 2 1.808104516 May_2015 Coffin_Lowry RPS6KA3 c.1155delT p.Phe385fs*40 Frameshift YES Male 8 7.406603271 Mar_2014 Floating_Harbour SRCAP c.7303C>T p.Arg2435* Nonsense YES Female 8 11.29885487 Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 15 16.23135534 Mar_2014 Floating_Harbour SRCAP c.7282dupC p.Arg2428Profs*1 5 Frameshift YES Female 6 5.620915174 Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 10 42.55562244 Mar_2014 Floating_Harbour SRCAP c.8117C>G p.Ser2706* Nonsense YES Male 4 2.815335426 Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 5 4.112348915 Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 42 43.43022309 Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Male 12 12.37257473 S.2 Supplementary for chapter 3 145 Mar_2014 Floating_Harbour SRCAP c.7316dupC p.Ala2440Serfs*3 Frameshift YES Male 10 4.424381743 Mar_2014 Floating_Harbour SRCAP c.7165G>T p.Glu2389* Nonsense YES Female 8 1.524333568 Mar_2014 Floating_Harbour SRCAP c.7218_7219del TC p.Gln2407Argfs*3 5 Frameshift YES Male 12 19.26251425 Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Male 5 4.902256866 Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 35 38.47378886 Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 15 14.81418145 Mar_2014 Floating_Harbour SRCAP c.7549delC p.Gln2517Lysfs*5 Frameshift YES Male 4 3.645524918 Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 6 7.201471688 Mar_2014 Floating_Harbour SRCAP c.7219C>T p.Gln2407* Nonsense YES Female 6 6.552720685 GSE41273 FXS FMR1 NA NA CGG repeat expansion YES Male 5 -0.26537653 GSE41273 FXS FMR1 NA NA CGG repeat expansion YES Male 10.41667 4.620596743 GSE41273 FXS FMR1 NA NA CGG repeat expansion YES Male 7.75 9.380603836 GSE41273 FXS FMR1 NA NA CGG repeat expansion YES Male 4.333333 7.378290152 GSE41273 FXS FMR1 NA NA CGG repeat expansion YES Male 0.083333 7.256745087 GSE41273 FXS FMR1 NA NA CGG repeat expansion YES Male 4.166667 6.582911793 GSE41273 FXS FMR1 NA NA CGG repeat expansion YES Male 21 32.38418863 GSE41273 FXS FMR1 NA NA CGG repeat expansion YES Male 34.58333 46.41126929 GSE41273 FXS FMR1 NA NA CGG repeat expansion YES Male 48 58.89975733 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 27 32.354974 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 12 11.03917455 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 42 40.85689027 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 28 31.89965321 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 15 15.3286979 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 17 13.98190146 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 21 21.42017869 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 30 35.16564816 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 28 27.14880628 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 21 24.03936596 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 33 37.84060062 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 29 35.17133434 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 25 25.67600147 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 17 14.45573451 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 33 36.37082822 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 29 34.45261333 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 20 24.86340454 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 41 46.76222649 146 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 31 34.61968346 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 27 29.78714348 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 17 19.72629863 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 15 11.78896917 May_2015 FXS FMR1 NA NA CGG repeat expansion YES Male 14 12.80759084 GSE116300 Kabuki KMT2D NA p.Pro443fs Frameshift YES Female 1 0.790826048 GSE116300 Kabuki KMT2D NA p.Tyr2199fs Frameshift YES Female 3 4.448848163 GSE116300 Kabuki KMT2D NA p.Ser5307fs Frameshift YES Male 5 11.49079359 GSE116300 Kabuki KMT2D NA p.Asn4403fs Frameshift YES Male 4.33 6.325934863 GSE116300 Kabuki KMT2D NA p.Gln4102* Nonsense YES Male 2 5.566745677 GSE116300 Kabuki KMT2D NA p.Gln3934* Nonsense YES Male 3.75 4.443224079 GSE116300 Kabuki KMT2D c.14515+1G>T NA Splice site mutation YES Male 2.5 16.55101592 GSE116300 Kabuki KMT2D NA p.Gln4090* Nonsense YES Female 1.42 3.379081974 GSE116300 Kabuki KMT2D NA p.Thr1708fs Frameshift YES Female 11.5 10.71344707 GSE97362 Kabuki KMT2D c.15061C>T p.Arg5021* Nonsense YES Female 14 8.946680052 GSE97362 Kabuki KMT2D c.16318delG p.Glu5440Argfs*1 6 Frameshift YES Male 1 0.664960442 GSE97362 Kabuki KMT2D c.15030dup p.Glu5011Argfs*1 3 Frameshift YES Male 18 24.00757516 GSE97362 Kabuki KMT2D c.8172_8173del p.Pro2724Glnfs*5 Frameshift YES Female 16 4.540501556 GSE97362 Kabuki KMT2D c.6595delT p.Tyr2199Ilefs*65 Frameshift YES Male 15 6.279894046 GSE97362 Kabuki KMT2D c.14055_14056 delCA p.His4685Glnfs*4 Frameshift YES Male 11 9.2260079 GSE97362 Kabuki KMT2D c.6295C>T p.Arg2099* Nonsense YES Male 14 6.594838599 GSE97362 Kabuki KMT2D c.4135delA p.Met1379Valfs*5 2 Frameshift YES Male 20 10.04269734 GSE97362 Kabuki KMT2D c.12592C>T p.Arg4198* Nonsense YES Male 18 9.095825776 GSE97362 Kabuki KMT2D c.4135delA p.Met1379Valfs*5 2 Frameshift YES Male 6 8.462691919 GSE97362 Kabuki KMT2D c.11710C>T p.Gln3904* Nonsense YES Male 16 12.68670209 GSE97362 Kabuki KMT2D c.15143G>A p.Arg5048His Missense YES_predicted Female 7 0.627461504 GSE97362 Kabuki KMT2D c.16522- 5_16522-4delTT NA Splice site mutation YES_predicted Female 15 12.75508563 Jun_2015 Kabuki KMT2D c.1801_1822du p22 NA Frameshift YES Male 7 6.044371299 Nov_2015 Kabuki KMT2D c.13059delG p.Pro4353fs Frameshift YES Female 6.7 5.526369466 Nov_2015 Kabuki KMT2D c.839+1delG NA Splice site mutation YES Male 1.9 2.51325414 Nov_2015 Kabuki KMT2D c.15844C>T p.Arg5282* Nonsense YES Female 3.9 3.752004426 Nov_2015 Kabuki KMT2D c.16294C>T p.Arg5432Trp Missense YES_predicted Male 21.6 30.3375233 Nov_2015 Kabuki KMT2D c.8488C>T p.Arg2830* Nonsense YES Female 0 -0.1055224 Nov_2015 Kabuki KMT2D c.4168dupG p.Ala1390fs Frameshift YES Female 3.8 4.177253095 Nov_2015 Kabuki KMT2D c.15289C>T p.Arg5097* Nonsense YES Male 4.3 6.455955113 Nov_2015 Kabuki KMT2D c.4419-2A>G NA Splice site mutation YES Male 2.6 3.387623395 Nov_2015 Kabuki KMT2D c.16048A>T p.Lys5350* Nonsense YES Female 19.1 19.2926115 Nov_2015 Kabuki KMT2D c.10201C>T p.Gln3401* Nonsense YES Male 7.1 8.838432826 Nov_2015 Kabuki KMT2D c.16360C>T p.Arg5454* Nonsense YES Male 3.4 5.199197126 Nov_2015 Kabuki KMT2D c.8692C>T p.Gln2898* Nonsense YES Male 3.1 3.423420462 S.2 Supplementary for chapter 3 147 Nov_2015 Kabuki KMT2D c.14878C>T p.Arg4960* Nonsense YES Female 4.1 4.752807097 Nov_2015 Kabuki KMT2D c.6265A>T p.Lys2089* Nonsense YES Female 23.1 25.95907184 Nov_2015 Kabuki KMT2D c.10740+1G>A NA Splice site mutation YES Female 6.9 6.253113479 Nov_2015 Kabuki KMT2D c.13652T>A p.Leu4551* Nonsense YES Male 2.2 3.757460909 Nov_2015 Kabuki KMT2D c.11596C>T p.Gln3866* Nonsense YES Female 1 1.193509229 Nov_2015 Kabuki KMT2D c.548delC p.Pro183fs Frameshift YES Female 16.6 8.413539447 Nov_2015 Kabuki KMT2D c.7411C>T p.Arg2471* Nonsense YES Female 3.3 3.541604601 Nov_2015 Kabuki KMT2D c.1966dupC pLeu656fs Frameshift YES Female 24.1 28.78927404 Nov_2015 Kabuki KMT2D c.6200delA p.Asn2067fs Frameshift YES Female 9.5 6.485224166 Nov_2015 Kabuki KMT2D c.7933C>T p.Arg2645* Nonsense YES Female 9.3 8.701999271 Nov_2015 Kabuki KMT2D c.13450C>T p.Arg4484* Nonsense YES Female 5.8 5.430619578 Feb_2016 Noonan PTPN11 c.1403C>T p.Thr468Met Missense YES Male 9 10.53231848 Feb_2016 Noonan PTPN11 c.1391G>C p.Gly464Ala Missense YES Female 28 25.06455423 Feb_2016 Noonan PTPN11 c.1493G>T p.Arg498Leu Missense YES Male 0.4 1.069462128 Feb_2016 Noonan PTPN11 c.836A>G p.Tyr279Cys Missense YES Male 0.2 0.145725107 Feb_2016 Noonan PTPN11 c.1493G>T p.Arg498Leu Missense YES Male 7 7.125930003 Feb_2016 Noonan PTPN11 c.1528C>G p.Gln510Glu Missense YES Female 2 4.906928458 Feb_2016 Noonan PTPN11 c.228G>C p.Glu76Asp Missense YES Male 17 17.52765019 Feb_2016 Noonan PTPN11 c.215C>G p.Ala72Gly Missense YES Female 13 9.011977393 Feb_2016 Noonan PTPN11 c.1391G>C p.Gly464Ala Missense YES Female 0.7 1.172244358 Feb_2016 Noonan PTPN11 c.922A>G p.Asn308Asp Missense YES Male 15 14.68576639 Feb_2016 Noonan PTPN11 c.836A>G p.Tyr279Cys Missense YES Male 0.3 0.576697185 Feb_2016 Noonan PTPN11 c.214G>T p.Ala72Ser Missense YES Male 0.9 1.080594238 Feb_2016 Noonan PTPN11 c.178G>A p.Gly60Ser Missense YES Male 2 3.079510066 Feb_2016 Noonan PTPN11 c.172A>G p.Asn58Asp Missense YES Male 37 42.63784241 Feb_2016 Noonan PTPN11 c.174C>A p.Asn58Lys Missense YES Female 27 32.19911243 Feb_2016 Noonan RAF1 c.781C>T p.Pro261Ser Missense YES Male 9 11.76954478 Feb_2016 Noonan RAF1 c.770C>T p.Ser257Leu Missense YES Female 4 6.836828788 Feb_2016 Noonan RAF1 c.788T>G p.Val263Gly Missense YES Male 8 10.54386119 Feb_2016 Noonan RAF1 c.782C>T p.Pro261Leu Missense YES Male 3 5.956377653 Feb_2016 Noonan RAF1 c.786T>A p.Asn262Lys Missense YES Female 3 3.603073783 Feb_2016 Noonan RAF1 c.768G>T p.Arg256Ser Missense YES Male 20 21.09275241 Feb_2016 Noonan RAF1 c.524A>G p.His175Arg Missense YES Female 0.7 0.815080545 Feb_2016 Noonan RAF1 c.1837C>G p.Leu613Val Missense YES Female 10 7.425274033 Feb_2016 Noonan RAF1 c.775T>A p.Ser259Thr Missense YES Female 8 8.883918263 Feb_2016 Noonan RAF1 c.1472C>T p.Thr491Ile Missense YES Female 26 29.82312626 Feb_2016 Noonan RAF1 c.781C>A p.Pro261Thr Missense YES Female 11 12.25565712 Feb_2016 Noonan SOS1 c.2536G>A p.Glu846Lys Missense YES Female 3 2.62618922 Feb_2016 Noonan SOS1 c.1654A>G p.Arg552Gly Missense YES Male 16 12.47288243 Feb_2016 Noonan SOS1 c.1310T>C p.Ile437Thr Missense YES Female 7 7.309199493 Feb_2016 Noonan SOS1 c.806T>C p.Met269Thr Missense YES Female 35 25.04627009 Feb_2016 Noonan SOS1 c.1642A>C p.Ser548Arg Missense YES Female 3 4.372134286 Feb_2016 Noonan SOS1 c.925G>T p.Asp309Tyr Missense YES Female 49 45.20434465 Feb_2016 Noonan SOS1 c.1655G>C p.Arg552Thr Missense YES Male 1 2.41372048 Feb_2016 Noonan SOS1 c.508A>G p.Lys170Glu Missense YES Male 0.3 0.944100935 Feb_2016 Noonan SOS1 c.1294T>C p.Trp432Arg Missense YES Female 14 17.03491762 148 Feb_2016 Noonan SOS1 c.1322G>A p.Cys441Tyr Missense YES Female 0.6 0.555111083 Feb_2016 Noonan SOS1 c.806T>G p.Met269Arg Missense YES Female 0.4 0.844087032 Feb_2016 Noonan SOS1 c.797C>A p.Thr266Lys Missense YES Male 1 2.133506512 Feb_2016 Noonan SOS1 c.1297G>A p.Glu433Lys Missense YES Male 1 1.481217449 Feb_2016 Noonan SOS1 c.1300G>A p.Gly434Arg Missense YES Male 5 8.558246566 May_2015 Rett MECP2 NA p.Arg106Trp Missense YES Female 1 1.835127123 May_2015 Rett MECP2 NA p.Arg168* Nonsense YES Female 25 29.34649481 May_2015 Rett MECP2 NA p.Pro302Arg Missense YES Female 34 35.17904908 May_2015 Rett MECP2 NA NA Exonic deletion YES Female 2 2.581071992 May_2015 Rett MECP2 NA p.Thr158Met Missense YES Female 1 2.210005617 May_2015 Rett MECP2 Deletion in exon 4 NA Exonic deletion YES Female 3 5.225511336 May_2015 Rett MECP2 NA p.Thr158Met Missense YES Female 1 2.510753024 May_2015 Rett MECP2 NA p.Pro225Arg Missense YES Female 4 6.160921221 May_2015 Rett MECP2 c.1157_1197del 41 p.Glu374fs Frameshift YES Female 6 6.2636907 May_2015 Rett MECP2 NA p.Arg255* Nonsense YES Female 1.5 1.084382282 May_2015 Rett MECP2 Deletion in exons 3 and 4 NA Exonic deletion YES Female 6 6.883663479 May_2015 Rett MECP2 NA p.Arg106Trp Missense YES Female 29 38.83647398 May_2015 Rett MECP2 NA p.Thr158Met Missense YES Female 3 4.77442952 May_2015 Rett MECP2 NA p.Arg255* Nonsense YES Female 11 11.74653291 May_2015 Rett MECP2 Partial deletion of exon 4 NA Exonic deletion YES Female 4 3.072948979 Jun_2015 Saethre_Chotzen TWIST1 c.385_405dup2 1 NA In-frame insertion YES Female 0.003 -0.35722332 Nov_2015 Saethre_Chotzen TWIST1 c.149delC p.Ala50fs Frameshift YES Male 0.02 0.16785508 Nov_2015 Saethre_Chotzen TWIST1 c.149delC p.Ala50fs Frameshift YES Female 0.1 13.96937513 Nov_2015 Saethre_Chotzen TWIST1 c.376G>T p.Glu126* Nonsense YES Male 38 41.56611411 Nov_2015 Saethre_Chotzen TWIST1 c.406_407ins21 NA In-frame insertion YES Male 30 29.61790422 Nov_2015 Saethre_Chotzen TWIST1 c.156delC p.Pro52fs Frameshift YES Female 33.5 27.76671901 Nov_2015 Saethre_Chotzen TWIST1 c.418_419ins21 NA In-frame insertion YES Male 17.7 15.97052177 Nov_2015 Saethre_Chotzen TWIST1 c.211C>T p.Gln71* Nonsense YES Female 20.7 18.347741 Nov_2015 Saethre_Chotzen TWIST1 c.325C>T p.Gln109* Nonsense YES_predicted Male 0.7 0.45749609 Nov_2015 Saethre_Chotzen TWIST1 c.396_416dup2 1 NA In-frame insertion YES Male 0.1 0.386967314 Nov_2015 Saethre_Chotzen TWIST1 c.193G>T p.Glu65* Nonsense YES Female 0.01 0.049927484 Nov_2015 Saethre_Chotzen TWIST1 c.472T>C p.Phe158Leu Missense YES Female 23.3 0.174364646 Nov_2015 Saethre_Chotzen TWIST1 NA NA Full gene deletion YES Female 0.35 0.404844597 Nov_2015 Saethre_Chotzen TWIST1 NA NA Full gene deletion YES Female 0.003 7.069271322 Nov_2015 Saethre_Chotzen TWIST1 c.160G>T p.Gly54* Nonsense YES Female 0.7 0.830512167 Nov_2015 Saethre_Chotzen TWIST1 c.397_417dup2 1 NA In-frame insertion YES_predicted Female 20.5 25.83177177 Nov_2015 Saethre_Chotzen TWIST1 c.120_145del26 NA Frameshift YES Male 0.6 0.491449014 Nov_2015 Saethre_Chotzen TWIST1 c.149delC p.Ala50fs Frameshift YES Female 23.5 18.94806941 Nov_2015 Saethre_Chotzen TWIST1 c.394_414del21 NA In-frame deletion YES Female 12.3 10.10722932 Nov_2015 Saethre_Chotzen TWIST1 c.352C>G p.Arg118Gly Missense YES_predicted Female 21.5 23.41800184 S.2 Supplementary for chapter 3 149 Nov_2015 Saethre_Chotzen TWIST1 c.376G>T p.Glu126* Nonsense YES Female 0.8 0.92117994 Nov_2015 Saethre_Chotzen TWIST1 c.490C>T p.Gln164* Nonsense YES Female 28.7 28.56296158 GSE74432 Sotos NSD1 chr5:175,366,0 08- 177,470,488 NA Long deletion YES Female 9 8.442111023 GSE74432 Sotos NSD1 chr5:175,764,2 62- 177,059,256 NA Long deletion YES Female 7 16.4840396 GSE74432 Sotos NSD1 Exons 15-19 deletion NA Exonic deletion YES Male 10 26.70242296 GSE74432 Sotos NSD1 c.1716delC p.Cys573Valfs*26 Frameshift YES Female 10 14.59121875 GSE74432 Sotos NSD1 c.6454C>T p.Arg2152* Nonsense YES Female 3.5 9.371834336 GSE74432 Sotos NSD1 c.5445C>G p.Tyr1815* Nonsense YES Female 13.2 22.67264348 GSE74432 Sotos NSD1 c.4843delT p.Tyr1615Thrfs*2 7 Frameshift YES Male 3 7.039068162 GSE74432 Sotos NSD1 NA NA Microdeletion YES Male 2.2 15.1797238 GSE74432 Sotos NSD1 c.6349C>T p.Arg2117* Nonsense YES Female 12 26.9093016 GSE74432 Sotos NSD1 c.1492C>T p.Arg498* Nonsense YES Male 2.2 8.399587071 GSE74432 Sotos NSD1 c.6454C>T p.Arg2152* Nonsense YES Male 18 32.23853498 GSE74432 Sotos NSD1 c.1583delA p.Lys528Argfs*8 Frameshift YES Male 19.7 27.25531484 GSE74432 Sotos NSD1 c.2014_2018del ACAGA p.Thr672Glufs*9 Frameshift YES Male 8 26.46585423 GSE74432 Sotos NSD1 c.2014_2018del ACAGA p.Thr672Glufs*9 Frameshift YES Male 41 67.36442178 GSE74432 Sotos NSD1 c.2014_2018del ACAGA p.Thr672Glufs*9 Frameshift YES Female 2 11.34495985 GSE74432 Sotos NSD1 c.1810C>T p.Arg604* Nonsense YES Female 1.6 6.2471485 GSE74432 Sotos NSD1 c.1801A>T p.Lys601* Nonsense YES Male 10.6 30.82670587 GSE74432 Sotos NSD1 c.4977_4978ins G p.Arg1660Alafs*1 3 Frameshift YES Male 20 41.38296452 GSE74432 Sotos NSD1 c.6437G>C p.Cys2146Ser Missense YES_predicted Male 2 9.83036953 GSE74432 Sotos NSD1 c.6412T>C p.Cys2138Arg Missense YES_predicted Male 7 29.0788673 GSE74432 Weaver EZH2 c.457_459delTA T p.Tyr153del In-frame deletion YES Male 30 40.6786865 GSE74432 Weaver EZH2 c.2080C>T p.His694Tyr Missense YES Female 10.9167 17.28626931 GSE74432 Weaver EZH2 c.2050C>T p.Arg684Cys Missense YES Male 2.5833 2.611103643 GSE74432 Weaver EZH2 c.398A>G p.Tyr133Cys Missense YES Female 17 7.870608634 GSE74432 Weaver EZH2 c.553G>C p.Asp185His Missense YES Male 15.4167 18.04003584 GSE74432 Weaver EZH2 c.394C>T p.Pro132Ser Missense YES Female 19.75 21.09459251 GSE74432 Weaver EZH2 c.1876G>A p.Val626Met Missense YES Male 43 42.37721085 Fig. S2.1 Table showing information for the samples from individuals with developmental disorders (total N = 367). Mutation information was annotated for the human genome assembly hg19. ASD: autism spectrum disorder; ATR-X: alpha thalassemia/mental retardation X-linked syndrome; FXS: fragile X syndrome. 150 l l ll ll ll ll llll l ll l l l l l l ll lll l ll l l l ll llllll ll l l l ll llllllllllll ll l l l l l ll ll ll l ll lll l l l l l l ll llllll ll l l l ll llllllllllll0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC Angelman (N=14) l l l l l l ll l llllll ll lllllllllllllll l ll lllll ll l l lllll lllllll lll ll l ll l lll l ll l l l l l ll ll l l lllll lllllll lll l 0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC ASD (N=119) l l lll l l l l l lll l l ll lll lllllll ll l ll l llll ll l l l ll lllllll llllllll l l l l l l l lllll l ll ll ll lllll ll l l l ll lllllllllllll0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC ATR−X (N=15) l lll ll l ll l l l l l l l ll lll l llllll lll l l l ll lllll ll l l lllllllllllllll l l ll l l lllll l l ll l lllllll ll ll l l lllllllllll lllll 0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC Claes_Jensen (N=10) l l l l ll l l l l l ll l l ll ll l l ll ll llll l ll ll l ll lllll ll l l l ll lllllll llllll l l l l l l l l ll ll l lll l l l lll l l ll l l l ll ll l l ll ll lllll llllll 0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC Coffin_Lowry (N=10) l l l l l ll l l ll l ll lllllllllll llll l ll lllll ll l l llllllllllllll l ll l llll lll lll lllllllllllll ll lll l l l l ll lllllllllll0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC Floating_Harbour (N=17) l l ll l l llll lllll lll ll ll l ll ll l l l l l l l ll lll ll ll l l l ll ll l lllll lll lll lllll l l ll l l l ll l llll l ll l lll ll ll l l l l l lll l l l ll lllll l lll lll llll 0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC FXS (N=32) l l lll lll l ll l l llllll llll l lll ll l ll l l ll l ll ll l l l ll ll lllll ll l ll l llll lll ll lllllll ll l l ll l lll ll l l l l l ll l ll ll l l l llll lllll ll l ll l llll l 0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC Kabuki (N=46) l l llllllll lll llllll lll llll llll l ll l llll ll l l l ll lllllll lllllll l l lll lll l lll lllll l ll lllll ll l l l ll l llll lllllll 0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC Noonan_PTPN11 (N=15) l l lll llll ll l l l lll ll l llllll llll l ll lllll ll l l l ll llllll lllll l l l l l ll ll l ll lll llllllll l ll llllll0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC Noonan_RAF1 (N=11) l l ll l llllllllll lll llll ll ll l l l l llll ll l l l ll llllll ll lllllll l l l ll l lll l ll ll l ll lll ll llll l ll l lllll ll l l l ll l llllll lllllll 0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC Noonan_SOS1 (N=14) l l lll l lllllll l ll l lll llll l ll l l l l l l l l ll llll ll ll l l l ll ll lllll lll lll llll l l l l l l ll llll ll l l l l lll ll ll l l l ll l lllll l ll lllllll 0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC Rett (N=15) l l l l l l l l ll llll l ll l lll ll lll llllll l ll lllll ll l l l ll llllll lllllll l l l ll l l ll ll l l l ll l l lll ll l l l llll ll l l l ll llllll lllll0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC Saethre_Chotzen (N=22) l l l l l l lll lll ll lllllllll lllllll l ll lllll ll l l lllllllllllllllll l l l l l l lllllll l lllllll llllll l ll lllll ll l l llll llllll lllllll 0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC Sotos (N=20) l l l l ll lllllllllllllllllll l ll lllll ll l l lllllllllllllll l l l ll l l lllllllllllll ll l l0 5 10 10 20 30 40 50 Median age in control (years) − lo g 1 0(P − v al u e) EAA model l l With CCC Without CCC Weaver (N=7) Fig. S2.2 Effect of changing the median age of the controls when performing the screening for epigenetic age acceleration (EAA) in the different developmental disorders. The dashed green line displays the significance level of α = 0.01 after Bonferroni correction. The dashed orange line displays the median age for the samples in the developmental disorder considered. In blue: EAA model without cell composition correction (CCC). In red: EAA model with CCC. ASD: autism spectrum disorder; ATR-X: alpha thalassemia/mental retardation X-linked syndrome; FXS: fragile X syndrome. S.2 Supplementary for chapter 3 151 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● Angelman Control Control: N=1128 Angelman: N=14 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 Angelman: N=14 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Angelman: N=14 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● ASD Control Control: N=1128 ASD: N=119 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ●● ● ● ● ● ●● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 ASD: N=119 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 ASD: N=119 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● ATR−X Control Control: N=1128 ATR−X: N=15 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 ATR−X: N=15 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 ATR−X: N=15 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● Claes_Jensen Control Control: N=1128 Claes_Jensen: N=10 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 Claes_Jensen: N=10 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Claes_Jensen: N=10 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● Coffin_Lowry Control Control: N=1128 Coffin_Lowry: N=10 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 Coffin_Lowry: N=10 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Coffin_Lowry: N=10 152 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● Floating_Harbour Control Control: N=1128 Floating_Harbour: N=17 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 Floating_Harbour: N=17 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Floating_Harbour: N=17 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● FXS Control Control: N=1128 FXS: N=32 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 FXS: N=32 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 FXS: N=32 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● Kabuki Control Control: N=1128 Kabuki: N=46 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 Kabuki: N=46 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Kabuki: N=46 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● Noonan_PTPN11 Control Control: N=1128 Noonan_PTPN11: N=15 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 Noonan_PTPN11: N=15 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Noonan_PTPN11: N=15 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● Noonan_RAF1 Control Control: N=1128 Noonan_RAF1: N=11 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 Noonan_RAF1: N=11 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Noonan_RAF1: N=11 S.2 Supplementary for chapter 3 153 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● Noonan_SOS1 Control Control: N=1128 Noonan_SOS1: N=14 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 Noonan_SOS1: N=14 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Noonan_SOS1: N=14 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● Rett Control Control: N=1128 Rett: N=15 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 Rett: N=15 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Rett: N=15 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● Saethre_Chotzen Control Control: N=1128 Saethre_Chotzen: N=22 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 Saethre_Chotzen: N=22 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Saethre_Chotzen: N=22 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● Sotos Control Control: N=1128 Sotos: N=20 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 Sotos: N=20 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Sotos: N=20 ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0 20 40 60 80 0 20 40 Chronological age (years) DN Am Ag e (y ea rs ) Disease status ● ● Weaver Control Control: N=1128 Weaver: N=7 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th ou t C CC (y ea rs ) Control: N=1128 Weaver: N=7 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● −20 0 20 40 0 20 40 Chronological age (years) EA A wi th C CC (y ea rs ) Control: N=1128 Weaver: N=7 Fig. S2.3 Screening for epigenetic age acceleration (EAA) in developmental disorders. Left panel: scatterplot showing the relation between epigenetic age (DNAmAge) according to Horvath’s model and chronological age of the samples for a given developmental disorder (orange) and control (grey). Each sample is represented by one point. The black dashed line represents the diagonal to aid visualisation. Middle and right panels: scatterplots showing the relation between the epigenetic age acceleration (EAA) (without and with CCC respectively) and chronological age of the samples for a given developmental disorder (orange) and control (grey). Each sample is represented by one point. The yellow line represents the linear model EAA ∼ Age, with the standard error shown in the light yellow shade. 154 l l l l l l l l ll l l l ll ll l l l l l l l l l l l l l l l l l l l l ll l l l ll l l l l l ll l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l ll l l ll ll l l l l ll l l l l l l l l l l l ll ll l l l l l l l l l ll l l l l l l ll l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l ll ll l l l l Hypo Sotos DMPs Hypo−Hypo DMPs Hyper−Hypo DMPs Hypo aDMPs Hyper aDMPs Hyper Sotos DMPs Active Enhancer 1 Active Enhancer 2 Active Enhancer Flank Active TSS Bivalent prom oter CG I G ene_body H eterochrom atin P oised prom oter Prim ary DNase Prim ary H3K27ac possible Enhancer Prom oter D o w n stream TSS 1 Prom oter D o w n stream TSS 2 Prom oter Upstream TSS Quiescent/lo w R epressed polycom b Shelf Shore Strong tra n scription Tra n scribed − 3' prefe re ntial Tra n scribed − 5' prefe re ntial Tra n scribed & regulatory (Prom/Enh) Tra n scribed 3' prefe re ntial and Enh Tra n scribed 5' prefe re ntial and Enh Tra n scribed and W e ak Enhancer W e ak Enhancer 1 W e ak Enhancer 2 W e ak tra n scription ZN F genes & repeats Active Enhancer 1 Active Enhancer 2 Active Enhancer Flank Active TSS Bivalent prom oter CG I G ene_body H eterochrom atin P oised prom oter Prim ary DNase Prim ary H3K27ac possible Enhancer Prom oter D o w n stream TSS 1 Prom oter D o w n stream TSS 2 Prom oter Upstream TSS Quiescent/lo w R epressed polycom b Shelf Shore Strong tra n scription Tra n scribed − 3' prefe re ntial Tra n scribed − 5' prefe re ntial Tra n scribed & regulatory (Prom/Enh) Tra n scribed 3' prefe re ntial and Enh Tra n scribed 5' prefe re ntial and Enh Tra n scribed and W e ak Enhancer W e ak Enhancer 1 W e ak Enhancer 2 W e ak tra n scription ZN F genes & repeats 0.01 0.10 1.00 10.00 0.01 0.10 1.00 10.00 0.01 0.10 1.00 10.00 O dd s ra tio 25 50 75 100 − log10(P − value) Fig. S2.4 Enrichment for the categorical (epi)genomic features considered when comparing the different genome-wide subsets of differentially methylated positions (DMPs) in ageing and Sotos against a control (see section 3.7). The y-axis represents the odds ratio (OR), the error bars show the 95% confidence interval for the OR estimate and the colour of the points codes for − log10(p-value) obtained after testing for enrichment using Fisher’s exact test. An OR > 1 shows that the given feature is enriched in the subset of DMPs considered, whilst an OR < 1 shows that it is found less than expected. The ‘Hyper-Hypo DMPs’ subset results from the intersection between the hypermethylated DMPs in ageing and the hypomethylated DMPs in Sotos. The ‘Hypo-Hypo DMPs’ subset results from the intersection between the hypomethylated DMPs in ageing and Sotos. In grey: features that did not reach significance using a significance level of α = 0.01 after Bonferroni correction. S.2 Supplementary for chapter 3 155 < 2.2e−16 388195 40071 −0.381 −0.411 0.46 428155 111 −0.384 −0.405 0.0041 425716 2550 −0.384 −0.372 < 2.2e−16 390815 37451 −0.378 −0.428 < 2.2e−16 413204 15062 −0.382 −0.422 6.1e−09 426538 1728 −0.384 −0.416 Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K27ac 0.3 388195 40071 −0.299 −0.265 0.076 428155 111 −0.295 −0.329 0.00065 425716 2550 −0.295 −0.302 < 2.2e−16 390815 37451 −0.279 −0.413 < 2.2e−16 413204 15062 −0.292 −0.339 5.9e−12 426538 1728 −0.295 −0.344 Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K4me3 < 2.2e−16 388195 40071 −0.306 −0.258 0.16 428155 111 −0.301 −0.264 0.67 425716 2550 −0.301 −0.284 < 2.2e−16 390815 37451 −0.297 −0.335 < 2.2e−16 413204 15062 −0.301 −0.308 < 2.2e−16 426538 1728 −0.301 −0.34 Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K36me3 < 2.2e−16 388195 40071 −0.395 1.31 4.1e−10 428155 111 −0.361 0.629 < 2.2e−16 425716 2550 −0.363 0.759 < 2.2e−16 390815 37451 −0.369 −0.283 < 2.2e−16 413204 15062 −0.373 0.194 < 2.2e−16 426538 1728 −0.362 −0.046 Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K27me3 < 2.2e−16 388195 40071 −0.42 −0.414 0.81 428155 111 −0.42 −0.342 0.26 425716 2550 −0.42 −0.389 < 2.2e−16 390815 37451 −0.413 −0.471 < 2.2e−16 413204 15062 −0.418 −0.448 0.004 426538 1728 −0.42 −0.43 Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K9ac < 2.2e−16 388195 40071 −0.292 0.113 8.8e−06 428155 111 −0.248 0.16 < 2.2e−16 425716 2550 −0.25 0.014 < 2.2e−16 390815 37451 −0.219 −0.463 0.0021 413204 15062 −0.246 −0.275 0.0092 426538 1728 −0.247 −0.365 Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K4me1 < 2.2e−16 388195 40071 −0.166 0.096 0.00033 428155 111 −0.147 0.105 < 2.2e−16 425716 2550 −0.148 −0.061 0.18 390815 37451 −0.147 −0.146 < 2.2e−16 413204 15062 −0.15 −0.077 1.4e−05 426538 1728 −0.147 −0.179 Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K9me3 < 2.2e−16 388195 40071 −0.17 −0.283 0.88 428155 111 −0.18 −0.204 7e−06 425716 2550 −0.18 −0.231 < 2.2e−16 390815 37451 −0.175 −0.219 3.8e−11 413204 15062 −0.179 −0.202 0.75 426538 1728 −0.18 −0.179 Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: RNF2 < 2.2e−16 388195 40071 −0.276 0.173 6.9e−09 428155 111 −0.258 0.051 < 2.2e−16 425716 2550 −0.259 −0.026 < 2.2e−16 390815 37451 −0.254 −0.296 < 2.2e−16 413204 15062 −0.259 −0.214 2.4e−05 426538 1728 −0.258 −0.275 Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: EZH2 < 2.2e−16 388195 40071 −0.316 −0.363 0.018 428155 111 −0.324 −0.351 < 2.2e−16 425716 2550 −0.323 −0.363 < 2.2e−16 390815 37451 −0.319 −0.351 < 2.2e−16 413204 15062 −0.319 −0.375 < 2.2e−16 426538 1728 −0.323 −0.375 Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N R E Feature: RNA < 2.2e−16 388195 40071 69.059 62.084 0.0024 428155 111 68.55 57.755 < 2.2e−16 425716 2550 68.507 72.956 0.00016 390815 37451 68.522 68.78 < 2.2e−16 413204 15062 68.366 72.137 < 2.2e−16 426538 1728 68.514 74.315 Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t 0 25 50 75 10 0 W TS Feature: Replication_timing < 2.2e−16 388195 40071 0.997 1.492 0.32 428155 111 1.052 1.251 < 2.2e−16 425716 2550 1.048 1.494 < 2.2e−16 390815 37451 1.072 0.822 < 2.2e−16 413204 15062 1.037 1.365 4.5e−14 426538 1728 1.051 1.259 Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N R C Feature: LaminB1 Fig. S2.5 Boxplots showing the distributions of scores for the continuous (epi)genomic features considered when comparing the different genome-wide subsets of differentially methylated positions (DMPs) in ageing and Sotos against a control (see section 3.7). The p-values (two-sided Wilcoxon’s test, before multiple testing correction) are shown above the boxplots. The number of DMPs belonging to each subset (in green) and the median value of the feature score (in dark red) are shown below the boxplots. NFC: ‘normalised fold change’; NRE: ‘normalised RNA expression’; WTS: ‘wavelet-transformed signals’; NRC: ‘normalised read counts’. 156 Horvath's clock CpGs Fe at ur es H3K9me3_ENCFF713QZB H3K9me3_ENCFF319EBK H3K9me3_ENCFF033IPJ H3K9me3_ENCFF171WZC RNF2_ENCFF071CIY RNF2_ENCFF847TGB RNF2_ENCFF320VKN RNF2_ENCFF857HEZ H3K27me3_ENCFF412KUE H3K27me3_ENCFF150RIG H3K27me3_ENCFF265VZG EZH2_ENCFF516PTT H3K4me1_ENCFF457WMB H3K36me3_ENCFF643USH H3K36me3_ENCFF249WVX H3K4me3_ENCFF573QMJ H3K9ac_ENCFF455IGC H3K4me3_ENCFF796FFT H3K4me3_ENCFF303YKC H3K27ac_ENCFF759GIZ H3K27ac_ENCFF737OJY H3K4me1_ENCFF100NYH H3K9ac_ENCFF211ORP cg 09 13 30 26 cg 10 37 72 74 cg 21 37 01 43 cg 26 39 49 40 cg 09 86 98 58 cg 18 18 07 83 cg 05 96 00 24 cg 01 58 44 73 cg 01 35 34 48 cg 26 45 35 88 cg 04 12 68 66 cg 02 33 24 92 cg 06 14 49 05 cg 07 59 59 43 cg 01 57 08 85 cg 19 69 27 10 cg 13 30 21 54 cg 25 41 17 25 cg 12 98 54 18 cg 01 40 77 97 cg 01 45 94 53 cg 07 77 02 22 cg 06 92 67 35 cg 06 68 88 48 cg 20 76 13 22 cg 04 09 41 60 cg 01 56 08 71 cg 07 40 84 56 cg 07 28 52 76 cg 15 66 14 09 cg 10 37 67 63 cg 20 79 58 63 cg 06 04 48 99 cg 05 75 57 79 cg 08 12 47 22 cg 07 33 75 98 cg 16 15 04 35 cg 19 30 52 27 cg 25 80 99 05 cg 06 11 78 55 cg 06 95 23 10 cg 21 37 82 06 cg 22 56 85 40 cg 13 68 27 22 cg 09 64 63 92 cg 24 12 68 51 cg 18 98 36 72 cg 24 11 68 86 cg 10 34 59 36 cg 17 27 40 64 cg 02 58 06 06 cg 12 41 35 66 cg 27 01 63 07 cg 17 96 05 16 cg 08 33 19 60 cg 27 20 27 08 cg 17 72 96 67 cg 09 72 25 55 cg 24 58 00 01 cg 17 58 93 41 cg 16 03 46 52 cg 19 70 66 82 cg 01 65 62 16 cg 19 34 61 93 cg 14 16 37 76 cg 03 56 53 23 cg 25 56 48 00 cg 07 15 83 39 cg 19 56 96 84 cg 03 01 90 00 cg 24 88 80 49 cg 08 25 10 36 cg 13 03 85 60 cg 10 26 64 90 cg 22 67 91 20 cg 18 44 00 48 cg 18 98 41 51 cg 15 54 75 34 cg 26 00 38 13 cg 13 82 80 47 cg 15 80 49 73 cg 21 21 17 48 cg 01 26 29 13 cg 06 46 22 91 cg 14 42 37 78 cg 23 66 26 75 cg 07 45 52 79 cg 13 85 48 74 cg 00 37 47 17 cg 24 47 18 94 cg 11 31 46 84 cg 17 09 95 69 cg 19 04 69 59 cg 09 80 96 72 cg 00 09 16 93 cg 16 89 94 42 cg 13 12 90 46 cg 14 59 79 08 cg 02 07 13 05 cg 10 04 58 81 cg 18 13 97 69 cg 03 68 28 23 cg 12 94 13 69 cg 27 54 41 90 cg 24 89 97 50 cg 14 99 22 53 cg 01 02 77 39 cg 25 68 30 12 cg 11 29 99 64 cg 19 72 44 70 cg 17 33 84 03 cg 24 25 41 20 cg 13 54 72 37 cg 02 15 40 74 cg 26 72 38 47 cg 08 96 52 35 cg 03 58 83 57 cg 04 47 48 32 cg 18 32 89 33 cg 19 76 12 73 cg 27 01 59 31 cg 15 70 35 12 cg 04 43 10 54 cg 02 33 54 41 cg 02 65 42 91 cg 13 93 12 28 cg 06 99 34 13 cg 19 85 37 60 cg 02 27 52 94 cg 00 94 55 07 cg 14 40 99 58 cg 13 97 53 69 cg 14 17 54 38 cg 20 24 08 60 cg 11 38 82 38 cg 24 26 24 69 cg 26 82 40 91 cg 26 00 50 82 cg 02 47 95 75 cg 21 80 13 78 cg 23 94 15 99 cg 12 94 62 25 cg 25 50 56 10 cg 22 92 08 73 cg 09 88 59 51 cg 20 82 80 84 cg 03 57 80 41 cg 02 38 81 50 cg 00 16 89 42 cg 00 07 59 67 cg 04 08 41 57 cg 17 68 68 85 cg 19 51 49 28 cg 27 49 43 83 cg 05 92 16 99 cg 07 38 84 93 cg 02 08 55 07 cg 05 44 29 02 cg 09 72 23 97 cg 02 36 46 42 cg 01 82 03 74 cg 04 12 19 83 cg 07 66 37 89 cg 00 43 15 49 cg 03 27 02 04 cg 10 86 51 19 cg 19 16 76 73 cg 16 35 88 26 cg 09 01 99 38 cg 08 43 42 34 cg 13 46 04 09 cg 15 97 40 53 cg 09 41 82 83 cg 26 62 09 59 cg 26 37 25 17 cg 03 10 31 92 cg 25 55 24 92 cg 01 96 81 78 cg 25 10 19 36 cg 14 32 91 57 cg 02 21 71 59 cg 00 86 48 67 cg 09 50 96 73 cg 05 36 57 29 cg 19 42 09 68 cg 12 83 06 94 cg 05 29 42 43 cg 20 91 45 08 cg 10 28 10 02 cg 10 92 09 57 cg 06 83 67 72 cg 21 09 63 99 cg 23 51 76 05 cg 08 03 00 82 cg 15 98 82 32 cg 14 89 41 44 cg 13 21 60 57 cg 04 00 50 32 cg 16 49 44 77 cg 10 48 69 98 cg 21 87 08 84 cg 12 61 62 77 cg 12 35 14 33 cg 14 72 30 32 cg 06 55 73 58 cg 27 16 90 20 cg 04 26 84 05 cg 22 44 91 14 cg 03 16 72 75 cg 06 73 86 02 cg 22 17 18 29 cg 25 07 06 37 cg 04 45 27 13 cg 08 37 09 96 cg 14 25 82 36 cg 09 11 86 25 cg 22 28 98 37 cg 12 76 86 05 cg 06 49 39 94 cg 22 90 18 40 cg 20 69 25 69 cg 27 31 98 98 cg 13 83 66 27 cg 22 19 01 14 cg 25 14 85 89 cg 00 43 66 03 cg 02 48 95 52 cg 08 77 17 31 cg 14 50 12 53 cg 17 06 39 29 cg 17 65 56 14 cg 17 85 35 87 cg 18 57 33 83 cg 25 65 78 34 cg 25 92 85 79 cg 14 65 83 62 cg 14 06 08 28 cg 26 04 54 34 cg 09 19 13 27 cg 22 94 70 00 cg 14 42 45 79 cg 22 00 63 86 cg 20 29 56 71 cg 25 77 11 95 cg 24 05 81 32 cg 26 84 53 00 cg 11 65 32 66 cg 03 89 13 19 cg 06 36 11 08 cg 06 12 14 69 cg 11 93 25 64 cg 10 52 30 19 cg 22 80 90 47 cg 08 18 61 24 cg 24 08 18 19 cg 26 29 76 88 cg 10 94 00 99 cg 06 51 30 75 cg 16 41 93 45 cg 16 16 83 11 cg 03 33 00 58 cg 21 39 57 82 cg 25 16 68 96 cg 14 72 79 52 cg 07 29 15 63 cg 05 90 36 09 cg 27 41 35 43 cg 04 83 60 38 cg 24 83 47 40 cg 20 99 98 13 cg 20 30 56 10 cg 26 04 33 91 cg 19 94 58 40 cg 16 74 47 41 cg 14 65 48 75 cg 22 73 63 54 cg 12 37 37 71 cg 25 15 96 10 cg 01 64 48 50 cg 17 28 53 25 cg 16 24 17 14 cg 19 72 28 47 cg 22 63 75 07 cg 21 95 05 18 cg 03 94 73 62 cg 01 51 15 67 cg 16 98 49 44 cg 14 40 89 69 cg 25 78 11 23 cg 23 09 20 72 cg 26 61 40 73 cg 07 49 84 21 cg 01 87 36 45 cg 02 04 75 77 cg 21 30 52 65 cg 23 78 65 76 cg 20 10 03 81 cg 13 26 94 07 cg 09 78 51 72 cg 26 16 26 95 cg 22 19 78 30 cg 15 26 29 28 cg 11 02 57 93 cg 01 48 56 45 cg 03 28 67 83 cg 02 82 71 12 cg 20 52 42 16 cg 26 45 69 57 cg 19 27 31 82 cg 23 12 44 51 cg 15 34 13 40 cg 16 57 91 01 cg 07 84 99 04 cg 08 09 07 72 cg 27 09 20 35 cg 01 02 78 05 cg 24 45 03 12 cg 06 81 06 47 cg 15 38 17 69 cg 21 46 00 81 cg 20 94 77 75 cg 05 67 53 73 cg 22 43 22 69 cg 19 00 88 09 cg 09 44 11 52 cg 16 40 83 94 cg 17 32 41 28 cg 05 59 02 57 cg 01 23 40 63 cg 07 73 03 01 cg 18 03 10 08 cg 08 41 34 69 cg 22 61 30 10 cg 05 84 77 78 cg 23 18 03 65 cg 19 47 87 43 cg 19 04 46 74 cg 17 40 86 47 cg 18 95 60 95 cg 15 18 52 86 cg 27 37 74 50 cg 03 76 04 83 cg 14 30 84 52 cg 04 99 96 91 cg 13 31 91 75 cg 18 05 50 07 cg 26 84 20 24 cg 02 97 25 51 cg 02 33 15 61 cg 04 52 88 19 cg 16 54 75 29 cg 05 25 04 58 cg 13 89 91 08 Z−score (in PBMC) −4 −2 0 2 4 Cell type B cell K562 PBMC Sotos DMPs Hypomethylated aDMPs Hypermethylated Hypomethylated Weight in model −1 −0.5 0 0.5 1 ChrHMM state (in K562) Active TSS Promoter Transcribed Weakly transcribed Transcribed/regulatory Active enhancer Weak enhancer DNase Heterochromatin Poised promoter Bivalent promoter Repressed polycomb Quiescent/low RNA (in PBMC) −2 −1 0 1 2 In gene body Yes No Fig. S2.6 Heatmap displaying the scores for the different continuous (epi)genomic features (rows) in each one of the 353 Horvath’s epigenetic clock CpGs (columns). The names of the features include the ENCODE ID (see Fig. S2.11). Hierarchical clustering was performed in both rows and columns. RNA refers to the ‘normalised RNA expression’ (NRE). aDMPs: differentially methylated positions during ageing. PBMC: peripheral blood mononuclear cells. S.2 Supplementary for chapter 3 157 ll l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l ll l l l ll ll l l l l l l ll l l l l l l l ll l l l ll l ll ll ll l l l l ll ll l l l l Hypo aDMPs Hypo Sotos DMPs All Horvath Hyper aDMPs Active Enhancer 1 Active Enhancer 2 Active Enhancer Flank Active TSS Bivalent prom oter CG I G ene_body H eterochrom atin P oised prom oter Prim ary DNase Prim ary H3K27ac possible Enhancer Prom oter D o w n stream TSS 1 Prom oter D o w n stream TSS 2 Prom oter Upstream TSS Quiescent/lo w R epressed polycom b Shelf Shore Strong tra n scription Tra n scribed − 3' prefe re ntial Tra n scribed − 5' prefe re ntial Tra n scribed & regulatory (Prom/Enh) Tra n scribed 3' prefe re ntial and Enh Tra n scribed 5' prefe re ntial and Enh Tra n scribed and W e ak Enhancer W e ak Enhancer 1 W e ak Enhancer 2 W e ak tra n scription ZN F genes & repeats Active Enhancer 1 Active Enhancer 2 Active Enhancer Flank Active TSS Bivalent prom oter CG I G ene_body H eterochrom atin P oised prom oter Prim ary DNase Prim ary H3K27ac possible Enhancer Prom oter D o w n stream TSS 1 Prom oter D o w n stream TSS 2 Prom oter Upstream TSS Quiescent/lo w R epressed polycom b Shelf Shore Strong tra n scription Tra n scribed − 3' prefe re ntial Tra n scribed − 5' prefe re ntial Tra n scribed & regulatory (Prom/Enh) Tra n scribed 3' prefe re ntial and Enh Tra n scribed 5' prefe re ntial and Enh Tra n scribed and W e ak Enhancer W e ak Enhancer 1 W e ak Enhancer 2 W e ak tra n scription ZN F genes & repeats 0.01 0.10 1.00 10.00 0.01 0.10 1.00 10.00O dd s ra tio 8.0 8.2 8.4 − log10(P − value) Fig. S2.7 As in Fig. S2.4., but focused on the 353 Horvath’s epigenetic clock CpG sites. 158 0.00048 21015 353 −0.356 −0.474 4.8e−06 21285 83 −0.357 −0.656 0.47 21306 62 −0.358 −0.408 0.42 21339 29 −0.358 −0.401 All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K27ac 0.00024 21015 353 −0.218 −0.359 0.0017 21285 83 −0.219 −0.392 0.003 21306 62 −0.22 −0.437 0.079 21339 29 −0.22 −0.426 All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K4me3 0.059 21015 353 −0.17 −0.131 0.00058 21285 83 −0.17 −0.039 0.074 21306 62 −0.169 −0.237 0.57 21339 29 −0.169 −0.208 All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K36me3 0.0011 21015 353 −0.419 −0.298 3.8e−11 21285 83 −0.419 0.665 0.15 21306 62 −0.417 −0.42 0.43 21339 29 −0.418 −0.298 All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K27me3 1e−04 21015 353 −0.248 −0.477 3.3e−07 21285 83 −0.25 −0.66 0.21 21306 62 −0.251 −0.395 0.46 21339 29 −0.251 −0.307 All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K9ac 0.64 21015 353 −0.035 −0.035 0.93 21285 83 −0.035 −0.14 0.28 21306 62 −0.035 −0.02 0.032 21339 29 −0.035 0.264 All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K4me1 0.011 21015 353 −0.135 −0.088 2.4e−06 21285 83 −0.135 0.223 0.09 21306 62 −0.134 −0.257 0.41 21339 29 −0.134 −0.196 All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: H3K9me3 0.54 21015 353 −0.072 −0.127 0.54 21285 83 −0.073 −0.129 0.93 21306 62 −0.073 −0.135 0.3 21339 29 −0.073 0.027 All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: RNF2 0.00014 21015 353 −0.291 −0.187 2e−05 21285 83 −0.29 −0.008 0.52 21306 62 −0.29 −0.321 0.16 21339 29 −0.29 −0.197 All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N FC Feature: EZH2 8.1e−05 21015 353 −0.358 −0.385 1.6e−06 21285 83 −0.358 −0.42 0.12 21306 62 −0.358 −0.386 0.011 21339 29 −0.358 −0.4 All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N R E Feature: RNA 0.24 21015 353 71.313 71.823 0.51 21285 83 71.313 72.587 0.59 21306 62 71.324 68.928 0.36 21339 29 71.313 74.865 All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t 0 25 50 75 10 0 W TS Feature: Replication_timing 0.41 21015 353 1.245 1.314 0.054 21285 83 1.245 1.432 0.7 21306 62 1.246 1.448 0.43 21339 29 1.246 1.633 All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t Co nt ro l In s ub se t − 2 0 2 4 N R C Feature: LaminB1 Fig. S2.8 As in Fig. S2.5., but focused on the 353 Horvath’s epigenetic clock CpG sites. S.2 Supplementary for chapter 3 159 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.06 −0.03 0.00 0.03 0.06 0 20 40 Chronological age (years) Ge no m e−w id e Sh an no n en tro py a cc el er at io n Control: N=1128 Sotos: N=20 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.06 −0.03 0.00 0.03 0.06 0 20 40 Chronological age (years)S ha nn on e nt ro py a cc el er at io n fo r t he cl oc k si te s Control: N=1128 Sotos: N=20 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ●●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● 0.3 0.4 0.5 0.6 0 20 40 Chronological age (years) Sh an no n en tro py fo r t he cl oc k si te s Disease status ● ● Control Sotos Control: N=1128 Sotos: N=20 a b Fig. S2.9 Methylation Shannon entropy acceleration. a. Scatterplot showing the relationship between the genome-wide Shannon entropy acceleration (gSEA) and chronological age of the samples for Sotos (orange) and healthy controls (grey). Each sample is represented by one point. The yellow line represents the linear model gSEA ∼ Age, with the standard error shown in the light yellow shade. b. As in a., but using the Shannon entropy acceleration calculated only for the 353 CpG sites in the Horvath’s epigenetic clock (cSEA). l l l l l l l l l l l ll l l l l l l l l l l l l l l l l ll l l l l l l l l l l lll l l l ll ll l l l l ll l l l l l l l l ll l l l l l l l l ll ll l l l l l l l ll l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l ll l l l l l l l l l l l l l l ll l l ll l l l ll l l l l l l l ll l l l l l l l l l l l l ll l l l l l l l l ll ll l l ll l l l l l l l lll l l l l l l l ll l l l l l ll l l l l ll l l ll l l l l l lll ll l l ll l l ll l l l l l l l l l l l l l ll l l l l l l l llll ll l l l l l l ll l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l lll l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l ll l l l l ll l l l l l ll l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll ll l l l l l ll l l l l l l l l l l l l l l l l lll l l l l l l l l l l l l l l ll l l ll l l lll l l l ll l l ll l l l ll l l ll l l l l l l l ll lll l ll l l l l l l l l l l ll l l l ll l l ll l l l ll l l l l l l l l ll l lll l l lll ll l l ll ll l l l l l ll l l l l l l l l l l l l l ll l l l l l l l l ll l l l l 0.3 0.4 0.5 0.6 0 20 40 Chronological age (years) Sh an no n en tro py fo r th e cl oc k si te s Batch l l l l l l l l l l l l l l Europe Feb_2016 GSE104812 GSE111629 GSE40279 GSE41273 GSE42861 GSE51032 GSE55491 GSE59065 GSE61496 GSE74432 GSE81961 GSE97362 Control: N=1128 Sotos: N=20 Fig. S2.10 Scatterplot showing the effects of the different batches on the methylation Shannon entropy calcu- lations for the 353 Horvath’s epigenetic clock sites. Each sample is represented by one point and coloured according to the batch that they belong to. 160 File ID Feature type Data type Tissue Age (years) Sex Source ENCFF516PTT EZH2 fold change over control B cell 27 Female ENCODE ENCFF071CIY RNF2 fold change over control K562 NA NA ENCODE ENCFF857HEZ RNF2 fold change over control K562 NA NA ENCODE ENCFF320VKN RNF2 fold change over control K562 NA NA ENCODE ENCFF847TGB RNF2 fold change over control K562 NA NA ENCODE ENCFF737OJY H3K27ac fold change over control PBMC 32 Male ENCODE ENCFF303YKC H3K4me3 fold change over control PBMC 32 Male ENCODE ENCFF643USH H3K36me3 fold change over control PBMC 32 Male ENCODE ENCFF249WVX H3K36me3 fold change over control PBMC 28 Male ENCODE ENCFF759GIZ H3K27ac fold change over control PBMC 28 Female ENCODE ENCFF412KUE H3K27me3 fold change over control PBMC 32 Male ENCODE ENCFF455IGC H3K9ac fold change over control PBMC 28 Male ENCODE ENCFF457WMB H3K4me1 fold change over control PBMC 32 Male ENCODE ENCFF211ORP H3K9ac fold change over control PBMC 27 Male ENCODE ENCFF171WZC H3K9me3 fold change over control PBMC 27 Male ENCODE ENCFF573QMJ H3K4me3 fold change over control PBMC 27 Male ENCODE ENCFF150RIG H3K27me3 fold change over control PBMC 28 Female ENCODE ENCFF033IPJ H3K9me3 fold change over control PBMC 28 Female ENCODE ENCFF796FFT H3K4me3 fold change over control PBMC 28 Female ENCODE ENCFF100NYH H3K4me1 fold change over control PBMC 27 Male ENCODE ENCFF713QZB H3K9me3 fold change over control PBMC 32 Male ENCODE ENCFF265VZG H3K27me3 fold change over control PBMC 28 Male ENCODE ENCFF319EBK H3K9me3 fold change over control PBMC 28 Male ENCODE ENCFF754LBN RNA-seq minus strand signal of unique reads PBMC 52 Female ENCODE ENCFF398HDS RNA-seq plus strand signal of unique reads PBMC 52 Female ENCODE GSM923447 Replication timing Wavelet-transformed signals IMR90 NA Female GEO GSM1289416 LaminB1 Normalised read counts IMR90 NA NA GEO Fig. S2.11 Information (including the source) about the continuous (epi)genomic features (ChIP-seq and RNA-seq data) that were included in my analysis to annotate the different sets of CpG sites. All the data were mapped to the hg19 assembly of the human genome. PBMC: peripheral blood mononuclear cells. S.3 Supplementary for chapter 4 161 S.3 Supplementary for chapter 4 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l AanI Acc36I AccB7I AclWI AcsI AcuI AflII AflIII AgeI AgsI AhdI AhlI AjnI AjuI AluBI Alw21I AlwNI Ama87I AseI Asp700I AspA2I AsuHPI AsuII AxyI BaeGI BalI BamHI BanII BauI BbsI BbvI BccI BciT130I BciVI BclI BfaI BfmI BfuCI BglII BlpI BmcAI BmrI BmtI BplI BpmI Bpu10I BpuEI BsaJI BsaWI BsaXI Bse118I Bse1I Bse3DI BseGI BseMII BseRI BsgI BshFI BsiSI BsmI Bsp1286I Bsp1407I Bsp19I BspCNI BspHIBspMAI BspQI BssT1I Bst6I BstDEI BstDSI BstEII BstENI BstKTI BstNSI BstX2I BstXI BtsI BtsIMutI CsiI Csp6I CspCI CviAII DraI Eco147I Eco32I EcoO109I EcoT22I FaeI FalI FatI FauNDI FokI HindIII Hpy188I HpyCH4V KpnI MaeIII MboII MfeI MluCI MlyI MnlI MseI MslI MssI NmeAIII PacI PaeI PaeR7I PasI PciI PflFI PpuMI Psp124BI PvuII SbfI SmiI SmlISspI TaqI TatI Tru9I TscAI Tsp45I TspDTI XbaI XcmI 105 106 107 102 103 104 Median fragment length (in bp) To ta l n u m be r o f f ra gm en ts Fig. S3.1 Scatterplot which summarises the fragment length distributions for the same isoschizomer families portrayed in Fig 4.2a. The red dots represent the actual values of median fragment length and total number of fragments for each family. The black lines assign each name label to the correspondent red point for visualization purposes. 162 Mean GC content (%) 0.5 1.5 0.745 0.754 2 6 10 0.71 0.533 5.5 6.5 7.5 −0.813 0.793 45 55 65 −0.793 0.594 2 4 6 8 0.867 0.928 2 6 10 0.683 25 40 55 0.908 0. 5 1. 5 l l l l l ll l l l l l l l l l l l ll l l l ll l l l l l lll l l l l ll l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l Mean CpG content (%) 0.743 0.961 0.356 −0.653 0.789 −0.79 0.943 0.951 0.853 0.958 0.846 l ll l l l l ll l l l l l ll l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l ll l l l l l ll l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l ll l l l l ll l l l l l ll l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l ll l lll l l l l l ll l l l l l l l lll l l l l l ll l l l % of sites in protein−coding genes 0.676 0.879 −0.66 0.996 −0.996 0.652 0.822 0.893 0.723 35 450.846 2 6 10 l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l ll ll l l l l l l l l ll l l l l l l l l l ll l ll l l ll l l l l l l ll l l l l l l l l l l l l ll l l l l l ll l l l l l l l l ll l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l lll l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l ll l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l % of sites in exons 0.242 −0.59 0.72 −0.721 0.896 0.904 0.783 0.907 0.806 l l l l l l l lll l l l l ll l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l ll l l l l l l ll lll l l l l l ll l l ll l l l l l l l l l l l l l ll l l l l ll l l l l l l l l l l l l l l l l l l l l l l lll l l l l l l l l ll l l l l l l l l l ll ll l ll l l l l l l l l l l l l ll l l l l l l l ll ll l l l l l ll l ll l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l lll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l llll l l l l l ll ll l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l lll l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l ll l l l l l l l ll ll l l l l l ll ll l l l l ll l l l l l l l l ll l l l l l l l l l l l l l % of sites in introns −0.487 0.844 −0.844 0.278 0.497 0.668 0.364 35 450.591 5. 5 7. 0 l ll l l l l ll l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l ll l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l ll l l l l l l l l l l l l ll l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l % of sites in non−coding RNA genes −0.706 0.705 −0.581 −0.775 −0.841 −0.673 −0.748 l l l l l l l ll l l l l l ll l l l l l l l ll l l l l ll l l l l l l l l ll ll l l l l ll l l l l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l ll l l l l l l l l lll l l l l l l l l l l l l l l l l l l l l l l lll l l l l l l l ll l llll l l l l l l l l l l l l l l l lll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l ll ll l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l ll ll l l l l l l l l l lll l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l ll l llll l l l l l l l l l l l l l l l l l l l l l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l lll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l % of intragenic sites −1 0.701 0.865 0.924 0.774 35 45 55 0.875 45 55 65 l l l l l l l ll l l l l l ll l l l l l l l ll l l l l ll l l l l l l l l ll ll l l l l ll l l l l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l ll l l l l l l l l lll l l l l l l l l l l l l l l l l l l l l l l lll l l l l l l l l l l ll ll l l l l l l l l l l l l l l l lll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l ll l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l ll ll l l l l l l l l l lll l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l ll l l l ll ll l l l ll l l l l l l l % of intergenic sites −0.702 −0.865 −0.924 −0.774 −0.875 l l l l l llll l l l l l l l l llll lll l l l l ll l l l ll l l l l l l l ll ll l l l ll l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l llll l l l l lll ll l ll l l l l l l l ll ll l ll l l l l lll ll l l l l l l ll l l l l ll lll l l l l l l l ll ll ll l l l lll l l l lll l l l l l l l l ll l l l l ll l l l l l l lll l l l l l ll l l l l l l l l l ll l l l l l l l l l l l l l lll l l l l l l l l l ll l ll ll l l ll l l l l l l l ll l l ll l lll l l ll l lll l l l l l l l l l l l l l l l l l ll ll ll l l l l l l l l lll l ll l l l l l l l l l lll l ll l l l l l l l lll l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l ll l l l l l ll l l l l l l l l l lll l l l ll l l ll l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l ll ll l ll l l l l l ll l l l l l l l l ll l l l l l l l l l l ll lll l l l l l l l lll l l l l l lll l l l lll ll l l l l l l l l ll l l l ll l l l l l l lll l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l lll l l l l ll l lll l l l l l l ll l l ll l l l ll l l ll l l l l l l l ll l l l l l l l l l l ll l l ll l ll l l l l l l % of sites in CGI 0.829 0.721 0.986 0 5 15 0.657 2 6 10 l l l l l l l l l l l l l l l l l lll l l l l l l l l l l l ll l l l ll l l l l l l l l l l l l l l l ll ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l ll l ll l l l l l l l ll l l l l l l l ll l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l ll l l l l l l l l l l l l l l l l l l l l l l lll l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l lll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l ll l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l ll l l l ll l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l % of sites in shores 0.951 0.892 0.952 l l l l l l l l l l l l l ll l l ll l l l l l l l l l ll l l l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l ll ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l % of sites in shelves 0.801 2 4 6 0.961 2 6 10 l l l l l lll l l l l l l l l l lll l l l ll l l l ll ll l ll l l l l l l l l l l ll ll l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l lll l l l l l l ll l l l l l l l ll l l l l l l l l l l ll l l l l l ll l l l ll l l l l l l l l l l ll l l l l l l l l ll l l l l l l l ll l l l l ll lll l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l ll l l l l l ll l l l l l l l l l l l ll l l ll l l l l l l l l l l l llll l l l l l ll l l l l l l l l l l l l ll l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l ll l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l ll ll l l l l l l l l l l l l l l ll l l l l l l l ll l l l l l l l lll l l l l l l l l l l l l l l l l ll l l l l l l l l l l l lll l l l l l l l l l l l l l l l l lll l l ll l l l l l l l l l l ll l l l l l l l ll l l l l l l l l lll l l l l ll l ll l l l l l l ll l l l l l l l l l ll ll l l l l l l l l l l l l l l l l ll ll ll lll l l l l l l l l lll l l ll l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l ll l l l l l l lll l l l l l l lll l l l l l l lll lll l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l ll l l l l l lll l l ll l l l l ll l l l l l l l l ll lll l l l l l ll l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l % of sites in CGI−containing promoters 0.743 25 35 45 55 l l l l l l l l l l l l l l ll l lll l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l 35 45 l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l ll l l l l 35 40 45 l l l l l l l l l l l l l l l l l l lll l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l lll l ll l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l ll l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l 35 45 55 l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l ll l l l ll l l ll l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l ll l l l l l l l l l l l ll l 0 5 10 15 l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l lll l l l l l l l l l l ll l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l 2 3 4 5 6 l l l l l l l l l l l l l l ll l lll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l ll l l l l l ll l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l 0.8 1.2 1.6 2.0 0. 8 1. 4 2. 0 % of sites in non CGI−containing promoters Fig. S3.2 Matrix of scatterplots showing the percentages of cleavage sites from different restriction enzymes that overlap with several genomic features (listed on the diagonal) in the human genome (hg38). The red dot in each scatterplot represents the values for MspI. The numbers above the diagonal are the Pearson correlation coefficients between all the possible pairs of genomic features. S.3 Supplementary for chapter 4 163 First author(s) Title Date Single enzymes checked Double enzymes checked Size ranges interrogated Genomic regions targeted Organism(s) Read lengths tested For sequencing Code available Cedar H Direct detection of methylated cytosine in DNA by use of the restriction enzyme MspI 1979 YES NO NA NA Neurospora crassa , herpes virus, fly, bovine NA N N Yu L A NotI–EcoRV promoter library for studies of genetic and epigenetic alterations in mouse models of human malignancies 2004 YES YES NA CpG islands, protein-coding genes Human (hg16), mouse (mm4) NA Y N Wang J and Xia Y Double restriction-enzyme digestion improves the coverage and accuracy of genome-wide CpG methylation profiling by reduced representation bisulfite sequencing 2013 YES YES 2 Increase CpG coverage genome- wide Human (hg18), mouse(mm9) 50 bp PE, 90 bp PE Y N Bystrykh L A combinatorial approach to the restriction of a mouse genome 2013 YES YES NA NA Mouse (mm10) NA N N Martinez-Arguelles DB In silico analysis identifies novel restriction enzyme combinations that expand reduced representation bisulfite sequencing CpG coverage 2014 YES YES 1 Increase CpG coverage genome- wide Human (hg38), mouse (mm10), rat (NCBI build 4.2) 50 bp PE Y N Lee YK and Jin S Improved reduced representation bisulfite sequencing for epigenomic profiling of clinical samples 2014 YES YES 1 Increase CpG coverage genome- wide Human (hg19) 36 bp PE Y N Kirschner SA Focussing reduced representation CpG sequencing through judicious restriction enzyme choice 2016 YES YES 2 Increase CpG coverage genome- wide Mouse (mm10) NA Y N Tanas AS Rapid and affordable genome-wide bisulfite DNA sequencing by XmaI- reduced representation bisulfite sequencing 2017 YES NO 1 CpG islands Human (hg19) NA Y N Martin-Herranz DE and Stubbs TM cuRRBS 2017 YES YES Defined by the user Defined by the user Defined by the user Defined by the user Y Y Fig. S3.3 Table showing the comparison of different studies that have attempted to use restriction enzymes to target different regions in the genome. 164 Annotation for sites of interest Pre-computed in silico digestions Annotation for restriction enzymes Restriction enzymes to check INPUT Obtain the fragment size distribution and the location of the sites of interest in the digested fragments Calculate the Score, NF/1000 and EV variables for different size ranges Find the optimal size range which minimizes the EV Filtering: Score > C_Score ⋅ max_Score NF/1000 ≤ C_NF/1000 ⋅ ref_NF/1000 For each enzyme or enzyme combination cuRRBS Rank the enzymes or enzyme combinations by EV and calculate their CRF and robustness CSV file containing information about the optimal enzymes and size ranges to use in the new cuRRBS protocol OUTPUT ● ● ● 0.00 0.25 0.50 0.75 1.00 0 10 20 0.4 0.6 0.8 1.0 R > Q3 = 0.9834 Q1 <= R <= Q3 R < Q1 = 0.9580 a b c Individual enzymes 2-enzyme combinations All Pe ar so n' s co rr el at io n co ef fic ie nt Trade-off between NF/1000 and Score D en si ty Robustness (R) Fig. S3.4 Additional insights into cuRRBS. a. Detailed flowchart showing the input, main steps in cuRRBS and the output of the software. b. Violin plots showing the distribution of Pearson’s correlation coefficients between the number of fragments (NF) and the Score for all the different enzymes tested with cuRRBS (single-enzyme, double-enzyme, all). In this example we used the Horvath epigenetic clock system [Horvath, 2013a], checking all the size ranges between 20 and 1000 bp, with an experimental error of 10 bp and a read length of 75 bp. Each yellow point represents the median for the Pearson’s correlation coefficients under consideration. c. Density plot showing the distribution of the robustness (R) values when assuming an experimental error (δ ) of 20 bp. cuRRBS was run for all the biological systems under study (Fig. S3.5) [Domcke et al., 2015; Hanna et al., 2016; Horvath, 2013a; Kawakatsu et al., 2016; Lev Maor et al., 2015; Maurano et al., 2015; Milagre et al., 2017] with the same parameters as described in ‘Running cuRRBS for different in silico systems’ in section 4.7 (all the hits that satisfied the thresholds were reported in this case). The dashed blue line represents the median (0.9734). The different colours provide a way to judge the robustness values: bad (in red, R < Q1 = 0.9580), medium (in orange, Q1 ≤ R≤ Q3 = 0.9834) and good (in green, R > Q3); where Q1 and Q3 represent the first and the third quartiles respectively. S.3 Supplementary for chapter 4 165 Species System PMID where applicable Additional information about the system Total number of sites targeted Optimal restriction enzyme combination Optimal theoretical size range (in bp) % max Score NF /1000 Enrichment Value (EV ) Cost Reduction Factor (CRF ) Robustness (R ) Homo sapiens Exon-intron boundaries DNA methylation has been shown to affect alternative splicing. Therefore, we focused on targeting CpGs close to canonical splicing sites. 26211 (BsiSI OR MspI) AND (SbfI OR SdaI OR Sse8387I) 80_500 25.4 772.23 2.06446811 53.32 0.94704403 Homo sapiens Horvath epigenetic clock 24138928 The Horvath epigenetic clock is the best predictor of biological age available in humans. We have attempted to target the 353 CpG sites that are used in the model in order to reduce the cost associated with the assay. 353 (BsiSI OR MspI) AND (BspQI OR LguI OR SapI) 60_160 27.57 442.456 3.65771916 93.06 0.91305072 Homo sapiens Imprinted loci 26769960 Genomic imprinting is an epigenetic phenomenon that results in gene expression occuring in a parent-of-origin fashion. We have attempted to target Cs in CpG context that are found within the canonical human imprints. 2810 (BmeT110I OR BsoBI) AND (BsaWI) 60_540 25.12 336.88 2.67867053 122.23 0.98085689 Homo sapiens Placental imprinted loci 26769960 Genomic imprinting is an epigenetic phenomenon that results in gene expression occuring in a parent-of-origin fashion. However, until recently many extraembryonic imprints were still unknown. We have targetted Cs in CpG context that are found within these novel human placental imprints. 7591 (BsaWI) AND (BssAI) 60_540 26.41 107.248 1.72827483 383.94 0.93382453 Homo sapiens CTCF sites 26257180 CTCF is an important architectural protein that helps to organise chromatin domains. Since its binding has been shown to be dependent on DNA methylation in some of its recognition sequences, we have targeted the CpG sites within these regions of the genome. 2000 (BmeT110I OR BsoBI) AND (BssAI) 40_360 25.5 314.079 2.78946872 131.1 0.88798165 Mus musculus iPSCs demethylated 28147265 iPSC reprogramming in mouse is characterised by global changes in DNA methylation. Sites that tend to undergo demethylation faster than the genome average tend to be within ESC-Super Enhancers. We targetted the Cs in CpG context in these regions, as they are interesting for the reprogramming field. 1449 (BmeT110I OR BsoBI) AND (BsiSI OR MspI) 80_980 25.19 974.05 3.42628839 37.31 0.96792238 Mus musculus iPSCs maintained 28147265 iPSC reprogramming in mouse is characterised by global changes in DNA methylation. Sites that tend to be resistant to the genome-wide demethylation tend to be within Intercisernal A-particle containing regions. We targetted the Cs in CpG context in these regions, as they are interesting for the reprogramming field. 3896 (BmeT110I OR BsoBI) AND (BsiSI OR MspI) 80_560 25.85 690.088 2.835875 52.66 0.94227711 Mus musculus NRF1 sites 26675734 NRF1 is a transcription factor whose binding to the DNA is dependent on the methylation status of its recognition sequences. We have tried to enrich for those CpG sites that overlap with in vivo NRF1 binding sites. 17018 (BmeT110I OR BsoBI) AND (PaeI OR SphI) 20_760 25.04 445.36 2.01909776 81.6 0.99634045 Arabidopsis thaliana CHG sites 27419873 Non-CpG methylation is an important epigenetic modification in plants. In this study a huge number of regions containing non-CpG methylation were found to vary between different Arabidopsis accessions in the 1001 Epigenomes Project. We targetted Cs in non- CpG context within these non-CpG DMRs. 21801 (AanI OR PsiI) AND (Csp6I OR CviQI) 100_520 25.05 165.313 1.48095531 9.65 0.94999336 Fig. S3.5 Table showing the information regarding the different biological systems [Domcke et al., 2015; Hanna et al., 2016; Horvath, 2013a; Kawakatsu et al., 2016; Lev Maor et al., 2015; Maurano et al., 2015; Milagre et al., 2017] for which cuRRBS was run in silico. Some variables from the top hits in cuRRBS output are also reported. 166 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 Depth of coverage threshold Di ffe re nc e in th e nu m be r o f s ite s (% ) FN FP TN TP ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 25 50 75 100 5 10 15 20 Depth of coverage threshold Pe rc en ta ge (% ) ● ● ● ● Sensitivity; size range: 90−185 bp Sensitivity; size range: 110−200 bp Specificity; size range: 90−185 bp Specificity; size range: 110−200 bp a b Fig. S3.6 Effect of experimental errors during size selection in cuRRBS predictions. a. Barplots showing the difference in the number of true positives (TP, in green), true negatives (TN, in blue), false positives (FP, in red) and false negatives (FN, in yellow) derived from cuRRBS theoretical predictions for the XmaI-RRBS data [Tanas et al., 2017] using two different size ranges: 110-200 bp (aimed size range) and 90-185 bp (real size range). The difference observed between the two size ranges (aimed - real) is expressed as the percentage of the total number of sites considered (i.e. all CGI- CpGs). The number of sites in each category is calculated for different thresholds in the depth of coverage (number of reads covering a CpG site as reported by Bismark). cuRRBS was run for XmaI with all the default parameters (with a read length of 200 bp). Legend is displayed on the right hand side. b. Plot showing values of cuRRBS sensitivity and specificity as a function of the depth of coverage threshold employed to filter the experimental data [Tanas et al., 2017]. The two size ranges considered in a. (aimed: 110-200 bp; real: 90-185 bp) are used for the calculations. Legend is displayed below the plot curves. S.3 Supplementary for chapter 4 167 ● ● ● ● ● ● ● ● ● ● ● ● ● 0 250 500 750 1000 0 25 50 75 100 125 Number of enzymes M ea n tim e (s ) ● ● ● ● ● ● ● ● ● ● 100 120 140 160 10 20 30 40 50 Experimental error (bp) M ea n tim e (s ) ● ● ● ● ●●●● 100 150 200 250 300 0 25000 50000 75000 100000 Number of sites of interest M ea n tim e (s ) ● ● ● 50 75 100 125 150 0 1 2 3 4 Genome size (GB of pre−computed files) M ea n tim e (s ) a b c d Fig. S3.7 cuRRBS computational efficiency. a. Plot showing the dependency between the number of enzymes checked and the computational (real) time required by the software (mean between 3 independent runs). cuRRBS was run for the Horvath epigenetic clock system [Horvath, 2013a] with a read length of 75 bp, a Score threshold of 25% and an experimental error of 10 bp. A laptop with an Intel® CoreT M i7-6600U CPU was used, which allowed cuRRBS to employ 4 parallel threads. The red error bars display the mean ± SD for the 3 independent runs. b. Plot showing the dependency between the experimental error (which determines how many size ranges are sampled) and the computational (real) time required by the software (mean between 3 independent runs). cuRRBS was run for the Horvath epigenetic clock system [Horvath, 2013a] with a read length of 75 bp, a Score threshold of 25% and a list with 40 enzymes. A laptop with an Intel® CoreT M i7-6600U CPU was used, which allowed cuRRBS to employ 4 parallel threads. The red error bars display the mean ± SD for the 3 independent runs. c. Plot showing the dependency between the number of sites of interest and the computational (real) time required by the software (mean between 3 independent runs). cuRRBS was run with a read length of 75 bp, a Score threshold of 25%, an experimental error of 10 bp and a list with 40 enzymes. A laptop with an Intel® CoreT M i7-6600U CPU was used, which allowed cuRRBS to employ 4 parallel threads. The red error bars display the mean ± SD for the 3 independent runs. d. Plot showing the dependency between genome size (measured as the size in GB of all the pre-computed files) and the computational (real) time required by the software (mean between 3 independent runs). cuRRBS was run with a read length of 75 bp, a Score threshold of 25%, an experimental error of 10 bp and a list with 40 enzymes. A laptop with an Intel® CoreT M i7-6600U CPU was used, which allowed cuRRBS to employ 4 parallel threads. The red error bars display the mean ± SD for the 3 independent runs. References Akalin, A. (2014). AmpliconBiSeq GitHub repository: findElbow function. Aldinger, K. A., Plummer, J. T., and Levitt, P. (2013). Comparative DNA methylation among females with neurodevelopmental disorders and seizures identifies TAC1 as a MeCP2 target gene. Journal of Neurodevelopmental Disorders, 5(1):15. Alexandrov, L. B., Jones, P. H., Wedge, D. C., Sale, J. E., Campbell, P. J., Nik-Zainal, S., and Stratton, M. R. (2015). Clock-like mutational processes in human somatic cells. Nature Genetics, 47(12):1402–1407. Alexandrov, L. B. and Stratton, M. R. (2014). Mutational signatures: the patterns of somatic mutations hidden in cancer genomes. Current Opinion in Genetics & Development, 24:52–60. Alisch, R. S., Wang, T., Chopra, P., Visootsak, J., Conneely, K. N., and Warren, S. T. (2013). Genome-wide analysis validates aberrant methylation in fragile X syndrome is specific to the FMR1locus. BMC Medical Genetics, 14(1):18. Allis, C. D. and Jenuwein, T. (2016). The molecular hallmarks of epigenetic control. Nature Reviews Genetics, 17:487–500. Allum, F., Shao, X., Guénard, F., Simon, M.-M., Busche, S., Caron, M., Lambourne, J., Lessard, J., Tandre, K., Hedman, Å. K., Kwan, T., Ge, B., Consortium, T. M. T. H. E. R., Ahmadi, K. R., Ainali, C., Barrett, A., Bataille, V., Bell, J. T., Buil, A., Dermitzakis, E. T., Dimas, A. S., Durbin, R., Glass, D., Hassanali, N., Ingle, C., Knowles, D., Krestyaninova, M., Lindgren, C. M., Lowe, C. E., Meduri, E., di Meglio, P., Min, J. L., Montgomery, S. B., Nestle, F. O., Nica, A. C., Nisbet, J., O’Rahilly, S., Parts, L., Potter, S., Sandling, J., Sekowska, M., Shin, S.-Y., Small, K. S., Soranzo, N., Surdulescu, G., Travers, M. E., Tsaprouni, L., Tsoka, S., Wilk, A., Yang, T.-P., Zondervan, K. T., Rönnblom, L., McCarthy, M. I., Deloukas, P., Richmond, T., Burgess, D., Spector, T. D., Tchernof, A., Marceau, S., Lathrop, M., Vohl, M.-C., Pastinen, T., and Grundberg, E. (2015). Characterization of functional methylomes by next-generation capture sequencing identifies novel disease- associated variants. Nature Communications, 6:7211. Angermueller, C., Lee, H. J., Reik, W., and Stegle, O. (2017). DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biology, 18(1):67. Anisimov, V. N., Berstein, L. M., Egormin, P. A., Piskunova, T. S., Popovich, I. G., Zabezhin- ski, M. A., Tyndyk, M. L., Yurova, M. V., Kovalenko, I. G., Poroshina, T. E., and Semenchenko, A. V. (2008). Metformin slows down aging and extends life span of female SHR mice. Cell Cycle, 7(17):2769–2773. 170 References Arantes-Oliveira, N., Berman, J. R., and Kenyon, C. (2003). Healthy Animals with Extreme Longevity. Science, 302(5645):611. Aref-Eshghi, E., Bend, E. G., Hood, R. L., Schenkel, L. C., Carere, D. A., Chakrabarti, R., Nagamani, S. C. S., Cheung, S. W., Campeau, P. M., Prasad, C., Siu, V. M., Brady, L., Tarnopolsky, M. A., Callen, D. J., Innes, A. M., White, S. M., Meschino, W. S., Shuen, A. Y., Paré, G., Bulman, D. E., Ainsworth, P. J., Lin, H., Rodenhiser, D. I., Hennekam, R. C., Boycott, K. M., Schwartz, C. E., and Sadikovic, B. (2018a). BAFopathies’ DNA methylation epi-signatures demonstrate diagnostic utility and functional continuum of Coffin–Siris and Nicolaides–Baraitser syndromes. Nature Communications, 9(1):4885. Aref-Eshghi, E., Rodenhiser, D. I., Schenkel, L. C., Lin, H., Skinner, C., Ainsworth, P., Paré, G., Hood, R. L., Bulman, D. E., Kernohan, K. D., Boycott, K. M., Campeau, P. M., Schwartz, C., and Sadikovic, B. (2018b). Genomic DNA Methylation Signatures Enable Concurrent Diagnosis and Clinical Genetic Variant Classification in Neurodevelopmental Syndromes. American Journal of Human Genetics, 102(1):156–174. Aref-Eshghi, E., Schenkel, L. C., Lin, H., Skinner, C., Ainsworth, P., Paré, G., Rodenhiser, D., Schwartz, C., and Sadikovic, B. (2017). The defining DNA methylation signature of Kabuki syndrome enables functional assessment of genetic variants of unknown clinical significance. Epigenetics, 12(11):923–933. Armstrong, V. L., Rakoczy, S., Rojanathammanee, L., and Brown-Borg, H. M. (2013). Expression of DNA Methyltransferases Is Influenced by Growth Hormone in the Long- Living Ames Dwarf Mouse In Vivo and In Vitro. The Journals of Gerontology: Series A, 69(8):923–933. Aryee, M. J., Jaffe, A. E., Corrada-Bravo, H., Ladd-Acosta, C., Feinberg, A. P., Hansen, K. D., and Irizarry, R. A. (2014). Minfi: A flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics, 30(10):1363–1369. Atlasi, Y. and Stunnenberg, H. G. (2017). The interplay of epigenetic marks during stem cell differentiation and development. Nature Reviews Genetics, 18:643–658. Austad, S. N. and Fischer, K. E. (2016). Sex Differences in Lifespan. Cell Metabolism, 23(6):1022–1033. Avrahami, D., Li, C., Zhang, J., Schug, J., Avrahami, R., Rao, S., Stadler, M. B., Burger, L., Schübeler, D., Glaser, B., and Kaestner, K. H. (2015). Aging-Dependent Demethylation of Regulatory Elements Correlates with Chromatin State and Improved Cell Function. Cell Metabolism, 22(4):619–632. Ayyadevara, S., Alla, R., Thaden, J. J., and Shmookler Reis, R. J. (2008). Remarkable longevity and stress resistance of nematode PI3K-null mutants. Aging Cell, 7(1):13–22. Bacalini, M. G., Deelen, J., Pirazzini, C., De Cecco, M., Giuliani, C., Lanzarini, C., Ravaioli, F., Marasco, E., Van Heemst, D., Suchiman, H. E. D., Slieker, R., Giampieri, E., Recchioni, R., Marcheselli, F., Salvioli, S., Vitale, G., Olivieri, F., Spijkerman, A. M., DollCrossed, M. E., Sedivy, J. M., Castellani, G., Franceschi, C., Slagboom, P. E., and Garagnani, P. (2017). Systemic Age-Associated DNA Hypermethylation of ELOVL2 Gene: In Vivo References 171 and in Vitro Evidences of a Cell Replication Process. Journals of Gerontology - Series A Biological Sciences and Medical Sciences, 72(8):1015–1023. Bahcall, O. G. (2018). UK Biobank — a new era in genomic medicine. Nature Reviews Genetics, 19(12):737. Baker, D. J., Childs, B. G., Durik, M., Wijers, M. E., Sieben, C. J., Zhong, J., A. Saltness, R., Jeganathan, K. B., Verzosa, G. C., Pezeshki, A., Khazaie, K., Miller, J. D., and van Deursen, J. M. (2016). Naturally occurring p16Ink4a-positive cells shorten healthy lifespan. Nature, 530:184–189. Baker, D. J., Wijshake, T., Tchkonia, T., LeBrasseur, N. K., Childs, B. G., van de Sluis, B., Kirkland, J. L., and van Deursen, J. M. (2011). Clearance of p16Ink4a-positive senescent cells delays ageing-associated disorders. Nature, 479:232–236. Barau, J., Teissandier, A., Zamudio, N., Roy, S., Nalesso, V., Hérault, Y., Guillou, F., and Bourc’his, D. (2016). The DNA methyltransferase DNMT3C protects male germ cells from transposon activity. Science, 354(6314):909–912. Barbi, E., Lagona, F., Marsili, M., Vaupel, J. W., and Wachter, K. W. (2018). The plateau of human mortality: Demography of longevity pioneers. Science, 360(6396):1459–1461. Bardet, A. F., Steinmann, J., Bafna, S., Knoblich, J. A., Zeitlinger, J., and Stark, A. (2013). Identification of transcription factor binding sites from ChIP-seq data at high resolution. Bioinformatics, 29(21):2705–2713. Barzilai, N., Crandall, J. P., Kritchevsky, S. B., and Espeland, M. A. (2016). Metformin as a Tool to Target Aging. Cell Metabolism, 23(6):1060–1065. Baubec, T., Colombo, D. F., Wirbelauer, C., Schmidt, J., Burger, L., Krebs, A. R., Akalin, A., and Schübeler, D. (2015). Genomic profiling of DNA methyltransferases reveals a role for DNMT3B in genic methylation. Nature, 520(7546):243–247. Beerman, I., Bock, C., Garrison, B. S., Smith, Z. D., Gu, H., Meissner, A., and Rossi, D. J. (2013). Proliferation-dependent alterations of the DNA methylation landscape underlie hematopoietic stem cell aging. Cell Stem Cell, 12(4):413–425. Benayoun, B. A., Pollina, E. A., and Brunet, A. (2015). Epigenetic regulation of ageing: linking environmental inputs to genomic stability. Nature Reviews Molecular Cell Biology, 16:593–610. Berdasco, M., Ropero, S., Setien, F., Fraga, M. F., Lapunzina, P., Losson, R., Alaminos, M., Cheung, N.-K., Rahman, N., and Esteller, M. (2009). Epigenetic inactivation of the Sotos overgrowth syndrome gene histone methyltransferase NSD1 in human neuroblastoma and glioma. Proceedings of the National Academy of Sciences, 106(51):21830–21835. Bernhart, S. H., Kretzmer, H., Holdt, L. M., Jühling, F., Ammerpohl, O., Bergmann, A. K., Northoff, B. H., Doose, G., Siebert, R., Stadler, P. F., and Hoffmann, S. (2016). Changes of bivalent chromatin coincide with increased expression of developmental genes in cancer. Scientific Reports, 6:37393. 172 References Bernstein, B. E., Mikkelsen, T. S., Xie, X., Kamal, M., Huebert, D. J., and Cuff, J. (2006). A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell, 125(2):315–326. Bernstein, D. L., Kameswaran, V., Le Lay, J. E., Sheaffer, K. L., and Kaestner, K. H. (2015). The BisPCR2 method for targeted bisulfite sequencing. Epigenetics {&} Chromatin, 8:27. Bibikova, M., Barnes, B., Tsan, C., Ho, V., Klotzle, B., Le, J. M., Delano, D., Zhang, L., Schroth, G. P., Gunderson, K. L., Fan, J. B., and Shen, R. (2011). High density DNA methylation array with single CpG site resolution. Genomics, 98(4):288–295. Bibikova, M., Le, J., Barnes, B., Saedinia-Melnyk, S., Zhou, L., Shen, R., and Gunder- son, K. L. (2009). Genome-wide DNA methylation profiling using Infinium® assay. Epigenomics, 1(1):177–200. Bird, A. (2007). Perceptions of epigenetics. Nature, 447:396–398. Bjornsson, H. T. (2015). The Mendelian disorders of the epigenetic machinery. Genome Research, 25(10):1473–1481. Blagosklonny, M. V. (2006). Aging and immortality: Quasi-programmed senescence and its pharmacologic inhibition. Blagosklonny, M. V. (2010). Revisiting the antagonistic pleiotropy theory of aging: TOR- driven program and quasi-program. Cell Cycle, 9(16):3171–3176. Blasco, M. A. (2007). Telomere length, stem cells and aging. Nature Chemical Biology, 3:640. Bock, C., Walter, J., Paulsen, M., and Lengauer, T. (2007). CpG island mapping by epigenome prediction. PLoS Comput Biol, 3(6):e110. Bocklandt, S., Lin, W., Sehl, M. E., Sánchez, F. J., Sinsheimer, J. S., Horvath, S., and Vilain, E. (2011). Epigenetic predictor of age. PLoS One, 6(6):e14821. Bonkowski, M. S. and Sinclair, D. A. (2016). Slowing ageing by design: the rise of NAD+ and sirtuin-activating compounds. Nature Reviews Molecular Cell Biology, 17:679–690. Booth, L. N. and Brunet, A. (2016). The Aging Epigenome. Molecular Cell, 62(5):728–744. Bork, S., Pfister, S., Witt, H., Horn, P., Korn, B., Ho, A. D., and Wagner, W. (2010). DNA methylation pattern changes upon long-term culture and aging of human mesenchymal stromal cells. Aging Cell, 9(1):54–63. Bourc’his, D., Xu, G.-L., Lin, C.-S., Bollman, B., and Bestor, T. H. (2001). Dnmt3L and the Establishment of Maternal Genomic Imprints. Science, 294(5551):2536–2539. Boyle, P., Clement, K., Gu, H., Smith, Z. D., Ziller, M., Fostel, J. L., Holmes, L., Meldrim, J., Kelley, F., Gnirke, A., and Meissner, A. (2012). Gel-free multiplexed reduced represen- tation bisulfite sequencing for large-scale DNA methylation profiling. Genome Biology, 13(10):R92. References 173 Brinkman, A. B., Simmer, F., Ma, K., Kaan, A., Zhu, J., and Stunnenberg, H. G. (2010). Whole-genome DNA methylation profiling using MethylCap-seq. Methods, 52(3):232– 236. Bürkle, A., Moreno-Villanueva, M., Bernhard, J., Blasco, M., Zondag, G., Hoeijmakers, J. H. J., Toussaint, O., Grubeck-Loebenstein, B., Mocchegiani, E., Collino, S., Gonos, E. S., Sikora, E., Gradinaru, D., Dollé, M., Salmon, M., Kristensen, P., Griffiths, H. R., Libert, C., Grune, T., Breusing, N., Simm, A., Franceschi, C., Capri, M., Talbot, D., Caiafa, P., Friguet, B., Slagboom, P. E., Hervonen, A., Hurme, M., and Aspinall, R. (2015). MARK-AGE biomarkers of ageing. Mechanisms of Ageing and Development, 151:2–12. Butcher, D. T., Cytrynbaum, C., Turinsky, A. L., Siu, M. T., Inbar-Feigenberg, M., Mendoza- Londono, R., Chitayat, D., Walker, S., Machado, J., Caluseriu, O., Dupuis, L., Grafo- datskaya, D., Reardon, W., Gilbert-Dussardier, B., Verloes, A., Bilan, F., Milunsky, J. M., Basran, R., Papsin, B., Stockley, T. L., Scherer, S. W., Choufani, S., Brudno, M., and Weksberg, R. (2017). CHARGE and Kabuki Syndromes: Gene-Specific DNA Methyla- tion Signatures Identify Epigenetic Mechanisms Linking These Clinically Overlapping Conditions. The American Journal of Human Genetics, 100(5):773–788. Bystrykh, L. V. (2013). A combinatorial approach to the restriction of a mouse genome. BMC Research Notes, 6(1):284. Cai, L., Rothbart, S. B., Lu, R., Xu, B., Chen, W.-Y., Tripathy, A., Rockowitz, S., Zheng, D., Patel, D. J., Allis, C. D., Strahl, B. D., Song, J., and Wang, G. G. (2013). An H3K36 Methylation-Engaging Tudor Motif of Polycomb-like Proteins Mediates PRC2 Complex Targeting. Molecular Cell, 49(3):571–582. Castillo-Fernandez, J. E., Spector, T. D., and Bell, J. T. (2014). Epigenetics of discordant monozygotic twins: implications for disease. Genome Medicine, 6(7):60. Cedar, H., Solage, A., Glaser, G., and Razin, A. (1979). Direct detection of methylated cytosine in DNA by use of the restriction enzyme MspI. Nucleic Acids Research, 6(6):2125– 2132. Chantalat, S., Depaux, A., Héry, P., Barral, S., Thuret, J. Y., Dimitrov, S., and Gérard, M. (2011). Histone H3 trimethylation at lysine 36 is associated with constitutive and facultative heterochromatin. Genome Research, 21:1426–1437. Chen, B. H., Marioni, R. E., Colicino, E., Peters, M. J., Ward-Caviness, C. K., Tsai, P. C., Roetker, N. S., Just, A. C., Demerath, E. W., Guan, W., Bressler, J., Fornage, M., Studenski, S., Vandiver, A. R., Moore, A. Z., Tanaka, T., Kiel, D. P., Liang, L., Vokonas, P., Schwartz, J., Lunetta, K. L., Murabito, J. M., Bandinelli, S., Hernandez, D. G., Melzer, D., Nalls, M., Pilling, L. C., Price, T. R., Singleton, A. B., Gieger, C., Holle, R., Kretschmer, A., Kronenberg, F., Kunze, S., Linseisen, J., Meisinger, C., Rathmann, W., Waldenberger, M., Visscher, P. M., Shah, S., Wray, N. R., McRae, A. F., Franco, O. H., Hofman, A., Uitterlinden, A. G., Absher, D., Assimes, T., Levine, M. E., Lu, A. T., Tsao, P. S., Hou, L., Manson, J. A. E., Carty, C. L., LaCroix, A. Z., Reiner, A. P., Spector, T. D., Feinberg, A. P., Levy, D., Baccarelli, A., van Meurs, J., Bell, J. T., Peters, A., Deary, I. J., Pankow, J. S., Ferrucci, L., and Horvath, S. (2016a). DNA methylation-based measures of biological age: Meta-analysis predicting time to death. Aging, 8(9):1844–1865. 174 References Chen, T., Tsujimoto, N., and Li, E. (2004). The PWWP Domain of Dnmt3a and Dnmt3b Is Required for Directing DNA Methylation to the Major Satellite Repeats at Pericentric Heterochromatin. Molecular and Cellular Biology, 24(20):9048–9058. Chen, Y., Zhang, Y., Zhao, G., Chen, C., Yang, P., Ye, S., and Tan, X. (2016b). Difference in Leukocyte Composition between Women before and after Menopausal Age, and Distinct Sexual Dimorphism. PLOS ONE, 11(9):e0162953. Chen, Y.-a., Lemire, M., Choufani, S., Butcher, D. T., Grafodatskaya, D., Zanke, B. W., Gallinger, S., Hudson, T. J., and Weksberg, R. (2013). Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics, 8(2):203–209. Cheung, W. A., Shao, X., Morin, A., Siroux, V., Kwan, T., Ge, B., Aïssi, D., Chen, L., Vasquez, L., Allum, F., Guénard, F., Bouzigon, E., Simon, M.-M., Boulier, E., Redensek, A., Watt, S., Datta, A., Clarke, L., Flicek, P., Mead, D., Paul, D. S., Beck, S., Bourque, G., Lathrop, M., Tchernof, A., Vohl, M.-C., Demenais, F., Pin, I., Downes, K., Stunnenberg, H. G., Soranzo, N., Pastinen, T., and Grundberg, E. (2017). Functional variation in allelic methylomes underscores a strong genetic contribution and reveals novel epigenetic alterations in the human epigenome. Genome Biology, 18(1):50. Chinn, I. K., Blackburn, C. C., Manley, N. R., and Sempowski, G. D. (2012). Changes in primary lymphoid organs with aging. Seminars in Immunology, 24(5):309–320. Choufani, S., Cytrynbaum, C., Chung, B. H. Y., Turinsky, A. L., Grafodatskaya, D., Chen, Y. A., Cohen, A. S. A., Dupuis, L., Butcher, D. T., Siu, M. T., Luk, H. M., Lo, I. F. M., Lam, S. T. S., Caluseriu, O., Stavropoulos, D. J., Reardon, W., Mendoza-Londono, R., Brudno, M., Gibson, W. T., Chitayat, D., and Weksberg, R. (2015). NSD1 mutations generate a genome-wide DNA methylation signature. Nature Communications, 6:10207. Ciccarone, F., Malavolta, M., Calabrese, R., Guastafierro, T., Bacalini, M. G., Reale, A., Franceschi, C., Capri, M., Hervonen, A., Hurme, M., Grubeck-Loebenstein, B., Koller, B., Bernhardt, J., Schon, C., Slagboom, P. E., Toussaint, O., Sikora, E., Gonos, E. S., Breusing, N., Grune, T., Jansen, E., Dollé, M., Moreno-Villanueva, M., Sindlinger, T., Bürkle, A., Zampieri, M., and Caiafa, P. (2016). Age-dependent expression of DNMT1 and DNMT3B in PBMCs from a large European population enrolled in the MARK-AGE study. Aging Cell, 15(4):755–765. Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., and De Hoon, M. J. L. (2009). Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11):1422–1423. Cohen-Karni, D., Xu, D., Apone, L., Fomenkov, A., Sun, Z., Davis, P. J., Morey Kinney, S. R., Yamada-Mabuchi, M., Xu, S.-y., Davis, T., Pradhan, S., Roberts, R. J., and Zheng, Y. (2011). The MspJI family of modification-dependent restriction endonucleases for epigenetic studies. Proceedings of the National Academy of Sciences, 108(27):11040– 11045. Cole, J. H., Ritchie, S. J., Bastin, M. E., Valdés Hernández, M. C., Muñoz Maniega, S., Royle, N., Corley, J., Pattie, A., Harris, S. E., Zhang, Q., Wray, N. R., Redmond, P., Marioni, References 175 R. E., Starr, J. M., Cox, S. R., Wardlaw, J. M., Sharp, D. J., and Deary, I. J. (2017a). Brain age predicts mortality. Molecular Psychiatry, 23:1385–1392. Cole, J. J., Robertson, N. A., Rather, M. I., Thomson, J. P., McBryan, T., Sproul, D., Wang, T., Brock, C., Clark, W., Ideker, T., Meehan, R. R., Miller, R. A., Brown-Borg, H. M., and Adams, P. D. (2017b). Diverse interventions that extend mouse lifespan suppress shared age-associated epigenetic changes at critical gene regulatory regions. Genome Biology, 18(1):58. Conboy, I. M., Conboy, M. J., Wagers, A. J., Girma, E. R., Weissman, I. L., and Rando, T. A. (2005). Rejuvenation of aged progenitor cells by exposure to a young systemic environment. Nature, 433(7027):760–764. Consortium, I. H. G. S., Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J. P., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A., Sougnez, C., Stange-Thomann, N., Stojanovic, N., Subramanian, A., Wyman, D., Rogers, J., Sulston, J., Ainscough, R., Beck, S., Bentley, D., Burton, J., Clee, C., Carter, N., Coulson, A., Deadman, R., Deloukas, P., Dunham, A., Dunham, I., Durbin, R., French, L., Grafham, D., Gregory, S., Hubbard, T., Humphray, S., Hunt, A., Jones, M., Lloyd, C., McMurray, A., Matthews, L., Mercer, S., Milne, S., Mullikin, J. C., Mungall, A., Plumb, R., Ross, M., Shownkeen, R., Sims, S., Waterston, R. H., Wilson, R. K., Hillier, L. W., McPherson, J. D., Marra, M. A., Mardis, E. R., Fulton, L. A., Chinwalla, A. T., Pepin, K. H., Gish, W. R., Chissoe, S. L., Wendl, M. C., Delehaunty, K. D., Miner, T. L., Delehaunty, A., Kramer, J. B., Cook, L. L., Fulton, R. S., Johnson, D. L., Minx, P. J., Clifton, S. W., Hawkins, T., Branscomb, E., Predki, P., Richardson, P., Wenning, S., Slezak, T., Doggett, N., Cheng, J.-F., Olsen, A., Lucas, S., Elkin, C., Uberbacher, E., Frazier, M., Gibbs, R. A., Muzny, D. M., Scherer, S. E., Bouck, J. B., Sodergren, E. J., Worley, K. C., Rives, C. M., Gorrell, J. H., Metzker, M. L., Naylor, S. L., Kucherlapati, R. S., Nelson, D. L., Weinstock, G. M., Sakaki, Y., Fujiyama, A., Hattori, M., Yada, T., Toyoda, A., Itoh, T., Kawagoe, C., Watanabe, H., Totoki, Y., Taylor, T., Weissenbach, J., Heilig, R., Saurin, W., Artiguenave, F., Brottier, P., Bruls, T., Pelletier, E., Robert, C., Wincker, P., Rosenthal, A., Platzer, M., Nyakatura, G., Taudien, S., Rump, A., Smith, D. R., Doucette-Stamm, L., Rubenfield, M., Weinstock, K., Lee, H. M., Dubois, J., Yang, H., Yu, J., Wang, J., Huang, G., Gu, J., Hood, L., Rowen, L., Madan, A., Qin, S., Davis, R. W., Federspiel, N. A., Abola, A. P., Proctor, M. J., Roe, B. A., Chen, F., Pan, H., Ramser, J., Lehrach, H., Reinhardt, R., McCombie, W. R., de la Bastide, M., Dedhia, N., Blöcker, H., Hornischer, K., Nordsiek, G., Agarwala, R., Aravind, L., Bailey, J. A., Bateman, A., Batzoglou, S., Birney, E., Bork, P., Brown, D. G., Burge, C. B., Cerutti, L., Chen, H.-C., Church, D., Clamp, M., Copley, R. R., Doerks, T., Eddy, S. R., Eichler, E. E., Furey, T. S., Galagan, J., Gilbert, J. G. R., Harmon, C., Hayashizaki, Y., Haussler, D., Hermjakob, H., Hokamp, K., Jang, W., Johnson, L. S., Jones, T. A., Kasif, S., Kaspryzk, A., Kennedy, S., Kent, W. J., Kitts, P., Koonin, E. V., Korf, I., Kulp, D., Lancet, D., Lowe, T. M., McLysaght, A., Mikkelsen, T., Moran, J. V., Mulder, N., Pollara, V. J., Ponting, C. P., Schuler, G., Schultz, J., Slater, G., Smit, A. F. A., Stupka, E., Szustakowki, J., Thierry-Mieg, D., Thierry-Mieg, J., Wagner, L., Wallis, J., Wheeler, R., Williams, A., Wolf, Y. I., Wolfe, K. H., Yang, S.-P., Yeh, R.-F., Collins, F., Guyer, M. S., Peterson, J., Felsenfeld, A., Wetterstrand, K. A., Myers, R. M., Schmutz, J., Dickson, M., 176 References Grimwood, J., Cox, D. R., Olson, M. V., Kaul, R., Raymond, C., Shimizu, N., Kawasaki, K., Minoshima, S., Evans, G. A., Athanasiou, M., Schultz, R., Patrinos, A., and Morgan, M. J. (2001). Initial sequencing and analysis of the human genome. Nature, 409:860–921. Consortium, M. G. S., Chinwalla, A. T., Cook, L. L., Delehaunty, K. D., Fewell, G. A., Fulton, L. A., Fulton, R. S., Graves, T. A., Hillier, L. W., Mardis, E. R., McPherson, J. D., Miner, T. L., Nash, W. E., Nelson, J. O., Nhan, M. N., Pepin, K. H., Pohl, C. S., Ponce, T. C., Schultz, B., Thompson, J., Trevaskis, E., Waterston, R. H., Wendl, M. C., Wilson, R. K., Yang, S.-P., An, P., Berry, E., Birren, B., Bloom, T., Brown, D. G., Butler, J., Daly, M., David, R., Deri, J., Dodge, S., Foley, K., Gage, D., Gnerre, S., Holzer, T., Jaffe, D. B., Kamal, M., Karlsson, E. K., Kells, C., Kirby, A., Kulbokas III, E. J., Lander, E. S., Landers, T., Leger, J. P., Levine, R., Lindblad-Toh, K., Mauceli, E., Mayer, J. H., McCarthy, M., Meldrim, J., Meldrim, J., Mesirov, J. P., Nicol, R., Nusbaum, C., Seaman, S., Sharpe, T., Sheridan, A., Singer, J. B., Santos, R., Spencer, B., Stange-Thomann, N., Vinson, J. P., Wade, C. M., Wierzbowski, J., Wyman, D., Zody, M. C., Birney, E., Goldman, N., Kasprzyk, A., Mongin, E., Rust, A. G., Slater, G., Stabenau, A., Ureta-Vidal, A., Whelan, S., Ainscough, R., Attwood, J., Bailey, J., Barlow, K., Beck, S., Burton, J., Clamp, M., Clee, C., Coulson, A., Cuff, J., Curwen, V., Cutts, T., Davies, J., Eyras, E., Grafham, D., Gregory, S., Hubbard, T., Hunt, A., Jones, M., Joy, A., Leonard, S., Lloyd, C., Matthews, L., McLaren, S., McLay, K., Meredith, B., Mullikin, J. C., Ning, Z., Oliver, K., Overton- Larty, E., Plumb, R., Potter, S., Quail, M., Rogers, J., Scott, C., Searle, S., Shownkeen, R., Sims, S., Wall, M., West, A. P., Willey, D., Williams, S., Abril, J. F., Guigó, R., Parra, G., Agarwal, P., Agarwala, R., Church, D. M., Hlavina, W., Maglott, D. R., Sapojnikov, V., Alexandersson, M., Pachter, L., Antonarakis, S. E., Dermitzakis, E. T., Reymond, A., Ucla, C., Baertsch, R., Diekhans, M., Furey, T. S., Hinrichs, A., Hsu, F., Karolchik, D., Kent, W. J., Roskin, K. M., Schwartz, M. S., Sugnet, C., Weber, R. J., Bork, P., Letunic, I., Suyama, M., Torrents, D., Zdobnov, E. M., Botcherby, M., Brown, S. D., Campbell, R. D., Jackson, I., Bray, N., Couronne, O., Dubchak, I., Poliakov, A., Rubin, E. M., Brent, M. R., Flicek, P., Keibler, E., Korf, I., Batalov, S., Bult, C., Frankel, W. N., Carninci, P., Hayashizaki, Y., Kawai, J., Okazaki, Y., Cawley, S., Kulp, D., Wheeler, R., Chiaromonte, F., Collins, F. S., Felsenfeld, A., Guyer, M., Peterson, J., Wetterstrand, K., Copley, R. R., Mott, R., Dewey, C., Dickens, N. J., Emes, R. D., Goodstadt, L., Ponting, C. P., Winter, E., Dunn, D. M., von Niederhausern, A. C., Weiss, R. B., Eddy, S. R., Johnson, L. S., Jones, T. A., Elnitski, L., Kolbe, D. L., Eswara, P., Miller, W., O’Connor, M. J., Schwartz, S., Gibbs, R. A., Muzny, D. M., Glusman, G., Smit, A., Green, E. D., Hardison, R. C., Yang, S., Haussler, D., Hua, A., Roe, B. A., Kucherlapati, R. S., Montgomery, K. T., Li, J., Li, M., Lucas, S., Ma, B., McCombie, W. R., Morgan, M., Pevzner, P., Tesler, G., Schultz, J., Smith, D. R., Tromp, J., Worley, K. C., Lander, E. S., Abril, J. F., Agarwal, P., Alexandersson, M., Antonarakis, S. E., Baertsch, R., Berry, E., Birney, E., Bork, P., Bray, N., Brent, M. R., Brown, D. G., Butler, J., Bult, C., Chiaromonte, F., Chinwalla, A. T., Church, D. M., Clamp, M., Collins, F. S., Copley, R. R., Couronne, O., Cawley, S., Cuff, J., Curwen, V., Cutts, T., Daly, M., Dermitzakis, E. T., Dewey, C., Dickens, N. J., Diekhans, M., Dubchak, I., Eddy, S. R., Elnitski, L., Emes, R. D., Eswara, P., Eyras, E., Felsenfeld, A., Flicek, P., Frankel, W. N., Fulton, L. A., Furey, T. S., Gnerre, S., Glusman, G., Goldman, N., Goodstadt, L., Green, E. D., Gregory, S., Guigó, R., Hardison, R. C., Haussler, D., Hillier, L. W., Hinrichs, A., Hlavina, W., Hsu, F., Hubbard, T., Jaffe, D. B., Kamal, M., Karolchik, D., Karlsson, E. K., Kasprzyk, A., Keibler, E., Kent, W. J., Kirby, A., Kolbe, D. L., Korf, I., Kulbokas III, E. J., Kulp, D., Lander, E. S., Letunic, I., Li, M., Lindblad-Toh, K., Ma, B., Maglott, D. R., Mauceli, E., Mesirov, J. P., Miller, W., Mott, References 177 R., Mullikin, J. C., Ning, Z., Pachter, L., Parra, G., Pevzner, P., Poliakov, A., Ponting, C. P., Potter, S., Reymond, A., Roskin, K. M., Sapojnikov, V., Schultz, J., Schwartz, M. S., Schwartz, S., Searle, S., Singer, J. B., Slater, G., Smit, A., Stabenau, A., Sugnet, C., Suyama, M., Tesler, G., Torrents, D., Tromp, J., Ucla, C., Vinson, J. P., Wade, C. M., Weber, R. J., Wheeler, R., Winter, E., Yang, S.-P., Zdobnov, E. M., Waterston, R. H., Whelan, S., Worley, K. C., and Zody, M. C. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature, 420:520–562. Consortium, N. R. E. M. (2013). Roadmap Epige- nomics Chromatin State Model: emission parameters. https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/ imputed12marks/jointModel/final/emissions_25_imputed12marks.png. Consortium, N. R. E. M. (2014). Roadmap Epigenomics Chromatin State Model: raw data. https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/ imputed12marks/jointModel/final/catMat/hg19_chromHMM_imputed25.gz. Consortium, R. E., Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi- Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M. J., Amin, V., Whitaker, J. W., Schultz, M. D., Ward, L. D., Sarkar, A., Quon, G., Sandstrom, R. S., Eaton, M. L., Wu, Y.-C., Pfenning, A., Wang, X., ClaussnitzerYaping Liu, M., Coarfa, C., Alan Harris, R., Shoresh, N., Epstein, C. B., Gjoneska, E., Leung, D., Xie, W., David Hawkins, R., Lister, R., Hong, C., Gascard, P., Mungall, A. J., Moore, R., Chuah, E., Tam, A., Canfield, T. K., Scott Hansen, R., Kaul, R., Sabo, P. J., Bansal, M. S., Carles, A., Dixon, J. R., Farh, K.-H., Feizi, S., Karlic, R., Kim, A.-R., Kulkarni, A., Li, D., Lowdon, R., Elliott, G., Mercer, T. R., Neph, S. J., Onuchic, V., Polak, P., Rajagopal, N., Ray, P., Sallari, R. C., Siebenthall, K. T., Sinnott-Armstrong, N. A., Stevens, M., Thurman, R. E., Wu, J., Zhang, B., Zhou, X., Abdennur, N., Adli, M., Akerman, M., Barrera, L., Antosiewicz-Bourget, J., Ballinger, T., Barnes, M. J., Bates, D., Bell, R. J. A., Bennett, D. A., Bianco, K., Bock, C., Boyle, P., Brinchmann, J., Caballero-Campo, P., Camahort, R., Carrasco-Alfonso, M. J., Charnecki, T., Chen, H., Chen, Z., Cheng, J. B., Cho, S., Chu, A., Chung, W.-Y., Cowan, C., Athena Deng, Q., Deshpande, V., Diegel, M., Ding, B., Durham, T., Echipare, L., Edsall, L., Flowers, D., Genbacev-Krtolica, O., Gifford, C., Gillespie, S., Giste, E., Glass, I. A., Gnirke, A., Gormley, M., Gu, H., Gu, J., Hafler, D. A., Hangauer, M. J., Hariharan, M., Hatan, M., Haugen, E., He, Y., Heimfeld, S., Herlofsen, S., Hou, Z., Humbert, R., Issner, R., Jackson, A. R., Jia, H., Jiang, P., Johnson, A. K., Kadlecek, T., Kamoh, B., Kapidzic, M., Kent, J., Kim, A., Kleinewietfeld, M., Klugman, S., Krishnan, J., Kuan, S., Kutyavin, T., Lee, A.-Y., Lee, K., Li, J., Li, N., Li, Y., Ligon, K. L., Lin, S., Lin, Y., Liu, J., Liu, Y., Luckey, C. J., Ma, Y. P., Maire, C., Marson, A., Mattick, J. S., Mayo, M., McMaster, M., Metsky, H., Mikkelsen, T., Miller, D., Miri, M., Mukame, E., Nagarajan, R. P., Neri, F., Nery, J., Nguyen, T., O’Geen, H., Paithankar, S., Papayannopoulou, T., Pelizzola, M., Plettner, P., Propson, N. E., Raghuraman, S., Raney, B. J., Raubitschek, A., Reynolds, A. P., Richards, H., Riehle, K., Rinaudo, P., Robinson, J. F., Rockweiler, N. B., Rosen, E., Rynes, E., Schein, J., Sears, R., Sejnowski, T., Shafer, A., Shen, L., Shoemaker, R., Sigaroudinia, M., Slukvin, I., Stehling-Sun, S., Stewart, R., Subramanian, S. L., Suknuntha, K., Swanson, S., Tian, S., Tilden, H., Tsai, L., Urich, M., Vaughn, I., Vierstra, J., Vong, S., Wagner, U., Wang, H., Wang, T., Wang, Y., Weiss, A., Whitton, H., Wildberg, A., Witt, H., Won, K.-J., Xie, M., Xing, X., Xu, I., Xuan, Z., Ye, Z., Yen, C.-a., Yu, P., Zhang, X., Zhang, X., Zhao, J., Zhou, Y., Zhu, J., Zhu, Y., Ziegler, S., 178 References Beaudet, A. E., Boyer, L. A., De Jager, P. L., Farnham, P. J., Fisher, S. J., Haussler, D., Jones, S. J. M., Li, W., Marra, M. A., McManus, M. T., Sunyaev, S., Thomson, J. A., Tlsty, T. D., Tsai, L.-H., Wang, W., Waterland, R. A., Zhang, M. Q., Chadwick, L. H., Bernstein, B. E., Costello, J. F., Ecker, J. R., Hirst, M., Meissner, A., Milosavljevic, A., Ren, B., Stamatoyannopoulos, J. A., Wang, T., Kellis, M., Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M. J., Amin, V., Whitaker, J. W., Schultz, M. D., Ward, L. D., Sarkar, A., Quon, G., Sandstrom, R. S., Eaton, M. L., Wu, Y.-C., Pfenning, A. R., Wang, X., Claussnitzer, M., Liu, Y., Coarfa, C., Harris, R. A., Shoresh, N., Epstein, C. B., Gjoneska, E., Leung, D., Xie, W., Hawkins, R. D., Lister, R., Hong, C., Gascard, P., Mungall, A. J., Moore, R., Chuah, E., Tam, A., Canfield, T. K., Hansen, R. S., Kaul, R., Sabo, P. J., Bansal, M. S., Carles, A., Dixon, J. R., Farh, K.-H., Feizi, S., Karlic, R., Kim, A.-R., Kulkarni, A., Li, D., Lowdon, R., Elliott, G., Mercer, T. R., Neph, S. J., Onuchic, V., Polak, P., Rajagopal, N., Ray, P., Sallari, R. C., Siebenthall, K. T., Sinnott-Armstrong, N. A., Stevens, M., Thurman, R. E., Wu, J., Zhang, B., Zhou, X., Beaudet, A. E., Boyer, L. A., De Jager, P. L., Farnham, P. J., Fisher, S. J., Haussler, D., Jones, S. J. M., Li, W., Marra, M. A., McManus, M. T., Sunyaev, S., Thomson, J. A., Tlsty, T. D., Tsai, L.-H., Wang, W., Waterland, R. A., Zhang, M. Q., Chadwick, L. H., Bernstein, B. E., Costello, J. F., Ecker, J. R., Hirst, M., Meissner, A., Milosavljevic, A., Ren, B., Stamatoyannopoulos, J. A., Wang, T., and Kellis, M. (2015). Integrative analysis of 111 reference human epigenomes. Nature, 518:317–330. Consortium, T. E. P., Dunham, I., Kundaje, A., Aldred, S. F., Collins, P. J., Davis, C. A., Doyle, F., Epstein, C. B., Frietze, S., Harrow, J., Kaul, R., Khatun, J., Lajoie, B. R., Landt, S. G., Lee, B.-K., Pauli, F., Rosenbloom, K. R., Sabo, P., Safi, A., Sanyal, A., Shoresh, N., Simon, J. M., Song, L., Trinklein, N. D., Altshuler, R. C., Birney, E., Brown, J. B., Cheng, C., Djebali, S., Dong, X., Dunham, I., Ernst, J., Furey, T. S., Gerstein, M., Giardine, B., Greven, M., Hardison, R. C., Harris, R. S., Herrero, J., Hoffman, M. M., Iyer, S., Kellis, M., Khatun, J., Kheradpour, P., Kundaje, A., Lassmann, T., Li, Q., Lin, X., Marinov, G. K., Merkel, A., Mortazavi, A., Parker, S. C. J., Reddy, T. E., Rozowsky, J., Schlesinger, F., Thurman, R. E., Wang, J., Ward, L. D., Whitfield, T. W., Wilder, S. P., Wu, W., Xi, H. S., Yip, K. Y., Zhuang, J., Bernstein, B. E., Birney, E., Dunham, I., Green, E. D., Gunter, C., Snyder, M., Pazin, M. J., Lowdon, R. F., Dillon, L. A. L., Adams, L. B., Kelly, C. J., Zhang, J., Wexler, J. R., Green, E. D., Good, P. J., Feingold, E. A., Bernstein, B. E., Birney, E., Crawford, G. E., Dekker, J., Elnitski, L., Farnham, P. J., Gerstein, M., Giddings, M. C., Gingeras, T. R., Green, E. D., Guigó, R., Hardison, R. C., Hubbard, T. J., Kellis, M., Kent, W. J., Lieb, J. D., Margulies, E. H., Myers, R. M., Snyder, M., Stamatoyannopoulos, J. A., Tenenbaum, S. A., Weng, Z., White, K. P., Wold, B., Khatun, J., Yu, Y., Wrobel, J., Risk, B. A., Gunawardena, H. P., Kuiper, H. C., Maier, C. W., Xie, L., Chen, X., Giddings, M. C., Bernstein, B. E., Epstein, C. B., Shoresh, N., Ernst, J., Kheradpour, P., Mikkelsen, T. S., Gillespie, S., Goren, A., Ram, O., Zhang, X., Wang, L., Issner, R., Coyne, M. J., Durham, T., Ku, M., Truong, T., Ward, L. D., Altshuler, R. C., Eaton, M. L., Kellis, M., Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., Xue, C., Marinov, G. K., Khatun, J., Williams, B. A., Zaleski, C., Rozowsky, J., Röder, M., Kokocinski, F., Abdelhamid, R. F., Alioto, T., Antoshechkin, I., Baer, M. T., Batut, P., Bell, I., Bell, K., Chakrabortty, S., Chen, X., Chrast, J., Curado, J., Derrien, T., Drenkow, J., Dumais, E., Dumais, J., Duttagupta, R., Fastuca, M., Fejes-Toth, K., Ferreira, P., Foissac, S., Fullwood, M. J., Gao, H., Gonzalez, D., Gordon, A., Gunawardena, H. P., Howald, C., Jha, S., Johnson, R., Kapranov, P., King, B., Kingswood, C., Li, G., Luo, O. J., Park, E., Preall, J. B., Presaud, K., Ribeca, P., Risk, References 179 B. A., Robyr, D., Ruan, X., Sammeth, M., Sandhu, K. S., Schaeffer, L., See, L.-H., Shahab, A., Skancke, J., Suzuki, A. M., Takahashi, H., Tilgner, H., Trout, D., Walters, N., Wang, H., Wrobel, J., Yu, Y., Hayashizaki, Y., Harrow, J., Gerstein, M., Hubbard, T. J., Reymond, A., Antonarakis, S. E., Hannon, G. J., Giddings, M. C., Ruan, Y., Wold, B., Carninci, P., Guigó, R., Gingeras, T. R., Rosenbloom, K. R., Sloan, C. A., Learned, K., Malladi, V. S., Wong, M. C., Barber, G. P., Cline, M. S., Dreszer, T. R., Heitner, S. G., Karolchik, D., Kent, W. J., Kirkup, V. M., Meyer, L. R., Long, J. C., Maddren, M., Raney, B. J., Furey, T. S., Song, L., Grasfeder, L. L., Giresi, P. G., Lee, B.-K., Battenhouse, A., Sheffield, N. C., Simon, J. M., Showers, K. A., Safi, A., London, D., Bhinge, A. A., Shestak, C., Schaner, M. R., Ki Kim, S., Zhang, Z. Z., Mieczkowski, P. A., Mieczkowska, J. O., Liu, Z., McDaniell, R. M., Ni, Y., Rashid, N. U., Kim, M. J., Adar, S., Zhang, Z., Wang, T., Winter, D., Keefe, D., Birney, E., Iyer, V. R., Lieb, J. D., Crawford, G. E., Li, G., Sandhu, K. S., Zheng, M., Wang, P., Luo, O. J., Shahab, A., Fullwood, M. J., Ruan, X., Ruan, Y., Myers, R. M., Pauli, F., Williams, B. A., Gertz, J., Marinov, G. K., Reddy, T. E., Vielmetter, J., Partridge, E., Trout, D., Varley, K. E., Gasper, C., Bansal, A., Pepke, S., Jain, P., Amrhein, H., Bowling, K. M., Anaya, M., Cross, M. K., King, B., Muratet, M. A., Antoshechkin, I., Newberry, K. M., McCue, K., Nesmith, A. S., Fisher-Aylor, K. I., Pusey, B., DeSalvo, G., Parker, S. L., Balasubramanian, S., Davis, N. S., Meadows, S. K., Eggleston, T., Gunter, C., Newberry, J. S., Levy, S. E., Absher, D. M., Mortazavi, A., Wong, W. H., Wold, B., Blow, M. J., Visel, A., Pennachio, L. A., Elnitski, L., Margulies, E. H., Parker, S. C. J., Petrykowska, H. M., Abyzov, A., Aken, B., Barrell, D., Barson, G., Berry, A., Bignell, A., Boychenko, V., Bussotti, G., Chrast, J., Davidson, C., Derrien, T., Despacio-Reyes, G., Diekhans, M., Ezkurdia, I., Frankish, A., Gilbert, J., Gonzalez, J. M., Griffiths, E., Harte, R., Hendrix, D. A., Howald, C., Hunt, T., Jungreis, I., Kay, M., Khurana, E., Kokocinski, F., Leng, J., Lin, M. F., Loveland, J., Lu, Z., Manthravadi, D., Mariotti, M., Mudge, J., Mukherjee, G., Notredame, C., Pei, B., Rodriguez, J. M., Saunders, G., Sboner, A., Searle, S., Sisu, C., Snow, C., Steward, C., Tanzer, A., Tapanari, E., Tress, M. L., van Baren, M. J., Walters, N., Washietl, S., Wilming, L., Zadissa, A., Zhang, Z., Brent, M., Haussler, D., Kellis, M., Valencia, A., Gerstein, M., Reymond, A., Guigó, R., Harrow, J., Hubbard, T. J., Landt, S. G., Frietze, S., Abyzov, A., Addleman, N., Alexander, R. P., Auerbach, R. K., Balasubramanian, S., Bettinger, K., Bhardwaj, N., Boyle, A. P., Cao, A. R., Cayting, P., Charos, A., Cheng, Y., Cheng, C., Eastman, C., Euskirchen, G., Fleming, J. D., Grubert, F., Habegger, L., Hariharan, M., Harmanci, A., Iyengar, S., Jin, V. X., Karczewski, K. J., Kasowski, M., Lacroute, P., Lam, H., Lamarre-Vincent, N., Leng, J., Lian, J., Lindahl-Allen, M., Min, R., Miotto, B., Monahan, H., Moqtaderi, Z., Mu, X. J., O’Geen, H., Ouyang, Z., Patacsil, D., Pei, B., Raha, D., Ramirez, L., Reed, B., Rozowsky, J., Sboner, A., Shi, M., Sisu, C., Slifer, T., Witt, H., Wu, L., Xu, X., Yan, K.-K., Yang, X., Yip, K. Y., Zhang, Z., Struhl, K., Weissman, S. M., Gerstein, M., Farnham, P. J., Snyder, M., Tenenbaum, S. A., Penalva, L. O., Doyle, F., Karmakar, S., Landt, S. G., Bhanvadia, R. R., Choudhury, A., Domanus, M., Ma, L., Moran, J., Patacsil, D., Slifer, T., Victorsen, A., Yang, X., Snyder, M., White, K. P., Auer, T., Centanin, L., Eichenlaub, M., Gruhl, F., Heermann, S., Hoeckendorf, B., Inoue, D., Kellner, T., Kirchmaier, S., Mueller, C., Reinhardt, R., Schertel, L., Schneider, S., Sinn, R., Wittbrodt, B., Wittbrodt, J., Weng, Z., Whitfield, T. W., Wang, J., Collins, P. J., Aldred, S. F., Trinklein, N. D., Partridge, E. C., Myers, R. M., Dekker, J., Jain, G., Lajoie, B. R., Sanyal, A., Balasundaram, G., Bates, D. L., Byron, R., Canfield, T. K., Diegel, M. J., Dunn, D., Ebersol, A. K., Frum, T., Garg, K., Gist, E., Hansen, R. S., Boatman, L., Haugen, E., Humbert, R., Jain, G., Johnson, A. K., Johnson, E. M., Kutyavin, T. V., Lajoie, B. R., Lee, K., Lotakis, D., Maurano, M. T., 180 References Neph, S. J., Neri, F. V., Nguyen, E. D., Qu, H., Reynolds, A. P., Roach, V., Rynes, E., Sabo, P., Sanchez, M. E., Sandstrom, R. S., Sanyal, A., Shafer, A. O., Stergachis, A. B., Thomas, S., Thurman, R. E., Vernot, B., Vierstra, J., Vong, S., Wang, H., Weaver, M. A., Yan, Y., Zhang, M., Akey, J. M., Bender, M., Dorschner, M. O., Groudine, M., MacCoss, M. J., Navas, P., Stamatoyannopoulos, G., Kaul, R., Dekker, J., Stamatoyannopoulos, J. A., Dunham, I., Beal, K., Brazma, A., Flicek, P., Herrero, J., Johnson, N., Keefe, D., Lukk, M., Luscombe, N. M., Sobral, D., Vaquerizas, J. M., Wilder, S. P., Batzoglou, S., Sidow, A., Hussami, N., Kyriazopoulou-Panagiotopoulou, S., Libbrecht, M. W., Schaub, M. A., Kundaje, A., Hardison, R. C., Miller, W., Giardine, B., Harris, R. S., Wu, W., Bickel, P. J., Banfai, B., Boley, N. P., Brown, J. B., Huang, H., Li, Q., Li, J. J., Noble, W. S., Bilmes, J. A., Buske, O. J., Hoffman, M. M., Sahu, A. D., Kharchenko, P. V., Park, P. J., Baker, D., Taylor, J., Weng, Z., Iyer, S., Dong, X., Greven, M., Lin, X., Wang, J., Xi, H. S., Zhuang, J., Gerstein, M., Alexander, R. P., Balasubramanian, S., Cheng, C., Harmanci, A., Lochovsky, L., Min, R., Mu, X. J., Rozowsky, J., Yan, K.-K., Yip, K. Y., and Birney, E. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57–74. Curran, S. P., Wu, X., Riedel, C. G., and Ruvkun, G. (2009). A soma-to-germline transfor- mation in long-lived Caenorhabditis elegans mutants. Nature, 459:1079–1084. Czesnikiewicz-Guzik, M., Lee, W.-W., Cui, D., Hiruma, Y., Lamar, D. L., Yang, Z.-Z., Ouslander, J. G., Weyand, C. M., and Goronzy, J. J. (2008). T cell subset-specific susceptibility to aging. Clinical Immunology, 127(1):107–118. Dale, R. K., Pedersen, B. S., and Quinlan, A. R. (2011). Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics, 27(24):3423–3424. Davey, J. W. and Blaxter, M. L. (2011). RADSeq: next-generation population genetics. Briefings in Functional Genomics, 9(5-6):416–423. Davey, J. W., Hohenlohe, P. A., Etter, P. D., Boone, J. Q., Catchen, J. M., and Blaxter, M. L. (2011). Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics, 12:499–510. Davis, S. and Meltzer, P. S. (2007). GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics, 23(14):1846–1847. Day, K., Waite, L. L., Thalacker-Mercer, A., West, A., Bamman, M. M., and Brooks, J. D. (2013). Differential DNA methylation with age displays both common and dynamic features across human tissues that are influenced by CpG landscape. Genome Biology, 14:R102. De Cecco, M., Criscione, S. W., Peterson, A. L., Neretti, N., Sedivy, J. M., and Kreiling, J. A. (2013). Transposable elements become active and mobile in the genomes of aging mammalian somatic tissues. Aging, 5(12):867–883. De Cecco, M., Ito, T., Petrashen, A. P., Elias, A. E., Skvir, N. J., Criscione, S. W., Caligiana, A., Brocculi, G., Adney, E. M., Boeke, J. D., Le, O., Beauséjour, C., Ambati, J., Ambati, K., Simon, M., Seluanov, A., Gorbunova, V., Slagboom, P. E., Helfand, S. L., Neretti, N., and Sedivy, J. M. (2019). L1 drives IFN in senescent cells and promotes age-associated inflammation. Nature, 566(7742):73–78. References 181 De Magalhães, J. P. and Costa, J. (2009). A database of vertebrate longevity records and their relation to other life-history traits. Journal of Evolutionary Biology, 22(8):1770–1774. de Magalhães, J. P. (2012). Programmatic features of aging originating in development: aging mechanisms beyond molecular damage? The FASEB Journal, 26(12):4821–4826. Dedeurwaerder, S., Defrance, M., Calonne, E., Denis, H., Sotiriou, C., and Fuks, F. (2011). Evaluation of the Infinium Methylation 450K technology. Epigenomics, 3(6):771–784. Dekker, J., Marti-Renom, M. A., and Mirny, L. A. (2013). Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nature Reviews Genetics, 14:390–403. Deng, J., Shoemaker, R., Xie, B., Gore, A., LeProust, E. M., Antosiewicz-Bourget, J., Egli, D., Maherali, N., Park, I.-H., Yu, J., Daley, G. Q., Eggan, K., Hochedlinger, K., Thomson, J., Wang, W., Gao, Y., and Zhang, K. (2009). Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming. Nature Biotechnology, 27:353–360. Dhayalan, A., Rajavelu, A., Rathert, P., Tamas, R., Jurkowska, R. Z., Ragozin, S., and Jeltsch, A. (2010). The Dnmt3a PWWP domain reads histone 3 lysine 36 trimethylation and guides DNA methylation. Journal of Biological Chemistry, 285:26114–26120. Diep, D., Plongthongkum, N., Gore, A., Fung, H.-L., Shoemaker, R., and Zhang, K. (2012). Library-free methylation sequencing with bisulfite padlock probes. Nature Methods, 9:270–272. Dillin, A., Crawford, D. K., and Kenyon, C. (2002). Timing Requirements for Insulin/IGF-1 Signaling in C. elegans. Science, 298(5594):830–834. Domcke, S., Bardet, A. F., Adrian Ginno, P., Hartl, D., Burger, L., and Schübeler, D. (2015). Competition between DNA methylation and transcription factors determines binding of NRF1. Nature, 528(7583):575–579. Dong, X., Milholland, B., and Vijg, J. (2016). Evidence for a limit to human lifespan. Nature, 538:257–259. Dozmorov, M. G. (2015). Polycomb repressive complex 2 epigenomic signature defines age- associated hypermethylation and gene expression changes. Epigenetics, 10(6):484–495. Du, P., Zhang, X., Huang, C. . C., Jafari, N., Kibbe, W. A., Hou, L., and Lin, S. M. (2010). Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics, 11:587. Eaton, M. L. (2007). Linear Statistical Models. In Multivariate Statistics: A Vector Space Approach, pages 132–158. Edgar, R., Domrachev, M., and Lash, A. (2002). Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research, 30(1):207– 210. 182 References El Khoury, L. Y., Gorrie-Stone, T., Smart, M., Hughes, A., Bao, Y., Andrayas, A., Burrage, J., Hannon, E., Kumari, M., Mill, J., and Schalkwyk, L. C. (2018). Properties of the epigenetic clock and age acceleration. bioRxiv, page 363143. Enguix, A., Cubiles, M. D., Barroso, S., Aguilera, A., Vaquero-Sedas, M. I., and Vega- Palas, M. A. (2018). Epigenetic features of human telomeres. Nucleic Acids Research, 46(5):2347–2355. Ernst, J. and Kellis, M. (2010). Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature Biotechnology, 28:817–825. Feldman, L., Andersen, S. L., Perls, T. T., Dworkis, D. A., and Sebastiani, P. (2012). Health Span Approximates Life Span Among Many Supercentenarians: Compression of Morbidity at the Approximate Limit of Life Span. The Journals of Gerontology: Series A, 67A(4):395–405. Fernández, A. F., Bayón, G. F., Urdinguio, R. G., Toraño, E. G., García, M. G., Carella, A., Petrus-Reurer, S., Ferrero, C., Martinez-Camblor, P., Cubillo, I., García-Castro, J., Delgado-Calle, J. U., Pérez-Campo, F. M., Riancho, J. A., Bueno, C., Menéndez, P., Mentink, A., Mareschi, K., Claire, F., Fagnani, C., Medda, E., Toccaceli, V., Brescianini, S., Moran, S., Esteller, M., Stolzing, A., De Boer, J., Nistico, L., Stazi, M. A., and Fraga, M. F. (2015). H3K4me1 marks DNA regions hypomethylated during aging in human stem and differentiated cells. Genome Research, 25:27–40. Feser, J., Truong, D., Das, C., Carson, J. J., Kieft, J., Harkness, T., and Tyler, J. K. (2010). Elevated Histone Expression Promotes Life Span Extension. Molecular Cell, 39(5):724– 735. Field, A. E., Robertson, N. A., Wang, T., Havas, A., Ideker, T., and Adams, P. D. (2018). DNA Methylation Clocks in Aging: Categories, Causes, and Consequences. Molecular Cell, 71(6):882–895. Finch, C. E. (2009). Update on Slow Aging and Negligible Senescence – A Mini-Review. Gerontology, 55(3):307–313. Fine, M. (2014). Intergenerational perspectives on ageing, economics and globalisation. Australasian Journal on Ageing, 33(4):220–225. FitzGerald, G., Botstein, D., Califf, R., Collins, R., Peters, K., Van Bruggen, N., and Rader, D. (2018). The future of humans as model organisms. Science, 361(6402):552–553. Flanagan, J. M. (2015). Epigenome-Wide Association Studies (EWAS): Past, Present, and Future. In Verma, M., editor, Cancer Epigenetics: Risk Assessment, Diagnosis, Treatment and Prognosis, pages 51–63. Springer New York, New York, NY. Flavahan, W. A., Gaskell, E., and Bernstein, B. E. (2017). Epigenetic plasticity and the hallmarks of cancer. Science, 357(6348):eaal2380. Fleischer, T., Gampe, J., Scheuerlein, A., and Kerth, G. (2017). Rare catastrophic events drive population dynamics in a bat species with negligible senescence. Scientific Reports, 7(1):7370. References 183 Fontana, L. and Partridge, L. (2015). Promoting health and longevity through diet: From model organisms to humans. Cell, 161(1):106–118. Fortin, J.-P. and Hansen, K. D. (2015). minfi guidelines: analysis of 450K data using minfi. Fortin, J.-P., Labbe, A., Lemire, M., Zanke, B. W., Hudson, T. J., Fertig, E. J., Greenwood, C. M. T., and Hansen, K. D. (2014). Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biology, 15(11):503. Fraga, M. F., Ballestar, E., Paz, M. F., Ropero, S., Setien, F., Ballestar, M. L., Heine-Suñer, D., Cigudosa, J. C., Urioste, M., Benitez, J., Boix-Chornet, M., Sanchez-Aguilera, A., Ling, C., Carlsson, E., Poulsen, P., Vaag, A., Stephan, Z., Spector, T. D., Wu, Y.-Z., Plass, C., and Esteller, M. (2005). Epigenetic differences arise during the lifetime of monozygotic twins. Proceedings of the National Academy of Sciences of the United States of America, 102(30):10604–10609. Franceschi, C. (2007). Inflammaging as a Major Characteristic of Old People: Can It Be Prevented or Cured? Nutrition Reviews, 65(s3):S173–S176. Frankish, A., Bignell, A., Berry, A., Yates, A., Parker, A., Schmitt, B. M., Aken, B., García Girón, C., Zerbino, D., Stapleton, E., Martin, F. J., Cunningham, F., Barnes, I., Sycheva, I., Loveland, J., Mudge, J. M., Gonzalez, J. M., Ruffier, M., Suner, M.-M., Hardy, M., Izuogu, O. G., Donaldson, S., Mohanan, S., Hourlier, T., Grego, T., Hunt, T., Flicek, P., Wright, J., Choudhary, J. S., Lagarde, J., Carbonell Sala, S., Guigó, R., Pozo, F., Martínez, L., Tress, M. L., Di Domenico, T., Muir, P., Uszczynska-Ratajczak, B., Paten, B., Fiddes, I. T., Armstrong, J., Diekhans, M., Hubbard, T. J. P., Reymond, A., Ferreira, A.-M., Chrast, J., Johnson, R., Jungreis, I., Kellis, M., Pei, B., Navarro, F. C. P., Xu, J., Zhang, Y., Gerstein, M., and Sisu, C. (2018). GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Research, 47(D1):D766–D773. Freund, A. (2019). Untangling Aging Using Dynamic, Organism-Level Phenotypic Networks. Cell Systems, 8(3):172–181. Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of statistical software, 33(1):1–22. Froimchuk, E., Jang, Y., and Ge, K. (2017). Histone H3 lysine 4 methyltransferase KMT2D. Gene, 627:337–342. Frommer, M., McDonald, L. E., Millar, D. S., Collis, C. M., Watt, F., Grigg, G. W., Molloy, P. L., and Paul, C. L. (1992). A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proceedings of the National Academy of Sciences, 89(5):1827–1831. Fumagalli, M. (2013). Assessing the Effect of Sequencing Depth and Sample Size in Population Genetics Inferences. PLOS ONE, 8(11):e79667. Gagnon-Bartsch, J. A. and Speed, T. P. (2012). Using control genes to correct for unwanted variation in microarray data. Biostatistics, 13(3):539–552. 184 References Galkin, F., Aliper, A., Putin, E., Kuznetsov, I., Gladyshev, V. N., and Zhavoronkov, A. (2018). Human microbiome aging clocks based on deep learning and tandem of permutation feature importance and accumulated local effects. bioRxiv, page 507780. Gao, Y., Gan, H., Lou, Z., and Zhang, Z. (2018). Asf1a resolves bivalent chromatin domains for the induction of lineage-specific genes during mouse embryonic stem cell differentiation. Proceedings of the National Academy of Sciences, 115(27):E6162–E6171. Garagnani, P., Bacalini, M. G., Pirazzini, C., Gori, D., Giuliani, C., Mari, D., Di Blasio, A. M., Gentilini, D., Vitale, G., Collino, S., Rezzi, S., Castellani, G., Capri, M., Salvioli, S., and Franceschi, C. (2012). Methylation of ELOVL2 gene as a new epigenetic marker of age. Aging Cell, 11(6):1132–1134. Gems, D. (2015). The aging-disease false dichotomy: understanding senescence as pathology. Frontiers in genetics, 6(June):212. Gilbert, S. F. (2011). Commentary: ‘The Epigenotype’ by C.H. Waddington. International Journal of Epidemiology, 41(1):20–23. Gompertz, B. (1825). On the Nature of the Function Expressive of the Law of Human Mor- tality, and on a New Mode of Determining the Value of Life Contingencies. Philosophical Transactions of the Royal Society of London, 115:513–583. Gontier, G., Iyer, M., Shea, J. M., Bieri, G., Wheatley, E. G., Ramalho-Santos, M., and Villeda, S. A. (2018). Tet2 Rescues Age-Related Regenerative Decline and Enhances Cognitive Function in the Adult Mouse Brain. Cell Reports, 22(8):1974–1981. Gopalan, S., Carja, O., Fagny, M., Patin, E., Myrick, J. W., McEwen, L. M., Mah, S. M., Kobor, M. S., Froment, A., Feldman, M. W., Quintana-Murci, L., and Henn, B. M. (2017). Trends in DNA Methylation with Age Replicate Across Diverse Human Populations. Genetics, 206(3):1659–1674. Grafodatskaya, D., Chung, B. H. Y., Butcher, D. T., Turinsky, A. L., Goodman, S. J., Choufani, S., Chen, Y.-A., Lou, Y., Zhao, C., Rajendram, R., Abidi, F. E., Skinner, C., Stavropoulos, J., Bondy, C. A., Hamilton, J., Wodak, S., Scherer, S. W., Schwartz, C. E., and Weksberg, R. (2013). Multilocus loss of DNA methylation in individuals with mutations in the histone H3 Lysine 4 Demethylase KDM5C. BMC Medical Genomics, 6(1):1. Greally, J. M. (2018). A user’s guide to the ambiguous word ’epigenetics’. Nature Reviews Molecular Cell Biology, 19:207–208. Greer, E. L. and Brunet, A. (2008). Signaling networks in aging. Journal of Cell Science, 121:407–412. Greer, E. L., Oskoui, P. R., Banko, M. R., Maniar, J. M., Gygi, M. P., Gygi, S. P., and Brunet, A. (2007). The energy sensor AMP-activated protein kinase directly regulates the mammalian FOXO3 transcription factor. Journal of Biological Chemistry, 282:30107– 30119. Grönniger, E., Weber, B., Heil, O., Peters, N., Stäb, F., Wenck, H., Korn, B., Winnefeld, M., and Lyko, F. (2010). Aging and Chronic Sun Exposure Cause Distinct Epigenetic Changes in Human Skin. PLOS Genetics, 6(5):e1000971. References 185 Gu, H., Bock, C., Mikkelsen, T. S., Jäger, N., Smith, Z. D., Tomazou, E., Gnirke, A., Lander, E. S., and Meissner, A. (2010). Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution. Nature methods, 7(2):133–136. Guarente, L. and Kenyon, C. (2000). Genetic pathways that regulate ageing in model organisms. Nature, 408(6809):255–262. Halfmann, R. and Lindquist, S. (2010). Epigenetics in the Extreme: Prions and the Inheritance of Environmentally Acquired Traits. Science, 330(6004):629–632. Hanna, C. W., Peñaherrera, M. S., Saadeh, H., Andrews, S., McFadden, D. E., Kelsey, G., and Robinson, W. P. (2016). Pervasive polymorphic imprinted methylation in the human placenta. Genome Research, 26:756–767. Hannum, G., Guinney, J., Zhao, L., Zhang, L., Hughes, G., and Sadda, S. (2013). Genome- wide methylation profiles reveal quantitative views of human aging rates. Mol Cell, 49(2):359–367. Harrison, D. E., Strong, R., Sharp, Z. D., Nelson, J. F., Astle, C. M., Flurkey, K., Nadon, N. L., Wilkinson, J. E., Frenkel, K., Carter, C. S., Pahor, M., Javors, M. a., Fernandez, E., and Miller, R. a. (2009). Rapamycin fed late in life extends lifespan in genetically heterogeneous mice. Nature, 460(7253):392–395. Harrow, J., Frankish, A., Gonzalez, J. M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B. L., Barrell, D., Zadissa, A., Searle, S., Barnes, I., Bignell, A., Boychenko, V., Hunt, T., Kay, M., Mukherjee, G., Rajan, J., Despacio-Reyes, G., Saunders, G., Steward, C., Harte, R., Lin, M., Howald, C., Tanzer, A., Derrien, T., Chrast, J., Walters, N., Balasubramanian, S., Pei, B., Tress, M., Rodriguez, J. M., Ezkurdia, I., Van Baren, J., Brent, M., Haussler, D., Kellis, M., Valencia, A., Reymond, A., Gerstein, M., Guigó, R., and Hubbard, T. J. (2012). GENCODE: The reference human genome annotation for the ENCODE project. Genome Research, 22:1760–1774. Hayflick, L. (1998). A Brief History of the Mortality and Immortality of Cultured Cells. The Keio Journal of Medicine, 47(3):174–182. Hayflick, L. (2007a). Biological aging is no longer an unsolved problem. In Annals of the New York Academy of Sciences, volume 1100, pages 1–13. Hayflick, L. (2007b). Entropy Explains Aging, Genetic Determinism Explains Longevity, and Undefined Terminology Explains Misunderstanding Both. PLOS Genetics, 3(12):e220. Hayflick, L. and Moorhead, P. S. (1961). The serial cultivation of human diploid cell strains. Experimental Cell Research, 25(3):585–621. He, Y. and Ecker, J. R. (2015). Non-CG Methylation in the Human Genome. Annual Review of Genomics and Human Genetics, 16(1):55–77. Hernando-Herraez, I., Evano, B., Stubbs, T., Commere, P.-H., Clark, S., Andrews, S., Tajbakhsh, S., and Reik, W. (2018). Ageing affects DNA methylation drift and transcrip- tional cell-to-cell variability in muscle stem cells. bioRxiv, page 500900. 186 References Herranz, N. and Gil, J. (2018). Mechanisms and functions of cellular senescence. The Journal of Clinical Investigation, 128(4):1238–1246. Hertel, J., Friedrich, N., Wittfeld, K., Pietzner, M., Budde, K., Van der Auwera, S., Lohmann, T., Teumer, A., Völzke, H., Nauck, M., and Grabe, H. J. (2016). Measuring Biological Age via Metabonomics: The Metabolic Age Score. Journal of Proteome Research, 15(2):400–410. Heyn, H., Li, N., Ferreira, H. J., Moran, S., Pisano, D. G., and Gomez, A. (2012). Distinct DNA methylomes of newborns and centenarians. Proc Natl Acad Sci U S A, 109(26):10522– 10527. Heyn, P., Logan, C. V., Fluteau, A., Challis, R. C., Auchynnikava, T., Martin, C.-A., Marsh, J. A., Taglini, F., Kilanowski, F., Parry, D. A., Cormier-Daire, V., Fong, C.-T., Gibson, K., Hwa, V., Ibáñez, L., Robertson, S. P., Sebastiani, G., Rappsilber, J., Allshire, R. C., Reijns, M. A. M., Dauber, A., Sproul, D., and Jackson, A. P. (2019). Gain-of-function DNMT3A mutations cause microcephalic dwarfism and hypermethylation of Polycomb-regulated regions. Nature Genetics, 51(1):96–105. Hodges, E., Smith, A. D., Kendall, J., Xuan, Z., Ravi, K., Rooks, M., Zhang, M. Q., Ye, K., Bhattacharjee, A., Brizuela, L., McCombie, W. R., Wigler, M., Hannon, G. J., and Hicks, J. B. (2009). High definition profiling of mammalian DNA methylation by array capture and single molecule bisulfite sequencing. Genome Research, 19:1593–1605. Holliday, R. and Pugh, J. E. (1975). DNA modification mechanisms and gene activity during development. Science, 187(4173):226–232. Hon, G., Song, C.-X., Du, T., Jin, F., Selvaraj, S., Lee, A., Yen, C.-a., Ye, Z., Mao, S.-Q., Wang, B.-A., Kuan, S., Edsall, L., Zhao, B., Xu, G.-L., He, C., and Ren, B. (2014). 5mC Oxidation by Tet2 Modulates Enhancer Activity and Timing of Transcriptome Reprogramming during Differentiation. Molecular Cell, 56(2):286–297. Hood, R. L., Schenkel, L. C., Nikkel, S. M., Ainsworth, P. J., Pare, G., Boycott, K. M., Bulman, D. E., and Sadikovic, B. (2016). The defining DNA methylation signature of Floating-Harbor Syndrome. Scientific Reports, 6:38803. Horvath, S. (2013a). DNA methylation age of human tissues and cell types. Genome Biology, 14(10):3156. Horvath, S. (2013b). DNAmAge online calculator: https://dnamage.genetics.ucla.edu/home. Horvath, S. (2013c). FAQs DNAmAge online calculator: https://horvath.genetics.ucla.edu/html/dnamage/faq.htm#_Toc385147421. Horvath, S. (2015). Erratum to: DNA methylation age of human tissues and cell types. Genome Biology, 16(1):96. Horvath, S., Erhart, W., Brosch, M., Ammerpohl, O., von Schönfels, W., Ahrens, M., Heits, N., Bell, J. T., Tsai, P.-C., Spector, T. D., Deloukas, P., Siebert, R., Sipos, B., Becker, T., Röcken, C., Schafmayer, C., and Hampe, J. (2014). Obesity accelerates epigenetic aging of human liver. Proceedings of the National Academy of Sciences, page 201412759. References 187 Horvath, S., Garagnani, P., Bacalini, M. G., Pirazzini, C., Salvioli, S., Gentilini, D., Di Blasio, A. M., Giuliani, C., Tung, S., Vinters, H. V., and Franceschi, C. (2015a). Accelerated epigenetic aging in Down syndrome. Aging Cell, 14(3):491–495. Horvath, S., Gurven, M., Levine, M. E., Trumble, B. C., Kaplan, H., Allayee, H., Ritz, B. R., Chen, B., Lu, A. T., Rickabaugh, T. M., Jamieson, B. D., Sun, D., Li, S., Chen, W., Quintana-Murci, L., Fagny, M., Kobor, M. S., Tsao, P. S., Reiner, A. P., Edlefsen, K. L., Absher, D., and Assimes, T. L. (2016a). An epigenetic clock analysis of race/ethnicity, sex, and coronary heart disease. Genome Biology, 17(1):171. Horvath, S., Langfelder, P., Kwak, S., Aaronson, J., Rosinski, J., Vogt, T. F., Eszes, M., Faull, R. L., Curtis, M. A., Waldvogel, H. J., Choi, O. W., Tung, S., Vinters, H. V., Coppola, G., and Yang, X. W. (2016b). Huntington’s disease accelerates epigenetic aging of human brain and disrupts DNA methylation levels. Aging, 8(7):1485–1512. Horvath, S. and Levine, A. J. (2015). HIV-1 Infection Accelerates Age According to the Epigenetic Clock. The Journal of infectious diseases, 212(10):1563–73. Horvath, S., Mah, V., Lu, A. T., Woo, J. S., Choi, O.-W., Jasinska, A. J., Riancho, J. A., Tung, S., Coles, N. S., Braun, J., Vinters, H. V., and Coles, L. S. (2015b). The cerebellum ages slowly according to the epigenetic clock. Aging, 7(5):294–306. Horvath, S., Oshima, J., Martin, G. M., Lu, A. T., Quach, A., Cohen, H., Felton, S., Matsuyama, M., Lowe, D., Kabacik, S., Wilson, J. G., Reiner, A. P., Maierhofer, A., Flunkert, J., Aviv, A., Hou, L., Baccarelli, A. A., Li, Y., Stewart, J. D., Whitsel, E. A., Ferrucci, L., Matsuyama, S., and Raj, K. (2018). Epigenetic clock for skin and blood cells applied to Hutchinson Gilford Progeria Syndrome and ex vivo studies. Aging, 10(7):1758–1775. Horvath, S. and Raj, K. (2018). DNA methylation-based biomarkers and the epigenetic clock theory of ageing. Nature Reviews Genetics, 19(6):371–384. Horvath, S., Zhang, Y., Langfelder, P., Kahn, R. S., Boks, M. P., and Van Eijk, K. (2012). Aging effects on DNA methylation modules in human brain and blood tissue. Genome Biol, 13:R97. Hoshino, A., Horvath, S., Sridhar, A., Chitsazan, A., and Reh, T. A. (2019). Synchrony and asynchrony between an epigenetic clock and developmental timing. Scientific Reports, 9(1):3770. Houseman, E. A., Accomando, W. P., Koestler, D. C., Christensen, B. C., Marsit, C. J., Nelson, H. H., Wiencke, J. K., and Kelsey, K. T. (2012). DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics, 13:86. Hsu, A.-L., Murphy, C. T., and Kenyon, C. (2003). Regulation of Aging and Age-Related Disease by DAF-16 and Heat-Shock Factor. Science, 300(5622):1142–1145. Huang, H., Weng, H., Zhou, K., Wu, T., Zhao, B. S., Sun, M., Chen, Z., Deng, X., Xiao, G., Auer, F., Klemm, L., Wu, H., Zuo, Z., Qin, X., Dong, Y., Zhou, Y., Qin, H., Tao, S., Du, J., Liu, J., Lu, Z., Yin, H., Mesquita, A., Yuan, C. L., Hu, Y.-C., Sun, W., Su, R., Dong, L., Shen, C., Li, C., Qing, Y., Jiang, X., Wu, X., Sun, M., Guan, J.-L., Qu, L., 188 References Wei, M., Müschen, M., Huang, G., He, C., Yang, J., and Chen, J. (2019). Histone H3 trimethylation at lysine 36 guides m6A RNA modification co-transcriptionally. Nature, 567(7748):414–419. Huang, X., Lu, H., Wang, J.-W., Xu, L., Liu, S., Sun, J., and Gao, F. (2013). High-throughput sequencing of methylated cytosine enriched by modification-dependent restriction endonu- clease MspJI. BMC Genetics, 14(1):56. Huh, C. J., Zhang, B., Victor, M. B., Dahiya, S., Batista, L. F. Z., Horvath, S., and Yoo, A. S. (2016). Maintenance of age in human neurons generated by microRNA-based neuronal conversion of fibroblasts. eLife, 5:e18648. Illumina (2010). GenomeStudio® Methylation Module v1.8 User Guide. Technical report. Illumina (2015). Infinium® HD Assay Methylation Protocol Guide. Technical report. Irvin, M. R., Aslibekyan, S., Do, A., Zhi, D., Hidalgo, B., Claas, S. A., Srinivasasainagendra, V., Horvath, S., Tiwari, H. K., Absher, D. M., and Arnett, D. K. (2018). Metabolic and inflammatory biomarkers are associated with epigenetic aging acceleration estimates in the GOLDN study. Clinical Epigenetics, 10(1):56. Ito, S., D’Alessio, A. C., Taranova, O. V., Hong, K., Sowers, L. C., and Zhang, Y. (2010). Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass specification. Nature, 466:1129–1133. Iurlaro, M., von Meyenn, F., and Reik, W. (2017). DNA methylation homeostasis in human and mouse development. Current Opinion in Genetics & Development, 43:101–109. Ivanov, M., Kals, M., Kacevska, M., Metspalu, A., Ingelman-Sundberg, M., and Milani, L. (2013). In-solution hybrid capture of bisulfite-converted DNA for targeted bisulfite sequencing of 174 ADME genes. Nucleic Acids Research, 41(6):e72. Jaffe, A. E. (2018). FlowSorted.Blood.450k Bioconductor Package. Jaffe, A. E. and Irizarry, R. A. (2014). Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biology, 15(2):R31. Jeffries, A. R., Maroofian, R., Salter, C. G., Chioza, B. A., Cross, H. E., Patton, M. A., Temple, I. K., Mackay, D., Rezwan, F. I., Aksglaede, L., Baralle, D., Dabir, T., Hunter, M. F., Kamath, A., Kumar, A., Newbury-Ecob, R., Selicorni, A., Springer, A., van Maldergem, L., Varghese, V., Yachelevich, N., Tatton-Brown, K., Mill, J., Crosby, A. H., and Baple, E. (2018). Growth disrupting mutations in epigenetic regulatory molecules are associated with abnormalities of epigenetic aging. bioRxiv, page 477356. Jenkinson, G., Pujadas, E., Goutsias, J., and Feinberg, A. P. (2017). Potential energy landscapes identify the information-theoretic nature of the epigenome. Nature Genetics, 49:719–729. Jensen, A. B., Moseley, P. L., Oprea, T. I., Ellesøe, S. G., Eriksson, R., Schmock, H., Jensen, P. B., Jensen, L. J., and Brunak, S. (2014). Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nature Communications, 5:4022. References 189 Jeong, M., Sun, D., Luo, M., Huang, Y., Challen, G. A., Rodriguez, B., Zhang, X., Chavez, L., Wang, H., Hannah, R., Kim, S.-B., Yang, L., Ko, M., Chen, R., Göttgens, B., Lee, J.-S., Gunaratne, P., Godley, L. A., Darlington, G. J., Rao, A., Li, W., and Goodell, M. A. (2013). Large conserved domains of low DNA methylation maintained by Dnmt3a. Nature Genetics, 46:17–23. Jeziorska, D. M., Murray, R. J. S., De Gobbi, M., Gaentzsch, R., Garrick, D., Ayyub, H., Chen, T., Li, E., Telenius, J., Lynch, M., Graham, B., Smith, A. J. H., Lund, J. N., Hughes, J. R., Higgs, D. R., and Tufarelli, C. (2017). DNA methylation of intragenic CpG islands depends on their transcriptional activity during differentiation and disease. Proceedings of the National Academy of Sciences, 114(36):E7526–E7535. Johnson, T. E. (2013). 25 years after age-1: Genes, interventions and the revolution in aging research. Experimental Gerontology, 48(7):640–643. Jones, O. R., Scheuerlein, A., Salguero-Gómez, R., Camarda, C. G., Schaible, R., Casper, B. B., Dahlgren, J. P., Ehrlén, J., García, M. B., Menges, E. S., Quintana-Ascencio, P. F., Caswell, H., Baudisch, A., and Vaupel, J. W. (2013). Diversity of ageing across the tree of life. Nature, 505:169. Jylhävä, J., Pedersen, N. L., and Hägg, S. (2017). Biological Age Predictors. EBioMedicine, 21:29–36. Kacmarczyk, T. J., Fall, M. P., Zhang, X., Xin, Y., Li, Y., Alonso, A., and Betel, D. (2018). “Same difference”: comprehensive evaluation of four DNA methylation measurement platforms. Epigenetics & Chromatin, 11(1):21. Kanfi, Y., Naiman, S., Amir, G., Peshti, V., Zinman, G., Nahum, L., Bar-Joseph, Z., and Co- hen, H. Y. (2012). The sirtuin SIRT6 regulates lifespan in male mice. Nature, 483:218221. Kaplanis, J., Gordon, A., Shor, T., Weissbrod, O., Geiger, D., Wahl, M., Gershovits, M., Markus, B., Sheikh, M., Gymrek, M., Bhatia, G., MacArthur, D. G., Price, A. L., and Erlich, Y. (2018). Quantitative analysis of population-scale family trees with millions of relatives. Science, 360(6385):171–175. Kapourani, C.-A. and Sanguinetti, G. (2019). Melissa: Bayesian clustering and imputation of single-cell methylomes. Genome Biology, 20(1):61. Kawakatsu, T., Huang, S.-s. C., Jupe, F., Sasaki, E., Schmitz, R. J., Urich, M. A., Castanon, R., Nery, J. R., Barragan, C., He, Y., Chen, H., Dubin, M., Lee, C.-R., Wang, C., Bemm, F., Becker, C., O’Neil, R., O’Malley, R. C., Quarless, D. X., Alonso-Blanco, C., Andrade, J., Becker, C., Bemm, F., Bergelson, J., Borgwardt, K., Chae, E., Dezwaan, T., Ding, W., Ecker, J. R., Expósito-Alonso, M., Farlow, A., Fitz, J., Gan, X., Grimm, D. G., Hancock, A., Henz, S. R., Holm, S., Horton, M., Jarsulic, M., Kerstetter, R. A., Korte, A., Korte, P., Lanz, C., Lee, C.-R., Meng, D., Michael, T. P., Mott, R., Muliyati, N. W., Nägele, T., Nagler, M., Nizhynska, V., Nordborg, M., Novikova, P., Picó, F. X., Platzer, A., Rabanal, F. A., Rodriguez, A., Rowan, B. A., Salomé, P. A., Schmid, K., Schmitz, R. J., Seren, Ü., Sperone, F. G., Sudkamp, M., Svardal, H., Tanzer, M. M., Todd, D., Volchenboum, S. L., Wang, C., Wang, G., Wang, X., Weckwerth, W., Weigel, D., Zhou, X., Schork, N. J., Weigel, D., Nordborg, M., and Ecker, J. R. (2016). Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions. Cell, 166(2):492–505. 190 References Kelsey, G., Stegle, O., and Reik, W. (2017). Single-cell epigenomics: Recording the past and predicting the future. Science, 358(6359):69–75. Kenyon, C. (2005). The plasticity of aging: Insights from long-lived mutants. Cell, 120(4):449–460. Kenyon, C., Chang, J., Gensch, E., Rudner, A., and Tabtiang, R. (1993). A C. elegans mutant that lives twice as long as wild type. Nature, 366(6454):461–464. Kenyon, C. J. (2010). The genetics of ageing. Nature, 464(7288):504–12. Kernohan, K. D., Cigana Schenkel, L., Huang, L., Smith, A., Pare, G., Ainsworth, P., Boycott, K. M., Warman-Chardon, J., Sadikovic, B., and Consortium, C. C. (2016). Identification of a methylation profile for DNMT1-associated autosomal dominant cerebellar ataxia, deafness, and narcolepsy. Clinical Epigenetics, 8(1):91. Khan, S. S., Singer, B. D., and Vaughan, D. E. (2017). Molecular and physiological manifestations and measurement of aging in humans. Aging Cell, 16(4):624–633. Kierkegaard, S. (1843). Journals. Kirkland, J. L., Tchkonia, T., Zhu, Y., Niedernhofer, L. J., and Robbins, P. D. (2017). The Clinical Potential of Senolytic Drugs. Journal of the American Geriatrics Society, 65(10):2297–2301. Kirkwood, T. B. and Rose, M. R. (1991). Evolution of senescence: late survival sacrificed for reproduction. Philosophical Transactions - Royal Society of London, B, 332(1262):15–24. Kirkwood, T. B. L. (1977). Evolution of ageing. Nature, 270(5635):301–304. Kirschner, S. A., Hunewald, O., Mériaux, S. B., Brunnhoefer, R., Muller, C. P., and Turner, J. D. (2016). Focussing reduced representation CpG sequencing through judicious restric- tion enzyme choice. Genomics, 107(4):109–119. Klass, M. and Hirsh, D. (1976). Non-ageing developmental variant of Caenorhabditis elegans. Nature, 260(5551):523–525. Koch, C. M. and Wagner, W. (2011). Epigenetic-aging-signature to determine age in different tissues. Aging, 3(10):1018–1027. Koestler, D. C., Jones, M. J., Usset, J., Christensen, B. C., Butler, R. A., Kobor, M. S., Wiencke, J. K., and Kelsey, K. T. (2016). Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL). BMC Bioinformatics, 17:120. Komori, H. K., LaMere, S. A., Torkamani, A., Hart, G. T., Kotsopoulos, S., Warner, J., Samuels, M. L., Olson, J., Head, S. R., Ordoukhanian, P., Lee, P. L., Link, D. R., and Salomon, D. R. (2011). Application of microdroplet PCR for large-scale targeted bisulfite sequencing. Genome Research, 21(10):1738–1745. Kontis, V., Bennett, J. E., Mathers, C. D., Li, G., Foreman, K., and Ezzati, M. (2017). Future life expectancy in 35 industrialised countries: projections with a Bayesian model ensemble. The Lancet, 389(10076):1323–1335. References 191 Kresovich, J. K., Xu, Z., O’Brien, K. M., Weinberg, C. R., Sandler, D. P., and Taylor, J. A. (2019). Methylation-Based Biological Age and Breast Cancer Risk. JNCI: Journal of the National Cancer Institute, page djz020. Kriaucionis, S. and Heintz, N. (2009). The Nuclear DNA Base 5-Hydroxymethylcytosine Is Present in Purkinje Neurons and the Brain. Science, 324(5929):929–930. Kriukiene˙, E., Labrie, V., Khare, T., Urbanavicˇiu¯te˙, G., Lapinaite˙, A., Koncevicˇius, K., Li, D., Wang, T., Pai, S., Ptak, C., Gordevicˇius, J., Wang, S.-C., Petronis, A., and Klimašauskas, S. (2013). DNA unmethylome profiling by covalent capture of CpG sites. Nature Communications, 4:2190. Krueger, F. and Andrews, S. R. (2011). Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics, 27(11):1571–1572. Kucab, J. E., Zou, X., Morganella, S., Joel, M., Nanda, A. S., Nagy, E., Gomez, C., Degasperi, A., Harris, R., Jackson, S. P., Arlt, V. M., Phillips, D. H., and Nik-Zainal, S. (2019). A Compendium of Mutational Signatures of Environmental Agents. Cell, 177(4):821– 836.e16. Kudithipudi, S., Lungu, C., Rathert, P., Happel, N., and Jeltsch, A. (2014). Substrate Specificity Analysis and Novel Substrates of the Protein Lysine Methyltransferase NSD1. Chemistry & Biology, 21(2):226–237. Kuhn, R. M., Haussler, D., and Kent, W. J. (2012). The UCSC genome browser and associated tools. Briefings in Bioinformatics, 14(2):144–161. Kuranda, K., Vargaftig, J., de la Rochere, P., Dosquet, C., Charron, D., Bardin, F., Tonnelle, C., Bonnet, D., and Goodhardt, M. (2011). Age-related changes in human hematopoietic stem/progenitor cells. Aging Cell, 10(3):542–546. Kurdyukov, S. and Bullock, M. (2016). DNA Methylation Analysis: Choosing the Right Method. Biology, 5(1):3. Kurotaki, N., Imaizumi, K., Harada, N., Masuno, M., Kondoh, T., Nagai, T., Ohashi, H., Naritomi, K., Tsukahara, M., Makita, Y., Sugimoto, T., Sonoda, T., Hasegawa, T., Chinen, Y., Tomita, H.-a., Kinoshita, A., Mizuguchi, T., Yoshiura, K.-i., Ohta, T., Kishino, T., Fukushima, Y., Niikawa, N., and Matsumoto, N. (2002). Haploinsufficiency of NSD1 causes Sotos syndrome. Nature Genetics, 30:365–366. Lappalainen, T. and Greally, J. M. (2017). Associating cellular epigenetic models with human phenotypes. Nature Reviews Genetics, 18:441–451. Larsson, N.-G. (2010). Somatic Mitochondrial DNA Mutations in Mammalian Aging. Annual Review of Biochemistry, 79(1):683–706. Lawrence, M., Daujat, S., and Schneider, R. (2016). Lateral Thinking: How Histone Modifications Regulate Gene Expression. Trends in Genetics, 32(1):42–56. Lee, Y. K., Jin, S., Duan, S., Lim, Y. C., Ng, D. P. Y., Lin, X. M., Yeo, G. S. H., and Ding, C. (2014). Improved reduced representation bisulfite sequencing for epigenomic profiling of clinical samples. Biological Procedures Online, 16(1):1. 192 References Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., Geman, D., Baggerly, K., and Irizarry, R. A. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11:733–739. Lev Maor, G., Yearim, A., and Ast, G. (2015). The alternative role of DNA methylation in splicing regulation. Trends in Genetics, 31(5):274–280. Leventopoulos, G., Kitsiou-Tzeli, S., Kritikos, K., Psoni, S., Mavrou, A., Kanavakis, E., and Fryssira, H. (2009). A Clinical Study of Sotos Syndrome Patients With Review of the Literature. Pediatric Neurology, 40(5):357–364. Levine, M. E., Lu, A. T., Chen, B. H., Hernandez, D. G., Singleton, A. B., Ferrucci, L., Bandinelli, S., Salfati, E., Manson, J. E., Quach, A., Kusters, C. D. J., Kuh, D., Wong, A., Teschendorff, A. E., Widschwendter, M., Ritz, B. R., Absher, D., Assimes, T. L., and Horvath, S. (2016). Menopause accelerates biological aging. Proceedings of the National Academy of Sciences, 113(33):9327–9332. Levine, M. E., Lu, A. T., Quach, A., Chen, B. H., Assimes, T. L., Bandinelli, S., Hou, L., Baccarelli, A. A., Stewart, J. D., Li, Y., Whitsel, E. A., Wilson, J. G., Reiner, A. P., Aviv, A., Lohman, K., Liu, Y., Ferrucci, L., and Horvath, S. (2018). An epigenetic biomarker of aging for lifespan and healthspan. Aging, 10(4):573–591. Lezzerini, M. and Budovskaya, Y. (2014). A dual role of the Wnt signaling pathway during aging in Caenorhabditis elegans. Aging Cell, 13(1):8–18. Li, E. and Zhang, Y. (2014). DNA methylation in mammals. Cold Spring Harbor Perspectives in Biology, 6(5):a019133. Li, H., Liefke, R., Jiang, J., Kurland, J. V., Tian, W., Deng, P., Zhang, W., He, Q., Patel, D. J., Bulyk, M. L., Shi, Y., and Wang, Z. (2017). Polycomb-like proteins link the PRC2 complex to CpG islands. Nature, 549(7671):287–291. Li, Y., Zheng, H., Wang, Q., Zhou, C., Wei, L., Liu, X., Zhang, W., Zhang, Y., Du, Z., Wang, X., and Xie, W. (2018). Genome-wide analyses reveal a role of Polycomb in promoting hypomethylation of DNA methylation valleys. Genome Biology, 19(1):18. Lim, Y. C., Chia, S. Y., Jin, S., Han, W., Ding, C., and Sun, L. (2016). Dynamic DNA methy- lation landscape defines brown and white cell specificity during adipogenesis. Molecular Metabolism, 5(10):1033–1041. Lin, K., Hsin, H., Libina, N., and Kenyon, C. (2001). Regulation of the Caenorhabditis elegans longevity protein DAF-16 by insulin/IGF-1 and germline signaling. Nature Genetics, 28:139–145. Liu, J. and Siegmund, K. D. (2016). An evaluation of processing methods for HumanMethy- lation450 BeadChip data. BMC Genomics, 17(1):469. Liu, X. S., Wu, H., Ji, X., Stelzer, Y., Wu, X., Czauderna, S., Shu, J., Dadon, D., Young, R. A., and Jaenisch, R. (2016). Editing DNA Methylation in the Mammalian Genome. Cell, 167(1):233–247.e17. References 193 Liu, Y., Aryee, M. J., Padyukov, L., Fallin, M. D., Hesselberg, E., Runarsson, A., Reinius, L., Acevedo, N., Taub, M., Ronninger, M., Shchetynsky, K., Scheynius, A., Kere, J., Alfredsson, L., Klareskog, L., Ekström, T. J., and Feinberg, A. P. (2013). Epigenome- wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nature Biotechnology, 31:142–147. Liu, Y., Siejka-Zielin´ska, P., Velikova, G., Bi, Y., Yuan, F., Tomkova, M., Bai, C., Chen, L., Schuster-Böckler, B., and Song, C.-X. (2019). Bisulfite-free direct detection of 5- methylcytosine and 5-hydroxymethylcytosine at base resolution. Nature Biotechnology, 37(4):424–429. Long, H. K., Sims, D., Heger, A., Blackledge, N. P., Kutter, C., Wright, M. L., Grützner, F., Odom, D. T., Patient, R., Ponting, C. P., and Klose, R. J. (2013). Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates. eLife, 2:e00348. Lopez-Otin, C., Blasco, M. A., Partridge, L., Serrano, M., and Kroemer, G. (2013). The hallmarks of aging. Cell, 153(6):1194–1217. Lowe, D., Horvath, S., and Raj, K. (2016). Epigenetic clock analyses of cellular senescence and ageing. Oncotarget, 7(8):8524–8531. Lowe, R., Barton, C., Jenkins, C. A., Ernst, C., Forman, O., Fernandez-Twinn, D. S., Bock, C., Rossiter, S. J., Faulkes, C. G., Ozanne, S. E., Walter, L., Odom, D. T., Mellersh, C., and Rakyan, V. K. (2018). Ageing-associated DNA methylation dynamics are a molecular readout of lifespan variation among mammalian species. Genome Biology, 19(1):22. Lu, A. T., Hannon, E., Levine, M. E., Hao, K., Crimmins, E. M., Lunnon, K., Kozlenkov, A., Mill, J., Dracheva, S., and Horvath, S. (2016). Genetic variants near MLST8 and DHX57 affect the epigenetic age of the cerebellum. Nature Communications, 7:10561. Lu, A. T., Quach, A., Wilson, J. G., Reiner, A. P., Aviv, A., Raj, K., Hou, L., Baccarelli, A. A., Li, Y., Stewart, J. D., Whitsel, E. A., Assimes, T. L., Ferrucci, L., and Horvath, S. (2019). DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging, 11(2):303–327. Lu, A. T., Xue, L., Salfati, E. L., Chen, B. H., Ferrucci, L., Levy, D., Joehanes, R., Murabito, J. M., Kiel, D. P., Tsai, P.-C., Yet, I., Bell, J. T., Mangino, M., Tanaka, T., McRae, A. F., Marioni, R. E., Visscher, P. M., Wray, N. R., Deary, I. J., Levine, M. E., Quach, A., Assimes, T., Tsao, P. S., Absher, D., Stewart, J. D., Li, Y., Reiner, A. P., Hou, L., Baccarelli, A. A., Whitsel, E. A., Aviv, A., Cardona, A., Day, F. R., Wareham, N. J., Perry, J. R. B., Ong, K. K., Raj, K., Lunetta, K. L., and Horvath, S. (2018). GWAS of epigenetic aging rates in blood reveals a critical role for TERT. Nature Communications, 9(1):387. Luscan, A., Laurendeau, I., Malan, V., Francannet, C., Odent, S., Giuliano, F., Lacombe, D., Touraine, R., Vidaud, M., Pasmant, E., and Cormier-Daire, V. (2014). Mutations in SETD2 cause a novel overgrowth condition. Journal of Medical Genetics, 51(8):512–517. Lyko, F. (2017). The DNA methyltransferase family: a versatile toolkit for epigenetic regulation. Nature Reviews Genetics, 19:81–92. 194 References Machado, A. (1912). Proverbios y cantares XXIX. In Campos de Castilla. Maegawa, S., Hinkal, G., Kim, H. S., Shen, L., Zhang, L., and Zhang, J. (2010). Widespread and tissue specific age-related DNA methylation changes in mice. Genome Res, 20:332– 340. Mahmoudi, S., Xu, L., and Brunet, A. (2019). Turning back time with emerging rejuvenation strategies. Nature Cell Biology, 21(1):32–43. Maierhofer, A., Flunkert, J., Oshima, J., Martin, G. M., Haaf, T., and Horvath, S. (2017). Accelerated epigenetic aging in Werner syndrome. Aging, 9(4):1143–1152. Maksimovic, J., Gordon, L., and Oshlack, A. (2012). SWAN: Subset-quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips. Genome Biology, 13(6):1–12. Maksimovic, J., Oshlack, A., Gagnon-Bartsch, J. A., and Speed, T. P. (2015). Removing un- wanted variation in a differential methylation analysis of Illumina HumanMethylation450 array data. Nucleic Acids Research, 43(16):e106–e106. Manser, A. R. and Uhrberg, M. (2016). Age-related changes in natural killer cell repertoires: impact on NK cell function and immune surveillance. Cancer Immunology, Immunother- apy, 65(4):417–426. Marioni, R. E., Deary, I. J., Relton, C. L., Suderman, M., Ferrucci, L., Chen, B. H., Horvath, S., Bandinelli, S., Beck, S., Morris, T., Pedersen, N. L., and Hägg, S. (2018). Tracking the Epigenetic Clock Across the Human Life Course: A Meta-analysis of Longitudinal Cohort Data. The Journals of Gerontology: Series A, 74(1):57–61. Marioni, R. E., Shah, S., McRae, A. F., Chen, B. H., Colicino, E., Harris, S. E., Gibson, J., Henders, A. K., Redmond, P., Cox, S. R., Pattie, A., Corley, J., Murphy, L., Martin, N. G., Montgomery, G. W., Feinberg, A. P., Fallin, M. D., Multhaup, M. L., Jaffe, A. E., Joehanes, R., Schwartz, J., Just, A. C., Lunetta, K. L., Murabito, J. M., Starr, J. M., Horvath, S., Baccarelli, A. A., Levy, D., Visscher, P. M., Wray, N. R., and Deary, I. J. (2015). DNA methylation age of blood predicts all-cause mortality in later life. Genome Biology, 16(1):25. Martin-Herranz, D. E. (2019). demh/epigenetic_ageing_clock: Epigenetic ageing clock v1.1.0. GitHub repository: https://github.com/demh/epigenetic_ageing_clock/. Martin-Herranz, D. E., Aref-Eshghi, E., Bonder, M. J., Stubbs, T. M., Choufani, S., Weksberg, R., Stegle, O., Sadikovic, B., Reik, W., and Thornton, J. M. (2019). Screening for genes that accelerate the epigenetic aging clock in humans reveals a role for the H3K36 methyltransferase NSD1. Genome Biology, 20(1):146. Martin-Herranz, D. E., Ribeiro, A. J., and Stubbs, T. M. (2017a). demh/cuRRBS: cuRRBS V1.0.4. Martin-Herranz, D. E., Ribeiro, A. J. M., Krueger, F., Thornton, J. M., Reik, W., and Stubbs, T. M. (2017b). cuRRBS: simple and robust evaluation of enzyme combinations for reduced representation approaches. Nucleic Acids Research, 45(20):11559–11569. References 195 Martin-Montalvo, A., Mercken, E. M., Mitchell, S. J., Palacios, H. H., Mote, P. L., Scheibye- Knudsen, M., Gomes, A. P., Ward, T. M., Minor, R. K., Blouin, M.-J., Schwab, M., Pollak, M., Zhang, Y., Yu, Y., Becker, K. G., Bohr, V. A., Ingram, D. K., Sinclair, D. A., Wolf, N. S., Spindler, S. R., Bernier, M., and de Cabo, R. (2013). Metformin improves healthspan and lifespan in mice. Nature Communications, 4:2192. Martincorena, I., Fowler, J. C., Wabik, A., Lawson, A. R. J., Abascal, F., Hall, M. W. J., Cagan, A., Murai, K., Mahbubani, K., Stratton, M. R., Fitzgerald, R. C., Handford, P. A., Campbell, P. J., Saeb-Parsy, K., and Jones, P. H. (2018). Somatic mutant clones colonize the human esophagus with age. Science, 362(6417):911–917. Martinez-Arguelles, D. B., Lee, S., and Papadopoulos, V. (2014). In silico analysis identi- fies novel restriction enzyme combinations that expand reduced representation bisulfite sequencing CpG coverage. BMC research notes, 7(1):534. Martinez-Jimenez, C. P., Eling, N., Chen, H.-C., Vallejos, C. A., Kolodziejczyk, A. A., Connor, F., Stojic, L., Rayner, T. F., Stubbington, M. J. T., Teichmann, S. A., de la Roche, M., Marioni, J. C., and Odom, D. T. (2017). Aging increases cell-to-cell transcriptional variability upon immune stimulation. Science, 355(6332):1433–1436. Martins, R., Lithgow, G. J., and Link, W. (2016). Long live FOXO: unraveling the role of FOXO proteins in aging and longevity. Aging Cell, 15(2):196–207. Mathelier, A., Fornes, O., Arenillas, D. J., Chen, C.-y., Denay, G., Lee, J., Shi, W., Shyr, C., Tan, G., Worsley-Hunt, R., Zhang, A. W., Parcy, F., Lenhard, B., Sandelin, A., and Wasserman, W. W. (2015). JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Research, 44(D1):D110–D115. Mattick, J. S., Amaral, P. P., Dinger, M. E., Mercer, T. R., and Mehler, M. F. (2009). RNA regulation of epigenetic processes. BioEssays, 31(1):51–59. Maurano, M. T., Wang, H., John, S., Shafer, A., Canfield, T., Lee, K., and Stamatoyannopou- los, J. A. (2015). Role of DNA Methylation in Modulating Transcription Factor Occupancy. Cell Reports, 12(7):1184–1195. McCay, C. M., Maynard, L. A., and Crowell, M. F. (1935). The Effect of Retarded Growth Upon the Length of Life Span and Upon the Ultimate Body Size: One Figure. The Journal of Nutrition, 10(1):63–79. McDaniel, S. L., Hepperla, A. J., Huang, J., Dronamraju, R., Adams, A. T., Kulkarni, V. G., Davis, I. J., and Strahl, B. D. (2017). H3K36 Methylation Regulates Nutrient Stress Response in Saccharomyces cerevisiae by Enforcing Transcriptional Fidelity. Cell Reports, 19(11):2371–2382. McDonald, R. B. and Ramsey, J. J. (2010). Honoring Clive McCay and 75 Years of Calorie Restriction Research. The Journal of Nutrition, 140(7):1205–1210. McGregor, K., Bernatsky, S., Colmegna, I., Hudson, M., Pastinen, T., Labbe, A., and Green- wood, C. M. T. (2016). An evaluation of methods correcting for cell-type heterogeneity in DNA methylation studies. Genome Biology, 17(1):84. 196 References McLaren, W., Gil, L., Hunt, S. E., Riat, H. S., Ritchie, G. R. S., Thormann, A., Flicek, P., and Cunningham, F. (2016). The Ensembl Variant Effect Predictor. Genome Biology, 17(1):122. Medvedev, Z. A. (1990). An attempt at a rational classification of theories of ageing. Biological Reviews, 65:375–398. Meer, M. V., Podolskiy, D. I., Tyshkovskiy, A., and Gladyshev, V. N. (2018). A whole lifespan mouse multi-tissue DNA methylation clock. eLife, 7:e40675. Meissner, A., Gnirke, A., Bell, G. W., Ramsahoye, B., Lander, E. S., and Jaenisch, R. (2005). Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Research, 33(18):5868–5877. Meissner, A., Mikkelsen, T. S., Gu, H., Wernig, M., Hanna, J., Sivachenko, A., Zhang, X., Bernstein, B. E., Nusbaum, C., Jaffe, D. B., Gnirke, A., Jaenisch, R., and Lander, E. S. (2008). Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature, 454(7205):766–70. Mihaylova, M. M. and Shaw, R. J. (2011). The AMPK signalling pathway coordinates cell growth, autophagy and metabolism. Nature Cell Biology, 13:1016–1023. Milagre, I., Stubbs, T. M., King, M. R., Spindel, J., Santos, F., Krueger, F., Bachman, M., Segonds-Pichon, A., Balasubramanian, S., Andrews, S. R., Dean, W., and Reik, W. (2017). Gender Differences in Global but Not Targeted Demethylation in iPSC Reprogramming. Cell Reports, 18(5):1079–1089. Min, K.-W., Zealy, R. W., Davila, S., Fomin, M., Cummings, J. C., Makowsky, D., Mcdowell, C. H., Thigpen, H., Hafner, M., Kwon, S.-H., Georgescu, C., Wren, J. D., and Yoon, J.-H. (2018). Profiling of m6A RNA modifications identified an age-associated regulation of AGO2 mRNA stability. Aging Cell, 17(3):e12753. Morris, J. Z., Tissenbaum, H. A., and Ruvkun, G. (1996). A phosphatidylinositol-3-OH kinase family member regulating longevity and diapause in Caenorhabditis elegans. Nature, 382(6591):536–539. Morris, K. V. and Mattick, J. S. (2014). The rise of regulatory RNA. Nature Reviews Genetics, 15:423–437. Morris, T. J. and Beck, S. (2015). Analysis pipelines and packages for Infinium Human- Methylation450 BeadChip (450k) data. Methods, 72:3–8. Most, J., Tosti, V., Redman, L. M., and Fontana, L. (2017). Calorie restriction in humans: An update. Ageing Research Reviews, 39:36–45. Mostoslavsky, R., Chua, K. F., Lombard, D. B., Pang, W. W., Fischer, M. R., Gellon, L., Liu, P., Mostoslavsky, G., Franco, S., Murphy, M. M., Mills, K. D., Patel, P., Hsu, J. T., Hong, A. L., Ford, E., Cheng, H.-L., Kennedy, C., Nunez, N., Bronson, R., Frendewey, D., Auerbach, W., Valenzuela, D., Karow, M., Hottiger, M. O., Hursting, S., Barrett, J. C., Guarente, L., Mulligan, R., Demple, B., Yancopoulos, G. D., and Alt, F. W. (2006). Genomic Instability and Aging-like Phenotype in the Absence of Mammalian SIRT6. Cell, 124(2):315–329. References 197 Narasimamurthy, R. and Virshup, D. M. (2017). Molecular Mechanisms Regulating Temper- ature Compensation of the Circadian Clock . Naumova, N., Smith, E. M., Zhan, Y., and Dekker, J. (2012). Analysis of long-range chromatin interactions using Chromosome Conformation Capture. Methods, 58(3):192– 203. Neri, F., Rapelli, S., Krepelova, A., Incarnato, D., Parlato, C., Basile, G., Maldotti, M., Anselmi, F., and Oliviero, S. (2017). Intragenic DNA methylation prevents spurious transcription initiation. Nature, 543(7643):72–77. Newell Stamper, B. L., Cypser, J. R., Kechris, K., Kitzenberg, D. A., Tedesco, P. M., and Johnson, T. E. (2018). Movement decline across lifespan of Caenorhabditis elegans mutants in the insulin/insulin-like signaling pathway. Aging Cell, 17(1):e12704. Newman, A. B. and Sanders, J. L. (2013). Telomere Length in Epidemiology: A Biomarker of Aging, Age-Related Disease, Both, or Neither? Epidemiologic Reviews, 35(1):112–131. Newman, A. M., Liu, C. L., Green, M. R., Gentles, A. J., Feng, W., Xu, Y., Hoang, C. D., Diehn, M., and Alizadeh, A. A. (2015). Robust enumeration of cell subsets from tissue expression profiles. Nature Methods, 12:453–457. Ni, Z., Ebata, A., Alipanahiramandi, E., and Lee, S. S. (2012). Two SET domain containing genes link epigenetic changes and aging in Caenorhabditis elegans. Aging Cell, 11(2):315– 325. Nikolich-Žugich, J. (2018). The twilight of immunity: emerging concepts in aging of the immune system. Nature Immunology, 19(1):10–19. Oberdoerffer, P. and Sinclair, D. A. (2007). The role of nuclear architecture in genomic instability and ageing. Nature Reviews Molecular Cell Biology, 8:692–702. Ocampo, A., Reddy, P., Martinez-Redondo, P., Platero-Luengo, A., Hatanaka, F., Hishida, T., Li, M., Lam, D., Kurita, M., Beyret, E., Araoka, T., Vazquez-Ferrer, E., Donoso, D., Roman, J. L., Xu, J., Rodriguez Esteban, C., Nuñez, G., Nuñez Delicado, E., Campistol, J. M., Guillen, I., Guillen, P., and Izpisua Belmonte, J. C. (2016). In Vivo Amelioration of Age-Associated Hallmarks by Partial Reprogramming. Cell, 167(7):1719–1733.e12. Oh, G., Ebrahimi, S., Carlucci, M., Zhang, A., Nair, A., Groot, D. E., Labrie, V., Jia, P., Oh, E. S., Jeremian, R. H., Susic, M., Shrestha, T. C., Ralph, M. R., Gordevicˇius, J., Koncevicˇius, K., and Petronis, A. (2018). Cytosine modifications exhibit circadian oscillations that are involved in epigenetic diversity and aging. Nature Communications, 9(1):644. Oh, G., Koncevicˇius, K., Ebrahimi, S., Carlucci, M., Groot, D. E., Nair, A., Zhang, A., Krišcˇiu¯nas, A., Oh, E. S., Labrie, V., Wong, A. H. C., Gordevicˇius, J., Jia, P., Susic, M., and Petronis, A. (2019). Circadian oscillations of cytosine modification in humans contribute to epigenetic variability, aging, and complex disease. Genome Biology, 20(1):2. Olova, N., Simpson, D. J., Marioni, R. E., and Chandra, T. (2019). Partial reprogramming induces a steady decline in epigenetic age before loss of somatic identity. Aging Cell, 18(1):e12877. 198 References Orr, W. C. (2016). Tightening the connection between transposable element mobilization and aging. Proceedings of the National Academy of Sciences, 113(40):11069–11070. O’Sullivan, R. J. and Karlseder, J. (2010). Telomeres: protecting chromosomes against genome instability. Nature Reviews Molecular Cell Biology, 11:171–181. Ou, H. D., Phan, S., Deerinck, T. J., Thor, A., Ellisman, M. H., and O’Shea, C. C. (2017). ChromEMT: Visualizing 3D chromatin structure and compaction in interphase and mitotic cells. Science, 357(6349):eaag0025. Pal, S. and Tyler, J. K. (2016). Epigenetics and aging. Science Advances, 2(7):e1600584. Partridge, L., Deelen, J., and Slagboom, P. E. (2018). Facing up to the global challenges of ageing. Nature, 561(7721):45–56. Patalano, S., Hore, T. A., Reik, W., and Sumner, S. (2012). Shifting behaviour: epigenetic reprogramming in eusocial insects. Current Opinion in Cell Biology, 24(3):367–373. Paul, D. S., Guilhamon, P., Karpathakis, A., Butcher, L. M., Thirlwell, C., Feber, A., and Beck, S. (2014). Assessment of raindrop BS-seq as a method for large-scale, targeted bisulfite sequencing. Epigenetics, 9(5):678–684. Penn, N. W., Suwalski, R., O’Riley, C., Bojanowski, K., and Yura, R. (1972). The presence of 5-hydroxymethylcytosine in animal deoxyribonucleic acid. Biochemical Journal, 126(4):781–790. Perna, L., Zhang, Y., Mons, U., Holleczek, B., Saum, K.-U., and Brenner, H. (2016). Epigenetic age acceleration predicts cancer, cardiovascular, and all-cause mortality in a German case cohort. Clinical Epigenetics, 8(1):64. Peters, J. (2014). The role of genomic imprinting in biology and disease: an expanding view. Nature Reviews Genetics, 15:517–530. Peters, M. J., Joehanes, R., Pilling, L. C., Schurmann, C., Conneely, K. N., Powell, J., Reinmaa, E., Sutphin, G. L., Zhernakova, A., Schramm, K., Wilson, Y. A., Kobes, S., Tukiainen, T., NABEC/UKBEC Consortium, Ramos, Y. F., Göring, H. H. H., Fornage, M., Liu, Y., Gharib, S. A., Stranger, B. E., De Jager, P. L., Aviv, A., Levy, D., Murabito, J. M., Munson, P. J., Huan, T., Hofman, A., Uitterlinden, A. G., Rivadeneira, F., van Rooij, J., Stolk, L., Broer, L., Verbiest, M. M. P. J., Jhamai, M., Arp, P., Metspalu, A., Tserel, L., Milani, L., Samani, N. J., Peterson, P., Kasela, S., Codd, V., Peters, A., Ward-Caviness, C. K., Herder, C., Waldenberger, M., Roden, M., Singmann, P., Zeilinger, S., Illig, T., Homuth, G., Grabe, H.-J., Völzke, H., Steil, L., Kocher, T., Murray, A., Melzer, D., Yaghootkar, H., Bandinelli, S., Moses, E. K., Kent, J. W., Curran, J. E., Johnson, M. P., Williams-Blangero, S., Westra, H.-J., McRae, A. F., Smith, J. A., Kardia, S. L. R., Hovatta, I., Perola, M., Ripatti, S., Salomaa, V., Henders, A. K., Martin, N. G., Smith, A. K., Mehta, D., Binder, E. B., Nylocks, K. M., Kennedy, E. M., Klengel, T., Ding, J., Suchy-Dicey, A. M., Enquobahrie, D. A., Brody, J., Rotter, J. I., Chen, Y.-D. I., Houwing-Duistermaat, J., Kloppenburg, M., Slagboom, P. E., Helmer, Q., den Hollander, W., Bean, S., Raj, T., Bakhshi, N., Wang, Q. P., Oyston, L. J., Psaty, B. M., Tracy, R. P., Montgomery, G. W., Turner, S. T., Blangero, J., Meulenbelt, I., Ressler, K. J., Yang, J., Franke, L., Kettunen, J., Visscher, P. M., Neely, G. G., Korstanje, R., Hanson, R. L., Prokisch, H., Ferrucci, L., References 199 Esko, T., Teumer, A., van Meurs, J. B. J., and Johnson, A. D. (2015). The transcriptional landscape of age in human peripheral blood. Nature communications, 6:8570. Petkovich, D. A., Podolskiy, D. I., Lobanov, A. V., Lee, S.-G., Miller, R. A., and Gladyshev, V. N. (2017). Using DNA Methylation Profiling to Evaluate Biological Age and Longevity Interventions. Cell Metabolism, 25(4):954–960.e6. Peto, R. and Doll, R. (1997). There is no such thing as aging. BMJ, 315(7115):1030. Pidsley, R., Zotenko, E., Peters, T. J., Lawrence, M. G., Risbridger, G. P., Molloy, P., Van Djik, S., Muhlhausler, B., Stirzaker, C., and Clark, S. J. (2016). Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biology, 17(1):208. Plongthongkum, N., Diep, D. H., and Zhang, K. (2014). Advances in the profiling of DNA modifications: cytosine methylation and beyond. Nat Rev Genet, 15(10):647–661. Polanowski, A. M., Robbins, J., Chandler, D., and Jarman, S. N. (2014). Epigenetic estimation of age in humpback whales. Molecular Ecology Resources, 14(5):976–987. Poulain, M., Herm, A., and Pes, G. (2013). The Blue Zones: areas of exceptional longevity around the world. Vienna Yearbook of Population Research, 11:87–108. Price, E. M. and Robinson, W. P. (2018). Adjusting for Batch Effects in DNA Methylation Microarray Data, a Lesson Learned. Frontiers in Genetics, 9:83. Pu, M., Ni, Z., Wang, M., Wang, X., Wood, J. G., Helfand, S. L., Yu, H., and Lee, S. S. (2015). Trimethylation of Lys36 on H3 restricts gene expression change during aging and impacts life span. Genes and Development, 29(7):718–731. Putin, E., Mamoshina, P., Aliper, A., Korzinkin, M., Moskalev, A., Kolosov, A., Ostrovskiy, A., Cantor, C., Vijg, J., and Zhavoronkov, A. (2016). Deep biomarkers of human aging: Application of deep neural networks to biomarker development. Aging, 8(5):1021–1033. Quinlan, A. R. and Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6):841–842. Raddatz, G., Hagemann, S., Aran, D., Söhle, J., Kulkarni, P. P., Kaderali, L., Hellman, A., Winnefeld, M., and Lyko, F. (2013). Aging is associated with highly defined epigenetic changes in the human epidermis. Epigenetics & Chromatin, 6(1):36. Radford, E. J., Ito, M., Shi, H., Corish, J. A., Yamazawa, K., Isganaitis, E., Seisenberger, S., Hore, T. A., Reik, W., Erkek, S., Peters, A. H. F. M., Patti, M.-E., and Ferguson- Smith, A. C. (2014). In utero undernourishment perturbs the adult sperm methylome and intergenerational metabolism. Science, 345(6198):1255903. Rahmadi, R., Groot, P., van Rijn, M. H. C., van den Brand, J. A. J. G., Heins, M., Knoop, H., and Heskes, T. (2017). Causality on longitudinal data: Stable specification search in constrained structural equation modeling. Statistical Methods in Medical Research, 27(12):3814–3834. 200 References Rakyan, V. K., Down, T. A., Balding, D. J., and Beck, S. (2011). Epigenome-wide association studies for common human diseases. Nat Rev Genet, 12:529–541. Rakyan, V. K., Down, T. A., Maslau, S., Andrew, T., Yang, T. P., Beyan, H., Whittaker, P., McCann, O. T., Finer, S., Valdes, A. M., Leslie, R. D., Deloukas, P., and Spector, T. D. (2010). Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains. Genome Research, 20:434–439. Rando, T. A. and Chang, H. Y. (2012). Aging, Rejuvenation, and Epigenetic Reprogramming: Resetting the Aging Clock. Cell, 148(1):46–57. Reddington, J. P., Perricone, S. M., Nestor, C. E., Reichmann, J., Youngson, N. A., and Suzuki, M. (2013). Redistribution of H3K27me3 upon DNA hypomethylation results in de-repression of Polycomb target genes. Genome Biol, 14:R25. Redman, L. M., Smith, S. R., Burton, J. H., Martin, C. K., Il’yasova, D., and Ravussin, E. (2018). Metabolic Slowing and Reduced Oxidative Damage with Sustained Caloric Restriction Support the Rate of Living and Oxidative Damage Theories of Aging. Cell Metabolism, 27(4):805–815.e4. Reinberg, D. and Vales, L. D. (2018). Chromatin domains rich in inheritance. Science, 361(6397):33–34. Reinius, L. E., Acevedo, N., Joerink, M., Pershagen, G., Dahlén, S.-E., Greco, D., Söderhäll, C., Scheynius, A., and Kere, J. (2012). Differential DNA Methylation in Purified Human Blood Cells: Implications for Cell Lineage and Studies on Disease Susceptibility. PLOS ONE, 7(7):e41361. Remolina, S. C. and Hughes, K. A. (2008). Evolution and mechanisms of long life and high fertility in queen honey bees. Age, 30(2-3):177–185. Renfrew, C., Boyd, M. J., and Morley, I. (2016). Death Rituals, Social Order and the Archeology of Immortality in the Ancient World. Research, Z. (2019). EZ DNA methylation-Direct™ Kit. Technical report. Richter, A. S., Ryan, D. P., Kilpert, F., Ramírez, F., Heyne, S., and Manke, T. (2019). pyBigWig GitHub Repository. Richter, E. A. and Ruderman, N. B. (2009). AMPK and the biochemistry of exercise: implications for human health and disease. Biochemical Journal, 418(2):261–275. Ricklefs, R. E. (2010). Life-history connections to rates of aging in terrestrial vertebrates. Proceedings of the National Academy of Sciences, 107(22):10314–10319. Riggs, A. D. (1975). X inactivation, differentiation, and DNA methylation. Cytogenetic and Genome Research, 14(1):9–25. Rinaldi, L., Datta, D., Serrat, J., Morey, L., Solanas, G., Avgustinova, A., Blanco, E., Pons, J. I., Matallanas, D., Von Kriegsheim, A., Di Croce, L., and Benitah, S. A. (2016). Dnmt3a and Dnmt3b Associate with Enhancers to Regulate Human Epidermal Stem Cell Homeostasis. Cell Stem Cell, 19(4):491–501. References 201 Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., and Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7):e47. Roberts, R. J., Vincze, T., Posfai, J., and Macelis, D. (2005). REBASE—restriction enzymes and DNA methyltransferases. Nucleic Acids Research, 33(suppl_1):D230–D232. Roberts, R. J., Vincze, T., Posfai, J., and Macelis, D. (2015). REBASE—a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Research, 43(D1):D298–D299. Roby, J., C., J. A., E., M. R., C., P. L., M., R. L., R., M. P., Weihua, G., Tao, X., E., E. C., Stella, A., Hortensia, M.-M., A., S. J., A., B. J., Radhika, D., Paul, Y., S., P. J., Sonja, K., H., S. S., F., M. A., Kurt, L., Jin, S., M., A. D., Luigi, F., Wei, Z., W., D. E., Jan, B., L., G. M., Tianxiao, H., Chunyu, L., M., M. M., Chen, Y., P., K. D., Annette, P., Rui, W.-S., M., V. P., R., W. N., M., S. J., Jingzhong, D., J., R. C., J., W. N., R., I. M., Degui, Z., Myrto, B., Paolo, V., Srikant, A., G., U. A., Albert, H., Joel, S., Elena, C., Lifang, H., S., V. P., G., H. D., B., S. A., Stefania, B., T., T. S., B., W. E., K., S. A., Torsten, K., B., B. E., M., P. B., D., T. K., A., G. S., R., S. B., Liming, L., L., D. D., T., O. G., Zdenko, H., J., R. K., N., C. K., Nona, S., R., K. S. L., David, M., A., B. A., J., v. M. J. B., Isabelle, R., K., A. D., K., O. K., Yongmei, L., Melanie, W., J., D. I., Myriam, F., Daniel, L., and J., L. S. (2016). Epigenetic Signatures of Cigarette Smoking. Circulation: Cardiovascular Genetics, 9(5):436–447. Ruby, J. G., Smith, M., and Buffenstein, R. (2018a). Naked mole-rat mortality rates defy Gompertzian laws by not increasing with age. eLife, 7:e31157. Ruby, J. G., Wright, K. M., Rand, K. A., Kermany, A., Noto, K., Curtis, D., Varner, N., Garrigan, D., Slinkov, D., Dorfman, I., Granka, J. M., Byrnes, J., Myres, N., and Ball, C. (2018b). Estimates of the Heritability of Human Longevity Are Substantially Inflated due to Assortative Mating. Genetics, 210(3):1109–1124. Rulands, S., Lee, H. J., Clark, S. J., Angermueller, C., Smallwood, S. A., Krueger, F., Mohammed, H., Dean, W., Nichols, J., Rugg-Gunn, P., Kelsey, G., Stegle, O., Simons, B. D., and Reik, W. (2018). Genome-Scale Oscillations in DNA Methylation during Exit from Pluripotency. Cell Systems, 7(1):63–76.e12. Sánchez-Romero, M. A., Cota, I., and Casadesús, J. (2015). DNA methylation in bacteria: from the methyl group to the methylome. Current Opinion in Microbiology, 25:9–16. Sarkar, T. J., Quarta, M., Mukherjee, S., Colville, A., Paine, P., Doan, L., Tran, C. M., Chu, C. R., Horvath, S., Bhutani, N., Rando, T. A., and Sebastiano, V. (2019). Transient non-integrative nuclear reprogramming promotes multifaceted reversal of aging in human cells. bioRxiv, page 573386. Schenkel, L. C., Kernohan, K. D., McBride, A., Reina, D., Hodge, A., Ainsworth, P. J., Rodenhiser, D. I., Pare, G., Bérubé, N. G., Skinner, C., Boycott, K. M., Schwartz, C., and Sadikovic, B. (2017). Identification of epigenetic signature associated with alpha thalassemia/mental retardation X-linked syndrome. Epigenetics & Chromatin, 10(1):10. 202 References Schenkel, L. C., Schwartz, C., Skinner, C., Rodenhiser, D. I., Ainsworth, P. J., Pare, G., and Sadikovic, B. (2016). Clinical Validation of Fragile X Syndrome Screening by DNA Methylation Array. The Journal of Molecular Diagnostics, 18(6):834–841. Schübeler, D. (2015). Function and information content of DNA methylation. Nature, 517(7534):321–326. Schultz, M. D., He, Y., Whitaker, J. W., Hariharan, M., Mukamel, E. A., Leung, D., Rajagopal, N., Nery, J. R., Urich, M. A., Chen, H., Lin, S., Lin, Y., Jung, I., Schmitt, A. D., Selvaraj, S., Ren, B., Sejnowski, T. J., Wang, W., and Ecker, J. R. (2015). Human body epigenome maps reveal noncanonical DNA methylation variation. Nature, 523:212–216. Sehl, M. E., Henry, J. E., Storniolo, A. M., Ganz, P. A., and Horvath, S. (2017). DNA methylation age is elevated in breast tissue of healthy women. Breast Cancer Research and Treatment, 164(1):209–219. Seidler, S., Zimmermann, H. W., Bartneck, M., Trautwein, C., and Tacke, F. (2010). Age- dependent alterations of monocyte subsets and monocyte-related chemokine pathways in healthy adults. BMC Immunology, 11(1):30. Sen, P., Dang, W., Donahue, G., Dai, J., Dorsey, J., Cao, X., Liu, W., Cao, K., Perry, R., Lee, J. Y., Wasko, B. M., Carr, D. T., He, C., Robison, B., Wagner, J., Gregory, B. D., Kaeberlein, M., Kennedy, B. K., Boeke, J. D., and Berger, S. L. (2015). H3K36 methylation promotes longevity by enhancing transcriptional fidelity. Genes and Development, 29(13):1362– 1376. Sen, P., Shah, P. P., Nativio, R., and Berger, S. L. (2016). Epigenetic Mechanisms of Longevity and Aging. Cell, 166(4):822–839. Sheather, S. J. (2009). A Modern Approach to Regression with R. Shendure, J. and Ji, H. (2008). Next-generation DNA sequencing. Nature Biotechnology, 26:1135. Shipony, Z., Mukamel, Z., Cohen, N. M., Landan, G., Chomsky, E., Zeliger, S. R., Fried, Y. C., Ainbinder, E., Friedman, N., and Tanay, A. (2014). Dynamic and static maintenance of epigenetic memory in pluripotent and somatic cells. Nature, 513(7516):115–119. Sierro, N., Battey, J. N. D., Ouadi, S., Bakaher, N., Bovet, L., Willig, A., Goepfert, S., Peitsch, M. C., and Ivanov, N. V. (2014). The tobacco genome sequence and its comparison with those of tomato and potato. Nature Communications, 5:3833. Singh, P. P., Demmitt, B. A., Nath, R. D., and Brunet, A. (2019). The Genetics of Aging: A Vertebrate Perspective. Cell, 177(1):200–220. Slieker, R. C., Relton, C. L., Gaunt, T. R., Slagboom, P. E., and Heijmans, B. T. (2018). Age- related DNA methylation changes are tissue-specific with ELOVL2 promoter methylation as exception. Epigenetics & Chromatin, 11(1):25. References 203 Slieker, R. C., van Iterson, M., Luijk, R., Beekman, M., Zhernakova, D. V., Moed, M. H., Mei, H., van Galen, M., Deelen, P., Bonder, M. J., Zhernakova, A., Uitterlinden, A. G., Tigchelaar, E. F., Stehouwer, C. D. A., Schalkwijk, C. G., van der Kallen, C. J. H., Hofman, A., van Heemst, D., de Geus, E. J., van Dongen, J., Deelen, J., van den Berg, L. H., van Meurs, J., Jansen, R., ‘t Hoen, P. A. C., Franke, L., Wijmenga, C., Veldink, J. H., Swertz, M. A., van Greevenbroek, M. M. J., van Duijn, C. M., Boomsma, D. I., Slagboom, P. E., Heijmans, B. T., and Consortium, B. (2016). Age-related accrual of methylomic variability is linked to fundamental ageing mechanisms. Genome Biology, 17(1):191. Smith, Z. D., Gu, H., Bock, C., Gnirke, A., and Meissner, A. (2009). High-throughput bisulfite sequencing in mammalian genomes. Methods, 48(3):226–232. Smith, Z. D. and Meissner, A. (2013). DNA methylation: roles in mammalian development. Nature Reviews Genetics, 14:204–220. Søraas, A., Matsuyama, M., de Lima, M., Wald, D., Buechner, J., Gedde-Dahl, T., Søraas, C. L., Chen, B., Ferrucci, L., Dahl, J. A., Horvath, S., and Matsuyama, S. (2019). Epi- genetic age is a cell-intrinsic property in transplanted human hematopoietic cells. Aging Cell, 18(2):e12897. Sørensen, C. S., Schotta, G., and Jørgensen, S. (2013). Histone H4 Lysine 20 methylation: key player in epigenetic regulation of genomic integrity. Nucleic Acids Research, 41(5):2797– 2806. Strahl, B. D. and Allis, C. D. (2000). The language of covalent histone modifications. Nature, 403(6765):41–45. Streubel, G., Watson, A., Jammula, S. G., Scelfo, A., Fitzpatrick, D. J., Oliviero, G., McCole, R., Conway, E., Glancy, E., Negri, G. L., Dillon, E., Wynne, K., Pasini, D., Krogan, N. J., Bracken, A. P., and Cagney, G. (2018). The H3K36me2 Methyltransferase Nsd1 Demarcates PRC2-Mediated H3K27me2 and H3K27me3 Domains in Embryonic Stem Cells. Molecular Cell, 70(2):371–379.e5. Stroud, H., Do, T., Du, J., Zhong, X., Feng, S., Johnson, L., Patel, D. J., and Jacobsen, S. E. (2013). Non-CG methylation patterns shape the epigenetic landscape in Arabidopsis. Nature Structural & Molecular Biology, 21:64–72. Stroustrup, N., Anthony, W. E., Nash, Z. M., Gowda, V., Gomez, A., López-Moyado, I. F., Apfeld, J., and Fontana, W. (2016). The temporal scaling of Caenorhabditis elegans ageing. Nature, 530:103–107. Stroustrup, N., Ulmschneider, B. E., Nash, Z. M., López-Moyado, I. F., Apfeld, J., and Fontana, W. (2013). The Caenorhabditis elegans Lifespan Machine. Nature methods, 10(7):665–70. Stubbs, T. M., Bonder, M. J., Stark, A.-K., Krueger, F., von Meyenn, F., Stegle, O., and Reik, W. (2017). Multi-tissue DNA methylation age predictor in mouse. Genome Biology, 18(1):68. 204 References Stunnenberg, H. G., Abrignani, S., Adams, D., de Almeida, M., Altucci, L., Amin, V., Amit, I., Antonarakis, S. E., Aparicio, S., Arima, T., Arrigoni, L., Arts, R., Asnafi, V., Esteller, M., Bae, J.-B., Bassler, K., Beck, S., Berkman, B., Bernstein, B. E., Bilenky, M., Bird, A., Bock, C., Boehm, B., Bourque, G., Breeze, C. E., Brors, B., Bujold, D., Burren, O., Bussemakers, M. J., Butterworth, A., Campo, E., Carrillo-de Santa-Pau, E., Chadwick, L., Chan, K. M., Chen, W., Cheung, T. H., Chiapperino, L., Choi, N. H., Chung, H.-R., Clarke, L., Connors, J. M., Cronet, P., Danesh, J., Dermitzakis, M., Drewes, G., Durek, P., Dyke, S., Dylag, T., Eaves, C. J., Ebert, P., Eils, R., Eils, J., Ennis, C. A., Enver, T., Feingold, E. A., Felder, B., Ferguson-Smith, A., Fitzgibbon, J., Flicek, P., Foo, R. S.-Y., Fraser, P., Frontini, M., Furlong, E., Gakkhar, S., Gasparoni, N., Gasparoni, G., Geschwind, D. H., Glažar, P., Graf, T., Grosveld, F., Guan, X.-Y., Guigo, R., Gut, I. G., Hamann, A., Han, B.-G., Harris, R. A., Heath, S., Helin, K., Hengstler, J. G., Heravi-Moussavi, A., Herrup, K., Hill, S., Hilton, J. A., Hitz, B. C., Horsthemke, B., Hu, M., Hwang, J.-Y., Ip, N. Y., Ito, T., Javierre, B.-M., Jenko, S., Jenuwein, T., Joly, Y., Jones, S. J. M., Kanai, Y., Kang, H. G., Karsan, A., Kiemer, A. K., Kim, S. C., Kim, B.-J., Kim, H.-H., Kimura, H., Kinkley, S., Klironomos, F., Koh, I.-U., Kostadima, M., Kressler, C., Kreuzhuber, R., Kundaje, A., Küppers, R., Larabell, C., Lasko, P., Lathrop, M., Lee, D. H. S., Lee, S., Lehrach, H., Leitão, E., Lengauer, T., Lernmark, Å., Leslie, R. D., Leung, G. K. K., Leung, D., Loeffler, M., Ma, Y., Mai, A., Manke, T., Marcotte, E. R., Marra, M. A., Martens, J. H. A., Martin-Subero, J. I., Maschke, K., Merten, C., Milosavljevic, A., Minucci, S., Mitsuyama, T., Moore, R. A., Müller, F., Mungall, A. J., Netea, M. G., Nordström, K., Norstedt, I., Okae, H., Onuchic, V., Ouellette, F., Ouwehand, W., Pagani, M., Pancaldi, V., Pap, T., Pastinen, T., Patel, R., Paul, D. S., Pazin, M. J., Pelicci, P. G., Phillips, A. G., Polansky, J., Porse, B., Pospisilik, J. A., Prabhakar, S., Procaccini, D. C., Radbruch, A., Rajewsky, N., Rakyan, V., Reik, W., Ren, B., Richardson, D., Richter, A., Rico, D., Roberts, D. J., Rosenstiel, P., Rothstein, M., Salhab, A., Sasaki, H., Satterlee, J. S., Sauer, S., Schacht, C., Schmidt, F., Schmitz, G., Schreiber, S., Schröder, C., Schübeler, D., Schultze, J. L., Schulyer, R. P., Schulz, M., Seifert, M., Shirahige, K., Siebert, R., Sierocinski, T., Siminoff, L., Sinha, A., Soranzo, N., Spicuglia, S., Spivakov, M., Steidl, C., Strattan, J. S., Stratton, M., Südbeck, P., Sun, H., Suzuki, N., Suzuki, Y., Tanay, A., Torrents, D., Tyson, F. L., Ulas, T., Ullrich, S., Ushijima, T., Valencia, A., Vellenga, E., Vingron, M., Wallace, C., Wallner, S., Walter, J., Wang, H., Weber, S., Weiler, N., Weller, A., Weng, A., Wilder, S., Wiseman, S. M., Wu, A. R., Wu, Z., Xiong, J., Yamashita, Y., Yang, X., Yap, D. Y., Yip, K. Y., Yip, S., Yoo, J.-I., Zerbino, D., Zipprich, G., and Hirst, M. (2016). The International Human Epigenome Consortium: A Blueprint for Scientific Collaboration and Discovery. Cell, 167(5):1145–1149. Sun, D., Luo, M., Jeong, M., Rodriguez, B., Xia, Z., Hannah, R., Wang, H., Le, T., Faull, K. F., Chen, R., Gu, H., Bock, C., Meissner, A., Göttgens, B., Darlington, G. J., Li, W., and Goodell, M. A. (2014a). Epigenomic profiling of young and aged HSCs reveals concerted changes during aging that reinforce self-renewal. Cell Stem Cell, 14(5):673–688. Sun, Y., Hou, R., Fu, X., Sun, C., Wang, S., Wang, C., Li, N., Zhang, L., and Bao, Z. (2014b). Genome-Wide Analysis of DNA Methylation in Five Tissues of Zhikong Scallop, Chlamys farreri. PLOS ONE, 9(1):e86232. Suzuki, M. and Greally, J. M. (2013). Genome-wide DNA Methylation Analysis Using Massively Parallel Sequencing Technologies. Seminars in Hematology, 50(1):70–77. References 205 Sziráki, A., Tyshkovskiy, A., and Gladyshev, V. N. (2018). Global remodeling of the mouse DNA methylome during aging and in response to calorie restriction. Aging Cell, 17(3):e12738. Taher, L., Smith, R. P., Kim, M. J., Ahituv, N., and Ovcharenko, I. (2013). Sequence signatures extracted from proximal promoters can be used to predict distal enhancers. Genome Biology, 14(10):R117. Tahiliani, M., Koh, K. P., Shen, Y., Pastor, W. A., Bandukwala, H., and Brudno, Y. (2009). Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science, 324(5929):930–935. Taiwo, O., Wilson, G. A., Morris, T., Seisenberger, S., Reik, W., Pearce, D., Beck, S., and Butcher, L. M. (2012). Methylome analysis using MeDIP-seq with low DNA concentra- tions. Nature Protocols, 7:617–636. Takahashi, Y., Wu, J., Suzuki, K., Martinez-Redondo, P., Li, M., Liao, H.-K., Wu, M.-Z., Hernández-Benítez, R., Hishida, T., Shokhirev, M. N., Esteban, C. R., Sancho-Martinez, I., and Belmonte, J. C. I. (2017). Integration of CpG-free DNA induces de novo methylation of CpG islands in pluripotent stem cells. Science, 356(6337):503–508. Talens, R. P., Christensen, K., Putter, H., Willemsen, G., Christiansen, L., Kremer, D., Suchiman, H. E. D., Slagboom, P. E., Boomsma, D. I., and Heijmans, B. T. (2012). Epigenetic variation during the adult lifespan: cross-sectional and longitudinal data on monozygotic twin pairs. Aging Cell, 11(4):694–703. Tan, L., Ke, Z., Tombline, G., Macoretta, N., Hayes, K., Tian, X., Lv, R., Ablaeva, J., Gilbert, M., Bhanu, N. V., Yuan, Z.-F., Garcia, B. A., Shi, Y. G., Shi, Y., Seluanov, A., and Gorbunova, V. (2017). Naked Mole Rat Cells Have a Stable Epigenome that Resists iPSC Reprogramming. Stem Cell Reports, 9(5):1721–1734. Tanaka, T., Biancotto, A., Moaddel, R., Moore, A. Z., Gonzalez-Freire, M., Aon, M. A., Candia, J., Zhang, P., Cheung, F., Fantoni, G., Consortium, C. H. I., Semba, R. D., and Ferrucci, L. (2018). Plasma proteomic signature of age in healthy humans. Aging Cell, 17(5):e12799. Tanas, A. S., Borisova, M. E., Kuznetsova, E. B., Rudenko, V. V., Karandasheva, K. O., Nemtsova, M. V., Izhevskaya, V. L., Simonova, O. A., Larin, S. S., Zaletaev, D. V., and Strelnikov, V. V. (2017). Rapid and affordable genome-wide bisulfite DNA sequencing by XmaI-reduced representation bisulfite sequencing. Epigenomics, 9(6):833–847. Tang, W. W. C., Kobayashi, T., Irie, N., Dietmann, S., and Surani, M. A. (2016). Specification and epigenetic programming of the human germ line. Nature Reviews Genetics, 17:585– 600. Taudt, A., Colomé-Tatché, M., and Johannes, F. (2016). Genetic sources of population epigenomic variation. Nature Reviews Genetics, 17:319–332. Teschendorff, A. E., Breeze, C. E., Zheng, S. C., and Beck, S. (2017). A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies. BMC Bioinformatics, 18(1):105. 206 References Teschendorff, A. E., Marabita, F., Lechner, M., Bartlett, T., Tegner, J., Gomez-Cabrero, D., and Beck, S. (2012). A Beta-Mixture Quantile Normalisation method for correcting probe design bias in Illumina Infinium 450k DNA methylation data. Bioinformatics (Oxford, England), 29(2):189–196. Teschendorff, A. E., Menon, U., Gentry-Maharaj, A., Ramus, S. J., Weisenberger, D. J., Shen, H., Campan, M., Noushmehr, H., Bell, C. G., Maxwell, A. P., Savage, D. A., Mueller- Holzner, E., Marth, C., Kocjan, G., Gayther, S. A., Jones, A., Beck, S., Wagner, W., Laird, P. W., Jacobs, I. J., and Widschwendter, M. (2010). Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Research, 20(4):440–446. Teschendorff, A. E. and Relton, C. L. (2018). Statistical and integrative system-level analysis of DNA methylation data. Nature Reviews Genetics, 19:129–147. Teschendorff, A. E., Yang, Z., Wong, A., Pipinikas, C. P., Jiao, Y., Jones, A., Anjum, S., Hardy, R., Salvesen, H. B., Thirlwell, C., Janes, S. M., Kuh, D., and Widschwendter, M. (2015). Correlation of Smoking-Associated DNA Methylation Changes in Buccal Cells With DNA Methylation Changes in Epithelial CancerSmoking and DNA Methylation Changes in Buccal Cells and Epithelial CancerSmoking and DNA Methylation Changes in Buccal Cells and Epi. JAMA Oncology, 1(4):476–485. Teschendorff, A. E. and Zheng, S. C. (2017a). Cell-type deconvolution in epigenome-wide association studies: a review and recommendations. Epigenomics, 9(5):757–768. Teschendorff, A. E. and Zheng, S. C. (2017b). EpiDISH Bioconductor Package. Thompson, M. J., Chwiałkowska, K., Rubbi, L., Lusis, A. J., Davis, R. C., Srivastava, A., Korstanje, R., Churchill, G. A., Horvath, S., and Pellegrini, M. (2018). A multi-tissue full lifespan epigenetic clock for mice. Aging, 10(10):2832–2854. Thompson, M. J., von Holdt, B., Horvath, S., and Pellegrini, M. (2017). An epigenetic aging clock for dogs and wolves. Aging, 9(3):1055–1068. Thomson, W. (1889). Popular lectures and addresses. London Macmillan. Titus, A. J., Gallimore, R. M., Salas, L. A., and Christensen, B. C. (2017). Cell-type deconvolution from DNA methylation: a review of recent applications. Human Molecular Genetics, 26(R2):R216–R224. Tomás-Loba, A., Flores, I., Fernández-Marcos, P. J., Cayuela, M. L., Maraver, A., Tejera, A., Borrás, C., Matheu, A., Klatt, P., Flores, J. M., Viña, J., Serrano, M., and Blasco, M. A. (2008). Telomerase Reverse Transcriptase Delays Aging in Cancer-Resistant Mice. Cell, 135(4):609–622. Tomida, M. W., Gaddis, S., Takata, Y., Liu, B., Lin, K., Estecio, M. R., Hardikar, S., Lu, Y., Veland, N., Zeng, Y., Chen, T., Shen, J., Saha, D., Gowher, H., and Zhao, H. (2018). DNMT3L facilitates DNA methylation partly by maintaining DNMT3A stability in mouse embryonic stem cells. Nucleic Acids Research, 47(1):152–167. References 207 Touleimat, N. and Tost, J. (2012). Complete pipeline for Infinium® Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. Epigenomics, 4(3):325–341. Triche Jr, T. J., Weisenberger, D. J., Van Den Berg, D., Laird, P. W., and Siegmund, K. D. (2013). Low-level processing of Illumina Infinium DNA Methylation BeadArrays. Nucleic Acids Research, 41(7):e90. Trojer, P. and Reinberg, D. (2007). Facultative Heterochromatin: Is There a Distinctive Molecular Signature? Molecular Cell, 28(1):1–13. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520–525. Truong, T. P., Sakata-Yanagimoto, M., Yamada, M., Nagae, G., Enami, T., Nakamoto- Matsubara, R., Aburatani, H., and Chiba, S. (2015). Age-Dependent Decrease of DNA Hy- droxymethylation in Human T Cells. Journal of Clinical and Experimental Hematopathol- ogy, 55(1):1–6. Tsurumi, A. and Li, W. (2012). Global heterochromatin loss: A unifying theory of aging? Epigenetics, 7(7):680–688. Tullet, J. M. A., Hertweck, M., An, J. H., Baker, J., Hwang, J. Y., Liu, S., Oliveira, R. P., Baumeister, R., and Blackwell, T. K. (2008). Direct Inhibition of the Longevity-Promoting Factor SKN-1 by Insulin-like Signaling in C. elegans. Cell, 132(6):1025–1038. Um, S. H., D’Alessio, D., and Thomas, G. (2006). Nutrient overload, insulin resistance, and ribosomal protein S6 kinase 1, S6K1. Cell Metabolism, 3(6):393–402. van Dongen, J., Nivard, M. G., Willemsen, G., Hottenga, J.-J., Helmer, Q., Dolan, C. V., Ehli, E. A., Davies, G. E., van Iterson, M., Breeze, C. E., Beck, S., Consortium, B., Hoen, P. A., Pool, R., van Greevenbroek, M. M. J., Stehouwer, C. D. A., van der Kallen, C. J. H., Schalkwijk, C. G., Wijmenga, C., Zhernakova, S., Tigchelaar, E. F., Beekman, M., Deelen, J., van Heemst, D., Veldink, J. H., van den Berg, L. H., van Duijn, C. M., Hofman, B. A., Uitterlinden, A. G., Jhamai, P. M., Verbiest, M., Verkerk, M., van der Breggen, R., van Rooij, J., Lakenberg, N., Mei, H., Bot, J., Zhernakova, D. V., van’t Hof, P., Deelen, P., Nooren, I., Moed, M., Vermaat, M., Luijk, R., Bonder, M. J., van Dijk, F., van Galen, M., Arindrarto, W., Kielbasa, S. M., Swertz, M. A., van Zwet, E. W., Isaacs, A., Franke, L., Suchiman, H. E., Jansen, R., van Meurs, J. B., Heijmans, B. T., Slagboom, P. E., and Boomsma, D. I. (2016). Genetic and environmental influences interact with age and sex in shaping the human methylome. Nature Communications, 7:11115. van Iterson, M., van Zwet, E. W., Heijmans, B. T., and Consortium, t. B. (2017). Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biology, 18(1):19. Villeponteau, B. (1997). The heterochromatin loss model of aging. Experimental Gerontol- ogy, 32(4):383–394. Visscher, P. M., Wray, N. R., Zhang, Q., Sklar, P., McCarthy, M. I., Brown, M. A., and Yang, J. (2017). 10 Years of GWAS Discovery: Biology, Function, and Translation. 208 References Voigt, P., Tee, W. W., and Reinberg, D. (2013). A double take on bivalent promoters. Genes and Development, 27:1318–1338. Waddington, C. H. (1942). The Epigenotype. Endeavor, 1:18–20. Waddington, C. H. (1957). The cybernetics of development. In The strategy of the genes, pages 27–38. Wagner, E. J. and Carpenter, P. B. (2012). Understanding the language of Lys36 methylation at histone H3. Nature Reviews Molecular Cell Biology, 13:115–126. Wang, J., Xia, Y., Li, L., Gong, D., Yao, Y., Luo, H., Lu, H., Yi, N., Wu, H., Zhang, X., Tao, Q., and Gao, F. (2013). Double restriction-enzyme digestion improves the coverage and accuracy of genome-wide CpG methylation profiling by reduced representation bisulfite sequencing. BMC genomics, 14:11. Wang, T., Tsui, B., Kreisberg, J. F., Robertson, N. A., Gross, A. M., Yu, M. K., Carter, H., Brown-Borg, H. M., Adams, P. D., and Ideker, T. (2017). Epigenetic aging signatures in mice livers are slowed by dwarfism, calorie restriction and rapamycin treatment. Genome Biology, 18(1):57. Wei, M., Brandhorst, S., Shelehchi, M., Mirzaei, H., Cheng, C. W., Budniak, J., Groshen, S., Mack, W. J., Guen, E., Di Biase, S., Cohen, P., Morgan, T. E., Dorff, T., Hong, K., Michalsen, A., Laviano, A., and Longo, V. D. (2017). Fasting-mimicking diet and markers/risk factors for aging, diabetes, cancer, and cardiovascular disease. Science Translational Medicine, 9(377):eaai8700. Weidner, C. I., Lin, Q., Koch, C. M., Eisele, L., Beier, F., and Ziegler, P. (2014). Aging of blood can be tracked by DNA methylation changes at just three CpG sites. Genome Biol, 15:R24. West, J., Teschendorff, A. E., and Beck, S. (2013). Age-associated epigenetic drift: implica- tions, and a case of epigenetic thrift? Human Molecular Genetics, 22(R1):R7–R15. Whitaker, J. W., Chen, Z., and Wang, W. (2014). Predicting the human epigenome from DNA motifs. Nature Methods, 12:265–272. Widschwendter, M., Jones, A., Evans, I., Reisel, D., Dillner, J., Sundström, K., Steyerberg, E. W., Vergouwe, Y., Wegwarth, O., Rebitschek, F. G., Siebert, U., Sroczynski, G., de Beaufort, I. D., Bolt, I., Cibula, D., Zikan, M., Bjørge, L., Colombo, N., Harbeck, N., Dudbridge, F., Tasse, A.-M., Knoppers, B. M., Joly, Y., Teschendorff, A. E., and Pashayan, N. (2018). Epigenome-based cancer risk prediction: rationale, opportunities and challenges. Nature Reviews Clinical Oncology, 15:292–309. Wilhelm-Benartzi, C. S., Koestler, D. C., Karagas, M. R., Flanagan, J. M., Christensen, B. C., Kelsey, K. T., Marsit, C. J., Houseman, E. A., and Brown, R. (2013). Review of processing and analysis methods for DNA methylation array data. Br J Cancer, 109(6):1394–1402. Williams, G. C. (1957). Pleiotropy, Natural Selection, and the Evolution of Senescence. Evolution, 11(4):398–411. References 209 Witten, M. (1986). Information content of biological survival curves arising in aging experiments: some further thoughts. In Evolution of longevity in animals: a comparative approach., pages 295–317. Wu, C.-t. and Morris, J. R. (2001). Genes, Genetics, and Epigenetics: A Correspondence. Science, 293(5532):1103–1105. Wu, H., Xu, T., Feng, H., Chen, L., Li, B., Yao, B., Qin, Z., Jin, P., and Conneely, K. N. (2015). Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates. Nucleic Acids Research, 43(21):e141–e141. Wu, T. P., Wang, T., Seetin, M. G., Lai, Y., Zhu, S., Lin, K., Liu, Y., Byrum, S. D., Mackintosh, S. G., Zhong, M., Tackett, A., Wang, G., Hon, L. S., Fang, G., Swenberg, J. a., and Xiao, A. Z. (2016). DNA methylation on N6-adenine in mammalian embryonic stem cells. Nature, 532:1–18. Wu, X. and Zhang, Y. (2017). TET-mediated active DNA demethylation: mechanism, function and beyond. Nature Reviews Genetics, 18:517–534. Wutz, A. (2011). Gene silencing in X-chromosome inactivation: advances in understanding facultative heterochromatin formation. Nature Reviews Genetics, 12:542–553. Xiao, C.-L., Zhu, S., He, M., Chen, D., Zhang, Q., Chen, Y., Yu, G., Liu, J., Xie, S.- Q., Luo, F., Liang, Z., Wang, D.-P., Bo, X.-C., Gu, X.-F., Wang, K., and Yan, G.-R. (2018). N6-Methyladenine DNA Modification in the Human Genome. Molecular Cell, 71(2):306–318.e7. Xie, H., Wang, M., De Andrade, A., Bonaldo, M. D. F., Galat, V., Arndt, K., Rajaram, V., Goldman, S., Tomita, T., and Soares, M. B. (2011). Genome-wide quantitative assessment of variation in DNA methylation patterns. Nucleic Acids Research, 39(10):4099–4108. Xie, W., Schultz, M. D., Lister, R., Hou, Z., Rajagopal, N., Ray, P., Whitaker, J. W., Tian, S., Hawkins, R. D., Leung, D., Yang, H., Wang, T., Lee, A. Y., Swanson, S. A., Zhang, J., Zhu, Y., Kim, A., Nery, J. R., Urich, M. A., Kuan, S., Yen, C.-a., Klugman, S., Yu, P., Suknuntha, K., Propson, N. E., Chen, H., Edsall, L. E., Wagner, U., Li, Y., Ye, Z., Kulkarni, A., Xuan, Z., Chung, W.-Y., Chi, N. C., Antosiewicz-Bourget, J. E., Slukvin, I., Stewart, R., Zhang, M. Q., Wang, W., Thomson, J. A., Ecker, J. R., and Ren, B. (2013). Epigenomic Analysis of Multilineage Differentiation of Human Embryonic Stem Cells. Cell, 153(5):1134–1148. Xu, M., Pirtskhalava, T., Farr, J. N., Weigand, B. M., Palmer, A. K., Weivoda, M. M., Inman, C. L., Ogrodnik, M. B., Hachfeld, C. M., Fraser, D. G., Onken, J. L., Johnson, K. O., Verzosa, G. C., Langhi, L. G. P., Weigl, M., Giorgadze, N., LeBrasseur, N. K., Miller, J. D., Jurk, D., Singh, R. J., Allison, D. B., Ejima, K., Hubbard, G. B., Ikeno, Y., Cubro, H., Garovic, V. D., Hou, X., Weroha, S. J., Robbins, P. D., Niedernhofer, L. J., Khosla, S., Tchkonia, T., and Kirkland, J. L. (2018). Senolytics improve physical function and increase lifespan in old age. Nature Medicine, 24(8):1246–1256. Yang, L., Rodriguez, B., Mayle, A., Park, H. J., Lin, X., Luo, M., Jeong, M., Curry, C. V., Kim, S.-B., Ruau, D., Zhang, X., Zhou, T., Zhou, M., Rebel, V. I., Challen, G. A., Göttgens, B., Lee, J.-S., Rau, R., Li, W., and Goodell, M. A. (2016a). DNMT3A Loss 210 References Drives Enhancer Hypomethylation in FLT3-ITD-Associated Leukemias. Cancer Cell, 30(2):363–365. Yang, Y., Sebra, R., Pullman, B. S., Qiao, W., Peter, I., Desnick, R. J., Geyer, C. R., DeCoteau, J. F., and Scott, S. A. (2015). Quantitative and multiplexed DNA methylation analysis using long-read single-molecule real-time bisulfite sequencing (SMRT-BS). BMC Genomics, 16(1):350. Yang, Y. C., Boen, C., Gerken, K., Li, T., Schorpp, K., and Harris, K. M. (2016b). Social relationships and physiological determinants of longevity across the human life span. Proceedings of the National Academy of Sciences, 113(3):578–583. Yang, Z., Wong, A., Kuh, D., Paul, D. S., Rakyan, V. K., Leslie, R. D., Zheng, S. C., Widschwendter, M., Beck, S., and Teschendorff, A. E. (2016c). Correlation of an epigenetic mitotic clock with cancer risk. Genome Biology, 17(1):205. Yong, W.-S., Hsu, F.-M., and Chen, P.-Y. (2016). Profiling genome-wide DNA methylation. Epigenetics & Chromatin, 9(1):26. Yu, L., Liu, C., Bennett, K., Wu, Y.-Z., Dai, Z., Vandeusen, J., Opavsky, R., Raval, A., Trikha, P., Rodriguez, B., Becknell, B., Mao, C., Lee, S., Davuluri, R. V., Leone, G., Van den Veyver, I. B., Caligiuri, M. A., and Plass, C. (2004). A NotI–EcoRV promoter library for studies of genetic and epigenetic alterations in mouse models of human malignancies. Genomics, 84(4):647–660. Yuan, T., Jiao, Y., de Jong, S., Ophoff, R. A., Beck, S., and Teschendorff, A. E. (2015). An Integrative Multi-scale Analysis of the Dynamic DNA Methylation Landscape in Aging. PLOS Genetics, 11(2):e1004996. Zbiec´-Piekarska, R., Spólnicka, M., Kupiec, T., Makowska, Z˙., Spas, A., Parys-Proszek, A., Kucharczyk, K., Płoski, R., and Branicki, W. (2015). Examination of DNA methylation status of the ELOVL2 marker may be useful for human age prediction in forensic science. Forensic Science International: Genetics, 14:161–167. Zeng, J., Nagrajan, H. K., and Yi, S. V. (2014). Fundamental diversity of human CpG islands at multiple biological levels. Epigenetics, 9(4):483–491. Zhang, R., Chen, W., and Adams, P. D. (2007). Molecular Dissection of Formation of Senescence-Associated Heterochromatin Foci. Molecular and Cellular Biology, 27(6):2343–2358. Zhang, W., Li, J., Suzuki, K., Qu, J., Wang, P., Zhou, J., Liu, X., Ren, R., Xu, X., Ocampo, A., Yuan, T., Yang, J., Li, Y., Shi, L., Guan, D., Pan, H., Duan, S., Ding, Z., Li, M., Yi, F., Bai, R., Wang, Y., Chen, C., Yang, F., Li, X., Wang, Z., Aizawa, E., Goebl, A., Soligalla, R. D., Reddy, P., Esteban, C. R., Tang, F., Liu, G.-H., and Belmonte, J. C. I. (2015a). A Werner syndrome stem cell model unveils heterochromatin alterations as a driver of human aging. Science, 348(6239):1160– 1163. Zhang, W., Spector, T. D., Deloukas, P., Bell, J. T., and Engelhardt, B. E. (2015b). Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements. Genome biology, 16(1):14. References 211 Zheng, S. C., Breeze, C. E., Beck, S., and Teschendorff, A. E. (2018). Identification of differentially methylated cell types in epigenome-wide association studies. Nature Methods, 15(12):1059–1066. Zheng, S. C., Widschwendter, M., and Teschendorff, A. E. (2016). Epigenetic drift, epigenetic clocks and cancer risk. Epigenomics, 8(5):705–719. Zhou, W., Dinh, H. Q., Ramjan, Z., Weisenberger, D. J., Nicolet, C. M., Shen, H., Laird, P. W., and Berman, B. P. (2018). DNA methylation loss in late-replicating domains is linked to mitotic cell division. Nature Genetics, 50(4):591–602. Zhu, T., Zheng, S. C., Paul, D. S., Horvath, S., and Teschendorff, A. E. (2018). Cell and tissue type independent age-associated DNA methylation changes are not rare but common. Aging, 10(11):3541–3557. Zhuang, J., Widschwendter, M., and Teschendorff, A. E. (2012). A comparison of feature se- lection and classification methods in DNA methylation studies using the Illumina Infinium platform. BMC Bioinformatics, 13(1):59. Ziller, M. J., Gu, H., Mueller, F., Donaghey, J., Tsai, L. T., and Kohlbacher, O. (2013). Charting a dynamic DNA methylation landscape of the human genome. Nature, 500:477– 481. Ziller, M. J., Müller, F., Liao, J., Zhang, Y., Gu, H., Bock, C., Boyle, P., Epstein, C. B., Bernstein, B. E., Lengauer, T., Gnirke, A., and Meissner, A. (2011). Genomic Distribution and Inter-Sample Variation of Non-CpG Methylation across Human Cell Types. PLOS Genetics, 7(12):e1002389.