On the epigenetic ageing clock in
humans
Daniel Elías Martín Herranz
European Molecular Biology Laboratory,
European Bioinformatics Institute
University of Cambridge
This dissertation is submitted for the degree of
Doctor of Philosophy
Churchill College April 2019

A mi familia capicúa, Andrés, Pilar y Andrés.
Porque estas páginas de ciencia son un reflejo de su arte.

Declaration
This dissertation is the result of my own work and includes nothing which is the outcome
of work done in collaboration with others, except when specified in the declarations at the
beginning of the chapters. I further specify this by using the pronoun ‘we’ when others were
substantially involved in the work and ‘I’ for those parts that are purely my own work.
It is not substantially the same as any that I have submitted, or, is being concurrently
submitted for a degree or diploma or other qualification at the University of Cambridge
or any other University or similar institution. I further state that no substantial part of my
dissertation has already been submitted, or, is being concurrently submitted for any such
degree, diploma or other qualification at the University of Cambridge or any other University
or similar institution. This dissertation contains fewer than 60,000 words exclusive of tables,
footnotes, bibliography, and appendices and has fewer than 150 figures.
Daniel Elías Martín Herranz
April 2019

Acknowledgements
This thesis has made use of a great amount of chronological time (hopefully not too much biological
time) of a lot of people. This is also their work. I am deeply thankful ...
... to Janet Thornton, for opening the doors of the EBI to me, showing me the true nature of
critical thinking, science and proper discussion, and for supporting my (sometimes) wild ideas and
plans;
... to Wolf Reik, who is responsible for my scientific crush on epigenetics, for accepting me as an
unofficial student, providing always stimulating ideas and inviting me to his garden parties;
... to Tom Stubbs, for his scientific creativity, friendship and burrito evenings;
... to the rest of my collaborators, especially Marc Jan Bonder, Antonio Ribeiro and Erfan
Aref-Eshghi, for their contributions;
... to the rest of my TAC members, Oliver Stegle, Judith Zaugg and Gos Micklem, for their
guidance;
... to Nils Eling, Hannah Meyer, Jack Monahan and Max Stammnitz; for taking the time to read
through these pages and send me their thoughts and comments;
... to the incredible people in the Thornton and Reik labs, for their input and many shared lunches
(and some beers);
... a mi familia, por su amor y apoyo siempre incondicional (y por alimentarme tan bien);
... a Parvathi ‘Ale’ Subbiah, porque su ‘efecto’ me ha dado fuerza todos los días desde que la
conocí (y por ayudarme con el diseño de las figuras);
... to the EMBL-EBI crowd, especially to Jack, Nils, Lara, Omar, Hannah and Julia, for many
good times at the Blue Moon and the Wiggle Mansion;
... to the rest of the Cambridge crowd, including members of Los del Cam (Max, Vlad, Gogi,
Ale), Churchill College (Barbora, basketball team), the CompBio MPhil (Daniel, Elias, Dalia, Andy)
and becari@s La Caixa; for keeping me sane in this bubble;
... a mis amigos de Salamanca (Salón del Té) y de Soria (Club de Bebedores Mercadona); por su
eterna amistad;
... to La Caixa and EMBL, for funding me and giving me the opportunity to be writing these
words;
... to all those people that I forgot to include because of my procrastination, they know who they
are.

Abstract
Epigenetic clocks are mathematical models that predict the biological age of an organism us-
ing DNA methylation data, and which have emerged in the last few years as the most accurate
biomarkers of the ageing process. However, little is known about the molecular mechanisms
that control the rate of such clocks. In this thesis I focus on the study of the epigenetic ageing
clock in humans. First, I review and benchmark statistical and computational tools required
for the analysis of DNA methylation data in the context of human ageing. Next, I validate the
performance of the Horvath epigenetic clock, the most widely used multi-tissue epigenetic
clock in humans, in a control blood dataset and test its behaviour in patients with a variety of
developmental disorders, which harbour mutations in proteins of the epigenetic machinery. I
demonstrate that loss-of-function mutations in the H3K36 methyltransferase NSD1, which
cause Sotos syndrome, substantially accelerate epigenetic ageing. Furthermore, I show that
the normal ageing process and Sotos syndrome share methylation changes and the genomic
context in which they happen. These results suggest that the H3K36 methylation machinery
is a key component of the epigenetic maintenance system in humans, which controls the
rate of epigenetic ageing, and this role seems to be conserved in model organisms. Finally, I
provide a technological strategy to make epigenetic clocks (or any DNA methylation-based
mathematical models) more cost-effective by exploiting the ability of restriction enzymes
to perform genomic enrichment. This thesis provides novel insights (statistical, biological,
technological) into the epigenetic ageing clock in humans, which will help to shed light on
the different processes that erode the human epigenetic landscape during ageing.

Table of contents
List of figures xiii
List of tables xvii
Abbreviations and acronyms xxiv
1 Introduction 1
1.1 The biology of ageing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 A brief introduction to ageing theory . . . . . . . . . . . . . . . . . 1
1.1.2 The genetic basis of ageing . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Hallmarks of mammalian ageing . . . . . . . . . . . . . . . . . . . 9
1.1.4 Studying the ageing process in humans . . . . . . . . . . . . . . . 12
1.2 Epigenetics of ageing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.1 A brief introduction to epigenetics . . . . . . . . . . . . . . . . . . 14
1.2.2 Fundamentals of DNA methylation in mammals . . . . . . . . . . 18
1.2.3 Links between the epigenetic machinery and ageing . . . . . . . . 23
1.3 The epigenetic ageing clock . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.3.1 Measuring the ageing process . . . . . . . . . . . . . . . . . . . . 27
1.3.2 The landscape of epigenetic clocks . . . . . . . . . . . . . . . . . . 29
1.3.3 Molecular mechanisms of the epigenetic ageing clock . . . . . . . 32
2 Statistical aspects 37
2.1 Analysing the blood methylome to study human ageing . . . . . . . . . . . 37
2.1.1 Building a DNA methylation dataset from public data . . . . . . . . 37
2.1.2 Main DNA methylation data pre-processing pipeline . . . . . . . . 38
2.1.3 Accounting for blood cell composition changes during ageing . . . 46
2.1.4 Identifying differentially methylated positions during ageing . . . . 53
2.1.5 Shannon methylation entropy . . . . . . . . . . . . . . . . . . . . 59
2.2 Behaviour of Horvath’s epigenetic clock during ageing . . . . . . . . . . . 61
xii Table of contents
2.2.1 Calculating epigenetic age using Horvath’s epigenetic clock . . . . 61
2.2.2 Horvath’s epigenetic clock measures physiological ageing . . . . . 64
2.2.3 Correcting for batch effects in the context of the epigenetic clock . 67
2.3 Behaviour of other epigenetic clocks during ageing . . . . . . . . . . . . . 70
2.3.1 Hannum’s epigenetic clock . . . . . . . . . . . . . . . . . . . . . . 70
2.3.2 Epigenetic mitotic clock: epiTOC . . . . . . . . . . . . . . . . . . 71
2.4 Additional methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3 Biological aspects 79
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.2 Screening for genes that accelerate the epigenetic ageing clock . . . . . . . 80
3.3 Sotos syndrome accelerates epigenetic ageing . . . . . . . . . . . . . . . . 84
3.4 Comparing Sotos syndrome and physiological ageing . . . . . . . . . . . . 87
3.5 Methylation Shannon entropy and the epigenetic clock . . . . . . . . . . . 90
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.7 Additional methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4 Technological aspects 103
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 Restriction enzyme digestion as a tool for genomic enrichment . . . . . . . 106
4.3 cuRRBS: customised Reduced Representation Bisulfite Sequencing . . . . 109
4.4 Running cuRRBS in different biological systems . . . . . . . . . . . . . . 111
4.5 Experimental validation of cuRRBS . . . . . . . . . . . . . . . . . . . . . 113
4.6 Conclusions and future directions . . . . . . . . . . . . . . . . . . . . . . 115
4.7 Additional methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5 Final remarks 125
5.1 Statistical aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.2 Biological aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.3 Technological aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Appendix 131
S.1 Supplementary for chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . 131
S.2 Supplementary for chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . 140
S.3 Supplementary for chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . 161
References 169
List of figures
1.1 Theoretical framework to conceptualise the ageing process . . . . . . . . . 4
1.2 Main signalling pathways that affect the ageing process . . . . . . . . . . . 6
1.3 Establishment and maintenance of 5-methylcytosine in mammalian genomes 20
1.4 Oxidation of 5-methylcytosine and the cycle of demethylation . . . . . . . 21
2.1 Chronological age distribution in the healthy individuals . . . . . . . . . . 40
2.2 Main DNA methylation data pre-processing pipeline . . . . . . . . . . . . 43
2.3 Effect of BMIQ normalisation on the β -value distribution . . . . . . . . . . 45
2.4 Benchmarking of the cell-type deconvolution strategies in blood: RMSE and
MAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.5 Predictions obtained for each blood cell type using the optimal deconvolution
strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.6 Changes in blood cell composition during human ageing . . . . . . . . . . 54
2.7 Changes in the blood methylome during human ageing . . . . . . . . . . . 57
2.8 Changes in the β -values of four different aDMPs . . . . . . . . . . . . . . 58
2.9 Relationship between the β -value and the Shannon entropy at a given CpG site 60
2.10 Genome-wide methylation Shannon entropy during physiological ageing . . 60
2.11 Transforming chronological age in Horvath’s model . . . . . . . . . . . . . 63
2.12 Horvath’s epigenetic clock measures physiological ageing . . . . . . . . . 66
2.13 Correcting for batch effects in the context of the epigenetic clock . . . . . . 69
2.14 Causes of deviation from the expected EAA distribution in the control model 70
2.15 Behaviour of Hannum’s epigenetic clock in the healthy individuals . . . . . 72
2.16 Behaviour of the epigenetic mitotic clock (epiTOC) in the healthy individuals 74
3.1 Chronological age distribution in the individuals with developmental disorders 81
3.2 Overview of the analyses performed in Chapter 3 . . . . . . . . . . . . . . 83
3.3 Screening for epigenetic age acceleration (EAA) in developmental disorders 85
3.4 Sotos syndrome accelerates epigenetic ageing . . . . . . . . . . . . . . . . 86
xiv List of figures
3.5 Comparing DNA methylation changes in Sotos syndrome and physiological
ageing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.6 Landscape of Horvath’s epigenetic clock CpGs in Sotos syndrome . . . . . 91
3.7 Methylation Shannon entropy during physiological ageing and in Sotos
syndrome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.8 Proposed model that highlights the role of H3K36 methylation maintenance
on epigenetic ageing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.1 The landscape of restriction enzyme motifs . . . . . . . . . . . . . . . . . 106
4.2 Restriction enzyme digestion as a tool for genomic enrichment . . . . . . . 108
4.3 cuRRBS overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.4 Running cuRRBS in different biological systems . . . . . . . . . . . . . . 114
4.5 Experimental validation of cuRRBS . . . . . . . . . . . . . . . . . . . . . 116
S1.1 Effects of noob background correction on the array fluorescence intensities. 131
S1.2 Quality control (QC) strategy to identify outlier samples. . . . . . . . . . . 132
S1.3 M-value distributions in the GSE41273 batch . . . . . . . . . . . . . . . . 132
S1.4 Cell-type deconvolution strategies that were benchmarked . . . . . . . . . 133
S1.5 Benchmarking of the cell-type deconvolution strategies in blood: R2 . . . . 134
S1.6 Table showing the top 100 aDMPs . . . . . . . . . . . . . . . . . . . . . . 137
S1.7 Impact of the absence of background correction on the predictions from the
epigenetic clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
S1.8 Correcting for batch effects: control model without cell composition correction138
S1.9 PCA on the array control probes captures batch effects: cases . . . . . . . . 139
S1.10Variance explained by the different principal components during batch effect
correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
S2.1 Table showing information for the individuals with developmental disorders 149
S2.2 Effect of changing the median age of the controls when performing the
screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
S2.3 Screening for epigenetic age acceleration (EAA) in developmental disorders:
additional scatterplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
S2.4 Enrichment for the categorical (epi)genomic features in Sotos and ageing:
genome-wide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
S2.5 Distributions of scores for the continuous (epi)genomic features in Sotos and
ageing: genome-wide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
S2.6 Scores for the continuous (epi)genomic features in the Horvath’s epigenetic
clock CpGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
List of figures xv
S2.7 Enrichment for the categorical (epi)genomic features in Sotos and ageing:
Horvath’s epigenetic clock . . . . . . . . . . . . . . . . . . . . . . . . . . 157
S2.8 Distributions of scores for the continuous (epi)genomic features in Sotos and
ageing: Horvath’s epigenetic clock . . . . . . . . . . . . . . . . . . . . . . 158
S2.9 Methylation Shannon entropy acceleration . . . . . . . . . . . . . . . . . . 159
S2.10Batch effects in the methylation Shannon entropy for the epigenetic clock sites159
S2.11Information for the continuous (epi)genomic features . . . . . . . . . . . . 160
S3.1 Scatterplot of fragment length distributions for the isoschizomer families . . 161
S3.2 Genomic features that overlap with restriction enzyme cleavage sites . . . . 162
S3.3 Comparison of studies using restriction enzymes for genomic enrichment . 163
S3.4 Additional insights into cuRRBS . . . . . . . . . . . . . . . . . . . . . . . 164
S3.5 Additional results of running cuRRBS in different biological systems . . . . 165
S3.6 Effect of experimental errors during size selection in cuRRBS predictions . 166
S3.7 cuRRBS computational efficiency . . . . . . . . . . . . . . . . . . . . . . 167

List of tables
1.1 Comparison of epigenetic clocks in different species . . . . . . . . . . . . 31
2.1 Overview of the blood DNA methylation dataset from healthy individuals . 39
3.1 Overview of the developmental disorders that were included in the screening 82
4.1 Flexible user-defined cuRRBS parameters . . . . . . . . . . . . . . . . . . 119
S2.1 Additional information for the developmental disorders dataset . . . . . . . 140

Abbreviations and acronyms
27K Illumina Infinium HumanMethylation27 array
450K Illumina Infinium HumanMethylation450 array
5caC 5-carboxylcytosine
5fC 5-formylcytosine
5hmC 5-hydroxymethylcytosine
5mC 5-methylcytosine
a.k.a. Also known as
aDMPs Differentially methylated positions during ageing
AMP Adenosine monophosphate
AMPK Adenosine monophosphate-activated kinase
ASD Autism spectrum disorder
ATP Adenosine triphosphate
ATR-X Alpha thalassemia/mental retardation X-linked syndrome
aVMPs Variably methylated positions during ageing
B CD19+ B cells
BER Base excision repair
BMIQ Beta-mixture quantile normalisation
bp Base pairs
CCC Cell composition correction
CD4T CD4+ T cells
CD8T CD8+ T cells
CG 5′-cytosine-phosphate-guanine-3′
xx Abbreviations and acronyms
CGI CpG island
CHG 5′-cytosine-phosphate-H-phosphate-guanine-3′, where H corresponds to ade-
nine, thymine or cytosine
CHH 5′-cytosine-phosphate-H-phosphate-H-3′, where H corresponds to adenine,
thymine or cytosine
ChIP-seq Chromatin immunoprecipitation and sequencing
CP/QP Constrained projection/quadratic programming
CpG 5′-cytosine-phosphate-guanine-3′
CPU Central processing unit
CRF Cost Reduction Factor in cuRRBS
cSEA Shannon entropy acceleration for the Horvath’s epigenetic clock sites
CTCF CCCTC-binding factor
cuRRBS customised Reduced Representation Bisulfite Sequencing
DHS DNase Hypersensitive Sites
DHS-DMCs In cell-type deconvolution strategies, reference probes identified using infor-
mation from differential methylation and chromatin accessibility
DMCs Differentially methylated cytosines
DMCTs Differentially methylated cytosines in individual cell types
DMPs Differentially methylated positions
DMRs Differentially methylated regions
DMV DNA methylation valley
DNA Deoxyribonucleic acid
DNAmAge DNA methylation age i.e. epigenetic age calculated with Horvath’s epigenetic
clock
EAA Epigenetic age acceleration
EPIC Illumina Infinium MethylationEPIC array
epiTOC epigenetic Timer of Cancer (i.e. the epigenetic mitotic clock)
ESCs Embryonic stem cells
etc. Et cetera
EV Enrichment Value in cuRRBS
EWAS Epigenome-wide association studies
Abbreviations and acronyms xxi
FDR False discovery rate
FN False negatives
FP False positives
FXS Fragile X syndrome
GB Gigabytes
Gbp Giga base pairs
GC content Guanine + cytosine content
GEO Gene Expression Omnibus repository
Gran Granulocytes
gSEA Genome-wide Shannon entropy acceleration
GWAS Genome-wide association studies
H3K27me3 Histone H3 lysine 27 trimethylation
H3K36 Histone H3 lysine 36
H3K36me3 Histone H3 lysine 36 trimethylation
H3K4me3 Histone H3 lysine 4 trimethylation
hg19 Reference human genome assembly 19
hg38 Reference human genome assembly 38
hQTLs Histone quantitative trait loci
HSCs Haematopoietic stem cells
i.e. Id est
IDOL IDentifying Optimal DNA methylation Libraries, a strategy to build cell-type
deconvolution references
IEAA Intrinsic epigenetic age acceleration
IGF-1 Insulin-like growth factor 1
IHEC International Human Epigenome Consortium
iPSCs Induced pluripotent stem cells
kb Kilo base pairs
KNN k-nearest neighbours
xxii Abbreviations and acronyms
m6A N6-methyladenosine
MAE Mean absolute error (in the context of cell-type deconvolution benchmarking)
or median absolute error (in the context of Horvath’s epigenetic clock)
MBD Methyl-CpG-binding domain
MEFs Mouse embryonic fibroblasts
meQTLs Methylation quantitative trait loci
Mono CD14+ monocytes
mRNA Messenger RNA
NAD+ Nicotinamide adenine dinucleotide
NF Theoretical number of fragments sequenced in cuRRBS
NFC Normalised fold change
NK CD56+ natural killer cells
NRC Normalised read counts
NRE Normalised RNA expression
NRF1 Nuclear respiratory factor 1
OOB Out-of-band fluorescence intensities in the Infinium I probes of Illumina
arrays
OR Odds ratio
PAHs Polycyclic aromatic hydrocarbons
PBMC Peripheral blood mononuclear cells
PC Principal component
PCA Principal component analysis
PCC Pearson’s correlation coefficient
pcgtAge Mitotic age according to the epigenetic mitotic clock (epiTOC)
PCR Polymerase chain reaction
PGCs Primordial germ cells
PRC2 Polycomb Repressing Complex 2
QC Quality control
Abbreviations and acronyms xxiii
R It can have two meanings: robustness variable in cuRRBS or the R program-
ming language
R2 Coefficient of determination
RAM Random-access memory
Repli-seq Genome-wide analysis of replication timing by sequencing
RMSE Root mean squared error
RNA Ribonucleic acid
RNA-seq RNA sequencing
ROS Reactive oxigen species
RPC Robust partial correlations
RRBS Reduced Representation Bisulfite Sequencing
rRNA Ribosomal RNA
SASP Senescence-associated secretory phenotype
SCC Spearman’s correlation coefficient
SD Standard deviation
Sexp Sex predicted for a sample using DNA methylation data
SNP Single-nucleotide polymorphism
SQN Stratified quantile normalisation
sur Signal of unique reads
TDG Thymine DNA glycosylase
TKO Triple knockout
TN True negatives
TOR Target of rapamycin
TP True positives
TSS Transcription start site
UTR Untranslated region
WGBS Whole Genome Bisulfite Sequencing
WTS Wavelet-transformed signals
xxiv Abbreviations and acronyms
Chapter 1
Introduction
‘[...] there are as many theories of
ag[e]ing as there are biogerontologists.’
L. Hayflick [2007a]
1.1 The biology of ageing
1.1.1 A brief introduction to ageing theory
The ageing process is one of the most mysterious, complex and fascinating biological
problems that remains to be solved. Ageing and immortality have probably fascinated
mankind since we have a conception of time and death [Renfrew et al., 2016].
Biological ageing (a.k.a. the ageing process) can be broadly defined as the time-
dependent functional decline which increases vulnerability to death in most organisms
[Lopez-Otin et al., 2013]. The revolution taking place in genetics and molecular biology
during the 20th century gave rise to more than 300 theories that attempt to explain the
mechanisms behind biological ageing [Medvedev, 1990]. Any valid modern theory of ageing
would need to explain at least two things [Medvedev, 1990]:
• The molecular basis for the increase in mortality rate (a.k.a. death rate) over time
in the population of a given species. Mortality rate can be broadly defined as the
number of deaths in a population per unit of time, scaled by the size of the population.
More formally, by quantifying the deaths of individuals in a population over time (and
assuming that there are no increases in the population number due to reproduction,
migration, etc.), the survival fraction at a given time t, S(t), is [Witten, 1986]:
2 Introduction
S(t) =
N(t)
N0
(1.1)
where N(t) is the number of individuals alive at a given time t and N0 is the initial
number of individuals in the population. It can be demonstrated that the mortality rate,
λ (t), can be expressed as [Witten, 1986]:
λ (t) =− 1
S(t)
· dS(t)
dt
(1.2)
• The evolutionary variations in lifespan between different species [Jones et al.,
2013]; where lifespan is defined as the time passed between birth and death of an organ-
ism. For example, the maximum lifespan in the case of the roundworm (Caenorhabditis
elegans) is 0.16 years (58.4 days, in captivity); in the case of the fruit fly (Drosophila
melanogaster) is 0.3 years (109.5 days, in captivity); in the case of the house mouse
(Mus musculus) is 4 years (in captivity); in the case of humans (Homo sapiens) is 122.5
years and in the case of the bowhead whale (Balaena mysticetus) is 211 years (in the
wild) according to the database AnAge [De Magalhães and Costa, 2009]. Furthermore,
some species (such as certain turtles, certain species of rockfish or the bristlecone pine)
seem to have negligible senescence i.e. negligible changes in adult mortality rates over
extended periods of time at advanced adult ages [Finch, 2009].
Nowadays, there are at least two main paradigms, complementary to each other, that
try to conceptualise the problem and that are a topic of intense discussion among biogeron-
tologists:
• Ageing as a consequence of molecular infidelity. In this case, stochastic chemical
modifications of biomolecules, such as DNA or proteins, exceed the capacity of the
repair and turnover systems of the organism and accumulate over time, which increases
the entropy of the system. This leads to changes in molecular structure and, finally,
changes in function, which increase vulnerability to age-related diseases [Hayflick,
2007a,b]. From an evolutionary point of view, this fits into the disposable soma theory,
originally proposed by Thomas Kirkwood in 1977. This theory suggests that organisms
have evolved to optimise the amount of energy dedicated to repair errors in somatic
cells in order to maximise reproductive success (at the expense of indefinite survival)
[Kirkwood and Rose, 1991; Kirkwood, 1977].
1.1 The biology of ageing 3
• Ageing as a consequence of hyperfunction. In this case, the primary cause of ageing
is an excessive activity of certain growth or development-related genes and pathways
in later life [Blagosklonny, 2006, 2010; de Magalhães, 2012; Gems, 2015]. In other
words, ageing originates from developmental programmes that have not been turned
off [Blagosklonny, 2006]. This idea is rooted in the concept of antagonistic pleiotropy,
an important pillar of the evolutionary theory of ageing originally proposed by George
C. Williams in 1957 [Williams, 1957]. It implies that certain genes have opposite
effects on fitness at different ages, which is a consequence of the decrease in selection
forces after reproductive age. A strong candidate is the TOR (target of rapamycin)
pathway (see section 1.1.2), which promotes development in early life but also the
advancement of several late-life pathologies [Blagosklonny, 2010].
It has become clear that no single molecular mechanism will be able to explain ageing
across all kingdoms of life. Different species have different life histories that are subjected to
evolutionary trade-offs (e.g. regarding reproduction strategies, developmental schedules, etc.)
and that can affect the rate of ageing [Jones et al., 2013; Ricklefs, 2010]. Nevertheless, it is
possible to integrate all the ideas presented so far into a theoretical framework that can help
to unify definitions across studies and set the foundations for mechanistic advancements on
the biology of ageing (Fig. 1.1, inspired by ideas from [Freund, 2019; Gems, 2015; Hayflick,
2007a; Peto and Doll, 1997; Stroustrup et al., 2016]). Under this theoretical framework:
• The ageing process is composed of different molecular mechanisms (subprocesses)
that operate at different stages of life and contribute, in variable proportions, to the
appearance of different age-related diseases i.e. the risk of developing an age-related
disease is the ‘integral of its ageing subprocesses operating over time’. Furthermore,
the development of different diseases affects the mortality rate and, thus, the probability
of dying. The different ageing processes can also be understood as the sources of
ageing-associated molecular damage [Lopez-Otin et al., 2013].
• If the ageing subprocesses can be altered through different genetic, lifestyle or pharma-
cological interventions, it is possible to reduce the likelihood of several age-related
diseases at the same time. This makes ageing research incredibly relevant to the
biomedical sciences, since it changes the current paradigm away from developing
interventions for a specific already-existing disease towards the prevention of several
diseases simultaneously.
• Differences in the average lifespan between different species should be explained by
different combinations of ageing subprocesses and their rates.
4 Introduction
A
B
…
?
D1
D2
D3
D4
Ageing mechanism Disease
Organism 1Rate 
Mortality Rate
WA1 W
D1
WD2
WD3
WD4
WB1
W
B4
WA2
W
A3
Reproductive age
Diagnosis 
D2
Diagnosis 
D3 Death 
A
B
Time 
Organism 2Rate 
Reproductive age
Diagnosis 
D1
Diagnosis 
D4 Death 
A
B
Time 
a
b
c
Fig. 1.1 Theoretical framework to conceptualise the ageing process. a. The ageing process is composed
of different molecular mechanisms (subprocesses) that operate at different stages of life and contribute, in
variable proportions (specified by the weights), to the appearance of different age-related diseases. Furthermore,
the development of different diseases affects the mortality rate and, thus, the probability of dying. b. and c.
Examples of the life histories of two organisms. In these examples, two ageing mechanisms operate: A (which
changes its rate after reproductive age e.g. activated growth-related pathways) and B (with a constant rate over
time e.g. some type of (epi)mutational process). Differences in the mechanisms’ profiles lead to differences in
the age-related diseases that manifest over the lifespan of the organisms, even though the molecular mechanisms
are the same. This affects the mortality rate and, ultimately, the time-to-death. This figure is inspired by ideas
from [Freund, 2019; Gems, 2015; Hayflick, 2007a; Peto and Doll, 1997; Stroustrup et al., 2016].
1.1 The biology of ageing 5
Consequently, systems biology approaches become fundamental to understand the ageing
process [Freund, 2019]. In the next sections, I will provide an overview of the ageing
mechanisms that may operate in different species, with a special focus on mammalian
species.
1.1.2 The genetic basis of ageing
Given the large variability in lifespan between species [Jones et al., 2013], it is nowadays
clear that the ageing process must have a genetic basis. However, for a long time, the ageing
process was thought to be a ‘haphazard process driven solely by entropy’ [Kenyon, 2005].
Furthermore, in 1935 Clive Maine McCay had shown that caloric restriction (a reduction
in calories intake without malnutrition) could extend mean and maximal lifespan in rats
[McCay et al., 1935; McDonald and Ramsey, 2010], which probably shifted the focus towards
environmental or external causes as the main driving forces of the ageing process. Since
then, dietary restriction (which includes different types of dietary interventions that reduce
food intake without malnutrition) has been established as the most successful non-genetic
intervention to slow down the ageing process across species [Fontana and Partridge, 2015].
The establishment of the nematode Caenorhabditis elegans as a model organism in
the 70s triggered its adoption in the ageing field [Klass and Hirsh, 1976], since it allowed
well-controlled experiments in a much shorter period of time than rodents [Johnson, 2013].
This lead to the discovery of the first mutants that dramatically extended lifespan, which
mapped to genes in the insulin/IGF-1 signalling pathway [Kenyon et al., 1993; Morris et al.,
1996]. Since then, many genes have been found to significantly affect the lifespan of other
model organisms as well, such as in budding yeast (Saccharomyces cerevisiae), in fruit flies
(Drosophila melanogaster) and in mice (Mus musculus) [Kenyon, 2005, 2010; Singh et al.,
2019].
Interestingly, the effects of many of these genetic mutations and their pathways are shared
by distantly-related species. This suggests that at least part of the molecular mechanisms that
drive the ageing process could be evolutionarily conserved. Major signalling pathways that
have been associated with ageing include the following (Fig. 1.2) [Greer and Brunet, 2008;
Kenyon, 2005, 2010; Singh et al., 2019]:
• Insulin/IGF-1 pathway. This underscores the central role of the endocrine system on
the biology of ageing. Mutations that lower the level of daf-2, encoding an insulin/IGF-
1 receptor, were originally found to double the lifespan of C. elegans [Guarente and
6 Introduction
(DAF-2)
IGF1R / INSR
(     nutrients)
Dietary restriction
mTOR
(TOR)
NRF
(SKN-1)
FOXO1/3/4
(DAF-16)
S6K1
(RSKS-1)
HSF1
(HSF-1)
4EBP-1
(-)
AMPK 
(AAK)
Expression of longevity-
promoting genes
(e.g. stress resistance)
Translation Autophagy
Ageing rate
Lifespan
Fig. 1.2 Main signalling pathways that affect the ageing process. These pathways sense nutrient and stress
inputs (such as a dietary restriction regime) to ultimate impact the rate of ageing. The grey lines represent
inhibition (negative regulation) while the yellow arrows represent activation (positive regulation). As such,
dietary restriction inhibits the insulin/IGF-1 pathway (in red), inhibits the TOR pathway (in green) and activates
AMPK signalling (in purple), ultimately extending lifespan. For simplification, I have only included the main
proteins that transduce the signal (e.g. there are more intermediate kinases in the insulin/IGF-1 pathway).
Protein names are provided for both the mammalian (top) and the C. elegans orthologs if available (bottom, in
parenthesis).
1.1 The biology of ageing 7
Kenyon, 2000; Kenyon et al., 1993]. Activation of the insulin/IGF-1 pathway leads
to the phosphorylation of a transcription factor of the FOXO family, encoded by daf-
16 in C. elegans, which prevents it to reach the nucleus [Lin et al., 2001]. FOXO
transcription factors, of which there are several members in mammals, activate the
expression of longevity-promoting genes involved in processes such as autophagy
(which clears protein aggregates and damaged organelles in the cell) [Singh et al.,
2019], resistance to oxidative stress or stem cell maintenance [Martins et al., 2016].
This partially explains why the inhibition of the insulin/IGF-1 pathway can increase
organismal lifespan. However, other downstream targets that regulate gene expression
have also been identified, such as hsf-1 (a transcription factor that regulates heat-shock
response) [Hsu et al., 2003] or skn-1 (a transcription factor that coordinates a response
to oxidative stress) [Tullet et al., 2008] in C. elegans.
• TOR pathway. TOR (target of rapamycin) is a kinase that acts as a major amino-acid
and nutrient sensor by stimulating growth (including protein translation) and blocking
autophagy [Kenyon, 2010]. The effects of TOR are partly mediated by activating the
ribosomal subunit S6 kinase (which promotes protein translation) and by inhibiting
4EBP (a translation inhibitor) [Kenyon, 2010; Um et al., 2006]. Reductions in TOR
activity (via genetic or pharmacological mechanisms) increase lifespan across many
species [Kenyon, 2010]. Importantly, rapamycin, a drug that inhibits TOR, can increase
the mean lifespan of mice when fed late in life, which showed for the first time that
pharmacological interventions targeting mammalian ageing are possible [Harrison
et al., 2009]. Interestingly, the increase in lifespan differed in males (9%) and females
(13%) [Harrison et al., 2009], highlighting the sex-specific effects of some ageing
mechanisms.
• AMPK pathway. The AMP-activated kinase (AMPK) controls the balance between
catabolic and anabolic processes depending on the cellular levels of AMP/ATP (i.e.
when ATP levels decrease, AMPK is activated to promote catabolic pathways) [Kenyon,
2010; Mihaylova and Shaw, 2011]. Furthermore, AMPK activation promotes au-
tophagy, partially by inhibiting TOR [Mihaylova and Shaw, 2011]. The anti-diabetic
drug metformin, which activates AMPK among other targets, has been shown to extend
lifespan in mice [Anisimov et al., 2008; Martin-Montalvo et al., 2013] and has been
included as the first drug to target the human ageing process in a clinical trial [Barzilai
et al., 2016].
• Sirtuins. Sirtuins are a family of nicotinamide adenine dinucleotide (NAD+)-dependent
deacetylases i.e. they generally catalyse the removal of an acetyl group from lysine
8 Introduction
residues using NAD+ as a cofactor [Bonkowski and Sinclair, 2016]. Sirtuins have
been shown to play complex roles in the biology of ageing and age-related diseases, in
general by cross-talking with other nutrient-sensing pathways and promoting longevity
[Bonkowski and Sinclair, 2016; Kenyon, 2010]. Several authors have shown that
increasing NAD+ levels enhances the activity of sirtuins, which could constitute an
additional anti-ageing pharmacological avenue in mammals. Additionally, intensive re-
search is being carried out to identify other molecules that activate sirtuins [Bonkowski
and Sinclair, 2016].
• Other pathways. Mitochondrial respiration (and its production of reactive oxygen
species or ROS), genome surveillance pathways (such as those involved in DNA repair
or telomere maintenance), signals from the reproductive system or Wnt signalling have
also been implicated in different ways in the ageing process [Greer and Brunet, 2008;
Kenyon, 2010; Lezzerini and Budovskaya, 2014].
These pathways seem to have a dual role depending on the environmental context
that the organism is facing, behaving as nutrient and stress sensors. Under abundant
nutrient availability and low stress (oxidative, temperature), they tend to promote growth
and reproduction. While in contrast, under harsh conditions (such as those posed by dietary
restriction), they favour cell protection and maintenance [Kenyon, 2005, 2010]. It is worth
mentioning that the responses of the different pathways to dietary restriction deeply depend
on the characteristics of the diet and its timing [Kenyon, 2010]. This model also relates to the
disposable soma theory, where more resources are allocated either to reproduction or somatic
maintenance depending on the context [Kirkwood and Rose, 1991; Kirkwood, 1977]. This
is further mechanistically supported by experiments showing that decreased insulin/IGF-1
signalling (e.g. via daf-2 mutation) produces the acquisition of germline characteristics (e.g.
higher genomic stability) in C. elegans somatic cells [Curran et al., 2009]. Even though
this model is a clear oversimplification, it becomes useful when thinking about the way in
which the ageing process might have evolved and how the same biological pathways can be
repurposed to activate genetic programs with completely different goals.
There are many more complexities associated with these pathways that would require
an entire thesis on its own. For example, the insulin/IGF-1 signalling pathway can work
in a cell non-autonomous manner (i.e. the activity of the pathway in one tissue can affect
lifespan by influencing cells in a different tissue), which could help to coordinate ageing
rates in the organism, and the effects are many times tissue-specific [Kenyon, 2005, 2010].
Additionally, the pathways can have different effects depending on the life stage of the animal
(e.g. development, adulthood, etc.) [Dillin et al., 2002]. Furthermore, cross-talk between the
1.1 The biology of ageing 9
pathways has previously been reported [Bonkowski and Sinclair, 2016; Greer et al., 2007].
Therefore, the inner workings of these signalling pathways is still an area of intense research.
The discovery of signalling pathways that can dramatically extend the lifespan of model
organisms has demonstrated that the ageing process has a genetic basis and it is possible
to alter its rate. More importantly, the appearance of age-related disease seems to be
delayed in many of these long-lived organisms [Arantes-Oliveira et al., 2003; Kenyon, 2010],
suggesting that these interventions indeed reduce the rate of some the operating ageing
mechanisms (Fig. 1.1).
1.1.3 Hallmarks of mammalian ageing
Most studies on the biology of mammalian ageing have been conducted in mice. Many
genetic mutations in conserved pathways (mainly nutrient-sensing pathways) have been
shown to significantly extend the lifespan of mice. Among them, those that affect growth
hormone signalling (which in mammals in turn controls the secretion of IGF-1 by the liver and
therefore the insulin/IGF-1 signalling pathway) produce the longest lifespan improvements
(in the order of 40-60%) [Singh et al., 2019]. Even though this is a remarkable result, it is
far off the lifespan extensions achieved with ‘simpler’ model organisms such as C. elegans
(where extensions of almost 1000% have been achieved with a mutation in a single gene of the
insulin/IGF-1 pathway; the equivalent of a human living up to ≈ 1200 years!) [Ayyadevara
et al., 2008]. This highlights a trend where translating lifespan interventions discovered in
worms and flies yields generally less spectacular results in mice and potentially in humans.
Evolution has been experimenting with lifespan extension for a long time. Consequently,
some species of mammals, such as the naked mole rat (Heterocephalus glaber) or some
species of bats (Chiroptera), are exceptionally long-lived for their body size. Recent reports
point towards the possibility that these species do not increase their mortality rate with age
(i.e. they may have negligible senescence) [Fleischer et al., 2017; Ruby et al., 2018a], which
makes them incredibly interesting systems to study the biology of ageing in mammals.
In 2013, López-Otín et al. reviewed the main common denominators of the ageing
process across organisms [Lopez-Otin et al., 2013]. They defined several hallmarks of
ageing, which can be understood as the measurable consequences of the ageing mechanisms
that I proposed in Fig 1.1. I will briefly discuss some of them, with a special focus on those
that directly affect the genome during mammalian ageing [Lopez-Otin et al., 2013; Singh
et al., 2019]:
10 Introduction
• Genomic instability. Somatic DNA mutations (single nucleotide variants, copy num-
ber changes, structural rearrangements, etc.) accumulate over time in mammalian
cells (both in the nuclear genome and the mitochondrial genome) [Larsson, 2010;
Martincorena et al., 2018]. Different mutational processes (good candidates for ageing
mechanisms) create specific patterns of mutations (a.k.a. mutational signatures) in the
genome, which have been widely studied in the context of human cancer [Alexandrov
and Stratton, 2014]. It is possible to assign specific endogenous (e.g. DNA replication
errors) and exogenous factors (e.g. smoke exposure) that contribute to the different
processes. In the context of ageing, deamination of 5-methylcytosine (5mC) in a
CpG context leads to C>T (cytosine to thymine) mutations, which accumulate in a
clock-like manner with a rate that correlates with the proliferative activity of the tissue
[Alexandrov et al., 2015]. Furthermore, nuclear architecture and the 3-dimensional
organisation of the genome both seem to change with age, which can distort nuclear
homeostasis. Interestingly, several human diseases that are considered to display
premature ageing, such as Werner syndrome or Hutchinson–Gilford progeria, have
mutations in proteins that lead to genomic instability [Oberdoerffer and Sinclair, 2007].
Finally, it is possible that an increase in the mobilisation of transposable elements with
age further contributes to destabilise the genome [Orr, 2016].
• Telomere attrition. The repetitive DNA sequences at the linear ends of mammalian
chromosomes are capped with the protein complex shelterin to form structures known
as telomeres. Due to the nature of the standard DNA replication machinery, the
chromosomal DNA ends of somatic cells are eroded after each cell division (net
loss of 100-200 bp of telomeric sequence per cell division). After a certain number
of doublings (and therefore telomeres shortening) cells stop diving and they induce
cellular senescence (see below) or cell death (apoptosis) [O’Sullivan and Karlseder,
2010]. For many years, this replicative limit, known as the Hayflick limit, was
understood as the manifestation of the ageing process at the cellular level [Hayflick,
1998; Hayflick and Moorhead, 1961]. Telomere shortening has indeed been shown
to occur with age in most human tissues [Blasco, 2007]. Importantly, stem cells and
germ cells express telomerase, an enzymatic complex that synthesises new telomeric
repeats, avoiding telomere shortening. This way organisms can regenerate their tissues
if needed, which makes it unlikely that telomere attrition is the only mechanism driving
ageing. In addition to telomere length shortening, other mechanisms may contribute to
replicative senescence in mammals [O’Sullivan and Karlseder, 2010]. Nevertheless,
telomere biology plays a critical role in many fundamental processes, such as DNA
repair and genomic stability, and non-telomeric functions for telomerase have also been
1.1 The biology of ageing 11
suggested (such as global chromatin regulation and transcription of developmentally-
regulated genes) [O’Sullivan and Karlseder, 2010]. As such, telomeres have been
implicated in age-related diseases, such as cancer and cardiovascular disease [Blasco,
2007; O’Sullivan and Karlseder, 2010]. Interestingly, ectopic expression of the catalytic
subunit of telomerase (TERT) extends the lifespan of mice that are cancer-resistant
[Tomás-Loba et al., 2008].
• Cellular senescence. Cellular senescence is a cellular state characterised by a stable
cell cycle arrest. There are different types of senescence induced by different stress
stimuli, including telomere shortening (replicative senescence, previously mentioned),
sustained DNA damage (e.g. via irradiation) or derepression of the INK4/ARF locus
(which encodes three tumour suppressor genes)[Herranz and Gil, 2018; Lopez-Otin
et al., 2013]. Under normal circumstances, cellular senescence carries out physiological
functions such as preventing pre-malignant cells from dividing, participating in wound
healing and tissue remodelling. Furthermore, senescent cells also secrete a cocktail
of factors (termed the senescence-associated secretory phenotype, or SASP) with
pleiotropic effects (e.g. pro-inflammatory, matrix remodelling, inducing growth, etc.)
[Herranz and Gil, 2018]. Senescent cells accumulate in mammalian tissues during
ageing. If this happens in excess, the SASP can perturb the homeostasis of the tissue.
Consequently, the removal of senescent cells in mice increases lifespan and reduces the
appearance of age-related phenotypes [Baker et al., 2016, 2011; Xu et al., 2018]. Drugs
that selectively induce apoptosis in senescent cells (known as senolytics) [Kirkland
et al., 2017] are currently undergoing clinical trials in humans.
• Epigenetic alterations. This hallmark is reviewed in further detail in section 1.2.3,
since it is the main focus of this thesis.
• Other hallmarks of ageing. These include loss of proteostasis (appropriate quality
control of the proteome, which is mechanistically connected with autophagy pathways;
strongly implicated in neurodegenerative diseases); deregulated nutrient sensing (medi-
ated by the pathways discussed in section 1.1.2); mitochondrial dysfunction (including
a reduction in the efficacy of the respiratory chain with age); stem cell exhaustion
(which is thought to contribute to the decline of regenerative potential of the tissues
with ageing, such as in the case of the haematopoietic system) and altered intercellular
communication (including an increase in inflammation, known as inflammageing, or
alterations in the neuroendocrine system) [Lopez-Otin et al., 2013].
Importantly, complex interactions and interdependencies emerge between the different
hallmarks of ageing. For example, in senescent cells in the mouse, a type of transposable
12 Introduction
element (LINE-1) becomes derepressed and activates type-I interferon response, which in
turn causes inflammageing [De Cecco et al., 2019]. Furthermore, understanding the role of
the environment in modulating the ageing process and the different hallmarks in mammals is
becoming increasingly important.
Assuming that molecular damage is the main cause of biological ageing, the mechanisms
that lead to genomic instability, telomere attrition, epigenetic alterations and loss of proteosta-
sis are very likely the main drivers of the ageing process, with the rest of the hallmarks being
a consequence of them [Lopez-Otin et al., 2013]. Nevertheless, interventions targeting some
of the more ‘integrative hallmarks’ (such as removing senescent cells, optimising dietary
restriction or stem cell therapies) will probably arrive earlier in the clinic.
1.1.4 Studying the ageing process in humans
Average human lifespan has nearly doubled in most developed countries during the last 200
years. This has been the consequence of external factors, such as improvements in quality
of water, nutrition, hygiene, housing and lifestyle, immunisation against infectious disease,
antibiotics and medical care [Partridge et al., 2018]. One of the most debated questions in
the human ageing field is whether there is a limit to maximal human lifespan [Dong et al.,
2016]. Since Benjamin Gompertz’s pioneering work in 1825, it is known that the mortality
rate in humans increases exponentially with age [Gompertz, 1825]. However, a recent study
on Italian centenarians suggests that mortality rate, which increases exponentially up to
about age 80, decelerates thereafter and reaches or closely approaches a plateau after age
105 [Barbi et al., 2018]. This implies that human lifespan may continue to increase in the
next decades and that we have probably not reached our evolutionary lifespan limit as a
species yet [Barbi et al., 2018; Kontis et al., 2017].
Thus, in order to avoid a massive socioeconomic burden on our societies [Fine, 2014],
biomedical research should focus on extending human healthspan (i.e. the amount of time
that we live free of disease) and not only lifespan. This goal, known as the ‘compression of
morbidity’ [Partridge et al., 2018], is theoretically possible if we target the core mechanisms
that drive the ageing process (Fig 1.1); which is assumed to be the biggest contributor
to the development of most age-related diseases, such as cancer, diabetes, cardiovascular
disorders and neurodegenerative diseases [Lopez-Otin et al., 2013]. Indeed, genetic and
pharmacological interventions that increase lifespan in model organisms also seem to extend
healthspan [Newell Stamper et al., 2018] and the compression of morbidity is a characteristic
of human centenarians [Feldman et al., 2012].
1.1 The biology of ageing 13
Most of our understanding of human ageing comes from studies carried out in population
cohorts. Furthermore, during the last years different datasets of high-throughput molecular
data (broadly know as ‘omics’) have been generated for many of these cohorts, including
genetic data, epigenetic data, metabolomic data, imaging data or even the microbiome. These
data (sometimes referred as ‘deep phenotypes’) complement the more traditional phenotypic
measurements and health records and allow, for the first time, characterising the human
ageing process with unprecedented resolution and scale. An example of such a cohort is the
UK Biobank, which has enrolled > 500,000 participants [Bahcall, 2018]. Importantly, there is
a trend in many of these cohorts to collect more longitudinal data (i.e. data over time for the
same individual), which will likely increase the power to discover causal ageing mechanisms
(as opposed to cross-sectional data, when data from different individuals at different ages is
used) [Rahmadi et al., 2017]. As Nobel laureate Sydney Brenner (known for establishing C.
elegans as a model organism) remarked 10 years ago: "We don’t have to search for a model
organism anymore. Because we are the model organisms" [FitzGerald et al., 2018] (as a
disclosure, I still believe we need model organisms to gain definitive mechanistic insights,
and probably so did the late Prof. Brenner).
The ageing process is an extremely polygenic trait, probably one of the most complex
phenotypes to be studied (as one would expect if it is composed of many different molecular
processes). Candidate gene studies (with biased hypotheses) and genome-wide association
studies (GWAS, unbiased) have found genetic variants that may affect the rate of ageing in
humans. Many of them are associated with the function of genes that are part of nutrient-
sensing pathways (such as FOXO3 or IGF1R), that increase the risk of Alzheimer’s (such
as APOE), that are involved in cellular senescence (such as CDKN2A) or that are related to
the immune system and inflammation (such as HLA-DQA1, HLA-DRB1 or IL6) [Partridge
et al., 2018; Singh et al., 2019]. Additionally, biological sex has a major impact on the ageing
process and the incidence of age-related diseases. In the case of humans, females consistently
live longer than males (females make around 90% of the supercentenarians i.e. individuals
that live 110 years or more). However, they also seem to suffer greater morbidity in later life,
which is known as the ‘mortality-morbidity paradox’ [Austad and Fischer, 2016].
It is clear that human longevity has a genetic component. However, the latest estimates
of heritability are quite low (ranging between 10-15%) [Kaplanis et al., 2018; Ruby et al.,
2018b]. Furthermore, GWAS have yielded relatively few genetic variants compared with
other complex phenotypes [Singh et al., 2019]. This could be due to the sample sizes re-
quired or methodological limitations (such as the way that the ageing phenotype is defined).
Nevertheless, it is more likely that the environment (and its interaction with the genetic
14 Introduction
background) accounts for most of the phenotypic variation in human ageing popula-
tions [Partridge et al., 2018; Singh et al., 2019]. As such, there is evidence that diet (not
only the content but also the timing) and exercise can act through nutrient-sensing pathways
to regulate human healthspan and potentially lifespan [Most et al., 2017; Partridge et al.,
2018; Redman et al., 2018; Richter and Ruderman, 2009; Singh et al., 2019; Wei et al., 2017].
Interestingly, social relationships are hypothesised to have a causal role in mortality rate,
with lower levels of social integration associated with higher levels of inflammation, blood
pressure or waist circumference across all human lifespan [Yang et al., 2016b]. A fascinating
example of the impact of environmental and lifestyle factors on human lifespan are the so
called ‘blue zones’. These are geographical areas (such as Ogliastra in Sardinia, Okinawa
in Japan, the Nicoya peninsula in Costa Rica and the island of Ikaria in Greece) that have
unusual proportions of long-lived individuals. However, the genetics of these populations are
similar to their neighbours and therefore differences in the rate of ageing must be attributed
to environmental and lifestyle factors [Partridge et al., 2018; Poulain et al., 2013]. As such,
targeted lifestyle interventions will likely complement pharmacological interventions (some
of them mentioned in section 1.1.2) in order to slow down the human ageing process. Finally,
epigenetic mechanisms constitute an interesting layer of biological information that could
mediate the interactions between genetics and environment to affect the ageing process and
it will be the topic of discussion in the next sections.
1.2 Epigenetics of ageing
1.2.1 A brief introduction to epigenetics
The coining of the term epigenetics is normally attributed to Conrad H. Waddington, when,
in 1942, he defined it as the studies that deal with the causal mechanisms behind embryonic
development (i.e. the processes by which the genotype of a single cell brings about the
phenotype of an organism) [Waddington, 1942]. This led to the unification of two apparently
distinct fields (genetics and embryology), today known as the field of developmental genetics
[Gilbert, 2011]. Furthermore, Waddington is also known for introducing the concept of
the epigenetic landscape, which depicted developmental trajectories and the theory behind
them in an incredibly compelling way [Waddington, 1957]. Later work by Nanney, Riggs,
Holliday and others evolved the definition of epigenetics towards the concept of cellular
memory, that was materialised at the molecular level through DNA methylation (since it
could affect transcription and be inherited after each cell division) [Lappalainen and Greally,
2017]. The next decades were characterised by the discovery of a great variety of ‘molecular
1.2 Epigenetics of ageing 15
routes’ to affect gene expression (such as chromatin modifications or non-coding RNAs),
which in humans culminated with consortia such as ENCODE [Consortium et al., 2012],
Roadmap Epigenomics [Consortium et al., 2015], BLUEPRINT or IHEC [Stunnenberg et al.,
2016] and created a broader concept of epigenetics [Greally, 2018; Lappalainen and Greally,
2017].
Nowadays there is a debate in the scientific community about the appropriate definition of
epigenetics [Bird, 2007; Greally, 2018]. For the purpose of this thesis I will define epigenetics
as the study of molecular variation that is beyond changes in the DNA sequence, that is
inherited after cell division and that regulates gene expression (in line with the definition
by Wu and Morris) [Wu and Morris, 2001]. However, it is important to mention that, in the
context of the epigenetic clock, we are still not sure whether these molecular changes have
direct functional consequences (e.g. by affecting RNA expression) and/or whether they help
to define a new metastable cellular state in the cells in which they occur (see section 1.3.3).
There are different types of molecular mechanisms that are normally considered ‘epige-
netic’. These include:
• DNA methylation. This will be discussed in detail in section 1.2.2.
• Histone modifications. The basic unit of chromatin is the nucleosome. It is composed
of ∼147 bp of DNA wrapped around an octamer of histones (generally two copies of
each one of the four core histones: H2A, H2B, H3 and H4; although histone variants
such as H3.3 or H2A.Z have also been characterised). In order to fit∼2 meters of DNA
into the nucleus of a human cell, chromatin needs to be further compacted with the
help of scaffold proteins (with the furthest level of compaction achieved in the mitotic
chromosome) [Ou et al., 2017]. Histones possess N-terminal regions (a.k.a histone
tails) that project towards the outside of the nucleosome and are positively charged.
By default, this helps to compact the chromatin by interacting with the negative
charges of the DNA. However, many different types of post-translational modifications
(acetylation, methylation, phosphorylation, ubiquitinylation, sumoylation, etc.) in
the residues of the histone tails have been identified across the eukaryotic tree of life
(although modifications have been also found in the globular domains) [Lawrence et al.,
2016]. These histone modifications can affect the chemical properties of chromatin,
its degree of compaction and ultimately contribute to the regulation of transcription
(e.g. through the recruitment of downstream effector proteins). The sequence and
combinations of these modifications that modulate chromatin activity was named the
histone code [Strahl and Allis, 2000] and its complexity is slowly being characterised
16 Introduction
thanks to technologies such as ChIP-seq [Consortium et al., 2015, 2012]. Finally, it is
worth mentioning the nomenclature that is used to refer to histone modifications. For
instance, for the histone modification ‘H3K36me3’, the information about the histone
(‘H3’), the residue where the modification happens (‘K36’ is lysine 36) and the type
and number of modification(s) (‘me3’ refers to three methyl groups) is provided.
• Other ‘epigenetic’ players. Non-coding RNAs (such as long non-coding RNAs,
PIWI-associated RNAs or short-interfering RNAs) have been shown to affect the epige-
netic landscape through different mechanisms. Additionally, many RNA modifications
(known as the epitranscriptome), are currently being elucidated. However, whether
they are considered truly ‘epigenetic’ is debatable [Mattick et al., 2009; Morris and
Mattick, 2014]. Furthermore, prions (misfolded proteins that accumulate in cells and
act as templates to further misfold more protein molecules) have been proposed as an
epigenetic mechanism that is not based on heritable changes in nucleic acid [Halfmann
and Lindquist, 2010].
The different epigenetic marks present complex patterns of correlation and cross-talk,
which are mechanistically linked to the way that its addition and removal is regulated. This
helps to define chromatin states (i.e. combination of different epigenetic marks) that affect
gene regulation in different ways. Historically, chromatin has been broadly classified in two
categories [Allis and Jenuwein, 2016; Reinberg and Vales, 2018; Trojer and Reinberg, 2007]:
• Euchromatin. It presents active gene activity and it is more accessible to the tran-
scription machinery. It is generally characterised by histone modifications such as
H4K16ac, H3K4me3 or H3K36me3.
• Heterochromatin. It is normally subdivided in constitutive (highly condensed and
transcriptionally repressed; mostly found in pericentromeric regions, telomeres and
other regions that contain repetitive elements; it is generally marked by H3K9me3 and
high levels of 5mC) and facultative (normally transcriptionally silent but it has the
potential to adopt open conformations depending on the temporal and spatial context;
it is generally marked by H3K27me3).
Consortia that have mapped many epigenetic marks (collectively known as the epigenome)
in humans [Consortium et al., 2015, 2012] and advances in chromatin segmentation algo-
rithms [Ernst and Kellis, 2010] have led to a more fine-grained definition of chromatin states.
This has helped to identify functional elements in the genome in a high-throughput way,
such as active transcription start sites (TSS, enriched in H3K4me3), enhancers (enriched
1.2 Epigenetics of ageing 17
in H3K4me1) or bivalent chromatin (enriched in H3K4me3 and H3K27me3) [Consortium
et al., 2015, 2012].
Epigenetic marks contribute to define (or in Waddingtonian terms, ‘canalise’) different
cell types and cellular states from the same genomic sequence. Cellular identity is normally
established by master regulators (initiators), generally transcription factors that activate the
expression of a genetic program (i.e. coordinated gene expression) [Reinberg and Vales,
2018]. However, in order for this cellular state to survive once the initiator is no longer
present, the patterns of epigenetic marks need to be inherited after cell division. This is clearly
the case for 5-methylcytosine (5mC, see section 1.2.2). In the case of histone modifications,
there is evidence for the propagation of some of the repressing histone modifications (such as
H3K9me3 and H3K27me3). This is possible because the machinery in charge of catalysing
the addition of these chemical modifications (i.e. the writers, SUV39H1 and Polycomb
Repressive Complex 2) also has the ability to recognise it (i.e. they are also readers),
therefore creating a positive feedback. However, it is not clear whether many other histone
modifications are copied after DNA replication in the newly synthesised DNA strand and
therefore whether they are truly epigenetic [Reinberg and Vales, 2018]. Additionally, it is
important to mention that enzymatic activities to reverse most (if not all) epigenetic marks
(i.e. erasers) have been identified [Allis and Jenuwein, 2016].
Besides regulating transcription and/or defining cellular states, epigenetic mechanisms
play a fundamental role in other important biological processes. These include ge-
nomic imprinting (monoallelic expression according to parental origin) [Peters, 2014], X-
chromosome inactivation (silencing of one of the two X chromosomes in female therian
mammals) [Wutz, 2011] or cast differentiation in eusocial insects (such as queen and worker
differentiation in honeybees, where there is a 10-fold difference in lifespan) [Patalano et al.,
2012; Remolina and Hughes, 2008].
One of the big questions in the field of epigenetics is to which extent epigenetic patterns
are genetically programmed and to which extent they change in response to environmen-
tal/stochastic influences. In the case of human populations, genetic variants that affect the
levels of DNA methylation (meQTLs) and histone modifications (hQTLs) at specific loci
have been identified [Taudt et al., 2016]. Interestingly, it is possible to predict different
epigenetic marks from the raw DNA sequence, mainly by identifying transcription factor
binding sites that guide different parts of the epigenetic machinery [Whitaker et al., 2014].
Furthermore, monozygotic twins allow to control for the genetic background and study the
epigenetic variation derived from environmental and stochastic factors, which is particularly
18 Introduction
interesting in the context of complex diseases [Castillo-Fernandez et al., 2014]. Nevertheless,
the debate is far from being finished.
1.2.2 Fundamentals of DNA methylation in mammals
Different types of DNA modifications have been described across the tree of life. DNA
methylation enzymes evolved in bacterial species to protect them from the infection of
bacteriophages, although roles in bacterial transcriptional regulation have also been described
[Sánchez-Romero et al., 2015]. In mammals, the most common DNA modification is the
addition of a methyl group in the carbon at the 5th position of cytosines (5mC), which
has been called the 5th base of DNA. The traditional functions assigned to 5mC include the
mediation of genomic imprinting and X-chromosome inactivation, repressing transposable
elements and regulating transcription [Wu and Zhang, 2017]. In the latter case, 5mC has
been commonly associated with the repression of transcription (e.g. by altering the ability of
transcription factors to bind or by attracting methyl-CpG binding domain proteins) [Li and
Zhang, 2014]. However, it is becoming clearer over time that the picture is more complex.
For example, gene bodies of highly expressed genes are methylated in order to avoid cryptic
transcription [Neri et al., 2017].
5mC generally occurs when the cytosine is followed by a guanine in the DNA strand
(commonly known as a CG dinucleotide or CpG site) [Li and Zhang, 2014; Smith and
Meissner, 2013]. In the human genome there are around 28 million CpG sites, of which
approximately 60-80% are normally methylated [Smith and Meissner, 2013]. The density
of CpG sites in the genome is variable. CpG islands (CGIs) are CpG-enriched genomic
regions (200-2000 bp long, ∼30,000 CGIs in the human genome which account for ∼10%
CpG sites) and are frequently associated with promoters (although ∼9,000 of them are found
inside gene bodies) [Jeziorska et al., 2017; Smith and Meissner, 2013; Zeng et al., 2014].
Promoter-associated CGIs are normally unmethylated across cell types, which contrasts with
the high methylation levels in the rest of the genome. The mechanism by which these CGIs
remain resistant to DNA methylation is starting to be elucidated. Recent reports suggest that
active transcription together with the binding of proteins that block methylation are required
for the resistance. Among these proteins (which bind non-methylated CpG sites in the CGI
via their zinc-finger CXXC domain) it is worth mentioning CFP1 (which recruits an H3K4
methyltransferase that increases H3K4me3 levels, which in turn inhibits de novo methylation)
and TET1 (see below) [Takahashi et al., 2017]. On the contrary, if a promoter-associated
CGI is methylated, this commonly leads to transcriptional repression of the correspondent
1.2 Epigenetics of ageing 19
gene; something that is observed in the promoters of certain tumour suppressor genes in
cancer [Flavahan et al., 2017].
Different enzymes contribute to the establishment, maintenance and removal of
DNA modifications in mammals. De novo methyltransferases DNMT3A and DNMT3B
are capable of catalysing the addition of 5mC in those CpG sites that originally lack the
modification in any of the two DNA strands. Maintenance methyltransferase DNMT1 is able
to add 5mC to hemimethylated DNA (i.e. when only one of the strands in the CpG site has
5mC) thanks to the symmetry of CpG sites (and its recruitment via UHRF1). This provides
a mechanism for the inheritance of DNA methylation patterns after cell division, therefore
making it a true epigenetic mark capable of generating cellular memory (Fig. 1.3) [Li and
Zhang, 2014; Smith and Meissner, 2013]; as originally hypothesised in 1975 by Holliday,
Pugh and Riggs [Holliday and Pugh, 1975; Riggs, 1975]. It is worth mentioning that 5mC
in a non-CpG context (i.e. in CHG or CHH, where H corresponds to adenine, thymine
or cytosine) has also been detected in human tissues [Schultz et al., 2015]. However, its
abundance is generally very low with the exception of embryonic stem cells (ESCs), induced
pluripotent stem cells (iPSCs) and some brain cells; probably due to the levels of DNMT3A
and/or DNMT3B in these cell types [He and Ecker, 2015; Ziller et al., 2011]. A third de
novo DNA methyltransferase that is catalytically inactive, DNMT3L, has also been identified
in mammals. DNMT3L is a DNMT3 variant that lacks the N-terminal part of the regulatory
domain and the C-terminal part of the catalytic domain [Lyko, 2017]. DNMT3L cooperates
mainly with DNMT3A to add 5mC in maternal genomic imprints during gametogenesis
[Bourc’his et al., 2001; Tomida et al., 2018]. Additionally, DNMT3C was recently discovered
as a de novo DNA methyltranferase in rodents, where it is responsible for methylating and
silencing young retrotransposons in the male germline, which is required for mouse fertility
[Barau et al., 2016].
It was a long-standing question whether the loss of 5mC (a.k.a demethylation) can only
occur by replication-coupled passive loss (i.e. preventing DNMT1 maintenance activity
and diluting 5mC content by cell division), due to methyltransferase errors or as a result
of DNA repair after DNA damage [Iurlaro et al., 2017]. In 2009, two groups conclusively
identified the presence of a different type of DNA modification in mouse and human DNA,
5-hydroxymethylcytosine (5hmC) [Kriaucionis and Heintz, 2009; Tahiliani et al., 2009]
(although, surprisingly, its presence in rat tissue had been detected almost 40 years before)
[Penn et al., 1972]. Furthermore, one of them demonstrated that the enzyme TET1 is
capable of oxidising 5mC to 5hmC [Tahiliani et al., 2009]. Since then, other enzymes from
the TET family (TET2, TET3) have also been shown to catalyse this reaction [Ito et al.,
20 Introduction
C G
G C
C G
G C
C G
G C
C G
G C
C G
G C
C G
G C
DNMT3A
DNMT3B
DNA replicationDN
A r
ep
lica
tio
n me
me
me
me
me
me me
me
DNMT1 DNMT1
5’ 3’
3’ 5’
CpG site
Watson DNA strand
Crick DNA strand
Fig. 1.3 Establishment and maintenance of 5-methylcytosine (5mC) in mammalian genomes. Unmethylated
cytosines in symmetric CpG sites are originally methylated de novo by DNA methyltransferases DNMT3A and
DNMT3B to form 5mC. After cell division, the newly synthesised DNA strands lack the methylation mark.
Maintenance DNA methyltransferase DNMT1 recognises this hemimethylated DNA and adds the missing
methyl groups, therefore ensuring the inheritance of DNA methylation patterns and cellular memory.
1.2 Epigenetics of ageing 21
2010]. Further products of oxidation, 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC),
can also be generated by TET enzymes, although their abundance is incredibly low in the
genome. Replication-dependent dilution of oxidised products or thymine DNA glycosylase
(TDG)-mediated excision of 5fC and 5caC coupled with BER have been shown to complete
the demethylation process (Fig. 1.4). Altogether, this shows that active enzymatic DNA
demethylation is a feature of mammalian epigenomes [Wu and Zhang, 2017]. Finally, it
is worth mentioning that another type of DNA modification, N6-methyladenine, has recently
been identified in both mouse and human cells, thus further expanding the DNA alphabet
[Wu et al., 2016; Xiao et al., 2018].
5mC
C
5hmC5fC
5caC DN
A re
plic
atio
n
DN
A 
re
pl
ica
tio
n DNA replication
TET
TETTET
DNMTTDG +
BER
TDG +
BER
N
N
O
NH2
R
N
N
O
NH2
R
N
O
ON
O
NH2
R
N
O
N
O
NH2
R
N
N
O
NH2
R
OH
AM-AR
AM-PD
Fig. 1.4 Oxidation of 5-methylcytosine (5mC) and the cycle of demethylation. 5mC can be oxidised to different
DNA modifications (5hmC, 5fC, 5caC) by TET enzymes. The maintenance DNA methyltransferase DNMT1
can only recognise 5mC. As a consequence, after DNA replication, the rest of the modifications would be
eventually lost (active modification–passive dilution or AM-PD). Alternatively, thymine DNA glycosylase
(TDG)-mediated excision of 5fC and 5caC coupled with base excision repair (BER) can lead to the same
outcome (active modification–active removal or AM–AR). This figure was adapted from [Wu and Zhang, 2017].
22 Introduction
DNA methylation patterns change drastically during mammalian embryonic devel-
opment. After fertilisation, mouse and human zygotes undergo epigenetic reprogramming
in order to reset naive pluripotency. This is mainly characterised by a global loss of 5mC (i.e
DNA hypomethylation) with different demethylation processes affecting the paternal and
maternal genome. Nevertheless, some genomic regions, such as imprints, survive epigenetic
reprogramming. De novo DNA methylation occurs after implantation of the blastocyst, which
will restore DNA methylation levels for most somatic cells and eventually generate cell-type
specific DNA methylation patterns [Atlasi and Stunnenberg, 2017; Iurlaro et al., 2017; Tang
et al., 2016].
In the case of the cells that will give rise to the germline (primordial germ cells or
PGCs), further genome-wide DNA demethylation occurs, which makes PGCs the most
hypomethylated cells found during mammalian development thus far (global methylation
levels are ∼4%). This ensures imprint erasure and that parental epigenetic memories are
removed, therefore posing a barrier for transgenerational epigenetic inheritance. Nevertheless,
some regions that escape epigenetic reprogramming in mice and humans have been described
(mainly evolutionarily young and potentially hazardous retrotransposons). Methylation
patterns of the germline will then be re-established in a sex-specific manner [Tang et al.,
2016].
Over the years, many technologies have been developed to measure 5mC and its
oxidative products (see section 4.1 for an overview). Many assays rely on a chemical
procedure called bisulfite conversion [Frommer et al., 1992]. Genomic DNA is denatured and
incubated with sodium bisulfite. This leaves 5mC residues intact, but unmethylated cytosines
are deaminated and converted to uracil. Therefore, after PCR amplification, 5mCs are
substituted by cytosines while unmethylated cytosines become thymines. This information
can then be read at base-pair resolution through DNA sequencing or by hybridisation to a
methylation array (such as the Illumina Infinium BeadChips, which are the platform used to
generate the data analysed in this thesis) [Plongthongkum et al., 2014]. It is important to keep
in mind that 5hmC is confounded with the 5mC signal [Wu and Zhang, 2017]. Furthermore,
C>T mutations (a common mutation during ageing, see section 1.1.3) can be confounded
with hypomethylation events. Another caveat of bisulfite treatment is that it degrades DNA to
a great degree and generates sequencing libraries of low complexity, which leads to reduced
mapping rates and higher costs. A few months ago, Liu et al. published a bisulfite-free
protocol for 5mC sequencing at base-pair resolution, which could potentially solve some of
these issues and start a new generation of bisulfite-free methods [Liu et al., 2019].
1.2 Epigenetics of ageing 23
Exposure to certain environmental factors is associated with changes in the methy-
lome that can potentially modulate disease risk. In mice, in utero undernourishment leads
to weight and metabolic defects in the F1 offspring. Furthermore, this metabolic phenotype
is inherited in the F2 offspring through the paternal line. Interestingly, this could be caused
because genomic regions that become hypomethylated during paternal germline specification
survive epigenetic reprogramming in the F2 zygote and lead to further chromatin alterations
in adult tissues [Radford et al., 2014]. In a different example, smoking exposure in humans
changes DNA methylation patterns of blood [Roby et al., 2016] or buccal cells [Teschendorff
et al., 2015] in a consistent and reproducible manner. However, the mechanisms behind
these changes and whether they are functional or mere passenger epimutations remain ob-
scure. Moreover, many complex diseases, such as rheumatoid arthritis or many cancer types,
are characterised by altered DNA methylation patterns (although these DNA methylation
changes are more robust and more abundant in most cancers when compared with rheumatoid
arthritis). This suggests that epigenetic mechanisms integrate genetic and environmental
aetiologies of disease [Liu et al., 2013; Widschwendter et al., 2018]. Nevertheless, it is
important to mention that many environmental factors have an impact in the biology of the
organism through non-epigenetic mechanisms. For example, exposure to certain environ-
mental factors, such as UV light or polycyclic aromatic hydrocarbons (PAHs), can lead
to specific DNA mutational signatures and increase the risk of developing certain types of
human cancer [Kucab et al., 2019]. Furthermore, many environmental adaptations, such as
those that regulate responses to temperature and light [Narasimamurthy and Virshup, 2017]
or nutrient availability (see section 1.1.2), rely on non-epigenetic molecular mechanisms.
1.2.3 Links between the epigenetic machinery and ageing
As previously mentioned in section 1.1.3, epigenetic alterations are one of the hallmarks
of mammalian ageing [Lopez-Otin et al., 2013]. Given the role of genetic pathways
and environmental factors in the regulation of organismal lifespan, the epigenetic layer of
biological information has attracted a lot of interest in the ageing field (to the point that some
authors have suggested that it is the hub that connects all the hallmarks of ageing) [Booth and
Brunet, 2016]. Indeed, many life-extending interventions (such as dietary restriction, exercise
or a robust circadian rhythm) modulate the epigenetic machinery and induce chromatin
changes [Benayoun et al., 2015]. Furthermore, since many epigenetic marks are stable over
time, they could behave as cellular memories that store past environmental exposures. Taking
into account the vast literature available on this topic, in this section I will try to extract the
pieces of information that are more relevant for this work.
24 Introduction
Several authors have reviewed the wide variety of chromatin changes that occur during
ageing in different organisms [Benayoun et al., 2015; Booth and Brunet, 2016; Pal and
Tyler, 2016; Sen et al., 2016]. These include changes in histone numbers, histone variants,
histone modifications, DNA modifications, non-coding RNAs or nucleosome positioning;
which eventually lead to transcriptional deregulation. Certain mutations in proteins of the
epigenetic machinery affect the lifespan of organisms from yeast to mouse, thus proving
a causal role for some of these changes in the ageing process. Furthermore, I highlight a few
interesting insights from these studies:
• Global heterochromatin loss and redistribution has been suggested as one of the mech-
anisms behind the ageing process [Tsurumi and Li, 2012; Villeponteau, 1997]. Indeed,
mutations in proteins that cause premature ageing in humans (such as nuclear lamins or
the WRN helicase) have a major impact in heterochromatin structure and the genomic
distribution of its characteristic repressive chromatin marks (such as H3K9me3 or
H3K27me3) [Zhang et al., 2015a]. Cellular senescence is also associated with the
remodelling of heterochromatin [Zhang et al., 2007]. Furthermore, heterochromatin
deregulation can lead to the activation and mobilisation of transposable elements during
mammalian ageing [De Cecco et al., 2013].
• Mutations that alter the levels of H3K27me3 and H3K4me3 can have contradictory
effects in the lifespan of model organisms, probably depending on the loci and cell
types that they affect. However, it appears that mutations that increase the levels of
H3K36me3 consistently increase lifespan (at least in yeast, worm and fly) [Benayoun
et al., 2015; Booth and Brunet, 2016; Pal and Tyler, 2016; Sen et al., 2016]. This will
be of interest for Chapter 3.
• Increased levels of SIRT6, an H3K9ac and H3K56ac histone deacetylase from the
sirtuin family, can extend lifespan of male mice [Kanfi et al., 2012]. On the contrary
SIRT6-deficient mice die at about 4 weeks, have a progeroid phenotype and have
increased genomic instability (due to problems in the base excision repair pathway)
[Mostoslavsky et al., 2006]. The role of SIRT6 in human ageing is still not clear.
• Histone chaperone ASF1, which promotes histone deposition and stability, is required
for normal replicative lifespan in yeast [Feser et al., 2010]. Intriguingly, the mouse
ortholog ASF1A is important to resolve bivalent chromatin upon differentiation of
embryonic stem cells [Gao et al., 2018] (see discussion regarding the importance of
bivalent domains during mammalian ageing in the ‘Hypermethylated regions during
ageing’ section below).
1.2 Epigenetics of ageing 25
• The naked mole rat, an incredibly long-lived rodent with very low cancer incidence,
presents a stable epigenome that is resistant to in vitro reprogramming. Furthermore,
higher levels of repressive chromatin marks (such as H3K27me3) are observed relative
to the mouse [Tan et al., 2017].
Importantly, the DNA methylation landscape also seems to be affected during ageing in
mammals. Certain CpG sites or genomic regions gain methylation with age (i.e. they become
hypermethylated) while others sites lose methylation (i.e. they become hypomethylated).
Furthermore, some of these age-associated methylation changes are shared across tissues,
while others are tissue-specific. Notably, even though they have a stochastic component,
the genomic context where these changes occur seems to be conserved in mice [Avrahami
et al., 2015; Cole et al., 2017b; Maegawa et al., 2010; Sziráki et al., 2018; Wang et al., 2017]
and humans [Day et al., 2013; Dozmorov, 2015; Fernández et al., 2015; Heyn et al., 2012;
Horvath et al., 2012; Raddatz et al., 2013; Rakyan et al., 2010; Slieker et al., 2018, 2016;
Teschendorff et al., 2010; Weidner et al., 2014; Yuan et al., 2015; Zhu et al., 2018]:
• Hypermethylated regions during ageing. They are generally enriched for bivalent
chromatin, regions repressed by PRC2 (Polycomb Repressing Complex 2) and CpG
islands (CGIs, many of which overlap with bivalent promoters). Bivalent domains are
populated with numerous transcription factor binding sites and are marked simultane-
ously by histone marks H3K27me3 (established by EZH2, which is part of the PRC2
complex; associated with transcriptional repression) and H3K4me3 (established by
Trithorax-group proteins; associated with transcriptional activation). The two histone
marks seem to co-occur on the same loci of the same cell in a majority of the bivalent
domains (as opposed to an heterogeneous population of cells with different histone
marks), and sometimes even on the different histone copies of the same nucleosome
[Voigt et al., 2013]. This opposing duality is thought to silence developmental genes in
embryonic stem cells (and pluripotent stem cells in the embryo) while keeping them
poised for activation (by developmental and/or environmental cues) [Voigt et al., 2013].
Developmental genes (many of them lowly expressed transcription factors) are indeed
highly enriched in these regions and this seems to be a feature of most gene ontology
analysis performed in hypermethylated CpGs during ageing. Many of the bivalent
domains disappear after differentiation, leaving only one of the two marks [Bernstein
et al., 2006], but specific nonpluripotent bivalent domains can also be generated after
differentiation [Voigt et al., 2013]. Besides differentiation, the physiological ageing
process also seems to change the landscape of bivalent domains, as observed in aged
haematopoietic stem cells or HSCs (where around 335 bivalent domains disappear in
26 Introduction
old mouse HSCs, whereas 1,245 emerge) [Sun et al., 2014a]. This process is apparently
linked to the proliferative history of HSCs [Beerman et al., 2013] and could contribute
to the myeloid skewing observed during ageing [Beerman et al., 2013; Sun et al.,
2014a]. Interestingly, bivalent domain losses occur in cancer cells as well, which
seems to correlate with the hypermethylation of the regions [Bernhart et al., 2016]. It
is possible that the ageing- or cancer-related hypermethylation destroys the ability to
create a bivalent equilibrium in these regions. If this happens in the stem cells, it could
impair adequate differentiation and propagate the methylation change in the tissue.
Overall, this provides an interesting mechanistic link between embryonic development,
lineage-specific cellular identity and the ageing process that should be further explored.
• Hypomethylated regions during ageing. They are generally enriched for tissue-
specific enhancers (generally marked with H3K4me1) and depleted for CGIs (which
makes sense, given the low methylation levels of CGIs). For example, in wild-type
mouse liver, 8230 liver-specific enhancers are hypomethylated during ageing. On
the contrary, only 4702 of those enhancers suffer the same fate in Ames dwarf mice
(which have decreased insulin/IGF-1 signalling and a longer lifespan), which highlights
that the epigenome from Ames dwarf mice appears more stable [Cole et al., 2017b].
DNA methylation patterns in enhancers are likely regulated by the balance between de
novo DNA methyltranferases and TET enzymes. For example, in human epidermal
stem cells, DNMT3A and DNMT3B associate with the most active enhancers in a
H3K36me3-dependent way and, together with TET2, regulate enhancer DNA methyla-
tion levels and function [Rinaldi et al., 2016]. Furthermore, loss of DNMT3A drives
enhancer hypomethylation in both mouse and human leukaemia models [Yang et al.,
2016a]. Conversely, deletion of TET2 causes extensive loss of 5hmC at enhancers in
mouse ESCs, which is accompanied by enhancer hypermethylation [Hon et al., 2014].
Therefore, the enhancer-specific hypomethylation observed during ageing could be a
consequence of changes in the expression or activity levels of DNA methyltransferases
and TET enzymes with age, which have been reported both in mice and humans
[Armstrong et al., 2013; Ciccarone et al., 2016; Gontier et al., 2018; Truong et al.,
2015].
Importantly, some of these ageing-associated DNA methylation changes also seem to
happen in dogs and wolves [Thompson et al., 2017]. Furthermore, the rate of change of many
of these age-associated regions is negatively correlated with lifespan in six different mammals
[Lowe et al., 2018]. Altogether, this suggests that conserved epigenetic mechanisms may
operate during ageing to shape the mammalian methylome.
1.3 The epigenetic ageing clock 27
Hence, it is clear that the epigenome is eroded over time. In humans, this inter-individual
divergence of DNA methylation patterns created upon ageing has been termed ‘epigenetic
drift’ [West et al., 2013]. Interestingly, this phenomenon is found even in monozygotic twins
[Fraga et al., 2005; Talens et al., 2012], again highlighting the role of environmental and
stochastic factors. This is also observed at the single cell level, where cells from old organisms
become more heterogenous at the epigenomic and transcriptomic level [Hernando-Herraez
et al., 2018; Martinez-Jimenez et al., 2017].
1.3 The epigenetic ageing clock
1.3.1 Measuring the ageing process
In order to study any phenomenon one needs to be able to measure it. Using survival curves
(a.k.a lifespan curves, i.e. plotting the survival fraction over time, see equation 1.1) we
have been able to quantify the ageing process at a population level (where the assumption
is that life extension in a significant proportion of the population is a surrogate marker of
slowed ageing) [Johnson, 2013]. The adoption of this methodology in model organisms
(that we can manipulate genetically and/or pharmacologically) triggered the discovery of
the first genes impacting upon the ageing process (i.e. the mutants showed ‘shifts’ of the
survival curve when compared with a control). Since then, this has been the main paradigm in
ageing research, with efforts being made to automate the process and increase its throughput
[Stroustrup et al., 2013].
Nevertheless, measuring the ageing process at the organismal level has proven more
difficult. Due to environmental and stochastic factors, there are significant differences in
the lifespan of even isogenic organisms. Therefore, there is a real need to develop accurate
biomarkers of ageing i.e. measurements of ‘age-related change(s) in body function(s) or
composition that can predict the future onset of age-related disease(s) and/or the residual
lifetime left (i.e. predict the rate of ageing) more accurately than chronological age’ [Bürkle
et al., 2015]. Furthermore, according to the American Federation of Aging Research, any
valid biomarker of ageing must also monitor a basic (sub)process underlying ageing, it must
be able to be tested repeatedly without harming the organism (i.e. it has the potential to
become a longitudinal biomarker) and be reproducible in both humans and laboratory animals
(such as mice) [Bürkle et al., 2015].
The derivation of a biomarker of ageing leads to the definition of two types of age:
28 Introduction
• Chronological age. It is the time elapsed since the birth of an individual.
• Biological age. It is the result derived from a specific biomarker. Each biomarker
is trained using a set of biological parameters (independent variables) to predict a
dependent variable (e.g. chronological age) that captures the probability of dying at a
given time. The training takes place using several individuals, ideally from multiple
populations. Afterwards, given a new individual and the biological parameters, the
biological age can be predicted (and it should capture the risk of death more accurately
than chronological age). Younger biological ages should be linked to high fitness and
health whereas older biological ages should correlate with age-related disease onset
and morbidity [Benayoun et al., 2015]. For example, if chronological age is used as
the dependent variable (which is the case for most biomarkers), the biological age of
an individual represents the chronological age of the average population that is most
similar to the individual (according to the set of biological parameters). In this case, if
the biological age of an individual is smaller than his chronological age, this could be
interpreted as his probability of death being smaller than the probability of death for
the average population (i.e. potentially the rate of ageing of the individual is slower
than the average).
In the case of humans, the initial ageing biomarkers included traditional biological
parameters such as body mass index, waist and hip circumference, blood pressure or heart
rate. Over the years, biomarkers that use molecular parameters have also been developed;
these include clinical chemistry parameters (such as cholesterol, immunoglobulins or fasting
glucose), telomere length or ‘omics’-based measurements [Bürkle et al., 2015; Jylhävä et al.,
2017]. In the latter category, almost every layer of biological information can be used to
derive a biomarker, including epigenomics (see next section), transcriptomics [Peters et al.,
2015], proteomics [Tanaka et al., 2018], metabolomics [Hertel et al., 2016], microbiome
[Galkin et al., 2018] or even brain neuroimaging data [Cole et al., 2017a]. Furthermore,
composite biomarkers (that combine the biological parameters from molecular layers with
measurements of physiological function) [Khan et al., 2017] and algorithmic innovations
(such as deep neural networks) [Putin et al., 2016] will likely improve the predictions. The
biomarkers of the human ageing process will serve as personalised risk indicators and
will allow monitoring the response to interventions, therefore creating endpoints in clinical
trials that target the ageing process.
1.3 The epigenetic ageing clock 29
1.3.2 The landscape of epigenetic clocks
Epigenetic clocks are mathematical models that predict the biological age of an organism
using DNA methylation data. These models exploit the fact that DNA methylation patterns
change robustly with age in different tissues and species, as summarised in section 1.2.3.
Epigenetic clocks have emerged in the last few years as the most accurate molecular biomark-
ers of the ageing process in humans, which they can track across the entire lifespan. As a
quick comparison, telomere length (one of the other popular ageing biomarkers) achieves
a Pearson’s correlation coefficient with chronological age of ∼ -0.5 in blood leukocytes in
the best case scenarios (with many studies reporting much lower values and contradictory
results) [Newman and Sanders, 2013]. On the other hand, the coefficients for Hannum’s
or Horvath’s epigenetic clocks (discussed later) are generally above ∼ 0.8 (in virtually all
studies assessed) [Chen et al., 2016a].
The idea that DNA methylation patterns behave in a clock-like manner during cellular
ageing was already proposed in 1975 [Holliday and Pugh, 1975]. With the advent of
high-throughput DNA methylation technologies, some authors started to test the ability of
DNA methylation patterns to predict chronological age in humans. In 2010, Bork et al.
showed that DNA methylation values change at specific CpG sites upon long-term culture
and between young and old individuals in mesenchymal stromal cells [Bork et al., 2010].
Later that same year, studies by Teschendorff et al. [2010], Rakyan et al. [2010], Grönniger
et al. [2010] and others identified sets of CpG sites (signatures) that consistently altered their
methylation states with age in different tissues and cell types (and interestingly some of them
seemed to occur in the same genomic context). In 2011, Bocklandt et al. demonstrated that it
was possible to predict chronological age in saliva with an average error of 5.2 years using the
DNA methylation values of only two CpG sites [Bocklandt et al., 2011]. Shortly afterwards,
Koch et al. built what was probably the first multi-tissue predictor of chronological age in
humans (which worked using the same 5 CpG sites across different cell types) [Koch and
Wagner, 2011].
The potential role of epigenetic clocks as biomarkers of human ageing was probably
realised after the publications, in 2013, of the models by Hannum et al. [2013] and Horvath
[2013a] (Table 1.1). Since then, these epigenetic clocks have being validated in a large
number of independent cohorts and have become, de facto, the default human epigenetic
clocks for blood and multi-tissue predictions respectively. Importantly, this inspired other
groups to build epigenetic clocks in the mouse [Meer et al., 2018; Petkovich et al., 2017;
Stubbs et al., 2017; Thompson et al., 2018; Wang et al., 2017], dogs and wolves [Thompson
et al., 2017] or even humpback whales [Polanowski et al., 2014]; which will be instrumental
30 Introduction
to broaden our understanding of the biology of ageing in mammals [Stubbs et al., 2017]. A
comparison of some of these epigenetic clocks can be found in Table 1.1. The accuracy that
they can achieve with a relatively small number of CpG sites as covariates is remarkable.
The predictions from epigenetic clocks are normally referred as epigenetic age (which is
equivalent to the concept of biological age previously explained). Interestingly, deviations
of epigenetic age from chronological age (a.k.a epigenetic age acceleration or EAA) have
been associated with many conditions in humans, including time-to-death [Chen et al.,
2016a; Marioni et al., 2015], HIV infection [Horvath and Levine, 2015], Down syndrome
[Horvath et al., 2015a], obesity [Horvath et al., 2014], menopause [Levine et al., 2016] and
breast-cancer risk in women [Kresovich et al., 2019], Werner syndrome [Maierhofer et al.,
2017] or Huntington’s disease [Horvath et al., 2016b], among others (reviewed in Horvath
and Raj [2018]). Interestingly, females and people of Hispanic ethnicity have lower EAA
(after correcting for blood cell composition effects) when compared with males and those of
Caucasian origin respectively, highlighting a role for biological sex and genetic background
in the rate of the epigenetic ageing clock [Horvath et al., 2016a]. In mice, the epigenetic
clock is slowed down by dwarfism and calorie restriction [Cole et al., 2017b; Meer et al.,
2018; Petkovich et al., 2017; Thompson et al., 2018; Wang et al., 2017] and is accelerated by
ovariectomy and high fat diet [Petkovich et al., 2017; Stubbs et al., 2017; Thompson et al.,
2018; Wang et al., 2017].
Recently, other epigenetic clocks have been created for slightly different purposes. For
example, Yang and colleagues developed an epigenetic clock that can track the rate of
(stem) cell divisions in normal and cancerous tissue (see section 2.3.2) [Yang et al., 2016c].
Furthermore, an epigenetic clock that performs well in skin cells (such as fibroblasts, buccal
cells and endothelial cells; known as the skin-blood clock) was developed in order to improve
ex vivo studies or forensic applications [Horvath et al., 2018]. Moreover, this epigenetic
clock enables the detection of EAA in Hutchinson-Gilford progeria, which is not possible
with Horvath’s clock [Horvath et al., 2018]. Additionally, other epigenetic clocks have been
trained to predict more complex dependent variables than chronological age. Levine et
al. built a model that predicts a combination of chronological age with clinically-relevant
variables (such as erythrocytes distribution width or serum glucose), known as PhenoAge
[Levine et al., 2018]; while Lu et al. built a model that predicts a composite variable mixing
information from smoking pack-years and plasma proteins (adrenomedullin, C-reactive
protein, plasminogen activation inhibitor 1 and growth differentiation factor 15), known as
GrimAge [Lu et al., 2019]. These models perform better than previous epigenetic clocks in
1.3 The epigenetic ageing clock 31
Species Human Human Mouse Dog and wolf
Main reference Hannum et al.
[2013]
Horvath
[2013a]
Thompson
et al. [2018]
Thompson
et al. [2017]
DNA methylation
technology
Illumina
methylation
array (450K)
Illumina
methylation
array (27K and
450K)
RRBS RRBS
N samples (in train-
ing set)
N = 482 N = 3931 N = 893 N = 108
Tissues (in training
set)
Blood Multi-tissue
(18)
Multi-tissue
(10)
Blood
Age range (in train-
ing set)
19-101 years 0-100 years 0.2-32.2
months
0.5-8 years
Number of CpGs in
the final model
71 353 529 115
Median absolute er-
ror (MAE)
4.9 years 3.6 years 2.5 months 0.8 years
MAE
max. age in model ·100 4.85% 3.6% 7.76% 10.0%
Table 1.1 Comparison of some of the epigenetic clocks available for different species. RRBS: reduced
representation bisulfite sequencing (see Chapter 4).
32 Introduction
predicting the onset of several age-related diseases and therefore they will likely be useful in
a clinical context.
From an statistical point of view, most of the epigenetic clocks have been built using linear
regression (see section 2.4). A model needs to be trained to predict the dependent variable
(normally chronological age) using the methylation values of different cytosines (generally
in CpG context) as covariates. Given that the number of covariates is normally several orders
of magnitude bigger than the number of samples available for training, regularisation (i.e.
‘shrinking’ of the linear regression coefficients, many of which become zero) needs to be
performed. More specifically, elastic net (a combination of lasso and ridge regularisation)
has been successfully applied [Friedman et al., 2010]. Many epigenetic clocks with sim-
ilar performance can be built from different sets of CpG sites (i.e. the construction of
epigenetic clocks is highly statistically degenerate) [Thompson et al., 2018]. Therefore,
it is important to understand that the CpG sites that constitute an epigenetic clock are not
necessarily the most important biologically, but rather they are probably a lower-dimensional
representation of the main processes that shape the epigenome with age.
1.3.3 Molecular mechanisms of the epigenetic ageing clock
At this point it is probably useful to clarify a few concepts that I will refer to throughout
this work. I define the epigenetic ageing clock as the biological mechanisms that give rise
to the genome-wide epigenetic changes that occur during ageing (in a given species); a
definition in line with the one reported in [Horvath and Raj, 2018]. These changes have been
widely studied in the context of DNA methylation and can be utilised to train predictors of
chronological age (or other more complex variables). These predictors constitute different
types of epigenetic clocks, and I will try to refer to them by the specific model being
mentioned (e.g. Horvath’s epigenetic clock, Hannum’s epigenetic clock, etc.). As such,
specific epigenetic clocks capture the changes associated with the underlying epigenetic
ageing clock.
The molecular mechanisms that control the rate of the epigenetic ageing clock are still
mysterious [Field et al., 2018; Horvath and Raj, 2018]. Steve Horvath proposed that his
multi-tissue epigenetic clock captures the workings of an epigenetic maintenance system,
although the molecular nature of this hypothetical system is unknown to this date [Horvath,
2013a]. Furthermore, we still do not know whether these changes are functional at all or
whether they are just downstream consequences of other molecular processes that drive
ageing.
1.3 The epigenetic ageing clock 33
As mentioned in section 1.2.3, many studies have characterised changes in DNA methyla-
tion patterns during mammalian ageing, some of which seemed to be evolutionarily conserved
[Horvath, 2013a; Lowe et al., 2018]. Interestingly, changes that involve a gain in methy-
lation during ageing seem to be more conserved across tissues, whilst changes involving
hypomethylation are generally more tissue-specific [Horvath, 2013a; Yang et al., 2016c].
Furthermore, many of these changes occur in regions normally occupied by Polycomb
Repressing Complex 2, which are marked by the repressive histone mark H3K27me3.
Therefore, it is likely that disruptions of H3K27me3 domains (which are generally inherited
after cell division) play a role in epigenetic ageing. A specific instance would be bivalent
promoters (which are marked by both H3K27me3 and H3K4me3); these tend to gain methy-
lation with age (see section 1.2.3). These signals are captured by most epigenetic clocks
trained to predict chronological age [Horvath and Raj, 2018].
The mere existence of multi-tissue epigenetic clocks supports the idea that some of the
mechanisms behind the epigenetic ageing clock are shared across tissues. Furthermore,
Hannum’s epigenetic clock (trained exclusively in blood) explains 72% of variation in
chronological age across other tissues (such as breast, kidney, lung and skin), although there
is generally a tissue-specific offset [Hannum et al., 2013]. Interestingly, Horvath’s epigenetic
clock (which is multi-tissue) presents positive epigenetic age acceleration in breast tissue
[Sehl et al., 2017], whilst the cerebellum looks younger than expected [Horvath et al., 2015b]
and some tissues are poorly calibrated (uterine endometrium, dermal fibroblasts, skeletal
muscle and heart) [Horvath, 2013a]. Moreover, Horvath’s epigenetic clock dramatically
underestimates epigenetic age in sperm [Horvath, 2013a], which highlights differences
between somatic cells and the germline. Altogether, this raises the possibility that some
of the mechanisms behind the epigenetic ageing clock may be shared across tissues but
that they may operate at different rates (e.g. because of different exposure to hormones,
differences in proliferation rate, etc.).
Horvath’s epigenetic clock works in primary tissues and cell types, and also in vitro
(both in cell culture and organoids) [Horvath, 2013a; Hoshino et al., 2019]. Furthermore,
recipients of allogeneic hematopoietic stem cell transplantations show an epigenetic age in
their blood that corresponds to the age of the donor, even 17 years after the transplantation
took place [Søraas et al., 2019]. This suggests that the epigenetic ageing clock is a stable
cell-intrinsic property, as opposed to the idea that it is highly influenced by the systemic
environment (such as the effects observed in heterochronic parabiotic experiments) [Conboy
et al., 2005]. The stability is further demonstrated by experiments showing that human
fibroblasts that have been reprogrammed into neurons maintain their original epigenetic
34 Introduction
age [Huh et al., 2016]. Moreover, aneuploid mice carrying a complete copy of human
chromosome 21 accumulate DNA methylation changes during ageing far more rapidly than
seen in human tissues, which suggests that the epigenetic ageing clock is a molecular readout
of the ageing cellular milieu [Lowe et al., 2018].
Epigenetic age acceleration (EAA) has been proposed as a way to capture the ageing
phenotype in GWAS analysis. Genetic variants associated with EAA have been found in
TERT, the catalytic subunit of telomerase [Lu et al., 2018]. Epigenetic age increases in vitro
with cell passage, but it requires the expression of TERT to keep linearly increasing after
a certain number of passages [Lu et al., 2018]. This suggests that bypassing replicative
senescence is required for the epigenetic ageing clock to keep ticking, at least in vitro.
Interestingly, inducing senescence in TERT-immortalised cells via an oncogene makes the
cells age faster in culture, but induction of senescence via DNA damage does not increase
epigenetic age [Lowe et al., 2016]. Overall, this could imply that the epigenome of senescent
cells does not contribute substantially to the changes captured by the epigenetic ageing
clock. Furthermore, it has been proposed that epigenetic ageing could serve a complementary
role to that of senescence, by suppressing potential cancer development (e.g. by protecting
against dedifferentiation signals) [Horvath and Raj, 2018]. The molecular connections
between cell division, alternative non-telomeric functions of TERT and the epigenetic ageing
clock need to be further studied. Moreover, these experiments do not discard an indirect
effect of senescent cells on the epigenetic ageing clock (i.e. via the SASP by inducing
changes in the epigenomes of other cells in the tissue) that could occur in vivo.
The rate of the epigenetic ageing clock is substantially faster during post-natal organismal
growth (something that Horvath’s model accounts for) [Horvath, 2013a], which could be
related to the high levels of TERT expression during this period [Lu et al., 2018]. Interest-
ingly, epigenetic ageing according to Horvath’s epigenetic clock (but not according to other
epigenetic clocks, such as Hannum’s clock, the skin-blood clock, PhenoAge or GrimAge)
seems to start a few weeks post-conception in fetal tissues [Hoshino et al., 2019]. This
could imply that the molecular processes responsible for mammalian epigenetic ageing are
operative even during pre-natal development, potentially with different consequences.
This molecular continuum between development and ageing is further reinforced by
the fact that embryonic stem cells have an epigenetic age around zero [Horvath, 2013a].
Notably, in vitro reprogramming of somatic cells into induced pluripotent stem cells (iPSCs)
also reduces epigenetic age to values close to zero (or even negative) both in humans [Horvath,
2013a] and mice [Meer et al., 2018; Petkovich et al., 2017]. Moreover, the induction of in
vivo partial reprogramming (short and cyclic exposure to reprogramming factors) in progeric
1.3 The epigenetic ageing clock 35
mice ameliorates several ageing phenotypes and extends lifespan [Ocampo et al., 2016].
We are currently testing whether a similar protocol applied to physiologically aged mice
can reduce epigenetic age. This is of extreme importance since it shows that the epigenetic
changes associated with the epigenetic ageing clock are reversible, which opens the door to
further mechanistic studies and to the development of rejuvenation therapies [Mahmoudi
et al., 2019; Olova et al., 2019; Rando and Chang, 2012; Sarkar et al., 2019].
The goal of this thesis is to improve our understanding of the epigenetic ageing clock in
humans. For this purpose, I will first review statistical methods to quantify epigenetic ageing
in human blood (Chapter 2). Then, I will study how different proteins of the epigenetic
machinery affect the rate of the epigenetic ageing clock (Chapter 3). Next, I will discuss a
technological improvement with the potential to make future epigenetic clocks more cost-
effective (Chapter 4). Finally, I will provide interesting future avenues that should be explored
in order to unravel the molecular mechanisms of the epigenetic ageing clock (Chapter 5).

Chapter 2
Statistical aspects
‘I often say that when you can measure
what you are speaking about, and
express it in numbers, you know
something about it; but when you cannot
measure it, when you cannot express it
in numbers, your knowledge is of a
meagre and unsatisfactory kind; it may
be the beginning of knowledge, but you
have scarcely, in your thoughts,
advanced to the stage of science,
whatever the matter may be.’
W. Thomson [1889]
2.1 Analysing the blood methylome to study human ageing
2.1.1 Building a DNA methylation dataset from public data
During the last years large amounts of DNA methylation data have been generated to study
complex diseases and ageing [Flanagan, 2015; Rakyan et al., 2011]. Many of these datasets
can be obtained from public repositories, such as the NCBI-hosted Gene Expression Omnibus
(GEO) [Edgar et al., 2002]. Given its clinical accessibility and ease of collection, blood is
one of the most commonly profiled tissues in human DNA methylation studies [Flanagan,
2015], including published studies on developmental disorders [Aref-Eshghi et al., 2018b]
(see Chapter 3). Therefore, I decided to use blood as my surrogate tissue to broaden our
understanding of the human epigenetic ageing clock.
38 Statistical aspects
Furthermore, most of these human datasets have been generated using different versions
of the Illumina Infinium array technology, with the Illumina Infinium HumanMethylation450
array (450K) being the most frequently used platform [Flanagan, 2015]. Additionally, given
that the different array versions have different chemistries, biases and number of probes
[Bibikova et al., 2011, 2009; Pidsley et al., 2016], I decided to focus on 450K data for my
analyses. Using the GEOquery R package [Davis and Meltzer, 2007], I programmatically
downloaded from GEO all the DNA methylation data from human blood that I could find,
including samples from both whole blood and peripheral blood mononuclear cells (PBMC).
Furthermore, the data also had to satisfy the following criteria:
• Raw DNA methylation data was available (i.e. IDAT files). This was required so
the pre-processing pipeline and the batch effect correction (which requires access to
control probes intensities, see section 2.2.3) could be consistently applied across all
the samples in the study.
• Metadata for the samples was available, with the chronological age as a minimum
requirement.
• In order to study physiological ageing, the blood samples were collected from individ-
uals without prior disease diagnoses. However, it is important to mention that I could
never be completely certain of this, since there could be a lack of diagnosis and/or lack
of reporting of the disease in the metadata.
This allowed me to assemble a human blood DNA methylation dataset for healthy
individuals (after QC, total N = 2218) with the characteristics shown in Table 2.1, which
spans the entire human lifespan (0.5 to 101 years). Fig. 2.1 shows that the chronological age
distribution is bimodal, with peaks around 10.69 and 58.81 years respectively. This reflects a
sampling bias in human population studies, with more data being generated for the periods of
postnatal development and during the appearance of age-related disease. However, in order
to understand the development of complex diseases as a consequence of the ageing process,
efforts should be made to also sample people in their middle ages, before the diseases are
normally diagnosed.
2.1.2 Main DNA methylation data pre-processing pipeline
The analysis of DNA methylation data generated in Illumina arrays has been a topic of huge
discussion and statistical innovation in the epigenetic community. There are plenty of reviews
in the literature that discuss the different steps that should be involved in the pre-processing
2.1 Analysing the blood methylome to study human ageing 39
Batch name N♀ N♂ N Median age
(years)
Other comments
Europe 0 121 121 10.96 -
Feb_2016 0 1 1 0.50 -
GSE104812 19 29 48 9.00 -
GSE111629 111 124 235 71.00 -
GSE40279 336 314 650 65.00 -
GSE41273 0 51 51 10.25 -
GSE42861 239 96 335 55.00 -
GSE51032 253 78 331 54.57 Only people that remained cancer-free
in the follow-up after sample collection
were included
GSE55491 1 5 6 29.50 -
GSE59065 49 46 95 34.00 -
GSE61496 72 78 150 57.00 Only one member of each twins pair
was included
GSE74432 29 22 51 12.00 -
GSE81961 25 0 25 30.05 -
GSE97362 39 80 119 13.00 -
Total 1173 1045 2218 55.00 -
Table 2.1 Overview of the blood DNA methylation dataset from healthy individuals (control). All the batches
were downloaded from GEO [Edgar et al., 2002], with the exception of ‘Europe’ and ‘Feb_2016’, which were
generated in-house by my collaborators in Canada (see Chapter 3). N♀: number of samples from females. N♂:
number of samples from males. N: total number of samples. These numbers correspond to the samples left
after applying quality control (QC, see section 2.1.2).
40 Statistical aspects
0.000
0.005
0.010
0.015
0.020
0.025
0 25 50 75 100
Chronological age (years)
D
en
si
ty
Control: N=2218
Fig. 2.1 Histogram showing the chronological age distribution for all the healthy individuals included in
the DNA methylation dataset. The blue line represents the 1D kernel density estimate, as calculated by the
stat_density function in R with default parameters.
of this data type [Liu and Siegmund, 2016; Morris and Beck, 2015; Wilhelm-Benartzi et al.,
2013]. More specifically, a recent study by Je Liu and Kimberly D. Siegmund systematically
benchmarked the pre-processing methods available for the 450K array in order to reduce
variation among technical replicates and improve the detection of biological differences [Liu
and Siegmund, 2016]. Inspired by their results, I implemented a pre-processing pipeline for
the 450K data using the minfi R package [Aryee et al., 2014] embedded in the following
steps (Fig. 2.2):
1. Background correction. I used the noob method [Triche Jr et al., 2013], as imple-
mented in the preprocessNoob function from the minfi R package [Aryee et al., 2014].
noob allows accounting for technical variation in the background (i.e. non-specific)
fluorescence signal, which can lead to a reduced dynamic range for the methylation
values (β -values) obtained (Fig. 2.2b, Fig. S1.1) [Triche Jr et al., 2013]. Briefly,
when measuring fluorescence intensities in the Illumina array platforms, the observed
intensity (also known as foreground, X f ) is composed of:
2.1 Analysing the blood methylome to study human ageing 41
X f = Xs+Xb (2.1)
where Xs is the true signal and Xb is the background signal. Making use of a normal-
exponential convolution (which assumes Xs ∼ Exp(γ) and Xb ∼ N(µ,σ2)) and the
‘out-of-band’ (OOB) intensities (fluorescence signals in the opposite colour channel in
Infinium I probes) to model Xb , noob is capable of estimating Xs given X f . Furthermore,
I also applied the default dye-bias correction strategy, which controls for the different
average intensities in the two colour channels [Triche Jr et al., 2013].
2. Quality control (QC). Following guidelines from the minfi R package [Aryee et al.,
2014], I kept only those samples that satisfied the following criteria:
(a) The sex predicted from the DNA methylation data (Sexp) was the same as the re-
ported sex in the metadata. The sex was predicted using the getSex function from
the minfi R package [Aryee et al., 2014], which employs intensity information
from the sex chromosomes, such that:
Sexp =
female, if: (median
{
log2(My+Uy)
}−median{log2(Mx+Ux)})< c
male, if: (median
{
log2(My+Uy)
}−median{log2(Mx+Ux)})≥ c
(2.2)
where My and Uy represent the methylated and unmethylated intensity mea-
surements for the array probes in the Y chromosome, Mx and Ux represent the
methylated and unmethylated intensity measurements for the array probes in the
X chromosome and c is a predefined cutoff (default in minfi: c =−2). A total of
13 samples (0.56%) did not satisfy this criterion.
(b) They were not outliers according to their global intensity values after background
correction, such that:
median{log2(Mi)}+median{log2(Ui)}
2
≥ 10.5 (2.3)
42 Statistical aspects
where Mi and Ui represent the background-corrected methylated and unmethy-
lated intensity measurements for all the 450K array probes (Fig. S1.2). A total of
95 samples (4.09%) did not satisfy this criterion.
3. Probe filtering. I filtered out the following types of probes:
• Probes that contain SNPs at the single base extension site (position 0) or at the
proximal CpG on the probe (positions 1-2), using the dropLociWithSnps function
in the minfi package [Aryee et al., 2014].
• Cross-reactive probes, as defined by Chen et al. [2013]. These are probes that
can co-hybridise to alternative genomic sequences that are highly homologous to
the target sequences [Chen et al., 2013].
• Probes that map to the sex chromosomes (X and Y).
It is important to mention that other authors have also filtered out probes with high
detection p-value or low bead counts across samples [Morris and Beck, 2015; Wilhelm-
Benartzi et al., 2013]. However, I did not include these filters since it was not pointed
out in the minfi guidelines [Aryee et al., 2014; Fortin and Hansen, 2015] and it could
complicate further downstream analyses (e.g. different sets of probes missing across
different batches).
4. β -value calculation. The methylation status of a given cytosine (normally found in
a CpG site) in one of the array probes can be quantified using the β -value statistic,
which is calculated as [Du et al., 2010; Wilhelm-Benartzi et al., 2013]:
βi =
max(Mi,0)
max(Mi,0)+max(Ui,0)+α
(2.4)
where Mi and Ui represent the methylated and unmethylated intensity measurements
for the ith-probe and α is a constant offset (in this work α = 100, as recommended by
Illumina) [Du et al., 2010].
In a DNA molecule of a single cell, a specific cytosine is either unmethylated or
methylated (categorical / binary variable). However, given that a bulk DNA sample
from a tissue is composed of thousands of cells (which can include different cell types
with different methylation patterns), β -values result in a continuous variable between
0 and 1. A value of 0 means that all the measured DNA molecules are unmethylated
(0%) and a value of 1 means that all the measured DNA molecules are methylated
2.1 Analysing the blood methylome to study human ageing 43
DNA methylation
data (IDAT files)
Background
correction (noob)
Healthy individuals
Nsamples = 2325
Nprobes = 485512
Quality control
Healthy individuals
Nsamples = 2218
Nprobes = 485512
Probe filtering
β-value
calculation
BMIQ
normalisation
Healthy individuals
Nsamples = 2218
Nprobes = 428266
0
1
2
3
0.0 0.5 1.0
β−value
De
ns
ity Failed QC
Passed QC
0
1
2
3
4
0.0 0.5 1.0
β−value
De
ns
ity Failed QC
Passed QC
0
1
2
3
4
0.0 0.5 1.0
β−value
De
ns
ity
Passed QC
a b
c
d
Fig. 2.2 Main DNA methylation data pre-processing pipeline. a. Flowchart showing the main steps implemented
to pre-process the DNA methylation data from the 450K methylation arrays. The number of samples (Nsamples)
and the number of array probes (Nprobes) left after each step are also specified for the samples from the healthy
individuals. b. β -value distributions, calculated using the raw fluorescence intensities (i.e. before any pre-
processing), for the samples in the GSE41273 batch. Each curve represents a different sample. In grey: 51
samples that passed quality control (QC). In red: 2 samples that failed QC. c. As in b., but calculating the
β -values after background correction. d. As in b., but calculating the β -values after background correction,
QC, probe filtering and BMIQ normalisation (i.e. the final β -values that I used for downstream analyses). Note
that the samples that failed QC have been removed.
44 Statistical aspects
(100%) in that cytosine, which is roughly equivalent to say that 100% of the cells
are either unmethylated or methylated respectively in that cytosine for the sampled
tissue. The β -values for a given sample (i.e. considering all the cytosines measured)
usually follow a bimodal distribution, where the two peaks are centred around 0 and 1
(Fig. 2.2d).
Other authors have used M-values to quantify methylation levels in arrays (Fig. S1.3),
which can be calculated as:
M-valuei = log2
(
max(Mi,0)+α
max(Ui,0)+α
)
(2.5)
with a default offset value of α = 1. Du et al. reported that β -values suffer from
severe heteroscedasticity (i.e. differences in the variance) for highly methylated or
unmethylated CpG sites and therefore the M-values have more desirable statistical
properties [Du et al., 2010]. However, Zhuang et al. later showed that this only
becomes a problem in studies with small sample sizes [Zhuang et al., 2012] (which is
not the case for my analyses). Furthermore, β -values are easier to interpret biologically
and can be readily used in the context of BMIQ normalisation (see below). For these
reasons, I choose β -values as the main methylation variable for this work.
5. Beta-mixture quantile normalisation (BMIQ). In the case of the 450K arrays two
types of probes / chemistry coexist in the same platform. Infinium I probes and Infinium
II probes have different β -values distributions (a.k.a. Infinium II probe bias). BMIQ is
an intra-array normalisation strategy that allows to correct for this bias and has been
shown to outperform other methods used in this context [Dedeurwaerder et al., 2011;
Maksimovic et al., 2012; Teschendorff et al., 2012; Touleimat and Tost, 2012]. BMIQ
fits a three-state beta-mixture model to Infinium I and Infinium II probes separately
and then maps the Infinium II probes distribution into the Infinium I probe distribution
(Fig. 2.3). In the case of unmethylated (β -values close to 0) and methylated (β -values
close to 1) probes, this is done by transforming probabilities into quantiles. In the
case of ‘hemimethylated’ probes (intermediate β -values), a dilation transformation is
applied to preserve the monotonicity and continuity of the data [Teschendorff et al.,
2012]. I applied BMIQ to my samples and discarded those that failed the normalisation
step.
2.1 Analysing the blood methylome to study human ageing 45
0
2
4
6
0.00 0.25 0.50 0.75 1.00
β−value
D
en
si
ty
Infinium I
Infinium II 
with BMIQ
Infinium II 
without BMIQ
Fig. 2.3 Effect of BMIQ normalisation on the β -value distribution of different subsets of array probes with
different chemistries (Infinium I, Infinium II). These results correspond to a DNA methylation sample from the
GSE41273 batch. It can be appreciated how BMIQ transforms the distribution of the Infinium II probes into a
distribution more similar to the Infinium I probes.
46 Statistical aspects
2.1.3 Accounting for blood cell composition changes during ageing
Whole blood is composed of several cell types that contain a nucleus, including neutrophils,
eosinophils, basophils, CD14+ monocytes, CD4+ T cells, CD8+ T cells, CD19+ B cells and
CD56+ natural killer (NK) cells [Teschendorff and Zheng, 2017a]. These cell types have
different epigenetic profiles and, as a consequence, changes in their proportions (i.e. changes
in blood cell composition) can affect bulk DNA methylation measurements [Reinius et al.,
2012].
Accounting for this cellular heterogeneity is really important in epigenome-wide asso-
ciation studies (EWAS) [Jaffe and Irizarry, 2014; Liu et al., 2013; McGregor et al., 2016].
Furthermore, previous research has highlighted changes in blood cell composition with
age, which could be one of the causes behind immunosenescence [Chen et al., 2016b;
Czesnikiewicz-Guzik et al., 2008; Kuranda et al., 2011; Manser and Uhrberg, 2016; Seidler
et al., 2010]. Therefore, considering blood cell composition in the context of ageing-related
studies and the epigenetic clock is fundamental in order to make sure that the observed
age-related changes in the methylome are not a direct consequence of the changes in blood
cell composition during ageing [Chen et al., 2016a; Horvath et al., 2016a; Jaffe and Irizarry,
2014].
Several methods have been developed to estimate the cell composition of a blood sample
given a bulk DNA methylation measurement (a.k.a. cell-type deconvolution) [Teschendorff
et al., 2017; Teschendorff and Relton, 2018; Teschendorff and Zheng, 2017a; Titus et al.,
2017]. These methods can be broadly split in two categories:
• Reference-based approaches. They use a pre-defined set of DNA methylation ref-
erence profiles for the cell types that are supposed to be present in the tissue. In the
case of methylation arrays, these reference profiles can be constituted by the β -values
for a subset of array probes that are highly discriminative of the underlying cell types.
Assuming that the blood sample is a weighted linear sum of the C reference profiles,
the objective of the method is to find these weights (wc), which should be equivalent to
the actual cell type proportions (given the assumption ∑Cc=1 wc ≤ 1) [Teschendorff and
Zheng, 2017a]. In mathematical terms:
y =
C
∑
c=1
wcbc+ ε (2.6)
2.1 Analysing the blood methylome to study human ageing 47
where y is the DNA methylation profile of the sample being considered, C is the
number of underlying cell types, bc is the DNA methylation profile for the cth cell type
and ε is the error [Teschendorff et al., 2017]. Different algorithms have been applied
to estimate the values of wc, with the approach by Houseman et al. [2012] (which uses
a linear constrained projection) being the most widely used.
• Reference-free approaches. Instead of making use of reference profiles for the
cell types of interest, these methods generally calculate latent variables that capture
variation driven by cell type composition, although the strategy and assumptions to
derive these latent variables from the DNA methylation data is highly method-specific
[Teschendorff and Zheng, 2017a]. These methods become particularly useful when no
references are available for the cell types that constitute the tissue [Teschendorff and
Zheng, 2017a].
However, reference-free approaches rarely provide estimates for the specific cell types in
a given sample [Teschendorff and Zheng, 2017a] (which are needed in the current modelling
framework of the epigenetic clock) and they often rely on the assumption that the top compo-
nents of variation correlate with cell composition [Teschendorff et al., 2017], something that
is not always true (especially in the case of developmental disorders, see Chapter 3), Thus, I
decided to benchmark different reference-based cell-type deconvolution strategies in blood.
In this context I tested (Fig. S1.4):
• Different blood references. As pointed out before, the quality of the reference,
containing the DNA methylation profiles of the cell types to be inferred, is crucial
[Koestler et al., 2016; Teschendorff et al., 2017]. The reference must be composed of
those CpG sites (in this case, array probes) that are able to better discriminate between
the different cell types. In my case I considered six major blood ‘cell types’ for the
inference: granulocytes (‘Gran’), CD4+ T cells (‘CD4T’), CD8+ T cells (‘CD8T’),
CD19+ B cells (‘B’), CD14+ monocytes (‘Mono’) and CD56+ natural killer cells
(‘NK’). It is important to point out that granulocytes are not themselves a ‘biological
cell type’ (since they are composed of neutrophils, eosinophils and basophils), but
will be considered as a single ‘computational cell type’ as previously done [Chen
et al., 2016a; Horvath et al., 2016a]. I tested three different blood references whose
constitutive probes were selected using different strategies:
1. The reference implemented in the estimateCellCounts function from the minfi R
package [Aryee et al., 2014], which is widely used in the epigenetic literature.
The reference probes were selected using t-statistics, by finding those probes that
48 Statistical aspects
were differentially methylated in each cell type when compared with the rest of
the cell types. Among those probes that showed differences at p-value < 10−8,
the 100 most differentially methylated probes by effect size (50 hypermethylated
and 50 hypomethylated) were chosen for each cell type (making a total of 600
probes for the reference) [Jaffe and Irizarry, 2014].
2. The reference implemented in the EpiDISH R package (centDHSbloodDMC.m)
[Teschendorff and Zheng, 2017b]. The reference probes (DHS-DMCs, 333
in total) were selected by leveraging information of both differentially methy-
lated cytosines (DMCs, using moderated t-statistics) and chromatin accessibility
(DNase Hypersensitive Sites or DHS) for each cell type [Teschendorff et al.,
2017].
3. The reference implemented as part of the IDOL strategy (IDentifying Optimal
DNA methylation Libraries) [Koestler et al., 2016]. In this case, the reference
probes (300 in total) were originally selected based on differential methylation
criteria and are updated in an iterative manner, with the probability of being
selected based on their contribution to prediction accuracy [Koestler et al., 2016].
The three references were built using the dataset from Reinius et al. [2012] (GSE35069),
which I obtained directly from the FlowSorted.Blood.450k R package [Jaffe, 2018].
This dataset contains DNA methylation data generated in the 450K array for the six cell
types considered, all of which were isolated using flow cytometry [Reinius et al., 2012].
The β -values for the selected probes were averaged across the biological replicates for
each cell type.
• Different DNA methylation pre-processing pipelines. I tested different configura-
tions for the pre-processing of both the gold-standard (see below) and the reference
data. For example, I tested whether probe filtering according to the criteria outlined in
the previous section (section 2.1.2) is desirable, since this leads to the removal of some
of the probes originally selected for the reference in the original publications [Koestler
et al., 2016; Teschendorff et al., 2017] (Fig. S1.4). Furthermore, I also tested whether
the prediction benefits from a similar pre-processing of both the gold-standard (or the
dataset where the prediction will be made) and the reference.
• Different deconvolution algorithms. I tested the performance of the following algo-
rithms: CP/QP (constrained projection/quadratic programming, originally implemented
by Houseman et al. [2012]), RPC (robust partial correlations) [Teschendorff et al.,
2017] and CIBERSORT (which was originally developed for cell-type deconvolution
2.1 Analysing the blood methylome to study human ageing 49
using RNA expression data) [Newman et al., 2015; Teschendorff et al., 2017]. One
of the key differences between the algorithms is how the normalisation constrain
(∑Cc=1 wc ≤ 1) is implemented [Teschendorff et al., 2017]. All the algorithms were
run using the implementations in the epidish function from the EpiDISH R package
[Teschendorff and Zheng, 2017b], with the exception of the run in the minfi reference,
for which I used the estimateCellCounts function with default parameters for the 450K
array [Aryee et al., 2014].
In order to compare the results from the predictions against real cell composition values, I
used a gold-standard dataset (GSE77797) containing 12 samples where known proportions
of DNA isolated from the different blood cell types were mixed [Koestler et al., 2016]. I
assessed the accuracy of the predictions using three different metrics:
• Root mean squared error (RMSE), which is calculated as (for a given cell type c):
RMSEc =
√
∑Nn=1(yˆcn− ycn)2
N
(2.7)
where yˆcn is the predicted proportion of the cth cell type in the nth sample, ycn is the
real proportion of the cth cell type in the nth sample and N is the total number of
samples in the gold-standard dataset (N = 12). A perfect prediction for a cell type
would minimise the value of RMSEc (i.e. RMSEc = 0).
• Mean absolute error (MAE), which is calculated as (for a given cell type c):
MAEc =
∑Nn=1 |yˆcn− ycn|
N
(2.8)
A perfect prediction for a cell type would minimise the value of MAEc (i.e. MAEc = 0).
• Coefficient of determination (R2), which is calculated as (for a given cell type c):
R2c =
∑Nn=1(yˆcn− y¯c)2
∑Ni=1(ycn− y¯c)2
(2.9)
where y¯c =
∑Nn=1 ycn
N . A perfect prediction would maximise the value of R
2
c (i.e. R
2
c = 1).
50 Statistical aspects
The most accurate strategy, according to the RMSE (mean across cell types: 1.9270) and
MAE (mean across cell types: 1.5498), is ‘idol_NFB_houseman’ (Fig. 2.4, Fig. S1.5) i.e.
the strategy that uses the IDOL reference, with all the pre-processing steps from my main
pipeline for both reference and gold-standard (noob background correction, probe filtering
and BMIQ normalisation) and employs Houseman’s CP/CQ algorithm (Fig. S1.4). This
strategy performed well in all the cell types (Fig. 2.5) and I selected it for my cell-type
deconvolution analyses.
It is important to mention that the gold-standard dataset was generated as part of the
same study where the IDOL reference was also derived [Koestler et al., 2016]. However, the
gold-standard samples were used as an independent validation of the IDOL reference and
should not influence the conclusions of the benchmarking that I performed. In the future, it
will be interesting to validate these conclusions using new gold-standard datasets generated
from whole blood.
Next, I ran the optimal blood cell-type deconvolution strategy in the DNA methylation
dataset that I built from healthy individuals (Table 2.1). The main goal of this analysis
was to provide blood cell type proportions that can be used as covariates as part of the
epigenetic clock modelling (see section 2.2.2). However, this also allowed me to broadly
quantify the changes in blood composition that occur during human ageing (Fig. 2.6).
The mammalian immune system undergoes dramatic changes during ageing. These changes
are normally referred as immunosenescence and can be broadly defined as a decline in
immune system functionality and its ability to fight infections, which results in an increase
in morbidity and mortality with age [Nikolich-Žugich, 2018]. Furthermore, human ageing
is also characterised by an increase in chronic, low-grade inflammation referred as inflam-
mageing, which is thought to contribute to the development of age-related diseases (such as
atherosclerosis, type 2 diabetes, Alzheimer’s disease and osteoporosis) [Franceschi, 2007].
In my dataset, I observe the following (Fig. 2.6):
• A relative decrease in cell types from the adaptive immune system (CD4+ T cells,
CD8+ T cells and CD19+ B cells). Interestingly, the decline in CD8+ T cells was
more pronounced (i.e. higher absolute value of the slope) than in the case of CD4+ T
cells, which has been previously reported [Czesnikiewicz-Guzik et al., 2008].
• A relative increase in cell types from the innate immune system (granulocytes, CD14+
monocytes and CD56+ natural killer cells).
2.1 Analysing the blood methylome to study human ageing 51
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
5
10
m
inf
i
dh
s_
dif
1_
ho
us
em
an
dh
s_
NB
_h
ou
se
m
an
dh
s_
dif
2_
ho
us
em
an
dh
s_
NF
B_
ho
us
em
an
dh
s_
dif
1_
cib
er
so
rt
dh
s_
NB
_c
ibe
rs
or
t
dh
s_
dif
2_
cib
er
so
rt
dh
s_
NF
B_
cib
er
so
rt
dh
s_
dif
1_
rp
c
dh
s_
NB
_r
pc
dh
s_
dif
2_
rp
c
dh
s_
NF
B_
rp
c
ido
l_N
B_
ho
us
em
an
ido
l_N
FB
_h
ou
se
m
an
ido
l_N
B_
cib
er
so
rt
ido
l_N
FB
_c
ibe
rs
or
t
ido
l_N
B_
rp
c
ido
l_N
FB
_r
pc
Cell−type deconvolution strategy
RM
SE
Cell
●
●
●
●
●
●
B
CD4T
CD8T
Gran
Mono
NK
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
0.0
2.5
5.0
7.5
10.0
m
inf
i
dh
s_
dif
1_
ho
us
em
an
dh
s_
NB
_h
ou
se
m
an
dh
s_
dif
2_
ho
us
em
an
dh
s_
NF
B_
ho
us
em
an
dh
s_
dif
1_
cib
er
so
rt
dh
s_
NB
_c
ibe
rs
or
t
dh
s_
dif
2_
cib
er
so
rt
dh
s_
NF
B_
cib
er
so
rt
dh
s_
dif
1_
rp
c
dh
s_
NB
_r
pc
dh
s_
dif
2_
rp
c
dh
s_
NF
B_
rp
c
ido
l_N
B_
ho
us
em
an
ido
l_N
FB
_h
ou
se
m
an
ido
l_N
B_
cib
er
so
rt
ido
l_N
FB
_c
ibe
rs
or
t
ido
l_N
B_
rp
c
ido
l_N
FB
_r
pc
Cell−type deconvolution strategy
M
AE
Cell
●
●
●
●
●
●
B
CD4T
CD8T
Gran
Mono
NK
a
b
Fig. 2.4 Benchmarking of the cell-type deconvolution strategies in blood. The x-axis shows the different
strategies that were tested (for a detailed description see Fig. S1.4). The y-axis shows the results for a. the root
mean squared error (RMSE) and b. the mean absolute error (MAE) when comparing the predictions with the
real proportions of cells in a gold-standard dataset (GSE77797) [Koestler et al., 2016]. The grey horizontal solid
lines represent the mean for the RMSE or the MAE across cell types and the grey dashed line the minimum of
these values.
52 Statistical aspects
l
ll
l
ll
l
l
l
l
l
l
20
40
60
20 40 60
Real cellular fractions (%)
Pr
ed
ic
te
d 
ce
llu
la
r f
ra
ct
io
ns
 (%
)
Strategy: idol_NFB_houseman
Cell type: Gran
l
l
l l
ll
l
l
l
l
l
l
8
12
16
20
10 15
Real cellular fractions (%)
Pr
ed
ic
te
d 
ce
llu
la
r f
ra
ct
io
ns
 (%
)
Strategy: idol_NFB_houseman
Cell type: CD4T
l
l
l
l
l
l
l
l
l
l
ll
0
10
20
30
10 20 30
Real cellular fractions (%)
Pr
ed
ic
te
d 
ce
llu
la
r f
ra
ct
io
ns
 (%
)
Strategy: idol_NFB_houseman
Cell type: CD8T
l
l
l
l
l
l
l
l
l
l
l
l
10
20
0 10 20
Real cellular fractions (%)
Pr
ed
ic
te
d 
ce
llu
la
r f
ra
ct
io
ns
 (%
)
Strategy: idol_NFB_houseman
Cell type: B
ll
l
l
l
l
l
l
l
l
l
l
5
10
15
20
5 10 15 20
Real cellular fractions (%)
Pr
ed
ic
te
d 
ce
llu
la
r f
ra
ct
io
ns
 (%
)
Strategy: idol_NFB_houseman
Cell type: Mono
l
l
l
l
l
l
l
l
l
l
l
l
0
5
10
15
20
0 5 10 15 20
Real cellular fractions (%)
Pr
ed
ic
te
d 
ce
llu
la
r f
ra
ct
io
ns
 (%
)
Strategy: idol_NFB_houseman
Cell type: NK
Fig. 2.5 Comparison of the predictions for the different cell types using the optimal deconvolution strategy
(‘idol_NFB_houseman’) with the real cell type fractions in the gold-standard dataset (GSE77797) [Koestler
et al., 2016]. Each point corresponds to a different sample in the gold-standard. The black dashed line represents
the diagonal to aid visual interpretation.
2.1 Analysing the blood methylome to study human ageing 53
These results are highly consistent with the literature [Chen et al., 2016b; Czesnikiewicz-
Guzik et al., 2008; Jaffe and Irizarry, 2014; Kuranda et al., 2011; Manser and Uhrberg, 2016;
Seidler et al., 2010], which validates the methodology for cell-type deconvolution that I have
used. These variations in blood cell composition may be caused by the age-related changes
that happen in the two primary lymphoid organs: the bone marrow (whose hematopoietic stem
cells exhibit reduced self-renewal potential and increased skewing towards myelopoiesis)
and the thymus (which undergoes tissue involution) [Chinn et al., 2012].
This analysis provides a preliminary overview of the blood cell composition landscape
during human ageing. However, only relative changes in blood composition were quantified
and the analysis is limited by the ‘cell types’ that I have deconvoluted (e.g. granulocytes
include different cell types, different subsets of monocytes exist, etc.), which means that
these conclusions must be taken with care [Nikolich-Žugich, 2018]. Furthermore, the sex of
the individual can influence the proportions of blood leukocytes [Chen et al., 2016b] and it
should be taken into account in future analyses.
2.1.4 Identifying differentially methylated positions during ageing
Differential methylation analysis is one of the most common types of downstream analyses
in the context of DNA methylation data [Morris and Beck, 2015; Teschendorff and Relton,
2018; Wilhelm-Benartzi et al., 2013]. It involves finding associations between the DNA
methylation levels at specific CpG sites in the genome (a.k.a. differentially methylation
positions or DMPs) and a given phenotypic variable of interest (e.g. a specific disease,
when compared with a healthy sample). It is worth mentioning that DMPs are also called
differentially methylated cytosines (DMCs) in the literature [Teschendorff and Relton, 2018].
In order to study the changes that the methylome undergoes during physiological ageing, it
is useful to identify differentially methylated positions during ageing (aDMPs) i.e. individual
cytosines (normally found in a CpG context) that change their methylation status as a
function of chronological age. Linear models, widely used in the context of differential
RNA expression analysis [Ritchie et al., 2015], can also been adapted to find aDMPs
[Teschendorff and Relton, 2018; Zhuang et al., 2012]. In the case of a continuous variable
(such as chronological age) the association is performed using a linear regression modelling
framework [Zhuang et al., 2012] (see section 2.4 for a short description of linear regression
and the nomenclature used throughout this thesis). Briefly, for each probe in the methylation
array, I fitted the following linear regression models to the data from the healthy individuals:
54 Statistical aspects
0
25
50
75
100
0 25 50 75 100
Chronological age (years)
G
ra
n 
(%
)
SCC: 0.3071; p−value < 2.2e−16
 Model: y = 49.1307 + 0.1674*x
0
10
20
30
40
50
0 25 50 75 100
Chronological age (years)
CD
4T
 (%
)
SCC: −0.1998; p−value < 2.2e−16
 Model: y = 19.9565 − 0.0522*x
0
10
20
30
40
50
0 25 50 75 100
Chronological age (years)
CD
8T
 (%
)
SCC: −0.3754; p−value < 2.2e−16
 Model: y = 12.4339 − 0.0835*x
0
10
20
30
40
50
0 25 50 75 100
Chronological age (years)
B
 (%
)
SCC: −0.405; p−value < 2.2e−16
 Model: y = 8.9931 − 0.0695*x
0
10
20
30
40
50
0 25 50 75 100
Chronological age (years)
M
on
o 
(%
)
SCC: 0.1094; p−value = 2.43e−07
 Model: y = 5.2971 + 0.0066*x
0
10
20
30
40
50
0 25 50 75 100
Chronological age (years)
N
K
 (%
)
SCC: 0.2064; p−value < 2.2e−16
 Model: y = 3.9194 + 0.0313*x
Fig. 2.6 Changes in blood cell composition during human ageing. Scatterplots showing the changes in the
proportions of the six cell types considered (inferred using the cell-type deconvolution strategy) as a function
of chronological age. Each point represents a different DNA methylation human sample from Table 2.1. The
black line displays the linear model %cell_type ∼ Age (see section 2.4 for more details on linear modelling),
with the slope and intercept shown in the titles. The Spearman’s correlation coefficient (SCC) and the p-value
associated with it are also displayed.
2.1 Analysing the blood methylome to study human ageing 55
• A model with cell composition correction (CCC). As I have shown previously, the
different blood cell types change their abundance with age. Therefore, in order to
maximise the chances of finding aDMPs that are conserved across different cell types,
it is important to include the estimated cell proportions as covariates in the model:
Beta∼ Age+Sex+Gran+CD4T +CD8T +B+Mono+NK+PC1+ ...+PC17
(2.10)
where Beta is the β -value for the array probe being evaluated; Age is the chronological
age (in years) of the samples; Sex encodes for the sex of the samples (0/1); Gran,
CD4T , CD8T , B, Mono and NK are the cell type proportions from the samples as
calculated with my cell-type deconvolution strategy and PCN is the Nth principal
component that captures technical variance and accounts for potential batch effects
(see section 2.2.3 for more details).
• A model without CCC, which can be expressed as:
Beta∼ Age+Sex+PC1+ ...+PC17 (2.11)
This leads to the identification of aDMPs which will be more confounded with the
proportions of the different cell types (i.e. the change in β -value with age could be
entirely driven by a change in a specific cell type that is differentially methylated at
that particular probe).
Furthermore, for each probe, I calculated a p-value, based on t-statistics [Teschendorff
and Relton, 2018], to assess whether the putative linear association between the methylation
status and chronological age was significant or not (at a significance level of α = 0.01 after
applying Bonferroni correction to account for multiple testing, see section 2.4 for more
details). I used a customised version of the dmpFinder function in the minfi R package
[Aryee et al., 2014] to identify the aDMPs, which internally uses the limma framework
[Ritchie et al., 2015]. Given the big sample size (N = 2218≫ 10), I did not use variance
shrinkage (i.e. empirical Bayes moderated t-statistics) as part of the statistic calculations
[Ritchie et al., 2015].
An overview of the different aDMPs (with and without CCC) identified in the healthy
individuals can be found in Figure 2.7. Around 30% of the blood methylome (at least
56 Statistical aspects
according to the 450K array) is affected by the ageing process during human lifespan.
However, it is worth mentioning that Bonferroni correction provides a very conservative
picture of the methylomic changes (when compared with other methods to control for type-I
error, like FDR) and it is likely that an even greater proportion of the methylome is indeed
altered with age [Zhu et al., 2018]. CpG sites can become both hypomethylated (i.e. lose
methylation with age) or hypermethylated (i.e. gain methylation with age). Importantly,
the effect sizes of the age coefficient (i.e. the observed changes in the β -values per year)
are generally small. More specifically, in the model with CCC, the median age coefficient
for the hypomethylated aDMPs is -0.000426 (equivalent to a -4.26% methylation change
over 100 years of human life) and for hypermethylated aDMPs is 0.000437 (equivalent to
a +4.37% methylation change over 100 years of human life). This is consistent with the
progressive functional decline observed during ageing [Lopez-Otin et al., 2013]. It is worth
mentioning that around 50% of the CpG sites that constitute the Horvath epigenetic clock
are blood aDMPs according to my analysis (Fig. 2.7c,d). Overall, these results are consistent
with previous studies [Slieker et al., 2018, 2016; van Dongen et al., 2016; Zhu et al., 2018].
Next, I looked at the top 100 aDMPs that were identified (according to their p-value
and t-statistic, Fig. S1.6 and Fig. 2.8). The first aDMP in the list was cg16867657, a probe
that consistently gains methylation with age (Fig. 2.8a) and has been previously identified
as the strongest aDMP across tissues and human populations in several studies [Bacalini
et al., 2017; Garagnani et al., 2012; Gopalan et al., 2017; Hannum et al., 2013; Slieker et al.,
2018; Zbiec´-Piekarska et al., 2015]. cg16867657 is associated with the CpG island in the
promoter of the ELOVL2 gene, which encodes an enzyme that catalises one of the reactions
in the elongation of polyunsaturated fatty acids [Gopalan et al., 2017]. Furthermore, other
aDMPs that were located among my top hits have previously been reported as well (such as
cg06639320 in the FHL2 gene, which is the second aDMP, Fig. 2.8b) [Garagnani et al., 2012].
These results validate the statistical methods used so far to process the DNA methylation
data and to identify aDMPs.
It is important to mention that not all the CpG sites change their DNA methylation levels
with age in a perfectly linear manner. For instance, the two top hypomethylated aDMPs
(Fig. 2.8c,d) modify their rate at ages 20-25 years. This was already recognised by Horvath
[Horvath, 2013a] and that is why he transformed the age into a logarithmic scale before the
age of 20 years in order to improve the model fit (see section 2.2.1). Furthermore, genetic
background can have a significant effect on the DNA methylation patterns and interact with
the ageing process to shape the epigenome [Hannum et al., 2013; van Dongen et al., 2016].
Unfortunately, I did not have genetic data for the healthy individuals but this could help
2.1 Analysing the blood methylome to study human ageing 57
70.61%
13.24%
16.14%
0e+00
1e+05
2e+05
3e+05
Nu
m
be
r o
f a
DM
Ps
Methylation change
Hypermethlated
Hypomethylated
No change
70.39%
10.53%
19.08%
0e+00
1e+05
2e+05
3e+05
Nu
m
be
r o
f a
DM
Ps
Methylation change
Hypermethlated
Hypomethylated
No change
aDMPs with CCC aDMPs without CCC
a b
c d
Fig. 2.7 The blood methylome changes during physiological human ageing. a. Barplot showing the total
number of differentially methylated positions during ageing (aDMPs) that were identified (in grey: probes
that did not reach statistical significance). In this case, the model with cell composition correction (CCC) was
applied. b. As in a., but using the model without CCC. c. Volcano plot showing the relationship between the
p-value (y-axis) and the effect size (x-axis) of the age coefficient for each one of the array probes (each point
represents a probe). Those probes above the dashed green line (α = 0.01 after Bonferroni correction) are the
identified aDMPs. Above the volcano plot, a density plot captures the distributions of the age coefficient for
the hypermethylated aDMPs (in red) and the hypomethylated aDMPs (in blue). In this case, the model with
CCC was applied. The black points are the 353 CpG probes that constitute the Horvath epigenetic clock model
[Horvath, 2013a]. d. As in c., but using the model without CCC.
58 Statistical aspects
0.
00
0.
25
0.
50
0.
75
1.
00
0 25 50 75 10
0
Chronological age (years)
β−
v
al
ue
cg16867657
0.
00
0.
25
0.
50
0.
75
1.
00
0 25 50 75 10
0
Chronological age (years)
β−
v
al
ue
cg06639320
0.
00
0.
25
0.
50
0.
75
1.
00
0 25 50 75 10
0
Chronological age (years)
β−
v
al
ue
cg19283806
0.
00
0.
25
0.
50
0.
75
1.
00
0 25 50 75 10
0
Chronological age (years)
β−
v
al
ue
cg10501210
Fig. 2.8 Changes in the β -values of four differentially methylated positions during ageing (aDMPs) in the blood
of the healthy individuals. cg16867657 and cg06639320 are the top aDMPs that gain methylation with age
(i.e. become hypermethylated) according to the model that accounts for cell composition correction (CCC).
cg19283806 and cg10501210 are the top aDMPs that lose methylation with age (i.e. become hypomethylated)
according to the model that accounts for CCC. In order to aid visualisation, the black line displays the linear
model β -value ∼ Age.
2.1 Analysing the blood methylome to study human ageing 59
to refine the identification of aDMPs in the future. Additionally, it would be interesting to
apply methods to control for bias and inflation in the test statistic, by estimating the empirical
null distribution of the observed set of test statistics [van Iterson et al., 2017]. Finally,
other types of epigenetic features can be derived to understand the effects of ageing in the
epigenome, such as variably methylated positions during ageing (aVMPs) [Slieker et al.,
2016], differentially methylated regions (DMRs, which consider several correlated CpGs
at the same time) [Teschendorff and Relton, 2018] or differentially methylated cytosines in
individual cell types (DMCTs, which consider interactions between the phenotypic variable
and the proportions of cell types) [Zheng et al., 2018].
2.1.5 Shannon methylation entropy
Shannon entropy (H) can be used in the context of DNA methylation analysis to estimate the
information content stored in a given set of CpG sites [Hannum et al., 2013; Jenkinson
et al., 2017; Slieker et al., 2016; Wang et al., 2017; Xie et al., 2011]. I calculated it using the
same approach as in Hannum et al. [2013]:
H =− 1
N
·
N
∑
i=1
[βi · log2(βi)+(1−βi) · log2(1−βi)] (2.12)
where βi represents the methylation β -value for the ith array probe (or CpG site) and
N = 428266 if all the array probes that passed the pre-processing pipeline are considered (i.e.
genome-wide, ot at least array-wide). Shannon entropy is minimised when the methylation
levels of all the CpGs are either 0% or 100%, and maximised when all of them are 50%
(Fig. 2.9).
Next, I calculated the genome-wide Shannon entropy for the blood samples in the healthy
individuals. Consistent with previous reports [Hannum et al., 2013; Jenkinson et al., 2017;
Slieker et al., 2016; Wang et al., 2017], the genome-wide Shannon entropy associated
with the methylome increases during ageing (Fig. 2.10a; Spearman correlation coefficient
= 0.1985; p-value = 3.8281 ·10−21), which implies that the epigenome loses information
content. Finally, it is worth mentioning that I observed a remarkable batch effect on the
Shannon entropy calculations, which can generate high entropy variability for a given
age (Fig. 2.10b). However, after removing potential outlier batches (such as GSE41273,
GSE59065 or GSE97362) the increase of Shannon methylation entropy during ageing was
still consistent. Thus, accounting for technical variation (see section 2.2.3) becomes crucial
when assessing this type of data, even after careful pre-processing.
60 Statistical aspects
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
β−value (of a CpG site)
Sh
an
no
n 
en
tro
py
 (o
f a
 C
pG
 si
te)
Fig. 2.9 Plot showing the relationship between the β -value and the methylation Shannon entropy at a given
CpG site (in my case, at a given array probe).
0.3
0.4
0.5
0.6
0 25 50 75 100
Chronological age (years)
Ge
no
m
e−w
id
e 
Sh
an
no
n 
en
tro
py
0.3
0.4
0.5
0.6
0 25 50 75 100
Chronological age (years)
Ge
no
m
e−w
id
e 
Sh
an
no
n 
en
tro
py
Batch
Europe
Feb_2016
GSE104812
GSE111629
GSE40279
GSE41273
GSE42861
GSE51032
GSE55491
GSE59065
GSE61496
GSE74432
GSE81961
GSE97362
a b
Fig. 2.10 a. Scatterplot showing the changes in genome-wide methylation Shannon entropy during ageing in
the healthy individuals. Each sample is represented by one point. The black line displays the linear model
Entropy ∼ Age. b. Same as in a., but colouring the samples according to the batch where they came from.
2.2 Behaviour of Horvath’s epigenetic clock during ageing 61
2.2 Behaviour of Horvath’s epigenetic clock during ageing
2.2.1 Calculating epigenetic age using Horvath’s epigenetic clock
Steve Horvath’s model, originally published in 2013 [Horvath, 2013a], is without any doubt
the most widely used epigenetic clock in the literature. Given that it works across tissues
with high accuracy and that it has been validated in many human cohorts, I have used it as
the main tool to quantify epigenetic ageing in this work.
Horvath’s model measures epigenetic age (a.k.a. DNAmAge) by making use of the DNA
methylation levels at 353 CpG sites, as quantified with the Illumina methylation arrays
(27K or 450K). Previous studies have generally employed a ready-to-use online calculator
for DNAmAge provided by Steve Horvath [Horvath, 2013b]. This has clearly simplified
the computational process and helped a lot of research groups to test the behaviour of the
epigenetic clock in their system of interest. However, this has also led to the treatment of the
epigenetic clock as a ‘black-box’, without critical assessment of the statistical methodology
behind it. Therefore, I decided to replicate the original code and to make it available
in a GitHub repository for the scientific community to be used [Martin-Herranz, 2019].
Furthermore, I tested the impact of different steps involved in the estimation of epigenetic
age acceleration (EAA), including the presence/absence of background correction, removal
of technical variation from batch effects and the importance of the age distribution when
fitting the control models, which I discuss in the following sections.
The main pipeline to calculate the epigenetic age (DNAmAge) from a sample has the
following steps (some of them are shared with the previously described pipeline for DNA
methylation pre-processing in section 2.1.2):
1. Background correction. I implemented a pipeline that starts with the raw DNA
methylation data (IDAT files) for a sample. First, I tested the effect of applying noob
background correction, before calculating the β -values, on the median absolute error
(MAE) of the predictions (see section 2.2.2). Background correction did not have
a major impact in the final predictions as long as I also corrected for batch effects
(Fig. S1.7, Fig. 2.13c, see section 2.2.3). Therefore, I decided to keep the noob
background correction for consistency with the other pre-processing pipeline.
2. Quality control. I applied the same criteria as previously described in section 2.1.2.
3. Probe filtering. Horvath’s model was originally trained starting with 21368 array
probes that had the following characteristics [Horvath, 2013a]:
62 Statistical aspects
• They were shared between the 27K and 450K methylation arrays.
• They had ≤ 10 missing values across all the training data.
Therefore, these were the probes selected for downstream analysis.
4. β -value calculation. β -values were calculated as previously described in section 2.1.2.
It is worth mentioning that Horvath’s original code includes two alternatives for the
imputation of missing β -values:
• Slow imputation (applied when the number of missing β -values is < 3000). In this
case, k-nearest neighbours (KNN) is used. KNN imputation borrows information
from the DNA methylation profiles of the most similar probes (the neighbours)
according to a metric (normally the Euclidean distance). The impute.knn function
from the impute R package can be used for these purposes [Troyanskaya et al.,
2001].
• Fast imputation (applied when the number of missing β -values is ≥ 3000). In
this case, the values from the blood gold-standard (see below) can be used as the
imputed values.
In the case of my dataset, no missing values were present for the 21368 probes so there
was no need to perform imputation.
5. Gold-standard normalisation. A modified version of BMIQ normalisation is used
[Teschendorff et al., 2012]. In this case, instead of mapping the distribution of the
Infinium II probes to the distribution of Infinium I probes, the mapping is done from
the distribution of the 21368 probes in the sample to the distribution of a previously
derived gold-standard for the same set of probes. This gold-standard was created by
taking the average β -values for the 21368 probes across all the whole blood samples
from [Horvath et al., 2012].
6. Calculating epigenetic age (DNAmAge). As previously observed for some of the
aDMPs, the rate of β -value change can be different before and after adult age (Fig. 2.8).
For this reason, Horvath performed a transformation of the chronological age before
training the model:
f (c) = ct =
ln
( c+1
a+1
)
if: c≤ a( c−a
a+1
)
if: c > a
(2.13)
2.2 Behaviour of Horvath’s epigenetic clock during ageing 63
where ct is the transformed chronological age that was used as the dependent variable
during training, c is the chronological age (in years) and a is the adult age (for
humans, 20 years). This transformation allows accounting for a relationship between
chronological age and methylation changes that is logarithmic until adult age and linear
afterwards (Fig. 2.11).
−2
0
2
4
0 25 50 75 100
Chronological age (years)
Tr
an
sf
o
rm
ed
 c
hr
o
n
o
lo
gi
ca
l a
ge
Fig. 2.11 Plot showing the relationship between the chronological age in years (c) and the transformed
chronological age (ct ) in Horvath’s model. This transformation allows accounting for different rates of β -value
change before and after adult age (20 years in humans, as pointed out by the dashed black line).
Given a sample to predict, the epigenetic age can then be calculated as:
DNAmAge = g(cˆt) = g(βˆ0+
353
∑
i=1
βˆi · xi) (2.14)
where cˆt is the predicted transformed age according to Horvath’s model, βˆ0 is the
intercept in the Horvath’s model, βˆi is the coefficient (weight) for the ith probe (only
353 probes are finally used), xi is the β -value for the ith probe after gold-standard
normalisation and g(·) is the inverse of f (·), such that:
g(cˆt) = f−1(cˆt) = cˆ =
ecˆt · (a+1)−1 if: cˆt ≤ 0cˆt · (a+1)+a if: cˆt > 0 (2.15)
64 Statistical aspects
where cˆ is the predicted age according to Horvath’s model (i.e. DNAmAge).
2.2.2 Horvath’s epigenetic clock measures physiological ageing
Using the methodology from the previous section, I calculated the epigenetic age (DNAmAge)
in the blood of the healthy individuals. Given that these individuals are supposed to be
disease-free, Horvath’s epigenetic clock should predict epigenetic ages that are similar to
the chronological age of the samples, and this was indeed the case (Fig. 2.12a, Pearson’s
correlation coefficient (PCC)= 0.9671, p-value≈ 0). This validates that Horvath’s epigenetic
clock does indeed measure the ageing process (at least in a cross-sectional population) and
sets a foundation for the rest of the analyses presented in this thesis.
As mentioned in Chapter 1, the difference between epigenetic age and chronological age
is known as epigenetic age acceleration (EAA), with a positive EAA (i.e. DNAmAge>Age)
associated with several age-related health problems. In order to calculate the EAA for the
healthy individuals, I fitted the following linear regression models (hereinafter referred as the
control models):
• With cell composition correction (CCC):
DNAmAge∼ Age+Sex+Gran+CD4T +CD8T +B+Mono+NK+PC1+ ...+PC17
(2.16)
where DNAmAge is the epigenetic age calculated with Horvath’s epigenetic clock;
Age is the chronological age (in years) of the samples; Sex encodes for the sex of the
samples (0/1); Gran, CD4T , CD8T , B, Mono and NK are the cell type proportions
from the samples as calculated with my cell-type deconvolution strategy and PCN is
the Nth principal component that captures technical variance and accounts for potential
batch effects (see section 2.2.3 for more details).
Horvath’s epigenetic clock was trained using multiple tissues and its predictions should
be robust to changes in blood cell composition. However, previous studies have
highlighted that adding this correction can improve the ability to detect ‘pure’ ageing
effects [Chen et al., 2016a; Horvath et al., 2016a] (i.e. epigenetic age acceleration
mainly caused by DNA methylation changes that happen in the nucleus of all cell
types). For a given sample, the EAAwith CCC is the residual from the model i.e. the
2.2 Behaviour of Horvath’s epigenetic clock during ageing 65
difference between the actual DNAmAge and the prediction from the control model
(which is conceptually similar to the difference between DNAmAge and chronological
age, but accounting for the rest of covariates as well). The EAAwith CCC that I have
defined is very similar to the previously reported measure of ‘intrinsic EAA’ (IEAA)
[Chen et al., 2016a; Horvath et al., 2016a].
• Without CCC:
DNAmAge∼ Age+Sex+PC1+ ...+PC17 (2.17)
In this case the residuals of the model are referred as the EAAwithout CCC for the
different samples.
It is possible to calculate the overall accuracy of the predictions using the median absolute
error (MAE), that is calculated as:
MAE = median{|EAAi|} (2.18)
where EAAi is the epigenetic age acceleration for the ith sample calculated with one
of the models (with CCC or without CCC). The MAE for all the healthy individuals (full
lifespan) in the control models should approach zero, and this was indeed what I observed
(MAEwith CCC = 2.7117 years, MAEwithout CCC = 2.8211 years). These results are below the
original MAE reported by Horvath in his test set (3.6 years) [Horvath, 2013a]. However, it is
worth mentioning that some of the samples from my healthy individuals (such as samples
from batches GSE40279 and GSE42861) could have been used by Horvath as part of his
training set [Horvath, 2013a], and therefore these results must be interpreted carefully.
Even though Horvath’s model seems to predict epigenetic age accurately, it is also clear
that some samples deviate substantially from the expected prediction. This is specially
obvious for the older samples (> 55 years), that have a systematically younger epigenetic age
than expected (see deviations from the diagonal in Fig. 2.12a). If a control model is fit to the
full lifespan dataset (which contains around 50% samples which are > 55 years), this leads to
a model with a smaller than expected age coefficient (slope), which introduces a bias when
estimating epigenetic age acceleration for different age groups (Fig. 2.12b). Although many
studies do not take this problem into account, this phenomenon has been previously reported
in the context of humans [El Khoury et al., 2018; Marioni et al., 2018] and mice [Stubbs
66 Statistical aspects
0
25
50
75
100
0 25 50 75 100
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Full lifespan control: N = 2218
−30
−20
−10
0
10
20
30
Young age
M
iddle age
Old age
Age group
Ep
ig
en
et
ic
 a
ge
 a
cc
el
er
at
io
n 
(y
ea
rs
)
EAA model
With CCC
Without CCC
Full lifespan control: N = 2218
0
25
50
75
100
0 25 50 75 100
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
0−55 years control: N = 1128
−30
−20
−10
0
10
20
30
Young age
M
iddle age
Age group
Ep
ig
en
et
ic
 a
ge
 a
cc
el
er
at
io
n 
(y
ea
rs
)
EAA model
With CCC
Without CCC
0−55 years control: N = 1128
a b
c d
Fig. 2.12 Horvath’s epigenetic clock measures physiological ageing. a. Scatterplot showing the relationship
between epigenetic age (DNAmAge) according to Horvath’s model [Horvath, 2013a] and chronological age
of the samples for the healthy individuals. Each sample is represented by one point. The black dashed line
represents the diagonal to aid visualisation. The solid brown line represents the linear model DNAmAge∼ Age,
which deviates from the diagonal if the full lifespan samples are used. b. Boxplots displaying the epigenetic age
acceleration (EAA) distributions for different age ranges (young age: ≤ 20 years; middle age: 20 < Age≤ 55
years; old age: > 55 years) after fitting the control models to the full lifespan samples. The dashed black line
represents EAA = 0, where the distributions should be centred around. This is not the case for the samples in
the young age and middle age groups. In red: EAA model with cell composition correction (CCC). In blue:
EAA model without CCC. c. As in a., but removing the samples in the old age group (> 55 years). The solid
green line represents the linear model DNAmAge ∼ Age, which is much more similar to the diagonal if only
young and middle age samples are considered. d. As in b., but fitting the control models to the samples in the
young and middle age groups (0-55 years). The bias in the EAA is corrected in this case (the distributions are
centred around zero for the different age groups).
2.2 Behaviour of Horvath’s epigenetic clock during ageing 67
et al., 2017]. However, to this date, it is unclear whether it represents a technical artefact
or has a biological explanation (e.g. survivor bias of the older individuals, the molecular
processes that drive ageing slow down with age, etc.).
This highlights the importance of having a properly age-matched control when per-
forming analyses with the Horvath’s epigenetic clock. As expected, removing the older
samples (> 55 years) from the control models corrected for this bias (Fig. 2.12c,d) and
reduced the MAE (MAEwith CCC = 2.2742 years, MAEwithout CCC = 2.3237 years). This is
the strategy that I used when screening for epigenetic age acceleration in the context of
developmental disorders (see Chapter 3).
2.2.3 Correcting for batch effects in the context of the epigenetic clock
As mentioned in the previous section, it is expected that, after fitting the control models, the
EAA distributions of the samples from the healthy individuals should be centred around zero.
However, when the principal components (PCs) that capture technical variation were not
included in the control models (see equations 2.16 and 2.17), this was not the case for several
batches (Fig. 2.13a, Fig. S1.8a). Therefore, I hypothesised that technical variation can affect
the predictions from Horvath’s epigenetic clock and that batch effects need to be explicitly
accounted for in this context, even after applying the internal normalisation step against the
blood gold-standard [Horvath, 2013a]. This section explains how I implemented this batch
effect correction (i.e. how I derived the principal components that capture technical variance
across batches).
A batch effect is a systematic technical source of variation that is unrelated to the
biological or scientific variables in a study [Leek et al., 2010]. They affect low- and high-
throughput measurements and can be caused by a wide variety of situations: different
technicians performing the experiments, different laboratories generating the data, different
lots of reagents or arrays used, etc. [Leek et al., 2010]. Correcting for batch effects is
crucial, especially when integrating data from different studies and sources [Maksimovic
et al., 2015], as it is the case in the analyses presented in this thesis. Data generated by DNA
methylation arrays is also affected by batch effects and several methods have been described
in the literature to correct for them, normally at the level of probe intensities [Fortin et al.,
2014] or M-values [Maksimovic et al., 2015; Price and Robinson, 2018]. In the context
of the epigenetic clock, previous attempts to account for technical variation have used the
first five PCs estimated directly from the DNA methylation data (presumably the β -values)
[Horvath et al., 2016b]. However, this approach potentially removes meaningful biological
68 Statistical aspects
variation, especially in studies with global changes in DNA methylation, such as cancer
[Fortin et al., 2014] or developmental disorders (see Chapter 3). Furthermore, given that
Horvath’s epigenetic clock was trained with data pre-processed using different strategies, it is
unclear how applying an additional batch effect correction step to the intensities or β -values
would impact the predictions [Horvath, 2013c].
Thus, I decided to correct for the potential batch effects when fitting the control models
(see equations 2.16 and 2.17). I make use of the control probes present on the 450K array,
which have been shown to carry information about unwanted variation from a technical source
(i.e. technical variance) [Fortin et al., 2014; Gagnon-Bartsch and Speed, 2012; Maksimovic
et al., 2015]. These probes are designed to capture technical variance in negative controls,
measure between-array differences and quantify the performance of different steps of the
array protocol, such as bisulfite conversion, staining or hybridisation [Fortin et al., 2014;
Illumina, 2010]. I performed principal component analysis (PCA, with centering but not
scaling using the prcomp function in R) on the raw intensities of the control probes (847
probes · 2 channels = 1694 intensity values) for all the healthy individuals (N = 2218) and
the samples with developmental disorders (cases, N = 666, see Chapter 3). This showed
that the first two PCs capture the batch structure in both healthy individuals (Fig. 2.13b)
and cases (Fig. S1.9). Including the first 17 PCs as part of the epigenetic age acceleration
(EAA) modelling (see equations 2.16 and 2.17), which together accounted for 98.06% of the
technical variance in all the samples (Fig. S1.10), significantly reduced the median absolute
error (MAE) of the predictions in the healthy individuals (MAEwith CCC = 2.7117 years,
MAEwithout CCC = 2.8211 years, mean MAE = 2.7664, Fig. 2.13c). Notably, the reduction in
the MAE provided by the batch effect correction was higher than the improvement provided
by cell composition correction, a common practice in the epigenetic clock field [Chen et al.,
2016a; Horvath et al., 2016a]. The optimal number of PCs was found by making use of the
findElbow function from [Akalin, 2014].
Finally, deviations from a median EAA close to zero in some of the batches after batch
effect correction (Fig. 2.13d, Fig. S1.8b) could be explained by other variables, such as a small
batch size or an overrepresentation of young samples (Fig. 2.14). The latter is a consequence
of the fact that Horvath’s model underestimates the epigenetic ages of older samples, which I
have discussed in the previous section. Thus, I have shown that correcting for batch effects
in the context of the epigenetic clock is important, especially when combining datasets from
different sources for meta-analysis purposes. Batch effect correction is essential to remove
technical variance that could affect the epigenetic age of the samples and confound
biological interpretation. Furthermore, given the flexibility of this modelling approach, I have
2.2 Behaviour of Horvath’s epigenetic clock during ageing 69
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
−4
0
−2
0
0
20
40
Eu
ro
pe
Fe
b_
20
16
GS
E1
04
81
2
GS
E1
11
62
9
GS
E4
02
79
GS
E4
12
73
GS
E4
28
61
GS
E5
10
32
GS
E5
54
91
GS
E5
90
65
GS
E6
14
96
GS
E7
44
32
GS
E8
19
61
GS
E9
73
62
Batch
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Batch effect correction: FALSE
MAE: 3.0881
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●●●
●
●
●
●
●●
●
−4
0
−2
0
0
20
40
Eu
ro
pe
Fe
b_
20
16
GS
E1
04
81
2
GS
E1
11
62
9
GS
E4
02
79
GS
E4
12
73
GS
E4
28
61
GS
E5
10
32
GS
E5
54
91
GS
E5
90
65
GS
E6
14
96
GS
E7
44
32
GS
E8
19
61
GS
E9
73
62
Batch
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Batch effect correction: TRUE
MAE: 2.7117
●
● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●●
●
●
●● ●
●
●
●
●
●
●
●
●
●
● ●
● ● ●●
●
●
●
● ●
●
●
●
●●
● ●
●
●●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
● ●●
●●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
● ●●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
● ●●
●
●
● ●
●
●
●
●
● ●
●
● ●
●
●
●●
●
●
●
●
● ●
●
● ●
●
●
● ● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●●
●
● ●
●
● ● ● ●
●
●
●
●
●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
● ●●
●●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
● ● ● ●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
● ●
●
● ●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
● ●
●●●
●
● ●
●
●
●●
●
● ●
●
●
● ●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2
00
00
0
20
00
0
40
00
0
−1
e+
05
−5
e+
04
0e
+0
0
5e
+0
4
1e
+0
5
PC1 (68.99%)
PC
2 
(1
0.
33
%
)
Batch
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Europe
Feb_2016
GSE104812
GSE111629
GSE40279
GSE41273
GSE42861
GSE51032
GSE55491
GSE59065
GSE61496
GSE74432
GSE81961
GSE97362
Healthy individuals
2.
5
3.
0
3.
5
4.
0
0 25 50 75 10
0
Number of PCs
M
AE
 in
 c
on
tro
l
Corrections
CCC: No | Batch: No
CCC: No | Batch: Yes
CCC: Yes | Batch: No
CCC: Yes | Batch: Yes
Optimal number of PCs: 17
Optimal mean MAE: 2.7664
Background correction: noob
a
b c
d
Fig. 2.13 Correcting for batch effects in the context of the epigenetic clock. a. Distribution of the epigenetic age
acceleration (EAA) for the different batches of healthy individual samples, using the control model with cell
composition correction (CCC) and before applying batch effect correction. The dashed black line represents
EAA = 0, where the distributions should be centred around. b. Scatterplot showing the values of the first two
principal components (PCs) for the healthy individual samples after performing PCA on the control probes of
the 450K arrays. Each point corresponds to a different sample and the colours represent the different batches.
The different batches cluster together in the PCA space, showing that the control probes indeed capture technical
variation. Please note that all the PCA calculations were done using samples from both healthy individuals (full
lifespan, N = 2218) and cases from developmental disorders (N = 666, see Chapter 3). c. Plot showing how
the median absolute error (MAE) of the prediction in the healthy individual samples, that should tend to zero,
is reduced when the PCs capturing the technical variation are included as part of the modelling strategy (see
equations 2.16 and 2.17). The dashed line represents the optimal number of PCs (17) that was finally used. The
optimal mean MAE is calculated as the average MAE between the green and purple lines. d. As in a., but after
applying batch effect correction (i.e. equivalent to equation 2.16).
70 Statistical aspects
applied batch effect correction across other types of analyses in the thesis, such as DMPs
identification (see equation 2.10).
●
●
●
●
●
●
●
● ●
●
●●
−2
.5
0.
0
2.
5
0 20 40 60
Median age (years)
M
ed
ia
n 
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
●
●
200
400
600
Batch
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Europe
Feb_2016
GSE104812
GSE111629
GSE40279
GSE41273
GSE42861
GSE51032
GSE55491
GSE59065
GSE61496
GSE74432
GSE81961
GSE97362
Fig. 2.14 After applying batch effect correction in the samples from the healthy individuals, deviations from a
median epigenetic age acceleration (EAA) of zero (dotted black line) in some of the batches can be explained
by other causes. The grey line separates in the lower left corner those weird batches (Feb_2016, GSE104812,
GSE41273, GSE55491), which have a small sample size and/or a low median age.
2.3 Behaviour of other epigenetic clocks during ageing
2.3.1 Hannum’s epigenetic clock
Besides Horvath’s epigenetic clock, other models have been proposed in the literature to
measure the ageing process using DNA methylation. Among them, Hannum’s epigenetic
clock has also been shown to accurately predict epigenetic age in several cohorts [Chen
et al., 2016a; Horvath et al., 2016a; Irvin et al., 2018; Marioni et al., 2018, 2015; Perna et al.,
2016]. Hannum’s model was originally trained in whole blood and it makes use of a linear
combination of β -values from 71 probes in the 450K array.
2.3 Behaviour of other epigenetic clocks during ageing 71
I calculated the epigenetic ages according to Hannum’s model (HannumAge), although I
only used 68 out of the 71 probes (the other 3 were filtered out during my pre-processing).
Hannum’s epigenetic clock performed quite accurately in the dataset of healthy individ-
uals, although with a slight overestimation of the epigenetic ages (Fig 2.15a), which has
also been previously observed [Marioni et al., 2015]. Furthermore, it is possible to observe
the non-linear behaviour of Hannum’s clock for young ages (≤ 20 years), for which the
authors did not correct in their original publication [Hannum et al., 2013]. Horvath’s and
Hannum’s epigenetic clocks are correlated (Fig. 2.15b). The magnitude of this correlation
(HannumAge vs DNAmAge: PCC = 0.9778) was slightly stronger than the correlation be-
tween HannumAge and chronological age (PCC = 0.9756), which could highlight the fact
that both models indeed measure epigenetic age.
Next, I estimated the epigenetic age acceleration (EAA) according to Hannum’s epige-
netic clock, using similar models to the ones previously described (although in this case the
dependent variable was HannumAge, see equations 2.16 and 2.17). The median absolute
errors for Hannum’s model (MAEwith CCC = 2.8422 years, MAEwithout CCC = 2.9484 years)
were slightly higher than the ones obtained for Horvath’s clock (MAEwith CCC = 2.7117 years,
MAEwithout CCC = 2.8211 years), which could also be influenced by the fact that three of the
model probes were not available. The EAAs estimated by Hannum’s and Horvath’s clocks
showed a moderate correlation (Fig. 2.15c,d), consistent with previous estimates [Irvin et al.,
2018]. Including cell composition correction improved the correlation between the EAAs
from both clocks, highlighting the fact that Hannum’s clock seems to be confounded with
the changes in blood cell composition with age [Irvin et al., 2018; Marioni et al., 2015].
Overall, Hannum’s epigenetic clock performed well in my dataset. However, given that
it produces slightly worse predictions than Horvath’s and could be partially tracking blood
immunosenescence instead of multi-tissue ageing effects, I used the latter as my main proxy
to measure the ageing process in this thesis. Finally, it is also worth mentioning that the data
that was used to train Hannum’s model (GSE40279) is also part of the dataset of healthy
individuals that I assembled and, therefore, this analysis does not constitute a completely
independent assessment of the behaviour of Hannum’s epigenetic clock.
2.3.2 Epigenetic mitotic clock: epiTOC
In 2016, Yang and colleagues conceived a novel type of epigenetic clock called epiTOC
(epigenetic Timer Of Cancer), which measures the rate of (stem) cell division in both normal
and cancerous tissues and is associated with cancer risk [Yang et al., 2016c]. This epigenetic
72 Statistical aspects
0
25
50
75
100
0 25 50 75 100
Chronological age (years)
Ha
nn
um
Ag
e 
(y
ea
rs
)
Full lifespan control: N = 2218
0
25
50
75
100
0 25 50 75 100
DNAmAge (years)
Ha
nn
um
Ag
e 
(y
ea
rs
)
Full lifespan control: N = 2218
−20
0
20
40
−20 0 20
Horvath EAA with CCC (years)
Ha
nn
um
 E
AA
 w
ith
 C
CC
 (y
ea
rs
)
PCC: 0.5952; p−value < 2.2e−16
Full lifespan control: N = 2218
−20
0
20
40
−20 0 20
Horvath EAA without CCC (years)
Ha
nn
um
 E
AA
 w
ith
ou
t C
CC
 (y
ea
rs
)
PCC: 0.5781; p−value < 2.2e−16
Full lifespan control: N = 2218
a b
c d
Fig. 2.15 Behaviour of Hannum’s epigenetic clock in the healthy individuals. a. Scatterplot showing the
relationship between the epigenetic age predicted with Hannum’s model (HannumAge) [Hannum et al., 2013]
and chronological age of the samples for the healthy individuals. Each sample is represented by one point. The
black dashed line represents the diagonal to aid visualisation. The solid brown line represents the linear model
HannumAge ∼ Age. b. Relationship between the Hannum and Horvath epigenetic ages estimated for the same
sample. The solid brown line represents the linear model HannumAge ∼ DNAmAge. c. Relationship between
the epigenetic age acceleration (EAA) calculated with the Hannum and the Horvath’s epigenetic clocks. In this
case the models include cell composition correction (CCC). The solid brown line represents the linear model
Hannum_EAAwith CCC ∼ Horvath_EAAwith CCC. d. As in c., but in this case the models do not include CCC.
2.3 Behaviour of other epigenetic clocks during ageing 73
mitotic clock tracks the gain in methylation levels that happens in 385 CpG sites, which
localise in the promoter of genes that are targeted by Polycomb Repressing Complex 2
(PRC2). Importantly, these CpG sites are unmethylated across fetal tissues and therefore this
provides a ground state to measure these changes during human lifespan.
I calculated the mitotic age (pcgtAge) of the healthy individuals in my dataset, although I
only used 378 out of the 385 probes (the other 7 were filtered out during my pre-processing).
The mitotic age of the individuals correlated with both chronological age (PCC = 0.5131,
Fig. 2.16a) and DNAmAge (PCC = 0.5602, Fig. 2.16b), which is expected given the cumula-
tive number of divisions of the hematopoietic stem cells [Beerman et al., 2013]. Furthermore,
I estimated the epigenetic age acceleration (EAA) according to the epigenetic mitotic clock,
using similar models to the ones previously described (although in this case the dependent
variable was pcgtAge, see equations 2.16 and 2.17). Interestingly, the EAAs for pcgtAge
and DNAmAge showed a small but highly statistically significant correlation (Fig. 2.16c,d).
Moreover, I also did some preliminary work where I calculated the DNAmAge of different
healthy tissues (that came from cancer patients). I observed that tissues with a high turnover
(such as breast) [Horvath, 2013a; Sehl et al., 2017] had a higher DNAmAge when compared
with tissues with a low turnover (data not shown). This was quite surprising given that
Horvath’s epigenetic clock predicts across tissues with different turnover rates [Yang et al.,
2016c]. Additionally, it has been recently demonstrated that DNAmAge increases linearly
with cell passage in vitro if TERT (the catalytic subunit of telomerase) is expressed (although
whether this also applies in vivo is unknown) [Lu et al., 2018].
All of this, together with the fact that DNAmAge has a stronger correlation with pcgtAge
than chronological age (at least in this blood dataset), could suggest that Horvath’s epi-
genetic clock might track cell division to a certain extent (although it is also clear that
Horvath’s clock is mostly not a mitotic clock). Furthermore, it is important to point out
that the observed effect sizes are small, that some of these results could be confounded by
variables that are difficult to account for (e.g. higher DNAmAge in breast tissue could be due
to hormonal factors) and that DNAmAge is not universally accelerated in cancer [Horvath,
2015]. Therefore, further testing of these ideas is required by future studies, which hopefully
will improve our understanding of the contribution of cell division to Horvath’s epigenetic
clock and its relation to the hypermethylation in PRC2-bound regions as measured by the
epigenetic mitotic clock.
74 Statistical aspects
0.05
0.10
0.15
0.20
0 25 50 75 100
Chronological age (years)
pc
gt
Ag
e
Full lifespan control: N = 2218
0.05
0.10
0.15
0.20
0 25 50 75 100
DNAmAge (years)
pc
gt
Ag
e
Full lifespan control: N = 2218
−0.05
0.00
0.05
−20 0 20
Horvath EAA with CCC (years)
pc
gt
Ag
e 
EA
A 
wi
th
 C
CC
PCC: 0.279; p−value < 2.2e−16
Full lifespan control: N = 2218
0.00
0.05
0.10
0.15
−20 0 20
Horvath EAA without CCC (years)
pc
gt
Ag
e 
EA
A 
wi
th
ou
t C
CC
PCC: 0.2778; p−value < 2.2e−16
Full lifespan control: N = 2218
a b
c d
Fig. 2.16 Behaviour of the epigenetic mitotic clock (epiTOC) in the healthy individuals. a. Scatterplot
showing the relationship between mitotic age (pcgtAge) [Yang et al., 2016c] and chronological age of the
samples for the healthy individuals. Each sample is represented by one point. The solid brown line represents
the linear model pcgtAge ∼ Age. b. Relationship between pcgtAge and DNAmAge estimated for the same
sample. The solid brown line represents the linear model pcgtAge ∼ DNAmAge. c. Relationship between
the epigenetic age acceleration (EAA) calculated with the mitotic and the Horvath’s epigenetic clocks. In this
case the models include cell composition correction (CCC). The solid brown line represents the linear model
pcgtAge_EAAwith CCC ∼ Horvath_EAAwith CCC. d. As in c., but in this case the models do not include CCC.
2.4 Additional methods 75
2.4 Additional methods
A short introduction to the linear regression framework
Linear models are a broad class of statistical analyses that are at the core of many bioin-
formatic methods, including differential RNA expression analyses [Ritchie et al., 2015]
or genome-wide association studies (GWAS) [Visscher et al., 2017]. An instance of such
models is linear regression [Eaton, 2007], a statistical approach that allows modelling of the
relationship between:
• A dependent variable Y, with observations yi ∈ R and i ∈ {1, ...,n}, where n is the total
number of observations (i.e. samples).
• One or more independent variables X j, with observations xi j ∈ R and j ∈ {1, ...,k},
where k is the total number of independent variables (a.k.a covariates). These variables
can indicate, for example, whether a specific condition or phenotype is present in a
given sample, quantify the effects of a continuous variable (such as chronological age)
or adjust for the effects of batch effects; which gives this statistical framework a great
analytical flexibility [Ritchie et al., 2015].
We can describe the dependent variable Y as a function of the independent variables X j:
yi =
k
∑
j=1
xi jβ j + εi (2.19)
where β j are unknown parameters that need to be estimated from the data and εi is the
random error. In matrix form:
y = Xβ + ε (2.20)
where y ∈ Rn is the vector {y1, ...,yn}, X ∈ Rn×k is the n×k matrix of xi j’s, β ∈ Rk is the
vector {β1, ...,βk} and ε ∈ Rn is the vector {ε1, ...,εn}.
Assuming that E(ε) = 0, Var(ε) = σ2 > 0 and Cov(ε) = σ2In (where In is the n×n iden-
tity matrix) and applying the Gauss-Markov theorem [Eaton, 2007], it can be demonstrated
that:
76 Statistical aspects
βˆ = (X′X)−1X′y (2.21)
where X′ is the transpose of X and βˆ is the least-squares estimator of β , since it minimises:
n
∑
i=1
(yi−
k
∑
j=1
xi jβˆ j)2 (2.22)
It is possible to test whether there is a statistically-significant linear association between
the dependent variable (Y) and one of the independent variables (X j) i.e. to test:
H0 : β j = 0 against HA : β j ̸= 0 (2.23)
where H0 is the null hypothesis and HA is the alternative hypothesis. A t-statistic (T ) can
be derived after performing the fitting of the linear regression model [Sheather, 2009]:
T =
βˆ j
se(βˆ j)
(2.24)
where se(βˆ j) is the standard error of βˆ j. When H0 is true, then the statistic T follows a
Student’s t distribution with n− k degrees of freedom i.e. T ∼ tn−k. This allows to estimate
the p-value for the linear association of Y with a given X j.
Finally, it is worth mentioning the nomenclature that I used for the linear regression
models along this thesis. For example, the following model fits a linear association between
the dependent variable (e.g. β -value at a specific CpG probe in the array) with intercept and
3 covariates (e.g. age, sex and disease status):

y1
y2
...
yn
=

1 x11 x12 x13
1 x21 x22 x23
... ... ... ...
1 xn1 xn2 xn3


β0
β1
β2
β3
+

ε1
ε2
...
εn
 (2.25)
where yi is the β -value at a certain CpG probe for the ith sample, xi1 is the age for the
ith sample, xi2 is the sex (e.g. 0 for male and 1 for female) for the ith sample, xi3 is the
2.4 Additional methods 77
disease status (e.g. 0 for a healthy individual and 1 for an individual with a disease) for the
ith sample, β0 is the intercept coefficient, β j are the covariate coefficients ( j = 1 for age,
j = 2 for sex, j = 3 for disease status) and εi is the error for the ith sample.
Throughout this thesis, I use the following nomenclature to describe the model above
(‘R-style’ nomenclature):
Beta∼ Age+Sex+Disease_status (2.26)

Chapter 3
Biological aspects
‘At a fundamental level evolutionary
survival is the preservation of a dynamic
balance between information, or order,
and entropy, or disorder.’
T. B. L. Kirkwood [1977]
Declaration
This chapter in mainly the product of my own work. Additionally, I would like to recognise the contributions of
Janet M. Thornton, Wolf Reik and Thomas M. Stubbs (who helped designing the study and interpreting the
data), Erfan Aref-Eshghi (who run some of the analyses using my code and provided part of the samples in
the dataset), Marc Jan Bonder and Oliver Stegle (who provided statistical input) and Bekim Sadikovic (who
provided part of the samples in the dataset). All of them also helped in the revision of the final text. This work
has been published in the journal Genome Biology [Martin-Herranz et al., 2019].
3.1 Background
Epigenetic clocks can be understood as a proxy to quantify the changes of the epigenome
with age. However, little is known about the molecular mechanisms that determine the rate of
the underlying epigenetic ageing clock (see section 1.3.3). Steve Horvath proposed that the
multi-tissue epigenetic clock captures the workings of an epigenetic maintenance system
[Horvath, 2013a]. Recent GWAS studies have found several genetic variants associated
with epigenetic age acceleration in genes such as TERT (the catalytic subunit of telomerase)
[Lu et al., 2018], DHX57 (an ATP-dependent RNA helicase) [Lu et al., 2016] or MLST8 (a
subunit of both mTORC1 and mTORC2 complexes) [Lu et al., 2016]. Nevertheless, to my
80 Biological aspects
knowledge no genetic variants in epigenetic modifiers have been found and the molecular
nature of this hypothetical system is unknown to this date.
I decided to take a reverse genetics approach and look at the behaviour of the epigenetic
clock in patients with developmental disorders, many of which harbour mutations in pro-
teins of the epigenetic machinery [Aref-Eshghi et al., 2018b; Bjornsson, 2015]. I performed
an unbiased screen for epigenetic age acceleration and found that Sotos syndrome accelerates
epigenetic ageing, potentially revealing a role of H3K36 methylation maintenance in the
regulation of the rate of the epigenetic clock.
3.2 Screening for genes that accelerate the epigenetic age-
ing clock
The main goal of this analysis is to identify genes, mainly components of the epigenetic ma-
chinery, that can affect the rate of epigenetic ageing in humans (as measured by Horvath’s
epigenetic clock) [Horvath, 2013a]. For this purpose, I assembled a dataset with all the DNA
methylation data from patients with different developmental disorders that I could find, in
order to perform an unbiased screen. This dataset combines samples publicly available in
GEO [Edgar et al., 2002] with in-house data generated by my collaborators at the London
Health Sciences Centre, Canada (Table S2.1, Fig. S2.1). All these data were generated from
blood using the Illumina 450K methylation array, as in the case of the healthy individuals
described in Chapter 2.
Many of these developmental syndromes have overlapping clinical features [Aref-Eshghi
et al., 2018b; Bjornsson, 2015]. Furthermore, in some cases with a clinical diagnosis, the
genetic cause remains unknown, probably due to locus heterogeneity or difficulty to assess
the clinical significance of some genetic variants [Aref-Eshghi et al., 2017]. Therefore,
several studies have explored the ability of DNA methylation signatures to aid differential
diagnoses of these syndromes [Aldinger et al., 2013; Alisch et al., 2013; Aref-Eshghi et al.,
2018a,b, 2017; Butcher et al., 2017; Choufani et al., 2015; Grafodatskaya et al., 2013; Hood
et al., 2016; Kernohan et al., 2016; Schenkel et al., 2017, 2016]. Given that most of the
diagnoses for developmental disorders are carried out early in life, this dataset has a bias
towards younger ages (Fig. 3.1). In order to maximise the ability to detect ageing-associated
effects, I kept only those developmental disorders with at least 5 samples, of which at least 2
had a chronological age ≥ 20 years (which, according to Horvath’s model, is the adult age
3.2 Screening for genes that accelerate the epigenetic ageing clock 81
for humans) [Horvath, 2013a]. This filtering resulted in a dataset for the main screen with
N = 367 samples from cases, which had ages between 0 and 55 years (Fig. 3.2, Table 3.1).
0.00
0.02
0.04
0.06
0 20 40
Chronological age (years)
D
en
si
ty
Cases: N=367
Fig. 3.1 Histogram showing the chronological age distribution for all the individuals with developmental
disorders (cases) included in the final dataset (i.e. after QC and filtering). The blue line represents the 1D kernel
density estimate, as calculated by the stat_density function in R with default parameters.
The purpose of the screen is to test whether the epigenetic ages of the samples from a
given developmental disorder (cases) deviate from their chronological age i.e. identify
those developmental disorders that present epigenetic age acceleration (EAA). For a given
sample, a positive EAA indicates that the epigenetic (biological) age of the sample is higher
than the one expected for someone with that chronological age. In other words, it means
that the epigenome of that person resembles the epigenome of an older individual. The
opposite is true when a negative EAA is found (i.e. the epigenome looks younger than
expected). I calculated the epigenetic ages (DNAmAge) of all the samples according to
Horvath’s epigenetic clock (see section 2.2.1) and I fitted the control models to the samples
from the healthy individuals, including models with and without blood cell composition
correction (CCC) and always accounting for potential batch effects (see equations 2.16 and
2.17). As previously discussed (see section 2.2.2), due to the fact that Horvath’s model
underestimates the epigenetic age of old samples, the age distribution of the control samples
82 Biological aspects
Developmental disorder Gene(s)
involved
Gene(s)
function
Molecular
cause
N Age range
(years)
Angelman UBE3A Ubiquitin
protein
ligase E3A
Imprinting,
mutation
14 1 to 55
Autism spectrum disorder
(ASD)
- - - 119 1.83 to 35.16
Alpha thalassemia/mental
retardation X-linked syn-
drome (ATR-X)
ATRX Chromatin
remodelling
Mutation 15 0.7 to 27
Claes-Jensen KDM5C H3K4
demethylase
Mutation 10 2 to 42
Coffin-Lowry RPS6KA3 Serine / thre-
onine kinase
Mutation 10 1.3 to 22.8
Floating-Harbour SRCAP Chromatin
remodelling
Mutation 17 4 to 42
Fragile X syndrome (FXS) FMR1 Translational
control
Mutation
(CGG
expansion)
32 0.08 to 48
Kabuki KMT2D H3K4
methyltrans-
ferase
Mutation 46 0 to 24.1
Noonan PTPN11,
RAF1, SOS1
RAS/ MAPK
signalling
Mutation 15, 11,
14
0.2 to 49
Rett MECP2 Transcriptional
repression
Mutation 15 1 to 34
Saethre-Chotzen TWIST1 Transcription
factor
Mutation 22 0 to 38
Sotos NSD1 H3K36
methyltrans-
ferase
Mutation 20 1.6 to 41
Weaver EZH2 H3K27
methyltrans-
ferase
Mutation 7 2.58 to 43
Total - - - 367 0 to 55
Table 3.1 Overview of the developmental disorders that were included in the screening after quality control and
filtering (total N = 367).
3.2 Screening for genes that accelerate the epigenetic ageing clock 83
Controls
Healthy samples
N = 2218
Cases
23 developmental 
disorders
N = 666
QC
DNA methylation
data (IDAT files)
Controls
Healthy samples
N = 1128
Cases
13 developmental 
disorders
N = 367
Filtering
Screening for 
epigenetic age 
acceleration 
(EAA) using 
Horvath’s 
epigenetic clock
Calculate pcgtAge
using epigenetic 
mitotic clock
Enrichment of 
(epi)genomic 
features in DMPs
Calculate 
Shannon entropyBenchmarking of pre-processing strategies 
and correcting for batch effects
Calculating cell composition in blood
Fig. 3.2 Flow diagram that portrays an overview of the different analyses that are carried out in the raw DNA
methylation data (IDAT files) from human blood for cases (developmental disorders samples) and controls
(healthy samples). The control samples are filtered to match the age range of the cases (0-55 years). The cases
are filtered based on the number of ‘adult’ samples available (for each disorder, at least 5 samples, with 2 of
them with an age ≥ 20 years). QC: quality control. DMPs: differentially methylated positions.
can have an impact on the results of the screen. Therefore, I filtered the ages of the healthy
individual samples to make them match the age range of the developmental disorders (0-55
years, N = 1128, see Fig. 3.2).
The EAA for the control samples corresponds to the residuals from the control models
(see section 2.2.2). On the other hand, the EAA for a case sample is calculated by taking
the difference between the epigenetic age (DNAmAge) and the predicted value from the
corresponding control model (with or without cell composition). Finally, I compared the dis-
tributions of the EAA for the different developmental disorders against the EAA distributions
for the healthy controls using the non-parametric two-sided Wilcoxon’s test. P-values were
adjusted for multiple testing using Bonferroni correction and a significance level of α = 0.01
was applied. It is worth mentioning that some of the developmental disorders included in
the screen (such as autism spectrum disorder or Coffin-Lowry syndrome) are not necessarily
caused by alterations in the epigenetic machinery, but were still included to maintain the
unbiased nature of the screen.
84 Biological aspects
3.3 Sotos syndrome accelerates epigenetic ageing
The results from the screen are portrayed in Fig. 3.3. Most syndromes do not show evidence of
accelerated epigenetic ageing, but Sotos syndrome presents a clear positive EAA (median
EAAwith CCC = + 7.64 years, median EAAwithout CCC = + 7.16 years), with p-values consider-
ably below the significance level of 0.01 after Bonferroni correction (p-valuecorrected, with CCC
= 3.40 · 10−9, p-valuecorrected, without CCC = 2.61 · 10−7). Additionally, Rett syndrome (median
EAAwith CCC = + 2.68 years, median EAAwithout CCC = + 2.46 years, p-valuecorrected, with CCC
= 0.0069, p-valuecorrected, without CCC = 0.0251) and Kabuki syndrome (median EAAwith CCC
= - 1.78 years, median EAAwithout CCC = - 2.25 years, p-valuecorrected, with CCC = 0.0011,
p-valuecorrected, without CCC = 0.0035) reach significance, with a positive and negative EAA
respectively. Finally, fragile X syndrome (FXS) shows a positive EAA trend (median
EAAwith CCC = + 2.44 years, median EAAwithout CCC = + 2.88 years) that does not reach
significance in the screen (p-valuecorrected, with CCC = 0.0680, p-valuecorrected, without CCC =
0.0693).
Next, I tested the effect of changing the median age used to build the healthy control
model (i.e. the median age of the controls) on the screening results (Fig. S2.2). Sotos
syndrome is robust to these changes, whilst Rett, Kabuki and FXS are much more sensitive to
the control model used. This again highlights the importance of choosing an appropriate age-
matched control when testing for epigenetic age acceleration, given that Horvath’s epigenetic
clock underestimates epigenetic age for advanced chronological ages [El Khoury et al., 2018;
Marioni et al., 2018].
Moreover, all but one of the Sotos syndrome patients (19/20 = 95%) show a consistent
deviation in EAA (with CCC) in the same direction (Fig. 3.4a,b), which is not the case for the
rest of the disorders, with the exception of Rett syndrome (Fig. S2.3). Even though these data
suggest that there are already some methylomic changes at birth, the EAA seems to increase
with age in the case of Sotos patients (Fig. 3.4b; p-values for the slope coefficient of the
EAA∼Age linear regression: p-valuewith CCC = 0.00569, p-valuewithout CCC = 0.00514).This
could imply that at least some of the changes that normally affect the epigenome with age
are happening at a faster rate in Sotos syndrome patients during their lifespan (as opposed to
the idea that the Sotos epigenetic changes are only acquired during prenatal development and
remain constant afterwards).
Finally, I investigated whether Sotos syndrome leads to a higher rate of (stem) cell division
in blood when compared with the healthy population. I employed the epigenetic mitotic
clock, that makes use of the fact that some CpGs in promoters that are bound by Polycomb
3.3 Sotos syndrome accelerates epigenetic ageing 85
0.0
2.5
5.0
7.5
10.0
Angelm
an
ASD
ATR
−X
Claes_Jensen
Coffin_Lo
w
ry
Floating_Harbour
FXS
Kabuki
N
oonan_PTPN
11
N
oonan_R
AF1
N
oonan_SO
S1
R
ett
Saethre_Chotze
n
Sotos
W
e
a
ve
r
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
With CCC
Without CCC
Age range in control: 0−55 years
Median age in control: 34 years 
Number of samples in control: 1128
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l l
l
l
l
l
l
l
l l
l l
l
l
l
l
ll
l
l
l
l
−20
−10
0
10
20
30
Ep
ig
en
et
ic
 a
ge
 a
cc
el
er
at
io
n 
(ye
ar
s)
EAA model
With CCC
Without CCC
Fig. 3.3 Screening for epigenetic age acceleration (EAA) in developmental disorders. The upper panel shows
the p-values derived from comparing the EAA distributions for the samples in a given developmental disorder
and the control (two-sided Wilcoxon’s test). The dashed green line displays the significance level of α = 0.01
after Bonferroni correction. The bars above the green line reach statistical significance. The lower panel
displays the actual EAA distributions, which allows assessing the direction of the EAA (positive or negative).
In red: EAA model with cell composition correction (CCC). In blue: EAA model without CCC. ASD: autism
spectrum disorder. ATR-X: alpha thalassemia/mental retardation X-linked syndrome. FXS: fragile X syndrome.
86 Biological aspects
●
● ●
●
●
●
●●
●
● ●
●●
●
●
●
●
●●●●
●
● ●
●
●●
●
●
●
●●
●●●●●
●
●●
●● ●
●
●
●
●
●
●● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●● ●
●●
●
●
●●
●●●●
●
●
● ●●●●
●●●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●● ●
●● ●
●●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
● ●
●●
●
●
●
●
●
●
●
●
● ●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
● ●
●
●●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●●●
●●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Control: N=1128
Sotos: N=20
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●●●●
●
●●
●● ●
●
●
●
●
●●●●
●
●●●
●●●
●
● ●●
●
●
●●●●
●
●
●●
●●
● ●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●● ●
●●
●
●●
●●●●●● ●●
●
●
●
●
●
●
●●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●●
●●
●
●
●
●
● ●
●
●●●
●
●
● ●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●● ●
●
●●●
●
●●
●
●
●
●
●
● ●
●
●
● ●●
●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●● ●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●●●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
● ●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
● ●
●
●●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●● ●
●
●
●
●●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●● ●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
● ●
●●
● ●
●● ●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●●
●●
● ● ●●
●
●●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Sotos: N=20
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●●
●
●●●●●
●
●
●
●●
●●
●
●
●
●
●
●●●●●●●
●
●
●
●
●● ●●● ●●●●
●
●
●
● ●
●●●●●
●
●
●
●
●●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●●
●
● ●
●
●
●
●
●
●
● ●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
● ●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
● ●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
● ●
●
●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●● ●
●●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●● ●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●● ●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●●
●
●
● ●●
●● ● ●●
●
●
●
●●
●● ●
●
● ●
● ●
●
●● ●●●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
● ●
●
●
●
●
● ●
●
●
●
●
●● ●
●
●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
0.025
0.050
0.075
0.100
0.125
0 20 40
Chronological age (years)
pc
gt
Ag
e
Control: N=1128
Sotos: N=20
● ●
● ●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
● ●●
●
● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●
●
● ●
●●
● ●
●●
●●
●
●
●
● ●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●●
● ●
●
●
●
●
●●
●
●●
●
● ●● ●
●
●
●
●
●
● ●●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●● ●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●●●
● ●●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●
●
●●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●●● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●●
●
● ●●
●
●
●
●
● ●●
●●
● ●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
● ●
●
●
●
●
●
●
●
●
●● ●●●
● ● ●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
−0.02
0.00
0.02
0.04
0 20 40
Chronological age (years)
pc
gt
Ag
e 
ac
ce
le
ra
tio
n
Control: N=1128
Sotos: N=20
●
● ●
●
●
●
●●
●
● ●
●●
●
●
●
●
●●●●
●
● ●
●
●●
●
●
●
●●
●●
●●
●
●
●● ●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●
●
●●
●●●
●
●
● ●●●
●●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●● ●
●● ●
●●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
● ●
●●
●
●
●
●
●
●
●
●
● ●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●●
●
●
●
●
● ●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 20 40
Chronological age (years)
Disease status
●
●
Sotos
Control
Control: N=1128
Sotos: N=20
a b
c d
Fig. 3.4 Sotos syndrome accelerates epigenetic ageing. a. Scatterplot showing the relationship between
epigenetic age (DNAmAge) according to Horvath’s model [Horvath, 2013a] and chronological age of the
samples for Sotos (orange) and control (grey). Each sample is represented by one point. The black dashed line
represents the diagonal to aid visualisation. b. Scatterplot showing the relationship between the epigenetic age
acceleration (EAA) and chronological age of the samples for Sotos (orange) and control (grey). Each sample is
represented by one point. The yellow line represents the linear model EAA ∼ Age, with the standard error
shown in the light yellow shade. c. Scatterplot showing the relationship between the score for the epigenetic
mitotic clock (pcgtAge) [Yang et al., 2016c] and chronological age of the samples for Sotos (orange) and control
(grey). Each sample is represented by one point. A higher value of pcgtAge is associated with a higher number
of cell divisions in the tissue. d. Scatterplot showing the relationship between the epigenetic mitotic clock
(pcgtAge) acceleration (with CCC) and chronological age of the samples for Sotos (orange) and control (grey).
Each sample is represented by one point. The yellow line represents the linear model pcgtAge_EAAwith CCC ∼
Age, with the standard error shown in the light yellow shade.
3.4 Comparing Sotos syndrome and physiological ageing 87
group proteins become hypermethylated with age (captured by a metric called pcgtAge; see
section 2.3.2). This hypermethylation correlates with the number of cell divisions in the
tissue and is also associated with an increase in cancer risk [Yang et al., 2016c]. I calculated
pcgtAge for the Sotos samples and compared them against the healthy controls (using a model
similar to the one in equation 2.16, although in this case the dependent variable was pcgtAge;
see section 2.3.2). I found a trend suggesting that the epigenetic mitotic clock might be
accelerated in Sotos patients (p-value = 0.0112, Fig. 3.4c,d), which could explain the higher
cancer predisposition (e.g. to acute leukemia, sacrococcygeal teratoma, neuroblastoma, ...)
reported in these patients and might relate to their overgrowth [Leventopoulos et al., 2009].
Consequently, I report that individuals with Sotos syndrome present an accelerated
epigenetic age, which makes their epigenome look, on average, more than 7 years older
than expected. These changes seem to be the consequence of a higher ticking rate of the
epigenetic ageing clock (or at least part of its machinery), with epigenetic age acceleration
increasing during lifespan: the youngest Sotos patient (1.6 years) has an EAAwith CCC =
5.43 years and the oldest (41 years) has an EAAwith CCC = 24.53 years. Additionally, Rett
syndrome, Kabuki syndrome and fragile X syndrome could also have their epigenetic ages
affected, but more evidence is required to be certain about this conclusion.
3.4 Comparing Sotos syndrome and physiological ageing
Sotos syndrome is caused by loss-of-function heterozygous mutations in the NSD1 gene,
a histone H3K36 methyltransferase [Choufani et al., 2015; Kurotaki et al., 2002]. These
mutations lead to a specific DNA methylation signature in Sotos patients, potentially due to
the crosstalk between the histone and DNA methylation machinery [Choufani et al., 2015]. In
order to gain a more detailed picture of the reported epigenetic age acceleration, I decided to
compare the genome-wide (or at least array-wide) changes observed in the methylome during
ageing with those observed in Sotos syndrome. For this purpose, I identified differentially
methylated positions (DMPs) for both conditions, using the models that account for cell
composition correction (see equations 2.10 and 3.1). Ageing DMPs (aDMPs) were calculated
in this case using the healthy samples in the age range 0-55 years. aDMPs were composed
almost equally of CpG sites that gain methylation with age (i.e. become hypermethylated,
51.69%) and CpG sites that lose methylation with age (i.e. become hypomethylated, 48.31%,
barplot in Fig. 3.5a), a picture that resembles previous studies [Zhu et al., 2018]. It is worth
mentioning that in this case fewer aDMPs were identified when compared with the full
lifespan analysis presented in section 2.1.4, where the hypomethylated aDMPs were also
88 Biological aspects
slightly more frequent when compared with the hypermethylated ones. This highlights
the importance of the age range and/or the sample size when calculating aDMPs. On the
contrary, DMPs in Sotos were clearly dominated by CpGs that have lower methylation levels
in individuals with the syndrome (i.e. hypomethylated, 99.27%, barplot in Fig. 3.5a). This is
highly consistent with the results from a previous report, where 99.3% of the Sotos DMPs
identified (in this case applying a filter of >20% difference in average DNA methylation
levels) were hypomethylated in Sotos patients [Choufani et al., 2015]. It is important to
point out that Sotos syndrome patients and healthy control samples were matched for age
and sex in both differential analyses. Furthermore, in my analysis I included age and sex as
covariates in the linear model (see equation 3.1), which minimises the chances that Sotos
DMPs could also constitute ageing DMPs.
Then, I compared the intersections between the hypermethylated and hypomethylated
DMPs in ageing and Sotos. Most of the DMPs were specific for ageing or Sotos (i.e. they did
not overlap), but a subset of them were shared (table in Fig. 3.5a). Interestingly, there were
1728 DMPs that became hypomethylated both during ageing and in Sotos (‘Hypo-Hypo
DMPs’). This subset of DMPs is of special interest because it could be used to understand in
more depth some of the mechanisms that drive hypomethylation during physiological ageing.
Thus, I tested whether the different subsets of DMPs are found in specific genomic contexts
(Fig. S2.4, Fig. S2.5). DMPs that are hypomethylated during ageing and in Sotos were
both enriched (odds ratio >1) in enhancer categories (such as ‘active enhancer 1’ or ‘weak
enhancer 1’, see the chromatin state model used, from the K562 cell line, in section 3.7)
and depleted (odds ratio <1) for active transcription categories (such as ‘active TSS’ or
‘strong transcription’), which was also observed in the ‘Hypo-Hypo DMPs’ subset (Fig. 3.5b).
Interestingly, age-related hypomethylation in enhancers seems to be a characteristic of both
humans [Slieker et al., 2018, 2016] and mice [Cole et al., 2017b]. Furthermore, both de
novo DNA methyltransferases (DNMT3A and DNMT3B) have been shown to bind in an
H3K36me3-dependent manner to active enhancers [Rinaldi et al., 2016], consistent with
these results.
When looking at the levels of total RNA expression (depleted for rRNA) in blood, I
confirmed a significant reduction in the RNA levels around these hypomethylated DMPs
when compared with the controls sets (Fig. 3.5c, see section 3.7 for more details on how
the control sets were defined). Interestingly, hypomethylated DMPs in both ageing and
Sotos were depleted from gene bodies (Fig. 3.5b) and were located in areas with lower
levels of H3K36me3 when compared with the control sets (Fig. 3.5d, Fig. S2.5). Moreover,
hypomethylated aDMPs and hypomethylated Sotos DMPs where both generally enriched or
3.4 Comparing Sotos syndrome and physiological ageing 89
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ● ●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
● ●
●
●
●
●●
●●
●
●
●
●
Hypo−Hypo DMPs
Hypo Sotos DMPs
Hypo aDMPs
Active Enhancer 1
Active Enhancer 2
Active Enhancer Flank
Active TSS
Bivalent prom
oter
CGI
Gene_body
Heterochrom
atin
Poised prom
oter
Prim
ary DNase
Prim
ary H3K27ac possible Enhancer
Prom
oter Downstream
 TSS 1
Prom
oter Downstream
 TSS 2
Prom
oter Upstream
 TSS
Quiescent/low
Repressed polycom
b
Shelf
Shore
Strong transcription
Transcribed − 3' preferential
Transcribed − 5' preferential
Transcribed & regulatory (Prom
/Enh)
Transcribed 3' preferential and Enh
Transcribed 5' preferential and Enh
Transcribed and W
eak Enhancer
W
eak Enhancer 1
W
eak Enhancer 2
W
eak transcription
ZNF genes & repeats
0.01
0.10
1.00
10.00
0.01
0.10
1.00
10.00
0.01
0.10
1.00
10.00
Od
ds
 ra
tio
25
50
75
100
− log10(P − value)
0
20000
40000
60000
80000
Ageing Sotos
Nu
m
be
r o
f D
M
Ps
Methylation change
Hypermethylated
Hypomethylated Hyper aDMPs
Hyper Sotos
DMPs
Hypo aDMPs
Hypo Sotos
DMPs
29 2550
7 1728
< 2.2e−16
390815 37451
−0.319 −0.351
< 2.2e−16
413204 15062
−0.319 −0.375
< 2.2e−16
426538 1728
−0.323 −0.375
Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 su
bs
et
Co
nt
ro
l
In
 su
bs
et
Co
nt
ro
l
In
 su
bs
et−
0.
6
−0
.3
0.
0
0.
3
0.
6
NR
E
Feature: RNA
< 2.2e−16
390815 37451
−0.297 −0.335
< 2.2e−16
413204 15062
−0.301 −0.308
< 2.2e−16
426538 1728
−0.301 −0.34
Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 su
bs
et
Co
nt
ro
l
In
 su
bs
et
Co
nt
ro
l
In
 su
bs
et−
0.
6
−0
.3
0.
0
0.
3
0.
6
NF
C
Feature: H3K36me3
Figure 3
a b
c
d
Fig. 3.5 Comparison between the DNA methylation changes during physiological ageing and in Sotos. a.
On the left: barplot showing the total number of differentially methylated positions (DMPs) found during
physiological ageing and in Sotos syndrome. CpG sites that increase their methylation levels with age in the
healthy population or those that are elevated in Sotos patients (when compared with a control) are displayed
in red. Conversely, those CpG sites that decrease their methylation levels are displayed in blue. On the right:
table that represents the intersection between the ageing (aDMPs) and the Sotos DMPs. The subset resulting
from the intersection between the hypomethylated DMPs in ageing and Sotos is called the ‘Hypo-Hypo DMPs’
subset (N=1728). b. Enrichment for the categorical (epi)genomic features considered when comparing the
different genome-wide subsets of differentially methylated positions (DMPs) in ageing and Sotos against a
control (see section 3.7). The y-axis represents the odds ratio (OR), the error bars show the 95% confidence
interval for the OR estimate and the colour of the points codes for − log10(p-value) obtained after testing for
enrichment using Fisher’s exact test. An OR > 1 shows that the given feature is enriched in the subset of
DMPs considered, whilst an OR < 1 shows that it is found less than expected. In grey: features that did not
reach significance using a significance level of α = 0.01 after Bonferroni correction. c. Boxplots showing the
distributions of the ‘normalised RNA expression’ (NRE) when comparing the different genome-wide subsets
of differentially methylated positions (DMPs) in ageing and Sotos against a control (see section 3.7). NRE
represents normalised mean transcript abundance in a window of± 200 bp from the CpG site coordinate (DMP)
being considered. The p-values (two-sided Wilcoxon’s test, before multiple testing correction) are shown above
the boxplots. The number of DMPs belonging to each subset (in green) and the median value of the feature
score (in dark red) are shown below the boxplots. d. As in c., but showing the ‘normalised fold change’ (NFC)
for the H3K36me3 histone modification (representing normalised mean ChIP-seq fold change for H3K36me3
in a window of ± 200 bp from the DMP being considered).
90 Biological aspects
depleted for the same histone marks in blood (Fig. S2.5), which adds weight to the hypothesis
that they share the same genomic context and could become hypomethylated through similar
molecular mechanisms.
Intriguingly, I also identified a subset of DMPs (2550) that were hypermethylated during
ageing and hypomethylated in Sotos (Fig. 3.5a). These ‘Hyper-Hypo DMPs’ seem to be
enriched for categories such as ‘bivalent promoter’ and ‘repressed polycomb’ (Fig. S2.4),
which are normally associated with developmental genes [Bernhart et al., 2016; Bernstein
et al., 2006]. These categories are also a defining characteristic of the hypermethylated
aDMPs, highlighting that even though the direction of the DNA methylation changes is
different in some ageing and Sotos DMPs, the genomic context in which they happen is
shared.
Finally, I looked at the DNA methylation patterns in the 353 Horvath’s epigenetic clock
CpG sites for the Sotos samples. For each clock CpG site, I modelled the changes of DNA
methylation with age in the healthy control individuals (0-55 years) and then calculated
the deviations from these patterns for the Sotos samples (Fig. 3.6, see equation 3.3). As
expected, the landscape of clock CpG sites is dominated by hypomethylation in the Sotos
samples, although only a small fraction of the clock CpG sites seems to be significantly
affected (Fig. 3.6c). Overall, I confirmed the trends reported for the genome-wide analysis
(Fig. S2.6, Fig. S2.7, Fig. S2.8). However, given the much smaller number of CpG sites to
consider in this analysis, very few comparisons reached significance.
I have demonstrated that the ageing process and Sotos syndrome share a subset of hy-
pomethylated CpG sites that is characterised by an enrichment in enhancer features and a
depletion of active transcription activity. This highlights the usefulness of developmental
disorders as a model to study the mechanisms that may drive the changes in the methy-
lome with age, since they permit stratification of the ageing DMPs into different functional
categories that are associated with alterations in the function of specific genes and hence
specific molecular components of the epigenetic ageing clock.
3.5 Methylation Shannon entropy and the epigenetic clock
In section 2.1.5 I have discussed how Shannon entropy can be applied in the context of
DNA methylation data in order to measure the genome-wide epigenetic information loss that
happens during ageing. It is possible to apply a methodology similar to the one described in
section 2.2.2 to compare the methylation Shannon entropy in healthy controls (0-55 years)
3.5 Methylation Shannon entropy and the epigenetic clock 91
●
●
● ●● ●●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
● ●
●●●●
● ●
●
●
●●
●
●
● ●
●
●
●
●
●
● ●
● ●●
●
●
●
●
●
●
●
●●
● ● ●
●
●
●
●
●
●
●●● ●
● ●●
●
●
● ●
●
●
●● ●
●
●
●
●
● ●
●
●●● ●● ●
●
●
●
●
●
●
● ●
●
●●
●
●● ●●
●
●
●●
●
●●
●●
●●
●
●
●
● ●
●
●
● ●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●●●●
●
●
●
●●
●
●●●
●
●
●
●●●
●
● ●
●
●●
● ●●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●●
●●
●● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
● ● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
● ●
● ●
●
●
●
●
●●
● ●
●●
●
●●
●
●
●
●
●
●● ●
●
●
●
● ●●● ●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
● ● ●
●
●
● ●
●●●
●
●● ● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
● ●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
● ●
●
● ●●
●
●
● ●
●●
●
●
●●
●
●
● ●
●
●
●
● ●
●
●●
●
●
●
●● ●
●
●
●●●
●●
●
●●
●
● ● ●
●
●
0.
00
0.
25
0.
50
0.
75
1.
00
0 20 40
Chronological age (years)
β-v
al
ue Disease status
●
●
Control
Sotos
cg02071305
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●● ●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●● ●●
●●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●● ●●
●
● ●●
● ●●
●
●
●
●
●● ●
●● ●
●
● ●
●
●
● ●
●
●
● ●●
●
●
●●
● ●●
●
●
●
●
●
●●
●
● ●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
● ●
●
●
●
● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●●
● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●●
●●● ●
●
●
●
●
●
●
●
● ●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●● ●
● ●
● ●
●
●
●●●
●
●
●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
● ●●
●●
● ●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
● ●
●● ●
●
●
●
●
●
●
●● ●
●
●●
●
●
● ●● ●
●
●
●●
●
●
●●●
●● ●●
●
●●●
●
●●
●
●
●
●●●
● ●●
●
●
●
●●●●●
●● ●●
●
●
●●
●
●
● ●● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
0.
00
0.
25
0.
50
0.
75
1.
00
0 20 40
Chronological age (years)
Disease status
●
●
Control
Sotos
cg18328933
Horvath's clock CpGs
So
to
s 
sa
m
pl
es
cg
04
47
48
32
cg
13
82
80
47
cg
03
33
00
58
cg
16
98
49
44
cg
19
72
28
47
cg
22
19
78
30
cg
01
82
03
74
cg
03
76
04
83
cg
02
07
13
05
cg
05
36
57
29
cg
18
32
89
33
cg
27
01
63
07
cg
09
80
96
72
cg
27
16
90
20
cg
25
56
48
00
cg
22
80
90
47
cg
03
57
80
41
cg
26
45
35
88
cg
02
65
42
91
cg
02
36
46
42
cg
22
17
18
29
cg
00
07
59
67
cg
17
27
40
64
cg
00
16
89
42
cg
04
43
10
54
cg
22
94
70
00
cg
01
35
34
48
cg
13
26
94
07
cg
26
39
49
40
cg
08
03
00
82
cg
02
48
95
52
cg
00
94
55
07
cg
15
80
49
73
cg
25
10
19
36
cg
02
08
55
07
cg
03
27
02
04
cg
15
98
82
32
cg
26
61
40
73
cg
24
26
24
69
cg
19
85
37
60
cg
24
89
97
50
cg
25
77
11
95
cg
05
25
04
58
cg
22
44
91
14
cg
13
85
48
74
cg
24
12
68
51
cg
14
16
37
76
cg
00
09
16
93
cg
10
34
59
36
cg
05
75
57
79
cg
14
25
82
36
cg
19
72
44
70
cg
21
09
63
99
cg
02
27
52
94
cg
08
96
52
35
cg
18
44
00
48
cg
17
68
68
85
cg
16
89
94
42
cg
18
98
41
51
cg
10
26
64
90
cg
24
05
81
32
cg
05
29
42
43
cg
12
35
14
33
cg
17
06
39
29
cg
06
11
78
55
cg
20
82
80
84
cg
07
45
52
79
cg
00
43
66
03
cg
21
87
08
84
cg
10
86
51
19
cg
17
32
41
28
cg
06
14
49
05
cg
13
93
12
28
cg
05
59
02
57
cg
05
44
29
02
cg
20
24
08
60
cg
14
17
54
38
cg
11
29
99
64
cg
01
51
15
67
cg
14
72
30
32
cg
14
89
41
44
cg
22
28
98
37
cg
16
03
46
52
cg
12
94
13
69
cg
26
37
25
17
cg
19
30
52
27
cg
09
11
86
25
cg
22
90
18
40
cg
06
95
23
10
cg
27
20
27
08
cg
25
80
99
05
cg
19
04
69
59
cg
04
45
27
13
cg
21
95
05
18
cg
27
54
41
90
cg
19
70
66
82
cg
25
55
24
92
cg
26
72
38
47
cg
04
12
19
83
cg
03
89
13
19
cg
22
61
30
10
cg
17
33
84
03
cg
14
99
22
53
cg
16
16
83
11
cg
08
33
19
60
cg
09
64
63
92
cg
26
84
53
00
cg
19
76
12
73
cg
02
33
24
92
cg
19
69
27
10
cg
07
40
84
56
cg
07
33
75
98
cg
06
73
86
02
cg
14
65
48
75
cg
03
10
31
92
cg
16
41
93
45
cg
11
38
82
38
cg
25
16
68
96
cg
02
33
54
41
cg
16
57
91
01
cg
16
49
44
77
cg
26
62
09
59
cg
13
68
27
22
cg
01
26
29
13
cg
02
82
71
12
cg
06
99
34
13
cg
04
26
84
05
cg
18
98
36
72
cg
19
16
76
73
cg
08
12
47
22
cg
10
37
72
74
cg
21
39
57
82
cg
12
83
06
94
cg
06
46
22
91
cg
19
42
09
68
cg
20
76
13
22
cg
05
90
36
09
cg
10
52
30
19
cg
17
72
96
67
cg
04
83
60
38
cg
25
15
96
10
cg
24
45
03
12
cg
05
67
53
73
cg
09
13
30
26
cg
01
48
56
45
cg
24
08
18
19
cg
12
41
35
66
cg
21
37
82
06
cg
27
01
59
31
cg
02
21
71
59
cg
22
67
91
20
cg
07
59
59
43
cg
14
32
91
57
cg
23
94
15
99
cg
15
66
14
09
cg
01
65
62
16
cg
24
11
68
86
cg
27
37
74
50
cg
02
38
81
50
cg
09
72
25
55
cg
20
79
58
63
cg
05
96
00
24
cg
10
37
67
63
cg
02
58
06
06
cg
24
58
00
01
cg
13
03
85
60
cg
09
86
98
58
cg
18
18
07
83
cg
13
31
91
75
cg
25
65
78
34
cg
25
41
17
25
cg
17
65
56
14
cg
05
92
16
99
cg
19
56
96
84
cg
12
37
37
71
cg
22
73
63
54
cg
23
51
76
05
cg
06
49
39
94
cg
08
37
09
96
cg
07
38
84
93
cg
00
43
15
49
cg
07
73
03
01
cg
07
15
83
39
cg
12
76
86
05
cg
17
09
95
69
cg
16
15
04
35
cg
03
01
90
00
cg
01
57
08
85
cg
13
46
04
09
cg
22
19
01
14
cg
03
58
83
57
cg
16
54
75
29
cg
01
58
44
73
cg
00
37
47
17
cg
13
97
53
69
cg
13
12
90
46
cg
06
81
06
47
cg
04
12
68
66
cg
03
68
28
23
cg
10
48
69
98
cg
14
59
79
08
cg
10
04
58
81
cg
23
12
44
51
cg
24
47
18
94
cg
16
35
88
26
cg
11
02
57
93
cg
21
37
01
43
cg
07
28
52
76
cg
19
94
58
40
cg
09
50
96
73
cg
08
09
07
72
cg
14
06
08
28
cg
01
40
77
97
cg
06
36
11
08
cg
26
29
76
88
cg
27
49
43
83
cg
16
40
83
94
cg
12
61
62
77
cg
21
21
17
48
cg
27
09
20
35
cg
20
29
56
71
cg
20
99
98
13
cg
18
03
10
08
cg
20
10
03
81
cg
17
85
35
87
cg
15
97
40
53
cg
17
96
05
16
cg
14
50
12
53
cg
18
95
60
95
cg
19
51
49
28
cg
19
04
46
74
cg
23
78
65
76
cg
13
21
60
57
cg
09
78
51
72
cg
04
00
50
32
cg
24
83
47
40
cg
09
44
11
52
cg
26
84
20
24
cg
08
25
10
36
cg
26
04
54
34
cg
04
08
41
57
cg
26
82
40
91
cg
04
52
88
19
cg
01
23
40
63
cg
22
00
63
86
cg
22
63
75
07
cg
00
86
48
67
cg
27
31
98
98
cg
25
78
11
23
cg
07
84
99
04
cg
15
26
29
28
cg
15
54
75
34
cg
07
29
15
63
cg
19
00
88
09
cg
17
28
53
25
cg
02
97
25
51
cg
06
51
30
75
cg
14
72
79
52
cg
09
88
59
51
cg
08
41
34
69
cg
20
52
42
16
cg
10
92
09
57
cg
15
70
35
12
cg
15
38
17
69
cg
04
09
41
60
cg
01
87
36
45
cg
14
65
83
62
cg
03
94
73
62
cg
25
50
56
10
cg
08
77
17
31
cg
26
16
26
95
cg
11
65
32
66
cg
05
84
77
78
cg
23
18
03
65
cg
20
30
56
10
cg
25
68
30
12
cg
14
40
89
69
cg
17
40
86
47
cg
22
92
08
73
cg
02
33
15
61
cg
11
93
25
64
cg
15
18
52
86
cg
10
94
00
99
cg
09
41
82
83
cg
21
30
52
65
cg
16
24
17
14
cg
02
04
75
77
cg
12
98
54
18
cg
07
66
37
89
cg
22
43
22
69
cg
06
12
14
69
cg
14
42
45
79
cg
23
09
20
72
cg
06
92
67
35
cg
08
18
61
24
cg
26
45
69
57
cg
23
66
26
75
cg
07
49
84
21
cg
09
72
23
97
cg
21
46
00
81
cg
09
19
13
27
cg
06
68
88
48
cg
13
83
66
27
cg
06
55
73
58
cg
26
00
38
13
cg
09
01
99
38
cg
19
47
87
43
cg
01
02
77
39
cg
02
47
95
75
cg
26
00
50
82
cg
24
25
41
20
cg
21
80
13
78
cg
20
94
77
75
cg
26
04
33
91
cg
27
41
35
43
cg
25
07
06
37
cg
15
34
13
40
cg
19
27
31
82
cg
01
64
48
50
cg
12
94
62
25
cg
14
30
84
52
cg
22
56
85
40
cg
19
34
61
93
cg
14
40
99
58
cg
01
02
78
05
cg
25
92
85
79
cg
08
43
42
34
cg
03
28
67
83
cg
01
96
81
78
cg
18
57
33
83
cg
10
28
10
02
cg
03
16
72
75
cg
20
91
45
08
cg
13
89
91
08
cg
20
69
25
69
cg
18
05
50
07
cg
01
45
94
53
cg
06
83
67
72
cg
13
54
72
37
cg
25
14
85
89
cg
02
15
40
74
cg
17
58
93
41
cg
13
30
21
54
cg
03
56
53
23
cg
24
88
80
49
cg
06
04
48
99
cg
18
13
97
69
cg
01
56
08
71
cg
04
99
96
91
cg
07
77
02
22
cg
16
74
47
41
cg
11
31
46
84
cg
14
42
37
78
β-value difference
−0.2
−0.1
0
0.1
0.2
Sex
Male
Female
EAA with CCC (years)
−10
0
10
20
30
Chronological age (years)
0
20
40
60
Sotos DMPs
Hypomethylated
aDMPs
Hypermethylated
Hypomethylated
Weight
in model
−1
−0.5
0
0.5
1
ChrHMM state
(in K562)
Active TSS
Promoter
Transcribed
Weakly transcribed
Transcribed/regulatory
Active enhancer
Weak enhancer
DNase
Heterochromatin
Poised promoter
Bivalent promoter
Repressed polycomb
Quiescent/low
RNA
(in PBMC)
−2
−1
0
1
2
H3K36me3
(in PBMC)
−2
−1
0
1
2
In gene body
Yes
No
a b
c
Fig. 3.6 The landscape of Horvath’s epigenetic clock CpG sites in Sotos syndrome. a. and b. DNA methylation
(β -value) profiles for two of the clock CpG sites (cg02071305 and cg18328933). A linear model (displayed in
dark grey, see equation 3.3) can be fixed to each CpG site to model the changes in β -value with chronological age
in the controls (grey). Afterwards, the difference of the Sotos samples β -values (orange) with the controls can be
estimated. c. Heatmap displaying the differential methylation patterns for Sotos samples (rows) when compared
with controls in each one of the 353 epigenetic clock CpGs (columns). Hierarchical clustering was performed
in both rows and columns. RNA refers to the ‘normalised RNA expression’ (NRE). H3K36me3 refers to the
H3K36me3 histone modification ‘normalised fold change’ (NFC). aDMPs: differentially methylated positions
during ageing. EAA: epigenetic age acceleration. CCC: cell composition correction. PBMC: peripheral blood
mononuclear cells.
92 Biological aspects
and Sotos patients (i.e. using a linear model similar to equation 2.16, although in this case
the dependent variable is the entropy value). This allows testing whether Sotos syndrome
patients present genome-wide Shannon entropy acceleration i.e. deviations from the expected
genome-wide Shannon entropy for their age. Despite detailed analysis, I did not find evidence
that this was the case when looking genome-wide (p-value = 0.71, Fig. 3.7a,b, Fig. S2.9a).
When I considered only the 353 Horvath’s epigenetic clock CpG sites for the entropy
calculations, the picture was different. Shannon entropy for the 353 clock sites slightly
decreased with age in the controls when I included all the batches, showing the opposite
direction when compared with the genome-wide entropy (SCC =−0.1223, p-value = 3.8166
·10−5, Fig. 3.7c). However, when I removed the ‘Europe’ batch (which was an outlier even
after pre-processing, Fig. S2.10), this trend was reversed and I observed a weak increase of
clock Shannon entropy with age (SCC = 0.1048, p-value = 8.6245 ·10−5). This shows that
Shannon entropy calculations are very sensitive to batch effects, especially when considering
a small number of CpG sites, and the results must be interpreted carefully, as already
discussed in section 2.1.5.
Interestingly, the mean Shannon entropy across all the control samples was higher in the
epigenetic clock sites (mean = 0.4726, Fig. 3.7c) with respect to the genome-wide entropy
(mean = 0.3913, Fig. 3.7a). Sotos syndrome patients displayed a lower clock Shannon entropy
when compared with the control (p-value = 5.0449 ·10−12, Fig. 3.7d, Fig. S2.9b), which is
probably driven by the hypomethylation of the clock CpG sites. Furthermore, this highlights
that the Horvath’s epigenetic clock sites could have slightly different characteristics in
terms of the methylation entropy associated with them when compared with the genome
as a whole, something that to my knowledge has not been reported before.
3.6 Discussion
The epigenetic ageing clock has emerged as the most accurate biomarker of the ageing process
and it seems to be a conserved property in mammalian genomes [Field et al., 2018; Horvath
and Raj, 2018]. However, it is still unknown whether the age-related DNA methylation
changes measured are functional at all or whether they are related to some fundamental
process of the biology of ageing. Developmental disorders in humans represent an interesting
framework to look at the biological effects of mutations in genes that are fundamental
for the integrity of the epigenetic landscape and other core processes, such as growth or
neurodevelopment [Aref-Eshghi et al., 2018b; Bjornsson, 2015]. Therefore, using a reverse
3.6 Discussion 93
●
● ●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●●●
●
●●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●●
●
●
●
●
●● ●●
●
● ●
●
●
● ●
●
●
●
●
●
● ●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●●
●●
●
●
●
● ●● ●
●
●●
●
●
●●
●●●
●
●
●
●●●●●
●●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●● ●
●●
●
●●
●
●●
●
●
●
●
●
●●●
● ●
●
●
●●
●
●● ●
●
●● ●
●
●●
●
●
●● ●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●● ●
●
● ●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ● ● ●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●● ●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
● ●●●●
●●
●
●
●
●
●
●
●
●●
●●●
● ●
●●
● ●● ●
●●●
●
●
●●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●●
●
●
●
●
● ● ●
●
● ●●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●●●
●
●
●
●
● ●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
0.3
0.4
0.5
0.6
0 20 40
Chronological age (years)
Ge
no
m
e−
wi
de
 S
ha
nn
on
 e
nt
ro
py
Disease status
●
●
Control
Sotos
Control: N=1128
Sotos: N=20
0.71
−0
.0
25
0.
00
0
0.
02
5
0.
05
0
Co
nt
ro
l
So
to
s
Ge
no
m
e−
wi
de
 S
ha
nn
on
 e
nt
ro
py
 a
cc
el
er
at
io
n
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●● ●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
● ●
●●
●
● ●
●
● ●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
● ●
●
●● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●● ●
●
●
● ●
●
●
● ●●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
● ●
●
●●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
● ●
●
●
●
●●
●
●●●
●
●
●●●
●●
● ● ●
●
●●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
0.3
0.4
0.5
0.6
0 20 40
Chronological age (years)
Sh
an
no
n 
en
tro
py
 fo
r t
he
 cl
oc
k 
si
te
s
Disease status
●
●
Control
Sotos
Control: N=1128
Sotos: N=20
5e−12
−0
.0
6
−0
.0
3
0.
00
0.
03
0.
06
Co
nt
ro
l
So
to
s
Sh
an
no
n 
en
tro
py
 a
cc
el
er
at
io
n 
fo
r t
he
 cl
oc
k 
si
te
s
Figure 4
a c
b d
Fig. 3.7 Analysis of methylation Shannon entropy during physiological ageing and in Sotos syndrome. a.
Scatterplot showing the relation between genome-wide Shannon entropy (i.e. calculated using the methylation
levels of all the CpG sites in the array) and chronological age of the samples for Sotos (orange) and healthy
controls (grey). Each sample is represented by one point. b. Boxplots showing the distributions of genome-wide
Shannon entropy acceleration (i.e. deviations from the expected genome-wide Shannon entropy for their age)
for the control and Sotos samples. The p-value displayed on top of the boxplots was derived from a two-sided
Wilcoxon’s test. c. As in a., but using the Shannon entropy calculated only for the 353 CpG sites in the
Horvath’s epigenetic clock. d. As in b., but using the Shannon entropy calculated only for the 353 CpG sites in
the Horvath’s epigenetic clock.
94 Biological aspects
genetics approach, I aimed to identify genes that disrupt aspects of the behaviour of the
epigenetic ageing clock in humans.
Most of the studies have looked at the epigenetic ageing clock using Horvath’s epigenetic
clock [Horvath, 2013a], and I decided to employ it as a tool to measure the epigenetic age of
my samples. The results from the screen strongly suggest that Sotos syndrome accelerates
epigenetic ageing. Sotos syndrome is caused by loss-of-function mutations in the NSD1
gene [Choufani et al., 2015; Kurotaki et al., 2002], which encodes a histone H3 lysine 36
(H3K36) methyltransferase. This leads to a phenotype which can include pre-natal and
post-natal overgrowth, facial gestalt, advanced bone age, developmental delay, higher cancer
predisposition and, in some cases, heart defects [Leventopoulos et al., 2009]. Remarkably,
many of these characteristics could be interpreted as ageing-like, identifying Sotos syndrome
as a potential human model of accelerated physiological ageing.
NSD1 catalyses the addition of either monomethyl (H3K36me) or dimethyl groups
(H3K36me2) and indirectly regulates the levels of trimethylation (H3K36me3) by altering
the availability of the monomethyl and dimethyl substrates for the trimethylation enzymes
(SETD2 in humans, whose mutations cause a ‘Sotos-like’ overgrowth syndrome ) [Luscan
et al., 2014; Wagner and Carpenter, 2012]. H3K36 methylation has a complex role in the
regulation of transcription [Wagner and Carpenter, 2012] and has been shown to regulate
nutrient stress response in yeast [McDaniel et al., 2017]. Moreover, experiments in model or-
ganisms (yeast and worm) have demonstrated that mutations in H3K36 methyltranferases
decrease lifespan and, remarkably, mutations in H3K36 demethylases increase it [Ni
et al., 2012; Pu et al., 2015; Sen et al., 2015].
In humans, DNA methylation patterns are established and maintained by three conserved
enzymes: the maintenance DNA methyltransferase DNMT1 and the de novo DNA methyl-
transferases DNMT3A and DNMT3B [Schübeler, 2015]. Both DNMT3A and DNMT3B
contain PWWP domains that can read the H3K36me3 histone mark [Baubec et al., 2015;
Dhayalan et al., 2010]. Therefore, the H3K36 methylation landscape can influence DNA
methylation levels in specific genomic regions through the recruitment of the de novo DNA
methyltransferases. Mutations in the PWWP domain of DNMT3A impair its binding to
H3K36me2 and H3K36me3 and cause an undergrowth disorder in humans (microcephalic
dwarfism) [Heyn et al., 2019]. This redirects DNMT3A, which is normally targeted to
H3K36me2 and H3K36me3 throughout the genome, to DNA methylation valleys (DMVs,
a.k.a DNA methylation canyons), which become hypermethylated [Heyn et al., 2019]; a
phenomenon that also seems to happen during physiological ageing in humans [Rakyan
et al., 2010; Slieker et al., 2016; Teschendorff et al., 2010] and mice [Cole et al., 2017b].
3.6 Discussion 95
DMVs are hypomethylated domains conserved across cell types and species, often asso-
ciated with Polycomb-regulated developmental genes and marked by bivalent chromatin
(with H3K27me3 and H3K4me3) [Jeong et al., 2013; Li et al., 2018; Long et al., 2013;
Xie et al., 2013]. Therefore, I suggest a model (Fig. 3.8) where the reduction in the levels
of H3K36me2 and/or H3K36me3, caused by a proposed decrease in H3K36 methyla-
tion maintenance during ageing or NSD1 function in Sotos syndrome, could lead to hy-
pomethylation in many genomic regions (because DNMT3A is recruited less efficiently)
and hypermethylation in DMVs (because of the higher availability of DNMT3A). In-
deed, I observe enrichment for categories such as ‘bivalent promoter’ or ‘repressed polycomb’
in the hypermethylated DMPs in Sotos and ageing (Fig. S2.4), which is also supported
by higher levels of Polycomb Repressing Complex 2 (PRC2, represented by EZH2) and
H3K27me3, the mark deposited by PRC2 (Fig. S2.5).This is also consistent with the results
obtained for the epigenetic mitotic clock [Yang et al., 2016c], where I observe a trend towards
increased hypermethylation of Polycomb-bound regions in Sotos patients. Furthermore, it is
worth mentioning that a mechanistic link between PRC2 recruitment and H3K36me3 has
also been unravelled via the Tudor domains of some polycomb-like proteins [Cai et al., 2013;
Li et al., 2017].
A recent preprint has shown that loss-of-function mutations in DNMT3A, which cause
Tatton-Brown-Rahman overgrowth syndrome, also lead to a higher ticking rate of the epige-
netic ageing clock [Jeffries et al., 2018]. They also report positive epigenetic age acceleration
in Sotos syndrome and negative acceleration in Kabuki syndrome, consistent with my re-
sults. Furthermore, they observe a DNA methylation signature in the DNMT3A mutants
characterised by widespread hypomethylation, with a modest enrichment of DMPs in re-
gions upstream of the transcription start site, shores and enhancers [Jeffries et al., 2018],
which I also detect in the ‘Hypo-Hypo DMPs’ (those that become hypomethylated both
during physiological ageing and in Sotos). Therefore, the hypomethylation observed in
the ‘Hypo-Hypo DMPs’ is consistent with a reduced methylation activity of DNMT3A,
which in my analysis could be a consequence of the decreased recruitment of DNMT3A to
genomic regions that have lost H3K36 methylation (Fig. 3.8).
Interestingly, H3K36me3 is required for the selective binding of the de novo DNA
methyltransferase DNMT3B to the bodies of highly transcribed genes [Baubec et al., 2015].
Furthermore, DNMT3B loss reduces gene-body methylation, which leads to intragenic
spurious transcription (a.k.a cryptic transcription) [Neri et al., 2017]. An increase in this
so-called cryptic transcription seems to be a conserved feature of the ageing process [Sen
et al., 2015]. Therefore, the changes observed in the ‘Hypo-Hypo DMPs’ could theoretically
96 Biological aspects
DMV
Ageing Sotos syndrome
( NSD1)( H3K36 methylation maintenance)
DMV
H3K36me2/3
5-mC
C
DNMT3A
PWWP domain
Figure 5
Fig. 3.8 Proposed model that highlights the role of H3K36 methylation maintenance on epigenetic ageing. The
H3K36me2/3 mark allows recruiting de novo DNA methyltransferases DNMT3A (in green) and DNMT3B (not
shown) through their PWWP domain (in blue) to different genomic regions (such as gene bodies or pericentric
heterochromatin) [Baubec et al., 2015; Chantalat et al., 2011; Chen et al., 2004], which leads to the methylation
of the cytosines in the DNA of these regions (5mC, black lollipops). On the contrary, DNA methylation
valleys (DMVs) are conserved genomic regions that are normally found hypomethylated and associated with
Polycomb-regulated developmental genes [Jeong et al., 2013; Li et al., 2018; Long et al., 2013; Xie et al., 2013].
During ageing, the H3K36 methylation machinery could become less efficient at maintaining the H3K36me2/3
landscape. This would lead to a relocation of de novo DNA methyltransferases from their original genomic
reservoirs (which would become hypomethylated) to other non-specific regions such as DMVs (which would
become hypermethylated and potentially lose their normal boundaries), with functional consequences for the
tissues. This is also partially observed in patients with Sotos syndrome, where mutations in NSD1 potentially
affect H3K36me2/3 patterns and accelerate the epigenetic ageing clock as measured with the Horvath’s model
[Horvath, 2013a]. Given that DNMT3B is enriched in the gene bodies of highly transcribed genes [Baubec
et al., 2015] and that I found these regions depleted in the differential methylation analysis, I hypothesise that
the hypermethylation of DMVs could be mainly driven by DNMT3A instead. However, it is important to
mention that my analysis does not discard a role of DNMT3B during epigenetic ageing.
3.6 Discussion 97
be a consequence of the loss of H3K36me3 and the concomitant inability of DNMT3B to be
recruited to gene bodies. However, the ‘Hypo-Hypo DMPs’ were depleted for H3K36me3,
active transcription and gene bodies when compared with the rest of the probes in the array
(Fig. 3.5b-d), prompting me to suggest that the DNA methylation changes observed are
likely mediated by DNMT3A instead (Fig. 3.8). Nevertheless, it is worth mentioning that
the different biological replicates for the blood H3K36me3 ChIP-seq datasets were quite
heterogeneous and that the absolute difference in the case of the hypomethylated Sotos DMPs,
although significant due to the big sample sizes, is quite small. Thus, I cannot exclude the
existence of this mechanism during human ageing and an exhaustive study on the prevalence
of cryptic transcription in humans and its relation to the ageing methylome should be carried
out.
H3K36me3 has also been shown to guide deposition of the N6-methyladenosine mRNA
modification (m6A), an important post-transcriptional mechanism of gene regulation [Huang
et al., 2019]. Interestingly, a decrease in overall m6A during human ageing has been
previously reported in PBMCs [Min et al., 2018], suggesting another biological route through
which an alteration of the H3K36 methylation landscape could have functional consequences
for the organism.
Because of the way that the Horvath epigenetic clock was trained [Horvath, 2013a], it
is likely that its constituent 353 CpG sites are a low-dimensional representation of the
different genome-wide processes that are eroding the epigenome with age. My analysis
has shown that these 353 CpG sites are characterised by a higher Shannon entropy when
compared with the rest of the genome, which is dramatically decreased in the case of Sotos
patients. This could be related to the fact that Horvath’s clock CpGs are enriched in regions
of bivalent chromatin (marked by H3K27me3 and H3K4me3), conferring a more dynamic or
plastic regulatory state with levels of DNA methylation deviated from the collapsed states
of 0 or 1. Interestingly, EZH2 (part of Polycomb Repressing Complex 2, responsible for
H3K27 methylation) is an interacting partner of DNMT3A and NSD1, with mutations in
NSD1 affecting the genome-wide levels of H3K27me3 [Streubel et al., 2018]. Furthermore,
Kabuki syndrome was weakly identified in my screen as having an epigenome younger than
expected, which could be related to the fact that they show post-natal dwarfism [Aref-Eshghi
et al., 2017; Butcher et al., 2017]. Kabuki syndrome is caused by loss-of-function mutations
in KMT2D [Aref-Eshghi et al., 2017; Butcher et al., 2017], a major mammalian H3K4
mono-methyltransferase [Froimchuk et al., 2017]. Additionally, H3K27me3 and H3K4me3
levels can affect lifespan in model organisms [Sen et al., 2016]. It will be interesting to test
whether bivalent chromatin is a general feature of multi-tissue epigenetic ageing clocks.
98 Biological aspects
Thus, DNMT3A, NSD1 and the machinery in control of bivalent chromatin (such
as EZH2 and KMT2D) contribute to an emerging picture on how the mammalian
epigenome is regulated during ageing, which could open new avenues for anti-ageing
drug development. Mutations in these proteins lead to different developmental disorders
with impaired growth defects [Bjornsson, 2015], with DNMT3A, NSD1 and potentially
KMT2D also affecting epigenetic ageing. Interestingly, EZH2 mutations (which cause
Weaver syndrome, Table 3.1) do not seem to affect the epigenetic clock in my screen.
However, this syndrome has the smallest number of samples (N = 7) and this could limit the
power to detect any changes.
My screen has also revealed that Rett syndrome and fragile X syndrome (FXS) could
potentially have an accelerated epigenetic age. It is worth noting that FXS is caused by
an expansion of the CGG trinucleotide repeat located in the 5’ UTR of the FMR1 gene
[Schenkel et al., 2016]. Interestingly, Huntington’s disease, caused by a trinucleotide repeat
expansion of CAG, has also been shown to accelerate epigenetic ageing of human brain
[Horvath et al., 2016b], pointing towards trinucleotide repeat instability as an interesting
molecular mechanism to look at from an ageing perspective. It is important to notice that the
conclusions for Rett syndrome, FXS and Kabuki syndrome were very dependent on the age
range used in the healthy control (Fig. S2.2) and these results must therefore be treated with
caution.
This study has several limitations that I tried to address in the best possible way.
First of all, given that DNA methylation data for patients with developmental disorders is
relatively rare, some of the sample sizes were quite small. It is thus possible that some of the
other developmental disorders assessed are epigenetically accelerated but I lack the power to
detect this. Furthermore, people with the disorders tend to get sampled when they are young
i.e. before reproductive age. Horvath’s clock adjusts for the different rates of change in the
DNA methylation levels of the clock CpGs before and after adult/reproductive age (20 years
in humans) [Horvath, 2013a], but this could still have an effect on the predictions, especially
if the control is not properly age-matched. My solution was to discard those developmental
disorders with less than 5 samples and I required them to have at least 2 samples with an age
≥ 20 years, which reduced the list of final disorders included to the ones listed in Table 3.1.
Future studies should increase the sample size and follow the patients during their entire
lifespan in order to confirm these findings. Furthermore, it would be interesting to identify
mutations that affect, besides the mean, the variance of epigenetic age acceleration, since
changes in methylation variability at single CpG sites with age have been associated with
fundamental ageing mechanisms [Slieker et al., 2016]. Finally, testing the influence of H3K36
3.7 Additional methods 99
methylation on the epigenetic clock and lifespan in mice will provide deeper mechanistic
insights.
3.7 Additional methods
Sample generation and annotation
I collected DNA methylation data generated with the Illumina Infinium HumanMethyla-
tion450 BeadChip (450K array) from human blood. In the case of the developmental disorder
samples, I combined public data with data generated in-house by my collaborators in Canada
(Table S2.1, Fig. S2.1). The wet-lab protocols used in the public datasets can be found in
their respective GEO repositories. DNA methylation data from my Canadian collaborators
was generated according to the manufacturer’s protocol [Illumina, 2015; Research, 2019].
Basic metadata (including the chronological age) was also stored. All the mutations in
the developmental disorder samples were manually curated using Variant Effect Predictor
[McLaren et al., 2016] in the GRCh37 (hg19) human genome assembly. Those samples with
a variant of unknown significance that had the characteristic DNA methylation signature
of the disease were also included (they are labelled as ‘YES_predicted’ in Fig. S2.1). In
the case of fragile X syndrome (FXS), only male samples with full mutation (>200 repeats)
[Schenkel et al., 2016] were included in the final screen. As a consequence, only samples
with a clear molecular and clinical diagnosis were kept for the final screen.
Identifying differentially methylated positions in Sotos syndrome
Following a strategy similar to the one outlined in section 2.1.4, I identified those array
probes that were differentially methylated in patients with Sotos syndrome. I compared the
Sotos samples (N=20) against the internal control samples (N=51) from the same dataset
(GSE74432) [Choufani et al., 2015], fitting the following linear model to each one of the
array probes:
Beta∼ Disease_status+Age+Sex+Gran+CD4T +CD8T +B+Mono+NK+PC1+ ...+PC17
(3.1)
where Beta is the β -value for the array probe being evaluated; Disease_status indicates
whether a sample comes from a healthy individual (0) or a Sotos syndrome patient (1); Age is
100 Biological aspects
the chronological age (in years) of the samples; Sex encodes for the sex of the samples (0/1);
Gran, CD4T , CD8T , B, Mono and NK are the cell type proportions from the samples as
calculated with my cell-type deconvolution strategy and PCN is the Nth principal component
that captures technical variance and accounts for potential batch effects (see section 2.2.3 for
more details).
P-values and regression coefficients were extracted for the Disease_status covariate. I
selected as my final Sotos DMPs those CpG probes that survived the analysis after Bonferroni
multiple testing correction with a significance level of α = 0.01.
(Epi)genomic annotation of the CpG sites
Different (epi)genomic features were extracted for the CpG sites of interest. All the data
were mapped to the hg19 assembly of the human genome. The continuous features were
calculated by extracting the mean value in a window of± 200 bp from the CpG site coordinate
using the pyBigWig package [Richter et al., 2019]. I chose this window value based on the
methylation correlation observed between neighbouring CpG sites in previous studies [Zhang
et al., 2015b]. The continuous features included (Fig. S2.11):
• ChIP-seq data from ENCODE (histone modifications from peripheral blood mononu-
clear cells or PBMC; EZH2, as a marker of Polycomb Repressing Complex 2 binding,
from B cells; RNF2, as a marker of Polycomb Repressing Complex 1 binding, from
the K562 cell line). I obtained Z-scores (using the scale function in R) for the values of
‘fold change over control’ as calculated in ENCODE [Consortium et al., 2012]. When
needed, biological replicates of the same feature were aggregated by taking the mean
of the Z-scores in order to obtain the ‘normalised fold change’ (NFC).
• ChIP-seq data for LaminB1 (GSM1289416, quantified as ‘normalised read counts’ or
NRC) and Repli-seq data for replication timing (GSM923447, quantified as ‘wavelet-
transformed signals’ or WTS). I used the same data from the IMR90 cell line as in
[Zhou et al., 2018].
• Total RNA-seq data (rRNA depleted, from PBMC) from ENCODE. I calculated Z-
scores after aggregating the ‘signal of unique reads’ (sur) for both strands (+ and - ) in
the following manner:
RNAi = log2(1+ suri++ suri−) (3.2)
3.7 Additional methods 101
where RNAi represents the RNA signal (that then needs to be scaled to obtain the
‘normalised RNA expression’ or NRE) for the ith CpG site.
The categorical features were obtained by looking at the overlap (using the pybedtools
package) [Dale et al., 2011] of the CpG sites with the following:
• Gene bodies, from protein-coding genes as defined in the basic gene annotation of
GENCODE release 29 [Frankish et al., 2018].
• CpG islands (CGIs) were obtained from the UCSC Genome Browser [Bock et al.,
2007]. Shores were defined as regions 0 to 2 kb away from CGIs in both directions and
shelves as regions 2 to 4 kb away from CGIs in both directions as previously described
[Martin-Herranz et al., 2017b; Zhang et al., 2015b].
• Chromatin states were obtained from the K562 cell line in the Roadmap Epigenomics
Project (based on imputed data, 25 states, 12 marks) [Consortium, 2014]. A visualisa-
tion for the association between chromatin marks and chromatin states can be found
in Consortium [2013]. When needed for visualisation purposes, the 25 states were
manually collapsed to a lower number of them.
I compared the different genomic features for each one of the subsets of CpG sites
(hypomethylated aDMPs, hypomethylated Sotos DMPs, etc.) against a control set. This
control set was composed of all the probes from the background set from which I removed
the subset that I was testing. In the case of the comparisons against the 353 Horvath clock
CpG sites, a background set of the 21368 (21K) CpG probes used to train the original
Horvath model [Horvath, 2013a] was used. In the case of the genome-wide comparisons for
ageing and Sotos syndrome, a background set containing all 428266 probes that passed my
pre-processing pipeline was used (see section 2.1.2).
For each continuous feature, the feature score distributions for a given subset of CpG
sites and the control set were compared using the non-parametric two-sided Wilcoxon’s test.
For each categorical feature, I first created a 2x2 contingency table, with the two variables
indicating whether a given CpG site overlaps with the categorical feature under consideration
(Yes/No) and whether the CpG site is in the subset (e.g. hypomethylated aDMPs) being
considered (Yes/No). Using Fisher’s exact test (as implemented in the fisher.test function in
R) I calculated the p-value and the odds ratio (OR), which allows determining whether the
categorical feature under consideration is enriched in the CpGs subset.
102 Biological aspects
Differences in the epigenetic clock CpGs β -values for Sotos syndrome
To compare the β -values of the Horvath clock CpG sites between the healthy samples and
Sotos samples I fitted the following linear model to each array probe from the Horvath’s
epigenetic clock (353 in total) in the healthy individuals samples (Fig. 3.6a,b):
Beta∼ Age+Age2+Sex+Gran+CD4T +CD8T +B+Mono+NK+PC1+ ...+PC17
(3.3)
where Beta is the β -value for the clock array probe being evaluated; Age is the chronolog-
ical age (in years) of the samples; Sex encodes for the sex of the samples (0/1); Gran, CD4T ,
CD8T , B, Mono and NK are the cell type proportions from the samples as calculated with
my cell-type deconvolution strategy and PCN is the Nth principal component that captures
technical variance and accounts for potential batch effects (see section 2.2.3 for more details).
The Age2 covariate allows accounting for non-linear relationships between chronological age
and the β -values.
Finally, I calculated the difference between the β -values in Sotos samples and the
predictions from the models in equation 3.3 and displayed these differences in an annotated
heatmap (Fig. 3.6c).
Chapter 4
Technological aspects
‘It is perfectly true, as the philosophers
say, that life must be understood
backwards. But they forget the other
proposition, that it must be lived
forwards.’
Søren Kierkegaard [1843]
Declaration
The content of this chapter was joint work with Tom Stubbs, with whom I designed and developed cuRRBS.
Nevertheless, almost all the text, code and plots here presented were produced by myself. Additionally, I would
like to recognise the contributions of Janet M. Thornton and Wolf Reik (who helped designing the study),
Antonio J. M. Ribeiro (who implemented the last version of cuRRBS to make it more computationally efficient)
and Felix Krueger (who processed the RRBS datasets). All of them also helped in the revision of the final text.
This work has been published in the journal Nucleic Acids Research [Martin-Herranz et al., 2017b].
4.1 Background
With the advent of next-generation sequencing, scientists are studying the biology of life
at unprecedented resolution [Shendure and Ji, 2008]. Unfortunately, owing to the large
size of many commonly studied genomes (human, mouse and tobacco plant for example
are all > 2.5 Gbp in size) [Consortium et al., 2001, 2002; Sierro et al., 2014], it is often
still prohibitively expensive to conduct whole genome sequencing at high coverage. This
creates a trade-off that negatively impacts the number of replicates that can be included and,
therefore, it challenges the statistical power and the reproducibility of the studies [Fumagalli,
104 Technological aspects
2013; Wu et al., 2015]. This is true in particular for DNA methylation, where differentially
methylated regions ( DMRs) are typically called by identifying changes as small as 10%
and where 70−80% of the reads of Whole Genome Bisulfite Sequencing (WGBS) methods
contain little to no relevant information on the DNA methylation status [Ziller et al., 2013].
To address these cost inefficiencies, many methods have been developed to reduce the
number of genomic fragments that need to be sequenced for a given biological system
[Kacmarczyk et al., 2018; Kurdyukov and Bullock, 2016; Plongthongkum et al., 2014; Suzuki
and Greally, 2013; Yong et al., 2016]. These methods can be broadly split into those that
positively select for genomic fragments of interest and those that deplete for fragments
that are not of interest. Positive selection-based methods involve the sites of interest being
enriched from the background. This usually occurs through pull-down of these sites via an
antibody (e.g. anti-5mC antibody) [Taiwo et al., 2012], a recombinant binding protein (e.g.
methyl-CpG-binding domains or MBD) [Brinkman et al., 2010], covalent biotin tagging
[Kriukiene˙ et al., 2013], capture probes/baits for the sites of interest [Allum et al., 2015;
Cheung et al., 2017; Ivanov et al., 2013], array-based approaches (e.g. 27K, 450K and EPIC
arrays in human) [Bibikova et al., 2011, 2009; Hodges et al., 2009; Pidsley et al., 2016] or
PCR-based approaches [Bernstein et al., 2015; Deng et al., 2009; Diep et al., 2012; Komori
et al., 2011; Paul et al., 2014; Yang et al., 2015]. These methods have many limitations,
including enrichment biases, complex protocols and difficulties in quantification [Suzuki and
Greally, 2013; Yong et al., 2016].
Current evidence shows that depletion-based methods do not have enrichment biases,
tend to be simpler and are more readily quantifiable [Kurdyukov and Bullock, 2016; Suzuki
and Greally, 2013]. The most common depletion-based approaches use restriction enzymes
to exploit the fact that the nucleotide composition in a given genome is non-random and that
the fragment lengths produced from a given digestion will thus reflect this [Bystrykh, 2013;
Cedar et al., 1979; Cohen-Karni et al., 2011; Martinez-Arguelles et al., 2014; Yu et al., 2004].
In the case of 5-methylcytosine (5mC), the most common depletion-based method is Reduced
Representation Bisulfite Sequencing (RRBS) using the methylation-insensitive restriction
enzyme MspI (with the recognition sequence C|CGG) [Boyle et al., 2012; Meissner et al.,
2008], although enzymes such as BglII [Meissner et al., 2005], XmaI [Tanas et al., 2017],
Taqα I [Lee et al., 2014; Lim et al., 2016], MspJI [Huang et al., 2013] , ApeKI [Wang et al.,
2013], HpyCH4IV or HpaII [Kirschner et al., 2016] have also been used. RRBS has proven
extremely useful for cost-effective, global studies of DNA methylation [Gu et al., 2010; Lee
et al., 2014; Meissner et al., 2008; Stubbs et al., 2017], capturing around 10% of CpG sites
4.1 Background 105
within mammalian genomes but with up to a 30-fold reduction in the number of fragments
sequenced in comparison to WGBS [Smith et al., 2009].
In the context of epigenetic clocks, most studies have used methylation arrays in humans
[Hannum et al., 2013; Horvath, 2013a; Koch and Wagner, 2011] and MspI-based RRBS
in mice, dogs and wolves [Meer et al., 2018; Petkovich et al., 2017; Stubbs et al., 2017;
Thompson et al., 2018, 2017]. The utility of the MspI-based RRBS approach is limited
to a specific subset of CpG sites in the genome, mainly found within CpG islands and
promoters [Meissner et al., 2008]. Nevertheless, it is known that many age-related changes
in the methylome occur in other genomic regions (such as enhancers) [Cole et al., 2017b;
Martin-Herranz et al., 2019; Slieker et al., 2018, 2016], and current technologies could be
biasing our discoveries. Furthermore, epigenetic clocks could be used in the near future to
perform high-throughput screenings of anti-ageing drugs or employed as ageing biomarkers
in clinical trials [Horvath et al., 2018]. However, the current assay costs could preclude the
use of epigenetic clocks in this context.
Given that restriction enzyme-based approaches are versatile and simple, we devel-
oped a new computational method called customised Reduced Representation Bisulfite
Sequencing (cuRRBS), which allows researchers to optimise the RRBS protocol for a spe-
cific experiment. cuRRBS generalises the problem of genomic enrichment with restriction
enzymes by allowing the user to define both the genome and the particular sites of interest,
before outputting the optimal enzyme combinations and size ranges to target these sites. In
addition, cuRRBS provides the user with a variety of metrics to compare the various sug-
gested protocols, including an estimate of the fold-reduction in sequencing costs compared to
WGBS and a robustness value to assess the impact of experimental error in the size selection
step.
Here, we have tested the enrichment ability of cuRRBS in several biological systems
(including the Horvath epigenetic clock), with sites in both CpG and CHG contexts and
multiple species, to showcase the generalisability and utility of the software [Domcke et al.,
2015; Hanna et al., 2016; Horvath, 2013a; Kawakatsu et al., 2016; Lev Maor et al., 2015;
Maurano et al., 2015; Milagre et al., 2017]. In addition, we take advantage of two recently
published independent RRBS datasets to demonstrate the accuracy of the software predictions
in both single and double enzyme experimental settings [Lim et al., 2016; Tanas et al., 2017].
We hope that cuRRBS will be useful as a tool for designing cost-effective, genome-wide
studies in the future, to help in the development of new epigenetic-based predictors and
to validate previous results from whole genome approaches in a simple, cheap and timely
fashion.
106 Technological aspects
4.2 Restriction enzyme digestion as a tool for genomic en-
richment
Restriction enzymes represent an incredibly effective tool for the enrichment of certain sites
of interest in a genome. This is possible due to the wide variety of motifs that commercially-
available restriction enzymes can recognise (Fig. 4.1) combined with the non-random nature
of the genome composition itself. Fig. 4.1 highlights that this motif diversity is driven both
by the sequence composition (GC content) and the length of the recognition sequence. Thus,
different restriction enzymes will generate different fragment length distributions, de-
pendent upon how frequently their recognition site is present in a given genome (Fig. 4.2a,
Fig. S3.1).
GWGCWC
GKGCMC
GRGCYC
GDGCHC
GAGCT
C
GGCC
RGGNCCYRGGWCCY
GGTNACC
GTNAC
GTSAC
GAATGC
GA
AN
NN
NN
NN
TT
GG
GCATGC
CA
AN
NN
NN
GT
GG
CAY
NNN
NRT
G
CTAG
AG
CT
GCAGC
TG
CA
GA
GT
C
AGGCCT
TGTACA
TG
GC
CA
GT
AC
GGTACC
GATATC
RG
AT
CY
GGATG
GG
AT
G
AG
AT
CT
TG
AT
CA
GTATCC
GGATC
GGATCC RA
AT
TY
RCATGY
CT
NA
G
CT
CA
G
CT
CA
G
CCTC
AAG
CTT
ATTAAT
AA
TT
ATGAA
CATG
ATGCAT
CATG
CATG
TC
GATC
TA
GA
GA
TC
GCTCTTCCTCTTC
GG
TG
A
CATATG
ACATGT
CA
ST
G
WGTACW
AATAT
T
RCCGGY
WCCGGW
GCTAGC
AGTACTACRYGT
ACCTGC
GAAGAC
GA
AG
A
CCATC
ACTAGT
ACCGGT
CA
GC
TG
CTC
GAG
GCCGAGCAATTG
GCAGTG
CCATGG
GT
GC
AG
GAGG
AG
GCAATG
AC
TG
G
CC
RY
GG
CCWWGGCCNNGG
CTTG
AGCTGG
AG
ACTGGG
CT
YR
AG
CT
RY
AG
CACGAG
CCTAGG
CYCGRG
CTT
AAGCTG
AAG
GA
TC
CCTNAGG
GCTNAGC
CCTNAGC
AC
NN
NN
NC
TC
C
CCTNNNNNAGG
CCANNNNNTGG
CCANNNNNNTGG
CC
AN
NN
NN
NN
NN
TG
G
GAA
NNN
NTT
C
GA
CN
NN
NN
GT
C
GAC
NNN
GTC
GA
GN
NN
NN
CTC
AA
GN
NN
NN
CT
T
CAG
NNN
CTG
CAGTG
CT
GC
AG
CCTGCAGG
CCGG
CCCWGGG
CCWGG
CCWGG
ACCWGGT
TC
NG
ATC
AT
GA
TTCGAA TT
SA
A
TTATAA
GTTT
AAAC
TT
TA
AA
ATTTA
AAT
TT
AA
TTAATTA
A
TTAA
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●●●
●
●
●●●
●
●
●
●●●●
●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
−2
−1
0
1
−5.0 −2.5 0.0 2.5
PC1 (30.14%)
PC
2 
(1
4.
85
%
)
●
●
●
●
75−100% GC content
50−75% GC content
25−50% GC content
0−25% GC content
a b
Fig. 4.1 The landscape of restriction enzyme motifs. a. Phylogenetic analysis of the motifs that are recognised
by the different commercially-available restriction enzymes which are insensitive to CpG methylation. Each
sequence represents a different isoschizomer family considered in this study. A neighbour-joining method was
used to construct the tree. Motifs with different GC content are shown with different colours. b. Principal
component analysis (PCA) performed on the matrix of pairwise distances from the aligned motifs. Each circle
represents a different motif. The coordinates of the different motifs on the first two principal components are
plotted on the x- and y-axes. Motifs with different GC content are shown with different colours (same as in a.)
and the motif length is represented by the diameter of the circle.
In DNA methylation studies the most common application is the use of MspI (cutting at
C|CGG) in RRBS (Reduced Representation Bisulfite Sequencing), which is used to enrich
for CG dinucleotides (CpGs) contained in promoters and CpG islands [Meissner et al., 2008]
4.2 Restriction enzyme digestion as a tool for genomic enrichment 107
a
80
01
−In
f
78
01
−8
00
0
76
01
−7
80
0
74
01
−7
60
0
72
01
−7
40
0
70
01
−7
20
0
68
01
−7
00
0
66
01
−6
80
0
64
01
−6
60
0
62
01
−6
40
0
60
01
−6
20
0
58
01
−6
00
0
56
01
−5
80
0
54
01
−5
60
0
52
01
−5
40
0
50
01
−5
20
0
48
01
−5
00
0
46
01
−4
80
0
44
01
−4
60
0
42
01
−4
40
0
40
01
−4
20
0
38
01
−4
00
0
36
01
−3
80
0
34
01
−3
60
0
32
01
−3
40
0
30
01
−3
20
0
28
01
−3
00
0
26
01
−2
80
0
24
01
−2
60
0
22
01
−2
40
0
20
01
−2
20
0
18
01
−2
00
0
16
01
−1
80
0
14
01
−1
60
0
12
01
−1
40
0
10
01
−1
20
0
80
1−
10
00
60
1−
80
0
40
1−
60
0
20
1−
40
0
1−
20
0
XcmI
Bsp1407I
XbaI
HindIII
Bsp19I
BglII
BstENI
FauNDI
PflFI
Asp700I
AccB7I
AxyI
EcoT22I
BspHI
PciI
BmcAI
PaeI
MfeI
AspA2I
BlpI
Psp124BI
BciVI
BclI
AflII
PpuMI
BstDSI
PvuII
Ama87I
NmeAIII
PasI
Eco147I
SmiI
BsaWI
AhdI
CsiI
KpnI
BmtI
BstEII
BamHI
BspQI
BauI
Eco32I
AhlI
Bse118I
SbfI
PaeR7I
PacI
AsuII
AjuI
CspCI
BsgI
Alw21I
BpuEI
BseRI
BaeGI
EcoO109I
BstXI
AanI
BmrI
SspI
Bpu10I
BanII
BstX2I
AcuI
Bst6I
AlwNI
Acc36I
AflIII
BspMAI
BalI
BsmI
TaqI
AseI
BbsI
Bse3DI
BplI
FalI
BsaXI
BsiSI
MssI
AgeI
AclWI
BstNSI
BssT1I
DraI
SmlI
BpmI
TatI
BtsI
Tsp45I
MlyI
Bsp1286I
BbvI
BfmI
MslI
AgsI
TspDTI
TscAI
BtsIMutI
BfaI
MaeIII
BseMII
BspCNI
Hpy188I
MboII
CviAII
FatI
FaeI
AluBI
MseI
Tru9I
MluCI
AjnI
BciT130I
BsaJI
BshFI
AsuHPI
BccI
Bse1I
BstKTI
BfuCI
FokI
BseGI
Csp6I
AcsI
HpyCH4V
BstDEI
MnlI
Size ranges (bp)
R
es
tr
ic
tio
n 
en
zy
m
es
 (i
so
sc
hi
zo
m
er
 fa
m
ili
es
)
P
ro
po
rti
on
 
of
 fr
ag
m
en
ts
To
ta
l n
um
be
r
of
 fr
ag
m
en
ts
M
ed
ia
n 
fra
gm
en
t
le
ng
th
(b
p)
G
C
 c
on
te
nt
 (%
)
10
0
10
-4
10
-8
10
-2
10
-1
10
4
10
7
10
8
10
6
10
5
10
4
10
5
10
3
10
2
25
-5
0
0-
25
50
-7
5
75
-1
00
108 Technological aspects
b
MspI
BssAI
5
10
0 5 10 15
Total number
of sites
0.5x107
1.0x107
1.5x107
2.0x107
2.5x107
%
 o
f s
ite
s 
in
 p
ro
m
ot
er
s
% of sites in CpG islands
MfeI
MspI BsmI
6
7
8
45 50 55 60 65
%
 o
f s
ite
s 
in
 n
on
-c
od
in
g 
R
N
A 
ge
ne
s
% of sites in intergenic regions
Total number
of sites
0.5x107
1.0x107
1.5x107
2.0x107
2.5x107
c
Fig. 4.2 Restriction enzyme digestion as a tool for genomic enrichment. a. Heatmap showing the fragment
length distributions generated by different restriction enzymes in the human genome (hg38). Each column
represents the distribution for an isoschizomer family of restriction enzymes that contains at least one member
which is methylation-insensitive in a CpG context. The distributions are binned in size ranges of 200 bp,
ordered as they would appear in an electrophoretic gel. Additional row annotations on top of the heatmap
contain information regarding the total number of fragments (in red) and the median fragment length (in blue)
produced by each in silico digestion, together with the GC content of the recognition motif in the isoschizomer
family (in green). Legend is displayed on the right hand side. b. Scatterplot showing the percentage of cleavage
sites from different restriction enzymes that overlaps with CpG islands (x-axis) and promoters (y-axis) in the
human genome (hg38). The size of the circles represents the total number of cleavage sites generated by each
enzyme. The enzymes MspI and BssAI are highlighted in red and blue respectively. Legend is displayed on the
right hand side. c. Scatterplot showing the percentage of cleavage sites from different restriction enzymes that
overlaps with intergenic regions (x-axis) and non-coding RNA genes (y-axis) in the human genome (hg38).
The size of the circles represents the total number of cleavage sites generated by each enzyme. The enzyme
MspI is highlighted in red. The enzymes BsmI and MfeI are both highlighted in blue. Legend is displayed on
the right hand side.
4.3 cuRRBS: customised Reduced Representation Bisulfite Sequencing 109
(Fig. 4.2b). However, in many cases, MspI is by no means the most effective restriction
enzyme that could be used. For instance, MspI would be a poor restriction enzyme to choose
for the enrichment of CpGs found in intergenic regions or non-coding RNA genes in the
human genome, which would be far better enriched for using BsmI or MfeI respectively
(Fig. 4.2c). In fact, it turns out that across many genomic features MspI is rarely the most
optimal methylation-insensitive restriction enzyme (Fig. S3.2).
Previous studies have tested the potential of other restriction enzymes and enzyme com-
binations to expand the range of CpG sites that can be targeted in a genome [Bystrykh,
2013; Cedar et al., 1979; Kirschner et al., 2016; Lee et al., 2014; Martinez-Arguelles et al.,
2014; Tanas et al., 2017; Wang et al., 2013; Yu et al., 2004]. However, to our knowl-
edge, there is currently no computational method that systematically explores the capac-
ity of all commercially-available restriction enzymes to generate ‘personalised’ reduced-
representations of the genome whilst minimising the experimental cost (Fig. S3.3).
4.3 cuRRBS: customised Reduced Representation Bisulfite
Sequencing
We have developed a novel computational method (cuRRBS) that determines the optimal
combination of restriction enzymes and size range to enrich for any given set of sites
of interest in any genome. In other words, by modifying two of the steps in the original
RRBS protocol (Fig. 4.3a), cuRRBS generalises RRBS.
The software takes as input the genomic coordinates that the user wants to target (Fig. 4.3b,
Fig. S3.4a). Afterwards, cuRRBS assesses in silico the potential of all single enzymes and
double-enzyme combinations to enrich for the sites of interest using the following variables:
• NF, which reflects the theoretical number of genomic fragments that will be sequenced
after the size selection step (i.e. those whose lengths after the in silico digestion
are within the size range). Assuming that the sequencing cost is proportional to NF,
cuRRBS attempts to minimise this value.
• Score, which reflects the theoretical number of sites of interest that will be sequenced
after the size selection step. cuRRBS attempts to maximise this value, which can be
calculated as:
110 Technological aspects
Score =
n
∑
i=1
wi · γi (4.1)
where n is the total number of sites of interest, wi is the weight of the ith site of interest
and γi is 1 if the ith site would be theoretically sequenced (i.e. present in a size selected
fragment and ≤ read length base pairs away from one of the ends of the fragment) and
0 otherwise.
• Enrichment Value (EV), which combines both NF and Score into a single number. The
objective of cuRRBS is to minimise EV, which can be calculated as:
EV =−log10
(
Score
NF
· n
max_Score
)
(4.2)
where max_Score is the Score obtained if all the sites of interest were sequenced.
The NF and Score variables are positively correlated with one another, such that the more
genomic fragments sequenced, the more sites of interest are likely to be contained within the
reduced representation (Fig. 4.3c, Fig. S3.4b). However, this relationship disappears at higher
NF values, where the Score variable becomes saturated such that any additional fragments
sequenced will result in a reduction in the overall enrichment of the sites of interest. This
Score saturation at high NF is mainly due to additional sites of interest being buried within
long fragments that will not be sequenced due to limitations in the read length (cuRRBS
parameter –r, see Table 4.1). For a given enzyme or enzyme combination, the NF and the
Score variables depend on the size range chosen, since only the genomic fragments within
the size range will be present in the reduced representation of the genome.
cuRRBS requires that the user sets thresholds for the maximum NF (i.e. minimum
CRF, see below) and minimum Score that would be acceptable for a given application
(Fig. 4.3b, Fig. S3.4a). These thresholds allow cuRRBS to search through all possible size
ranges for a given enzyme or enzyme combination and to find the one that minimises the
Enrichment Value (EV ). cuRRBS repeats this procedure for every single enzyme and enzyme
combination and reports those with the best hits (i.e. those with the lowest EV s) (Fig. S3.4a).
The output file contains the best scoring enzymes with their correspondent size ranges
and some other useful variables for each one of the hits, such as:
4.4 Running cuRRBS in different biological systems 111
• Cost Reduction Factor (CRF), which estimates the theoretical fold-reduction in se-
quencing costs for the cuRRBS protocol when compared to Whole Genome Bisulfite
Sequencing (WGBS). The CRF for a given cuRRBS protocol can be calculated as:
CRF =
NFre f
NF
=
g/r
NF
(4.3)
where NFre f is the estimated number of fragments that would be sequenced in a WGBS
experiment, that can be roughly calculated as the genome size (g) divided by the read
length (r).
• Robustness (R). This assesses how much the cuRRBS prediction varies if a slightly
different size range is used (Fig. 4.3d). The results for robust enzymes will not be
greatly affected as a consequence of experimental error during the size selection step.
This will help the user to make an informed decision on which enzyme combination
to choose for the system of interest (Fig. S3.4c). The robustness of a given enzyme
(combination) is calculated as:
R = e−θ (4.4)
with
θ =
∑x∈{a−δ ,a,a+δ}∑y∈{b−δ ,b,b+δ} |EVx,y−EVa,b|
EVa,b
(4.5)
where EVa,b is the EV for the optimal size range (a: lower limit in size range, b:
breadth) and δ is the experimental error (in bp) that is assumed during the size
selection step. The robustness will take values in the interval (0,1], with higher values
identifying robust cuRRBS protocols.
4.4 Running cuRRBS in different biological systems
cuRRBS provides a way to effectively interrogate DNA methylation in any biological system
(including the CpG sites that constitute different epigenetic clocks) for which the reference
112 Technological aspects
a b
c d
gDNA
cuRRBS-defined
enzyme combination
Standard library
preparation
Bisulfite conversion
PCR amplification
cuRRBS-defined
size selection
Sequencing
Restricted ends
Illumina adapters
Maximum no. of
fragments to sequence
Minimum proportion of
sites of interest
cuRRBS
combination
to useorder size range robustnessCRF
1st 40-200 bp 0.95205
2nd 100-135 bp 0.4163
... 200-350 bp 0.897
30th 50-250 bp 186
Genome
of interest
Sites of
interest
%
 o
f m
ax
im
um
 S
co
re
●●●
●●
●●●●●
●
●
●●●●
●●
●●●●●●
●
●●●
●
●
●●
●●●●
●●●
●●●
●●
●
●●
●
●●
●
●
●
●●
●●●●●
●
●
●●●●●
●●
●●
●●
●●
●●●●●●
●
●
●
●
●●
●●●●●●●
●
●
●●●●
●●
●
●●
●
●●●●●●●●●●
●
●
●●●●●●●
●
●
●
●●●
●●
●
●
●●●●
●●
●●●●●●
●
●
●
●●●●●●●●●●
●
●
●●
●●
●
●
●
●
●
●●●
●●●
●●
●
●
●●●●
●●●●●●●●●
●
●
●
●●
●
●
●
●
●●
●
●●●●
●●●●●
●
●●●●●●●
●
●●●●●
●
●●
●
●
●
●
●●
●●
●●●●●●
●
●●
●●●
●●●●
●●
●●●●●
●●
●
●
●
●
●●
●
●●
●●●●●
●●●●●
●
●●●
●●
●●●●●●
●
●●
●
●
●●
●
●
●●
●
●●●●
●●●
●
●
●●
●
●●●●●●●●●
●●
●●●
●
●
●
●
●
●
●●
●●●●●
●
●
●●
●●●●
●●
●●●●
●
●●●●●●
●●●
●●
●
●
●
●
●●
●●●●●
●●●●●●
●●●●●●●●●●●●
●●●●
●
●
●
●
●
●●
●●●●●
●
●
●
●
●
●●●●
●●●●●●●●●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●●●●●●
●
●●●●●●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●●●
●●●●
●
●
●●●●●●
●●●
●●●
●
●
●
●
●
●●
●
●
●
●
●●
●●●●●●
●●●●●●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●●●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●●●
●●●●
●
●●●●
●●●
●
●
●
●
●
●
●
●
●
●●●●
●
●●●
●
●●●
●
●
●●●●
●●●
●
●
●
●
●
●
●
●
●●●
●●
●
●●●
●●●●
●
●
●●●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●●●●
●
●
●●●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
●●●●
●●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●●●●
●
●●●
●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
60-540 bp
0
25
50
75
100
0 50 100 150
NF/1000
BsaWI & BssAI
Pearson's correlation coefficient 0.9583
1.0
1.5
2.0
2.5
3.0
0 200 400 600 800 1000
200
400
600
800
1000
******
*
*
*
Lower limit size range (bp)
B
re
ad
th
 (b
p)
EV landscape for BsaWI & BssAI
Optimal size range: 60-540 bp
Robustness (d=20 bp): 0.934
●
●
●
Did not pass filtering
Optimal size range
Passed filtering
Fig. 4.3 cuRRBS overview. a. Outline of an RRBS protocol. Highlighted are the two steps that would be
modified according to the output produced by cuRRBS (i.e. the restriction enzymes used for the genomic
digestion and the size selection). Legend is displayed on the bottom left. b. Schematic of cuRRBS. Highlighted
are the two main inputs required for the software and the two thresholds that the user has to define (red and
purple tags). The default output for cuRRBS is a table containing the top hits (restriction enzyme combination
and size range) along with additional information that might be useful to the user (such as Cost Reduction
Factor and robustness). c. Scatterplot showing the trade-off between the number of fragments (NF) and
the Score for the best enzyme combination (BsaWI & BssAI) that targets the CpGs present in the human
placental-specific imprinted regions [Hanna et al., 2016]. NF is divided by 1000 for visualization purposes.
Each point represents a different size range. Shown in dark blue and grey are the size ranges that would and
would not pass filtering respectively. Shown in orange is the optimal size range in the filtered search space. The
dotted lines depict the thresholds that need to be specified by the user (red: maximum NF; purple: minimum
percentage of the maximum Score). In this mock example we specified an NF threshold of 150000 fragments
and a Score threshold of 25% of the maximum Score. Legend is displayed below the plot title. d. Contour
plot that depicts how the robustness (R) variable is calculated for the optimal enzyme combination (BsaWI &
BssAI; size range: 60-540 bp) that targets the CpGs present in the human placental-specific imprinted regions
[Hanna et al., 2016]. Enrichment values (EVs) are calculated for all possible size ranges in order to create an EV
‘landscape’. In this landscape, cuRRBS finds the size range with the lowest EV that still satisfies the thresholds
(asterisk in green). Afterwards, cuRRBS samples EVs around the optimum (asterisks in black). The points that
are sampled depend on the experimental error (in this case, δ = 20 bp). A high robustness value means that the
sampled EVs do not change a lot when compared to the optimum, which implies that cuRRBS prediction will
not be greatly affected by experimental errors during the size selection step.
4.5 Experimental validation of cuRRBS 113
genome is available. Besides reducing the cost for organisms currently under intensive study
(e.g. human, mouse), cuRRBS opens the door to the cost-effective study of DNA methylation
in species with large genomes or where DNA methylation in non-CpG contexts is common,
such as plants [Stroud et al., 2013], which currently lack an MspI-based RRBS protocol,
owing to the enzyme’s CHG methylation sensitivity [Sun et al., 2014b].
We decided to test the ability of cuRRBS to enrich for genomic sites that have important
functional roles in different systems. Some of the systems that we tested in silico include
genomic regions whose methylation status is important during cellular reprogramming
[Milagre et al., 2017], Horvath’s epigenetic clock [Horvath, 2013a], transcription factor
binding sites that are affected by DNA methylation [Domcke et al., 2015; Maurano et al.,
2015], imprinted loci [Hanna et al., 2016], CpGs found in the exon-intron boundaries
[Lev Maor et al., 2015] and CHG sites that are differentially methylated between different
arabidopsis accessions [Kawakatsu et al., 2016] (Fig. S3.5). For these in silico systems we
chose to run the software with the threshold set to 25% of the maximum Score.
In all cases, cuRRBS is able to dramatically reduce the cost associated with the
sequencing by several orders of magnitude compared to WGBS, which is assessed using
the Cost Reduction Factor (CRF) (Fig. 4.4). In addition, for cases where a comparison to
MspI-based RRBS could be made, cuRRBS is able to improve the CRF, again, by orders
of magnitude. As an example, for the placental-specific imprints, the sequencing costs
are reduced by approximately 400-fold when compared to WGBS and by 12.5-fold when
compared to the traditional MspI-based RRBS.
Furthermore, we have also observed that many of the top hits reported by cuRRBS are
digestions of two restriction enzymes (Fig. S3.5), highlighting the combinatorial power of
restriction enzymes to produce optimal reduced representations of the genome [Bystrykh,
2013]. Excitingly, we are able to show that using cuRRBS it is possible to assay a far larger
number of target sites, in a far simpler experimental design than would normally be achieved
using amplicon-based bisulfite sequencing.
4.5 Experimental validation of cuRRBS
To assess in an unbiased manner how well predictions from cuRRBS perform in an experi-
mental setting, we employed two independent non-canonical RRBS datasets: one generated
from a single enzyme (XmaI) and the other from a combination of two restriction enzymes
(MspI and Taqα I) [Lim et al., 2016; Tanas et al., 2017]. By evaluating the predictive power
114 Technological aspects
Ara
bid
op
sis
 CH
G s
ites
Mo
us
e i
PS
Cs
 de
me
thy
lat
ed
Mo
us
e i
PS
Cs
 m
ain
tain
ed
Mo
us
e N
RF
1 s
ite
s
Hu
ma
n e
xo
n−
int
ron
 bo
un
da
rie
s
Hu
ma
n e
pig
en
etic
 clo
ck
Hu
ma
n i
mp
rin
ted
 lo
ci
Hu
ma
n C
TC
F s
ites
Hu
ma
n p
lac
en
tal
 im
pri
nte
d l
oc
i
0
100
200
300
400
Co
st
 R
ed
uc
tio
n 
Fa
ct
or
 (C
RF
)
Fig. 4.4 Running cuRRBS in different biological systems. Barplot showing the values for the Cost Reduction
Factor (CRF) in the different biological systems that were tested (see Fig. S3.5) [Domcke et al., 2015; Hanna
et al., 2016; Horvath, 2013a; Kawakatsu et al., 2016; Lev Maor et al., 2015; Maurano et al., 2015; Milagre et al.,
2017]. The colours in the bars represent the different species interrogated (green: Arabidopsis thaliana, blue:
Mus musculus, red: Homo sapiens). The CRF for the traditional RRBS protocol (MspI in the human genome,
using a bead size selection step of 20-800 bp, CRF = 30.65) is displayed as a grey area, which is not compared
with the A. thaliana system (since MspI is sensitive to CHG methylation).
of cuRRBS in these two datasets, we were able to observe cuRRBS’ performance in both
single and double enzyme contexts and across different genomes.
To test the accuracy of cuRRBS predictions in the context of a single enzyme digestion,
we utilised the non-canonical RRBS dataset generated from human DNA using the restriction
enzyme XmaI [Tanas et al., 2017]. This dataset was previously used to show that XmaI
could enrich for CpG islands (CGIs), while reducing the overall sequencing cost relative to
MspI, making the protocol more cost-effective. To validate cuRRBS using this system, we
therefore chose to enrich for all CpG sites that overlapped with a CGI (CGI-CpGs) in the
human genome using a predetermined theoretical size range equivalent to the ‘reproducible
library fragment lengths’ reported in Tanas et al. [2017] (i.e. 90-185 bp). cuRRBS predicted
with high accuracy the CpG sites that were observed in the experimental XmaI-RRBS dataset
(Fig. 4.5a). In particular, only a small proportion of the total number of CGI-CpGs should
be theoretically sequenced (102253 out of 2164614 i.e. 4.72%), and this was indeed the
case (Fig. 4.5a). Furthermore, upon filtering out sites with low depth of coverage, which
commonly represent noise in RRBS datasets, the sensitivity increased up to approximately
80%. Importantly, the specificity remained constant at almost 100% independent of the
threshold set for depth of coverage (Fig. 4.5b). Thus, cuRRBS produces a prediction that is
4.6 Conclusions and future directions 115
relatively conservative, as highlighted by the low numbers of false positives (Fig. 4.5a), at
the expense of a small decrease in sensitivity.
Interestingly, the original theoretical size range that the study was aiming for (110-200
bp) was slightly different to the one achieved in the actual experiments (90-185 bp) [Tanas
et al., 2017]. We ran cuRRBS using the original size range target and obtained slightly
worse results for the sensitivity but not the specificity of the prediction (Fig. S3.6). This
demonstrates that the correct execution of the size selection step during the experimental
protocol is key for obtaining the sites predicted by cuRRBS and highlights the importance of
the robustness variable as part of the cuRRBS output in order to judge the consequences of
these experimental errors.
To test the accuracy of cuRRBS predictions in the context of a double enzyme digestion,
we utilised the non-canonical RRBS dataset generated from mouse DNA using the restriction
enzymes MspI and Taqα I [Lim et al., 2016]. To compare the accuracy of cuRRBS prediction
in this double enzyme system to that of the XmaI-RRBS system, we again ran cuRRBS for
CGI-CpGs, this time in the mouse genome with a theoretical size range of 80-160 bp [Lim
et al., 2016]. cuRRBS predicted with high accuracy the CpG sites that were observed in this
double enzyme experiment (Fig. 4.5c). In addition, the results for sensitivity and specificity
were very similar to the ones reported for the XmaI-RRBS dataset (Fig. 4.5d). Therefore,
we conclude that cuRRBS produces robust predictions for the sites of interest that will be
sequenced in RRBS protocols both for single and double enzyme combinations independent
of the genome under study.
Lastly, the number of fragments that were theoretically recoverable in each of our experi-
mental systems ranged from NF = 12780 (for XmaI) to NF = 331058 (for MspI and Taqα I).
This represents approximately a 30-fold difference in the number of recoverable fragments
and demonstrates that cuRRBS predictions, even for low NF values, are experimentally
feasible. Importantly, in the nine theoretical examples that we report (Fig. S3.5), the number
of fragments required by each cuRRBS protocol ranges from 107248 to 974050. Thus, the
number of fragments required to achieve the stated CRF comfortably exceeds the minimum
experimentally validated NF value (>8-fold).
4.6 Conclusions and future directions
cuRRBS provides a new framework that allows the user to optimise RRBS for the biological
system of interest by using novel combinations of restriction enzymes. Therefore, cuRRBS
116 Technological aspects
0
500000
1000000
1500000
2000000
0 5 10 15 20
Depth of coverage threshold
Nu
m
be
r o
f s
ite
s
FN
FP
TN
TP
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
25
50
75
100
5 10 15 20
Depth of coverage threshold
Pe
rc
en
ta
ge
 (%
)
●
●
Sensitivity =
TP
TP + FN
⋅ 100
Specificity =
TN
FP + TN
⋅ 100
0e+00
2e+05
4e+05
6e+05
8e+05
1e+06
0 5 10 15 20
Depth of coverage threshold
Nu
m
be
r o
f s
ite
s
FN
FP
TN
TP
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
25
50
75
100
5 10 15 20
Depth of coverage threshold
Pe
rc
en
ta
ge
 (%
)
●
●
Sensitivity =
TP
TP + FN
⋅ 100
Specificity =
TN
FP + TN
⋅ 100
a b
c d
Fig. 4.5 Experimental validation of cuRRBS. a. Barplots showing the number of true positives (TP, in
green), true negatives (TN, in blue), false positives (FP, in red) and false negatives (FN, in orange) when
comparing cuRRBS theoretical prediction with the actual XmaI-RRBS experimental data [Tanas et al., 2017]
(see section 4.7 for more details). The number of sites in each category is calculated for different thresholds in
the depth of coverage (number of reads covering a CpG site as reported by Bismark). cuRRBS prediction for
the CpG sites in human CpG islands was obtained enforcing a theoretical size range of 90-185 bp and running
the software for XmaI with all the default parameters (with a read length of 200 bp). Legend is displayed on
the right hand side. b. Plot showing values of cuRRBS sensitivity (in light green) and specificity (in cyan) as a
function of the depth of coverage threshold employed to filter the experimental data [Tanas et al., 2017]. The
number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) are the same as
in a. Legend is displayed below the plot curves. c. Same as in a. but for the MspI&Taqα I-RRBS experimental
data [Lim et al., 2016]. cuRRBS prediction for the CpG sites in mouse CpG islands was obtained enforcing a
theoretical size range of 80-160 bp and running the software for MspI&Taqα I with all the default parameters
(with a read length of 75 bp). d. Same as in b. but for the MspI&Taqα I-RRBS experimental data [Lim et al.,
2016].
4.7 Additional methods 117
makes the study of DNA methylation more affordable across all species for which genomic
sequences are available. Furthermore, it can open the door to the design of future studies in a
clinical context [Lee et al., 2014], which require cost-effective and robust protocols.
Currently, cuRRBS only considers combinations of up to two restriction enzymes. How-
ever, in the future, it would be possible to adapt the software to explore combinations that
contain higher numbers of enzymes, which could theoretically allow targeting the sites of
interest even more efficiently [Bystrykh, 2013]. Moreover, there are several methods that
are able to impute DNA methylation levels in sites that are not covered experimentally
[Angermueller et al., 2017; Zhang et al., 2015b]. These methods could expand the set of
sites of interest that are finally measured by making use of the additional DNA methylation
information that is retrieved in a cuRRBS experiment.
Finally, the potential of restriction enzymes to target different genomic coordinates is not
limited to DNA methylation. As such, it would be conceivable for cuRRBS to be adapted
to enrich for SNPs of interest [Davey and Blaxter, 2011; Davey et al., 2011] or to optimise
chromosome conformation capture techniques [Dekker et al., 2013; Naumova et al., 2012].
By reducing the cost associated with sequencing, we believe that cuRRBS will help to
democratise high-throughput genomic studies.
4.7 Additional methods
Restriction enzymes annotation
All the information regarding the commercially-available restriction enzymes that are used
by cuRRBS was extracted from REBASE [Roberts et al., 2005, 2015]. Restriction enzymes
were grouped in isoschizomer families (i.e. enzymes that recognise the same sequence and
generate identical fragment length distributions) and each enzyme was manually annotated
for different types of methylation-sensitivity (CpG, CHG, CHH). Only isoschizomer families
that contained at least one methylation-insensitive enzyme were considered for the examples
described here.
Genome assemblies and genomic annotation
All the analyses presented here were performed in the following genome assemblies: Homo
sapiens (hg38), Mus musculus (mm10) and Arabidopsis thaliana (TAIR10). Scaffolds not
assembled into the main chromosomes were discarded. Genomic annotation for the human
118 Technological aspects
genome (hg38) was obtained from GENCODE (v25, basic gene annotation) [Harrow et al.,
2012], with the exception of CpG islands (CGIs), which were extracted from the UCSC
Genome Browser [Bock et al., 2007]. GC content and CpG content were calculated, around
each restriction enzyme cleavage site, taking windows of ± 25 bp and ± 500 bp respectively.
For each enzyme, the mean of all cleavage sites was calculated to obtain the mean GC content
and the mean CpG content. Intragenic regions were defined as those regions within ± 2.5 kb
of a protein-coding gene, whilst the rest of the genome was considered to be intergenic. CpG
shores were defined as regions 0 to 2 kb away from CGIs in both directions and CpG shelves
as regions 2 to 4 kb away from CGIs in both directions [Zhang et al., 2015b]. Promoters were
defined as encompassing a 3 kb region (2.5 kb upstream and 0.5 kb downstream of the TSS)
relative to the TSS of all protein-coding transcripts in GENCODE, similar to the strategy
used in Taher et al. [2013]. Genomic annotation for the CGIs in the mouse genome (mm10)
was also obtained from the UCSC Genome Browser [Bock et al., 2007]. All annotations
were handled using the pybedtools library [Dale et al., 2011; Quinlan and Hall, 2010].
Performing in silico digestions of a given genome
We used the Restriction package from Biopython v1.68 to digest the different genomes with
the appropriate restriction enzymes in silico [Cock et al., 2009]. Only the first member of a
given isoschizomer family (which contained at least one methylation-insensitive enzyme)
was processed to avoid redundant computations. The output of the in silico digestions was
stored (pre-computed files) and subsequently read by cuRRBS when needed to reduce the
computational time (see ‘cuRRBS heuristics and computational efficiency’). When assessing
enzyme combinations, the information from the appropriate individual pre-computed files
(i.e. the genomic coordinates where the enzyme theoretically cuts) were combined by the
software to compute all the necessary variables.
cuRRBS’ enzyme flexibility
To ensure the user has full control over the enzymes that cuRRBS will use to derive the
desired enrichments, one of the inputs given to cuRRBS is an enzyme annotation file. This
file contains the desired isoschizomer families that the user wishes to be tested by cuRRBS. In
my GitHub repository we have already defined enzyme annotation files for enzymes that are
methylation-insensitive in a CG context and in CG, CHG and CHH contexts [Martin-Herranz
et al., 2017a]. However, it is also possible for the user to define a personalised set of enzymes
by providing a self-generated annotation file. This can be useful, for instance, to reduce the
chance of any star activity in the reported cuRRBS protocols.
4.7 Additional methods 119
cuRRBS parameter (abbrev.) Significance Default Range
Enzymes to check (-e) Defines the enzymes (isoschizomer fami-
lies) that cuRRBS will look at
- -
Annotation for the sites of inter-
est (-a)
Allows identification and weighting of the
sites of interest
- -
Read length (-r) Defines the positions in the theoretical frag-
ments that can be ‘seen’ after sequencing
- 30-300
Adapters size (-s) Ensures correct experimental size selection - -
C_Score constant (-c) Sets the minimum acceptable Score - 0-1
Genome size (-g) Needed to calculate the CRF - -
C_NF/1000 constant (-k) Sets the minimum acceptable CRF 0.2 0-1
Experimental error (-d) Sets the assumed experimental error (δ ) 20 5-500
Size range breadth (-b) Constrains the breadth of the size range 980 -
Output size (-t) Defines the number of cuRRBS protocols
the user can compare
30 -
Site IDs (-i) Enables the identification of the recovered
sites of interest
No -
Table 4.1 Flexible user-defined cuRRBS parameters. This table details the flexible user-defined parameters
that cuRRBS will accept as arguments. The cuRRBS parameter full name and command line abbreviation (in
brackets) are provided alongside a simplified description of the significance of these arguments to the user.
Where applicable, the defaults and ranges of these arguments are also detailed.
In addition, the output file from cuRRBS contains, by default, 30 cuRRBS protocols
that would enrich for the user’s sites of interest. Therefore, the user can determine which
enzyme combination and size range would be the simplest and most appropriate for the given
application. This provides the user with the opportunity to consider experimental factors that
may complicate the protocol, such as buffer compatibility and whether consecutive digestions
would be required.
Flexible user-defined cuRRBS parameters
cuRRBS contains a number of user-defined parameters to ensure the greatest possible
flexibility and ease of use. A table of these parameters is provided to highlight the versatility
that the user has and why such versatility is useful (Table 4.1).
cuRRBS heuristics and computational efficiency
cuRRBS employs several strategies to reduce the computational time needed in each run:
120 Technological aspects
• Restriction enzymes are grouped in isoschizomer families. Since isoschizomers gen-
erate the same genomic digestions, only one member of each family needs to be
processed.
• In silico digestions are read from pre-computed files. Digesting the genomes would be
a limiting factor in the cuRRBS pipeline. The user can download the pre-computed
files [Martin-Herranz et al., 2017a] and the information that they contain is read every
time that an enzyme needs to be assessed.
• The number of size ranges that are sampled is minimised. Since the experimental size
selection step is generally imperfect, size ranges are sampled with a sliding window
whose ‘resolution’ is equivalent to the experimental error specified by the user.
• Parallelization. cuRRBS can use several cores to decrease the CPU time.
Moreover, we have observed that, in many enzyme combinations, one of the enzymes is
providing most of the enrichment for the sites of interest, while the second one complements
the targeting. Therefore, it would be possible to implement a ‘heuristic’ mode, where only
those enzymes that perform well individually are used as ‘seeds’ to construct combinations
(as opposed to the current implementation, where all the enzyme combinations are checked
exhaustively). This could further reduce the computational time, especially if combinations
of more than two enzymes were being evaluated.
The CPU time required by cuRRBS depends on several parameters, including the number
of enzymes checked, the experimental error, the number of sites of interest or the genome
size (Fig. S3.7). The RAM used will be approximately equal to the size of the pre-computed
files that are read by the software. A standard cuRRBS run (e.g. for a few thousand sites
of interest in the human genome, checking 128 CpG methylation-insensitive isoschizomer
families) takes around 0.5-1 hours and uses around 4 GB RAM, which allows the user to
easily run it on a dual-core laptop or desktop computer.
Obtaining the sites of interest for different biological systems
We have tested in silico the ability of cuRRBS to enrich for the sites of interest in a selection
of different biological systems where DNA methylation has an important functional role. In
some of these systems, described below, previous analysis was performed in order to obtain
the genomic coordinates for the sites:
4.7 Additional methods 121
• Exon-intron boundaries in human. Exons and introns were obtained from protein-
coding genes using GENCODE annotation data. Those CpG sites that were found
within ± 5 bp of a canonical splice site (5’-GT, 3’-AG) were selected.
• Epigenetic clock in human. These sites were obtained from the Horvath epigenetic
clock [Horvath, 2013a] and were lifted over to hg38 [Kuhn et al., 2012] before running
cuRRBS.
• Canonical and placental imprints in human. These loci were obtained from Hanna et al.
[2016]. The sites were lifted over to hg38 [Kuhn et al., 2012] and the CpG sites were
then extracted for the analysis.
• CTCF binding sites in human. We obtained the CpG sites that overlap with in vivo
CTCF binding sites. Peaks from sites that seem to be affected by methylation (upregu-
lated, reactivated) were kindly provided by Dr. M. T. Maurano [Maurano et al., 2015].
We scanned the peaks for high-scoring motifs according to the CTCF JASPAR model
[Mathelier et al., 2015]. Finally, we extracted those CpGs that were found in positions
5 and 15 of the motif, whose methylation status is supposed to influence the binding of
the transcription factor [Maurano et al., 2015].
• Induced pluripotent stem cells (iPSCs) demethylated and maintained sites in mouse.
These were obtained by comparing mouse embryonic fibroblasts (MEFs) to iPSCs as
described previously [Milagre et al., 2017], with an additional filter for magnitude of
methylation change (>50% methylation change).
• NRF1 binding sites in mouse. We obtained the CpG sites that overlap with in vivo
NRF1 binding sites in mouse. ChIP-seq data was processed as described in the original
publication [Domcke et al., 2015], where peaks were called using Peakzilla [Bardet
et al., 2013]. We took as our final set of peaks the overlap between the two TKO
replicates. Next, we scanned the peaks for high-scoring motifs according to the NRF1
JASPAR model [Mathelier et al., 2015]. Finally, we extracted those CpGs that were
found in positions 2 and 8 of the motif, whose methylation status is supposed to
influence the binding of the transcription factor [Mathelier et al., 2015].
• CHG sites in Arabidopsis thaliana. Non-CpG DMRs arising from the epigenomic
diversity between Arabidopsis thaliana accessions were obtained from Kawakatsu
et al. [2016]. The coordinates for C sites in non-CpG context were extracted.
In all the cases the sites were equally weighted (wi = 1), with the exception of the human
epigenetic clock system, where the sites were assigned the absolute value of the weights in
122 Technological aspects
the linear model [Horvath, 2013a]. All the site annotation files can be found in my GitHub
repository [Martin-Herranz et al., 2017a]
Running cuRRBS for the different biological systems
cuRRBS was run in the different systems described above using the default parameters
(k = 0.2, d = 20, b = 980, t = 30), for a read length (r) of 75 bp and a Score threshold (c)
of 0.25. In the mouse and human examples we considered 128 isoschizomer families that
contained enzymes that were not sensitive to CpG methylation. In the case of Arabidopsis
thaliana we used 28 isoschizomer families that contained enzymes that were not sensitive to
5mC in any context (CG, CHG, CHH).
Mapping of RRBS samples
XmaI-RRBS data generated on the Ion Torrent platform [Tanas et al., 2017] and MspI&Taqα I
-RRBS data generated on the Illumina HiSeq platform [Lim et al., 2016] were quality
trimmed using Trim Galore (www.bioinformatics.babraham.ac.uk/projects/trim_galore/) and
had base pairs removed from the 3’ end to avoid including filled-in nucleotides with artificial
methylation states (the filled-in XmaI, MspI and Taqα I cut sites include the nucleotide
sequence CCGG, CG and CG respectively). The data was then mapped to the human genome
(for XmaI data, parameters: –non_directional) or the mouse genome (for MspI&Taqα I data,
parameters: –directional) using Bismark (0.18.0) [Krueger and Andrews, 2011]. In each of
the two cases data from different experiments or replicates was merged into the same FASTQ
file prior to quality trimming.
Estimating cuRRBS’ sensitivity and specificity
We assessed the performance of cuRRBS predictions in two independent experimental
datasets [Lim et al., 2016; Tanas et al., 2017] (see section 4.5). We ran cuRRBS fixing the
theoretical size ranges tested to the ones reported in the publications [Lim et al., 2016; Tanas
et al., 2017] and we used as our sites of interest the CpGs that overlapped with CpG islands
(CGI-CpGs) in the human [Tanas et al., 2017] and the mouse genomes [Lim et al., 2016]
respectively. From the cuRRBS output files we recovered the IDs of the sites that should
be theoretically sequenced. Moreover, using the experimental RRBS data [Lim et al., 2016;
Tanas et al., 2017], we could obtain the IDs of the sites that were actually sequenced (filtered
by a given depth of coverage threshold). Afterwards, we calculated the following variables
for each one of the datasets:
4.7 Additional methods 123
• True positives (TP): number of CGI-CpGs that cuRRBS predicted to be sequenced and
were indeed found in the RRBS data.
• True negatives (TN): number of CGI-CpGs that cuRRBS predicted to be absent and
were not found in the RRBS data.
• False positives (FP): number of CGI-CpGs that cuRRBS predicted to be sequenced but
were not found in the RRBS data.
• False negatives (FN): number of CGI-CpGs that cuRRBS predicted to be absent but
were found in the RRBS data.
Finally, we estimated the sensitivity and specificity, for a given dataset, as follows:
Sensitivity =
T P
T P+FN
·100 (4.6)
Speci f icity =
T N
FP+T N
·100 (4.7)
Software availability
cuRRBS and its documentation are freely distributed under GNU General Public License
v3.0 and can be accessed in my GitHub repository [Martin-Herranz et al., 2017a].

Chapter 5
Final remarks
‘Caminante, son tus huellas
el camino, y nada más;
caminante, no hay camino:
se hace camino al andar.’
Antonio Machado [1912]
The purpose of this thesis was to advance our understanding of the epigenetic ageing
clock in humans. I now review the main conclusions from this work and propose future
directions that could be of interest.
5.1 Statistical aspects
In Chapter 2, I have assessed different statistical methods that allowed me to characterise
the epigenetic landscape during human physiological ageing. To date, DNA methylation
data from blood, generated in the Illumina 450K methylation array platform, is the most
abundant epigenetic data type available to study human ageing. I built a dataset of this
data type for healthy individuals, pre-processed it and benchmarked different methods to
correct for blood cell composition changes. I reproduce previous findings showing that a
great proportion of the epigenome is affected by the ageing process (in my case around
30%, using a conservative threshold to correct for multiple testing). This highlights that
the epigenetic ageing clock is a genome-wide phenomena that extends way beyond the
cytosines included in most epigenetic clock models. Furthermore, the small effect sizes
suggest that most age-related DNA methylation changes occur only in a small proportion of
cells (DNA molecules) in the tissue (around 4% on average for the entire human lifespan).
126 Final remarks
Finally, I tested the behaviour of different epigenetic clocks (Horvath, Hannum, epiTOC)
and developed a strategy to correct for potential batch effects in this context.
Current epigenetic clocks use a linear modelling framework. Nevertheless, many changes
in methylation values during ageing are non-linear (for example during organismal growth).
Horvath’s clock corrects for this by transforming chronological age, but it would be inter-
esting to try to model the changes of individual CpG sites before including them as part
of the training. This could also help to identify modules of CpG sites that behave in the
same way during ageing and allow deconvoluting the different processes that shape the
epigenetic landscape and may be operative at different life stages. Additional improvements
in epigenetic clocks will likely include integrating longitudinal information (which could
help to identify different ageing trajectories) [Jensen et al., 2014] and separating the con-
tributions of mutations and epimutations to the methylation signal. Furthermore, it would
be interesting to try to map the shapes of the DNA methylation changes to the changes
in mortality rate at a human population level, therefore creating a link between molecular
changes and epidemiological observations. This could be further validated in species with
extremely different profiles of mortality rate (e.g. naked mole rat).
Current multi-tissue epigenetic clocks have been trained on all tissues available. Nev-
ertheless, it is reasonable to assume that the strategies to maintain stable DNA methylation
landscapes over time would significantly differ between highly proliferative tissues (such as
blood) from those where cell division is a rare event (such as brain). Thus, building different
epigenetic clocks for these two categories of tissues and analysing their genome-wide changes
in DNA methylation over time could improve the accuracy of the models and provide further
insights into the role of cell division on the epigenetic ageing clock.
There are methods that allow imputing DNA methylation patterns based on different
genomic features for a ‘static’ epigenome, both at the bulk [Zhang et al., 2015b] and single-
cell levels [Angermueller et al., 2017; Kapourani and Sanguinetti, 2019]. Given that the
regions that change their DNA methylation during ageing seem to share the genomic context,
it would be interesting to design an imputation algorithm for a ‘dynamic’ epigenome
(e.g. given an epigenome at time t, predict what that epigenome would look like at time
t +∆t). Furthermore, it would be fascinating to attempt training machine learning models
(e.g. deep neural networks) to predict whether a given CpG site or region will change the
methylation status with age (and the direction and magnitude of the change) from the DNA
sequence. This could give us additional insights into how much the ageing-related changes
are hard-coded in the genome and how much the environment and lifestyle contribute to
5.2 Biological aspects 127
modify it. Moreover, some of these predictions could be tested by introducing exogenous
pieces of DNA in ageing mice.
Developmental disorders are useful biological systems to study the effects of altered
functions in specific parts of the epigenetic machinery. As such, the analysis presented in
Chapter 3 could be further expanded into a statistical framework that allows quantifying how
much certain epigenetic functions contribute to the methylation status of specific regions. In
other words, the definition of epimutational signatures (e.g. epimutational signature 1 is
the consequence of reduced H3K4 methylation in enhancers) that would allow to deconvolute
the epigenetic processes behind a specific DNA methylation pattern (e.g. the one caused by
ageing or smoking exposure).
5.2 Biological aspects
The goal of Chapter 3 was to study how different parts of the epigenetic machinery affect
the rate of the epigenetic ageing clock, thus providing the first identified components of
the hypothetical epigenetic maintenance system [Horvath, 2013a]. For that purpose, I studied
the epigenetic age acceleration observed in patients with developmental disorders, many of
which harbour mutations in proteins of the aforementioned epigenetic machinery.
This analysis revealed that mutations in NSD1, an H3K36 methyltransferase, dra-
matically accelerate epigenetic ageing. The effect sizes observed (on average > 7 years)
are bigger than many of the conditions reported to accelerate the epigenetic ageing clock
[Horvath and Raj, 2018]. Importantly, the genomic context where these changes happen
is partially shared with the ageing process. Regions marked by H3K27me3, deposited by
Polycomb Repressing Complex 2 (PRC2), were highly enriched for these changes both in
ageing and Sotos, consistent with previous reports. Interestingly, global DNA hypomethyla-
tion (a characteristic of Sotos patients) causes a redistribution of PRC2 and H3K27me3 from
their normal targets (many of them developmental genes marked with bivalent chromatin)
to other genomic regions, which leads to the aberrant expression of some of these genes
[Reddington et al., 2013]. Importantly, there is a mechanistic link between PRC2 recruitment
and H3K36me3 via the Tudor domains of some polycomb-like proteins [Cai et al., 2013; Li
et al., 2017]. As such, it would be expected that perturbations in the H3K36 methylation
landscape would affect PRC2 activity. Furthermore, methylation of CpG sites in normally
unmethylated CpG islands could also lead to a loss of PRC2 binding [Li et al., 2017]. This
could be happening in bivalent regions / DNA methylation valleys (DMVs) during ageing
and affect the differentiation process of progenitor stem cells in adult tissues. Indeed, this
128 Final remarks
seems to be the case for aged haematopoietic stem cells [Beerman et al., 2013; Sun et al.,
2014a], but whether this applies to other tissues still needs to be elucidated. Importantly,
DNA methylation changes affecting progenitor stem cells could be propagated in the tissue,
therefore contributing substantially to the signal captured by epigenetic clocks.
Hence, during ageing, there could be a redistribution of PRC2 from bivalent regions
/ DMVs to other regions that have become hypomethylated, at the same time that de
novo DNMT3A/B get relocated in the opposite direction (as shown in Fig. 3.8), leading to
a deregulation in the expression of developmental genes. This model expands and is overall
compatible with the one proposed by Zheng, Widschwendter and Teschendorff to explain
the increase in cancer risk with age [Zheng et al., 2016]. While this could be induced by
the rewiring of the H3K36 methylation landscape, direct evidence needs to be provided to
ascertain that this is indeed the case during human physiological ageing. As such, it would be
interesting to profile H3K36me3 during ageing in different tissues. Furthermore, differential
expression of genes coding for the H3K36 methylation machinery (both methyltransferases
and demethylases) during ageing would also be expected (e.g. by hypermethylating the
promoter of NSD1, as observed in human neuroblastoma and glioma cells) [Berdasco et al.,
2009]. Moreover, a study showing if cryptic transcription increases during human ageing
(something that seems to happen in model organisms) could contribute to our understanding
of the global functional consequences of these epigenetic changes. Finally, genes with lower
levels of H3K36me3 should be more prone to cryptic transcription during ageing [Pu et al.,
2015] and potentially display higher transcriptional heterogeneity between cells.
There is conflicting evidence on the literature on whether NSD1 can also catalyse the
methylation of H4K20 in vivo [Berdasco et al., 2009; Kudithipudi et al., 2014]. H4K20me1
is a histone mark highly enriched in telomeres [Enguix et al., 2018] and depletion of H4K20
methylation leads to genomic instability [Sørensen et al., 2013]. This creates another
interesting link between telomere biology and the epigenetic ageing clock (as discussed
in Chapter 1, TERT genetic variants are associated with epigenetic age acceleration and its
expression is required in vitro to ensure epigenetic ageing) [Lu et al., 2018]. It would be worth
testing how the epigenetic ageing clock behaves in cancer-resistant mice that constitutively
express TERT (which have an extended lifespan) [Tomás-Loba et al., 2008].
Ageing-related DNA methylation changes generally increase the informational entropy
of the system (i.e. the methylation values tend to 0.5, see section 3.5). It is tempting to
speculate that, from a biological point of view, this can be interpreted as a dilution of the
epigenetic marks that define stable cell types and transcriptional programs and an increase
in cell-to-cell epigenetic heterogeneity (in other words, as an erosion of the Waddingtonian
5.3 Technological aspects 129
epigenetic landscape). Some authors have suggested that epigenetic information is carried by
a population of cells as a whole [Jenkinson et al., 2017; Shipony et al., 2014]. Furthermore,
even populations of a specific cell type (such as primed ESCs) show oscillations in the
methylation values of specific regions, which seem to have a particularly high amplitude in
enhancers [Rulands et al., 2018] (one of the hotspots of hypomethylation changes during
ageing). If such a population were to be analysed with a bulk DNA methylation method, it
would likely display a high methylation entropy in enhancers. Furthermore, the fact that
methylation entropy is higher in the sites of the Horvath clock could indicate that cytosines
that display this type of metastable state make good predictors. Thus, it is possible that
alterations in the DNA methylation oscillatory behaviour, caused by changes in the activities
or the binding of DNMT3s and TETs (which could happen if the H3K36 methylation
landscape is altered), are a feature of the epigenetic ageing clock. Moreover, cytosines that
change their methylation status following circadian rhythms have been identified both in
mice [Oh et al., 2018] and humans [Oh et al., 2019]. Importantly, these cytosines seem to
significantly overlap with ageing DMPs, with the amplitude of the circadian oscillations
correlating with the magnitude of epigenetic ageing effects. Intriguingly, oscillatory cytosines
were highly enriched in neutrophil-specific enhancers [Oh et al., 2019]. Therefore, further
studies should explore the relationship between the disruption of circadian rhythms during
ageing and its association with the methylome and the epigenetic ageing clock.
Mechanistic advances will require testing these ideas in the mouse. First, it would be
interesting to confirm whether the effects of heterozygous loss-of-function mutations in
NSD1 are evolutionarily conserved, using the mouse multi-tissue epigenetic clock, and test
if they affect the lifespan of these mice. Moreover, one of the remaining questions is whether
the DNA methylation changes associated with the epigenetic ageing clock are functional at
all. Epigenomic editing technologies [Liu et al., 2016] could help to answer this question.
Additionally, testing how conserved these mechanisms are beyond mammals (e.g. in the
African turquoise killifish) or whether they behave differently in species with remarkable
longevity (such as the naked mole rat) would be of interest.
5.3 Technological aspects
In Chapter 4, we have created a computational method (cuRRBS) to optimise the enrich-
ment of specific sets of genomic sites through the combinatorial use of restriction en-
zymes. This could be potentially applied to make future epigenetic clocks more cost-effective
(especially if they are composed of several hundreds or thousands of sites). Furthermore,
130 Final remarks
given how statistically degenerate epigenetic clocks are, new models could be trained taking
into account the most cost-effective combinations of sites. Reductions in assay cost could
lead to the wide adoption of DNA methylation-based biomarkers for high-throughput drug
screening.
From a research point-of-view, it is fundamental that we expand our analysis beyond the
biased regions from the Illumina methylation array. Therefore, whole genome bisulfite
sequencing during ageing should become more common, allowing us to characterise the
changes in the epigenetic landscape at higher resolution. This will likely become a reality
thanks to the fast drop in sequencing costs and to the development of bisulfite-free methods
that improve mapping rates [Liu et al., 2019].
Furthermore, it remains to be seen whether the DNA methylation changes observed
during ageing occur in all cell types in the tissue or whether changes in the concentration
of specific cell types (e.g. progenitor stem cells) or clones are responsible for them. In this
sense, single-cell technologies (specially those that profile transcriptome and epigenome
simultaneously) and lineage tracing will become instrumental for future mechanistic advances
on the epigenetic ageing clock [Kelsey et al., 2017].
Appendix
Supplementary figures
S.1 Supplementary for chapter 2
0e+00
2e−04
4e−04
6e−04
0 10000 20000 30000
Raw Mi
De
ns
ity Failed QC
Passed QC
0e+00
2e−04
4e−04
0 10000 20000 30000
Raw Ui
De
ns
ity Failed QC
Passed QC
a b
0e+00
2e−04
4e−04
6e−04
8e−04
0 10000 20000 30000
Background−corrected Mi
De
ns
ity Failed QC
Passed QC
0e+00
2e−04
4e−04
6e−04
0 10000 20000 30000
Background−corrected Ui
De
ns
ity Failed QC
Passed QC
c d
Fig. S1.1 Effects of noob background correction on the array flurescence intensities. Distributions of the array
fluorescence intensities for the a. methylated signals (Mi) before background correction; b. unmethylated
signals (Ui) before background correction; c. methylated signals (Mi) after background correction and d.
unmethylated signals (Ui) after background correction. Each curve represents a DNA methylation sample from
the GSE41273 batch. In grey: 51 samples that passed quality control (QC). In red: 2 samples that failed QC.
132
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
ll
l
ll
l
l
l
l l l
l
l
l
l
l
l
l
l
lll
9
10
11
12
13
9 10 11 12 13
median{ log2 Mi }
m
ed
ia
n{
 
lo
g 2
 
U i
 
}
l
l
Failed QC
Passed QC
Fig. S1.2 Quality control (QC) strategy to identify outlier samples, according to their global intensity values,
in the GSE41273 batch. Those samples with low median intensity values (see criteria in section 2.1.2) were
discarded from downstream analyses (2/53, in red). Each sample is represented by one point. The dashed line
represents the intensity threshold. Mi and Ui represent the background-corrected methylated and unmethylated
intensity measurements for the different 450K array probes in a given sample.
0.00
0.05
0.10
0.15
−5 0 5
M−value
D
en
si
ty
Passed QC
Fig. S1.3 M-value distributions in the samples of the GSE41273 batch, after all the pre-processing steps have
been carried out (background correction, quality control, probe filtering and BMIQ normalisation). M-values
were calculated applying the logistic transformation to the β -values, as described in Du et al. [2010]. Each
curve represents a different sample.
S.1 Supplementary for chapter 2 133
Strategy name Reference Gold-standard 
preprocessing
Reference 
preprocessing
Probes in 
reference
Algorithm Mean 
RMSE
Mean 
MAE
Mean 
R^2
minfi minfi SQN* SQN* 600 Houseman 
CP/QP
2.3246 2.0137 0.9473
dhs_dif1_houseman DHS-DMCs Noob+BMIQ Default 333 Houseman 
CP/QP
4.8039 3.843 0.7783
dhs_NB_houseman DHS-DMCs Noob+BMIQ Noob+BMIQ 333 Houseman 
CP/QP
4.9398 4.1559 0.8062
dhs_dif2_houseman DHS-DMCs Noob+Filtering+
BMIQ 
Default 316 Houseman 
CP/QP
6.1731 5.2469 0.7779
dhs_NFB_houseman DHS-DMCs Noob+Filtering+
BMIQ 
Noob+Filtering+
BMIQ 
316 Houseman 
CP/QP
6.1194 5.3185 0.7816
dhs_dif1_cibersort DHS-DMCs Noob+BMIQ Default 333 CIBERSORT 2.3914 1.9502 0.8702
dhs_NB_cibersort DHS-DMCs Noob+BMIQ Noob+BMIQ 333 CIBERSORT 2.8578 2.3833 0.8453
dhs_dif2_cibersort DHS-DMCs Noob+Filtering+
BMIQ 
Default 316 CIBERSORT 2.9751 2.4714 0.8552
dhs_NFB_cibersort DHS-DMCs Noob+Filtering+
BMIQ 
Noob+Filtering+
BMIQ 
316 CIBERSORT 3.0684 2.5403 0.8571
dhs_dif1_rpc DHS-DMCs Noob+BMIQ Default 333 RPC 2.0421 1.7032 0.8873
dhs_NB_rpc DHS-DMCs Noob+BMIQ Noob+BMIQ 333 RPC 2.5289 2.1689 0.8705
dhs_dif2_rpc DHS-DMCs Noob+Filtering+
BMIQ 
Default 316 RPC 2.9653 2.3887 0.8722
dhs_NFB_rpc DHS-DMCs Noob+Filtering+
BMIQ 
Noob+Filtering+
BMIQ 
316 RPC 3.0755 2.5266 0.8611
idol_NB_houseman IDOL Noob+BMIQ Noob+BMIQ 300 Houseman 
CP/QP
2.0347 1.6778 0.9632
idol_NFB_houseman IDOL Noob+Filtering+
BMIQ 
Noob+Filtering+
BMIQ 
281 Houseman 
CP/QP
1.927 1.5498 0.9672
idol_NB_cibersort IDOL Noob+BMIQ Noob+BMIQ 300 CIBERSORT 2.1997 1.7958 0.9626
idol_NFB_cibersort IDOL Noob+Filtering+
BMIQ 
Noob+Filtering+
BMIQ 
281 CIBERSORT 1.9818 1.6216 0.9704
idol_NB_rpc IDOL Noob+BMIQ Noob+BMIQ 300 RPC 2.26 1.8812 0.9679
idol_NFB_rpc IDOL Noob+Filtering+
BMIQ 
Noob+Filtering+
BMIQ 
281 RPC 2.0122 1.6288 0.9692
Fig. S1.4 Table showing the different cell-type deconvolution strategies that were benchmarked. BMIQ:
beta-mixture quantile normalisation. CP/QP: constrained projection/quadratic programming. MAE: mean
absolute error. Noob: noob background correction. R2: coefficient of determination. RMSE: root mean
squared error. RPC: robust partial correlations. SQN: stratified quantile normalisation. ‘Default’ refers to the
pre-processing strategy employed in the original DHS-DMCs publication, as implemented in the EpiDISH R
package (centDHSbloodDMC.m) [Teschendorff et al., 2017; Teschendorff and Zheng, 2017b]. See section 2.1.3
in the main text for more details on what the different references refer to.
134
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
0.25
0.50
0.75
1.00
m
in
fi
dh
s_
di
f1
_h
ou
se
m
an
dh
s_
NB
_h
ou
se
m
an
dh
s_
di
f2
_h
ou
se
m
an
dh
s_
NF
B_
ho
us
em
an
dh
s_
di
f1
_c
ib
er
so
rt
dh
s_
NB
_c
ib
er
so
rt
dh
s_
di
f2
_c
ib
er
so
rt
dh
s_
NF
B_
cib
er
so
rt
dh
s_
di
f1
_r
pc
dh
s_
NB
_r
pc
dh
s_
di
f2
_r
pc
dh
s_
NF
B_
rp
c
id
ol
_N
B_
ho
us
em
an
id
ol
_N
FB
_h
ou
se
m
an
id
ol
_N
B_
cib
er
so
rt
id
ol
_N
FB
_c
ib
er
so
rt
id
ol
_N
B_
rp
c
id
ol
_N
FB
_r
pc
Cell−type deconvolution strategy
R
2
Cell
l
l
l
l
l
l
B
CD4T
CD8T
Gran
Mono
NK
Fig. S1.5 Benchmarking of the cell-type deconvolution strategies in blood. The x-axis shows the different
strategies that were tested (for a detailed description see Fig. S1.4). The y-axis shows the results for the
coefficient of determination (R2) when comparing the predictions with the real proportions of cells in a gold-
standard dataset (GSE77797) [Koestler et al., 2016]. The grey horizontal solid lines represent the mean for the
R2 across cell types and the grey dashed line the maximum of these values.
S.1 Supplementary for chapter 2 135
ProbeID Chromosome Coordinate Intercept Slope T statistic p-value Methylation change In Horvath 
model
Gene(s)
cg16867657 chr6 11044877 0.5458189 0.0053562 96.7079 0 Hypermethylated No ELOVL2
cg06639320 chr2 106015739 -0.18099 0.0040751 68.4826 0 Hypermethylated No FHL2
cg21572722 chr6 11044894 0.4485118 0.0029979 67.7891 0 Hypermethylated No ELOVL2
cg22454769 chr2 106015767 -0.37256 0.0054721 65.4459 0 Hypermethylated No FHL2
cg07547549 chr20 44658225 -0.109895 0.0039332 60.4444 0 Hypermethylated No SLC12A5
cg24724428 chr6 11044888 0.1715795 0.003787 60.3559 0 Hypermethylated No ELOVL2
cg17110586 chr19 36454623 -0.076933 0.0027991 59.6101 0 Hypermethylated No
cg19283806 chr18 66389420 1.1244081 -0.0052494 -55.5368 0 Hypomethylated No CCDC102B
cg10501210 chr1 207997020 -0.767615 -0.0071941 -54.848 0 Hypomethylated No
cg24079702 chr2 106015771 -0.239806 0.0037027 54.5055 0 Hypermethylated No FHL2
cg22796704 chr10 49673534 0.5923358 -0.0038938 -54.2818 0 Hypomethylated No ARHGAP22
cg04875128 chr15 31775895 -0.29584 0.0048949 53.8691 0 Hypermethylated No OTUD7A
cg23606718 chr2 131513927 -0.192302 0.0024361 53.8427 0 Hypermethylated No FAM123C
cg00059225 chr5 151304357 0.2564821 0.0023987 52.8361 0 Hypermethylated No GLRA1
cg23500537 chr5 140419819 0.2019473 0.0029768 52.4657 0 Hypermethylated No
cg07553761 chr3 160167977 -0.085898 0.0030009 52.1708 0 Hypermethylated No TRIM59
cg14674720 chr2 219827930 -0.15175 0.0022723 52.1475 0 Hypermethylated No
cg16419235 chr8 57360613 -0.110675 0.0021004 52.087 0 Hypermethylated No PENK
cg07082267 chr16 85429035 -0.234831 -0.0024153 -51.9394 0 Hypomethylated No
cg11970349 chr4 8582287 0.4395301 0.0024517 51.7603 0 Hypermethylated No GPR78
cg14556683 chr19 15342982 -0.354214 0.0030292 51.4444 0 Hypermethylated No EPHX3
cg06493994 chr6 25652602 -0.281467 0.0018639 51.2747 0 Hypermethylated Yes SCGN
cg19560758 chr1 8086721 0.123634 0.0017654 51.0739 0 Hypermethylated No ERRFI1
cg22736354 chr6 18122719 -0.328228 0.0023877 50.7215 0 Hypermethylated Yes NHLRC1
cg17885226 chr6 105388731 -0.011797 0.0030608 50.2096 0 Hypermethylated No
cg08262002 chr4 16575323 0.448234 -0.0036267 -50.1807 0 Hypomethylated No LDB2
cg18933331 chr1 110186418 0.1394501 -0.0026901 -49.3592 0 Hypomethylated No
cg00329615 chr3 118706648 0.3767479 -0.0049889 -49.1687 0 Hypomethylated No IGSF11
cg08097417 chr7 130419133 -0.212277 0.0018305 48.9874 0 Hypermethylated No KLF14
cg00748589 chr12 11653486 0.1822405 0.0024207 48.2695 0 Hypermethylated No
cg11084334 chr3 9594264 -0.022951 0.0027848 47.6682 0 Hypermethylated No LHFPL4
cg11071401 chr17 48637194 0.3081191 0.0023875 47.6374 0 Hypermethylated No CACNA1G
cg06784991 chr1 53308768 0.0728526 0.0021442 47.4979 0 Hypermethylated No ZYG11A
cg00439658 chr17 72848669 -0.187047 0.0019148 47.3396 0 Hypermethylated No GRIN2C
cg16054275 chr1 169556022 -0.308762 -0.0031404 -47.2773 0 Hypomethylated No F5
cg14692377 chr17 28562685 -0.319816 0.0019735 47.2725 0 Hypermethylated No SLC6A4
cg13649056 chr9 136474626 0.0939199 0.0018608 47.0121 0 Hypermethylated No
cg11693709 chr15 40542019 0.4398948 -0.0041179 -46.6849 0 Hypomethylated No PAK6
cg07080372 chr11 796607 -0.044385 -0.0020517 -46.5748 0 Hypomethylated No SLC25A22
cg19671120 chr2 98962974 0.2917162 0.0019275 46.5463 0 Hypermethylated No CNGA3
cg16219603 chr8 57360586 -0.243393 0.001599 46.4953 0 Hypermethylated No PENK
cg11705975 chr10 120354248 0.1345631 0.0025062 46.1335 0 Hypermethylated No PRLHR
cg15480367 chr14 93389485 0.1737257 0.0020641 46.1196 0 Hypermethylated No CHGA
cg24466241 chr1 53308908 -0.192473 0.0028258 45.9054 5.9288E-323 Hypermethylated No ZYG11A
cg02650266 chr4 147558239 -0.028284 0.0018604 45.5452 2.5444E-319 Hypermethylated No
136
cg03738025 chr6 105388694 0.1325219 0.0037303 45.5435 2.6480E-319 Hypermethylated No
cg08160331 chr11 75140865 0.1225186 0.0024513 45.5115 5.5982E-319 Hypermethylated No KLHL35
cg14361627 chr7 130419116 -0.029613 0.0024426 45.4145 5.4238E-318 Hypermethylated No KLF14
cg08128734 chr1 206685423 0.5891423 -0.0054386 -45.0487 2.8384E-314 Hypomethylated No RASSF5
cg26290632 chr8 91094847 0.2029635 0.0020152 45.0401 3.4695E-314 Hypermethylated No CALB1
cg01974375 chr1 151298954 0.0385361 -0.0019059 -45.0297 4.4226E-314 Hypomethylated No PI4KB
cg23479922 chr5 16179633 -0.5691 0.0045894 44.9595 2.2879E-313 Hypermethylated No MARCH11
cg09809672 chr1 236557682 0.175291 -0.0040059 -44.8504 2.9374E-312 Hypomethylated Yes EDARADD
cg00481951 chr3 187387650 0.1841224 0.0023342 44.6878 1.3200E-310 Hypermethylated No SST
cg03545227 chr2 220173100 0.0832971 0.0013552 44.5825 1.5491E-309 Hypermethylated No PTPRN
cg18618815 chr17 48275324 -0.292108 -0.0031805 -44.5025 1.0061E-308 Hypomethylated No COL1A1
cg11649376 chr12 81473234 0.1177648 -0.0025894 -44.4751 1.9099E-308 Hypomethylated No ACSS3
cg11436113 chr20 19191145 -0.245529 -0.0028774 -44.446 3.7798E-308 Hypomethylated No
cg20591472 chr1 110008990 0.2290873 0.0029438 44.3726 2.1018E-307 Hypermethylated No SYPL2
cg12757011 chr2 162281111 -0.036861 0.0022385 44.3402 4.4864E-307 Hypermethylated No TBR1
cg06570224 chr3 157812475 -0.255113 0.0021525 44.3003 1.1387E-306 Hypermethylated No
cg12878812 chr12 119419696 -0.152434 0.0017975 44.1946 1.3495E-305 Hypermethylated No SRRM4
cg07931844 chr15 72102213 -0.347225 -0.0020941 -44.1556 3.363E-305 Hypomethylated No NR2E3
cg15341124 chr14 102027734 0.1822515 0.0021014 43.8202 8.5279E-302 Hypermethylated No DIO3;  MIR1247
cg12534424 chr7 127992316 -0.038607 0.0019362 43.5602 3.7086E-299 Hypermethylated No PRRT4
cg25410668 chr1 28241577 0.5378571 0.0033963 43.5204 9.4093E-299 Hypermethylated No RPA2
cg19392831 chr10 120355756 0.1002692 0.0017162 43.3469 5.4065E-297 Hypermethylated No PRLHR
cg16008966 chr1 114761794 0.2872323 -0.0024427 -43.054 5.0499E-294 Hypomethylated No
cg05308819 chr1 155959156 -0.383566 -0.0018965 -43.0379 7.3568E-294 Hypomethylated No
cg08468401 chr3 14303131 -0.481126 -0.0045074 -43.0226 1.0497E-293 Hypomethylated No
cg19855470 chr22 40060836 -0.111118 0.0015512 42.913 1.3565E-292 Hypermethylated No CACNA1I
cg11220950 chr16 2042693 0.0102849 0.0019377 42.8543 5.3374E-292 Hypermethylated No SYNGR3
cg16717122 chr15 51973920 0.3252301 0.00151 42.8415 7.1833E-292 Hypermethylated No SCG3
cg22156456 chr17 39844239 -0.229764 -0.0018499 -42.8279 9.8668E-292 Hypomethylated No EIF1
cg06335143 chr1 53308654 -0.088651 0.0022272 42.8111 1.4619E-291 Hypermethylated No ZYG11A
cg23746497 chr6 105388668 0.072451 0.0034686 42.7311 9.4375E-291 Hypermethylated No
cg08234504 chr5 139013317 -0.235634 -0.0015863 -42.72 1.2233E-290 Hypomethylated No
cg24436906 chr2 242498081 0.4803492 0.0019615 42.6333 9.2401E-290 Hypermethylated No BOK
cg13848598 chr10 115804578 -0.111233 0.0024786 42.4955 2.2983E-288 Hypermethylated No ADRB1
cg10804656 chr10 22623460 -0.950746 0.0028943 42.4594 5.3272E-288 Hypermethylated No
cg13135455 chr2 241860318 0.0059196 -0.0022231 -42.4071 1.8043E-287 Hypomethylated No
cg23078123 chr1 68577796 0.759047 -0.0026555 -42.3732 3.9744E-287 Hypomethylated No GPR177
cg13327545 chr10 22623548 -0.358846 0.0022651 42.3019 2.0954E-286 Hypermethylated No
cg03431918 chr17 77716367 0.1575907 -0.0017119 -42.2827 3.2734E-286 Hypomethylated No
cg01820374 chr12 6882083 -0.47997 -0.0022168 -42.2819 3.3323E-286 Hypomethylated Yes LAG3
cg20747538 chr3 137838021 -0.227794 -0.0019417 -42.2727 4.1287E-286 Hypomethylated No
cg27320127 chr2 47798396 0.3532211 0.0019054 42.2074 1.8912E-285 Hypermethylated No KCNK12
cg20273670 chr17 21356245 -0.202763 0.0032538 42.1546 6.4709E-285 Hypermethylated No
cg19702785 chr20 43727089 -0.307403 0.0016088 42.1542 6.5405E-285 Hypermethylated No KCNS1
cg14583999 chr3 10019040 0.051048 -0.0038329 -42.1149 1.6328E-284 Hypomethylated No TMEM111
cg01844642 chr3 51989764 -0.160677 0.0021369 42.1066 1.9788E-284 Hypermethylated No GPR62
S.1 Supplementary for chapter 2 137
cg00602811 chr2 145278564 -0.192604 -0.0038479 -42.1046 2.0743E-284 Hypomethylated No ZEB2
cg01770755 chr15 41914122 -0.106172 0.0017079 42.0334 1.089E-283 Hypermethylated No
cg00484358 chr1 110610995 0.2396367 0.0016647 42.0065 2.0361E-283 Hypermethylated No ALX3
cg18064714 chr7 20824556 -0.082174 0.00167 41.9065 2.0891E-282 Hypermethylated No SP8
cg16512661 chr5 2743620 0.2799574 0.0020114 41.717 1.7193E-280 Hypermethylated No
cg11741201 chr11 35638398 -0.069447 -0.0023228 -41.523 1.5688E-278 Hypomethylated No FJX1
cg22016779 chr2 230452311 -0.370728 -0.0023361 -41.4895 3.4156E-278 Hypomethylated No DNER
cg18473521 chr12 54448265 0.1111276 0.0041993 41.3931 3.2188E-277 Hypermethylated No HOXC4
cg01528542 chr12 81468232 -0.352352 -0.0036075 -41.3691 5.6171E-277 Hypomethylated No
Fig. S1.6 Table showing the characteristics of the top 100 differentially methylated positions during ageing
(aDMPs) in the blood of the healthy individuals, ordered by p-value and the absolute value of the T statistic.
The chromosome and coordinate refer to the hg19 human genome assembly. The reported genes are the closest
genes associated with the array probe, as specified by the 450K array annotation. In this case, cell composition
correction (CCC) was applied during modelling (see section 2.1.4).
2.
5
3.
0
3.
5
4.
0
0 25 50 75 10
0
Number of PCs
M
A
E 
in
 c
on
tro
l
Corrections
CCC: No | Batch: No
CCC: No | Batch: Yes
CCC: Yes | Batch: No
CCC: Yes | Batch: Yes
Optimal number of PCs: 11
Optimal mean MAE: 2.7485
Background correction: None
Fig. S1.7 Plot showing how the median absolute error (MAE) of the prediction in the healthy individual samples,
that should tend to zero, is reduced when the PCs capturing the technical variation are included as part of
the modelling strategy (see equations 2.16 and 2.17). The dashed line represents the optimal number of PCs
(11) that was finally used. The optimal mean MAE is calculated as the average MAE between the green and
purple lines. In this case, no background correction was applied to the methylation data before calculating the
epigenetic ages according to Horvath’s epigenetic clock [Horvath, 2013a].
138
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
−4
0
−2
0
0
20
40
Eu
ro
pe
Fe
b_
20
16
GS
E1
04
81
2
GS
E1
11
62
9
GS
E4
02
79
GS
E4
12
73
GS
E4
28
61
GS
E5
10
32
GS
E5
54
91
GS
E5
90
65
GS
E6
14
96
GS
E7
44
32
GS
E8
19
61
GS
E9
73
62
Batch
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Batch effect correction: FALSE
MAE: 3.273
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−4
0
−2
0
0
20
40
Eu
ro
pe
Fe
b_
20
16
GS
E1
04
81
2
GS
E1
11
62
9
GS
E4
02
79
GS
E4
12
73
GS
E4
28
61
GS
E5
10
32
GS
E5
54
91
GS
E5
90
65
GS
E6
14
96
GS
E7
44
32
GS
E8
19
61
GS
E9
73
62
Batch
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Batch effect correction: TRUE
MAE: 2.8211
a
b
Fig. S1.8 Correcting for batch effects in the context of the epigenetic clock. a. Distribution of the epigenetic age
acceleration (EAA) for the different batches of healthy individual samples, using the control model without cell
composition correction (CCC) and before applying batch effect correction. The dashed black line represents
EAA= 0, where the distributions should be centred around. b. As in a., but after applying batch effect correction
(i.e. equivalent to equation 2.17).
S.1 Supplementary for chapter 2 139
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l l
l
l
l l
l
l
l
l
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l l l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l l l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l l
l l
l
l l
l
l
l
l
l l
l
ll l
l
l
l
l
ll ll
l l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
ll
l
l
l
l l
l
l
l
ll
l
l
l
l
l l
l
l
l
ll l l
l
l l
l
l
l
l l
l
l l
l
l
l
l
l
l
l
l l
l
l l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
ll
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll
l l
l
l l
ll
l
l
l
l
l
ll
l
l
l
ll
l
l l
l
ll
l
l
l
l l
l
l
l
l
l
l
l
l
l l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l l
l
l
l l
l
l
l
l
l
ll l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l l
l
l
l l
ll
l
l
l l
l
l
l
lll
ll
ll l
l
ll
l
l
l l l
l
ll
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l l
l l
l
ll
ll
l
l
l
l
l l
l
l
l
l
l
l
l
l l l
l
l
l
l
l l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l l
l
l
l
l l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l l
ll
l
l
ll ll
l l l
l
l
ll
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l l
l
l
l l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l l
l
l
l
ll l
l
−
40
00
0
−
20
00
0
0
20
00
0
40
00
0
−
50
00
0
0
50
00
0
PC1 (68.99%)
PC
2 
(10
.33
%)
Batch
l
l
l
l
l
l
l
l
l
l
l
l
l
Europe
Feb_2016
GSE116300
GSE41273
GSE55491
GSE74432
GSE97362
Jun_2015
Mar_2014
May_2015
May_2016
Nov_2015
Oct_2014
Cases
Fig. S1.9 Scatterplot showing the values of the first two principal components (PCs) for the samples with
developmental disorders (cases, see Chapter 3) after performing PCA on the control probes of the 450K arrays.
Each point corresponds to a different sample and the colours represent the different batches. The different
batches cluster together in the PCA space, showing that the control probes indeed capture technical variation.
Please note that all the PCA calculations were done using samples from both healthy individuals (full lifespan,
N = 2218) and cases from developmental disorders (N = 666).
l
l
l
l
l l l l l l l l l l l l l l l l l l l l l
l
l
l
l
l l
l l l
l l l l l l l l l l l
l l l l
0
25
50
75
10
0
0 5 10 15 20 25
Principal components (PCs)
%
 v
ar
ia
nc
e
l
l
Cumulative % variance
% variance for PC
Fig. S1.10 Plot showing the percentages of technical variance explained by the different PCs from the control
probes. The dashed line represents the optimal number of PCs (17) that was finally used.
140
S.2 Supplementary for chapter 3
Batch name N♀ N♂ N Median age
(years)
Other comments
Europe 0 119 119 7.73
Feb_2016 20 20 40 6
GSE116300 4 5 9 3
GSE41273 0 9 9 7.75
GSE74432 11 16 27 10
GSE97362 4 9 13 15 Samples from the ‘validation cohort’
were not included in the analysis, since
they all seemed outliers on close exami-
nation
Jun_2015 1 1 2 3.5015
Mar_2014 11 6 17 8
May_2015 17 49 66 14
Nov_2015 35 30 65 6.7
Total 103 264 367 8 -
Table S2.1 Overview of the blood DNA methylation dataset from individuals with developmental disorders.
The batches ‘Europe’, ‘Feb_2016’, ‘Jun_2015’, ‘Mar_2014’, ‘May_2015’ and ‘Nov_2015’ were generated
in-house by our collaborators in Canada (see Chapter 3). The rest of the batches were downloaded from GEO
[Edgar et al., 2002]. N♀: number of samples from females. N♂: number of samples from males. N: total
number of samples. These numbers correspond to the samples left after applying quality control and filtering
(see section 3.2).
S.2 Supplementary for chapter 3 141
Batch name
Developmental 
disorder
Gene Mutation (DNA)
Mutation 
(protein)
Mutation 
effect
Pathogenic Sex Age (years) DNAmAge
Europe ASD NA NA NA NA NA Male 23.25 29.94120469
Europe ASD NA NA NA NA NA Male 25.75 23.66579727
Europe ASD NA NA NA NA NA Male 23.75 22.89490773
Europe ASD NA NA NA NA NA Male 26.58 31.33521081
Europe ASD NA NA NA NA NA Male 11.83 13.55540994
Europe ASD NA NA NA NA NA Male 12.33 12.62567804
Europe ASD NA NA NA NA NA Male 11.67 11.91444556
Europe ASD NA NA NA NA NA Male 12.67 15.1433583
Europe ASD NA NA NA NA NA Male 15.92 20.69231419
Europe ASD NA NA NA NA NA Male 16.92 18.37736076
Europe ASD NA NA NA NA NA Male 15.92 14.74270021
Europe ASD NA NA NA NA NA Male 19 28.69942806
Europe ASD NA NA NA NA NA Male 16.75 20.84761017
Europe ASD NA NA NA NA NA Male 20.16 17.69509361
Europe ASD NA NA NA NA NA Male 12.92 18.28693655
Europe ASD NA NA NA NA NA Male 13.25 12.24924728
Europe ASD NA NA NA NA NA Male 13 15.27709141
Europe ASD NA NA NA NA NA Male 13.25 15.93247357
Europe ASD NA NA NA NA NA Male 13.16 17.97126245
Europe ASD NA NA NA NA NA Male 13.67 18.5985271
Europe ASD NA NA NA NA NA Male 7.67 9.834525429
Europe ASD NA NA NA NA NA Male 7.92 8.819610809
Europe ASD NA NA NA NA NA Male 7.73 10.53639331
Europe ASD NA NA NA NA NA Male 8 8.782413174
Europe ASD NA NA NA NA NA Male 7.83 8.331080792
Europe ASD NA NA NA NA NA Male 8 8.412508081
Europe ASD NA NA NA NA NA Male 10.83 12.94110542
Europe ASD NA NA NA NA NA Male 11.5 16.52427744
Europe ASD NA NA NA NA NA Male 10.83 9.546814402
Europe ASD NA NA NA NA NA Male 11.5 10.75219435
Europe ASD NA NA NA NA NA Male 10.83 11.7226536
Europe ASD NA NA NA NA NA Male 6 8.750320884
Europe ASD NA NA NA NA NA Male 5.75 8.069349936
Europe ASD NA NA NA NA NA Male 6 8.205893972
Europe ASD NA NA NA NA NA Male 5.83 8.765912407
Europe ASD NA NA NA NA NA Male 6.33 6.903468104
Europe ASD NA NA NA NA NA Male 5.25 5.648518225
Europe ASD NA NA NA NA NA Male 5.67 5.896253109
Europe ASD NA NA NA NA NA Male 5.42 6.160793858
Europe ASD NA NA NA NA NA Male 5.75 8.719005258
Europe ASD NA NA NA NA NA Male 5.42 6.49657694
Europe ASD NA NA NA NA NA Male 3.92 4.884904225
Europe ASD NA NA NA NA NA Male 4.08 4.766905985
Europe ASD NA NA NA NA NA Male 4 5.462162993
142
Europe ASD NA NA NA NA NA Male 4.08 4.557194499
Europe ASD NA NA NA NA NA Male 4 4.383741212
Europe ASD NA NA NA NA NA Male 4.25 5.321367013
Europe ASD NA NA NA NA NA Male 3.25 2.797437125
Europe ASD NA NA NA NA NA Male 3.42 3.906912403
Europe ASD NA NA NA NA NA Male 3.33 4.703272329
Europe ASD NA NA NA NA NA Male 3.5 3.223456196
Europe ASD NA NA NA NA NA Male 3.42 4.024449964
Europe ASD NA NA NA NA NA Male 3.58 4.662665584
Europe ASD NA NA NA NA NA Male 5.16 7.931806871
Europe ASD NA NA NA NA NA Male 5.16 6.144088681
Europe ASD NA NA NA NA NA Male 5.16 5.423886319
Europe ASD NA NA NA NA NA Male 5.25 6.873520458
Europe ASD NA NA NA NA NA Male 5.16 6.828746343
Europe ASD NA NA NA NA NA Male 5.25 6.287392617
Europe ASD NA NA NA NA NA Male 6.5 7.549817595
Europe ASD NA NA NA NA NA Male 6.83 5.310188113
Europe ASD NA NA NA NA NA Male 6.67 8.807848811
Europe ASD NA NA NA NA NA Male 7.16 7.314048584
Europe ASD NA NA NA NA NA Male 6.83 7.143809294
Europe ASD NA NA NA NA NA Male 7.25 4.888587648
Europe ASD NA NA NA NA NA Male 10.08 11.01168613
Europe ASD NA NA NA NA NA Male 10.08 9.091817984
Europe ASD NA NA NA NA NA Male 10.08 12.00962928
Europe ASD NA NA NA NA NA Male 10.5 11.89814401
Europe ASD NA NA NA NA NA Male 10.08 10.85200361
Europe ASD NA NA NA NA NA Male 10.58 15.97655481
Europe ASD NA NA NA NA NA Male 14.67 19.40830372
Europe ASD NA NA NA NA NA Male 15.25 17.28948864
Europe ASD NA NA NA NA NA Male 14.83 18.99313794
Europe ASD NA NA NA NA NA Male 15.25 17.40182035
Europe ASD NA NA NA NA NA Male 15.08 20.74719227
Europe ASD NA NA NA NA NA Male 15.83 17.66494621
Europe ASD NA NA NA NA NA Male 1.83 2.332369997
Europe ASD NA NA NA NA NA Male 2.33 2.079645877
Europe ASD NA NA NA NA NA Male 2.08 3.093728905
Europe ASD NA NA NA NA NA Male 2.5 3.327332717
Europe ASD NA NA NA NA NA Male 2.08 3.081702301
Europe ASD NA NA NA NA NA Male 2.5 3.640188937
Europe ASD NA NA NA NA NA Male 27.67 5.315328746
Europe ASD NA NA NA NA NA Male 32.92 35.79080593
Europe ASD NA NA NA NA NA Male 31.83 35.12415194
Europe ASD NA NA NA NA NA Male 35.16 34.8152863
Europe ASD NA NA NA NA NA Male 32.33 33.47894995
Europe ASD NA NA NA NA NA Male 11.58 14.81256772
Europe ASD NA NA NA NA NA Male 4.5 3.982793413
S.2 Supplementary for chapter 3 143
Europe ASD NA NA NA NA NA Male 4.75 6.632731853
Europe ASD NA NA NA NA NA Male 4.5 5.453577973
Europe ASD NA NA NA NA NA Male 5 6.0536493
Europe ASD NA NA NA NA NA Male 4.67 4.665684936
Europe ASD NA NA NA NA NA Male 5 5.538833496
Europe ASD NA NA NA NA NA Male 4.33 6.826640979
Europe ASD NA NA NA NA NA Male 4.42 5.074848057
Europe ASD NA NA NA NA NA Male 4.33 4.069969605
Europe ASD NA NA NA NA NA Male 4.5 2.914915908
Europe ASD NA NA NA NA NA Male 4.33 4.177855824
Europe ASD NA NA NA NA NA Male 4.5 5.359046992
Europe ASD NA NA NA NA NA Male 7.33 4.981096393
Europe ASD NA NA NA NA NA Male 7.5 7.521560211
Europe ASD NA NA NA NA NA Male 7.33 5.632014057
Europe ASD NA NA NA NA NA Male 7.58 5.381195679
Europe ASD NA NA NA NA NA Male 7.42 7.07596058
Europe ASD NA NA NA NA NA Male 7.58 6.118788705
Europe ASD NA NA NA NA NA Male 8.83 8.225301829
Europe ASD NA NA NA NA NA Male 9.08 9.139517533
Europe ASD NA NA NA NA NA Male 8.83 7.154970232
Europe ASD NA NA NA NA NA Male 9.67 9.966260719
Europe ASD NA NA NA NA NA Male 8.92 8.69481855
Europe ASD NA NA NA NA NA Male 9.67 12.84219838
Europe ASD NA NA NA NA NA Male 8.08 10.35219735
Europe ASD NA NA NA NA NA Male 8.25 8.849774575
Europe ASD NA NA NA NA NA Male 8.16 9.464032218
Europe ASD NA NA NA NA NA Male 8.33 10.51799454
Europe ASD NA NA NA NA NA Male 8.16 9.41622481
Europe ASD NA NA NA NA NA Male 8.75 13.39598874
May_2015 Angelman UBE3A NA NA NA YES Female 7 5.473183736
May_2015 Angelman UBE3A NA NA NA YES Male 13 15.48878288
May_2015 Angelman UBE3A NA NA NA YES Male 55 59.49787491
Nov_2015 Angelman UBE3A NA NA NA YES Male 1 2.790549766
Nov_2015 Angelman UBE3A NA NA NA YES Female 4 3.956276247
Nov_2015 Angelman UBE3A NA NA NA YES Female 15 17.87817565
Nov_2015 Angelman UBE3A NA NA NA YES Male 1 2.320603044
Nov_2015 Angelman UBE3A NA NA NA YES Male 4 4.348249902
Nov_2015 Angelman UBE3A NA NA NA YES Male 1 0.959598999
Nov_2015 Angelman UBE3A NA NA NA YES Female 1 1.994091886
Nov_2015 Angelman UBE3A NA NA NA YES Female 10 8.697172131
Nov_2015 Angelman UBE3A NA NA NA YES Female 14 15.7410421
Nov_2015 Angelman UBE3A NA NA NA YES Female 6 5.13374965
Nov_2015 Angelman UBE3A NA NA NA YES Male 25 32.45470863
May_2015 ATR-X ATRX c.6254G>A p.Arg2085His Missense YES Male 6.3 6.19432086
May_2015 ATR-X ATRX c.736C>T p.Arg246Cys Missense YES Male 18 13.11825849
May_2015 ATR-X ATRX c.6593A>G p.His2198Arg Missense YES Male 1.4 2.604328944
144
May_2015 ATR-X ATRX c.758T>C p.Leu253Ser Missense YES Male 18.5 6.108170831
May_2015 ATR-X ATRX c.4817G>A p.Ser1606Asn Missense YES Male 21 24.74309568
May_2015 ATR-X ATRX c.5786A>G p.Lys1929Arg Missense YES Male 0.7 -0.14552632
May_2015 ATR-X ATRX c.730A>C p.Ile244Leu Missense YES Male 14 11.30064691
May_2015 ATR-X ATRX c.7156C>T p.Arg2386* Nonsense YES Male 4.6 6.236506951
May_2015 ATR-X ATRX c.536A>G p.Asn179Ser Missense YES Male 4.6 33.54375298
May_2015 ATR-X ATRX
Exon 207 
deletion
NA Exonic deletion YES Male 4.4 4.821921423
May_2015 ATR-X ATRX
c.7366_7367ins
A
p.Met2456Asnfs*
42
Frameshift YES Male 27 39.19917395
May_2015 ATR-X ATRX c.109C>T p.Arg37* Nonsense YES Male 14.5 5.274937882
May_2015 ATR-X ATRX c.736C>T p.Arg246Cys Missense YES Male 2.5 1.113449871
May_2015 ATR-X ATRX c.109C>T p.Arg37* Nonsense YES Male 17.5 22.71435784
May_2015 ATR-X ATRX c.109C>T p.Arg37* Nonsense YES Male 14 11.21597332
Nov_2015 Claes_Jensen KDM5C c.1510G>A p.Val504Met Missense YES Male 30 42.69659356
Nov_2015 Claes_Jensen KDM5C c.1439C>T p.Pro480Leu Missense YES_predicted Male 6 8.103173952
Nov_2015 Claes_Jensen KDM5C
c.4439_4440del
AG
p.Arg1481Glyfs* Frameshift YES Male 26 28.25654272
Nov_2015 Claes_Jensen KDM5C Intron 11:+5G>A NA
Splice site 
mutation
YES Male 42 54.3236723
Nov_2015 Claes_Jensen KDM5C c.1510G>A p.Val504Met Missense YES Male 8 10.07007313
Nov_2015 Claes_Jensen KDM5C c.1439C>T p.Pro480Leu Missense YES Male 2 3.619189097
Nov_2015 Claes_Jensen KDM5C c.229G>A p.Ala77Thr Missense YES Male 37 48.42002598
Nov_2015 Claes_Jensen KDM5C
c.4439_4440del
AG
p.Arg1481Glyfs* Frameshift YES Male 28 31.61445991
Nov_2015 Claes_Jensen KDM5C c.229G>A p.Ala77Thr Missense YES Male 13 16.50827759
Nov_2015 Claes_Jensen KDM5C c.1510G>A p.Val504Met Missense YES Male 26 38.69008936
May_2015 Coffin_Lowry RPS6KA3 c.1520insA p.Arg507fs Frameshift YES Female 6 4.093225848
May_2015 Coffin_Lowry RPS6KA3 c.2065C>T p.Gln689* Nonsense YES Male 11.5 10.63296406
May_2015 Coffin_Lowry RPS6KA3 c.2186G>A p.Arg729Gln Missense YES_predicted Male 4 4.62981308
May_2015 Coffin_Lowry RPS6KA3
c.631_772del14
2 and 
c.774+5G>A
NA
Frameshift and 
intronic 
mutation
YES_predicted Male 7 5.068637974
May_2015 Coffin_Lowry RPS6KA3 c.340C>T p.Arg114Trp Missense YES_predicted Male 1.3 8.170755226
May_2015 Coffin_Lowry RPS6KA3 c.727C>T p.Arg243* Nonsense YES Male 13 14.17141748
May_2015 Coffin_Lowry RPS6KA3 Intron 14:+1G>A NA
Splice site 
mutation
YES Male 22.8 25.56720654
May_2015 Coffin_Lowry RPS6KA3 NA NA
Exonic and 
intronic 
deletion
YES Male 12 10.17620766
May_2015 Coffin_Lowry RPS6KA3
c.386_387insCT
TT
p.Phe130Phefs*1
41
Frameshift YES Male 2 1.808104516
May_2015 Coffin_Lowry RPS6KA3 c.1155delT p.Phe385fs*40 Frameshift YES Male 8 7.406603271
Mar_2014 Floating_Harbour SRCAP c.7303C>T p.Arg2435* Nonsense YES Female 8 11.29885487
Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 15 16.23135534
Mar_2014 Floating_Harbour SRCAP c.7282dupC
p.Arg2428Profs*1
5
Frameshift YES Female 6 5.620915174
Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 10 42.55562244
Mar_2014 Floating_Harbour SRCAP c.8117C>G p.Ser2706* Nonsense YES Male 4 2.815335426
Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 5 4.112348915
Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 42 43.43022309
Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Male 12 12.37257473
S.2 Supplementary for chapter 3 145
Mar_2014 Floating_Harbour SRCAP c.7316dupC p.Ala2440Serfs*3 Frameshift YES Male 10 4.424381743
Mar_2014 Floating_Harbour SRCAP c.7165G>T p.Glu2389* Nonsense YES Female 8 1.524333568
Mar_2014 Floating_Harbour SRCAP
c.7218_7219del
TC
p.Gln2407Argfs*3
5
Frameshift YES Male 12 19.26251425
Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Male 5 4.902256866
Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 35 38.47378886
Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 15 14.81418145
Mar_2014 Floating_Harbour SRCAP c.7549delC p.Gln2517Lysfs*5 Frameshift YES Male 4 3.645524918
Mar_2014 Floating_Harbour SRCAP c.7330C>T p.Arg2444* Nonsense YES Female 6 7.201471688
Mar_2014 Floating_Harbour SRCAP c.7219C>T p.Gln2407* Nonsense YES Female 6 6.552720685
GSE41273 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 5 -0.26537653
GSE41273 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 10.41667 4.620596743
GSE41273 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 7.75 9.380603836
GSE41273 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 4.333333 7.378290152
GSE41273 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 0.083333 7.256745087
GSE41273 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 4.166667 6.582911793
GSE41273 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 21 32.38418863
GSE41273 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 34.58333 46.41126929
GSE41273 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 48 58.89975733
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 27 32.354974
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 12 11.03917455
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 42 40.85689027
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 28 31.89965321
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 15 15.3286979
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 17 13.98190146
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 21 21.42017869
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 30 35.16564816
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 28 27.14880628
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 21 24.03936596
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 33 37.84060062
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 29 35.17133434
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 25 25.67600147
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 17 14.45573451
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 33 36.37082822
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 29 34.45261333
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 20 24.86340454
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 41 46.76222649
146
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 31 34.61968346
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 27 29.78714348
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 17 19.72629863
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 15 11.78896917
May_2015 FXS FMR1 NA NA
CGG repeat 
expansion
YES Male 14 12.80759084
GSE116300 Kabuki KMT2D NA p.Pro443fs Frameshift YES Female 1 0.790826048
GSE116300 Kabuki KMT2D NA p.Tyr2199fs Frameshift YES Female 3 4.448848163
GSE116300 Kabuki KMT2D NA p.Ser5307fs Frameshift YES Male 5 11.49079359
GSE116300 Kabuki KMT2D NA p.Asn4403fs Frameshift YES Male 4.33 6.325934863
GSE116300 Kabuki KMT2D NA p.Gln4102* Nonsense YES Male 2 5.566745677
GSE116300 Kabuki KMT2D NA p.Gln3934* Nonsense YES Male 3.75 4.443224079
GSE116300 Kabuki KMT2D c.14515+1G>T NA
Splice site 
mutation
YES Male 2.5 16.55101592
GSE116300 Kabuki KMT2D NA p.Gln4090* Nonsense YES Female 1.42 3.379081974
GSE116300 Kabuki KMT2D NA p.Thr1708fs Frameshift YES Female 11.5 10.71344707
GSE97362 Kabuki KMT2D c.15061C>T p.Arg5021* Nonsense YES Female 14 8.946680052
GSE97362 Kabuki KMT2D c.16318delG
p.Glu5440Argfs*1
6
Frameshift YES Male 1 0.664960442
GSE97362 Kabuki KMT2D c.15030dup
p.Glu5011Argfs*1
3
Frameshift YES Male 18 24.00757516
GSE97362 Kabuki KMT2D c.8172_8173del p.Pro2724Glnfs*5 Frameshift YES Female 16 4.540501556
GSE97362 Kabuki KMT2D c.6595delT p.Tyr2199Ilefs*65 Frameshift YES Male 15 6.279894046
GSE97362 Kabuki KMT2D
c.14055_14056
delCA
p.His4685Glnfs*4 Frameshift YES Male 11 9.2260079
GSE97362 Kabuki KMT2D c.6295C>T p.Arg2099* Nonsense YES Male 14 6.594838599
GSE97362 Kabuki KMT2D c.4135delA
p.Met1379Valfs*5
2
Frameshift YES Male 20 10.04269734
GSE97362 Kabuki KMT2D c.12592C>T p.Arg4198* Nonsense YES Male 18 9.095825776
GSE97362 Kabuki KMT2D c.4135delA
p.Met1379Valfs*5
2
Frameshift YES Male 6 8.462691919
GSE97362 Kabuki KMT2D c.11710C>T p.Gln3904* Nonsense YES Male 16 12.68670209
GSE97362 Kabuki KMT2D c.15143G>A p.Arg5048His Missense YES_predicted Female 7 0.627461504
GSE97362 Kabuki KMT2D
c.16522-
5_16522-4delTT
NA
Splice site 
mutation
YES_predicted Female 15 12.75508563
Jun_2015 Kabuki KMT2D
c.1801_1822du
p22
NA Frameshift YES Male 7 6.044371299
Nov_2015 Kabuki KMT2D c.13059delG p.Pro4353fs Frameshift YES Female 6.7 5.526369466
Nov_2015 Kabuki KMT2D c.839+1delG NA
Splice site 
mutation
YES Male 1.9 2.51325414
Nov_2015 Kabuki KMT2D c.15844C>T p.Arg5282* Nonsense YES Female 3.9 3.752004426
Nov_2015 Kabuki KMT2D c.16294C>T p.Arg5432Trp Missense YES_predicted Male 21.6 30.3375233
Nov_2015 Kabuki KMT2D c.8488C>T p.Arg2830* Nonsense YES Female 0 -0.1055224
Nov_2015 Kabuki KMT2D c.4168dupG p.Ala1390fs Frameshift YES Female 3.8 4.177253095
Nov_2015 Kabuki KMT2D c.15289C>T p.Arg5097* Nonsense YES Male 4.3 6.455955113
Nov_2015 Kabuki KMT2D c.4419-2A>G NA
Splice site 
mutation
YES Male 2.6 3.387623395
Nov_2015 Kabuki KMT2D c.16048A>T p.Lys5350* Nonsense YES Female 19.1 19.2926115
Nov_2015 Kabuki KMT2D c.10201C>T p.Gln3401* Nonsense YES Male 7.1 8.838432826
Nov_2015 Kabuki KMT2D c.16360C>T p.Arg5454* Nonsense YES Male 3.4 5.199197126
Nov_2015 Kabuki KMT2D c.8692C>T p.Gln2898* Nonsense YES Male 3.1 3.423420462
S.2 Supplementary for chapter 3 147
Nov_2015 Kabuki KMT2D c.14878C>T p.Arg4960* Nonsense YES Female 4.1 4.752807097
Nov_2015 Kabuki KMT2D c.6265A>T p.Lys2089* Nonsense YES Female 23.1 25.95907184
Nov_2015 Kabuki KMT2D c.10740+1G>A NA
Splice site 
mutation
YES Female 6.9 6.253113479
Nov_2015 Kabuki KMT2D c.13652T>A p.Leu4551* Nonsense YES Male 2.2 3.757460909
Nov_2015 Kabuki KMT2D c.11596C>T p.Gln3866* Nonsense YES Female 1 1.193509229
Nov_2015 Kabuki KMT2D c.548delC p.Pro183fs Frameshift YES Female 16.6 8.413539447
Nov_2015 Kabuki KMT2D c.7411C>T p.Arg2471* Nonsense YES Female 3.3 3.541604601
Nov_2015 Kabuki KMT2D c.1966dupC pLeu656fs Frameshift YES Female 24.1 28.78927404
Nov_2015 Kabuki KMT2D c.6200delA p.Asn2067fs Frameshift YES Female 9.5 6.485224166
Nov_2015 Kabuki KMT2D c.7933C>T p.Arg2645* Nonsense YES Female 9.3 8.701999271
Nov_2015 Kabuki KMT2D c.13450C>T p.Arg4484* Nonsense YES Female 5.8 5.430619578
Feb_2016 Noonan PTPN11 c.1403C>T p.Thr468Met Missense YES Male 9 10.53231848
Feb_2016 Noonan PTPN11 c.1391G>C p.Gly464Ala Missense YES Female 28 25.06455423
Feb_2016 Noonan PTPN11 c.1493G>T p.Arg498Leu Missense YES Male 0.4 1.069462128
Feb_2016 Noonan PTPN11 c.836A>G p.Tyr279Cys Missense YES Male 0.2 0.145725107
Feb_2016 Noonan PTPN11 c.1493G>T p.Arg498Leu Missense YES Male 7 7.125930003
Feb_2016 Noonan PTPN11 c.1528C>G p.Gln510Glu Missense YES Female 2 4.906928458
Feb_2016 Noonan PTPN11 c.228G>C p.Glu76Asp Missense YES Male 17 17.52765019
Feb_2016 Noonan PTPN11 c.215C>G p.Ala72Gly Missense YES Female 13 9.011977393
Feb_2016 Noonan PTPN11 c.1391G>C p.Gly464Ala Missense YES Female 0.7 1.172244358
Feb_2016 Noonan PTPN11 c.922A>G p.Asn308Asp Missense YES Male 15 14.68576639
Feb_2016 Noonan PTPN11 c.836A>G p.Tyr279Cys Missense YES Male 0.3 0.576697185
Feb_2016 Noonan PTPN11 c.214G>T p.Ala72Ser Missense YES Male 0.9 1.080594238
Feb_2016 Noonan PTPN11 c.178G>A p.Gly60Ser Missense YES Male 2 3.079510066
Feb_2016 Noonan PTPN11 c.172A>G p.Asn58Asp Missense YES Male 37 42.63784241
Feb_2016 Noonan PTPN11 c.174C>A p.Asn58Lys Missense YES Female 27 32.19911243
Feb_2016 Noonan RAF1 c.781C>T p.Pro261Ser Missense YES Male 9 11.76954478
Feb_2016 Noonan RAF1 c.770C>T p.Ser257Leu Missense YES Female 4 6.836828788
Feb_2016 Noonan RAF1 c.788T>G p.Val263Gly Missense YES Male 8 10.54386119
Feb_2016 Noonan RAF1 c.782C>T p.Pro261Leu Missense YES Male 3 5.956377653
Feb_2016 Noonan RAF1 c.786T>A p.Asn262Lys Missense YES Female 3 3.603073783
Feb_2016 Noonan RAF1 c.768G>T p.Arg256Ser Missense YES Male 20 21.09275241
Feb_2016 Noonan RAF1 c.524A>G p.His175Arg Missense YES Female 0.7 0.815080545
Feb_2016 Noonan RAF1 c.1837C>G p.Leu613Val Missense YES Female 10 7.425274033
Feb_2016 Noonan RAF1 c.775T>A p.Ser259Thr Missense YES Female 8 8.883918263
Feb_2016 Noonan RAF1 c.1472C>T p.Thr491Ile Missense YES Female 26 29.82312626
Feb_2016 Noonan RAF1 c.781C>A p.Pro261Thr Missense YES Female 11 12.25565712
Feb_2016 Noonan SOS1 c.2536G>A p.Glu846Lys Missense YES Female 3 2.62618922
Feb_2016 Noonan SOS1 c.1654A>G p.Arg552Gly Missense YES Male 16 12.47288243
Feb_2016 Noonan SOS1 c.1310T>C p.Ile437Thr Missense YES Female 7 7.309199493
Feb_2016 Noonan SOS1 c.806T>C p.Met269Thr Missense YES Female 35 25.04627009
Feb_2016 Noonan SOS1 c.1642A>C p.Ser548Arg Missense YES Female 3 4.372134286
Feb_2016 Noonan SOS1 c.925G>T p.Asp309Tyr Missense YES Female 49 45.20434465
Feb_2016 Noonan SOS1 c.1655G>C p.Arg552Thr Missense YES Male 1 2.41372048
Feb_2016 Noonan SOS1 c.508A>G p.Lys170Glu Missense YES Male 0.3 0.944100935
Feb_2016 Noonan SOS1 c.1294T>C p.Trp432Arg Missense YES Female 14 17.03491762
148
Feb_2016 Noonan SOS1 c.1322G>A p.Cys441Tyr Missense YES Female 0.6 0.555111083
Feb_2016 Noonan SOS1 c.806T>G p.Met269Arg Missense YES Female 0.4 0.844087032
Feb_2016 Noonan SOS1 c.797C>A p.Thr266Lys Missense YES Male 1 2.133506512
Feb_2016 Noonan SOS1 c.1297G>A p.Glu433Lys Missense YES Male 1 1.481217449
Feb_2016 Noonan SOS1 c.1300G>A p.Gly434Arg Missense YES Male 5 8.558246566
May_2015 Rett MECP2 NA p.Arg106Trp Missense YES Female 1 1.835127123
May_2015 Rett MECP2 NA p.Arg168* Nonsense YES Female 25 29.34649481
May_2015 Rett MECP2 NA p.Pro302Arg Missense YES Female 34 35.17904908
May_2015 Rett MECP2 NA NA Exonic deletion YES Female 2 2.581071992
May_2015 Rett MECP2 NA p.Thr158Met Missense YES Female 1 2.210005617
May_2015 Rett MECP2
Deletion in exon 
4
NA Exonic deletion YES Female 3 5.225511336
May_2015 Rett MECP2 NA p.Thr158Met Missense YES Female 1 2.510753024
May_2015 Rett MECP2 NA p.Pro225Arg Missense YES Female 4 6.160921221
May_2015 Rett MECP2
c.1157_1197del
41
p.Glu374fs Frameshift YES Female 6 6.2636907
May_2015 Rett MECP2 NA p.Arg255* Nonsense YES Female 1.5 1.084382282
May_2015 Rett MECP2
Deletion in 
exons 3 and 4
NA Exonic deletion YES Female 6 6.883663479
May_2015 Rett MECP2 NA p.Arg106Trp Missense YES Female 29 38.83647398
May_2015 Rett MECP2 NA p.Thr158Met Missense YES Female 3 4.77442952
May_2015 Rett MECP2 NA p.Arg255* Nonsense YES Female 11 11.74653291
May_2015 Rett MECP2
Partial deletion 
of exon 4
NA Exonic deletion YES Female 4 3.072948979
Jun_2015 Saethre_Chotzen TWIST1
c.385_405dup2
1
NA
In-frame 
insertion
YES Female 0.003 -0.35722332
Nov_2015 Saethre_Chotzen TWIST1 c.149delC p.Ala50fs Frameshift YES Male 0.02 0.16785508
Nov_2015 Saethre_Chotzen TWIST1 c.149delC p.Ala50fs Frameshift YES Female 0.1 13.96937513
Nov_2015 Saethre_Chotzen TWIST1 c.376G>T p.Glu126* Nonsense YES Male 38 41.56611411
Nov_2015 Saethre_Chotzen TWIST1 c.406_407ins21 NA
In-frame 
insertion
YES Male 30 29.61790422
Nov_2015 Saethre_Chotzen TWIST1 c.156delC p.Pro52fs Frameshift YES Female 33.5 27.76671901
Nov_2015 Saethre_Chotzen TWIST1 c.418_419ins21 NA
In-frame 
insertion
YES Male 17.7 15.97052177
Nov_2015 Saethre_Chotzen TWIST1 c.211C>T p.Gln71* Nonsense YES Female 20.7 18.347741
Nov_2015 Saethre_Chotzen TWIST1 c.325C>T p.Gln109* Nonsense YES_predicted Male 0.7 0.45749609
Nov_2015 Saethre_Chotzen TWIST1
c.396_416dup2
1
NA
In-frame 
insertion
YES Male 0.1 0.386967314
Nov_2015 Saethre_Chotzen TWIST1 c.193G>T p.Glu65* Nonsense YES Female 0.01 0.049927484
Nov_2015 Saethre_Chotzen TWIST1 c.472T>C p.Phe158Leu Missense YES Female 23.3 0.174364646
Nov_2015 Saethre_Chotzen TWIST1 NA NA
Full gene 
deletion
YES Female 0.35 0.404844597
Nov_2015 Saethre_Chotzen TWIST1 NA NA
Full gene 
deletion
YES Female 0.003 7.069271322
Nov_2015 Saethre_Chotzen TWIST1 c.160G>T p.Gly54* Nonsense YES Female 0.7 0.830512167
Nov_2015 Saethre_Chotzen TWIST1
c.397_417dup2
1
NA
In-frame 
insertion
YES_predicted Female 20.5 25.83177177
Nov_2015 Saethre_Chotzen TWIST1 c.120_145del26 NA Frameshift YES Male 0.6 0.491449014
Nov_2015 Saethre_Chotzen TWIST1 c.149delC p.Ala50fs Frameshift YES Female 23.5 18.94806941
Nov_2015 Saethre_Chotzen TWIST1 c.394_414del21 NA
In-frame 
deletion
YES Female 12.3 10.10722932
Nov_2015 Saethre_Chotzen TWIST1 c.352C>G p.Arg118Gly Missense YES_predicted Female 21.5 23.41800184
S.2 Supplementary for chapter 3 149
Nov_2015 Saethre_Chotzen TWIST1 c.376G>T p.Glu126* Nonsense YES Female 0.8 0.92117994
Nov_2015 Saethre_Chotzen TWIST1 c.490C>T p.Gln164* Nonsense YES Female 28.7 28.56296158
GSE74432 Sotos NSD1
chr5:175,366,0
08-
177,470,488
NA Long deletion YES Female 9 8.442111023
GSE74432 Sotos NSD1
chr5:175,764,2
62-
177,059,256
NA Long deletion YES Female 7 16.4840396
GSE74432 Sotos NSD1
Exons 15-19 
deletion
NA Exonic deletion YES Male 10 26.70242296
GSE74432 Sotos NSD1 c.1716delC p.Cys573Valfs*26 Frameshift YES Female 10 14.59121875
GSE74432 Sotos NSD1 c.6454C>T p.Arg2152* Nonsense YES Female 3.5 9.371834336
GSE74432 Sotos NSD1 c.5445C>G p.Tyr1815* Nonsense YES Female 13.2 22.67264348
GSE74432 Sotos NSD1 c.4843delT
p.Tyr1615Thrfs*2
7
Frameshift YES Male 3 7.039068162
GSE74432 Sotos NSD1 NA NA Microdeletion YES Male 2.2 15.1797238
GSE74432 Sotos NSD1 c.6349C>T p.Arg2117* Nonsense YES Female 12 26.9093016
GSE74432 Sotos NSD1 c.1492C>T p.Arg498* Nonsense YES Male 2.2 8.399587071
GSE74432 Sotos NSD1 c.6454C>T p.Arg2152* Nonsense YES Male 18 32.23853498
GSE74432 Sotos NSD1 c.1583delA p.Lys528Argfs*8 Frameshift YES Male 19.7 27.25531484
GSE74432 Sotos NSD1
c.2014_2018del
ACAGA
p.Thr672Glufs*9 Frameshift YES Male 8 26.46585423
GSE74432 Sotos NSD1
c.2014_2018del
ACAGA
p.Thr672Glufs*9 Frameshift YES Male 41 67.36442178
GSE74432 Sotos NSD1
c.2014_2018del
ACAGA
p.Thr672Glufs*9 Frameshift YES Female 2 11.34495985
GSE74432 Sotos NSD1 c.1810C>T p.Arg604* Nonsense YES Female 1.6 6.2471485
GSE74432 Sotos NSD1 c.1801A>T p.Lys601* Nonsense YES Male 10.6 30.82670587
GSE74432 Sotos NSD1
c.4977_4978ins
G
p.Arg1660Alafs*1
3
Frameshift YES Male 20 41.38296452
GSE74432 Sotos NSD1 c.6437G>C p.Cys2146Ser Missense YES_predicted Male 2 9.83036953
GSE74432 Sotos NSD1 c.6412T>C p.Cys2138Arg Missense YES_predicted Male 7 29.0788673
GSE74432 Weaver EZH2
c.457_459delTA
T
p.Tyr153del
In-frame 
deletion
YES Male 30 40.6786865
GSE74432 Weaver EZH2 c.2080C>T p.His694Tyr Missense YES Female 10.9167 17.28626931
GSE74432 Weaver EZH2 c.2050C>T p.Arg684Cys Missense YES Male 2.5833 2.611103643
GSE74432 Weaver EZH2 c.398A>G p.Tyr133Cys Missense YES Female 17 7.870608634
GSE74432 Weaver EZH2 c.553G>C p.Asp185His Missense YES Male 15.4167 18.04003584
GSE74432 Weaver EZH2 c.394C>T p.Pro132Ser Missense YES Female 19.75 21.09459251
GSE74432 Weaver EZH2 c.1876G>A p.Val626Met Missense YES Male 43 42.37721085
Fig. S2.1 Table showing information for the samples from individuals with developmental disorders (total
N = 367). Mutation information was annotated for the human genome assembly hg19. ASD: autism spectrum
disorder; ATR-X: alpha thalassemia/mental retardation X-linked syndrome; FXS: fragile X syndrome.
150
l
l
ll
ll
ll
ll
llll
l
ll
l
l
l
l
l
l
ll
lll
l
ll
l
l l ll
llllll
ll l l l
ll
llllllllllll
ll
l
l
l
l
l
ll
ll
ll
l
ll
lll
l
l
l
l
l l ll
llllll
ll l
l l
ll
llllllllllll0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
Angelman (N=14)
l
l
l
l
l
l
ll
l
llllll
ll
lllllllllllllll l ll lllll
ll l
l lllll
lllllll
lll
ll
l
ll
l
lll
l
ll
l
l
l l l ll ll l l lllll
lllllll
lll
l
0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
ASD (N=119)
l
l
lll
l
l
l
l
l
lll
l
l
ll
lll
lllllll
ll l
ll l
llll
ll l
l l
ll
lllllll
llllllll
l
l
l
l
l
l
l
lllll
l
ll
ll ll
lllll ll l l l
ll
lllllllllllll0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
ATR−X (N=15)
l
lll
ll
l
ll
l
l
l
l
l
l
l
ll
lll
l
llllll
lll
l
l l ll lllll
ll l l lllllllllllllll
l
l
ll
l
l
lllll
l
l
ll
l
lllllll
ll ll l l
lllllllllll
lllll
0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
Claes_Jensen (N=10)
l
l
l
l
ll
l
l
l
l
l
ll
l
l
ll
ll
l
l
ll
ll
llll
l
ll
ll l ll lllll
ll l l l
ll
lllllll
llllll
l
l
l
l
l
l
l
l
ll
ll
l
lll
l
l
l
lll
l
l
ll l l l ll
ll l l ll ll
lllll
llllll
0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
Coffin_Lowry (N=10)
l
l
l
l
l
ll
l
l
ll
l
ll
lllllllllll
llll l ll lllll ll l l llllllllllllll
l
ll
l
llll
lll
lll
lllllllllllll ll
lll l l l l
ll
lllllllllll0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
Floating_Harbour (N=17)
l
l
ll
l
l
llll
lllll
lll
ll
ll
l
ll
ll
l
l
l
l
l l l
ll lll
ll
ll l l l
ll
ll
l
lllll
lll
lll
lllll
l
l
ll
l
l
l
ll
l
llll l
ll
l
lll
ll
ll
l
l l l
l lll
l
l l
ll
lllll
l
lll
lll
llll
0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
FXS (N=32)
l
l
lll
lll
l
ll
l
l
llllll
llll
l
lll
ll
l
ll
l
l
ll
l
ll
ll
l
l l
ll
ll
lllll
ll
l
ll
l
llll
lll
ll
lllllll
ll
l
l
ll
l
lll
ll
l
l
l
l l
ll
l
ll ll
l
l
l
llll
lllll
ll
l
ll
l
llll
l
0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
Kabuki (N=46)
l
l
llllllll
lll
llllll
lll
llll
llll l
ll l
llll ll l
l l
ll
lllllll
lllllll
l
l
lll
lll
l
lll
lllll l ll
lllll ll l
l l
ll
l
llll
lllllll
0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
Noonan_PTPN11 (N=15)
l
l
lll
llll
ll
l
l
l
lll
ll
l
llllll
llll l ll lllll ll l l l
ll
llllll
lllll
l
l
l
l
l
ll
ll
l
ll
lll
llllllll l ll
llllll0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
Noonan_RAF1 (N=11)
l
l
ll
l
llllllllll
lll
llll
ll
ll l l
l l
llll ll
l
l l
ll
llllll
ll
lllllll
l
l
l
ll
l
lll
l
ll
ll
l
ll lll
ll
llll l ll l
lllll ll
l l l
ll
l
llllll
lllllll
0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
Noonan_SOS1 (N=14)
l
l
lll
l
lllllll
l
ll
l
lll
llll
l
ll
l
l
l
l
l
l
l l
ll
llll
ll
ll l
l
l
ll
ll
lllll
lll
lll
llll
l
l
l
l
l
l
ll
llll
ll
l
l
l
l
lll
ll
ll l
l
l
ll
l
lllll
l
ll
lllllll
0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
Rett (N=15)
l
l
l
l
l
l
l
l
ll
llll
l
ll
l
lll
ll
lll
llllll l ll
lllll ll
l l l
ll
llllll
lllllll
l
l
l
ll
l
l
ll
ll
l
l
l
ll
l
l
lll
ll l l l
llll ll l
l l
ll
llllll
lllll0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
Saethre_Chotzen (N=22)
l
l
l
l
l
l
lll
lll
ll
lllllllll
lllllll l ll lllll ll l l lllllllllllllllll
l
l
l
l
l
l
lllllll
l
lllllll
llllll l
ll lllll ll l l llll
llllll
lllllll
0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
Sotos (N=20)
l
l
l
l
ll
lllllllllllllllllll l ll lllll ll l l lllllllllllllll
l
l
l
ll
l
l
lllllllllllll ll l l0
5
10
10 20 30 40 50
Median age in control (years)
−
lo
g 1
0(P
−
v
al
u
e)
EAA model
l
l
With CCC
Without CCC
Weaver (N=7)
Fig. S2.2 Effect of changing the median age of the controls when performing the screening for epigenetic age
acceleration (EAA) in the different developmental disorders. The dashed green line displays the significance
level of α = 0.01 after Bonferroni correction. The dashed orange line displays the median age for the samples
in the developmental disorder considered. In blue: EAA model without cell composition correction (CCC). In
red: EAA model with CCC. ASD: autism spectrum disorder; ATR-X: alpha thalassemia/mental retardation
X-linked syndrome; FXS: fragile X syndrome.
S.2 Supplementary for chapter 3 151
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
Angelman
Control
Control: N=1128
Angelman: N=14
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
Angelman: N=14
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Angelman: N=14
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
● ●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
ASD
Control
Control: N=1128
ASD: N=119
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
● ●
●
●
●●●●
●●
● ●
●
●
●●
●
●
●
●
●●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
ASD: N=119
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
ASD: N=119
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
ATR−X
Control
Control: N=1128
ATR−X: N=15
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
ATR−X: N=15
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
ATR−X: N=15
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
Claes_Jensen
Control
Control: N=1128
Claes_Jensen: N=10
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●
●●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
Claes_Jensen: N=10
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Claes_Jensen: N=10
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
Coffin_Lowry
Control
Control: N=1128
Coffin_Lowry: N=10
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
Coffin_Lowry: N=10
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Coffin_Lowry: N=10
152
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
Floating_Harbour
Control
Control: N=1128
Floating_Harbour: N=17
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●●
● ●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
Floating_Harbour: N=17
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Floating_Harbour: N=17
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
FXS
Control
Control: N=1128
FXS: N=32
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
FXS: N=32
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
FXS: N=32
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
Kabuki
Control
Control: N=1128
Kabuki: N=46
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
Kabuki: N=46
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Kabuki: N=46
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
Noonan_PTPN11
Control
Control: N=1128
Noonan_PTPN11: N=15
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
Noonan_PTPN11: N=15
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Noonan_PTPN11: N=15
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
Noonan_RAF1
Control
Control: N=1128
Noonan_RAF1: N=11
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
Noonan_RAF1: N=11
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Noonan_RAF1: N=11
S.2 Supplementary for chapter 3 153
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
Noonan_SOS1
Control
Control: N=1128
Noonan_SOS1: N=14
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
Noonan_SOS1: N=14
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Noonan_SOS1: N=14
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
Rett
Control
Control: N=1128
Rett: N=15
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●●●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
Rett: N=15
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Rett: N=15
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
Saethre_Chotzen
Control
Control: N=1128
Saethre_Chotzen: N=22
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
Saethre_Chotzen: N=22
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
● ●
●
● ●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Saethre_Chotzen: N=22
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
Sotos
Control
Control: N=1128
Sotos: N=20
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
Sotos: N=20
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Sotos: N=20
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●● ●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
0
20
40
60
80
0 20 40
Chronological age (years)
DN
Am
Ag
e 
(y
ea
rs
)
Disease status
●
●
Weaver
Control
Control: N=1128
Weaver: N=7
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●●
● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
● ●
●
●
●
●
● ● ●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
ou
t C
CC
 (y
ea
rs
)
Control: N=1128
Weaver: N=7
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●●
●
●
●●
●
● ●●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
−20
0
20
40
0 20 40
Chronological age (years)
EA
A 
wi
th
 C
CC
 (y
ea
rs
)
Control: N=1128
Weaver: N=7
Fig. S2.3 Screening for epigenetic age acceleration (EAA) in developmental disorders. Left panel: scatterplot
showing the relation between epigenetic age (DNAmAge) according to Horvath’s model and chronological age
of the samples for a given developmental disorder (orange) and control (grey). Each sample is represented
by one point. The black dashed line represents the diagonal to aid visualisation. Middle and right panels:
scatterplots showing the relation between the epigenetic age acceleration (EAA) (without and with CCC
respectively) and chronological age of the samples for a given developmental disorder (orange) and control
(grey). Each sample is represented by one point. The yellow line represents the linear model EAA ∼ Age, with
the standard error shown in the light yellow shade.
154
l
l
l
l
l
l
l
l
ll l l
l ll ll l
l l l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll l
l
l
ll
l
l
l
l
l
ll
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l
l
ll
l
l
l
ll
l
l
ll
ll
l
l
l
l ll
l
l
l l
l
l
l
l
l
l
l
ll ll l
l
l
l
l
l
l
l
l
ll
l
l
l l
l
l
ll
l
l l l
l l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l l
l
l
l l
l
l
l
ll
ll
l
l
l
l
Hypo Sotos DMPs Hypo−Hypo DMPs
Hyper−Hypo DMPs Hypo aDMPs
Hyper aDMPs Hyper Sotos DMPs
Active
 Enhancer 1
Active
 Enhancer 2
Active
 Enhancer Flank
Active
 TSS
Bivalent prom
oter
CG
I
G
ene_body
H
eterochrom
atin
P
oised prom
oter
Prim
ary DNase
Prim
ary H3K27ac possible Enhancer
Prom
oter D
o
w
n
stream
 TSS 1
Prom
oter D
o
w
n
stream
 TSS 2
Prom
oter Upstream
 TSS
Quiescent/lo
w
R
epressed polycom
b
Shelf
Shore
Strong tra
n
scription
Tra
n
scribed − 3' prefe
re
ntial
Tra
n
scribed − 5' prefe
re
ntial
Tra
n
scribed & regulatory (Prom/Enh)
Tra
n
scribed 3' prefe
re
ntial and Enh
Tra
n
scribed 5' prefe
re
ntial and Enh
Tra
n
scribed and W
e
ak Enhancer
W
e
ak Enhancer 1
W
e
ak Enhancer 2
W
e
ak tra
n
scription
ZN
F genes & repeats
Active
 Enhancer 1
Active
 Enhancer 2
Active
 Enhancer Flank
Active
 TSS
Bivalent prom
oter
CG
I
G
ene_body
H
eterochrom
atin
P
oised prom
oter
Prim
ary DNase
Prim
ary H3K27ac possible Enhancer
Prom
oter D
o
w
n
stream
 TSS 1
Prom
oter D
o
w
n
stream
 TSS 2
Prom
oter Upstream
 TSS
Quiescent/lo
w
R
epressed polycom
b
Shelf
Shore
Strong tra
n
scription
Tra
n
scribed − 3' prefe
re
ntial
Tra
n
scribed − 5' prefe
re
ntial
Tra
n
scribed & regulatory (Prom/Enh)
Tra
n
scribed 3' prefe
re
ntial and Enh
Tra
n
scribed 5' prefe
re
ntial and Enh
Tra
n
scribed and W
e
ak Enhancer
W
e
ak Enhancer 1
W
e
ak Enhancer 2
W
e
ak tra
n
scription
ZN
F genes & repeats
0.01
0.10
1.00
10.00
0.01
0.10
1.00
10.00
0.01
0.10
1.00
10.00
O
dd
s 
ra
tio
25
50
75
100
− log10(P − value)
Fig. S2.4 Enrichment for the categorical (epi)genomic features considered when comparing the different
genome-wide subsets of differentially methylated positions (DMPs) in ageing and Sotos against a control (see
section 3.7). The y-axis represents the odds ratio (OR), the error bars show the 95% confidence interval for
the OR estimate and the colour of the points codes for − log10(p-value) obtained after testing for enrichment
using Fisher’s exact test. An OR > 1 shows that the given feature is enriched in the subset of DMPs considered,
whilst an OR < 1 shows that it is found less than expected. The ‘Hyper-Hypo DMPs’ subset results from
the intersection between the hypermethylated DMPs in ageing and the hypomethylated DMPs in Sotos. The
‘Hypo-Hypo DMPs’ subset results from the intersection between the hypomethylated DMPs in ageing and
Sotos. In grey: features that did not reach significance using a significance level of α = 0.01 after Bonferroni
correction.
S.2 Supplementary for chapter 3 155
< 2.2e−16
388195 40071
−0.381 −0.411
0.46
428155 111
−0.384 −0.405
0.0041
425716 2550
−0.384 −0.372
< 2.2e−16
390815 37451
−0.378 −0.428
< 2.2e−16
413204 15062
−0.382 −0.422
6.1e−09
426538 1728
−0.384 −0.416
Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K27ac
0.3
388195 40071
−0.299 −0.265
0.076
428155 111
−0.295 −0.329
0.00065
425716 2550
−0.295 −0.302
< 2.2e−16
390815 37451
−0.279 −0.413
< 2.2e−16
413204 15062
−0.292 −0.339
5.9e−12
426538 1728
−0.295 −0.344
Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K4me3
< 2.2e−16
388195 40071
−0.306 −0.258
0.16
428155 111
−0.301 −0.264
0.67
425716 2550
−0.301 −0.284
< 2.2e−16
390815 37451
−0.297 −0.335
< 2.2e−16
413204 15062
−0.301 −0.308
< 2.2e−16
426538 1728
−0.301 −0.34
Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K36me3
< 2.2e−16
388195 40071
−0.395 1.31
4.1e−10
428155 111
−0.361 0.629
< 2.2e−16
425716 2550
−0.363 0.759
< 2.2e−16
390815 37451
−0.369 −0.283
< 2.2e−16
413204 15062
−0.373 0.194
< 2.2e−16
426538 1728
−0.362 −0.046
Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K27me3
< 2.2e−16
388195 40071
−0.42 −0.414
0.81
428155 111
−0.42 −0.342
0.26
425716 2550
−0.42 −0.389
< 2.2e−16
390815 37451
−0.413 −0.471
< 2.2e−16
413204 15062
−0.418 −0.448
0.004
426538 1728
−0.42 −0.43
Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K9ac
< 2.2e−16
388195 40071
−0.292 0.113
8.8e−06
428155 111
−0.248 0.16
< 2.2e−16
425716 2550
−0.25 0.014
< 2.2e−16
390815 37451
−0.219 −0.463
0.0021
413204 15062
−0.246 −0.275
0.0092
426538 1728
−0.247 −0.365
Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K4me1
< 2.2e−16
388195 40071
−0.166 0.096
0.00033
428155 111
−0.147 0.105
< 2.2e−16
425716 2550
−0.148 −0.061
0.18
390815 37451
−0.147 −0.146
< 2.2e−16
413204 15062
−0.15 −0.077
1.4e−05
426538 1728
−0.147 −0.179
Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K9me3
< 2.2e−16
388195 40071
−0.17 −0.283
0.88
428155 111
−0.18 −0.204
7e−06
425716 2550
−0.18 −0.231
< 2.2e−16
390815 37451
−0.175 −0.219
3.8e−11
413204 15062
−0.179 −0.202
0.75
426538 1728
−0.18 −0.179
Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: RNF2
< 2.2e−16
388195 40071
−0.276 0.173
6.9e−09
428155 111
−0.258 0.051
< 2.2e−16
425716 2550
−0.259 −0.026
< 2.2e−16
390815 37451
−0.254 −0.296
< 2.2e−16
413204 15062
−0.259 −0.214
2.4e−05
426538 1728
−0.258 −0.275
Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: EZH2
< 2.2e−16
388195 40071
−0.316 −0.363
0.018
428155 111
−0.324 −0.351
< 2.2e−16
425716 2550
−0.323 −0.363
< 2.2e−16
390815 37451
−0.319 −0.351
< 2.2e−16
413204 15062
−0.319 −0.375
< 2.2e−16
426538 1728
−0.323 −0.375
Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
R
E
Feature: RNA
< 2.2e−16
388195 40071
69.059 62.084
0.0024
428155 111
68.55 57.755
< 2.2e−16
425716 2550
68.507 72.956
0.00016
390815 37451
68.522 68.78
< 2.2e−16
413204 15062
68.366 72.137
< 2.2e−16
426538 1728
68.514 74.315
Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
0
25
50
75
10
0
W
TS
Feature: Replication_timing
< 2.2e−16
388195 40071
0.997 1.492
0.32
428155 111
1.052 1.251
< 2.2e−16
425716 2550
1.048 1.494
< 2.2e−16
390815 37451
1.072 0.822
< 2.2e−16
413204 15062
1.037 1.365
4.5e−14
426538 1728
1.051 1.259
Hyper aDMPs Hyper Sotos DMPs Hyper−Hypo DMPs Hypo aDMPs Hypo Sotos DMPs Hypo−Hypo DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
R
C
Feature: LaminB1
Fig. S2.5 Boxplots showing the distributions of scores for the continuous (epi)genomic features considered
when comparing the different genome-wide subsets of differentially methylated positions (DMPs) in ageing
and Sotos against a control (see section 3.7). The p-values (two-sided Wilcoxon’s test, before multiple testing
correction) are shown above the boxplots. The number of DMPs belonging to each subset (in green) and the
median value of the feature score (in dark red) are shown below the boxplots. NFC: ‘normalised fold change’;
NRE: ‘normalised RNA expression’; WTS: ‘wavelet-transformed signals’; NRC: ‘normalised read counts’.
156
Horvath's clock CpGs
Fe
at
ur
es
H3K9me3_ENCFF713QZB
H3K9me3_ENCFF319EBK
H3K9me3_ENCFF033IPJ
H3K9me3_ENCFF171WZC
RNF2_ENCFF071CIY
RNF2_ENCFF847TGB
RNF2_ENCFF320VKN
RNF2_ENCFF857HEZ
H3K27me3_ENCFF412KUE
H3K27me3_ENCFF150RIG
H3K27me3_ENCFF265VZG
EZH2_ENCFF516PTT
H3K4me1_ENCFF457WMB
H3K36me3_ENCFF643USH
H3K36me3_ENCFF249WVX
H3K4me3_ENCFF573QMJ
H3K9ac_ENCFF455IGC
H3K4me3_ENCFF796FFT
H3K4me3_ENCFF303YKC
H3K27ac_ENCFF759GIZ
H3K27ac_ENCFF737OJY
H3K4me1_ENCFF100NYH
H3K9ac_ENCFF211ORP
cg
09
13
30
26
cg
10
37
72
74
cg
21
37
01
43
cg
26
39
49
40
cg
09
86
98
58
cg
18
18
07
83
cg
05
96
00
24
cg
01
58
44
73
cg
01
35
34
48
cg
26
45
35
88
cg
04
12
68
66
cg
02
33
24
92
cg
06
14
49
05
cg
07
59
59
43
cg
01
57
08
85
cg
19
69
27
10
cg
13
30
21
54
cg
25
41
17
25
cg
12
98
54
18
cg
01
40
77
97
cg
01
45
94
53
cg
07
77
02
22
cg
06
92
67
35
cg
06
68
88
48
cg
20
76
13
22
cg
04
09
41
60
cg
01
56
08
71
cg
07
40
84
56
cg
07
28
52
76
cg
15
66
14
09
cg
10
37
67
63
cg
20
79
58
63
cg
06
04
48
99
cg
05
75
57
79
cg
08
12
47
22
cg
07
33
75
98
cg
16
15
04
35
cg
19
30
52
27
cg
25
80
99
05
cg
06
11
78
55
cg
06
95
23
10
cg
21
37
82
06
cg
22
56
85
40
cg
13
68
27
22
cg
09
64
63
92
cg
24
12
68
51
cg
18
98
36
72
cg
24
11
68
86
cg
10
34
59
36
cg
17
27
40
64
cg
02
58
06
06
cg
12
41
35
66
cg
27
01
63
07
cg
17
96
05
16
cg
08
33
19
60
cg
27
20
27
08
cg
17
72
96
67
cg
09
72
25
55
cg
24
58
00
01
cg
17
58
93
41
cg
16
03
46
52
cg
19
70
66
82
cg
01
65
62
16
cg
19
34
61
93
cg
14
16
37
76
cg
03
56
53
23
cg
25
56
48
00
cg
07
15
83
39
cg
19
56
96
84
cg
03
01
90
00
cg
24
88
80
49
cg
08
25
10
36
cg
13
03
85
60
cg
10
26
64
90
cg
22
67
91
20
cg
18
44
00
48
cg
18
98
41
51
cg
15
54
75
34
cg
26
00
38
13
cg
13
82
80
47
cg
15
80
49
73
cg
21
21
17
48
cg
01
26
29
13
cg
06
46
22
91
cg
14
42
37
78
cg
23
66
26
75
cg
07
45
52
79
cg
13
85
48
74
cg
00
37
47
17
cg
24
47
18
94
cg
11
31
46
84
cg
17
09
95
69
cg
19
04
69
59
cg
09
80
96
72
cg
00
09
16
93
cg
16
89
94
42
cg
13
12
90
46
cg
14
59
79
08
cg
02
07
13
05
cg
10
04
58
81
cg
18
13
97
69
cg
03
68
28
23
cg
12
94
13
69
cg
27
54
41
90
cg
24
89
97
50
cg
14
99
22
53
cg
01
02
77
39
cg
25
68
30
12
cg
11
29
99
64
cg
19
72
44
70
cg
17
33
84
03
cg
24
25
41
20
cg
13
54
72
37
cg
02
15
40
74
cg
26
72
38
47
cg
08
96
52
35
cg
03
58
83
57
cg
04
47
48
32
cg
18
32
89
33
cg
19
76
12
73
cg
27
01
59
31
cg
15
70
35
12
cg
04
43
10
54
cg
02
33
54
41
cg
02
65
42
91
cg
13
93
12
28
cg
06
99
34
13
cg
19
85
37
60
cg
02
27
52
94
cg
00
94
55
07
cg
14
40
99
58
cg
13
97
53
69
cg
14
17
54
38
cg
20
24
08
60
cg
11
38
82
38
cg
24
26
24
69
cg
26
82
40
91
cg
26
00
50
82
cg
02
47
95
75
cg
21
80
13
78
cg
23
94
15
99
cg
12
94
62
25
cg
25
50
56
10
cg
22
92
08
73
cg
09
88
59
51
cg
20
82
80
84
cg
03
57
80
41
cg
02
38
81
50
cg
00
16
89
42
cg
00
07
59
67
cg
04
08
41
57
cg
17
68
68
85
cg
19
51
49
28
cg
27
49
43
83
cg
05
92
16
99
cg
07
38
84
93
cg
02
08
55
07
cg
05
44
29
02
cg
09
72
23
97
cg
02
36
46
42
cg
01
82
03
74
cg
04
12
19
83
cg
07
66
37
89
cg
00
43
15
49
cg
03
27
02
04
cg
10
86
51
19
cg
19
16
76
73
cg
16
35
88
26
cg
09
01
99
38
cg
08
43
42
34
cg
13
46
04
09
cg
15
97
40
53
cg
09
41
82
83
cg
26
62
09
59
cg
26
37
25
17
cg
03
10
31
92
cg
25
55
24
92
cg
01
96
81
78
cg
25
10
19
36
cg
14
32
91
57
cg
02
21
71
59
cg
00
86
48
67
cg
09
50
96
73
cg
05
36
57
29
cg
19
42
09
68
cg
12
83
06
94
cg
05
29
42
43
cg
20
91
45
08
cg
10
28
10
02
cg
10
92
09
57
cg
06
83
67
72
cg
21
09
63
99
cg
23
51
76
05
cg
08
03
00
82
cg
15
98
82
32
cg
14
89
41
44
cg
13
21
60
57
cg
04
00
50
32
cg
16
49
44
77
cg
10
48
69
98
cg
21
87
08
84
cg
12
61
62
77
cg
12
35
14
33
cg
14
72
30
32
cg
06
55
73
58
cg
27
16
90
20
cg
04
26
84
05
cg
22
44
91
14
cg
03
16
72
75
cg
06
73
86
02
cg
22
17
18
29
cg
25
07
06
37
cg
04
45
27
13
cg
08
37
09
96
cg
14
25
82
36
cg
09
11
86
25
cg
22
28
98
37
cg
12
76
86
05
cg
06
49
39
94
cg
22
90
18
40
cg
20
69
25
69
cg
27
31
98
98
cg
13
83
66
27
cg
22
19
01
14
cg
25
14
85
89
cg
00
43
66
03
cg
02
48
95
52
cg
08
77
17
31
cg
14
50
12
53
cg
17
06
39
29
cg
17
65
56
14
cg
17
85
35
87
cg
18
57
33
83
cg
25
65
78
34
cg
25
92
85
79
cg
14
65
83
62
cg
14
06
08
28
cg
26
04
54
34
cg
09
19
13
27
cg
22
94
70
00
cg
14
42
45
79
cg
22
00
63
86
cg
20
29
56
71
cg
25
77
11
95
cg
24
05
81
32
cg
26
84
53
00
cg
11
65
32
66
cg
03
89
13
19
cg
06
36
11
08
cg
06
12
14
69
cg
11
93
25
64
cg
10
52
30
19
cg
22
80
90
47
cg
08
18
61
24
cg
24
08
18
19
cg
26
29
76
88
cg
10
94
00
99
cg
06
51
30
75
cg
16
41
93
45
cg
16
16
83
11
cg
03
33
00
58
cg
21
39
57
82
cg
25
16
68
96
cg
14
72
79
52
cg
07
29
15
63
cg
05
90
36
09
cg
27
41
35
43
cg
04
83
60
38
cg
24
83
47
40
cg
20
99
98
13
cg
20
30
56
10
cg
26
04
33
91
cg
19
94
58
40
cg
16
74
47
41
cg
14
65
48
75
cg
22
73
63
54
cg
12
37
37
71
cg
25
15
96
10
cg
01
64
48
50
cg
17
28
53
25
cg
16
24
17
14
cg
19
72
28
47
cg
22
63
75
07
cg
21
95
05
18
cg
03
94
73
62
cg
01
51
15
67
cg
16
98
49
44
cg
14
40
89
69
cg
25
78
11
23
cg
23
09
20
72
cg
26
61
40
73
cg
07
49
84
21
cg
01
87
36
45
cg
02
04
75
77
cg
21
30
52
65
cg
23
78
65
76
cg
20
10
03
81
cg
13
26
94
07
cg
09
78
51
72
cg
26
16
26
95
cg
22
19
78
30
cg
15
26
29
28
cg
11
02
57
93
cg
01
48
56
45
cg
03
28
67
83
cg
02
82
71
12
cg
20
52
42
16
cg
26
45
69
57
cg
19
27
31
82
cg
23
12
44
51
cg
15
34
13
40
cg
16
57
91
01
cg
07
84
99
04
cg
08
09
07
72
cg
27
09
20
35
cg
01
02
78
05
cg
24
45
03
12
cg
06
81
06
47
cg
15
38
17
69
cg
21
46
00
81
cg
20
94
77
75
cg
05
67
53
73
cg
22
43
22
69
cg
19
00
88
09
cg
09
44
11
52
cg
16
40
83
94
cg
17
32
41
28
cg
05
59
02
57
cg
01
23
40
63
cg
07
73
03
01
cg
18
03
10
08
cg
08
41
34
69
cg
22
61
30
10
cg
05
84
77
78
cg
23
18
03
65
cg
19
47
87
43
cg
19
04
46
74
cg
17
40
86
47
cg
18
95
60
95
cg
15
18
52
86
cg
27
37
74
50
cg
03
76
04
83
cg
14
30
84
52
cg
04
99
96
91
cg
13
31
91
75
cg
18
05
50
07
cg
26
84
20
24
cg
02
97
25
51
cg
02
33
15
61
cg
04
52
88
19
cg
16
54
75
29
cg
05
25
04
58
cg
13
89
91
08
Z−score 
(in PBMC)
−4
−2
0
2
4
Cell type
B cell
K562
PBMC
Sotos DMPs
Hypomethylated
aDMPs
Hypermethylated
Hypomethylated
Weight 
in model
−1
−0.5
0
0.5
1
ChrHMM state 
(in K562)
Active TSS
Promoter
Transcribed
Weakly transcribed
Transcribed/regulatory
Active enhancer
Weak enhancer
DNase
Heterochromatin
Poised promoter
Bivalent promoter
Repressed polycomb
Quiescent/low
RNA 
(in PBMC)
−2
−1
0
1
2
In gene body
Yes
No
Fig. S2.6 Heatmap displaying the scores for the different continuous (epi)genomic features (rows) in each one
of the 353 Horvath’s epigenetic clock CpGs (columns). The names of the features include the ENCODE ID (see
Fig. S2.11). Hierarchical clustering was performed in both rows and columns. RNA refers to the ‘normalised
RNA expression’ (NRE). aDMPs: differentially methylated positions during ageing. PBMC: peripheral blood
mononuclear cells.
S.2 Supplementary for chapter 3 157
ll
l
l
l
l
l l
l
l
l l
l
ll l
l
l
l l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l l l
l
ll l
l
l l
l l
l
l ll l
l
l
l
l
l
l
l
l
l
l
l ll l l
l
ll ll l
l
l
l
l
l
ll
l
l
l
l
l
l
l
ll
l
l l ll l ll ll ll l l
l l
ll ll l
l
l
l
Hypo aDMPs Hypo Sotos DMPs
All Horvath Hyper aDMPs
Active
 Enhancer 1
Active
 Enhancer 2
Active
 Enhancer Flank
Active
 TSS
Bivalent prom
oter
CG
I
G
ene_body
H
eterochrom
atin
P
oised prom
oter
Prim
ary DNase
Prim
ary H3K27ac possible Enhancer
Prom
oter D
o
w
n
stream
 TSS 1
Prom
oter D
o
w
n
stream
 TSS 2
Prom
oter Upstream
 TSS
Quiescent/lo
w
R
epressed polycom
b
Shelf
Shore
Strong tra
n
scription
Tra
n
scribed − 3' prefe
re
ntial
Tra
n
scribed − 5' prefe
re
ntial
Tra
n
scribed & regulatory (Prom/Enh)
Tra
n
scribed 3' prefe
re
ntial and Enh
Tra
n
scribed 5' prefe
re
ntial and Enh
Tra
n
scribed and W
e
ak Enhancer
W
e
ak Enhancer 1
W
e
ak Enhancer 2
W
e
ak tra
n
scription
ZN
F genes & repeats
Active
 Enhancer 1
Active
 Enhancer 2
Active
 Enhancer Flank
Active
 TSS
Bivalent prom
oter
CG
I
G
ene_body
H
eterochrom
atin
P
oised prom
oter
Prim
ary DNase
Prim
ary H3K27ac possible Enhancer
Prom
oter D
o
w
n
stream
 TSS 1
Prom
oter D
o
w
n
stream
 TSS 2
Prom
oter Upstream
 TSS
Quiescent/lo
w
R
epressed polycom
b
Shelf
Shore
Strong tra
n
scription
Tra
n
scribed − 3' prefe
re
ntial
Tra
n
scribed − 5' prefe
re
ntial
Tra
n
scribed & regulatory (Prom/Enh)
Tra
n
scribed 3' prefe
re
ntial and Enh
Tra
n
scribed 5' prefe
re
ntial and Enh
Tra
n
scribed and W
e
ak Enhancer
W
e
ak Enhancer 1
W
e
ak Enhancer 2
W
e
ak tra
n
scription
ZN
F genes & repeats
0.01
0.10
1.00
10.00
0.01
0.10
1.00
10.00O
dd
s 
ra
tio
8.0
8.2
8.4
− log10(P − value)
Fig. S2.7 As in Fig. S2.4., but focused on the 353 Horvath’s epigenetic clock CpG sites.
158
0.00048
21015 353
−0.356 −0.474
4.8e−06
21285 83
−0.357 −0.656
0.47
21306 62
−0.358 −0.408
0.42
21339 29
−0.358 −0.401
All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K27ac
0.00024
21015 353
−0.218 −0.359
0.0017
21285 83
−0.219 −0.392
0.003
21306 62
−0.22 −0.437
0.079
21339 29
−0.22 −0.426
All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K4me3
0.059
21015 353
−0.17 −0.131
0.00058
21285 83
−0.17 −0.039
0.074
21306 62
−0.169 −0.237
0.57
21339 29
−0.169 −0.208
All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K36me3
0.0011
21015 353
−0.419 −0.298
3.8e−11
21285 83
−0.419 0.665
0.15
21306 62
−0.417 −0.42
0.43
21339 29
−0.418 −0.298
All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K27me3
1e−04
21015 353
−0.248 −0.477
3.3e−07
21285 83
−0.25 −0.66
0.21
21306 62
−0.251 −0.395
0.46
21339 29
−0.251 −0.307
All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K9ac
0.64
21015 353
−0.035 −0.035
0.93
21285 83
−0.035 −0.14
0.28
21306 62
−0.035 −0.02
0.032
21339 29
−0.035 0.264
All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K4me1
0.011
21015 353
−0.135 −0.088
2.4e−06
21285 83
−0.135 0.223
0.09
21306 62
−0.134 −0.257
0.41
21339 29
−0.134 −0.196
All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: H3K9me3
0.54
21015 353
−0.072 −0.127
0.54
21285 83
−0.073 −0.129
0.93
21306 62
−0.073 −0.135
0.3
21339 29
−0.073 0.027
All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: RNF2
0.00014
21015 353
−0.291 −0.187
2e−05
21285 83
−0.29 −0.008
0.52
21306 62
−0.29 −0.321
0.16
21339 29
−0.29 −0.197
All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
FC
Feature: EZH2
8.1e−05
21015 353
−0.358 −0.385
1.6e−06
21285 83
−0.358 −0.42
0.12
21306 62
−0.358 −0.386
0.011
21339 29
−0.358 −0.4
All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
R
E
Feature: RNA
0.24
21015 353
71.313 71.823
0.51
21285 83
71.313 72.587
0.59
21306 62
71.324 68.928
0.36
21339 29
71.313 74.865
All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
0
25
50
75
10
0
W
TS
Feature: Replication_timing
0.41
21015 353
1.245 1.314
0.054
21285 83
1.245 1.432
0.7
21306 62
1.246 1.448
0.43
21339 29
1.246 1.633
All Horvath Hyper aDMPs Hypo aDMPs Hypo Sotos DMPs
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
Co
nt
ro
l
In
 s
ub
se
t
−
2
0
2
4
N
R
C
Feature: LaminB1
Fig. S2.8 As in Fig. S2.5., but focused on the 353 Horvath’s epigenetic clock CpG sites.
S.2 Supplementary for chapter 3 159
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●● ●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
● ●
●●● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−0.06
−0.03
0.00
0.03
0.06
0 20 40
Chronological age (years)
Ge
no
m
e−w
id
e 
Sh
an
no
n 
en
tro
py
 a
cc
el
er
at
io
n
Control: N=1128
Sotos: N=20
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
● ● ●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ●
●●
●
●
●●
●
● ●
●
●●
●
● ●
● ●
●
●●
●
●
● ●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ● ●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●●
●● ●
●
●
●
●●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
● ● ●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●● ● ●●
●
●
●●
●
●
●
●●
●●
● ●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
● ●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−0.06
−0.03
0.00
0.03
0.06
0 20 40
Chronological age (years)S
ha
nn
on
 e
nt
ro
py
 a
cc
el
er
at
io
n 
fo
r t
he
 cl
oc
k 
si
te
s
Control: N=1128
Sotos: N=20
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
● ●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
● ●● ●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●●
●
● ●
●
● ●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●●
●
● ●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
● ●
●
●● ●
●
●
● ●
●
●
●
●
●
●
●●
● ●
●
● ●
●
●
●
●
●
●
● ●●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●●
●
●
●● ●
●●●
●
●
●●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
● ●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●●●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
● ●
●●
●
●
●
● ●
●
●
●●
●
●●●
●
●
●● ●
●
● ● ●●
●●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
0.3
0.4
0.5
0.6
0 20 40
Chronological age (years)
Sh
an
no
n 
en
tro
py
 fo
r t
he
 cl
oc
k 
si
te
s
Disease status
●
●
Control
Sotos
Control: N=1128
Sotos: N=20
a b
Fig. S2.9 Methylation Shannon entropy acceleration. a. Scatterplot showing the relationship between the
genome-wide Shannon entropy acceleration (gSEA) and chronological age of the samples for Sotos (orange)
and healthy controls (grey). Each sample is represented by one point. The yellow line represents the linear
model gSEA ∼ Age, with the standard error shown in the light yellow shade. b. As in a., but using the Shannon
entropy acceleration calculated only for the 353 CpG sites in the Horvath’s epigenetic clock (cSEA).
l
l
l
l
l
l
l
l
l
l
l ll
l
l
l l
l
l
l
l
l l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l l
l
lll
l
l
l
ll
ll
l
l
l
l
ll
l
l
l
l
l l
l
l
ll
l
l
l
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l l
ll
l l
l
l
l
l
l
l
l
l
l
l
l
l
ll l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l ll ll
l
l
ll
l
l
l
l
l
l
l
lll
l
l
l
l
l
l l
ll
l
l l
l
l ll
l
l
l
l
ll
l
l
ll
l
l
l
l
l
lll
ll
l
l
ll
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
llll
ll
l
l
l
l l
l
ll l
l
l
l l
ll
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l lll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll l
l
l
l l
l
l
l ll
l
l
l
l l
l
l
l
l
l
l
l
ll
l
l l
l
ll
l
l l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l ll
ll
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l lll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l ll
l
l
lll
l
l
l
ll
l
l
ll
l
l
l
ll
l
l
ll
l
l
l
l
l
l
l ll
lll
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
ll
l
l l
ll
l
l
l
l l
l
l
l
ll
l
lll
l
l
lll
ll
l l ll
ll
l l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l l
l
ll l
l
l
l
l
l
l
l
ll
l
l l
l
0.3
0.4
0.5
0.6
0 20 40
Chronological age (years)
Sh
an
no
n 
en
tro
py
 fo
r 
th
e 
cl
oc
k 
si
te
s Batch
l
l
l
l
l
l
l
l
l
l
l
l
l
l
Europe
Feb_2016
GSE104812
GSE111629
GSE40279
GSE41273
GSE42861
GSE51032
GSE55491
GSE59065
GSE61496
GSE74432
GSE81961
GSE97362
Control: N=1128
Sotos: N=20
Fig. S2.10 Scatterplot showing the effects of the different batches on the methylation Shannon entropy calcu-
lations for the 353 Horvath’s epigenetic clock sites. Each sample is represented by one point and coloured
according to the batch that they belong to.
160
File ID Feature type Data type Tissue Age (years) Sex Source
ENCFF516PTT EZH2 fold change over control B cell 27 Female ENCODE
ENCFF071CIY RNF2 fold change over control K562 NA NA ENCODE
ENCFF857HEZ RNF2 fold change over control K562 NA NA ENCODE
ENCFF320VKN RNF2 fold change over control K562 NA NA ENCODE
ENCFF847TGB RNF2 fold change over control K562 NA NA ENCODE
ENCFF737OJY H3K27ac fold change over control PBMC 32 Male ENCODE
ENCFF303YKC H3K4me3 fold change over control PBMC 32 Male ENCODE
ENCFF643USH H3K36me3 fold change over control PBMC 32 Male ENCODE
ENCFF249WVX H3K36me3 fold change over control PBMC 28 Male ENCODE
ENCFF759GIZ H3K27ac fold change over control PBMC 28 Female ENCODE
ENCFF412KUE H3K27me3 fold change over control PBMC 32 Male ENCODE
ENCFF455IGC H3K9ac fold change over control PBMC 28 Male ENCODE
ENCFF457WMB H3K4me1 fold change over control PBMC 32 Male ENCODE
ENCFF211ORP H3K9ac fold change over control PBMC 27 Male ENCODE
ENCFF171WZC H3K9me3 fold change over control PBMC 27 Male ENCODE
ENCFF573QMJ H3K4me3 fold change over control PBMC 27 Male ENCODE
ENCFF150RIG H3K27me3 fold change over control PBMC 28 Female ENCODE
ENCFF033IPJ H3K9me3 fold change over control PBMC 28 Female ENCODE
ENCFF796FFT H3K4me3 fold change over control PBMC 28 Female ENCODE
ENCFF100NYH H3K4me1 fold change over control PBMC 27 Male ENCODE
ENCFF713QZB H3K9me3 fold change over control PBMC 32 Male ENCODE
ENCFF265VZG H3K27me3 fold change over control PBMC 28 Male ENCODE
ENCFF319EBK H3K9me3 fold change over control PBMC 28 Male ENCODE
ENCFF754LBN RNA-seq minus strand signal of unique reads PBMC 52 Female ENCODE
ENCFF398HDS RNA-seq plus strand signal of unique reads PBMC 52 Female ENCODE
GSM923447 Replication timing
Wavelet-transformed 
signals IMR90 NA Female GEO
GSM1289416 LaminB1 Normalised read counts IMR90 NA NA GEO
Fig. S2.11 Information (including the source) about the continuous (epi)genomic features (ChIP-seq and
RNA-seq data) that were included in my analysis to annotate the different sets of CpG sites. All the data were
mapped to the hg19 assembly of the human genome. PBMC: peripheral blood mononuclear cells.
S.3 Supplementary for chapter 4 161
S.3 Supplementary for chapter 4
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
AanI
Acc36I
AccB7I
AclWI
AcsI
AcuI
AflII
AflIII
AgeI
AgsI
AhdI
AhlI
AjnI
AjuI
AluBI
Alw21I
AlwNI
Ama87I
AseI
Asp700I
AspA2I
AsuHPI
AsuII
AxyI
BaeGI
BalI
BamHI
BanII
BauI
BbsI
BbvI
BccI
BciT130I
BciVI BclI
BfaI
BfmI
BfuCI
BglII
BlpI BmcAI
BmrI
BmtI
BplI
BpmI
Bpu10I
BpuEI
BsaJI
BsaWI
BsaXI
Bse118I
Bse1I
Bse3DI
BseGI
BseMII
BseRI
BsgI
BshFI
BsiSI
BsmI
Bsp1286I
Bsp1407I
Bsp19I
BspCNI
BspHIBspMAI
BspQI
BssT1I
Bst6I
BstDEI
BstDSI
BstEII
BstENI
BstKTI
BstNSI
BstX2I
BstXI
BtsI
BtsIMutI
CsiI
Csp6I
CspCI
CviAII
DraI
Eco147I
Eco32I
EcoO109I
EcoT22I
FaeI
FalI
FatI
FauNDI
FokI
HindIII
Hpy188I
HpyCH4V
KpnI
MaeIII
MboII
MfeI
MluCI
MlyI
MnlI
MseI
MslI
MssI
NmeAIII
PacI
PaeI
PaeR7I
PasI
PciI
PflFI
PpuMI
Psp124BI
PvuII
SbfI
SmiI
SmlISspI
TaqI
TatI
Tru9I
TscAI
Tsp45I
TspDTI
XbaI
XcmI
105
106
107
102 103 104
Median fragment length (in bp)
To
ta
l n
u
m
be
r o
f f
ra
gm
en
ts
Fig. S3.1 Scatterplot which summarises the fragment length distributions for the same isoschizomer families
portrayed in Fig 4.2a. The red dots represent the actual values of median fragment length and total number
of fragments for each family. The black lines assign each name label to the correspondent red point for
visualization purposes.
162
Mean GC 
content (%)
0.5 1.5
0.745 0.754
2 6 10
0.71 0.533
5.5 6.5 7.5
−0.813 0.793
45 55 65
−0.793 0.594
2 4 6 8
0.867 0.928
2 6 10
0.683
25
40
55
0.908
0.
5
1.
5
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l l
l
l
l
lll
l
l
l
l ll
l
ll
l
l
l
l l
l l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
Mean CpG 
content (%)
0.743 0.961 0.356 −0.653 0.789 −0.79 0.943 0.951 0.853 0.958 0.846
l
ll
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l l l
ll
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
ll
l
l
l
l
l
l
l ll
l lll
l
l
l
l
l
ll l
l
l
l
l
l
l
lll
l
l
l
l
l
ll
l
l
l % of sites in 
protein−coding 
genes
0.676 0.879 −0.66 0.996 −0.996 0.652 0.822 0.893 0.723
35
450.846
2
6
10
l
l
l
l
l
l ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
ll
l
ll
l
l
ll
l
l
l
l
l
l
ll
l l
l
l
l
l
l
l
l l
l
l ll
l
l
l
l l
ll
l
l
l
l
l
l l
l ll
l
l
l
l l
l
l
l
l
l
l
l ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
lll
l
l
l l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l l
l l
l l
l l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l l
l
l
l
l
l
l l
l l l
l
l
l
l
l
% of sites 
in exons
0.242 −0.59 0.72 −0.721 0.896 0.904 0.783 0.907 0.806
l l
l
l
l
l
l
lll
l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l l
l
l
l
l
ll
l
l
l l
l
l
ll
lll
l
l
l
l
l
ll
l
l
ll
l
l
l
l
l
l
l
l
l
l l
l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l l
l
l l
l
l l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
ll
ll
l
ll
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l
l
l l
l
l
ll
ll
l
l
l
l
l
ll
l
ll
l
l
l
ll
l
l
l
l
l
l l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l l
l
l
l
l
l
lll
l
l
l
l
l l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l l
l
l
l
l
l
l
l
l l llll
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l
l
l l
l
l
ll
ll
l
l
l
l
l
ll
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l l
l
l
% of sites 
in introns
−0.487 0.844 −0.844 0.278 0.497 0.668 0.364
35
450.591
5.
5
7.
0 l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
% of sites 
in non−coding 
RNA genes
−0.706 0.705 −0.581 −0.775 −0.841 −0.673 −0.748
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l l
l
l
l
l
l
ll
l
l
l
l ll
l
l
l
l
l
l
l
l
ll
ll
l
l
l
l
ll
l
l
l
l
l
ll
l
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
lll
l
l
l
l
l
l
l ll
l llll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
lll
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
ll
l
l
l
ll
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l l
l
l
l
l
l
ll
l
ll
l
l l
l ll
l llll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l
ll
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l l
l l
l
l
l
l
l l
l
l
l
l ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
lll
l ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
ll l
ll l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l l
l
l
l
l
l % of intragenic 
sites
−1 0.701 0.865 0.924 0.774
35
45
55
0.875
45
55
65
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l l
l
l
l
l
l
ll
l
l
l
l ll
l
l
l
l
l
l
l
l
ll
ll
l
l
l
l
ll
l
l
l
l
l
ll
l
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
lll
l
l
l
l
l
l
l l
l
l ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
lll
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
l
l
ll
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l l
l
l
l
l
l
ll l
ll
l
l l
l l
l
l ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l
ll
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l l
l l
l
l
l
l
l l
l
l
l
l ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
ll
l
ll
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
ll
l
l
l
ll
l
l
l
l
l
l
l % of intergenic 
sites
−0.702 −0.865 −0.924 −0.774 −0.875
l
l
l
l
l llll
l
l
l
l
l
l
l l llll
lll
l l
l
l
ll
l
l
l ll
l
l
l
l l
l
l
ll
ll
l
l
l ll
l
l
l
l l l
l
l
l
l
l
l
l
l
l l
l
l
l
ll l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l llll
l
l l
l lll
ll
l
ll
l
l
l l
l
l
l
ll
ll
l
ll
l
l
l
l
lll
ll
l
l
l
l
l
l
ll
l
l
l
l
ll
lll
l
l
l
l
l
l l ll ll
ll l
l
l
lll
l
l
l
lll
l l l
l
l l
l
l
ll l
l
l
l ll
l
l
l
l
l
l
lll
l
l
l
l
l
ll l
l
l l
l
l
l l
l
ll
l
l
l l
l
l
l
l
l
l
l
l l
lll
l
l
l
l
l
l
l l
l
ll
l
ll
ll
l
l
ll l
l
l
l
l
l
l
ll
l
l ll
l
lll
l
l
ll
l
lll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
ll
ll ll
l
l
l
l
l
l
l l lll
l ll l
l l
l
l
l
l
l
l
lll
l ll
l
l
l l l
l
l
lll
l
l
l
l l
ll
l
l
l
l
l l
l
l
ll l
l l
l
l l
l
l
l l l l
l l l
l
l
ll
l
l
l
l
l
ll
l l l
l
l
l
l
l
l
lll l l l
ll l
l
ll l l l
l
l
l l
l l l l
l l l
l
l
l ll
l
l
l
l l
l
l
l
l
l
l
ll
ll l
ll
l
l
l
l
l
ll
l
l
l
l
l
l l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
lll
l
l
l
l
l
l l lll
l
l
l l
l
lll
l
l
l
lll
ll
l
l
l l
l
l
l
l
ll
l
l
l ll
l
l
l
l
l
l
lll
l
l
l
l
l
l
ll
l
l l
l
l
l l
l
ll
l
l
l
l
l
l
l
l
l
l
l l
l l l
l
l
l
l
l
lll l l
l
l
ll
l
lll
l
l
l
l l l
ll
l
l
ll
l
l
l ll
l
l
ll l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
l l
l
ll
l
l
ll
l
ll
l
l
l
l
l
l
% of sites 
in CGI
0.829 0.721 0.986
0
5
15
0.657
2
6
10
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l
l l
l
ll
l
l
l
ll
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l ll
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
ll
l
l l
l
l
l
l
l
l
ll
l
l l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l
l
l
l
l
l
l l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l l
l
l l
l
l
l
l
l
ll
l l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
l
ll
l
l l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l l
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l ll
l
l
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l l
ll
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l % of sites 
in shores
0.951 0.892 0.952
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
ll
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
ll
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l l
l
l
l
l
l
l
l
l
l
l
l
l % of sites 
in shelves
0.801
2
4
6
0.961
2
6
10
l
l
l
l
l lll
l
l
l
l
l
l
l
l
l
lll
l
l
l
ll
l
l
l
ll
ll
l
ll
l
l
l
l
l
l
l l
l
l
ll
ll
l
l
l l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l l
l l
l
l
l
l
l
l
l
l
l
l
l lll
l
l
l l
l
l
ll l
l
l
l l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l ll
l
l
l
l
l
l
l
l ll
l
l
l
l
l
l
l ll
l
l
l
l ll
lll
l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l
l
l ll
l
l
l l
l
l
l
l
l
l
l l
ll
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
ll
l l
ll l
l
l
l l
l
l
l
l
l
l
llll
l
l
l l
l
ll
l l l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l l
l
l
l
l
l
l
l l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l ll l
l
l
l
l
l
l
l
l
l
ll
l
l
l l
l
l
l
l
l ll
l
l
l
ll
l l
l
l
l
l l
l
l
l
l
l l
l
l
l
l
l
l l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l
l
l
l
l l l
l
l l
l
ll l
ll l
l
l
l
l
l l
l
l l
l
l
l
l l
l
l
l
l
l l
l
l
ll
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l ll
l
l
l
l
l
l
l ll
l
l
l
l
l
l
l
lll
l
l
l
l
l l
l
l l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l l
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l lll
l l
ll l
l
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
l
lll
l
l
l
l
ll
l
ll
l
l
l
l
l
l ll
l
l
l
l
l
l
l
l
l
ll
ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll ll
ll
lll
l
l
l
l
l
l
l
l
lll
l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll l
l
l
l
l
l lll
l
l
l
l
l
l lll
l
l
l
l
l
l
lll
lll
l
l
l
l l
l
l l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l
l
l
l lll
l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l ll
lll
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
% of sites in 
CGI−containing 
promoters
0.743
25 35 45 55
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
lll
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll
l
l ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
35 45
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
ll l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll
l
l l l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
l
35 40 45
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l l
l
lll
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
lll
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l l l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
35 45 55
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
ll l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l ll
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
ll
l
0 5 10 15
l
l
l
l
l
l
l
l
l
l l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
lll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
2 3 4 5 6
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
lll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
ll
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
0.8 1.2 1.6 2.0
0.
8
1.
4
2.
0
% of sites in non 
CGI−containing 
promoters
Fig. S3.2 Matrix of scatterplots showing the percentages of cleavage sites from different restriction enzymes
that overlap with several genomic features (listed on the diagonal) in the human genome (hg38). The red dot in
each scatterplot represents the values for MspI. The numbers above the diagonal are the Pearson correlation
coefficients between all the possible pairs of genomic features.
S.3 Supplementary for chapter 4 163
First author(s) Title Date
Single 
enzymes 
checked
Double 
enzymes 
checked
Size ranges 
interrogated
Genomic regions 
targeted Organism(s)
Read 
lengths 
tested
For 
sequencing
Code 
available
Cedar H
Direct detection of methylated 
cytosine in DNA by use of the 
restriction enzyme MspI
1979 YES NO NA NA
Neurospora 
crassa , herpes 
virus, fly, 
bovine
NA N N
Yu L
A NotI–EcoRV promoter library for 
studies of genetic and epigenetic 
alterations in mouse models of 
human malignancies 
2004 YES YES NA
CpG islands, 
protein-coding 
genes
Human (hg16), 
mouse (mm4) NA Y N
Wang J and Xia Y
Double restriction-enzyme digestion 
improves the coverage and accuracy 
of genome-wide CpG methylation 
profiling by reduced representation 
bisulfite sequencing 
2013 YES YES 2
Increase CpG 
coverage genome-
wide
Human (hg18), 
mouse(mm9)
50 bp PE, 
90 bp PE Y N
Bystrykh L A combinatorial approach to the restriction of a mouse genome 2013 YES YES NA NA Mouse (mm10) NA N N
Martinez-Arguelles 
DB
In silico analysis identifies novel 
restriction enzyme combinations that 
expand reduced representation 
bisulfite sequencing CpG coverage
2014 YES YES 1
Increase CpG 
coverage genome-
wide
Human (hg38), 
mouse 
(mm10),    rat 
(NCBI build 4.2)
50 bp PE Y N
Lee YK and Jin S
Improved reduced representation 
bisulfite sequencing for epigenomic 
profiling of clinical samples 
2014 YES YES 1
Increase CpG 
coverage genome-
wide
Human (hg19) 36 bp PE Y N
Kirschner SA
Focussing reduced representation 
CpG sequencing through judicious 
restriction enzyme choice 
2016 YES YES 2
Increase CpG 
coverage genome-
wide
Mouse (mm10) NA Y N
Tanas AS
Rapid and affordable genome-wide 
bisulfite DNA sequencing by XmaI-
reduced representation bisulfite 
sequencing
2017 YES NO 1 CpG islands Human (hg19) NA Y N
Martin-Herranz DE 
and Stubbs TM cuRRBS 2017 YES YES
Defined by 
the user
Defined by the 
user
Defined by the 
user
Defined by 
the user Y Y
Fig. S3.3 Table showing the comparison of different studies that have attempted to use restriction enzymes to
target different regions in the genome.
164
Annotation for
sites of
interest
Pre-computed
in silico
digestions
Annotation for
restriction
enzymes
Restriction
enzymes to
check
INPUT
Obtain the fragment size distribution
and the location of the sites of interest
in the digested fragments
Calculate the Score, NF/1000 and
EV variables for different size ranges
Find the optimal size range which
minimizes the EV
Filtering:
Score > C_Score ⋅ max_Score
NF/1000 ≤ C_NF/1000 ⋅ ref_NF/1000
For each enzyme or
enzyme combination
cuRRBS
Rank the enzymes or enzyme
combinations by EV and
calculate their CRF and robustness
CSV file containing information about the optimal enzymes and size
ranges to use in the new cuRRBS protocol OUTPUT
●
●
●
0.00
0.25
0.50
0.75
1.00
0
10
20
0.4 0.6 0.8 1.0
R > Q3 = 0.9834
Q1 <= R <= Q3
R < Q1 = 0.9580
a
b c
Individual 
enzymes
2-enzyme
combinations
All
Pe
ar
so
n'
s 
co
rr
el
at
io
n 
co
ef
fic
ie
nt
Trade-off between NF/1000 and Score
D
en
si
ty
Robustness (R)
Fig. S3.4 Additional insights into cuRRBS. a. Detailed flowchart showing the input, main steps in cuRRBS and
the output of the software. b. Violin plots showing the distribution of Pearson’s correlation coefficients between
the number of fragments (NF) and the Score for all the different enzymes tested with cuRRBS (single-enzyme,
double-enzyme, all). In this example we used the Horvath epigenetic clock system [Horvath, 2013a], checking
all the size ranges between 20 and 1000 bp, with an experimental error of 10 bp and a read length of 75
bp. Each yellow point represents the median for the Pearson’s correlation coefficients under consideration. c.
Density plot showing the distribution of the robustness (R) values when assuming an experimental error (δ ) of
20 bp. cuRRBS was run for all the biological systems under study (Fig. S3.5) [Domcke et al., 2015; Hanna
et al., 2016; Horvath, 2013a; Kawakatsu et al., 2016; Lev Maor et al., 2015; Maurano et al., 2015; Milagre et al.,
2017] with the same parameters as described in ‘Running cuRRBS for different in silico systems’ in section 4.7
(all the hits that satisfied the thresholds were reported in this case). The dashed blue line represents the median
(0.9734). The different colours provide a way to judge the robustness values: bad (in red, R < Q1 = 0.9580),
medium (in orange, Q1 ≤ R≤ Q3 = 0.9834) and good (in green, R > Q3); where Q1 and Q3 represent the first
and the third quartiles respectively.
S.3 Supplementary for chapter 4 165
Species System
PMID 
where 
applicable
Additional information about the system
Total 
number of 
sites 
targeted
Optimal restriction enzyme 
combination
Optimal 
theoretical size 
range (in bp)
% max 
Score NF /1000
Enrichment 
Value  (EV )
Cost 
Reduction 
Factor 
(CRF )
Robustness 
(R )
Homo sapiens
Exon-intron 
boundaries
DNA methylation has been shown to affect 
alternative splicing. Therefore, we focused on 
targeting CpGs close to canonical splicing 
sites.
26211 (BsiSI OR MspI) AND (SbfI OR SdaI OR Sse8387I) 80_500 25.4 772.23 2.06446811 53.32 0.94704403
Homo sapiens
Horvath 
epigenetic clock 24138928
The Horvath epigenetic clock is the best 
predictor of biological age available in 
humans. We have attempted to target the 353 
CpG sites that are used in the model in order 
to reduce the cost associated with the assay. 
353 (BsiSI OR MspI) AND (BspQI OR LguI OR SapI) 60_160 27.57 442.456 3.65771916 93.06 0.91305072
Homo sapiens Imprinted loci 26769960
Genomic imprinting is an epigenetic 
phenomenon that results in gene expression 
occuring in a parent-of-origin fashion. We 
have attempted to target Cs in CpG context 
that are found within the canonical human 
imprints.
2810 (BmeT110I OR BsoBI) AND (BsaWI) 60_540 25.12 336.88 2.67867053 122.23 0.98085689
Homo sapiens
Placental 
imprinted loci 26769960
Genomic imprinting is an epigenetic 
phenomenon that results in gene expression 
occuring in a parent-of-origin fashion. 
However, until recently many extraembryonic 
imprints were still unknown. We have 
targetted Cs in CpG context that are found 
within these novel human placental imprints.
7591 (BsaWI) AND (BssAI) 60_540 26.41 107.248 1.72827483 383.94 0.93382453
Homo sapiens CTCF sites 26257180
CTCF is an important architectural protein that 
helps to organise chromatin domains. Since its 
binding has been shown to be dependent on 
DNA methylation in some of its recognition 
sequences, we have targeted the CpG sites 
within these regions of the genome. 
2000 (BmeT110I OR BsoBI) AND (BssAI) 40_360 25.5 314.079 2.78946872 131.1 0.88798165
Mus musculus
iPSCs 
demethylated 28147265
iPSC reprogramming in mouse is characterised 
by global changes in DNA methylation. Sites 
that tend to undergo demethylation faster 
than the genome average tend to be within 
ESC-Super Enhancers. We targetted the Cs in 
CpG context in these regions, as they are 
interesting for the reprogramming field.
1449 (BmeT110I OR BsoBI) AND (BsiSI OR MspI) 80_980 25.19 974.05 3.42628839 37.31 0.96792238
Mus musculus iPSCs maintained 28147265
iPSC reprogramming in mouse is characterised 
by global changes in DNA methylation. Sites 
that tend to be resistant to the genome-wide 
demethylation tend to be within Intercisernal 
A-particle containing regions. We targetted 
the Cs in CpG context in these regions, as they 
are interesting for the reprogramming field.
3896 (BmeT110I OR BsoBI) AND (BsiSI OR MspI) 80_560 25.85 690.088 2.835875 52.66 0.94227711
Mus musculus NRF1 sites 26675734
NRF1 is a transcription factor whose binding 
to the DNA is dependent on the methylation 
status of its recognition sequences. We have 
tried to enrich for those CpG sites that overlap 
with in vivo  NRF1 binding sites.
17018 (BmeT110I OR BsoBI) AND (PaeI OR SphI) 20_760 25.04 445.36 2.01909776 81.6 0.99634045
Arabidopsis 
thaliana CHG sites 27419873
Non-CpG methylation is an important 
epigenetic modification in plants. In this study 
a huge number of regions containing non-CpG 
methylation were found to vary between 
different Arabidopsis accessions in the 1001 
Epigenomes Project. We targetted Cs in non-
CpG context within these non-CpG DMRs.
21801 (AanI OR PsiI) AND (Csp6I OR CviQI) 100_520 25.05 165.313 1.48095531 9.65 0.94999336
Fig. S3.5 Table showing the information regarding the different biological systems [Domcke et al., 2015; Hanna
et al., 2016; Horvath, 2013a; Kawakatsu et al., 2016; Lev Maor et al., 2015; Maurano et al., 2015; Milagre
et al., 2017] for which cuRRBS was run in silico. Some variables from the top hits in cuRRBS output are also
reported.
166
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
−0.6
−0.4
−0.2
0.0
0.2
0.4
0.6
Depth of coverage threshold
Di
ffe
re
nc
e 
in
 th
e 
nu
m
be
r o
f s
ite
s 
(%
)
FN
FP
TN
TP ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
25
50
75
100
5 10 15 20
Depth of coverage threshold
Pe
rc
en
ta
ge
 (%
)
●
●
●
●
Sensitivity; size range: 90−185 bp
Sensitivity; size range: 110−200 bp
Specificity; size range: 90−185 bp
Specificity; size range: 110−200 bp
a b
Fig. S3.6 Effect of experimental errors during size selection in cuRRBS predictions. a. Barplots showing the
difference in the number of true positives (TP, in green), true negatives (TN, in blue), false positives (FP, in
red) and false negatives (FN, in yellow) derived from cuRRBS theoretical predictions for the XmaI-RRBS data
[Tanas et al., 2017] using two different size ranges: 110-200 bp (aimed size range) and 90-185 bp (real size
range). The difference observed between the two size ranges (aimed - real) is expressed as the percentage of
the total number of sites considered (i.e. all CGI- CpGs). The number of sites in each category is calculated for
different thresholds in the depth of coverage (number of reads covering a CpG site as reported by Bismark).
cuRRBS was run for XmaI with all the default parameters (with a read length of 200 bp). Legend is displayed
on the right hand side. b. Plot showing values of cuRRBS sensitivity and specificity as a function of the depth of
coverage threshold employed to filter the experimental data [Tanas et al., 2017]. The two size ranges considered
in a. (aimed: 110-200 bp; real: 90-185 bp) are used for the calculations. Legend is displayed below the plot
curves.
S.3 Supplementary for chapter 4 167
● ●
●
●
●
●
●
●
●
●
●
●
●
0
250
500
750
1000
0 25 50 75 100 125
Number of enzymes
M
ea
n 
tim
e 
(s
)
●
●
●
●
● ●
●
●
●
●
100
120
140
160
10 20 30 40 50
Experimental error (bp)
M
ea
n 
tim
e 
(s
)
●
●
●
●
●●●●
100
150
200
250
300
0 25000 50000 75000 100000
Number of sites of interest
M
ea
n 
tim
e 
(s
)
●
●
●
50
75
100
125
150
0 1 2 3 4
Genome size (GB of pre−computed files)
M
ea
n 
tim
e 
(s
)
a b
c d
Fig. S3.7 cuRRBS computational efficiency. a. Plot showing the dependency between the number of enzymes
checked and the computational (real) time required by the software (mean between 3 independent runs).
cuRRBS was run for the Horvath epigenetic clock system [Horvath, 2013a] with a read length of 75 bp, a Score
threshold of 25% and an experimental error of 10 bp. A laptop with an Intel® CoreT M i7-6600U CPU was
used, which allowed cuRRBS to employ 4 parallel threads. The red error bars display the mean ± SD for the 3
independent runs. b. Plot showing the dependency between the experimental error (which determines how
many size ranges are sampled) and the computational (real) time required by the software (mean between 3
independent runs). cuRRBS was run for the Horvath epigenetic clock system [Horvath, 2013a] with a read
length of 75 bp, a Score threshold of 25% and a list with 40 enzymes. A laptop with an Intel® CoreT M i7-6600U
CPU was used, which allowed cuRRBS to employ 4 parallel threads. The red error bars display the mean ± SD
for the 3 independent runs. c. Plot showing the dependency between the number of sites of interest and the
computational (real) time required by the software (mean between 3 independent runs). cuRRBS was run with
a read length of 75 bp, a Score threshold of 25%, an experimental error of 10 bp and a list with 40 enzymes. A
laptop with an Intel® CoreT M i7-6600U CPU was used, which allowed cuRRBS to employ 4 parallel threads.
The red error bars display the mean ± SD for the 3 independent runs. d. Plot showing the dependency between
genome size (measured as the size in GB of all the pre-computed files) and the computational (real) time
required by the software (mean between 3 independent runs). cuRRBS was run with a read length of 75 bp, a
Score threshold of 25%, an experimental error of 10 bp and a list with 40 enzymes. A laptop with an Intel®
CoreT M i7-6600U CPU was used, which allowed cuRRBS to employ 4 parallel threads. The red error bars
display the mean ± SD for the 3 independent runs.

References
Akalin, A. (2014). AmpliconBiSeq GitHub repository: findElbow function.
Aldinger, K. A., Plummer, J. T., and Levitt, P. (2013). Comparative DNA methylation among
females with neurodevelopmental disorders and seizures identifies TAC1 as a MeCP2
target gene. Journal of Neurodevelopmental Disorders, 5(1):15.
Alexandrov, L. B., Jones, P. H., Wedge, D. C., Sale, J. E., Campbell, P. J., Nik-Zainal, S., and
Stratton, M. R. (2015). Clock-like mutational processes in human somatic cells. Nature
Genetics, 47(12):1402–1407.
Alexandrov, L. B. and Stratton, M. R. (2014). Mutational signatures: the patterns of somatic
mutations hidden in cancer genomes. Current Opinion in Genetics & Development,
24:52–60.
Alisch, R. S., Wang, T., Chopra, P., Visootsak, J., Conneely, K. N., and Warren, S. T. (2013).
Genome-wide analysis validates aberrant methylation in fragile X syndrome is specific to
the FMR1locus. BMC Medical Genetics, 14(1):18.
Allis, C. D. and Jenuwein, T. (2016). The molecular hallmarks of epigenetic control. Nature
Reviews Genetics, 17:487–500.
Allum, F., Shao, X., Guénard, F., Simon, M.-M., Busche, S., Caron, M., Lambourne, J.,
Lessard, J., Tandre, K., Hedman, Å. K., Kwan, T., Ge, B., Consortium, T. M. T. H. E. R.,
Ahmadi, K. R., Ainali, C., Barrett, A., Bataille, V., Bell, J. T., Buil, A., Dermitzakis, E. T.,
Dimas, A. S., Durbin, R., Glass, D., Hassanali, N., Ingle, C., Knowles, D., Krestyaninova,
M., Lindgren, C. M., Lowe, C. E., Meduri, E., di Meglio, P., Min, J. L., Montgomery,
S. B., Nestle, F. O., Nica, A. C., Nisbet, J., O’Rahilly, S., Parts, L., Potter, S., Sandling,
J., Sekowska, M., Shin, S.-Y., Small, K. S., Soranzo, N., Surdulescu, G., Travers, M. E.,
Tsaprouni, L., Tsoka, S., Wilk, A., Yang, T.-P., Zondervan, K. T., Rönnblom, L., McCarthy,
M. I., Deloukas, P., Richmond, T., Burgess, D., Spector, T. D., Tchernof, A., Marceau,
S., Lathrop, M., Vohl, M.-C., Pastinen, T., and Grundberg, E. (2015). Characterization
of functional methylomes by next-generation capture sequencing identifies novel disease-
associated variants. Nature Communications, 6:7211.
Angermueller, C., Lee, H. J., Reik, W., and Stegle, O. (2017). DeepCpG: accurate prediction
of single-cell DNA methylation states using deep learning. Genome Biology, 18(1):67.
Anisimov, V. N., Berstein, L. M., Egormin, P. A., Piskunova, T. S., Popovich, I. G., Zabezhin-
ski, M. A., Tyndyk, M. L., Yurova, M. V., Kovalenko, I. G., Poroshina, T. E., and
Semenchenko, A. V. (2008). Metformin slows down aging and extends life span of female
SHR mice. Cell Cycle, 7(17):2769–2773.
170 References
Arantes-Oliveira, N., Berman, J. R., and Kenyon, C. (2003). Healthy Animals with Extreme
Longevity. Science, 302(5645):611.
Aref-Eshghi, E., Bend, E. G., Hood, R. L., Schenkel, L. C., Carere, D. A., Chakrabarti, R.,
Nagamani, S. C. S., Cheung, S. W., Campeau, P. M., Prasad, C., Siu, V. M., Brady, L.,
Tarnopolsky, M. A., Callen, D. J., Innes, A. M., White, S. M., Meschino, W. S., Shuen,
A. Y., Paré, G., Bulman, D. E., Ainsworth, P. J., Lin, H., Rodenhiser, D. I., Hennekam,
R. C., Boycott, K. M., Schwartz, C. E., and Sadikovic, B. (2018a). BAFopathies’ DNA
methylation epi-signatures demonstrate diagnostic utility and functional continuum of
Coffin–Siris and Nicolaides–Baraitser syndromes. Nature Communications, 9(1):4885.
Aref-Eshghi, E., Rodenhiser, D. I., Schenkel, L. C., Lin, H., Skinner, C., Ainsworth, P.,
Paré, G., Hood, R. L., Bulman, D. E., Kernohan, K. D., Boycott, K. M., Campeau, P. M.,
Schwartz, C., and Sadikovic, B. (2018b). Genomic DNA Methylation Signatures Enable
Concurrent Diagnosis and Clinical Genetic Variant Classification in Neurodevelopmental
Syndromes. American Journal of Human Genetics, 102(1):156–174.
Aref-Eshghi, E., Schenkel, L. C., Lin, H., Skinner, C., Ainsworth, P., Paré, G., Rodenhiser,
D., Schwartz, C., and Sadikovic, B. (2017). The defining DNA methylation signature of
Kabuki syndrome enables functional assessment of genetic variants of unknown clinical
significance. Epigenetics, 12(11):923–933.
Armstrong, V. L., Rakoczy, S., Rojanathammanee, L., and Brown-Borg, H. M. (2013).
Expression of DNA Methyltransferases Is Influenced by Growth Hormone in the Long-
Living Ames Dwarf Mouse In Vivo and In Vitro. The Journals of Gerontology: Series A,
69(8):923–933.
Aryee, M. J., Jaffe, A. E., Corrada-Bravo, H., Ladd-Acosta, C., Feinberg, A. P., Hansen,
K. D., and Irizarry, R. A. (2014). Minfi: A flexible and comprehensive Bioconductor
package for the analysis of Infinium DNA methylation microarrays. Bioinformatics,
30(10):1363–1369.
Atlasi, Y. and Stunnenberg, H. G. (2017). The interplay of epigenetic marks during stem cell
differentiation and development. Nature Reviews Genetics, 18:643–658.
Austad, S. N. and Fischer, K. E. (2016). Sex Differences in Lifespan. Cell Metabolism,
23(6):1022–1033.
Avrahami, D., Li, C., Zhang, J., Schug, J., Avrahami, R., Rao, S., Stadler, M. B., Burger, L.,
Schübeler, D., Glaser, B., and Kaestner, K. H. (2015). Aging-Dependent Demethylation of
Regulatory Elements Correlates with Chromatin State and Improved Cell Function. Cell
Metabolism, 22(4):619–632.
Ayyadevara, S., Alla, R., Thaden, J. J., and Shmookler Reis, R. J. (2008). Remarkable
longevity and stress resistance of nematode PI3K-null mutants. Aging Cell, 7(1):13–22.
Bacalini, M. G., Deelen, J., Pirazzini, C., De Cecco, M., Giuliani, C., Lanzarini, C., Ravaioli,
F., Marasco, E., Van Heemst, D., Suchiman, H. E. D., Slieker, R., Giampieri, E., Recchioni,
R., Marcheselli, F., Salvioli, S., Vitale, G., Olivieri, F., Spijkerman, A. M., DollCrossed,
M. E., Sedivy, J. M., Castellani, G., Franceschi, C., Slagboom, P. E., and Garagnani, P.
(2017). Systemic Age-Associated DNA Hypermethylation of ELOVL2 Gene: In Vivo
References 171
and in Vitro Evidences of a Cell Replication Process. Journals of Gerontology - Series A
Biological Sciences and Medical Sciences, 72(8):1015–1023.
Bahcall, O. G. (2018). UK Biobank — a new era in genomic medicine. Nature Reviews
Genetics, 19(12):737.
Baker, D. J., Childs, B. G., Durik, M., Wijers, M. E., Sieben, C. J., Zhong, J., A. Saltness, R.,
Jeganathan, K. B., Verzosa, G. C., Pezeshki, A., Khazaie, K., Miller, J. D., and van Deursen,
J. M. (2016). Naturally occurring p16Ink4a-positive cells shorten healthy lifespan. Nature,
530:184–189.
Baker, D. J., Wijshake, T., Tchkonia, T., LeBrasseur, N. K., Childs, B. G., van de Sluis, B.,
Kirkland, J. L., and van Deursen, J. M. (2011). Clearance of p16Ink4a-positive senescent
cells delays ageing-associated disorders. Nature, 479:232–236.
Barau, J., Teissandier, A., Zamudio, N., Roy, S., Nalesso, V., Hérault, Y., Guillou, F., and
Bourc’his, D. (2016). The DNA methyltransferase DNMT3C protects male germ cells
from transposon activity. Science, 354(6314):909–912.
Barbi, E., Lagona, F., Marsili, M., Vaupel, J. W., and Wachter, K. W. (2018). The plateau of
human mortality: Demography of longevity pioneers. Science, 360(6396):1459–1461.
Bardet, A. F., Steinmann, J., Bafna, S., Knoblich, J. A., Zeitlinger, J., and Stark, A. (2013).
Identification of transcription factor binding sites from ChIP-seq data at high resolution.
Bioinformatics, 29(21):2705–2713.
Barzilai, N., Crandall, J. P., Kritchevsky, S. B., and Espeland, M. A. (2016). Metformin as a
Tool to Target Aging. Cell Metabolism, 23(6):1060–1065.
Baubec, T., Colombo, D. F., Wirbelauer, C., Schmidt, J., Burger, L., Krebs, A. R., Akalin, A.,
and Schübeler, D. (2015). Genomic profiling of DNA methyltransferases reveals a role for
DNMT3B in genic methylation. Nature, 520(7546):243–247.
Beerman, I., Bock, C., Garrison, B. S., Smith, Z. D., Gu, H., Meissner, A., and Rossi, D. J.
(2013). Proliferation-dependent alterations of the DNA methylation landscape underlie
hematopoietic stem cell aging. Cell Stem Cell, 12(4):413–425.
Benayoun, B. A., Pollina, E. A., and Brunet, A. (2015). Epigenetic regulation of ageing:
linking environmental inputs to genomic stability. Nature Reviews Molecular Cell Biology,
16:593–610.
Berdasco, M., Ropero, S., Setien, F., Fraga, M. F., Lapunzina, P., Losson, R., Alaminos, M.,
Cheung, N.-K., Rahman, N., and Esteller, M. (2009). Epigenetic inactivation of the Sotos
overgrowth syndrome gene histone methyltransferase NSD1 in human neuroblastoma and
glioma. Proceedings of the National Academy of Sciences, 106(51):21830–21835.
Bernhart, S. H., Kretzmer, H., Holdt, L. M., Jühling, F., Ammerpohl, O., Bergmann, A. K.,
Northoff, B. H., Doose, G., Siebert, R., Stadler, P. F., and Hoffmann, S. (2016). Changes of
bivalent chromatin coincide with increased expression of developmental genes in cancer.
Scientific Reports, 6:37393.
172 References
Bernstein, B. E., Mikkelsen, T. S., Xie, X., Kamal, M., Huebert, D. J., and Cuff, J. (2006).
A bivalent chromatin structure marks key developmental genes in embryonic stem cells.
Cell, 125(2):315–326.
Bernstein, D. L., Kameswaran, V., Le Lay, J. E., Sheaffer, K. L., and Kaestner, K. H. (2015).
The BisPCR2 method for targeted bisulfite sequencing. Epigenetics {&} Chromatin, 8:27.
Bibikova, M., Barnes, B., Tsan, C., Ho, V., Klotzle, B., Le, J. M., Delano, D., Zhang, L.,
Schroth, G. P., Gunderson, K. L., Fan, J. B., and Shen, R. (2011). High density DNA
methylation array with single CpG site resolution. Genomics, 98(4):288–295.
Bibikova, M., Le, J., Barnes, B., Saedinia-Melnyk, S., Zhou, L., Shen, R., and Gunder-
son, K. L. (2009). Genome-wide DNA methylation profiling using Infinium® assay.
Epigenomics, 1(1):177–200.
Bird, A. (2007). Perceptions of epigenetics. Nature, 447:396–398.
Bjornsson, H. T. (2015). The Mendelian disorders of the epigenetic machinery. Genome
Research, 25(10):1473–1481.
Blagosklonny, M. V. (2006). Aging and immortality: Quasi-programmed senescence and its
pharmacologic inhibition.
Blagosklonny, M. V. (2010). Revisiting the antagonistic pleiotropy theory of aging: TOR-
driven program and quasi-program. Cell Cycle, 9(16):3171–3176.
Blasco, M. A. (2007). Telomere length, stem cells and aging. Nature Chemical Biology,
3:640.
Bock, C., Walter, J., Paulsen, M., and Lengauer, T. (2007). CpG island mapping by epigenome
prediction. PLoS Comput Biol, 3(6):e110.
Bocklandt, S., Lin, W., Sehl, M. E., Sánchez, F. J., Sinsheimer, J. S., Horvath, S., and Vilain,
E. (2011). Epigenetic predictor of age. PLoS One, 6(6):e14821.
Bonkowski, M. S. and Sinclair, D. A. (2016). Slowing ageing by design: the rise of NAD+
and sirtuin-activating compounds. Nature Reviews Molecular Cell Biology, 17:679–690.
Booth, L. N. and Brunet, A. (2016). The Aging Epigenome. Molecular Cell, 62(5):728–744.
Bork, S., Pfister, S., Witt, H., Horn, P., Korn, B., Ho, A. D., and Wagner, W. (2010). DNA
methylation pattern changes upon long-term culture and aging of human mesenchymal
stromal cells. Aging Cell, 9(1):54–63.
Bourc’his, D., Xu, G.-L., Lin, C.-S., Bollman, B., and Bestor, T. H. (2001). Dnmt3L and the
Establishment of Maternal Genomic Imprints. Science, 294(5551):2536–2539.
Boyle, P., Clement, K., Gu, H., Smith, Z. D., Ziller, M., Fostel, J. L., Holmes, L., Meldrim,
J., Kelley, F., Gnirke, A., and Meissner, A. (2012). Gel-free multiplexed reduced represen-
tation bisulfite sequencing for large-scale DNA methylation profiling. Genome Biology,
13(10):R92.
References 173
Brinkman, A. B., Simmer, F., Ma, K., Kaan, A., Zhu, J., and Stunnenberg, H. G. (2010).
Whole-genome DNA methylation profiling using MethylCap-seq. Methods, 52(3):232–
236.
Bürkle, A., Moreno-Villanueva, M., Bernhard, J., Blasco, M., Zondag, G., Hoeijmakers,
J. H. J., Toussaint, O., Grubeck-Loebenstein, B., Mocchegiani, E., Collino, S., Gonos,
E. S., Sikora, E., Gradinaru, D., Dollé, M., Salmon, M., Kristensen, P., Griffiths, H. R.,
Libert, C., Grune, T., Breusing, N., Simm, A., Franceschi, C., Capri, M., Talbot, D.,
Caiafa, P., Friguet, B., Slagboom, P. E., Hervonen, A., Hurme, M., and Aspinall, R. (2015).
MARK-AGE biomarkers of ageing. Mechanisms of Ageing and Development, 151:2–12.
Butcher, D. T., Cytrynbaum, C., Turinsky, A. L., Siu, M. T., Inbar-Feigenberg, M., Mendoza-
Londono, R., Chitayat, D., Walker, S., Machado, J., Caluseriu, O., Dupuis, L., Grafo-
datskaya, D., Reardon, W., Gilbert-Dussardier, B., Verloes, A., Bilan, F., Milunsky, J. M.,
Basran, R., Papsin, B., Stockley, T. L., Scherer, S. W., Choufani, S., Brudno, M., and
Weksberg, R. (2017). CHARGE and Kabuki Syndromes: Gene-Specific DNA Methyla-
tion Signatures Identify Epigenetic Mechanisms Linking These Clinically Overlapping
Conditions. The American Journal of Human Genetics, 100(5):773–788.
Bystrykh, L. V. (2013). A combinatorial approach to the restriction of a mouse genome.
BMC Research Notes, 6(1):284.
Cai, L., Rothbart, S. B., Lu, R., Xu, B., Chen, W.-Y., Tripathy, A., Rockowitz, S., Zheng,
D., Patel, D. J., Allis, C. D., Strahl, B. D., Song, J., and Wang, G. G. (2013). An H3K36
Methylation-Engaging Tudor Motif of Polycomb-like Proteins Mediates PRC2 Complex
Targeting. Molecular Cell, 49(3):571–582.
Castillo-Fernandez, J. E., Spector, T. D., and Bell, J. T. (2014). Epigenetics of discordant
monozygotic twins: implications for disease. Genome Medicine, 6(7):60.
Cedar, H., Solage, A., Glaser, G., and Razin, A. (1979). Direct detection of methylated
cytosine in DNA by use of the restriction enzyme MspI. Nucleic Acids Research, 6(6):2125–
2132.
Chantalat, S., Depaux, A., Héry, P., Barral, S., Thuret, J. Y., Dimitrov, S., and Gérard,
M. (2011). Histone H3 trimethylation at lysine 36 is associated with constitutive and
facultative heterochromatin. Genome Research, 21:1426–1437.
Chen, B. H., Marioni, R. E., Colicino, E., Peters, M. J., Ward-Caviness, C. K., Tsai, P. C.,
Roetker, N. S., Just, A. C., Demerath, E. W., Guan, W., Bressler, J., Fornage, M., Studenski,
S., Vandiver, A. R., Moore, A. Z., Tanaka, T., Kiel, D. P., Liang, L., Vokonas, P., Schwartz,
J., Lunetta, K. L., Murabito, J. M., Bandinelli, S., Hernandez, D. G., Melzer, D., Nalls,
M., Pilling, L. C., Price, T. R., Singleton, A. B., Gieger, C., Holle, R., Kretschmer, A.,
Kronenberg, F., Kunze, S., Linseisen, J., Meisinger, C., Rathmann, W., Waldenberger,
M., Visscher, P. M., Shah, S., Wray, N. R., McRae, A. F., Franco, O. H., Hofman, A.,
Uitterlinden, A. G., Absher, D., Assimes, T., Levine, M. E., Lu, A. T., Tsao, P. S., Hou, L.,
Manson, J. A. E., Carty, C. L., LaCroix, A. Z., Reiner, A. P., Spector, T. D., Feinberg, A. P.,
Levy, D., Baccarelli, A., van Meurs, J., Bell, J. T., Peters, A., Deary, I. J., Pankow, J. S.,
Ferrucci, L., and Horvath, S. (2016a). DNA methylation-based measures of biological age:
Meta-analysis predicting time to death. Aging, 8(9):1844–1865.
174 References
Chen, T., Tsujimoto, N., and Li, E. (2004). The PWWP Domain of Dnmt3a and Dnmt3b
Is Required for Directing DNA Methylation to the Major Satellite Repeats at Pericentric
Heterochromatin. Molecular and Cellular Biology, 24(20):9048–9058.
Chen, Y., Zhang, Y., Zhao, G., Chen, C., Yang, P., Ye, S., and Tan, X. (2016b). Difference in
Leukocyte Composition between Women before and after Menopausal Age, and Distinct
Sexual Dimorphism. PLOS ONE, 11(9):e0162953.
Chen, Y.-a., Lemire, M., Choufani, S., Butcher, D. T., Grafodatskaya, D., Zanke, B. W.,
Gallinger, S., Hudson, T. J., and Weksberg, R. (2013). Discovery of cross-reactive
probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray.
Epigenetics, 8(2):203–209.
Cheung, W. A., Shao, X., Morin, A., Siroux, V., Kwan, T., Ge, B., Aïssi, D., Chen, L.,
Vasquez, L., Allum, F., Guénard, F., Bouzigon, E., Simon, M.-M., Boulier, E., Redensek,
A., Watt, S., Datta, A., Clarke, L., Flicek, P., Mead, D., Paul, D. S., Beck, S., Bourque, G.,
Lathrop, M., Tchernof, A., Vohl, M.-C., Demenais, F., Pin, I., Downes, K., Stunnenberg,
H. G., Soranzo, N., Pastinen, T., and Grundberg, E. (2017). Functional variation in
allelic methylomes underscores a strong genetic contribution and reveals novel epigenetic
alterations in the human epigenome. Genome Biology, 18(1):50.
Chinn, I. K., Blackburn, C. C., Manley, N. R., and Sempowski, G. D. (2012). Changes in
primary lymphoid organs with aging. Seminars in Immunology, 24(5):309–320.
Choufani, S., Cytrynbaum, C., Chung, B. H. Y., Turinsky, A. L., Grafodatskaya, D., Chen,
Y. A., Cohen, A. S. A., Dupuis, L., Butcher, D. T., Siu, M. T., Luk, H. M., Lo, I. F. M.,
Lam, S. T. S., Caluseriu, O., Stavropoulos, D. J., Reardon, W., Mendoza-Londono, R.,
Brudno, M., Gibson, W. T., Chitayat, D., and Weksberg, R. (2015). NSD1 mutations
generate a genome-wide DNA methylation signature. Nature Communications, 6:10207.
Ciccarone, F., Malavolta, M., Calabrese, R., Guastafierro, T., Bacalini, M. G., Reale, A.,
Franceschi, C., Capri, M., Hervonen, A., Hurme, M., Grubeck-Loebenstein, B., Koller, B.,
Bernhardt, J., Schon, C., Slagboom, P. E., Toussaint, O., Sikora, E., Gonos, E. S., Breusing,
N., Grune, T., Jansen, E., Dollé, M., Moreno-Villanueva, M., Sindlinger, T., Bürkle, A.,
Zampieri, M., and Caiafa, P. (2016). Age-dependent expression of DNMT1 and DNMT3B
in PBMCs from a large European population enrolled in the MARK-AGE study. Aging
Cell, 15(4):755–765.
Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg,
I., Hamelryck, T., Kauff, F., Wilczynski, B., and De Hoon, M. J. L. (2009). Biopython:
Freely available Python tools for computational molecular biology and bioinformatics.
Bioinformatics, 25(11):1422–1423.
Cohen-Karni, D., Xu, D., Apone, L., Fomenkov, A., Sun, Z., Davis, P. J., Morey Kinney,
S. R., Yamada-Mabuchi, M., Xu, S.-y., Davis, T., Pradhan, S., Roberts, R. J., and Zheng,
Y. (2011). The MspJI family of modification-dependent restriction endonucleases for
epigenetic studies. Proceedings of the National Academy of Sciences, 108(27):11040–
11045.
Cole, J. H., Ritchie, S. J., Bastin, M. E., Valdés Hernández, M. C., Muñoz Maniega, S., Royle,
N., Corley, J., Pattie, A., Harris, S. E., Zhang, Q., Wray, N. R., Redmond, P., Marioni,
References 175
R. E., Starr, J. M., Cox, S. R., Wardlaw, J. M., Sharp, D. J., and Deary, I. J. (2017a). Brain
age predicts mortality. Molecular Psychiatry, 23:1385–1392.
Cole, J. J., Robertson, N. A., Rather, M. I., Thomson, J. P., McBryan, T., Sproul, D., Wang,
T., Brock, C., Clark, W., Ideker, T., Meehan, R. R., Miller, R. A., Brown-Borg, H. M., and
Adams, P. D. (2017b). Diverse interventions that extend mouse lifespan suppress shared
age-associated epigenetic changes at critical gene regulatory regions. Genome Biology,
18(1):58.
Conboy, I. M., Conboy, M. J., Wagers, A. J., Girma, E. R., Weissman, I. L., and Rando,
T. A. (2005). Rejuvenation of aged progenitor cells by exposure to a young systemic
environment. Nature, 433(7027):760–764.
Consortium, I. H. G. S., Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody,
M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D.,
Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P.,
McKernan, K., Meldrim, J., Mesirov, J. P., Miranda, C., Morris, W., Naylor, J., Raymond,
C., Rosetti, M., Santos, R., Sheridan, A., Sougnez, C., Stange-Thomann, N., Stojanovic,
N., Subramanian, A., Wyman, D., Rogers, J., Sulston, J., Ainscough, R., Beck, S., Bentley,
D., Burton, J., Clee, C., Carter, N., Coulson, A., Deadman, R., Deloukas, P., Dunham, A.,
Dunham, I., Durbin, R., French, L., Grafham, D., Gregory, S., Hubbard, T., Humphray,
S., Hunt, A., Jones, M., Lloyd, C., McMurray, A., Matthews, L., Mercer, S., Milne, S.,
Mullikin, J. C., Mungall, A., Plumb, R., Ross, M., Shownkeen, R., Sims, S., Waterston,
R. H., Wilson, R. K., Hillier, L. W., McPherson, J. D., Marra, M. A., Mardis, E. R.,
Fulton, L. A., Chinwalla, A. T., Pepin, K. H., Gish, W. R., Chissoe, S. L., Wendl, M. C.,
Delehaunty, K. D., Miner, T. L., Delehaunty, A., Kramer, J. B., Cook, L. L., Fulton,
R. S., Johnson, D. L., Minx, P. J., Clifton, S. W., Hawkins, T., Branscomb, E., Predki,
P., Richardson, P., Wenning, S., Slezak, T., Doggett, N., Cheng, J.-F., Olsen, A., Lucas,
S., Elkin, C., Uberbacher, E., Frazier, M., Gibbs, R. A., Muzny, D. M., Scherer, S. E.,
Bouck, J. B., Sodergren, E. J., Worley, K. C., Rives, C. M., Gorrell, J. H., Metzker, M. L.,
Naylor, S. L., Kucherlapati, R. S., Nelson, D. L., Weinstock, G. M., Sakaki, Y., Fujiyama,
A., Hattori, M., Yada, T., Toyoda, A., Itoh, T., Kawagoe, C., Watanabe, H., Totoki, Y.,
Taylor, T., Weissenbach, J., Heilig, R., Saurin, W., Artiguenave, F., Brottier, P., Bruls, T.,
Pelletier, E., Robert, C., Wincker, P., Rosenthal, A., Platzer, M., Nyakatura, G., Taudien,
S., Rump, A., Smith, D. R., Doucette-Stamm, L., Rubenfield, M., Weinstock, K., Lee,
H. M., Dubois, J., Yang, H., Yu, J., Wang, J., Huang, G., Gu, J., Hood, L., Rowen, L.,
Madan, A., Qin, S., Davis, R. W., Federspiel, N. A., Abola, A. P., Proctor, M. J., Roe,
B. A., Chen, F., Pan, H., Ramser, J., Lehrach, H., Reinhardt, R., McCombie, W. R., de la
Bastide, M., Dedhia, N., Blöcker, H., Hornischer, K., Nordsiek, G., Agarwala, R., Aravind,
L., Bailey, J. A., Bateman, A., Batzoglou, S., Birney, E., Bork, P., Brown, D. G., Burge,
C. B., Cerutti, L., Chen, H.-C., Church, D., Clamp, M., Copley, R. R., Doerks, T., Eddy,
S. R., Eichler, E. E., Furey, T. S., Galagan, J., Gilbert, J. G. R., Harmon, C., Hayashizaki,
Y., Haussler, D., Hermjakob, H., Hokamp, K., Jang, W., Johnson, L. S., Jones, T. A.,
Kasif, S., Kaspryzk, A., Kennedy, S., Kent, W. J., Kitts, P., Koonin, E. V., Korf, I., Kulp,
D., Lancet, D., Lowe, T. M., McLysaght, A., Mikkelsen, T., Moran, J. V., Mulder, N.,
Pollara, V. J., Ponting, C. P., Schuler, G., Schultz, J., Slater, G., Smit, A. F. A., Stupka, E.,
Szustakowki, J., Thierry-Mieg, D., Thierry-Mieg, J., Wagner, L., Wallis, J., Wheeler, R.,
Williams, A., Wolf, Y. I., Wolfe, K. H., Yang, S.-P., Yeh, R.-F., Collins, F., Guyer, M. S.,
Peterson, J., Felsenfeld, A., Wetterstrand, K. A., Myers, R. M., Schmutz, J., Dickson, M.,
176 References
Grimwood, J., Cox, D. R., Olson, M. V., Kaul, R., Raymond, C., Shimizu, N., Kawasaki,
K., Minoshima, S., Evans, G. A., Athanasiou, M., Schultz, R., Patrinos, A., and Morgan,
M. J. (2001). Initial sequencing and analysis of the human genome. Nature, 409:860–921.
Consortium, M. G. S., Chinwalla, A. T., Cook, L. L., Delehaunty, K. D., Fewell, G. A.,
Fulton, L. A., Fulton, R. S., Graves, T. A., Hillier, L. W., Mardis, E. R., McPherson, J. D.,
Miner, T. L., Nash, W. E., Nelson, J. O., Nhan, M. N., Pepin, K. H., Pohl, C. S., Ponce,
T. C., Schultz, B., Thompson, J., Trevaskis, E., Waterston, R. H., Wendl, M. C., Wilson,
R. K., Yang, S.-P., An, P., Berry, E., Birren, B., Bloom, T., Brown, D. G., Butler, J., Daly,
M., David, R., Deri, J., Dodge, S., Foley, K., Gage, D., Gnerre, S., Holzer, T., Jaffe, D. B.,
Kamal, M., Karlsson, E. K., Kells, C., Kirby, A., Kulbokas III, E. J., Lander, E. S., Landers,
T., Leger, J. P., Levine, R., Lindblad-Toh, K., Mauceli, E., Mayer, J. H., McCarthy, M.,
Meldrim, J., Meldrim, J., Mesirov, J. P., Nicol, R., Nusbaum, C., Seaman, S., Sharpe,
T., Sheridan, A., Singer, J. B., Santos, R., Spencer, B., Stange-Thomann, N., Vinson,
J. P., Wade, C. M., Wierzbowski, J., Wyman, D., Zody, M. C., Birney, E., Goldman, N.,
Kasprzyk, A., Mongin, E., Rust, A. G., Slater, G., Stabenau, A., Ureta-Vidal, A., Whelan,
S., Ainscough, R., Attwood, J., Bailey, J., Barlow, K., Beck, S., Burton, J., Clamp, M.,
Clee, C., Coulson, A., Cuff, J., Curwen, V., Cutts, T., Davies, J., Eyras, E., Grafham, D.,
Gregory, S., Hubbard, T., Hunt, A., Jones, M., Joy, A., Leonard, S., Lloyd, C., Matthews,
L., McLaren, S., McLay, K., Meredith, B., Mullikin, J. C., Ning, Z., Oliver, K., Overton-
Larty, E., Plumb, R., Potter, S., Quail, M., Rogers, J., Scott, C., Searle, S., Shownkeen,
R., Sims, S., Wall, M., West, A. P., Willey, D., Williams, S., Abril, J. F., Guigó, R., Parra,
G., Agarwal, P., Agarwala, R., Church, D. M., Hlavina, W., Maglott, D. R., Sapojnikov,
V., Alexandersson, M., Pachter, L., Antonarakis, S. E., Dermitzakis, E. T., Reymond, A.,
Ucla, C., Baertsch, R., Diekhans, M., Furey, T. S., Hinrichs, A., Hsu, F., Karolchik, D.,
Kent, W. J., Roskin, K. M., Schwartz, M. S., Sugnet, C., Weber, R. J., Bork, P., Letunic,
I., Suyama, M., Torrents, D., Zdobnov, E. M., Botcherby, M., Brown, S. D., Campbell,
R. D., Jackson, I., Bray, N., Couronne, O., Dubchak, I., Poliakov, A., Rubin, E. M., Brent,
M. R., Flicek, P., Keibler, E., Korf, I., Batalov, S., Bult, C., Frankel, W. N., Carninci, P.,
Hayashizaki, Y., Kawai, J., Okazaki, Y., Cawley, S., Kulp, D., Wheeler, R., Chiaromonte,
F., Collins, F. S., Felsenfeld, A., Guyer, M., Peterson, J., Wetterstrand, K., Copley, R. R.,
Mott, R., Dewey, C., Dickens, N. J., Emes, R. D., Goodstadt, L., Ponting, C. P., Winter,
E., Dunn, D. M., von Niederhausern, A. C., Weiss, R. B., Eddy, S. R., Johnson, L. S.,
Jones, T. A., Elnitski, L., Kolbe, D. L., Eswara, P., Miller, W., O’Connor, M. J., Schwartz,
S., Gibbs, R. A., Muzny, D. M., Glusman, G., Smit, A., Green, E. D., Hardison, R. C.,
Yang, S., Haussler, D., Hua, A., Roe, B. A., Kucherlapati, R. S., Montgomery, K. T., Li,
J., Li, M., Lucas, S., Ma, B., McCombie, W. R., Morgan, M., Pevzner, P., Tesler, G.,
Schultz, J., Smith, D. R., Tromp, J., Worley, K. C., Lander, E. S., Abril, J. F., Agarwal,
P., Alexandersson, M., Antonarakis, S. E., Baertsch, R., Berry, E., Birney, E., Bork, P.,
Bray, N., Brent, M. R., Brown, D. G., Butler, J., Bult, C., Chiaromonte, F., Chinwalla,
A. T., Church, D. M., Clamp, M., Collins, F. S., Copley, R. R., Couronne, O., Cawley, S.,
Cuff, J., Curwen, V., Cutts, T., Daly, M., Dermitzakis, E. T., Dewey, C., Dickens, N. J.,
Diekhans, M., Dubchak, I., Eddy, S. R., Elnitski, L., Emes, R. D., Eswara, P., Eyras, E.,
Felsenfeld, A., Flicek, P., Frankel, W. N., Fulton, L. A., Furey, T. S., Gnerre, S., Glusman,
G., Goldman, N., Goodstadt, L., Green, E. D., Gregory, S., Guigó, R., Hardison, R. C.,
Haussler, D., Hillier, L. W., Hinrichs, A., Hlavina, W., Hsu, F., Hubbard, T., Jaffe, D. B.,
Kamal, M., Karolchik, D., Karlsson, E. K., Kasprzyk, A., Keibler, E., Kent, W. J., Kirby,
A., Kolbe, D. L., Korf, I., Kulbokas III, E. J., Kulp, D., Lander, E. S., Letunic, I., Li, M.,
Lindblad-Toh, K., Ma, B., Maglott, D. R., Mauceli, E., Mesirov, J. P., Miller, W., Mott,
References 177
R., Mullikin, J. C., Ning, Z., Pachter, L., Parra, G., Pevzner, P., Poliakov, A., Ponting,
C. P., Potter, S., Reymond, A., Roskin, K. M., Sapojnikov, V., Schultz, J., Schwartz, M. S.,
Schwartz, S., Searle, S., Singer, J. B., Slater, G., Smit, A., Stabenau, A., Sugnet, C.,
Suyama, M., Tesler, G., Torrents, D., Tromp, J., Ucla, C., Vinson, J. P., Wade, C. M.,
Weber, R. J., Wheeler, R., Winter, E., Yang, S.-P., Zdobnov, E. M., Waterston, R. H.,
Whelan, S., Worley, K. C., and Zody, M. C. (2002). Initial sequencing and comparative
analysis of the mouse genome. Nature, 420:520–562.
Consortium, N. R. E. M. (2013). Roadmap Epige-
nomics Chromatin State Model: emission parameters.
https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/
imputed12marks/jointModel/final/emissions_25_imputed12marks.png.
Consortium, N. R. E. M. (2014). Roadmap Epigenomics Chromatin State Model: raw data.
https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/
imputed12marks/jointModel/final/catMat/hg19_chromHMM_imputed25.gz.
Consortium, R. E., Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-
Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M. J., Amin, V., Whitaker,
J. W., Schultz, M. D., Ward, L. D., Sarkar, A., Quon, G., Sandstrom, R. S., Eaton, M. L.,
Wu, Y.-C., Pfenning, A., Wang, X., ClaussnitzerYaping Liu, M., Coarfa, C., Alan Harris,
R., Shoresh, N., Epstein, C. B., Gjoneska, E., Leung, D., Xie, W., David Hawkins, R.,
Lister, R., Hong, C., Gascard, P., Mungall, A. J., Moore, R., Chuah, E., Tam, A., Canfield,
T. K., Scott Hansen, R., Kaul, R., Sabo, P. J., Bansal, M. S., Carles, A., Dixon, J. R., Farh,
K.-H., Feizi, S., Karlic, R., Kim, A.-R., Kulkarni, A., Li, D., Lowdon, R., Elliott, G.,
Mercer, T. R., Neph, S. J., Onuchic, V., Polak, P., Rajagopal, N., Ray, P., Sallari, R. C.,
Siebenthall, K. T., Sinnott-Armstrong, N. A., Stevens, M., Thurman, R. E., Wu, J., Zhang,
B., Zhou, X., Abdennur, N., Adli, M., Akerman, M., Barrera, L., Antosiewicz-Bourget, J.,
Ballinger, T., Barnes, M. J., Bates, D., Bell, R. J. A., Bennett, D. A., Bianco, K., Bock, C.,
Boyle, P., Brinchmann, J., Caballero-Campo, P., Camahort, R., Carrasco-Alfonso, M. J.,
Charnecki, T., Chen, H., Chen, Z., Cheng, J. B., Cho, S., Chu, A., Chung, W.-Y., Cowan,
C., Athena Deng, Q., Deshpande, V., Diegel, M., Ding, B., Durham, T., Echipare, L.,
Edsall, L., Flowers, D., Genbacev-Krtolica, O., Gifford, C., Gillespie, S., Giste, E., Glass,
I. A., Gnirke, A., Gormley, M., Gu, H., Gu, J., Hafler, D. A., Hangauer, M. J., Hariharan,
M., Hatan, M., Haugen, E., He, Y., Heimfeld, S., Herlofsen, S., Hou, Z., Humbert, R.,
Issner, R., Jackson, A. R., Jia, H., Jiang, P., Johnson, A. K., Kadlecek, T., Kamoh, B.,
Kapidzic, M., Kent, J., Kim, A., Kleinewietfeld, M., Klugman, S., Krishnan, J., Kuan,
S., Kutyavin, T., Lee, A.-Y., Lee, K., Li, J., Li, N., Li, Y., Ligon, K. L., Lin, S., Lin, Y.,
Liu, J., Liu, Y., Luckey, C. J., Ma, Y. P., Maire, C., Marson, A., Mattick, J. S., Mayo, M.,
McMaster, M., Metsky, H., Mikkelsen, T., Miller, D., Miri, M., Mukame, E., Nagarajan,
R. P., Neri, F., Nery, J., Nguyen, T., O’Geen, H., Paithankar, S., Papayannopoulou, T.,
Pelizzola, M., Plettner, P., Propson, N. E., Raghuraman, S., Raney, B. J., Raubitschek,
A., Reynolds, A. P., Richards, H., Riehle, K., Rinaudo, P., Robinson, J. F., Rockweiler,
N. B., Rosen, E., Rynes, E., Schein, J., Sears, R., Sejnowski, T., Shafer, A., Shen, L.,
Shoemaker, R., Sigaroudinia, M., Slukvin, I., Stehling-Sun, S., Stewart, R., Subramanian,
S. L., Suknuntha, K., Swanson, S., Tian, S., Tilden, H., Tsai, L., Urich, M., Vaughn, I.,
Vierstra, J., Vong, S., Wagner, U., Wang, H., Wang, T., Wang, Y., Weiss, A., Whitton,
H., Wildberg, A., Witt, H., Won, K.-J., Xie, M., Xing, X., Xu, I., Xuan, Z., Ye, Z., Yen,
C.-a., Yu, P., Zhang, X., Zhang, X., Zhao, J., Zhou, Y., Zhu, J., Zhu, Y., Ziegler, S.,
178 References
Beaudet, A. E., Boyer, L. A., De Jager, P. L., Farnham, P. J., Fisher, S. J., Haussler, D.,
Jones, S. J. M., Li, W., Marra, M. A., McManus, M. T., Sunyaev, S., Thomson, J. A.,
Tlsty, T. D., Tsai, L.-H., Wang, W., Waterland, R. A., Zhang, M. Q., Chadwick, L. H.,
Bernstein, B. E., Costello, J. F., Ecker, J. R., Hirst, M., Meissner, A., Milosavljevic, A.,
Ren, B., Stamatoyannopoulos, J. A., Wang, T., Kellis, M., Kundaje, A., Meuleman, W.,
Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J.,
Ziller, M. J., Amin, V., Whitaker, J. W., Schultz, M. D., Ward, L. D., Sarkar, A., Quon,
G., Sandstrom, R. S., Eaton, M. L., Wu, Y.-C., Pfenning, A. R., Wang, X., Claussnitzer,
M., Liu, Y., Coarfa, C., Harris, R. A., Shoresh, N., Epstein, C. B., Gjoneska, E., Leung,
D., Xie, W., Hawkins, R. D., Lister, R., Hong, C., Gascard, P., Mungall, A. J., Moore, R.,
Chuah, E., Tam, A., Canfield, T. K., Hansen, R. S., Kaul, R., Sabo, P. J., Bansal, M. S.,
Carles, A., Dixon, J. R., Farh, K.-H., Feizi, S., Karlic, R., Kim, A.-R., Kulkarni, A., Li, D.,
Lowdon, R., Elliott, G., Mercer, T. R., Neph, S. J., Onuchic, V., Polak, P., Rajagopal, N.,
Ray, P., Sallari, R. C., Siebenthall, K. T., Sinnott-Armstrong, N. A., Stevens, M., Thurman,
R. E., Wu, J., Zhang, B., Zhou, X., Beaudet, A. E., Boyer, L. A., De Jager, P. L., Farnham,
P. J., Fisher, S. J., Haussler, D., Jones, S. J. M., Li, W., Marra, M. A., McManus, M. T.,
Sunyaev, S., Thomson, J. A., Tlsty, T. D., Tsai, L.-H., Wang, W., Waterland, R. A., Zhang,
M. Q., Chadwick, L. H., Bernstein, B. E., Costello, J. F., Ecker, J. R., Hirst, M., Meissner,
A., Milosavljevic, A., Ren, B., Stamatoyannopoulos, J. A., Wang, T., and Kellis, M. (2015).
Integrative analysis of 111 reference human epigenomes. Nature, 518:317–330.
Consortium, T. E. P., Dunham, I., Kundaje, A., Aldred, S. F., Collins, P. J., Davis, C. A.,
Doyle, F., Epstein, C. B., Frietze, S., Harrow, J., Kaul, R., Khatun, J., Lajoie, B. R., Landt,
S. G., Lee, B.-K., Pauli, F., Rosenbloom, K. R., Sabo, P., Safi, A., Sanyal, A., Shoresh, N.,
Simon, J. M., Song, L., Trinklein, N. D., Altshuler, R. C., Birney, E., Brown, J. B., Cheng,
C., Djebali, S., Dong, X., Dunham, I., Ernst, J., Furey, T. S., Gerstein, M., Giardine, B.,
Greven, M., Hardison, R. C., Harris, R. S., Herrero, J., Hoffman, M. M., Iyer, S., Kellis,
M., Khatun, J., Kheradpour, P., Kundaje, A., Lassmann, T., Li, Q., Lin, X., Marinov, G. K.,
Merkel, A., Mortazavi, A., Parker, S. C. J., Reddy, T. E., Rozowsky, J., Schlesinger, F.,
Thurman, R. E., Wang, J., Ward, L. D., Whitfield, T. W., Wilder, S. P., Wu, W., Xi, H. S.,
Yip, K. Y., Zhuang, J., Bernstein, B. E., Birney, E., Dunham, I., Green, E. D., Gunter,
C., Snyder, M., Pazin, M. J., Lowdon, R. F., Dillon, L. A. L., Adams, L. B., Kelly, C. J.,
Zhang, J., Wexler, J. R., Green, E. D., Good, P. J., Feingold, E. A., Bernstein, B. E., Birney,
E., Crawford, G. E., Dekker, J., Elnitski, L., Farnham, P. J., Gerstein, M., Giddings, M. C.,
Gingeras, T. R., Green, E. D., Guigó, R., Hardison, R. C., Hubbard, T. J., Kellis, M., Kent,
W. J., Lieb, J. D., Margulies, E. H., Myers, R. M., Snyder, M., Stamatoyannopoulos, J. A.,
Tenenbaum, S. A., Weng, Z., White, K. P., Wold, B., Khatun, J., Yu, Y., Wrobel, J., Risk,
B. A., Gunawardena, H. P., Kuiper, H. C., Maier, C. W., Xie, L., Chen, X., Giddings,
M. C., Bernstein, B. E., Epstein, C. B., Shoresh, N., Ernst, J., Kheradpour, P., Mikkelsen,
T. S., Gillespie, S., Goren, A., Ram, O., Zhang, X., Wang, L., Issner, R., Coyne, M. J.,
Durham, T., Ku, M., Truong, T., Ward, L. D., Altshuler, R. C., Eaton, M. L., Kellis, M.,
Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer,
A., Lagarde, J., Lin, W., Schlesinger, F., Xue, C., Marinov, G. K., Khatun, J., Williams,
B. A., Zaleski, C., Rozowsky, J., Röder, M., Kokocinski, F., Abdelhamid, R. F., Alioto,
T., Antoshechkin, I., Baer, M. T., Batut, P., Bell, I., Bell, K., Chakrabortty, S., Chen, X.,
Chrast, J., Curado, J., Derrien, T., Drenkow, J., Dumais, E., Dumais, J., Duttagupta, R.,
Fastuca, M., Fejes-Toth, K., Ferreira, P., Foissac, S., Fullwood, M. J., Gao, H., Gonzalez,
D., Gordon, A., Gunawardena, H. P., Howald, C., Jha, S., Johnson, R., Kapranov, P., King,
B., Kingswood, C., Li, G., Luo, O. J., Park, E., Preall, J. B., Presaud, K., Ribeca, P., Risk,
References 179
B. A., Robyr, D., Ruan, X., Sammeth, M., Sandhu, K. S., Schaeffer, L., See, L.-H., Shahab,
A., Skancke, J., Suzuki, A. M., Takahashi, H., Tilgner, H., Trout, D., Walters, N., Wang,
H., Wrobel, J., Yu, Y., Hayashizaki, Y., Harrow, J., Gerstein, M., Hubbard, T. J., Reymond,
A., Antonarakis, S. E., Hannon, G. J., Giddings, M. C., Ruan, Y., Wold, B., Carninci, P.,
Guigó, R., Gingeras, T. R., Rosenbloom, K. R., Sloan, C. A., Learned, K., Malladi, V. S.,
Wong, M. C., Barber, G. P., Cline, M. S., Dreszer, T. R., Heitner, S. G., Karolchik, D.,
Kent, W. J., Kirkup, V. M., Meyer, L. R., Long, J. C., Maddren, M., Raney, B. J., Furey,
T. S., Song, L., Grasfeder, L. L., Giresi, P. G., Lee, B.-K., Battenhouse, A., Sheffield,
N. C., Simon, J. M., Showers, K. A., Safi, A., London, D., Bhinge, A. A., Shestak, C.,
Schaner, M. R., Ki Kim, S., Zhang, Z. Z., Mieczkowski, P. A., Mieczkowska, J. O., Liu, Z.,
McDaniell, R. M., Ni, Y., Rashid, N. U., Kim, M. J., Adar, S., Zhang, Z., Wang, T., Winter,
D., Keefe, D., Birney, E., Iyer, V. R., Lieb, J. D., Crawford, G. E., Li, G., Sandhu, K. S.,
Zheng, M., Wang, P., Luo, O. J., Shahab, A., Fullwood, M. J., Ruan, X., Ruan, Y., Myers,
R. M., Pauli, F., Williams, B. A., Gertz, J., Marinov, G. K., Reddy, T. E., Vielmetter, J.,
Partridge, E., Trout, D., Varley, K. E., Gasper, C., Bansal, A., Pepke, S., Jain, P., Amrhein,
H., Bowling, K. M., Anaya, M., Cross, M. K., King, B., Muratet, M. A., Antoshechkin, I.,
Newberry, K. M., McCue, K., Nesmith, A. S., Fisher-Aylor, K. I., Pusey, B., DeSalvo, G.,
Parker, S. L., Balasubramanian, S., Davis, N. S., Meadows, S. K., Eggleston, T., Gunter,
C., Newberry, J. S., Levy, S. E., Absher, D. M., Mortazavi, A., Wong, W. H., Wold, B.,
Blow, M. J., Visel, A., Pennachio, L. A., Elnitski, L., Margulies, E. H., Parker, S. C. J.,
Petrykowska, H. M., Abyzov, A., Aken, B., Barrell, D., Barson, G., Berry, A., Bignell, A.,
Boychenko, V., Bussotti, G., Chrast, J., Davidson, C., Derrien, T., Despacio-Reyes, G.,
Diekhans, M., Ezkurdia, I., Frankish, A., Gilbert, J., Gonzalez, J. M., Griffiths, E., Harte,
R., Hendrix, D. A., Howald, C., Hunt, T., Jungreis, I., Kay, M., Khurana, E., Kokocinski,
F., Leng, J., Lin, M. F., Loveland, J., Lu, Z., Manthravadi, D., Mariotti, M., Mudge,
J., Mukherjee, G., Notredame, C., Pei, B., Rodriguez, J. M., Saunders, G., Sboner, A.,
Searle, S., Sisu, C., Snow, C., Steward, C., Tanzer, A., Tapanari, E., Tress, M. L., van
Baren, M. J., Walters, N., Washietl, S., Wilming, L., Zadissa, A., Zhang, Z., Brent, M.,
Haussler, D., Kellis, M., Valencia, A., Gerstein, M., Reymond, A., Guigó, R., Harrow,
J., Hubbard, T. J., Landt, S. G., Frietze, S., Abyzov, A., Addleman, N., Alexander, R. P.,
Auerbach, R. K., Balasubramanian, S., Bettinger, K., Bhardwaj, N., Boyle, A. P., Cao,
A. R., Cayting, P., Charos, A., Cheng, Y., Cheng, C., Eastman, C., Euskirchen, G., Fleming,
J. D., Grubert, F., Habegger, L., Hariharan, M., Harmanci, A., Iyengar, S., Jin, V. X.,
Karczewski, K. J., Kasowski, M., Lacroute, P., Lam, H., Lamarre-Vincent, N., Leng, J.,
Lian, J., Lindahl-Allen, M., Min, R., Miotto, B., Monahan, H., Moqtaderi, Z., Mu, X. J.,
O’Geen, H., Ouyang, Z., Patacsil, D., Pei, B., Raha, D., Ramirez, L., Reed, B., Rozowsky,
J., Sboner, A., Shi, M., Sisu, C., Slifer, T., Witt, H., Wu, L., Xu, X., Yan, K.-K., Yang, X.,
Yip, K. Y., Zhang, Z., Struhl, K., Weissman, S. M., Gerstein, M., Farnham, P. J., Snyder,
M., Tenenbaum, S. A., Penalva, L. O., Doyle, F., Karmakar, S., Landt, S. G., Bhanvadia,
R. R., Choudhury, A., Domanus, M., Ma, L., Moran, J., Patacsil, D., Slifer, T., Victorsen,
A., Yang, X., Snyder, M., White, K. P., Auer, T., Centanin, L., Eichenlaub, M., Gruhl,
F., Heermann, S., Hoeckendorf, B., Inoue, D., Kellner, T., Kirchmaier, S., Mueller, C.,
Reinhardt, R., Schertel, L., Schneider, S., Sinn, R., Wittbrodt, B., Wittbrodt, J., Weng, Z.,
Whitfield, T. W., Wang, J., Collins, P. J., Aldred, S. F., Trinklein, N. D., Partridge, E. C.,
Myers, R. M., Dekker, J., Jain, G., Lajoie, B. R., Sanyal, A., Balasundaram, G., Bates,
D. L., Byron, R., Canfield, T. K., Diegel, M. J., Dunn, D., Ebersol, A. K., Frum, T., Garg,
K., Gist, E., Hansen, R. S., Boatman, L., Haugen, E., Humbert, R., Jain, G., Johnson,
A. K., Johnson, E. M., Kutyavin, T. V., Lajoie, B. R., Lee, K., Lotakis, D., Maurano, M. T.,
180 References
Neph, S. J., Neri, F. V., Nguyen, E. D., Qu, H., Reynolds, A. P., Roach, V., Rynes, E.,
Sabo, P., Sanchez, M. E., Sandstrom, R. S., Sanyal, A., Shafer, A. O., Stergachis, A. B.,
Thomas, S., Thurman, R. E., Vernot, B., Vierstra, J., Vong, S., Wang, H., Weaver, M. A.,
Yan, Y., Zhang, M., Akey, J. M., Bender, M., Dorschner, M. O., Groudine, M., MacCoss,
M. J., Navas, P., Stamatoyannopoulos, G., Kaul, R., Dekker, J., Stamatoyannopoulos, J. A.,
Dunham, I., Beal, K., Brazma, A., Flicek, P., Herrero, J., Johnson, N., Keefe, D., Lukk,
M., Luscombe, N. M., Sobral, D., Vaquerizas, J. M., Wilder, S. P., Batzoglou, S., Sidow,
A., Hussami, N., Kyriazopoulou-Panagiotopoulou, S., Libbrecht, M. W., Schaub, M. A.,
Kundaje, A., Hardison, R. C., Miller, W., Giardine, B., Harris, R. S., Wu, W., Bickel, P. J.,
Banfai, B., Boley, N. P., Brown, J. B., Huang, H., Li, Q., Li, J. J., Noble, W. S., Bilmes,
J. A., Buske, O. J., Hoffman, M. M., Sahu, A. D., Kharchenko, P. V., Park, P. J., Baker,
D., Taylor, J., Weng, Z., Iyer, S., Dong, X., Greven, M., Lin, X., Wang, J., Xi, H. S.,
Zhuang, J., Gerstein, M., Alexander, R. P., Balasubramanian, S., Cheng, C., Harmanci,
A., Lochovsky, L., Min, R., Mu, X. J., Rozowsky, J., Yan, K.-K., Yip, K. Y., and Birney,
E. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature,
489:57–74.
Curran, S. P., Wu, X., Riedel, C. G., and Ruvkun, G. (2009). A soma-to-germline transfor-
mation in long-lived Caenorhabditis elegans mutants. Nature, 459:1079–1084.
Czesnikiewicz-Guzik, M., Lee, W.-W., Cui, D., Hiruma, Y., Lamar, D. L., Yang, Z.-Z.,
Ouslander, J. G., Weyand, C. M., and Goronzy, J. J. (2008). T cell subset-specific
susceptibility to aging. Clinical Immunology, 127(1):107–118.
Dale, R. K., Pedersen, B. S., and Quinlan, A. R. (2011). Pybedtools: a flexible Python library
for manipulating genomic datasets and annotations. Bioinformatics, 27(24):3423–3424.
Davey, J. W. and Blaxter, M. L. (2011). RADSeq: next-generation population genetics.
Briefings in Functional Genomics, 9(5-6):416–423.
Davey, J. W., Hohenlohe, P. A., Etter, P. D., Boone, J. Q., Catchen, J. M., and Blaxter, M. L.
(2011). Genome-wide genetic marker discovery and genotyping using next-generation
sequencing. Nature Reviews Genetics, 12:499–510.
Davis, S. and Meltzer, P. S. (2007). GEOquery: a bridge between the Gene Expression
Omnibus (GEO) and BioConductor. Bioinformatics, 23(14):1846–1847.
Day, K., Waite, L. L., Thalacker-Mercer, A., West, A., Bamman, M. M., and Brooks, J. D.
(2013). Differential DNA methylation with age displays both common and dynamic
features across human tissues that are influenced by CpG landscape. Genome Biology,
14:R102.
De Cecco, M., Criscione, S. W., Peterson, A. L., Neretti, N., Sedivy, J. M., and Kreiling,
J. A. (2013). Transposable elements become active and mobile in the genomes of aging
mammalian somatic tissues. Aging, 5(12):867–883.
De Cecco, M., Ito, T., Petrashen, A. P., Elias, A. E., Skvir, N. J., Criscione, S. W., Caligiana,
A., Brocculi, G., Adney, E. M., Boeke, J. D., Le, O., Beauséjour, C., Ambati, J., Ambati,
K., Simon, M., Seluanov, A., Gorbunova, V., Slagboom, P. E., Helfand, S. L., Neretti, N.,
and Sedivy, J. M. (2019). L1 drives IFN in senescent cells and promotes age-associated
inflammation. Nature, 566(7742):73–78.
References 181
De Magalhães, J. P. and Costa, J. (2009). A database of vertebrate longevity records and their
relation to other life-history traits. Journal of Evolutionary Biology, 22(8):1770–1774.
de Magalhães, J. P. (2012). Programmatic features of aging originating in development:
aging mechanisms beyond molecular damage? The FASEB Journal, 26(12):4821–4826.
Dedeurwaerder, S., Defrance, M., Calonne, E., Denis, H., Sotiriou, C., and Fuks, F. (2011).
Evaluation of the Infinium Methylation 450K technology. Epigenomics, 3(6):771–784.
Dekker, J., Marti-Renom, M. A., and Mirny, L. A. (2013). Exploring the three-dimensional
organization of genomes: interpreting chromatin interaction data. Nature Reviews Genetics,
14:390–403.
Deng, J., Shoemaker, R., Xie, B., Gore, A., LeProust, E. M., Antosiewicz-Bourget, J., Egli,
D., Maherali, N., Park, I.-H., Yu, J., Daley, G. Q., Eggan, K., Hochedlinger, K., Thomson,
J., Wang, W., Gao, Y., and Zhang, K. (2009). Targeted bisulfite sequencing reveals changes
in DNA methylation associated with nuclear reprogramming. Nature Biotechnology,
27:353–360.
Dhayalan, A., Rajavelu, A., Rathert, P., Tamas, R., Jurkowska, R. Z., Ragozin, S., and Jeltsch,
A. (2010). The Dnmt3a PWWP domain reads histone 3 lysine 36 trimethylation and guides
DNA methylation. Journal of Biological Chemistry, 285:26114–26120.
Diep, D., Plongthongkum, N., Gore, A., Fung, H.-L., Shoemaker, R., and Zhang, K. (2012).
Library-free methylation sequencing with bisulfite padlock probes. Nature Methods,
9:270–272.
Dillin, A., Crawford, D. K., and Kenyon, C. (2002). Timing Requirements for Insulin/IGF-1
Signaling in C. elegans. Science, 298(5594):830–834.
Domcke, S., Bardet, A. F., Adrian Ginno, P., Hartl, D., Burger, L., and Schübeler, D. (2015).
Competition between DNA methylation and transcription factors determines binding of
NRF1. Nature, 528(7583):575–579.
Dong, X., Milholland, B., and Vijg, J. (2016). Evidence for a limit to human lifespan. Nature,
538:257–259.
Dozmorov, M. G. (2015). Polycomb repressive complex 2 epigenomic signature defines age-
associated hypermethylation and gene expression changes. Epigenetics, 10(6):484–495.
Du, P., Zhang, X., Huang, C. . C., Jafari, N., Kibbe, W. A., Hou, L., and Lin, S. M. (2010).
Comparison of Beta-value and M-value methods for quantifying methylation levels by
microarray analysis. BMC Bioinformatics, 11:587.
Eaton, M. L. (2007). Linear Statistical Models. In Multivariate Statistics: A Vector Space
Approach, pages 132–158.
Edgar, R., Domrachev, M., and Lash, A. (2002). Gene Expression Omnibus: NCBI gene
expression and hybridization array data repository. Nucleic Acids Research, 30(1):207–
210.
182 References
El Khoury, L. Y., Gorrie-Stone, T., Smart, M., Hughes, A., Bao, Y., Andrayas, A., Burrage,
J., Hannon, E., Kumari, M., Mill, J., and Schalkwyk, L. C. (2018). Properties of the
epigenetic clock and age acceleration. bioRxiv, page 363143.
Enguix, A., Cubiles, M. D., Barroso, S., Aguilera, A., Vaquero-Sedas, M. I., and Vega-
Palas, M. A. (2018). Epigenetic features of human telomeres. Nucleic Acids Research,
46(5):2347–2355.
Ernst, J. and Kellis, M. (2010). Discovery and characterization of chromatin states for
systematic annotation of the human genome. Nature Biotechnology, 28:817–825.
Feldman, L., Andersen, S. L., Perls, T. T., Dworkis, D. A., and Sebastiani, P. (2012).
Health Span Approximates Life Span Among Many Supercentenarians: Compression of
Morbidity at the Approximate Limit of Life Span. The Journals of Gerontology: Series A,
67A(4):395–405.
Fernández, A. F., Bayón, G. F., Urdinguio, R. G., Toraño, E. G., García, M. G., Carella,
A., Petrus-Reurer, S., Ferrero, C., Martinez-Camblor, P., Cubillo, I., García-Castro, J.,
Delgado-Calle, J. U., Pérez-Campo, F. M., Riancho, J. A., Bueno, C., Menéndez, P.,
Mentink, A., Mareschi, K., Claire, F., Fagnani, C., Medda, E., Toccaceli, V., Brescianini,
S., Moran, S., Esteller, M., Stolzing, A., De Boer, J., Nistico, L., Stazi, M. A., and Fraga,
M. F. (2015). H3K4me1 marks DNA regions hypomethylated during aging in human stem
and differentiated cells. Genome Research, 25:27–40.
Feser, J., Truong, D., Das, C., Carson, J. J., Kieft, J., Harkness, T., and Tyler, J. K. (2010).
Elevated Histone Expression Promotes Life Span Extension. Molecular Cell, 39(5):724–
735.
Field, A. E., Robertson, N. A., Wang, T., Havas, A., Ideker, T., and Adams, P. D. (2018).
DNA Methylation Clocks in Aging: Categories, Causes, and Consequences. Molecular
Cell, 71(6):882–895.
Finch, C. E. (2009). Update on Slow Aging and Negligible Senescence – A Mini-Review.
Gerontology, 55(3):307–313.
Fine, M. (2014). Intergenerational perspectives on ageing, economics and globalisation.
Australasian Journal on Ageing, 33(4):220–225.
FitzGerald, G., Botstein, D., Califf, R., Collins, R., Peters, K., Van Bruggen, N., and Rader,
D. (2018). The future of humans as model organisms. Science, 361(6402):552–553.
Flanagan, J. M. (2015). Epigenome-Wide Association Studies (EWAS): Past, Present, and
Future. In Verma, M., editor, Cancer Epigenetics: Risk Assessment, Diagnosis, Treatment
and Prognosis, pages 51–63. Springer New York, New York, NY.
Flavahan, W. A., Gaskell, E., and Bernstein, B. E. (2017). Epigenetic plasticity and the
hallmarks of cancer. Science, 357(6348):eaal2380.
Fleischer, T., Gampe, J., Scheuerlein, A., and Kerth, G. (2017). Rare catastrophic events
drive population dynamics in a bat species with negligible senescence. Scientific Reports,
7(1):7370.
References 183
Fontana, L. and Partridge, L. (2015). Promoting health and longevity through diet: From
model organisms to humans. Cell, 161(1):106–118.
Fortin, J.-P. and Hansen, K. D. (2015). minfi guidelines: analysis of 450K data using minfi.
Fortin, J.-P., Labbe, A., Lemire, M., Zanke, B. W., Hudson, T. J., Fertig, E. J., Greenwood, C.
M. T., and Hansen, K. D. (2014). Functional normalization of 450k methylation array data
improves replication in large cancer studies. Genome Biology, 15(11):503.
Fraga, M. F., Ballestar, E., Paz, M. F., Ropero, S., Setien, F., Ballestar, M. L., Heine-Suñer,
D., Cigudosa, J. C., Urioste, M., Benitez, J., Boix-Chornet, M., Sanchez-Aguilera, A.,
Ling, C., Carlsson, E., Poulsen, P., Vaag, A., Stephan, Z., Spector, T. D., Wu, Y.-Z., Plass,
C., and Esteller, M. (2005). Epigenetic differences arise during the lifetime of monozygotic
twins. Proceedings of the National Academy of Sciences of the United States of America,
102(30):10604–10609.
Franceschi, C. (2007). Inflammaging as a Major Characteristic of Old People: Can It Be
Prevented or Cured? Nutrition Reviews, 65(s3):S173–S176.
Frankish, A., Bignell, A., Berry, A., Yates, A., Parker, A., Schmitt, B. M., Aken, B., García
Girón, C., Zerbino, D., Stapleton, E., Martin, F. J., Cunningham, F., Barnes, I., Sycheva, I.,
Loveland, J., Mudge, J. M., Gonzalez, J. M., Ruffier, M., Suner, M.-M., Hardy, M., Izuogu,
O. G., Donaldson, S., Mohanan, S., Hourlier, T., Grego, T., Hunt, T., Flicek, P., Wright,
J., Choudhary, J. S., Lagarde, J., Carbonell Sala, S., Guigó, R., Pozo, F., Martínez, L.,
Tress, M. L., Di Domenico, T., Muir, P., Uszczynska-Ratajczak, B., Paten, B., Fiddes, I. T.,
Armstrong, J., Diekhans, M., Hubbard, T. J. P., Reymond, A., Ferreira, A.-M., Chrast, J.,
Johnson, R., Jungreis, I., Kellis, M., Pei, B., Navarro, F. C. P., Xu, J., Zhang, Y., Gerstein,
M., and Sisu, C. (2018). GENCODE reference annotation for the human and mouse
genomes. Nucleic Acids Research, 47(D1):D766–D773.
Freund, A. (2019). Untangling Aging Using Dynamic, Organism-Level Phenotypic Networks.
Cell Systems, 8(3):172–181.
Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization Paths for Generalized
Linear Models via Coordinate Descent. Journal of statistical software, 33(1):1–22.
Froimchuk, E., Jang, Y., and Ge, K. (2017). Histone H3 lysine 4 methyltransferase KMT2D.
Gene, 627:337–342.
Frommer, M., McDonald, L. E., Millar, D. S., Collis, C. M., Watt, F., Grigg, G. W., Molloy,
P. L., and Paul, C. L. (1992). A genomic sequencing protocol that yields a positive display
of 5-methylcytosine residues in individual DNA strands. Proceedings of the National
Academy of Sciences, 89(5):1827–1831.
Fumagalli, M. (2013). Assessing the Effect of Sequencing Depth and Sample Size in
Population Genetics Inferences. PLOS ONE, 8(11):e79667.
Gagnon-Bartsch, J. A. and Speed, T. P. (2012). Using control genes to correct for unwanted
variation in microarray data. Biostatistics, 13(3):539–552.
184 References
Galkin, F., Aliper, A., Putin, E., Kuznetsov, I., Gladyshev, V. N., and Zhavoronkov, A. (2018).
Human microbiome aging clocks based on deep learning and tandem of permutation
feature importance and accumulated local effects. bioRxiv, page 507780.
Gao, Y., Gan, H., Lou, Z., and Zhang, Z. (2018). Asf1a resolves bivalent chromatin
domains for the induction of lineage-specific genes during mouse embryonic stem cell
differentiation. Proceedings of the National Academy of Sciences, 115(27):E6162–E6171.
Garagnani, P., Bacalini, M. G., Pirazzini, C., Gori, D., Giuliani, C., Mari, D., Di Blasio,
A. M., Gentilini, D., Vitale, G., Collino, S., Rezzi, S., Castellani, G., Capri, M., Salvioli,
S., and Franceschi, C. (2012). Methylation of ELOVL2 gene as a new epigenetic marker
of age. Aging Cell, 11(6):1132–1134.
Gems, D. (2015). The aging-disease false dichotomy: understanding senescence as pathology.
Frontiers in genetics, 6(June):212.
Gilbert, S. F. (2011). Commentary: ‘The Epigenotype’ by C.H. Waddington. International
Journal of Epidemiology, 41(1):20–23.
Gompertz, B. (1825). On the Nature of the Function Expressive of the Law of Human Mor-
tality, and on a New Mode of Determining the Value of Life Contingencies. Philosophical
Transactions of the Royal Society of London, 115:513–583.
Gontier, G., Iyer, M., Shea, J. M., Bieri, G., Wheatley, E. G., Ramalho-Santos, M., and
Villeda, S. A. (2018). Tet2 Rescues Age-Related Regenerative Decline and Enhances
Cognitive Function in the Adult Mouse Brain. Cell Reports, 22(8):1974–1981.
Gopalan, S., Carja, O., Fagny, M., Patin, E., Myrick, J. W., McEwen, L. M., Mah, S. M.,
Kobor, M. S., Froment, A., Feldman, M. W., Quintana-Murci, L., and Henn, B. M. (2017).
Trends in DNA Methylation with Age Replicate Across Diverse Human Populations.
Genetics, 206(3):1659–1674.
Grafodatskaya, D., Chung, B. H. Y., Butcher, D. T., Turinsky, A. L., Goodman, S. J., Choufani,
S., Chen, Y.-A., Lou, Y., Zhao, C., Rajendram, R., Abidi, F. E., Skinner, C., Stavropoulos,
J., Bondy, C. A., Hamilton, J., Wodak, S., Scherer, S. W., Schwartz, C. E., and Weksberg,
R. (2013). Multilocus loss of DNA methylation in individuals with mutations in the histone
H3 Lysine 4 Demethylase KDM5C. BMC Medical Genomics, 6(1):1.
Greally, J. M. (2018). A user’s guide to the ambiguous word ’epigenetics’. Nature Reviews
Molecular Cell Biology, 19:207–208.
Greer, E. L. and Brunet, A. (2008). Signaling networks in aging. Journal of Cell Science,
121:407–412.
Greer, E. L., Oskoui, P. R., Banko, M. R., Maniar, J. M., Gygi, M. P., Gygi, S. P., and
Brunet, A. (2007). The energy sensor AMP-activated protein kinase directly regulates the
mammalian FOXO3 transcription factor. Journal of Biological Chemistry, 282:30107–
30119.
Grönniger, E., Weber, B., Heil, O., Peters, N., Stäb, F., Wenck, H., Korn, B., Winnefeld, M.,
and Lyko, F. (2010). Aging and Chronic Sun Exposure Cause Distinct Epigenetic Changes
in Human Skin. PLOS Genetics, 6(5):e1000971.
References 185
Gu, H., Bock, C., Mikkelsen, T. S., Jäger, N., Smith, Z. D., Tomazou, E., Gnirke, A., Lander,
E. S., and Meissner, A. (2010). Genome-scale DNA methylation mapping of clinical
samples at single-nucleotide resolution. Nature methods, 7(2):133–136.
Guarente, L. and Kenyon, C. (2000). Genetic pathways that regulate ageing in model
organisms. Nature, 408(6809):255–262.
Halfmann, R. and Lindquist, S. (2010). Epigenetics in the Extreme: Prions and the Inheritance
of Environmentally Acquired Traits. Science, 330(6004):629–632.
Hanna, C. W., Peñaherrera, M. S., Saadeh, H., Andrews, S., McFadden, D. E., Kelsey, G.,
and Robinson, W. P. (2016). Pervasive polymorphic imprinted methylation in the human
placenta. Genome Research, 26:756–767.
Hannum, G., Guinney, J., Zhao, L., Zhang, L., Hughes, G., and Sadda, S. (2013). Genome-
wide methylation profiles reveal quantitative views of human aging rates. Mol Cell,
49(2):359–367.
Harrison, D. E., Strong, R., Sharp, Z. D., Nelson, J. F., Astle, C. M., Flurkey, K., Nadon,
N. L., Wilkinson, J. E., Frenkel, K., Carter, C. S., Pahor, M., Javors, M. a., Fernandez,
E., and Miller, R. a. (2009). Rapamycin fed late in life extends lifespan in genetically
heterogeneous mice. Nature, 460(7253):392–395.
Harrow, J., Frankish, A., Gonzalez, J. M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken,
B. L., Barrell, D., Zadissa, A., Searle, S., Barnes, I., Bignell, A., Boychenko, V., Hunt, T.,
Kay, M., Mukherjee, G., Rajan, J., Despacio-Reyes, G., Saunders, G., Steward, C., Harte,
R., Lin, M., Howald, C., Tanzer, A., Derrien, T., Chrast, J., Walters, N., Balasubramanian,
S., Pei, B., Tress, M., Rodriguez, J. M., Ezkurdia, I., Van Baren, J., Brent, M., Haussler,
D., Kellis, M., Valencia, A., Reymond, A., Gerstein, M., Guigó, R., and Hubbard, T. J.
(2012). GENCODE: The reference human genome annotation for the ENCODE project.
Genome Research, 22:1760–1774.
Hayflick, L. (1998). A Brief History of the Mortality and Immortality of Cultured Cells. The
Keio Journal of Medicine, 47(3):174–182.
Hayflick, L. (2007a). Biological aging is no longer an unsolved problem. In Annals of the
New York Academy of Sciences, volume 1100, pages 1–13.
Hayflick, L. (2007b). Entropy Explains Aging, Genetic Determinism Explains Longevity, and
Undefined Terminology Explains Misunderstanding Both. PLOS Genetics, 3(12):e220.
Hayflick, L. and Moorhead, P. S. (1961). The serial cultivation of human diploid cell strains.
Experimental Cell Research, 25(3):585–621.
He, Y. and Ecker, J. R. (2015). Non-CG Methylation in the Human Genome. Annual Review
of Genomics and Human Genetics, 16(1):55–77.
Hernando-Herraez, I., Evano, B., Stubbs, T., Commere, P.-H., Clark, S., Andrews, S.,
Tajbakhsh, S., and Reik, W. (2018). Ageing affects DNA methylation drift and transcrip-
tional cell-to-cell variability in muscle stem cells. bioRxiv, page 500900.
186 References
Herranz, N. and Gil, J. (2018). Mechanisms and functions of cellular senescence. The
Journal of Clinical Investigation, 128(4):1238–1246.
Hertel, J., Friedrich, N., Wittfeld, K., Pietzner, M., Budde, K., Van der Auwera, S., Lohmann,
T., Teumer, A., Völzke, H., Nauck, M., and Grabe, H. J. (2016). Measuring Biological
Age via Metabonomics: The Metabolic Age Score. Journal of Proteome Research,
15(2):400–410.
Heyn, H., Li, N., Ferreira, H. J., Moran, S., Pisano, D. G., and Gomez, A. (2012). Distinct
DNA methylomes of newborns and centenarians. Proc Natl Acad Sci U S A, 109(26):10522–
10527.
Heyn, P., Logan, C. V., Fluteau, A., Challis, R. C., Auchynnikava, T., Martin, C.-A., Marsh,
J. A., Taglini, F., Kilanowski, F., Parry, D. A., Cormier-Daire, V., Fong, C.-T., Gibson, K.,
Hwa, V., Ibáñez, L., Robertson, S. P., Sebastiani, G., Rappsilber, J., Allshire, R. C., Reijns,
M. A. M., Dauber, A., Sproul, D., and Jackson, A. P. (2019). Gain-of-function DNMT3A
mutations cause microcephalic dwarfism and hypermethylation of Polycomb-regulated
regions. Nature Genetics, 51(1):96–105.
Hodges, E., Smith, A. D., Kendall, J., Xuan, Z., Ravi, K., Rooks, M., Zhang, M. Q., Ye, K.,
Bhattacharjee, A., Brizuela, L., McCombie, W. R., Wigler, M., Hannon, G. J., and Hicks,
J. B. (2009). High definition profiling of mammalian DNA methylation by array capture
and single molecule bisulfite sequencing. Genome Research, 19:1593–1605.
Holliday, R. and Pugh, J. E. (1975). DNA modification mechanisms and gene activity during
development. Science, 187(4173):226–232.
Hon, G., Song, C.-X., Du, T., Jin, F., Selvaraj, S., Lee, A., Yen, C.-a., Ye, Z., Mao, S.-Q.,
Wang, B.-A., Kuan, S., Edsall, L., Zhao, B., Xu, G.-L., He, C., and Ren, B. (2014).
5mC Oxidation by Tet2 Modulates Enhancer Activity and Timing of Transcriptome
Reprogramming during Differentiation. Molecular Cell, 56(2):286–297.
Hood, R. L., Schenkel, L. C., Nikkel, S. M., Ainsworth, P. J., Pare, G., Boycott, K. M.,
Bulman, D. E., and Sadikovic, B. (2016). The defining DNA methylation signature of
Floating-Harbor Syndrome. Scientific Reports, 6:38803.
Horvath, S. (2013a). DNA methylation age of human tissues and cell types. Genome Biology,
14(10):3156.
Horvath, S. (2013b). DNAmAge online calculator: https://dnamage.genetics.ucla.edu/home.
Horvath, S. (2013c). FAQs DNAmAge online calculator:
https://horvath.genetics.ucla.edu/html/dnamage/faq.htm#_Toc385147421.
Horvath, S. (2015). Erratum to: DNA methylation age of human tissues and cell types.
Genome Biology, 16(1):96.
Horvath, S., Erhart, W., Brosch, M., Ammerpohl, O., von Schönfels, W., Ahrens, M., Heits,
N., Bell, J. T., Tsai, P.-C., Spector, T. D., Deloukas, P., Siebert, R., Sipos, B., Becker, T.,
Röcken, C., Schafmayer, C., and Hampe, J. (2014). Obesity accelerates epigenetic aging
of human liver. Proceedings of the National Academy of Sciences, page 201412759.
References 187
Horvath, S., Garagnani, P., Bacalini, M. G., Pirazzini, C., Salvioli, S., Gentilini, D., Di Blasio,
A. M., Giuliani, C., Tung, S., Vinters, H. V., and Franceschi, C. (2015a). Accelerated
epigenetic aging in Down syndrome. Aging Cell, 14(3):491–495.
Horvath, S., Gurven, M., Levine, M. E., Trumble, B. C., Kaplan, H., Allayee, H., Ritz,
B. R., Chen, B., Lu, A. T., Rickabaugh, T. M., Jamieson, B. D., Sun, D., Li, S., Chen, W.,
Quintana-Murci, L., Fagny, M., Kobor, M. S., Tsao, P. S., Reiner, A. P., Edlefsen, K. L.,
Absher, D., and Assimes, T. L. (2016a). An epigenetic clock analysis of race/ethnicity,
sex, and coronary heart disease. Genome Biology, 17(1):171.
Horvath, S., Langfelder, P., Kwak, S., Aaronson, J., Rosinski, J., Vogt, T. F., Eszes, M., Faull,
R. L., Curtis, M. A., Waldvogel, H. J., Choi, O. W., Tung, S., Vinters, H. V., Coppola, G.,
and Yang, X. W. (2016b). Huntington’s disease accelerates epigenetic aging of human
brain and disrupts DNA methylation levels. Aging, 8(7):1485–1512.
Horvath, S. and Levine, A. J. (2015). HIV-1 Infection Accelerates Age According to the
Epigenetic Clock. The Journal of infectious diseases, 212(10):1563–73.
Horvath, S., Mah, V., Lu, A. T., Woo, J. S., Choi, O.-W., Jasinska, A. J., Riancho, J. A., Tung,
S., Coles, N. S., Braun, J., Vinters, H. V., and Coles, L. S. (2015b). The cerebellum ages
slowly according to the epigenetic clock. Aging, 7(5):294–306.
Horvath, S., Oshima, J., Martin, G. M., Lu, A. T., Quach, A., Cohen, H., Felton, S.,
Matsuyama, M., Lowe, D., Kabacik, S., Wilson, J. G., Reiner, A. P., Maierhofer, A.,
Flunkert, J., Aviv, A., Hou, L., Baccarelli, A. A., Li, Y., Stewart, J. D., Whitsel, E. A.,
Ferrucci, L., Matsuyama, S., and Raj, K. (2018). Epigenetic clock for skin and blood
cells applied to Hutchinson Gilford Progeria Syndrome and ex vivo studies. Aging,
10(7):1758–1775.
Horvath, S. and Raj, K. (2018). DNA methylation-based biomarkers and the epigenetic clock
theory of ageing. Nature Reviews Genetics, 19(6):371–384.
Horvath, S., Zhang, Y., Langfelder, P., Kahn, R. S., Boks, M. P., and Van Eijk, K. (2012).
Aging effects on DNA methylation modules in human brain and blood tissue. Genome
Biol, 13:R97.
Hoshino, A., Horvath, S., Sridhar, A., Chitsazan, A., and Reh, T. A. (2019). Synchrony and
asynchrony between an epigenetic clock and developmental timing. Scientific Reports,
9(1):3770.
Houseman, E. A., Accomando, W. P., Koestler, D. C., Christensen, B. C., Marsit, C. J.,
Nelson, H. H., Wiencke, J. K., and Kelsey, K. T. (2012). DNA methylation arrays as
surrogate measures of cell mixture distribution. BMC Bioinformatics, 13:86.
Hsu, A.-L., Murphy, C. T., and Kenyon, C. (2003). Regulation of Aging and Age-Related
Disease by DAF-16 and Heat-Shock Factor. Science, 300(5622):1142–1145.
Huang, H., Weng, H., Zhou, K., Wu, T., Zhao, B. S., Sun, M., Chen, Z., Deng, X., Xiao,
G., Auer, F., Klemm, L., Wu, H., Zuo, Z., Qin, X., Dong, Y., Zhou, Y., Qin, H., Tao, S.,
Du, J., Liu, J., Lu, Z., Yin, H., Mesquita, A., Yuan, C. L., Hu, Y.-C., Sun, W., Su, R.,
Dong, L., Shen, C., Li, C., Qing, Y., Jiang, X., Wu, X., Sun, M., Guan, J.-L., Qu, L.,
188 References
Wei, M., Müschen, M., Huang, G., He, C., Yang, J., and Chen, J. (2019). Histone H3
trimethylation at lysine 36 guides m6A RNA modification co-transcriptionally. Nature,
567(7748):414–419.
Huang, X., Lu, H., Wang, J.-W., Xu, L., Liu, S., Sun, J., and Gao, F. (2013). High-throughput
sequencing of methylated cytosine enriched by modification-dependent restriction endonu-
clease MspJI. BMC Genetics, 14(1):56.
Huh, C. J., Zhang, B., Victor, M. B., Dahiya, S., Batista, L. F. Z., Horvath, S., and Yoo, A. S.
(2016). Maintenance of age in human neurons generated by microRNA-based neuronal
conversion of fibroblasts. eLife, 5:e18648.
Illumina (2010). GenomeStudio® Methylation Module v1.8 User Guide. Technical report.
Illumina (2015). Infinium® HD Assay Methylation Protocol Guide. Technical report.
Irvin, M. R., Aslibekyan, S., Do, A., Zhi, D., Hidalgo, B., Claas, S. A., Srinivasasainagendra,
V., Horvath, S., Tiwari, H. K., Absher, D. M., and Arnett, D. K. (2018). Metabolic and
inflammatory biomarkers are associated with epigenetic aging acceleration estimates in
the GOLDN study. Clinical Epigenetics, 10(1):56.
Ito, S., D’Alessio, A. C., Taranova, O. V., Hong, K., Sowers, L. C., and Zhang, Y. (2010).
Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass
specification. Nature, 466:1129–1133.
Iurlaro, M., von Meyenn, F., and Reik, W. (2017). DNA methylation homeostasis in human
and mouse development. Current Opinion in Genetics & Development, 43:101–109.
Ivanov, M., Kals, M., Kacevska, M., Metspalu, A., Ingelman-Sundberg, M., and Milani,
L. (2013). In-solution hybrid capture of bisulfite-converted DNA for targeted bisulfite
sequencing of 174 ADME genes. Nucleic Acids Research, 41(6):e72.
Jaffe, A. E. (2018). FlowSorted.Blood.450k Bioconductor Package.
Jaffe, A. E. and Irizarry, R. A. (2014). Accounting for cellular heterogeneity is critical in
epigenome-wide association studies. Genome Biology, 15(2):R31.
Jeffries, A. R., Maroofian, R., Salter, C. G., Chioza, B. A., Cross, H. E., Patton, M. A., Temple,
I. K., Mackay, D., Rezwan, F. I., Aksglaede, L., Baralle, D., Dabir, T., Hunter, M. F.,
Kamath, A., Kumar, A., Newbury-Ecob, R., Selicorni, A., Springer, A., van Maldergem,
L., Varghese, V., Yachelevich, N., Tatton-Brown, K., Mill, J., Crosby, A. H., and Baple, E.
(2018). Growth disrupting mutations in epigenetic regulatory molecules are associated
with abnormalities of epigenetic aging. bioRxiv, page 477356.
Jenkinson, G., Pujadas, E., Goutsias, J., and Feinberg, A. P. (2017). Potential energy
landscapes identify the information-theoretic nature of the epigenome. Nature Genetics,
49:719–729.
Jensen, A. B., Moseley, P. L., Oprea, T. I., Ellesøe, S. G., Eriksson, R., Schmock, H., Jensen,
P. B., Jensen, L. J., and Brunak, S. (2014). Temporal disease trajectories condensed from
population-wide registry data covering 6.2 million patients. Nature Communications,
5:4022.
References 189
Jeong, M., Sun, D., Luo, M., Huang, Y., Challen, G. A., Rodriguez, B., Zhang, X., Chavez,
L., Wang, H., Hannah, R., Kim, S.-B., Yang, L., Ko, M., Chen, R., Göttgens, B., Lee,
J.-S., Gunaratne, P., Godley, L. A., Darlington, G. J., Rao, A., Li, W., and Goodell, M. A.
(2013). Large conserved domains of low DNA methylation maintained by Dnmt3a. Nature
Genetics, 46:17–23.
Jeziorska, D. M., Murray, R. J. S., De Gobbi, M., Gaentzsch, R., Garrick, D., Ayyub, H.,
Chen, T., Li, E., Telenius, J., Lynch, M., Graham, B., Smith, A. J. H., Lund, J. N., Hughes,
J. R., Higgs, D. R., and Tufarelli, C. (2017). DNA methylation of intragenic CpG islands
depends on their transcriptional activity during differentiation and disease. Proceedings of
the National Academy of Sciences, 114(36):E7526–E7535.
Johnson, T. E. (2013). 25 years after age-1: Genes, interventions and the revolution in aging
research. Experimental Gerontology, 48(7):640–643.
Jones, O. R., Scheuerlein, A., Salguero-Gómez, R., Camarda, C. G., Schaible, R., Casper,
B. B., Dahlgren, J. P., Ehrlén, J., García, M. B., Menges, E. S., Quintana-Ascencio, P. F.,
Caswell, H., Baudisch, A., and Vaupel, J. W. (2013). Diversity of ageing across the tree of
life. Nature, 505:169.
Jylhävä, J., Pedersen, N. L., and Hägg, S. (2017). Biological Age Predictors. EBioMedicine,
21:29–36.
Kacmarczyk, T. J., Fall, M. P., Zhang, X., Xin, Y., Li, Y., Alonso, A., and Betel, D. (2018).
“Same difference”: comprehensive evaluation of four DNA methylation measurement
platforms. Epigenetics & Chromatin, 11(1):21.
Kanfi, Y., Naiman, S., Amir, G., Peshti, V., Zinman, G., Nahum, L., Bar-Joseph, Z., and Co-
hen, H. Y. (2012). The sirtuin SIRT6 regulates lifespan in male mice. Nature, 483:218221.
Kaplanis, J., Gordon, A., Shor, T., Weissbrod, O., Geiger, D., Wahl, M., Gershovits, M.,
Markus, B., Sheikh, M., Gymrek, M., Bhatia, G., MacArthur, D. G., Price, A. L., and
Erlich, Y. (2018). Quantitative analysis of population-scale family trees with millions of
relatives. Science, 360(6385):171–175.
Kapourani, C.-A. and Sanguinetti, G. (2019). Melissa: Bayesian clustering and imputation
of single-cell methylomes. Genome Biology, 20(1):61.
Kawakatsu, T., Huang, S.-s. C., Jupe, F., Sasaki, E., Schmitz, R. J., Urich, M. A., Castanon,
R., Nery, J. R., Barragan, C., He, Y., Chen, H., Dubin, M., Lee, C.-R., Wang, C., Bemm,
F., Becker, C., O’Neil, R., O’Malley, R. C., Quarless, D. X., Alonso-Blanco, C., Andrade,
J., Becker, C., Bemm, F., Bergelson, J., Borgwardt, K., Chae, E., Dezwaan, T., Ding, W.,
Ecker, J. R., Expósito-Alonso, M., Farlow, A., Fitz, J., Gan, X., Grimm, D. G., Hancock,
A., Henz, S. R., Holm, S., Horton, M., Jarsulic, M., Kerstetter, R. A., Korte, A., Korte,
P., Lanz, C., Lee, C.-R., Meng, D., Michael, T. P., Mott, R., Muliyati, N. W., Nägele, T.,
Nagler, M., Nizhynska, V., Nordborg, M., Novikova, P., Picó, F. X., Platzer, A., Rabanal,
F. A., Rodriguez, A., Rowan, B. A., Salomé, P. A., Schmid, K., Schmitz, R. J., Seren,
Ü., Sperone, F. G., Sudkamp, M., Svardal, H., Tanzer, M. M., Todd, D., Volchenboum,
S. L., Wang, C., Wang, G., Wang, X., Weckwerth, W., Weigel, D., Zhou, X., Schork, N. J.,
Weigel, D., Nordborg, M., and Ecker, J. R. (2016). Epigenomic Diversity in a Global
Collection of Arabidopsis thaliana Accessions. Cell, 166(2):492–505.
190 References
Kelsey, G., Stegle, O., and Reik, W. (2017). Single-cell epigenomics: Recording the past and
predicting the future. Science, 358(6359):69–75.
Kenyon, C. (2005). The plasticity of aging: Insights from long-lived mutants. Cell,
120(4):449–460.
Kenyon, C., Chang, J., Gensch, E., Rudner, A., and Tabtiang, R. (1993). A C. elegans mutant
that lives twice as long as wild type. Nature, 366(6454):461–464.
Kenyon, C. J. (2010). The genetics of ageing. Nature, 464(7288):504–12.
Kernohan, K. D., Cigana Schenkel, L., Huang, L., Smith, A., Pare, G., Ainsworth, P., Boycott,
K. M., Warman-Chardon, J., Sadikovic, B., and Consortium, C. C. (2016). Identification
of a methylation profile for DNMT1-associated autosomal dominant cerebellar ataxia,
deafness, and narcolepsy. Clinical Epigenetics, 8(1):91.
Khan, S. S., Singer, B. D., and Vaughan, D. E. (2017). Molecular and physiological
manifestations and measurement of aging in humans. Aging Cell, 16(4):624–633.
Kierkegaard, S. (1843). Journals.
Kirkland, J. L., Tchkonia, T., Zhu, Y., Niedernhofer, L. J., and Robbins, P. D. (2017).
The Clinical Potential of Senolytic Drugs. Journal of the American Geriatrics Society,
65(10):2297–2301.
Kirkwood, T. B. and Rose, M. R. (1991). Evolution of senescence: late survival sacrificed for
reproduction. Philosophical Transactions - Royal Society of London, B, 332(1262):15–24.
Kirkwood, T. B. L. (1977). Evolution of ageing. Nature, 270(5635):301–304.
Kirschner, S. A., Hunewald, O., Mériaux, S. B., Brunnhoefer, R., Muller, C. P., and Turner,
J. D. (2016). Focussing reduced representation CpG sequencing through judicious restric-
tion enzyme choice. Genomics, 107(4):109–119.
Klass, M. and Hirsh, D. (1976). Non-ageing developmental variant of Caenorhabditis elegans.
Nature, 260(5551):523–525.
Koch, C. M. and Wagner, W. (2011). Epigenetic-aging-signature to determine age in different
tissues. Aging, 3(10):1018–1027.
Koestler, D. C., Jones, M. J., Usset, J., Christensen, B. C., Butler, R. A., Kobor, M. S.,
Wiencke, J. K., and Kelsey, K. T. (2016). Improving cell mixture deconvolution by
identifying optimal DNA methylation libraries (IDOL). BMC Bioinformatics, 17:120.
Komori, H. K., LaMere, S. A., Torkamani, A., Hart, G. T., Kotsopoulos, S., Warner, J.,
Samuels, M. L., Olson, J., Head, S. R., Ordoukhanian, P., Lee, P. L., Link, D. R., and
Salomon, D. R. (2011). Application of microdroplet PCR for large-scale targeted bisulfite
sequencing. Genome Research, 21(10):1738–1745.
Kontis, V., Bennett, J. E., Mathers, C. D., Li, G., Foreman, K., and Ezzati, M. (2017). Future
life expectancy in 35 industrialised countries: projections with a Bayesian model ensemble.
The Lancet, 389(10076):1323–1335.
References 191
Kresovich, J. K., Xu, Z., O’Brien, K. M., Weinberg, C. R., Sandler, D. P., and Taylor, J. A.
(2019). Methylation-Based Biological Age and Breast Cancer Risk. JNCI: Journal of the
National Cancer Institute, page djz020.
Kriaucionis, S. and Heintz, N. (2009). The Nuclear DNA Base 5-Hydroxymethylcytosine Is
Present in Purkinje Neurons and the Brain. Science, 324(5929):929–930.
Kriukiene˙, E., Labrie, V., Khare, T., Urbanavicˇiu¯te˙, G., Lapinaite˙, A., Koncevicˇius, K.,
Li, D., Wang, T., Pai, S., Ptak, C., Gordevicˇius, J., Wang, S.-C., Petronis, A., and
Klimašauskas, S. (2013). DNA unmethylome profiling by covalent capture of CpG sites.
Nature Communications, 4:2190.
Krueger, F. and Andrews, S. R. (2011). Bismark: a flexible aligner and methylation caller for
Bisulfite-Seq applications. Bioinformatics, 27(11):1571–1572.
Kucab, J. E., Zou, X., Morganella, S., Joel, M., Nanda, A. S., Nagy, E., Gomez, C., Degasperi,
A., Harris, R., Jackson, S. P., Arlt, V. M., Phillips, D. H., and Nik-Zainal, S. (2019). A
Compendium of Mutational Signatures of Environmental Agents. Cell, 177(4):821–
836.e16.
Kudithipudi, S., Lungu, C., Rathert, P., Happel, N., and Jeltsch, A. (2014). Substrate
Specificity Analysis and Novel Substrates of the Protein Lysine Methyltransferase NSD1.
Chemistry & Biology, 21(2):226–237.
Kuhn, R. M., Haussler, D., and Kent, W. J. (2012). The UCSC genome browser and associated
tools. Briefings in Bioinformatics, 14(2):144–161.
Kuranda, K., Vargaftig, J., de la Rochere, P., Dosquet, C., Charron, D., Bardin, F., Tonnelle,
C., Bonnet, D., and Goodhardt, M. (2011). Age-related changes in human hematopoietic
stem/progenitor cells. Aging Cell, 10(3):542–546.
Kurdyukov, S. and Bullock, M. (2016). DNA Methylation Analysis: Choosing the Right
Method. Biology, 5(1):3.
Kurotaki, N., Imaizumi, K., Harada, N., Masuno, M., Kondoh, T., Nagai, T., Ohashi, H.,
Naritomi, K., Tsukahara, M., Makita, Y., Sugimoto, T., Sonoda, T., Hasegawa, T., Chinen,
Y., Tomita, H.-a., Kinoshita, A., Mizuguchi, T., Yoshiura, K.-i., Ohta, T., Kishino, T.,
Fukushima, Y., Niikawa, N., and Matsumoto, N. (2002). Haploinsufficiency of NSD1
causes Sotos syndrome. Nature Genetics, 30:365–366.
Lappalainen, T. and Greally, J. M. (2017). Associating cellular epigenetic models with
human phenotypes. Nature Reviews Genetics, 18:441–451.
Larsson, N.-G. (2010). Somatic Mitochondrial DNA Mutations in Mammalian Aging. Annual
Review of Biochemistry, 79(1):683–706.
Lawrence, M., Daujat, S., and Schneider, R. (2016). Lateral Thinking: How Histone
Modifications Regulate Gene Expression. Trends in Genetics, 32(1):42–56.
Lee, Y. K., Jin, S., Duan, S., Lim, Y. C., Ng, D. P. Y., Lin, X. M., Yeo, G. S. H., and Ding, C.
(2014). Improved reduced representation bisulfite sequencing for epigenomic profiling of
clinical samples. Biological Procedures Online, 16(1):1.
192 References
Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., Geman,
D., Baggerly, K., and Irizarry, R. A. (2010). Tackling the widespread and critical impact
of batch effects in high-throughput data. Nature Reviews Genetics, 11:733–739.
Lev Maor, G., Yearim, A., and Ast, G. (2015). The alternative role of DNA methylation in
splicing regulation. Trends in Genetics, 31(5):274–280.
Leventopoulos, G., Kitsiou-Tzeli, S., Kritikos, K., Psoni, S., Mavrou, A., Kanavakis, E., and
Fryssira, H. (2009). A Clinical Study of Sotos Syndrome Patients With Review of the
Literature. Pediatric Neurology, 40(5):357–364.
Levine, M. E., Lu, A. T., Chen, B. H., Hernandez, D. G., Singleton, A. B., Ferrucci, L.,
Bandinelli, S., Salfati, E., Manson, J. E., Quach, A., Kusters, C. D. J., Kuh, D., Wong,
A., Teschendorff, A. E., Widschwendter, M., Ritz, B. R., Absher, D., Assimes, T. L., and
Horvath, S. (2016). Menopause accelerates biological aging. Proceedings of the National
Academy of Sciences, 113(33):9327–9332.
Levine, M. E., Lu, A. T., Quach, A., Chen, B. H., Assimes, T. L., Bandinelli, S., Hou, L.,
Baccarelli, A. A., Stewart, J. D., Li, Y., Whitsel, E. A., Wilson, J. G., Reiner, A. P., Aviv,
A., Lohman, K., Liu, Y., Ferrucci, L., and Horvath, S. (2018). An epigenetic biomarker of
aging for lifespan and healthspan. Aging, 10(4):573–591.
Lezzerini, M. and Budovskaya, Y. (2014). A dual role of the Wnt signaling pathway during
aging in Caenorhabditis elegans. Aging Cell, 13(1):8–18.
Li, E. and Zhang, Y. (2014). DNA methylation in mammals. Cold Spring Harbor Perspectives
in Biology, 6(5):a019133.
Li, H., Liefke, R., Jiang, J., Kurland, J. V., Tian, W., Deng, P., Zhang, W., He, Q., Patel,
D. J., Bulyk, M. L., Shi, Y., and Wang, Z. (2017). Polycomb-like proteins link the PRC2
complex to CpG islands. Nature, 549(7671):287–291.
Li, Y., Zheng, H., Wang, Q., Zhou, C., Wei, L., Liu, X., Zhang, W., Zhang, Y., Du, Z., Wang,
X., and Xie, W. (2018). Genome-wide analyses reveal a role of Polycomb in promoting
hypomethylation of DNA methylation valleys. Genome Biology, 19(1):18.
Lim, Y. C., Chia, S. Y., Jin, S., Han, W., Ding, C., and Sun, L. (2016). Dynamic DNA methy-
lation landscape defines brown and white cell specificity during adipogenesis. Molecular
Metabolism, 5(10):1033–1041.
Lin, K., Hsin, H., Libina, N., and Kenyon, C. (2001). Regulation of the Caenorhabditis
elegans longevity protein DAF-16 by insulin/IGF-1 and germline signaling. Nature
Genetics, 28:139–145.
Liu, J. and Siegmund, K. D. (2016). An evaluation of processing methods for HumanMethy-
lation450 BeadChip data. BMC Genomics, 17(1):469.
Liu, X. S., Wu, H., Ji, X., Stelzer, Y., Wu, X., Czauderna, S., Shu, J., Dadon, D., Young,
R. A., and Jaenisch, R. (2016). Editing DNA Methylation in the Mammalian Genome.
Cell, 167(1):233–247.e17.
References 193
Liu, Y., Aryee, M. J., Padyukov, L., Fallin, M. D., Hesselberg, E., Runarsson, A., Reinius,
L., Acevedo, N., Taub, M., Ronninger, M., Shchetynsky, K., Scheynius, A., Kere, J.,
Alfredsson, L., Klareskog, L., Ekström, T. J., and Feinberg, A. P. (2013). Epigenome-
wide association data implicate DNA methylation as an intermediary of genetic risk in
rheumatoid arthritis. Nature Biotechnology, 31:142–147.
Liu, Y., Siejka-Zielin´ska, P., Velikova, G., Bi, Y., Yuan, F., Tomkova, M., Bai, C., Chen,
L., Schuster-Böckler, B., and Song, C.-X. (2019). Bisulfite-free direct detection of 5-
methylcytosine and 5-hydroxymethylcytosine at base resolution. Nature Biotechnology,
37(4):424–429.
Long, H. K., Sims, D., Heger, A., Blackledge, N. P., Kutter, C., Wright, M. L., Grützner, F.,
Odom, D. T., Patient, R., Ponting, C. P., and Klose, R. J. (2013). Epigenetic conservation at
gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates.
eLife, 2:e00348.
Lopez-Otin, C., Blasco, M. A., Partridge, L., Serrano, M., and Kroemer, G. (2013). The
hallmarks of aging. Cell, 153(6):1194–1217.
Lowe, D., Horvath, S., and Raj, K. (2016). Epigenetic clock analyses of cellular senescence
and ageing. Oncotarget, 7(8):8524–8531.
Lowe, R., Barton, C., Jenkins, C. A., Ernst, C., Forman, O., Fernandez-Twinn, D. S., Bock,
C., Rossiter, S. J., Faulkes, C. G., Ozanne, S. E., Walter, L., Odom, D. T., Mellersh, C.,
and Rakyan, V. K. (2018). Ageing-associated DNA methylation dynamics are a molecular
readout of lifespan variation among mammalian species. Genome Biology, 19(1):22.
Lu, A. T., Hannon, E., Levine, M. E., Hao, K., Crimmins, E. M., Lunnon, K., Kozlenkov, A.,
Mill, J., Dracheva, S., and Horvath, S. (2016). Genetic variants near MLST8 and DHX57
affect the epigenetic age of the cerebellum. Nature Communications, 7:10561.
Lu, A. T., Quach, A., Wilson, J. G., Reiner, A. P., Aviv, A., Raj, K., Hou, L., Baccarelli,
A. A., Li, Y., Stewart, J. D., Whitsel, E. A., Assimes, T. L., Ferrucci, L., and Horvath,
S. (2019). DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging,
11(2):303–327.
Lu, A. T., Xue, L., Salfati, E. L., Chen, B. H., Ferrucci, L., Levy, D., Joehanes, R., Murabito,
J. M., Kiel, D. P., Tsai, P.-C., Yet, I., Bell, J. T., Mangino, M., Tanaka, T., McRae,
A. F., Marioni, R. E., Visscher, P. M., Wray, N. R., Deary, I. J., Levine, M. E., Quach,
A., Assimes, T., Tsao, P. S., Absher, D., Stewart, J. D., Li, Y., Reiner, A. P., Hou, L.,
Baccarelli, A. A., Whitsel, E. A., Aviv, A., Cardona, A., Day, F. R., Wareham, N. J., Perry,
J. R. B., Ong, K. K., Raj, K., Lunetta, K. L., and Horvath, S. (2018). GWAS of epigenetic
aging rates in blood reveals a critical role for TERT. Nature Communications, 9(1):387.
Luscan, A., Laurendeau, I., Malan, V., Francannet, C., Odent, S., Giuliano, F., Lacombe,
D., Touraine, R., Vidaud, M., Pasmant, E., and Cormier-Daire, V. (2014). Mutations in
SETD2 cause a novel overgrowth condition. Journal of Medical Genetics, 51(8):512–517.
Lyko, F. (2017). The DNA methyltransferase family: a versatile toolkit for epigenetic
regulation. Nature Reviews Genetics, 19:81–92.
194 References
Machado, A. (1912). Proverbios y cantares XXIX. In Campos de Castilla.
Maegawa, S., Hinkal, G., Kim, H. S., Shen, L., Zhang, L., and Zhang, J. (2010). Widespread
and tissue specific age-related DNA methylation changes in mice. Genome Res, 20:332–
340.
Mahmoudi, S., Xu, L., and Brunet, A. (2019). Turning back time with emerging rejuvenation
strategies. Nature Cell Biology, 21(1):32–43.
Maierhofer, A., Flunkert, J., Oshima, J., Martin, G. M., Haaf, T., and Horvath, S. (2017).
Accelerated epigenetic aging in Werner syndrome. Aging, 9(4):1143–1152.
Maksimovic, J., Gordon, L., and Oshlack, A. (2012). SWAN: Subset-quantile Within Array
Normalization for Illumina Infinium HumanMethylation450 BeadChips. Genome Biology,
13(6):1–12.
Maksimovic, J., Oshlack, A., Gagnon-Bartsch, J. A., and Speed, T. P. (2015). Removing un-
wanted variation in a differential methylation analysis of Illumina HumanMethylation450
array data. Nucleic Acids Research, 43(16):e106–e106.
Manser, A. R. and Uhrberg, M. (2016). Age-related changes in natural killer cell repertoires:
impact on NK cell function and immune surveillance. Cancer Immunology, Immunother-
apy, 65(4):417–426.
Marioni, R. E., Deary, I. J., Relton, C. L., Suderman, M., Ferrucci, L., Chen, B. H., Horvath,
S., Bandinelli, S., Beck, S., Morris, T., Pedersen, N. L., and Hägg, S. (2018). Tracking
the Epigenetic Clock Across the Human Life Course: A Meta-analysis of Longitudinal
Cohort Data. The Journals of Gerontology: Series A, 74(1):57–61.
Marioni, R. E., Shah, S., McRae, A. F., Chen, B. H., Colicino, E., Harris, S. E., Gibson,
J., Henders, A. K., Redmond, P., Cox, S. R., Pattie, A., Corley, J., Murphy, L., Martin,
N. G., Montgomery, G. W., Feinberg, A. P., Fallin, M. D., Multhaup, M. L., Jaffe, A. E.,
Joehanes, R., Schwartz, J., Just, A. C., Lunetta, K. L., Murabito, J. M., Starr, J. M.,
Horvath, S., Baccarelli, A. A., Levy, D., Visscher, P. M., Wray, N. R., and Deary, I. J.
(2015). DNA methylation age of blood predicts all-cause mortality in later life. Genome
Biology, 16(1):25.
Martin-Herranz, D. E. (2019). demh/epigenetic_ageing_clock: Epigenetic ageing clock
v1.1.0. GitHub repository: https://github.com/demh/epigenetic_ageing_clock/.
Martin-Herranz, D. E., Aref-Eshghi, E., Bonder, M. J., Stubbs, T. M., Choufani, S., Weksberg,
R., Stegle, O., Sadikovic, B., Reik, W., and Thornton, J. M. (2019). Screening for
genes that accelerate the epigenetic aging clock in humans reveals a role for the H3K36
methyltransferase NSD1. Genome Biology, 20(1):146.
Martin-Herranz, D. E., Ribeiro, A. J., and Stubbs, T. M. (2017a). demh/cuRRBS: cuRRBS
V1.0.4.
Martin-Herranz, D. E., Ribeiro, A. J. M., Krueger, F., Thornton, J. M., Reik, W., and Stubbs,
T. M. (2017b). cuRRBS: simple and robust evaluation of enzyme combinations for reduced
representation approaches. Nucleic Acids Research, 45(20):11559–11569.
References 195
Martin-Montalvo, A., Mercken, E. M., Mitchell, S. J., Palacios, H. H., Mote, P. L., Scheibye-
Knudsen, M., Gomes, A. P., Ward, T. M., Minor, R. K., Blouin, M.-J., Schwab, M., Pollak,
M., Zhang, Y., Yu, Y., Becker, K. G., Bohr, V. A., Ingram, D. K., Sinclair, D. A., Wolf,
N. S., Spindler, S. R., Bernier, M., and de Cabo, R. (2013). Metformin improves healthspan
and lifespan in mice. Nature Communications, 4:2192.
Martincorena, I., Fowler, J. C., Wabik, A., Lawson, A. R. J., Abascal, F., Hall, M. W. J.,
Cagan, A., Murai, K., Mahbubani, K., Stratton, M. R., Fitzgerald, R. C., Handford, P. A.,
Campbell, P. J., Saeb-Parsy, K., and Jones, P. H. (2018). Somatic mutant clones colonize
the human esophagus with age. Science, 362(6417):911–917.
Martinez-Arguelles, D. B., Lee, S., and Papadopoulos, V. (2014). In silico analysis identi-
fies novel restriction enzyme combinations that expand reduced representation bisulfite
sequencing CpG coverage. BMC research notes, 7(1):534.
Martinez-Jimenez, C. P., Eling, N., Chen, H.-C., Vallejos, C. A., Kolodziejczyk, A. A.,
Connor, F., Stojic, L., Rayner, T. F., Stubbington, M. J. T., Teichmann, S. A., de la Roche,
M., Marioni, J. C., and Odom, D. T. (2017). Aging increases cell-to-cell transcriptional
variability upon immune stimulation. Science, 355(6332):1433–1436.
Martins, R., Lithgow, G. J., and Link, W. (2016). Long live FOXO: unraveling the role of
FOXO proteins in aging and longevity. Aging Cell, 15(2):196–207.
Mathelier, A., Fornes, O., Arenillas, D. J., Chen, C.-y., Denay, G., Lee, J., Shi, W., Shyr,
C., Tan, G., Worsley-Hunt, R., Zhang, A. W., Parcy, F., Lenhard, B., Sandelin, A.,
and Wasserman, W. W. (2015). JASPAR 2016: a major expansion and update of the
open-access database of transcription factor binding profiles. Nucleic Acids Research,
44(D1):D110–D115.
Mattick, J. S., Amaral, P. P., Dinger, M. E., Mercer, T. R., and Mehler, M. F. (2009). RNA
regulation of epigenetic processes. BioEssays, 31(1):51–59.
Maurano, M. T., Wang, H., John, S., Shafer, A., Canfield, T., Lee, K., and Stamatoyannopou-
los, J. A. (2015). Role of DNA Methylation in Modulating Transcription Factor Occupancy.
Cell Reports, 12(7):1184–1195.
McCay, C. M., Maynard, L. A., and Crowell, M. F. (1935). The Effect of Retarded Growth
Upon the Length of Life Span and Upon the Ultimate Body Size: One Figure. The Journal
of Nutrition, 10(1):63–79.
McDaniel, S. L., Hepperla, A. J., Huang, J., Dronamraju, R., Adams, A. T., Kulkarni,
V. G., Davis, I. J., and Strahl, B. D. (2017). H3K36 Methylation Regulates Nutrient Stress
Response in Saccharomyces cerevisiae by Enforcing Transcriptional Fidelity. Cell Reports,
19(11):2371–2382.
McDonald, R. B. and Ramsey, J. J. (2010). Honoring Clive McCay and 75 Years of Calorie
Restriction Research. The Journal of Nutrition, 140(7):1205–1210.
McGregor, K., Bernatsky, S., Colmegna, I., Hudson, M., Pastinen, T., Labbe, A., and Green-
wood, C. M. T. (2016). An evaluation of methods correcting for cell-type heterogeneity in
DNA methylation studies. Genome Biology, 17(1):84.
196 References
McLaren, W., Gil, L., Hunt, S. E., Riat, H. S., Ritchie, G. R. S., Thormann, A., Flicek, P.,
and Cunningham, F. (2016). The Ensembl Variant Effect Predictor. Genome Biology,
17(1):122.
Medvedev, Z. A. (1990). An attempt at a rational classification of theories of ageing.
Biological Reviews, 65:375–398.
Meer, M. V., Podolskiy, D. I., Tyshkovskiy, A., and Gladyshev, V. N. (2018). A whole
lifespan mouse multi-tissue DNA methylation clock. eLife, 7:e40675.
Meissner, A., Gnirke, A., Bell, G. W., Ramsahoye, B., Lander, E. S., and Jaenisch, R.
(2005). Reduced representation bisulfite sequencing for comparative high-resolution DNA
methylation analysis. Nucleic Acids Research, 33(18):5868–5877.
Meissner, A., Mikkelsen, T. S., Gu, H., Wernig, M., Hanna, J., Sivachenko, A., Zhang, X.,
Bernstein, B. E., Nusbaum, C., Jaffe, D. B., Gnirke, A., Jaenisch, R., and Lander, E. S.
(2008). Genome-scale DNA methylation maps of pluripotent and differentiated cells.
Nature, 454(7205):766–70.
Mihaylova, M. M. and Shaw, R. J. (2011). The AMPK signalling pathway coordinates cell
growth, autophagy and metabolism. Nature Cell Biology, 13:1016–1023.
Milagre, I., Stubbs, T. M., King, M. R., Spindel, J., Santos, F., Krueger, F., Bachman, M.,
Segonds-Pichon, A., Balasubramanian, S., Andrews, S. R., Dean, W., and Reik, W. (2017).
Gender Differences in Global but Not Targeted Demethylation in iPSC Reprogramming.
Cell Reports, 18(5):1079–1089.
Min, K.-W., Zealy, R. W., Davila, S., Fomin, M., Cummings, J. C., Makowsky, D., Mcdowell,
C. H., Thigpen, H., Hafner, M., Kwon, S.-H., Georgescu, C., Wren, J. D., and Yoon, J.-H.
(2018). Profiling of m6A RNA modifications identified an age-associated regulation of
AGO2 mRNA stability. Aging Cell, 17(3):e12753.
Morris, J. Z., Tissenbaum, H. A., and Ruvkun, G. (1996). A phosphatidylinositol-3-OH
kinase family member regulating longevity and diapause in Caenorhabditis elegans. Nature,
382(6591):536–539.
Morris, K. V. and Mattick, J. S. (2014). The rise of regulatory RNA. Nature Reviews Genetics,
15:423–437.
Morris, T. J. and Beck, S. (2015). Analysis pipelines and packages for Infinium Human-
Methylation450 BeadChip (450k) data. Methods, 72:3–8.
Most, J., Tosti, V., Redman, L. M., and Fontana, L. (2017). Calorie restriction in humans:
An update. Ageing Research Reviews, 39:36–45.
Mostoslavsky, R., Chua, K. F., Lombard, D. B., Pang, W. W., Fischer, M. R., Gellon, L.,
Liu, P., Mostoslavsky, G., Franco, S., Murphy, M. M., Mills, K. D., Patel, P., Hsu, J. T.,
Hong, A. L., Ford, E., Cheng, H.-L., Kennedy, C., Nunez, N., Bronson, R., Frendewey,
D., Auerbach, W., Valenzuela, D., Karow, M., Hottiger, M. O., Hursting, S., Barrett,
J. C., Guarente, L., Mulligan, R., Demple, B., Yancopoulos, G. D., and Alt, F. W. (2006).
Genomic Instability and Aging-like Phenotype in the Absence of Mammalian SIRT6. Cell,
124(2):315–329.
References 197
Narasimamurthy, R. and Virshup, D. M. (2017). Molecular Mechanisms Regulating Temper-
ature Compensation of the Circadian Clock .
Naumova, N., Smith, E. M., Zhan, Y., and Dekker, J. (2012). Analysis of long-range
chromatin interactions using Chromosome Conformation Capture. Methods, 58(3):192–
203.
Neri, F., Rapelli, S., Krepelova, A., Incarnato, D., Parlato, C., Basile, G., Maldotti, M.,
Anselmi, F., and Oliviero, S. (2017). Intragenic DNA methylation prevents spurious
transcription initiation. Nature, 543(7643):72–77.
Newell Stamper, B. L., Cypser, J. R., Kechris, K., Kitzenberg, D. A., Tedesco, P. M., and
Johnson, T. E. (2018). Movement decline across lifespan of Caenorhabditis elegans
mutants in the insulin/insulin-like signaling pathway. Aging Cell, 17(1):e12704.
Newman, A. B. and Sanders, J. L. (2013). Telomere Length in Epidemiology: A Biomarker
of Aging, Age-Related Disease, Both, or Neither? Epidemiologic Reviews, 35(1):112–131.
Newman, A. M., Liu, C. L., Green, M. R., Gentles, A. J., Feng, W., Xu, Y., Hoang, C. D.,
Diehn, M., and Alizadeh, A. A. (2015). Robust enumeration of cell subsets from tissue
expression profiles. Nature Methods, 12:453–457.
Ni, Z., Ebata, A., Alipanahiramandi, E., and Lee, S. S. (2012). Two SET domain containing
genes link epigenetic changes and aging in Caenorhabditis elegans. Aging Cell, 11(2):315–
325.
Nikolich-Žugich, J. (2018). The twilight of immunity: emerging concepts in aging of the
immune system. Nature Immunology, 19(1):10–19.
Oberdoerffer, P. and Sinclair, D. A. (2007). The role of nuclear architecture in genomic
instability and ageing. Nature Reviews Molecular Cell Biology, 8:692–702.
Ocampo, A., Reddy, P., Martinez-Redondo, P., Platero-Luengo, A., Hatanaka, F., Hishida,
T., Li, M., Lam, D., Kurita, M., Beyret, E., Araoka, T., Vazquez-Ferrer, E., Donoso, D.,
Roman, J. L., Xu, J., Rodriguez Esteban, C., Nuñez, G., Nuñez Delicado, E., Campistol,
J. M., Guillen, I., Guillen, P., and Izpisua Belmonte, J. C. (2016). In Vivo Amelioration of
Age-Associated Hallmarks by Partial Reprogramming. Cell, 167(7):1719–1733.e12.
Oh, G., Ebrahimi, S., Carlucci, M., Zhang, A., Nair, A., Groot, D. E., Labrie, V., Jia,
P., Oh, E. S., Jeremian, R. H., Susic, M., Shrestha, T. C., Ralph, M. R., Gordevicˇius,
J., Koncevicˇius, K., and Petronis, A. (2018). Cytosine modifications exhibit circadian
oscillations that are involved in epigenetic diversity and aging. Nature Communications,
9(1):644.
Oh, G., Koncevicˇius, K., Ebrahimi, S., Carlucci, M., Groot, D. E., Nair, A., Zhang, A.,
Krišcˇiu¯nas, A., Oh, E. S., Labrie, V., Wong, A. H. C., Gordevicˇius, J., Jia, P., Susic,
M., and Petronis, A. (2019). Circadian oscillations of cytosine modification in humans
contribute to epigenetic variability, aging, and complex disease. Genome Biology, 20(1):2.
Olova, N., Simpson, D. J., Marioni, R. E., and Chandra, T. (2019). Partial reprogramming
induces a steady decline in epigenetic age before loss of somatic identity. Aging Cell,
18(1):e12877.
198 References
Orr, W. C. (2016). Tightening the connection between transposable element mobilization
and aging. Proceedings of the National Academy of Sciences, 113(40):11069–11070.
O’Sullivan, R. J. and Karlseder, J. (2010). Telomeres: protecting chromosomes against
genome instability. Nature Reviews Molecular Cell Biology, 11:171–181.
Ou, H. D., Phan, S., Deerinck, T. J., Thor, A., Ellisman, M. H., and O’Shea, C. C. (2017).
ChromEMT: Visualizing 3D chromatin structure and compaction in interphase and mitotic
cells. Science, 357(6349):eaag0025.
Pal, S. and Tyler, J. K. (2016). Epigenetics and aging. Science Advances, 2(7):e1600584.
Partridge, L., Deelen, J., and Slagboom, P. E. (2018). Facing up to the global challenges of
ageing. Nature, 561(7721):45–56.
Patalano, S., Hore, T. A., Reik, W., and Sumner, S. (2012). Shifting behaviour: epigenetic
reprogramming in eusocial insects. Current Opinion in Cell Biology, 24(3):367–373.
Paul, D. S., Guilhamon, P., Karpathakis, A., Butcher, L. M., Thirlwell, C., Feber, A., and
Beck, S. (2014). Assessment of raindrop BS-seq as a method for large-scale, targeted
bisulfite sequencing. Epigenetics, 9(5):678–684.
Penn, N. W., Suwalski, R., O’Riley, C., Bojanowski, K., and Yura, R. (1972). The presence
of 5-hydroxymethylcytosine in animal deoxyribonucleic acid. Biochemical Journal,
126(4):781–790.
Perna, L., Zhang, Y., Mons, U., Holleczek, B., Saum, K.-U., and Brenner, H. (2016).
Epigenetic age acceleration predicts cancer, cardiovascular, and all-cause mortality in a
German case cohort. Clinical Epigenetics, 8(1):64.
Peters, J. (2014). The role of genomic imprinting in biology and disease: an expanding view.
Nature Reviews Genetics, 15:517–530.
Peters, M. J., Joehanes, R., Pilling, L. C., Schurmann, C., Conneely, K. N., Powell, J.,
Reinmaa, E., Sutphin, G. L., Zhernakova, A., Schramm, K., Wilson, Y. A., Kobes, S.,
Tukiainen, T., NABEC/UKBEC Consortium, Ramos, Y. F., Göring, H. H. H., Fornage, M.,
Liu, Y., Gharib, S. A., Stranger, B. E., De Jager, P. L., Aviv, A., Levy, D., Murabito, J. M.,
Munson, P. J., Huan, T., Hofman, A., Uitterlinden, A. G., Rivadeneira, F., van Rooij, J.,
Stolk, L., Broer, L., Verbiest, M. M. P. J., Jhamai, M., Arp, P., Metspalu, A., Tserel, L.,
Milani, L., Samani, N. J., Peterson, P., Kasela, S., Codd, V., Peters, A., Ward-Caviness,
C. K., Herder, C., Waldenberger, M., Roden, M., Singmann, P., Zeilinger, S., Illig, T.,
Homuth, G., Grabe, H.-J., Völzke, H., Steil, L., Kocher, T., Murray, A., Melzer, D.,
Yaghootkar, H., Bandinelli, S., Moses, E. K., Kent, J. W., Curran, J. E., Johnson, M. P.,
Williams-Blangero, S., Westra, H.-J., McRae, A. F., Smith, J. A., Kardia, S. L. R., Hovatta,
I., Perola, M., Ripatti, S., Salomaa, V., Henders, A. K., Martin, N. G., Smith, A. K., Mehta,
D., Binder, E. B., Nylocks, K. M., Kennedy, E. M., Klengel, T., Ding, J., Suchy-Dicey,
A. M., Enquobahrie, D. A., Brody, J., Rotter, J. I., Chen, Y.-D. I., Houwing-Duistermaat,
J., Kloppenburg, M., Slagboom, P. E., Helmer, Q., den Hollander, W., Bean, S., Raj, T.,
Bakhshi, N., Wang, Q. P., Oyston, L. J., Psaty, B. M., Tracy, R. P., Montgomery, G. W.,
Turner, S. T., Blangero, J., Meulenbelt, I., Ressler, K. J., Yang, J., Franke, L., Kettunen,
J., Visscher, P. M., Neely, G. G., Korstanje, R., Hanson, R. L., Prokisch, H., Ferrucci, L.,
References 199
Esko, T., Teumer, A., van Meurs, J. B. J., and Johnson, A. D. (2015). The transcriptional
landscape of age in human peripheral blood. Nature communications, 6:8570.
Petkovich, D. A., Podolskiy, D. I., Lobanov, A. V., Lee, S.-G., Miller, R. A., and Gladyshev,
V. N. (2017). Using DNA Methylation Profiling to Evaluate Biological Age and Longevity
Interventions. Cell Metabolism, 25(4):954–960.e6.
Peto, R. and Doll, R. (1997). There is no such thing as aging. BMJ, 315(7115):1030.
Pidsley, R., Zotenko, E., Peters, T. J., Lawrence, M. G., Risbridger, G. P., Molloy, P., Van
Djik, S., Muhlhausler, B., Stirzaker, C., and Clark, S. J. (2016). Critical evaluation of the
Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation
profiling. Genome Biology, 17(1):208.
Plongthongkum, N., Diep, D. H., and Zhang, K. (2014). Advances in the profiling of DNA
modifications: cytosine methylation and beyond. Nat Rev Genet, 15(10):647–661.
Polanowski, A. M., Robbins, J., Chandler, D., and Jarman, S. N. (2014). Epigenetic estimation
of age in humpback whales. Molecular Ecology Resources, 14(5):976–987.
Poulain, M., Herm, A., and Pes, G. (2013). The Blue Zones: areas of exceptional longevity
around the world. Vienna Yearbook of Population Research, 11:87–108.
Price, E. M. and Robinson, W. P. (2018). Adjusting for Batch Effects in DNA Methylation
Microarray Data, a Lesson Learned. Frontiers in Genetics, 9:83.
Pu, M., Ni, Z., Wang, M., Wang, X., Wood, J. G., Helfand, S. L., Yu, H., and Lee, S. S.
(2015). Trimethylation of Lys36 on H3 restricts gene expression change during aging and
impacts life span. Genes and Development, 29(7):718–731.
Putin, E., Mamoshina, P., Aliper, A., Korzinkin, M., Moskalev, A., Kolosov, A., Ostrovskiy,
A., Cantor, C., Vijg, J., and Zhavoronkov, A. (2016). Deep biomarkers of human aging:
Application of deep neural networks to biomarker development. Aging, 8(5):1021–1033.
Quinlan, A. R. and Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing
genomic features. Bioinformatics, 26(6):841–842.
Raddatz, G., Hagemann, S., Aran, D., Söhle, J., Kulkarni, P. P., Kaderali, L., Hellman, A.,
Winnefeld, M., and Lyko, F. (2013). Aging is associated with highly defined epigenetic
changes in the human epidermis. Epigenetics & Chromatin, 6(1):36.
Radford, E. J., Ito, M., Shi, H., Corish, J. A., Yamazawa, K., Isganaitis, E., Seisenberger,
S., Hore, T. A., Reik, W., Erkek, S., Peters, A. H. F. M., Patti, M.-E., and Ferguson-
Smith, A. C. (2014). In utero undernourishment perturbs the adult sperm methylome and
intergenerational metabolism. Science, 345(6198):1255903.
Rahmadi, R., Groot, P., van Rijn, M. H. C., van den Brand, J. A. J. G., Heins, M., Knoop,
H., and Heskes, T. (2017). Causality on longitudinal data: Stable specification search
in constrained structural equation modeling. Statistical Methods in Medical Research,
27(12):3814–3834.
200 References
Rakyan, V. K., Down, T. A., Balding, D. J., and Beck, S. (2011). Epigenome-wide association
studies for common human diseases. Nat Rev Genet, 12:529–541.
Rakyan, V. K., Down, T. A., Maslau, S., Andrew, T., Yang, T. P., Beyan, H., Whittaker, P.,
McCann, O. T., Finer, S., Valdes, A. M., Leslie, R. D., Deloukas, P., and Spector, T. D.
(2010). Human aging-associated DNA hypermethylation occurs preferentially at bivalent
chromatin domains. Genome Research, 20:434–439.
Rando, T. A. and Chang, H. Y. (2012). Aging, Rejuvenation, and Epigenetic Reprogramming:
Resetting the Aging Clock. Cell, 148(1):46–57.
Reddington, J. P., Perricone, S. M., Nestor, C. E., Reichmann, J., Youngson, N. A., and
Suzuki, M. (2013). Redistribution of H3K27me3 upon DNA hypomethylation results in
de-repression of Polycomb target genes. Genome Biol, 14:R25.
Redman, L. M., Smith, S. R., Burton, J. H., Martin, C. K., Il’yasova, D., and Ravussin,
E. (2018). Metabolic Slowing and Reduced Oxidative Damage with Sustained Caloric
Restriction Support the Rate of Living and Oxidative Damage Theories of Aging. Cell
Metabolism, 27(4):805–815.e4.
Reinberg, D. and Vales, L. D. (2018). Chromatin domains rich in inheritance. Science,
361(6397):33–34.
Reinius, L. E., Acevedo, N., Joerink, M., Pershagen, G., Dahlén, S.-E., Greco, D., Söderhäll,
C., Scheynius, A., and Kere, J. (2012). Differential DNA Methylation in Purified Human
Blood Cells: Implications for Cell Lineage and Studies on Disease Susceptibility. PLOS
ONE, 7(7):e41361.
Remolina, S. C. and Hughes, K. A. (2008). Evolution and mechanisms of long life and high
fertility in queen honey bees. Age, 30(2-3):177–185.
Renfrew, C., Boyd, M. J., and Morley, I. (2016). Death Rituals, Social Order and the
Archeology of Immortality in the Ancient World.
Research, Z. (2019). EZ DNA methylation-Direct™ Kit. Technical report.
Richter, A. S., Ryan, D. P., Kilpert, F., Ramírez, F., Heyne, S., and Manke, T. (2019).
pyBigWig GitHub Repository.
Richter, E. A. and Ruderman, N. B. (2009). AMPK and the biochemistry of exercise:
implications for human health and disease. Biochemical Journal, 418(2):261–275.
Ricklefs, R. E. (2010). Life-history connections to rates of aging in terrestrial vertebrates.
Proceedings of the National Academy of Sciences, 107(22):10314–10319.
Riggs, A. D. (1975). X inactivation, differentiation, and DNA methylation. Cytogenetic and
Genome Research, 14(1):9–25.
Rinaldi, L., Datta, D., Serrat, J., Morey, L., Solanas, G., Avgustinova, A., Blanco, E.,
Pons, J. I., Matallanas, D., Von Kriegsheim, A., Di Croce, L., and Benitah, S. A. (2016).
Dnmt3a and Dnmt3b Associate with Enhancers to Regulate Human Epidermal Stem Cell
Homeostasis. Cell Stem Cell, 19(4):491–501.
References 201
Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., and Smyth, G. K. (2015).
limma powers differential expression analyses for RNA-sequencing and microarray studies.
Nucleic Acids Research, 43(7):e47.
Roberts, R. J., Vincze, T., Posfai, J., and Macelis, D. (2005). REBASE—restriction enzymes
and DNA methyltransferases. Nucleic Acids Research, 33(suppl_1):D230–D232.
Roberts, R. J., Vincze, T., Posfai, J., and Macelis, D. (2015). REBASE—a database for
DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Research,
43(D1):D298–D299.
Roby, J., C., J. A., E., M. R., C., P. L., M., R. L., R., M. P., Weihua, G., Tao, X., E., E. C.,
Stella, A., Hortensia, M.-M., A., S. J., A., B. J., Radhika, D., Paul, Y., S., P. J., Sonja, K.,
H., S. S., F., M. A., Kurt, L., Jin, S., M., A. D., Luigi, F., Wei, Z., W., D. E., Jan, B., L.,
G. M., Tianxiao, H., Chunyu, L., M., M. M., Chen, Y., P., K. D., Annette, P., Rui, W.-S.,
M., V. P., R., W. N., M., S. J., Jingzhong, D., J., R. C., J., W. N., R., I. M., Degui, Z.,
Myrto, B., Paolo, V., Srikant, A., G., U. A., Albert, H., Joel, S., Elena, C., Lifang, H., S.,
V. P., G., H. D., B., S. A., Stefania, B., T., T. S., B., W. E., K., S. A., Torsten, K., B., B. E.,
M., P. B., D., T. K., A., G. S., R., S. B., Liming, L., L., D. D., T., O. G., Zdenko, H., J.,
R. K., N., C. K., Nona, S., R., K. S. L., David, M., A., B. A., J., v. M. J. B., Isabelle, R.,
K., A. D., K., O. K., Yongmei, L., Melanie, W., J., D. I., Myriam, F., Daniel, L., and J.,
L. S. (2016). Epigenetic Signatures of Cigarette Smoking. Circulation: Cardiovascular
Genetics, 9(5):436–447.
Ruby, J. G., Smith, M., and Buffenstein, R. (2018a). Naked mole-rat mortality rates defy
Gompertzian laws by not increasing with age. eLife, 7:e31157.
Ruby, J. G., Wright, K. M., Rand, K. A., Kermany, A., Noto, K., Curtis, D., Varner, N.,
Garrigan, D., Slinkov, D., Dorfman, I., Granka, J. M., Byrnes, J., Myres, N., and Ball, C.
(2018b). Estimates of the Heritability of Human Longevity Are Substantially Inflated due
to Assortative Mating. Genetics, 210(3):1109–1124.
Rulands, S., Lee, H. J., Clark, S. J., Angermueller, C., Smallwood, S. A., Krueger, F.,
Mohammed, H., Dean, W., Nichols, J., Rugg-Gunn, P., Kelsey, G., Stegle, O., Simons,
B. D., and Reik, W. (2018). Genome-Scale Oscillations in DNA Methylation during Exit
from Pluripotency. Cell Systems, 7(1):63–76.e12.
Sánchez-Romero, M. A., Cota, I., and Casadesús, J. (2015). DNA methylation in bacteria:
from the methyl group to the methylome. Current Opinion in Microbiology, 25:9–16.
Sarkar, T. J., Quarta, M., Mukherjee, S., Colville, A., Paine, P., Doan, L., Tran, C. M.,
Chu, C. R., Horvath, S., Bhutani, N., Rando, T. A., and Sebastiano, V. (2019). Transient
non-integrative nuclear reprogramming promotes multifaceted reversal of aging in human
cells. bioRxiv, page 573386.
Schenkel, L. C., Kernohan, K. D., McBride, A., Reina, D., Hodge, A., Ainsworth, P. J.,
Rodenhiser, D. I., Pare, G., Bérubé, N. G., Skinner, C., Boycott, K. M., Schwartz, C.,
and Sadikovic, B. (2017). Identification of epigenetic signature associated with alpha
thalassemia/mental retardation X-linked syndrome. Epigenetics & Chromatin, 10(1):10.
202 References
Schenkel, L. C., Schwartz, C., Skinner, C., Rodenhiser, D. I., Ainsworth, P. J., Pare, G.,
and Sadikovic, B. (2016). Clinical Validation of Fragile X Syndrome Screening by DNA
Methylation Array. The Journal of Molecular Diagnostics, 18(6):834–841.
Schübeler, D. (2015). Function and information content of DNA methylation. Nature,
517(7534):321–326.
Schultz, M. D., He, Y., Whitaker, J. W., Hariharan, M., Mukamel, E. A., Leung, D., Rajagopal,
N., Nery, J. R., Urich, M. A., Chen, H., Lin, S., Lin, Y., Jung, I., Schmitt, A. D., Selvaraj,
S., Ren, B., Sejnowski, T. J., Wang, W., and Ecker, J. R. (2015). Human body epigenome
maps reveal noncanonical DNA methylation variation. Nature, 523:212–216.
Sehl, M. E., Henry, J. E., Storniolo, A. M., Ganz, P. A., and Horvath, S. (2017). DNA
methylation age is elevated in breast tissue of healthy women. Breast Cancer Research
and Treatment, 164(1):209–219.
Seidler, S., Zimmermann, H. W., Bartneck, M., Trautwein, C., and Tacke, F. (2010). Age-
dependent alterations of monocyte subsets and monocyte-related chemokine pathways in
healthy adults. BMC Immunology, 11(1):30.
Sen, P., Dang, W., Donahue, G., Dai, J., Dorsey, J., Cao, X., Liu, W., Cao, K., Perry, R., Lee,
J. Y., Wasko, B. M., Carr, D. T., He, C., Robison, B., Wagner, J., Gregory, B. D., Kaeberlein,
M., Kennedy, B. K., Boeke, J. D., and Berger, S. L. (2015). H3K36 methylation promotes
longevity by enhancing transcriptional fidelity. Genes and Development, 29(13):1362–
1376.
Sen, P., Shah, P. P., Nativio, R., and Berger, S. L. (2016). Epigenetic Mechanisms of
Longevity and Aging. Cell, 166(4):822–839.
Sheather, S. J. (2009). A Modern Approach to Regression with R.
Shendure, J. and Ji, H. (2008). Next-generation DNA sequencing. Nature Biotechnology,
26:1135.
Shipony, Z., Mukamel, Z., Cohen, N. M., Landan, G., Chomsky, E., Zeliger, S. R., Fried,
Y. C., Ainbinder, E., Friedman, N., and Tanay, A. (2014). Dynamic and static maintenance
of epigenetic memory in pluripotent and somatic cells. Nature, 513(7516):115–119.
Sierro, N., Battey, J. N. D., Ouadi, S., Bakaher, N., Bovet, L., Willig, A., Goepfert, S., Peitsch,
M. C., and Ivanov, N. V. (2014). The tobacco genome sequence and its comparison with
those of tomato and potato. Nature Communications, 5:3833.
Singh, P. P., Demmitt, B. A., Nath, R. D., and Brunet, A. (2019). The Genetics of Aging: A
Vertebrate Perspective. Cell, 177(1):200–220.
Slieker, R. C., Relton, C. L., Gaunt, T. R., Slagboom, P. E., and Heijmans, B. T. (2018). Age-
related DNA methylation changes are tissue-specific with ELOVL2 promoter methylation
as exception. Epigenetics & Chromatin, 11(1):25.
References 203
Slieker, R. C., van Iterson, M., Luijk, R., Beekman, M., Zhernakova, D. V., Moed, M. H.,
Mei, H., van Galen, M., Deelen, P., Bonder, M. J., Zhernakova, A., Uitterlinden, A. G.,
Tigchelaar, E. F., Stehouwer, C. D. A., Schalkwijk, C. G., van der Kallen, C. J. H., Hofman,
A., van Heemst, D., de Geus, E. J., van Dongen, J., Deelen, J., van den Berg, L. H., van
Meurs, J., Jansen, R., ‘t Hoen, P. A. C., Franke, L., Wijmenga, C., Veldink, J. H., Swertz,
M. A., van Greevenbroek, M. M. J., van Duijn, C. M., Boomsma, D. I., Slagboom, P. E.,
Heijmans, B. T., and Consortium, B. (2016). Age-related accrual of methylomic variability
is linked to fundamental ageing mechanisms. Genome Biology, 17(1):191.
Smith, Z. D., Gu, H., Bock, C., Gnirke, A., and Meissner, A. (2009). High-throughput
bisulfite sequencing in mammalian genomes. Methods, 48(3):226–232.
Smith, Z. D. and Meissner, A. (2013). DNA methylation: roles in mammalian development.
Nature Reviews Genetics, 14:204–220.
Søraas, A., Matsuyama, M., de Lima, M., Wald, D., Buechner, J., Gedde-Dahl, T., Søraas,
C. L., Chen, B., Ferrucci, L., Dahl, J. A., Horvath, S., and Matsuyama, S. (2019). Epi-
genetic age is a cell-intrinsic property in transplanted human hematopoietic cells. Aging
Cell, 18(2):e12897.
Sørensen, C. S., Schotta, G., and Jørgensen, S. (2013). Histone H4 Lysine 20 methylation: key
player in epigenetic regulation of genomic integrity. Nucleic Acids Research, 41(5):2797–
2806.
Strahl, B. D. and Allis, C. D. (2000). The language of covalent histone modifications. Nature,
403(6765):41–45.
Streubel, G., Watson, A., Jammula, S. G., Scelfo, A., Fitzpatrick, D. J., Oliviero, G., McCole,
R., Conway, E., Glancy, E., Negri, G. L., Dillon, E., Wynne, K., Pasini, D., Krogan,
N. J., Bracken, A. P., and Cagney, G. (2018). The H3K36me2 Methyltransferase Nsd1
Demarcates PRC2-Mediated H3K27me2 and H3K27me3 Domains in Embryonic Stem
Cells. Molecular Cell, 70(2):371–379.e5.
Stroud, H., Do, T., Du, J., Zhong, X., Feng, S., Johnson, L., Patel, D. J., and Jacobsen,
S. E. (2013). Non-CG methylation patterns shape the epigenetic landscape in Arabidopsis.
Nature Structural & Molecular Biology, 21:64–72.
Stroustrup, N., Anthony, W. E., Nash, Z. M., Gowda, V., Gomez, A., López-Moyado, I. F.,
Apfeld, J., and Fontana, W. (2016). The temporal scaling of Caenorhabditis elegans ageing.
Nature, 530:103–107.
Stroustrup, N., Ulmschneider, B. E., Nash, Z. M., López-Moyado, I. F., Apfeld, J., and
Fontana, W. (2013). The Caenorhabditis elegans Lifespan Machine. Nature methods,
10(7):665–70.
Stubbs, T. M., Bonder, M. J., Stark, A.-K., Krueger, F., von Meyenn, F., Stegle, O., and
Reik, W. (2017). Multi-tissue DNA methylation age predictor in mouse. Genome Biology,
18(1):68.
204 References
Stunnenberg, H. G., Abrignani, S., Adams, D., de Almeida, M., Altucci, L., Amin, V., Amit,
I., Antonarakis, S. E., Aparicio, S., Arima, T., Arrigoni, L., Arts, R., Asnafi, V., Esteller,
M., Bae, J.-B., Bassler, K., Beck, S., Berkman, B., Bernstein, B. E., Bilenky, M., Bird,
A., Bock, C., Boehm, B., Bourque, G., Breeze, C. E., Brors, B., Bujold, D., Burren, O.,
Bussemakers, M. J., Butterworth, A., Campo, E., Carrillo-de Santa-Pau, E., Chadwick, L.,
Chan, K. M., Chen, W., Cheung, T. H., Chiapperino, L., Choi, N. H., Chung, H.-R., Clarke,
L., Connors, J. M., Cronet, P., Danesh, J., Dermitzakis, M., Drewes, G., Durek, P., Dyke,
S., Dylag, T., Eaves, C. J., Ebert, P., Eils, R., Eils, J., Ennis, C. A., Enver, T., Feingold,
E. A., Felder, B., Ferguson-Smith, A., Fitzgibbon, J., Flicek, P., Foo, R. S.-Y., Fraser, P.,
Frontini, M., Furlong, E., Gakkhar, S., Gasparoni, N., Gasparoni, G., Geschwind, D. H.,
Glažar, P., Graf, T., Grosveld, F., Guan, X.-Y., Guigo, R., Gut, I. G., Hamann, A., Han,
B.-G., Harris, R. A., Heath, S., Helin, K., Hengstler, J. G., Heravi-Moussavi, A., Herrup,
K., Hill, S., Hilton, J. A., Hitz, B. C., Horsthemke, B., Hu, M., Hwang, J.-Y., Ip, N. Y.,
Ito, T., Javierre, B.-M., Jenko, S., Jenuwein, T., Joly, Y., Jones, S. J. M., Kanai, Y., Kang,
H. G., Karsan, A., Kiemer, A. K., Kim, S. C., Kim, B.-J., Kim, H.-H., Kimura, H., Kinkley,
S., Klironomos, F., Koh, I.-U., Kostadima, M., Kressler, C., Kreuzhuber, R., Kundaje,
A., Küppers, R., Larabell, C., Lasko, P., Lathrop, M., Lee, D. H. S., Lee, S., Lehrach,
H., Leitão, E., Lengauer, T., Lernmark, Å., Leslie, R. D., Leung, G. K. K., Leung, D.,
Loeffler, M., Ma, Y., Mai, A., Manke, T., Marcotte, E. R., Marra, M. A., Martens, J. H. A.,
Martin-Subero, J. I., Maschke, K., Merten, C., Milosavljevic, A., Minucci, S., Mitsuyama,
T., Moore, R. A., Müller, F., Mungall, A. J., Netea, M. G., Nordström, K., Norstedt, I.,
Okae, H., Onuchic, V., Ouellette, F., Ouwehand, W., Pagani, M., Pancaldi, V., Pap, T.,
Pastinen, T., Patel, R., Paul, D. S., Pazin, M. J., Pelicci, P. G., Phillips, A. G., Polansky,
J., Porse, B., Pospisilik, J. A., Prabhakar, S., Procaccini, D. C., Radbruch, A., Rajewsky,
N., Rakyan, V., Reik, W., Ren, B., Richardson, D., Richter, A., Rico, D., Roberts, D. J.,
Rosenstiel, P., Rothstein, M., Salhab, A., Sasaki, H., Satterlee, J. S., Sauer, S., Schacht,
C., Schmidt, F., Schmitz, G., Schreiber, S., Schröder, C., Schübeler, D., Schultze, J. L.,
Schulyer, R. P., Schulz, M., Seifert, M., Shirahige, K., Siebert, R., Sierocinski, T., Siminoff,
L., Sinha, A., Soranzo, N., Spicuglia, S., Spivakov, M., Steidl, C., Strattan, J. S., Stratton,
M., Südbeck, P., Sun, H., Suzuki, N., Suzuki, Y., Tanay, A., Torrents, D., Tyson, F. L.,
Ulas, T., Ullrich, S., Ushijima, T., Valencia, A., Vellenga, E., Vingron, M., Wallace, C.,
Wallner, S., Walter, J., Wang, H., Weber, S., Weiler, N., Weller, A., Weng, A., Wilder, S.,
Wiseman, S. M., Wu, A. R., Wu, Z., Xiong, J., Yamashita, Y., Yang, X., Yap, D. Y., Yip,
K. Y., Yip, S., Yoo, J.-I., Zerbino, D., Zipprich, G., and Hirst, M. (2016). The International
Human Epigenome Consortium: A Blueprint for Scientific Collaboration and Discovery.
Cell, 167(5):1145–1149.
Sun, D., Luo, M., Jeong, M., Rodriguez, B., Xia, Z., Hannah, R., Wang, H., Le, T., Faull,
K. F., Chen, R., Gu, H., Bock, C., Meissner, A., Göttgens, B., Darlington, G. J., Li, W., and
Goodell, M. A. (2014a). Epigenomic profiling of young and aged HSCs reveals concerted
changes during aging that reinforce self-renewal. Cell Stem Cell, 14(5):673–688.
Sun, Y., Hou, R., Fu, X., Sun, C., Wang, S., Wang, C., Li, N., Zhang, L., and Bao, Z. (2014b).
Genome-Wide Analysis of DNA Methylation in Five Tissues of Zhikong Scallop, Chlamys
farreri. PLOS ONE, 9(1):e86232.
Suzuki, M. and Greally, J. M. (2013). Genome-wide DNA Methylation Analysis Using
Massively Parallel Sequencing Technologies. Seminars in Hematology, 50(1):70–77.
References 205
Sziráki, A., Tyshkovskiy, A., and Gladyshev, V. N. (2018). Global remodeling of the
mouse DNA methylome during aging and in response to calorie restriction. Aging Cell,
17(3):e12738.
Taher, L., Smith, R. P., Kim, M. J., Ahituv, N., and Ovcharenko, I. (2013). Sequence
signatures extracted from proximal promoters can be used to predict distal enhancers.
Genome Biology, 14(10):R117.
Tahiliani, M., Koh, K. P., Shen, Y., Pastor, W. A., Bandukwala, H., and Brudno, Y. (2009).
Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL
partner TET1. Science, 324(5929):930–935.
Taiwo, O., Wilson, G. A., Morris, T., Seisenberger, S., Reik, W., Pearce, D., Beck, S., and
Butcher, L. M. (2012). Methylome analysis using MeDIP-seq with low DNA concentra-
tions. Nature Protocols, 7:617–636.
Takahashi, Y., Wu, J., Suzuki, K., Martinez-Redondo, P., Li, M., Liao, H.-K., Wu, M.-Z.,
Hernández-Benítez, R., Hishida, T., Shokhirev, M. N., Esteban, C. R., Sancho-Martinez, I.,
and Belmonte, J. C. I. (2017). Integration of CpG-free DNA induces de novo methylation
of CpG islands in pluripotent stem cells. Science, 356(6337):503–508.
Talens, R. P., Christensen, K., Putter, H., Willemsen, G., Christiansen, L., Kremer, D.,
Suchiman, H. E. D., Slagboom, P. E., Boomsma, D. I., and Heijmans, B. T. (2012).
Epigenetic variation during the adult lifespan: cross-sectional and longitudinal data on
monozygotic twin pairs. Aging Cell, 11(4):694–703.
Tan, L., Ke, Z., Tombline, G., Macoretta, N., Hayes, K., Tian, X., Lv, R., Ablaeva, J.,
Gilbert, M., Bhanu, N. V., Yuan, Z.-F., Garcia, B. A., Shi, Y. G., Shi, Y., Seluanov, A., and
Gorbunova, V. (2017). Naked Mole Rat Cells Have a Stable Epigenome that Resists iPSC
Reprogramming. Stem Cell Reports, 9(5):1721–1734.
Tanaka, T., Biancotto, A., Moaddel, R., Moore, A. Z., Gonzalez-Freire, M., Aon, M. A.,
Candia, J., Zhang, P., Cheung, F., Fantoni, G., Consortium, C. H. I., Semba, R. D., and
Ferrucci, L. (2018). Plasma proteomic signature of age in healthy humans. Aging Cell,
17(5):e12799.
Tanas, A. S., Borisova, M. E., Kuznetsova, E. B., Rudenko, V. V., Karandasheva, K. O.,
Nemtsova, M. V., Izhevskaya, V. L., Simonova, O. A., Larin, S. S., Zaletaev, D. V., and
Strelnikov, V. V. (2017). Rapid and affordable genome-wide bisulfite DNA sequencing by
XmaI-reduced representation bisulfite sequencing. Epigenomics, 9(6):833–847.
Tang, W. W. C., Kobayashi, T., Irie, N., Dietmann, S., and Surani, M. A. (2016). Specification
and epigenetic programming of the human germ line. Nature Reviews Genetics, 17:585–
600.
Taudt, A., Colomé-Tatché, M., and Johannes, F. (2016). Genetic sources of population
epigenomic variation. Nature Reviews Genetics, 17:319–332.
Teschendorff, A. E., Breeze, C. E., Zheng, S. C., and Beck, S. (2017). A comparison of
reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide
Association Studies. BMC Bioinformatics, 18(1):105.
206 References
Teschendorff, A. E., Marabita, F., Lechner, M., Bartlett, T., Tegner, J., Gomez-Cabrero, D.,
and Beck, S. (2012). A Beta-Mixture Quantile Normalisation method for correcting probe
design bias in Illumina Infinium 450k DNA methylation data. Bioinformatics (Oxford,
England), 29(2):189–196.
Teschendorff, A. E., Menon, U., Gentry-Maharaj, A., Ramus, S. J., Weisenberger, D. J., Shen,
H., Campan, M., Noushmehr, H., Bell, C. G., Maxwell, A. P., Savage, D. A., Mueller-
Holzner, E., Marth, C., Kocjan, G., Gayther, S. A., Jones, A., Beck, S., Wagner, W., Laird,
P. W., Jacobs, I. J., and Widschwendter, M. (2010). Age-dependent DNA methylation
of genes that are suppressed in stem cells is a hallmark of cancer. Genome Research,
20(4):440–446.
Teschendorff, A. E. and Relton, C. L. (2018). Statistical and integrative system-level analysis
of DNA methylation data. Nature Reviews Genetics, 19:129–147.
Teschendorff, A. E., Yang, Z., Wong, A., Pipinikas, C. P., Jiao, Y., Jones, A., Anjum, S.,
Hardy, R., Salvesen, H. B., Thirlwell, C., Janes, S. M., Kuh, D., and Widschwendter, M.
(2015). Correlation of Smoking-Associated DNA Methylation Changes in Buccal Cells
With DNA Methylation Changes in Epithelial CancerSmoking and DNA Methylation
Changes in Buccal Cells and Epithelial CancerSmoking and DNA Methylation Changes
in Buccal Cells and Epi. JAMA Oncology, 1(4):476–485.
Teschendorff, A. E. and Zheng, S. C. (2017a). Cell-type deconvolution in epigenome-wide
association studies: a review and recommendations. Epigenomics, 9(5):757–768.
Teschendorff, A. E. and Zheng, S. C. (2017b). EpiDISH Bioconductor Package.
Thompson, M. J., Chwiałkowska, K., Rubbi, L., Lusis, A. J., Davis, R. C., Srivastava, A.,
Korstanje, R., Churchill, G. A., Horvath, S., and Pellegrini, M. (2018). A multi-tissue full
lifespan epigenetic clock for mice. Aging, 10(10):2832–2854.
Thompson, M. J., von Holdt, B., Horvath, S., and Pellegrini, M. (2017). An epigenetic aging
clock for dogs and wolves. Aging, 9(3):1055–1068.
Thomson, W. (1889). Popular lectures and addresses. London Macmillan.
Titus, A. J., Gallimore, R. M., Salas, L. A., and Christensen, B. C. (2017). Cell-type
deconvolution from DNA methylation: a review of recent applications. Human Molecular
Genetics, 26(R2):R216–R224.
Tomás-Loba, A., Flores, I., Fernández-Marcos, P. J., Cayuela, M. L., Maraver, A., Tejera, A.,
Borrás, C., Matheu, A., Klatt, P., Flores, J. M., Viña, J., Serrano, M., and Blasco, M. A.
(2008). Telomerase Reverse Transcriptase Delays Aging in Cancer-Resistant Mice. Cell,
135(4):609–622.
Tomida, M. W., Gaddis, S., Takata, Y., Liu, B., Lin, K., Estecio, M. R., Hardikar, S., Lu,
Y., Veland, N., Zeng, Y., Chen, T., Shen, J., Saha, D., Gowher, H., and Zhao, H. (2018).
DNMT3L facilitates DNA methylation partly by maintaining DNMT3A stability in mouse
embryonic stem cells. Nucleic Acids Research, 47(1):152–167.
References 207
Touleimat, N. and Tost, J. (2012). Complete pipeline for Infinium® Human Methylation
450K BeadChip data processing using subset quantile normalization for accurate DNA
methylation estimation. Epigenomics, 4(3):325–341.
Triche Jr, T. J., Weisenberger, D. J., Van Den Berg, D., Laird, P. W., and Siegmund, K. D.
(2013). Low-level processing of Illumina Infinium DNA Methylation BeadArrays. Nucleic
Acids Research, 41(7):e90.
Trojer, P. and Reinberg, D. (2007). Facultative Heterochromatin: Is There a Distinctive
Molecular Signature? Molecular Cell, 28(1):1–13.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein,
D., and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays.
Bioinformatics, 17(6):520–525.
Truong, T. P., Sakata-Yanagimoto, M., Yamada, M., Nagae, G., Enami, T., Nakamoto-
Matsubara, R., Aburatani, H., and Chiba, S. (2015). Age-Dependent Decrease of DNA Hy-
droxymethylation in Human T Cells. Journal of Clinical and Experimental Hematopathol-
ogy, 55(1):1–6.
Tsurumi, A. and Li, W. (2012). Global heterochromatin loss: A unifying theory of aging?
Epigenetics, 7(7):680–688.
Tullet, J. M. A., Hertweck, M., An, J. H., Baker, J., Hwang, J. Y., Liu, S., Oliveira, R. P.,
Baumeister, R., and Blackwell, T. K. (2008). Direct Inhibition of the Longevity-Promoting
Factor SKN-1 by Insulin-like Signaling in C. elegans. Cell, 132(6):1025–1038.
Um, S. H., D’Alessio, D., and Thomas, G. (2006). Nutrient overload, insulin resistance, and
ribosomal protein S6 kinase 1, S6K1. Cell Metabolism, 3(6):393–402.
van Dongen, J., Nivard, M. G., Willemsen, G., Hottenga, J.-J., Helmer, Q., Dolan, C. V.,
Ehli, E. A., Davies, G. E., van Iterson, M., Breeze, C. E., Beck, S., Consortium, B., Hoen,
P. A., Pool, R., van Greevenbroek, M. M. J., Stehouwer, C. D. A., van der Kallen, C. J. H.,
Schalkwijk, C. G., Wijmenga, C., Zhernakova, S., Tigchelaar, E. F., Beekman, M., Deelen,
J., van Heemst, D., Veldink, J. H., van den Berg, L. H., van Duijn, C. M., Hofman, B. A.,
Uitterlinden, A. G., Jhamai, P. M., Verbiest, M., Verkerk, M., van der Breggen, R., van
Rooij, J., Lakenberg, N., Mei, H., Bot, J., Zhernakova, D. V., van’t Hof, P., Deelen, P.,
Nooren, I., Moed, M., Vermaat, M., Luijk, R., Bonder, M. J., van Dijk, F., van Galen,
M., Arindrarto, W., Kielbasa, S. M., Swertz, M. A., van Zwet, E. W., Isaacs, A., Franke,
L., Suchiman, H. E., Jansen, R., van Meurs, J. B., Heijmans, B. T., Slagboom, P. E., and
Boomsma, D. I. (2016). Genetic and environmental influences interact with age and sex in
shaping the human methylome. Nature Communications, 7:11115.
van Iterson, M., van Zwet, E. W., Heijmans, B. T., and Consortium, t. B. (2017). Controlling
bias and inflation in epigenome- and transcriptome-wide association studies using the
empirical null distribution. Genome Biology, 18(1):19.
Villeponteau, B. (1997). The heterochromatin loss model of aging. Experimental Gerontol-
ogy, 32(4):383–394.
Visscher, P. M., Wray, N. R., Zhang, Q., Sklar, P., McCarthy, M. I., Brown, M. A., and Yang,
J. (2017). 10 Years of GWAS Discovery: Biology, Function, and Translation.
208 References
Voigt, P., Tee, W. W., and Reinberg, D. (2013). A double take on bivalent promoters. Genes
and Development, 27:1318–1338.
Waddington, C. H. (1942). The Epigenotype. Endeavor, 1:18–20.
Waddington, C. H. (1957). The cybernetics of development. In The strategy of the genes,
pages 27–38.
Wagner, E. J. and Carpenter, P. B. (2012). Understanding the language of Lys36 methylation
at histone H3. Nature Reviews Molecular Cell Biology, 13:115–126.
Wang, J., Xia, Y., Li, L., Gong, D., Yao, Y., Luo, H., Lu, H., Yi, N., Wu, H., Zhang, X., Tao,
Q., and Gao, F. (2013). Double restriction-enzyme digestion improves the coverage and
accuracy of genome-wide CpG methylation profiling by reduced representation bisulfite
sequencing. BMC genomics, 14:11.
Wang, T., Tsui, B., Kreisberg, J. F., Robertson, N. A., Gross, A. M., Yu, M. K., Carter, H.,
Brown-Borg, H. M., Adams, P. D., and Ideker, T. (2017). Epigenetic aging signatures in
mice livers are slowed by dwarfism, calorie restriction and rapamycin treatment. Genome
Biology, 18(1):57.
Wei, M., Brandhorst, S., Shelehchi, M., Mirzaei, H., Cheng, C. W., Budniak, J., Groshen,
S., Mack, W. J., Guen, E., Di Biase, S., Cohen, P., Morgan, T. E., Dorff, T., Hong,
K., Michalsen, A., Laviano, A., and Longo, V. D. (2017). Fasting-mimicking diet and
markers/risk factors for aging, diabetes, cancer, and cardiovascular disease. Science
Translational Medicine, 9(377):eaai8700.
Weidner, C. I., Lin, Q., Koch, C. M., Eisele, L., Beier, F., and Ziegler, P. (2014). Aging of
blood can be tracked by DNA methylation changes at just three CpG sites. Genome Biol,
15:R24.
West, J., Teschendorff, A. E., and Beck, S. (2013). Age-associated epigenetic drift: implica-
tions, and a case of epigenetic thrift? Human Molecular Genetics, 22(R1):R7–R15.
Whitaker, J. W., Chen, Z., and Wang, W. (2014). Predicting the human epigenome from
DNA motifs. Nature Methods, 12:265–272.
Widschwendter, M., Jones, A., Evans, I., Reisel, D., Dillner, J., Sundström, K., Steyerberg,
E. W., Vergouwe, Y., Wegwarth, O., Rebitschek, F. G., Siebert, U., Sroczynski, G.,
de Beaufort, I. D., Bolt, I., Cibula, D., Zikan, M., Bjørge, L., Colombo, N., Harbeck,
N., Dudbridge, F., Tasse, A.-M., Knoppers, B. M., Joly, Y., Teschendorff, A. E., and
Pashayan, N. (2018). Epigenome-based cancer risk prediction: rationale, opportunities
and challenges. Nature Reviews Clinical Oncology, 15:292–309.
Wilhelm-Benartzi, C. S., Koestler, D. C., Karagas, M. R., Flanagan, J. M., Christensen, B. C.,
Kelsey, K. T., Marsit, C. J., Houseman, E. A., and Brown, R. (2013). Review of processing
and analysis methods for DNA methylation array data. Br J Cancer, 109(6):1394–1402.
Williams, G. C. (1957). Pleiotropy, Natural Selection, and the Evolution of Senescence.
Evolution, 11(4):398–411.
References 209
Witten, M. (1986). Information content of biological survival curves arising in aging
experiments: some further thoughts. In Evolution of longevity in animals: a comparative
approach., pages 295–317.
Wu, C.-t. and Morris, J. R. (2001). Genes, Genetics, and Epigenetics: A Correspondence.
Science, 293(5532):1103–1105.
Wu, H., Xu, T., Feng, H., Chen, L., Li, B., Yao, B., Qin, Z., Jin, P., and Conneely, K. N. (2015).
Detection of differentially methylated regions from whole-genome bisulfite sequencing
data without replicates. Nucleic Acids Research, 43(21):e141–e141.
Wu, T. P., Wang, T., Seetin, M. G., Lai, Y., Zhu, S., Lin, K., Liu, Y., Byrum, S. D., Mackintosh,
S. G., Zhong, M., Tackett, A., Wang, G., Hon, L. S., Fang, G., Swenberg, J. a., and Xiao,
A. Z. (2016). DNA methylation on N6-adenine in mammalian embryonic stem cells.
Nature, 532:1–18.
Wu, X. and Zhang, Y. (2017). TET-mediated active DNA demethylation: mechanism,
function and beyond. Nature Reviews Genetics, 18:517–534.
Wutz, A. (2011). Gene silencing in X-chromosome inactivation: advances in understanding
facultative heterochromatin formation. Nature Reviews Genetics, 12:542–553.
Xiao, C.-L., Zhu, S., He, M., Chen, D., Zhang, Q., Chen, Y., Yu, G., Liu, J., Xie, S.-
Q., Luo, F., Liang, Z., Wang, D.-P., Bo, X.-C., Gu, X.-F., Wang, K., and Yan, G.-R.
(2018). N6-Methyladenine DNA Modification in the Human Genome. Molecular Cell,
71(2):306–318.e7.
Xie, H., Wang, M., De Andrade, A., Bonaldo, M. D. F., Galat, V., Arndt, K., Rajaram, V.,
Goldman, S., Tomita, T., and Soares, M. B. (2011). Genome-wide quantitative assessment
of variation in DNA methylation patterns. Nucleic Acids Research, 39(10):4099–4108.
Xie, W., Schultz, M. D., Lister, R., Hou, Z., Rajagopal, N., Ray, P., Whitaker, J. W., Tian,
S., Hawkins, R. D., Leung, D., Yang, H., Wang, T., Lee, A. Y., Swanson, S. A., Zhang,
J., Zhu, Y., Kim, A., Nery, J. R., Urich, M. A., Kuan, S., Yen, C.-a., Klugman, S., Yu,
P., Suknuntha, K., Propson, N. E., Chen, H., Edsall, L. E., Wagner, U., Li, Y., Ye, Z.,
Kulkarni, A., Xuan, Z., Chung, W.-Y., Chi, N. C., Antosiewicz-Bourget, J. E., Slukvin, I.,
Stewart, R., Zhang, M. Q., Wang, W., Thomson, J. A., Ecker, J. R., and Ren, B. (2013).
Epigenomic Analysis of Multilineage Differentiation of Human Embryonic Stem Cells.
Cell, 153(5):1134–1148.
Xu, M., Pirtskhalava, T., Farr, J. N., Weigand, B. M., Palmer, A. K., Weivoda, M. M., Inman,
C. L., Ogrodnik, M. B., Hachfeld, C. M., Fraser, D. G., Onken, J. L., Johnson, K. O.,
Verzosa, G. C., Langhi, L. G. P., Weigl, M., Giorgadze, N., LeBrasseur, N. K., Miller,
J. D., Jurk, D., Singh, R. J., Allison, D. B., Ejima, K., Hubbard, G. B., Ikeno, Y., Cubro,
H., Garovic, V. D., Hou, X., Weroha, S. J., Robbins, P. D., Niedernhofer, L. J., Khosla,
S., Tchkonia, T., and Kirkland, J. L. (2018). Senolytics improve physical function and
increase lifespan in old age. Nature Medicine, 24(8):1246–1256.
Yang, L., Rodriguez, B., Mayle, A., Park, H. J., Lin, X., Luo, M., Jeong, M., Curry, C. V.,
Kim, S.-B., Ruau, D., Zhang, X., Zhou, T., Zhou, M., Rebel, V. I., Challen, G. A.,
Göttgens, B., Lee, J.-S., Rau, R., Li, W., and Goodell, M. A. (2016a). DNMT3A Loss
210 References
Drives Enhancer Hypomethylation in FLT3-ITD-Associated Leukemias. Cancer Cell,
30(2):363–365.
Yang, Y., Sebra, R., Pullman, B. S., Qiao, W., Peter, I., Desnick, R. J., Geyer, C. R., DeCoteau,
J. F., and Scott, S. A. (2015). Quantitative and multiplexed DNA methylation analysis using
long-read single-molecule real-time bisulfite sequencing (SMRT-BS). BMC Genomics,
16(1):350.
Yang, Y. C., Boen, C., Gerken, K., Li, T., Schorpp, K., and Harris, K. M. (2016b). Social
relationships and physiological determinants of longevity across the human life span.
Proceedings of the National Academy of Sciences, 113(3):578–583.
Yang, Z., Wong, A., Kuh, D., Paul, D. S., Rakyan, V. K., Leslie, R. D., Zheng, S. C.,
Widschwendter, M., Beck, S., and Teschendorff, A. E. (2016c). Correlation of an epigenetic
mitotic clock with cancer risk. Genome Biology, 17(1):205.
Yong, W.-S., Hsu, F.-M., and Chen, P.-Y. (2016). Profiling genome-wide DNA methylation.
Epigenetics & Chromatin, 9(1):26.
Yu, L., Liu, C., Bennett, K., Wu, Y.-Z., Dai, Z., Vandeusen, J., Opavsky, R., Raval, A., Trikha,
P., Rodriguez, B., Becknell, B., Mao, C., Lee, S., Davuluri, R. V., Leone, G., Van den
Veyver, I. B., Caligiuri, M. A., and Plass, C. (2004). A NotI–EcoRV promoter library
for studies of genetic and epigenetic alterations in mouse models of human malignancies.
Genomics, 84(4):647–660.
Yuan, T., Jiao, Y., de Jong, S., Ophoff, R. A., Beck, S., and Teschendorff, A. E. (2015). An
Integrative Multi-scale Analysis of the Dynamic DNA Methylation Landscape in Aging.
PLOS Genetics, 11(2):e1004996.
Zbiec´-Piekarska, R., Spólnicka, M., Kupiec, T., Makowska, Z˙., Spas, A., Parys-Proszek, A.,
Kucharczyk, K., Płoski, R., and Branicki, W. (2015). Examination of DNA methylation
status of the ELOVL2 marker may be useful for human age prediction in forensic science.
Forensic Science International: Genetics, 14:161–167.
Zeng, J., Nagrajan, H. K., and Yi, S. V. (2014). Fundamental diversity of human CpG islands
at multiple biological levels. Epigenetics, 9(4):483–491.
Zhang, R., Chen, W., and Adams, P. D. (2007). Molecular Dissection of Formation
of Senescence-Associated Heterochromatin Foci. Molecular and Cellular Biology,
27(6):2343–2358.
Zhang, W., Li, J., Suzuki, K., Qu, J., Wang, P., Zhou, J., Liu, X., Ren, R., Xu, X., Ocampo,
A., Yuan, T., Yang, J., Li, Y., Shi, L., Guan, D., Pan, H., Duan, S., Ding, Z., Li, M., Yi, F.,
Bai, R., Wang, Y., Chen, C., Yang, F., Li, X., Wang, Z., Aizawa, E., Goebl, A., Soligalla,
R. D., Reddy, P., Esteban, C. R., Tang, F., Liu, G.-H., and Belmonte, J. C. I. (2015a). A
Werner syndrome stem cell model unveils heterochromatin alterations as a driver of human
aging. Science, 348(6239):1160– 1163.
Zhang, W., Spector, T. D., Deloukas, P., Bell, J. T., and Engelhardt, B. E. (2015b). Predicting
genome-wide DNA methylation using methylation marks, genomic position, and DNA
regulatory elements. Genome biology, 16(1):14.
References 211
Zheng, S. C., Breeze, C. E., Beck, S., and Teschendorff, A. E. (2018). Identification
of differentially methylated cell types in epigenome-wide association studies. Nature
Methods, 15(12):1059–1066.
Zheng, S. C., Widschwendter, M., and Teschendorff, A. E. (2016). Epigenetic drift, epigenetic
clocks and cancer risk. Epigenomics, 8(5):705–719.
Zhou, W., Dinh, H. Q., Ramjan, Z., Weisenberger, D. J., Nicolet, C. M., Shen, H., Laird,
P. W., and Berman, B. P. (2018). DNA methylation loss in late-replicating domains is
linked to mitotic cell division. Nature Genetics, 50(4):591–602.
Zhu, T., Zheng, S. C., Paul, D. S., Horvath, S., and Teschendorff, A. E. (2018). Cell and
tissue type independent age-associated DNA methylation changes are not rare but common.
Aging, 10(11):3541–3557.
Zhuang, J., Widschwendter, M., and Teschendorff, A. E. (2012). A comparison of feature se-
lection and classification methods in DNA methylation studies using the Illumina Infinium
platform. BMC Bioinformatics, 13(1):59.
Ziller, M. J., Gu, H., Mueller, F., Donaghey, J., Tsai, L. T., and Kohlbacher, O. (2013).
Charting a dynamic DNA methylation landscape of the human genome. Nature, 500:477–
481.
Ziller, M. J., Müller, F., Liao, J., Zhang, Y., Gu, H., Bock, C., Boyle, P., Epstein, C. B.,
Bernstein, B. E., Lengauer, T., Gnirke, A., and Meissner, A. (2011). Genomic Distribution
and Inter-Sample Variation of Non-CpG Methylation across Human Cell Types. PLOS
Genetics, 7(12):e1002389.