DNA methylation: the “stable” epigenetic mark 
 
 
Amir Daniel Hay 

St John’s College 

 
This dissertation is submitted for the degree of  

Doctor of Philosophy 

 
Department of Genetics 

University of Cambridge 

 
September 2022 
 
 
Declaration 
 

This thesis is the result of my own work and includes nothing which is the outcome of work 

done in collaboration except as declared in the text. The use of “we” reflects my own work, 

unless specifically stated otherwise. It is not substantially the same as any work that has 

already been submitted for any degree to any university or institution. It does not exceed 

the prescribed word limit for the School of Biological Sciences Degree Committee.  

 
Amir Daniel Hay 

September 2022 

  
Acknowledgements 
 

I would like to thank my supervisor Anne Ferguson-Smith for taking me on as a student, 

supporting me throughout my time in Cambridge, and pushing me to become a better and 

more complete scientist. Under your guidance, the scope of what I have learned ranges 

from the very technical to the highly conceptual; with your trust, I have progressed to 

conducting independent research with the knowledge that I can always do better. The 

opportunity to do research in such an enriching environment has been an immense privilege 

for which I cannot thank you enough – a gratitude that I would also like to express to all 

the past and present members of the AFS lab.  

I am especially grateful to fellow PhD student Noah Kessler who has been a peer, 

mentor, and friend, by showing me the ropes of bioinformatics, challenging me to go 

beyond my perceived limits, and knowing when it is a good time to take a coffee break. 

Another special thank you goes to Jessica Elmer for her collaborative ethos that made 

working together both fun and interesting. Thank you to Tessa Bertozzi for welcoming me 

to the lab, introducing me to the pyrosequencer, teaching me how to work with mice, and 

making me feel at home in Cambridge.  

Thank you to Carol Edwards and Fran Dearden for help with 4C; to Mitsuteru Ito 

for teaching me how to make MEFs and providing the much-needed day-to-day equipment 

maintenance that allowed me to work in the lab relatively unimpeded; to Nozomi Takahashi 

for teaching me to how make mESCs, managing the Dnmt3a/3b mutant mice, and for 

always providing me with feedback on my project; to Geula Hanin, Shrina Patel, Stephanie 

Telerman, and Boshra Alsulaiti for support with western blotting; to Jessi Becker for taking 

on the CTCF project; to Hugo Tavares for supervising my RNA-seq analyses; to Chrysante 

Iliakis for her assistance at the bench; and to the students whom I had the opportunity to 

supervise, William Xie, Eve Ainscough, Gloria Jansen, and William Saunter for all their 

hard work.  


Acknowledgements ii 

Outside the AFS lab, the most influential figure on my work has been Felipe 

Teixeira. The concepts behind Chapter 3 of this thesis arose from our conversations in the 

department hallways, and simple offhand drawings on the white board that would 

eventually become serious experiments. These humble beginnings spurred a full-fledged 

collaboration regarding the direction and scope of the project for which I am extremely 

appreciative. I would also like to thank Daniel Gebert, postdoc in the Teixeira group, for 

working with me to elucidate the relationship between DNA methylation fidelity and 

transposable elements. Our other collaborators include Jamie Hackett for whom I would 

like to thank for sharing with us piRNA mutant mouse material. I am looking forward to 

our continued collaboration with Ben Simons, Steffen Rulands, and Matteo Ciarchi to 

model the inheritance of methylation – an exciting project from which I have learned about 

the intricacy of interdisciplinary science, as well as the rewards. 

I have received a fair amount of technical advice and guidance, as well as general 

goodwill, that has proven to be critical for the completion of this thesis. Thank you to my 

advisor Julie Ahringer and her research associate Alex Appert for giving me access to a 

sequencer to test my first ChIP-seq libraries; to Michael Imbeault for helping me optimise 

my ChIP protocol; to Ben Harvey from Agilent for supervising me the first time I 

performed target capture bisulphite sequencing; to Rahia Mashoodh for advice on statistical 

analyses; to the mouse facility for providing care to our mice; to Novogene for sequencing 

our 4C libraries and to the CRUK genomics facility for sequencing the rest (and majority) 

of our libraries; to Dan Holland and Ian Henderson for lending me a crucial component of 

the Covaris sonicator; to Yoach Rais from the Weizmann institute for sending me purified 

TAT-CRE recombinase to perform my induced knockout experiment. 

Thank you to the Cambridge Trust, the Department of Genetics, and St. John’s 

College for supporting me financially.  

 
Lastly, I want to thank my parents for their endless support physically, mentally, and 

scientifically - I am lucky to have you.  

 
Summary 
 

DNA methylation is regarded as a stable epigenetic mark given its faithful maintenance 

across successive cell divisions. Methylation occurs at most CpG sites in mammalian 

genomes and is generally associated with transcriptional repression. An accepted 

evolutionary role for DNA methylation is to prevent the mobility of transposable elements 

(TEs). This thesis investigates the stability of DNA methylation in two separate contexts, 

particularly relating to intermediate levels of methylation. First, I characterise the 

properties of variably methylated TEs (VM-TEs) between individual mice. Second, I assess 

the fidelity of DNA methylation inheritance across cellular generations at VM-TEs, and 

more widely at the genome-scale, to ascertain the heritability, mechanism, and function of 

intermediate methylation states. 

My findings show that variable methylation extends beyond the boundaries of the 

TEs, and that all VM-TEs are enriched for binding of the transcription factor CTCF, which 

is inversely correlated with DNA methylation. I propose that molecular antagonism 

between CTCF and DNA methylation machinery influences the formation of variably 

methylated states in the early embryo. 

Within an individual mouse, VM-TEs are intermediately methylated between 10% 

and 90%, representing the cell population average of methylation states. The prevailing 

hypothesis supports the notion that methylation is established de novo by DNA 

methyltransferases DNMT3A/3B and then faithfully maintained by DNMT1. Hence, 

intermediate methylation levels likely represent stochastic de novo establishment 

(DNMT3A/3B) and clonal maintenance (DNMT1) within the cell population.  

To test this, I subcloned single cells from both mouse embryonic fibroblasts (MEFs) 

and embryonic stem cells (mESCs), growing them into multiple subclonal populations to 

assess methylation fidelity through cell divisions. This allowed me to address the degree to 

which a particular locus acquires intermediate methylation in the clonal population, as well 


Summary iv 

as the properties and mechanism of that state. If methylation is indeed propagated 

faithfully, one would expect that the single-cell derived populations always exhibit one of 

the three symmetric methylation states: 0%, 50% or 100%. At VM-TEs, we find that the 

subcloned cell lines attain intermediate methylation levels that reflect the level of the parent 

population, which implies that the original single-cell methylation state is not faithfully 

maintained at these loci.  

Expanding the analysis genome-wide, I use a target capture bisulphite sequencing 

method to evaluate methylation fidelity in the subclonal cell lines more globally. I find that 

CpGs exhibiting intermediate methylation at the cell population level, are generally 

unfaithfully inherited between cell divisions and attain methylation independently of 

neighbouring CpGs. While faithful hypo- and hypermethylation associate with 

transcriptional activity, unfaithful intermediate methylation associates with 

transcriptionally inactive genes or intergenic regions of the genome. Finally, in 

DNMT3A/3B mutants, methylation is not depleted consistently at any CpG, regardless of 

its methylation state in the control. Therefore, DNMT1 has two functions: 1) canonical 

maintenance of faithful methylation and 2) as shown here, it is responsible for the 

acquisition of intermediate methylation states that are unfaithfully inherited between cell 

divisions. 

 
Contents 
 

Chapter 1 Introduction ................................................................................ 1 

1.1 DNA methylation ........................................................................................... 2 

1.1.1 Discovering DNA methylation: from prokaryotes to eukaryotes .... 4 

1.1.2 Measuring DNA methylation ........................................................... 6 

1.1.3 Principles and patterns of DNA methylation ................................... 9 

1.1.4 The maintenance and establishment of DNA methylation ............ 14 

1.1.5 Crosstalk between DNA methylation and histone tail modifications

........................................................................................................ 23 

1.1.6 DNA methylation fidelity .............................................................. 26 

1.2 Transposable elements ................................................................................. 29 

1.2.1 Retrotransposons ............................................................................ 30 

1.2.2 Intracisternal A-particles (IAPs) .................................................... 32 

1.2.3 Mechanisms for silencing transposable elements in mammals ..... 33 

1.2.4 Metastable epialleles ...................................................................... 38 

1.3 CTCF: transcription factor and regulator of genomic architecture .............. 43 

1.3.1 CTCF and DNA methylation ......................................................... 44 

1.4 Research aims and thesis overview .............................................................. 46 

1.4.1 Aims ............................................................................................... 46 

1.4.2 Structure and overview .................................................................. 46 

Chapter 2 Genomic properties of variably methylated retrotransposons 

in mouse .................................................................................... 48 

2.1 Introduction and objectives .......................................................................... 48 

2.2 Results .......................................................................................................... 53 


Contents vi 

2.2.1 Characterising VM-IAPs by CpG density ..................................... 53 

2.2.2 Inter-individual methylation variability is not confined to the LTR 

boundaries of VM-IAPs ................................................................. 53 

2.2.3 CTCF and its motif are enriched at VM-IAPs ............................... 56 

2.2.4 CTCF binding and DNA methylation have an inverse relationship at 

VM-IAPs ........................................................................................ 60 

2.2.5 Chromatin interactions with VM-IAPs .......................................... 61 

2.2.6 DNA methylation at VM-IAPs is regulated independently of 

somatic and maternally derived piRNAs ....................................... 64 

2.3 Discussion .................................................................................................... 66 

Chapter 3 Stochastic and faithful inheritance define DNA methylation 

patterns through cell divisions ................................................ 71 

3.1 Introduction and objectives .......................................................................... 71 

3.2 Methodology ................................................................................................ 73 

3.3 Results .......................................................................................................... 76 

3.3.1 Methylation fidelity of VM-IAPs .................................................. 76 

3.3.2 Evaluating methylation fidelity at the genome-scale ..................... 80 

3.3.3 Methylation fidelity at transposable elements ............................... 99 

3.3.4 Investigating stochastic methylation inheritance ......................... 101 

3.4 Discussion .................................................................................................. 112 

3.4.1 Variable methylation at VM-IAPs is both stochastically established 

and maintained ............................................................................. 112 

3.4.2 Genome-wide, intermediate methylation is unfaithfully inherited 

between cell divisions yet can be stochastically retained in a cell 

population .................................................................................... 113 

3.4.3 Methylation fidelity is associated with transcription ................... 116 

3.4.4 Methylation fidelity at transposable elements is determined by 

genomic location .......................................................................... 116 

3.4.5 Stochastic methylation associates with repressive histone marks ..... 

...................................................................................................... 118 


Contents vii 

3.4.6 Both stochastic and non-stochastic methylation deposition is 

mediated by DNMT1 ................................................................... 119 

3.4.7 Future work on stochastic methylation regulation in mutant mice

...................................................................................................... 120 

Chapter 4 Discussion ................................................................................ 123 

4.1 Somatic DNA methylation: “accident” or design? .................................... 124 

4.1.1 What is the function of DNA methylation in mammals? ............ 125 

4.1.2 Stochastic inheritance is an inherent characteristic of DNA 

methylation regulation in the genome .......................................... 126 

4.2 Modelling methylation inheritance ............................................................ 128 

4.3 Implications for DNA methylation as a biomarker .................................... 129 

4.4 Beyond DNA methylation: understanding intermediate levels of 

transcription ............................................................................................. 131 

4.5 Concluding remarks ................................................................................... 133 

Chapter 5 Materials and methods ........................................................... 135 

5.1 Mouse procedures ...................................................................................... 135 

5.1.1 PiwiL2 knockouts ........................................................................ 135 

5.1.2 Dnmt3a/3b inducible double knockout ........................................ 135 

5.1.3 Tissue collection and DNA/RNA extraction ............................... 136 

5.2 Cell line generation, maintenance, and subcloning .................................... 136 

5.2.1 Mouse embryonic fibroblasts (MEFs) ......................................... 136 

5.2.2 Mouse embryonic stem cells (mESCs) ........................................ 136 

5.2.3 Dnmt3a/3b inducible double knockout MEFs ............................. 137 

5.2.4 5-azacytidine treatment ................................................................ 138 

5.3 Bisulphite pyrosequencing ......................................................................... 138 

5.4 Sequencing-based techniques and analyses ............................................... 138 

5.4.1 Chromatin immunoprecipitation (ChIP) ...................................... 139 

5.4.2 Circularised chromatin conformation capture sequencing (4C-seq)

...................................................................................................... 141 

5.4.3 Target capture bisulphite sequencing (tcBS-seq) ........................ 142 


Contents viii 

5.4.4 Total RNA sequencing ................................................................. 144 

5.4.5 Publicly available sequencing datasets ........................................ 145 

5.5 Western blotting ......................................................................................... 147 

References .................................................................................................. 148 

Appendix A Complementary information  and data ................... 174 

Appendix B Related publications .................................................. 196 


List of Figures 
 

Figure 1.1: 5-methylcytosine represents the addition of a methyl group to the fifth atom of 

the cytosine ring of DNA and in mammals is predominately found in a CpG 

dinucleotide context. ......................................................................................... 5 

Figure 1.2: Bisulphite treatment followed by PCR and DNA sequencing allows for 

quantitative nucleotide-level measurements of methylation levels. ................. 8 

Figure 1.3: Genomic methylation patterns and prevalence differ between species. .......... 10 

Figure 1.4: CpG islands are present in vertebrate genomes and absent from genomes 

lacking DNA methylation. .............................................................................. 12 

Figure 1.5: DNA methylation is established by DNMT3A/3B and maintained by DNMT1.

 ......................................................................................................................... 15 

Figure 1.6: Protein structures and catalytic mechanism of mammalian DNA 

methyltransferases. .......................................................................................... 19 

Figure 1.7: Mechanisms of passive and active demethylation. ......................................... 21 

Figure 1.8: Dynamics of methylation throughout mouse development. ............................ 22 

Figure 1.9: Histone tail modifications have distinct functions at specific regions of the 

genome. ........................................................................................................... 24 

Figure 1.10: Genomic transposable element content varies between species. .................. 30 

Figure 1.11: Genetic structures of LINEs, SINEs, and LTR retrotransposons. ................. 31 

Figure 1.12: Model of KRAB-ZFP-mediated heterochromatin formation. ....................... 37 

Figure 1.13: Agouti viable yellow phenotypic range and IAP regulation. ......................... 39 

Figure 1.14: Loop extrusion model for how CTCF and cohesin jointly mediate 3D 

chromatin interactions. .................................................................................... 43 

Figure 2.1: IAP elements of the LTR1_Mm–Ez-int (fully-structured) and the LTR2_Mm 

(solo LTR) types are over-represented in cVM-IAPs. .................................... 50 

Figure 2.2: Identification of variable methylation at transposable elements. .................... 51 

Figure 2.3: Increased CpG density in IAPLTR2_Mm cVM-IAPs. ................................... 54 


List of Figures x 

Figure 2.4: Inter-individual methylation variability is not confined to the LTRs of VM-

IAPs. ................................................................................................................ 55 

Figure 2.5: CTCF preferentially binds at a subset of IAP LTR types. .............................. 57 

Figure 2.6: CTCF binding is enriched at VM-IAPs relative to other IAPs in the mouse 

genome. ........................................................................................................... 59 

Figure 2.7: CTCF binding site motif at non-variable IAPs and VM-IAPs is similar. ....... 60 

Figure 2.8: Methylation at six out of seven tested VM-IAPs correlates inversely with CTCF 

binding assessed by ChIP-sequencing. ........................................................... 62 

Figure 2.9: Confirming that DNA methylation and CTCF binding are inversely correlated 

at VM-IAPs by ChIP-qPCR. ........................................................................... 63 

Figure 2.10: VM-IAP methylation is regulated independently of somatic piRNAs. ......... 65 

Figure 3.1: Evaluating intermediate methylation inheritance in cell culture. .................... 75 

Figure 3.2: Intermediate methylation at VM-IAPs is not faithfully inherited between 

cellular generations. ........................................................................................ 77 

Figure 3.3: Memory of intermediate methylation states at VM-IAPs is better retained in 

MEFs compared to mESCs. ............................................................................ 78 

Figure 3.4: Intermediate methylation levels at VM-IAPs do not recover consistently after 

recovery from methylation inhibition. ............................................................ 80 

Figure 3.5: Filtering and thresholding MEF methylation data. ......................................... 82 

Figure 3.6: Validation of MEF target capture bisulphite sequencing. ............................... 85 

Figure 3.7: Filtering and thresholding mESC methylation data. ....................................... 86 

Figure 3.8: Validation of mESC target capture bisulphite sequencing. ............................. 87 

Figure 3.9: Classifying and evaluating methylation states in MEFs. ................................ 88 

Figure 3.10: Classifying expression levels in MEFs. ........................................................ 91 

Figure 3.11: CpGs that exhibit intermediate methylation associate with transcriptional 

inactivity in MEFs. .......................................................................................... 92 

Figure 3.12: Methylation fidelity associates with transcription in MEFs. ......................... 93 

Figure 3.13: Intermediate methylation is generally unfaithful in MEFs. .......................... 94 

Figure 3.14: Classifying and evaluating methylation states in mESCs. ............................ 95 

Figure 3.15: Classifying expression levels in mESCs using publicly available data. ....... 96 

Figure 3.16: Methylation in mESCs associates with transcriptional inactivity. ................ 97 

Figure 3.17: Methylation remains low across protein-coding genes in mESCs. ............... 97 

Figure 3.18: Methylation is generally low and unfaithful in mESCs. ............................... 98 

Figure 3.19: Methylation fidelity at transposable elements. ............................................ 101 


List of Figures xi 

Figure 3.20: Intermediately methylated CpGs are prone to stochastic inheritance between 

cell divisions in MEF-1 cell lines. ................................................................ 102 

Figure 3.21: Stochastically methylated CpGs associate with repressive histone tail 

modifications H3K27me3 and H3K9me3. .................................................... 104 

Figure 3.22: Confirmation of induced Dnmt3a/3b DKO in primary MEFs. ................... 107 

Figure 3.23 Conditional loss of DNMT3A/3B shows no consistent changes in methylation.

 ....................................................................................................................... 109 

Figure 3.24: Intermediately methylated CpGs are prone to continue losing methylation 

following a genome-wide depletion by 5-aza. .............................................. 111 

Figure 4.1: DNA methylation is by default stochastically maintained. ........................... 127 

Figure 4.2: Local coordination of DNA methylation can largely explain inheritance 

dynamics between cell divisions. .................................................................. 129 

Figure 4.3: Intermediate transcription levels reflect variable transcription between clonal 

cell lines. ....................................................................................................... 133 

Figure A.1: Methylation variability exists beyond the edges of VM-IAPs. 175 

Figure A.2: VM-IAPs exist in a variety of genomic methylation contexts. .................... 177 

Figure A.3: Confirmation of methylation landscapes surrounding VM-IAPs. ................ 179 

Figure A.4: Genomic interactions of VM-IAPs. .............................................................. 185 

Figure A.5: Methylation levels and methylation fidelity at protein-coding genes of varying 

expression in MEFs. ...................................................................................... 186 

Figure A.6: Methylation levels and methylation fidelity at protein-coding genes of varying 

expression in mESCs. ................................................................................... 186 

Figure A.7: Intermediately methylated CpGs are prone to stochastic inheritance between 

cell divisions in MEF-2 cell lines. ................................................................ 187 

 
List of Tables 
 

Table 2.1: Read counts, mapping efficiency, and peak counts for eight individual CTCF 

ChIP-seq and Input libraries. .......................................................................... 57 

Table 3.1: Coverage of genic regions and transposable elements by target capture bisulphite 

sequencing. ...................................................................................................... 83 

Table 3.2: Publicly available bisulfite-sequencing and RNA-sequencing datasets used in 

this chapter. ..................................................................................................... 84 

Table 3.3: Number of genes and genic regions represented in methylation data. ............. 93 

Table 3.4: Counts and ratios of non-stochastic and stochastic CpG overlap with various 

histone tail modification peaks. .................................................................... 105 

Table 5.1: Summary of all generated sequencing datasets. ............................................. 139 

Table A.1: Bisulphite pyrosequencing primers for VM-IAPs, non-variable IAPs, and 

imprinting control regions. ............................................................................ 188 

Table A.2: Bisulphite pyrosequencing primers for regions surrounding VM-IAPs. ....... 189 

Table A.3: Primers used for 4C-sequencing. ................................................................... 194 

 
Abbreviations 
 

2i Two inhibitors (PD0325901 and CHIR99021) 

4C Circularised chromatin conformation capture 

5-aza 5-azacytidine 

5hmC 5-hydroxymethylcytosine 

5mC 5-methylcytosine 

A Adenine 

Avy Agouti viable yellow 

AxinFu Axin fused 

BER Base excision repair 

bp Base pair 

C Cytosine 

CGI CpG island 

ChIP Chromatin immunoprecipitation 

CpG Cytosine-guanine dinucleotide 

CTCF CCCTC-binding factor 

cVM-IAP Constitutive VM-IAP 

DKO  Double knockout 

DNA Deoxyribonucleic acid 

DNMT DNA methyltransferase 

E Embryonic day 

ERV Endogenous retrovirus 

G Guanine 

Gb Gigabase 

H A, C, or T (nucleotides) 

H3K27me3 Histone 3 lysine 27 trimethylation 

H3K36me3 Histone 3 lysine 36 trimethylation 


Abbreviations xiv 

H3K4me3 Histone 3 lysine 4 trimethylation 

H3K9me3 Histone 3 lysine 9 trimethylation 

HDAC Histone deactylase 

HP1 Heterochromatin protein 1 

I Intermediately methylated 

IAP Intracisternal A-particle 

ICM  Inner cell mass 

ICR Imprinting control region 

KAP1 KRAB-associated protein 1 

kb Kilo base 

KO Knockout 

KRAB Krüppel-associated box 

KZFP KRAB zinc finger protein 

LIF Leukemia inhibitory factor 

LINE Long interspersed element 

LTR Long terminal repeat 

M Hypermethylated 

Mb Megabase 

ME Metastable epiallele 

MEF Mouse embryonic fibroblasts 

mESC Mouse embryonic stem cell 

MI Hypermethylated and intermediately methylated 

nt Nucleotide 

NuRD Nucleosome remodelling deacetylase complex  

ORF Open reading frame 

PBS Primer binding site 

PCR Polymerase chain reaction 

PGC Primordial germ cell 

piRNA Piwi-interacting RNA 

qPCR Quantitative PCR 

RNA Ribonucleic acid 

RRBS Reduced representation bisulphite sequencing 

rRNA Ribosomal RNA 


Abbreviations xv 

SAM S-Adenosyl methionine 

seq Sequencing 

SINE Short interspersed element 

siRNAs Short interfering RNAs 

T Thymine 

tcBS-seq Target capture bisulphite sequencing 

TDG Thymine DNA glycosylase 

TE Transposable elements 

TET Ten eleven translocation proteins 

tRFs tRNA fragments 

tRNA Transfer RNA 

tsVM-IAP Tissue-specific VM-IAP 

TTKO Triple TET knockout 

U Hypomethylated 

UCSC University of California, Santa Cruz 

UHRF1 Ubiquitin-like, with PHD and RING finger domains 1  

UI Hypomethylated and intermediately methylated 

VM-IAP Variably methylated IAP 

VM-TE Variably methylated TE 

WGBS Whole genome bisulphite sequencing 

  
Chapter 1  
 
Introduction 
 
 
In the 1940s, Conrad Waddington proposed the term epigenetics to describe a field of study 

that focuses on the causal relationship between genotype and phenotype during 

development (Waddington 1942). However, by the 1990s, the definition of epigenetics was 

adapted and narrowed to denote heritable, mitotic or meiotic, changes in gene function that 

cannot be explained by changes in DNA sequence (Russo et al. 1996). This shift in 

definition, to one that focuses on inheritance of non-DNA information, was inspired by 

findings on DNA methylation. As a modification present on – but distinct from – the DNA, 

DNA methylation is important for development and can be inherited between cellular 

divisions, and sometimes even organismal generations (Holliday 1990). More recently, the 

scope of epigenetics has broadened beyond the necessitation for heritability as a defining 

characteristic to include RNA-mediated transcriptional repression and chromatin features, 

such as histone variants and tail modifications, as well as many other factors and pathways 

(Jablonka and Lamb 2002; Bird 2007). Yet, the heritability of non-DNA information is 

intriguing and continues to be an avenue of study in the now more expansive field of 

epigenetics. 

This thesis is divided into two relatively distinct projects that are both related to the 

stability of DNA methylation as an epigenetic mark between either organismal or cellular 

generations: one from the perspective of a subset of variably methylated transposable 

elements (VM-TEs) and the way in which their methylation status informs interactions with 

other genomic factors (Chapter 2), and the other that evaluates the extent to which 

methylation is mitotically inherited at VM-TEs and genome-wide (Chapter 3). Therefore, 

the first half of this introduction contains a thorough evaluation of the literature surrounding 

DNA methylation, from how it was discovered to recent findings that call into question 

long-lasting assumptions regarding its basic regulation. Following this extensive review of 


Chapter 1 2 

DNA methylation, we switch gears to provide the basis for the current understanding of 

TEs in the mouse genome, and more specifically VM-TEs termed “metastable epialleles”, 

which serve as a hallmark for transgenerational epigenetic studies in mice.  

1.1 DNA methylation 

Despite being the subject of intensive research for the past 50 years, the function of DNA 

methylation in eukaryotic systems is still debated. Since its discovery, a multitude of 

functions across many species have been proposed, from repressing transposable elements 

to regulating gene expression, as well as acting as an agent for transgenerational epigenetic 

inheritance. However, a unifying function of DNA methylation in eukaryotes does not seem 

to exist, as patterns throughout the genome can differ greatly between species. This is 

further complicated by the fact that DNA methylation is not conserved in all eukaryotes, 

notably being absent from three model organisms: Caenorhabditis elegans, Drosophila 

melanogaster, and Saccharomyces cerevisiae (Mattei et al. 2022). In contrast, another 

group of epigenetic marks called histone tail modifications are found in all eukaryotes1. 

Nevertheless, DNA methylation is essential for mammalian development and is unique in 

its status as a stable DNA modification that can be preserved between cell divisions, and in 

rare cases, organismal generations.  

Extensive research on DNA methylation has been conducted in plants using 

Arabidopsis as a model organism, which has provided a wealth of knowledge for the field. 

However, DNA methylation in plants differs from mammals in ways that hinder our ability 

to indiscriminately infer findings between them. For example, DNA methylation is found 

at CpG dinucleotides in both plants and mammals, and although mammals have non-CpG 

methylation, only in plants does DNA methylation occur frequently at CHG and CHH 

(where H represents A, C, or T) dinucleotides with known functional relevance (Henderson 

and Jacobsen 2007). In plants, the non-CpG methylation machinery is plant-specific and 

distinct from the CpG methylation machinery, which is largely conserved across eukaryotes 

(Chan et al. 2005; Stroud et al. 2014). Meanwhile, in mammals, the same enzymes are 

responsible for depositing both non-CpG and CpG methylation (Ziller et al. 2011). 

Methylation of mammalian genomes is extensive, with ~70-80% of CpG dinucleotides 

 
1 However, not all histone tail modifications are found in all eukaryotes. 


Chapter 1 3 

being methylated. In contrast, <1%-7% of mammalian non-CpG dinucleotides are 

methylated, a phenomenon that occurs in relative abundance in pluripotent cells, as well as 

the brain where it accumulates after birth (Ziller et al. 2011; Lister et al. 2013; Mo et al. 

2015; de Mendoza et al. 2021). The functional role of non-CpG methylation in mammals 

is debated, but it is known to be recognised in the brain by the highly expressed 

transcription factor MeCP2, which is important for normal neurological development (Guy 

et al. 2011; Gabel et al. 2015).  

Mammals experience a genome-wide loss of methylation during both 

embryogenesis and gametogenesis, which does not easily allow for methylation states to 

be passed between generations (Morgan et al. 2005). This is not the case in plants and other 

non-mammalian vertebrates, such as Xenopus laevis and Danio rerio2 (two more model 

organisms that have DNA methylation in their genomes), where methylation is not globally 

lost during embryogenesis (Veenstra and Wolffe 2001; Stancheva et al. 2002; Mhanni and 

McGowan 2004; Hsieh et al. 2009; Jullien et al. 2012; Potok et al. 2013; Bogdanovic et al. 

2016). Interestingly, the extra-embryonic tissue in plants, the endosperm, undergoes 

extensive demethylation (on the maternal alleles) – a process that is thought to have evolved 

to establish parental allele specific gene expression, also known as genomic imprinting 

(Gehring et al. 2009; Hsieh et al. 2009). In fact, mammals harbour genomic imprints in 

both the extra-embryonic tissue and the embryo, which both experience global loss of 

methylation suggesting that this process also occurs to re-establish imprints (Reik and 

Walter 2001; Feil and Berger 2007). This is all to say that the dynamics of DNA 

methylation, and thereby possibly the function, in mammalian systems varies drastically 

between early development, gametogenesis, and adulthood. For this introduction, we will 

focus on the features of DNA methylation in mouse to extrapolate its functions, using 

relevant insights obtained from studies in other organisms.  

As an overview for this section, we will first address the discovery of DNA 

methylation in prokaryotes and then highlight how these findings led to early methods in 

measuring DNA methylation levels, as well as a brief description of the methods more 

commonly used today and in this thesis. Next, we outline the patterns and principles of 

 
2 In Danio rerio, or zebrafish, the oocyte is hypomethylated compared to the sperm. The embryo gains 
methylation on the maternal allele by the 16-cell stage so that the DNA methylation profile is like that of 
the paternal allele - a state that is largely maintained throughout the development of the fish.  


Chapter 1 4 

DNA methylation as understood from comparative genomic analyses. This is followed by 

a detailed mechanistic review of how methylation is established and mitotically maintained 

in the genome, as well as the dynamics of methylation throughout development. We 

address the complex relationships between histone tail modifications and DNA 

methylation, and finally summarise findings that suggest DNA methylation is not always 

stably inherited between cell divisions.  

1.1.1 Discovering DNA methylation: from prokaryotes to eukaryotes 

The identification of DNA methylation took place before the confirmation that DNA, as 

opposed to protein, is the genetic material that is inherited between generations. 

Throughout this thesis, the terms “DNA methylation” and “methylation” refer to 5-

methylcytosine - the addition of a methyl group (-CH3) to the fifth atom of the cytosine 

ring of DNA (Figure 1.1A). 5-methylcytosine was synthesised for the first time in 1904 

(Wheeler and Johnson 1904; Hitchings et al. 1949) and discovered in nature 20 years later 

(Johnson and Coghill 1925) in the nucleic acid of Mycobacterium tuberculosis. It took 

another 20 years for DNA methylation to be detected in mammalian DNA, although it was 

not identified as such initially. In an independent and similar study to which led Erwin 

Chargaff to discover that there is a 1:1 stoichiometric ratio of purine and pyrimidine bases, 

Rollin Hotchkiss in 1948 showed that he was able to purify bases from calf thymus and 

identified what he called “epi-cytosine” – and what we now know is 5-methycytosine 

(Hotchkiss 1948; Witkin 2005).  

The first “discovery” of 5-methylcytosine in non-bacterial nucleic acid (with the 

knowledge of it being so) was published in 1950 by Gerald R. Wyatt at the Molteno 

Institute, whose building is now part of the University of Cambridge’s Pathology 

Department (Wyatt 1950). Wyatt confirmed the presence and amount of 5-methylcytosine 

(5mC) in different tissues from various organisms3. In a follow-up paper, Wyatt wrote “in 

the present state of knowledge as to the structure and function of nucleic acids nothing can 

be said as to the possible function of 5-methylcytosine. The amounts in which it occurs, 

however, varying with the source but constant from a given source, suggest that it is an 

essential constituent of certain DNA’s, and no accident of enzyme action” (Wyatt 1951). 

Today many things can be said about the function of DNA methylation, but the idea that it 

 
3 Curiously, he did not detect 5mC in M. tuberculosis, the organism in which 5mC was first observed.  


Chapter 1 5 

can exist due to an enzymatic “accident” is an unusual concept that we will return to in the 

Discussion section (4.1) of this thesis.  

 
Figure 1.1: 5-methylcytosine represents the addition of a methyl group to the fifth atom of the cytosine 
ring of DNA and in mammals is predominately found in a CpG dinucleotide context. (A) Molecular 
structures of an unmethylated and methylated cytosine (methyl group shown in orange). (B) Highlighting the 
difference between a CpG dinucleotide (in red), of which 70-80% are cytosine methylated in mammals, and a 
CG base pairing.  
 

Early investigations into the function of DNA methylation largely took place in E. 

coli bacteria, in which not only cytosine, but also adenine can be methylated as N6-

methyladenosine. Adenine DNA methylation is a rare occurrence in animal genomes 

(Vanyushin et al. 1970; Wu et al. 2016) and methylation of the remaining two 

deoxynucleotides, guanine or thymine, has not been detected in vivo4. 5-methylcytosine 

and N6-methyladenosine are differentially regulated in bacteria, yet they have overlapping 

functions (Borek and Srinivasan 1966). The major role of both 5-methylcytosine and N6-

methyladenosine in bacteria is as part of the “restriction modification system” that is used 

to defend the host bacterial cell against foreign DNA, such as that from a bacteriophage. 

The “restriction” part of this system refers to the restraint of the foreign DNA, which is 

accomplished by what are called restriction enzymes that cut DNA at specific sites and 

 
4 Aside from O6-methylguanine, which is mutagenic and cytotoxic. Bignami M, O'Driscoll M, Aquilina G, 
Karran P. 2000. Unmasking a killer: DNA O(6)-methylguanine and the cytotoxicity of methylating agents. 
Mutat Res 462: 71-82. 


Chapter 1 6 

come in many varieties. The “modification” part of the system refers to DNA methylation 

– the bacterial DNA is actively methylated whereas the foreign DNA is not. Therefore, 

there exist methylation-sensitive restriction enzymes that cannot cut methylated DNA, but 

can cut DNA lacking methylation (Meselson et al. 1972). The function of DNA methylation 

to mark non-host DNA for degradation in bacteria, in addition to the structural homology 

between prokaryotic and eukaryotic methylation deposition machinery (Cheng 1995), has 

led to the hypothesis that the last eukaryotic common ancestor used DNA methylation as a 

genome defence system (Chan et al. 2005). 

 
1.1.2 Measuring DNA methylation  

The key to understanding the function of DNA methylation lies in the ability to accurately 

measure its presence at distinct loci in the genome. Biologists used the tools provided by 

the bacterial restriction modification system to interrogate DNA methylation in other 

organisms. HpaII and MspI are restriction enzymes and isoschizomers, meaning that they 

both recognise and make cuts at the same sequence of DNA: CCGG (Mann and Smith 

1977; Waalwijk and Flavell 1978). However, HpaII cannot cleave the sequence when the 

internal cytosine residue is methylated, while MspI can. These enzymes can be used to 

digest DNA followed by gel electrophoresis and hybridisation on Southern blots – one can 

then compare the samples treated with HpaII versus MspI and determine the relative 

presence or absence of DNA methylation on the specific DNA fragment of interest. 

Although this method lacks the resolution of methylation levels at individual CpG sites, 

the context in which methylation is commonly found in mammals (Figure 1.1B), it was 

powerful enough to fuel many of the early discoveries regarding DNA methylation, like 

the identification of CpG islands, which will be introduced shortly (Bird 1980). 

In 1970, a group at the University of Tokyo (Hayatsu et al. 1970) and another at 

New York University (Shapiro et al. 1970) independently showed that bisulphite can 

deaminate cytosine residues – they additionally found that 5-methylcytosine is largely 

resistant to bisulphite-mediated deamination (Wang et al. 1980). It was not until the early 

1990s that this technique was used in conjunction with polymerase chain reaction (PCR) 

and DNA sequencing to deduce methylation levels by comparing the amount of deaminated 

Ts to Cs at a particular CpG site (Frommer et al. 1992; Clark et al. 1994). This allowed for 

another surge in DNA methylation studies due to the newfound ability to measure DNA 

methylation levels at individual CpG sites (Figure 1.2). For site-specific quantitative 


Chapter 1 7 

analyses of DNA methylation in this thesis, we use bisulphite-conversion followed by PCR 

coupled with pyrosequencing. Pyrosequencing is a “sequencing-by-synthesis” method that 

uses the single-stranded DNA to enzymatically synthesise the complementary strand, 

detecting nucleotides as they are incorporated (Tost and Gut 2007). The technique is limited 

to sequencing DNA that is a few hundred base pairs long, but is remarkably accurate when 

compared to other locus-specific DNA methylation assays (BLUEPRINT 2016).  

 The advent of next generation sequencing technologies in the 2000s coupled with 

bisulphite conversion allowed for a deeper understanding of DNA methylation patterns on 

the genome-wide level, as well as how its function may vary between organisms. The 

power of whole genome bisulphite sequencing (WGBS) was immediately evident as it both 

confirmed and clarified many of the previous hypotheses established by locus-specific 

analyses of methylation. It also provided the scope to establish the fundamental rules and 

patterns of DNA methylation through comparative analyses between genomes of highly 

divergent species. The recent advancements in single molecule real-time (SMRT from 

PacBio) and nanopore (from Oxford Nanopore technologies) sequencing allow for 

increasingly accurate measurements of DNA methylation without chemically modifying 

the DNA, at the same time as generating reads long enough to cover repetitive regions of 

the genome (Flusberg et al. 2010; Gigante et al. 2019; Amarasinghe et al. 2020). As these 

technologies continue to improve, they will almost certainly lead us into a new era of DNA 

methylation understanding due to their ability to natively measure methylation across 

multi-kilobase regions of the genome, as well as to interrogate loci like centromeres, which 

were previously unmappable (Naish et al. 2021; Gershman et al. 2022). 

  
Chapter 1 8 

 
Figure 1.2: Bisulphite treatment followed by PCR and DNA sequencing allows for quantitative 
nucleotide-level measurements of methylation levels. (A) Diagram of bisulphite treatment of DNA, which 
results in the conversion of unmethylated cytosines (C) to uracil (U) but does not affect methylated 
cytosines. Following PCR amplification, using either locus-specific or indexing primers (for a genome-wide 
approach), the uracil bases are converted to thymine (T). (B) Schematic example of how methylation levels 
are calculated at a base-level resolution using sequencing reads from bisulphite-converted and PCR 
amplified DNA derived from a sequence containing CpGs with unknown methylation levels. Methylation 
levels are calculated by measuring the number of Cs versus Ts sequenced at a particular CpG site. 

 
CC
C
C
C
C
C
C
C
T
C

T
T
T
T
T

Pyrosequencing 
for locus specific 

methylation analyses

Genome sequencing 
to assess global 
methylaton levels

C C G CT G A
M

C
M

C

U C G CT G A
M

C
M

U

T C G CT G AC T

Bisulphite 
conversion

PCR
Amplification

C G CT G
?

C
?

C
T
C
C
C
T
C
T
T
T

CTCT C
?

G

A

B

DNA sequence 
with unknown 

methylation levels

Sequencing reads
of bisulphite converted

and PCR amplified DNAC
T
T
T
T

50% 10% 90%

Methylation levels (%) = # C / (# C + # T)


Chapter 1 9 

1.1.3 Principles and patterns of DNA methylation  

The first WGBS experiments were published in 2008 for Arabidopsis thaliana and 

established the different dinucleotide contexts and ratios in which cytosine methylation can 

exist in plants (Cokus et al. 2008; Lister et al. 2008). In plants, methylation is found at 6.7% 

of CHG and 1.7% of CHH contexts, while methylation more frequently occurs at CpG 

dinucleotides (24% are methylated) (Law and Jacobsen 2010). They found that methylation 

is enriched at the pericentromeric regions, and that CpG methylation is especially enriched 

at gene bodies and transposable elements compared to the other kinds of DNA in plants. 

Additionally, non-CpG methylation is found at transposable elements, but not within gene 

bodies.  

Shortly following the publication of the plant WGBS studies, came two genome-

wide methylation studies for pluripotent and differentiated mammalian cells (human and 

mouse). They affirmed two fundamental principles regarding DNA methylation in 

mammals: 1) the mammalian genome is generally depleted of non-CpG methylation and 

2) 70-80% of CpGs are methylated (Meissner et al. 2008; Lister et al. 2009). The prevalence 

of methylation in mammals supports the hypothesis that the genome is by default 

methylated and regulated to be unmethylated wherever necessary (Edwards et al. 2010). 

Two additional studies generated WGBS datasets for many more species (20 additional 

species including the first full mouse methylome) (Feng et al. 2010a; Zemach et al. 2010). 

These studies found that methylation patterns and prevalence vary greatly amongst 

divergent species (Figure 1.3), and that gene body methylation is a universally conserved 

feature of eukaryotic DNA methylation5. 

In the two following sections, we outline the distinctive patterns of DNA 

methylation throughout the mammalian genome. More specifically, hypomethylation of 

CpG islands and hypermethylation of gene bodies and transposable elements.  

 
5 To clarify, this does not mean that all gene bodies of eukaryotic organisms are methylated, but that they 
are more likely to be methylated compared to gene promoters.  


Chapter 1 10 

 
Figure 1.3: Genomic methylation patterns and prevalence differ between species. Methylation from 
whole genome bisulphite sequencing (WGBS) data of eight different species plotted across (A) gene bodies 
and (B) repetitive regions, which represent both transposable elements and satellite DNA. Figure from Feng 
et al. 2010a. 

 
A

B


Chapter 1 11 

1.1.3.1 CpG islands 

Originally identified in the 1980s, CpG islands (CGIs) are features unique to vertebrate 

genomes and can be characterised by their high CpG dinucleotide density and lack of 

methylation (Bird 1980; Feng et al. 2010a). CGIs are generally around 200-1000 base pairs 

in length and associate with ~60-70% of annotated gene promoters in both human and 

mouse, especially those of housekeeping genes (Gardiner-Garden and Frommer 1987; 

Antequera 2003; Saxonov et al. 2006). It is important to note that only ~10% of CpGs in 

the mammalian genome are present in CGIs (Bird et al. 1985; Illingworth et al. 2008). 

Although they are typically unmethylated, there are some CGIs that can become methylated 

during normal development.  

Two well-studied examples of biological processes that involve CGI methylation 

are genomic imprinting and X chromosome inactivation. However, due to the distinctive 

nature of the mechanisms involved in these two specific cases, they may not necessarily be 

reflective of other instances of CGI methylation regulation throughout the genome. 

Genomic imprinting is a phenomenon by which genes are mono-allelically expressed in a 

parent-of-origin specific manner. There are ~100 genes regulated in this way, and many 

are in clusters that are under the control of a single imprinting control region (ICR), of 

which almost all can be classified as CGIs (Ferguson-Smith 2011; Suzuki et al. 2011). It is 

the establishment during gametogenesis, and subsequent maintenance throughout 

development, of differential methylation between parental alleles at ICRs, which allows for 

monoallelic expression of imprinted genes. Aberrant methylation levels at these ICRs can 

result in a range of detrimental phenotypes from developmental and growth defects to 

lethality. Thereby, the regulation of genomic imprinting highlights one of the most clearly 

defined and essential functions of DNA methylation in mammals.  

The inactivation of an X chromosome in females occurs to mediate gene dosage 

(due to the presence of two X chromosomes) and transcriptional repression is stably 

maintained at CGIs by DNA methylation, as well as other heterochromatin-associated 

factors (Chaligne and Heard 2014). In X inactivation, CGI methylation is not the primary 

silencing mechanism, but is necessary for long-term transcriptional repression (Lock et al. 

1987). In addition to genomic imprints and the X chromosome, there are other CGI 

promoters in the genome that get methylated during normal development (~10%), such as 

germline specific gene promoters (Weber…Schubeler 2007), as well as in specific 

biological contexts such as cancer or in vitro culture (Antequera et al. 1990; Weber et al. 


Chapter 1 12 

2005; Jones and Baylin 2007; Illingworth and Bird 2009; Maunakea et al. 2010). However, 

it is still unclear to what extent DNA methylation at CGIs is involved in initiating or 

maintaining transcriptional repression, or if it is just a molecular marker of silenced genes 

(Suzuki and Bird 2008). 

It has been proposed that selection acts to preserve CGIs due to their conserved lack 

of methylation across different species, combined with the potentially mutagenic properties 

of methylation (Bird 1980; Antequera 2003; Cohen et al. 2011). Methylation at a CpG 

dinucleotide is thought to be mutagenic because it can allow for spontaneous deamination 

from the 5-methylcytosine to a thymine nucleotide base in vitro, as well as in bacteria 

(Duncan and Miller 1980; Shen et al. 1992; Shen et al. 1994). If left uncorrected by the 

base excision repair (BER) pathway, this could result in the mutation of a cytosine to a 

thymine after cellular replication. The mutagenic properties of DNA methylation have been 

difficult to prove experimentally with mouse pluripotent stem cells (Spada et al. 2020). 

However, in species that have prevalent CpG DNA methylation, such as mouse and human, 

the CpG dinucleotide occurs at one fourth the expected ratio compared to other 

dinucleotides in the genome (Bird 1980). This reduced frequency of CpGs in the genome 

is much less pronounced in species that do not have DNA methylation, such as Drosophila 

melanogaster, which also lack CGIs (Schorderet and Gartler 1992; Jabbari and Bernardi 

2004; Vinson and Chatterjee 2012) (Figure 1.4). 

 
Figure 1.4: CpG islands are present in vertebrate genomes and absent from genomes lacking DNA 
methylation. Density of CpG dinucleotides at the chromosome scale across the mouse (left) 
and Drosophila (right) genomes. The mouse genome is punctuated by the presence of intermittent CpG 
dense regions, called CpG islands (CGIs), a distinct feature of vertebrate genomes. On the other hand, the 
Drosophila genome is devoid of methylation and has relatively consistent CpG density throughout, with no 
CGIs. CpG density is calculated for a 1000-bp window. Figure adapted from Vinson et al. 2012.  

 
Chapter 1 13 

1.1.3.2 Gene bodies and transposable elements 

Unlike CGI-associated promoters, gene bodies and transposable elements are generally 

hypermethylated. In the case of gene bodies, housekeeping genes are particularly enriched 

for methylation, with slight preference at exons compared to introns. As mentioned earlier, 

gene body methylation is a highly conserved feature that seems to be universal amongst 

almost all species with DNA methylation in their genomes (Feng et al. 2010a; Zemach et 

al. 2010), yet it does not have a known function. In mammals, gene body methylation has 

been hypothesised to prevent spurious intragenic transcriptional activation (Neri et al. 

2017) and to regulate alternative splicing of genes (Gelfman et al. 2013). However, these 

phenomena are quite rare and the findings that implied these functions have been proven 

difficult to reproduce (Teissandier and Bourc'his 2017). In plants, the loss of gene body 

methylation, through mutations of essential DNA methylation machinery, does not result 

in major changes in transcription (Roudier et al. 2009), suggesting it is not necessary for 

gene expression. Further work needs to be done to understand whether gene body 

methylation is indeed functionally relevant.  

On the other hand, hypermethylation at transposable elements does seem to have a 

function to repress transcriptional events. Transposable elements (TEs) are mobile genetic 

units that have the potential to “jump around” and integrate into the genome 

indiscriminately when transcribed or excised, thereby threatening genomic stability. There 

are many mechanisms that have evolved to suppress TE mobilisation through either 

transcriptional or post-transcriptional interference, and these can vary between species. In 

both plant and vertebrate genomes, TEs are hypermethylated, which has led to the 

hypothesis that DNA methylation originally evolved to silence TEs. This is supported by 

the fact that when the primary component for preserving CpG methylation profiles between 

cell divisions is removed in either plants (met1) or mouse (Dnmt1; discussed in the 

following section), TE expression is increased (Lippman et al. 2003). However, TEs are 

not homogeneous and are represented by various distinct families that are targeted by 

different silencing mechanisms. In the case of mice, DNA methylation appears to be 

especially important for the silencing of the mouse-specific and evolutionarily young TE 

family of intracisternal A-particles (IAPs) (Walsh et al. 1998). 

 With regards to the mutagenic properties of DNA methylation, the presence of 

methylation at both TEs and gene bodies raises an interesting paradox. Of all genomic 

features, housekeeping genes are the most conserved throughout evolution, so therefore it 


Chapter 1 14 

is surprising that they are also inundated with the potential for mutation due to the 

enrichment of methylation. In contrast, hypermethylation at TEs could provide an 

evolutionary role for DNA methylation to inactivate TEs through the accumulation of 

mutations (Goll and Bestor 2005). There are two separate enzymes, both acting as part of 

the base excision repair (BER) pathway, that are thought to be specifically responsible for 

correcting the T-G mismatch that results from the spontaneous deamination of DNA 

methylation: methyl-CpG binding domain protein 4 (MBD4) and thymine DNA 

glycosylase (TDG) (Bellacosa and Drohat 2015). In fact, mice lacking MBD4 accumulate 

more mutations at CpG sites on a reporter sequence compared with WT mice (Millar et al. 

2002). Yet it is still unclear how and whether these proteins behave differentially at gene 

bodies versus TEs with regards to repairing mismatches that arise due to the spontaneous 

deamination of DNA methylation. A possible explanation is that the enzyme action of 

MBD4 and/or TDG is coupled with transcription, thereby more easily dealing with 

mismatches at transcriptionally active gene bodies compared with TEs, which are generally 

transcriptionally silent. 

In the following sections, we will take a step back and introduce the factors and co-

factors necessary for the placement and removal of DNA methylation throughout the 

genome, as well as the dynamics of these processes during development.  

1.1.4 The maintenance and establishment of DNA methylation 

In 1975, two seminal review articles about the potential roles of DNA methylation and its 

underlying mechanisms were published and subsequently shaped the field (Holliday and 

Pugh 1975; Riggs 1975). Both papers hypothesised models for how eukaryotic methylation 

is established and maintained throughout development. Namely, that there are enzymes 

responsible for depositing methylation at unmethylated sites, and enzymes that preserve 

methylation during replication. In the current literature, these enzymes are referred to as 

the de novo and maintenance DNA methyltransferases (DNMTs) respectively. Here, we 

will first introduce the various DNMTs in the mouse genome and their different functions 

and structures, as well as an essential co-factor called UHRF1. An originally unanticipated 

capability of DNA methylation from the two 1975 review articles (Holliday and Pugh 1975; 

Riggs 1975), is that it can also be actively removed from the genome – a process that is 

especially important during embryonic development. Following the in-depth overview of 

DNMTs, we introduce TET enzymes that are involved in the active removal of methylation 


Chapter 1 15 

and describe the extensive dynamics of DNA methylation that occur during mammalian 

development.  

1.1.4.1 DNA methyltransferases (DNMTs)  

1.1.4.1.1 DNMT function and activity 

In the mouse genome, there are six annotated DNMTs: DNMT1, DNMT2, DNMT3A, 

DNMT3B, DNMT3C, and DNMT3L (Lyko 2018). DNMT1 is canonically referred to as 

the maintenance methyltransferase due to its preference for acting on hemimethylated DNA 

following the synthesis of new DNA strands during replication (Okano et al. 1998; Goyal 

et al. 2006). Meanwhile, DNMT3A and 3B are known as the de novo methyltransferases 

for establishing methylation patterns during early development, and do not show a 

preference for hemimethylated versus unmethylated DNA (Figure 1.5). DNMT3C is in the 

same family of methyltransferases as DNMT3A/3B but is mouse specific and exclusively 

expressed in the male germline (Barau et al. 2016).  

 
Figure 1.5: DNA methylation is established by DNMT3A/3B and maintained by DNMT1. The prevailing 
hypothesis posits that DNA methylation is established at unmethylated CpG sites by DNMT3A/3B, which 
are often referred to as the de novo methyltransferases. On the other hand, during replication, DNA 
methylation is placed on the newly replicated strand (in green) by DNMT1, which is known as the 
maintenance methyltransferase. Figure inspired by Jones and Liang 2009. 

 
Unlike the rest of the DNMTs, DNMT2 and DNMT3L are not directly involved in 

the catalytic process of adding a methyl group to the fifth carbon (C5) of a cytosine ring in 

the DNA. Originally discovered due to its highly conserved catalytic 5-methylcytosine 

domain (Okano et al. 1998), DNMT2 does not modify DNA, but is an RNA 

methyltransferase that specifically mediates methylation at tRNAs (Goll et al. 2006). On 

the other hand, DNMT3L is eutherian mammal-specific, has no catalytic function, but acts 

De novo
DNMT3A/3B

+

Replication

Maintenance
DNMT1


Chapter 1 16 

as an important co-factor of DNMT3A to facilitate methylation patterns at genomic 

imprints and TEs in the germline (Bourc'his et al. 2001; Bourc'his and Bestor 2004; 

Yokomine et al. 2006; Jia et al. 2007). 

Genetic studies of these different factors have elucidated the essential nature of 

DNA methylation in the mammalian genome. In mice, genetic knockouts of Dnmt1 or 

Dnmt3b are embryonic lethal (~E9.5), Dnmt3a knockouts are lethal postnatally (~4 weeks 

old), while knockouts of Dnmt3l or Dnmt3c result in male sterility (Li et al. 1992; Okano 

et al. 1999; Bourc'his et al. 2001; Barau et al. 2016). Although Dnmt3l-deficient females 

are fertile, their offspring are not viable – a phenotype recapitulated by a germline 

conditional knockout of Dnmt3a, which also results in male sterility (Bourc'his et al. 2001; 

Hata et al. 2002; Kaneda et al. 2004; Dura et al. 2022). Meanwhile, the germline-specific 

deletion of Dnmt3b does not have a discernible phenotype, further highlighting the 

differential developmental roles between DNMT3A and 3B (Kaneda et al. 2004). Genomes 

of mouse embryos lacking Dnmt1 are almost completely devoid of methylation 

(Grosswendt et al. 2020). Knockout embryos of either Dnmt3a or Dnmt3b show only partial 

loss of methylation (Auclair et al. 2014), but the double knockout (DKO) of both genes 

results in a similar amount of loss to Dnmt1-deficient embryos (Dahlet et al. 2020).  

In vitro studies have shown that DNMT1 functions in a processive manner, 

methylating long stretches of CpGs without dissociating from the single-stranded DNA 

(Hermann et al. 2004; Vilkaitis et al. 2005). It is thought that the processive nature of 

DNMT1 is, in part, what allows for the efficient and stable inheritance of DNA methylation 

during replication. In support of this, the frequency of errors that DNMT1 makes by 

missing a CpG site as it processes along the DNA is less than 0.3% of the time (Goyal et 

al. 2006). In contrast, it is debated the extent to which DNMT3A/3B methylate DNA in a 

processive manner (Jeltsch and Jurkowska 2016). In some studies, DNMT3A was observed 

to act with processivity (Holz-Schietinger and Reich 2010), while in others this 

phenomenon was not detected, and instead it was found that it binds in a cooperative 

manner (with additional DNMT3A proteins) across the DNA (Emperle et al. 2014). 

Recently, it was shown that DNMT3B does not behave in a cooperative manner like 

DNMT3A, and can act with processivity, but it is unclear how this compares to DNMT1’s 

ability to do the same (Norvil et al. 2018; Lin et al. 2020). The processivity of DNA 

methylation deposition by DNMT1, and perhaps DNMT3B, suggests that DNA 

methylation states are coordinated along the DNA. 


Chapter 1 17 

1.1.4.1.2 DNMT structure 

The basic structures of DNMTs are conserved throughout evolution. Dnmt1 and Dnmt3a 

were present in the common ancestor to metazoans, which suggests that species such as D. 

melanogaster and C. elegans lost these genes throughout evolution. On the other hand, 

Dnmt3b and Dnmt3l are thought to have resulted from a duplication event of Dnmt3a near 

to the origin of tetrapods and eutherian mammals respectively, while a duplication of 

Dnmt3b gave rise to Dnmt3c in mice (Molaro et al. 2020). 

The shared and unique protein structures of the different DNMTs informs their 

specific functions (Figure 1.6A). The catalytic domains of DNMTs involved in CpG 

methylation are conserved between mammals and bacteria (Bestor et al. 1988) and include 

two key motifs: PC and ENV. The mechanism by which DNMTs catalyse the addition of 

a methyl group is initiated via a nucleophilic substitution between the PC motif and the 

sixth carbon (C6) of the cytosine ring (Figure 1.6B). This reaction is facilitated by a 

protonation of the third atom (N3) of the cytosine ring by the ENV motif, which in turn 

allows for the transfer of the methyl group from S-adenosylmethionine (SAM) to the C5 of 

the cytosine ring (Jeltsch 2002). DNMT1 and DNMT3A/3B both have fully intact 

methyltransferase domains (which includes the PC and ENV motifs), while DNMT3L is 

truncated at the C-terminal portion and divergent in amino acid sequence at the PC and 

ENV motifs (Aapola et al. 2000), which is the presumed reason for its catalytic inactivity.  

Besides the catalytic domain at the C-terminal region of DNMTs, DNMT1 and 

DNMT3A/3B both harbour differential domains that are necessary for their distinct 

functions. The N-terminal region of DNMT1 is composed of the following domains: 

DNMT1-associated protein 1 (DMAP1) binding domain, a proliferating cell nuclear 

antigen (PCNA) binding domain, a replication foci-targeting sequence (RFTS) domain, a 

CXXC domain, and two bromo-adjacent homology (BAH) domains (Chen and Zhang 

2020). The RFTS, PCNA, and BAH domains are all necessary for targeting DNMT1 to 

replication foci to ensure the maintenance of the parent strand methylation state onto the 

newly synthesised daughter strand (Leonhardt et al. 1992; Chuang et al. 1997; 

Yarychkivska et al. 2018). The CXXC and BAH domains, in concert with RFTS, are 

involved in an autoinhibitory mechanism of DNMT1 that is thought to be resolved by a 

crucial co-factor, UHRF1 (Song et al. 2012; Bashtrykov et al. 2014; Berkyurek et al. 2014). 

The autoinhibitory function of DNMT1 has been hypothesised to reduce its ability to 

catalyse the addition of methylation independently of replication (Garvilles et al. 2015) or 


Chapter 1 18 

at sites that are completely unmethylated (Song et al. 2011). Finally, the DMAP1 binding 

domain has been proposed to interact with histone deacetlyase 2 (HDAC2) and 

transcriptional repressor DMAP1 (Rountree et al. 2000).   

Dnmt3a and Dnmt3b are highly homologous, and both have Pro-Trp-Trp-Pro 

(PWWP) and ATRX-DNMT3-DNMT3L (ADD) domains, the latter of which is also 

present in Dnmt3l. The PWWP domain binds to H3K36me3 (Dhayalan et al. 2010), a 

histone tail modification normally found on the gene bodies of expressed genes, and may 

be involved in the recruitment of DNMT3A or 3B to these sites (Du et al. 2015). The ADD 

domain recognises unmodified histone 3 lysine 4 (H3K4) (Otani et al. 2009) and is 

inhibited by histone methylation at this site (H3K4me3) (Ooi et al. 2007), a mechanism 

that may be crucial for the maintenance of unmethylated states at CpG islands (Edwards et 

al. 2010). 

 
Chapter 1 19 

 
Figure 1.6: Protein structures and catalytic mechanism of mammalian DNA methyltransferases. (A) 
There are four families of DNMTs in mammalian genomes and the conserved domains are shown in 
different colours. The catalytic domain responsible for the deposition of methylation is shown in red. DNMT2 
is in fact an RNA methyltransferase, a functional difference indicated structurally by the black stripe within 
the red catalytic domain. As shown by the truncation of the catalytic domain, DNMT3L is not capable of 
actively depositing methylation, but is still essential for methylation establishment during early development 
from its interactions with other DNMT3s. (B) Schematic showing the catalytic mechanism of DNA 
methylation deposition. The ENV and PC motifs from the catalytic domain of a methyltransferase (in green) 
facilitate the addition of a methyl group (in red) onto a cytosine base. S-adenosylmethionine (SAM), the 
substrate that provides the methyl group, is highlighted in purple. Figure adapted from Lyko 2018. 

 
1.1.4.1.3 UHRF1 

UHRF1 is an indispensable co-factor of DNMT1. Besides its potential role, mentioned 

above, to release DNMT1 from its autoinhibited state, UHRF1 recognises hemimethylated 

DNA via its SET- and RING-associated (SRA) domain, which is critical for the recruitment 

of DNMT1 to replication foci (Arita et al. 2008; Avvakumov et al. 2008; Hashimoto et al. 

2008). Thereby, UHRF1 is essential for embryonic development and the maintenance of 

methylation, supported by findings that Uhrf1 knockout leads to embryonic lethality and 

genome-wide depletion of methylation like that of Dnmt1-deficient embryos (Bostick et al. 

2007; Sharif et al. 2007). UHRF1 is also thought to play a role in the crosstalk between 

histone modifications and DNA methylation. This is because the Tandem Tudor and PHD 

domains of UHRF1 specifically recognise histone tail modification H3K9me3, however it 

A

B

(PC motif)

(ENV motif)
(SAM)


Chapter 1 20 

is unclear the extent to which this crosstalk is important for DNA methylation genome-

wide (Du et al. 2015). 

1.1.4.2 TET proteins and active demethylation 

The loss of a methylated state can occur either through the passive loss of methylation due 

to errors in maintenance by DNMT1, or the active oxidation of methylation by ten eleven 

translocation (TET) proteins (Hill et al. 2014). So far, three mammalian TET proteins have 

been identified  TET1, TET2, and TET3  and each one can catalyse the oxidation of 5-

methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), or 5-

carboxycytosine (5caC) (Figure 1.7A)  (Tahiliani et al. 2009; He et al. 2011; Ito et al. 

2011). Despite all having the same functional capacity, the three TET enzymes have 

relatively distinct biological functions due to differential regulation throughout 

development and between cell-types (Melamed et al. 2018). Notably, the double knockout 

of Tet1 and Tet2 can produce viable mice (Dawlaty et al. 2013), but Tet3-deficient mice 

die at birth (Gu et al. 2011).  

TET-mediated demethylation has been proposed to occur by two distinct 

mechanisms (Figure 1.7B) whereby oxidation of 5mC either results in 1) passive dilution 

of the methylation state during replication because DNMT1 and UHRF1 cannot recognise 

oxidated forms of 5mC, or 2) targeting for excision by thymine DNA glycosylase (TDG) 

followed by replacement with an unmethylated cytosine via base excision repair (BER) 

(Kohli and Zhang 2013). In support of the passive dilution mechanism, in vitro biochemical 

assays have shown that the efficiency of DNMT1 to catalyse the addition of methyl groups 

on a newly synthesised strand is greatly reduced when positioned opposite a hydroxy-

methylated CpG (5hmC) (Valinluck and Sowers 2007; Ji et al. 2014; Seiler et al. 2018). 

Despite this, it was recently shown that the TET-TDG-BER pathway, as opposed to the 

passive dilution of 5hmC, is the major contributor to demethylation events during induced 

pluripotent stem cell (iPSC) reprogramming from somatic cells (Caldwell et al. 2021).  

 
Chapter 1 21 

 
Figure 1.7: Mechanisms of passive and active demethylation. (A) Diagram of cytosine methylation 
(5mC; black) and iterative oxidation reactions mediated by TET enzymes to produce 5hmC (orange), 5fC 
(blue), and 5caC (purple). (B) Mechanistic differences between TET-mediated passive and active 
demethylation. Passive demethylation occurs by DNMT1/UHRF1 not recognising the oxidated forms of 5mC 
for maintenance during DNA replication. Meanwhile, active demethylation is the targeting of 5hmC, 5fC, or 
5caC by thymine DNA glycosylase (TDG) for excision, followed by replacement with an unmethylated 
cytosine (C; white) via the base excision repair (BER) pathway. Figure adapted from Lio and Rao 2019.  

 
1.1.4.3 Dynamics of DNA methylation throughout mammalian development 

Patterns of DNA methylation are highly dynamic throughout mammalian development, 

during which the genome undergoes two distinct rounds of epigenetic reprogramming: 1) 

in the early embryo during preimplantation development, and 2) in the developing germline 

post-implantation (Figure 1.8) (Greenberg and Bourc'his 2019). Both reprogramming 

events result in almost complete erasure of DNA methylation, as well as a near complete 

reset of histone tail modifications (Feng et al. 2010b). In the zygote, immediately following 

fertilisation, methylation is both passively lost due to the exclusion of DNMT1 from the 

nucleus (Carlson et al. 1992; Mertineit et al. 1998) and actively removed via TET3-

mediated demethylation (Gu et al. 2011; Guo et al. 2014; Shen et al. 2014). By the 

A

B


Chapter 1 22 

blastocyst stage of development (E3.5 in mice), genome-wide methylation levels dip to a 

nadir of ~20% (Wang et al. 2014). Following implantation of the mouse embryo at E4.5, 

the genome is rapidly methylated by DNMT3A/3B (Santos et al. 2002; Dahlet et al. 2020) 

to roughly somatic levels of global methylation, ~70%, by E6.5 (Seisenberger et al. 2012).  

 
Figure 1.8: Dynamics of methylation throughout mouse development. DNA methylation goes through 
two distinct rounds of epigenetic reprogramming during early mammalian development. Immediately 
following fertilisation, methylation is passively and actively lost, due to the exclusion of DNMT1 from the 
nucleus and TET3-enzymatic activity, respectively. By the blastocyst stage (E3.5), global methylation levels 
drop to ~20%, but after embryo implantation (E4.5) the genome is rapidly methylated by DNMT3s until E6.5 
when methylation levels reach ~70%, a state that is globally maintained throughout the rest of somatic 
development. The second round of reprogramming occurs during germline development starting at E7.25, 
when a subset of stem cells passively and actively (by TET1/2) lose methylation to form primordial germ 
cells (PGCs) by E13.5 with global methylation levels nearing ~7%. The formation of female and male 
gametes differs in both methylation reacquisition (although both require DNMT3L and DNMT3A) and 
developmental timing. Male germ cells are remethylated before birth to global levels of ~80% and undergo 
additional methylation regulation at a subset of transposable elements by DNMT3C. Female germ cells are 
not completely remethylated until after birth, during ovulation, after which global methylation levels reach to 
~50%. Figure from Greenberg and Bourc’his 2019. 

 
In the post-implantation epiblast (E7.25), a subset of stem cells undergoes germline 

specification to primordial germ cells (PGCs) (Ginsburg et al. 1990). These PGCs are 

epigenetic reprogrammed through both passive and TET1/2-mediated active DNA 

demethylation mechanisms (Yamaguchi et al. 2012; Hackett et al. 2013; Yamaguchi et al. 

2013a; Yamaguchi et al. 2013b). By E13.5, global methylation levels of PGCs are depleted 

to ~7% (Wang et al. 2014), after which there is a global gain in methylation largely 

mediated by DNMT3L and DNMT3A (Bourc'his and Bestor 2004; Kaneda et al. 2004; 

Kato et al. 2007; Smallwood et al. 2011). This process of remethylating the germline 

genome is distinctive between male and female gametes. With additional methylation 

deposition provided by DNMT3C, male germ cells are remethylated before birth to ~80% 


Chapter 1 23 

global methylation levels (Barau et al. 2016). On the other hand, female gametes are not 

completely remethylated until after birth during ovulation (Sasaki and Matsui 2008), after 

which the levels of methylation reach to ~50% genome-wide, which is notably lower than 

in sperm (Wang et al. 2014).  

Of interest is what remains methylated following the global depletion in the 

developing embryo and gametes, with global methylation levels of ~20% and ~7%, 

respectively. It is at these sites that it has been proposed DNA methylation can be inherited 

inter- or even trans-generationally (Skvortsova et al. 2018). In both the developing embryo 

and the gametes, methylation is partially retained at young TEs, such as intracisternal A-

particles (IAPs) (Hajkova et al. 2002; Lane et al. 2003; Lees-Murdock et al. 2003), though 

the precise loci that retain methylation have been difficult to ascertain. In the germ line, 

DNMT3L-mediated methylation at these elements is thought to be regulated by the 

conserved PIWI-associated RNA (piRNA) pathway, which protects genomes against TE 

mobilisation (Aravin et al. 2008; Tam et al. 2008; Molaro et al. 2014).   

Besides IAPs, imprinting control regions (ICRs) are also resistant to the global 

erasure of methylation in the developing embryo, but not the germline (Ferguson-Smith 

2011). Genomic imprints are initially established in the PGCs, a process that requires both 

DNMT3A and DNMT3L. In the developing embryo, parent-of-origin specific methylation 

states retain their methylation through the action of two Krüppel-associated box (KRAB)-

containing zinc finger (ZF) proteins (KZFP), ZFP57 and ZFP445 (Li et al. 2008; 

Strogantsev et al. 2015; Takahashi et al. 2019). Moreover, it was shown in mouse 

embryonic stem cells (mESCs; a cell culture model of the pluripotent blastocyst) that 

KZFP-mediated targeting induces DNA methylation, required for the deposition of the 

repressive histone mark H3K9me3, at ICRs (Quenneville et al. 2011; Quenneville et al. 

2012). This supports a model by which KZFPs are involved in maintaining DNA 

methylation at a subset of sites during global demethylation in the developing embryo.  

1.1.5 Crosstalk between DNA methylation and histone tail modifications  

As epigenetic marks that are involved in transcriptional regulation, there are many 

connections and correlations between histone tail modifications and DNA methylation, but 

they are complex and difficult to untangle. Compared to DNA methylation, histone tail 

modifications are distinct with regards to their function and genomic location, so it is well 

understood how different marks regulate different regions of the genome. Here we dissect 

the relationships between DNA methylation and four different histone tail modifications: 


Chapter 1 24 

histone 3 lysine 9 trimethylation (H3K9me3), histone 3 lysine 27 trimethylation 

(H3K27me3), histone 3 lysine 4 trimethylation (H3K4me3), and histone 3 lysine 36 

trimethylation (H3K36me3).  

 
Figure 1.9: Histone tail modifications have distinct functions at specific regions of the genome. 
Histone tail modifications are either associated with transcriptional activation or repression. H3K4me3 (in 
green) is found at the promoters and around the transcription start sites of active genes, while H3K36me3 
(in yellow) is associated with active gene bodies. On the other hand, H3K9me3 and H3K27me3 are two 
repressive histone marks. H3K9me3 (in red) is generally targeted to gene-poor regions of the genome 
where it is involved in the transcriptional repression of transposable elements. H3K27me3 (in blue) is 
located in gene-rich regions of the genome to silence genes. Figure adapted from Stefano 2022.  

 
Genome organisation is facilitated by the wrapping of DNA around histones to form 

nucleosomes (Luger et al. 1997). Histone proteins have N-terminal amino acid tails that 

can be modified by a range of chemical modifications, including methylation and 

acetylation. Histone tail modifications are involved in transcriptional regulation of the 

genome with some associating with repression and others with activation (Figure 1.9). 

Like DNA methylation, repressive modifications such as H3K9me3 and H3K27me3 can 

be stably maintained between cell divisions by histone retention at replication foci 

(Margueron et al. 2009). Conversely, histone marks associated with transcriptional 

activation, such as H3K4me3, are re-established following replication (Escobar et al. 

2019). In this section, we dissect the relationships between DNA methylation and both 

transcriptionally repressive and active histone modifications.  


Chapter 1 25 

1.1.5.1 Transcriptionally repressive histone marks 

H3K9 methylation is enriched in heterochromatic gene-poor regions of the genome, such 

as TEs and pericentromeric repetitive satellite elements, where DNA methylation can also 

be found (Pauler et al. 2009). There are six methyltransferases that catalyse H3K9 

methylation in the mouse genome: SUV39H1, SUV39H2, G9A, GLP, SETDB1, and 

SETDB2 (Padeken et al. 2022). The three canonical DNMTs, DNMT1 and DNMT3A/3B, 

as well as UHRF1, have been shown to directly interact with this H3K9 methylation 

enzymatic machinery (Fuks et al. 2003; Esteve et al. 2006; Li et al. 2006; Chang et al. 

2011), but the extent to which these interactions influence global DNA methylation levels 

are limited and dependent on developmental and genomic contexts.  

In Suv39h1-/-Suv39h2-/- mice, DNA methylation is lost at major satellite repeats in 

pericentromeric regions, while Dnmt1-/- or Dnmt3a-/-Dnmt3b-/- mouse embryos do not show 

any changes in H3K9 methylation at these sites (Lehnertz et al. 2003). Additionally, 

mESCs deficient for H3K9 methyltransferase G9a, show reduced levels of DNA 

methylation at retrotransposons, as well as ICRs (Dong et al. 2008; Zhang et al. 2016). 

However, this second point is controversial as it has been shown that at ICRs and ZFP57-

bound regions, H3K9 methylation is lost in Dnmt1-/-Dnmt3a-/-Dnmt3b-/- triple knockout 

mESCs, indicating that H3K9me3 is secondary to DNA methylation at these loci 

(Quenneville et al. 2011; Shi et al. 2019). Overall, these findings suggest a model in which 

H3K9 methylation can precede  DNA methylation to silence most regions of the genome 

(Padeken et al. 2022), but the opposite is likely to be the case at genomic imprints where 

DNA methylation precedes H3K9me3. 

It has been proposed that the direct binding of UHRF1, via its Tandem Tudor 

domain (TDD), to H3K9 methylation is essential for DNA methylation maintenance 

mediated by DNMT1 (Rothbart et al. 2012; Rothbart et al. 2013). But when this interaction 

is abrogated by mutating the TDD of UHRF1, global DNA methylation levels are only 

reduced by ~10% (Zhao et al. 2016). Additionally, when all six H3K9 methyltransferases 

are knocked out in mouse embryonic fibroblasts (MEFs), a somatic context in which DNA 

methylation is globally maintained by DNMT1, DNA methylation is only modestly 

reduced by ~20% genome-wide (Montavon et al. 2021). Therefore, H3K9 methylation may 

precede and promote de novo DNA methylation in the early embryo at most genomic loci, 

but H3K9me3 and DNA methylation at sites catalysed by DNMT1 appear to be largely 

independently regulated.  


Chapter 1 26 

The histone tail modification H3K27me3 is enriched at inactive genes and 

maintains their transcriptional repression at promoters via the polycomb repressive 

complex 2 (PRC2) (Pauler et al. 2009). Loss of H3K27me3 results in little change to DNA 

methylation, while loss of DNA methylation results in acquisition of H3K27me3 genome-

wide, which suggests that DNA methylation and H3K27me3 are molecular antagonists 

(Brinkman et al. 2012; Hagarman et al. 2013).  

1.1.5.2 Transcription-associated histone marks 

Another histone modification that has a proposed antagonistic relationship with DNA 

methylation is H3K4me3, which is generally found at the promoters of transcribed genes. 

In fact, enrichment of H3K4me3 is mutually exclusive with DNA methylation and, whereas 

DNA methylation seems to block H3K27 methylation, H3K4 methylation seems to block 

DNA methylation, particularly at CGIs (Ooi et al. 2007; Weber and Schubeler 2007).  

As a final example of how DNA methylation can crosstalk with histone tail 

modifications, histone 3 lysine 36 trimethylation (H3K36me3) and DNA methylation are 

both enriched at exons and introns of actively transcribed genes (Rose and Klose 2014). 

The PWWP domains of DNMT3A/3B can recognise H3K36me3 (Rondelet et al. 2016), 

and when they are disrupted by mutations in mESCs, gene body DNA methylation is 

reduced (Baubec et al. 2015). Gene body DNA methylation is also reduced when the 

enzyme responsible for catalysing H3K36 methylation, SETD2, is depleted (Morselli et al. 

2015). However, there is still strong enrichment of DNA methylation at gene bodies in 

somatic contexts, when DNMT3A/3B are absent, suggesting that there may be a yet 

uncovered relationship between H3K36me3 and DNMT1.   

1.1.6 DNA methylation fidelity  

So far, we have presented DNA methylation as an epigenetic mark that is established during 

early development by DNMT3s and then stably propagated through mitosis by DNMT1. 

However, there is accumulating evidence that DNA methylation is not always stably 

inherited between cellular divisions. In fact, two studies from the 1980s were among the 

first to observe this instability (Pollack et al. 1980; Wigler et al. 1981).  

By transfecting viral plasmid containing unmethylated or artificially methylated 

DNA into mouse fibroblasts, Pollack and colleagues used methylation-sensitive restriction 

enzymes to examine the presence or absence of DNA methylation at the transfected 

sequences in clonal cell lines after 25 to 30 divisions (Pollack et al. 1980). They found that 


Chapter 1 27 

methylation was only preserved in 2 out of the 8 clonal lines derived from methylated DNA 

transfections, and maintenance of the unmethylated state in 13 out of 14 clonal lines derived 

from the unmethylated DNA transfections – meaning that one line gained methylation. The 

findings were generally inconclusive about whether DNA methylation is replicated via a 

semiconservative process. Wigler and colleagues, using almost the same techniques, found 

that of 9 clonal lines derived from methylated DNA transfections, all of them appear to 

retain inconsistent levels of DNA methylation, and of 10 clonal lines derived from 

unmethylated DNA transfections, none of the lines exhibited methylation (Wigler et al. 

1981). They concluded that somatic cells can mediate the faithful inheritance of 

methylation, but not that they always do.  

The two studies reviewed above in detail analysed methylation levels through cell 

divisions at sequences not originally present in the genome, suggesting that perhaps DNA 

methylation is inherited differently at transfected loci. However, other groups also using 

subcloning approaches to infer methylation fidelity of endogenous sequences in the 

genome, came to similar conclusions. Using largely qualitative approaches to assess 

methylation levels, as well as being limited to specific loci that were of interest with regards 

to early functional methylation studies, led to varied and sometimes confounding 

interpretations and conclusions regarding methylation fidelity (Shmookler Reis and 

Goldstein 1982a; Shmookler Reis and Goldstein 1982b; Turker et al. 1989; Pfeifer et al. 

1990; Shmookler Reis et al. 1990). Nevertheless, across all the studies, DNA methylation 

levels at a particular CpG site, or group of sites, were not always consistent between 

subclonal cell lines – implicating imperfect mitotic inheritance of methylation at the tested 

genomic loci. More recent quantitative and genome-scale approaches have observed the 

existence of unfaithful methylation (also referred to as “stochastic” methylation) and have 

shown that it is more prevalent in pluripotent compared to somatic cells (Landan et al. 

2012; Shipony et al. 2014). 

Another way to approximate methylation fidelity is to measure the presence of 

hemimethylation. This is done by hairpin-bisulphite PCR (or sequencing), which allows 

for measuring of strand-specific methylation patterns. The logic for why hemimethylation 

can be used as a proxy for methylation fidelity stems from the widely accepted mechanism 

for how methylation patterns are inherited during cell replication. Through UHRF1, 

DNMT1 recognises hemimethylated CpG sites after replication and deposits methylation 

onto the newly synthesised strand (Hermann et al. 2004), therefore the presence of 

homeostatic hemimethylation is indicative of this process not coming to completion. Using 


Chapter 1 28 

hairpin-bisulphite PCR in a locus-specific manner revealed that hemimethylation, and by 

proxy methylation infidelity, can exist in varying amounts between different genomic loci 

(Laird et al. 2004; Arand et al. 2012). This concept promoted the idea that hemimethylation 

induced by imperfect DNMT1-mediated methylation maintenance will lead to loss of 

methylation unless accompanied by de novo methylation (Riggs and Xiong 2004). When 

assessed at the global scale by genomic sequencing, more than half of CpGs with 50-90% 

methylation levels exhibited hemimethylation (Zhao et al. 2014). Additionally, the 

prevalence of hemimethylation genome-wide was shown to decrease during mESC 

differentiation – confirming the finding mentioned above that unfaithful methylation 

inferred by subcloning is more prevalent in pluripotent cells compared to somatic ones.  

From these studies, ideas have begun to emerge for how DNA methylation may be 

maintained through cellular divisions unfaithfully or stochastically (Riggs and Xiong 2004; 

Jones and Liang 2009; Jeltsch and Jurkowska 2014). However, the extent and principles 

dictating this instability have yet to be fully determined – in Chapter 3 of this thesis, we 

decipher more precisely how methylation is propagated between cell divisions and to what 

degree it is stochastically, rather than clonally, inherited.  

 
Chapter 1 29 

1.2 Transposable elements 

Repressing the transcription of transposable elements (TEs) is an established example of a 

distinct biological role for DNA methylation in the genome. For the remaining sections of 

the introduction, we will describe TEs in greater depth and focus in on a subset of variably 

methylated TEs called “metastable epialleles”. Lastly, we will introduce the transcription 

factor CTCF, which has many binding sites within young TEs, is enriched at metastable 

epialleles, and can exhibit methylation sensitivity with regards to binding the DNA. 

TEs are DNA sequences that can change location within a genome (Bourque et al. 

2018). Originally identified in maize by Barbara McClintock in the 1950s (McClintock 

1950), it has since become clear that TEs, and their evolutionarily inactive remnants, often 

account for large proportions of many eukaryotic genomes. For example, 40% of the mouse 

genome is thought to be made up of TE genetic material (Mouse Genome Sequencing et 

al. 2002). This proportion is comparable to the TE content of other mammals (Figure 1.10)  

and very likely underestimates the true proportion due to the limitations of short-read 

Illumina sequencing (Platt et al. 2018), which is unable to read sequences long enough for 

unique mapping to the genome of identical (or similar) TEs. The recent advent of long-read 

sequencing will allow for more accurate estimates of TE content in the future (Shahid and 

Slotkin 2020).  

TEs are classified by their mechanism of transposition, although most identified 

TEs in the mouse genome no longer have the ability to actively mobilise (Huang et al. 

2012). The overwhelming majority of TEs in the mouse genome (96%) are classified as 

retrotransposons, which can mobilise via an RNA intermediate prior to reintegration into 

the genome (Mouse Genome Sequencing et al. 2002; Nellaker et al. 2012). The remaining 

4% of TEs are classified as DNA transposons, which can be excised as double-stranded 

DNA and reintegrate into the genome without being transcribed as an RNA intermediate. 

Although DNA transposons are still actively mobile in many species, in the mouse genome 

there is no evidence for transposition of a DNA transposon for the last 40 million years 

(Feschotte and Pritham 2007) – some retrotransposons, on the other hand, are still active. 

 
Chapter 1 30 

 
Figure 1.10: Genomic transposable element content varies between species. Transposable elements 
(TEs) make up ~40% of the mouse genome, which is comparable to the human genome in terms of amount 
and composition, both of which can differ quite radically from other species. For example, more than 50% 
of the zebra fish (Danio rerio) genome is composed of TEs, most of which are DNA transposons (in purple), 
which are present in comparatively low amounts in both mouse and human. Helitrons (in orange), a specific 
family of DNA transposons, are found in many genomes throughout the tree of life, but the only mammalian 
genomes in which they have been documented are that of bats. Figure from Huang, Burns, and Boeke 
2012.  
 

1.2.1 Retrotransposons 

In mammals, there are three major classes of retrotransposons: long-interspersed nuclear 

elements (LINEs), short-interspersed nuclear elements (SINEs), and long-terminal repeat 

(LTRs) elements (Figure 1.11) (Bourque et al. 2018). The DNA sequence of LINEs and 

LTR retrotransposons can encode proteins that allow for autonomous retrotransposition, 

whereas SINEs require LINE-derived proteins to mobilise (Dewannieux and Heidmann 

2005). A full-length LINE is around 7 kilobases (kb) long and contains two open reading 

frames that encode ORF1 and ORF2 (Boissinot and Sookdeo 2016). ORF1 is an RNA-

binding protein that functions as a chaperone to facilitate the reverse-transcriptase and 

endonuclease functions of ORF2. The process of reverse-transcription converts the RNA 

of LINEs or SINEs into DNA for integration into the genome, which is initiated by 


Chapter 1 31 

endonuclease activity. Most LINEs in the mouse genome are truncated and do not have the 

ability to mobilise – it has been estimated that there are ~3000 active LINE elements 

(Goodier et al. 2001). SINEs are much shorter than LINEs (< 700 base pairs (bp) in length) 

and contain an RNA polymerase III promoter – with A and B block regions – thought to 

be derived from either tRNAs or rRNAs, as well as an internal region homologous to LINE 

elements, which may allow for the facilitation of retrotransposition via the LINE machinery 

(Ferrigno et al. 2001).  

 
Figure 1.11: Genetic structures of LINEs, SINEs, and LTR retrotransposons. The three major classes 
of retrotransposons in mammals (LINEs, SINEs, and LTR retrotransposons) are fundamentally diverse in 
genetic structure, which informs their different mechanisms of mobilisation. LINEs contain two open reading 
frames (ORF1/2) that encode proteins that facilitate reverse-transcription and re-integration into the 
genome. SINEs utilise the LINE derived proteins, as they do not encode their own. However, they do contain 
RNA polymerase III promoters (with A and B block regions) to independently induce transcription. The 
internal portions of LTR retrotransposons are flanked by LTRs and contain gag, pol, and env genes. Gag 
and pol encode a fusion protein necessary for retrotransposition, while env encodes for an envelope protein 
that allows for the retrotransposon to exit the cell to infect others. Although the LTRs of a full-length element 
are identical, the 5' LTR typically acts as a promoter, while the 3' LTR behaves as a transcription termination 
site. Figure adapted from Fueyo et al. 2022.  

 
LTR retrotransposons are characterised by two identical non-coding long-terminal 

repeats that flank a set of genes that can encode proteins that are required for 

retrotransposition. LTRs are roughly 200-600 bp in length, while the internal portion of the 

elements can span between 5 and 7 kb. In mammals, all LTR retrotransposons are derived 


Chapter 1 32 

from a superfamily of endogenous retroviruses (ERVs) that likely arose from retroviruses 

inserting into the germline genome and then being inherited as proviruses through 

subsequent generations (Gifford et al. 2018). The structure of an ERV is therefore very 

similar to that of a retrovirus. Each LTR is subdivided into two unique regions (U3 and U5) 

that flank a regulatory region. Directly downstream of the U5 portion of the 5' LTR is the 

primer binding site (PBS), which is a highly conserved sequence that is essential for reverse 

transcription of the ERV (Havecker et al. 2004). Meanwhile, the internal portion of the 

element, between the two LTRs, contains gag, pol, and env genes that encode various 

polyproteins required for the propagation of the TE either intra- or extracellularly 

(Havecker et al. 2004). The gag gene encodes core structural proteins that form the virus-

like particle in which the retrotransposon RNA will undergo reverse transcription. The pol 

gene encodes: 1) a reverse transcriptase that transcribes the ERV RNA into double-stranded 

DNA (dsDNA); 2) an integrase that processes the 3' end of the ERV dsDNA to produce 3' 

hydroxyl groups, cleaves the host DNA, and facilitates the ligation between the processed 

ERV dsDNA and host DNA; and 3) a protease that processes the ERV polyproteins into 

functional units. Therefore, gag and pol harbour the machinery required for the ERV to 

intracellularly retrotranspose. Finally, the env gene encodes for the viral envelope, which 

protects the ERV and allows it to exit its host cell to infect other cells.  

Most annotated ERVs in mammalian genomes exist as immobile solo LTRs that 

arise due to inter-LTR homologous recombination – for example, in the human genome, 

solo LTRs represent ~90% of all ERV insertions (Stoye 2001; Jern and Coffin 2008; Friedli 

and Trono 2015). Of the full-length elements, most have l