1 
DNA G-quadruplex structures mould the DNA methylome 1 
 2 
Shi-Qing Mao1, Avazeh T. Ghanbarian1, Jochen Spiegel1, Sergio Martínez Cuesta1, Dario 3 
Beraldi1,4, Marco Di Antonio2, Giovanni Marsico1, Robert Hänsel-Hertsch1, David Tannahill1 4 
and Shankar Balasubramanian*1,2,3 5 
 6 
1 Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Cambridge, UK 7 
2 Department of Chemistry, University of Cambridge, Cambridge, UK 8 
3 School of Clinical Medicine, University of Cambridge, Cambridge, UK 9 
4 Current address: Institute of Cancer Sciences, University of Glasgow, Glasgow, UK 10 
* Correspondence should be addressed to S.B. (sb10031@cam.ac.uk) 11 
 12 
 13 
 14 
Abstract 15 
Control of DNA methylation level is critical for gene regulation, and the factors that govern 16 
hypomethylation at CpG islands (CGIs) are still being uncovered. Here, we provide evidence 17 
that G-quadruplex (G4) DNA secondary structures are genomic features that influence 18 
methylation at CGIs. We show that the presence of G4 structure is tightly associated with CGI 19 
hypomethylation in the human genome. Surprisingly, we find that these G4 sites are 20 
enriched for DNA methyltransferase 1 (DNMT1) occupancy, which is consistent with our 21 
biophysical observations that DNMT1 exhibits higher binding affinity for G4s as compared to 22 
duplex, hemi-methylated or single-stranded DNA. The biochemical assays also show that the 23 
G4 structure itself, rather than sequence, inhibits DNMT1 enzymatic activity. Based on these 24 
data, we propose that G4 formation sequesters DNMT1 thereby protecting certain CGIs from 25 
methylation and inhibiting local methylation. 26 
 27 
Introduction 28 
Methylation of cytosine at C-5 is a key DNA modification in development and disease1,2. In 29 
mammals, cytosine methylation occurs predominantly at CpG dinucleotides and is installed 30 
and maintained by three DNA methyltransferase enzymes (DNMT1, 3A and 3B) that are 31 
essential for development3–5. CpGs occur less frequently than expected in the mammalian 32 
genome and show a bimodal distribution with respect to methylation6,7. Sparsely distributed 33 
CpGs (~90%), found in genic and intergenic regions, tend to be highly methylated, while CpGs 34 
 2 
found in dense GC-rich regions, so-called CpG Islands (CGIs), are largely depleted of 35 
methylation and are prevalent at the promoters of house-keeping and developmental 36 
genes8,9. Outside of embryonic development, gross methylation patterns are generally stable 37 
across different tissues10,11. Nonetheless during key cellular events, methylation can be 38 
dynamic at specific loci to modulate gene expression, such as de novo methylation of some 39 
promoter CGIs with intermediate CpG density during lineage commitment12. 40 
 41 
General rules on maintenance of the default methylation state are being uncovered and 42 
several studies suggest that CGI hypomethylation is sequence-dependent13–18. Furthermore, 43 
DNMTs are reported to be actively and continuously excluded from CpG-poor distal 44 
regulatory regions through competitive inhibition with DNA binding proteins, such as NRF1 45 
and CTCF/REST,  thus maintaining the hypomethylated state of regulatory regions19,20. Lowly 46 
methylated regions also co-localise with DNase I hypersensitivity sites marking accessible 47 
chromatin regions21. The presence of enhancer chromatin marks, such as  histone 48 
modifications, also play an important role in forming the unique chromatin structure of 49 
CGIs22,23. In mouse embryonic stem cells, the CXXC finger protein 1, Cfp1, is believed to 50 
promote CGI hypomethylation through binding unmethylated CpG and recruitment of H3K4 51 
methyltransferases to promote H3K4me324,25. However, Cfp1 binding and/or H3K4me3 are 52 
not required for the ‘protection’ of CGI from DNA methylation since Cfp1 knockout results in 53 
a dramatic loss of H3K4me3 at CGIs without increasing DNA methylation24. This suggests that 54 
Cfp1 binding and/or H3K4me3 are not required to prevent CGIs from DNA methylation, thus 55 
there may be other factors that are fundamental to impart the hypomethylated state.  56 
 57 
Alternative DNA secondary structures, known as G-quadruplexes (G4s) are found within 58 
certain G-rich sequences and arise through the self-association of guanine bases to form 59 
stacked tetrads (Fig. 1a)26. G4s are increasingly being recognised as important features in the 60 
genome and over 700,000 G4s have been biophysically mapped in purified human genomic 61 
DNA by high-throughput sequencing27. G4 structures have been observed in human using 62 
immunofluorescence with a G4-specific antibody (BG4)28, and linked with transcriptional 63 
regulation and are enriched in gene promoters including many oncogenes26,29. Recently, G4-64 
chromatin immunoprecipitation sequencing (G4-ChIP-seq) has been developed to map G4 65 
 3 
structures in human chromatin30,31. Corroborating a link with transcription, the majority of 66 
G4-ChIP-seq sites were found predominantly in regulatory, nucleosome-depleted chromatin, 67 
particularly in gene promoters31,32. As both G4s and hypomethylated CGIs are associated 68 
with actively transcribed genes9,31, this raises the question of whether there is an interplay 69 
between G4 formation and DNA methylation. 70 
 71 
Herein we present evidence that most G4 structures, as detected by G4-ChIP-seq, are formed 72 
in regions comprising unmethylated CGIs in the human genome. We also uncover a striking 73 
co-localisation of G4 structures and DNMT1 docked at CGIs, and we demonstrate that 74 
DNMT1 methylation activity is inhibited by DNA G4 structures. Our data suggest a 75 
mechanism for the ‘protection’ of CGIs from methylation by G-quadruplex structures that 76 
locally sequester and inhibit DNMT1. 77 
 78 
Results 79 
G4 structures in active chromatin are found within hypomethylated CGIs 80 
To explore any potential relationship between G4 structures in chromatin and methylation 81 
levels, we employed human K562 chronic myelogenous leukaemia cells in which methylation 82 
has been comprehensively characterised at single base resolution using whole genome 83 
bisulfite sequencing (WGBS) by the ENCODE project33. We generated a genome-wide dataset 84 
for G4 structures by G4 ChIP-Seq31 using the G4-specific antibody BG428 and compared the 85 
BG4 peak overlap with CGIs34. Strikingly, we found that the majority of BG4 peaks (79%, 86 
7111/8952) overlapped with a CGI (covering 23% of all CGIs) (Fig. 1b). The majority of CpG 87 
island regions span 200 to 1000 bp (median/mean, 569/775 bp), while BG4 peak regions 88 
span 100 to 400 bp (median/mean, 205/226 bp) (Fig. 1c). 83% (5935/7111) of these CGIs 89 
overlap with one BG4 peak (Fig. 1d). Furthermore, when the level of methylation at BG4 90 
peaks was considered, we noted that there was a dramatic absence of methylation at BG4 91 
peaks (mean 1%, median 0.5%), compared with average genome methylation (28.4%) (Fig. 92 
1e). To rule out any effect of the cytosine methylation state on the ability of the BG4 93 
antibody to recognise a G4 structure, an ELISA binding assay was used to show that BG4 can 94 
bind to G4 structured DNA with equal affinity irrespective of the presence of cytosine 95 
methylation (SI Fig. 1). DNase I hypersensitive sites (DHS), which mark open chromatin, are 96 
 4 
also mainly hypomethylated (mean 11%, median 2.5%, Fig. 1e). Confirming our previous 97 
observations31, the majority of BG4 peaks are found in open chromatin (DHSs, 97%, 98 
8655/8952), and it is notable that these sites have the lowest methylation levels (Fig. 1e). 99 
Overall CGI methylation (mean 27%, median 8%, Fig. 1e) shows a broader distribution than 100 
BG4 regions, since some CGIs are associated with active hypomethylated promoters while 101 
others with inactive genes or gene bodies and thus are more heavily methylated9,35. This 102 
prominent association between BG4 peaks, hypomethylation and particular CGIs is 103 
suggestive of a functional link between G4 secondary structures and the establishment 104 
and/or maintenance of low methylation status at these CGIs in active chromatin.  105 
 106 
It has recently been concluded from work in mouse embryonic stem cells, that both high 107 
CpG-density and high GC-richness are required to establish the hypomethylated state at 108 
CGIs15. It is therefore notable that BG4 peaks have a similar level of GC richness to CGIs (Fig. 109 
1f) with most (79%) being located in regions of CpG density comparable to that seen in CGIs 110 
(Fig. 1g). It has been suggested that CpG density alone is only a minor determinant of the 111 
unmethylated state, as dense CpG sequences embedded within an AT-rich context are 112 
invariably highly methylated when inserted into the mouse genome15,16. Indeed, when we 113 
compare the average methylation of CGIs (Fig. 1h) to that of BG4 peaks at different CpG 114 
dinucleotide densities, it is noteworthy that across a wide range of CpG densities, BG4 peak 115 
regions are always largely devoid of methylation (Fig. 1h). We also confirmed these 116 
observations using an alternative CGI definition set generated by CpGCluster algorithm36 (SI 117 
Fig. 2a, 2b). Furthermore, when methylation at CGIs is considered with respect to the 118 
presence or absence of a BG4 peak, it is noteworthy that there is an almost a total lack of 119 
methylation at CGIs with BG4 than CGIs without (SI Fig. 2c). This strongly suggests that CGIs 120 
associated with the physical presence of a G4 structure generally have particularly low 121 
methylation.  122 
 123 
To explore low methylation in different CGI contexts, we calculated methylation levels 124 
relative to BG4 presence in CGIs containing i) no promoter or DHS site, ii) a promoter alone, 125 
iii) a DHS site alone and iv) both a promoter and DHS site. It is apparent that CGIs containing 126 
a BG4 peak always have lower methylation in open (DHS +) or closed (DHS −) chromatin, or in 127 
the presence or absence of a promoter. (SI Fig. 2d). CGIs with a DHS site and promoter but 128 
 5 
without a BG4 peak (4500 sites) have higher methylation (mean 2%, median 2%) than those 129 
CGIs (5567 sites) with a BG4 peak plus promoter and DHSs (mean 1%, median 0.5%) (SI Fig. 130 
2d, right two panels). The lowest observed methylation states are found therefore at sites 131 
carrying a G4 structure, suggesting that the physical presence of a G4 structure within CGI is 132 
an important feature with respect to the hypomethylation state. This is illustrated in Fig. 1i 133 
which shows the co-incidence of BG4 peaks with hypomethylated promoter CGIs for a 134 
representative genome region.  135 
 136 
In earlier work, we found that treatment of human epidermal keratinocytes (HaCaT) cells 137 
with the HDAC inhibitor entinostat led to increased BG4 binding signal primarily located in 138 
open chromatin promoter regions37.  We therefore generated WGBS datasets to examine 139 
DNA methylation changes with respect to BG4 signal. Consistent with our observation in 140 
K562 cells, BG4 peaks in HaCaT cells have lower methylation compared with open chromatin 141 
and CGI regions (SI Fig. 2e). In open-chromatin promoter CGI regions, 307 had a significant 142 
increase in BG4 signal (BG4 increase, > 1.5-fold change in signal and FDR < 0.05, see Online 143 
Methods). No change in BG4 signal was seen in 3261 CGI promoter regions before and after 144 
treatment (BG4 constant), or for 1504 G-rich CGI promoter regions that do not have a BG4 145 
peak (BG4 negative) but have the potential to form a G4 in vitro27. Despite open-chromatin 146 
promoter CGI regions already being predisposed to low methylation, we see a statistically 147 
significant additional drop in methylation levels at CGIs where BG4 peak size increases after 148 
HDAC inhibition (SI Fig. 2f). Overall, these data support that formation of G-quadruplex 149 
structures in CGIs is linked to lower methylation. 150 
 151 
DNMT1 is sequestered at G4 structures associated with low methylation 152 
Given that regions where G4 structures marked by BG4 peaks are generally observed to be 153 
hypomethylated, we considered that the DNA methyltransferases might have some form of 154 
physical interaction with G4 structures in the chromatin context. We focused on DNMT1 155 
since DNMT1 knockout is lethal causing global DNA methylation loss in all dividing somatic 156 
cells and human embryonic stem cells (ESCs)3,5,38,39, whereas  DNMT3A/B knockouts mainly 157 
affect non-CpG methylation in human ESCs5. When we considered the distribution of DNMT1 158 
binding sites in K562 cells, downloaded from ENCODE33 (516,483 peaks in total across both 159 
biological replicates), we found that 52% (4611/8952, Monte Carlo simulation’s P-value 160 
 6 
1.25e-04) of the G4 structures mapped by G4-ChIP (BG4 peaks) overlapped with at least one 161 
DNMT1 binding site. Of the remaining 4341 BG4 peaks, 4003 were within 1 Kb of a DNMT1 162 
binding site. The proximity of BG4 peaks to DNMT1 recruitment sites is illustrated graphically 163 
in Fig. 2a for a representative genome region. Intriguingly, when the distribution of DNMT1 164 
binding is plotted relative to high, intermediate or low methylated CGIs, we observe a 165 
prominent enrichment of DNMT1 binding at lowly methylated CGIs which overlaps with 166 
those regions with the highest BG4 peak density as well as DHS sites (Fig. 2b). A similar 167 
profile is also seen using alternative CGI definition set generated by CpGCluster36 (SI Fig. 3). 168 
The observation that DNMT1 enrichment at G4 regions that lack methylation is, at first 169 
glance, somewhat unexpected and counter-intuitive, given that DNMT1 installs methylation. 170 
We therefore considered the possibility of a mechanism whereby DNMT1 protein is 171 
sequestered at these sites in active chromatin but prevented from methylating CpGs in that 172 
locality. 173 
 174 
DNMT1 selectively binds to and is inhibited by G4 structures  175 
To address whether DNMT1 binds G4 structures directly, we carried out biophysical 176 
measurements using an enzyme-linked immunosorbent assay (ELISA) to measure the binding 177 
of recombinant human FLAG-tagged full-length DNMT1 protein to immobilized target DNA 178 
structures (see Online Methods). Biotinylated single-stranded oligonucleotides of sequence 179 
based on the promoters of BCL2, KIT2 and MYC were chosen as these fold into well-180 
characterised G4 structures40–42. Mutated versions (BCL2-mut, KIT2-mut and MYC-mut) that 181 
are unable to form a G4 structure were also used as controls. The presence or absence of G4 182 
folded structure with G4 oligonucleotides and mutated controls was confirmed by circular 183 
dichroism (CD) spectroscopy and ultraviolet (UV) thermal melting spectroscopic analysis (SI 184 
Fig. 4a-f). We found that DNMT1 binds to all three G4 structures with low nanomolar affinity 185 
(Kd[BCL2] = 9.6 ± 0.3 nM; Kd [KIT2] = 15.2 ± 0.4 nM, Kd [MYC] = 25.3 ± 0.4 nM, n=3; Fig. 3a-c). 186 
DNMT1 showed a lower binding affinity for unmethylated duplex DNA (BCL2, 107±5 nM; Fig. 187 
3d) and there was no specific binding observed for the single stranded mutated 188 
oligonucleotide controls. Notably, DNMT1 generally showed a greater affinity for G4 189 
structures than known DNMT1 substrates such as a hemi-methylated duplex DNA (BCL2, 85 ± 190 
7 nM, Fig. 3e), or a synthetic poly(dI-dC)50 substrate (75 ± 2 nM, Fig. 3f). DNMT1 binding to 191 
G4 structures does not appear to depend on CpG dinucleotides, since the absence of CpG in 192 
 7 
the MYC G4 did not preclude DNMT1 binding (Fig. 3c). To begin to dissect the binding mode 193 
of DNMT1 to G4s, we used a competition ELISA assay. 50 nM immobilized BCL2 G4 (Kd for 194 
DNMT1 = 9.6 nM, Fig. 3a) was incubated with DNMT1 protein in the presence of increasing 195 
concentrations of competitors. Even with 100-fold excess (5 M) of DNA duplex (Kd for 196 
DNMT1 = 107 nM, Fig. 3d) or poly-dIdC (Kd for DNMT1 = 75 nM, Fig. 3f), there was no 197 
inhibition (Fig. 3g). This suggests that G4s and duplex DNA occupy different binding sites and 198 
that the catalytic domain is not involved in G4 binding. A similar, non-overlapping G4 and 199 
duplex binding has also been observed in other proteins such as TRF243 and Rap144. 200 
 201 
The relatively high binding affinity and selectivity of DNMT1 for DNA G4 structures is 202 
consistent with the observation that DNMT1 shows some localisation to G4 structures in 203 
K562 cells (Fig. 2a, b). To validate the association with low methylation in the locality G4 sites 204 
in the genome, we evaluated whether G4 DNA could actually inhibit DNA methylation on a 205 
standard assay using poly(dI-dC)n as substrate45 of DNMT1 using a fluorometric biochemical 206 
assay (Abcam, see Online Methods). Specifically, we evaluated different concentrations of 207 
folded G4-structured oligonucleotides or mutated non-G4 controls, where the presence or 208 
absence of G4 structure had been confirmed by CD spectroscopy (SI Fig. 4g-i). We indeed 209 
found that each of three G4 structures resulted in significant inhibition of DNMT1 210 
methyltransferase activity whereas the mutated control oligonucleotides did not (Fig 3h-j). 211 
Gratifyingly, the potency of inhibition by each G4 was related to the binding affinity for 212 
DNMT1, as determined by ELISA with BCL2 being the most potent inhibitor (50% inhibition at 213 
~25 nM), MYC being least potent (50% inhibition at ~1 M) and KIT2 being intermediate (50% 214 
inhibition at 90 nM). No inhibition of activity was seen with mutated controls ranging from 215 
400 nM-8 μM concentration. C-rich oligonucleotides complementary to the G4 sequences 216 
(BCL2-CCC, KIT2-CCC and MYC-CCC) or corresponding duplex DNA also had no effect on 217 
DNMT1 activity (SI Fig. 4j-l). We also tested G4 oligonucleotides that were able to fold into a 218 
G4 structure but carried a reduced number CpGs (BCL2, KIT) or had a number of artificially 219 
introduced CpGs (MYC). In all cases, changes in the number of CpG sites only had minor 220 
effects on DNMT1 inhibition (SI Fig. 4j-l). Taken together, these results indicate a novel and 221 
unexpected feature of G4 structures as potential genomic regions that promote the 222 
unmethylated state through recruitment and inhibition of DNMT1 activity.  223 
 224 
 8 
Recruitment of DNMT to G4 structures shapes the methylome  225 
The above data suggest that there is a striking lack of methylation (Fig. 1e, h, SI Fig. 2a-d) in 226 
chromatin regions where G4 structure formation is observed. To rigorously question whether 227 
this observation was related to the detectable formation of a G4 structure in chromatin (i. e. 228 
a BG4 peak), or merely the G-rich sequences per se with potential to form a G4 structure, the 229 
methylation profile for BG4 peak regions was compared to those of G-rich sequences that 230 
can physically form a G4 structure in an in vitro sequencing assay27 (here called Sequences 231 
with potential to form G4s, Fig. 4a). As the majority of BG4 peaks (8,210) are found in open 232 
chromatin, only sequences with potential to form G4s located in open chromatin (36,015) 233 
were considered. The mean and median length is 226/205 bp for BG4 peaks, and 383/285 bp 234 
for the latter. G-rich sequences with the potential to form a G4 are largely hypomethylated 235 
(12%), with methylation levels rising in the flanking regions (45%), whereas BG4 peaks have 236 
substantially lower methylation (down to 1%) and flanking regions being more methylated 237 
(60%). The contrast between lowest methylation at BG4 sites with highest methylation at 238 
distal flanking regions is also exemplified in the genome browser view in Fig. 1i. While G-239 
richness as defined by G4 sequence without structure is a feature that correlates with the 240 
lack of methylation, there is a further dramatic loss of methylation due to the physical 241 
presence of a G4 structure with these regions also being marked by a greater methylation 242 
flanking the G4 structure. Regions with a G4 structure also correspond to CGIs that mark 243 
particular active genes, and is in keeping with our previous data showing that G4s are 244 
associated with particular chromatin states to promote elevated transcription31. R-loops 245 
(three-stranded DNA-RNA hybrids) have also been linked to reduced methylation in 246 
transcribed CpG island promoters46,47. As R-loops form in a similar genomic context to G4s, 247 
we tested the correlation of R-loops, BG4 peaks and methylation. Using the K562 R-loop 248 
dataset46, we found that 5685 BG4 peaks overlap with a R-loop, while 3267 BG4 peaks do not 249 
and that BG4 peaks are depleted of methylation independent of R-loop presence (SI Fig. 5). 250 
This suggests that G4 structure is strongly linked to hypomethylation, irrespective of the 251 
presence of R-loop. 252 
 253 
Discussion 254 
Here we have provided evidence for a link between a DNA secondary structure formation 255 
and epigenetic status. We have uncovered a unique chromatin context whereby certain CGIs 256 
 9 
in active chromatin are depleted in methylation but carry a G4 structure and also the 257 
surrounding flanking regions display higher than average methylation. This suggests that G4s 258 
may impart a previously unknown and important function in the establishment of epigenome.  259 
We propose a model (Fig. 4b) in which G4 formation, together with transcription factor 260 
binding19,20, contributes to loss of methylation at key genomic loci by sequestering DNMT1, 261 
via G4 recognition, and locally inhibiting DNMT1 function at CpG islands. It is noteworthy 262 
that this mechanism resembles a recently proposed model for recruitment and inhibition of 263 
PRC2 complex by a RNA G-quadruplex present in the HOTAIR lncRNA48,49. This suggests there 264 
may be other mechanisms for epigenetic regulation that operate by the sequestration and 265 
inhibition of epigenetic modifiers mediated by high affinity interactions with nucleic acid 266 
secondary structures. 267 
 268 
Accession Codes 269 
K562 datasets for DHS (ENCSR000EPC), DNMT1 ChIP-seq (ENCSR987PBI) and whole genome 270 
bisulfate sequencing (ENCSR765JPC) were downloaded from ENCODE. G4-ChIP-seq data sets 271 
for K562 and WGBS datasets for entinostat-treated and untreated HaCaT cells are available 272 
at the NCBI GEO repository under accession number GSE107690. G4-ChIP-seq data in 273 
entinostat-treated and untreated HaCaT cells were taken from GSE76688. 274 
 275 
Acknowledgements  276 
This work is supported by a core CRUK award (C14303/A17197). S.B. is a Senior Investigator 277 
of the Wellcome Trust (grant no. 099232/z/12/z). JS is a Marie Curie Fellow of the European 278 
Union (747297-QAPs-H2020-MSCA-IF-2016). 279 
 280 
Author Contributions  281 
 282 
The project was conceived by SM and SB. SM designed and carried out all the experiments 283 
with discussions from DB, JS, RHH, DT & SB. SM designed the analysis strategies with 284 
discussions from AG, DT & SB. JS performed G4-ChIP-seq experiments. AG & SMC carried out 285 
all computational analysis with discussions from SM, DT, DB, RHH & GM. MD carried out the 286 
CD spectroscopy and UV melting experiments. All authors interpreted the results. SM, DT & 287 
SB wrote the paper with input from all authors. 288 
 289 
 10 
Competing interests  290 
SB is an advisor and shareholder of Cambridge Epigenetix limited.  291 
 11 
References 292 
1. Smith, Z. D. & Meissner, A. DNA methylation: roles in mammalian development. Nat. 293 
Rev. Genet. 14, 204–20 (2013). 294 
2. Feinberg, A. P. & Tycko, B. The history of cancer epigenetics. Nat. Rev. Cancer 4, (2004). 295 
3. Li, E., Bestor, T. H. & Jaenisch, R. Targeted mutation of the DNA methyltransferase 296 
gene results in embryonic lethality. Cell 69, 915–926 (1992). 297 
4. Okano, M., Bell, D. W., Haber, D. A. & Li, E. DNA methyltransferases Dnmt3a and 298 
Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 299 
247–257 (1999). 300 
5. Liao, J. et al. Targeted disruption of DNMT1, DNMT3A and DNMT3B in human 301 
embryonic stem cells. Nat. Genet. 47, 469–478 (2015). 302 
6. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21 303 
(2002). 304 
7. Lister, R. et al. Human DNA methylomes at base resolution show widespread 305 
epigenomic differences. Nature 462, 315–322 (2009). 306 
8. Illingworth, R. S. & Bird, A. P. CpG islands - ‘A rough guide’. FEBS Lett. 583, 1713–1720 307 
(2009). 308 
9. Deaton, A. & Bird, A. CpG islands and the regulation of transcription. Genes Dev. 25, 309 
1010–1022 (2011). 310 
10. Reik, W., Dean, W. & Walter, J. Epigenetic reprogramming in mammalian development. 311 
Science 293, 1089–93 (2001). 312 
11. Li, E. Chromatin modification and epigenetic reprogramming in mammalian 313 
development. Nat. Rev. Genet. 3, 662–673 (2002). 314 
12. Reik, W. Stability and flexibility of epigenetic gene regulation in mammalian 315 
development. Nature 447, 425–432 (2007). 316 
13. Long, H. K., King, H. W., Patient, R. K., Odom, D. T. & Klose, R. J. Protection of CpG 317 
islands from DNA methylation is DNA-encoded and evolutionarily conserved. Nucleic 318 
Acids Res. 44, 6693–6706 (2016). 319 
14. Lienert, F. et al. Identification of genetic elements that autonomously determine DNA 320 
methylation states. Nat. Genet. 43, 1091–1097 (2011). 321 
15. Wachter, E. et al. Synthetic CpG islands reveal DNA sequence determinants of 322 
chromatin structure. Elife 3, 1–16 (2014). 323 
16. Krebs, A. R., Dessus-Babus, S., Burger, L. & Schübeler, D. High-throughput engineering 324 
of a mammalian genome reveals building principles of methylation states at CG rich 325 
regions. Elife 3, e04094 (2014). 326 
17. Takahashi, Y. et al. Integration of CpG-free DNA induces de novo methylation of CpG 327 
islands in pluripotent stem cells. Science 356, 503–508 (2017). 328 
18. Quante, T. & Bird, A. Do short, frequent DNA sequence motifs mould the epigenome? 329 
Nat. Rev. Mol. Cell Biol. 17, 257–62 (2016). 330 
19. Domcke, S. et al. Competition between DNA methylation and transcription factors 331 
determines binding of NRF1. Nature 528, 575–579 (2015). 332 
20. Stadler, M. B. et al. DNA-binding factors shape the mouse methylome at distal 333 
regulatory regions. Nature 480, 490–5 (2011). 334 
21. Thurman, R. E. et al. The accessible chromatin landscape of the human genome. 335 
Nature 489, 75–82 (2012). 336 
22. Ooi, S. K. T. et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novo 337 
methylation of DNA. Nature 448, 714–717 (2007). 338 
 12 
23. Du, J., Johnson, L. M., Jacobsen, S. E. & Patel, D. J. DNA methylation pathways and 339 
their crosstalk with histone methylation. Nat. Rev. Mol. Cell Biol. 16, 519–532 (2015). 340 
24. Clouaire, T. et al. Cfp1 integrates both CpG content and gene activity for accurate 341 
H3K4me3 deposition in embryonic stem cells. Genes Dev. 26, 1714–1728 (2012). 342 
25. Thomson, J. P. et al. CpG islands influence chromatin structure via the CpG-binding 343 
protein Cfp1. Nature 464, 1082–1086 (2010). 344 
26. Hänsel-Hertsch, R., Di Antonio, M. & Balasubramanian, S. DNA G-quadruplexes in the 345 
human genome: Detection, functions and therapeutic potential. Nat. Rev. Mol. Cell 346 
Biol. 18, 279–284 (2017). 347 
27. Chambers, V. S. et al. High-throughput sequencing of DNA G-quadruplex structures in 348 
the human genome. Nat. Biotechnol. 33, 1–7 (2015). 349 
28. Biffi, G., Tannahill, D., McCafferty, J. & Balasubramanian, S. Quantitative visualization 350 
of DNA G-quadruplex structures in human cells. Nat. Chem. 5, 182–6 (2013). 351 
29. Rhodes, D. & Lipps, H. J. G-quadruplexes and their regulatory roles in biology. Nucleic 352 
Acids Res. 43, 8627–8637 (2015). 353 
30. Hänsel-Hertsch, R., Spiegel, J., Marsico, G., Tannahill, D. & Balasubramanian, S. 354 
Genome-wide mapping of endogenous G-quadruplex DNA structures by chromatin 355 
immunoprecipitation and high-throughput sequencing. Nat. Protoc. 13, 551–564 356 
(2018). 357 
31. Hänsel-Hertsch, R. et al. G-quadruplex structures mark human regulatory chromatin. 358 
Nat. Genet. 48, 1267–1272 (2016). 359 
32. De, S. & Michor, F. DNA secondary structures and epigenetic determinants of cancer 360 
genome evolution. Nat. Struct. Mol. Biol. 18, 950–5 (2011). 361 
33. Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. 362 
Nature 489, 57–74 (2012). 363 
34. Gardiner-Garden, M. & Frommer, M. CpG Islands in vertebrate genomes. J. Mol. Biol. 364 
196, 261–282 (1987). 365 
35. Jones, P. A. Functions of DNA methylation: islands, start sites, gene bodies and beyond. 366 
Nat. Rev. Genet. 13, 484–92 (2012). 367 
36. Hackenberg, M. et al. CpGcluster: A distance-based algorithm for CpG-island detection. 368 
BMC Bioinformatics 7, 1–13 (2006). 369 
37. Chen, L. et al. R-ChIP Using Inactive RNase H Reveals Dynamic Coupling of R-loops with 370 
Transcriptional Pausing at Gene Promoters. Mol. Cell 68, 745–757.e5 (2017). 371 
38. Fan, G. et al. DNA hypomethylation perturbs the function and survival of CNS neurons 372 
in postnatal animals. J. Neurosci. 21, 788–797 (2001). 373 
39. Jackson-Grusby, L. et al. Loss of genomic methylation causes p53-dependent apoptosis 374 
and epigenetic deregulation. Nat. Genet. 27, 31–39 (2001). 375 
40. Dai, J. et al. An intramolecular G-quadruplex structure with mixed parallel/antiparallel 376 
G-strands formed in the human BCL-2 promoter region in solution. J. Am. Chem. Soc. 377 
128, 1096–1098 (2006). 378 
41. Kuryavyi, V., Phan, A. T. & Patel, D. J. Solution structures of all parallel-stranded 379 
monomeric and dimeric G-quadruplex scaffolds of the human c-kit2 promoter. Nucleic 380 
Acids Res. 38, 6757–6773 (2010). 381 
42. Ambrus, A., Chen, D., Dai, J., Jones, R. A. & Yang, D. Solution structure of the 382 
biologically relevant G-quadruplex element in the human c-MYC promoter. 383 
Implications for G-quadruplex stabilization. Biochemistry 44, 2048–2058 (2005). 384 
43. Biffi, G., Tannahill, D. & Balasubramanian, S. An intramolecular G-quadruplex structure 385 
is required for binding of telomeric repeat-containing RNA to the telomeric protein 386 
 13 
TRF2. J. Am. Chem. Soc. 134, 11974–11976 (2012). 387 
44. Giraldo, R. & Rhodes, D. The yeast telomere-binding protein RAP1 binds to and 388 
promotes the formation of DNA quadruplexes in telomeric DNA. EMBO J. 13, 2411–389 
2420 (1994). 390 
45. Bacolla, A., Pradhan, S., Roberts, R. J. & Wells, R. D. Recombinant Human DNA 391 
(Cytosine-5) Methyltransferase. J. Biol. Chem. 274, 33002–33010 (1999). 392 
46. Sanz, L. A. et al. Prevalent, Dynamic, and Conserved R-Loop Structures Associate with 393 
Specific Epigenomic Signatures in Mammals. Mol. Cell 63, 167–178 (2016). 394 
47. Ginno, P. A., Lott, P. L., Christensen, H. C., Korf, I. & Chédin, F. R-Loop Formation Is a 395 
Distinctive Characteristic of Unmethylated Human CpG Island Promoters. Mol. Cell 45, 396 
814–825 (2012). 397 
48. Wang, X. et al. Targeting of Polycomb Repressive Complex 2 to RNA by Short Repeats 398 
of Consecutive Guanines. Mol. Cell 65, 1056–1067.e5 (2017). 399 
49. Wang, X. et al. Molecular analysis of PRC2 recruitment to DNA in chromatin and its 400 
inhibition by RNA. Nat. Struct. Mol. Biol. (2017). doi:10.1038/nsmb.3487 401 
  402 


 14 
Figure 1. G4 formation is associated with hypomethylation at CGIs  403 
a)  A G-tetrad stabilized by Hoogsteen hydrogen bonding and a central monovalent cation 404 
(left). Schematic representations of a three-tetrad G4 structure (Right). b) Venn diagram 405 
illustrating the overlap of G4 structure formation (BG4 peaks) and CGIs. c) Violin plot 406 
showing size distribution of BG4 peaks and CGIs. d) Count of BG4 peaks overlapping a CGI. e) 407 
Box and whisker plot showing the average methylation for BG4 peaks (n = 8,210), DHSs (n = 408 
142,115) and CGIs (n = 27,073). Centre line represents the median value separating upper 409 
and lower quartiles in the box, whiskers indicate 1.5× interquartile range (IQR), points are 410 
actual values of outliers. Note that methylation level at CpG sites with less than 5x coverage 411 
is considered unreliable and discarded. f) Histogram showing the distribution of BG4 peaks 412 
and CGIs relative to percentage of GC. g) Histogram showing the distribution of BG4 peaks 413 
and CGIs relative to percentage of CpGs per 100 bp. h) Box and whisker plot showing the 414 
methylation levels for BG4 peaks and CGIs at different CpG densities. Note that by definition 415 
there are no CGIs at a CpG density < 5 CpGs/100bp and that at > 20 CpGs/100bp there are 416 
few CGIs (1) and BG4 (36) peaks to consider. The number of CGI regions and BG4 peaks in 417 
each category are presented on top of the plot. i) An IGV screen shot illustrating the co-418 
incidence of BG4 peaks (blue) with hypomethylated promoter CGIs (green) and DHSs (orange) 419 
for a representative genome region from Chr 7. Shown are normalised signal. Whole genome 420 
bisulfite sequence tracks are in black (top). RefGene tracks are in grey (bottom). 421 
 422 
Figure 2. DNMT1 is recruited to BG4 peaks associated with low methylation 423 
a) An IGV screen shot showing the co-incidence (blue-masked) of BG4 peaks (blue) with 424 
DNMT1 ChIP-seq peaks (red) and CGIs (green) at hypomethylated region from Chr 6. Orange-425 
masked regions are hypermethylated and enriched with DNMT1 presence, but not BG4 426 
signal. Whole genome bisulfite sequence tracks in black (top). b) Binding profile of DNMT1 427 
shown across CGIs with low (less than 20%, n = 16,523), intermediate (between 20% and 80%, 428 
n = 6,042) and high (more than 80%, n = 4,266) methylation. Y-axis shows the number of 429 
reads in the ChIP normalised to 1 of sequencing depth (also known as Reads Per Genomic 430 
Content (RPGC), more details in computational methods). Replicate 1 and 2 are indicated in 431 
red and blue respectively. Above each plot is a heat map showing the enrichment of BG4 432 
peaks and DHSs across the respective regions. The heat maps show RPGC of active chromatin 433 
marks (DHSs) and BG4 peaks on these three classes of CGIs.  434 
 15 
 435 
Figure 3. DNMT1 selectively binds and is inhibited by G4 structures  436 
a-f) ELISA assays testing the binding of recombinant DNMT1 to G4 structure and control 437 
oligonucleotides. Binding curves for: a) BCL2 G4 and non-G4-forming control (BCL2-mut); b) 438 
KIT2 G4 and non-G4-forming control (KIT2-mut); c) MYC G4 and non-G4-forming control 439 
(MYC-mut); d) BCL2 duplex DNA; e) BCL2 hemi-methylated duplex DNA; f) poly(dI-dC), 100 nt. 440 
Absorbance was measured at 450 nm. a.u., arbitrary unit. Sequences of oligonucleotides are 441 
given below the graphs. g) Binding curve of BCL2 G4 in presence of different concentration of 442 
BCL2 duplex or poly(dIdC)n. h-j) Relative methylation activity of recombinant DNMT1 in 443 
presence of G4 structure and control oligonucleotides: h) BCL2 G4 and BCL2-mut; i) KIT2 G4 444 
and KIT2-mut; j) MYC G4 and MYC-mut. Shown are mean ± s.d., n = 3 independent 445 
experiments in all plots but g (n = 2). 446 
 447 
Figure 4. Recruitment of DNMT1 by G4 structures shapes the methylome in G-rich regions 448 
a) Plot showing the average methylation profile centred around G4 forming regions (red and 449 
blue are replicates 1 and 2 respectively, n = 7,491) or G4 sequences without structure 450 
(orange and green are replicates 1 and 2 respectively, n = 36,015). The plot extends ± 5 Kb 451 
from the centre. The dotted line denotes the lowest methylation level of G4 sequence 452 
without structure. b) Proposed model for potential involvement of G4 structures and 453 
methylation control at CGIs: i) G4 structures sequester DNMT1 due to high affinity binding; ii) 454 
G4 structures inhibit the methylation activity of DNMT1. Together with the binding of 455 
transcription factors, G4 structures contribute to protection of CGIs from methylation.  456 
  457 
 16 
Online Methods 458 
 459 
Cell culture 460 
Mycoplasma-free human chronic myelogenous leukaemia K-562 cells (CCL-243) were 461 
purchased from ATCC and grown in RPMI1640 (Glutamine plus, Life Technologies) 462 
supplemented with 10% of fetal bovine serum and 100 U/ml penicillin-streptomycin (Life 463 
Technologies). All cell stocks were regularly tested for mycoplasma contamination. 464 
 465 
G-quadruplex ChIP-seq 466 
ChIP-seq for G-quadruplex structures (G4-ChIP-seq) was performed using the G4-specific 467 
antibody BG4 essentially as described previously31. 468 
 469 
Oligonucleotide annealing  470 
All oligonucleotides were PAGE purification quality (Sigma). For G4 formation, 10 µM DNA 471 
oligonucleotide was annealed in 10 mM Tris HCl, pH 7.4, 100 mM KCl by heating at 95 °C for 472 
5 min followed by gradually cooling to 21 °C. For double-stranded DNA, 10 µM forward and 473 
reverse strand oligonucleotides were mixed and annealed in 10 mM Tris HCl, pH 7.4, 100 mM 474 
NaCl in the same manner. 20 µM poly(dI-dC)50 was annealed as for double-stranded DNA. 475 
 476 
Enzyme-linked immunosorbent assay (ELISA) 477 
ELISAs for binding affinity and specificity were performed as described previously28 with 478 
minor modifications. Briefly, biotinylated oligonucleotides were bound to Pierce™ 479 
Streptavidin Coated High Capacity Plates (ThermoFisher) followed by blocking with 1.5% BSA 480 
and incubation with recombinant full-length human FLAG-tagged DNMT1 protein (Active 481 
Motif, Cat. No: 31404) in ELISA buffer (100 mM KCl, 50 mM KH2PO4, pH7.4). After three 482 
washes with ELISA buffer, detection was achieved with an anti-FLAG horseradish peroxidase 483 
(HRP)-conjugated antibody (ab1238, Abcam) and TMB (3,3′,5,5′-tetramethylbenzidine) ELISA 484 
Substrate (Fast Kinetic Rate, ab171524, Abcam). Signal intensity was measured at 450 nm on 485 
a PHERAstar microplate reader (BMG Labtech). Dissociation constants (Kd) were calculated 486 
from saturation binding curves assuming one-site binding using Prism (GraphPad Software 487 
Inc.). Standard error of mean (s.e.m.) values were calculated from three replicates.  488 
 489 
In vitro DNA methylation assay 490 
DNA methylation assays were performed using a DNMT Activity Assay Kit (Fluorometric, 491 
ab113468, Abcam) as per manufacturer's instructions. Briefly, 100ng recombinant DNMT1 492 
was incubated with substrate assay wells in presence of different concentrations of G4 or 493 
non-G4 oligonucleotides at 37 °C for 90 min. Methylation levels were quantified from the 494 
binding of an anti-5-methylcytosine antibody detected by fluorescent secondary antibody. 495 
Fluorescence signal was measured using a PHERAstar microplate reader (530 nm excitation, 496 
590 nm emission). DNMT enzyme activity is proportional to the fluorescence intensity (RFU, 497 
 17 
relative fluorescence unit) measured. Relative methylation activity is then calculated against 498 
mock control. 499 
 500 
Circular dichroism spectroscopy   501 
CD spectra were recorded on an Applied Photo-physics Chirascan circular dichroism 502 
spectropolarimeter using a 1 mm path length quartz cuvette. CD measurements were 503 
performed at 298 K over a range of 220-300 nm using a response time of 0.5 s, 1 nm pitch 504 
and 0.5 nm bandwidth. The recorded spectra represent a smoothed average of three scans, 505 
zero-corrected at 300 nm (Molar ellipticity θ is quoted in 105 deg cm2 dmol−1). The 506 
absorbance of the buffer was subtracted from the recorded spectra. Oligonucleotides were 507 
dissolved in lithium cacodylate buffer (100 mM, pH 7.2) containing 100 mM of KCl and 1 mM 508 
EDTA to the concentration of 10 μM. 200 L of the oligonucleotides were annealed prior 509 
measurement by warming up to 90 °C and slowly cooling down at room temperature. 510 
  511 
UV Melting 512 
For UV melting experiments, measurements were collected using a Varian Cary 100-Bio 513 
UV−visible spectrophotometer by following absorbance at 295 nm. Samples (200 μl) with 514 
final concentration of 2 M were measured in black, small window, 1 cm path-length quartz 515 
cuvettes, covered with a layer of mineral oil (50 μl). Samples were equilibrated at 5 °C for 10 516 
min, heated to 95 °C and cooled back to 5 °C at a rate of 0.5 °C/min. The samples were held 517 
for a further 10 min and then the 5 °C to 95 °C ramp was repeated. Data were recorded every 518 
1 °C during both the melting and cooling steps. 200 L of oligonucleotides were annealed 519 
prior measurement by warming up to 90 °C and slowly cooling down at room temperature. 520 
 521 
Bioinformatics Software and Scripts 522 
Bioinformatic data analyses and processing were performed using Perl, Bash, Python and R 523 
programming languages. The following tools were also used: cutadapt (1.15)50, BWA 524 
(0.7.15)51, Picard (2.8.3), (http://broadinstitute.github.io/picard), MACS (2.1.1)52, Bedtools 525 
(2.26.0), (http://bedtools.readthedocs.io/en/latest/content/overview.html), Deeptools 526 
(2.5.1)53 and Bismark (v0.19.0)54. 527 
All scripts and software developed are released in the following GitHub page: 528 
https://github.com/sblab-bioinformatics/dna-g4-methylation-dnmt1 529 
 530 
G4-ChIP-seq analysis 531 
Raw fastq reads from G4-ChIP-seq in K562 cells were trimmed with cutadapt50 to remove 532 
adapter sequences and low-quality reads (mapping quality < 10). Reads were aligned to the 533 
human genome (version hg19) with BWA51 and duplicates were removed using Picard. Peaks 534 
were called by MACS252 (p < 10-5) following our previous work31: https://github.com/sblab-535 
bioinformatics/dna-secondary-struct-chrom-lands/blob/master/Methods.md  536 
Peaks were merged from different replicates with bedtools multiIntersect. Only peaks 537 
overlapping in 3 out 5 replicates were considered high-confidence. K562 datasets for DHSs 538 
 18 
(ENCSR000EPC), DNMT1 ChIP-seq (ENCSR987PBI), whole genome bisulfite sequencing 539 
(ENCSR765JPC) were downloaded from ENCODE. Promoters were defined as 1 kb (+/−) from 540 
the transcription start sites of 31,239 hg19 transcripts. Methylation levels at CpG sites with 541 
less than 5x coverage were discarded. If not otherwise specified, CGI34 were downloaded 542 
using the UCSC’s table browser and then ported to human genome release hg38 using the 543 
batch coordinate conversion (liftover) tool of the UCSC. The alternative CGI sets were 544 
generated using CpGCluster36.  545 
 546 
Enrichment analysis 547 
ENCODE DHS and ChIP-seq data sets were normalised to sequencing depth of 1 (i.e. RPGC, 548 
Reads Per Genomic Content). Sequencing depth is defined as: (total number of mapped 549 
reads * fragment length) / effective genome size. The effective genome size was set to be 550 
3,209,286,105 and enrichment values for DHSs and BG4 peaks over CGIs and their flanking 551 
sequences were visualised in R using ggplot2 library. Enrichment values for DNMT1 over CGIs 552 
and their flanks were visualised with DeepTools53. 553 
 554 
Monte Carlo Simulation  555 
Monte Carlo simulation was used to calculate the significance of overlap between BG4 peaks 556 
and high confidence DNMT1 peaks, defined by Irreproducible Discovery Rate (IDR) in 557 
ENCODE’s ChIP pipeline. We first counted how many BG4 peaks overlapped with OQSs in 558 
open chromatin (defined as all OQSs seen potassium and/or PDS conditions (749,339 559 
sequences)27, which overlap at least one DHS region (43,506 sequences)). We then randomly 560 
selected the same number of OQSs from all OQSs in open chromatin and counted how many 561 
overlapped with at least one high confidence DNMT1 peak. The Monte Carlo P-value was 562 
then calculated as (N+1)/(M+1), where M is the number of iterations and N is the number of 563 
times the same or more overlaps were observed between randomised OQSs and high 564 
confidence DNMT1 peaks (compared to the number of overlaps observed between BG4 565 
peaks and high confidence DNMT1 peaks). Randomisation was repeated for 8000 times and 566 
on average the number of overlaps between the shuffled OQSs and DNMT1 were two-fold 567 
less than those observed between BG4 and DNMT1 peaks. 568 
 569 
Differential methylation and BG4 binding analysis of entinostat treated HaCaT cells 570 
HaCaT cells were treated with 10 M entinostat for 48 hours as we previously described31. 571 
Genomic DNA from untreated and treated cells were extracted with phenol/chloroform. 50 572 
ng DNA were used to generate whole genome bisulfite sequencing libraries using Pico 573 
Methyl-Seq Library Prep Kit from Zymo research. Libraries were sequenced using the pair-574 
end 150 bp high-output kit on Illumina Next-seq platform. Data from 4 runs were pooled 575 
together. After quality assessment using FastQC 576 
(https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), reads were processed to 577 
remove adaptors and low-quality bases using cutadapt3. Options –u 6 –u -1 –U 6 –U -1 were 578 
used to trim the initial six and last nucleotide bases. High-quality reads were aligned using 579 
 19 
Bismark in paired-end mode with options --non_directional, --unmapped and -N 0 to hg19 580 
reference genome. Reads were then de-duplicated and methylation was extracted. To 581 
increase mapping efficiency and following previous work55, unmapped reads resulting from 582 
the paired-end alignments were then re-aligned in single-end mode with options --583 
non_directional and -N 0, and then deduplicated. Methylation was extracted for paired-end 584 
and single-end alignments separately and then aggregated. Technical replicates for each 585 
condition (before and after entinostat treatment) were merged and methylation counts were 586 
aggregated by CpG site. A threshold was then applied to keep CpG sites with more than 5X 587 
bisulfite sequencing depth both before and after treatment. This resulted in 21,106,307 CpG 588 
sites, 75% of all CpG sites in hg19. 589 
Differential BG4 binding analysis was done as previously reported31. Analysis focused on 590 
open chromatin promoter regions (5351) which have at least one G-rich sequence27 and 591 
ATAC-seq peak unaltered in size (log2 fold change = −0.6 to 0.6, FDR < 0.05) between 592 
untreated and entinostat-treated HaCaT cells. Depending on differential BG4 signal, these 593 
regions were categorized into BG4 gain (> 1.5-fold change in signal and FDR < 0.05) and BG4 594 
constant and BG4 negative. 95% (5072/5351) of these regions are overlapping with CGIs. 595 
Difference of the percentage methylation of the overlapping CGIs in each of the categories 596 
were calculated. Statistic test was done with Mann–Whitney U test. Plotting of methylation 597 
data were performed in the R programming language. 598 
 599 
Data availability. K562 datasets for DHS (ENCSR000EPC), DNMT1 ChIP-seq (ENCSR987PBI) 600 
and whole genome bisulfate sequencing (ENCSR765JPC) were downloaded from ENCODE. 601 
G4-ChIP-seq data sets for K562 and WGBS datasets for entinostat-treated and untreated 602 
HaCaT cells are available at the NCBI GEO repository under accession number GSE107690. 603 
G4-ChIP-seq data in entinostat-treated and untreated HaCaT cells were taken from 604 
GSE76688. Source data for figure 1d, e, h and Figure 3 are available with the paper online. 605 
 606 
References 607 
50. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing 608 
reads. EMBnet.journal 17, 10 (2011). 609 
51. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 610 
Prepr. https//arxiv.org/abs/1303.3997 00, 1–3 (2013). 611 
52. Zhang, Y. et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008). 612 
53. Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data 613 
analysis. Nucleic Acids Res. 44, W160–W165 (2016). 614 
54. Krueger, F. & Andrews, S. R. Bismark: A flexible aligner and methylation caller for 615 
Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011). 616 
55. Peat, J. R. et al. Genome-wide Bisulfite Sequencing in Zygotes Identifies Demethylation 617 
Targets and Maps the Contribution of TET3 Oxidation. Cell Rep. 9, 1990–2000 (2014). 618 
 
 
Supplementary Figure 1 
BG4 binds unmethylated and methylated G4 structures with the same affinity. 
ELISA assays testing binding of BG4 antibody to unmethylated and methylated G4 structures and control non-G4 forming 
oligonucleotides. CpG sites are highlighted in red. MG denotes methylated CpG. Shown are mean and SD from three measurements. 
 
 
Supplementary Figure 2 
Methylation is depleted in BG4 regions. 
a) Box and whisker plot showing the average methylation for BG4 peaks (n = 8,210), DHSs (n = 142,115) and CGIs (n = 22,673). 
Similar as in Fig 1e, apart from using CGI set generated by CpGCluster. b) Box and whisker plot showing the methylation levels for 
BG4 peaks and CGIs at different CpG densities. Similar as in Fig 1h, apart from using CGI set generated by CpGCluster. c) Box and 
whisker plot showing the average methylation levels for CGIs with or without a BG4 peak at different CpG densities. d) Box and whisker 
plot showing average methylation on CGIs with respect to BG4 peaks in presence (+) or absence (–) of a DHS or promoter. The 
number of CGI regions in each category are presented on top of the plot. e) Box and whisker plot showing the average methylation for 
BG4 peaks (n = 17,101), ATAC (ATAC-seq peaks denoting open chromatin, n = 23,217) and CGI regions (n = 26,580) in untreated 
HaCaT cells. f) Box and scatter plot showing differential percentage methylation in entinostat treated vs untreated cells for promoter 
CGIs in open chromatin regions containing sequences with potential to form G4s. i) BG4 negative CGIs without a G4 ChIP–seq peak 
but having potential to form a G4 structure (n = 1504); ii) BG4 constant CGIs with a least one high-confidence G4 ChIP–seq peak that 
does not significantly change before and after treatment (n = 3261), and iii) BG4 increases where a BG4 peak significantly increases in 
size after treatment (n = 307). Each grey dot represents one CGI region. p-values were calculated using a Mann–Whitney U test. 
 
 
Supplementary Figure 3 
DNMT1 is enriched at BG4 peaks associated with low methylation. 
Binding profile of DNMT1 in CGIs with low (less than 20%, n = 14,983), intermediate (between 20% and 80%, n = 4,864) and high 
(more than 80%, n = 2,826) methylation. Above each plot is a heat map showing the enrichment of BG4 peaks and DHSs across the 
respective regions. Similar as in Fig. 2b, apart from using CGI set generated by CpGCluster. 
 
 
Supplementary Figure 4 
Structure verification of oligonucleotides used in this study and inhibition of DNMT1 by G4 DNA. 
Circular dichroism spectra of a) BCL2 and BCL2-mut, b) KIT2 and BKIT2-mut, c) MYC and MYC-mut. Sequences are listed below the 
graph. UV melting profiles of the d) BCL2, e) KIT2, f) MYC. Mutated oligonucleotides lose the capacity to form G4s and therefore have 
no absorbance at 295 nm. Circular dichroism spectra of the g) BCL2-0CG/2CG/3CG and BCL2-CCC, h) KIT2-0CG/2CG/CGCG and 
KIT2-CCC, i) MYC-2CG/CTCA/4CG and MYC-CCC. Sequences are listed below the graph. Note that CD spectra of all G4 forming 
oligonucleotides show a positive peak at ~263nm and a negative peak at ~240nm, which is characteristic of a G4 structures. DNMT1 
activity in presence of: j) BCL2 (G4 structure), BCL2-CCC (C-rich, non-G4 forming with 5 CpGs), BCL2-mut (wild type BCL2 with 
mutations in G4 tetrad Gs, non-G4 forming), BCL2-2CG (G4-forming with 2 CpGs), BCL2-3CG (G4 forming with 3CpGs) and BCL2-
0CG (G4 forming without CpGs). k) KIT2 (G4 structure), KIT2-CCC (C-rich, non-G4 forming with 4 CpGs), KIT2-mut (wild type KIT2 
with mutations in G4 tetrad Gs, non-G4 forming), KIT2-0CG (G4 forming without CpGs), KIT2- 2CG (G4-forming with 2 CpGs), KIT-
CGCG (G4 forming with 2 adjacent CpGs). l) MYC (G4 structure), MYC-CCC (C-rich, non-G4 forming with no CpGs), MYC-mut (wild 
type MYC with mutations in G4 tetrad Gs, non-G4 forming), MYC-2CG (G4 forming with 2 CpGs), MYC-CTCA (G4-forming without 
CpGs), MYC-4CG (G4 forming with 4 CpGs). Sequences of oligonucleotides used are given below the graphs. Shown are mean ± s.d., 
n = 3 independent experiments. Statistical tests were done using two-way ANOVA. 
 
 
Supplementary Figure 5 
Depletion of DNA methylation is independent of R-loop formation. 
Plot showing the average methylation profile centered around BG4 peak regions overlapping with R-loop regions (BG4 + R-loop, red 
and blue are replicate 1 and 2 respectively, n = 5,464) or BG4 peak regions without overlapping R-loop (BG4 – R-loop, pink and green 
are replicate 1 and 2 respectively, n = 3,111). The plot extends ± 5 kb from the centre of BG4 peaks.