1 DNA G-quadruplex structures mould the DNA methylome 1 2 Shi-Qing Mao1, Avazeh T. Ghanbarian1, Jochen Spiegel1, Sergio Martínez Cuesta1, Dario 3 Beraldi1,4, Marco Di Antonio2, Giovanni Marsico1, Robert Hänsel-Hertsch1, David Tannahill1 4 and Shankar Balasubramanian*1,2,3 5 6 1 Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Cambridge, UK 7 2 Department of Chemistry, University of Cambridge, Cambridge, UK 8 3 School of Clinical Medicine, University of Cambridge, Cambridge, UK 9 4 Current address: Institute of Cancer Sciences, University of Glasgow, Glasgow, UK 10 * Correspondence should be addressed to S.B. (sb10031@cam.ac.uk) 11 12 13 14 Abstract 15 Control of DNA methylation level is critical for gene regulation, and the factors that govern 16 hypomethylation at CpG islands (CGIs) are still being uncovered. Here, we provide evidence 17 that G-quadruplex (G4) DNA secondary structures are genomic features that influence 18 methylation at CGIs. We show that the presence of G4 structure is tightly associated with CGI 19 hypomethylation in the human genome. Surprisingly, we find that these G4 sites are 20 enriched for DNA methyltransferase 1 (DNMT1) occupancy, which is consistent with our 21 biophysical observations that DNMT1 exhibits higher binding affinity for G4s as compared to 22 duplex, hemi-methylated or single-stranded DNA. The biochemical assays also show that the 23 G4 structure itself, rather than sequence, inhibits DNMT1 enzymatic activity. Based on these 24 data, we propose that G4 formation sequesters DNMT1 thereby protecting certain CGIs from 25 methylation and inhibiting local methylation. 26 27 Introduction 28 Methylation of cytosine at C-5 is a key DNA modification in development and disease1,2. In 29 mammals, cytosine methylation occurs predominantly at CpG dinucleotides and is installed 30 and maintained by three DNA methyltransferase enzymes (DNMT1, 3A and 3B) that are 31 essential for development3–5. CpGs occur less frequently than expected in the mammalian 32 genome and show a bimodal distribution with respect to methylation6,7. Sparsely distributed 33 CpGs (~90%), found in genic and intergenic regions, tend to be highly methylated, while CpGs 34 2 found in dense GC-rich regions, so-called CpG Islands (CGIs), are largely depleted of 35 methylation and are prevalent at the promoters of house-keeping and developmental 36 genes8,9. Outside of embryonic development, gross methylation patterns are generally stable 37 across different tissues10,11. Nonetheless during key cellular events, methylation can be 38 dynamic at specific loci to modulate gene expression, such as de novo methylation of some 39 promoter CGIs with intermediate CpG density during lineage commitment12. 40 41 General rules on maintenance of the default methylation state are being uncovered and 42 several studies suggest that CGI hypomethylation is sequence-dependent13–18. Furthermore, 43 DNMTs are reported to be actively and continuously excluded from CpG-poor distal 44 regulatory regions through competitive inhibition with DNA binding proteins, such as NRF1 45 and CTCF/REST, thus maintaining the hypomethylated state of regulatory regions19,20. Lowly 46 methylated regions also co-localise with DNase I hypersensitivity sites marking accessible 47 chromatin regions21. The presence of enhancer chromatin marks, such as histone 48 modifications, also play an important role in forming the unique chromatin structure of 49 CGIs22,23. In mouse embryonic stem cells, the CXXC finger protein 1, Cfp1, is believed to 50 promote CGI hypomethylation through binding unmethylated CpG and recruitment of H3K4 51 methyltransferases to promote H3K4me324,25. However, Cfp1 binding and/or H3K4me3 are 52 not required for the ‘protection’ of CGI from DNA methylation since Cfp1 knockout results in 53 a dramatic loss of H3K4me3 at CGIs without increasing DNA methylation24. This suggests that 54 Cfp1 binding and/or H3K4me3 are not required to prevent CGIs from DNA methylation, thus 55 there may be other factors that are fundamental to impart the hypomethylated state. 56 57 Alternative DNA secondary structures, known as G-quadruplexes (G4s) are found within 58 certain G-rich sequences and arise through the self-association of guanine bases to form 59 stacked tetrads (Fig. 1a)26. G4s are increasingly being recognised as important features in the 60 genome and over 700,000 G4s have been biophysically mapped in purified human genomic 61 DNA by high-throughput sequencing27. G4 structures have been observed in human using 62 immunofluorescence with a G4-specific antibody (BG4)28, and linked with transcriptional 63 regulation and are enriched in gene promoters including many oncogenes26,29. Recently, G4-64 chromatin immunoprecipitation sequencing (G4-ChIP-seq) has been developed to map G4 65 3 structures in human chromatin30,31. Corroborating a link with transcription, the majority of 66 G4-ChIP-seq sites were found predominantly in regulatory, nucleosome-depleted chromatin, 67 particularly in gene promoters31,32. As both G4s and hypomethylated CGIs are associated 68 with actively transcribed genes9,31, this raises the question of whether there is an interplay 69 between G4 formation and DNA methylation. 70 71 Herein we present evidence that most G4 structures, as detected by G4-ChIP-seq, are formed 72 in regions comprising unmethylated CGIs in the human genome. We also uncover a striking 73 co-localisation of G4 structures and DNMT1 docked at CGIs, and we demonstrate that 74 DNMT1 methylation activity is inhibited by DNA G4 structures. Our data suggest a 75 mechanism for the ‘protection’ of CGIs from methylation by G-quadruplex structures that 76 locally sequester and inhibit DNMT1. 77 78 Results 79 G4 structures in active chromatin are found within hypomethylated CGIs 80 To explore any potential relationship between G4 structures in chromatin and methylation 81 levels, we employed human K562 chronic myelogenous leukaemia cells in which methylation 82 has been comprehensively characterised at single base resolution using whole genome 83 bisulfite sequencing (WGBS) by the ENCODE project33. We generated a genome-wide dataset 84 for G4 structures by G4 ChIP-Seq31 using the G4-specific antibody BG428 and compared the 85 BG4 peak overlap with CGIs34. Strikingly, we found that the majority of BG4 peaks (79%, 86 7111/8952) overlapped with a CGI (covering 23% of all CGIs) (Fig. 1b). The majority of CpG 87 island regions span 200 to 1000 bp (median/mean, 569/775 bp), while BG4 peak regions 88 span 100 to 400 bp (median/mean, 205/226 bp) (Fig. 1c). 83% (5935/7111) of these CGIs 89 overlap with one BG4 peak (Fig. 1d). Furthermore, when the level of methylation at BG4 90 peaks was considered, we noted that there was a dramatic absence of methylation at BG4 91 peaks (mean 1%, median 0.5%), compared with average genome methylation (28.4%) (Fig. 92 1e). To rule out any effect of the cytosine methylation state on the ability of the BG4 93 antibody to recognise a G4 structure, an ELISA binding assay was used to show that BG4 can 94 bind to G4 structured DNA with equal affinity irrespective of the presence of cytosine 95 methylation (SI Fig. 1). DNase I hypersensitive sites (DHS), which mark open chromatin, are 96 4 also mainly hypomethylated (mean 11%, median 2.5%, Fig. 1e). Confirming our previous 97 observations31, the majority of BG4 peaks are found in open chromatin (DHSs, 97%, 98 8655/8952), and it is notable that these sites have the lowest methylation levels (Fig. 1e). 99 Overall CGI methylation (mean 27%, median 8%, Fig. 1e) shows a broader distribution than 100 BG4 regions, since some CGIs are associated with active hypomethylated promoters while 101 others with inactive genes or gene bodies and thus are more heavily methylated9,35. This 102 prominent association between BG4 peaks, hypomethylation and particular CGIs is 103 suggestive of a functional link between G4 secondary structures and the establishment 104 and/or maintenance of low methylation status at these CGIs in active chromatin. 105 106 It has recently been concluded from work in mouse embryonic stem cells, that both high 107 CpG-density and high GC-richness are required to establish the hypomethylated state at 108 CGIs15. It is therefore notable that BG4 peaks have a similar level of GC richness to CGIs (Fig. 109 1f) with most (79%) being located in regions of CpG density comparable to that seen in CGIs 110 (Fig. 1g). It has been suggested that CpG density alone is only a minor determinant of the 111 unmethylated state, as dense CpG sequences embedded within an AT-rich context are 112 invariably highly methylated when inserted into the mouse genome15,16. Indeed, when we 113 compare the average methylation of CGIs (Fig. 1h) to that of BG4 peaks at different CpG 114 dinucleotide densities, it is noteworthy that across a wide range of CpG densities, BG4 peak 115 regions are always largely devoid of methylation (Fig. 1h). We also confirmed these 116 observations using an alternative CGI definition set generated by CpGCluster algorithm36 (SI 117 Fig. 2a, 2b). Furthermore, when methylation at CGIs is considered with respect to the 118 presence or absence of a BG4 peak, it is noteworthy that there is an almost a total lack of 119 methylation at CGIs with BG4 than CGIs without (SI Fig. 2c). This strongly suggests that CGIs 120 associated with the physical presence of a G4 structure generally have particularly low 121 methylation. 122 123 To explore low methylation in different CGI contexts, we calculated methylation levels 124 relative to BG4 presence in CGIs containing i) no promoter or DHS site, ii) a promoter alone, 125 iii) a DHS site alone and iv) both a promoter and DHS site. It is apparent that CGIs containing 126 a BG4 peak always have lower methylation in open (DHS +) or closed (DHS −) chromatin, or in 127 the presence or absence of a promoter. (SI Fig. 2d). CGIs with a DHS site and promoter but 128 5 without a BG4 peak (4500 sites) have higher methylation (mean 2%, median 2%) than those 129 CGIs (5567 sites) with a BG4 peak plus promoter and DHSs (mean 1%, median 0.5%) (SI Fig. 130 2d, right two panels). The lowest observed methylation states are found therefore at sites 131 carrying a G4 structure, suggesting that the physical presence of a G4 structure within CGI is 132 an important feature with respect to the hypomethylation state. This is illustrated in Fig. 1i 133 which shows the co-incidence of BG4 peaks with hypomethylated promoter CGIs for a 134 representative genome region. 135 136 In earlier work, we found that treatment of human epidermal keratinocytes (HaCaT) cells 137 with the HDAC inhibitor entinostat led to increased BG4 binding signal primarily located in 138 open chromatin promoter regions37. We therefore generated WGBS datasets to examine 139 DNA methylation changes with respect to BG4 signal. Consistent with our observation in 140 K562 cells, BG4 peaks in HaCaT cells have lower methylation compared with open chromatin 141 and CGI regions (SI Fig. 2e). In open-chromatin promoter CGI regions, 307 had a significant 142 increase in BG4 signal (BG4 increase, > 1.5-fold change in signal and FDR < 0.05, see Online 143 Methods). No change in BG4 signal was seen in 3261 CGI promoter regions before and after 144 treatment (BG4 constant), or for 1504 G-rich CGI promoter regions that do not have a BG4 145 peak (BG4 negative) but have the potential to form a G4 in vitro27. Despite open-chromatin 146 promoter CGI regions already being predisposed to low methylation, we see a statistically 147 significant additional drop in methylation levels at CGIs where BG4 peak size increases after 148 HDAC inhibition (SI Fig. 2f). Overall, these data support that formation of G-quadruplex 149 structures in CGIs is linked to lower methylation. 150 151 DNMT1 is sequestered at G4 structures associated with low methylation 152 Given that regions where G4 structures marked by BG4 peaks are generally observed to be 153 hypomethylated, we considered that the DNA methyltransferases might have some form of 154 physical interaction with G4 structures in the chromatin context. We focused on DNMT1 155 since DNMT1 knockout is lethal causing global DNA methylation loss in all dividing somatic 156 cells and human embryonic stem cells (ESCs)3,5,38,39, whereas DNMT3A/B knockouts mainly 157 affect non-CpG methylation in human ESCs5. When we considered the distribution of DNMT1 158 binding sites in K562 cells, downloaded from ENCODE33 (516,483 peaks in total across both 159 biological replicates), we found that 52% (4611/8952, Monte Carlo simulation’s P-value 160 6 1.25e-04) of the G4 structures mapped by G4-ChIP (BG4 peaks) overlapped with at least one 161 DNMT1 binding site. Of the remaining 4341 BG4 peaks, 4003 were within 1 Kb of a DNMT1 162 binding site. The proximity of BG4 peaks to DNMT1 recruitment sites is illustrated graphically 163 in Fig. 2a for a representative genome region. Intriguingly, when the distribution of DNMT1 164 binding is plotted relative to high, intermediate or low methylated CGIs, we observe a 165 prominent enrichment of DNMT1 binding at lowly methylated CGIs which overlaps with 166 those regions with the highest BG4 peak density as well as DHS sites (Fig. 2b). A similar 167 profile is also seen using alternative CGI definition set generated by CpGCluster36 (SI Fig. 3). 168 The observation that DNMT1 enrichment at G4 regions that lack methylation is, at first 169 glance, somewhat unexpected and counter-intuitive, given that DNMT1 installs methylation. 170 We therefore considered the possibility of a mechanism whereby DNMT1 protein is 171 sequestered at these sites in active chromatin but prevented from methylating CpGs in that 172 locality. 173 174 DNMT1 selectively binds to and is inhibited by G4 structures 175 To address whether DNMT1 binds G4 structures directly, we carried out biophysical 176 measurements using an enzyme-linked immunosorbent assay (ELISA) to measure the binding 177 of recombinant human FLAG-tagged full-length DNMT1 protein to immobilized target DNA 178 structures (see Online Methods). Biotinylated single-stranded oligonucleotides of sequence 179 based on the promoters of BCL2, KIT2 and MYC were chosen as these fold into well-180 characterised G4 structures40–42. Mutated versions (BCL2-mut, KIT2-mut and MYC-mut) that 181 are unable to form a G4 structure were also used as controls. The presence or absence of G4 182 folded structure with G4 oligonucleotides and mutated controls was confirmed by circular 183 dichroism (CD) spectroscopy and ultraviolet (UV) thermal melting spectroscopic analysis (SI 184 Fig. 4a-f). We found that DNMT1 binds to all three G4 structures with low nanomolar affinity 185 (Kd[BCL2] = 9.6 ± 0.3 nM; Kd [KIT2] = 15.2 ± 0.4 nM, Kd [MYC] = 25.3 ± 0.4 nM, n=3; Fig. 3a-c). 186 DNMT1 showed a lower binding affinity for unmethylated duplex DNA (BCL2, 107±5 nM; Fig. 187 3d) and there was no specific binding observed for the single stranded mutated 188 oligonucleotide controls. Notably, DNMT1 generally showed a greater affinity for G4 189 structures than known DNMT1 substrates such as a hemi-methylated duplex DNA (BCL2, 85 ± 190 7 nM, Fig. 3e), or a synthetic poly(dI-dC)50 substrate (75 ± 2 nM, Fig. 3f). DNMT1 binding to 191 G4 structures does not appear to depend on CpG dinucleotides, since the absence of CpG in 192 7 the MYC G4 did not preclude DNMT1 binding (Fig. 3c). To begin to dissect the binding mode 193 of DNMT1 to G4s, we used a competition ELISA assay. 50 nM immobilized BCL2 G4 (Kd for 194 DNMT1 = 9.6 nM, Fig. 3a) was incubated with DNMT1 protein in the presence of increasing 195 concentrations of competitors. Even with 100-fold excess (5 M) of DNA duplex (Kd for 196 DNMT1 = 107 nM, Fig. 3d) or poly-dIdC (Kd for DNMT1 = 75 nM, Fig. 3f), there was no 197 inhibition (Fig. 3g). This suggests that G4s and duplex DNA occupy different binding sites and 198 that the catalytic domain is not involved in G4 binding. A similar, non-overlapping G4 and 199 duplex binding has also been observed in other proteins such as TRF243 and Rap144. 200 201 The relatively high binding affinity and selectivity of DNMT1 for DNA G4 structures is 202 consistent with the observation that DNMT1 shows some localisation to G4 structures in 203 K562 cells (Fig. 2a, b). To validate the association with low methylation in the locality G4 sites 204 in the genome, we evaluated whether G4 DNA could actually inhibit DNA methylation on a 205 standard assay using poly(dI-dC)n as substrate45 of DNMT1 using a fluorometric biochemical 206 assay (Abcam, see Online Methods). Specifically, we evaluated different concentrations of 207 folded G4-structured oligonucleotides or mutated non-G4 controls, where the presence or 208 absence of G4 structure had been confirmed by CD spectroscopy (SI Fig. 4g-i). We indeed 209 found that each of three G4 structures resulted in significant inhibition of DNMT1 210 methyltransferase activity whereas the mutated control oligonucleotides did not (Fig 3h-j). 211 Gratifyingly, the potency of inhibition by each G4 was related to the binding affinity for 212 DNMT1, as determined by ELISA with BCL2 being the most potent inhibitor (50% inhibition at 213 ~25 nM), MYC being least potent (50% inhibition at ~1 M) and KIT2 being intermediate (50% 214 inhibition at 90 nM). No inhibition of activity was seen with mutated controls ranging from 215 400 nM-8 μM concentration. C-rich oligonucleotides complementary to the G4 sequences 216 (BCL2-CCC, KIT2-CCC and MYC-CCC) or corresponding duplex DNA also had no effect on 217 DNMT1 activity (SI Fig. 4j-l). We also tested G4 oligonucleotides that were able to fold into a 218 G4 structure but carried a reduced number CpGs (BCL2, KIT) or had a number of artificially 219 introduced CpGs (MYC). In all cases, changes in the number of CpG sites only had minor 220 effects on DNMT1 inhibition (SI Fig. 4j-l). Taken together, these results indicate a novel and 221 unexpected feature of G4 structures as potential genomic regions that promote the 222 unmethylated state through recruitment and inhibition of DNMT1 activity. 223 224 8 Recruitment of DNMT to G4 structures shapes the methylome 225 The above data suggest that there is a striking lack of methylation (Fig. 1e, h, SI Fig. 2a-d) in 226 chromatin regions where G4 structure formation is observed. To rigorously question whether 227 this observation was related to the detectable formation of a G4 structure in chromatin (i. e. 228 a BG4 peak), or merely the G-rich sequences per se with potential to form a G4 structure, the 229 methylation profile for BG4 peak regions was compared to those of G-rich sequences that 230 can physically form a G4 structure in an in vitro sequencing assay27 (here called Sequences 231 with potential to form G4s, Fig. 4a). As the majority of BG4 peaks (8,210) are found in open 232 chromatin, only sequences with potential to form G4s located in open chromatin (36,015) 233 were considered. The mean and median length is 226/205 bp for BG4 peaks, and 383/285 bp 234 for the latter. G-rich sequences with the potential to form a G4 are largely hypomethylated 235 (12%), with methylation levels rising in the flanking regions (45%), whereas BG4 peaks have 236 substantially lower methylation (down to 1%) and flanking regions being more methylated 237 (60%). The contrast between lowest methylation at BG4 sites with highest methylation at 238 distal flanking regions is also exemplified in the genome browser view in Fig. 1i. While G-239 richness as defined by G4 sequence without structure is a feature that correlates with the 240 lack of methylation, there is a further dramatic loss of methylation due to the physical 241 presence of a G4 structure with these regions also being marked by a greater methylation 242 flanking the G4 structure. Regions with a G4 structure also correspond to CGIs that mark 243 particular active genes, and is in keeping with our previous data showing that G4s are 244 associated with particular chromatin states to promote elevated transcription31. R-loops 245 (three-stranded DNA-RNA hybrids) have also been linked to reduced methylation in 246 transcribed CpG island promoters46,47. As R-loops form in a similar genomic context to G4s, 247 we tested the correlation of R-loops, BG4 peaks and methylation. Using the K562 R-loop 248 dataset46, we found that 5685 BG4 peaks overlap with a R-loop, while 3267 BG4 peaks do not 249 and that BG4 peaks are depleted of methylation independent of R-loop presence (SI Fig. 5). 250 This suggests that G4 structure is strongly linked to hypomethylation, irrespective of the 251 presence of R-loop. 252 253 Discussion 254 Here we have provided evidence for a link between a DNA secondary structure formation 255 and epigenetic status. We have uncovered a unique chromatin context whereby certain CGIs 256 9 in active chromatin are depleted in methylation but carry a G4 structure and also the 257 surrounding flanking regions display higher than average methylation. This suggests that G4s 258 may impart a previously unknown and important function in the establishment of epigenome. 259 We propose a model (Fig. 4b) in which G4 formation, together with transcription factor 260 binding19,20, contributes to loss of methylation at key genomic loci by sequestering DNMT1, 261 via G4 recognition, and locally inhibiting DNMT1 function at CpG islands. It is noteworthy 262 that this mechanism resembles a recently proposed model for recruitment and inhibition of 263 PRC2 complex by a RNA G-quadruplex present in the HOTAIR lncRNA48,49. This suggests there 264 may be other mechanisms for epigenetic regulation that operate by the sequestration and 265 inhibition of epigenetic modifiers mediated by high affinity interactions with nucleic acid 266 secondary structures. 267 268 Accession Codes 269 K562 datasets for DHS (ENCSR000EPC), DNMT1 ChIP-seq (ENCSR987PBI) and whole genome 270 bisulfate sequencing (ENCSR765JPC) were downloaded from ENCODE. G4-ChIP-seq data sets 271 for K562 and WGBS datasets for entinostat-treated and untreated HaCaT cells are available 272 at the NCBI GEO repository under accession number GSE107690. G4-ChIP-seq data in 273 entinostat-treated and untreated HaCaT cells were taken from GSE76688. 274 275 Acknowledgements 276 This work is supported by a core CRUK award (C14303/A17197). S.B. is a Senior Investigator 277 of the Wellcome Trust (grant no. 099232/z/12/z). JS is a Marie Curie Fellow of the European 278 Union (747297-QAPs-H2020-MSCA-IF-2016). 279 280 Author Contributions 281 282 The project was conceived by SM and SB. SM designed and carried out all the experiments 283 with discussions from DB, JS, RHH, DT & SB. SM designed the analysis strategies with 284 discussions from AG, DT & SB. JS performed G4-ChIP-seq experiments. AG & SMC carried out 285 all computational analysis with discussions from SM, DT, DB, RHH & GM. MD carried out the 286 CD spectroscopy and UV melting experiments. All authors interpreted the results. SM, DT & 287 SB wrote the paper with input from all authors. 288 289 10 Competing interests 290 SB is an advisor and shareholder of Cambridge Epigenetix limited. 291 11 References 292 1. Smith, Z. D. & Meissner, A. DNA methylation: roles in mammalian development. Nat. 293 Rev. Genet. 14, 204–20 (2013). 294 2. Feinberg, A. P. & Tycko, B. The history of cancer epigenetics. Nat. Rev. Cancer 4, (2004). 295 3. Li, E., Bestor, T. H. & Jaenisch, R. Targeted mutation of the DNA methyltransferase 296 gene results in embryonic lethality. Cell 69, 915–926 (1992). 297 4. Okano, M., Bell, D. W., Haber, D. A. & Li, E. DNA methyltransferases Dnmt3a and 298 Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 299 247–257 (1999). 300 5. Liao, J. et al. Targeted disruption of DNMT1, DNMT3A and DNMT3B in human 301 embryonic stem cells. Nat. Genet. 47, 469–478 (2015). 302 6. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21 303 (2002). 304 7. Lister, R. et al. Human DNA methylomes at base resolution show widespread 305 epigenomic differences. Nature 462, 315–322 (2009). 306 8. Illingworth, R. S. & Bird, A. P. CpG islands - ‘A rough guide’. FEBS Lett. 583, 1713–1720 307 (2009). 308 9. Deaton, A. & Bird, A. CpG islands and the regulation of transcription. Genes Dev. 25, 309 1010–1022 (2011). 310 10. Reik, W., Dean, W. & Walter, J. Epigenetic reprogramming in mammalian development. 311 Science 293, 1089–93 (2001). 312 11. Li, E. Chromatin modification and epigenetic reprogramming in mammalian 313 development. Nat. Rev. Genet. 3, 662–673 (2002). 314 12. Reik, W. Stability and flexibility of epigenetic gene regulation in mammalian 315 development. Nature 447, 425–432 (2007). 316 13. Long, H. K., King, H. W., Patient, R. K., Odom, D. T. & Klose, R. J. Protection of CpG 317 islands from DNA methylation is DNA-encoded and evolutionarily conserved. Nucleic 318 Acids Res. 44, 6693–6706 (2016). 319 14. Lienert, F. et al. Identification of genetic elements that autonomously determine DNA 320 methylation states. Nat. Genet. 43, 1091–1097 (2011). 321 15. Wachter, E. et al. Synthetic CpG islands reveal DNA sequence determinants of 322 chromatin structure. Elife 3, 1–16 (2014). 323 16. Krebs, A. R., Dessus-Babus, S., Burger, L. & Schübeler, D. High-throughput engineering 324 of a mammalian genome reveals building principles of methylation states at CG rich 325 regions. Elife 3, e04094 (2014). 326 17. Takahashi, Y. et al. Integration of CpG-free DNA induces de novo methylation of CpG 327 islands in pluripotent stem cells. Science 356, 503–508 (2017). 328 18. Quante, T. & Bird, A. Do short, frequent DNA sequence motifs mould the epigenome? 329 Nat. Rev. Mol. Cell Biol. 17, 257–62 (2016). 330 19. Domcke, S. et al. Competition between DNA methylation and transcription factors 331 determines binding of NRF1. Nature 528, 575–579 (2015). 332 20. Stadler, M. B. et al. DNA-binding factors shape the mouse methylome at distal 333 regulatory regions. Nature 480, 490–5 (2011). 334 21. Thurman, R. E. et al. The accessible chromatin landscape of the human genome. 335 Nature 489, 75–82 (2012). 336 22. Ooi, S. K. T. et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novo 337 methylation of DNA. Nature 448, 714–717 (2007). 338 12 23. Du, J., Johnson, L. M., Jacobsen, S. E. & Patel, D. J. DNA methylation pathways and 339 their crosstalk with histone methylation. Nat. Rev. Mol. Cell Biol. 16, 519–532 (2015). 340 24. Clouaire, T. et al. Cfp1 integrates both CpG content and gene activity for accurate 341 H3K4me3 deposition in embryonic stem cells. Genes Dev. 26, 1714–1728 (2012). 342 25. Thomson, J. P. et al. CpG islands influence chromatin structure via the CpG-binding 343 protein Cfp1. Nature 464, 1082–1086 (2010). 344 26. Hänsel-Hertsch, R., Di Antonio, M. & Balasubramanian, S. DNA G-quadruplexes in the 345 human genome: Detection, functions and therapeutic potential. Nat. Rev. Mol. Cell 346 Biol. 18, 279–284 (2017). 347 27. Chambers, V. S. et al. High-throughput sequencing of DNA G-quadruplex structures in 348 the human genome. Nat. Biotechnol. 33, 1–7 (2015). 349 28. Biffi, G., Tannahill, D., McCafferty, J. & Balasubramanian, S. Quantitative visualization 350 of DNA G-quadruplex structures in human cells. Nat. Chem. 5, 182–6 (2013). 351 29. Rhodes, D. & Lipps, H. J. G-quadruplexes and their regulatory roles in biology. Nucleic 352 Acids Res. 43, 8627–8637 (2015). 353 30. Hänsel-Hertsch, R., Spiegel, J., Marsico, G., Tannahill, D. & Balasubramanian, S. 354 Genome-wide mapping of endogenous G-quadruplex DNA structures by chromatin 355 immunoprecipitation and high-throughput sequencing. Nat. Protoc. 13, 551–564 356 (2018). 357 31. Hänsel-Hertsch, R. et al. G-quadruplex structures mark human regulatory chromatin. 358 Nat. Genet. 48, 1267–1272 (2016). 359 32. De, S. & Michor, F. DNA secondary structures and epigenetic determinants of cancer 360 genome evolution. Nat. Struct. Mol. Biol. 18, 950–5 (2011). 361 33. Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. 362 Nature 489, 57–74 (2012). 363 34. Gardiner-Garden, M. & Frommer, M. CpG Islands in vertebrate genomes. J. Mol. Biol. 364 196, 261–282 (1987). 365 35. Jones, P. A. Functions of DNA methylation: islands, start sites, gene bodies and beyond. 366 Nat. Rev. Genet. 13, 484–92 (2012). 367 36. Hackenberg, M. et al. CpGcluster: A distance-based algorithm for CpG-island detection. 368 BMC Bioinformatics 7, 1–13 (2006). 369 37. Chen, L. et al. R-ChIP Using Inactive RNase H Reveals Dynamic Coupling of R-loops with 370 Transcriptional Pausing at Gene Promoters. Mol. Cell 68, 745–757.e5 (2017). 371 38. Fan, G. et al. DNA hypomethylation perturbs the function and survival of CNS neurons 372 in postnatal animals. J. Neurosci. 21, 788–797 (2001). 373 39. Jackson-Grusby, L. et al. Loss of genomic methylation causes p53-dependent apoptosis 374 and epigenetic deregulation. Nat. Genet. 27, 31–39 (2001). 375 40. Dai, J. et al. An intramolecular G-quadruplex structure with mixed parallel/antiparallel 376 G-strands formed in the human BCL-2 promoter region in solution. J. Am. Chem. Soc. 377 128, 1096–1098 (2006). 378 41. Kuryavyi, V., Phan, A. T. & Patel, D. J. Solution structures of all parallel-stranded 379 monomeric and dimeric G-quadruplex scaffolds of the human c-kit2 promoter. Nucleic 380 Acids Res. 38, 6757–6773 (2010). 381 42. Ambrus, A., Chen, D., Dai, J., Jones, R. A. & Yang, D. Solution structure of the 382 biologically relevant G-quadruplex element in the human c-MYC promoter. 383 Implications for G-quadruplex stabilization. Biochemistry 44, 2048–2058 (2005). 384 43. Biffi, G., Tannahill, D. & Balasubramanian, S. An intramolecular G-quadruplex structure 385 is required for binding of telomeric repeat-containing RNA to the telomeric protein 386 13 TRF2. J. Am. Chem. Soc. 134, 11974–11976 (2012). 387 44. Giraldo, R. & Rhodes, D. The yeast telomere-binding protein RAP1 binds to and 388 promotes the formation of DNA quadruplexes in telomeric DNA. EMBO J. 13, 2411–389 2420 (1994). 390 45. Bacolla, A., Pradhan, S., Roberts, R. J. & Wells, R. D. Recombinant Human DNA 391 (Cytosine-5) Methyltransferase. J. Biol. Chem. 274, 33002–33010 (1999). 392 46. Sanz, L. A. et al. Prevalent, Dynamic, and Conserved R-Loop Structures Associate with 393 Specific Epigenomic Signatures in Mammals. Mol. Cell 63, 167–178 (2016). 394 47. Ginno, P. A., Lott, P. L., Christensen, H. C., Korf, I. & Chédin, F. R-Loop Formation Is a 395 Distinctive Characteristic of Unmethylated Human CpG Island Promoters. Mol. Cell 45, 396 814–825 (2012). 397 48. Wang, X. et al. Targeting of Polycomb Repressive Complex 2 to RNA by Short Repeats 398 of Consecutive Guanines. Mol. Cell 65, 1056–1067.e5 (2017). 399 49. Wang, X. et al. Molecular analysis of PRC2 recruitment to DNA in chromatin and its 400 inhibition by RNA. Nat. Struct. Mol. Biol. (2017). doi:10.1038/nsmb.3487 401 402 14 Figure 1. G4 formation is associated with hypomethylation at CGIs 403 a) A G-tetrad stabilized by Hoogsteen hydrogen bonding and a central monovalent cation 404 (left). Schematic representations of a three-tetrad G4 structure (Right). b) Venn diagram 405 illustrating the overlap of G4 structure formation (BG4 peaks) and CGIs. c) Violin plot 406 showing size distribution of BG4 peaks and CGIs. d) Count of BG4 peaks overlapping a CGI. e) 407 Box and whisker plot showing the average methylation for BG4 peaks (n = 8,210), DHSs (n = 408 142,115) and CGIs (n = 27,073). Centre line represents the median value separating upper 409 and lower quartiles in the box, whiskers indicate 1.5× interquartile range (IQR), points are 410 actual values of outliers. Note that methylation level at CpG sites with less than 5x coverage 411 is considered unreliable and discarded. f) Histogram showing the distribution of BG4 peaks 412 and CGIs relative to percentage of GC. g) Histogram showing the distribution of BG4 peaks 413 and CGIs relative to percentage of CpGs per 100 bp. h) Box and whisker plot showing the 414 methylation levels for BG4 peaks and CGIs at different CpG densities. Note that by definition 415 there are no CGIs at a CpG density < 5 CpGs/100bp and that at > 20 CpGs/100bp there are 416 few CGIs (1) and BG4 (36) peaks to consider. The number of CGI regions and BG4 peaks in 417 each category are presented on top of the plot. i) An IGV screen shot illustrating the co-418 incidence of BG4 peaks (blue) with hypomethylated promoter CGIs (green) and DHSs (orange) 419 for a representative genome region from Chr 7. Shown are normalised signal. Whole genome 420 bisulfite sequence tracks are in black (top). RefGene tracks are in grey (bottom). 421 422 Figure 2. DNMT1 is recruited to BG4 peaks associated with low methylation 423 a) An IGV screen shot showing the co-incidence (blue-masked) of BG4 peaks (blue) with 424 DNMT1 ChIP-seq peaks (red) and CGIs (green) at hypomethylated region from Chr 6. Orange-425 masked regions are hypermethylated and enriched with DNMT1 presence, but not BG4 426 signal. Whole genome bisulfite sequence tracks in black (top). b) Binding profile of DNMT1 427 shown across CGIs with low (less than 20%, n = 16,523), intermediate (between 20% and 80%, 428 n = 6,042) and high (more than 80%, n = 4,266) methylation. Y-axis shows the number of 429 reads in the ChIP normalised to 1 of sequencing depth (also known as Reads Per Genomic 430 Content (RPGC), more details in computational methods). Replicate 1 and 2 are indicated in 431 red and blue respectively. Above each plot is a heat map showing the enrichment of BG4 432 peaks and DHSs across the respective regions. The heat maps show RPGC of active chromatin 433 marks (DHSs) and BG4 peaks on these three classes of CGIs. 434 15 435 Figure 3. DNMT1 selectively binds and is inhibited by G4 structures 436 a-f) ELISA assays testing the binding of recombinant DNMT1 to G4 structure and control 437 oligonucleotides. Binding curves for: a) BCL2 G4 and non-G4-forming control (BCL2-mut); b) 438 KIT2 G4 and non-G4-forming control (KIT2-mut); c) MYC G4 and non-G4-forming control 439 (MYC-mut); d) BCL2 duplex DNA; e) BCL2 hemi-methylated duplex DNA; f) poly(dI-dC), 100 nt. 440 Absorbance was measured at 450 nm. a.u., arbitrary unit. Sequences of oligonucleotides are 441 given below the graphs. g) Binding curve of BCL2 G4 in presence of different concentration of 442 BCL2 duplex or poly(dIdC)n. h-j) Relative methylation activity of recombinant DNMT1 in 443 presence of G4 structure and control oligonucleotides: h) BCL2 G4 and BCL2-mut; i) KIT2 G4 444 and KIT2-mut; j) MYC G4 and MYC-mut. Shown are mean ± s.d., n = 3 independent 445 experiments in all plots but g (n = 2). 446 447 Figure 4. Recruitment of DNMT1 by G4 structures shapes the methylome in G-rich regions 448 a) Plot showing the average methylation profile centred around G4 forming regions (red and 449 blue are replicates 1 and 2 respectively, n = 7,491) or G4 sequences without structure 450 (orange and green are replicates 1 and 2 respectively, n = 36,015). The plot extends ± 5 Kb 451 from the centre. The dotted line denotes the lowest methylation level of G4 sequence 452 without structure. b) Proposed model for potential involvement of G4 structures and 453 methylation control at CGIs: i) G4 structures sequester DNMT1 due to high affinity binding; ii) 454 G4 structures inhibit the methylation activity of DNMT1. Together with the binding of 455 transcription factors, G4 structures contribute to protection of CGIs from methylation. 456 457 16 Online Methods 458 459 Cell culture 460 Mycoplasma-free human chronic myelogenous leukaemia K-562 cells (CCL-243) were 461 purchased from ATCC and grown in RPMI1640 (Glutamine plus, Life Technologies) 462 supplemented with 10% of fetal bovine serum and 100 U/ml penicillin-streptomycin (Life 463 Technologies). All cell stocks were regularly tested for mycoplasma contamination. 464 465 G-quadruplex ChIP-seq 466 ChIP-seq for G-quadruplex structures (G4-ChIP-seq) was performed using the G4-specific 467 antibody BG4 essentially as described previously31. 468 469 Oligonucleotide annealing 470 All oligonucleotides were PAGE purification quality (Sigma). For G4 formation, 10 µM DNA 471 oligonucleotide was annealed in 10 mM Tris HCl, pH 7.4, 100 mM KCl by heating at 95 °C for 472 5 min followed by gradually cooling to 21 °C. For double-stranded DNA, 10 µM forward and 473 reverse strand oligonucleotides were mixed and annealed in 10 mM Tris HCl, pH 7.4, 100 mM 474 NaCl in the same manner. 20 µM poly(dI-dC)50 was annealed as for double-stranded DNA. 475 476 Enzyme-linked immunosorbent assay (ELISA) 477 ELISAs for binding affinity and specificity were performed as described previously28 with 478 minor modifications. Briefly, biotinylated oligonucleotides were bound to Pierce™ 479 Streptavidin Coated High Capacity Plates (ThermoFisher) followed by blocking with 1.5% BSA 480 and incubation with recombinant full-length human FLAG-tagged DNMT1 protein (Active 481 Motif, Cat. No: 31404) in ELISA buffer (100 mM KCl, 50 mM KH2PO4, pH7.4). After three 482 washes with ELISA buffer, detection was achieved with an anti-FLAG horseradish peroxidase 483 (HRP)-conjugated antibody (ab1238, Abcam) and TMB (3,3′,5,5′-tetramethylbenzidine) ELISA 484 Substrate (Fast Kinetic Rate, ab171524, Abcam). Signal intensity was measured at 450 nm on 485 a PHERAstar microplate reader (BMG Labtech). Dissociation constants (Kd) were calculated 486 from saturation binding curves assuming one-site binding using Prism (GraphPad Software 487 Inc.). Standard error of mean (s.e.m.) values were calculated from three replicates. 488 489 In vitro DNA methylation assay 490 DNA methylation assays were performed using a DNMT Activity Assay Kit (Fluorometric, 491 ab113468, Abcam) as per manufacturer's instructions. Briefly, 100ng recombinant DNMT1 492 was incubated with substrate assay wells in presence of different concentrations of G4 or 493 non-G4 oligonucleotides at 37 °C for 90 min. Methylation levels were quantified from the 494 binding of an anti-5-methylcytosine antibody detected by fluorescent secondary antibody. 495 Fluorescence signal was measured using a PHERAstar microplate reader (530 nm excitation, 496 590 nm emission). DNMT enzyme activity is proportional to the fluorescence intensity (RFU, 497 17 relative fluorescence unit) measured. Relative methylation activity is then calculated against 498 mock control. 499 500 Circular dichroism spectroscopy 501 CD spectra were recorded on an Applied Photo-physics Chirascan circular dichroism 502 spectropolarimeter using a 1 mm path length quartz cuvette. CD measurements were 503 performed at 298 K over a range of 220-300 nm using a response time of 0.5 s, 1 nm pitch 504 and 0.5 nm bandwidth. The recorded spectra represent a smoothed average of three scans, 505 zero-corrected at 300 nm (Molar ellipticity θ is quoted in 105 deg cm2 dmol−1). The 506 absorbance of the buffer was subtracted from the recorded spectra. Oligonucleotides were 507 dissolved in lithium cacodylate buffer (100 mM, pH 7.2) containing 100 mM of KCl and 1 mM 508 EDTA to the concentration of 10 μM. 200 L of the oligonucleotides were annealed prior 509 measurement by warming up to 90 °C and slowly cooling down at room temperature. 510 511 UV Melting 512 For UV melting experiments, measurements were collected using a Varian Cary 100-Bio 513 UV−visible spectrophotometer by following absorbance at 295 nm. Samples (200 μl) with 514 final concentration of 2 M were measured in black, small window, 1 cm path-length quartz 515 cuvettes, covered with a layer of mineral oil (50 μl). Samples were equilibrated at 5 °C for 10 516 min, heated to 95 °C and cooled back to 5 °C at a rate of 0.5 °C/min. The samples were held 517 for a further 10 min and then the 5 °C to 95 °C ramp was repeated. Data were recorded every 518 1 °C during both the melting and cooling steps. 200 L of oligonucleotides were annealed 519 prior measurement by warming up to 90 °C and slowly cooling down at room temperature. 520 521 Bioinformatics Software and Scripts 522 Bioinformatic data analyses and processing were performed using Perl, Bash, Python and R 523 programming languages. The following tools were also used: cutadapt (1.15)50, BWA 524 (0.7.15)51, Picard (2.8.3), (http://broadinstitute.github.io/picard), MACS (2.1.1)52, Bedtools 525 (2.26.0), (http://bedtools.readthedocs.io/en/latest/content/overview.html), Deeptools 526 (2.5.1)53 and Bismark (v0.19.0)54. 527 All scripts and software developed are released in the following GitHub page: 528 https://github.com/sblab-bioinformatics/dna-g4-methylation-dnmt1 529 530 G4-ChIP-seq analysis 531 Raw fastq reads from G4-ChIP-seq in K562 cells were trimmed with cutadapt50 to remove 532 adapter sequences and low-quality reads (mapping quality < 10). Reads were aligned to the 533 human genome (version hg19) with BWA51 and duplicates were removed using Picard. Peaks 534 were called by MACS252 (p < 10-5) following our previous work31: https://github.com/sblab-535 bioinformatics/dna-secondary-struct-chrom-lands/blob/master/Methods.md 536 Peaks were merged from different replicates with bedtools multiIntersect. Only peaks 537 overlapping in 3 out 5 replicates were considered high-confidence. K562 datasets for DHSs 538 18 (ENCSR000EPC), DNMT1 ChIP-seq (ENCSR987PBI), whole genome bisulfite sequencing 539 (ENCSR765JPC) were downloaded from ENCODE. Promoters were defined as 1 kb (+/−) from 540 the transcription start sites of 31,239 hg19 transcripts. Methylation levels at CpG sites with 541 less than 5x coverage were discarded. If not otherwise specified, CGI34 were downloaded 542 using the UCSC’s table browser and then ported to human genome release hg38 using the 543 batch coordinate conversion (liftover) tool of the UCSC. The alternative CGI sets were 544 generated using CpGCluster36. 545 546 Enrichment analysis 547 ENCODE DHS and ChIP-seq data sets were normalised to sequencing depth of 1 (i.e. RPGC, 548 Reads Per Genomic Content). Sequencing depth is defined as: (total number of mapped 549 reads * fragment length) / effective genome size. The effective genome size was set to be 550 3,209,286,105 and enrichment values for DHSs and BG4 peaks over CGIs and their flanking 551 sequences were visualised in R using ggplot2 library. Enrichment values for DNMT1 over CGIs 552 and their flanks were visualised with DeepTools53. 553 554 Monte Carlo Simulation 555 Monte Carlo simulation was used to calculate the significance of overlap between BG4 peaks 556 and high confidence DNMT1 peaks, defined by Irreproducible Discovery Rate (IDR) in 557 ENCODE’s ChIP pipeline. We first counted how many BG4 peaks overlapped with OQSs in 558 open chromatin (defined as all OQSs seen potassium and/or PDS conditions (749,339 559 sequences)27, which overlap at least one DHS region (43,506 sequences)). We then randomly 560 selected the same number of OQSs from all OQSs in open chromatin and counted how many 561 overlapped with at least one high confidence DNMT1 peak. The Monte Carlo P-value was 562 then calculated as (N+1)/(M+1), where M is the number of iterations and N is the number of 563 times the same or more overlaps were observed between randomised OQSs and high 564 confidence DNMT1 peaks (compared to the number of overlaps observed between BG4 565 peaks and high confidence DNMT1 peaks). Randomisation was repeated for 8000 times and 566 on average the number of overlaps between the shuffled OQSs and DNMT1 were two-fold 567 less than those observed between BG4 and DNMT1 peaks. 568 569 Differential methylation and BG4 binding analysis of entinostat treated HaCaT cells 570 HaCaT cells were treated with 10 M entinostat for 48 hours as we previously described31. 571 Genomic DNA from untreated and treated cells were extracted with phenol/chloroform. 50 572 ng DNA were used to generate whole genome bisulfite sequencing libraries using Pico 573 Methyl-Seq Library Prep Kit from Zymo research. Libraries were sequenced using the pair-574 end 150 bp high-output kit on Illumina Next-seq platform. Data from 4 runs were pooled 575 together. After quality assessment using FastQC 576 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), reads were processed to 577 remove adaptors and low-quality bases using cutadapt3. Options –u 6 –u -1 –U 6 –U -1 were 578 used to trim the initial six and last nucleotide bases. High-quality reads were aligned using 579 19 Bismark in paired-end mode with options --non_directional, --unmapped and -N 0 to hg19 580 reference genome. Reads were then de-duplicated and methylation was extracted. To 581 increase mapping efficiency and following previous work55, unmapped reads resulting from 582 the paired-end alignments were then re-aligned in single-end mode with options --583 non_directional and -N 0, and then deduplicated. Methylation was extracted for paired-end 584 and single-end alignments separately and then aggregated. Technical replicates for each 585 condition (before and after entinostat treatment) were merged and methylation counts were 586 aggregated by CpG site. A threshold was then applied to keep CpG sites with more than 5X 587 bisulfite sequencing depth both before and after treatment. This resulted in 21,106,307 CpG 588 sites, 75% of all CpG sites in hg19. 589 Differential BG4 binding analysis was done as previously reported31. Analysis focused on 590 open chromatin promoter regions (5351) which have at least one G-rich sequence27 and 591 ATAC-seq peak unaltered in size (log2 fold change = −0.6 to 0.6, FDR < 0.05) between 592 untreated and entinostat-treated HaCaT cells. Depending on differential BG4 signal, these 593 regions were categorized into BG4 gain (> 1.5-fold change in signal and FDR < 0.05) and BG4 594 constant and BG4 negative. 95% (5072/5351) of these regions are overlapping with CGIs. 595 Difference of the percentage methylation of the overlapping CGIs in each of the categories 596 were calculated. Statistic test was done with Mann–Whitney U test. Plotting of methylation 597 data were performed in the R programming language. 598 599 Data availability. K562 datasets for DHS (ENCSR000EPC), DNMT1 ChIP-seq (ENCSR987PBI) 600 and whole genome bisulfate sequencing (ENCSR765JPC) were downloaded from ENCODE. 601 G4-ChIP-seq data sets for K562 and WGBS datasets for entinostat-treated and untreated 602 HaCaT cells are available at the NCBI GEO repository under accession number GSE107690. 603 G4-ChIP-seq data in entinostat-treated and untreated HaCaT cells were taken from 604 GSE76688. Source data for figure 1d, e, h and Figure 3 are available with the paper online. 605 606 References 607 50. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing 608 reads. EMBnet.journal 17, 10 (2011). 609 51. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 610 Prepr. https//arxiv.org/abs/1303.3997 00, 1–3 (2013). 611 52. Zhang, Y. et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008). 612 53. Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data 613 analysis. Nucleic Acids Res. 44, W160–W165 (2016). 614 54. Krueger, F. & Andrews, S. R. Bismark: A flexible aligner and methylation caller for 615 Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011). 616 55. Peat, J. R. et al. Genome-wide Bisulfite Sequencing in Zygotes Identifies Demethylation 617 Targets and Maps the Contribution of TET3 Oxidation. Cell Rep. 9, 1990–2000 (2014). 618 Supplementary Figure 1 BG4 binds unmethylated and methylated G4 structures with the same affinity. ELISA assays testing binding of BG4 antibody to unmethylated and methylated G4 structures and control non-G4 forming oligonucleotides. CpG sites are highlighted in red. MG denotes methylated CpG. Shown are mean and SD from three measurements. Supplementary Figure 2 Methylation is depleted in BG4 regions. a) Box and whisker plot showing the average methylation for BG4 peaks (n = 8,210), DHSs (n = 142,115) and CGIs (n = 22,673). Similar as in Fig 1e, apart from using CGI set generated by CpGCluster. b) Box and whisker plot showing the methylation levels for BG4 peaks and CGIs at different CpG densities. Similar as in Fig 1h, apart from using CGI set generated by CpGCluster. c) Box and whisker plot showing the average methylation levels for CGIs with or without a BG4 peak at different CpG densities. d) Box and whisker plot showing average methylation on CGIs with respect to BG4 peaks in presence (+) or absence (–) of a DHS or promoter. The number of CGI regions in each category are presented on top of the plot. e) Box and whisker plot showing the average methylation for BG4 peaks (n = 17,101), ATAC (ATAC-seq peaks denoting open chromatin, n = 23,217) and CGI regions (n = 26,580) in untreated HaCaT cells. f) Box and scatter plot showing differential percentage methylation in entinostat treated vs untreated cells for promoter CGIs in open chromatin regions containing sequences with potential to form G4s. i) BG4 negative CGIs without a G4 ChIP–seq peak but having potential to form a G4 structure (n = 1504); ii) BG4 constant CGIs with a least one high-confidence G4 ChIP–seq peak that does not significantly change before and after treatment (n = 3261), and iii) BG4 increases where a BG4 peak significantly increases in size after treatment (n = 307). Each grey dot represents one CGI region. p-values were calculated using a Mann–Whitney U test. Supplementary Figure 3 DNMT1 is enriched at BG4 peaks associated with low methylation. Binding profile of DNMT1 in CGIs with low (less than 20%, n = 14,983), intermediate (between 20% and 80%, n = 4,864) and high (more than 80%, n = 2,826) methylation. Above each plot is a heat map showing the enrichment of BG4 peaks and DHSs across the respective regions. Similar as in Fig. 2b, apart from using CGI set generated by CpGCluster. Supplementary Figure 4 Structure verification of oligonucleotides used in this study and inhibition of DNMT1 by G4 DNA. Circular dichroism spectra of a) BCL2 and BCL2-mut, b) KIT2 and BKIT2-mut, c) MYC and MYC-mut. Sequences are listed below the graph. UV melting profiles of the d) BCL2, e) KIT2, f) MYC. Mutated oligonucleotides lose the capacity to form G4s and therefore have no absorbance at 295 nm. Circular dichroism spectra of the g) BCL2-0CG/2CG/3CG and BCL2-CCC, h) KIT2-0CG/2CG/CGCG and KIT2-CCC, i) MYC-2CG/CTCA/4CG and MYC-CCC. Sequences are listed below the graph. Note that CD spectra of all G4 forming oligonucleotides show a positive peak at ~263nm and a negative peak at ~240nm, which is characteristic of a G4 structures. DNMT1 activity in presence of: j) BCL2 (G4 structure), BCL2-CCC (C-rich, non-G4 forming with 5 CpGs), BCL2-mut (wild type BCL2 with mutations in G4 tetrad Gs, non-G4 forming), BCL2-2CG (G4-forming with 2 CpGs), BCL2-3CG (G4 forming with 3CpGs) and BCL2- 0CG (G4 forming without CpGs). k) KIT2 (G4 structure), KIT2-CCC (C-rich, non-G4 forming with 4 CpGs), KIT2-mut (wild type KIT2 with mutations in G4 tetrad Gs, non-G4 forming), KIT2-0CG (G4 forming without CpGs), KIT2- 2CG (G4-forming with 2 CpGs), KIT- CGCG (G4 forming with 2 adjacent CpGs). l) MYC (G4 structure), MYC-CCC (C-rich, non-G4 forming with no CpGs), MYC-mut (wild type MYC with mutations in G4 tetrad Gs, non-G4 forming), MYC-2CG (G4 forming with 2 CpGs), MYC-CTCA (G4-forming without CpGs), MYC-4CG (G4 forming with 4 CpGs). Sequences of oligonucleotides used are given below the graphs. Shown are mean ± s.d., n = 3 independent experiments. Statistical tests were done using two-way ANOVA. Supplementary Figure 5 Depletion of DNA methylation is independent of R-loop formation. Plot showing the average methylation profile centered around BG4 peak regions overlapping with R-loop regions (BG4 + R-loop, red and blue are replicate 1 and 2 respectively, n = 5,464) or BG4 peak regions without overlapping R-loop (BG4 – R-loop, pink and green are replicate 1 and 2 respectively, n = 3,111). The plot extends ± 5 kb from the centre of BG4 peaks.