Article https://doi.org/10.1038/s41467-023-39549-4 Single cell Hi-C identifies plastic chromosome conformations underlying the gastrulation enhancer landscape Nimrod Rappoport1,2,6, Elad Chomsky1,6, Takashi Nagano3,4, Charlie Seibert5, Yaniv Lubling 1, Yael Baran1, Aviezer Lifshitz 1, Wing Leung3,4, Zohar Mukamel 1, Ron Shamir2, Peter Fraser 4,5 & Amos Tanay 1 Embryonic development involves massive proliferation and differentiation of cell lineages. This must be supported by chromosome replication and epige- netic reprogramming, but how proliferation and cell fate acquisition are balanced in this process is not well understood. Here we use single cell Hi-C to map chromosomal conformations in post-gastrulation mouse embryo cells and study their distributions and correlations with matching embryonic transcriptional atlases. We find that embryonic chromosomes show a remarkably strong cell cycle signature. Despite that, replication timing, chro- mosome compartment structure, topological associated domains (TADs) and promoter-enhancer contacts are shown to be variable between distinct epi- genetic states. About 10% of the nuclei are identified as primitive erythrocytes, showing exceptionally compact and organized compartment structure. The remaining cells are broadly associated with ectoderm and mesoderm iden- tities, showing only mild differentiation of TADs and compartment structures, but more specific localized contacts in hundreds of ectoderm and mesoderm promoter-enhancer pairs. The data suggest that while fully committed embryonic lineages can rapidly acquire specific chromosomal conformations, most embryonic cells are showing plastic signatures driven by complex and intermixed enhancer landscapes. The organization of mammalian chromosomes1 must accommodate physical nuclear packaging constraints alongside three major sources of dynamics– transcription2, replication3 and differentiation4–6. Recent advances in microscopy7, and different conformation capture technologies8 have provided improved understanding of the way chromosomes fold in general, leading to models for organization at multiple scales; from chromosomal territories and interchromosomal spaces9, through active and inactive (also known as A and B) intra- chromosomal compartments, and cohesin/CTCF mediated loop structures10. These models explain observations on the distribution of chromosomal contacts and domain insulation that give rise to topo- logical associated domains (TADs)11–14. Moreover, parallel advances in mapping the dynamics of genome replication show a high degree of linkage between chromosomal compartments, TADs, and genome replication time control15,16, highlighting genome replication as a key driver of the linkage between chromosomal structures and cellular proliferation. Quantification of the mitosis and replication cycle in chromosomes using synchronized cells17,18 and single cell Hi-C19,20 was Received: 27 November 2022 Accepted: 19 June 2023 Check for updates 1Department of Computer Science and Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel. 2The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel. 3Laboratory for Nuclear Dynamics, Institute for Protein Research, Osaka University, Osaka, Japan. 4Nuclear Dynamics Programme, The Babraham Institute, Cambridge, UK. 5Department of Biological Science, Florida State University, Tallahassee, FL, USA. 6These authors contributed equally: Nimrod Rappoport, Elad Chomsky. e-mail: pfraser@bio.fsu.edu; amos.tanay@weizmann.ac.il Nature Communications | (2023) 14:3844 1 12 34 56 78 9 0 () :,; 12 34 56 78 9 0 () :,; http://orcid.org/0000-0003-3461-9621 http://orcid.org/0000-0003-3461-9621 http://orcid.org/0000-0003-3461-9621 http://orcid.org/0000-0003-3461-9621 http://orcid.org/0000-0003-3461-9621 http://orcid.org/0000-0002-8458-9507 http://orcid.org/0000-0002-8458-9507 http://orcid.org/0000-0002-8458-9507 http://orcid.org/0000-0002-8458-9507 http://orcid.org/0000-0002-8458-9507 http://orcid.org/0000-0002-1623-9564 http://orcid.org/0000-0002-1623-9564 http://orcid.org/0000-0002-1623-9564 http://orcid.org/0000-0002-1623-9564 http://orcid.org/0000-0002-1623-9564 http://orcid.org/0000-0002-0041-1227 http://orcid.org/0000-0002-0041-1227 http://orcid.org/0000-0002-0041-1227 http://orcid.org/0000-0002-0041-1227 http://orcid.org/0000-0002-0041-1227 http://orcid.org/0000-0001-9419-3824 http://orcid.org/0000-0001-9419-3824 http://orcid.org/0000-0001-9419-3824 http://orcid.org/0000-0001-9419-3824 http://orcid.org/0000-0001-9419-3824 http://crossmark.crossref.org/dialog/?doi=10.1038/s41467-023-39549-4&domain=pdf http://crossmark.crossref.org/dialog/?doi=10.1038/s41467-023-39549-4&domain=pdf http://crossmark.crossref.org/dialog/?doi=10.1038/s41467-023-39549-4&domain=pdf http://crossmark.crossref.org/dialog/?doi=10.1038/s41467-023-39549-4&domain=pdf mailto:pfraser@bio.fsu.edu mailto:amos.tanay@weizmann.ac.il used to combine the effects of mitotic compaction and genome replication into one model describing the effect of cellular prolifera- tion on chromosomal structure. Overall, current data indicate that chromosomes are continuously being remodeled in all phases of the cell cycle - during exit from the mitotic state (M-G1 phase), while replicating (S-phase), and when re-entering the mitotic state (G2- M phase). A cycling dynamics of chromosome structure is therefore una- voidable for proliferating cell populations. This dynamics can be challenging if cells should combine proliferationwith the acquisition of stable transcriptional and epigenetic identities. A classical model for a process that must balance remarkable proliferation with rapid differ- entiation is embryonic development. Recent advances in single cell RNA-seq have provided unbiased and detailed maps of the earliest stages of transcriptional sorting during embryo gastrulation21. These data confirmed and refined the classical observations on the emer- genceof anepiblast cell population and its rapiddiversification into the three germ-layers by embryonic day 7.5. It also showed that diversifi- cation within the germ-layer is rapid and almost immediate, including early expansion of embryonic blood and several distinct mesodermal lineages, the differentiation of basic ectodermal neuronal progenitors, and the emergence of endodermal precursors from primitive pre- cursors and convergent extra-embryonic endoderm lineages22. Since these dramatic transcriptional events are occurring while cells are dividing at maximal rates (at least every 8 h on average), the chromo- somal structure underlying them must simultaneously support repli- cation and cell-fate acquisition. But it is currently not understood if and when chromosome conformation/structure in embryonic lineages differentiates and stabilizes. It is unclear if cell-type specific chromo- somal structures thatwere observed in-vitro23,24 or inmature tissues25,26 emerge before cells establish transcriptional identities, during (and in direct correlation with) transcriptional sorting, or only several cell cycles after cells commit to their fate transcriptionally. Here we use single cell Hi-C to explore the chromosomal organi- zation of post-gastrulation embryonic cells. We developed algorithms that combine analysis of replication time traces with contact distribu- tions to enable de-novo clustering of single cells in the embryo while minimizing bias by cell cycle signatures. This leads to two main obser- vations on the timing and structure of the initial cell type specific chromosomal structures in the embryo. First, we discover that a highly distinct chromosomal conformation is characterizing primitive ery- throcytes, showing that in principle, conformation can be specified and stabilized rapidly in differentiated cell types in the embryo. In contrast to this effect, most of the embryo nuclei show much milder con- formational heterogeneity that is associated primarily with broad clus- tering into mesoderm and ectoderm architectures. We show that the overall conformations of single cell Hi-C clusters representing the mesoderm and ectoderm layers are remarkably similar at the level of compartments and TADs. Nevertheless, we show that promoter- enhancer contacts that link ecto- or mesoderm specific promoter activity with specific enhancer markup are enriched for differential long-range contacts. Further analysis suggests that enhancers that are specific to diverse gastrulation lineages are interleaved within one group of TADs, while enhancers that are more accessible in the plur- ipotent epiblast state are demarcated from these genomic domains in a secondgroupof TADs. Together the data suggest thatwhile committed embryonic lineages may acquire specific chromosomal conformations rapidly, the majority of the embryonic lineages in gastrulation share a common and possibly more plastic chromosomal structure. Results Cell cycle signatures dominate embryonic chromosome conformations Weapplied single-cell Hi-C to assay chromosomal conformation in three E9.5 C57BL/6 Jmouse embryos.Weprocessed 3456 embryonic cells, out of which 87.15% passed quality control (QC) (Supplementary Fig. 1A–J). We sequenced at a depth that allowed recovery of a median of 91 K contacts per nucleus, with an overall low rate of trans-chromosomal contacts (median 7.85%), demonstrating high library quality (Fig. 1A). Across all cells,wecaptured310Mcontacts,with8% trans-chromosomal contacts. We initially phased nuclei along the cell cycle using our pre- viously reported strategy19, observing high degree of similarity between the parameters of the cell cycle model originally inferred for mouse embryonic stem cells (mESCs) and the embryonic cells. For example, we observed that 6.2%of thenuclei are enriched (20%ormore) for contacts in genomic distances ranging between 2–12Mb (Fig. 1B), defining a canonical mitotic cycle as previously observed for mESCs. Ordering embryo nuclei based on their distribution of contact distances as in Nagano et al.19 (Supplementary Fig. 1K) recapitulated the cell cycle dynamics involving transition between a G1 conformation landscape defined by long range (>12Mb) contacts and the S-phase regime invol- ving gradual increase in short range (<2Mb) contacts. To allow robust comparisonof the replication time trendsbetweenESCandEmbryoswe identified genomic regions that are constitutively replicating early or late in S-phase according to both datasets (defined as “strict early” and “strict late”, Methods, Supplementary Fig. 2A, B). Analysis of the ratio between Hi-C coverage in these genomic regions in embryo cells showed partial consistency with the trend observed in ESC, where we observed an increase in the ratio through mid-S phase and a decrease toward G2 (Fig. 1C). Interestingly, in the embryo this trend was per- turbed by a population of nuclei with high early/late coverage ratios and an atypically low fraction of short-range contacts in cells that were initially annotated as G1. These data reinforced our earlier observations on the dominance of cell cycle signatures in scHi-C, but also suggested the canonical signature may be shadowing additional conformational heterogeneity within the embryonic nuclei pool. Clustering scHi-C profiles using S-phase cluster seeding andRNA atlas projection To enable de-novo clustering of scHi-C profiles with reduced cell-cycle bias, we developed a two-stage approach (denoted S-phase cluster seeding). We seeded scHi-C clusters using analysis of replication time trends inmid-S phase cells and expanded these seeds to clusters using A-compartment association scores (A-scores, Methods). We applied this approach to a combined data set of ESC and embryo cells (Sup- plementary Fig. 2C–H), deriving a model defined by three main clus- ters, one involving a distinct group of embryo nuclei with non- canonical cell cycle phasing (C3, Supplementary Fig. 2I), and the other two representing clustering of the remaining embryo (C2) and ES (C1) nuclei. As expected, M-phase nuclei were poorly separated into clus- ters, but otherwise G1-S cell cycle variation was captured as intra- cluster structure (Fig. 1D). To annotate nuclei clusters and explore their underlying gene regulatory programs,we acquired and sequenced single cell RNA from two E9.0 embryos and from ESCs using MARS-seq and created a map of transcriptional states using Metacell27 (Supplementary Fig. 3A, B). We identified differentially expressed genes and genomic bins encompassing them for each expression metacell and projected scHi- C clusters on the transcriptional maps by calculating relative A-scores on these genomic bins. Remarkably, this strategy associated unam- biguously C3 conformations with primitive erythrocyte (pEry) expression (Supplementary Fig. 3C), but showed that the remaining transcriptional landscape in the embryo could not be matched by strong conformation clusters within C2. This was further confirmed by re-analysis of a reference gastrulation scRNA-seq atlas (E6.5–8.25, Supplementary Fig. 3D, E). Overall, despite the rich transcriptional embryonic space, C2 nuclei were reflecting variation that was approximately similar in extent to the transcriptionally homogeneous ESC states represented in the C1 cluster and only primitive ery- throcytes stood out as a distinct conformation cluster. Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 2 45e6 % S ho rt -r an ge c on ta ct s % Mito�c contacts Log early / late ra�o% Trans contacts 1. 5 1. 0 0. 5 0. 0 4.4 5.0 5.6 #Contacts (log10) 0 10 0 5 10 15 De ns ity 10 20 30 40 50 1 2 3 40 60 80 B C ES C A- sc or e Embryo A-score 0. 2 0. 6 1. 0 0.2 0.6 1.0 D F 0.2 0 -0.2 A sc or e Expression (emb) G 1 0.6 0.2 (-1 9, -1 8] (-1 8, -1 7] (-1 7, -1 6] (-1 6, -1 5] (-1 5, -1 4] (-1 4, -1 3] (-1 3, -1 2] (-1 2, -1 1] (-1 1, -1 0] (-1 0, -9 ] ES C ea rly -s co re Embryo early-score -1 -1 0 I 0 1 0 -1 0.4 E Embryo expression (log2) -25 -20 -15 -10 -18 -14 -10 ES C ex pr es sio n (lo g2 ) Em br yo ES C L 0 0.4 43e6 44e6 46e6 132e6 133e6 134e6 135e6 A- sc or e J (-1 9, -1 8] (-1 8, -1 7] (-1 7, -1 6] (-1 6, -1 5] (-1 5, -1 4] (-1 4, -1 3] (-1 3, -1 2] (-1 2, -1 1] (-1 1, -1 0] (-1 0, -9 ] 1 0 -1 Expression (embryo) Ea rly -S co re ES C- Em br yo A sc or e Em br yo T SS ES C TS S ES C – Em br yo Ea rly -s co re Em br yo T SS ES C TS S Embryo ESC 100 50 0 -50 -100 Enrichment score H K 0 0.4 47e6 48e6 49e6 50e6 0 Chr8 Chr16 Chr3 Tet2Dppa2/4Zfp42 0.02-0.02 0 0.04 PC1 PC 2 0.02-0.02 0 0.04 PC1 PC 2 Embryo pEry ESC Post-M G1 Early S Late S-G2 Pre-M % S ho rt -r an ge c on ta ct s A Fig. 1 | Single cell Hi-C in mouse embryo cells. A Distribution of the number of unique contacts per cell (left) and fraction of trans-chromosomal contacts per cell (right) in the Embryo scHi-C dataset.B For each single cell shown are the fraction of contacts in the 2Mb-12Mb (“mitotic”) distance band vs. the fraction of contacts between elements less than 2Mb apart (“short-range”). Color coding is based on classification into cell cycle phases as in Nagano et al. 2017. C Comparing normal- ized ratio of scHi-C coverage on early and late replicating loci (X axis) to fraction of short-range contacts.DVisualizing clusters of single ES and embryo cells using PCA projection of A-scores from 11 genomic clusters. Cells are color coded according to cluster (left) or the initial annotation of cell cycle phase (right). E Plotting gene expression of 40kb bins in ESCs compared to embryo cells (mean across E9 metacells, excluding pEry). Upper and lower dashed lines indicate the threshold for defining transcriptional changes between embryo and ESC. F Comparison of A-scores for 40kbgenomic bins.G 40 kb genomic bins were stratified according to embryonic expression level (units are log2 of the expression frequency). The dis- tributions of A-scores in embryos (blue) and ESCs (green) are depicted using boxplots. The (−19, −18] box contains at least n = 48K genomic bins, (−11, −10] and (−10,−9] at leastn = 20, and the rest at leastn = 200.Box limits are thefirst and third quartile, center line is the median, whiskers are 1.5 times the interquartile range, and points are outliers.H Distributions of differential A-score (ESC minus Embryo) in genomic bins with TSSs showing differential gene expression in embryos com- pared to ESCs (n = 1289 ESC induced bins: green, n = 806 Embryo induced bins: blue). Box limits are as in (G). I–K Similar to (F–H) but showing data on the early- scores of genomic bins instead of A-scores. L Examples of conformation repro- gramming atpluripotency loci. For each locuswe showShamanenrichment plots in Embryos (top) and ESC (middle), and the respective A-score trends (bottom; blue – embryo, green – ESCs). Dashed circles represent focal points for differential conformation. Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 3 Differential contacts in pluripotent and embryonic nuclei Many genomic bins showed average transcriptional changes in non- pEry embryo cells compared to ESCs (806 and 1289 bins with over 4-fold decrease and increase respectively, Fig. 1E). Global comparison of A-scores in the ESC (C1) and embryo (C2) clusters (Fig. 1F) showed however conservation of the A/B compartment structure, with 85% of the genomic bins showing less than 0.1 change in A-score, and only 0.2% showing over 0.3 change. Analysis of A-score in loci stratified according to expression levels (Fig. 1G, excluding bins with differential expression) suggested a clear distinction between A-linked expressed and B-linked non-expressed loci. Further analysis also indicated that bins containing genes over expressed in the ESC or embryo will have a higher A-score in that sample (Fig. 1H, KS D =0.4, p << 0.01). This association was observed based on relatively small changes in A-score and despite the lack of loci showingmajor A–B compartment switches. We next compared embryo and ESC estimated replication time per genomic bin (defined as the early-score, Methods). This suggested a similar trend of expression linkage (Fig. 1I–K, KS =0.36, p <<0.01). Together these data show that despite the mild magnitude of com- partment and replication-time remodeling in embryos compared to ESCs, it still reflects transcriptional regulation in these cells. We next searched for localized differential chromatin contacts in ESCs and embryo cells by pooling contacts from single-cell clusters and performing Shaman28,29 normalization and enrichment analysis. Using a threshold of differential enrichment score of 50, we identified 3267pairs of loci losing contacts and 1914 pairs of loci gaining contacts in embryos compared to ESCs, suggesting many cases of local con- formation remodeling. We observed that genomic bins with higher A-score in ESCs are involved in significantly more differential contacts than loci with constitutively high A-score or loci gaining A-score in the embryo (Supplementary Fig. 4A, two-sided Kolmogorov–Smirnov test). A screen for differential contacts at ESC-regulated TSSs high- lighted cases of conformation changes with potential regulatory impact (Supplementary Data 1). For example, we observed specific contacts and insulation structure isolating the pluripotency genes Rex1/Zfp42, Tet2 and Dppa2/4 from surrounding B-compartment associated regions in ESC nuclei (Fig. 1L, see Supplementary Fig. 4B for conformation changes in loci conserving their A-association). These data suggested that specific contacts, with possible linkage to gene regulation and in particular to the repression of the pluripotency program, are observed in embryonic nuclei. This is occurring even when global structural features such as compartment, replication and insulation (Supplementary Fig. 4C) are changing only mildly. Primitive Erythrocyte chromosomes show compact and highly organized folding In contrast to the weak separation of scHi-C clusters C1 and C2, the pEry cluster C3 was defined by a well separated group of 264 single cells (reclassified using total A-scores per cell, Supplementary Fig. 4D). This separationwas supported by a large number of genomic binswith modifiedA-score in pEry compared to other embryo cells (Fig. 2A, 4.8% with A-score delta > 0.3). We estimated mean expression in pEry and non-pEry E9 embryo metacells (Fig. 2B) and noted that in pEry, genomic bins bearing expressed genes at any level show remarkable alignment to the A-compartment (Fig. 2C). Estimation of single cell early/late coverage ratios (Fig. 2D), showed that cells classified as pEry are enriched in S-phase, but are also represented in other phases. Genome replication landscapes (quantified by early-scores) weremore conserved than A-scores (Fig. 2E), but genomic bins containing expressed genes showed earlier replication, in concordance with their increased A-score (Fig. 2F). Beyond its unique compartment structure, the pEry single cell cluster was also characterized by uniformly high fractions (30–60%) of contacts over >2Mb (Fig. 2G). The data also showed high variance for pEry long range contact distances, with no distance bin representing over 6% of the contacts (Fig. 2H). This property distinguishes the long-range contacts in pEry maps from those observed in embryonic or ESC G1 cells during exit frommitosis. Despite the higher rate of long range intra-chromosomal contacts, pEry nuclei show low rates of trans-chromosomal contacts (Fig. 2I). Together these observations indicate pEry chromosomes form com- pact and highly organized territory structures,with A/B compartments that are strongly demarcated and reflective of transcriptional activity patterns. pEry funnel-like A-compartment structures are anchored at TSSs and cryptic loci To understand further the sharp increase in pEry A-compartment association specificity, we identified 357 loci with the highest increase in pEry specific A-scores. Clustering of A-scores profiles over 400 kb around such pEry A-specific loci showed that about 50% of the sites (Fig. 2J, clusters A1-A3) involved sharp A-linked pEry hotspots that reside in the B compartment in non-pEry embryo cells. The remaining sites typically represented increase in A-score for a larger domain bounded by the identified pEry A-linked peak (clusters A4-A8). Pro- jectionof differentialTSS expressionon the clusteredgenomic interval confirmed that the majority (92%) of pEry A-peaks were associated with an expressed TSS (Fig. 2K). Interestingly it also suggested many hotspots of A-association could not be explained by any known loca- lized transcriptional driver. We then computed the mean pEry and non-pEry contact enrichment patterns for the loci clusters (Fig. 2L). In pErys, this revealed an unexpected trend involving a funnel-like structure representing aligned contacts around the focal A-compartment contact hotspot. Contact enrichment maps around the same loci in non-pEry nuclei showed these sites are located within embryo insulators andbetween two loop structures (Fig. 2L - right).We visualized individual loci showing major funnel-like conformational remodeling around key genes (Fig. 2M) and multi-peak loci (Supple- mentary Fig. 4E), but also in hotspots that represented uncharacter- ized regulatory effects (Fig. 2M, bottom). This suggested that strong A-compartment alignment in pErys is not driven solely by transcrip- tion, and must therefore also involve some other trans-acting factors (e.g., we observed enrichment for erythrocyte TF binding, Supple- mentary Fig. 4F, G). For control, we clustered profiles of 272 loci with top non-pEry A-score increase, indicating lack of similar funnel-like effects in the embryo conformation cluster (Supplementary Fig. 4H). To validate that the highly specific conformations in C3 nuclei are indeed representing primitive erythrocytes in a non-biased fashion, we sorteddirectly 118 primitive erythrocytes cells fromE10.5 embryos and generated new single-cell Hi-C profiles from them (Supplementary Fig. 5A–D). The data confirmed that sorted pEry cells represent the same sharp A/B compartment structure as the one characterized in non-sorted cells and reconfirmed the presence of remarkable funnel- like structures in these cells (Supplementary Fig. 5E, F). Of note, Guo et al. recently reported a similar funnel structure in thymocytes and B cells, which they termed “chromatin jets”, suggesting its prevalence in hematopoietic cells30. Refined embryo clustering by model-based analysis of replica- tion dynamics Since embryo transcriptional states are highly heterogeneous at E9, we made several attempts to enhance resolution within cluster C2, searching for conformation variation that can be linked with differ- entiating cell types on the background of massive proliferation sig- natures. Direct clustering of single cell coverage profiles in S-phase cells and UMAP visualization of these cells (Fig. 3A) suggested cell-to- cell variation may be present in the data, but showed that it is super- imposed over strong cell-cycle gradients, even when restricting ana- lysis to replicating cells alone. We therefore developed a sensitive algorithm that considers both the replication cycle and the potential cell-type structure explicitly and quantitatively (Supplementary Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 4 Fig. 6A, B). The algorithm infers a probabilisticmixturemodel in which each cell is associated with a cluster and a latent replication timing variable defined as the s-score. Each cluster specifies the replication timing of each genomic bin, such that once a cell’s s-score is inferred, the algorithm can compare its observed bins read coverage to the values predicted by a linear replication process that is timed in a bin- specific way (Methods). The algorithm tries to fit the observed data by clustering cells de-novo while simultaneously inferring their s-scores and the cluster-specific replication timing parameters. We used cross- validation to tune model parameters and verify the algorithm robust- ness (Supplementary Fig. 6C–I). This resulted in good matching of observed andmodeled replication regimes (Supplementary Fig. 6J) for % Mito�c contacts Far �ghtness De ns ity 0 0. 4 0. 8 1 2 3 A 0 10 20 30 2 6 10 14 IG H 10 30 50 10 30 50 % > 2M b co nt ac ts 10 30 50 0.04 0.08 0.12 Log early / late ra�o % inter-chromosomal A- sc or e -18 -10-16 -12-14 Log2(RNA average) 0 1 -18 -10-16 -12-14 Log2(RNA Ery) -18 -10-16 -12-14 Log2(RNA Embryo) -18 -10-16 -12-14 Log2(RNA) -18 -10-16 -12-14 Log2(RNA Ery) -18 -10-16 -12-14 Log2(RNA Embryo) Ea rly -s co re -1 1 B C D E F 0 0. 4 0. 8 0 0.4 0.8 -2 0 -1 5 -1 0 -20 -15 -10 -1 0 1 -1 0 1 A-score Embryo A- sc or e pE ry Lo g2 (p Er y RN A) Log2(mean embryo RNA) 100500-50-100 Enrichment score00.51 Early–score non-pEry Ea rly –s co re p Er y De ns ity A-score pEry Non-pEry J K A1 A2 A3 A4 A5 A6 A7 A8 0 pEry -4e5 4e5 0 non-pEry -4e5 4e5 1e6 L -1e6 0 1e6 pEry -1e6 0 non-pEry A1 A2 A3 A4 A5 A6 A7 A8 A1 A2 A3 A4 A5 A6 A7 A8 0-4e5 4e5 % > 2M b co nt ac ts 40-4 Bin RNA Ery-Embryo 57e6 58e6 59e6 60e6Chr16 0.6 0.2 110e6 0.25 111e6 112e6 113e6 43e6 44e6 45e6 46e6 47e6 0.75 Chr7 0.2 0.6 Chr2 Cpox Hbb M N on -p Er y pE ry N on -p Er y pE ry N on -p Er y pE ry Chr2:45M Log2(RNA pEry/Embryo) > 2 Log2(RNA pEry/Embryo) < -2-2 < Log2(RNA pEry/Embryo) < 2 Ge no m ic b in Coordinate pEry Non- pEry A- sc or e A- sc or e A- sc or e pEryNon-pEry Log2(RNA pEry/Embryo) > 2 Log2(RNA pEry/Embryo) < -2-2 < Log2(RNA pEry/Embryo) < 2 Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 5 a model including 3 clusters denoted C2.1, C2.2 and C2.3. Importantly, the model’s inferred s-scores facilitate normalization of the coverage statistics for each cell. UMAP projection of such normalized profiles showaclear, cell-cycle independent cluster structure (Fig. 3B). Analysis of the observed cluster structure suggested C2.1 and C2.3 are dis- tributed homogeneously along the replication cycle (Fig. 3C). Cluster C2.2 showed skewed distribution enriched for late-S profiles and additional analysis indicated cells within the cluster are of lower cov- erage and potentially lower quality (Supplementary Fig. 7A). We note that we could not derive robust results using alternative methods for clustering scHi-C data, which are based on differential compartment structure and are lacking explicit cell cycle modelling31,32 (Supple- mentary Fig. 7B, C). Ectoderm and mesoderm/endoderm scHi-C clusters We estimated replication time per genomic bin (early-score) in the C2.1-3 clusters to facilitate their further annotation. These estimations were consistent for C2.1 and C2.3 (Fig. 3D), but showed C2.2 cells are skewed to high and low coverage values (as expected by their bias to mid to late-S phase, Fig. 3E).We identified groups of genomic binswith C2.1 or C2.3-specific early replication, showing that pooling coverage on these groups provided replication-time dependent separation of the clusters (Fig. 3F). Moreover, mean A-score over the same genomic bin groups showed matching separation of single cells (i.e. early replicating loci in C2.1 were also more A-associated in C2.1 and con- versely for C2.3, Fig. 3G). This allowed expansion of our clustering to additional S-phase cells (Fig. 3H). A similar approach was not applic- able to G1 cells (Supplementary Fig. 7D). Overall this strategy yielded a total of 431 C2.1 cells and 504 C2.3 cells for further analysis. We sear- ched for cluster-specific replication time in groups of loci representing correlated gene modules inferred from scRNA-seq data (Fig. 3I–K, Supplementary Fig. 7E–G). This unambiguously associated cells in cluster C2.1 with ectodermgene expression programs and cells in C2.3 with mesoderm or endoderm programs. Cells in C2.2 were not asso- ciated with any gene expression program. The annotation of clusters C2.1 and C2.3 was supported by comparing their compartments to data from Neural Progenitor Cells (NPCs) and Hematopoietic Stem Cells (HSCs) from E14.5 embryos (Supplementary Fig. 7H–J)24,33. Lineage-specific scHi-C conformation differences are weak To test the potential for detecting additional cell-type structure given the limited breadth and depth of our scHI-C sample, we performed simulations with downsampled data. These experiments show that the data and algorithms are sufficiently sensitive to allow detection of clusters similar to C2.1 and C2.3 even when these involved as little as 50–75 cells (~5–10% of the modeled cells, Supplementary Fig. 8A, B). This suggests that other possible chromosomal differences between cell types areweaker, or are present in scarcer cell populations. Further analysis suggested that even for themesodermand ectodermclusters, contact landscapes could be remarkably similar, even around loci that support dramatic transcriptional regulation (e.g., Igf2 or Crabp2, Fig. 3L). Quantitatively, only 1% of the genome (divided into 40 kbbins) show A-score different of 0.2 ormore between the clusters, compared to 3.6% in a comparison of the E14.5 NPC and HSC maps (Supple- mentary Fig. 8C). Consistent with their overall similarity in conformation land- scapes, further dissection of themesodermor ectoderm into cell types using our mixture model approach (Supplementary Fig. 8D, E) was deriving only cell-cycle dependent refinements of the C2.1 and C2.3 clusters. To improve on this, we used inferred replication time para- meters to normalize coverage profiles per cell in each cluster (Sup- plementary Fig. 9A, B). Hierarchical clustering of the resulted data did identify an intra-mesoderm lineage structure, including a small cluster strongly matching the endothelial transcriptional state (Supplemen- tary Fig. 9C, D). It can therefore be hypothesized that replication time and compartment structure of refined embryonic lineages may be detected using sensitive algorithms and deeper single cell sampling. But the data strongly suggest that the magnitude of conformational changes between such refined lineages will remain small, in particular compared to the highly distinct pEry state we described above. Three-way identification of regulated long-range interactions Pooling contacts in ectoderm andmesoderm scHi-C clusters provided us with a strategy for identifying germ-layer specific chromatin inter- actions. We first identified 256 and 236 loci with higher ectoderm or mesoderm/endoderm A-score respectively. Annotation of these sites (Supplementary Data 2) revealed several important regulatory genes, for example the mesoderm TF Twist1, and the epiblast/ectoderm TF Sox2 (Fig. 4A, B). Comparative analysis of contact maps in these loci showed again a very high degree of consistency between the global conformation of the two clusters. Nevertheless, we could identify refined alteration in contact distributions of the promoter of Twist1 with a putative regulatory element (shown by virtual 4C, Fig. 4B). To generalize this observation, we used published histone modification maps from ectoderm (hind-, mid- and forebrain) and mesoderm (heart, limb) tissues, and identified cell type specific putative enhan- cers (Methods, Supplementary Fig. 10A, B). We also screened for identified genes with germ-layer specific expression, and combined them with the epigenomic maps by mapping each enhancer to its closest promoter. Proximal pairs of enhancers and promoters with matching ecto- or mesoendo- specific activity could then be defined (Fig. 4C). To complete a three-way integrative screen on putatively Fig. 2 | Distinct, compact conformation for primitive erythrocytes. A Comparison of 40kb bins A-score in pEry cells vs. non-pEry embryo cells. Upper and lower dashed lines show differences of at least 0.3 in A-score.B Comparison of log2 mean expression (fraction of molecules per gene) for 40 kb genomic bins. C Distribution of genomic bins’ A-score as a function of expression levels. A-score and transcription were calculated for 40kb genomic bins. Plots show A-scores stratified by expression, for loci classified with conserved expression (left), Ery induced expression (middle) and Ery-repressed expression (right). In the left panel, the (−19, −18] box contains at least n = 48K genomic bins, (−11, −10] and (−10, −9] at least n = 20, and the rest at least n = 200. In the middle panel, all boxes contain at least n = 15 genomic bins, except for the (−11, −10] and (−10, −9] which contain at least n = 5. In the right panel, all boxes contain at least n = 90 genomic bins, except for (−12, −11], (−11, 10] and (−10, −9] which contain at least n = 35, 10, and 2 respectively. Box limits are the first and third quartile, center line is the median, whiskers are 1.5 times the interquartile range, andpoints are outliers.DDistribution of single cell early/late coverage ratio for pEry (red) and non-pEry (black) cells. E Comparing early-scores for 40kb genomic bins in pEry and non-pEry embryo cells. F Similar to (C), but showing distributions of 40kb genomic bins early-score instead of A-score. G Showing the distribution of contacts with distance >2Mb vs mitotic contacts (2–12Mb) inpEry (red) andnon-pEry cells (black).Note the general high degree of long range contacts in p-Erys.H Showing the fraction of contacts in the most frequent distance bin (defined as “Far tightness” in Nagano et al 2017) compared to the rate of long-range contacts. I Distributions of inter-chromosomal contact rates for pEry and non-pEry cell. J Shown are color coded A-scores com- puted for the pEry (left) and non-pEry (right) clusters around loci with pEry specific high A-score (400 kb upstream and downstream). Loci are grouped into 8 clusters using K-means clustering. K For each of the loci clustered in J we color coded bins with any level of transcription according to the relative expression in pEry and non- pEry cells (blue – higher in non-pEry, red - higher in pEry). L Lociwithin each cluster in J were pooled, and their average Shaman score is color coded for pEry and non- pErycells. ThepooledA-scoreprofile is shownat thebottom for every loci cluster in pEry and non-pEry. M Examples of loci showing distinct pEry conformation. For every locus, depicted are contact enrichment in non-pEry cells (top), pEry cells (middle) and profile of A-score in the two clusters (bottom). For Cpox and Hbb we mark contacts with the TSS locus by black diagonal lines. Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 6 interactingpairs,wenext computed the contact enrichment in theC2.1 and C2.3 contact maps for each of the matching promoter-enhancer pairs. We observed significant contact enrichment in the ectoderm scHi-C cluster for ectoderm specific promoter-enhancer pairs, and mesoderm contact enrichment in the mesoderm pairs (Fig. 4D). We also implemented a direct statistical test for contact frequency around putative ectoderm and mesoderm hotspots (Supplementary Fig. 10C, D), which supported a similar observation. We note that the two methods differ in their normalization strategy and power, and their identified hits are only partly overlapping. When using Shaman com- parison, we detected 173 and 338 promoter-enhancer pairing with 3-way support for ecto- andmesoderm regulatory activity respectively A M1: Hes3 M2: Hand1 M3: Crabp2 M4: Crabp1, Lhx, Six M5: Hbb/a M14: Sox11 M8: Neurog1, Pou3f1 M9: Epcam, Pdlim1 M6: Cdh5 M13: Vim M10: Igf2, Dlk1 M11: Ankrd1 M15:Twist1 M12:T, Meox1 M7: Sox9, Ets, Ftl1 Ec to Ery E. M es o CM Endot S. Ecto. Mean Early Score Mean A Score 2 1 1.5 1.75 1.25 2 0 -2 0 2-2 C2.1 Early-score C2.1, C2.3 mean Early-score 0 2-2 2 0 -2 C2 .3 E ar ly -s co re C2 .2 E ar ly -s co re Igf2 148e6 149e6 150e6 151e6 86e6 87e6 88e6 89e6 Crabp2 0.40-0.4 0 4 8 12 0-1 1 0 2 4 D E G -0.6 -0.2 -0 .4 0. 0 Normalized C2.1 A score H N or m al ize d C2 .3 A sc or e 1. 8 co ve ra ge (% ) 2. 0 2. 2 Cell s-score 1.2 1.4 1.6 1.8 1. 6 Cell s-score 1.2 1.4 1.6 1.8 2. 0 2. 4 2. 8 F Umap 1 U m ap 2 Inferred cell s- score -2 20 2 -2 B C2.1 C2.3 C2.2 Umap 1 U m ap 2 20-2 4 0 -4 Difference in cells’ mean Early-score De ns ity Difference in cells’ mean A-score I J K L Chr7 Chr3 7 12 15 11 2 10 13 6 4 9 5 1 14 3 8 C2 .1 C2 .2 C2 .3 0.04 0.02 0 -0.02 -0.04 7 12 15 11 2 10 13 6 4 9 5 1 14 3 8 C2 .1 C2 .2 C2 .3 0.02 0.01 0 -0.01 -0.02 C2 .1 C2 .3 C2 .1 C2 .3 0. 5 De ns ity 1. 5 2. 5 1.2 1.6 C Inferred cell s-score C2.1 C2.3 C2.2 Normalized C2.1 A score 0. 0 -0.4 0.0 N or m al ize d C2 .3 A sc or e -0 .4 100500-50-100 Enrichment score Ge ne m od ul e Gene-gene correla�on 10.5-0.5 01- C2.1 early bins C2.3 early bins Original clusters (n=699) Extended clusters (n=898) M es o Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 7 (Supplementary Data 3), including many examples linked with key regulators of cell type specific transcriptional programs (See examples in Fig. 4E). These putative interactions should be interpreted carefully. First, while we believe comparisons using Shaman scores are more sensitive, these cannot be fully controlled statistically. Second,wenote that only 1.5% of the highest intensity (Shaman score difference > 40) differential ectoderm-mesoderm contacts were annotated within one of our enhancer-promoter pairs, illustrating that the complex con- formational landscape in these clusters involvesmanyuncharacterized contacts despite showing only weak compartment and TAD differences. Polycombmarkup and ectoderm specific long-range contacts in the Tbx3-5 locus Our 3-way analysis of regulated promoter-enhancer pairs suggested contact enrichment is positively linked with lineage-specific gene activation in most cases. It is however possible that contact enrich- ment will be associated with gene repression, as postulated previously for Polycomb domains34–36. We therefore screened for ectoderm/ mesoderm differential H3K27me3 loci (using hind-, mid- and fore- brain/heart and limb) with proximal anti-correlated promoter expression pattern (Supplementary Fig. 10E–G). This screen yielded several candidate locus pairs showing high H3K27me3 occupancy in correlation with proximal gene repression and low contact intensity (Supplementary Data 4), where most of these cases were of lower specificity than the positive interactions observed for activated genes. A reciprocal effect was detected in the Tbx3-Tbx5 locus, where Poly- comb marks and gene repression were associated with increased rather than decreased contact intensity. This locus codes for two transcription factors with sophisticated transcriptional control, where Tbx3 is expressed in the epiblast and most mesodermal tissues, and Tbx5 is specific to pharyngeal mesoderm and cardiomyocytes (Sup- plementary Fig. 10H). In the mesoderm cluster, consistently with pre- vious reports37, we observed two TAD structures (contacts over L1 and L2, Fig. 4F, Supplementary Fig. 10I) physically separating the two TFs. In the ectoderm, however, the near-complete repression of both genes is correlated with the emergence of a new Tbx3-Tbx5 contact (L3), and severe attenuation of the L1 contact. The internal structure at Tbx5 (L2.1) is unperturbed. While we have not observed other repressive chromatin structures of similar intensity, this example suggests that de-novo establishment of chromatin interactions may be facilitated in the context of either the Polycomb or some other uncharacterized repressive machinery. Gastrulation cell-type specific accessibility hotspots are inter- twined within TADs We reasoned that the linkage between extensive transcriptional diversification in gastrulation and the rather rudimentary observed chromosomal conformation diversity must involve the chromosomal and genomic distribution of active regulatory elements and pro- moters. Using single-cell ATAC/RNA-seqmultiomics data38, we derived clusters of cell type specific chromosome accessibility peaks with specific distributions over the key gastrulating cell types (Fig. 5A, Supplementary Fig. 10J). We then tested the A-score distribution of the loci in each cluster of peaks. Comparing ESC and embryo A-scores (Fig. 5B) we discovered stronger A-linkage in ESC for cluster 27, 37 and 38, which are enriched for accessibility in extraembryonic tissues and early gastrulation state (e.g. Epiblast). Comparing embryo and pEry A-scores (Fig. 5C) showed strong pEry A-linkage in clusters 8, 9 and 5, which represent erythrocyte or combined hematoendothelial peak specificity. Importantly, the extent of A-association differential for pEry clusters was significantly higher than that observed between ESCs and embryo cells. Comparison of mesoderm and ectoderm A-scores (Fig. 5D) showed several clusters with compatible A-score and acces- sibility preferences including clusters 69, 75 and 76 for the ectoderm, and clusters 95, 98, 99 and 117 for the mesoderm. This analysis also highlightedmore complex combinatorics such as the oneobserved for cluster 73 (accessible in both ectoderm and endoderm). The compartment association analysis of the ATAC peak clusters confirmed that we can observe strikingly cell-type specific accessibility hotspots in loci with very mild compartment association differences. Since chromosomal organization is observed at scales of at least 10 s of kilobases and TADs are typically organizing hundreds of kilobases into looped units, we reasoned that this effect could be explained if accessible hotspots with differential cell type activity were intertwined within large chromosomal units rather than demarcated into cell type specific domains. To test this idea, we computed log enrichment ratios for genomic proximity between clustered ATAC peaks. These values are positive if ATAC peaks from one cluster are more likely than expected by chance to be localized within 200 kb of peaks from another cluster in the same TAD. Negative values represent under- representation of pairs from the same cluster at <200 kb distance and within the same TAD. As shown in Fig. 5E, this analysis showed that peak clusters with activity in embryonic cell types but not extra- embryonic types (P2), or peak clusters with strong embryonic cell type specific accessibility (P3) are overall demarcated from constitutively accessible sites (P1) or loci that are active specifically in the extra embryonic or early embryonic states (P4). While there are additional proximity relationships within the embryonic peak clusters, the pri- mary organizational principle seems to package the thousands of regulatory elements driving gastrulation in relative proximity, while isolating them from pluripotency or constitutive regulatory elements. Discussion In order to characterize how chromosome conformations are reorga- nized immediately following gastrulation, we generated single cell Hi- C maps from more than 3000 mouse E9.5 embryo cells. We modeled the derived maps along two major axes: first, we aimed to account for the conformation changes occurring during the replication and mitotic cycle; second, we searched for clusters of conformations that canbe associatedwith the rich transcriptional landscape in the embryo at this stage. Separating these two simultaneous dynamics in the Fig. 3 | Ectoderm and mesoderm/endoderm scHi-C clusters in the embryo. A S-phase cells from the non-pEry cluster were identified and projectedon2Dusing UMAP analysis of their coverage in 1103 loci. Cells are color-coded by their s-score as inferred by our probabilistic model. B UMAP projection of the same cells as in (A), using features normalized given inferred s-score for each cell.CDistribution of inferred s-scores for the three non-pEry embryo clusters. D Average normalized coverage (early-score) for genomic bins in clusters C2.1 and C2.3. E Similar to (D), but comparing average C2.1 and C2.3 behavior to C2.2 behavior. F Genomic bins thatwere inferred to be early replicating (Methods) inC2.1 (left) orC2.3 (right)were pooled, and for each cell we plotted total coverage as a function of the inferred s-score. Cells are colored by their cluster (C2.1 – green, C2.3 – orange). GDistribution of the differencebetweenC2.1 cells andC2.3 cells in early-score (left) and A-score (right) for genomic bins classified as specific to C2.1 (green) or C2.3 (orange). Grey – all bins. H Average normalized A-score for the group of genomic bins specific to C2.1 (X) and C2.3 (Y) are depicted for color-coded cells in the three clusters C2.1- 3 (left). A Similar plot is shown for 898 cells that were not included in the set of 699 mid S-phase cells used for clustering (right). Gray lines mark the thresholds used for classification of the expanded C2.1 and C2.3 clusters. I Correlation heatmaps for 2353 gene expression profiles over the E9.0 metacell model. Gene module numbers and representative genes are shown on the right. S. ecto Surface ectoderm, CM cardiomyocyte, Endot Endothelium, E Meso extra- embryonic mesoderm. J The color-coded matrix represents the difference in average early-score per single cell cluster (columns) for the TSS loci in each gene module from I (rows).K Similar to (J), but showing difference in average A-score in each cluster. L Depicting the contact structure (color-coded Shaman map) in C2.1 (top) and C2.3 (bottom) cells around the Crabp2 and Igf2 TSSs. Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 8 embryo (or other tissues) remains a major analytical challenge. Iden- tification of cell types in cells approaching mitosis or exiting it is not realistic at this stage. But algorithms we introduce here can use robust changes in genome replication time to cluster mid S-phase cells and then derive contact matrix-based (in particular differential A-compartment association) signatures from S-phase clusters. Based on these signatures, cells from nearly all parts of the cell cycle can be classified into balancedmodels of cell types.Once a cluster structure is inferred, we can pool contacts from single cells into conformation maps and explore cluster-specific differential compartments, long- range contacts and putative promoter-enhancer interactions at high resolution. Bnip2,more Foxf2,q1 De ns ity Sox2 Twist1A C L3 L1 L2 Tbx5 6e6.0226e4.0226e0226e8.9116e6.0226e4.0226e0226e8.911 Tbx5Tbx3Tbx3 0. 4 0. 8 A- sc or e 0. 0 -5 0 50 Sh am an 34e6 35e6 Twist1 Hdac9Atxn7l1,.. 34e6 35e6 Sox2Fxr1 F Foxd2 Gata6 Msx2 Foxc1 Foxb1 Otx2 Rfx4 Pou3f2Rbbp8More.. Skint11,5,. Tal1,more.. Drd1a, moreRor2, more (Gmds) Rora, more Fbx14, morePeli2Ktn1 Exoc5, more Polr3b,more Cry1, more 0. 0 0. 4 0. 8 A- sc or e B E Enhancer-promoter distance 32K 128K 512K 0.1 0.3 0.5 Promoter Enhancers Ecto Ecto Ecto Meso Meso Ecto Meso Meso C2 .1 C2 .3 De ns ity 115e6114e6 54e653e6 70e669e6 85e684e6 12e611e6 32e631e6 50e649e6 23e622e6 D Ecto (C2.1) Meso (C2.3) L3 L1 L2 L2.1L2.1 CoordinateCoordinate 100 50 0 -50 -100 Enrichment score C2 .1 C2 .3 C2.1 C2.3 -5 0 50 Sh am an -5 0 50 Sh am an -5 0 50 Sh am an 7 13 7 13 7 13 7 13 7 13 7 13 7 13 7 13 H3 K4 m e1 C2 .1 C2 .3 H3 K4 m e1 C2 .1 C2 .3 H3 K4 m e1 C2 .1 C2 .3 H3 K4 m e1 C2 .1 C2 .3 C2.1 – C2.3 shaman diff -20 0 20 40 0. 00 0. 01 0. 02 0. 03 Ectoderm pairs Mesoderm pairs Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 9 Our analysis of Hi-C maps in mouse post-gastrulation highlights several aspects of the relationship between chromosome conforma- tion and embryonic differentiation. First, while the genome organiza- tion of ES cells compared to the embryo reflects changes in regulation of key pluripotency genes, the organization within the embryo is lar- gely homogeneous. This suggests that differences in chromosomal conformation between ES cells and E9.5 cells are greater than those between different cell types immediately after gastrulation and at the onset of organogenesis. The exceptions to this homogeneitywithin the embryo are the distinctively folded primitive erythrocytes. Ery- throcytes are unexpected positive controls for the ability to precisely detect a cell type specific conformation when it exists. The unique, compact and highly organized structure of pEry chromosomes cannot be explained by gene expression alone. In contrast to other differ- entiating embryonic tissues that continuously respond to signals from neighboring cells and tissue contexts, erythrocytes are fully com- mitted to their functional fate, which may explain their highly distinct (and potentially less plastic) conformation. It is unclear if the ery- throcyte chromosome condensation and enucleation program39 is related to the conformation we observe at E9.5, since definitive ery- throcytes only appear several days after. Similar effect could be expected in other terminally differentiated cell types, such as cardio- myocytes or endothelium. But our analysis could not detect a cardi- omyocyte conformation cluster, and the small cluster that we linked with endothelial programs could not be associated with a highly dis- tinctive conformation, but was clustered as part of the mesoderm state. It is possible that the reason for this is that these cell states are differentiating much later than pEry cells. Within the embryo-proper we detected two clusters that match broad ectoderm and mesoderm/endoderm genome regulatory pro- grams. The considerable transcriptional diversity within the meso- derm (and to a lesser extent the ectodermand endodermat E9) at this stage was correlated very weakly with conformation sub-clusters within these two clusters. Our current scHi-C data is limited in its depth and number of cells, in particular compared to scRNA-seq or scATAC modern datasets and our analysis suggests that sampling more embryonic cells may lead to characterization of additional sta- tistically significant conformation clusters. But subsampling and in- depth analysis show that such potential additional conformation clusters are unlikely to represent high intensity differential con- formation features (as those we detected for pEry cells). The differ- ences between the clusters in terms of replication time regime and compartment structure were small and we had to use sensitive algo- rithms to deconvolve them from the more apparent cell-cycle sig- nature. Interestingly, on the background of such a homogeneous conformation landscape we detected hundreds of lineage specific promoter-enhancer contacts that showed matching expression and epigenetic markup in the respective tissues. This argues for an important role for localized embryonic contacts within an initially homogeneous TAD and compartment structure in the embryo. However, the epigenetic stability of such local contacts and the existence of factors regulating them (in addition to the known TFs binding the relevant enhancers) are still unclear. Furthermore, only a small fractionof differential contacts couldbe explained byenhancer- promoter interactions. It also remains to be seen how specific loca- lized contacts and their higher order structures29,40,41 contribute to later emergence of broader contact structures, as previously observed in the brain and other tissues. Conversely, since we showed in the case of erythrocytes that chromosomes can in principle be reprogrammed quickly, it will be interesting to understand how conformation in the embryo remains relatively homogeneous despite the activity of specific gene regulatory program, which epigenetic factors may facilitate the maintenance of such a flexible conforma- tion, and whether this is linked with the retained developmental plasticity of most embryonic cells at this stage. Methods Experimental methods Cell extraction, fixation and permeabilization. Pregnant C57BL/6 mice were sacrificed at day 9.5 post-coitum and three embryos were dissected under a microscope, in accordance with the Babraham Institute Animal Welfare and Ethical Review Body. The yolk sack was mechanically removed from each embryo, leaving the embryo proper only, and the embryos’ morphology was validated to match that of a wildtype E9.5 embryo. To create single-cells suspension, each embryo was moved to a 1.5ml tube containing 200 µl of trypsin-EDTA (0.05% trypsin, 0.02% EDTA) and incubated at 37 °C for 5min. 800 µl of cold MEF medium was then added to each tube to inactivate the trypsin. To fix the cells, the cell suspensions of all three embryos were combined andMEFmedium at room temperature was added to a final volume of 21ml. 3ml of 16% formaldehyde were added (2% for- maldehyde final concentration) and the mixture was incubated for 10min at room temperature, followed by quenching with 127mM glycine for 5min on ice and washing with cold PBS + 0.001% BSA. Cells were then permeabilized in 10mM Tris-Cl pH 8, 10mM NaCl, 0.2% IGEPAL CA-630 and cOmplete EDTA-free protease inhibitor cocktail (Roche) for 30min on ice with intermittent agitation, and spun to collect a nuclei pellet. Single-cell Hi-C library preparation. scHi-C libraries were prepared in a fashion similar to the one previously described19. Briefly, the nuclei were washed with 1.24x NEBuffer 3 (New England Biolabs) and sus- pended in 400 µl of that buffer. 6 µl of 20% SDS and then 40 µl of 20% Triton X-100 were added to the suspension, with an incubation of 60min at 37 °C with constant agitation following the addition of each of these detergents. Next 50 µl of 25 U/µl MboI (New England Biolabs) was added and the suspension incubated at 37 °C overnight with constant agitation. To label the digested DNA ends, dCTP, dGTP, dTTP and biotin-14- dATP (Thermo fisher) were added to the suspension (final concentra- tion of 28.36 µM per nucleoside triphosphate) along with DNA poly- merase I, large (Klenow) fragment (New England Biolabs, final concentration 0.095U/µl) and the sample incubated at 37 °C for 60minutes with occasionalmixing. The sample was then spun and the supernatant partially removed, leaving a volume of 50 µl, followed by the addition of 100 µl 10x T4 DNA ligase reaction buffer (New England Biolabs), 10 µl 100x BSA (New England Biolabs), 10 µl of 1 U/µl T4 DNA ligase (Thermo Fisher) and water to a final volume of 1ml, and incu- bated at 16 °C overnight. Finally, the nuclei were filtered through a 30 µmcell strainer and single nuclei were sorted into individual empty wells in 384 well plates using an BD Influx cell sorter. The plates were sealed and stored at −80 °C until further processing. Fig. 4 | Three-way support for specific regulatory contacts. A, B Comparing A-score (top), contact maps, virtual-4C using Shaman scores, and H3K4me1 ChIP- seq (bottom) around the Sox2 and Twist1 loci. The genes, and for Twist1 also a nearby enhancer, are marked by vertical grey lines. C Shown are distributions of genomic distances between a TSS and the nearest putative enhancer classified according to the ectoderm/mesoderm lineage specificity of the two loci as deter- mined by gene expression (for the promoter) and ENCODE ChIP-seq (for the putative enhancer).DThedistributionof differentialC2.1 andC2.3 Shamanscore (X axis) on TSS-enhancers pairs with coordinated mesoderm or ectoderm specific activity. Shaman differences is computed only for contacts with positive scores in both C2.1 and C2.3. E Examples of virtual 4 C plots (top) and H3K4me1 ChIP-seq (bottom, C2.1 followed by C2.3) around 4 ectoderm and 4 mesoderm genes. Gray vertical lines mark the TSS and putative enhancer. Gene-free regions around regulatedgenes are highlightedbyhorizontal gray bars.FContact structure around the Tbx3-Tbx5 locus in the C2.1 and C2.3 clusters. Contacts discussed in the text are marked by dashed circles. Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 10 To prepare single-cell Hi-C libraries from single nuclei in plate wells, 2.5 µl of PBS was added to eachwell and the plate was sealed and incubated at 65 °C overnight. DNA was then tagmented using the Nextera XT kit (Illumina) by adding 5 µl of TDand 2.5 µl of ATMperwell and incubating at 55 °C for 5minutes, followed by cooling to 10 °C and adding of 2.5 µl of NT per well. Hi-C ligation junctions were then captured by Dynabeads M-280 streptavidin beads (Thermo Fisher; 10 µl of original suspension per well). Beads were prepared by washing with 1x BW buffer (5mM Tris-Cl pH 7.5, 0.5mM EDTA, 1M NaCl), resuspended in 4x BW buffer (20mM Tris-Cl pH 7.5, 2mM EDTA, 4M NaCl; 4 µl per sample), and then mixed with the 12.5 µl per-well sample and incubated at room temperature overnight with gentle agitation. Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 11 The beads were then washed four times with 40 µl of 1x BW buffer, washed twice with 40 µl of 10mM Tris-Cl pH 7.5 and resuspended in 12.5 µl of 10mMTris-Cl pH 7.5. Single-cell Hi-C libraries were amplified from the beads by adding 7.5 µl of Nextera PCR Master Mix, 2.5 µl of Index 1 primer and 2.5 µl of Index 2 primer (a different combination of index 1 and index 2 perwell) followedby 12 PCR cycles. The beadswere then magnetically removed and the supernatant from all 384 wells combined. The combined supernatant was purified using AMPure XP beads (Beckman Coulter; 0.6 times volume of the supernatant) according to the manufacturer’s instructions and resuspended in 100 µl of 10mM Tris-Cl pH 7.5. Finally, the sample was purified again using AMPure XP beads (1.0 times volume of supernatant) and resus- pended in 11 µl of 10mM Tris-Cl pH 7.5. Embryo dissection and collection of primitive erythrocytes. Preg- nant females were anesthetized with isoflurane using the open-drop system, followed by decapitation in accordance with a protocol approved by the Florida State University Animal Care and Use Com- mittee (ACUC). Uterine horns were removed, rinsed in room tem- perature PBS and embryos were isolated and transferred to a droplet of DMEM-high glucose, 10% FBS, 2mM L-glutamine, 1X MEM-Eagle non-essential amino acids and 12 ug/mL heparin (Sigma #H31493). Placenta and extraembryonic tissues were removed, embryos were decapitated and circulating peripheral blood was allowed to flow into the droplet of the room temperature media from the severed vitelline and umbilical veins. Media was collected, pooled, and brought to a volumeof 21.875mLwith room temperaturemedia. Cells werefixedby adding a final concentration of 2% paraformaldehyde for ten minutes. Fixation was quenched by bringing the solution to a final concentra- tion of 0.127M glycine, then incubating on ice for 5min. Cells were pelleted, washed with PBS and pelleted. Cells were flash-frozen and kept at −80C. Cells were thawed and stained for CD71 and TER119. Cells were first blocked with 1mL of PBS-FT (5% FBS, 0.1% Tween-20) for 1 h, then stained with 1:200 anti-CD71-PE (Invitrogen, 12-0711-82) and 1:200 anti-TER119-APC (Invitrogen, 17-5921-82) for 2 h at room temperature. Cells were washed and resuspended in 500 uL PBS-F (2% FBS) and Hoechst (15 ug/mL) and subjected to FACS by Aria (BD Biosciences). Primitive erythrocytes (CD71+, TER119+) were collected and pooled into a 50mL falcon for scHi-C processing following the established protocol (Nagano et al., 2017). MARS-seq. MARS-seq on E9.0 embryos was performed as previously described42 sorting 15 plates from 2 129S4/SvJae embryos and sequencing a total of 5760 cells, out of which we retained for analysis 4781 cells with at least 1000 unique molecular identifiers (UMIs) each (median coverage 4574 UMIs). The experiment was performed in accordancewith the institutional animal care and use guidelines of the Weizmann Institute of Science. Sequencing and basic computational analysis scHi-C sequence processing, quality control and cell cycle phas- ing. We processed the scHi-C data as described previously19. Briefly, paired-end reads were demultiplexed to single cells using cell specific barcodes. Reads were broken to segments using matches to MboI recognition site (GATC), and segments were mapped to the genome using Bowtie2. Duplicate contacts were discarded. We next performed quality control (QC) on each single cell. Cells were filtered based on their coverage (total number of reads), fraction of non-digested contacts, maximal chromosomal coverage aberration, and the contact distance bin with highest number of contacts. To partition cells into different phases of the cell cycle, and order the cells within the phases, we calculated for each cell the fraction of “near” reads (with distance <2Mb), the fraction of mitotic reads (with distance 2–12Mb),mean contact distance for distances at least 4.5Mb, and the fraction of contacts from a predefined set of early replicating regions. These statistics were used to phase cells into post-mitotic, G1, early to mid-S, mid-S to G2 and pre-mitotic phases, and to order cells within each phase. We note that this approach to phasing was only used as a preliminary stage for the algorithms described below. Metacell analysis. We applied the Metacell algorithm27 to organize E9 single cell profiles in 77 metacells (excluding 69 outlier cells), that we summarized into quantitative expression profiles and visualized as previously described27. We also downloaded published single cell profiles from the mouse gastrulation atlas21 and generated 1306 atlas metacells on 110,291 QC-positive cells. Atlas metacells were annotated bymajority voting on the published annotations of their cells, defining for each metacell m, the function atlas.type (m). Each atlas metacell i defined a gene expression distribution eatlasgi over the set of the 2237 feature genes g used while constructing the metacell graph. For annotation of the E9 map, we identified for each E9 single cell profile the atlas metacell with maximal correlations ann= atlas:typeðargmaxi½corðlogðug + 1Þ, logðϵ+ eatlasgi Þ�Þ, where ug is the UMI vector for the E9 cell and ϵ= 10�5 is a regularization factor.We then annotated each E9 metacell with the atlas annotation atlas.type that was linked with most of its cells. Definitions and derivation of the strict early and strict late genomic subsets. We partitioned the genome into bins of size 200 kb (or 40 kb, depending on application) and counted scHi-C coverage per bin and cell in a matrix. We performed down-sampling of the scHi-C data such that each cell has 75k contacts and defined: DSN=dsni j as the number of contacts that map to genomic bin j in cell i after downsampling. We next identified strict-early and strict-late genomics bins. This was done by clustering the genomic bins j using the vectors dsni j into 4 groups using hierarchical clustering. The two clusters showing the highest and lowest coverage were shown to represent the previously observed19 A andBcompartment structures respectively. These clusters behaved consistently (e.g. showenrichment (for A) and anti-enrichment (for B) in S-phase cells) between the pool of embryo and ESC cells. We will denote that derived genomic bins subsets earlystrict and latestrict. We defined the early/late ratio of a cell as: eli = log2 P j2earlystrict dsn i jP j2latestrict dsn i j 0 @ 1 A Fig. 5 | Gastrulation accessibility hotspots are chromosomally intertwined. A Bottom panel shows the accessibility of peaks (rows) over metacells (columns) (log2 the number of normalized ATAC-seq reads). Shown are loci from select clusters highlighted in the text. Top panel depicts gene expression of correlated TFs over the same metacells, provided in order to link accessibility clusters with specific cell types. B For each cluster of ATAC peaks we computed the fraction of loci with A-compartment score difference larger than 0.1 when comparing ESC and Embryo pooled Hi-C. Clusters with over 0.08 of the loci showing A-score enrich- ment in ESCs are colored black. C Similar to (B), but comparing embryo and pEry pooled Hi-C maps. D Similar to (B), but comparing the embryonic clusters C2.1 (ectoderm) and C2.3 (mesoderm). E Left panel is showing mean normalized accessibility for ATAC peak clusters (row) and metacells (column). Right panel is showing for each pair of peak clusters the enrichment of intra-TAD proximity (number of pairs of peaks in the same TAD and within 200 kb of each other). Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 12 and classified mid S-phase cells as: KS = fi s:t: eli > 1:8g: Knon�S was defined as all other cells. A-score and early-score for genomic bins. For each genomic bin j and each cell i, we count the number of long range intra-chromosomal contacts (>1Mb) observed between fragment ends in the bins and fragment ends in the earlystrict and latestrict genome compartments, defining count vectors cAi j and cBi j . The A-score of a genomic bin is determined given a set C of scHi-C profiles (possibly all) as: scoreAC j = X i2Cf g cAi j= X i2Cf g cAi j + X i2Cf g cBi j " # : The early-replication score (shortened early-score) of a bin is computed given a groupCof cells (typically all or part of cells classified as S-phase) by comparing the relative coverage of the bin in C to its relative coverage in G1 cells: scoreECj = log ð∣G1∣=∣C∣Þ X i2Cf g dsni j= X i2G1f g dsni j 0 @ 1 A Mapping gene expression to genomic bins and scHi-C clusters. We used UCSC gene annotation to determine for each gene (as defined by the MARS-seq or 10X pipeline) a transcription start site (TSS) coordi- nate. Gene expression profiles were generated as the fraction of UMI per gene observed in scRNA-seq metacells or group of metacells27. Given anexpressionprofile eg wedefined aprofile over genomicbins ej by taking the maximal expression of all genes mapping to TSSs on the bin j. Tomatch expressionprofiles and scHi-C clusters, we used clusters of TSSs showing coordinated or enriched expression to compute meanðscoreAC fj2TSSbinsgÞ or meanðscoreECfj2TSSbinsgÞ for each scHi-C clus- ter C. TSSbins sets were generated in two ways. First, given a metacell model, we normalized expression (log transforming and subtracting the mean over all metacell profiles), and selected the top 50 enriched TSSs that had enrichment value larger than 0.5. These TSS sets were used to compute the matching between A and early scores and the erythrocyte scHi-C cluster in Fig. 1. Second, we clustered genes based on their metacell log2 UMI enrichment profiles (Fig. 3I), generating clusters that were curated manually and derived TSSbins sets from them for analysis of A-score and early score differences (Fig. 3J, K). Hi-C contact matrices analysis Shamananalysis. To calculate enrichment of genome contacts in aHi- C contact matrix, and to visualize chromosomal conformations, we used the Shaman algorithm24,28. We pooled all cells in each cell cluster, and down-sampled the contacts to the same number in each cell cluster pool. We then applied Shaman to the down-sampled contact pools. Briefly, Shaman shuffles contacts while maintaining the mar- ginal coverage distribution and the contact distance distribution, creating a random shuffled contact matrix. The enrichment of a con- tact is then scored using a KS statistic on the k-nearest neighbors of that contact in the original down-sampled contactmatrix and shuffled contact matrix. The Shaman results we report here were derived using an improved MCMC sampler that provide better convergence (in particular onmatrices with a smaller number of contacts). In short, the algorithm uses efficient data structures to compute precisely the MCMC update rule. This approach is replacing the previously used strategy of adaptive calibration of a correction term for the function assigning probability for contacting at any genomic distance. Insulation. We calculated insulation as described previously24,29. For a genomic locus, we counted the number of contacts where one contact is up to 200 kb upstream of the coordinate and the other up to 200 kb downstream. We next counted the number of contacts where both contacts are in distance up to 200 kb from the coordinate. The log ratio between these two numbers is the insulation score. We per- formed this calculation genome wide in 40 kb jumps. Virtual 4C. To calculate the 4C trace at a specific genome coordinate x, we looked at all contacts which satisfy either of the next conditions: a. One of the fragment ends is at distance <3e3 from x, and the distance between the fragment ends is <1e5. b. One of the fragment ends is at distance <1e4 from x, and the distance between the fragment ends is between 1e5 and 5e5. c. One of the fragment ends is at distance <3e4 from x, and the distance between the fragment ends is between 5e5 and 1e6. To screen for conformation differences for two scHi-C clusters in a set of target loci, we calculated the difference between their virtual 4 Cs. We partitioned the 4C trace to bins based on contact distance (2.5e4, 5e4, 1e5, 2e5, 3e5, 5e5, 7.5e5, 1e6, onboth 3’ and 5’).Weaveraged the Shaman scores within every bin, and defined the distance of the conformations for two clusters as the maximum difference (in abso- lute value) over all bins. Parameters and specific figure panel analysis ClusteringESC, Embryos anderythrocytes (Fig. 1).Weprocessed the scHi-C data, and performed QC and cell cycle phasing as described above. For generating clusters in Fig. 1, we used S-phase seeding (Supplementary Methods), with the following inputs: KS,K non�S , DSN, and the matrices cAi j and cBi j . To identify primitive erythrocytes, we identified a bin cluster (of the 11 A-score-based bin clusters, see Supplementary Methods) that had high C3-specific A-score, and similarly a bin cluster with low C3- specific A-score.We calculated the pooledA-score of eachof these two bin clusters in each single cell (denoted cell Ai m in the Supplementary Methods), and used a linear separator to classify cells based on these two scores as either C3 or non-C3 (Supplementary Fig. 4D). Embryo cells that were not classified as C3 were assigned to C2, and ESC cells to C1. We generated the genomic bin expression value, A-score and early-score in ESC and non-pEry embryo as described above, in 40 kb resolution. We defined genomic bins with at least 4-fold change in expression as ESC- and embryo-induced. Similarly, we defined embryo A-specific bins and ESC A-specific bins as having at least 0.2 difference in A-score. We screened for genes with different Shaman score in ESC and embryo using comparisons of virtual 4Cs as described above. We similarly calculated differences in Shaman scores for embryo and ESC A-specific bins (Supplementary Fig. 4A), but looking at the 4C profile of each bin only up to 500 kb upstream and downstream. Erythrocyte analysis (Fig. 2). As before, we generated the genomicbin expression value, A-score and early-score in pEry and non-pEry in 40 kb resolution. We defined genomic bins with at least 4-fold change in expression as pEry- and non-pEry-induced. We also identified bins that were not expressed in either of the clusters. To create Fig. 2J we identified 40 kb genomic bins with A-score that is at least 0.35 higher in Erys compared to embryo. We merged adjacent bins meeting this criterion, and for every set of merged bins found the bin with highest difference in A-score between Erys and embryo. For every such bin, we looked at the average A-score of its 3’ and 5’bins up to 400 kb.We reversedA-score traces (mirroring 3’/5’) to create a matrix in which for all rows, the upstream 5’ A-score is higher. Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 13 We concatenated the A-score trace in pErys and non-pErys, and clus- tered the concatenated traces using kmeans into 8 clusters. We per- formed a similar analysis when taking genomicbinswith A-score that is at least 0.35 higher in embryo compared to Erys (Supplemen- tary Fig. 4H). To create Fig. 2L, we partitioned the contact matrix into 20 kb × 20 kbbins, and created for eachof the loci in Fig. 2J amatrix of average Shaman scores in the 1Mb around it (on both sides).We then averaged the scores in such matrices for the loci in each of the Fig. 2J clusters. Gata1 and Tal1 ChIP-seq. We scored and normalized 20bp bins for theirGata1 and Tal1ChIP-seq score using data from ENCODE.We used ChIP-seq scores as previously described, computing ChIP coverage percentiles p for each bin, and defined the score as −log2(1 − p). We defined Gata1 and Tal1 binding sites as those having score > 8. For 40 kb genomic bins we computed a binding score as the maximum ChIP-seq score of all binding sites contained in it. Clustering the embryo proper (Fig. 3). We applied the replication trend mixture model (Supplementary Methods) on embryo scHi-C profiles classified as non-pEry, non-G1 and non-M as described above. We further selected cells with sufficient coverage (at least 8 contacts per 200 kbbinon average), andmid-Sphaseclassification as inNagano et al. 201719. We set nij as the number of contacts in cell i that map to genomic bin j and excluded the X chromosome, or any bin with mean coverage <8. To set pj, we calculated the fraction of contacts that mapped to each genomic bin across all G1 cells that are not erythrocytes. To initialize the model we clustered cells hierarchically using distances based on correlations between rows in a normalized nij matrix. Normalization provided initial heuristic correction to the cell cycle effect by ordering cells according to their scHi-C fraction of short range contacts and subtracting for each locus the runningmean (using a window of 20 cells). Given this clustering solution, we initialized E zik � � such that each cell belongs to its cluster with probability 0.5, and to all other clusters with equal probability. In case k =2, each cell belongs to its cluster with probability 2 3. To initialize si we ordered the cells by the fraction of short-range contacts theymake, and assigned them values between 1.2 and 1.8 according to their order, assuming that all parts of the repli- cation program are equally represented in the data. We performed cross validation on the hyperparameters, as described in the Supplementary Methods, and selected L= 12, R= 11, λ=40. To generate UMAP projections of mid-S phase cells, we normal- ized nij coverage by G1 mean coverage, selected bins with high var- iance to mean ratio, and calculated a cell-cell correlation matrix using these values. We then used the R package umap with default para- meters (and random seed = 42). We repeated this analysis using data normalized based on inferred s-score (see Supplementary Methods). Plotting replication trends for early replicating bins. To plot Fig. 3F, we identifiedbins that are in replication regime 2 (out of 12) in C2.1 (left plot; C2.3 for the right plot), and are in replication regime ≥4 in all other clusters, and for every cell calculated the total fraction of con- tacts from these bins. Executing other scHi-C clustering algorithms. We executed schicluster and scHi-C topic modeling. We ran schicluster and topic modeling with resolutions 1Mbp and 0.5Mbp respectively, as per- formed in the publications of these methods. Cluster annotation. To annotate the C2.1, C2.2 and C2.3 clusters, we used 15 TSS bin sets G1,::G15 derived from the E9 metacell model data as described above. To account for possible differences in the s-score distribution in eachcell cluster,we orderedgenomic bins j 2 S m Gm by theirmean early score across clusters, computed for each bin scoreECk j and subtracted from it the running mean using a window of 200 bins, defining scoreE0j Ck . We then computed meanðscoreE0 j2Gmf g Ck Þ, and normalized rows to create the matrix shown in Fig. 3J. A similar nor- malization strategy was used with A-scores to derive the matrix in 3 K. We repeated the same analysis for the gastrulation atlas metacell model, using 20 gene modules. We note that in order to test possible functional association of the C2.2 cluster, in this analysis we only used 42%of its cells showing a stronger correlation structure. Similar results were obtained using the entire C2.2 cluster. To compareC2.1 andC2.3 to E14.5 data24,33, we computedA-scores for genomic bins of length 40Kbp in four samples: C2.1 cells, C2.3 cells, E14.5 HSCs and E14.5 NPCs. To compute these scores, we used the strict-early and strict-late genomic bins that we used to calculate A-scores previously. Because of the large difference in depth between our data and the E14.5 data, we downsampled the contacts of each genomic bin such that the total number of strict-early and strict-late contacts a genomicbinmakes is the same in the four different samples. The downsampled contacts were used to calculate the A-scores. To estimate our assay’s sensitivity, we sampled 100, 75, 50 and 25 cells from cluster C2.1, and applied the replication mixture model to a dataset including this subset with all cells from C2.2 and C2.3. We performed a similar analysis for C2.3. To search for additional sub-structure in cluster C2.1 and C2.3 (Supplementary Fig. 9) we applied hierarchical clustering to cell-cell correlations derived using s-score normalized copy number profiles. We partitioned C2.1 into 3 subclusters, and C2.3 into 9 subclusters. To annotate the C2.1 subclusters, we correlated their A-score and cover- age fold changes with differential gene expression of ectodermal cell types. To calculate the gene expression profile of a cell type, we cal- culated the average log2 expression of each gene across all the cell type’s metacells in the E9 scRNA-cell data. We then subtracted from each gene its mean expression across ectodermal cell types. This gave each gene its differential expression across all ectodermal cell types. To calculate a genomic bin’s A-score fold change in a subcluster, we calculated the A-score by pooling all contacts from the subcluster’s cells, and the A-score by pooling all contacts from the other sub- clusters’ cells. The bin’s A-score fold change is then the log2of the ratio between these A-scores. To calculate the coverage fold change, we calculated the relative coverage in the pool of the subcluster’s cells as described above, and the relative coverage in the pool of other sub- clusters’ cells, and took the log2 of their ratio. We then only selected genes with at least 2-fold change in gene expression in some cell type, and correlated their relative expression with the A-score and coverage fold changes of the bins containing these genes. Both the A-score and coverage were calculated for genomic bins of size 200 kb. To annotate the C2.3 subclusters we did a similar analysis, but used only genes with at least 4-fold change in gene expression in some cell type. Screening for differential ecto/meso contacts (Fig. 4). We identified enhancers using Chip-seq ENCODE data from ectoderm (forebrain, midbrain, hindbrain) and mesoderm (heart and limb) tissues43. We calculated the ChIP-seq scores (log2(1-percentile)) in 20 bp resolution for each of the 5 tissues. We called enhancers as contiguous genomic intervals (or peaks) showing H3k4me1 scores > 7 (that is, the top 1/128 bins). We scored each peak H3k4me1 occupancy in mesoderm (max- imumbetween the values of the two tissues) and ectoderm (maximum among the values of the three tissues). To define mesoderm and ectoderm specific peaks we required a score of at least 9 in one set of tissues and a difference of at least 3 between the scores of the two tissue sets. Overall this approached generated 24059 and 9506 meso and ecto- specific enhancers respectively. To identify meso- and ecto- specifically expressed genes (and TSSs), we identified three metacells representing the mesoderm tran- scriptional state and three others representing the ectoderm state. We Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 14 computed the maximum expression per gene in each set. 826 genes were showing at least two-fold differencebetween the twoprofiles, and their TSSs were considered for enhancer associations below. To create a set of potential promoter-enhancer interactions, we linked each enhancer peak with its closest TSSs. We searched for such TSSs 50k-500k upstream and downstream of the enhancer locus (so up to two genes were linked with each peak). We did not use pairings spanning less than 50kb since scHi-C resolution at such distances is limited. We also note that our pairing heuristic is by no means exhaustive, and is meant only to generate a shortlist of putative pairs. Finally, we identified mesoderm and ectoderm specific enhancer- promoter candidate pairs as those involving a differential enhancer and a specifically expressed gene according to the definitions above. We defined the Shaman score of a putative enhancer-promoter inter- action as the score of the contact between coordinates closest to the enhancer and promoter (using Euclidean distances)), and computed it for both meso and ecto. We selected pairs with Shaman score at least 15 higher in the expected cell cluster, or with an absolute shaman score at least 15 if the other score is negative, as having three way support. To derive a p-value for the number of contacts between an enhancer and a promoter, we counted the number of contacts in the pooled ectodermandmesodermcells in a 50 kbwindow (25 kb to each direction) around the enhancer-promoter in the contact matrix. Denote these numbers for an enhancer-promoter pair ep by ectoep and mesoep. We similarly counted the number of contacts for all other enhancers and their associated TSSs. To test whether an enhancer- promoter pair had a high number of contacts in ectoderm, we calcu- lated ectoep � mesoep, and compared it to the empirical distribution of ectoe’p’ � mesoe’p’ for all background enhancer promoter pairs e’p’ that had the sameectoe’p’ + mesoe’p’ value as ep. To increasepower, for Supplementary Fig. 10C, D we only looked at enhancer-promoter pairs for which ectoep + mesoep ≥80. For the background distribution to be accurate, we only considered ectoep + mesoep values with more than 100 other enhancer-promoter pairs with similar ectoe’p’ + mesoe’p’ value. We performed a similar analysis for mesoderm. We performed a similar analysis to identify H3k27me3-gene pairs. Genes were selected similarly. H3k27me3 regions were selected as those with ChIP-seq score > 7. We designated ecto- or mesoderm specific regions as those having ChIP-seq score > 3 higher than the other tissue. The selected pairs are those where the gene is lowly expressed and the H3k27me3 signal is higher. To find all hotspots with support for different chromosomal conformation between ectoderm and mesoderm, we looked only at contacts in distance 1e4 to 1e6. We compared the Shaman score of every meso contact to the Shaman score of its nearest ecto contact. We detected regions with high Shaman difference iteratively. In each iteration we identified the contact with the maximal Shaman difference between meso and ecto, and removed all contacts where both their ends are in distance <5e4 from the maximal-difference contact. We continued with this process until no contact with Sha- man difference >40 remained. This resulted in 5200 hits that we used in order to estimate the fraction of differential contacts explicable by known three-way supported promoter-enhancer pairing in the text. Analysis of multiome-data and integration with pooled Hi-C clus- ters. We used scRNA-seq and scATAC-seq profiles from a recent paper by the Reik group38 to generate the analysis in Fig. 5, applying the following steps: 1. Using metacell-244 with default parameters and target metacell size of 320K UMIs to organize scRNA-seq profiles into 1404 metacells. 2. Using the RNA-based grouping of cells to collect single-cell ATAC reads and create a genomic track for each metacell. 3. Finding all genomic intervals with ATAC-coverage (total over all metacells) larger than 300 and identifying the maximal coverage 300bp within each such interval as a peak. Overall this provided us with 94,600 peaks. 4. Grouping RNA metacells into 300 clusters using hierarchical clustering of the RNA signatures. RNA clusters were associated with cell type by comparison to gastrulation manifolds and TF expression profiles. 5. We then pooled ATAC reads over the clusters and extracted the reads within identified peaks into an accessibility count matrix. We removed cell clusters supported by fewer than 82 K ATAC reads, retaining for analysis 285 clusters. 6. Normalizing peak ATAC coverage in each cluster by normalizing (dividing by total reads for the cluster) and transforming the frequencies p to log2(1e-5+p). 7. Running kmeans++ with a large number of clusters (K = 120) over the normalized accessibility profiles. Deriving mean peak cluster profile by averaging the log normalized ATAC values. 8. Filtering peak clusters with less than 100 peaks (only 1 case). Annotating peak clusters as variable whenever at least four metacell clusters showed mean ATAC value smaller than −16 and the difference between minimum and maximum value over the cluster was larger than 0.7. All other clusters were considered constitutive. 9. Computing the A-score of each peak in ESC, Embryo, pEry, C2.1 and C2.3. Analysis of the A-score distributions in each cluster is used to generate Fig. 5B–D. 10. Identifying all pairs of peaks within less than 200 kb genomic distance. Summarizing the number of such pairs between ele- ments of each pair of clusters into a matrix of observed “proxi- mities”. Multiplying each element in thematrix by the total matrix counts divided by the product of its row and column total counts. Log transforming the resulted enrichment ratio, followed by hierarchical clustering of the submatrices defined by the con- stitutive and variable peak clusters (separately) in order to gen- erate the heat map of Fig. 5E. Reporting summary Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. Data availability The data that support this study are available from the corresponding authors upon reasonable request. The scHi-C and scRNA-seq data generated in this study have been deposited in the GEO database under accession code GSE148793. The ESC scHi-C data used in this study are available in the GEO database under accession code GSE94489. The previously published embryo gastrulation scRNA-seq data used in this study are available in theArrayExpressdatabaseunder accession code E-MTAB-6967. The scRNA/scATACmultiomedata used in this study are available in the GEO database under accession code GSE205117. The neural progenitor cells’Hi-Cdata used in this study are available in the GEO database under accession code GSE96107. The hematopoietic Hi-C data used in this study are available in the GEO database under accession code GSE119201. Code availability All code supporting the analysis of this work is available in Github: https://github.com/tanaylab/scHiC_embryo. References 1. Bonev, B. &Cavalli, G.Organization and function of the 3Dgenome. Nat. Rev. Genet. 17, 661–678 (2016). 2. Kim, S. & Shendure, J. Mechanisms of interplay between tran- scription factors and the 3D genome.Mol. Cell 76, 306–319 (2019). Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 15 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE148793 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94489 https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-6967?query=E-MTAB-6967 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE205117 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96107 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE119201 https://github.com/tanaylab/scHiC_embryo 3. Marchal, C., Sima, J. & Gilbert, D. M. Control of DNA replication timing in the3Dgenome.Nat.Rev.Mol.Cell Biol.20, 721–737 (2019). 4. Despang, A. et al. Functional dissection of the Sox9-Kcnj2 locus identifies nonessential and instructive roles of TAD architecture. Nat. Genet. 51, 1263–1271 (2019). 5. Symmons,O. et al. The shh topological domain facilitates the action of remote enhancers by reducing the effects of genomic distances. Dev. Cell 39, 529–543 (2016). 6. Rodríguez-Carballo, E. et al. The HoxD cluster is a dynamic and resilient TAD boundary controlling the segregation of antagonistic regulatory landscapes. Genes Dev. 31, 2264–2281 (2017). 7. Bintu, B. et al. Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells. Science 362, eaau1783 (2018). 8. Kempfer, R. & Pombo, A. Methods for mapping 3D chromosome architecture. Nat. Rev. Genet. https://doi.org/10.1038/s41576-019- 0195-2. (2019). 9. Monahan, K., Horta, A. & Lomvardas, S. LHX2- and LDB1-mediated trans interactions regulate olfactory receptor choice. Nature 565, 448–453 (2019). 10. Mirny, L. A., Imakaev, M. & Abdennur, N. Twomajor mechanisms of chromosome organization.Curr. Opin. Cell Biol. 58, 142–152 (2019). 11. Beagan, J. A. & Phillips-Cremins, J. E. On the existence and func- tionality of topologically associating domains. Nat. Genet. 52, 8–16 (2020). 12. Sexton, T. et al. Three-dimensional folding and functional organi- zation principles of the Drosophila genome. Cell 148, 458–472 (2012). 13. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012). 14. Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012). 15. Pope, B. D. et al. Topologically associating domains are stable units of replication-timing regulation. Nature 515, 402–405 (2014). 16. Dileep, V. & Gilbert, D. M. Single-cell replication profiling to mea- sure stochastic variation in mammalian replication timing. Nat. Commun. 9, 427 (2018). 17. Naumova, N. et al. Organization of the mitotic chromosome. Sci- ence 342, 948–953 (2013). 18. Zhang, H. et al. Chromatin structure dynamics during the mitosis- to-G1 phase transition. Nature 576, 158–162 (2019). 19. Nagano, T. et al. Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature 547, 61–67 (2017). 20. Ramani, V. et al. Massively multiplex single-cell Hi-C. Nat. Methods 14, 263–266 (2017). 21. Pijuan-Sala, B. et al. A single-cell molecular map of mouse gas- trulation and early organogenesis. Nature 566, 490–495 (2019). 22. Nowotschin, S. et al. The emergent landscape of the mouse gut endoderm at single-cell resolution. Nature 569, 361–367 (2019). 23. Dixon, J. R. et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015). 24. Bonev, B. et al.Multiscale 3Dgenome rewiringduringmouse neural development. Cell 171, 557–572.e24 (2017). 25. Schmitt, A. D. et al. A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Rep. 17, 2042–2059 (2016). 26. Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene pro- moters. Cell 167, 1369–1384.e19 (2016). 27. Baran, Y. et al. MetaCell: analysis of single-cell RNA-seq data using K-nn graph partitions. Genome Biol. 20, 206 (2019). 28. Cohen, N. M. et al. SHAMAN: bin-free randomization, normalization and screening of Hi-C matrices. bioRxiv 187203–187203. https:// doi.org/10.1101/187203 (2017). 29. Olivares-Chauvet, P. et al. Capturing pairwise and multi-way chro- mosomal conformations using chromosomal walks. Nature 540, 296–300 (2016). 30. Guo, Y. et al. Chromatin jets define the properties of cohesin-driven in vivo loop extrusion. Mol. Cell 82, 3769–3780.e5 (2022). 31. Zhou, J. et al. Robust single-cell Hi-C clustering by convolution- and random-walk–based imputation. Proc. Natl Acad. Sci. 116, 14011–14018 (2019). 32. Kim, H.-J. et al. Capturing cell type-specific chromatin compart- ment patterns by applying topic modeling to single-cell Hi-C data. PLoS Comput Biol. 16, e1008173 (2020). 33. Chen, C. et al. Spatial genome re-organization between fetal and adult hematopoietic stem cells. Cell Rep. 29, 4200–4211.e7 (2019). 34. Du, Z. et al. Polycomb group proteins regulate chromatin archi- tecture in mouse oocytes and early embryos. Mol. Cell https://doi. org/10.1016/j.molcel.2019.11.011.(2019). 35. Loubiere, V., Martinez, A.-M. & Cavalli, G. Cell fate and develop- mental regulation dynamics by polycomb proteins and 3D genome architecture. Bioessays 41, e1800222 (2019). 36. Schoenfelder, S. et al. Polycomb repressive complex PRC1 spatially constrains themouse embryonic stem cell genome.Nat. Genet.47, 1179–1186 (2015). 37. van Weerd, J. H. et al. A large permissive regulatory domain exclusively controls Tbx3 expression in the cardiac conduction system. Circ. Res. 115, 432–441 (2014). 38. Argelaguet, R. et al. Decodinggene regulation in themouseembryo using single-cell multi-omics. 2022.06.15.496239. Preprint at https://doi.org/10.1101/2022.06.15.496239 (2022). 39. Ji, P. New insights into the mechanisms of mammalian erythroid chromatin condensation and enucleation. Int Rev. Cell Mol. Biol. 316, 159–182 (2015). 40. Beagrie, R. A. et al. Complex multi-enhancer contacts captured by genome architecture mapping. Nature 543, 519–524 (2017). 41. Allahyar, A. et al. Enhancer hubs and loop collisions identified from single-allele topologies. Nat. Genet. 50, 1151–1160 (2018). 42. Keren-Shaul, H. et al. MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing. Nat. Protoc. 14, 1841–1862 (2019). 43. Gorkin, D. U. et al. Systematic mapping of chromatin state land- scapes during mouse development. https://doi.org/10.1101/ 166652 (2017). 44. Ben-Kiki, O., Bercovich, A., Lifshitz, A. & Tanay, A. Metacell-2: a divide-and-conquer metacell algorithm for scalable scRNA-seq analysis. Genome Biol. 23, 100 (2022). Acknowledgements This work is dedicated to the memory of Elad Chomsky. We thank members of the Tanay lab for discussion. Work in A.T. group was sup- ported by the NIH 4DN nucleomic tools program and by the European ResearchCouncil grant (scAssembly). Work in P.F. groupwas supported by the NIH 4DN Nucleomic tools program. Work in R.S. group was supported by the German-Israeli Project DFG RE 4193/1-1, ISF (grant No. 1339/18), Len Blavatnik and the Blavatnik Family foundation and ISF (grant No. 3165/19) within the Israel Precision Medicine Partnership program. N.R. was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics, Tel Aviv University, and by the Planning and Budgeting Committee (PBC) fellowship for excellent PhD students in Data Sciences. The contribution of N.R. is part of Ph.D. thesis research conducted at Tel Aviv University. T.N. was supported by MEXT/JSPS KAKENHI (JP18H02374 and JP19H05744) and a grant from the Takeda Science Foundation. Author contributions A.T., E.C. and P.T. conceived and designed the project. E.C., T.N., C.S., W.L. and Z.M. executed all wet lab-based experiments. N.R., E.C., Y.L., Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 16 https://doi.org/10.1038/s41576-019-0195-2 https://doi.org/10.1038/s41576-019-0195-2 https://doi.org/10.1101/187203 https://doi.org/10.1101/187203 https://doi.org/10.1016/j.molcel.2019.11.011 https://doi.org/10.1016/j.molcel.2019.11.011 https://doi.org/10.1101/2022.06.15.496239 https://doi.org/10.1101/166652 https://doi.org/10.1101/166652 Y.B., A.L. and A.T. conducted all computational analysis. R.S., A.T. and P.T. supervised the project. N.R. and A.T. wrote the manuscript with input from all authors. Competing interests The authors declare no competing interests. Additional information Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41467-023-39549-4. Correspondence and requests for materials should be addressed to Peter Fraser or Amos Tanay. Peer review information Nature Communications thanks the anon- ymous reviewers for their contribution to the peer review of this work. A peer review file is available. Reprints and permissions information is available at http://www.nature.com/reprints Publisher’s note Springer Nature remains neutral with regard to jur- isdictional claims in published maps and institutional affiliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/ licenses/by/4.0/. © The Author(s) 2023 Article https://doi.org/10.1038/s41467-023-39549-4 Nature Communications | (2023) 14:3844 17 https://doi.org/10.1038/s41467-023-39549-4 http://www.nature.com/reprints http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ Single cell Hi-C identifies plastic chromosome�conformations underlying the�gastrulation enhancer landscape Results Cell cycle signatures dominate embryonic chromosome conformations Clustering scHi-C profiles using S-phase cluster seeding and RNA atlas projection Differential contacts in pluripotent and embryonic nuclei Primitive Erythrocyte chromosomes show compact and highly organized folding pEry funnel-like A-compartment structures are anchored at TSSs and cryptic loci Refined embryo clustering by model-based analysis of replication dynamics Ectoderm and mesoderm/endoderm scHi-C clusters Lineage-specific scHi-C conformation differences are weak Three-way identification of regulated long-range interactions Polycomb markup and ectoderm specific long-range contacts in the Tbx3-5 locus Gastrulation cell-type specific accessibility hotspots are intertwined within TADs Discussion Methods Experimental methods Cell extraction, fixation and permeabilization Single-cell Hi-C library preparation Embryo dissection and collection of primitive erythrocytes MARS-seq Sequencing and basic computational analysis scHi-C sequence processing, quality control and cell cycle phasing Metacell analysis Definitions and derivation of the strict early and strict late genomic subsets A-score and early-score for genomic bins Mapping gene expression to genomic bins and scHi-C clusters Hi-C contact matrices analysis Shaman analysis Insulation Virtual 4 C Parameters and specific figure panel analysis Clustering ESC, Embryos and erythrocytes (Fig. 1) Erythrocyte analysis (Fig. 2) Clustering the embryo proper (Fig. 3) Plotting replication trends for early replicating bins Executing other scHi-C clustering algorithms Cluster annotation Screening for differential ecto/meso contacts (Fig. 4) Analysis of multiome-data and integration with pooled Hi-C clusters Reporting summary Data availability Code availability References Acknowledgements Author contributions Competing interests Additional information