A Roadmap for the Human Gut Cell Atlas  Author(s): Matthias Zilbauer#‡1,2,3†, Kylie R James#‡4,5, Mandeep Kaur#6, Sebastian Pott‡#7, Zhixin Li#8, Albert Burger‡#9, Jay R Thiagarajah10, Joseph Burclaff11,12, Frode L Jahnsen13, Francesca Perrone1,2, Alexander D. Ross1,2,14, Gianluca Matteoli15, Nathalie Stakenborg‡15, Tomohisa Sujino16, Andreas Moor‡17, Raquel Bartolome-Casado13,18, Espen S Bækkevold13, Ran Zhou‡19, Bingqing Xie‡20, Ken S Lau‡21, Shahida Din‡22, Scott T Magness12,23, Qiuming Yao24, Semir Beyaz25, Mark Arends‡26, Alexandre Denadai-Souza‡27, Lori A. Coburn‡28,29, Jellert T Gaublomme‡30, Richard Baldock‡31, Irene Papatheodorou‡32, Jose Ordovas-Montanes10, Guy Boeckxstaens‡15, Anna Hupalowska33, Sarah A Teichmann‡18,34, Aviv Regev33,35, Ramnik J Xavier‡36, Alison Simmons37, Michael P Snyder38, Keith T. Wilson‡28,29, Gut Cell Atlas‡ & Human Cell Atlas Gut Biological Network #These authors jointly supervised this work. ‡Members of the Gut Cell Atlas, an Initiative Supported by the Helmsley Charitable Trust. Author affiliations: 1Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK, 2University Department of Paediatrics, University of Cambridge, Cambridge, UK 3Department of Paediatric Gastroenterology, Hepatology and Nutrition, Cambridge University Hospitals, Cambridge, UK 4Garvan Institute of Medical Research, NSW, Australia, 5School of Biomedical Sciences, University of New South Wales, NSW, Australia 6School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg, South Africa 7Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, USA 8Dana-Farber Cancer Institute, Boston, USA 9Department of Computer Science, Heriot-Watt University, Edinburgh, UK 10Division of Gastroenterology, Hepatology and Nutrition, Boston Children’s Hospital, Harvard Medical School, Boston, USA. 11Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill, North Carolina State University, Chapel Hill, North Carolina, USA 12Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA 13Department of Pathology, Oslo University Hospital and University of Oslo, Oslo, Norway 14University Department of Medical Genetics, University of Cambridge, Cambridge, UK 15Translational Research Center for Gastrointestinal Disorders (TARGID), Department of Chronic Diseases, Metabolism and Ageing, KU Leuven, Leuven, Belgium 16Center for the Diagnostic and Therapeutic Endoscopy, School of Medicine, Keio University, Tokyo, Japan 17Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland 18Wellcome Sanger Institute, Hinxton, Cambridge, UK 19Section of Genetic Medicine, University of Chicago, Chicago, USA 20Department of medicine, University of Chicago, Chicago, USA  21Epithelial Biology Center and Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville TN, USA 22Edinburgh IBD Unit, Western General Hospital, NHS Lothian 23Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill/North Carolina State University, Chapel Hill, North Carolina, USA  24Department of Computer Science and Engineering, University of Nebraska Lincoln, Lincoln, Nebraska, USA 25Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724 USA 26Edinburgh Pathology, University of Edinburgh, Institute of Genetics and Cancer, Edinburgh, UK 27Laboratory of Mucosal Biology, Department of Chronic Diseases, Metabolism and Ageing, KU Leuven, Belgium 28Vanderbilt University Medical Center, Nashville, USA 29Veterans Affairs Tennessee Valley Healthcare System, Nashville, USA 30Department of Biological Sciences, Columbia University, New York, USA 31University of Edinburgh, UK 32European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK. 33Genentech, San Francisco, CA, USA 34Theory of Condensed Matter Group, Cavendish Laboratory/Department of Physics, University of Cambridge, Cambridge, UK 35Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA. 36Broad Institute and Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, USA 37MRC Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK. 38Stanford University School of Medicine, Stanford, USA †Email: mz304@cam.ac.uk Abstract: The number of studies investigating the human gastrointestinal tract using various single-cell profiling methods has increased substantially in the past few years. Although this increase provides a unique opportunity for the generation of a first comprehensive Human Gut Cell Atlas (HGCA), there are still a range of major challenges ahead. Above all, the ultimate success will largely depend on a structured and coordinated approach that aligns global efforts undertaken by a large number of research groups. In this Roadmap, we discuss a comprehensive forward-thinking direction for the generation of the HGCA on behalf of the Human Cell Atlas (HCA) Gut Biological Network. Based on the consensus opinion of experts from across the globe, we outline the main requirements for a first complete HGCA by summarising existing datasets and highlighting anatomical regions and/or tissues with limited coverage. We provide recommendations for future studies and discuss key methodologies and the importance of integrating the healthy gut atlas with related diseases and gut organoids. Importantly, we critically overview the computational tools available and provide recommendations to overcome key challenges. [H1] Introduction The intestinal tract is one of the most complex organs in the human body and serves a wide range of functions, including the digestion and absorption of nutrients and representing a major site of immune interactions. Different anatomical sections of the intestinal tract have specific roles in the digestive process requiring the presence and complex interaction of various cell types. Additionally, the interaction with trillions of nearby microbes has been shown to be of critical importance, adding further complexity to this finely-tuned symbiosis. A detailed knowledge of gut physiology and cellular function in health is a prerequisite for investigating related diseases such as bowel cancer and inflammatory conditions. The development of methodologies enabling genome-wide molecular profiling on a single-cell level has opened unprecedented opportunities to generate detailed anatomical maps of the human body. Although the number of studies investigating the human intestine using various single-cell profiling methods has increased substantially in the past few years, generating a first comprehensive Human Gut Cell Atlas (HGCA) is associated with major challenges. Above all, the ultimate success will largely depend on a structured and coordinated approach that aligns global efforts. The Human Cell Atlas (HCA) Gut Biological Network connects leading experts in a wide range of related areas providing an ideal platform to lead the development of the first single-cell HGCA. Here we provide a detailed roadmap, including recommendations on how to combine existing data and requirements for future studies. We discuss key methodologies and the importance of integrating the healthy gut atlas with related diseases and gut organoids. Importantly, we critically reflect on available computational tools, highlight existing limitations and provide recommendations to overcome key challenges. Essentially, generating a first comprehensive HGCA will provide researchers, scientists and clinicians with greater scope and power to enable novel discoveries into intestinal biology and pathophysiology with the ultimate goal of improving human health. [H1] Mapping the human gastrointestinal tract The human digestive tract reaches from the oral cavity to the rectum, including the oesophagus, stomach, and small and large intestines. Generating a complete map requires the inclusion of all organs associated with the digestive process and detailed sampling of each gut segment to capture distinct anatomical changes along the cephalad-caudal axis (Figure 1). A unique feature of the human intestine is the presence of trillions of microbiota that exist in a finely balanced symbiosis and are thought to be of critical importance to the host1. As such, profiling microbial communities will provide an opportunity to interrogate the crosstalk between microbes and host cells. However, this task is associated with the major challenges of profiling the human gut microbiome, including the vast inter-individual variability, differences in its composition along the intestinal tract and the dynamic changes of microbiota over time2 3. As a result, integrating the gut microbiome in the HGCA is likely to be part of future efforts and will require close interaction with expert researchers in this area. Another important aspect unique to the gastrointestinal tract is the exposure to and interaction with a wide range of food and nutritional components, as well as antigens or potential toxins present in the daily diet. Robust evidence in mice and humans suggests that gastrointestinal development, function, and predisposition to related disease are strongly correlated with dietary habits4,5. Therefore, documenting details on dietary habits will add critical information and further increase the value of molecular profiles generated. Importantly, epigenetic mechanisms can mediate the effect of exposure to environmental factors into stable cellular phenotypes6. The possibility of simultaneously profiling multiple molecular layers at single-cell resolution, including epigenetic, transcriptomic, and proteomic signatures, will support us in unravelling the interactions between intestinal host cells and our environment. [H1] Sampling strategies Acquiring human tissue in sufficient quantities and from a large number of donors is one of the major challenges associated with the generation of the HGCA. There are several considerations that apply to the gastrointestinal tract. In general, there are three major strategies for sampling the alimentary tract, including mucosal biopsy samples, live surgical resections, and resections from deceased donors7. Each strategy and tissue source has unique advantages and disadvantages (Figure 2). Mucosal biopsy samples can be obtained during a routine endoscopy, but access is restricted to the upper and lower gastrointestinal tract, with the jejunum and proximal ileum rarely sampled8. Multiple biopsy samples can be taken from the same patient enabling comparisons of different anatomical regions which can be accessed during routine upper endoscopy and colonoscopy8. Contrary to mucosal biopsy samples, which can be obtained from healthy individuals, live resections are derived from patients with major gastrointestinal pathologies. Work published in 2019 identified transcriptional differences for example in antimicrobial defense pathways and mucin biosynthesis, between healthy intestines and normal-appearing intestinal tissue proximal to chronic inflammatory regions in patients with ulcerative colitis 9. Hence, resections from patients with gastrointestinal disease should be used with caution when mapping ‘healthy’ cells. Advanced endoscopic imaging technology, such as confocal laser endomicroscopy, can provide additional information and guide sampling strategies10. Tissue from deceased organ donors enables sampling of entire organs, providing the opportunity for widescale comparisons between regions. Another major advantage of gut resections (either surgical or from deceased donors) is the provision of all layers of the intestinal tube from luminal contents through the musculature and serosa. By contrast forceps biopsy samples, obtained during clinical endoscopy, only capture the mucosa, including the epithelium and immediate subepithelial cells11. Size also varies drastically between biopsy samples and resections, affecting the number and type of downstream analyses that can be performed. Finally, tissue quality might differ between strategies, with the time from sampling to processing of tissue being of critical importance12. In summary, compared to other human organs, the gastrointestinal tract is frequently sampled during routine clinical procedures providing unique opportunities to obtain tissue from healthy individuals as well as many related gut diseases. However, for complete coverage of all anatomical regions and layers of the intestinal tube, the use of surgical resection materials as well as deceased donor tissue, is essential. [H1] Documenting metadata The generation of an HGCA requires combining a vast number of different datasets generated by research groups around the globe13,14. A fundamental aspect that determines the integration and comparability of datasets is the documentation of detailed metadata15. The HCA (The Human Cell Atlas – Metadata) provides extensive guidance on this topic16. In the following section, we briefly describe the basic information required to enable future studies to be included in the HGCA and provide a template metadata table (Table 1). In line with recommendations provided by the HCA, required metadata can be divided into the following main aspects: a) study design, b) donor information, c) sample information, d) sample processing and e) data generation16. In brief, a detailed description of the study design, including patient inclusion criteria and sampling strategy, is of major importance. Donor information should include baseline demographic data, details on any known medical conditions, particularly those affecting the intestine, and medications that can be of major benefit. Similarly, the effect of dietary habits on gut physiology is well established, and information on dietary habits could be used to identify novel aspects of dietary factors involved in gut health and disease4,5. Baseline information on sample type includes the method by which it was obtained, sample area covered and anatomic location7,12,17. Depending on the sample type, the exact anatomical location from which the sample was obtained can be difficult to determine. For example during routine endoscopy, exact location of mucosal biopsies can often only be estimated relative to anatomical landmarks such as the terminal ileum17. However, the level of detail provided has a major effect on the comparability of studies and the interpretation of data generated. If available, representative haematoxylin & eosin and/or another staining on tissue sections can be directly linked to samples processed for single-cell studies17 18. Information on sampling procedures should include details on the length of time between sampling and processing, storage duration as well as storage type (e.g. -20 or -80ºC, liquid nitrogen). Furthermore, providing a detailed protocol for sample processing, including tissue dissociation, cell viability, possible enrichment of individual cell types, and equipment used, is important. The development of standardised protocols is subject to ongoing work within the Human Cell Atlas. Active engagement of the scientific community and the use of available resources, including recommended protocols, will increase the comparability of data generated by different groups. [H1] Protecting donor identity and engaging communities Documentation, storage and sharing of patient and donor information raises important ethical considerations19. Regulations vary considerably between different institutions and territories, further complicating the sharing of tissues and data. In the European Union, regulations concerning the protection of personal data and the responsible sharing of such data have been formulated in the General Data Protection Regulation (GDPR) 2016/67920. In the United States, there are several federal regulations that must be considered when using human tissue and data in the context of research, including the Department of Health and Human Services (HHS) and the Food and Drug Administration (FDA)21. In addition to major differences in the rules and regulations between territories across the globe, exchanging tissue and data between regions poses further complexities and barriers. However, given the critical importance of metadata for the integration of different datasets, appropriate donor consent for sharing information, data and/or tissue is a prerequisite. In addition to explicit informed consent, pseudonymisation of data and donor privacy must be ensured. A critical aspect of achieving a broad utility for the HGCA is the inclusion and participation of ancestrally and geographically diverse populations. This requires close partnerships between researchers, funders, and potential patient and donor communities, particularly in the case of historically neglected or mistreated populations in research22 23. Ideally, studies providing data for the HGCA should include as many community stake-holders as feasible, with the long-term goal of having research questions formulated by researchers and participating populations working in partnership24. [H1] Available methodologies and their application A wide range of single-cell and single-nuclei genomics tools have been developed and enabled the creation of high-resolution cellular atlases from human organs and tissues13 25 26. By and large, the methodologies that will be used to generate the HGCA are the same as those used to profile other organs or systems (Figure 3). In addition to single-cell transcriptional profiling, key methods include spatial transcriptomics and single-cell profiling of other molecular layers such as the genotype27 and epigenotype, including chromatin states (single-cell assay for transposase-accessible chromatin sequencing (ATAC)-seq)28 and DNA methylation (whole genome bisulfite sequencing)29. Furthermore, several methods have been developed that enable simultaneous profiling of multiple molecular layers, such as the 10x Genomics Multiome kit, which combines single-nuclei ATACseq and single-nuclei RNA sequencing (RNAseq)30. Single-cell nucleosome, occupancy and methylome sequencing can measure DNA methylation and chromatin accessibility within single cells31, whereas single-cell nucleosome, methylation and transcription sequencing captures transcript levels in addition to chromatin accessibility and DNA methylation32. Furthermore, the use of methods that enable enrichment for specific and/or rare cell types will be key to achieving complete coverage. For example, mining rare cells sequencing (MIRACLseq) enables label-free enrichment of rare cell types33 whereas the Chromium Single Cell Immune Profiling assay provided by 10x Genomics enables detailed immune cell profiling including full-length, paired B cell or T cell receptor sequences, surface protein expression, antigen specificity, and 5’ gene expression34 . Another key challenge for the HGCA is the timely processing of fresh tissue samples. Isolation of tightly connected epithelial cells is associated with damage12, highlighting the need for suitable dissociation protocols as well as computational tools to exclude dead or damaged cells12. Furthermore, dissociation of cells at 37C for prolonged time (i.e. more than one hour) induces the expression of early-immediate genes, thereby disrupting nuanced cellular states. Current advances have led to the development of single-nuclei RNAseq protocols, which can be applied to frozen tissue samples, therefore, easing the burden of sample processing. High-quality nuclei can be readily isolated from frozen samples and subjected to single-nuclei RNAseq35.  Although fewer transcripts are recovered, these data capture most cell populations and vastly increase the availability of tissue sources35. The decision on which tissue processing method is most appropriate will be largely determined by the individual setup, including tissue availability. However, a detailed description of experimental procedures is of critical importance for the value of generated data for the HGCA. As a result of methods requiring tissue dissociation prior to single-cell profiling, information on spatial arrangement of cells that are crucial for their function is lost entirely. Multiple spatial methods are used to capture cells in their anatomical context, including next-generation sequencing-based (for example, 10x Genomics Visium)36 and imaging-based (for example, multiplexed error-robust fluorescence in situ hybridization) assays37,38. Typically, frozen or formalin-fixed paraffin-embedded tissue sections are used as starting material and ideally are obtained from the same area from which tissue for single-cell assays was taken18,36. Integration of spatial approaches with single-cell genomics provides both the cellular resolution as well as spatial organisation of cell combinations and states (functional tissue units) as an essential framework for a comprehensive atlas of the human gut. Several computational tools have been developed and will be discussed later. In summary, existing methodologies offer researchers a wide range of opportunities to address their research goals. Although integrating data derived from studies using different methodologies poses a challenge for the HGCA, the key factor determining the potential value is the quality of data generated combined with the documentation of detailed metadata. Additionally, we encourage researchers across the globe to engage with the HCA Gut Bionetwork prior to and during their single-cell studies to further maximise the use of generated data. [H1] Computational challenges and opportunities The HGCA aims to generate a resource for the scientific community that is reliable, easy to access and user-friendly. Given the vast number of diverse datasets generated by a wide range of research groups using different methodological approaches (as outlined below), the usage of existing and development of novel computational tools is of critical importance. The main challenges include adequate measures to identify studies with sufficient quality and combining and mapping diverse datasets. Furthermore, the development of a user-friendly interface enabling the data to be explored by the scientific community is a key requirement. Although many of these tasks also apply to most, if not all, tissue mapping studies39-41, there are several additional challenges and opportunities that are specific to the intestine. These include the integration of host cell molecular signatures with mucosa-associated microbial profiles, mapping of datasets to specific anatomical locations along the craniocaudal axis of the gut and alignment of healthy gut-derived datasets with disease state and intestinal organoid models. With the rapid development of single-cell RNAseq technology, numerous computational tools and pipelines have been developed to analyse single-cell data42-44. These include programmes and/or packages enabling pre-processing of data such as quality control checks, normalisation and batch correction, sequence alignment, as well as detection and removal of cell doublets45-47. A summary of relevant existing computational tools and/or packages is provided in Table 2. In the following section, we briefly discuss the main computational challenges and opportunities related to the development of the HCGA. [H2] Combining and integrating datasets across modalities Combining and integrating studies performed by different groups using either the same or different methodological approaches poses a major challenge. An important first step is the application of stringent selection criteria and quality control measures to select high-quality datasets. In addition to data quality, the provision of detailed metadata is a prerequisite for successful integration into the HGCA17. Another major challenge is the removal of potentially confounding batch effects which refer to differences caused by technical variation rather than reflecting true biological differences48. Available approaches and packages include Seurat49, LIGER50, and Harmony51 which are using a variety of different computational approaches52. The integration of different molecular profiles (e.g. transcriptomes and epigenomes) forms another key computational challenge, and analytical frameworks have been developed to integrate multiple data types in the same cells, including GLUE53, MOFA+54 and Cobolt55. Additionally, a statistical regression framework (MIRACL-seq) has been developed to integrate clinical metadata and genetic information in single-cell profiling studies33. [H2] Cell type annotation Abundant and well-characterised cell types can be reliably identified through unsupervised clustering algorithms followed by the comparison of key marker gene expression profiles56. Generating a list of known marker genes based on existing literature forms a key aspect in the process of performing reliable cell annotation and requires the contribution of experts in the field. Indeed, combining expertise and large datasets provides a unique opportunity to develop an extensive cell marker gene list using both automated and supervised cell identification approaches (for example, MACA57, singleR58, ScType59). For each confidently identified gut cell type, marker genes can then be inferred by distinguishing the known clusters (for example, COMET60, COSG61) ultimately leading to a fully automated annotation procedure. Applying automated annotation approaches to large datasets will also yield unknown and novel cell clusters, which must be subjected to additional validation studies and functional characterisations, for example by using human gut organoid models as will be discussed later. [H2] Mapping cellular location Mapping individual cell types to their specific anatomical location within the human intestine is of critical importance. Combining single-cell sequencing technologies with spatial transcriptomics enables accurate profiling of cell type topography, and several tools have been developed to address the computational aspects involved62, including ASAP63, novoSpaRc64, and Cell2location65. Multiomic spatial profiling efforts result in a gene, plus protein-by-cell matrix, annotated with spatial coordinates for the centroid position of each cell66. This matrix can then be used as input to the currently developed open source for spatial analysis pipelines67 enabling an unbiased identification of spatial patterns based on gene expression, distinct cellular neighbourhoods and cell-to-cell interactions. Mapping the spatial network of cells in the human intestine will enable the identification of genes with coherent spatial expression patterns using methods such as BinSpect and SpatialDE68,69. Spatial domains with coherent gene expression patterns can be identified with Hidden Markov random field models that find spatial domains by comparing the gene expression patterns of each cell with its neighbourhood to search for coherent patterns67,70. The prevalence of specific cell-to-cell interactions in the human intestine could be evaluated by the frequency that each pair of cell types is proximal to each other71. Finally, spatially interacting cell types can be analyzed to identify which known ligand-receptor pairs that show increased or decreased co-expression, which could serve as a proxy for signalling activity, by creating a background distribution through spatially aware permutations67, which increases predictive power when compared to spatially unaware permutations. The inclusion of imaging data in the HGCA provides an additional approach towards mapping cellular location and will be discussed in more detail below. [H2] Cellular and microbial crosstalk Investigating cellular crosstalk through the analyses of single-cell transcriptomic datasets is subject to active research in the field72. Amongst the most compelling existing approaches are CellPhoneDB, which enables interrogation of context-specific crosstalk between different cell types based on an extensive database of known receptors and ligands73. Applying these algorithms to large and/or combined datasets is likely to yield novel insights into fundamental aspects of human intestinal physiology18. Extending such analyses to include the crosstalk between human host cells and the gut microbiome represents another substantial challenge. To this end, several computational strategies have been developed to identify cellular and microbial crosstalk in the gut utilizing single-cell data from human gut and microbial samples74 75,76. Current studies have provided evidence for the major value of combining in situ spatial-profiling technologies with single-cell sequencing to interrogate host-microbial interactions77. [H2] Regulatory network inference Successful integration of datasets and/or molecular profiles has the potential to unravel both known and novel cell type-specific molecular networks. Several tools are available to enable the prediction of gene regulatory networks that control fundamental biological functions such as cellular differentiation and cell state transitions (for example, SCENIC78, GRNBoost279, PIDC80). Identifying the gene regulatory networks contributes towards our understanding of how coordinated expression of transcription factor networks drives the expression of their respective target genes and ultimately shapes and maintains gut cell identity81. Examples include the identification of transcriptional networks that are present in the human fetal intestinal epithelium and reactivated in patients with Crohn’s Disease18 and the coregulation of UC risk genes by a limited number of gene modules in disease relevant cell types82. [H2] HGCA portal A user-friendly and interactive web portal essential for making the HGCA accessible to a broad research community 83. There are a number of existing portals of single-cell data that provide a wide range of tools to interrogate datasets, and a summary is provided later84-86. These portals provide an excellent starting point, and we envision complete integration of the HGCA into the larger portals, which also enables gut-specific tools to be developed. [H1] Development of a common coordinate framework for the HGCA A critical part of the HGCA is its ability to capture the location of cells in their anatomical and physiological context13. The concept of a common coordinate framework (CCF) has been introduced to facilitate this capability. Rood and colleagues provide the following definition87: “An underlying reference map of organs, tissues, or cells that allows new individual samples to be mapped to determine the relative location of structural regions between samples.” Like any other computational model, a CCF provides an abstract representation of a real-world item. CCFs have been developed for several human organs, including the lung88, the brain89 and the use of vasculature as a CCF of the entire human body90. A first step towards the development of a HGCA CCF requires the generation of one, two and three-dimensional models of the intestinal tract. The one-dimensional conceptual model is based on the clinical view and biological organisation of the digestive tract in which distance from anatomical landmarks provides the critical location metric17,91. ‘Anatomograms’ are two-dimensional graphical models that include a simplified view of the gut for representational purposes, thereby providing a basic framework for accurately capturing the anatomical location of the tissue component and cell types92. Three-dimensional models can be generated by integrating computerized tomography or magnetic resonance imaging data. Generating a framework also involves developing a system of associated coordinates that provide a detailed specification of anatomical location, tissue and cell type17,91. Hence, providing a detailed anatomical location of sampling sites is of critical importance. However, there are limitations to the accuracy of obtaining the exact anatomical location during surgery or intestinal endoscopy17. Stating proportional lengths and/or distances from reliable anatomical landmarks such as the splenic flexure or ileo-caecal valve, could improve accuracy and help to facilitate the integration of generated datasets into the HGCA CCF. Another complementary strategy to localise cell types in two-dimensional space is the combined use of signle-cell RNAseq with spatial transcriptomics as discussed earlier. This strategy requires the development of computational tools and efforts to allow the integration of single-cell RNAseq and spatial data into a CCF, and work in this area is ongoing. Several CCFs have been developed and are accessible through existing data portals. For example, as part of the Human BioMolecular Atlas Program (HuBMAP), three-dimensional models of both the human large and small intestine have been developed91. The output is a series of surface models of the gut representing the outline within a three-dimensional context enabling placement of tissue or cell types at the macro-level. This data can be accessed online through a CCF Registration User Interface (Figure 4A), and mapped data can be explored using the Exploration User Interface. Furthermore, as part of a collaborative project funded by the Leona M. and Harry B. Helmsley Charitable Trust (Gut Cell Atlas), researchers from The European Bioinformatics Institute (EBI Cambridge) and Edinburgh have developed a one-dimensional linear conceptual CCF model for the human large and small intestine that is linked to the two-dimensional anatomogram17,92. These mappings coupled with the inverse transform from the three-dimensional and two-dimensional spaces back to the one-dimensional linear model enable spatial interoperability between all representations and, therefore, the capability to compare and query data registered to any CCF using a web-based visualisation platforms (Figure 4B, C and D)17. In addition to providing a framework for accurately mapping single datasets, CCFs are also capable of integrating other related datasets and can be managed across geographically dispersed data repositories87. Importantly, the value of CCFs is directly related to the level of standardisation achieved as it determines the degree of potential interoperability across the datasets91. Amongst the ways to increase standardisation are the development of consensus guidelines on how anatomical location can be most accurately documented for the purpose of integrating single-cell studies into the HGCA CCF. A starting point can be the use of a standard metadata template (Table 1) and the adherence to the minimum information standard for the description of cell and tissue sources in the gut. Ultimately, successful development of a CCF will greatly enhance the value and broad application of the HGCA. [H1] Summary of relevant existing datasets, studies and portals The number of published studies reporting on single-cell profiling of the human gut has increased substantially since the founding of the HCA in 2016. Table 3 provides a summary of some of the major studies that have profiled primary tissue obtained from individual gut segments including the oral cavity93-95, esophagus96,97, stomach97-101, small18,25,102-109 and large18,25,33,82,105,106,108,110-114 intestine. Successful integration of these studies provides a strong foundation for the development of the HGCA. Indeed, several publicly accessible portals have already been established that contain intestinal single-cell datasets and enable access in a user-friendly way. Furthermore, a substantial number of studies have profiled intestinal tissues obtained from patients with gut diseases82,108,109,115-128 and/or human intestinal organoids18,129,130 (Figure 5, Table 3). In this section, we provide a summary of the main existing datasets and portals, highlight examples of novel findings derived from sc research, and provide recommendations for future work. [H2] In utero development and healthy gut datasets The high throughput scalability intrinsic to most single-cell technologies has provided unprecedented advances to the field. One area that benefited from this aspect is gastrointestinal organogenesis, as its progress has been hindered by the scarcity of human samples. Indeed, datasets derived from single-cell RNAseq spanning from 6 to 25 post-conception weeks have yielded several novel insights. For example, during early fetal development, the crypt-villus axis begins to emerge in the small intestine, concomitant to the appearance of FOXL1+ mesenchyme cells co-expressing PDGFRA and F318. In other single-cell studies, a population of mesenchymal cells displaying a similar transcriptional signature were found to co-express NRG1, which has been demonstrated to support the differentiation of LGR5+ stem cells into mature intestinal epithelial cells131,132. Additionally, distinct clusters of single-cell transcriptomes identified during the early stages of human intestinal development provided novel insight into the processes of regionalization during early intestinal development132,133. There are numerous examples of novel findings based on single-cell RNAseq datasets derived from healthy adult gut samples. For example, an assessment of epithelial cells from the ileum, colon, and rectum revealed a high degree of functional diversity between the small and large intestine, as reflected by different nutrient absorption preferences106. The proposed existence of Paneth-like cells in the large intestine based on a cluster of colorectal epithelial cells co-expressing LYZ, CA4, CA7, and SPIB was later attributed to a new absorptive epithelial cell type expressing BEST4+ 25,96,112. In 2022, BEST4+ epithelial cells have also been reported to vary in abundance and transcriptional signature across different regions of the gut105. Together, these findings illustrate the major benefit of combining and comparing datasets from different studies, ultimately reaching reliable insight into healthy gut physiology and cellular function. Besides overcoming the challenges inherent to scarcity of material, datasets generated by single-cell RNAseq have aided the characterization of the cellular diversity and transcriptional signatures of rare cell types. For instance, the characteristics of the human enteric nervous system remained elusive until a few years ago. In 2020, by employing MIRACL-seq, a novel method designed to enrich samples for rare cell types, 1,445 enteric neurons were recovered from the human colon and found to cluster into 14 subsets based on their transcriptional signatures33. Interstitial cells of Cajal (ICC) are rare entities critical for gastrointestinal peristalsis through both the generation of slow-wave pacemaker activity to smooth muscle cells and the mediation of neurotransmission from enteric neurons. By applying the same strategy, transcriptional signatures of 1,103 ICCs from the human colon were generated33. Another single-cell database, generated by immunophenotyping and fluorescence-activated cell sorting, enriched ICCs from gastric resections and provided a comprehensive characterization of pathways and channels participating in their pacemaker activity100. Chemosensory cells such as tuft cells and enteroendocrine cells (EEC) are rare intestinal epithelial cells operating as an interface for signal transduction between the intestinal lumen and the body, relaying diet and microbiota-derived signals through the release of numerous peptide hormones, neurotransmitters, and cytokines134. In addition, tuft cells are essential for mounting T helper 2 (TH2) immune responses against parasites135. Based on single-cell studies, tuft cells were demonstrated to interact with the innate and adaptive immune systems through previously unreported receptors105, including immunoglobulin G receptors25. EECs sense intestinal content and release hormones to regulate the gastrointestinal activity, systemic metabolism, and food intake134. By using an organoid-based platform wherein EEC differentiation was induced by transient expression of NEUROG3, and hormones were tagged with gene reporters, the researchers generated a comprehensive dataset of EEC subtypes derived from the small intestine and colon129. [H2] Intestinal diseases and gut organoids Generating a complete HGCA in health provides unique opportunities to study the pathogenesis of related diseases. Furthermore, single-cell transcriptional profiles of primary tissue samples can be utilised as a valuable reference map allowing validation of existing and future organoid models. It is, therefore, of critical importance to take steps towards ensuring that datasets generated from related samples and patient cohorts can be integrated into the HGCA (Figure 5). Amongst the related gut diseases that have been investigated using single-cell profiling approaches are colorectal cancer (CRC)33,106,115,117-126, inflammatory bowel diseases (IBD)136, Crohn’s disease18,107,137-139 and ulcerative colitis82,113,116,140,141 as well as celiac disease102. Examples of major findings in IBD include the identification of distinct immune cell signatures in ulcerative colitis and Crohn’s disease142, a pathogenic cellular module associated with resistance to anti-TNF therapy138, inference of genetic risk genes to single-cell function82 and the reactivation of fetal intestinal epithelial transcriptional profiles in childhood-onset Crohn’s disease18. Similarly, the application of single-cell molecular profiling methods to colonic tissue obtained from patients with CRC has led to major advances in our understanding of disease pathogenesis. Specifically, current single-cell insights into the stem and metaplastic origins of human pre-cancers118 have led to the reclassification of the consensus molecular subtypes of CRC by their intrinsic features127. The transition of benign lesions into malignancy is accompanied by tumour cell acquisition of stem characteristics118,126, and reorganization of the microenvironment into suppressive immune-stromal hubs that can potentially be therapeutically targeted117,143. Although a comprehensive summary of all relevant available sc studies in IBD and CRC is beyond the scope of this manuscript, it is important to highlight that the ability to integrate and compare studies performed on disease tissues at different stages of progression is of major benefit. Hence, ensuring compatibility with the HGCA remains a key priority for every study. Although integration of common gut-related conditions for which extensive datasets are already available will be prioritised in the first phase of data integration, it is important to emphasise the major value of investigating and ultimately integrating rarer conditions for example tufting enterophathy144, early onset and monogenic forms of inflammatory bowel diseases145 as well as congenital disease affecting the enteric nervious system such as Hirschsprungs disease146. Indeed, combining datasets from less commonly profiled conditions represents another unique opportunity for the HGCA by increasing computational power and allowing validation of key findings. Furthermore, applying a variety of methodologies (for example, single-cell multiome profiling) to the same condition will further improve the value of the HGCA and the chances of gaining novel insight into disease pathogenesis. The development of human intestinal organoid culture models has transformed many aspects of gut-related research providing researchers with unprecedented opportunities to study fundamental aspects of intestinal biology147. Performing transcriptional profiling of patient-derived intestinal organoids on a single-cell level is of great value as reflected in the increasing numbers of studies reporting novel findings129. Main benefits include the ability to validate the cellular composition of organoids to further improve culture conditions and evaluate to what extent disease-associated cellular alterations are retained in vitro. An example is a study published by He and colleagues in 2022, who applied single-cell RNAseq to human small intestinal organoids130. Exposure of organoids to IL-22 resulted in increased expression of antimicrobial peptides suggesting the presence of Paneth cells130. Furthermore, single-cell profiling of primary human fetal gut and organoids derived from the same tissue sample revealed in vitro maturation of fetal organoids highlighting their value in investigating the early stages of human intestinal epithelial cell development18. Work by Ishikawa and colleagues combined single-cell RNAseq of human colonic epithelium cells with genetically modified human intestinal organoids leading to the identification of quiescent LGR5+ stem cells in the human colon148. Similarly, combining cancer organoid assays and single-cell RNAseq of biopsy samples from patients with CRC identified a novel role of a β-hydroxybutyrate-triggered pathway in regulating intestinal tumorigenesis149. As for studies based on profiling primary tissue, the provision of metadata, including clinical donor details, sampling sites, biobanking as well as experimental procedures, are equally important to organoid-related studies as they will determine their future compatibility and ensure successful integration into the HGCA. [H2] Existing Data portals In an effort to consolidate the growing number of single-cell datasets that have been generated in the past few years, several web-based data repositories have emerged to provide access in a user-friendly format. Although most portals share common features of storing data, curating datasets by different levels of metadata, and offering a variety of analysis and visualization tools, distinct aspects render each portal valuable. Here we summarize key features of the main existing data portals relevant to the HGCA. [H3] Single-Cell Expression Atlas The Single-Cell Expression Atlas (SCEA) 92 is the single-cell component of EMBL-EBI’s Expression Atlas. This is an added-value resource that enables simple gene and meta-data queries, allowing users to answer questions such as “where is my gene of interest expressed” or “how does its expression level change in a disease”. SCEA collects expression data from all species, annotates their metatada with appropriate ontology terms to enable nested across studies and, crucially, re-analyses all datasets using standardised analysis pipelines to enable comparison across studies. The SCEA works in close collaboration with the HGCA, releasing the datasets as they become publicly available and generating the first full gut 2D anatomogram, that will enable easy visual exploration of gene and marker gene expression across the cell types in the different anatomical sub-structures of the gut. [H3] HubMAP The HuBMAP is a National Institutes of Health Common Fund effort to integrate and map diverse biological data across the healthy human body. Three main features enable: 1) analysis of single-cell RNAseq experiments with Azimuth, a web application that uses reference datasets to automate annotation and interpretation of data; 2) spatial single-cell data visualization with Vitessce; and 3) navigation of the healthy human cells with the CCF to interact with the virtual human body to focus on anatomical structures, cell types, and biomarkers. [H3] Tabula Sapiens Like HuBMAP, the Tabula Sapiens focuses on healthy human subjects and has created a first draft HCA of 24 organs from 15 different human subjects. Tabula Sapiens is funded by the Chan Zuckerberg Initiative and is unique in that the single-cell data sets derived for each organ are from the same human subject, controlling for inter-individual factors. This also enables the comparison of cell types that are shared between different organs. The web portal offers the ability to peruse all the single-cell data combined (currently at 500,000 cells) or curated cell subsets (that is, endothelial, epithelial, immune, and stromal) in an easy-to-navigate graphical user interface. [H3] University of California at Santa Cruz Cell Browser The University of California at Santa Cruz (UCSC) Cell Browser is a sub-project under the supervision of the UCSC Genome Browser project that was developed and is maintained by a cross-departmental team in the UCSC Genomics Institute. The Cell Browser is an interactive viewer where users can interrogate single-cell data sets from a wide variety of species, organs, and tissues from a menu list. A unique feature is the broad scientific focus of the datasets generated from human, mouse, fly, and sponge and curated as conventional Atlases or analysed in the context of development and evolution. The datasets are converted for compatibility by the UCSC Cell Browser Group and placed in an open-source portal. The analysis and viewer package can also be downloaded and installed locally. [H3] Broad Institute Single Cell Portal The Single Cell Portal is hosted by the Broad Institute and was created with the idea of making single-cell data easy to share and access. The portal is a deposit site where investigators can create their own collection as a body of work or contribute to the growing list of curated datasets. The site is easy to use and has succinct overviews describing the study design and experimental conditions. Like other portals, the Single Cell Portal has conventional visualization tools that are interactive in a simple graphical user interface that allows the user to filter several parameters in the data sets and sub-sampling data. [H3] Gut Cell Atlas, Wellcome Sanger Institute Gut Cell Atlas provides a detailed gut cell survey by combining single-cell data generated from a range of human gut tissues. Datasets include studies that profiled various gut segments obtained from fetal, pediatric and adult donors as well as patients diagnosed with Crohn’s disease. In addition to direct access to raw datasets, the site provides an interactive viewer that enables basic analyses and interrogation of datasets, such as exploring single-cell expression profiles of individual cell lineages, gut regions, or comparisons between age groups. [H1] Future opportunities, including incorporating datasets In many aspects, the current collection of high-quality published single-cell studies achieve good coverage of the human intestinal tract and therefore provide a solid foundation for the generation of the HGCA. Areas that have been explored using single-cell RNAseq technology include intestinal development18,133,150-152, profiling of cell types, states and tissue composition129, mapping regional differences across intestinal tissues and along the intestinal tube114,129, as well as comparisons between healthy individuals and people with IBD105,111, CRC123,124, or other related diseases97,126. However, there are several less well-explored areas that require careful consideration in future studies. For example, to achieve a comprehensive and truly global HGCA, greater effort must be made to include samples obtained from underrepresented and ethnically diverse communities to reflect potential cellular differences across human populations. Currently, only a few studies have sampled multiple intestinal regions within the same individual25,108. However, such cross-tissue studies are of major value as they enable the identification of common and/or distinct cell types across intestinal regions as well as studying migratory patterns of specific immune cell populations25. Furthermore, greater emphasis should be placed on profiling rare cell types, particularly in less frequently sampled gut regions such as the jejunum and proximal ileum. This objective can be achieved by applying methods designed to enrich for such cell types prior to single-cell RNAseq and/or the application of spatial transcriptomics. Additionally, analysing single nuclei rather than single cells is a useful strategy to profile cells that cannot be readily recovered using standard dissociation protocols33. However, the advantages and disadvantages of these methods must be carefully considered in the context of each particular research study153,154. Finally, there has been a major bias regarding the anatomical sampling sites, with a large proportion of studies focussing on profiling the colon and distal small bowel. This is likely to be partly caused by the ease of access during endoscopic procedures as well as the major relevance to related gastrointestinal diseases such as IBD and CRC. Future studies should also include less frequently sampled gut segments, such as the mid-small intestine, oesophagus, stomach and other organs involved in the digestive process, including the liver and pancreas. Last but not least, profiling the intestine at different developmental stages by obtaining tissue from donors of all age groups will provide further insight into the pathophysiology of related diseases, including those occurring in specific age groups. [H1] Conclusions Generating a complete map of the human intestine at the single-cell level will improve our understanding of gut health and disease. The inherent complexities and scale of this task requires a coordinated effort led by experts in this rapidly evolving field. This manuscript forms a key part of our strategy by providing a detailed roadmap to the scientific community. Broad distribution and constructive discussion of our proposal is essential to achieving our goal the generation of a HGCA. 1 Rooks, M. G. & Garrett, W. S. Gut microbiota, metabolites and host immunity. Nature Reviews Immunology 16, 341-352, doi:10.1038/nri.2016.42 (2016). 2 Kolodziejczyk, A. A., Zheng, D. & Elinav, E. Diet-microbiota interactions and personalized nutrition. Nat Rev Microbiol 17, 742-753, doi:10.1038/s41579-019-0256-8 (2019). 3 Knight, R. et al. Best practices for analysing microbiomes. Nat Rev Microbiol 16, 410-422, doi:10.1038/s41579-018-0029-9 (2018). 4 Makki, K., Deehan, E. C., Walter, J. & Backhed, F. The Impact of Dietary Fiber on Gut Microbiota in Host Health and Disease. Cell Host Microbe 23, 705-715, doi:10.1016/j.chom.2018.05.012 (2018). 5 Khalili, H. et al. The role of diet in the aetiopathogenesis of inflammatory bowel disease. Nat Rev Gastroenterol Hepatol 15, 525-535, doi:10.1038/s41575-018-0022-9 (2018). 6 Safi-Stibler, S. & Gabory, A. Epigenetics and the Developmental Origins of Health and Disease: Parental environment signalling to the epigenome, critical time windows and sculpting the adult phenotype. Semin Cell Dev Biol 97, 172-180, doi:10.1016/j.semcdb.2019.09.008 (2020). 7 Perrone, F. & Zilbauer, M. Biobanking of human gut organoids for translational research. Exp Mol Med 53, 1451-1458, doi:10.1038/s12276-021-00606-x (2021). 8 Enns, R. A. et al. Clinical Practice Guidelines for the Use of Video Capsule Endoscopy. Gastroenterology 152, 497-514, doi:10.1053/j.gastro.2016.12.032 (2017). 9 Smillie, C. S. et al. Intra- and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis. Cell 178, 714-730.e722, doi:10.1016/j.cell.2019.06.029 (2019). 10 Pilonis, N. D., Januszewicz, W. & di Pietro, M. Confocal laser endomicroscopy in gastro-intestinal endoscopy: technical aspects and clinical applications. Transl Gastroenterol Hepatol 7, 7, doi:10.21037/tgh.2020.04.02 (2022). 11 Chen, W. C. & Wallace, M. B. Endoscopic management of mucosal lesions in the gastrointestinal tract. Expert Rev Gastroenterol Hepatol 10, 481-495, doi:10.1586/17474124.2016.1122520 (2016). 12 Pensold, D. & Zimmer-Bensch, G. Methods for Single-Cell Isolation and Preparation. Adv Exp Med Biol 1255, 7-27, doi:10.1007/978-981-15-4494-1_2 (2020). 13 Regev, A. et al. The Human Cell Atlas. Elife 6, doi:ARTN e27041 10.7554/eLife.27041 (2017). 14 Rozenblatt-Rosen, O., Stubbington, M. J. T., Regev, A. & Teichmann, S. A. The Human Cell Atlas: from vision to reality. Nature 550, 451-453, doi:DOI 10.1038/550451a (2017). 15 Skinnider, M. A., Squair, J. W. & Courtine, G. Enabling reproducible re-analysis of single-cell data. Genome Biol 22, doi:ARTN 215 10.1186/s13059-021-02422-y (2021). 16 Füllgrabe, A. et al. Guidelines for reporting single-cell RNA-seq experiments. Nat Biotechnol 38, 1384-1386, doi:10.1038/s41587-020-00744-z (2020). 17 Burger, A. et al. Towards a Clinically-based Common Coordinate Framework for the Human Gut Cell Atlas - The Gut Models. bioRxiv, 2022.2012.2008.519665, doi:10.1101/2022.12.08.519665 (2022). 18 Elmentaite, R. et al. Single-Cell Sequencing of Developing Human Gut Reveals Transcriptional Links to Childhood Crohn's Disease. Developmental Cell 55, 771-+, doi:10.1016/j.devcel.2020.11.010 (2020). 19 Lee, S. S. J. The Ethics of Consent in a Shifting Genomic Ecosystem. Annual Review of Biomedical Data Science, Vol 4 4, 145-164, doi:10.1146/annurev-biodatasci-030221-125715 (2021). 20 EUR-Lex, T. E. P. a. t. C. o. t. E. U. General Data Protection Regulation (EU) 2016/679 (GDPR), (2016). 21 Bledsoe, M. J. & Grizzle, W. E. Use of human specimens in research: the evolving United States regulatory, policy, and scientific landscape. Diagn Histopathol (Oxf) 19, 322-330, doi:10.1016/j.mpdhp.2013.06.015 (2013). 22 Shore, N. et al. Understanding community-based processes for research ethics review: a national study. Am J Public Health 101 Suppl 1, S359-364, doi:10.2105/ajph.2010.194340 (2011). 23 Tackling helicopter research. Nature Geoscience 15, 597-597, doi:10.1038/s41561-022-01010-4 (2022). 24 Mikesell, L., Bromley, E. & Khodyakov, D. Ethical community-engaged research: a literature review. Am J Public Health 103, e7-e14, doi:10.2105/ajph.2013.301605 (2013). 25 Elmentaite, R. et al. Cells of the human intestinal tract mapped across space and time. Nature 597, 250-+, doi:10.1038/s41586-021-03852-1 (2021). 26 Camp, J. G., Platt, R. & Treutlein, B. Mapping human cell phenotypes to genotypes with single-cell genomics. Science 365, 1401-+, doi:10.1126/science.aax6648 (2019). 27 Macaulay, I. C. et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat Methods 12, 519-522, doi:10.1038/nmeth.3370 (2015). 28 Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat Biotechnol 37, 916-924, doi:10.1038/s41587-019-0147-6 (2019). 29 Smallwood, S. A. et al. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat Methods 11, 817-820, doi:10.1038/nmeth.3035 (2014). 30 Hao, Y. H. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573-+, doi:10.1016/j.cell.2021.04.048 (2021). 31 Pott, S. Simultaneous measurement of chromatin accessibility, DNA methylation, and nucleosome phasing in single cells. Elife 6, doi:10.7554/eLife.23203 (2017). 32 Clark, S. J. et al. scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat Commun 9, doi:ARTN 781 10.1038/s41467-018-03149-4 (2018). 33 Drokhlyansky, E. et al. The Human and Mouse Enteric Nervous System at Single-Cell Resolution. Cell 182, 1606-+, doi:10.1016/j.cell.2020.08.003 (2020). 34 Setliff, I. et al. High-Throughput Mapping of B Cell Receptor Sequences to Antigen Specificity. Cell 179, 1636-+, doi:10.1016/j.cell.2019.11.003 (2019). 35 Slyper, M. et al. A single-cell and single-nucleus RNA-Seq toolbox for fresh and frozen human tumors. Nat Med 26, 792-802, doi:10.1038/s41591-020-0844-1 (2020). 36 Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat Genet 53, 1334-1347, doi:10.1038/s41588-021-00911-1 (2021). 37 Xia, C., Babcock, H. P., Moffitt, J. R. & Zhuang, X. Multiplexed detection of RNA using MERFISH and branched DNA amplification. Scientific Reports 9, 7721, doi:10.1038/s41598-019-43943-8 (2019). 38 Black, S. et al. CODEX multiplexed tissue imaging with DNA-conjugated antibodies. Nat Protoc 16, 3802-3835, doi:10.1038/s41596-021-00556-8 (2021). 39 Haniffa, M. et al. A roadmap for the Human Developmental Cell Atlas. Nature 597, 196-205, doi:10.1038/s41586-021-03620-1 (2021). 40 Travaglini, K. J. et al. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature 587, 619-625, doi:10.1038/s41586-020-2922-4 (2020). 41 Zou, Z. et al. A Single-Cell Transcriptomic Atlas of Human Skin Aging. Dev Cell 56, 383-397.e388, doi:10.1016/j.devcel.2020.11.002 (2021). 42 Hwang, B., Lee, J. H. & Bang, D. Single-cell RNA sequencing technologies and bioinformaticspipelines. Experimental & Molecular Medicine 50, 1-14, doi:10.1038/s12276-018-0071-8 (2018). 43 Zappia, L., Phipson, B. & Oshlack, A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLOS Computational Biology 14, e1006245, doi:10.1371/journal.pcbi.1006245 (2018). 44 Moreno, P. et al. User-friendly, scalable tools and workflows for single-cell RNA-seq analysis. Nat. Methods 18, 327-328, doi:10.1038/s41592-021-01102-w (2021). 45 Wilbrey-Clark, A., Roberts, K. & Teichmann, S. A. Cell Atlas technologies and insights into tissue architecture. Biochem J 477, 1427-1442, doi:10.1042/bcj20190341 (2020). 46 Ke, M., Elshenawy, B., Sheldon, H., Arora, A. & Buffa, F. M. Single cell RNA-sequencing: A powerful yet still challenging technology to study cellular heterogeneity. Bioessays 44, e2200084, doi:10.1002/bies.202200084 (2022). 47 Su, M. et al. Data analysis guidelines for single-cell RNA-seq in biomedical studies and clinical applications. Mil Med Res 9, 68, doi:10.1186/s40779-022-00434-8 (2022). 48 Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biology 21, 12, doi:10.1186/s13059-019-1850-9 (2020). 49 Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573-3587.e3529, doi:10.1016/j.cell.2021.04.048 (2021). 50 Liu, J. et al. Jointly defining cell types from multiple single-cell datasets using LIGER. Nature protocols 15, 3632-3662, doi:10.1038/s41596-020-0391-8 (2020). 51 Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289-1296, doi:10.1038/s41592-019-0619-0 (2019). 52 Ryu, Y., Han, G. H., Jung, E. & Hwang, D. Integration of Single-Cell RNA-Seq Datasets: A Review of Computational Methods. Mol Cells 46, 106-119, doi:10.14348/molcells.2023.0009 (2023). 53 Cao, Z. J. & Gao, G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat Biotechnol, doi:10.1038/s41587-022-01284-4 (2022). 54 Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biology 21, 111, doi:10.1186/s13059-020-02015-1 (2020). 55 Gong, B., Zhou, Y. & Purdom, E. Cobolt: integrative analysis of multimodal single-cell sequencing data. Genome Biology 22, 351, doi:10.1186/s13059-021-02556-z (2021). 56 Jones, R. C. et al. The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans. Science 376, 711-+, doi:10.1126/science.abl4896 (2022). 57 Xu, Y., Baumgart, S. J., Stegmann, C. M. & Hayat, S. MACA: Marker-based automatic cell-type annotation for single cell expression data. Bioinformatics, doi:10.1093/bioinformatics/btab840 (2021). 58 Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nature Immunology 20, 163-172, doi:10.1038/s41590-018-0276-y (2019). 59 Ianevski, A., Giri, A. K. & Aittokallio, T. Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data. Nature Communications 13, 1246, doi:10.1038/s41467-022-28803-w (2022). 60 Delaney, C. et al. Combinatorial prediction of marker panels from single-cell transcriptomic data. Mol Syst Biol 15, e9005, doi:10.15252/msb.20199005 (2019). 61 Dai, M., Pei, X. & Wang, X.-J. Accurate and fast cell marker gene identification with COSG. Briefings in Bioinformatics 23, doi:10.1093/bib/bbab579 (2022). 62 Andersson, A. et al. Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography. Communications Biology 3, 565, doi:10.1038/s42003-020-01247-y (2020). 63 David, F. P. A., Litovchenko, M., Deplancke, B. & Gardeux, V. ASAP 2020 update: an open, scalable and interactive web-based portal for (single-cell) omics analyses. Nucleic Acids Research 48, W403-W414, doi:10.1093/nar/gkaa412 (2020). 64 Moriel, N. et al. NovoSpaRc: flexible spatial reconstruction of single-cell gene expression with optimal transport. Nature protocols 16, 4177-4200, doi:10.1038/s41596-021-00573-7 (2021). 65 Kleshchevnikov, V. et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat. Biotechnol., doi:10.1038/s41587-021-01139-4 (2022). 66 Longo, S. K., Guo, M. G., Ji, A. L. & Khavari, P. A. Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics. Nat Rev Genet 22, 627-644, doi:10.1038/s41576-021-00370-8 (2021). 67 Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol 22, 78, doi:10.1186/s13059-021-02286-2 (2021). 68 Svensson, V., Teichmann, S. A. & Stegle, O. SpatialDE: identification of spatially variable genes. Nat Methods 15, 343-346, doi:10.1038/nmeth.4636 (2018). 69 Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol 22, doi:ARTN 78 10.1186/s13059-021-02286-2 (2021). 70 Zhu, Q., Shah, S., Dries, R., Cai, L. & Yuan, G.-C. Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data. Nat Biotechnol 36, 1183-1190, doi:10.1038/nbt.4260 (2018). 71 Goltsev, Y. et al. Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging. Cell 174, 968-981 e915, doi:10.1016/j.cell.2018.07.010 (2018). 72 Armingol, E., Officer, A., Harismendy, O. & Lewis, N. E. Deciphering cell-cell interactions and communication from gene expression. Nature Reviews Genetics 22, 71-88, doi:10.1038/s41576-020-00292-x (2021). 73 Efremova, M., Vento-Tormo, M., Teichmann, S. A. & Vento-Tormo, R. CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes. Nat Protoc 15, 1484-1506, doi:10.1038/s41596-020-0292-x (2020). 74 Kang, R., Park, B., Eady, M., Ouyang, Q. & Chen, K. J. Single-cell classification of foodborne pathogens using hyperspectral microscope imaging coupled with deep learning frameworks. Sensors and Actuators B-Chemical 309, doi:ARTN 127789 10.1016/j.snb.2020.127789 (2020). 75 Chattopadhyay, P. K., Roederer, M. & Bolton, D. L. A deadly dance: the choreography of host-pathogen interactions, as revealed by single-cell technologies. Nat Commun 9, doi:ARTN 4638 10.1038/s41467-018-06214-0 (2018). 76 Liao, C., Wang, T., Maslov, S. & Xavier, J. B. Modeling microbial cross-feeding at intermediate scale portrays community dynamics and species coexistence. Plos Comput Biol 16, e1008135, doi:10.1371/journal.pcbi.1008135 (2020). 77 Galeano Niño, J. L. et al. Effect of the intratumoral microbiota on spatial and cellular heterogeneity in cancer. Nature 611, 810-817, doi:10.1038/s41586-022-05435-0 (2022). 78 Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083-1086, doi:10.1038/nmeth.4463 (2017). 79 Moerman, T. et al. GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks. Bioinformatics 35, 2159-2161, doi:10.1093/bioinformatics/bty916 (2019). 80 Chan, T. E., Stumpf, M. P. H. & Babtie, A. C. Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures. Cell Syst 5, 251-267.e253, doi:10.1016/j.cels.2017.08.014 (2017). 81 Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A. & Murali, T. M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147-154, doi:10.1038/s41592-019-0690-6 (2020). 82 Smillie, C. S. et al. Intra- and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis. Cell 178, 714-+, doi:10.1016/j.cell.2019.06.029 (2019). 83 Cakir, B. et al. Comparison of visualization tools for single-cell RNAseq data. NAR Genomics and Bioinformatics 2, doi:10.1093/nargab/lqaa052 (2020). 84 Megill, C. et al. chanzuckerberg/cellxgene: Release 0.15.0. doi:10.5281/ZENODO.3710410 (2020). 85 Elmentaite, R. et al. Single-Cell Sequencing of Developing Human Gut Reveals Transcriptional Links to Childhood Crohn’s Disease. Developmental Cell 55, 771-783.e775, doi:https://doi.org/10.1016/j.devcel.2020.11.010 (2020). 86 Moreno, P. et al. Expression Atlas update: gene and protein expression in multiple species. Nucleic Acids Research 50, D129-D140, doi:10.1093/nar/gkab1030 (2021). 87 Rood, J. E. et al. Toward a Common Coordinate Framework for the Human Body. Cell 179, 1455-1467, doi:10.1016/j.cell.2019.11.019 (2019). 88 Schiller, H. B. et al. The Human Lung Cell Atlas: A High-Resolution Reference Map of the Human Lung in Health and Disease. American Journal of Respiratory Cell and Molecular Biology 61, 31-41, doi:10.1165/rcmb.2018-0416TR (2019). 89 Eze, U. C., Bhaduri, A., Haeussler, M., Nowakowski, T. J. & Kriegstein, A. R. Single-cell atlas of early human brain development highlights heterogeneity of human neuroepithelial cells and early radial glia. Nature Neuroscience 24, 584-594, doi:10.1038/s41593-020-00794-1 (2021). 90 Weber, G. M., Ju, Y. N. & Borner, K. Considerations for Using the Vasculature as a Coordinate System to Map All the Cells in the Human Body. Frontiers in Cardiovascular Medicine 7, doi:ARTN 29 10.3389/fcvm.2020.00029 (2020). 91 Burger, A. et al. Towards a clinically-based common coordinate framework for the human gut cell atlas: the gut models. BMC Medical Informatics and Decision Making 23, 36, doi:10.1186/s12911-023-02111-9 (2023). 92 Moreno, P. et al. Expression Atlas update: gene and protein expression in multiple species. Nucleic Acids Res 50, D129-D140, doi:10.1093/nar/gkab1030 (2022). 93 Williams, D. W. et al. Human oral mucosa cell atlas reveals a stromal-neutrophil axis regulating tissue immunity. Cell 184, 4090-4104.e4015, doi:10.1016/j.cell.2021.05.013 (2021). 94 Zhao, R. W. et al. Function of normal oral mucosa revealed by single-cell RNA sequencing. Journal of Cellular Biochemistry 123, 1481-1494, doi:10.1002/jcb.30307 (2022). 95 Caetano, A. J. et al. Defining human mesenchymal and epithelial heterogeneity in response to oral inflammatory disease. Elife 10, e62810, doi:10.7554/eLife.62810 (2021). 96 Busslinger, G. A. et al. Human gastrointestinal epithelia of the esophagus, stomach, and duodenum resolved at single-cell resolution. Cell Reports 34, doi:ARTN 108819 10.1016/j.celrep.2021.108819 (2021). 97 Nowicki-Osuch, K. et al. Molecular phenotyping reveals the identity of Barrett's esophagus and its malignant transition. Science 373, 760-+, doi:10.1126/science.abd1449 (2021). 98 Zhang, M. et al. Dissecting transcriptional heterogeneity in primary gastric adenocarcinoma by single cell RNA sequencing. Gut 70, 464-475, doi:10.1136/gutjnl-2019-320368 (2021). 99 Sorini, C. et al. Metagenomic and single-cell RNA-Seq survey of the Helicobacter pylori-infected stomach in asymptomatic individuals. JCI Insight 8, doi:10.1172/jci.insight.161042 (2023). 100 Foong, D. et al. Single-cell RNA sequencing predicts motility networks in purified human gastric interstitial cells of Cajal. Neurogastroenterology and Motility 34, doi:ARTN e14303 10.1111/nmo.14303 (2022). 101 Kumar, V. et al. Single-Cell Atlas of Lineage States, Tumor Microenvironment, and Subtype-Specific Expression Programs in Gastric Cancer. Cancer Discovery 12, 670-691, doi:10.1158/2159-8290.Cd-21-0683 (2022). 102 Atlasy, N. et al. Single cell transcriptomic analysis of the immune cell compartment in the human small intestine and in Celiac disease. Nat Commun 13, 4920, doi:10.1038/s41467-022-32691-5 (2022). 103 Egozi, A. et al. Single cell atlas of the neonatal small intestine with necrotizing enterocolitis. bioRxiv, 2022.2003.2001.482508, doi:10.1101/2022.03.01.482508 (2022). 104 Domínguez Conde, C. et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376, eabl5197, doi:10.1126/science.abl5197 (2022). 105 Burclaff, J. et al. A Proximal-to-Distal Survey of Healthy Adult Human small Intestine and Colon Epithelium by Single-Cell Transcriptomics. Cellular and Molecular Gastroenterology and Hepatology 13, 1554-1589, doi:10.1016/j.jcmgh.2022.02.007 (2022). 106 Wang, Y. L. et al. Single-cell transcriptome analysis reveals differential nutrient absorption functions in human intestine. Journal of Experimental Medicine 217, doi:10.1084/jem.20191130 (2020). 107 Jaeger, N. et al. Single-cell analyses of Crohn's disease tissues reveal intestinal intraepithelial T cells heterogeneity and altered subset distributions. Nat Commun 12, 1921, doi:10.1038/s41467-021-22164-6 (2021). 108 Kong, L. et al. The landscape of immune dysregulation in Crohn's disease revealed through single-cell transcriptomic profiling in the ileum and colon. Immunity 56, 444-458.e445, doi:10.1016/j.immuni.2023.01.002 (2023). 109 Kondo, A. et al. Highly Multiplexed Image Analysis of Intestinal Tissue Sections in Patients With Inflammatory Bowel Disease. Gastroenterology 161, 1940-1952, doi:https://doi.org/10.1053/j.gastro.2021.08.055 (2021). 110 Huang, B. et al. Mucosal Profiling of Pediatric-Onset Colitis and IBD Reveals Common Pathogenics and Therapeutic Pathways. Cell 179, 1160-1176.e1124, doi:https://doi.org/10.1016/j.cell.2019.10.027 (2019). 111 Kinchen, J. et al. Structural Remodeling of the Human Colonic Mesenchyme in Inflammatory Bowel Disease. Cell 175, 372-+, doi:10.1016/j.cell.2018.08.067 (2018). 112 Parikh, K. et al. Colonic epithelial cell diversity in health and inflammatory bowel disease. Nature 567, 49-+, doi:10.1038/s41586-019-0992-y (2019). 113 Corridoni, D. et al. Single-cell atlas of colonic CD8(+) T cells in ulcerative colitis. Nat Med 26, 1480-1490, doi:10.1038/s41591-020-1003-4 (2020). 114 James, K. R. et al. Distinct microbial and immune niches of the human colon. Nat Immunol 21, 343-+, doi:10.1038/s41590-020-0602-z (2020). 115 Lee, H. O. et al. Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat Genet 52, 594-603, doi:10.1038/s41588-020-0636-z (2020). 116 Uzzan, M. et al. Ulcerative colitis is characterized by a plasmablast-skewed humoral response associated with disease activity. Nat Med 28, 766-779, doi:10.1038/s41591-022-01680-y (2022). 117 Pelka, K. et al. Spatially organized multicellular immune hubs in human colorectal cancer. Cell 184, 4734-4752.e4720, doi:10.1016/j.cell.2021.08.003 (2021). 118 Chen, B. et al. Differential pre-malignant programs and microenvironment chart distinct paths to malignancy in human colorectal polyps. Cell 184, 6262-6280.e6226, doi:10.1016/j.cell.2021.11.031 (2021). 119 Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat Genet 49, 708-718, doi:10.1038/ng.3818 (2017). 120 Qian, J. et al. A pan-cancer blueprint of the heterogeneous tumor microenvironment revealed by single-cell profiling. Cell Res 30, 745-762, doi:10.1038/s41422-020-0355-0 (2020). 121 Zhang, L. et al. Single-Cell Analyses Inform Mechanisms of Myeloid-Targeted Therapies in Colon Cancer. Cell 181, 442-459.e429, doi:10.1016/j.cell.2020.03.048 (2020). 122 Domanska, D. et al. Single-cell transcriptomic analysis of human colonic macrophages reveals niche-specific subsets. J Exp Med 219, doi:10.1084/jem.20211846 (2022). 123 Uhlitz, F. et al. Mitogen-activated protein kinase activity drives cell trajectories in colorectal cancer. Embo Molecular Medicine 13, doi:ARTN e14123 10.15252/emmm.202114123 (2021). 124 Zhang, L. et al. Lineage tracking reveals dynamic relationships of T cells in colorectal cancer. Nature 564, 268-+, doi:10.1038/s41586-018-0694-x (2018). 125 Che, L. H. et al. A single-cell atlas of liver metastases of colorectal cancer reveals reprogramming of the tumor microenvironment in response to preoperative chemotherapy. Cell Discov 7, 80, doi:10.1038/s41421-021-00312-y (2021). 126 Becker, W. R. et al. Single-cell analyses define a continuum of cell state and composition changes in the malignant transformation of polyps to colorectal cancer. Nature Genetics 54, 985-+, doi:10.1038/s41588-022-01088-x (2022). 127 Joanito, I. et al. Single-cell and bulk transcriptome sequencing identifies two epithelial tumor cell states and refines the consensus molecular classification of colorectal cancer. Nat Genet 54, 963-975, doi:10.1038/s41588-022-01100-4 (2022). 128 Luoma, A. M. et al. Molecular Pathways of Colon Inflammation Induced by Cancer Immunotherapy. Cell 182, 655-671.e622, doi:https://doi.org/10.1016/j.cell.2020.06.001 (2020). 129 Beumer, J. et al. High-Resolution mRNA and Secretome Atlas of Human Enteroendocrine Cells. Cell 181, 1291-+, doi:10.1016/j.cell.2020.04.036 (2020). 130 He, G. W. et al. Optimized human intestinal organoid model reveals interleukin-22-dependency of paneth cell formation. Cell Stem Cell 29, 1333-1345 e1336, doi:10.1016/j.stem.2022.08.002 (2022). 131 Holloway, E. M. et al. Mapping Development of the Human Intestinal Niche at Single-Cell Resolution. Cell Stem Cell 28, 568-+, doi:10.1016/j.stem.2020.11.008 (2021). 132 Yu, Q. H. et al. Charting human development using a multi-endodermal organ atlas and organoid models. Cell 184, 3281-+, doi:10.1016/j.cell.2021.04.028 (2021). 133 Gao, S. et al. Tracing the temporal-spatial transcriptome landscapes of the human fetal digestive tract using single-cell RNA-sequencing (vol 20, pg 721, 2018). Nature Cell Biology 20, 1227-1227, doi:10.1038/s41556-018-0165-5 (2018). 134 Sanchez, J. G., Enriquez, J. R. & Wells, J. M. Enteroendocrine cell differentiation and function in the intestine. Curr Opin Endocrinol Diabetes Obes 29, 169-176, doi:10.1097/med.0000000000000709 (2022). 135 Du, Y. et al. An update on the biological characteristics and functions of tuft cells in the gut. Front Cell Dev Biol 10, 1102978, doi:10.3389/fcell.2022.1102978 (2022). 136 Bolton, C. et al. An Integrated Taxonomy for Monogenic Inflammatory Bowel Disease. Gastroenterology 162, 859-876, doi:10.1053/j.gastro.2021.11.014 (2022). 137 Kanke, M. et al. Single-Cell Analysis Reveals Unexpected Cellular Changes and Transposon Expression Signatures in the Colonic Epithelium of Treatment-Naive Adult Crohn's Disease Patients. Cell Mol Gastroenterol Hepatol 13, 1717-1740, doi:10.1016/j.jcmgh.2022.02.005 (2022). 138 Martin, J. C. et al. Single-Cell Analysis of Crohn's Disease Lesions Identifies a Pathogenic Cellular Module Associated with Resistance to Anti-TNF Therapy. Cell 178, 1493-1508 e1420, doi:10.1016/j.cell.2019.08.008 (2019). 139 Uniken Venema, W. T. et al. Single-Cell RNA Sequencing of Blood and Ileal T Cells From Patients With Crohn's Disease Reveals Tissue-Specific Characteristics and Drug Targets. Gastroenterology 156, 812-815 e822, doi:10.1053/j.gastro.2018.10.046 (2019). 140 Chen, E. et al. Inflamed Ulcerative Colitis Regions Associated With MRGPRX2-Mediated Mast Cell Degranulation and Cell Activation Modules, Defining a New Therapeutic Target. Gastroenterology 160, 1709-1724, doi:10.1053/j.gastro.2020.12.076 (2021). 141 Devlin, J. C. et al. Single-Cell Transcriptional Survey of Ileal-Anal Pouch Immune Cells From Ulcerative Colitis Patients. Gastroenterology 160, 1679-1693, doi:10.1053/j.gastro.2020.12.030 (2021). 142 Mitsialis, V. et al. Single-Cell Analyses of Colon and Blood Reveal Distinct Immune Cell Signatures of Ulcerative Colitis and Crohn's Disease. Gastroenterology 159, 591-608 e510, doi:10.1053/j.gastro.2020.04.074 (2020). 143 Qi, J. J. et al. Single-cell and spatial analysis reveal interaction of FAP(+) fibroblasts and SPP1(+) macrophages in colorectal cancer. Nat Commun 13, doi:ARTN 1742 10.1038/s41467-022-29366-6 (2022). 144 Goulet, O., Pigneur, B. & Charbit-Henrion, F. Congenital enteropathies involving defects in enterocyte structure or differentiation. Best Pract Res Clin Gastroenterol 56-57, 101784, doi:10.1016/j.bpg.2021.101784 (2022). 145 Kelsen, J. R. & Baldassano, R. N. The role of monogenic disease in children with very early onset inflammatory bowel disease. Curr Opin Pediatr 29, 566-571, doi:10.1097/mop.0000000000000531 (2017). 146 Karim, A., Tang, C. S. & Tam, P. K. The Emerging Genetic Landscape of Hirschsprung Disease and Its Potential Clinical Applications. Front Pediatr 9, 638093, doi:10.3389/fped.2021.638093 (2021). 147 Gunther, C., Winner, B., Neurath, M. F. & Stappenbeck, T. S. Organoids in gastrointestinal diseases: from experimental models to clinical translation. Gut 71, 1892-1908, doi:10.1136/gutjnl-2021-326560 (2022). 148 Ishikawa, K. et al. Identification of Quiescent LGR5(+) Stem Cells in the Human Colon. Gastroenterology, doi:10.1053/j.gastro.2022.07.081 (2022). 149 Dmitrieva-Posocco, O. et al. beta-Hydroxybutyrate suppresses colorectal cancer. Nature 605, 160-165, doi:10.1038/s41586-022-04649-6 (2022). 150 Fawkner-Corbett, D. et al. Spatiotemporal analysis of human intestinal development at single-cell resolution. Cell 184, 810-+, doi:10.1016/j.cell.2020.12.016 (2021). 151 Cao, J. et al. A human cell atlas of fetal gene expression. Science 370, eaba7721, doi:doi:10.1126/science.aba7721 (2020). 152 Li, N. et al. Memory CD4+ T cells are generated in the human fetal intestine. Nat Immunol 20, 301-312, doi:10.1038/s41590-018-0294-9 (2019). 153 Nguyen, Q. H., Pervolarakis, N., Nee, K. & Kessenbrock, K. Experimental Considerations for Single-Cell RNA Sequencing Approaches. Frontiers in Cell and Developmental Biology 6, doi:ARTN 108 10.3389/fcell.2018.00108 (2018). 154 Haque, A., Engel, J., Teichmann, S. A. & Lonnberg, T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Medicine 9, doi:ARTN 75 10.1186/s13073-017-0467-4 (2017). Acknowledgements This publication is part of the Human Cell Atlas (HCA)- www.humancellatlas.org/publications. The HCA initiative receives funding from The Wellcome Trust, the UK Research and Innovation Medical Research Council, EU Horizon 2020, INSERM (HuDeCA), and the Knut and Alice Wallenberg and Erling-Persson foundations. We thank the HCA Executive Office for their support. The Gut Cell Atlas is organised by The Leona M. and Harry B. Helmsley Charitable Trust and provides funding for members in form of project grants. MZ was supported by an MRC New Investigator research grant (MR/T001917/1) and a project grant from the Great Ormond Street Hospital Children’s Charity, Sparks (V4519); KSL was supported by NIDDK R01DK103831, and The Helmsley Trust - G-1903-03793. STM received funding from (STM) National Institutes of Health USA R01DK115806 & P30DK034987. TS was supported by the Japanese Science and Technology (JST) forest, and the Japanese Society for the Promotion of Science (JSPS) (21K18272). LAC and KTW. were supported by The Helmsley Charitable Trust - G-1903-03793. KTW was also supported by NIDDK R01DK128200. LAC was supported by a Veterans Affairs Merit Award 1I01BX004366. MK was supported by the National Research Foundation, South Africa grant no: 129356. Author contributions M.Z., K.R.J., M.K., S.P., Z.L., A.B., J.R.T., J.B., F.L.J., F.P., A.Ross, N.S., R.B.C., E.S.B., R.Z., B.X., K.L., S.D., S.T.M., Q.Y., S.B., M.J.A., A.D.S., L.C., J.G., R.B., I.P., J.O.M., S.A.T., M.P.S., K.T.W. researched data for the article. M.Z., K.R.J., M.K., S.P., Z.L., A.B., J.R.T., J.B., F.L.J., F.P., A.Ross, G.M., N.S., T.S., A.M., R.B.C., E.S.B., R.Z., B.X., K.L., S.D., S.T.M., Q.Y., S.B., M.J.A., A.D.S., L.C., J.G., R.B., I.P., J.O.M., G.E.B., A.H. S.A.T., A.Regev, R.J.X., M.P.S., K.T.W. contributed substantially to discussion of the content. M.Z., K.R.J., M.K., S.P., Z.L., A.B., J.R.T., J.B., F.L.J., F.P., A.Ross, N.S., T.S., R.B.C., E.S.B., R.Z., B.X., K.L., S.D., S.T.M., Q.Y., S.B., M.J.A., A.D.S., L.C., J.G., R.B., I.P., G.E.B., S.A.T., M.P.S., K.T.W. wrote the article. M.Z., K.R.J., M.K., S.P., Z.L., A.B., J.R.T., J.B., F.L.J., F.P., A.Ross, G.M., T.S., A.M., R.B.C., E.S.B., R.Z., B.X., K.L., S.D., S.T.M., Q.Y., S.B., M.J.A., A.D.S., L.C., J.G., R.B., I.P., J.O.M., G.E.B., A.H., S.A.T., A.Regev, R.J.X., A.S., M.P.S., K.T.W reviewed and/or edited the manuscript before submission. Competing interests SAT: In the past three years, SAT. has consulted or been a member of scientific advisory boards at Roche, Genentech, Biogen, GlaxoSmithKline, Qiagen and ForeSite Labs and is an equity holder of Transition Bio. GM has received grant funding from Boehringer Ingelheim. A.R. [Au: A.R. as in Alexander Ross or Aviv Reger?] is a co-founder and equity holder of Celsius Therapeutics, an equity holder in Immunitas, and was an SAB member of ThermoFisher Scientific, Syros Pharmaceuticals, Neogene Therapeutics and Asimov until 31 July 2020. Since 1 August 2020, A.R. has been an employee of Genentech and has equity in Roche. A.R. [Au: A.R. as in Alexander Ross or Aviv Reger?] is an inventor on patents and patent applications filed at the Broad related to single cell genomics. Peer review information [Au: Placeholder for referee accreditation, please do not delete] Nature Reviews Reviews Gastroenterology & Hepatology thanks [Referee#1 name], [Referee#2 name] and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Related links The Human Cell Atlas – Metadata: https://data.humancellatlas.org/metadata HCA Gut Bionetwork: https://www.humancellatlas.org/biological-networks/ Human BioMolecular Atlas Program: https://commonfund.nih.gov/hubmap Helmsley Charitable Trust (Gut Cell Atlas): https://helmsleytrust.org/our-focus-areas/crohns-disease/crohns-disease-therapeutics/gut-cell-atlas/ Single Cell Expression Atlas: https://www.ebi.ac.uk/gxa/sc/home Tabula Sapiens: https://tabula-sapiens-portal.ds.czbiohub.org University of California at Santa Cruz (UCSC) Cell Browser: https://cells.ucsc.edu Single Cell Portal: https://singlecell.broadinstitute.org/single_cell Gut Cell Atlas: https://www.gutcellatlas.org Supplementary information Supplementary information is available for this paper at https://doi.org/10.1038/s415XX-XXX-XXXX-X Author contributions MZ, KTW, AS, MPS and RJX are coordinators of the HCA Gut Biological Network. MZ conceived the idea, co-ordinated the writing process, wrote parts of the paper and edited all sections. SP was section lead for methodologies. ZL was section lead for computational tools and challenges. MK was section lead of the integration of HGCA with gut diseases and organoids. KJ was section lead for the summary of existing datasets and portals. AGB was section lead for the Common Coordinate Framework (CCF). AH helped to design and create the figures. All other authors wrote parts of the paper, contributed to critical discussions and provided feedback on the entire manuscript. Key points: · The number of studies applying single-cell sequencing methods to human intestinal tissue has been rapidly increasing providing a unique opportunity to generate a complete map of the human intestine. · The generation of a Human Gut Cell Atlas (HGCA) requires the coordinated efforts of groups across the globe and the integration of various datasets followed by their computational analyses. · This article provides a roadmap for the generation of the HGCA based on the expertise and recommendations of the Human Cell Atlas Gut Biological Network. · The HGCA will provide a unique and highly valuable reference map enhancing research in intestinal health and disease. Figure 1: Profiling the human digestive tract: A complete map of the human intestinal tract requires the inclusion of the entire intestinal tube and associated organs (left panel). Profiling must also consider the effect of the developmental stage, the gut microbiome, and the potential effect of environmental factors such as diet, toxins and medication. In addition to transcriptional profiling, capturing the underlying genome sequence and the epigenetic programme will yield critical information (right panel). Figure 2: Summary of main tissue types and sampling strategies available. Main advantages and limitations are illustrated. A. Anatomical outline of the human digestive tract including the intestinal tube and associated organs. B. Key aspects to consider when profiling the human gastrointestinal tract including axial and cephalon-caudal differences, developmental age of tissue donors and the effects of the microbiome and environmental factors in cellular function. Profiling epigenetic mechanisms as a molecular mediator between the cellular genome and the environment. scChiP-seq = single cell chromatin immunoprecipitation – sequencing, scATAC-seq = single cell Assay for Transposase-Accessible Chromatin,  scRRBS-seq = single cell Reduced Representation Bisulfite sequencing. Figure 3: Data integration, processing and analysing strategies: 1) Generation of the HGCA requires successful integration of various datasets, datatypes, and associated patient and donor metadata. 2 and 3) Successful integration will allow a range of queries to be performed and outputs generated. Simultaneous profiling of epigenetic mechanisms on a single cell level will allow identification of regulatory cellular networks and help to define cellular identities. Figure 4: Current applications for the Human Gut Cell Atlas Common Coordinate Framework: a: HuBMAP Registration User Interface (CCF-RUI) showing the gut visibility controls in the left panel, interactive block registration interface in the centre panel and the block specification controls in the right panel. b: Gut Atlas CCF browser interface showing the one-dimensional conceptual model for the full large and small intestines with zoom panel and ontology listings at the top with two-dimensional and three-dimensional interactive displays for all of the available models. The position of the current region of interest (ROI) is displayed in each CCF view and is fully interoperable in the sense that position selection in any view will be updated immediately in all the other views. c: EBI SCEA anatomogram for the large and small intestines allowing selection of any structure identified in the ASCT tables. d: Expanded view of the anal canal to show relevant cell-types and tissue organisation of that region. Figure 5: HGCA in health and disease: Integration of datasets generated from intestinal tissues obtained from healthy individuals and patients with related diseases including IBD and colorectal cancer will enhance the future value of the HGCA. Single cell profiling of intestinal organoids followed by their integration in the HGCA will provide unique opportunities for translational research including regenerative medicine, drug testing and development. Table 3. Summary of existing studies that have used single-cell profiling methods to human intestinal tissue samples obtained from healthy individuals, patients diagnosed with colorectal cancer, inflammatory bowel diseases and patient derived intestinal organoids. Study Donor and patients Gut segment sampled Tissue/sample type Number of cells and main/key cell types Williams et al, Cell 202193 Healthy individuals Oral cavity Biopsies of buccal and gingival mucosa 88,000 cells; Epithelial, endothelial, fibroblast and immune lineages Zhao et al, J Cell Biolo 202294 Healthy indivdiuals Oral cavity Buccal mucosa 26,398 cells; Fibroblasts, immune endothelial, melanocytes, myofibroblasts, epithelial and neuron-like cells Caetano et al, eLife 202195 Helathy individuals and patients with periodontitis Oral cavity Resections of gingival tissue from healthy individuals or patients with periodontitis 12,411 cells; Epithelial, stromal, immune, endothelial, perivascular Busslinger et al, Cell Reports 202196 Healthy individuals Esophagus, stomach, duodenum Mucosal biopsies Biopsies 4,581 cells; Esophageal squamous, gastric glandular and duodenal crypt, and villus epithelia Nowicki-Osuch et al., Science, 202197 Deceased organ donors and Barrett’s esophagus patients Esophagus, stomach, duodenum Mucosal biopsies and surgical resection material 43,000 cells; Epithelial cells, including squamous basal and superficial cells, gastric foveolar, endocrine, parietal, and chief cells Zhang et al., Gut, 202198 Patients diagnosed with gastric adenocarcinoma and healthy controls Stomach Mucosal biopsies 27,667 cells; Epithelial, stromal and immune cells Sorini et al., JCI Insight 202399 Patients undergoing gastric resection for obesity (with and without Helicobacter pylori infection) Stomach Surgical resection 22,033 cells; T cells, B cells, ILCs, myeloid cells. Foong et al., Neurogastroenterology Motility, 2022100 Patients undergoing gastric resection for obesity Stomach Surgical resection 424 cells; Intestinal cells of Cajal and hematopoietic cells Kumar et al., Cancer Discovery, 2022101 Patients diagosed with gastric adenocarcinoma Stomach Mucosal biopsies and surgical resection 152,423 cells; Epithelial, stromal and immune cells Atlasy et al., Nat. Comm., 2022102 Patients diagnosed with Coeliac disease and healthy controls Small bowel (duodenum) Mucosal biopsies 6,248 cells; immune cells including B-, T-cells, macrophages and den Egozi et al, BioRxiv, 2022103 Infants diagnosed with necrotizing enterocolitis and controls Small bowel Surgical resection 11,308 cells; Myeloid, B cells, T/NK cells, lymphatic/blood endothelial cells, fibroblasts, and enterocytes populations Dominguez Conde et al, Science 2022104 Deceased donors Small and large bowel (also Lung, spleen, bone marrow and lymphoid tissue) Resection 360,000 cells, moscal immune cells Burclaff et al, CMGH 2022105 Transplant donors Small bowel (duodenum, jejunum, ileum), large bowel (ascending, transverse and descending colon) Surgical resection 428,000 cells; Immune, epithelial, mesenchymal, endothelial, neural and red blood cells. Elmentaite et al, Nature 202125 Deceased donors, healthy individuals, patients diagnosed with Inflammatory bowel diseases, human fetal gut Small bowel (duodenum, jejunum, ileum), appendix, large bowel (caecum, ascending colon, transverse colon, descending colon, sigmoid colon, rectum) Surgical resection and mucosal biopsies 428,000 cells; Immune, epithelial, mesenchymal, endothelial, neural and red blood cells. Beumer et al, Cell 2020129 Patients diagnosed with colorectal cancer, healthy individuals Small bowel (duodenum, ileum), large bowel (ascending colon) Intestinal organoids generated from surgical resection or mucosal biopsies. 4,281 cells; Enteroendocrine cells, enterocytes, stem, goblet and Paneth cells. Wang et al, J Exp Med 2020106 Patients with intestinal tumours Small bowel (ileum), large bowel (colon and rectum) Mucosal resection (sampled 10cm away from tumour) 14,537 cells; Enterocytes, Goblet cells, Paneth cells, Enteroendocrine cells, progenitor cells, TA and stem cells. Huang et al, Cell 2019110 Paediatric patients diagnosed with Crohn’s Disease, Ulcerative Colitis and healthy controls Appendix, large bowel (cecum, ascending colon, transverse colon, descending colon, sigmoid colon, rectum) Mucosal biopsies 73,165 cells; Epithelial cells, stromal cells, and immune cells including myeloid cell, B cell subsets, plasma cell and T and NK cells. Jaeger et al, Nat Comm, 2021107 Patients diagnosed with Crohn’s diseases and non-IBD controls (bowel cancer) Small bowel (terminal ileum) Surgical resection 15,731 IELs & 29,247 LP cells; T cells Elmentaite et al, Dev Cell 202018 Paediatric patients diagnosed with Crohn’s Disease and healthy controls, human fetal gut Small bowel (terminal ileum), colon (fetal colon) Mucosal biopsies (paediatric patients), resection (fetal gut) 62,854 (fetal) & 11,302 (pediatric) cells; Immune, epithelial, mesenchymal, endothelial, neural and red blood cells. Drokhlyansky et al, Cell 202033 Patients diagnosed with colorectal cancer Large bowel (colon) Surgical resection 436,202 cells; Adipose, epithelial, glial, mesenchymal, endothelial, T cell, fibroblast, macrophage, myocyte, neuronal subsets from muscularis propria. Kinchen et al., Cell, 2018111 Patients diagnosed with ulcerative colitis and healthy controls Large bowel (colon) Mucosal biopsies 4,378 cells; stromal cell subsets in healthy human colon and UC colon. Parikh et al., Nature, 2019112 Patients diagnosed with Ulcerative colitis and healthy controls Large bowel (colon) Mucosal biopsies 11,175 cells; Intestinal epithelial cells, including progenitor cells, colonocytes and goblet cells Corridoni et al., Nature Medicine, 2020113 Patients diagnosed with Ulcerative colitis and healthy controls Large bowel (colon) Mucosal biopsies 8,581 cells; CD8+ T cells James et al., Nat. Immunol., 2020114 Transplant donors Large bowel (cecum, transverse colon, sigmoid colon and mesenteric lymph nodes) Surgical resection 40,108 cells; B cells, T cells, ILCs and myeloid cells. Includes microbiota data Kong et al, Immunity, 2023108 Patients diagnosed with Crohn’s disease and healthy controls Small bowel (terminal ileum), large bowel (sigmoid colon) Mucosal biopsies 720,633 cells; intestinal epithelium, myofibroblasts, immune cells, stromal cells. Smillie et al., Cell, 201982 Patients diagnosed with Ulcerative colitis and healthy controls Large bowel (colon) Mucosal biopsies 107,784 cells; epithelial, stromal, and immune cells Kondo et al, Gastroenterology 2021109 Patients diagnosed with Ulcerative colitis, Crohn’s Disease and healthy controls Small bowel (terminal ileum), large bowel (colon) Mucosal biopsies >300,000 cells; Epithelial, stromal, T, B, plasma, macrophage, myeloid cells from 31 protein markers Lee et al, Nature Genetics, 2020115 Patients diagnosed with colorectal cancer Large bowel (colon) Surgical resection 93,333 cells; Goblet cells, stem-like transit amplifying cells, colonocytes, fibroblast, glia cells, endothelial cells, macrophage, Dendritic cells, T- NK-cells, plasma, B-cells Uzzan et al., Nat Med, 2022116 Patients diagnosed with Ulcerative colitis and healthy controls Large bowel (left colon) Mucosal biopsies 18,720 cells; T-cells, B-cells, macrophages, plasma cells, stromal cells Pelka et al, Cell, 2021117 Patients diagnosed with Colorectal cancer Large bowel (colon) Surgical resection 371,223 cells; immune cells endothelial cells, fibroblasts, epithelial cells Chen et al, Cell 2021118 Patients diagnosed with colorectal cancer, intestinal polyps and healthy controls Large bowel (colon) Surgical resection, intestinal polyps 142,065 cells; Intestinal epithelium, immune cells, fibroblasts. Li et al, Nat Genet 2017119 Patients diagnosed with colorectal cancer Large bowel (colon) Surgical resection 622 cells; Epithelial cells, fibroblasts, endothelial cells, mast cells, immune cells. Qian et al., Cell Research, 2020120 Patients diagnosed with Colorectal cancer Large bowel (colon) Surgical resection 44,685 cells; T cells, B cells, macrophages, dendritic cells, fibroblasts, endothelial cells, epithelial cells, enteric glia Zhang et al, Cell 2020121 Patients diagnosed with colorectal cancer Large bowel (colon) Surgical resection 43,817 cells (10X Genomics), 10,4