Translational regulation in aggressive B-cell lymphomas Joanna Alicja Krupka Gonville and Caius College This dissertation is submitted on Easter Term, 2021 for the degree of Doctor of Philosophy Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except as declared in the Preface and specified in the text. It is not substantially the same as any that I have submitted, or am concurrently submitting, for a degree or diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the Preface and specified in the text. I further state that no substantial part of my dissertation has already been submitted, or is being concurrently submitted, for any such degree, diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the Preface and specified in the text. This dissertation does not exceed the prescribed limit of 60 000 words. Joanna Alicja Krupka September, 2021 Abstract Translational regulation in aggressive B-cell lymphomas Joanna Alicja Krupka The Germinal Centre (GC) reaction is a dynamic process where B-cells undergo recombination and somatic hypermutation of immunoglobulin genes in response to antigen stimulation. This essential component of the adaptive immune system is associated with cycles of intensive proliferation and selection, which carries a risk of malignant transformation. Aggressive lymphomas arising from the GC stage of B-cell development are the most common haematological malignancies with heterogeneous molecular mechanisms and clinical presentation. Although the last decade witnessed considerable advances in the biology of GC reaction and related tumours, the studies focused predominantly on the network of transcription factors. The advances in Next Generation Sequencing technologies have opened new possibilities to explore mechanisms of regulation beyond the level of transcription. Ribo-Seq is a technique combining ribosome footprinting with deep sequencing of mRNA fragments that allows to map the position of translating ribosomes with single nucleotide precision. Here I investigate the mechanisms of translational regulation contributing to lymphoma development. Firstly, I introduce RiboStream, an automated bioinformatic pipeline designed to streamline processing of Ribo-Seq datasets while maintaining transparency and reproducibility of the computational workflow. Then, I provide an overview and benchmarking of current methods for identifying translationally regulated genes. Based on these I select a strategy to reveal that overexpression of two B-cell oncogenes, BCL6 or MYC, is followed by preferential translational of selected transcripts. Next, I show that loss-of-function mutations in RNA-helicase (DDX3X) promote early development of MYC- driven lymphoma by bu↵ering the e↵ects of MYC on translation of ribosomal proteins and the rate of global protein synthesis. Finally, I explore a genome-wide distribution of translating ribosomes to study the scope of non-canonical translation in lymphoid cells. Taking advantage of a large dataset of 79 Ribo-Seq libraries I reveal pervasive translation of ostensibly non-coding regions, and design a knock-down CRISPR screen library to identify those important for B-cell survival. Acknowledgements First and foremost, I would like to express my gratitude to my supervisor, Dr. Daniel Hodson, who took the risk of having me in his lab and gave me the freedom to discuss problems I found interesting. His expertise, advice and sense of humour were invaluable. I would like to thank Dan for being a trusted mentor and friend on my way to becoming a scientist. I am also thankful to my second supervisor, Dr. Shamith Samarajiwa, for his enthusiasm for accompanying me with my first steps in bioinformatics. By sharing all computational resources with me, I enjoyed an unrestrained opportunity to explore my ideas, for which I am immensely grateful. I am also indebted to Dr. Martin Turner for inspiring discussions and encouragement to look ahead into unexplored territories of science. I would like to also thank all members of the Hodson and the Samarajiwa labs, who made the last four years a good-humoured and colourful time. Special thanks go to Jie Gao for her strength of spirit in preparing all the Ribo-Seq libraries and to Chun Gong, Mata Vorri and Hendrik Runge for fruitful collaboration. My time in Cambridge would not be the same without Dora Bihary, Shoko Hirosue, Cassandra Kosmidou, David Shorthouse and Katie Young. Thank you for sharing your time with me and an entire series of pub events, which helped me get through challenging times. I would not be where I am today without my dearest friends, who stayed in Poland. My PhD adventure would not be possible without Dr. Agnieszka Graczyk-Jarzynka, who one day in November 2015 welcomed me in the Department of Immunology (Medical University of Warsaw) and became my first mentor. I am also grateful to Julia, Sara, Kasia and Ania, who, despite being thousands of kilometres away, were always keeping my spirits up. All of this would not have been possible without the support of Cancer Research UK Cambridge Centre and Addenbrooke’s Charitable Trust, who funded my research. Finally, I would like to thank my family. I am grateful to my parents, brother and grandparents for believing in me and supporting even the most bizarre of my ideas, to Alek’s mom and grandma for our weekly chats on FaceTime, and lastly to Alek for his endless patience and fabulous lemon tarts. Contents 1 Introduction 15 1.1 Molecular basics of RNA translation . . . . . . . . . . . . . . . . . . . . . 16 1.1.1 Four stages of protein synthesis . . . . . . . . . . . . . . . . . . . . 18 1.1.2 Overview of ribosome biogenesis . . . . . . . . . . . . . . . . . . . . 23 1.2 Translational control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.2.1 Regulation of the translation initiation . . . . . . . . . . . . . . . . 25 1.2.2 Regulation of translation elongation . . . . . . . . . . . . . . . . . . 27 1.2.3 Regulation of translation termination . . . . . . . . . . . . . . . . . 28 1.3 Deregulation of translation in human cancers . . . . . . . . . . . . . . . . . 29 1.3.1 Oncogenic and tumour suppressor pathways converge at controlling protein synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.3.2 Ribosome biogenesis and its oncogenic potential . . . . . . . . . . . 32 1.3.3 Translation factors are frequently deregulated in cancer . . . . . . . 33 1.3.4 The significance of translational response to stress in cancer . . . . 34 1.4 Elements of B-cell biology from the perspective of lymphoma development 38 1.5 Role of translation in B-cell development and malignancy . . . . . . . . . . 42 1.6 Toolkit to study heterogeneity of translation . . . . . . . . . . . . . . . . . 46 1.7 Project aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2 Materials and methods 53 2.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.1.1 Overview of genomic sequences and annotations used in this study . 53 2.1.2 Overview of external datasets used in this study . . . . . . . . . . . 54 2.1.3 Overview of computational software used in this study . . . . . . . 55 2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.2.1 Next Generation Sequencing library preparation and sequencing . . 56 2.2.1.1 RNA-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.2.1.2 Ribo-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.2.2 Processing and quality control of Next Generation Sequencing data 57 2.2.2.1 Adapter trimming and alignment to the reference genome 57 2.2.2.2 Read counting . . . . . . . . . . . . . . . . . . . . . . . . 57 2.2.3 Di↵erential translation analysis of Ribo-Seq data . . . . . . . . . . 57 2.2.4 Di↵erential expression and downstream analysis of RNA-Seq data . 58 2.2.5 Metagene analysis of iCLIP and Ribo-Seq data . . . . . . . . . . . 58 2.2.6 Di↵erential expression analysis of RNA-Seq data . . . . . . . . . . . 58 2.2.7 Downstream data analysis . . . . . . . . . . . . . . . . . . . . . . . 58 2.2.7.1 Individual-nucleotide resolution UV crosslinking and im- munoprecipitation (iCLIP) . . . . . . . . . . . . . . . . . . 59 2.2.8 Identification of DDX3X mutations from RNA-Seq data . . . . . . . 59 2.2.9 DLBCL Cell-of-origin identification from RNA-Seq data . . . . . . 60 2.2.10 Chromosome Y expression identification from RNA-Seq data . . . . 60 2.2.11 Hierarchical de novo identification of translated regions from Ribo- Seq data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.2.12 Reanalysis of published mass spectrometry datasets . . . . . . . . . 62 2.2.13 Analysis of proteogenomic data downloaded from OpenProt and sORFdb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.2.14 Evolutionary conservation of identified ORFs . . . . . . . . . . . . . 63 2.2.15 CRISPR screen design . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.2.16 Figures preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 2.2.17 R and Bioconductor . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3 Genome wide quantification of translation in lymphoid malignancies 67 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.1.1 Establishing a bioinformatic pipeline for processing of translatome and transcriptome data . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.1.2 Quality of translatome profiling in primary GC B-cells . . . . . . . 74 3.1.3 Benchmarking statistical approaches for di↵erential translation analysis 79 3.2 Translational regulation following BCL6 and MYC overexpression in primary GC B-cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4 Mutations in RNA helicase DDX3X facilitate MYC-driven lymphoma- genesis 95 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.2.1 Examining the prevalence and distribution of DDX3X mutations . . 98 4.2.1.1 DDX3X is preferentially mutated in MYC driven lymphomas 98 4.2.1.2 Context dependent pattern of DDX3X mutation in di↵erent cancer types . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.2.1.3 DDX3X mutations in B-cell lymphomas cluster within C-terminal helicase domain . . . . . . . . . . . . . . . . . 101 4.2.1.4 Males with Burkitt Lymphoma and DLBCL are more likely to have DDX3X mutation . . . . . . . . . . . . . . . . . . 103 4.2.2 DDX3X regulates ribosome biogenesis and global protein synthesis . 105 4.2.2.1 DDX3X binds preferentially to mRNA encoding compo- nents of core translation machinery . . . . . . . . . . . . . 105 4.2.2.2 DDX3X regulates translation of a subset of expressed tran- scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.2.3 Deregulation of MYC in primary GC B-cells increases ribosome biogenesis and triggers ER stress. . . . . . . . . . . . . . . . . . . . 118 4.2.4 DDX3X mutation interferes with endoplasmic reticulum stress response120 4.2.4.1 DDX3X R475C mutation is associated with suppression of unfolded protein response in U2932 cells . . . . . . . . . . 120 4.2.4.2 DDX3X mutation is associated with suppression of un- folded protein response in BL patients . . . . . . . . . . . 123 4.2.5 Up-regulation of DDX3Y in established tumours rescues loss of DDX3 helicase activity . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5 Elucidating the role of translated micropeptides in Di↵use Large B-cell Lymphoma 133 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.2.1 A systematic approach for de novo identification of noncanonical translation products in lymphoid cells. . . . . . . . . . . . . . . . . 135 5.2.1.1 An integrated ORF identification workflow . . . . . . . . . 135 5.2.1.2 Pervasive translation of crude non-coding regions in lym- phoid cells . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.2.2 Noncanonical ORFs account for about 10% of proteins detected in proteomics experiments . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.2.3 Characteristics of noncanonical ORFs producing MHC-bound peptides144 5.2.4 Design of customised knockout CRISPR screen to identify noncanon- ical ORFs important for B-cells survival . . . . . . . . . . . . . . . 146 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6 Perspectives 155 Bibliography 159 CHAPTER 1 Introduction The ‘central dogma’ of molecular biology, proposed by Crick (1958), states that genes are chemically expressed as proteins in a sequence of molecular procedures. First, a DNA sequence is rewritten (transcribed) into RNA, then decoded (translated) into an amino-acids chain, which eventually folds into a protein (Figure 1.1). For a long time, the translation process and its core components - the ribosomes and translation factors were viewed as molecular machines passively processing all available transcripts. This focus has been rewired by the ready avaliability of technology to quantify mRNA abundance, such as microarray or RNA-Seq. Figure 1.1: Francis Crick’s first draft of the central dogma of molecular biology from an unpublished note (1956) However, the relationship between mRNA and protein levels is far from being simple. Firstly, it is estimated that steady-state transcript levels in human cells can explain between 56% and 84% of protein abundance (Lundberg et al., 2010; Liu et al., 2016; Schwanha¨usser et al., 2011; Jovanovic et al., 2015). The analysis performed by Jovanovic et al. (2015) showed that mRNA level, translation intensity and protein degradation rate can explain up to 79% of total protein abundance, with 18 - 26% and 8 - 22% of this corresponding to translation and protein degradation, respectively. Secondly, the relationship between mRNA and its product may vary when comparing di↵erent cell types and conditions. Relative changes in protein abundance during dynamic cell transitions are explained predominantly by the mRNA abundance, however, the rates of synthesis 15 and degradation vary substantially between individual proteins, which highlights the importance of regulation other than transcriptional (Jovanovic et al., 2015; Mathieson et al., 2018). Therefore, there is a growing appreciation of the importance of regulation imposed at the point of translation. Protein synthesis consumes the majority of cellular energy resources, especially if a cell is bio-synthetically active, rapidly growing, or di↵erentiating (Rolfe and Brown, 1997; Buttgereit and Brand, 1995; Lynch and Marinov, 2015). Hence, precise regulation of translation seems to be energetically ecient providing an opportunity to modulate cellular protein levels quickly. It is not surprising that translational control is especially important for highly energetic processes, such as cell proliferation, hormone release and stress response (Hershey et al., 2012). Translation is also the primary mechanism of gene expression regulation in cells lacking active transcription (for example, in oocytes or red blood cells) or during the early stages of viral infection, when the host transcription is suppressed (Hershey et al., 2012; Mohr, 2016). Deregulated translation is a hallmark of cancer, but the exact role of this is complex and context dependent. 1.1 Molecular basics of RNA translation Translation is an evolutionarily conserved process during which the nucleotide sequence of a messenger RNA (mRNA) is decoded into the amino-acid sequence of a protein by rules known as genetic code. A CoDing Sequence region (CDS) is a region of mRNA or genomic DNA whose nucleotide sequence determines amino-acids sequence. It is usually flanked by two UnTranslated Regions (UTRs): 50UTR and 30UTR. Every three adjacent nucleotides of a CDS form a unit called a codon. There are 4 x 4 x 3 = 64 possible combinations of the three nucleotides, three of which indicate the end of a protein; the remaining 61 corresponds to one out of 20 amino acids. An Open Reading Frame (ORF) is a series of codons contained between a start and a stop codon. A given RNA sequence can be decoded in one out of three possible reading frames depending on position of the first (start) codon. The codons of a protein coding sequence are deciphered from 50 end to 30 end by transfer RNA (tRNA). tRNAs are small adaptor molecules that pair with a coding triplet through a complementary anticodon. The genetic code is redundant, which means that more than one codon specifies a single amino acid, and more than one tRNA molecule can match a single codon. Some tRNAs require a complete match between codon and anticodon, but for some, one mismatch (wobble) at the third position of the codon does not a↵ect the accurate amino-acid determination. A tRNA carrying a matching amino acid is called aminoacyl-tRNAs. The majority of protein coding mRNAs undergoe two specific post-transcriptional 16 modifications: polyadenylation of the 30 end and capping of the 50end with a methylated guanosine at 7th position (m7G). The actual reactions of protein synthesis take place inside the ribosome, a large ribonucleoprotein complex (RNP) composed of multiple ribosomal proteins and ribosomal RNAs (rRNA). A fully assembled eukaryotic ribosome (80S) consists of a small (40S) and a large subunit (60S). When not involved in active translation, the ribosomal subunits are separated. During evolution, the ribosome became larger and richer, but its core components are so highly conserved that a common ancestor of all living species must have synthesised its proteins similarly to organisms living today. The human ribosome contains 80 ribosomal proteins and four separate rRNA molecules that account for more than 80 % of the cellular RNA pool (Cech, 2000). Each ribosome has three tRNA binding sites, that is A-site for aminoacyl-tRNA, P-site for peptidyl-tRNA and E-site for exit (Figure 1.2). Each covers three adjacent codons and is essential for the orchestration of the ribosome movement along the coding sequence. The ribosome reads only one codon at a time, but several ribosomes (polysomes) can simultaneously translate a single mRNA molecule. Figure 1.2: Structural model of the eukaryotic 80S ribosome Three sites for tRNA binding are highlighted: E-site - orange, P-site - violet and A-site - green. mRNA is shown in red. Estimated localisation of the ribosomal proteins is indicated by shaded areas: blue for 60S and yellow for 40S subunit; from Schuller and Green (2018) 17 1.1.1 Four stages of protein synthesis Translation process consists of four phases: initiation, elongation, termination and recycling. Translation initiation Translation initiation is a multi-step process, which is considered the principal point of translational control. There are two main modes of translation initiation: 50cap dependent and 50cap independent, of which 50cap dependent is the most common. The canonical pathway of 50cap dependent translation initiation can be divided into 4 stages. The ternary complex (TC), which consists of methionine carrying tRNA (Met- tRNA or methionyl-tRNA), GTP and eukaryotic Initiation Factor 2 (eIF2) complex is loaded to the small ribosomal subunit (40S) recycled from the previous round of translation with bound eIF1, eIF1A and eIF3 factors. TC together with 40S, eIF1, eIF1A and eIF3 form a 43S preinitiation complex (43S PIC) (Figure 1.3, steps 1-2) (Jackson et al., 2010). Recruitment of 43S PIC onto mRNA requires cooperation of eIF4F complex, eIF4B and eIF4H. eIF4F complex, consists of a 50cap-binding eIF4E, RNA-helicase eIF4A and a sca↵old protein eIF4G (Figure 1.3, step 3). Apart from interacting with eIF4A and eIF4E, eIF4G binds also poly(A) binding protein (PABP), which brings mRNA into circular shape with 50cap and 30-poly(A) tail close together creating the so-called closed-loop structure (Mangus et al., 2003; Taylor et al., 2020). 43S PIC starts base-by-base scanning of 50UTR. It consists of coordinated unwinding of the 50UTR secondary structures and 43S PIC movement towards 30end. Once 43S PIC recognises a start codon (usually AUG encoding methionine), the scanning process stops (Jackson et al., 2010). Sometimes an optimal start codon is not necessarily the first AUG encountered. In most eukaryotic mRNAs, a start codon is framed into a Kozak consensus sequence, an RNA sequence context ensuring the fidelity of the translation initiation site (TIS) (Kozak, 1987). If the context of a potential start codon is weak (poorly resembling a Kozak sequence), PIC may skip it and continue to search for the next codon. This phenomenon is known as leaky scanning (Zhang et al., 2019). The selection of the optimal TIS is promoted by eIF1, which discriminates between AUG and weak context codons. The commitment to a TIS is promoted by eIF5, a GTPase-activating protein targeting eIF2 complex and leading to partial dissociation of eIF2 from 40S subunit (Jackson et al., 2010). Assembly of 80S ribosome at the TIS is mediated by eIF5B and followed by release of eIF1, eIF1A, eIF3 and residual eIF2. (Figure 1.3, step 4). The ribosome is now ready to enter the elongation stage (Aitken and Lorsch, 2012; Hinnebusch, 2014). 18 Figure 1.3: Diagram showing a simplified process of 50cap- dependent translation initiation in eukaryotic cells. The process of translation initiation is divided into four steps: Step 1. Ternary complex formation from GTP-bound eIF2 and methionyl-tRNA (Met- tRNA). Step 2. 43S preinitiation complex (43S PIC) assembly, which includes the ternary complex, the small ribosome subunit 40S and translation initiation factors (eIF3, eIF1A and eIF1) Step 3. mRNA binding to 43S PIC promoted by eIF4F complex and eIF4B followed by 50UTR scanning in 50 to 30end direction. An optimal translation initiation site, typically AUG codon framed by Kozak sequence, is promoted by eIF1 and eIF5 Step 4. Recruitment of the large ribosome subunit 60S leading to 80S ribosome assembly, and dissociation of the initiation factors mediated by eIF5B. Adapted from Protein Translation Cascade, by BioRender.com (2021). Retrieved from https://app.biorender.com/biorender-templates 19 A less common mechanism of translation initiation involves 50cap-independent ribosome recruitment. Although the exact mechanisms of cap-independent translation are not fully understood, the two proposed mechanisms involve Internal Ribosome Entry Sites (IRES) and 50cap-independent translational enhancers (CITEs). Translation through IRES is a common strategy employed by pathogenic viruses to escape global halt of 50cap mediated translation in the host cell, but was also found in eukaryotic genes, particularly those involved in stress or anti-viral response (Yang and Wang, 2019; Jackson et al., 2010). IRES can interact with canonical initiation factors and recruits 40S ribosomal subunit through IRES trans-acting factors (ITAFs) (Komar and Hatzoglou, 2011; Yang et al., 2017; Meyer et al., 2015) or short cis-elements that pairs with 18S rRNA (Dresios et al., 2006; Yang and Wang, 2019). Because of the lack of conserved IRES sequence, the exact number of IRES-initiating open reading frames and factors regulating this mode of translation is unknown. An alternative mechanism of 50cap-independent translation, also originating from viruses, involves CITEs - RNA structural elements within mRNAs attracting translation initiation factors (Shatsky et al., 2018). Proposed mechanism of CITEs-mediated trans- lation initiation involves reversible N6A-methylation of mRNA within GAC sequence context recognised by YTHDF1 and followed by direct recruitment of the eIF3 complex (Meyer et al., 2015; Wang et al., 2015; Shatsky et al., 2018). Although it is still a widely debated topic, both CITEs and IRES initiating translation may proceed without a full set of translation initiation factors (Terenin et al., 2013). It is important to highlight that 50cap-independent translation is still a largely unex- plored territory and some controversies arose around the accuracy of IRES reporter assays and so the existence of IRES in eukaryotic cells (Shatsky et al., 2018; Yang and Wang, 2019). Nevertheless the concept of alternative translation initiation adds another layer of complexity to the cellular translatome potentially broadening proteome diversity. Translation elongation Translation elongation takes place between loading the first aminoacyl-tRNA after the start codon and encountering the first in-frame stop codon. Translation proceeds in the 50 - to - 30 direction, so the N-terminal end of a protein is translated first (Schuller and Green, 2018). The elongation stage consists of three major steps: tRNA binding, peptide bond formation and translocation. After initiation is completed, the tRNA carrying the first N-terminal amino-acid (usually Met-tRNA) is localised in the middle slot of the ribosome (P-site). Next slot (A site) exposes the next codon. Once tRNA carrying the next amino-acid binds to the complementary A site, eukaryotic Elongation Factors 1 alpha 1 (eEF1A1) hydrolyses GTP to GDP fixing aminoacyl-tRNA within the A-site. The next stage is the formation of a 20 peptide bond between two adjacent amino-acids. The peptidyl transferase centre of the large ribosomal subunit catalyses the reaction, after which the aminoacyl-tRNA occupying the P-site releases the attached amino-acid. The ribosome is now in the pre-translocation state with a peptidyl-tRNA (tRNA with nascent peptide attached) in the A-site and deacetylated tRNA (tRNA with detached amino-acid) in the P-site (Schuller and Green, 2018). The last step involves the a series of conformational changes that pushes the entire ribosome three nucleotides towards the 3’ end - a reaction conjugated with another GTP hydrolysis reaction catalysed by eEF2. During the translocation phase, deacetylated tRNA from the P-site is transferred to the E-site and released (Ratje et al., 2010). The A-site is now empty and ready to accept a new aminoacyl-tRNA. The elongation cycle then repeats until the termination codon appears in the A-site. Each elongation cycle will add one new amino acid to the growing polypeptide chain (Figure 1.4). Ecient conformational changes of the ribosome during each elongation cycle are essential for maintaining the right direction of translation. The energy for this process is delivered by GTP to GDP hydrolysis, which is performed by the two eucariotic Elongation Factors 1 and 2 (eEF1 and eEF2). The accuracy of the ribosome in decoding mRNA into protein is estimated to reach almost 99,99% (1 misincorporation per 104 amino acids joined). Because the release of faulty protein product could potentially cause serious consequences for the cell, two proofreading mechanisms monitor each elongation cycle (Neelagandan et al., 2020). Correct codon-anticodon matching is favoured due to its high anity, which stabilises the bond formation, a↵ects rRNA folding around the tRNA-rRNA interaction and triggers GTP hydrolysis by eEF1. When this is missing, the elongation becomes slow, aminoacyl tRNA cannot be fixed in the A-site and dissociates o↵ the ribosome. An invalid tRNA in the P- site increases the risk for further decoding errors, thus decreasing chances for synthesising a full-length protein. Repetitive amino-acid misincorporation may lead to premature termination of translation (Neelagandan et al., 2020). Translation termination The end of the coding sequence is marked by a stop codon (UAA, UAG or UGA). All three are recognised by eukaryotic release factor 1 (eRF1). Another release factor, eRF3, is a GTP-ase that promotes the hydrolysis of the peptidyl-tRNA bond in the P-site. This releases the C-terminal end of the newly synthesised amino-acid chain from the ribosome. The nascent polypeptide then completes folding in the cytoplasm. The post-termination complex (post-TC) disassembles and can be recycled to participate in another round of translation (Hellen, 2018). 21 Ribosome recycling The ribosome recycling stage aims to split 80S ribosomes into separate subunits, preparing them for a new round of translation. This is initiated by the recruitment of ABCE1 to post-termination ribosomes with eRF1 in the A-site. After 80S disassembly, the last step involves releasing deacetylated tRNA and mRNA from the 40S ribosome subunit, which is mediated by translation initiation factors: eIF1, eIF1A and eIF3. The full process of translation termination and recycling was reviewed comprehensively by Hellen (2018) Figure 1.4: Diagram showing the process of translation elongation in eukaryotic cells. A) Schematic of the translation elongation cycle demonstrating how tRNA moves between the ribosomal sites: Amino-acyl tRNA (green) recognises codon in the ribosomal A-site. This is followed by peptide-bond formation and transfer of the nascent peptide chain to amino-acyl tRNA in the A-site, now a peptidyl-tRNA. Change in the ribosome conformation after peptide bound formation are referred to as a hybrid state - a transient conformation, where anticodon loop of the tRNA remains fixed in in the P and A sites of the small ribosomal subunit, but the amino-acid site of tRNA (the acceptor stem) is already in the E and P sites of the large ribosomal subunit (Schuller and Green, 2018) The process of the ribosome translocation ends with peptidyl-tRNA in the P-site and A-site ready to accept the next amino-acyl tRNA (Schuller and Green, 2018). B) Overview of the peptide-transfer reaction catalysed by the ribosome. The peptide-bond formation occurs through nucleophilic attack of the amino group of the new amino acid (bound to tRNA in the A-site) on the ester linkage on the peptidyl-tRNA (remaining in the P-site) (Schuller and Green, 2018). This leaves a deacetylated tRNA in P-site and peptidyl-tRNA in the A-site longer by one amino-acid. From Schuller and Green (2018) 22 1.1.2 Overview of ribosome biogenesis Ribosome biogenesis is a complex process involving biosynthesis and assembly of the complete 80S human ribosome. Briefly, 80S ribosome consists of 2 subunits: small and large, both composed from ribosomal RNA (rRNA) and ribosomal proteins (RPs). The small subunit (40S) is responsible for binding, scanning and unwinding of the mRNA, while the function of the large subunit (60S) is to catalyse peptide bond formation and check the quality of the nascent peptide (Pelletier et al., 2018). The nucleolus, a nuclear substructure, is central to the process of ribosome biogenesis. It is responsible for transcription, processing and modifications of rRNA and the assembly of precursor ribosomal subunits (Lafontaine et al., 2021). rRNA sequences are organised in clusters of tandem repeats encoded by ribosomal DNA (rDNA) in nucleolus organiser regions (NORs). NORs contain 18S, 5.8S and 28S rRNA sequences (47S pre-rRNA) separated by spacers with regulatory sequences and distributed between the short arms of five acrocentric chromosomes 13, 14, 15, 21 and 22 (Henderson et al., 1972; Pelletier et al., 2018). 47S pre-rRNA is transcribed by RNA polymerase I. Another ribosome component, 5S rDNA cluster, is localised on chromosome 1, outside the nucleolus, and transcribed by RNA polymerase III (Pol III). In contrast, mRNAs of RPs are transcribed by RNA polymerase II (Pol II), exported to the cytoplasm for translation and then re-imported to the nucleus to participate in the ribosome assembly (Pelletier et al., 2018). In the nucleolus, 47S pre-rRNA, 5S rRNA, numerous RPs and assembly factors co-transcriptionally form a 90S processome (pre-ribosome) (Figure 1.5). Next step is 90S maturation, which includes extensive base modifications and cleavage reactions, resulting in the separation of the pre-40S and pre-60S subunits (Pelletier et al., 2018; Lafontaine, 2015). This requires the activity of small nucleolar RNAs (snoRNAs) derived from introns of certain Pol II transcribed genes. Finally, the subunits are exported to the cytoplasm for the final maturation (incorporation of few additional RPs and accessory factors) contributing to the assembly of 80S ribosome for protein synthesis (Figure 1.5) (Tschochner and Hurt, 2003; Pelletier et al., 2018). 23 Figure 1.5: Overview ribosome biogenesis The components of eukaryotic ribosomes are the product of three RNA polymerases transcribing di↵erent parts of ribosomal RNA (rRNA), and mRNA for ribosomal proteins (RP). Polymerase III (Pol III) is responsible for transcription of 5S rRNA cluster in the nucleus, while polymerase I (Pol I) in nucleolus produces 47S pre-RNA containing 18S, 5.8S and 28S rRNA. As is typical for other protein-coding regions, the RP sequence is transcribed by Pol II in the nucleus. RP’s mRNA is then exported to the cytoplasm for translation and imported back to the nucleus. RPs, 47S pre-RNA and 5S rRNA participate in the assembly of 90S processome, which undergoes chemical modifications and cleavage resulting in formation of two separate ribosomal subunits (pre-40S and pre 60S). Pre-40S and pre 60S are exported to the cytoplasm, where after final maturation step, they are ready to participate in protein synthesis. From Pelletier et al. (2018) 24 1.2 Translational control Regulation of translation is a broad term covering mechanisms a↵ecting di↵erent stages of protein synthesis. A given mechanism may a↵ect translation globally or be specific towards a single mRNA or a transcript group. A further distinction can be made between processes that modify the function of the core components of the translation machinery and those that a↵ect RNA directly. From a molecular perspective, translational control includes a variety of mechanisms such as chemical modifications (e.g. phosphorylation or methylation), di↵erential expression of translation factors, modulation of the 3D structure of mRNA, action of trans-activating RNA-binding proteins (RBPs), presence of cis-regulatory elements in mRNA or recruitment of microRNAs to 30UTR. For clarity, I will review the most important points of regulation, focusing on each stage of translation separately. 1.2.1 Regulation of the translation initiation Translation initiation is considered the main rate-limiting step of eukaryotic protein synthesis. Many of the regulatory mechanisms target the balance between 50cap dependent and 50cap independent translation by modulating the activity of core translation initiation factors. The most extensively studied mechanisms of translation initiation control involves the regulation of eIF4E activity through 1) transcription, 2) phosphorylation or 3) sequestration by binding to a family of translational repressors (Raught and Gingras, 1999). eIF4E is a direct transcriptional target of many important signalling pathways, including NF-B and c-MYC (Hariri et al., 2013). Phosphorylation of eIF4E at Ser 209 is mediated by MAP kinase-interacting kinases MNK1/MNK2 and occurs in response to various mitogenic and stress factors promoting 50cap dependent translation initiation (Sonenberg, 1996; Pyronnet et al., 1999). Another arm of regulation is formed by the family of eIF4E binding proteins (EIF4EBPs), which bind eIF4E and remove it from the pool available for translation. Binding of 4E-BPs and eIF4G, a part of eIF4F complex, are mutually exclusive for eIF4E. (Yang et al., 2020). A well characterised regulator of this process is mTOR signalling, which couples the activity of growth factors with nutrients availability and the intensity of anabolic processes (Saxton and Sabatini, 2017; Liu and Sabatini, 2020; Kim and Guan, 2019a). mTOR-mediated phosphorylation of EIF4EBPs disrupts the 4E-BP–eIF4E complex allowing eIF4E to participate in eIF4F initiation complex. EIF4E is required for translation of all 50capped mRNAs, but not all transcripts are equally susceptible for changes in eIF4E levels. Transcripts with long 50UTRs and specific RNA regulatory motifs are particularly sensitive to eIF4E levels. (Smith et al., 2021). 25 The 50UTR scanning requires coordinated unwinding of mRNA structure, which is performed by RNA helicases, typically eIF4A - a component of the eIF4F complex. eIF4A activity is promoted by two cofactors: eIF4B or eIF4H, while the avaliability by programmed cell death 4 (PDCD4). (Yang et al., 2003; Smith et al., 2021). PDCD4 acts downstream of mTOR and is controlled by inactivation phosphorylation by Ribosomal Protein S6 Kinase (S6K) (Silvera et al., 2010). Although the net e↵ect of eIF4A inhibition or sequestration is a mild reduction in global translation rate, the rate and even the direction of change vary between single transcripts (Modelska et al., 2015; Smith et al., 2021). eIF4A-sensitive mRNAs are characterised by complex secondary structures in their 50UTRs and the presence of specific RNA motifs (Steinhardt et al., 2014; Modelska et al., 2015; Wolfe et al., 2014a). Another important mechanisms of translation initiation control involves phosphoryla- tion of eIF2↵, the main regulatory subunit of eIF2 complex, which is responsible for joining methionine-tRNA to the small ribosomal subunit (Smith et al., 2021). Phosphorylated eIF2↵ has an inhibitory e↵ect on global translation, because it prevents GDP to GTP restoration of eIF2, another eIF2 subunit, trapping it its inactive form. This shifts the balance towards 50cap-independent translation initiation. Translation control through eIF2↵ is central to the Integrated Stress Response (ISR), an adaptive pathway, which aims to restore cellular homeostatsis or commit the cell to apoptosis following exposure to unfavourable conditions (Pakos-Zebrucka et al., 2016), see section 1.3.4. The eciency of translation initiation can also be controlled by numerous RNA-binding proteins, of which RNA helicases deserves special attention. RNA helicases are multifunc- tional proteins involved in all aspects of RNA metabolism, such as mRNA processing, nuclear export, tracking in the cytoplasm, translation, degradation or microRNAs- mediated RNA silencing. The activity of RNA helicases regulates almost all stages of protein synthesis (Bourgeois et al., 2016; Linder and Jankowsky, 2011). The eciency of 50cap-dependent translation initiation relies on the ability to unwind structured 50UTRs. Helicases assisting in this process include DDX48, DDX3X and components of eIF4F complex (eIF4A1 and eIF4A2). Those are especially important for translating mRNAs with GC-rich 50UTRs and 30UTRs with microRNA binding sites (Bourgeois et al., 2016; Linder and Jankowsky, 2011; Sen et al., 2015; Wolfe et al., 2014b). Other roles of RNA helicases in translation control include mRNA positioning at ribosomal 40S subunit (DHX29), regu- lation of 40S scanning and ribosome recycling (DHX9), promoting 80S ribosome assembly (DHX33, DDX3X) and recognition of stop codon (DDX19B). RNA helicases also control the translation of mRNA regulons; for example, DDX25 is essential for the translation of mRNAs associated with spermatogenesis, and DDX48 facilitates the translation of several neuronal mRNAs (Bourgeois et al., 2016; Linder and Jankowsky, 2011). Despite recent 26 advances in the biology of RNA helicases, the full spectrum of their activity remains elusive. tRNA-derived small RNAs (tsRNAs), a novel class of regulatory non-coding RNAs, compose another interesting layer of translational control. tsRNA are cleaved from several types of tRNA in response to stress. These tsRNAs contains an oligo-G terminal motif forming G-quadruplexes in the 50UTR of certain transcripts, that suppress translation initiation by displacing eIF4F complex from 50cap (Ivanov et al., 2014). This mechanism of translation inhibition by stress-induced tsRNAs was identified in several organisms and cell types indicating high evolutionary conservation (Ivanov et al., 2011). 1.2.2 Regulation of translation elongation For a very long time translation elongation was not considered an important rate-limiting step of protein production. However, variety of factors can influence the elongation speed, which is especially important in the terms of protein folding and transport. Regulatory mechanisms of translation elongation involve: ORF codon composition, local sequence context, post-transcriptional modifications of tRNA, mRNA, and the expression levels of the core elongation factors (Knight et al., 2020; Schuller and Green, 2018). The genetic code is redundant (degenerate), so each amino acid can be decoded by two, four or six codons, with exception for methionine and tryptophan, which are encoded by a unique codon. Interestingly, the distribution of synonymous codons over the transcriptome is not uniform (codon bias), and the rate of translation of di↵erent codons is uneven (codon optimality). Hypotheses put forward to explain an evolutionary origin of such phenomenon involve varying elongation rate of di↵erent codons, translation accuracy or selection of splicing enhancers (Richter and Coller, 2015). The speed of tRNA-codon pairing depends on the abundance of a specific tRNA in the cytoplasm (it takes more time for a less abundant tRNA to find the codon) and interaction strength (standard Watson-Crick pairing is quicker than wobble pairing). The relative abundance of di↵erent types of tRNA in human is not uniform and varies between tissues promoting translation of cell-specific transcripts (Dittmar et al., 2006). This view has been expanded by findings in yeast, that demonstrated coordinated changes in the abundance of specific tRNAs after exposure to various types of stress. In consequence, when tRNA pool is restricted, transcripts with larger number of rare codons tends to be translated less eciently (Torrent et al., 2018; Hanson and Coller, 2018). Codon bias correlates with tRNA levels, therefore, to some extent, it is mirrored by codon optimality (Sabi and Tuller, 2014; Knight et al., 2020). Both translation eciency and mRNA half-life correlate with codon composition, but this changes with the concentration of specific tRNAs in the cytoplasm (Presnyak et al., 2015). Although variations in codon selection and translation dynamics have been 27 observed within and between all species, the majority of studies in this topic came from lower organisms (Hanson and Coller, 2018). Certain codon patterns, for example poly-lysine tracks, or mRNA secondary structures can lead to programmed ribosome stalling and frameshifting (Schuller and Green, 2018). Local slowdown, or even halt in the elongation rate may facilitate protein folding or signal recognition particle binding, which essential for secreted protein (Richter et al., 2012). Ribosome transit can be regulated by several protein interacting with the A-site, for example the elongation factor EF-G. Programmed frameshifting involves either slipping back or skipping one nucleotide and occurs in all species (Ketteler, 2012). The scope of this phenomenon in human is still debated. An example of a gene known to regulate its translation through frameshifting is ornithine decarboxylase (ODC) (Bekaert et al., 2008). 1.2.3 Regulation of translation termination Although termination of translation is usually not a rate-limiting step of protein synthesis, the fidelity of this process is important for maintaining proteome integrity. The biological activity of truncated or extended proteins, possible products of noncanonical termination of translation, may be di↵erent from the original mRNA product. Nonsense mediated decay (NMD) is a translation-coupled process of elimination of mRNAs harbouring premature stop codons. This is an important surveillance mechanisms preventing the expression of truncated proteins (Kurosaki and Maquat, 2016; Hellen, 2018). The role of NMD is not limited to aberrant transcript elimination. Transcripts with upstream ORFs in their 50UTR, products of alternative splicing, intron retention or auto-regulatory loops may utilise programmed NMD to regulate the pool of mRNAs available for translation (Kurosaki and Maquat, 2016). It is estimated that the expression level of about 10% of eukaryotic mRNAs may be modulated by NMD (Kurosaki and Maquat, 2016) Finally, near cognate recognition of stop codon (readthrough) may lead to an extended product, but the full scope of this phenomenon in mammalian cells is still unclear. Functional readthrough has been shown for three human genes: VEGFA, LDHB, and MDH1 (Schueren and Thoms, 2016). Interestingly, the extended isoforms show di↵erent subcellular location (LDHB, MHD1) or have the opposite biological activity to a canonical product (antiangiogenic instead of proangiogenic in case of VEGFA) (Eswarappa et al., 2014; Schueren and Thoms, 2016). 28 1.3 Deregulation of translation in human cancers Enlarged nucleoli, which are the primary site of ribosome biogenesis, was one of the first hallmarks of malignancy that was later widely applied in diagnostics (Pianese, 1896; Gani, 1976). Deregulation of translation machinery is frequently observed in spontaneous cancers as well as in hereditary cancer syndromes. While the latter usually involves point mutations a↵ecting ribosome biogenesis, the scope of translational reprogramming in the majority of cancers encompass a variety of mechanisms. Protein synthesis can be hijacked by malignant cells through aberrant expression, point mutations or post-translational modifications of translation factors or their regulators, mRNA regulatory elements, RNA modifications, or preferential codon usage. The mechanisms can include global changes in protein synthesis and/or increase or decrease in translation intensity of a subset of transcripts. 1.3.1 Oncogenic and tumour suppressor pathways converge at controlling protein synthesis At the global scale, increased protein synthesis rate was shown to accompany high mitotic activity (Johnson et al., 1976). Indeed, many key oncogenic and tumour suppressor pathways, such as MYC, PI3K, RAS, PTEN, TP53, converge at the regulation of cellular translation synchronising proliferation rate with anabolic and catabolic pathways, but this relationship is far from being simple. Deregulation of MYC is observed in >50% of human cancers (Meyer and Penn, 2008). The oncoprotein MYC family consists of three genes: MYC, MYCN, and MYCL, which have the capacity to regulate about 15% of human genes (Dang, 2012) coordinating a variety of cellular functions including proliferation, metabolism, di↵erentiation and immunosurveillance (Chen et al., 2018). MYC regulates protein synthesis through control- ling the transcription of ribosomal proteins (Boon et al., 2001), ribosomal RNA (rRNA) (Grandori et al., 2005) (Figure 1.6), and translation initiation factors (eIF4A, eIF4E, and eIF4G) (Schmidt, 2004; Xu and Ruggero, 2020). Although the role of MYC far exceeds the regulation of translation, its oncogenic potential is highly dependent on the translation apparatus: the cell ability to phosphorylate eIF4E (Pourdehnad et al., 2013) and augment protein synthesis (Barna et al., 2008), see section 1.5. Moreover, MYC protein abundance is regulated at the level of translation. Firstly, through alternative, IRES-mediated, translation initiation from a CUG start codon and secondly, by preferential translation controlled by eIF4A and eIF4E (Schatz et al., 2011; Culjkovic-Kraljacic et al., 2016). 29 Figure 1.6: Multilevel regulation of ribosome biogenesis by MYC MYC controls ribosome biogenesis by promoting transcription of ribosome biogenesis components. It does so by cooperation with other cofactors regulating the recruitment of RNA polymerases (RNA pol I, II and III) and through chromatin structure remodelling (Shiue et al., 2009; Van Riggelen et al., 2010). Adapted from Van Riggelen et al. (2010) Interestingly, some ribosomal proteins were found to inhibit MYC expression creating a negative feedback loop. For example, RPL11 interferes with MYC binding to 5S rRNA and tRNA promoters (Dai et al., 2010) and, together with RPL5, jointly binds to MYC mRNA directing it to RNA-induced silencing complex (RISC) for degradation (Liao et al., 2014). The translational control of PTEN/PI3K/AKT and RAS/MEK1/ERK signalling con- verges at regulating the activity of mTORC1 complex, a key coordinator of cell growth, survival and metabolism (Figure 1.7) (Kim and Guan, 2019b). Activation of these pathways leads to inactivating phosphorylation of TSC1/2 complex, which is a nega- tive regulator of mTORC1. The mTOR pathway promotes 50cap-dependent translation by controlling eIF4E inhibitory binding proteins, which sequester eIF4E and prevent it from participating in translation initiation (Silvera et al., 2010). Additional control is imposed at the level of mTORC1 targets. MNK1 and MNK2, which are downstream of RAS/MEK1/ERK, phosphorylate eIF4E at a single residue Ser209 independently of mTORC1 control, thus selectively increasing translation intensity of several protumorigenic transcripts in mice and human cell lines including anti-apoptotic MCL1 and cyclin D1 (CCND1), MYC, and proangiogenic VEGF and FGF2 (Furic et al., 2010; Sonenberg and Hinnebusch, 2009; Kevil et al., 1996). Biophysical studies demonstrated that phosphory- lation at Ser209 promotes 50cap independent translation, but the significance of this for tumourgenesis is unclear (Zuberek et al., 2003). We know that oncogenic capacity of both RAS/MEK1/ERK and MYC signalling depends on the ability to phosporylate eIF4E (in vitro and in vivo evidence) (Yang et al., 2020) suggesting that a specific, prooncogenic, mode of protein synthesis is a common strategy during cell transformation. Prooncogenic translation is not limited to malignant cells, though. Robichaud et al. (2018) showed that mice with mutated phosphorylation site (eIF4ES209A) are resistant to the development of lung metastases due to decreased translation of MCL1 and BCL2 in prometastatic neutrophils. 30 Figure 1.7: Signalling pathways converging at regulating protein synthesis Mitogenic stimulation (growth factors, hormones or cytokines) targets receptor tyrosine kinases (RTKs), which promotes RAS/ERK and PI3K/PTEN/AKT signalling. ERK activates 50cap dependent translation either through MNK-mediated phosphorylation of eIF4E, RSK-mediated phosphorylation of eIF4B or by inhibiting TSC1/TSC2complex, a negative regulator of mTOR signalling. AKT also regulates translation through mTOR by inhibiting another negative regulator, PRAS40. From Silvera et al. (2010) 31 Although the contribution of p53 to tumour formation is one of the best studied mechanisms of cancer, its role in controlling protein synthesis and being regulated by translation apparatus is less known. Overall, activation of p53 leads to suppression of ribosome biogenesis a↵ecting translation initiation. Ribosome biogenesis is a complex process that coordinates the synthesis of rRNA, ribosomal proteins and other auxillary factors; p53 can control the activity of all of them. Firstly, p53 directly interferes with the assembly of RNA Pol I complex, which is essential for rRNA transcription (Zhai and Comai, 2000). Secondly, lessons learnt from studying ribosomopathies show that there is a bidirectional feedback loop between p53 and the expression of ribosomal proteins. When the process of ribosome biogenesis is impaired, for example due to mutation in a ribosomal protein, DNA damage of the ribosomal DNA (rDNA) or insucient transcription of rRNA, free ribosomal proteins are released to the nucleoplasm. RPL5, RPL11, RPL23a, RPS7, and RPL26 are known to interact and sequester MDM2, a key inhibitor of p53 (Kampen et al., 2020) This leads to cell cycle arrest, induction of cellular senescence, or apoptosis, but it can vary between cell types. For example, haploinsuciency of RPS14 or RPS19 in erythroid progenitor cells is known to increase p53 and CDKN1A (p21) protein in vivo to the level similar to those after gamma irradiation (Dutt et al., 2011). p53 can inhibit transcription of ribosomal proteins and other associated factors, indirectly, by inhibiting MYC (Ho et al., 2005). The dynamic of p53-related nucleolar stress response and its links to cancer and neurodegenerative diseases are not fully understood; this topic was reviewed recently by Lindstro¨m et al. (2018) and Pfister (2019). 1.3.2 Ribosome biogenesis and its oncogenic potential The involvement of ribosome biogenesis in oncogenesis is well understood in the context of MYC expression deregulation, see section 1.3.1, but it is not limited to it. Other ribosome biogenesis regulators with known oncogenic potential include netrin 1 (NTN1) and epithelial cell-transforming sequence 2 oncogene (ECT2). ECT2 and truncated NTN1 isoform, expressed exclusively in cancer cells, promote rDNA transcription and pre-rRNA processing. rDNA is one of the most transcriptionally active regions of the genome and rDNA rearrangements are observed in the majority of lung and colorectal cancers (Stults et al., 2009). During S phase, when rDNA transcription continues, RNA polymerase I collides with the replication forks, which facilitates the formation of R-loops (rRNA-rDNA hybrids) (Pelletier et al., 2018). R-loops are known hot spots of DNA damage (Helmrich et al., 2011; Skourti-Stathaki and Proudfoot, 2014) and rDNA regions are common fragile sites (Pelletier et al., 2018). 32 Cancer-specific changes in the ribosome morphology are another interesting concept, which is sometimes referred to as the oncoribosome hypothesis. Varying expression level of ribosomal proteins, post-transcriptional rRNA modifications and recurrent mutations in ribosomal proteins are observed in many malignancies (Babaian et al., 2020; Bastide and David, 2018; Pelletier et al., 2018) 1.3.3 Translation factors are frequently deregulated in cancer Regulation of translation initiation is essential for maintaing cell homeostasis during malignant transformation and under exposure to unfavorable conditions. High levels of eIF4E are common in cancer and in vitro studies revealed that this can be sucient to induce malignant transformation (Lazaris-Karatzas et al., 1990; Gingras et al., 1999). Oncogenic properties of eIF4E has been shown also in vivo. Transgenic mice overexpressing eIF4E have increased risk of several types of cancers, including B-cell lymphomas (Ruggero et al., 2004) and accelerated B-cell lymphomagenesis when c-Myc is deregulated as well (Wendel et al., 2004). Oncogenic properties of high levels of eIF4E are attributed to its ability to regulate translation of specific transcripts rather than control on global translation. Haploinsuciency of eIF4E is associated with normal translation levels and development, but still prevents HRAS-induced transformation in mice (Truitt et al., 2015). eIF4E-sensitive mRNAs (eIF4E regulon) are enriched for CERT motif and complex RNA structures sequences in their 50UTRs. The latter involves eIF4E-mediated recruitment of RNA helicase - eIF4A (Smith et al., 2021). These include transcripts associated with cell proliferation, survival and oxidative stress, such as cyclins, ornithine decarboxylase (ODC), vascular endothelial growth factor (VEGF), MYC, and phosphoribosyl-pyrophosphate synthetase 2 (PRPS2) (Bhat et al., 2015). Tumourgenesis following eIF4E overexpression develops relatively late, suggesting that other prooncogenic events might be required for full transformation (Ruggero et al., 2004). Interestingly, the role of eIF4E is not limited to translation initation. A substantial proportion of it exists withing the nuclei, where eIF4E regulates export of selected mRNA to the cytoplasm through the nuclear pore. Known transcripts, whose export is promoted by eIF4E, include MYC, BCL6 and BCL2 (Culjkovic-Kraljacic et al., 2016). Other translation factors important from the oncogenesis perspective are eIF3 subunits. Prooncogenic properties have been reported for eIF3A–eIF3D, eIF3G, eIF3H, eIF3M, eIF3E and eIF3F, but the exact role of individual subunits is not fully understood, see Smith et al. (2021). Similairly to eIF4E, deregulation of certain eIF3 subunits leads to preferential translation of specific prooncogenic regulon, which includes JUN, genes associated with epithelial–mesenchymal transition or inflammation (Smith et al., 2021; Lee et al., 2016; Desnoyers et al., 2015). Some functions are independent on the role of eIF3 complex in translation initation. For example, the role of eIF3A and eIF3B encompass 33 control of enzymes responsible for reversible N6-methyladenine RNA modification (Smith et al., 2021). eIF3-complex independent pool of eIF3H acts as a deubiquitylating enzyme stabilising YAP1, which is important for tumour progression and metastasis (Smith et al., 2021). An interesting dual role has been shown also for eIF3G, which is cleaved during apoptosis and translocated to the nucleus, where promotes caspase activation and DNA degradation (Smith et al., 2021). When 50cap-dependent translation is inhibited following cellular stress, eIF2A and eIF5B promote translation of selected transcripts through non-canonical, IRES-mediated initation of translation. These include few apoptosis inhibitors (XIAP, BIRC2 and BCL2L1), CDKN1A (p21) and proteins associated with NF-B signalling (Smith et al., 2021). The role of translation initation factors in cancer has been summarised in Figure 1.8 1.3.4 The significance of translational response to stress in can- cer Uncontrolled cell growth and proliferation combined with limited nutrients supply, hypoxic environment and deregulated cellular energetics are one of the hallmarks of cancer (Hanahan and Weinberg, 2011). Adaptation to unfavourable conditions is essential to initiate and maintain malignant transformation. Key to this process are evolutionarily conserved stress-signalling pathways that aim to rebalance cell homeostasis or commit it to apoptosis, if this cannot be achieved. Stress response encompasses a variety of mechanisms of which the Integrated Stress Response (ISR), Unfolded Protein Response (UPR) and heat shock response (HSR) are the ones most relevant for the topic of protein translation. The principal point of ISR is the control of the avaliability of the ternary complex for translation, thus reprogramming the global translation landscape (Costa-Mattioli and Walter, 2020). The ternary complex consists of three subunits: eIF2↵, eIF2 and eIF2. Accumulation of misfolded proteins, amino acid deprivation and other stress signals activate eIF2↵ kinases: PERK, PKR, HRI, and GCN2 that converge on the phosphorylation of eIF2↵. The role of the ternary complex is AUG start codon recognition that triggers GTP hydrolysis of eIF2 with the aid of eIF5. eIF2-GDP dissociates from the 40S complex and is recycled for another round of translation by eIF2B, which restores eIF2-bound GDP to GTP (Pakos-Zebrucka et al., 2016). The ↵ subunit of eIF2 is the main point of control. In response to various stress stimuli, it can be phosphorylated at Ser 51 (P-eIF2↵) and act as an inhibitor of eIF2B. In its inactive form (GDP-bound) P-eIF2↵ is limiting for the ternary complex formation, which results in the reduction of global, 50cap dependent protein synthesis enabling 50cap independent translation of selected transcripts. 34 Figure 1.8: Overview of 50cap dependent translation initiation in cancer Numerous components of the translation initation machinery are deregulated in human cancers. Up to now, the link between oncogenesis and translation initiation has been well established for the components of eIF4F, eIF3 and eIF2 complex and ribosome biogenesis. The overexpression of eIF4F components, which include eIF4E, eIF4A and eIF4G results in preferential translation of selected transcripts. For individual eIF3 subunits similar mechanisms was observed. eIF2↵ is phosphorylated by di↵erent kinases (PKR, PERK or GCN2), which are reactive to various types of stress. The net e↵ect of P-eIF2↵ is repression of 50cap-dependent translation, promotion of non-canonical translation initation and adaptation to unfavourable conditions. Ribosomal biogenesis is another mechanisms hacked by cancer cells to drive a malignant phenotype. Many oncogenic signalling pathways converges at controlling the synthesis of the ribosome components. Other factors with possible role in tumourgenesis include eIF5A, eIF5B and eIF2A. From (Silvera et al., 2010) 35 Of these, activating transcription factor 4 (ATF4), activating transcription factor 5 (ATF5), DNA Damage Inducible Transcript 3 (DDIT3 or CHOP) promote transcription of genes responsible for restoration of cell homeostasis (Hinnebusch, 2005; Vattem and Wek, 2004; Pakos-Zebrucka et al., 2016). The presence of misfolded proteins in the endoplasmic reticulum (ER) or the cytosol can have serious consequences for the cell function. The salient role of UPR is to sense unfolded proteins in the ER, while HSR respond from the cytosol increasing protein folding and degradation capacity (Costa-Mattioli and Walter, 2020). UPR consists of three main sensors: inositol-requiring protein 1 (IRE1↵), protein kinase RNA-like endoplasmic reticulum (ER) kinase (PERK) and activating transcription factor 6 (ATF6), which controls the folding capacity of the ER (Hetz, 2012). Early UPR, mediated by PERK-eIF2 and converging with ISR, aims to reduce translation intensity decreasing protein load to the ER. This is accompanied by IRE1-dependent decay of mRNA and activation of the autophagy pathway (Hetz, 2012). Then, a group of transcription factors: ATF4, transcription factor 6 cytosolic fragment (ATF6f) and spliced X box-binding protein 1 (XBP1s) trigger transcription of adaptive genes aiming to alleviate stress and increase protein folding capacity. If the exposure and duration of stress exceeds the cell’s ability to restore homeostasis, the cell commits to apoptosis (Hetz, 2012). Stress response signalling controls pro-survival and apoptotic pathways, so the role of the stress response in cancer is complex and context specific Figure 1.10, reviewed by Clarke et al. (2014) and Urra et al. (2016). Briefly, many oncogenic pathways drive hyperactivation of protein synthesis making the cell susceptible to stress-induced apoptosis. However, high levels of UPR activation are associated with malignant transformation (Urra et al., 2016), are known to promote resistance to chemotherapy and radiotherapy in di↵erent cancer types (Rouschop et al., 2010; Ghaddar et al., 2021), correlate with poor prognosis in glioblastoma, breast cancer and pre-B acute lymphoblastic leukemia, promote metabolic reprogramming in prostate cancer and immunosuppressive environment in ovarian cancer (Clarke et al., 2014; Urra et al., 2016). 36 Figure 1.9: Overview of the cell fate decisions associated with ER stress response The outcome of ER stress response involves adaptation and restoration of cell’s protein folding capacity or triggering apoptosis, if the stress is prolonged or not restrained. There are three signalling arms responsible for the initiation of ER stress response: ATF6, IRE1↵ or PERK-mediated. Of which the latter, is one of the component of the Integrated Stress Response pathway. The net e↵ect of the ER stress activation is transcriptional upregulation of several ER chaperones to facilitate protein folding. ER stress also include inhibition of 50cap dependent translation (through eIF2↵) and activation of ER-associated degradation (ERAD) pathway, autophagy to remove misfolded proteins and regulated IRE1-dependent decay (RIDD) pathway, which selectively removes mRNAs encoding proteins located in the endoplasmic reticulum (Hetz, 2012). Above certain threshold, the cell’s ability to tolerate ER stress may be exceeded and ER stress response signalling will trigger the apoptosis. In such scenario, cell death is controlled by BCL2 protein family, BAX and BID and p53-mediated signalling. From Hetz (2012) 37 1.4 Elements of B-cell biology from the perspective of lymphoma development The current World Health Organisation (WHO) classification of B-cell malignancies relies predominantly on the cell of origin of the tumour. B-cell neoplasms can be divided into malignancies originating from either progenitor cells, mature B-cells or plasma cells (Swerdlow et al., 2016). Overall, the majority of Non-Hodgkin Lymphomas (NHL), which are the most common haematological malignancies in adults, are derived from the germinal center (GC) stage of B cell development (Mlynarczyk et al., 2019; Swerdlow et al., 2016). GC-derived lymphomas include: Di↵use Large B-cell Lymphoma (DLBCL), Burkitt Lymphoma (BL) and Follicular Lymphoma (FL), which all account for about 60% of all B-cell NHL. GCs are transient microstructures, which are formed in secondary lymphoid organs upon activation of mature naive B-cells by T-cell dependent antigens. The GC reaction is dedicated to selecting and expanding populations of B-cells that will eventually di↵erentiate into memory B-cells, and plasma cells. This is an essential part of a physiological adaptive immune response against exogenous pathogens (Klein and Dalla-Favera, 2008; De Silva and Klein, 2015). Lack of GCs observed in patients with inherited hyper-IgM syndrome is associated with severe immunodeficiency, which highlights the importance of the GC reaction (Etzioni and Ochs, 2004). The GC consists of two functionally and histologically distinct zones: the dark zone and the light zone. The dark zone consists of a dense population of large B-cells - centroblasts, that undergo rapid proliferation and somatic hypermutation (SHM). The aim of the SHM process is to introduce point mutations into the variable regions of the heavy and light chains of the immunoglobulin genes (IgV), that will generate a population of mutant subclones with a broad range of B-cell receptor (BCR) anity for the antigen. This is achieved through expression of activation-induced cytidine deaminase (AICDA) (Muramatsu et al., 2000; De Silva and Klein, 2015). Centroblasts then migrate to the light zone becoming centrocytes. Centrocytes compete for survival signal from CD4 T follicular helper (TFH) cells and Follicular Dendritic Cells (FDC) and only those demonstrating high-anity towards the immunising antigen evade apoptosis (Nakagawa and Calado, 2021). The positive selection (survival) stimuli is provided by B-cell receptor (BCR) capturing the FDC-bound intact antigen, co-stimulatory receptors such as CD40 or tumour-necrosis factor (TNF)-receptor, adhesive molecules (ICAM-1 and VCAM-1), anti-apoptotic molecules (BAFF), and a mixture of cytokines secreted by FDC and TFH . BCR-antigen complex is internalised and the antigen is then presented at the cell surface through MHC II complex that interacts with CD4 T follicular helper cells. The higher the anity the more antigen can be acquired from FDC and the stronger the stimulation received from TFH 38 is (Nakagawa and Calado, 2021). Immunoglobulin Class Switch Recombination (CSR) is another important event in the life of GC B-cells. The mechanism of CSR involves double stranded breakage and rejoining of the immunoglobulin genes leading to deletional recombination of the heavy chain region, thus determining the final class of produced antibodies (IgG, IgE or IgA). GC B-cells undergo several rounds of proliferation and selection, re-cycling between dark and light zone through a process directed by chemokines gradients. Selected GC B-cells di↵erentiate into first antibody-secreting plasma cells after about one week since activation (Basso and Dalla-Favera, 2015; Sha↵er III et al., 2012). The anity of the antibodies increases with time in the phenomenon known as ”anity maturation”, which continues over the next weeks. An overview of the most important steps of B-cell development and maturation were shown in the Figure 1.10. The family of GC-derived lymphomas is highly heterogeneous comprising a broad spectrum of clinical presentation and complex molecular context. Each lymphoma subtype works like a distorting mirror, which resembles a certain stage of B-cell development, sometimes referred to as cell-of-origin, but aggravates or attenuates specific physiological mechanisms to drive a malignant phenotype. The oncogenic events may start at earlier stages than the normal B-cell counterpart and accumulate during di↵erentiation and maturation (Sha↵er III et al., 2012). For example, although FL resembles GC stage of development but initial genetic corruption begins at bone marrow pro-B cell stage (Sha↵er III et al., 2012). In addition to genetic aberrations commonly found in tumours, such as point mutations, deletions or amplifications that activate potential oncogenes or deactivate tumour suppressor genes, there are two GC-specific processes, SHM and CSR, that largely contribute to the uniqueness of the lymphoma genome (Basso and Dalla-Favera, 2015; Klein and Dalla-Favera, 2008; Mlynarczyk et al., 2019). SHM and CSR occur in a highly hazardous environment, in which DNA damage response and cell cycle checkpoints are desensitised allowing for remodelling of the immunoglobulin loci and the expansion of selected clones. An unwanted by-product of CSR is the risk of translocation, which is common in GC-derived lymphomas. Chromosomal abnormalities in GC-lymphomas usually involve translocation of a proto-oncogene into an immunoglobulin locus which is under control of an active enhancer. This drives high-level and sustained transcription of the translocated oncogene (Ku¨ppers and Dalla-Favera, 2001). Thus translocation of MYC into the immunoglobulin loci, t(8,14), is a hallmark of Burkitt Lymphoma (occurs in more than 95% of cases) (Basso and Dalla-Favera, 2015; Klein and Dalla-Favera, 2008; Mlynarczyk et al., 2019). Other genes with recurrent translocation include BCL2 (80% of FL patients) and BCL6 (30–40% of DLBCL). 39 Figure 1.10: Overview of B-cell development and maturation B-cells develop in bone marrow from haematopoietic stem cells (HSC). Recombination of the immunoglobulin loci is the first step in the generation of a mature immunoglobulin receptor. This process starts in the pre-pro-B cell and continues to the pro-B cell stage, leading to a heavy chain protein formation. V(D)J segments recombination requires the activity of the recombination-activating gene (RAG) complex, which directs cleavage of the DNA. Mature B-cells (na¨ıve B-cells) leave the bone marrow and migrate to secondary lymphoid organs (SLOs), where they encounter an antigen. Once activated, with T-cell dependent antigen, B-cells enter the germinal centre reaction (De Silva and Klein, 2015). In the dark zone (DZ), activated B-cells, now centroblast, undergo proliferation and somatic hypermutation of their immunoglobulin loci. Centroblasts migrate to the light zone (LZ), becoming centrocytes and compete for the survival signal from CD4 T follicular helper (TFH) cells and Follicular Dendritic Cells (FDC). During germinal centre (GC) GC reaction B-cells switch their isotype during class switch recombination. Editing of the immunoglobulin loci during somatic hypermutation and class switch recombination is mediated by activation-induced cytidine deaminase (AID) activity (De Silva and Klein, 2015). GC B-cells undergo several rounds of proliferation and selection, re-cycling between LZ and DZ. Finally, GC B-cells di↵erentiate into memory B-cells or antibody secreting plasma cells (Mlynarczyk et al., 2019; Runte et al., 2019). 40 A massive phenotype shift from quiescent na¨ive B-cells cells to rapidly proliferating and hypermutated GC B-cells and, finally, to antibody producing plasma cells is directed by profound reorganisation of the gene expression programme. Transcription factors with well established links to GC formation and lymphomagenesis include MYC and BCL6. In the physiological GC reaction, the proto-oncogene MYC has a bimodal pattern of expression: it is induced early during GC reaction initation and then transiently in the LZ B-cells undergoing positive selection and DZ re-entry (Calado et al., 2012; Dominguez-Sola et al., 2012; Basso and Dalla-Favera, 2015). Expression of MYC primes B-cells for massive proliferation and clonal expansion. BCL6, known as a master regulator of GC-reaction, is key to GC formation and maintenance (Basso and Dalla-Favera, 2015). It acts as a transcriptional repressor inhibiting DNA damage response, apoptosis and premature B-cell activation and terminal di↵erentiation into plasma cells (Basso and Dalla-Favera, 2015). BCL6 controls the expression of several genes known to drive tumour formation, such as cell cycle checkpoints (CDKN1A, CDKN1B), DNA damage response regulators (ATR, TP53) and di↵erentiation factors (eg. IRF4 and PRDM1) (Ci et al., 2009; Basso and Dalla-Favera, 2015). Desensitised DNA damage response by BCL6 is necessary to sustain immunoglobulin loci remodelling and anity maturation. The expression of BCL6 is promoted during the initation of GC reaction by IRF8 and MEF2B, OCT2 and OCA-B and then downregulated at the GC exit stage (Song et al., 2021). Despite tremendous progress that has been made in our understanding of processes governing GC reaction and their links to lymphomagenesis, translation of these findings into concise subtype classification systems and then further into clinical practice has been challenging. Of all GC-derived lymphomas, DLBCL has been the most dicult to subdivide due to its considerable molecular heterogeneity. After years of attempts to classify DLBCL morphologicaly with little success, the first modern classification utilising the technology of next generation sequencing was the Cell Of Origin system (Alizadeh et al., 2000), which distinguished two DLBCL subtypes based on gene expression profile. Germinal centre B-like DLBCL (GCB-DLBCL) group resembling normal germinal centre B-cells and activated B-like DLBCL (ABC-DLBCL) similar to activation of peripheral blood B cells. In addition to gene expression profile, certain genetic hallmarks of B-cell lymphoma were also recognised. Identification of BCL2, MYC and/or BCL6 chromosomal rearrangements has led to distinguishing double/triple hit high grade B-cell lymphoma. This finding was further complemented by discovery of distinct transcriptomic signatures of MYC-driven GCB-DLBCL, that overlapped with double hit lymphoma cases, referred to as Molecular High-Grade DLBCL or DHSig, respectively (Sha et al., 2019; Ennishi et al., 2019). More recently three genomic studies from Harvard (Chapuy et al., 2018), National 41 Cancer Institute (NCI) (Schmitz et al., 2018) and the UK Haematological Malignancy Research Network (HMRN) (Lacy et al., 2020) identified converging genetic subclasses based on the profile of mutations. The current status molecular profiling of DLBCL has been summarised in the Figure 1.11 and covered in detail by Cutmore, Krupka, Hodson (2022) 1.5 Role of translation in B-cell development and ma- lignancy Although GC reaction and lymphomagenesis have been extensively studied at the level of transcription, much less is known about post-transcriptional regulation of B-cells development and maturation. Deregulation of mRNA translation is common in B-cell lymphoma and typically involves 1) changes in the expression of the core components of translation machinery or pathways regulating its activity, 2) microRNAs-driven regulation of protein synthesis, and 3) translational control of key lymphoma oncogenes or tumour suppressors (Horvilleur et al., 2010). Aberrant MYC or mTOR signalling are the most studied examples of pathways a↵ecting the proteins synthesis in B-cell lymphomas. MYC translocation and mutations are hallmarks of Burkitt Lymphoma and Molecular High-Grade (and Double-Hit) DLBCL subtype (Schmitz et al., 2012; Sha et al., 2019), while mTOR activation is usually associated with activation of B-cell receptor (BCR) signalling, PTEN loss (about 14 % of DLBCL), or mutations in ERK/RAS pathway. (Chapuy et al., 2018; Reddy et al., 2017; Lacy et al., 2020; Pfeifer et al., 2013) The oncogenic activity of MYC includes hyperactivation of protein synthesis, but it is also dependent of the translation machinery. Haploinsuceincy in Rpl24 was shown to ab- rogate MYC-induced transformation in vivo, while maintaining normal B-cell development (Barna et al., 2008). This e↵ect was specific to MYC-driven tumours as in Tp53/ mice, the frequency and latency of tumours was not a↵ected by Rpl24+/ (Barna et al., 2008). Simultaneous deregulation of c-MYC and eIF4E expression accelerates the development of B-cell lymphoma in transgenic mice. The molecular mechanism of this cooperation results from eIF4E’s ability to suppress MYC-induced apoptosis by promoting cellular senescence (Ruggero et al., 2004). It is interesting in the light of another study, where overexpression of eIF4E in lymphoma was associated with chemotherapy resistance (Wendel et al., 2004). Another important pathway converging at regulating protein synthesis is BCR signalling, which is vital to the survival of normal and malignant B-cells. Depending on whether it requires antigen engagement or not, BCR signalling in lymphoma is termed chronic active or tonic. 42 Figure 1.11: Topography of DLBCL molecular subtypes A conceptual representation of the genetic classification of DLBCL. Coloured hills depict known genetic subtypes. The genes most commonly altered are indicated in white. Each subtype is labelled with LymphGen/NCI cluster name (red) (Schmitz et al., 2018), HMRN (black) (Lacy et al., 2020) and Harvard (blue) equivalent (Chapuy et al., 2018). Briefly, the MCD/C5/MYD88 subtype covers DLBCL cases with poor prognosis driven by mutations in B-cell receptor, Toll like receptor (TLR) and NFB pathway accompanied by immune evading mutations. The EZB/C3/BCL2 group distinguishes cases with recurrent BCL2, EZH2, CREBBP, KMT2D mutations sharing the pattern with follicular lymphoma (FL). Interestingly, based on the gene expression signature or MYC amplification status, LymphGen and HMRN characterise an extra subtype of termed EZB-MYC/MHG. Next, the BN2/C1/NOTCH2 subtype includes mutation of NOTCH2, TNFAIP3, BCL10, NFKBIZ. The ST2/C4 group is characterised by activation of the JAK/STAT/ERK pathway and mutations in SOCS1, DUSP2, STAT3 and BRAF. The HMRN systems additionally distinguishes two subgroups within ST2/C4, based on SGK1 and SOCS1 mutation status. Lastly, the N1/NOTCH1 subtype is defined by activating mutations of NOTCH1 gene and the A53/C3 subtype is characterised by enrichment for TP53 mutations and aneuploidy. Whilst patients position on top of the coloured hills will be reproducibly classified by each classification system, patients in valleys may be unclassified or classified alternatively across di↵erent classification systems. Uncoloured hills correspond to unknown DLBCL subtypes which may emerge in the future from currently unclassified cases. See Cutmore, Krupka, Hodson (2022) 43 Whereas the pro-survival e↵ect of the antigen-independent tonic BCR signalling is dependent on the activated PI3K/AKT pathway, the antigen stimulated chronic active BCR signalling engages multiple pathways including PI3K and NF-B (Sha↵er III et al., 2012). Tonic BCR activity is characteristic to BL and GCB subtype of DLBCL, while the chronic active BCR signalling is characteristic to ABC-DLBCL. BCR-mediated mTOR activation promotes both: global 50cap dependent translation initation and preferential translation of selected transcripts through increased translation factors activity, see section 1.2.1 and 1.3.1 . Although, core components of the translation machinery are rarely mutated in B-cell lymphoma (usually in less than 10% of cases), their expression is frequently abnormal (Taylor et al., 2020). The analysis of mRNA expression of 16 eIFs showed that 12 out of 16 are overexpressed in DLBCL compared to normal tissue samples (Unterluggauer et al., 2018). Deregulated activity of translation factors may drive pro-oncogenic programmes of translation. For example, overexpression of eIF4B promotes translation of proteins associated with tumour cell survival (DAXX, BCL2 and ERCC5) and correlates with poor clinical outcome (Horvilleur et al., 2014). Another study has shown that translation of key lymphoma oncogenes, such as MYC, BCL6, BCL2 is under the control of USP11, which is recruited to the translation initiation complex where it deubiquitinates and stabilises eIF4B (Kapadia et al., 2018). In addition to the core components of the translation machinery, there are other RNA- binding proteins that, when mutated or deregulated, may promote lymphoma development and progression. Of these a DEAD-box helicase 3, X-linked (DDX3X) has attracted much attention recently as it been found recurrently mutated in Burkitt Lymphoma (Grande et al., 2019; Schmitz et al., 2012), Chronic Lymphocytic Leukaemia (CLL) (Ojha et al., 2015; Takahashi et al., 2018), medulloblastoma (Jones et al., 2012; Pugh et al., 2012; Robinson et al., 2012), head and neck squamous cell carcinoma (Stransky et al., 2011) and NK-T cell lymphoma (Jiang et al., 2015). The role of DDX3X has been extensively studied in medulloblastoma (Jones et al., 2012; Pugh et al., 2012; Robinson et al., 2012; Patmore et al., 2020; Samir et al., 2019), where it has been classified as tumour suppressor gene regulating the activity of Wnt signalling. DDX3X function in other cancers, including lymphoma, is less understood as it has been recognised as both tumour suppressor or oncogene depending on the cancer type, what is not surprising given ubiquitous role of DDX3X in RNA biology. DDX3X roles include regulation of transcription, splicing, nuclear export, stress granule formation and resolution, microRNA biogenesis, mRNA translation and decay (Linder and Jankowsky, 2011; Mo et al., 2021). It is entirely unknown which of these functions may be relevant to lymphoma. 44 Table 1.1: Examples of translationally regulated genes relevant to mature B-cell lymphomas, from Taylor et al. (2020) Gene(s) Protein function Association with B-cell neoplasm Evidence for translational control Ref MYC Cell growth t(14,8) translocation in BL Recurrent mutations in MHG/Double-Hit DLBCL Increased mRNA translation following BCR stimulation Inhibition of eIF4A reduces MYC expression in lymphoma cell lines Inhibition of eIF4E reduces MYC mRNA translation and nuclear export in DLBCL cell lines (Yeomans et al., 2016) (Culjkovic-Kraljacic et al., 2016) (Steinhardt et al., 2014) (Schatz et al., 2011) (Wilmore et al., 2021) MCL1 Cell survival (anti-apoptotic) Expression induced following BCR stimulation Inhibition of eIF4A reduces MCL1 expression in lymphoma cell lines (Schatz et al., 2011) BCL2 Cell survival (anti-apoptotic) t(14;18) translocation in FL Recurrent translocations in GCB-DLBCL and double/triple hit DLBCL Inhibition of eIF4E reduces BCL2 mRNA translation and nuclear export in DLBCL cell lines (Culjkovic-Kraljacic et al., 2016) CCND1 Cell cycle progression t(11;14) translocation in mantle cell lymphoma Inhibition of eIF4A reduces CCND1 expression in lymphoma cell lines (Schatz et al., 2011) BCL6 Transcriptional repression t(14;18) translocation in double/triple hit lymphoma Inhibition of eIF4E reduces BCL6 mRNA translation and nuclear export in DLBCL cell lines (Culjkovic-Kraljacic et al., 2016) CARD11 BCL10 MALT1 CBM complex components which control NF-B activation Recurrent mutations in subset of ABC-DLBCL Mediates NF-B activation downstream of BCR Inhibition of eIF4A reduces CBM complex expression in lymphoma cell lines (Steinhardt et al., 2014) 45 Another interesting topic is aberrant expression of microRNA (miRNA) in B-cell malignancies. MiRNAs are small (about 22 nucleotides) non-coding RNAs that interact with a group of mRNA, through fully or partially complementary sequence in the 30UTR, regulating their translation activity and degradation, see Filipowicz et al. (2008) for a review. Expression of specific miRNAs signatures in DLBCL predict lymphoma subtype, event-free survival and the risk of transformation from indolent follicular lymphoma to DLBCL (Lawrie et al., 2009; Malumbres et al., 2009; Li et al., 2009a). Characteristic miRNAs patterns have been also observed for Burkitt Lymphoma (Leucci et al., 2008), Hodgkin Lymphoma (Navarro et al., 2008) and Mantle Cell Lymphoma (Zhao et al., 2010). Interestingly, even for the same disease, little overlap is observed in miRNAs expression pattern between individual studies. miRNAs identified as important for one type of lymphoma do not necessarily have similar e↵ect in another, or their role might be ambiguous (Horvilleur et al., 2010). The role of individual miRNA in B-cell malignancies has been reviewed by Sole et al. (2016). Finally, the expression level of several genes relevant to B-cell malignancies can be regulated at the level of translation, detailed in Table 1.1. 1.6 Toolkit to study heterogeneity of translation ”A small particulate component of the cytoplasm” was the title of the paper, where George E. Palade revealed the ribosome to the world (Palade, 1955). Although the process of translation has been known since 1955 and its biochemical and molecular nature was comprehensively studied over the last decades, the analysis of the dynamics of these ”small particles”, especially from a genome-wide perspective, is still challenging. Technical diculties associated with studying translation are related to the necessity of studying simultaneously two chemically distinct types of molecules: RNA and protein. The translation process refers to the very short and dynamic moment when the two families of molecules physically interact and one (RNA) catalyses the synthesis of another. Therefore measuring the concentration of total transcript or protein abundance does not reflect true translation intensity as many factors, not directly related to protein synthesis, may a↵ect the abundance of both. A simple workaround is to use special separation or labelling techniques to extract subpopulation of mRNA or protein that are functionally related to translation. At the protein level, it will be a pool of newly synthesised proteins, ideally nascent proteins that are still bound by the ribosome. At the transcript level, the evidence of physical interaction with the translation machinery can be used as a proxy for active translation. Another challenge is translation heterogeneity, which involves not only di↵erences in translation intensity between single cells and tissues, but also varying translation 46 programmes of single or groups of transcripts. This phenomenon addresses the problem of the scale and resolution of translation quantification. Available methods can be classified by the number of transcripts or proteins measured simultaneously into low-throughput and high-throughput, by the level of insight, e.g. bulk, single cell or cell compartments or by resolution - single nucleotide, transcript wide or global. The choice of the experimental technique is dictated by the biological question and the level of throughput required to answer it. Net-abundance of each protein is a function of both - synthesis and degradation rate, so that it is safe to assume that only the pool of newly synthesised proteins can mirror the translation rate (Iwasaki and Ingolia, 2017). Because of relatively large content of pre-existing proteins in the cell, special separation or labelling techniques are needed to quantify this subpopulation. Earlier methods involved radioisotope labelling of newly synthesised proteins, for example with 35S-methionine or cysteine. Nowadays, non-radioactive, luminescent labelling is more popular. The most common techniques utilise aminoacyl tRNA analogues, such as azidohomoalanine (AHA) or alkyne-bearing homopropargylglycine (HPG) (Iwasaki and Ingolia, 2017), which are incorporated into the C-terminal end of nascent proteins. The disadvantage of using amino-acids analogues is methionine-depletion step, which can induce stress response and disrupt a delicate translation pattern. This does not apply to puromycin analogue - alkyne-bearing puromycin (O-propargyl-puromycin; OPP), which induces premature translation termination and release of nascent proteins with OPP attached. Labelled proteins, depending on the exact assay, can be detected by specific antibodies or secondary labelled with fluorophore by CLICK chemistry (Aviner, 2020). The protocol can be conjugated with several well-established techniques, such as confocal microscopy, fluorescence-activated cell sorting (FACS) or classical immunohistochemistry (Liu et al., 2012; Iwasaki and Ingolia, 2017). Methods involving nascent proteins labeling allows the measurement of bulk protein translation, or, depending on the exact protocol, provides single-cell or event cell compartment resolution. High-throughput techniques have been dominated by two approaches: mass-spectrometry (MS) or mRNA sequencing based. The first usually combines some type of labelling to isolate newly synthesised proteins with subsequent measurement with tandem MS. Pulsed stable isotope labeling by amino acid in cell culture (pSILAC), which is the most popular technique, involves supplementation of the growing media with amino-acids labelled with stable isotopes only for a brief moment. Proteins with incorporated heavy amino-acids (newly synthesised) can be easily distinguished from pre-existing light proteins by their mass. The abundance of heavy labelled proteins reflect the relative intensity of protein 47 synthesis (Iwasaki and Ingolia, 2017). Other MS techniques, such as BONCAT, QuaN- CAT or PUNCH-P, relies on isolation of labelled proteins with streptavidin beads. In bio-orthogonal noncanonical amino acid tagging (BONCAT), AHA-labelled proteins are tagged with biotin. A variation of BONCAT, quantitative noncanonical amino acid tag- ging (QuaNCAT), is a combination of BONCAT and SILAC allowing to quantify early changes of protein synthesis with higher specificity and greater depth than standard BON- CAT (Howden et al., 2013). Finally, in puromycin-associated nascent chain proteomics (PUNCH-P) (Aviner et al., 2013), translating ribosomes are isolated from the cell with ultracentrifugation, next, nascent peptides are labelled with biotin-dC-puromycin, pulled down with streptavidin beads and analysed by MS (Aviner, 2020). An advantage of MS- based approaches is their accuracy and high-throughtput. Unfortunately, quantification of the protein abundance with MS has limited dynamic range and capacity to identify novel products, isoforms or proteins with post-translational modifications (Iwasaki and Ingolia, 2017). Polysome fractionation and ribosome footprinting (a.k.a Ribosome Profiling or Ribo- Seq) are two main high-throughput strategies to analyse translation from the individual transcript perspective (Figure 1.12). Both are almost equally popular nowadays, but their technical protocols di↵er substantially and thereby biological interpretation of the results. Ribo-Seq and polysome fractionation work under the assumption that 1) actively translated mRNAs are physically associated with the ribosomes, and 2) the eciency of translation is proportional to the number of ribosomes bound to mRNAs (Piccirillo et al., 2014). In polysome fractionation, cell lysates containing polysomes (mRNAs with multiple ribosomes), monosomes (mRNAs with a single 80S ribosome), and free ribosomal subunits are centrifuged over a sucrose density gradient, which allows for separation of transcripts according to the number of ribosomes attached. This is referred to as a polysome profiling step. Each fraction can be distinguished by change in optical density (absorbance) and subjected to analysis. The composition of it can be evaluated in terms of the individual mRNA or protein abundance. For mRNA-centric studies, cDNA microarray (historical approach), RNA-seq (current approach) or RT-PCR (low-throughput) can be used (Piccirillo et al., 2014; Johannes et al., 1999; Karginov and Hannon, 2013). Typically, mRNAs translated more eciently will be enriched in the heavy polysome fraction (more than three ribosomes per mRNA) (Piccirillo et al., 2014; Johannes et al., 1999), while less translated mRNA will be more abundant in the light fraction, which usually corresponds to polysomes with three or less ribosomes and the monosome fraction. For better resolution, each fraction can be quantified separately. Ribo-Seq is centred on the idea of ribosome footprinting, developed initially by Wolin and Walter (1988), which allows to determine the position of a ribosome with single-nucleotide precision (Ingolia et al., 2009). Ribosomes bound to a mRNA protect a 48 fragment (about 30 nucleotides) from RNAse digestion (ribosome-protected fragments, RPFs). By utilising the advances in next-generation sequencing, it is possible to sequence those RPFs, thus, to infer the position of each translating ribosome. The information about the number and the location of ribosomal footprints can be used to compute ribosome density for a single transcript, which will be a proxy for its translation intensity (Ingolia et al., 2009). In contrast to Ribo-Seq, polysome profiling-based approaches do not allow to analyse the precise location of attached ribosomes. It might be enough to judge which transcripts forms polysomes (are actively translated), but no information about the exact location and reading frame are available. On the other hand, polysome fractionation maintains the entire transcript intact, which may be advantageous for studying the e↵ect of di↵erential splicing or polyadenylation on translation. Polysome fractionation combined with RNA-Seq reports the total number of ribosomes per mRNA, which depending on the fraction in which a transcript was found, while Ribo-Seq returns the position of every ribosome (regardless of the polysome/monosome fraction of the entire transcript). Hence, Ribo-Seq allows for quantification of the relative ribosome density per transcript, whereas polysome fractionation-RNA-Seq gives an estimation of the absolute number of ribosomes. This may prove less ecient when studying translation landscape following large shifts in global translation, but no systematic comparison has been performed so far (Piccirillo et al., 2014) Finally, with Ribo-Seq it is possible to annotate de novo, per sample, directly from the data, which regions of the genome are actively translated at subcodon level, see chapter 5, which is not possible with polysome profiling. In a typical Ribo-Seq experiment, treatment with a translation inhibitor briefly before harvesting aims to freeze the translating ribosomes preventing them from dissociating from the transcript and maintaining the fidelity of the ribosome footprint localisation. The combination of ribosomal footprinting with di↵erent translation inhibitors, targeting di↵erent stages of translation, can broaden the biological application of the Ribo-Seq data. Cycloheximide (CHX), used in the original Ribo-Seq protocol (Ingolia et al., 2009; McGlincy and Ingolia, 2017), is the most popular choice. CHX is a small molecule inhibitor blocking eukaryotic translation elongation by binding to the ribosome E-site and blocking eEF2-mediated translocation (Obrig et al., 1971; Schneider-Poetsch et al., 2010; de Loubresse et al., 2014). In contrast, two other popular inhibitors: lactimidomycin (LTM) and harringtonine (HARR), require empty E-site, thus inhibit specifically only the first round of elongation capturing the ribosomes at the translation initiation site (Iwasaki and Ingolia, 2017; de Loubresse et al., 2014). Ribo-Seq combined with HARR or LTM treatment allows to build a genome-wide map of start codons and is known as global translation initation site sequencing (GTI-Seq) (Iwasaki and Ingolia, 2017). The ribosomes not blocked by the translation inhibitor dissociates from the transcripts leaving only the initiation site footprints for sequencing. A variation of this protocol, quantitative 49 translation initiation sequencing (QTI-seq), involves sequential treatment with LTM and puromycin, which facilitates run-o↵ of non-initiating ribosomes increasing resolution of translation initiation site localisation (Gao et al., 2015). The full spectrum of ribosome footprinting-based techniques has been reviewed by Iwasaki and Ingolia (2017) Figure 1.12: Comparison of polysome fractionation and Ribo-Seq technique Polysome fractionation and Ribo-Seq allow analysing the translation intensity of individual transcripts. Polysome fractionation is based on the technique of polysome profiling, where a sucrose density gradient separates polysomes, monosomes and individual ribosomal subunits. Each polysome fraction contains the transcripts with a di↵erent number of ribosomes attached. By employing a next generation sequencing technology (e.g. RNA- Seq), it is possible to infer which transcripts are abundant in each fraction. In contrast, in the Ribo-Seq workflow, transcripts are digested with RNAse generating a mixture of mRNA fragments protected from RNAse digestion by attached ribosomes. These, so-called ribosomal footprints can be analysed with next generation sequencing technique. 50 1.7 Project aims The aim of this thesis is to explore the role of translational control during lymphoma development. Firstly, I established an automated bioinformatic pipeline for ecient processing of Ribo- Seq datasets which allows to keep the computational workflow transparent, reproducible and flexible. Next, I reviewed and benchmarked current methods developed for identifying genes regulated at the level of translation. This allowed me to select a strategy to elucidate that overexpression of two B-cell oncogenes, BCL6 or MYC, is followed by preferential translation of selected transcripts. Then, I focus on the role of RNA-helicase (DDX3X) in MYC-driven lymphoma. I reveal that that loss-of-function mutations in DDX3X are common in MYC-translocated B-cell lymphomas facilitating early tumour development. By controlling tranlation of selected transcripts, mutated DDX3X bu↵ers the e↵ects of MYC on translation of ribosomal proteins and the rate of global protein synthesis. Finally, I analyse a large dataset of 79 Ribo-Seq libraries to investigate a genome- wide distribution of translating ribosomes and the extent of non-canonical translation in lymphoid cells. I show pervasive translation of ostensibly non-coding region, and design a knock-down CRISPR screen library to identify those important for B-cell survival. 51 52 CHAPTER 2 Materials and methods 2.1 Materials 2.1.1 Overview of genomic sequences and annotations used in this study Resource Source Purpose Gencode v.29 Frankish et al. (2019) Gene and transcript models GRCh38 Genome Reference Consor- tium (FASTA file down- loaded from Gencode v.29) Nucleotide sequence of the GRCh38 primary genome assembly H.sapiens rRNA RefSeq (FASTA files) Pre-alignment of Ribo-Seq reads Reactome Joshi-Tope et al. (2005) Pathway knowledgebase MSigDb Subramanian et al. (2005) Molecular Signatures Database for GSEA sORFdb Olexiouk et al. (2018) Proteogenomic database OpenProt Brunet et al. (2021) Proteogenomic database 53 2.1.2 Overview of external datasets used in this study Accession ID Source Description GSE125966 McCord et al. (2019a) RNA-Seq from GOYA clinical trial GTEx Consortium et al. (2020) RNA-Seq samples from 54 tissue sites in non-diseased individuals TCGA Weinstein et al. (2013) Pan-cancer RNA-Seq dataset CGCI-BLGSP- 2019 Grande et al. (2019) RNA-Seq from Burkitt Lymphoma pa- tients GSE35163 Schmitz et al. (2012) RNA-Seq from Burkitt Lymphoma pa- tients EGAS00001003560 Caeser et al. (2019) RNA-Seq from Primary Germinal Cen- ter B-cells COSMIC Tate et al. (2019) Database of somatic mutations in can- cer MSV000084172 Sarkizova et al. (2020) Mono-allelic MHC-I peptidome from B721.221 cell line PXD000332 Deeb et al. (2014) N-glyco FASP and super-SILAC mass spectrometry from lymphoma patients PXD002004 Johnston et al. (2018) TMT 10-plex mass spectrometry from CLL patients and peripheral B-cells PXD002098 Deeb et al. (2012) Super-SILAC dataset from lymphoma cell lines PXD004452 Bekker-Jensen et al. (2017) Deep-proteome dataset of HeLa cells, tissue samples (colon, prostate, liver) and 5 human cell lines PXD004746 Khodadoust et al. (2017) MHC-I and MHC-II peptidomes from lymphoma patients PXD010808 Khodadoust et al. (2019) MHC-I peptidome from lymphoma pa- tients 54 2.1.3 Overview of computational software used in this study Tool Source Purpose FastQC Andrews et al. (2017) Quality check: FASTQ files Cutadapt Martin (2011) Adapter trimming Bowtie2 Langmead and Salzberg (2012) Prealignment of Ribo-Seq to rRNA STAR Dobin et al. (2013) Sequencing reads alignment MultiQC Ewels et al. (2016) Summary of QC and alignmnt reports Samtools Li et al. (2009b) BAM files indexing GenomicFeatures Lawrence et al. (2013) Manipulating genomic locations Samtools Morgan et al. (2016) Processing of SAM or BAM files DESeq2 Love et al. (2014) Di↵erential expression analysis edgeR Robinson et al. (2010) Di↵erential expression analysis Deeptools Ramı´rez et al. (2016) Metagene analysis clusterProfiler Yu et al. (2012) GO and GSEA analysis GATK Van der Auwera and O’Connor (2020) SNV calling from RNA-Seq data Picard Broad Institute BAM files processing GSEA Subramanian et al. (2005) GSEA analysis ORFLine Hu et al. (2021) ORF identification from Ribo-Seq data ORF-RATER Fields et al. (2015) ORF identification from Ribo-Seq data RiboCode Xiao et al. (2018) ORF identification from Ribo-Seq data RibORF Ji (2018) ORF identification from Ribo-Seq data Salomon Patro et al. (2017) Isoforms quantification UCSC toolkit Kent et al. (2002) Processing of BigWig files Chop-chop Labun et al. (2019) gRNAs scoring and selection MaxQuant Cox and Mann (2008) Mass spectrometry data analysis Comet Eng et al. (2013) Mass spectrometry data analysis NewAnce Chong et al. (2020) Mass spectrometry data analysis ProteoWizard Chambers et al. (2012) Mass spectrometry data analysis Xtail Xiao et al. (2016) Di↵erential translation analysis Riborex Li et al. (2017) Di↵erential translation analysis anota Larsson et al. (2011) Di↵erential translation analysis anota2 Oertlin et al. (2019) Di↵erential translation analysis deltaTE Chothani et al. (2019) Di↵erential translation analysis Ribowaltz Lauria et al. (2018) Ribo-seq QC 55 2.2 Methods 2.2.1 Next Generation Sequencing library preparation and se- quencing 2.2.1.1 RNA-Seq Performed by Dr. Jie Gao Total RNA was extracted using NucleoSpin RNA extraction kit (Machery-Nagel, Cat No. 740955.250) according to manufacturer’s protocol. 500 ng of total RNA was used to prepare RNA-seq libraries using NEBNext Poly(A) mRNA magnetic isolation module (Cat No. NEB E7490) as per the manufacturer’s instruction. Final libraries were amplified by PCR for 12 cycles, purified with AMPure XP beads and analyzed by Agilent Bioanalyser before sequencing on an Illumina Hi-seq4000. 2.2.1.2 Ribo-Seq Performed by Dr. Jie Gao Ribosome profiling was conducted as previously described (Ingolia et al., 2012) with minor modifications. 5 million cells per sample were treated with 100 µg/ml of cyclo- heximide and immediately centrifuged and lysed in 300 µl of bu↵er containing 20 mM Tris-HCl, pH 7.4, 150 mM NaCl, 5 mM MgCl2, 1% NP40, 1 mM DTT and 100 µg/ml cycloheximide. 100 µl of the lysate were reserved for RNA-Sequencing and the rest treated with DNase I and RNase I. Ribosome monomers were purified using Microspin S-400 columns. The ribosome protected RNA fragments (RPF) were extracted using RNA clean up and concentration kit (23600, Norgen Biotek). RPF were resolved in 15% Novex TBE-Urea gels, stained with SYBR gold and fragments with 26-34 nt were excised from the gel. The RNA was extracted from the gel by electrophoresis using D-tube (MWCO 3.5 kDa, 71506-3, Merck Chemical). Precipitated RNA was dephsphorylated by T4 polynucleotide kinase and ligated to universal miRNA Cloning Linker (NEB) using T4 RNA ligase 2 truncated. cDNA was prepared with SuperScript III and reverse transcription primer containing a degenerate 5-nucleotides molecular barcode sequence. The cDNA was resolved by polyacrylamide gel electrophoresis, excised and extracted by dialysis D-tube (71504-3, Merk). The extracted cDNA was then circularized and PCR amplified. The final library was separated from PCR primers by electrophoresis and extracted by dialysis using D-tube (71504-3, Merk) before sequencing on an Illumina Hi-Seq4000 as 50 nt single-end reads. 56 2.2.2 Processing and quality control of Next Generation Se- quencing data 2.2.2.1 Adapter trimming and alignment to the reference genome Raw FASTQ files were stripped of adapter sequence using Cutadapt. Reads shorter than 15 nucleotides were discarded. After quality check with FastQC 0.11.5, Ribo-Seq reads were additionally filtered by rRNA using Bowtie2 2.3.4 with seed length 15. The remaining reads were then mapped to the human genome (GRCh38) using STAR 2.5.4a with default parameters. The reference human rRNA index was constructed from RefSeq database. STAR genome index was built with GENCODE v.29 comprehensive gene annotation set. 2.2.2.2 Read counting Trimmed gene models were built using GENCODE v.29 comprehensive gene annotation set with GenomicFeatures R package. The first 30nt and last 30nt of each CDS region were removed to reflect translation elongation intensity, as described previously. Trimmed CDS models representing di↵erent transcript isoforms were merged by gene. The number of footprints for 50UTR and 30UTR was obtained using a similar logic: the UTRs regions were trimmed by 5 nucleotides adjacent to the annotated start (for 50UTR) or stop codon (for 30UTR). As is expected from Ribo-seq, 28-30 nt long fragments with evident triplet nucleotide periodicity relative to start and stop codon were the most abundant and were selected for further analysis. Localisation of ribosomal P-site was determined by o↵set between 5’end of fragments spanning translation start site and annotated start codon, which was 12 nt for read lengths of 28-30 nt. Per sample per gene P-sites counts matrices were build using Genomic Ranges R package allowing for assignment of a read to more than one overlapping features. At least 25 overlapped bases were required to assign a read to a gene. Corresponding RNA-Seq samples were counted using the same gene models. Genes with low counts were filtered out with a threshold of minimum 128 counts 2.2.3 Di↵erential translation analysis of Ribo-Seq data Di↵erential translation analysis was performed as previously described (Sendoel et al., 2017; Hsieh et al., 2012). Briefly, di↵erentially expressed genes were identified using DESeq2. RNA-Seq and Ribo-Seq derived read counts were analysed separately. Log2 fold change in translation eciency (TE) was computed by subtracting Log2 fold change in mRNA from log2 fold change in ribosome footprints abundance. Both values were obtained from the DESeq2 output. The following rules were applied to define di↵erentially translated genes: 57 1. Statistically significant change in ribosomal footprints abundance (evaluated with standard DESeq2 workflow), FDR < 0.1, 2. Absolute mean log2 fold change in mRNA abundance < 0.3, 3. Absolute mean log2 fold change in TE abundance > 0.3. In addition, two other gene regulatory classes were identified dependent on the rela- tionship between Ribo-Seq and RNA-Seq measured expression changes. For genes with statistically significant change in mRNA abundance (FDR < 0.1 and absolute log2 fold change in mRNA abundance > 0.3) and concordant change in Ribo-Seq (mean log2 fold change in ribosomal footprints abundance > 0 or < 0 depending on the direction of change and regardless of the statistical significance) are classified as ‘Homodirectional’. When the direction of change is opposite, the genes are classified as either ‘bu↵ered mRNA up’ or ‘bu↵ered mRNA down’. 2.2.4 Di↵erential expression and downstream analysis of RNA- Seq data 2.2.5 Metagene analysis of iCLIP and Ribo-Seq data A metagene analysis for scaled density of Ribo-Seq reads or iCLIP hits relative to start and stop codon was performed using deeptools. The coverage of sequencing reads was normalized per sample by the total number of uniquely mapped reads (CPM) excluding sex chromosomes. Scaled coverage per each transcript was computed using computeMatrix func- tion with parameters as follows: scale-regions -m 10000 -bs 20 -m 5000 -b 3000 -a 3000 -p 40 --metagene --exonID CDS --transcriptID tran- script --skipZeros. Transcripts models were built using GENCODE v.29 basic gene annotation set. 2.2.6 Di↵erential expression analysis of RNA-Seq data Uniquely mapped reads were assigned to genes using featureCounts function from Rsubread package allowing for assignment of a read to more than one overlapping features. At least 25 overlapped bases were required to assign a read to a gene. Di↵erential expression analysis was performed using standard workflow from DESeq2 package. 2.2.7 Downstream data analysis Identified up-regulated and down-regulated genes were used to perform gene ontology analysis with enrichGO function from clusterProfiler package. Enrichment scores were 58 computed using in-house scripts by taking the ratio between the number of di↵erentially expressed genes overlapping with a gene ontology set and the number of background genes assigned to this gene set. Pathway analysis was performed using browser-based Reactome Pathway Database. A list of all expressed genes detected in RNA-Seq was used as a background set for over-representation testing. Gene Set Enrichment Analysis (GSEA) was performed using the GSEA function from clusterProfiler package. Gene expression measurements were normalized using variance stabilising transformation, as implemented in DESeq2 package, and analysed for enrichment in hallmark gene set from MSigDB v. 7.0 2.2.7.1 Individual-nucleotide resolution UV crosslinking and immunoprecip- itation (iCLIP) iCLIP libraries were performed by Dr Chun Gong in the Hodson lab using two lymphoma cell lines as well as primary human germinal centre B cells purified from donor tonsil. iCLIP data preprocessing (alignment, trimming and peaks calling) was performed by Dr Igor Ruiz De Los Mozos using iMaps web server. The downstream analysis and data visualisation was performed by me. Briefly, Unique Molecular Identifiers were used to distinguish and remove PCR duplicates before removing experimental barcodes and Solexa adapters. The trimmed reads were mapped to GENCODE GRCh38 v.29 using STAR with default parameters. First nucleotide after the UMI was assigned as the crosslink site defined by the truncated cDNA. Crosslink significant sites were determined by the iCount peaks finding algorithm (False Discovery Rate < 0.05), by weighting the enrichment of crosslinks versus shu✏ed random positions. Neighbouring cDNA start position less than 15 nt apart were joined to form high confidence crosslink clusters with iCount clusters function. Genes with more than 4 cross-linking peaks in at least one experiment were considered as valid DDX3X targets. 2.2.8 Identification of DDX3X mutations from RNA-Seq data I used published RNA-Seq data to identify cases with mutation of the DDX3X gene. Single Nucleotide Variant (SNV) calling was performed according to GATK guideline using paired-end RNA-Seq data from diagnostic biopsies of 553 patients enrolled in the GOYA trial - a phase III of 1st line treatment in DLBCL. Briefly, RNA-Seq paired-end reads were mapped to the reference genome using STAR 2.5.4a 2-pass mode (Dobin et al., 2013). Read groups were added with AddOrReplaceReadGroups and duplicated reads were identified with MarkDuplicates script from Picard tools. Sequences overhanging intronic regions were hardclipped and STAR mapping qualities were reassigned to match GATK software. Variant calling was performed with HaplotypeCaller GATK script, with phredScore 20 as minimal variant calling confidence. Variants clusters (at least 3 valid 59 variants in a window of 35 bases) were removed to diminish the e↵ect of RNA-Seq mapping errors. Standard variants quality filtering was applied with VariantFiltration GATK script: Fisher Strand values > 30.0 and Qual By Depth < 2.0 with. Individual SNVs were then annotated with gene names and their predicted consequences on protein function using VariantAnnotation Bioconductor package and gene models from GENCODE comprehensive gene annotations set (v.28). In order to identify samples with potential DDX3X mutation, ENSG00000215301 (gene ID) and ENST00000644876 (transcript ID) DDX3X models were used as a reference. All nonsense, frameshift and non-synonymous SNVs with the ratio of variant coverage : reference coverage > 0.2, localised in DDX3X helicase domain and not previously reported in the ExAC database of common population variants were considered as valid hits. 2.2.9 DLBCL Cell-of-origin identification from RNA-Seq data Classification of DLBCL biopsies into four transcriptomic subtypes: ABC, GCB, Unclassi- fied and Molecular High Grade (MHG) DLBCL was performed as previously described (Reddy et al., 2017; Sha et al., 2019). 2.2.10 Chromosome Y expression identification from RNA-Seq data Chromosome Y expression identification in human RNA-Seq data was performed using decision tree algorithm implemented in rpart R package. The model was trained using RNA-Seq data from Genotype-Tissue Expression (GTEx) Project, which includes gene expression samples from 54 tissue sites in non-diseased individuals (11688 samples in total)(2013). The GTEx datasets (V8) were obtained from https://gtexportal.org/home/. Raw RNA-Seq counts were filtered and TMM normalised. Per gene scaled gene expression values were used as an input. The GTEx data were randomly split into a training and test set comprising, respectively, 80% (9333 samples) and 20% (2355) of the data. The algorithm running on default parameters achieved high performance: F1 score = 0.9973, AUC = 0.9973, 8 males were misclassified. KDM5D, DDX3Y, USP9Y, RPS4Y1, TXLNGY, XIST were identified as classifying genes. In order to assess the ability of the algorithm to classify cancer samples, GTEx trained model was benchmarked against cancer dataset. TCGA gene expression data (RNA-Seq only)(Weinstein et al., 2013) were downloaded using TCGAbiolinks, the Bioconductor package for integrative analysis with GDC data. Similarly, the dataset was split into a training (80%, 9198 samples) and test set (20%, 2336 samples). The algorithm achieved lower performance than in GTEx data: F1 score = 0.9624, AUC = 0.9779, 46 males and 37 females were misclassified on default parameters. When the GTEx trained model was tested on the TCGA test dataset, the number of 60 misclassified females and males was 2 and 295, respectively. The 295 males classified as females showed remarkably lower expression of genes localised on chromosome Y, which could reflect previously reported loss of chromosome Y during oncogenesis (Dunford et al., 2017). In order to obtain high-confidence set of male DLBCL patient samples with chromosome Y expression, the GTEx trained model was used to classify samples in the GOYA dataset. 2.2.11 Hierarchical de novo identification of translated regions from Ribo-Seq data Ribo-Seq and RNA-Seq FASTQ files were processed and a standard quality check was performed as described earlier in section 2.2.2. P-site location was determined automatically with Plastid suite using psite and phase by size functions with default settings. Only reads of length between 25 and 33, with clear three nucleotide pattern, were used to call ORFs. Aligned sequencing reads were used to run ORFLine (Hu et al., 2021), ORF- RATER (Fields et al., 2015), RibORF (Ji, 2018) and RiboCode (Xiao et al., 2018) with default settings. Harringtonine treated samples, if available, were used as an additional input for ORF-RATER run to increase precision of locating translation initation sites. Identified ORFs were annotated over the reference transcriptome that was customised for lymphoid cells, so that it contains only one the most highly expressed representative isoform per gene. Three di↵erent RNA-Seq datasets were queried for that purpose: 1) 50 base pair single-end RNA-seq data matching Ribo-Seq dataset (77 samples), 2) 50 base pair paired end RNA-Seq data from GOYA clinical trial (553 samples) (McCord et al., 2019a). Transcript-level quantification of expression was performed with Salmon following the recommendations from the user manual (Patro et al., 2017). Salmon .sf files were loaded to R with listings package. Scaled TPMs (transcript expression abundance scaled to library size) were used to determine a mean expression of each isoform. The isoform with the highest mean expression was selected as representative. For protein coding transcripts, only protein coding isoforms were considered. Identified ORFs were classified as canonical if both, start and stop codon were present in Gencode V.29 annotations. All the remaining ORFs were assigned to one out of seven types as defined in Table 2.1. 61 Table 2.1: Unified naming system for classifying identified ORFs Unified type Definition canonical Matching start and stop codon uORF Initiating and terminating downstream of the canonical ORF. dORF Initiating and terminating upstream of the canonical ORF. novel Localised in non-coding transcript. internal Contained within known CDS but out-of-frame with canonical product. truncation Matching stop codon but initiating downstream of the canonical start site. extension Matching stop codon but initiating upstream of the canonical start site. readthrough Matching start codon but terminating downstream of the canonical termination site. overlap uORF Initiating upstream of the canonical start codon and terminating upstream of the canonical termination site. overlap dORF Initiating upstream of the canonical termination site but terminating downstream of the canonical termination site. 2.2.12 Reanalysis of published mass spectrometry datasets ProteomeXchange database (Deutsch et al., 2016) was searched for proteomic experiments performed in lymphoid cells (keywords: B-cells, lymphoma, lymphocyte) covering variety of experimental conditions and mass spectrometry (MS) techniques. Experiments with incomplete submission, targeted experiments (eg. co-immunoprecipitation of a single protein) or with very few replicates were excluded. RAW files were downloaded from PRIDE (Perez-Riverol et al., 2019) or MassIVE repository. To avoid forced misanotation of MS/MS spectra to noncanonical products, a customised reference FASTA file was build by merging a database of human canonical proteins downloaded from UniProt with amino-acid sequences of predicted micropeptides. To decrease search space noncanonical ORFs shorter than 6 amino-acids and showing more than 20 % of in-frame overlap with known ORFs were filtered out. Andromeda search engine implemented in MaxQuant 1.6.3.4 software (Cox and Mann, 2008) was used to process all collected MS datasets. The parameters were set as fol- lows: precursor mass tolerance 20 ppm, MS/MS fragment tolerance 0.5 Da; PSM FDR and Protein FDR threshold 1%. Noncanonical reference sequences were provided as proteogenomic fasta file, that allows for separate FDR calculations for canonical and noncanonical peptides. For immunopepditomics data, NewAnce workflow was followed (Chong et al., 2020), that required additional search with Comet engine (Eng et al., 2013). To match Comet input requirements, RAW files were converted to mzXML format with the MsConvert from ProteoWizard tool (Chambers et al., 2012). Search parameters for immunopeptidomics data were set as described above with small modification: peptide length of 8–15 amino-acids was set when searching MHC I data and 8-25 for MHC II. Variable and fixed modifications were kept as in the original publication. 62 2.2.13 Analysis of proteogenomic data downloaded from Open- Prot and sORFdb FASTA files containing the sequence of peptides deposited in sORFdb and OpenProt 1.6 databases were downloaded from https://www.openprot.org (Brunet et al., 2021) and http://sorfs.org (Olexiouk et al., 2018), respectively. Amino-acid sequences of candidate ORFs identified in this study were queried against those with the databases using BLASTP (Altschul et al., 1990) with default settings. ORFs matching the queried database in at least 95% were defined as valid matches. 2.2.14 Evolutionary conservation of identified ORFs Evolutionary conservation of each identified ORF was evaluated with PhastCons score, which represents the probability that each nucleotide belongs to an evolutionary conserved element. Per base values for GRCh38/hg38 were downloaded from UCSC. BigWig files obtained from three multiple alignments were selected: 1) 100 way: representing conservation between 99 vertebrate genomes and the human genome 2) 20 way: 29 vertebrates genomes, and 3) 7 way: 6 vertebrate genomes, including 3 primates. Averaged scores over ORF regions were computed with a command line tool bigWigAv- erageOverBed from UCSC toolkit (Kent et al., 2010). 2.2.15 CRISPR screen design To identify ORFs fit for CRISPR screening, entries fulfilling the following criteria were removed: 1) ORFs shorter than 30 nucleotides, 2) with mean expression below 20th percentile, 3) sharing more than 50% of sequence with known protein coding region, and 4) having a unique part (not overlapping any CDS) shorter than 30 nucleotides. The remaining ORFs were collapsed by their genomic coordinated, extended 16 nucleotides in 50 direction and used to generate a BED file. Because CRISPR-Cas9 enzyme cuts 3-4 nucleotides upstream of the PAM sequence, extension of the queried region allows to maximise the number of candidate gRNAs targeting the translation start site. BED file was used as an input for Chop Chop, a Python tool to search for all gRNA given genomic coordinates (Labun et al., 2019). A customised wrapper script was used to search for all regions at once. Chop chop parameters were set as follows: chopchop . py T 1 M NGG maxMismatches 3 g 20 scoringMethod ALL f NN n N G hg38 o temp t WHOLE Target [REGION] Obtained gRNAs were then filtered based on their GC content and the number of predicted o↵-target locations. All gRNAs with any predicted o↵-targets with 0 mismatches 63 or having GC content lower than 40% or higher than 70% were removed. gRNAs were reassigned back to ORFs and ORFs with fewer than 5 gRNAs were considered untargetable and filtered out. All sequences were prepended with a G at 50 end to facilitate transcription from the U6 promoter. gRNAs were scored as implemented in the CHOP-CHOP workflow. gRNAs with a G at position 20 (upstream of PAM were prioritised) and on-target activity scores were computed as described in (Moreno-Mateos et al., 2015; Xu et al., 2015; Doench et al., 2014; Dutt et al., 2011). Remaining ORFs were then scored and ranked based on the features increasing their biological importance. Following characteristics were used to compute the mean score per ORF: 1. Mean scaled ORF score (Bazzini et al., 2014): reflecting the accumulation of ribosomal footprints in the first reading frame: ORFscore = log2(RRF1+RRF2+RRF3), where RRF is the proportion of reads in given reading frame; scaled so that it takes values from 0 to 1, when 1 indicates a situation when all ribosomal footprints belong to the first reading frame and 0 means no first frame preference. 2. Expression level: measured as percentile of mean expression across all unmerged samples included in the study. 3. Identification in the reanalysed mass spectrometry studies: 2 points for ORFs identified in both immunopeptidomics and full proteome studies, 1 - identified in one and 0 - no found in proteomics. 4. Present in Uniprot or any of the two proteogenomic databases (sORFdb or OpenProt): 1 - if present, 0 - if not. 5. Mean conservation score computed by averaging three phasCons scores (100way, 20way and 7way) computed as described above in section 2.2.14, takes values from 0 (no evolutionary conservation) to 1 (full evolutionary conservation). 6. The number of ORF finding tools that has identified an ORF as actively translated scaled, so that it takes values from 0.25 (detected by 1 tool) to 1 (detected by all tools) 7. The number of samples in which an ORF was detected by at least one tool scaled so it takes values from 0.0084 (detected in 1 sample) to 1 (detected in all samples) 8. The proportion of di↵erential expression analyses, where an ORF reached statistical significance threshold. The library was constructed to target top scoring ORFs with 4-5 top scoring gRNAs per each and synthesised as a pool of 6000 single-stranded oligos (Twist Bioscience). 64 2.2.16 Figures preparation Figures were prepared using R package ggplot2 (Wickham, 2016) and composed into panels with Adobe Illustrator or Adobe Photoshop. 2.2.17 R and Bioconductor Statistical analysis was performed with R 4.0.2. The version of Bioconductor was 3.12. 65 66 CHAPTER 3 Genome wide quantification of translation in lymphoid malignancies 3.1 Background Deregulation of protein synthesis is a hallmark of malignant transformation, however the molecular mechanisms underlying those processes still remain elusive. The transcription factors BCL6 and MYC are essential for induction and maintenance of GC reaction (De Silva and Klein, 2015) and, when deregulated, become drivers of lymphomagenesis (Basso and Dalla-Favera, 2015). During the GC reaction, both proteins drive extensive reprogramming of B-cell gene expression landscape, which has been studied mainly at the level of transcription. The contribution of other levels, including post-transcriptional regulation, is poorly understood. Sendoel et al. (2017) showed that overexpression of SOX2 in skin epidermis is followed by broad changes in translation programme. SOX2 is a transcription factor that is highly expressed in fully developed squamous cell carcinoma as well as in cells considered to be cancer stem cells for this type of tumour. I hypothesised that similar reprogramming of protein synthesis may follow overexpression of the transcription factors BCL6 and MYC in GC B-cells. We took advantage of a primary GC B-cells culture system, that has been developed in the Hodson lab (Caeser et al., 2019). Primary GC B-cells were isolated from tonsil tissue discarded after elective tonsillectomy and cultured with Follicular Dendritic Cells (FDCs) expressing CD40L and IL 21, which mimic the supportive role of GC microenvironment. The co-culture system has been described in detail by Caeser et al. (2019). To test whether BCL6 and MYC, in addition to their transcriptional e↵ects, are responsible for changes at the level of translation, we decided to perform Ribo-Seq and RNA-Seq experiments in 67 primary GC B-cells overexpressing either MYC-t2A-BCL2 or BCL6-t2A-BCL2 or BCL2 alone (Figure 3.1 A). The co-expression of BCL2 and BCL6 or MYC from the same viral construct with T2A linker was essential to keep B-cells alive in culture and perform the experiments (Figure 3.1 B). Transduction of either MYC-t2A-BCL2 or BCL6-t2A-BCL2 is associated with malignant transformation of the primary GC B-cells inducing their prolonged expansion and survival (Caeser et al., 2019). Cell culture, transduction and sequencing libraries were prepared by Dr. Jie Gao as described in section 2.2.1.1. Figure 3.1: Studying translational regulation in primary GC B-cells overexpressing common lymphoma oncogenes A) Flowchart showing the experimental design for investigating translational changes following BCL6 or MYC overexpression. Primary GC B-cells are cultured with a monolayer of immortalised Folicular Dendritic Cells-like feeder cells. The feeders express CD40L and IL 21 providing the essential survival signal for cultured primary cells. B) Diagram illustrating the design of a viral construct for simultaneous expression of BCL2 with BCL6 or MYC using t2A linker. Thosea asigna virus 2A (t2A) is a small peptide (19-22 amino acids), which undergoes self-cleavage resulting in simultaneous expression of two or more proteins in equivalent amounts. In this chapter: 1. I introduce RiboStream, a bioinformatic pipeline developed to process Ribo-Seq and RNA-Seq data simultaneously, 2. I provide an overview and comparison of available tools for identifying di↵erentially translated genes. Despite growing popularity of Ribo-Seq in studies on protein translation, the analytic workflow is not well established. I use an external dataset of 54 Ribo-Seq and RNA-Seq samples to evaluate the available tools and choose the approach for further analysis, 3. I apply my optimised strategy to analyse changes in the translation as non-malignant GC B-cells undergo transformation driven by 2 di↵erent oncogene combinations. 68 3.1.1 Establishing a bioinformatic pipeline for processing of trans- latome and transcriptome data The analysis of large genomic datasets usually involves sequential application of various command line tools, software packages and custom scripts. The design of such workflows depends on the exact next-generation sequencing technology and is usually tailored to available computational resources. In order to facilitate processing of dataset for this and future projects, there was a need to develop an in-house computational pipeline. I developed a pipeline, which I named RiboStream, for parallel processing of Ribo-Seq and RNA-Seq datasets; https://github.com/ashakru/RiboStream_bpipe.git. Subse- quent stages of the workflow were linked using Bpipe, a Groovy language platform for managing bioinformatic jobs (Sadedin et al., 2012). A workflow management system, such as Bpipe, can aid to achieve reproducibility, and transparency of the analysis. Bpipe pipeline architecture has numerous advantages over running all the jobs manually or through a series of Shell scripts. The central idea of a workflow management system is to automate, orchestrate and monitor the execution of subsequent stages of the data processing. Automatic log of executed commands and the outputs management increase readability and control on the workflow, that facilitate troubleshooting and customisation, if needed. It is not surprising that workflow management systems have been widely adopted by the bioinformatic community and they are now considered a part of a good research practice. The choice of Bpipe has been dictated by the simplicity of Bpipe architecture, compatibility with a cluster resource manager, and the avaliability of comprehensive documentation and users community support. Briefly, RiboStream takes raw FASTQ files with Ribo-Seq or RNA-Seq reads, as received from a sequencing facility, and returns: aligned BAM files, various quality check metrics and read count matrices, which ready for downstream analysis tailored to the biological question of interest. Optionally, Sequence Read Archive (SRA) accession numbers can be provided instead of FASTQ files. In such scenario the pipeline downloads FASTQ files from SRA, the largest publicly available repository of next generation sequencing data. Flexibility and control over each stage is provided by a single configuration file, where all the parameters of executed tools can be adjusted. Sample-specific parameters, such as the localisation of the FASTQ files (or SRA accession numbers), type of the experiment (Ribo-Seq or RNA-Seq) or the sequence of the adapters for trimming, are contained in a sample sheet and parsed by the pipeline when needed. An outline of the pipeline is shown in Figure 3.2 69 Figure 3.2: Overview of RiboStream, a Bpipe pipeline for Ribo-Seq data analysis Flowchart showing a basic bioinformatic workflow that I developed for Ribo-Seq and RNA-Seq data processing. A typical run performs automatic preprocessing of raw FASTQ files, alignment to the reference genome and routine quality check focused on ribosomal footprints data. The output, in a form of count matrices, BAM and BigWig files, is ready for a downstream analysis. Bioinformatic tools implemented in the workflow at each stage are indicated. SRA - Sequence Read Archive; QC - quality check While most of the stages are the same for both data types, there are three steps specific and for Ribo-Seq samples: prealignment to non-coding RNA sequences, extended quality control and modified reads counting strategy. The first involves removal of common contaminating sequences derived from ribosomal RNA (rRNA), tRNA or small nuclear RNA (Ingolia et al., 2012). The advantage of this extra data purification step is an increase in the proportion of true ribosome-derived mRNA footprints in the final BAM file, containing reads aligned to the human genome. This facilitates the interpretation of the alignment pattern and improves the estimation of the ribosomal P-site location. An additional benefit is a smaller size of the final output file, that accelerates further processing. Quality control is a critical step of all next generation sequencing experiments and usually involves evaluation of raw reads parameters (e.g. total number of reads, base quality scores, GC content), alignment eciency and reproducibility of biological or technical replicates. In addition there are several metrics characterising a good quality Ribo-Seq 70 experiment, that should be checked before performing further analyses. Ribo-Seq specific quality measures include the proportion of uniquely mapped reads mapping to various genomic locations (introns, exons, known CDS or untranslated regions), and transcript biotypes (e.g. protein coding, lncRNAs, pseudogenes). We anticipate the majority of ribosomal footprints to map to CDS regions of known protein-coding genes. This step was performed with Ribowaltz package and custom R scripts utilising basic functions of the versatile GenomicFeatures and GenomicAlignments packages. Another important quality check is the pattern of 50ends of the footprints around annotated start and stop codons. This should reveal enrichment of ribosomal footprints in CDS comparing to 50UTR or 30UTR and clear three-nucleotide periodicity of alignment reflecting single frame preference of translated ribosomes (Figure 3.3 A). These patterns usually are evident for reads of length between 26 and 30 nucleotides, which most likely correspond to true ribosomal footprints. A similar metric, which also investigates the characteristic distribution of the footprints, is a metagene profile. RiboStream performs a metagene analysis to examine the distribution of aligned Ribo-Seq reads over all known protein-coding transcripts. The aggregated (metagene) profile visualises general patterns in read coverage over sets of genomic regions of interest, for example promoters, transcripts or CDS (Figure 3.3 B). A metagene analysis examining the density of Ribo-Seq reads mapping between known start and stop codons is vital to ensure data quality and may reveal biologically important changes in ribosome occupation between experimental conditions (Gerashchenko and Gladyshev, 2014). The expected pattern characterising a successful Ribo-Seq experiment is the footprint density at the first 30 to 40 codons that is far greater than in the rest of CDS. In the literature the mechanism of this pattern was widely discussed. One proposed biological explanation is that it reflects the speed of translation initation, which is slower than translation elongation. Specific codon context around the start codon may explain the accumulation of the ribosomal footprints mirroring local translation rate (Ingolia et al., 2009; Tuller et al., 2010). An alternative hypothesis involves treatment with translation inhibitors. Cycloheximide, which was used in this and numerous other studies including the original Ribo-Seq paper (Ingolia et al., 2009), inhibits translation elongation by interrupting ribosome translocation. By the time all cells are saturated with the inhibitor, some ribosomes keep initiating, which may explain the observed accumulation of footprints in the first codons (Gerashchenko and Gladyshev, 2014). 71 Figure 3.3: Illustration of critical steps in data processing specific to Ribo-Seq datasets A) Diagram showing the mechanism explaining periodical pattern of ribosome footprints alignment, which suggest involvement in active translation. B) Flowchart of metagene analysis implemented in RiboStream. The workflow consists of three steps in which read coverage profiles are scaled to the same length, and then aggregated to create a metagene profile. The final plot represents an average mapping density over all known CDS. Deeptools commands (bamCoverage, computeMatrix or plotProfile) used at each stage are indicated. C) Diagram showing the position of estimated ribosomal P-site in a 28 nucleotide long Ribo-Seq read. D) Diagram showing the strategy to assign mapped reads to di↵erent genomic regions: 50UTR, CDS or 30UTR. CDS - Coding Sequence Region, 50UTR - 50 Untranslated Region, 30UTR - 30 Untranstaled Region 72 An initiation peak also occurs in untreated Ribo-Seq samples (Ingolia et al., 2009; Weinberg et al., 2016), but they are much smaller than the peaks in samples treated with translation inhibitor. Small peaks can be also observed at the stop codon, but the mechanism of this is unknown. Regardless, the mechanism, the accumulation of Ribo-Seq reads around the start codon has real consequences for the data interpretation. Firstly, it is a useful quality measure of the purity of Ribo-Seq signal. Secondly, it allows to identify translation initiation sites directly from the data, which is utilised by various ORF identification algorithms (Calviello and Ohler, 2017). Lastly, this requires a di↵erent read counting strategy to the approach established for RNA-Seq data. Similairly, as di↵erential expression analysis is a core application of RNA-Seq technique, di↵erential translation is one of the main aims of a typical Ribo-Seq experiment. Counting how many fragments have aligned to each gene is an essential step of such analysis. Ribo-Seq read counting in the Bpipe pipeline was implemented to follow the original protocol from Ingolia et al. (2009) with minor modifications. Ribosomal footprints were assigned to genomic regions (CDS, intron, 50UTR or 30UTR) based on the position of the estimated P-site (Figure 3.3 C). I built the models of the genomic regions with GenomicFeatures R package using GENCODE v.29 comprehensive gene annotation set as reference. The first 15 and last 15 nucleotides of each CDS region were trimmed, so that the assigned footprints reflect the baseline translation intensity rather than the rate of initiation or termination (Figure 3.3 D). Trimmed CDS models representing di↵erent transcript isoforms were merged by gene. Only fragments with evident triplet nucleotide periodicity relative to start and stop codon, typically between 27 and 30 nucleotides long, were selected. The pipeline determines the position of the ribosomal P-site with psite script from Plastid toolkit that looks at the o↵set between 50 end of fragments spanning translation start site and annotated start codon, which, usually, is 12 nucleotides for reads in the range between 27 and 30 nucleotides (Figure 3.3 C). Finally, per sample and per gene counts matrices are built using Genomic Ranges R package allowing for assignment of a read to more than one overlapping features (Figure 3.3 D). Corresponding RNA-Seq samples are counted using the same gene models. Processing of a small Ribo-Seq experiment (9-12 samples) with RiboStream installed on High Performance Computing Cluster (18 nodes, 176 CPUs) takes about 3 hours. 73 3.1.2 Quality of translatome profiling in primary GC B-cells Overall the average number of uniquely mapped reads in Ribo-Seq samples was almost 80 million reads of which about 70% corresponded to rRNA sequences (Figure 3.4 A). Although the amount of rRNA-mapping reads depends on the organisms, footprint isolation method, and translational status, they usually account for the majority of sequenced reads (McGlincy and Ingolia, 2017). Given the fact that a ribosome-footprint complex consists of several kilobases of rRNA but only about 28 bases of ribosome protected mRNA, this is not unexpected (Ingolia et al., 2012). Higher rRNA content may also be observed in conditions associated with global change in translation rate, when the ratio of actively translated ribosomes to other RNAs is a↵ected (Ingolia et al., 2012). In contrast to a routine rRNA-depletion library strategy for RNA-Seq experiment, only a few specific rRNA fragments account for the large proportion of contamination observed in Ribo-Seq. This may be related to the RNAse digestion of rRNA at specific and reproducible positions (Ingolia et al., 2012). Our study adopted the rRNA removal protocol introduced in the original Ribo-Seq workflow, where contaminating sequences are hybridised to biotinylated oligonucleotides and depleted with streptavidin beads (Ingolia et al., 2012). Of non-rRNA reads, the majority was about 29 nucleotides long corresponding to the length of the ribosome protected mRNA fragments. The average GC content of uniquely mapped reads was about 50 %. In about 60% mapping destination was known coding sequence region. (Figure 3.4 B-C). The analysis of a mean distribution of the reads 50 end around the start and stop codons, revealed that the 50 end starts aligning about 12 nucleotides upstream from the annotated start codon and shows strong 3-nucleotide periodicity (Figure 3.5 A). The strongest 3 nucleotide pattern of alignment was seen for reads between 26 and 29 nucleotides long, which is in line with previous work. (Ingolia et al., 2009, 2012). The 12 nucleotide o↵set between 50 end of a read and TIS allowed to estimate the position of the ribosome P-site, as was described in the section 3.1.1. The distribution of estimated P-sites around start codon presented clear frame preference and expected accumulation in the first 20 bases of the CDS (Figure 3.5 B). The same analysis applied to samples treated with di↵erent translation inhibitor, harringtonine, which immobilises ribosomes shortly after translation initiation, revealed distinct pattern. While samples treated with cycloheximide showed evident 30nucleotide periodicity over the entire length of CDS, estimated P-sites from the harringtonine group were found almost exclusively in the first 25 nucleotides (Figure 3.5 B). 74 Figure 3.4: Basic mapping statistics metrics for Ribo-Seq reads for the BCL6/MYC experiment in human GC B-cells A) The fate of Ribo-Seq reads after the two rounds of alignment shown for each replicate. The left panel shows the percentage of reads in each of the five group (aligned to rRNA, removed during trimming stage, mapped to multiple locations, unmapped or aligned uniquely). The right panel depicts the total number of uniquely mapped reads that was eligible for the further analysis. B) Histogram of read length of uniquely mapped Ribo-Seq reads stratified by experimental condition. C) Distribution of mapped reads from Ribo-Seq and RNA-Seq experiments to gene features, showing the expected restriction of Ribo-Seq reads to the CDS with only a small portion mapping to UTRs. 75 On a global scale, a metagene plot of cycloheximide treated samples showed a char- acteristic peak at the expected translation initation site (TIS) and abrupt drop-o↵ the signal after translation termination site (TTS). The pattern of ribosomes occupancy was similar for all experimental conditions (Figure 3.5 C). When P-sites were stratified by transcript regions (50UTR, CDS or 30UTR) and read length, they, again, showed a strong frame restriction in known CDS regions (Figure 3.5 D). Even in a good quality experiment, genes with low expression level have typically higher variance which may decrease the sensitivity of di↵erential expression analysis. With only few read counts assigned, it is dicult to distinguish biological di↵erences from technical (sampling) noise. Therefore, a common practice is to remove those lowly expressed genes before attempting di↵erential expression analysis. I estimated the threshold of this filtering adopting the strategy introduced by Ingolia et al. (2009), removing all genes with number of RNA-Seq reads lower than 128 in more than 25% of samples. I also filtered out histone genes, which are not polyadenylated and therefore underrepresented in our RNA-Seq data with poly(A) enrichment library strategy (Figure 3.6 A). Next, I evaluated the reproducibility of gene expression measurements obtained from Ribo-Seq and RNA-Seq. Between-replicate measurement error distribution was relatively narrow: on average normalised expression values from the same experimental condition were lower than 1.22 fold for 90 % of the genes (Figure 3.6 B). Translation eciency (TE), which is the ratio of normalised ribosomal footprints abundance to mRNA level, showed almost 100-fold dynamic range (Figure 3.6 C), which is similar to the original Ribo-Seq protocol (Ingolia et al., 2009). Lastly, the values for both techniques were highly reproducible between biological and technical replicates, average Pearson’s product of correlation was 0.965 for both Ribo-Seq and RNA-Seq (Figure 3.6 D). Interestingly, when Ribo-Seq measured expression levels are juxtaposed with the RNA-Seq counterparts two observations can be made. Firstly, ribosomal footprints abundance has smaller dynamic range that mRNA abundance and, secondly, the relationship between Ribo-Seq and RNA- Seq signal seems to be intensity dependent (Figure 3.6 D). This means that the high correlation between the number of ribosomal footprints and mRNA abundance can be spurious, particulairly for lowly expressed transcripts, and a translation intensity of a mRNA needs to be interpreted with care. I conclude that all the quality measures and the reproducibility of gene expression measurements meet the expectations for Ribo-Seq data. By comparison to previous Ribo-Seq studies, this experiment has larger number of replicates (4 versus usual 2) and shows overall high reproducibility and good quality of the libraries. This confirms that ribosomal footprints obtained from primary GC B-cells: 1) correspond to the ribosomes involved in active translation, and 2) can be used for further analyses. 76 Figure 3.5: Reading frame restriction of Ribo-Seq reads for the BCL6/MYC exper- iment in human GC B-cells A) Heatmap of Ribo-Seq read frame usage by read length, showing read frame restriction in the 28-31 nucleotide ribosome protected fragments. B) Histogram of mean fold changes in TMM normalised ribosome footprints density between replicates of the same experimental condition. C) Metagene plot showing distribution of mapped Ribo-Seq reads to regions of the transcript. D) Heatmap of Ribo-Seq read frame usage by gene feature, showing how the characteristic read frame bias within the CDS but not the UTRs. 77 Figure 3.6: Low counts filtering and reproducibility of Ribo-Seq and RNA-Seq samples for the BCL6/MYC experiment in human GC B-cells A) Scatter plot showing trimmed mean of M-values (TMM) normalised ribosome footprints density against mRNA abundance. Genes classified as low counts and histons were excluded from further analysis. B) Histogram of mean log2 fold changes in ribosome footprints abundance between samples from the same experimental condition. This is similar to the reproducibility reported by Ingolia et al. (2009). C) Histogram of mean Translation Eciency (TE) values obtained from paired Ribo-Seq and RNA-Seq samples from the same experimental condition. Similar dynamic range was shown by Ingolia et al. (2009). D) Scatter plots showing consistency between four replicates of Ribo-Seq and RNA-Seq. Pearson correlation coecients are shown for each comparison. 78 3.1.3 Benchmarking statistical approaches for di↵erential trans- lation analysis The aim of di↵erential translation analysis is to identify genes regulated predominantly at the level of translation. Although many computational tools has been developed to perform the statistical analysis of di↵erentially translated genes, very few have been used beyond their initial publication. The popularity of the Ribo-Seq technique, however, is growing. In 2021 the original paper (Ingolia et al., 2009) reached 3050 citation and, according to PubMed, there are almost 1000 papers containing Ribo-Seq or ribosome-profiling in the title or abstract. An opposite trend has been observed for the computational methods. Only two out of seven most popular tools for di↵erential translation analysis, that will be reviewed in this section, exceeds 100 citation. For comparison, the number of citations for DESeq2 and edgeR, which are the two most popular packages for di↵erential expression analysis of RNA-Seq experiment go over 27 000 and 20 000, respectively (Love et al., 2014; Robinson et al., 2010). To make an informed decision on how to identify translationally regulated genes associated with BCL6 or MYC overexpression, I reviewed previous approaches to di↵erential translation analysis and evaluated their performance using an external dataset of 54 Ribo- Seq/RNA-Seq pairs from Yoruba lymphoblastoid cell lines (Battle et al., 2015). The choice of this dataset has been dictated by the large number of replicates available, which allows me to benchmark tools using various experimental designs. I will return to the BCL6/MYC data at the end of this chapter. Translation eciency Ribo-Seq relies on the assumption that the density of ribosomal footprints (Ribo-Seq reads) mapping to a gene is a proxy for its translation intensity (Ingolia et al., 2009). Because the more abundant mRNAs produce more ribosomal footprints, the Ribo-Seq signal is dependent on a transcript abundance. In other words, the number of ribosomal footprints mapping to a gene is positively correlated with the number of RNA-Seq reads assigned to the same gene. Therefore, each Ribo-Seq sample is always paired with its corresponding RNA-Seq sample in order to enable the calculation of mRNA-specific translation rate. Translation Eciency (TE) is a measure introduced first by Ingolia et al. (2009) to compute translation rate relative to mRNA abundance. By definition, TE is the ratio of ribosomal footprints abundance to mRNA abundance, typically presented as log ratios. Thus TE for a gene g is: TEg = log( RFg mRNAg ), where RF and mRNA stand for ribosomal footprints or mRNA abundance, respectively. 79 Di↵erent normalisation methods can be used to calculate these values. A simple approach is to normalise the number of ribosomal footprints per gene by the library size and the length of the CDS with Reads Per Kilobase Million (RPKM) or just by the library size, using e.g. Counts Per Million (CPM). Normalisation methods developed initially for di↵erential expression analysis of RNA-Seq data, such as Trimmed Mean of M-values (TMM) (Robinson et al., 2010) or Relative Log Expression (RLE) (Love et al., 2014), are also applicable. Fold change in TE has been widely adopted to identify genes with translation load that is disproportionate to the transcript abundance. However, as was pointed by Larsson et al. (2010), the TE fails to control for the e↵ect of mRNA level fully. Bias of TE ratio can be explained mathematically by spurious correlation, described by Pearson in 1897 (Pearson, 1897). It refers to a situation when dependent variable (TE in this case) correlates with the mean value of the independent variable (mRNA abundance) even when mRNA and ribosomal footprints abundance are uncorrelated, which is illustrated by the following equation: r(YZ),Z = sZp s2Y + s 2 Z , where Y is a vector of ribosomal footprints abundance, Z is a vector of paired mRNA abundance, r is the Pearson correlation coecient and s is the sample standard deviation. Here a correlation between TE and log mRNA abundance is a function of the standard deviation of TE and mRNA abundance vector from experimental replicates. Larsson et al. (2010) conclude that this property gives rise to false positives and false negatives when TE is the only metric used to identify translationally regulated genes, especially if large shifts in mRNA abundance are expected. Brief overview of current methods for di↵erential translation analysis The lack of established analytic workflow, as this is the case for RNA-Seq data analysis, suggests that the di↵erential translation analysis is not straightforward. Despite di↵erent biological origin Ribo-Seq and RNA-Seq data share common features: both represent sequence of mRNA fragments that can be summarised as count matrices of gene counts per sample. Most Ribo-Seq specific tools take advantage of these similarities tailoring transcriptomic workflows to di↵erential translation analysis. Comprehensive review and benchmarking of developed strategies has been missing, which complicates an informed strategy selection. In order to bridge this gap and pick an optimal strategy to analyse the data for this project, I reviewed five tools: 1. Xtail (Xiao et al., 2016), 2. Riborex (Li et al., 2017), 80 3. anota (Larsson et al., 2011), 4. anota2 (Oertlin et al., 2019), 5. deltaTE (Chothani et al., 2019), in terms of their statistical approaches, performance, and the impact of experimental design on the accuracy and ability to identify di↵erentially translated genes. Because the majority of tools either run DESeq2 in the background or utilise similar strategies to analyse counts data, I will first outline the basics of DESeq2 analysis. In DESeq2 raw counts for each gene are modelled with negative binomial distribution (NB) using a generalized linear model (GLM). Mean values for each gene are normalised taking into account di↵erences in the total number of reads between samples and NB parameters (mean and dispersion) are estimated from the data. With a simple experiment design (for example control and treatment), a GLM regression model will have one coecient corresponding to the log2 fold change between experimental conditions. The significance of this e↵ect size is typically tested with Wald statistics, as detailed by Love et al. (2014). A straightforward implementation of the DESeq2 method for di↵erential translation analysis is deltaTE, which takes a combined matrix of Ribo-Seq and RNA-Seq counts and models the di↵erence between the two, that is specific to the experimental condition. DeltaTE models this e↵ect introducing an interaction term to the GLM formula and test for statistical significance with Wald test. A potential problem with this approach is that using both data types to estimate NB parameters may lead to biased results. Very similar strategy (combined analysis with interaction term) is employed by Ri- borex. However, in addition to DESeq2-based workflow, two more methods are available: edgeR-based and Voom-based. All three process Ribo-Seq and RNA-Seq reads together. It is important to mention that edgeR (Robinson et al., 2010) and DESeq2, in their principles are very similar: both use GLM methods, model counts data with NB and assume most genes are not di↵erently expressed. Di↵erences can be mainly attributed to processing outliers, dealing with lowly expressed genes, and normalization approach which may a↵ect the estimation of NB parameters. Voom is based on limma, which was originally developed to analyse microarrays data (Law et al., 2014). Voom estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the an empirical Bayes model (Law et al., 2014). Xtail also uses NB and estimates its parameters running DESeq2 in the background separately for RNA-Seq and Ribo-Seq counts. Then, it builds joint probability matrix of two distributions: one for the di↵erence between experimental conditions (separately for Ribo-Seq and RNA-Seq), and second, reflecting the overall di↵erence in gene expression measured by Ribo-Seq and RNA-Seq (separately for experimental conditions). Finally, 81 Xtail compares the log2 fold changes of ribosomal footprints with log2 fold changes of mRNA abundance OR the di↵erence in the disproportion in translational and mRNA response between two conditions. The final p-value is selected dependent on which of the two approaches returns more conservative results. The last family of tools, anota/anota2 employs a di↵erent approach. While the ma- jority of methods focus on a simple di↵erence in TE between two conditions, anota/anota2 rely on the analysis of partial variance (APV) and linear regression to control for cytosolic mRNA levels evaluating changes that are independent on transcripts abundance. To sum up, all tools designed for di↵erential translation analysis (except anota/anota2) assume read counts to follow NB, model the expression with GLM and test for the di↵erence in TE. However, only two (Riborex and deltaTE) allows for experimental design more complex than just two conditions, e.g. including time series, sequencing batches or di↵erent biological models. Di↵erences in the strategies are also related to the estimation of NB parameters, null hypothesis and statistical approaches to identify deferentially translated genes. An alternative approach to identify di↵erentially translated genes, used successfully by other groups (Sendoel et al., 2017; Hsieh et al., 2012) is to combine a standard di↵erential expression analysis performed separately for translatome and transcriptome data with heuristic system of (arbitrary) thresholds to extract genes with biologically relevant e↵ect sizes. An example set of rules for calling a gene di↵erentially translated is as follows: 1. Statistically significant change in ribosomal footprints abundance (evaluated with standard DESeq2 workflow), FDR < 0.1, 2. Absolute mean log2 fold change in mRNA abundance < 0.3, 3. Absolute mean log2 fold change in TE abundance > 0.3. This heuristic approach can be naturally extended to classify genes into distinct regulatory subtypes based on the direction and coordination between translation and mRNA levels. Genes with significant change in mRNA level, with concordant direction of translational change, are classified as homodirectional (up or down), while genes with opposite direction of change (e.g. mRNA downregulated with ribosomal footprints abundance upregulated) are considered ‘bu↵ered’, see section 2.2.3. I refer to this strategy as DESeq2T and, for the purpose of this benchmarking analysis, focus only on the di↵erentially translated genes. 82 Comparison of strategies to perform di↵erential translation analysis To compare the performance of existing tools in their ability to accurately identify di↵erentially translated genes, I generated a series of Ribo-Seq analysis sets with varying number of replicates and di↵erentially expressed genes. For that purpose I used data from a real translatome study of 54 Yoruba lymphoblastoid cell lines (Battle et al., 2015). Previous performance analyses, associated with the initial publication of the di↵erential translation tools, used predominantly artificial data. Here, by using true experimental data we are closer to mimicking a real-life user experience. I chose to use the Battle et al. (2015) data because of the large number of paired Ribo-Seq and RNA-Seq samples of the same cell type. I downloaded raw FASTQ files from SRA and processed them using my RiboStream pipeline obtaining CDS count matrices. First, I evaluated the performance of each tool under a NULL model. From the set of all 54 samples, I generated a series of analysis sets with varying number of replicates. As they all come from repeated sequencing of the same cell type, I do not expect any genes to be identified as di↵erentially translated. This allowed me to estimate the frequency of type I of error (false positives). Half of the randomly selected samples in each analysis set was labelled as ‘treatment’, the remaining half as ‘control’. No separation between ‘treatment’ and ‘control’ was observed in the PCA plot (Figure 3.7 A). The tools were applied to datasets with 2-20 replicates, 50 runs per tool per replicate group in total. Figure 3.7: Comparison of available tools for di↵erential translation analysis A) Representative PCA plot showing a NULL model dataset with 7 replicates per condition. Treatment and control samples were randomly assigned from a set of 54 samples of the same cell type. B) Heatmap showing the median number of di↵erentially translated genes per tool per replicate group in the NULL model test (assuming no di↵erence in expression between two conditions). For each replicate group, 50 count matrices were generated and each tool was applied. 83 Overall, the median number of false positive findings, at FDR < 0.05 with absolute log2 fold change in TE larger than 0.3, was the highest for Xtail (122 genes), which is in line with previous study Oertlin et al. (2019) and suggest that the performance of Xtail in terms of false positive findings may be inferior in comparison to other tools (Figure 3.7 B). For other tools, the median number of false positive findings was higher in comparisons with low number of replicates. When the number of replicates was higher than 3, the number of false positives in the NULL model was negligible (with exception of Xtail). Next, I applied the tools to a di↵erential translation model with simulated fold changes. This time, randomly selected genes from a ”treatment” group were assigned artificial fold changes sampled from a normal distribution. The fold changes were synchronised between RNA-Seq and Ribo-Seq samples, so that there is a 0.4 Pearson correlation between fold changes observed in RNA-Seq and Ribo-Seq, which resembles a natural relationship between mRNA and ribosomal footprints abundance (Figure 3.8 A). The count matrices with designed fold changes were generated using seqgendi↵, R package for adding a known amount of signal to a real read count matrix. Clear separation between treatment and control group was observed in PCA plot (Figure 3.8 B). Similairly to the NULL model analysis, the tools were applied to datasets with 2 - 20 replicates, 50 runs per tool per replicate group (350 comparisons in total). For each tool and each run I defined a ground truth, which was the result of di↵erential translation analysis performed using 20 replicates. The first striking di↵erence between the tools was the total number of di↵erentially translated genes (DTGs) identified with 20 replicates (Figure 3.8 C). While the TE- based strategies (TEdelta, Riborex and Xtail) identified more than 5000 DTGs, DESeq2T returned almost five times fewer genes. For anota and anota2 the mean number of DTGs was 1950 and 3600, respectively. This may reflect di↵erent definitions of di↵erential translation between the tools. For TE-based strategies, statistically significant change in TE (which may be dependent on either change in ribosomal footprints, mRNA abundance or both) is sucient to classify a gene as di↵erentially translated. Whereas, for anota/anota2 and DESeq2T, the change in ribosomal footprints abundance must be significant AND independent on mRNA abundance (anota/anota2) or with negligible change at mRNA level (DESeq2T). Next, I compared the robustness of the DTGs identification between the tools. For each tool and replicate group I computed the following metrics: false positive rate (FPR), false negative rate (FNR), true positive rate (TPR), accuracy, precision and the F1 score (Figure 3.8 D). Similairly as for NULL model test, the highest rate of false positives was seen for Xtail. 84 Figure 3.8: Comparison of available tools for di↵erential translation analysis A) Representative scatter plot of log2 fold change distribution for simulation of Ribo-Seq and RNA-Seq di↵erential expression analysis. B) Representative PCA plot showing a model di↵erential translation dataset used for bench- marking. C) Boxplot showing the total number of identified di↵erentially translated genes (DTGs) by tool when 20 replicates were used for comparisons. Each tool has been applied to a 20 analysis sets. D) Performance of 9 algorithms for identifying DTGs by the number of replicates used for the comparison. For each replicate group mean values of the performance metrics over 50 runs were computed. 85 Overall, TE-based strategies had higher TPR, lower FNR and higher precision (positive predictive value), but this was accompanied by higher FPR. In contrast, the accuracy (the total number of correct findings), and F1 score (weighed average of precision and TPR) was the highest for DESeq2T, while TE-based showed inferior performance. The number of replicates a↵ected the performance of all methods, but with di↵erent strength. The general trend was inferior performance for lower number of replicates (2-3) with improvement for comparisons with 8-10 replicates. Interestingly, FPR was less dependent on the number of replicates: it increased steadily between 2 and 5, and, with the exception of Riborex and deltaTE, stabilised for 5 and more. For Riborex and deltaTE, FPR started growing again when the number of replicates exceeded 10. The results of benchmarking analysis of methods for identifying di↵erentially translated genes demonstrate that no single method is superior, but taking into account experimental design and biological question of interest, it is possible to di↵erentiate between better and worse approaches. Firstly, a striking feature of a di↵erential translation analysis, regardless the method, is its relatively low recall (true positive rate) and high FNR when the number of replicates is low. It seems that at least 8 replicates are needed to reach the level of 50-70 % true positive findings in the group of identified DTGs. Secondly, with FDR < 0.05, the rate of false positive findings is well controlled for almost all TE-based tools, with the exception of Xtail, which shown high FPR in the NULL model and in the di↵erential translation model. The lowest rate of false positive findings across all replicates groups was observed for DESeq2T. Lastly, the flexibility to analyse more complex experimental design, such as time series or sequencing batches, might be key for method selection. Among the benchmarked tools, this is possible in Riborex, DESeq2T and TEdelta. I conclude that for the analysis of small to medium size translatome experiments with complex experimental design DESeq2T and Riborex are the preferential choice. However, when the primary interest of the analysis are the changes ribosomal footprints abundance that are independent of the mRNA level, the analysis performed with DESeq2T may provide better answer due to its lower rate of false positive findings. Therefore this is the approach I adopted to identify di↵erentially translated genes in this and the next chapter. 86 3.2 Translational regulation following BCL6 andMYC overexpression in primary GC B-cells Having settled on an analysis strategy to identify di↵erentially translated genes, I returned to the experiment with primary GC B-cells. I analysed gene expression responses following MYC and BCL6 overexpression in primary GC B-cells following the DESeq2T method tested above and detailed in section 2.2.3. Briefly, I performed a standard di↵erential expression analysis with DESeq2, separately for Ribo-Seq and RNA-Seq data. This revealed that, for both BCL6 and MYC, the fold changes in mRNA abundance and ribosomal footprint density were highly correlated (Pearson’s product of correlation, 0.71 and 0.82, respectively) suggesting that mRNA-driven changes are the dominant mechanism of regulation (Figure 3.9 A-B). Overall, in both experiment types, I identified 946 di↵erentially expressed genes for BCL6 and more than four times more, 4247, for MYC, of which 31.8% and 43.7% overlapped between transcriptome and translatome analysis (Figure 3.9 C). Next, I classified di↵erentially expressed genes into 6 regulatory groups, depending on the direction (up or down) and the type of expression response: translation only, coordinated (homodirectional) or bu↵ered between translation and mRNA level, see section 2.2.3. To characterise each of the potential regulatory programmes, I performed a gene ontol- ogy (GO) and pathway analysis. Top statistically significant findings are shown in (Figure 3.9 D-E). In BCL6 overexpressing cells, genes showing coordinated downregulation at the level of transcription and translation were enriched for numerous terms associated with unfolded protein response and inflammation. Indeed, the expression of many core components of the ER stress pathway, including ERN1, XBP1, or DDIT3 (CHOP) was significantly reduced in both RNA-Seq and Ribo-Seq data. Pathways related to extracel- lular matrix organisation and signalling through receptor tyrosine kinases were enriched in the upregulated group. For MYC, I observed strong upregulation of terms related to translation and ribosome biogenesis, which is in line with previous studies on MYC regulatory network. The downregulated group consisted of terms related to inflammatory responses, G protein-coupled receptors (GPCR) and PD-L1 signalling. The latter included mainly genes related to HLA presentation. Interestingly, the decrease in mRNA abundance of CD274 (Programmed death-ligand 1, PD-L1) in MYC overexpressing cells was bu↵ered by increase in the ribosome footprint density (TE log2 fold change = 1.280). In BCL6 overexpressing cells, a similar trend was observed - mild mRNA downregulation bu↵ered by the ribosome abundance (TE log2 fold change = 0.42). The expression of PD-L1 on the cell surface is known to inhibit T cell-mediated immune responses, which is a well established mechanism of maintaining physiological self-tolerance, but also promoting immune escape 87 during tumour formation (Pardoll, 2012). Maintenance of PD-L1 translation combined with translational downregulation of HLA-related genes by MYC, could mediate permissive environment for aberrant clonal expansion of GC B-cells, thus, facilitate tumour formation. However, the average expression of PD-L1 in primary GC B-cells was low, just above 10th percentile of expression. The mean di↵erence in TE had narrow dynamic range. Only about 1% of genes had absolute log2 fold changes larger than 1. Translation-level changes were dominated by the regulation of synthesis of housekeeping proteins, such as the components of the mitochondrial respiratory chain (MYC only) or ribosomal proteins (MYC and BCL6) (Figure 3.9 E). 4 out of 7 mitochondrially encoded subunits of NADH dehydrogenase (complex I) were translationally upregulated in MYC overexpressing cells (TE log2 fold change between 0.4 and 1) as well as CYCS gene (cytochrome c), and 3 components of the cytochrome c oxidase (Figure 3.9 F). This may complement a known function of MYC in promoting mitochondria biogenesis (Morrish and Hockenbery, 2014). When it comes to ribosomal proteins (RPs), BCL6 or MYC overexpression was associated with translational control of few of them, but in opposite directions (Figure 3.9 E). RPL28, RPL32, RPL36AL, RPLP0 and RPS9 were translationally suppressed in MYC overexpressing cells, while translational upregulation of RPL5, RPL24, RPL18, RPS27L and RPL30 was associated with BCL6 overexpression. Translational regulation of RPs is typically associated with mTOR signalling and preferential translation of transcripts with 50terminal oligopyrimidine (50TOP) motifs. However, since MYC overexpression is associated with an increase in the RPs mRNA abundance, changes in TE can reflect mRNA rather than translation-driven changes. Overall, I conclude that the overexpression of MYC or BCL6 is associated with relatively minor changes in translation intensity that are independent on mRNA-driven reprogramming. Translational control a↵ects only selected transcripts, predominantly those involved in highly energetic processes such as ribosome biogenesis or respiratory chain. In addition, translational bu↵ering of decreasing levels of PD-L1 mRNA were revealed, which may suggest the involvement of translational control in immune surveillance mechanisms. Although, the data presented here do not allow me to determine the significance of such adaptive mechanisms, they provide an interesting insight into the scope of changes that can accompany deregulation of MYC or BCL6 in B-cell lymphoma. 88 Figure 3.9: Di↵erential translation analysis of GC B-cells overexpressing BCL6 or MYC A) Scatter plot showing log2 fold changes in ribosome footprints occupancy against log2 fold changes in mRNA level for BCL6-t2A-BCL2 versus BCL2 comparison. Colours represent regulatory groups. B) Scatter plot showing log2 fold changes in ribosome footprints occupancy against log2 fold changes in mRNA level for MYC-t2A-BCL2 versus BCL2 comparison. Colours represent regulatory groups. C) Venn Diagrams showing the overlap in di↵erentially expressed genes identified by trans- latome (Ribo-Seq) or transcriptome (RNA-Seq) data. D) Reactome Pathway analysis for genes with coordinated change (homodirectional) in expression. Three top enriched pathway are shown per group (FDR < 0.1). E) Gene ontology analysis of di↵erentially translated genes (FDR < 0.1). F) The network of protein-protein interactions of translationally upregulated genes following MYC overexpression. Obtained from STRING 1.1.5. 89 3.3 Discussion Main findings Ribo-seq, also known as ribosome profiling, is a Next Generation Sequencing technique that has transformed studies on mRNA translation. Here, I introduce RiboStream, a bioinformatic pipeline that I developed to process translatome data eciently and accurately. RiboStream was applied to process a large dataset of 54 Ribo-Seq and RNA- Seq samples that allowed me to systematically benchmark available tools for di↵erential translation analysis. Based on this, I selected DESeq2T as preferential for analysing Ribo- Seq datasets in this study. DESeq2T method combines a standard di↵erential expression analysis using DESeq2 software with a set of rules classifying genes subject to distinct mechanisms of regulation, DESeq2T showed superior accuracy, low false positive rate and flexibility to analyse experiments with complex experimental design. A similar strategy, applied in previous studies (Hsieh et al., 2012; Sendoel et al., 2017) has lead to biologically relevant results (confirmed experimentally), despite using a lower number of replicates than in our study. I then used DESeq2T to dissect the translational consequences of BCL6 or MYC overexpression in the primary GC B-cells model. This revealed preferential translational of selected transcripts encoding certain ribosomal proteins and the components of the respiratory chain. While the concordant translational response followed changes in the mRNA level in the majority of genes, there were few exceptions from that rule. A particularly interesting example was translational bu↵ering of CD274 expression (PD-L1). Limitations and artefacts of translatome studies Given the dynamic nature of translation, the process of Ribo-Seq library preparation is a critical factor in the accurate measurement of cellular translatome. In order to keep the position of translating ribosomes undisturbed during cell harvesting, the standard practice is to treat the cells with translation inhibitors just before the procedure. The most common drug used to freeze the ribosomes before harvesting is cycloheximide (CHX), but the fidelity and precision of ribosomal footprints mapping in samples treated with CHX has been questioned (Duncan and Mata, 2017; Gerashchenko and Gladyshev, 2014; Santos et al., 2019; Hussmann et al., 2015). Analyses performed in lower eukaryotes (Schizosaccharomyces sp. and Saccharomyces sp.) showed that CHX induces artefacts in ribosome coverage profile in cells subjected to di↵erent types of stress. CHX artefacts were responsible for skew in codon occupancy towards CGA and CGG (Hussmann et al., 2015), accumulation of the ribosomal footprints in 50UTR and in the first 100–200 nucleotides of the coding sequence, so-called 50 translation ramp. (Gerashchenko and Gladyshev, 2014; Tuller et al., 2010). Highly expressed genes, such as ribosomal proteins, were more prone to 90 experience CHX-induced artefacts (Santos et al., 2019) and the e↵ect was dose-dependent. However, a similar study validating these findings in yeasts and mammalian cells showed that this e↵ect is species specific (Sharma et al., 2019). Sharma et al. (2019) showed that CHX not only did not disturb the codon occupancy pattern in human cells but also did not a↵ect the gene-level translation intensity measurements. Important factors determining the quality of the experiment included: the type of RNAse, its concentration (Sharma et al., 2019; Gerashchenko and Gladyshev, 2017) and the time between the onset of harvesting and flash-freezing (Sharma et al., 2019; Rooijers et al., 2013) These results suggest that the experiments performed in yeast cannot be simply extrapolated to mammalian systems. Although we cannot exclude that di↵erential translation analysis performed in human cells is free from artefacts and biases, it is unlikely that the results presented here are caused by the usage of CHX during the Ribo-Seq library preparation. A group of housekeeping genes undergo translational regulation following BCL6 or MYC overexpression The analysis of cellular translatome and transcriptome following BCL6 or MYC overexpres- sion revealed a profound correlation between mRNA and ribosomal footprint abundance, which suggest domination of mRNA-driven changes. Given that, during dynamic cell transitions, even up to 92 % of per mRNA-translation rates can be explained by mRNA levels (Jovanovic et al., 2015), this is not a surprising finding for proteins, which are bona fide transcription factors. Interestingly, the same study (Jovanovic et al., 2015), utilising a combination of pulsed-SILAC and RNA-Seq, showed that while the majority of changes in protein level is, indeed, driven by mRNA abundance, a group of housekeeping genes, including ribosomal proteins (RPs) and mitochondrion-related were more dependent on translation and protein degradation, respectively. The hypothesis put forward to explain this was that translational regulation is reactive to dynamic changes of cellular states by finely tuning the rate of specific metabolic processes (Jovanovic et al., 2015). Here, I identified RPs and oxidative phosphorylation related genes as translationally regulated. MYC overexpression resulted in decreased translation rate of selected RPs and increased synthesis of several key enzymes of the respiratory chain. On the contrary, BCL6 overexpression was associated with an increase in the translation of a few ribosomal proteins. Translational control of ribosomal proteins expression Translational regulation of RPs was reported previously. Strong repression of translation of RPs was observed during di↵erentiation of mouse embryonic stem cells (Ingolia et al., 2011), and after treatment with mTOR inhibitor of PC3 human prostate cancer cells (Hsieh et al., 2012). RPs were also translationally upregulated in heart tissue samples of 91 patients with dilated cardiomyopathy compared to normal hearts (van Heesch et al., 2019). Two patterns of regulation can be distinguished in these studies: global and selective. The first mechanism refers to 50TOP mediated control of mTOR downstream signalling on the translation of the core components of protein synthesis machinery, which includes RPs and few translation factors (Thoreen et al., 2012; Hsieh et al., 2012; Philippe et al., 2020). While expression of MYC is known to correlate with mTOR activation (Lu et al., 2021; Pourdehnad et al., 2013; Liu et al., 2017), no such pattern has been shown for BCL6. While inhibition of mTOR signalling is associated with translational repression of all RPs (Hsieh et al., 2012), in other studies (van Heesch et al., 2019; Ingolia et al., 2011), including this one, di↵erential translation was shown only for individual proteins. There is a possibility that forced expression of MYC or BCL6 induces adaptive mTOR-mediated adjustments of the RPs synthesis, but only a few single RPs reached the detection level. Ribosomal proteins are highly expressed; there are millions of ribosomes present in every cell, so even small changes in their abundance can have a profound e↵ect on translation (Genuth and Barna, 2018). Selective regulation of translation of specific RPs may be explained in the context of ribosomal heterogeneity. Tissue specific or developmental stage specific pattern of the core ribosomal proteins expression has been reported in human and other organisms, as reviewed by (Genuth and Barna, 2018; Shi and Barna, 2015). Changes in the abundance of selected RPs and the assembly of specialised ribosomes could drive a specific programme of translation that could facilitate the oncogenic function of MYC or BCL6. For example, knockdown of RPL28, which I found translationally downregulated in MYC overexpressing cells, is not lethal to the cell but promotes an MHC I peptides presentation of non-canonical peptides from non-AUG start codons and noncoding regions of the transcriptome. Thus, regulation of MHC I peptide presentation may facilitate immune surveillance (Wei et al., 2019). An additional level of complexity is provided by di↵erences in the stoichiometry of RPs in free ribosomal subunits and translationally active ribosomes, or the extra- ribosomal roles of certain RPs (Shi et al., 2017). Mutations in RPs are well known in the context of ribosomopathies, hereditary disorders with increased risk of malignancy, including lymphoma. Abnormalities in the ribosome biogenesis process can trigger p53 activation and DNA damage response (Lindstro¨m et al., 2018). Therefore fine tuning of the abundance of individual ribosomal proteins may be essential for tumour initiation and maintenance. An alternative explanation is that the disproportionate translation of specific ribosomal proteins is a marker of profound transcriptional reorganisation following BCL6 or MYC overexpression. Both Ribo-Seq and RNA-Seq analysis provides only a relative quantification of gene expression, which may mask global shifts in translation intensity. It would be interesting to investigate the changes in polysome fractions of individual transcripts and 92 compare them with the results from our Ribo-Seq experiment, as this could provide better resolution of translational response during dynamic reprogramming of the transcriptome. MYC controls PD-L1 translation in primary GC B-cells Translational control of the immune checkpoints and HLA surface molecules has attracted much attention in the context of immunosurveillance of developing Here, I observed MYC- induced downregulation of mRNA abundance of 24 out of 26 genes encoding HLA class I and II molecules combined with translational control of PD-L1, an important component of an immune checkpoint. PD-L1 is a ligand of PD1 receptor located on activated T-cells and expressed during persistent antigen stimulation to constrain immune response (Sharpe and Pauken, 2018) MYC promotes the immune escape of tumours through a variety of mechanisms. Its ability to downregulate HLA expression has been known for a long time (Versteeg et al., 1988; God et al., 2015; Staege et al., 2002), so this not comes as a surprise in MYC-overexpressing GC B-cells. MYC-induced immunosuppressive phenotype is further enhanced by transcriptional control of two immune checkpoints: PD-L1 and CD47 (Casey et al., 2016, 2018). Inactivation of Myc lead to abrupt decrease in PD-L1 and CD47 mRNA and protein level in multiple cancer types (in vitro and in vivo evidence) (Casey et al., 2016, 2018), which may explain an increased recruitment of immune cells to the tumour tissue following depletion of MYC (Rakhra et al., 2010). It is important to note that PD-L1 mRNA levels in cancer, including B-cell lymphoma, are also regulated by MYC-independent mechanisms, including CD274 gene amplification, translocation under active promoter or truncation of 30UTR, which stabilised PD-L1 mRNA (Ansell et al., 2015; Green et al., 2010; Twa et al., 2014; Kataoka et al., 2016). In contrast, the overexpression of MYC in primary GC B-cells, studied here, was not associated with an increase in PD-L1 and CD47 mRNA level but almost two-fold decrease for PD-L1. PD-L1 expression level was bu↵ered by an increased ribosome footprints abundance suggesting the contribution of post-transcriptional control. PD-L1 mRNA is known to contain 2 upstream Open Reading Frames (uORF), which interrupt an e↵ective translation of the PD-L1 protein. MYC-induced activation of the integrated stress response and subsequent phosphorylation of eIF2↵ was shown to alter the uORF/PD-L1 balance increasing PD-L1 expression (Xu et al., 2019). Although the ribosomal footprints abundance in 50UTR of PD-L1 was evident in our data, there were no di↵erences in the occupation of it between the experimental conditions. This does not allow me to exclude uORF-mediated regulation of PD-L1 translation, but it is enough to speculate that, if such exists, it is not mediated by a binary on/o↵ translation of uORF. Interestingly, PD-L1 translation in cancer cells requires a certain setup of the translational apparatus. PD-L1 translation was promoted upon eIF2↵ phosphorylation during stress response, while the ablation of eIF4E phosphorylation at Serine 209 Xu et al. (2019) and 93 depletion of eIF5B reduced PD-L1 expression (Suresh et al., 2020). So far, the PD-1/PD-L1 inhibition therapy in Non-Hodgkin Lymphoma showed limited eciency, as most patients do not respond well to monotherapy or the response is temporary (Zhang et al., 2018b). The role of PD1/PD-L1 axis in aggressive B-cell lymphoma is complex. T follicular helper cells express high levels of PD1 and are considered important in regulating B-cell di↵erentiation in GC B-cells and the formation of long-lived plasma cells (Goodman et al., 2017; Good-Jacobson et al., 2010). However, cell surface expression of PD-L1 was found only in 11-30% of DLBCL patients (depending on the study). PD-L1 expression was found associated with EBV positive tumours, which had inferior overall survival (Goodman et al., 2017; Kiyasu et al., 2015). A discrepancy between the direction of change upon changing MYC levels in established tumours (Casey et al., 2016), including B-cell lymphoma, and the primary GC B-cells, studied here, may suggest that the nature of PD-L1 control by MYC may change in the course of tumour development. It would be interesting to elucidate this cross-talk between the regulation of transcription, translation and mRNA stability of PD-L1, which could shed light on the susceptibility of lymphoma cells to PD-1/PD-L1 inhibition. 94 CHAPTER 4 Mutations in RNA helicase DDX3X facilitate MYC-driven lymphomagenesis 4.1 Background Burkitt Lymphoma (BL) is a highly aggressive form of non-Hodgkin lymphoma (NHL) with 3:1 male:female incidence ratio (Smith et al., 2015; Morton et al., 2006). It presents in three distinct forms: endemic, sporadic and immunodeficiency-associated. The endemic Burkitt Lymphoma (eBL) is the most common childhood cancer in sub-Saharan Africa accounting for nearly half of all paediatric cancers there. The estimated number of new eBL diagnoses in Africa was 3900 in 2018 (Ha¨mmerl et al., 2019). BL not associated with immunodeficiency and occurring outside endemic Africa is defined as sporadic Burkitt Lymphoma (sBL). It is a rare disease accounting for only 2% of all newly diagnosed NHL. The annual incidence rate of BL in Europe is about 0.36 per 100,000 (Smith et al., 2015). sBL occurs with two age peaks - 10 and 75 years. With a growth fraction approaching 100 % and a mass doubling time around 25 hours, BL is one of the most aggressive tumours in humans. Despite such fulminant onset, it is potentially curable with intensive chemotherapy. 5-year survival is 87% for patients younger than 20 (Costa et al., 2013). The toxicity of such regimes poses a substantial risk for older individuals, thus treatment intensity must often be reduced. 5–year survival for patients 60 and older is only 25-33%. Moreover, full compliance of multi-agent treatment, which consist of sequential administration of four to six drugs, is dicult to achieve in countries with limited access to medical care. It is estimated that the long term survival of paediatric BL in sub-Saharan Africa is between 30% and 50% and has remained unchanged since 1970s (Ozuah et al., 2020). 95 Recent advances in molecular mechanisms of lymphoma development open new oppor- tunities to improve the therapy of BL patients. From a molecular perspective, BL arises from the germinal centre (GC) stage of B-cell development, which is dedicated to selecting and expanding mature lymphocytes producing high-anity antibodies (De Silva and Klein, 2015; Klein and Dalla-Favera, 2008; Basso and Dalla-Favera, 2015). This process was explained in detail in section 1.4. The oncoprotein MYC is essential for the successful outcome of the GC reaction, but its expression is transient and limited only to a small portion of GC B cells undergoing positive selection (Calado et al., 2012; Dominguez-Sola et al., 2012). MYC is a transcription factor that regulates many key cell functions, such as proliferation, DNA replication, protein biosynthesis, and metabolism. DNA-strand breaks and non-homologous end joining are the principal mechanisms involved in the immunoglobulin genes recombination. A side e↵ect of this is the risk of translocation involving highly expressed immunoglobulin. Translocation between an oncogene MYC and immunoglobulin heavy or light chain loci is an almost universal feature of BL, observed in more than 95 % of cases (Swerdlow et al., 2016). However, sustained MYC upregulation alone is not sucient to drive lymphomagenesis. MYC overexpression in non-cancer cells triggers apoptosis mediated by both p53-dependent and p53-independent pathways and amplifies the apoptotic signal in mitochondria through inhibition of anti-apoptotic BCL2 expression (McMahon, 2014). A mouse model of human lymphoma revealed that further co-operating mutational mechanisms, such as PI3K activation, are needed for MYC-induced malignant transformation (Sander et al., 2012). Genes and pathways recurrently mutated in BL include transcription factors, such as TCF3/ID3, or FOXO1, SWI/SNF chromatin remodelling complex (ARID1A, SMARCA4), genes related to apoptosis (TP53, USP7, CDKN2A), GPCR signalling, B-cell receptor/PI3K signalling and epigenetic regulators (Grande et al., 2019; Schmitz et al., 2012; Bouska et al., 2017; Grande et al., 2019; Lo´pez et al., 2019; Richter et al., 2012). Interestingly, in the most recent genomic study of BL (Grande et al., 2019), an RNA binding protein, DDX3X, was the most frequently mutated gene after MYC, when both point mutations and copy number changes were considered. Despite such high recurrence rate of DDX3X mutations, little is known about their role in BL. Whilst DDX3X mutations have been also reported in chronic lymphocytic leukaemia (Ojha et al., 2015; Takahashi et al., 2018), medulloblastoma (Jones et al., 2012; Pugh et al., 2012; Robinson et al., 2012), head and neck squamous cell carcinoma (Stransky et al., 2011) and NK-T cell lymphoma (Jiang et al., 2015), the function of DDX3X in cancer remains puzzling and conflicting as it has been classified both as a tumour suppressor and an oncogene (Soto-Rifo et al., 2012; He et al., 2018). Its dual function has been reported not only for di↵erent types of cancer but also within the same type (He et al., 96 2018), which underscores the context-specific e↵ect of DDX3X in malignancy. DDX3X role in human diseases is not limited to cancer. Heterozygous mutations in DDX3X have been previously linked to neurodevelopmental disorders associated with autism spectrum, intellectual disability and seizures (Johnson-Kerner et al., 2020; Lennox et al., 2020a; Kellaris et al., 2018; Iossifov et al., 2014). DDX3X is a highly conserved ATP-dependent RNA helicase involved in various aspects of RNA biology: transcription, splicing, nuclear export, stress granule formation and resolution, microRNA biogenesis, mRNA translation and decay (Linder and Jankowsky, 2011; Mo et al., 2021). Known RNA-independent functions include regulation of WNT, and NFB signalling (Pugh et al., 2012; Xiang et al., 2016). DDX3X is located on the non-pseudoautosomal region of chromosome X and is known to escape chromosome X inactivation in wide range of tissues (Berletch et al., 2011; Cotton et al., 2015). The Y- chromosome paralogue, DDX3Y, shares 92% amino acid similarity with DDX3X. Although widely transcribed, DDX3Y protein is expressed exclusively in spermatogonia (Ditton et al., 2004; Rauschendorf et al., 2011; Foresta et al., 2000a). Given the high frequency of DDX3X mutations in BL, I set out to establish the contribution of this gene to lymphomagenesis in Burkitt lymphoma. In this chapter: 1. I examine the frequency and distribution of DDX3X mutations in BL, 2. I perform a bioinformatic analysis of multi-omic datasets to elucidate the molecular role of DDX3X in BL This project was conducted as a collaborative project with Dr Chun Gong in the Hodson lab. Dr Gong performed the wet lab experiments whilst I performed all computational analysis. 97 4.2 Results 4.2.1 Examining the prevalence and distribution of DDX3X mu- tations 4.2.1.1 DDX3X is preferentially mutated in MYC driven lymphomas In order to establish the frequency of point mutations in BL in the UK cohort, I reviewed the results of a 293-gene targeted sequencing panel of 39 cases of previously untreated sporadic BL. Consistently with previous reports, MYC, ID3, TP53, CCND3, DDX3X, ARID1A, FOXO1 and SMARCA4 were the most frequently mutated genes. Mutation in DDX3X was found in 30.8% (12/39) patients and was much more common in males than females, 11 versus 1, respectively (Figure 4.1 A). . The same analysis was applied to 928 cases of DLBCL (Lacy et al., 2020). Of these, only 5.2% had DDX3X mutation, which was in line with other recent sequencing studies (Figure 4.1 B) (Chapuy et al., 2018; Reddy et al., 2017; Schmitz et al., 2012). Figure 4.1: Frequency of DDX3X mutations in BL A) Barplot showing mutation frequency (%) for the indicated genes detected using a 293-gene panel applied to 39 cases of Burkitt lymphoma (sequencing, mutation calling and filtering performed by Dr. Peter Campbell and Dr. Philip Beer). B) Barplot showing the frequency of DDX3X mutation across published sequencing studies of BL and DLBCL. 98 Unlike BL, DLBCL is a highly heterogeneous disease. The recently described Molecular- High Grade (MHG) subtype of DLBCL shares several similarities with BL which are reflected by BL-like gene expression signature involving high expression of genes related to cell cycle, TCF3 signalling and ribosome biogenesis (Sha et al., 2019). In order to establish a link between DDX3X mutation and MYC-driven lymphomagenesis, I compared the relationship of DDX3X mutation frequency with MYC status in targeted sequencing data from the published UK cohort of 550 DLBCL cases with available fluorescence-in-situ hybridisation (FISH) data for MYC (Cucco et al., 2020). The frequency of MYC locus rearrangement was significantly enriched in cases with DDX3X mutation (Figure 4.2 A, Chi-square test, p-value = 0.001): out of 34 patients with DDX3X mutation, 16 (47.06%) showed MYC translocation, comparing with only 21.04 % (109/516) in DDX3X wild-type group. In the same study, a comparison of DDX3X mutation frequencies between DLBCL subtypes revealed remarkable enrichment of DDX3X mutation in MHG group (Figure 4.2 B). Out of 558 cases with available gene expression profile, that was used to identify DLBCL transcriptomic subtype, DDX3X was mutated in 16.7% of MHG versus 4.3% and 2.2% of GCB and ABC DLBCL, respectively. The enrichment of DDX3X mutations in the MHG subtype was significantly higher than in other subtypes (p-value = 0.001, Chi-squared test). To validate this finding, I re-analysed RNA-Seq data from a large, publicly available dataset of 553 DLBCL patients from GOYA clinical trial (McCord et al., 2019b): I downloaded the raw FASTQ files from Sequence Archive Database (SRA), performed quality control, alignment to the reference genome and read counting, as detailed in section 2.2.2. The aim of this analysis was to 1) classify the cases by their transcriptional subtypes, 2) identify those with DDX3X either mutated or not expressed and 3) compare the frequency of DDX3X alteration between the subtypes. I combined two previously developed DLBCL classifiers to, firstly, segregate cases into ABC, GCB, Unclassified (Reddy et al., 2017) and then, distinguish the MHG group among samples belonging to GCB-DLBCL subtype, as described by Sha et al. (2019). The frequency of each identified subtype in GOYA dataset was as expected with the MHG group accounted for about 10 % of total cases (Figure 4.2 C). Next, I performed Single Nucleotide Variants (SNVs) calling from paired-end RNA-Seq data according to the GATK Best Practices Guideline. Because of the lack of germline control for each sequenced tumour, I imposed rigid criteria to establish DDX3X mutation status for each sample. I defined DDX3X mutant samples as those with either nonsense, frameshift or non-synonymous SNVs, localised in evolutionary conserved DDX3X helicase domain, with the ratio of variant coverage to reference coverage > 0.2. I filtered out all common population variants reported in the ExAC database. 99 Figure 4.2: DDX3X mutations are enriched in MYC-driven DLBCL A) Barplot showing the proportion of cases with MYC rearrangement detected by FISH from a cohort of 550 cases of DLBCL from Cucco et al. (2020) stratified by DDX3X mutation status. B) Barplot showing the frequency of DDX3X mutation in 558 cases of DLBCL from Cucco et al. (2020) stratified by transcriptional subtype. C) Barplot showing the frequency of DDX3X mutation in 554 cases of DLBCL from GOYA study McCord et al. (2019b) stratified by transcriptional subtype. D) Heatmap showing DDX3X mutation status by transcriptional subtype in 553 DLBCL cases from GOYA trial. Rows represent gene expression signatures which were used to assign transcriptional subtypes (obtained from Reddy et al. (2017) and Sha et al. (2019)). 100 5 samples with disproportional low expression of DDX3X were also classified as loss- of-DDX3X. See the section 2.2.8 for a detailed description of the analytic workflow. Concordant with previous result, DDX3X mutation or absent expression was significantly enriched in the MHG subtype (19.0%), compared to GCB (8.1%) and ABC (3.0%) DLBCL (p-value < 105, Chi-squared test) (Figure 4.2 C-D). The computational evidence for MYC-DDX3X interaction was subsequently tested in the lab by Dr. Chun Gong in a culture system using transduced primary human germinal centre B cells. This revealed a competitive advantage in ex vivo GC B cells when co-transduced with both MYC and helicase-mutant DDX3X. This competitive advantage was not seen in cells co-transduced with MYC and WT DDX3X or cells co-transduced with mutant DDX3X and BCL6. These functional experiments confirmed the computational prediction of co-operation between MYC and helicase-mutant DDX3X. 4.2.1.2 Context dependent pattern of DDX3X mutation in di↵erent cancer types Since DDX3X was reported to be both a tumour suppressor and an oncogene, I wondered if there are any di↵erences in the types of point mutations between cancer types suggesting the di↵erential role of DDX3X. Hence, I examined the characteristics of DDX3X mutations downloaded from COSMIC, a Catalogue of Somatic Mutations in Cancer. Although the total number of mutations reported in COSMIC is biased by the availability of sequencing datasets corresponding to each tissue type, an interesting observation can be made by looking at the ratio of disrupting mutations such as nonsense or frameshift. While the majority of cancers had a varying mixture of disrupting and non-disrupting mutations, almost all mutations reported in Central Nervous System (CNS), thyroid and pancreas were of missense type (Figure 4.3 A) This supports the observation that the role of DDX3X mutations is context-dependent and may di↵er between cancer types. 4.2.1.3 DDX3X mutations in B-cell lymphomas cluster within C-terminal helicase domain As a member of the DEAD-box RNA helicase family, DDX3X (and DDX3Y) contains an evolutionarily conserved helicase core of two RecA-like domains (Linder and Jankowsky, 2011; Mo et al., 2021). The two core domains contain 12 sequence motifs that are involved in either RNA binding (motifs Q, I, II/DEAD, VI) or ATP binding and hydrolysis (Ia, Ib, Ic, IV, IVa, V, VI). The helicase core is surrounded by two Low Complexity Domains (LCDs) that are known to participate in the assembly of RNA-protein aggregations, such as stress granules, through liquid-phase separation (Molliex et al., 2015; Valentin-Vega et al., 2016). I investigated the distribution of DDX3X point mutations across the protein domains in patients diagnosed with BL or DLBCL. 101 Figure 4.3: Types of DDX3X mutations in di↵erent cancer types A) Barplot showing distribution of DDX3X mutation types in di↵erent cancer types included in COSMIC database (v.89) B) Lollipop plot with the distribution of DDX3X mutations identified in this and published studies of BL and DLBCL over the functional domains and motifs of DDX3X protein. 102 This revealed the accumulation of mutations within the helicase domains, especially in the C-terminal helicase domain. The mutational hot-spots in BL and DLBCL were: R488, R475, R311, and R528, R534 (Figure 4.3 B). Some of them are shared with medulloblastoma and are known to abolish helicase activity. In previous studies, RNA unwinding assays showed complete loss of helicase activity for R475 mutation (Lennox et al., 2020a) and almost 100-fold reduction in activity for R534 (Floor et al., 2016). In line with the mutation pattern observed in COSMIC data and frequent deletions of DDX3X loci reported in the recent whole-genome sequencing study of BL (Grande et al., 2019), there were multiple disrupting mutations, many of them localised near N-terminus. These findings suggest that loss of RNA helicase function is the predominant conse- quence of DDX3X mutation in B-cell lymphoma. 4.2.1.4 Males with Burkitt Lymphoma and DLBCL are more likely to have DDX3X mutation Given the strong male skew in DDX3X mutation frequency observed in the targeted sequencing panel of BL patients, I attempted to establish whether there is any relationship between sex and the probability of DDX3X mutation in B-cell lymphoma. In order to and improve confidence and precision of the e↵ect size estimation, I performed meta-analysis of the sex skew in DDX3X mutation ratio using previously published sequencing studies of BL and DLBCL with available sex data. I collected data from 7 published studies with available DDX3X mutation status and patient sex. Data from the targeted sequencing of 39 patients from this study were also included. Because patient sex was unavailable for 553 DLBCL patients from GOYA clinical trial, I decided to classify RNA-Seq samples into male and female using the decision tree algorithm implemented in rpart R package. In total, the number of patients included in the analysis was 395 for BL (6 studies) and 2180 for DLBCL (3 studies). Pooled male to female ratios were 4 for BL and 1.2 for DLBCL. The median percentage of DDX3X mutated samples was 30.385% for BL and 5.967 % for DLBCL. Single data points and sequencing type used to call variants shown in (Figure 4.4 A - B, Table 4.1). The analysis revealed that males diagnosed with BL or DLBCL have approximately 1.23 times the risk of DDX3X mutation compared to females ((Figure 4.4 A-B), Random e↵ects model p-value = 0.0002). The allele frequencies suggested that DDX3X mutations were predominantly clonal in both females and males. In females only one copy of DDX3X was mutated (Figure 4.4 C). 103 Figure 4.4: Metaanalysis of DDX3X mutation associated gender skew in published DLBCL and Burkitt Lymphoma studies A) Forrest plot showing the e↵ect sizes for each study. The red bar represents the prediction interval around the pooled e↵ect shown by the grey diamond. B) L’Abbe´ plot showing sex skew of DDX3X across this and other studies of BL and DLBCL. C) Dot plot showing DDX3X mutant allele frequency by sex. Data are taken from this study and three other BL sequencing studies for which sex and MAF was available (Grande et al., 2019; Lo´pez et al., 2019; Zhou et al., 2019). Mantel-Haenszel random-e↵ects model was used to calculate the overall Relative Risk (RR) and 95% CI. The RR in all studies had a range of 1.06–1.45 and RR of 1.23. The heterogeneity across studies was assessed by Cochran’s Q test and Tau-squared (p-value 0.8338, 2 = 0.0033). All computations were performed using meta R package. 104 Table 4.1: An Overview of BL and DLBCL datasets used for meta-analysis of sex skew of DDX3X mutation occurrence Study Disease Number of patients Male:Female ratio Sequencing type Abate et al. (2015) BL 20 5.67 RNA-Seq (2x75 bp) Gong, Krupka et al. (2021) BL 39 5.50 Targeted panel) Kaymaz et al. (2017) BL 28 2.50 RNA-Seq (2x100 bp) Lo´pez et al. (2019) BL 21 6.00 WGS Zhou et al. (2019) 167 20 2.41 WES Grande et al. (2019) BL 120 1.79 WGS Cucco et al. (2020) DLBCL 337 1.15 Targeted panel McCord et al. (2019b) (GOYA) DLBCL 553 0.93 RNA-Seq (2x50 bp) Reddy et al. (2017) DLBCL 998 1.30 WES 4.2.2 DDX3X regulates ribosome biogenesis and global protein synthesis 4.2.2.1 DDX3X binds preferentially to mRNA encoding components of core translation machinery The existing literature suggests that DDX3X has a versatile role in cell biology encompassing many aspects of RNA biology, regulation of cell proliferation, stress response and apoptosis. In order to uncover which of these may be relevant in lymphoma, Dr. Chun Gong performed immunoprecipitation of the endogenous DDX3X and SILAC mass spectrometry of the interacting proteins (Figure 4.4 A). Gene ontology analysis of proteins interacting with DDX3X in at least one cell line revealed a strong enrichment for the proteins participating in the translation initiation including almost all components of the eIF3 complex, eIF4A, eIF4E, and eIFG4 (Figure 4.4 B). This is in line with previous works (Lee et al., 2008a; Soto-Rifo et al., 2012; Shih et al., 2008) reporting association of DDX3X protein with translation initiation complex. Among interacting proteins, there were also 7 components of stress granules (SGs). SGs are membraneless assemblies of messenger ribonucleoproteins (mRNPs) that form from mRNAs stalled in translation initiation in response to stress (Protter and Parker, 2016; Buchan and Parker, 2009; Jain et al., 2016). SGs-associated proteins included: DDX1, ATXN2L, NUFIP2, PDCD4, USP10, UPF1, and EWSR1. These findings suggest that the role of DDX3X in lymphoma cell lines focuses on translation and protein synthesis, either through participation in translation initiation or stress granules assembly. 105 Figure 4.5: DDX3X co-immunoprecipitates with essential components of translation machinery A) DDX3X-interacting proteins were identified by SILAC-MS following immunoprecipitation of endogenous DDX3X in U2932 (RRID: CVCL 1896) and Mutu (RRID: CVCL ZY05). Scatter plot shows log2 SILAC ratios of interacting proteins. Proteins significantly enriched in both cell lines are labelled. The experiment was performed by Jade Gong. B) Venn diagram showing overlap of DDX3X-interacting proteins in two lymphoma cell lines - Mutu and U2932 C) Barplot showing gene Ontology (GO) enrichment of DDX3X-interacting proteins identified in both cell lines. 106 As an RNA-helicase, DDX3X can bind directly to RNA a↵ecting its fate and function. In order to identify transcripts bound by DDX3X, we decided to perform individual nucleotide resolution crosslinking immunoprecipitation (iCLIP). This technique combines immunoprecipitation of UV-crosslinked protein-RNA complexes with Next Generation Sequencing, which allows mapping the localisation of protein-RNA complexes with single nucleotide precision (Hafner et al., 2021). Although a similar technique was used previously by other groups, they all used HEK293T cells transfected to expressed FLAG-tagged DDX3X (Valentin-Vega et al., 2016; Oh et al., 2016; Calviello et al., 2021). To investigate the binding profile of DDX3X at physiological expression levels and take into account the context-specific functions of DDX3X, it was crucial to perform the experiment with endogenous DDX3X protein in lymphoid cells. iCLIP was conducted using two lymphoma cell lines: U2932 and Mutu, as well as non-malignant human GC B cells purified from discarded tonsil tissue (Caeser et al., 2019). The experiment was performed by Dr. Chun Gong in at least two biological replicates per condition with an isotype control with IgG antibody that does not recognise DDX3X. Details regarding the experimental protocol are described in (Gong, Krupka et al., 2021). Overall, the number of uniquely mapped reads for DDX3X iCLIP was between 4-24 million which, as expected from this technique, accounted for about 30-40% of the total number of reads aligned. In contrast, the number of uniquely mapped reads in the IgG control samples was less than 15,800 suggesting high signal-to-noise ratio. Because the density of crosslinking sites per gene was highly consistent between replicates (Pearson’s Correlation Coecient between 0.7 - 0.96), I pooled the data from the same cell type together to increase the sensitivity of DDX3X binding analysis, see methods 2.2.7.1. First, I examined the distribution of DDX3X crosslinking sites across genomic regions. In all cell types DDX3X bound ubiquitously to mature coding transcripts (Figure 4.6 A). To visualise the precise location of the DDX3X binding site from a mature mRNA per- spective, I performed a metagene analysis where the aggregated coverage of all crosslinking sites over all expressed transcripts is plotted relative to the known translation start and termination site. This showed strong enrichment in DDX3X binding at translation initia- tion sites (TIS) and further into the open reading frame with another peak in crosslinking sites density at the translation termination site (TTS) (Figure 4.6 B). These findings are consistent with our previously demonstrated association of DDX3X with the proteins of translation initiation machinery (Figure 4.5). 107 Figure 4.6: Binding profile of endogenous DDX3X in lymphoid cells A) Barplot showing density of iCLIP cross-link sites mapping to the indicated genetic features is shown for the indicated cell types. B) Venn diagram showing the overlap between DDX3X-bound transcripts detected in iCLIP experiments in lymphoma cell lines and primary human GC B cells. C) Metagene summary of cross-link density across DDX3X-bound mRNA transcripts, showing length-scaled coding region and 3kb of the 5’ and 3’ untranslated regions. TIS = translation initiation site, TTS = translation termination site, ORF = open reading frame. 108 As an RNA-helicase DDX3X is known to facilitate translation of transcripts with complex secondary structures in their 50UTR. To address this behaviour I investigated the preference in DDX3X binding with regards to the adjacent sequence context, GC content and RNA secondary structure. De novo motif search revealed no consensus binding motif, neither the analysis of folding energy profile and GC content around the binding site showed no pattern. This suggests that the binding of DDX3X to mRNA is not directly related to any particular RNA sequence context. Next, I examined whether DDX3X binds to a specific family of transcripts. In order to define a list of high-confidence DDX3X targets, I filtered out first all binding peaks with less than 10 crosslinking sites and then, discarded genes with less than 3 iCLIP peaks. I found substantial overlap between DDX3X targets in all three cell types - 45.73 % of all high-confidence targets were detected in at least 2 cell types (Figure 4.6 C). The majority of cell type-specific targets could be explained by the di↵erence in the expression level of the transcript and by the di↵erence in the total number of uniquely mapped reads between the groups. Although I had previously hypothesised that the R475 mutation situated in the RNA-binding domain may abolish interaction with RNA, the strong overlap of iCLIP targets between Mutu (DDX3X R475S) and the WT cells (U2932, primary GC B-cells) is evidence that this is not the case. However, I acknowledge that subtle binding di↵erences are not excluded. Figure 4.7: DDX3X protein binds predominantly to mRNA of ribosomal proteins Barplot showing Gene Ontology (GO) enrichment of DDX3X-bound transcripts identified by iCLIP in the indicated cell types. BP = Biological Process, CC = Cell component, MF = Molecular Function. 109 Gene ontology analysis of mRNA targets that were shared between at least two cell types (441 genes) revealed a strong enrichment for mRNAs encoding components of the core translation machinery, in particular, 45 ribosomal proteins, 9 translation initiation factors, 67 genes associated with cellular response to stress and 23 30UTR binding genes (Figure 4.7). High expression level of the core components of translation machinery enriched here may be considered a confounding factor suggesting that the iCLIP signal is non-specific. However, the iCLIP peaks spans the broad range of expression levels and these highly expressed mRNAs do not come up in other iCLIP experiments using identical protocol (personal communication with Dr. Martin Turner), which supports the specificity of the DDX3X binding. These findings suggest that DDX3X binds preferentially to mature transcripts encoding proteins linked to various aspects of protein translation. 4.2.2.2 DDX3X regulates translation of a subset of expressed transcripts Links between DDX3X and protein synthesis machinery were revealed by: 1) co- immuno- precipitation of DDX3X with several translation initiation factors and stress granules components, and 2) the observation of preferential binding to a subset of mature tran- scripts, prompted me to form a hypothesis, that DDX3X may alter the translation of its mRNA targets. To determine which transcripts are sensitive to DDX3X depletion, we performed transcriptome-wide translational profiling (Ribo-Seq) in lymphoma cell lines. The basic assumptions, strengths and limitations of Ribo-Seq technique were discussed in detail in Chapter 3. Sequencing libraries were prepared by Dr. Jie Gao as detailed in section 2.2.2.2 . The experiment was performed in two lymphoma cell lines: Mutu and U2932 with two di↵erent DDX3X shRNA and a scrambled shRNA as control. For each sample, a paired sequencing (RNA and Ribo-Seq) was performed at two time points (24h and 48h) (Figure 4.8). Figure 4.8: Identification of DDX3X sensitive transcripts: Experimental setting Diagram showing experimental design of translational profiling for DDX3X shRNA cells 110 The average number of uniquely mapped reads (non-rRNA ) in the Ribo-Seq samples was 9,623,015. In all samples, the profile of aligned sequencing reads meets expectations for Ribo-Seq experiment: 1. fragment length between 26 and 32 nucleotides with a peak at 28, nucleotides, 2. enrichment of fragments in CDS, 3. evidence of three-nucleotide periodicity in the frame preference (Figure 4.9 A-C), 4. characteristic pattern shown in the metagene analysis of the estimated P-site location showing enrichment of footprints at the start codon and abrupt drop-o↵ at the stop codon (Figure 4.9 D), performed as described in section 3.1.1 The gene expression measurement with Ribo-Seq and RNA-Seq were highly reproducible between experimental replicates. The Pearson correlation coecient was higher than 0.97 for all the samples from the same sequencing type. mRNA levels were also strongly correlated with the ribosome abundance (Pearson correlation coecient 0.92-0.95). All this together supports a good quality of the experiment. I sought to classify transcripts into distinct regulatory profiles. The extent to which change in expression occurs at the level of translation, transcription or both can be distinguished by juxtaposing the di↵erence in ribosomal footprints with the di↵erence in mRNA abundance. Because mRNA abundance positively correlates with the number of ribosomal footprints mapping to this region, the ribosomal footprints density for each transcript should be normalised by the transcript abundance. The obtained metric reflects the relative density of ribosomes per transcripts and is known as Translation Eciency (TE). Interpretation and limitation of TE has been reviewed in section 3.1.3. I dissected mRNA and translation driven changes using the method described in section 3.1.3. This is the same strategy that was applied to elucidate translational consequences of BCL6 and MYC overexpression in the chapter 3. This analysis revealed that the changes in expression profile after DDX3X depletion were mainly limited to the level of translation. In DDX3X WT cell line U2932, out of 200 di↵erentially expressed genes, 70 and 90 showed decreased or increased translation rates respectively (Figure 4.10 A). Because the number of genes di↵erentially expressed at the level of mRNA was relatively small (40 genes), I decided to simplify the original classification described in section 2.2.3. The remaining 40 genes di↵erentially expressed in RNA-Seq data were classified into two groups: mRNA down or mRNA up, depending on the direction of change. 111 Figure 4.9: Quality control of Ribo-Seq dataset examining translational conse- quences of DDX3X depletion in U2932 and Mutu A) Histogram showing the distribution of read length in Ribo-Seq and RNA-Seq experiments. Ribo-Seq shows a characteristic peak at 28-29 nt, which corresponds to the length of the ribosome protected mRNA fragment. B) Heatmap of Ribo-Seq read frame usage by read length, showing read frame restriction in the 28-31nt ribosome protected fragments, characteristic of ribosome position on the mRNA transcript. C) Heatmap of Ribo-Seq read frame usage by gene feature, showing how the characteristic read frame bias within the CDS but not the UTRs. D) Metagene plot showing distribution of mapped Ribo-Seq reads to regions of the transcript, with the characteristic peak at the translation initiation site (TIS) and abrupt drop-o↵ at translation termination site (TTS). 112 Translationally downregulated genes were enriched for components of the core transla- tional machinery, in particular, protein constituents of the ribosome (Figure 4.10 B-C). This was specific to the cytosolic ribosome as no significant di↵erence in TE was observed for transcripts encoding mitochondrial ribosomal proteins (Figure 4.11 B p-value = 2.2 · 1016, Kolmorgorov-Smirnov test). The group of translationally downregulated genes also included ODC1, a known gene translationally regulated by DDX3X (Calviello et al., 2021). No significantly enriched terms were found in the translationally upregulated mRNA up or mRNA down groups. Genes in TE down group were more likely to be identified as iCLIP targets (Figure 4.11 A-C, p-value=3 · 1014, Chi-squared test). Identified translationally regulated genes spanned broad range of mRNA expression levels. Figure 4.10: DDX3X-sensitive transcipts are enriched for the components of the cytosolic ribosome A) Scatter plot comparing changes in mRNA abundance (RNA-Seq) with changes in ribosome footprint density (Ribo-Seq) following shRNA depletion of DDX3X in U2932. Data are from eight replicate knockdowns using two di↵erent shRNAs. Transcripts with altered translational eciency (TE) or mRNA abundance are indicated by color. B) Cumulative distribution of TE change in U2932 following shRNA depletion of DDX3X is plotted for genes encoding cytosolic or mitochondrial ribosome proteins, or all other genes. P-value calculated using Kolmogorov-Smirnov test. C) Barplot showing Gene Ontology (GO) enrichment of genes with reduced TE following DDX3X depletion. 113 Figure 4.11: Di↵erentially translated mRNAs following DDX3X depletion in U2932 are more likely to be bound by DDX3X A) Scatter plot showing changes in TE plotted against mRNA abundance. Genes identified as di↵erentially translated are coloured. Ribosomal proteins and ODC1 (a known DDX3X- regulated gene) are indicated. B) Scatter plot showing changes in TE plotted against cross-lining density from iCLIP experiments. C) Barchart showing the proportion of transcripts with di↵erential translation identified as direct targets of DDX3X in iCLIP experiments. The number of genes within each category is indicated. Adjusted p-values (Fisher test) are shown and reflect the comparison of each category with stable genes. D) Violin plot showing GC content distribution across di↵erent categories of di↵erentially expressed genes. Adjusted p-values (Wilcoxon test) are shown and reflect the comparison of each category with stable genes. E) Scatter plot showing the log2 fold changes in TE against 50TOP score, which reflects the strength of the 50TOP motif. Canonical 50TOP genes are indicated. 114 The helicase activity of DDX3X may be linked to preferential translation of mRNAs with complex 50UTR structure. I compared GC content and RNA folding energy of 50UTR sequence in translationally regulated mRNAs to explore this hypothesis. Although translationally downregulated genes had slightly higher GC content (adjusted p-value = 0.0092, Wilcoxon rank sum test), the RNA folding energy was not significantly di↵erent between the regulatory groups (Figure 4.11 D). Because translational control of ribosomal proteins is linked to certain RNA motifs, such as 5 terminal oligopyrimidine (50TOP) motif, I explored also this option. I downloaded the list of canonical 50TOP and 50TOP scores from a recent study surveying 50TOP sequences in the human transcriptome (Philippe et al., 2020). A striking finding was that all canonical 50TOP mRNAs, which include ribosomal proteins and few translation factors, showed a decreased translation eciency following DDX3X depletion (p-value < 2.2e-16, Wilcoxon rank sum test), suggesting that DDX3X may regulate translation of its targets through 50TOP motif (Figure 4.11 E). Next, we validated the prediction of Ribo-Seq in three ways: 1. Mass spectrometry of DDX3X shRNA U2932 cells: The results of Ribo-Seq experiments were confirmed at the protein level with a tandem mass tag (TMT) mass spectrometry. In line with the Ribo-Seq results, the abundance of almost all cytosolic ribosomal proteins was reduced after shRNA depletion of DDX3X in U2932 cells (Figure 4.12 A). GO analysis of proteins with reduced abundance in the DDX3X shRNA group showed enrichment for terms associated with protein synthesis, and the protein constituents of the ribosome, therefore reiterating the Ribo-Seq findings (Figure 4.12 B-C). Interestingly, the depletion of R475S mutant DDX3X from Mutu had minimal e↵ect on translation or mRNA abundance of specific transcripts, which supports the hypothesis of loss-of-function nature of this mutation(Figure 4.12 B). 2. ProteomeHD database of protein co-regulatory groups: To examine the DDX3X interaction in a broader context, I queried Proteome HD database (Kustatscher et al., 2019). This repository integrates data from 5,288 mass- spectrometry runs spanning diverse human tissue types and biological conditions to create a map of the functional associations between co-expressed proteins. Examining co-regulation landscape of DDX3X showed 81 proteins, of these 27 were parts of the GO Cytosolic Ribosome group (adjusted p-value = 7.2 · 1042) and 8 were components of the core translation machinery (Figure 4.13). Overall, GO terms with the strongest enrichment among DDX3X-interacting proteins were: RNP complex, mRNA metabolism, translation initiation and the cytosolic ribosome. 115 Figure 4.12: Validation of Ribo-Seq results with proteomic profiling of DDX3Xsh cells. A) Heatmap showing altered abundance of ribosomal proteins in mass spectrometry analysis performed following shRNA DDX3X depletion in U2932. B) Heatmap showing fold change in RNA-Seq, RiboSeq and TE across eight replicate knock- downs for all di↵erentially translated genes. Protein abundance changes are shown (Mass Spec). DDX3X targets identified from iCLIP and genes encoding ribosomal proteins (RPs) are indicated by purple or red bands respectively. C) GO terms enriched amongst proteins with reduced abundance in proteomic profiling (MS) following DDX3X depletion. Results from RiboSeq experiments are included for comparison. 116 Figure 4.13: Validation of Ribo-Seq results with ProteomeHD analysis of proteins co-regulated with DDX3X and OPP-assay of DDX3X depleted cells A) Map of the co-regulated human proteome plotted using data downloaded from Proteome-HD (Kustatscher et al., 2019). The 81 proteins identified as being statistically co-regulated with DDX3X are coloured. Ribosome proteins identified as being co-regulated with DDX3X are shown in red. B) Barplot showing global protein synthesis quantified by OPP incorporation at the indicated time points following shRNA depletion of DDX3X in U2932 and normalized to control shRNA. Data shows mean+SEM, * p < 0.05, *** p < 0.001, ANOVA with multiple comparison testing, n=4 replicate experiments. Performed by Dr. Chun Gong. 117 3. O-propargyl-puromycine (OPP) assay for measuring global translation rate: Translational downregulation of the ribosomal proteins after shRNA depletion of DDX3X in U2932 was associated with a reduction in global synthesis rate (Figure 4.13 B). A similar e↵ect was observed in primary GC B-cells transduced with helicase-dead DDX3X mutants (Gong, Krupka et al., 2021). Taken together, these results reveal that DDX3X promotes translation of transcripts encoding core components of translation machinery, in particular ribosomal proteins. The net e↵ect of this is the regulation of global protein synthesis capacity. 4.2.3 Deregulation of MYC in primary GC B-cells increases ri- bosome biogenesis and triggers ER stress. The observation that loss-of-function mutations in DDX3X a↵ect ribosomal proteins synthesis, and hence decrease global translation load, contrasts sharply with the known role of MYC in promoting ribosome biogenesis. Relevant experimental system is key to understand molecular mechanism of lymphoma. GC B-cells are the cell-of-origin of BL, so it was essential to investigate the the transcriptional consequences of MYC overexpression in this system. Previous diculties with genetic manipulations of ex vivo human B-cells means that this experiment has not been performed before. For this I analysed the RNA-Seq data I generated in chapter 3 (section 3.2) Transduction of MYC alone into primary GC B-cells triggers apoptosis, which is not the case for MYC transduction into established lymphoma cell lines. Therefore an experiment where MYC is overexpressed together with BCL2 (MYC-t2A-BCL2), should allow me to infer transcriptional response to MYC in the context of cell-of-origin of BL. Di↵erential expression analysis comparing MYC-BCL2 with BCL2 alone revealed massive upregulation of a ribosome biogenesis signature. This was specific to MYC, as no such pattern was seen for BCL6-BCL2 cells. In addition, Gene Set Enrichment Analysis (GSEA) revealed strong upregulation of Unfolded Protein Response (UPR) signature. Moreover, ERN1, a key sensor of UPR, was one of the most significantly upregulated genes in MYC transduced GC B-cells. UPR is an adaptive response to disturbance of the Endoplasmic Reticulum (ER) homeostasis, which results in the accumulation of misfolded proteins in the ER lumen. It triggers the expression of ER chaperones aiming to contain misfolded proteins through ER-associated degradation (ERAD) or commit the cell to apoptosis (Walter and Ron, 2011; Ruggiano et al., 2014; Hetz et al., 2015; Zhang et al., 2020). Molecular markers of UPR involve alternative splicing of transcription factor XBP1 initiated by ERN1 and phosphorylation of eIF2↵ by EIF2AK3 (PERK) (Ron and Walter, 2007). 118 Figure 4.14: Deregulation of MYC in primary GC B-cells increases ribosome biogenesis and triggers Unfolded Protein Response A) Heatmap showing mRNA expression level of genes belonging to the gene set Ribosome Biogenesis (GO: 0042254) in human GC B-cells transduced with BCL2, BCL6-2A-BCL2, or MYC-2A-BCL2 B) Gene set enrichment analysis (GSEA) of RNA-seq from human GC B-cells transduced with MYC-2A-BCL2 compared to BCL2 alone, showing enrichment of gene sets related to MYC, UPR, and mammalian target of rapamycin complex 1 (mTORC1) signalling. Genes ordered according to DESeq2 test statistics (decreasing order). C) Volcano plot (log2 fold change against -log10(FDR) from di↵erential expression analysis MYC-2A-BCL2 compared to BCL2 alone). The position of MYC and ERN1 indicated. 119 Elevated levels of phosphorylated eIF2↵ and spliced isoform of XBP1 in primary GC B-cells transduced with MYC-t2A-BCL2 were confirmed experimentally by Dr. Chun Gong. UPR turned out to be, at least partly, related to MYC-induced apoptosis. Primary GC B-cells transduced with MYC alone and treated with rapamycin, an allosteric inhibitor of mTORC1, had significantly lower protein synthesis associated with a modest reduction in apoptosis rate. This experiment was performed by Dr Chun Gong (Gong, Krupka et al., 2021). These results show that deregulation of MYC in primary GC B-cells is associated with a higher protein synthesis rate and induction of ER stress response. Loss of DDX3X, by limiting global protein synthesis might, protect cells from proteotoxic stress, thus alleviating MYC-induced apoptosis. Support for this hypothesis comes from a series of experiments described in the next section. 4.2.4 DDX3X mutation interferes with endoplasmic reticulum stress response 4.2.4.1 DDX3X R475C mutation is associated with suppression of unfolded protein response in U2932 cells A combination of computational and experimental evidence allowed us to hypothesise that loss-of-function mutations in DDX3X may play a similar role to rapamycin treatment in protecting cells from MYC-induced apoptosis. If deregulation of MYC in primary GC B-cells is associated with proteotoxic stress and apoptosis, DDX3X mutations may lower translation load, alleviate ER stress, thus decreasing apoptosis rate. To interrogate the regulatory network of DDX3X mutation in lymphoid cells, I analysed RNA-Seq data from an experiment performed by Dr. Gong comparing CRISPR–edited clones of U2932 cells expressing R475C helicase mutant DDX3X. Control clones with a syn- onymous mutation in endogenous DDX3X were created in parallel. Di↵erential expression analysis comparing CRISPR-Cas9 DDX3X clones revealed strong downregulation of the key regulators of ER stress response - ERN1 and XBP1. Moreover, the mRNA expression pattern of R475C-edited clones resembled the profile of samples with the strongest deple- tion of DDX3X following shRNA treatment. GSEA analysis of DDX3X-mutant clones and RNA-Seq of shDDX3X depleted cells also revealed striking overlap. In both comparisons ‘MYC targets V1’, ‘MTORC1 signalling’ and ‘Unfolded Protein Response’ terms were among the most downregulated gene sets. 120 Figure 4.15: DDX3X R475C mutation is associated with downregulation of genes associated with unfolded protein response in U2932 cells A) GSEA analysis of RNA-seq data comparing DDX3X R475C-edited or control clones (left), and DDX3X shRNA knockdown experiments (right). Genes ordered according to DESeq2 test statistics (decreasing order). B) Heatmap showing genes that are di↵erentially expressed between control and homozygous R475C edited clones (left). The same genes are shown for shRNA knockdown (right). For shRNA experiments samples are ordered left to right by DDX3X mRNA expression. The top bar indicates the expression of DDX3X mRNA showing how stronger knockdown recapitulates the signature seen in R475C-edited clones. C) Boxplot showing relative expression of the Unfolded Protein Response (UPR) marker transcripts ERN1 (encoding IRE1) and XBP1 mRNA in RNA-seq from DDX3X R475C- edited clones. Statistical significance from di↵erential expression analysis (DESeq2). 121 Figure 4.16: DDX3X R475C mutation is associated with downregulation of proteins associated with ER stress in U2932 cells A) Heatmap showing proteins with altered abundance in proteomic profiling of DDX3X R475C-edited clones. Proteins included in the Gene Ontology (GO) terms Endoplasmic Reticulum (ER), GO: 0005783, and ER-associated protein degradation pathway (ERAD), GO: 0036503, are indicated by red and orange highlighting, respectively. B) Barplot showing the statistical significance of top GO terms enriched among proteins with decreased expression in DDX3X R475C-edited clones. BP, biological process; CC, cellular component; MF, molecular function. 122 Given the evidence of DDX3X driving gene expression changes predominantly at the level of translation, the R475C mutant and control clones were subjected to proteomic profiling. In line with previous experiments, GO terms linked to protein processing in the ER and ER stress were enriched in protein downregulated in R475C mutant clones. Nearly one-third of downregulated proteins were related to GO terms associated with the ER or ER stress. I conclude that the R475C mutation can be considered loss-of-function and is associated with suppression of expression of genes related to unfolded protein response. 4.2.4.2 DDX3X mutation is associated with suppression of unfolded protein response in BL patients In an orthogonal approach, I examined gene expression profiles in human biopsy samples obtained from two published BL datasets (Grande et al., 2019; Schmitz et al., 2012). I downloaded raw FASTQ files from Sequence Read Archive, performed quality check, alignment to the reference genome and di↵erential expression and GSEA analysis comparing DDX3X mutant to wild-type samples. Expression pattern of DDX3X mutant cases mirrored the changes observed in our CRISPR-edited clones: reduced expression of ‘MYC targets’, ‘MTORC1 signalling’ and ‘Unfolded Protein Response’ sets in GSEA accompanied by strong downregulation of ERN1 and XBP1 mRNA levels. The link between DDX3X mutation and ER stress response was confirmed experimen- tally by Dr. Chun Gong. Co-transduction of primary GC B-cells with MYC and DDX3X mutant recapitulated the pattern observed after rapamycin treatment: loss-of-function DDX3X mutations (R475C and K230E) abrogated both the MYC-induced apoptosis as well as the increase in the global translation rate caused by MYC. DDX3X mutation was also able to alleviate ER stress response induced by treatment with thapsigargin (Gong, Krupka et al., 2021), which inhibits the sarco/endoplasmic reticulum Ca2+ ATPase (SERCA) thereby inducing ER stress . Taken together, I conclude that loss-of-function DDX3X mutations can counteract the e↵ects of MYC to drive global protein synthesis and trigger proteotoxic stress. This also reveals a potential vulnerability of MYC-driven lymphomas to drugs inducing ER stress. 123 Figure 4.17: Downregulation of genes associated with unfolded protein response in BL patients with DDX3X mutation A) GSEA analysis of RNA-seq data from the indicated studies reanalyzed to compare cases of sporadic BL with either WT or mutant DDX3X. Genes ordered according to DESeq2 test statistics (decreasing order). B) Relevant gene sets downregulated in the presence of mutant DDX3X or the relative abundance of the Unfolded Protein response (UPR) transcripts ERN1 and XBP1. Statistical significance is from DESeq2. C) Heatmap showing mRNA expression in RNA-Seq of GSEA core enrichment genes in the gene set Hallmark Unfolded Protein Response in BL biopsies in two published Burkitt lymphoma RNA-Seq data sets and in U2932 DDX3X R475C edited clones. 124 4.2.5 Up-regulation of DDX3Y in established tumours rescues loss of DDX3 helicase activity Context-dependent e↵ect of DDX3X activity was a recurrent finding in this and previous studies. While loss-of-function DDX3X mutation was beneficial for primary GC B-cells co-transduced with MYC, it triggered apoptosis in established lymphoid cell lines (Gong, Krupka et al., 2021). Because upregulated translation rate is a feature of established tumours, and the ability to increase global protein synthesis is essential for MYC-driven lymphomas (Barna et al., 2008), we hypothesised that there must be a way to compensate for the reduced translation capacity during the latter stages of lymphomagenesis. Both BL and MHG subtypes of DLBCL have strongly skewed sex ratios in favour of males and DDX3X mutations are more frequent in male-skewed cancers (Alkallas et al., 2020; Dunford et al., 2017). DDX3X shares almost 92% amino–acid and 70% nucleotide sequence identity with the Y–chromosome homologue, DDX3Y. At the functional level, it is redundant with DDX3X in regulating protein synthesis (Venkataramanan et al., 2020). Firstly, DDX3Y can rescue a decrease in total translation after depleting DDX3X and secondly, the translation profile of single transcript in male-derived colorectal cancer HCT 116 cells is indistinguishable (Venkataramanan et al., 2020). DDX3Y is widely transcribed in many adult tissue types, but it is not expressed at the protein level anywhere except spermatogonia (Ditton et al., 2004; Foresta et al., 2000a; Rauschendorf et al., 2011). To establish the status of DDX3Y expression in lymphoid cells, I examined changes in mRNA abundance in a previously published RNA-Seq dataset including di↵erent states of GC B-cells (processed by me from FASTQ files) (Caeser et al., 2019). I did not observe significant changes in DDX3X mRNA comparing freshly isolated GC B-cells, established lymphoma cell lines, as well as transduced and transformed GC B-cells. Finally, no significant di↵erence in DDX3X mRNA level was observed between DDX3X mutant and wild–type samples in male BL patients (Grande et al., 2019). However, the coverage of ribosomal footprints in the DDX3Y region from Mutu cells was similar to known protein coding regions with similar expression level. In line with that, immunoblotting for DDX3Y in male lymphoma cell lines performed by Dr. Chun Gong revealed strong protein expression. The same was observed in patient-derived BL xenografts and 5 primary biopsies from male patients (Gong, Krupka et al., 2021). This is consistent with previous works proposing that DDX3Y protein expression in testis is regulated predominantly at the level of translation through alternative usage of translation initiation site in 50UTR of DDX3Y (Jaroszynski et al., 2011). I questioned whether DDX3Y translation might be directly influenced by DDX3X. However, I found no evidence from the iCLIP or Ribo-Seq experiments that the DDX3Y transcript was a direct target of DDX3X. Nevertheless, the link between loss-of-function mutation in DDX3X and protein expression of DDX3Y 125 remains strong. In vivo tumorigenesis experiment performed by Dr. Chun Gong to validate this observation showed that primary GC B-cells with deleted DDX3X, transduced with MYC-2A-BCL2 and implanted subcutaneously in Matrigel into immunodeficient mice, forms tumours and start to express DDX3Y protein seven weeks after injection. This suggests that DDX3Y expression at the protein level is unique to transformed B-cells. Although indirectly, the expression of DDX3Y is linked to loss of DDX3X, but the exact mechanisms of this is unclear. Figure 4.18: Expression of DDX3Y and DDX3X in primary GC B-cells and lymphoma cell lines A) Boxplot showing the mRNA expression level of DDX3X and DDX3Y in Primary GC B-cells, GC B-cells transduced with BCL6-BCL2 or MYC-BCL2 constructs and in lymphoid cell lines (left panel) and in BL patients for males and females separately (right panel). VST, Variance Stabilising Transformation B) Scatter plot of mean mRNA and ribosomal footprints abundance (VST transformed) showing the expression level of DDX3Y in comparison to other genes. DDX3X, DDX41 (ubiquitously expressed RNA helicase), FOXO1 (BL oncogene) and EIF3J (translation initation factor) were indicated for comparison. C) Coverage plot of Ribo-Seq and RNA-Seq reads in the DDX3Y region in representative samples from Mutu cells 126 4.3 Discussion Main findings Although recurrent DDX3X mutations have been identified in a variety of malignancies, the molecular role of DDX3X in malignancy remains puzzling. The versatile function of DDX3X in regulating multiple stages of RNA biogenesis is mirrored by its complex and context- specific role in tumorigenesis - both oncogenic and tumour suppressor activity has been reported. This study discusses the function of DDX3X in MYC-driven B-cell lymphoma revealing functional cooperation between MYC and mutant DDX3X. Deregulation of MYC expression, through translocation to highly expressed immunoglobulin loci or mutations increasing protein stability, is central to the development of Burkitt Lymphoma and Double Hit or Molecular High-Grade subtype of DLBCL. All three diseases share the germinal centre origin and are associated with poor clinical prognosis. The model proposed here highlight the vulnerability of MYC-driven lymphoma to proteotoxic stress opening an attractive opportunity for therapeutic intervention. By integrating data from di↵erent high-throughput sequencing techniques I show that the e↵ect of DDX3X mutations is mainly mediated by changes in translation of selected transcripts, predominantly ribosomal proteins, which in turn controls global translation intensity. The requirements for translation load change during lymphomagenesis, so that the state of decreased translation capacity might not meet the demands of a fully established tumour. By sequential deregulation of DDX3X and DDX3Y, it is possible to bu↵er global protein synthesis to support these changing needs. Therefore drugs that disrupt this delicate balance of translation load and proteotoxic stress may prove e↵ective against MYC-driven lymphoma. Context specific role of DDX3X in malignant cells DDX3X mutations have been widely studied in medulloblastoma, the most common brain tumour in children, where they are associated with Wingless (WNT) and Sonic hedghog (SHH) subtypes (Northcott et al., 2017; Patmore et al., 2020). The pattern of DDX3X mutations in medulloblastoma di↵ers from what I have observed in BL, which suggests a di↵erent molecular role of DDX3X mutant in the two tumours. While nonsense and frameshifts mutations are frequent in BL, they are never observed in medulloblastoma. Moreover, amplification of MYC family genes, which is a hallmark of Group 3 and Group 4 medulloblastoma subtype is never accompanied by mutations in DDX3X (Northcott et al., 2017), indicating that the cooperative e↵ect between DDX3X and MYC is also context- specific. DDX3X has been shown to regulate brain development and mediates tumour suppressing stress and inflammasome response (Patmore et al., 2020). Medulloblastoma- 127 associated DDX3X mutations promote oncogenic WNT signalling directly in helicase- independent fashion through an association with CSNK1E protein (Cruciat et al., 2013; Pugh et al., 2012) and indirectly, through prevention of pyroptosis, inflammatory apoptosis mediated by inflammasome activation coupled with WNT signalling (Huang et al., 2020; Samir et al., 2019; Patmore et al., 2020). I did not find compelling evidence of the association between DDX3X and CSNK1E in B-cells. Other previously reported individual targets of DDX3X include ODC1 (Van Steeg et al., 1991; Calviello et al., 2021), KLF4 (Cannizzaro et al., 2018) and MITF (Phung et al., 2019). Of which, only ODC1 was found to be DDX3X sensitive in lymphoid cells highlighting the strong tissue-specific context of DDX3X activity. In contrast, I collected multiple orthogonal lines of evidence to support the regulation of ER stress and protein synthesis as the dominant phenotype of DDX3X mutations in MYC-driven lymphomas. This, in light of these data, is achieved through the regulation of translation of selected transcripts encoding the core components of the translation machinery. DDX3X–DDX3Y axis maintains the balance between translation load and proteotoxicity that is stage-specific and allows the germinal centre B-cells for adaptation to deregulated MYC expression. This study, however, does not allow me to exclude the possibility that loss-of-function mutations in DDX3X exert additional functions that may converge with functions of DDX3X reported in other tissue types. Role of DDX3X as auxiliary translation factor DDX3X is reported to play multiple roles in RNA biology, however, this study indicates that in primary GC B-cells and lymphoma cell lines, DDX3X acts predominantly at the level of translation. The role of DDX3X in regulating protein synthesis has been studied previously revealing both activating and repressing functions. Although this study does not define the precise mechanism of translational repression upon loss of DDX3X in B-cell lymphoma, at least two models could potentially explain this phenomenon. Firstly, DDX3X is known to act as an auxillary translation factor directly interacting, independently on RNA, with eIF3, and 40S subunit of the ribosome (Geissler et al., 2012; Lee et al., 2008b). DDX3X induces conformational changes of the 43S pre-initiation complex promoting the release of translation initiation factors and joining the 60S ribosomal subunit leading to 80S complex formation (Geissler et al., 2012). In line with this study, the role of DDX3X in translation seems to be limited mainly to the initiation stage. In lymphoid cells, we observed that DDX3X protein co-immunoprecipitates with almost all components of the eIF3 complex and DDX3X iCLIP binding profile resembles the pattern of known translation initiation factors (Calviello et al., 2021). In fact, DDX3X has not been found in polysome fractions of human hepatoma cells by Geissler et al. (2012) reinforcing translation initiation as the main point of DDX3X mediated translational control. Although DDX3X depletion a↵ects translation intensity globally, it is worth mentioning 128 that Geissler et al. (2012) found that only about 50 % of the newly assembled ribosomes contained DDX3X, which suggests that regulation of translation by DDX3X might be limited only to a subset of transcripts. In fact, as an RNA-helicase, DDX3X can participate in the unwinding of highly structured regions within 50UTRs, thus, facilitating translation initiation, but the evidence underlying this activity is puzzling. Soto-Rifo et al. (2012) and Calviello et al. (2021) work on DDX3X showed that transcripts with complex 50UTR, originating from viral or host transcription respectively, are sensitive to DDX3X depletion. This is in contrast to Geissler et al. (2012), who argue that depletion of DDX3X had a substantial impact on the translation of all studied viral transcripts regardless the complexity of the leader sequence. Almost complete loss of helicase activity for BL-associated DDX3X mutant suggests that the molecular function of DDX3X in BL is helicase-dependent. This would argue in favour of the 50UTR structure as a key determinant for DDX3X-sensitive transcript. Although, DDX3X-sensitive transcripts had slightly higher GC content, their RNA folding energy of their 50UTR was not significantly di↵erent from other genes suggesting that other factors may be involved. An alternative hypothesis linking DDX3X with translation regulation of the core compo- nents of translation machinery is 50TOP-mediated regulation. 50 terminal oligopyrimidine (50TOP) motifs are situated in the 50UTR of selected transcripts. The translation of 50TOP mRNAs is inhibited by LARP1, an RNA-binding protein, which is inactivated by mTORC1 complex. There is a possibility that DDX3X competes with LARP1 to control translation of 50TOP transcripts or acts as 50TOP translation modulator. Apparently, almost all canonical 50TOP mRNAs have negative TE changes after DDX3X knock-down in U2932, but further experiments are needed to confirm if translational control by DDX3X is mediated through 50TOP and/or mTOR control. Di↵erent DDX3X expression levels may promote di↵erent cellular responses The observation of dose-dependent mRNA expression profile of DDX3Xsh cells, which resembles the profile of R475C cells only when depletion is the strongest could suggest that even small amount of DDX3X is sucient to maintain translation of target transcripts. Dual, dose-dependent role of DDX3X in translation has been reported previously. While moderate overexpression of DDX3 stimulated translation Geissler et al. (2012), massive overexpression lead to a halt in global translation. Given the versatile function of DDX3X in regulating di↵erent aspects of RNA biology, there is a possibility that di↵erent functions require precise titration of DDX3X levels in the cell. In addition to the regulation of translation initiation, studies of the yeast homologue of DDX3X, Ded1, show that it is able to nucleate stress granules (SGs) formation (Hilliker et al., 2011). While sequestration of selected transcripts in SGs does not require Ded1 129 helicase activity, the reverse process of releasing mRNA into the cytoplasm does so, thereby promoting their translation (Hilliker et al., 2011). Indeed, SGs formation in human cells was reported in studies using overexpressed mutant DDX3X (Lennox et al., 2020a; Valentin-Vega et al., 2016). Valentin-Vega et al. (2016) argue that low complexity domains flanking the helicase core of DDX3X are the main e↵ector of SGs assembly. This could explain the observed toxicity of supraphysiological levels of either wild-type or mutant DDX3X (Lennox et al., 2020a; Valentin-Vega et al., 2016). Pro-oncogenic properties of decreased translation load The process of oncogenesis is typically associated with an increased translation load that meet the demands of deregulated cell cycle and growth (Ruggero, 2013). However, in several malignancies, an opposite strategy has evolved. In neuroblastoma, the most common extracranial tumour in children, adaptation to limited nutrient supply is mediated by the activation of eEF2-kinase, which suppress translation elongation (Leprivier et al., 2013). It is also interesting that this e↵ect was found predominantly in MYC-N-driven tumour model (Delaidelli et al., 2017) suggesting that tight regulation of translation load might be a general feature of malignancies deregulation of the MYC gene family. RUNX1 deficiency, which is common in myelodysplastic syndrome and acute myelogenous leukaemia (AML), reduces the rate of ribosome biogenesis making the haematopoetic stem cells (HSCs) resistant to stress, thus, providing a competitive advantage over normal HSCs (Cai et al., 2015). Another example comes from a mouse model of medulloblastoma, where an activating mutations in PERK are essential for premalignant cells to decrease translation load, which is restored at later stages of tumour formation (Ho et al., 2016). PERK (EIF2AK3) is a key activator of the integrated stress response, which target 50cap translation initiation. Similarly to our model, lowering of global translation at early stages of tumorigenesis decreases ER stress and protects cells from apoptosis (Ho et al., 2016). The conclusion that suppressed translation is advantageous for MYC-induced lym- phomagenesis might be inconsistent with studies using existing mouse models of MYC- driven lymphoma. Haploinsuciency for Rpl24 in Eµ-Myc mouse, which overexpress Myc in the B-cell compartment, decreases global protein synthesis and delayed the lymphoma onset (Barna et al., 2008). This is radically di↵erent to the two-stage model proposed in this study. Genetic deletion of Rpl24 induces permanent suppression of translation, whilst loss of DDX3X can be rescued by expression of DDX3Y protein. Thus the level of translation can be tuned to changing requirements of the newly forming tumour. This hypothesis relies strongly on the assumption that DDX3X and DDX3Y are redundant in their molecular function. Germline heterozygous mutations in DDX3X has long been linked to cortical malformations, intellectual disability and autism-spectrum disorders in females (Lennox et al., 2020b). Males with DDX3Y mutation, in turn, present with 130 infertility, but no cognitive dysfunction has been observed suggesting that DDX3X cannot compensate for loss of DDX3Y (Ferlin et al., 2003; Foresta et al., 2000b). This might be attributed to the di↵erence in expression pattern between tissue types, however, I cannot exclude the possibility that the ectopic expression of DDX3Y in transformed B-cells is more than just rescuing DDX3X insuciency. Nevertheless, the evidence supporting the overlap in molecular role of DDX3X and DDX3Y is rich. Firstly, DDX3X and DDX3Y are 92% identical in the amino-acid sequence. Secondly, recently published work reveal that substitution of DDX3X with DDX3Y completely recapitulates gene expression profile at the level of transcription and translation (Venkataramanan et al., 2020). Finally, we observed that DDX3Y bound the same mRNA targets as DDX3X and had a similar influence on regulating global protein synthesis (Gong, Krupka et al., 2021). The ability to reactivate DDX3Y protein expression may explain the observed sex bias towards males in BL and other cancers with recurrent DDX3X mutations. Given the importance of restored protein synthesis capacity in mature tumours, a natural question arises: what compensatory mechanism exists in female lymphomas, who also (but at much lower frequencies) present with DDX3X mutation. A review of mutation co-occurrence in BL exome sequencing data (Grande et al., 2019) shows that mutations in SIN3A are enriched (24 % vs. 6%; p-value < 0.05, Fisher Exact test) in the presence of DDX3X mutation. SIN3A is a transcriptional corepressor that antagonises MYC directly interacting with MXD1-MAX heterodimers (Nascimento et al., 2011). This observation allows me to speculate that SIN3A inactivation at the latter stages of lymphomagenesis could increase MYC activity and thereby global translation capacity. Other complementary mechanisms, genetic or non-genetic, restoring protein synthesis or increasing MYC activity are also possible. In contrast, this problem does not exist in males. Ectopic expression of DDX3Y has been found in all investigated cell lines. Moreover there are almost no mutations in DDX3Y across published lymphoma datasets, which suggests that functional DDX3Y is important for successful lymphomagenesis. The importance of DDX3X/DDX3Y axis in MYC-driven lymphoma opens new therapeutic opportunities The results presented here reinforce therapeutic relevance of drugs targeting ER stress for the treatment of MYC-driven lymphoma. Firstly, they support the vulnerability of the lymphomagenesis process to proteotoxic stress. This is especially interesting in the light of the recent results from the REMoDEL-B study examining bortezomib as an addition to standard R-CHOP therapy in DLBCL. An unexpected outcome of a post-hoc analysis was that the addition of the proteasome inhibitor appeared to benefit patients of the MHG subtype. This study also proposes DDX3X as an attractive therapeutic target. DDX3Y is essential for maintaining the translational capacity of transformed B-cells, but is not 131 expressed at the protein level in normal cells (Ditton et al., 2004; Foresta et al., 2000a; Rauschendorf et al., 2011), which allows us to speculate that the toxicity of potential DDX3Y inhibitor will be low. Indeed, males with germline DDX3Y mutation present with infertility, but no other phenotype, except azoospermia, has been observed (Foresta et al., 2000a; Rauschendorf et al., 2011). Up to now, there are two small molecule inhibitors of DDX3 available for pre-clinical studies (Bol et al., 2015; Brai et al., 2016). Both target the core helicase domain of DDX3, which is almost identical between DDX3X and DDX3Y. Undoubtedly, there is an urgent need for the development of more DDX3Y-specific agents. An interesting solution could be the use of Proteolysis Targeting Chimeras (PROTAC) technology to target unique parts of DDX3Y to direct the protein to ubiquitin-proteasome system for degradation (Gao et al., 2020). 132 CHAPTER 5 Elucidating the role of translated micropeptides in Diffuse Large B-cell Lymphoma 5.1 Background The human genome is thought to contain approximately 20,000 protein-coding sequences. The annotation of these Open Reading Frames (ORFs) has been based on a set of rules predicting translation of stable and functional proteins, for example requiring minimum length of 100 amino-acids (aa), methionine encoded translation start site, biased codon usage and high sequence conservation between species (Brent, 2005; Ingolia et al., 2011). The hypothesis of one gene, one enzyme (or one polypeptide in latter variants), formed initially in 1941 (Beadle and Tatum, 1941), also largely shaped our understanding of the organisation of the protein coding regions. However, in the light of recent studies, there are several exceptions from those rules suggesting that the human proteome might be much more complex and dynamic than we have thought. Firstly, the number of variants and isoforms, or as has been recently proposed proteo- forms (Smith and Kelleher, 2013), of known proteins may exceeds a fixed, gene-centric, set of 20,000 sequences. Some estimates put the figure of 70,000 or even few millions proteoforms produced in the human genome, if all post-translational modification are included (Aebersold et al., 2018a). Another layer of proteome complexity is provided by the discovery of new coding regions that were missed by the consensus annotations due to, for example, their small size or unexpected localisation. The existence of those noncanonical proteins, also referred to as micropeptides or cryptic peptides, have already been confirmed by several studies, which demonstrated wide range of biological activity of these products. For instance, the chemokine family, which are secreted by cells to 133 coordinate the immune response, comprise more than 40 known proteins with length between 67–127 amino-acids (Moser and Willimann, 2004). Other examples of biologically relevant micropeptides include the the family of defensins (18-45 aa), ribosomal protein L24 (RPL24, 25 aa) and three calcium transporter regulators: phospholamban (PLN, 52 aa), sarcolipin (SLN, 31 aa), and myoregulin (MLN, 46 aa). The ability of new sequencing techniques, such as Ribo-Seq to reveal the position of every translating ribosome in the cell opens new opportunities to re-examine the translation landscape. Ribo-Seq technique proved to be highly e↵ective to identify new ORFs in bona fide non-coding regions of the genome, such as long non-coding RNAs (lncRNAs) or 50 and 30 untranslated regions, 50UTR and 30UTR respectively, flanking known ORFs (Chen et al., 2020; Chong et al., 2020; Jackson et al., 2018; Calviello and Ohler, 2017). However, only a handful of these have been validated to encode functional micropeptides with potent biological functions. The scope of noncanonical translation and its exact role are still unclear. In attempt to bridge this gap, I took advantage of a large translatome dataset generated in the Hodson lab involving 79 Ribo-Seq libraries with matching RNA-Seq samples obtained from primary B-cells and lymphoma cell lines. In this chapter: 1. I introduce a systematic approach to annotate de novo actively translated regions directly from Ribo-Seq data, 2. I query a set of mass-spectromentry datasets downloaded from public repositories for identified noncanonical ORFs, 3. I rank putative micropeptides according to their biological relevance, 4. I use the top scoring entries to the design a knockdown CRISPR screen to identify ORFs essential for B-cell growth and survival. 134 5.2 Results 5.2.1 A systematic approach for de novo identification of non- canonical translation products in lymphoid cells. 5.2.1.1 An integrated ORF identification workflow To annotate regions of noncanonical translation, I developed a bioinformatic pipeline that integrates translatome (Ribo-Seq) and transcriptome (RNA-Seq) data creating a non-redundant database of putative peptides and proteins (Figure 5.1 A). The core idea of this workflow is to analyse the phasing pattern of ribosomal footprints that is expected to be enriched in actively translated regions. It is important to point out that a simple alignment of Ribo-Seq reads to a genomic region is not an evidence of active translation. Of RNAse protected mRNA fragments, sequenced in Ribo-Seq, only in about 85 % correspond to the ribosomes position. The remaining portion of footprints comes from other RNA-protein complexes (Ji et al., 2016). The two situations can be easily distinguished by looking at the distribution of Ribo-Seq reads over a queried region. Ribosome-derived reads (1) have length about 28-29 nucleotides, (2) span the entire translated region (signal uniformity), and (3) show 3-nucleotide periodicity in the signal shape (frame preference), which corresponds to the ribosome decoding only three nucleotides at time. In contrast, nonribosomal footprints are usually highly localised and produce reads of varying lengths (Ji et al., 2016). To find genomic regions showing features of active translation I have combined four independent computational algorithms: ORFLine (Hu et al., 2021), ORF-RATER (Fields et al., 2015), RibORF (Ji, 2018) and RiboCode (Xiao et al., 2018) to (Table 5.1) that were developed specifically for this task. Overall, all four tools benefit from a regular pattern of alignment of ribosome-derived footprints, but adopt di↵erent statistical models to rank and select regions, which have the highest probability of being translated. Some tools incorporate also additional metrics that reflect signal uniformity, such as Inside/outside ratio (I/O ratio), sequence coverage or Ribosome Release Score (RRS) quantifying the abrupt drop in Ribo-Seq signal after the stop codon. Because translation can sometimes begin from codons other than AUG (Kearse and Wilusz, 2017), all possible reading frames constrained by any combination of start (AUG, UUG, GUG or CUG) and stop codons (UAA, UGA, or UAG) were considered. All parameters of ORF-finding algorithms were kept at default values, as recommended in the original publication, see section 2.2.11 for details. 135 Figure 5.1: Systematic approach to identify translated ORF in lymphoid cells A) Flow chart showing the workflow for identifying ORFs from Ribo-Seq datasets B) Diagram explaining the strategy of hierarchical merging adapted from Ouspenskaia et al. (2020). Samples are represented as leaves that are merged based on the biological similarity to create clades and finally one large root file containing all sequenced reads. Accumulation of Ribo-Seq reads over multiple samples should increase the strength of frame preference (increase sensitivity), while maintaining tissue specificity, as individual samples are also used for ORF identification. 136 Table 5.1: An Overview of ORF-finding algorithms used in this study Tool Input Start codons Translation metrics Statistical evaluation ORFLine Ribo-Seq RNA-Seq NUG Frame preference, Ribosome release score, Inside/outside read ratio, sequence coverage Log-scaled chi-squared goodness of fit test for frame preference ORF-RATER Ribo-Seq RNA-Seq NUG Frame preference, aggregate profiles over start/stop codons Linear regression model RibORF Ribo-Seq NUG Frame preference, signal uniformity Logistic regression model Support Vector Machine classifier RiboCode Ribo-Seq NUG Frame preference Modified Wilcoxon signed-rank test First, the RiboStream pipeline was applied to a large dataset of 79 paired Ribo-Seq and RNA-Seq libraries covering a variety of lymphoid cell types including 12 samples from primary germinal centre B-cells, 5 samples from DLBCL tumour biopsies and 60 samples from established lymphoma cell lines. To maximise sensitivity and specificity of detection of translated regions, I adopted a strategy of hierarchical ORF detection that was initially introduced by Ouspenskaia et al. (2020). If the number of aligned footprints fall below certain level, the 3-nucleotide periodicity become indistinguishable from non-translational noise and may be missed by ORF identification algorithm. However, if we aggregate the Ribo-Seq signal across multiple samples and run the tools on the merged set of footprints, truly translated regions should accumulate enough mapped reads to show a clear phasing pattern, thus, reaching detection level of ORF finding algorithms. Hierarchical merging of Ribo-Seq and RNA-Seq BAM files (aligned reads) is guided by the biological similarity, which allows to increase sensitivity of detection while maintaining the specificity of cell type expression (Figure 5.1 B). For example, from the data generated in chapter 3 (3 experimental conditions, 4 replicates each) 4 merged files can be built and 16 ORF identification runs can be performed at three levels: 1. Leaves level: individual samples runs (12 samples) 2. Group (clades) level: files merged by experimental condition (3 files, one per condi- tion)) 3. Root level: all 12 individual samples merged into 1 file The same logic has been applied to all 79 Ribo-Seq samples giving rise to 177 files to run ORF identification tools. 137 5.2.1.2 Pervasive translation of crude non-coding regions in lymphoid cells In total, I found 13 483 canonical ORFs and 43 537 noncanonical ORFs identified by at least two algorithms (Figure 5.2 A). Interestingly, only 420 (0.7%, 420/57,020 ) noncanonical ORFs were picked by all four tools what suggests substantial di↵erences between the ORF finding strategies and reinforces the need of using more than one tool for comprehensive identification. As expected, the number of ORFs was positively correlated with the number of ribosomal footprints used as an input. The merged files, generated through hierarchical merging strategy, brought almost one order of magnitude more ORFs than would be from using from just raw, unmerged, files (Figure 5.2 B). Although all tools classify identified ORFs into distinct types routinely, the strategies to do so and the exact definitions of those ORF types vary between the algorithms. To make the results comparable, I reassigned the ORFs to a representative transcript isoform and unified ORF classification specifying nine ORF categories based on their genomic location and reading frame (Figure 5.2 C), see section 2.2.11 for detailed description of the classification strategy. Of the 43 537 noncanonical ORFs, the largest group (31232/43537, 71.8%) encompassed ORFs situated in the noncoding parts of coding transcripts, or overlapping the known CDS out-of-frame with the annotated product. The noncanonical ORFs were found in 8586 protein-coding transcripts and each of these contained more than one additional ORF, 2.8 on average. 14.7% (6386) of noncanonical ORFs showed substantial, in-frame, overlap with annotated ORFs and were considered new isoforms of known ORFs with either truncated or extended amino-acid sequence. Among the noncanonical ORFs, 5919 (5919/43537, 13.6%) were localised on transcripts annotated as noncoding, predominantly in long noncoding RNAs (Figure 5.2 C). Consistent with previous studies (Chen et al., 2020; Cuevas et al., 2021), noncanonical ORFs were much shorter than the canonical proteins: the length of almost 50 % was lower than 100 amino-acids (Figure 5.2 D). Noncanonical ORFs showed strong frame preference, accumulation of footprints at the start codon and sudden drop in signal after estimated termination site (Figure 5.2 E), which all characterise active translation and resembles the pattern observed in known protein coding regions. 138 Figure 5.2: Ribo-Seq reveals pervasive translation of noncanonical ORFs in B-cells. A) Venn Diagram showing the total number of ORFs identified and the overlap between the ORF-finding algorithms. Only ORFs detected by more than 1 tool (numbers in black) were selected for further analysis. B) Scatter plot showing the relationship between the mean number of ORFs detected per sample and the total number of ribosomal footprints used as an input. The colour indicates which input files were obtained from merging the original FASTQ files together. C) Barplot showing the percentage contribution of each ORF type to the total of 57 020 ORFs identified. ORF types were divided into three classes: known ORFs and their variants (blue), noncanonical ORFs localised on coding transcripts (violet) and novel ORFs in noncoding transcripts (pink). D) Histogram showing ORF length distribution for canonical and noncanonical ORFs. E) Barplot showing average three nucleotide periodicity and single frame preference in canoni- cal and noncanonical ORFs. 139 5.2.2 Noncanonical ORFs account for about 10% of proteins detected in proteomics experiments The analysis of the Ribo-Seq dataset revealed pervasive translation of thousands of noncanonical regions in lymphoid cells. The most burning questions arising in this context are: 1) are those products of noncanonical translation detectable at the protein level, 2) is this a reproducible finding, and 3) are there any additional evidence, other than Ribo-Seq, that those regions are biologically relevant? I addressed the first question by reanalysing 13 publicly available mass spectometry (MS) datasets, covering various cell types and conditions, and querying them for the presence of peptides originating from predicted ORFs. Searching for the products of noncanonical translation in proteomic data is a nontrivial challenge. Firstly, their small molecular weight can impede detection by standard mass spectrometry techniques, thus, even less popular methods should be explored. Recent studies demonstrated that noncanonical proteins are enriched in Major Histocompatibility Complex (MHC) bound peptides (MAPs) (Chong et al., 2020; Chen et al., 2020; Ouspenskaia et al., 2020). Mass-spectrometry based identification of MAPs (immunopeptidomics) has attracted much attention in the context of tumour neoantigens and numerous datasets are deposited in public repositories. Another interesting technique is, so called, deep proteome or high-resolution MS, which allows for higher protein coverage and broader dynamic range of detection (Bekker-Jensen et al., 2017). Therefore the collected datasets included SILAC or tandem mass tag (TMT) techniques, as well as immunopeptidomics and deep-proteome experiments, see section 2.2.12 for details regarding the bioinformatic workflow. A typical analysis of a proteomic dataset involves the assignment of collected spectra to the reference sequence. This poses another diculty, because an inflated search database used for mass spectra survey can lead to spurious matches increasing the number of false positive findings (Blakeley et al., 2012; Nesvizhskii, 2014). To decrease the search space for MS analysis 1) I narrowed down a list of putative ORFs by removing all ORFs shorter than 18 nucleotides (6 amino-acids), and 2) filtered out all in-frame ORFs with more than 20% overlap with known ORFs focusing my attention on unique ORFs that are unlikely to be variants of known peptides. After filtering my list of putative micropeptides had 30,188 unique sequences that were used to build a customised search database for MS analysis. To avoid forced assignment of the mass spectra to noncanonical ORFs, the final search database contained the noncanonical peptides sequences and a set of reference proteins downloaded from UniProtKB. I queried the MS datasets using MaxQuant software or NewAnce workflow, designed specifically to process the immunopeptidomics data, see Chong et al. (2020). 140 From all MS datasets surveyed, I identified 564,764 peptides assigned to 12,518 unique proteins (PSM FDR 1%, Protein FDR 1%) (Figure 5.3 A). Of these, between 1 and 5.5 % were assigned to noncanonical ORFs (Figure 5.3 B) accounting for 1,311 unique ORFs in total. The total number of peptides identified in MS data and the percentage of those derived from noncanonical ORFs varied slightly between datasets and MS techniques, the di↵erence was on the border of statistical significance (Kruskall-Wallis p = 0.1). The highest proportion of peptides derived from noncanonical ORFs was in MHC-I immunopeptidomics experiments and in a dataset utilising a deep proteome technique (Bekker-Jensen et al., 2017) (Figure 5.3 B). This may suggest that certain properties of immunopeptidomics and deep proteome experiments may facilitate the detection of noncanonical proteins. It can be attributed to technical as well as biological properties, such as preferential access to the antigen presentation pathway, diculties with detecting small or short living proteins by standard techniques. MHC complex pull-down, multi-fractionation or the usage of multiple proteases in deep-proteome studies may facilitate detection of otherwise undetectable proteins (Bekker-Jensen et al., 2017). Figure 5.3: Analysis of mass spectrometry datasets in search of peptides matching predicted noncanonical ORFs A) Barplot showing the total number of unique peptides identified in each mass spectrometry dataset. Colour of the bar indicates whether a peptide has been assigned to noncanonical (red) or canonical (grey) protein. B) Boxplot showing the percentage of unique peptides matching predicted noncanonical proteins. Statistical significance of the di↵erence in the proportion of peptides derived from noncanonical ORFs in each MS group was determined with Kruskal-Wallis test, p-value = 0.1. 141 Next, I evaluated the accuracy of noncanonical proteins identification using four metrics. In all datasets the distribution of mass measurement error (the di↵erence between an individual measurement and an expected value for a peptide) and the Andromeda score (probabilistic score reflecting the accuracy of peptide to MS spectrum match) were indistinguishable for noncanonical and canonical proteins (Figure 5.4 A-C). The observed Retention Time (RT) of eluted peptides was highly correlated with the chromatographic Hydrophobicity Index (HI) predicted from the amino-acid composition of a peptide (Figure 5.4 C) (Krokhin and Spicer, 2010). The correlation for peptides derived from canonical and noncanonical ORFs was similar. Lastly, the distribution of MHC-bound peptides was similar in both groups with the majority of peptides having length between 9 and 10 nucleotides for MHC-I and 11-17 nucleotides for MHC-II (Figure 5.4 E). Finally, I addressed the question of reproducibility of ORF identification. I aligned the noncanonical ORFs against the extended database of human proteins downloaded from UniProt (SwissProt and TrEMBL sets) and from two public repositories, OpenProt and sORFdb, containing the sequences of new proteins predicted by other proteogenomic studies. 5774 (19.12%, 5774/30188) ORFs matched UniProt, OpenProt or sORFdb with at least 95% sequence similarity (Figure 5.4 F). Almost half of the ORFs (47.52%, 623/1311) with matching peptides from the reanalysed MS experiments were confirmed in at least one of the databases (Figure 5.4 F). In summary, although relatively small proportion of noncanonical ORFs was identified in proteomic experiments, they accounted for about 10% of all unique proteins quantified and showed the accuracy of detection no di↵erent to known proteins. In total, 21.40% of noncanonical ORFs (6462/30188) had evidence of protein levels expression (UniProt or proteomics) or have been identified as actively translated by other groups, which reinforces the authenticity of our Ribo-Seq workflow. The remaining 78.6% were unique to Ribo-Seq data from this study (no protein level evidence and not identified in other translatome studies). 142 Figure 5.4: Features of peptides from canonical and noncanonical proteins identified in MS experiments A) Histogram of mass error for identified canonical and noncanonical peptide-spectrum matches (PSM). The p-value < 1016, calculated with Kolmogorov–Smirnov test (Number of canonical PSM = 7378711, number of noncanonical PSM = 80829). B) Histogram of Andromeda Score, which mirrors the accuracy of peptide-spectrum match of identified canonical and noncanonical peptides. The p-value < 1016, calculated with Kolmogorov–Smirnov test (Number of canonical PSM = 7378711, number of noncanonical PSM = 80829). C) Pearson correlations between observed and SSRCalc predicted retention times of peptides derived from canonical and noncanonical proteins. D) Total number of unique canonical and noncanonical proteins and mean sequence coverage of matched peptides. E) Histogram of length of MAPs derived from canonical and noncanonical proteins. F) Venn Diagram showing the overlap between ORFs identified in MS datasets, found in UniProt and proteogenomic databases. 143 5.2.3 Characteristics of noncanonical ORFs producing MHC- bound peptides Next, I wished to find the features of noncanonical proteins that increase the probability of protein level expression. In contrast to known proteins, there was only a little overlap in noncanonical ORFs detected in immunopeptidomics and full proteome MS data: only 4.3% were identified by both techniques (Figure 5.5 A-B). The distribution of the ORF types between various proteomic techniques was also remarkably di↵erent (Figure 5.5 C). MHC-bound peptides were more likely to come from uORF, while peptides derived from noncoding transcripts (pseudogenes or lncRNAs) were more common in full proteome data (Chi-square test for independence, p-value < 2.2 · 106). Interestingly, there was no di↵erence in the distribution of ORF types between standard MS experiments and deep-proteome studies (Chi-square test for independence, p-value = 0.59). This suggests that there might be a relationship between the ability to detect ORF-derived peptides and their biological properties. I integrated di↵erent ORFs characteristics into a machine-learning algorithm based on Random Forests to predict MS detection. Because of little overlap between immunopep- tidomics and full proteome studies in detecting peptides originating from noncanonical ORFs, I built separate single-class classification models to distinguish MHC-detected vs. not detected and full proteome (deep proteome or standard MS) detected vs. not detected. Sequence characteristics incorporated into the models included: the percentage of the overlap with known CDS, ribosomal footprints density, translation eciency, gene type (protein-coding, lncRNA or pseudogene), GC content, length, amino-acid composition and isoelectric point (pI). Variants or isoforms of known genes (ORFs classified as canonical, truncated, readthrough or extended) were filtered out and all noncanonical ORFs with mean mRNA expression and ribosome footprints density > 0 were included. Of these randomly selected 75% were used as a training set, the remaining 25% was used for testing. The performance varied substantially between the two suggesting that ORF features are not always sucient to predict detection in proteomics. ORF characteristics predicted well the identification of MHC-bound peptides (80% recall at 40% precision, AUPRC = 0.4919). Chromosome location, ORF score and the number of ORF identifying tools recognising an ORF as translated were the most important factors for accurate prediction. ORFs with detected MHC-bound peptides were significantly enriched in chromosome 4, 20 and 6 (z-test of two proportions, adjusted p-value < 0.01), had higher ORF scores (Kolmogorov-Smirnov test, p-value < 1016) and were more likely to be detected by more than one tool (Chi-squared test < 1016). Surprisingly, despite similar number of ORFs detected in full proteome studies, the prediction did not work eciently for this group (AUPRC = 0.0264). 144 Figure 5.5: Ribo-Seq reveals pervasive translation of noncanonical ORFs in B-cells. A) Venn Diagram showing the overlap between MHC and full proteome derived peptides derived from noncanonical proteins. B) Venn Diagram showing the overlap between MHC and full proteome derived peptides derived from canonical proteins. C) Barplot showing the proportion of ORF types in peptides identified by di↵erent mass spectrometry techniques. Chi-square test for independence, p-value < 2.2 · 106 D) Performance of machine-learning-based classifiers in predicting detection of noncanonical ORFs derived peptides in mass spectrometry. Random forest classifiers were trained on the set of noncanonical ORFs characteristics and performance was assessed in a tenfold cross-validation (CV) mode. E) Feature importance of random forest classifier predicting MAPs detection represented as mean decrease in accuracy. F) Percentage of noncanonical ORFs with MAPs per chromosome. Highlighted points indicate chromosomes with statistically significant enrichment of MAPs. Z-test of two proportions, adjusted p-value < 0.01. G) Violin plot showing the di↵erence in ORF score between noncanonical ORFs with and without detected MAPs. Kolmogorov-Smirnov test, p-value < 1016 H) Percentage of noncanonical ORFs detected with more than one tool compared for ORFs with without detected MAPs, Chi-squared test for independence < 1016 I) Scatter plot showing feature importance of random forest regression model predicting the value of the ORF score; measured with mean decrease in accuracy and mean increase in mean squared error (MSE). 145 I also investigated the factors determining ORF score value, which directly translates into the strength of three-nucleotide periodicity, and so is the key determinant of active translation. Overall ORF features explained 61.29 % of ORF score variance with overlap with known CDS, ribosome footprint density, mRNA expression, conservation score and AUG as a start codon being the most influential predictive features. High importance of expression related measures (mRNA expression or ribosomal footprints density) resonates well with known relationship between the number of ORFs annotated and the number of mapped ribosomal footprints. This analysis suggests that, firstly, certain biological factors, sequence characteristics may determine MHC presentation of peptides derived from noncanonical ORFs and, secondly, low sequence coverage or low expression level may be the primary limiting factor for ecient detection and validation of noncanonical ORFs. 5.2.4 Design of customised knockout CRISPR screen to identify noncanonical ORFs important for B-cells survival To identify the noncanonical ORFs that are essential for B-cell survival in a systematised manner, I wished to screen the most promising candidates using a customised CRISPR library which contains 6000 gRNAs targeting 1,625 not overlapping ORFs. I designed a knockout CRISPR screen to target selected ORFs with features indicating an important biological function in lymphoid cells. I divided the process of selecting ORFs for screening into two stages: negative selection aiming to discard all ORFs that are not fit for CRISPR-Cas9 targeting and positive selection enriching for regions with interesting biological features. The first stage discards all ORFs which are either too short to be targeted with sucient number of gRNAs, have low expression in the primary cells or cell lines chosen for the screen, or share too large overlap with known CDS region, so that the observed phenotype could be easily explained by disruption in the canonical coding region (Figure 5.6 A-B). Next I screened the remaining ORFs for compatible gRNAs cutting within a predicted ORF. I filtered out all untargetable ORFs with the number of good quality gRNAs below 5. By good quality I understand gRNAs with GC content between 40 and 70 %, with no direct o↵-target locations and without homopolymers of 4 or more consecutive Ts. The presence of homopolymers in gRNA sequence can decrease cutting activity and TTTT is known to act as minimal T-stretch termination signal for RNA polymerase III (Gao et al., 2018). Indeed, when I compared the eciency of dropout for genes identified as essential in knockout CRISPR-Cas9 screen in lymphoid cell lines (Phelan et al., 2018), I observed significant decrease for gRNAs containing homopolymers, especially series of Ts (Figure 5.6 C). 146 Figure 5.6: Design of a CRISPR-Cas9 CRISPR screen library to study noncanonical ORFs in lymphoid cells A) Barplot showing the proportion of noncanonical ORF excluded from the screen. In total about 68.49 % ORFs were excluded because of low expression level, too large overlap with known CDS or too being too short. B) Violin plots showing mean expression level of noncanonical ORFs in cells selected for the screen. Red dashed line corresponds to the 20th percentile, which was considered a threshold for low expression. C) Barplots showing dropout eciency of gRNAs targeting essential genes from Phelan et al. (2018) knockout CRISPR-Cas9 screen stratified by the presence of homopolymers in gRNAs sequence. Statistical significance (compared to ‘None’ group) determined with two sample Wilcoxon test, adjusted p-values computed with Benjamini and Hochberg method. **** < 0.0001, *** < 0.001, ** < 0.01 D) Violin plot showing RNA folding energy of gRNA included and not included in the final CRISPR-Cas9 screen library. E) Violin plot showing one of the gRNA eciency scores for gRNAs included and not included in the final CRISPR-Cas9 screen library. F) Barplot showing the percentage of di↵erent ORF types targeted with the library of gRNAs. G) Histograms showing the ORFs targeted with the gRNA library have overall higher evolu- tionary conservation, higher ORF score and higher expression level. The p-values in all groups (folding energy, DOENCH 2016 score, evolutionary conservation, ORF score and expression level) < 1010, calculated with Kolmogorov–Smirnov test 147 The eciency of all gRNAs was evaluated using a number of eciency scores, including prioritising a G at position 20 (upstream of PAM) and few multi-factor scores predicting gRNA stability and activity (Moreno-Mateos et al., 2015; Xu et al., 2015; Doench et al., 2014, 2016) (Figure 5.6 D). Given reports about inferior on-target activity of gRNAs with internal harpins and regions of self-complementarity, I also prioritised gRNAs with lower folding energy (Thyme et al., 2016) (Figure 5.6 E). 9514 ORFs that left after the stage of negative selection were then ranked based on the features that make them interesting from a biological perspective. I took intro account the following characteristics: mean ORF score (reflecting the strength of frame preference), mRNA expression level, the fact of being detected at the protein level (in immunopeptidomics or full proteome data) or matching predicted proteins from external databases, evolutionary conservation, the number of tools that identified an ORF as actively translated, the number of samples where this happened and the di↵erential expression pattern shown in our large Ribo-Seq dataset. The final library targets 1,625 top scoring ORFs with gRNAs showing the highest predicted on-target activity, on average 4-5 gRNAs per ORF (Figure 5.6 F-G). As negative controls I used 250 non-targeting gRNAs from an established Brunello CRISPR-Cas9 library (Sanson et al., 2018), which do not recognise any sequence in the human genome. This will help me to distinguish the e↵ect of neutral drift from selective disadvantage caused by disruption of biologically important ORFs. For positive control I selected two known oncogenes, MYC and POU2AF1, which knock-out should be disadvantageous for established lymphoma cell lines and primary GC B-cells (Phelan et al., 2018; Caeser et al., 2019) allowing me to estimate the expected dropout capacity of the screen. This library was ordered as an oligo pool from Twist Bioscience and cloned into a lentiviral backbone, maintaining representation > 100 colonies per guide. It was then introduced into six Cas9-expressing lymphoma cell lines and ex vivo human GC B cells maintaining representation of >1,000 transduced cells per guide. Cells were harvested at day 21 and sequencing libraries prepared. At the time of writing these libraries are waiting to be sequencing. These CRISPR screening experiments were conducted by Dr. Stamatia Vori, a Masters student in the Hodson lab. 148 5.3 Discussion The last four years have witnessed an increase in studies investigating the topic of noncanonical translation. Our extensive collection of 79 B-cell translatomes generated in primary GC and malignant B-cell cells provides a powerful resource to investigate this question in the context of physiology and pathology. To our knowledge, this is the largest study exploring this topic in B-cells so far. The proteogenomic method described here integrates Ribo-Seq, RNA-Seq, mass spectrometry (MS) and external databases to comprehensively evaluate the scope of noncanonical translation in lymphoid cells and identify the most promising targets for high-throughput knockout CRISPR screen. The initial results of the systematic analysis of B-cells translatomes revealed 43 537 noncanonical Open Reading Frames (ORFs) and 13 483 canonical (known) ORFs. Noncanonical ORFs were typically situated in ostensibly noncoding regions of the genome, accounted for about 10% of all proteins detected in the analysis of external mass spectrometry experiments, and almost 20% was also found in other proteogenomic databases. To identify noncanonical ORFs essential to B-cell survival, I designed a customised knockdown CRISPR screen library targeting top ORFs with the highest likelihood of protein level expression and biological relevance. Evidence behind pervasive translation of noncanonical ORFs This and previous studies (Chen et al., 2020; Chong et al., 2020; Cuevas et al., 2021; van Heesch et al., 2019) employed Ribo-Seq to provide evidence for widespread ribosome occupancy in noncoding regions of the transcriptome. This has been observed in a range of model organisms (Zhang et al., 2019; Mackowiak et al., 2015), in studies utilising di↵erent translation inhibitors for Ribo-Seq library preparation (Ingolia et al., 2011; Zhang et al., 2018a; Lee et al., 2012), in polysome-centric studies (Poly-Ribo-Seq) (Aspden et al., 2014), and in vivo with RiboTag RNA sequencing (Jackson et al., 2018; Sanz et al., 2009). This is a striking finding that puts some rules of eukaryotic translation into question, including monocistronic organisation of eukaryotic transcripts or the purely noncoding nature of several long-noncoding RNAs. Limitations of proteomic-based approaches for novel proteins discovery Although noncanonical translation seems to be a recurrent observation in translatome studies, validation of their products at the protein level is lagging behind. In a most popular approach - shotgun proteomics, proteins extracted from the cell are fragmented with a protease (typically trypsin), and the mixture of digested peptides is analysed with tandem mass spectrometry (MS/MS). Identification of the mass spectra occurs through comparison with the reference database of in silico generated spectra from a provided 149 sequence of reference proteins (Perez-Riverol et al., 2018). This poses diculties for pinpointing peptides originating from proteins not included in the reference sequence database, with unexpected post-translational modifications (PTMs) or with low signal-to- noise ratio (Griss et al., 2016). On the other hand, an inflated reference database leads to a high number of false positive findings (Blakeley et al., 2012; Nesvizhskii, 2014), so the reference must be prepared with care. Here, I adopted a parsimonious strategy combining the sequence of known proteins with the sequence of ORFs predicted with Ribo-Seq, so that using the smallest possible reference database to search for novel proteins. Only about 4% of noncanonical ORFs predicted in our study had evidence of protein level expression. Are those remaining 96% of noncanonical ORFs a technical noise, or we just failed to detect it with proteomics? Both scenarios are possible. The arguments in favour of the first are: relatively low reproducibility of the assay between di↵erent ORF identification algorithms and the small overlap with external proteogenomic databases as only about 20% of noncanonical ORFs were observed in other studies. Another issue is that we cannot exclude the possibility that the Ribo-Seq signal, even when showing the periodical pattern of alignment, does not always correspond to actively translating ribosomes. For example, could the non-coding regions of the genome work as a ribosome sponge by sequestrating them or directing their transport? Noncanonical translation could also correspond to mRNA quality check or maturation process, e.g. during a pioneering round of translation (Maquat et al., 2010). On the other hand, although orthogonal evidence for about 20% of noncanonical ORFs seems like a low recall, it may be a lot given that we know very little about noncanonical translation biology. Even in the group of 20,386 manually curated human proteins (Swiss-Prot), almost 20% (3,989/20,386) lacks strong protein level evidence (Perdiga˜o et al., 2015). Indeed, out of about 20,000 canonical proteins, only half was identified in our MS data analysis. On average, 75% of mass spectra reported in a typical MS experiment remains unidentified (Griss et al., 2016). Many of these have high quality and are likely to emerge from real proteins (Chick et al., 2015). The life cycle (synthesis and degradation rate), a pattern of PTM or subcellular localisation of noncanonical ORFs may be di↵erent to canonical proteins, which may poses diculties with detection using standard proteomic techniques. For example, not all noncanonical micropeptides contain a tryptic cleavage site, which may complicate their identification in assays using trypsin for protein fragmentation. Enrichment of noncanonical translation products in the MHC- bound peptidome (immunopeptidome) is an unprecedented finding (Chong et al., 2020; Ouspenskaia et al., 2020; Chen et al., 2020; Cuevas et al., 2021). Firstly, it reinforces the concept that the search for micropeptides, ‘hidden’ in the cellular proteome, should include a broad range of proteomic techniques. And secondly, it suggests that the role of micropeptides may be linked to immunity and the generation of peptides directed for 150 MHC presentation. Finally, the MS data analysed here were downloaded from external repositories, so there is a possibility that our ability to detect mass spectra matching predicted micropeptides might be lower than in studies, where MS and Ribo-Seq data were generated from the same biological model. For example, matching mass spectra were found for about 7% of noncanonical ORF identified by Chen et al. (2020). Potential role of micropeptides in immunity and immune surveillance Noncanonical translation has drawn recently much attention in the context of antigen presentation, and prospective targets for cancer immunotherapy as a considerable fraction of MHC-derived peptides (MAPs) has been attributed to noncanonical proteins (Chong et al., 2020; Ouspenskaia et al., 2020; Cuevas et al., 2021). In line with those studies, I also observed a sizeable proportion of MAPs (between 2 and 5%) originating from noncanonical ORFs. Of 9,242 proteins identified in all reanalysed immunopeptidomics datasets, 652 (7.05%) were noncanonical proteins encoded by ORFs from noncoding regions of the genome. MHC genes are one of the most polymorphic genes, and each allele (allomorphs) binds a distinct set of peptides. This may limit our ability to identify MAPs accurately if the immunopeptidomics data were generated from di↵erent cells than the Ribo-Seq samples used for ORF prediction. A fascinating concept linking noncanonical translation to immunopeptidome and defective ribosomal products (DRiPs) was formulated by Jonathan Yewdell. DRiPs are short-lived peptides (half-lives of minutes), which originate from invalid rounds of translation, e.g. due to mutations, synthesis errors, misfolding, truncation etc. (Dersh et al., 2021). Initially, the DRiPs hypothesis was put forward to explain the rapid presentation of antigens derived from stable viral proteins (Yewdell et al., 1996). DRiPs can also arise in abundance from noncanonical translation (e.g. ORFs with near-cognate start codons), especially when 50-cap-dependent translation is shut down during stress or viral infection. Interestingly, the composition of MAPs is poorly reflected by the transcriptome and proteome, i.e. most abundant mRNAs or proteins do not necessarily produce the largest number of MAPs (Pearson et al., 2016). Certain genomic regions are ‘hot spots’ of MAPs despite relatively small contribution to the cellular proteome or transcriptome (Pearson et al., 2016). Preferential access of certain peptide groups to the MHC presentation pathway could be a part of the immunosurveillance process, which is especially important in the context of tumour formation. Immunosurveillance involves a complex interplay between the tumour immunogenicity, immune cells infiltration, cytotoxic T-cells activation, immune checkpoints and the microenvironment (Dersh et al., 2021). A better understanding of cancer-specific antigens could aid in designing cancer vaccines, personalised CAR T cell therapy or drugs that increase peptide generation in cancer cells, thus immune visibility of the tumour. 151 Possible roles of upstream Open Reading Frame (uORFs) Translation of thousands of upstream Open Reading Frames (uORFs) is a recurrent finding in global translatome studies, including this one. The regulatory properties of uORFs have been known for a long time, but the evidence was mainly anecdotal, limited to a handful of transcripts, such as ATF4, MDM2, CEBPA and CEBPB (Wethmar et al., 2014). However, current estimates place the percentage of mammalian protein-coding genes with potentially functional uORFs at around 40-50% (Johnstone et al., 2016; Lee et al., 2012). In our study, I identified uORFs in 37% (5034/13483) of protein-coding genes, between 2 - 3 uORFs per gene. uORFs are usually permissive for translation of the main coding sequence, but the eciency of re-initiation of the downstream translation may be reduced Smith et al. (2021); Zhang et al. (2019); Hinnebusch et al. (2016). Despite a possible deleterious e↵ect of uORFs on canonical ORF translation and observed depletion of population variants creating polymorphic uORFs, uORFs present in the genome show a higher level of conservation than expected from neutral evolution (Zhang et al., 2019; Churbanov et al., 2005; Zhang et al., 2021). A recent systematic analysis of 16,907,129 upstream AUGs (uAUGs) in 478 eukaryotic species showed strong purifying selection for the vast majority of uORFs suggesting the biological importance of these regions (Zhang et al., 2021). In line with these studies, I observed many uORFs localised in the regions of high evolutionary conservation. Up to now, regulation of translation of the downstream (canonical) ORF is the main role of uORF. The amino-acid sequence of putative, uORF-encoded micropeptides shows a smaller degree of conservation, suggesting that the primary function of the majority of uORFs might be a fine tuning of downstream ORF translation rather than the encoding of stable proteins (Zhang et al., 2021). The balance between uORF and main ORF translation is influenced by a number of factors. Proposed determinants include: the total number of uORFs in 50UTR (Zhang et al., 2018a; Chew et al., 2016), uORF position with respect to canonical start codon (distance, out-of-frame/in-frame) (Johnstone et al., 2016; Chew et al., 2016; Calvo et al., 2009), the activity of certain translation factors, Kozak sequence context (Rogozin et al., 2001) or the adjacent mRNA secondary structure (Chew et al., 2016; Zhang et al., 2019). A well studied mechanism of preferential uORFs translation is stress and immune response. The frequency of uORF leaky scanning (start codon skipping) negatively correlates with the availability of the ternary complex (Orr et al., 2020). Stress response- associated programme of uORF translation has been associated with the preferential translation of the immune regulator programmed cell death ligand-1 (PD-L1), and a set of genes related to the development of squamous cell carcinoma (Sendoel et al., 2017). 152 Active translation of long non-coding RNAs The concept of actively translated long noncoding RNAs (lncRNAs) is still a controversial and widely debated topic. The main critics of this hypothesis argue that the evidence of lncRNAs translation comes almost exclusively from Ribo-Seq studies. Given that RNAse protected fragments can originate from nonribosomal footprints (Ji et al., 2016), the abundance of footprints observed in certain lncRNAs might be just an artefact. Aebersold et al. (2018b) argue that among all possible amino-acid sequences that could be produced from lncRNAs, only 69 shown proteomic evidence, which, for the majority, was limited to a single peptide match or could be explained by pseudogene missanotation or overlapping exons from adjacent protein coding transcripts. Moreover, most candidate lncRNA-encoded peptides: lack detectable functional protein domain, show lower expression level and lower evolutionary conservation than known protein coding genes (Ji et al., 2015). This, however, provides only a glimpse of a complex system. LncRNAs belongs to a heterogeneous group of 9,640 transcripts (according to ENCODE) with a broad range of biological activity including regulation of transcription, mRNA splicing, sequestration of certain mRNAs or chromatin remodelling (Aebersold et al., 2018b; Statello et al., 2021; Derrien et al., 2012). Annotation of lncRNA has been based predominantly on cDNA alignment to the genome, chromatin signature indicating active transcription and lack of an ORF meeting strict protein-coding criteria (Guttman et al., 2009). Although they do not encode, by convention, canonical proteins, lncRNAs biogenesis is almost indistinguishable from coding mRNAs: they are capped, spliced and polyadenylated (Statello et al., 2021). Pervasive mapping of Ribo-Seq reads to lncRNAs has been reported in several works (Aspden et al., 2014; Bazzini et al., 2014; Ruiz-Orera et al., 2014; Chong et al., 2020; Ouspenskaia et al., 2020; Fields et al., 2015). Indeed, the ribosome occupancy or translation eciency value is not sucient to distinguish between true coding and noncoding regions (Guttman et al., 2013; Xiao et al., 2016), but other metrics, such as ORFscore or Ribosome Release Score (RRS), have been developed to assist with this task. Both ORFscore (the strength of frame preference) and RRS (ribosome clearance after stop codon) are highly dependent on the ORF coverage and may underperform for transcripts with low expression values. I expect that, by combining multiple ORF identification algorithms with the strategy of hierarchical merging, it is possible to, firstly, decrease dependency on one metric to score putative ORFs, and secondly, benefit from a large Ribo-Seq dataset to increase sensitivity and specificity of ORF prediction for transcripts with a broad range of expression values. Many lncRNAs-encoded proteins turned out to be functional and have been validated experimentally in humans, fly and mouse (Jackson et al., 2018; Chen et al., 2020). Translated lncRNAs were more hydrophobic with predicted alpha-secondary structure and Kozak sequence around the translation initiation site, but these features varied between studies suggesting strong context or methodology dependency (Ji et al., 153 2015; Li and Liu, 2019). Ji et al. (2015) showed that lncRNAs with evident features of active translation were almost exclusively cytoplasmic, had higher conservation scores and evidence of purifying selection of amino-acid sequence. A consensus model explaining the observations of coding properties of lncRNAs can be proposed. First of all, there is a possibility that in the group of current lncRNAs, there are true protein coding transcripts that have been missed in annotations because of their unusual features. Some coding lncRNAs could also form a group of bifunctional RNAs: encoding a protein sequence and biologically important noncoding transcript. A handful of such transcripts has already been observed in human, bacteria, and few model organisms, including Xenopus, Drosophila, Zebrafish (Aebersold et al., 2018b; Hube et al., 2011; Chooniedass-Kothari et al., 2004; Kondo et al., 2010; Ingolia et al., 2011; Kumari and Sampath, 2015). Dark side of human proteome may shed light on noncanonical translation The topic of noncanonical translation converges with the concept of dark proteome. The dark proteome refers to the portion of the cellular proteome with unknown structure, encoded by bona-fide noncoding transcripts or with atypical folding structure (Perdiga˜o et al., 2015). Our preliminary analysis showed that the noncanonical ORFs are enriched for intrinsically disordered regions (IDRs), which is an intriguing observation. IDRs are regions with compositional bias in the amino-acid sequence containing more hydrophilic amino acids and proline residues than structured regions (Dyson and Wright, 2005). IDRs provide a large surface of interaction with frequent short linear motifs (SLIMs), including peptide or nucleic acids binding motifs or sites for post-translational modifications (Tompa et al., 2014; Dinkel et al., 2014). Given their ability to bind and interact with other molecules, IDR-containing proteins, including identified noncanonical ORFs, may act as chaperones or co-factors complementing the function of other proteins (Dyson and Wright, 2005). 154 CHAPTER 6 Perspectives Ribo-Seq as a tool to study genome-wide translation in lymphoid cells There is no doubt that the technology to study ribosome occupancy with single-nucleotide precision has been a breakthrough for translation studies. Ribo-Seq has been applied to analyse protein synthesis quantitatively, by estimating the translation intensity of a chosen region, or qualitatively, by exploring which portions of the transcriptome undergo active translation. All Ribo-seq data presented in this thesis has achieved expected quality and repro- ducibility, similar to in the original protocol (Ingolia et al., 2012). This would not be possible without an ecient bioinformatic workflow. Therefore, RiboStream pipeline, which I have developed for parallel and transparent data processing, was the core stage of this project. Despite the widespread application of Ribo-Seq in research, a computational workflow has been poorly standardised and the bioinformatic tools developed, specifically for Ribo- Seq, has rarely been used beyond the initial publication. In this thesis, I performed a basic benchmarking of available tools for di↵erential translation analysis to make an informed decision on the strategy for analysing our Ribo-Seq data. The most striking derivative of this analysis was the strong dependency of the performance of the tools on the experimental design. With the most common experimental design in the literature (2 experimental conditions, 2 replicates each) on average only 20% of truly di↵erentially translated genes could be recalled. The number of replicates as high as 8 was just enough to rise the true positive rate to about 70% and stabilise the false positive rate around the desired level of 5%. This suggests that to utilise the full power of Ribo-Seq for di↵erential translation analysis, at least 8 replicates is recommended. Our benchmarking would benefit from an additional set of Ribo-Seq data, that could validate the aforementioned observations. It would be also interesting to dissect characteristics of the transcripts arising as false positives or false negatives, that could assist in the di↵erential translation interpretation. 155 Translational landscape of GC B-cells malignant transformation In chapter 3 I have also studied the translational consequences of deregulated expression of the two transcription factors, BCL6 and MYC, in primary GC B-cells. With similar ex- perimental design and biological motivation to find di↵erentially translated genes following the overexpression of an oncogenic transcription factor, I have not observed such profound translational reprogramming as Sendoel et al. (2017). A possible limitation of this study is that Ribo-Seq provides only relative quantification of the ribosome footprint abundance, which may be challenging to interpret when massive changes in gene expression landscape are expected. It would be interesting to perform the same experiment but using polysome fractionation combined with RNA-Seq instead. Up to now, a systematic comparison of Ribo-Seq with polysome fractionation based di↵erential translation analysis has not been performed. While mRNA levels have directed the majority of changes in gene expression, the di↵erentially translated genes were mainly associated with cellular housekeeping functions, such as ribosome biogenesis or oxidative phosphorylation. The adaptive role of translational control in fine tuning of highly energetic metabolic processes and to what extent this may a↵ect the process of malignant transformation is not well understood. The concept of ribosome heterogeneity is another fascinating topic worth deeper exploration. If BCL6 or MYC-induced changes in the translation intensity of individual ribosomal proteins translates into the stoichiometry of ribosomal proteins incorporated into the ribosome, this could be an essential mechanism of post-transcriptional regulation. The role of DDX3X in facilitating MYC-driven lymphomagenesis In chapter 4 I revealed that DDX3X loss-of-function promotes lymphomagenesis by bu↵ering MYC-driven increase in global protein synthesis and proteotoxic stress. I show that DDX3X controls the translation of ribosomal proteins, thus global translation load. Although this involves direct binding of DDX3X to transcripts encoding ribosomal proteins, the exact molecular mechanism is still unclear: whether this is related to RNA unwinding activity of DDX3X, associated with mTOR/LARP1/5’TOP axis or facilitated by one of the other versatile functions of DDX3X. Another question that remained relating to the role of DDX3X in MYC-driven lym- phomas is the mechanism of DDX3Y protein upregulation in transformed B-cells and redundancy of biological activity of DDX3Y and DDX3X in lymphoid cells. 156 Exploring the role of noncanonical proteome in immunity and tumour im- munosurveillance Finally, in chapter 5 I revealed pervasive translation of noncanonical ORF in lymphoid cells, which is a captivating finding. Although this is not the only study uncovering widespread ribosome occupancy of bona-fide noncoding regions, this is one of the largest studies and the first investigating this topic in primary GC B-cells. An introductory analysis presented in this thesis undoubtedly has a hypothesis- generating flavour. The most burning questions related to noncanonical translation involve the extension of the protein level expression, the interplay between uORF and main ORF translation, and the frequency of potential functional protein domains and short linear motifs, which could shed light on the biological role of synthesised micropeptides. The analysis of three external immunopeptidomics data revealed that between 2 and 5% of MHC-bound peptides (MAPs) could originate from predicted noncanonical ORFs. To what extent this refers to the immunogenic function of noncanonical translation or reflects the inferior performance of other proteomic methods, remains to be addressed. 157 158 Bibliography Abate, F., Ambrosio, M. R., Mundo, L., Laginestra, M. A., Fuligni, F., Rossi, M., Zairis, S., Gazaneo, S., De Falco, G., Lazzi, S., et al. (2015). Distinct viral and mutational spectrum of endemic Burkitt lymphoma. PLoS pathogens, 11(10):e1005158. Aebersold, R., Agar, J. N., Amster, I. J., Baker, M. S., Bertozzi, C. R., Boja, E. S., Costello, C. E., Cravatt, B. F., Fenselau, C., Garcia, B. A., et al. (2018a). How many human proteoforms are there? Nature Chemical Biology, 14(3):206. Aebersold, R., Agar, J. N., Amster, I. J., Baker, M. S., Bertozzi, C. R., Boja, E. S., Costello, C. E., Cravatt, B. F., Fenselau, C., Garcia, B. A., et al. (2018b). How many human proteoforms are there? Nature Chemical Biology, 14(3):206–214. Aitken, C. E. and Lorsch, J. R. (2012). A mechanistic overview of translation initiation in eukaryotes. Nature Structural and Molecular Biology, 19(6):568–576. Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., et al. (2000). Distinct types of di↵use large b-cell lymphoma identified by gene expression profiling. Nature, 403(6769):503–511. Alkallas, R., Lajoie, M., Moldoveanu, D., Hoang, K. V., Lefranc¸ois, P., Lingrand, M., Ahanfeshar-Adams, M., Watters, K., Spatz, A., Zippin, J. H., et al. (2020). Multi- omic analysis reveals significantly mutated genes and DDX3X as a sex-specific tumor suppressor in cutaneous melanoma. Nature Cancer, 1(6):635–652. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–410. Andrews, S. et al. (2017). FastQC: a quality control tool for high throughput sequence data. 2010. Ansell, S. M., Lesokhin, A. M., Borrello, I., Halwani, A., Scott, E. C., Gutierrez, M., Schuster, S. J., Millenson, M. M., Cattry, D., Freeman, G. J., et al. (2015). PD-1 blockade with nivolumab in relapsed or refractory Hodgkin’s lymphoma. New England Journal of Medicine, 372(4):311–319. 159 Aspden, J. L., Eyre-Walker, Y. C., Phillips, R. J., Amin, U., Mumtaz, M. A. S., Brocard, M., and Couso, J.-P. (2014). Extensive translation of small open reading frames revealed by Poly-Ribo-Seq. eLife, 3:e03528. Aviner, R. (2020). The science of puromycin: From studies of ribosome function to applications in biotechnology. Computational and Structural Biotechnology Journal, 18:1074–1083. Aviner, R., Geiger, T., and Elroy-Stein, O. (2013). Novel proteomic approach (PUNCH-P) reveals cell cycle-specific fluctuations in mRNA translation. Genes & development, 27(16):1834–1844. Babaian, A., Rothe, K., Girodat, D., Minia, I., Djondovic, S., Milek, M., Miko, S. E. S., Wieden, H.-J., Landthaler, M., Morin, G. B., et al. (2020). Loss of m1acp3 ribosomal RNA modification is a major feature of cancer. Cell Reports, 31(5):107611. Barna, M., Pusic, A., Zollo, O., Costa, M., Kondrashov, N., Rego, E., Rao, P. H., and Ruggero, D. (2008). Suppression of Myc oncogenic activity by ribosomal protein haploinsuciency. Nature, 456(7224):971–975. Basso, K. and Dalla-Favera, R. (2015). Germinal centres and B cell lymphomagenesis. Nature Reviews Immunology, 15(3):172–184. Bastide, A. and David, A. (2018). The ribosome, (slow) beating heart of cancer (stem) cell. Oncogenesis, 7(4):1–13. Battle, A., Khan, Z., Wang, S. H., Mitrano, A., Ford, M. J., Pritchard, J. K., and Gilad, Y. (2015). Impact of regulatory variation from RNA to protein. Science, 347(6222):664–667. Bazzini, A. A., Johnstone, T. G., Christiano, R., Mackowiak, S. D., Obermayer, B., Fleming, E. S., Vejnar, C. E., Lee, M. T., Rajewsky, N., Walther, T. C., et al. (2014). Identification of small ORF s in vertebrates using ribosome footprinting and evolutionary conservation. The EMBO journal, 33(9):981–993. Beadle, G. W. and Tatum, E. L. (1941). Genetic control of biochemical reactions in Neurospora. Proceedings of the National Academy of Sciences of the United States of America, 27(11):499. Bekaert, M., Ivanov, I. P., Atkins, J. F., and Baranov, P. V. (2008). Ornithine decarboxylase antizyme finder (OAF): fast and reliable detection of antizymes with frameshifts in mRNAs. BMC Bioinformatics, 9(1):1–10. 160 Bekker-Jensen, D. B., Kelstrup, C. D., Batth, T. S., Larsen, S. C., Haldrup, C., Bramsen, J. B., Sørensen, K. D., Høyer, S., Ørntoft, T. F., Andersen, C. L., et al. (2017). An optimized shotgun strategy for the rapid generation of comprehensive human proteomes. Cell Systems, 4(6):587–599. Berletch, J. B., Yang, F., Xu, J., Carrel, L., and Disteche, C. M. (2011). Genes that escape from X inactivation. Human Genetics, 130(2):237–245. Bhat, M., Robichaud, N., Hulea, L., Sonenberg, N., Pelletier, J., and Topisirovic, I. (2015). Targeting the translation machinery in cancer. Nature Reviews Drug discovery, 14(4):261–278. Blakeley, P., Overton, I. M., and Hubbard, S. J. (2012). Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies. Journal of Proteome Research, 11(11):5221–5234. Bol, G. M., Vesuna, F., Xie, M., Zeng, J., Aziz, K., Gandhi, N., Levine, A., Irving, A., Korz, D., Tantravedi, S., et al. (2015). Targeting DDX3 with a small molecule inhibitor for lung cancer therapy. EMBO Molecular Medicine, 7(5):648–669. Boon, K., Caron, H. N., Van Asperen, R., Valentijn, L., Hermus, M.-C., Van Sluis, P., Roobeek, I., Weis, I., Voute, P., Schwab, M., et al. (2001). N-myc enhances the expression of a large set of genes functioning in ribosome biogenesis and protein synthesis. The EMBO Journal, 20(6):1383–1393. Bourgeois, C. F., Mortreux, F., and Auboeuf, D. (2016). The multiple functions of RNA helicases as drivers and regulators of gene expression. Nature Reviews Molecular Cell Biology, 17(7):426–438. Bouska, A., Bi, C., Lone, W., Zhang, W., Kedwaii, A., Heavican, T., Lachel, C. M., Yu, J., Ferro, R., Eldorghamy, N., et al. (2017). Adult high-grade b-cell lymphoma with Burkitt lymphoma signature: genomic features and potential therapeutic targets. Blood, 130(16):1819–1831. Brai, A., Fazi, R., Tintori, C., Zamperini, C., Bugli, F., Sanguinetti, M., Stigliano, E., Este´, J., Badia, R., Franco, S., et al. (2016). Human DDX3 protein is a valuable target to develop broad spectrum antiviral agents. Proceedings of the National Academy of Sciences, 113(19):5388–5393. Brent, M. R. (2005). Genome annotation past, present, and future: how to define an ORF at each locus. Genome research, 15(12):1777–1786. 161 Brunet, M. A., Lucier, J.-F., Levesque, M., Leblanc, S., Jacques, J.-F., Al-Saedi, H. R., Guilloy, N., Grenier, F., Avino, M., Fournier, I., et al. (2021). OpenProt 2021: deeper functional annotation of the coding potential of eukaryotic genomes. Nucleic Acids Research, 49(D1):D380–D388. Buchan, J. R. and Parker, R. (2009). Eukaryotic stress granules: the ins and outs of translation. Molecular Cell, 36(6):932–941. Buttgereit, F. and Brand, M. D. (1995). A hierarchy of ATP-consuming processes in mammalian cells. Biochemical Journal, 312(1):163–167. Caeser, R., Di Re, M., Krupka, J. A., Gao, J., Lara-Chica, M., Dias, J. M., Cooke, S. L., Fenner, R., Usheva, Z., Runge, H. F., et al. (2019). Genetic modification of primary human B cells to model high-grade lymphoma. Nature Communications, 10(1):1–16. Cai, X., Gao, L., Teng, L., Ge, J., Oo, Z. M., Kumar, A. R., Gilliland, D. G., Mason, P. J., Tan, K., and Speck, N. A. (2015). Runx1 deficiency decreases ribosome biogenesis and confers stress resistance to hematopoietic stem and progenitor cells. Cell Stem Cell, 17(2):165–177. Calado, D. P., Sasaki, Y., Godinho, S. A., Pellerin, A., Ko¨chert, K., Sleckman, B. P., De Albora´n, I. M., Janz, M., Rodig, S., and Rajewsky, K. (2012). The cell-cycle regulator c-Myc is essential for the formation and maintenance of germinal centers. Nature Immunology, 13(11):1092–1100. Calviello, L. and Ohler, U. (2017). Beyond read-counts: Ribo-seq data analysis to understand the functions of the transcriptome. Trends in Genetics, 33(10):728–744. Calviello, L., Venkataramanan, S., Rogowski, K. J., Wyler, E., Wilkins, K., Tejura, M., Thai, B., Krol, J., Filipowicz, W., Landthaler, M., et al. (2021). DDX3 depletion represses translation of mRNAs with complex 5 UTRs. Nucleic Acids Research, 49(9):5336–5350. Calvo, S. E., Pagliarini, D. J., and Mootha, V. K. (2009). Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proceedings of the National Academy of Sciences, 106(18):7507–7512. Cannizzaro, E., Bannister, A. J., Han, N., Alendar, A., and Kouzarides, T. (2018). DDX3X RNA helicase a↵ects breast cancer cell cycle progression by regulating expression of KLF4. FEBS Letters, 592(13):2308–2322. Casey, S. C., Baylot, V., and Felsher, D. W. (2018). The MYC oncogene is a global regulator of the immune response. Blood, The Journal of the American Society of Hematology, 131(18):2007–2015. 162 Casey, S. C., Tong, L., Li, Y., Do, R., Walz, S., Fitzgerald, K. N., Gouw, A. M., Baylot, V., Gu¨tgemann, I., Eilers, M., et al. (2016). MYC regulates the antitumor immune response through CD47 and PD-L1. Science, 352(6282):227–231. Cech, T. R. (2000). The ribosome is a ribozyme. Science, 289(5481):878–879. Chambers, M. C., Maclean, B., Burke, R., Amodei, D., Ruderman, D. L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J., et al. (2012). A cross-platform toolkit for mass spectrometry and proteomics. Nature Biotechnology, 30(10):918–920. Chapuy, B., Stewart, C., Dunford, A. J., Kim, J., Kamburov, A., Redd, R. A., Lawrence, M. S., Roemer, M. G., Li, A. J., Ziepert, M., et al. (2018). Molecular subtypes of di↵use large B cell lymphoma are associated with distinct pathogenic mechanisms and outcomes. Nature Medicine, 24(5):679–690. Chen, H., Liu, H., and Qing, G. (2018). Targeting oncogenic Myc as a strategy for cancer treatment. Signal transduction and targeted therapy, 3(1):1–7. Chen, J., Brunner, A.-D., Cogan, J. Z., Nun˜ez, J. K., Fields, A. P., Adamson, B., Itzhak, D. N., Li, J. Y., Mann, M., Leonetti, M. D., et al. (2020). Pervasive functional translation of noncanonical human open reading frames. Science, 367(6482):1140–1146. Chew, G.-L., Pauli, A., and Schier, A. F. (2016). Conservation of uORF repressiveness and sequence features in mouse, human and zebrafish. Nature Communications, 7(1):1–10. Chick, J. M., Kolippakkam, D., Nusinow, D. P., Zhai, B., Rad, R., Huttlin, E. L., and Gygi, S. P. (2015). A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides. Nature Biotechnology, 33(7):743–749. Chong, C., Mu¨ller, M., Pak, H., Harnett, D., Huber, F., Grun, D., Leleu, M., Auger, A., Arnaud, M., Stevenson, B. J., et al. (2020). Integrated proteogenomic deep sequencing and analytics accurately identify non-canonical peptides in tumor immunopeptidomes. Nature Communications, 11(1):1–21. Chooniedass-Kothari, S., Emberley, E., Hamedani, M., Troup, S., Wang, X., Czosnek, A., Hube, F., Mutawe, M., Watson, P., and Leygue, E. (2004). The steroid receptor RNA activator is the first functional RNA encoding a protein. FEBS Letters, 566(1-3):43–47. Chothani, S., Adami, E., Ouyang, J. F., Viswanathan, S., Hubner, N., Cook, S. A., Schafer, S., and Rackham, O. J. (2019). deltaTE: detection of translationally regulated genes by integrative analysis of Ribo-seq and RNA-seq data. Current Protocols in Molecular Biology, 129(1):e108. 163 Churbanov, A., Rogozin, I. B., Babenko, V. N., Ali, H., and Koonin, E. V. (2005). Evolutionary conservation suggests a regulatory function of AUG triplets in 5-UTRs of eukaryotic genes. Nucleic Acids Research, 33(17):5512–5520. Ci, W., Polo, J. M., Cerchietti, L., Shaknovich, R., Wang, L., Yang, S. N., Ye, K., Farinha, P., Horsman, D. E., Gascoyne, R. D., et al. (2009). The BCL6 transcriptional program features repression of multiple oncogenes in primary B cells and is deregulated in DLBCL. Blood, 113(22):5536–5548. Clarke, H. J., Chambers, J. E., Liniker, E., and Marciniak, S. J. (2014). Endoplasmic reticulum stress in malignancy. Cancer Cell, 25(5):563–573. Consortium, G. et al. (2020). The GTEx consortium atlas of genetic regulatory e↵ects across human tissues. Science, 369(6509):1318–1330. Costa, L. J., Xavier, A. C., Wahlquist, A. E., and Hill, E. G. (2013). Trends in survival of patients with Burkitt lymphoma/leukemia in the USA: an analysis of 3691 cases. Blood, 121(24):4861–4866. Costa-Mattioli, M. and Walter, P. (2020). The integrated stress response: From mechanism to disease. Science, 368(6489). Cotton, A. M., Price, E. M., Jones, M. J., Balaton, B. P., Kobor, M. S., and Brown, C. J. (2015). Landscape of DNA methylation on the X chromosome reflects CpG density, functional chromatin state and x-chromosome inactivation. Human Molecular Genetics, 24(6):1528–1539. Cox, J. and Mann, M. (2008). MaxQuant enables high peptide identification rates, individualized ppb-range mass accuracies and proteome-wide protein quantification. Nature Biotechnology, 26(12):1367–1372. Crick, F. H. (1958). On protein synthesis. In Symp Soc Exp Biol, volume 12, page 8. Cruciat, C.-M., Dolde, C., De Groot, R. E., Ohkawara, B., Reinhard, C., Korswagen, H. C., and Niehrs, C. (2013). RNA helicase DDX3 is a regulatory subunit of casein kinase 1 in Wnt–-catenin signaling. Science, 339(6126):1436–1441. Cucco, F., Barrans, S., Sha, C., Clipson, A., Crouch, S., Dobson, R., Chen, Z., Thompson, J. S., Care, M. A., Cummin, T., et al. (2020). Distinct genetic changes reveal evolutionary history and heterogeneous molecular grade of DLBCL with MYC/BCL2 double-hit. Leukemia, 34(5):1329–1341. 164 Cuevas, M. V. R., Hardy, M.-P., Holly`, J., Bonneil, E´., Durette, C., Courcelles, M., Lanoix, J., Coˆte´, C., Staudt, L. M., Lemieux, S., et al. (2021). Most non-canonical proteins uniquely populate the proteome or immunopeptidome. Cell Reports, 34(10):108815. Culjkovic-Kraljacic, B., Fernando, T. M., Marullo, R., Calvo-Vidal, N., Verma, A., Yang, S., Tabbo`, F., Gaudiano, M., Zahreddine, H., Goldstein, R. L., et al. (2016). Combinatorial targeting of nuclear export and translation of RNA inhibits aggressive b-cell lymphomas. Blood, 127(7):858–868. Cutmore, Krupka, Hodson (2022). Molecular profiling in di↵use large b cell lymphoma – challenges and opportunities. Under review. Dai, M.-S., Sun, X.-X., and Lu, H. (2010). Ribosomal protein L11 associates with c-Myc at 5S rRNA and tRNA genes and regulates their expression. Journal of Biological Chemistry, 285(17):12587–12594. Dang, C. V. (2012). Myc on the path to cancer. Cell, 149(1):22–35. de Loubresse, N. G., Prokhorova, I., Holtkamp, W., Rodnina, M. V., Yusupova, G., and Yusupov, M. (2014). Structural basis for the inhibition of the eukaryotic ribosome. Nature, 513(7519):517–522. De Silva, N. S. and Klein, U. (2015). Dynamics of B cells in germinal centres. Nature Reviews Immunology, 15(3):137–148. Deeb, S. J., Cox, J., Schmidt-Supprian, M., and Mann, M. (2014). N-linked glycosylation enrichment for in-depth cell surface proteomics of di↵use large B-cell lymphoma subtypes. Molecular & Cellular Proteomics, 13(1):240–251. Deeb, S. J., D’Souza, R. C., Cox, J., Schmidt-Supprian, M., and Mann, M. (2012). Super- SILAC allows classification of di↵use large B-cell lymphoma subtypes by their protein expression profiles. Molecular & Cellular Proteomics, 11(5):77–89. Delaidelli, A., Leprivier, G., and Sorensen, P. H. (2017). eEF2K protects MYCN-amplified cells from starvation. Cell Cycle, 16(18):1633. Derrien, T., Johnson, R., Bussotti, G., Tanzer, A., Djebali, S., Tilgner, H., Guernec, G., Martin, D., Merkel, A., Knowles, D. G., et al. (2012). The GENCODE v7 catalog of human long noncoding rnas: analysis of their gene structure, evolution, and expression. Genome Research, 22(9):1775–1789. Dersh, D., Holly`, J., and Yewdell, J. W. (2021). A few good peptides: MHC class I- based cancer immunosurveillance and immunoevasion. Nature Reviews Immunology, 21(2):116–128. 165 Desnoyers, G., Frost, L. D., Courteau, L., Wall, M. L., and Lewis, S. M. (2015). Decreased eIF3e expression can mediate epithelial-to-mesenchymal transition through activation of the TGF signaling pathway. Molecular Cancer Research, 13(10):1421–1430. Deutsch, E. W., Csordas, A., Sun, Z., Jarnuczak, A., Perez-Riverol, Y., Ternent, T., Campbell, D. S., Bernal-Llinares, M., Okuda, S., Kawano, S., et al. (2016). The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Research, page gkw936. Dinkel, H., Van Roey, K., Michael, S., Davey, N. E., Weatheritt, R. J., Born, D., Speck, T., Kru¨ger, D., Grebnev, G., Kuban´, M., et al. (2014). The eukaryotic linear motif resource ELM: 10 years and counting. Nucleic Acids Research, 42(D1):D259–D266. Dittmar, K. A., Goodenbour, J. M., and Pan, T. (2006). Tissue-specific di↵erences in human transfer RNA expression. PLoS genetics, 2(12):e221. Ditton, H., Zimmer, J., Kamp, C., Rajpert-De Meyts, E., and Vogt, P. (2004). The AZFa gene dby (DDX3Y) is widely transcribed but the protein is limited to the male germ cells by translation control. Human Molecular Genetics, 13(19):2333–2341. Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1):15–21. Doench, J. G., Fusi, N., Sullender, M., Hegde, M., Vaimberg, E. W., Donovan, K. F., Smith, I., Tothova, Z., Wilen, C., Orchard, R., et al. (2016). Optimized sgRNA design to maximize activity and minimize o↵-target e↵ects of CRISPR-Cas9. Nature Biotechnology, 34(2):184–191. Doench, J. G., Hartenian, E., Graham, D. B., Tothova, Z., Hegde, M., Smith, I., Sullender, M., Ebert, B. L., Xavier, R. J., and Root, D. E. (2014). Rational design of highly active sgRNAs for CRISPR-Cas9–mediated gene inactivation. Nature Biotechnology, 32(12):1262–1267. Dominguez-Sola, D., Victora, G. D., Ying, C. Y., Phan, R. T., Saito, M., Nussenzweig, M. C., and Dalla-Favera, R. (2012). The proto-oncogene MYC is required for selection in the germinal center and cyclic reentry. Nature Immunology, 13(11):1083–1091. Dresios, J., Chappell, S. A., Zhou, W., and Mauro, V. P. (2006). An mRNA-rRNA base-pairing mechanism for translation initiation in eukaryotes. Nature structural & Molecular Biology, 13(1):30–34. 166 Duncan, C. D. and Mata, J. (2017). E↵ects of cycloheximide on the interpretation of ribosome profiling experiments in Schizosaccharomyces pombe. Scientific Reports, 7(1):1–11. Dunford, A., Weinstock, D. M., Savova, V., Schumacher, S. E., Cleary, J. P., Yoda, A., Sullivan, T. J., Hess, J. M., Gimelbrant, A. A., Beroukhim, R., et al. (2017). Tumor- suppressor genes that escape from X-inactivation contribute to cancer sex bias. Nature Genetics, 49(1):10–16. Dutt, S., Narla, A., Lin, K., Mullally, A., Abayasekara, N., Megerdichian, C., Wilson, F. H., Currie, T., Khanna-Gupta, A., Berliner, N., et al. (2011). Haploinsuciency for ribosomal protein genes causes selective activation of p53 in human erythroid progenitor cells. Blood, 117(9):2567–2576. Dyson, H. J. and Wright, P. E. (2005). Intrinsically unstructured proteins and their functions. Nature Reviews Molecular Cell biology, 6(3):197–208. Eng, J. K., Jahan, T. A., and Hoopmann, M. R. (2013). Comet: an open-source MS/MS sequence database search tool. Proteomics, 13(1):22–24. Ennishi, D., Jiang, A., Boyle, M., Collinge, B., Grande, B. M., Ben-Neriah, S., Rushton, C., Tang, J., Thomas, N., Slack, G. W., et al. (2019). Double-hit gene expression signature defines a distinct subgroup of germinal center b-cell-like di↵use large b-cell lymphoma. Journal of Clinical Oncology, 37(3):190. Eswarappa, S. M., Potdar, A. A., Koch, W. J., Fan, Y., Vasu, K., Lindner, D., Willard, B., Graham, L. M., DiCorleto, P. E., and Fox, P. L. (2014). Programmed translational readthrough generates antiangiogenic VEGF-Ax. Cell, 157(7):1605–1618. Etzioni, A. and Ochs, H. D. (2004). The hyper IgM syndrome—an evolving story. Pediatric research, 56(4):519–525. Ewels, P., Magnusson, M., Lundin, S., and Ka¨ller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19):3047–3048. Ferlin, A., Moro, E., Rossi, A., Dallapiccola, B., and Foresta, C. (2003). The human Y chromosome’s azoospermia factor b (azfb) region: sequence, structure, and deletion analysis in infertile men. Journal of Medical Genetics, 40(1):18–24. Fields, A. P., Rodriguez, E. H., Jovanovic, M., Stern-Ginossar, N., Haas, B. J., Mertins, P., Raychowdhury, R., Hacohen, N., Carr, S. A., Ingolia, N. T., et al. (2015). A regression- based analysis of ribosome-profiling data reveals a conserved complexity to mammalian translation. Molecular Cell, 60(5):816–827. 167 Filipowicz, W., Bhattacharyya, S. N., and Sonenberg, N. (2008). Mechanisms of post- transcriptional regulation by microRNAs: are the answers in sight? Nature Reviews Genetics, 9(2):102–114. Floor, S. N., Condon, K. J., Sharma, D., Jankowsky, E., and Doudna, J. A. (2016). Autoinhibitory interdomain interactions and subfamily-specific extensions redefine the catalytic core of the human dead-box protein ddx3. Journal of Biological Chemistry, 291(5):2412–2421. Foresta, C., Ferlin, A., and Moro, E. (2000a). Deletion and expression analysis of AZFa genes on the human Y chromosome revealed a major role for DBY in male infertility. Human Molecular Genetics, 9(8):1161–1169. Foresta, C., Ferlin, A., and Moro, E. (2000b). Deletion and expression analysis of AZFa genes on the human Y chromosome revealed a major role for DBY in male infertility. Human Molecular Genetics, 9(8):1161–1169. Frankish, A., Diekhans, M., Ferreira, A.-M., Johnson, R., Jungreis, I., Loveland, J., Mudge, J. M., Sisu, C., Wright, J., Armstrong, J., et al. (2019). GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Research, 47(D1):D766–D773. Furic, L., Rong, L., Larsson, O., Koumakpayi, I. H., Yoshida, K., Brueschke, A., Petroulakis, E., Robichaud, N., Pollak, M., Gaboury, L. A., et al. (2010). eIF4E phosphorylation promotes tumorigenesis and is associated with prostate cancer progression. Proceedings of the National Academy of Sciences, 107(32):14134–14139. Gani, R. (1976). The nucleoli of cultured human lymphocytes: I. nucleolar morphology in relation to transformation and the DNA cycle. Experimental Cell Research, 97(2):249– 258. Gao, H., Sun, X., and Rao, Y. (2020). PROTAC technology: opportunities and challenges. ACS Medicinal Chemistry Letters, 11(3):237–240. Gao, X., Wan, J., Liu, B., Ma, M., Shen, B., and Qian, S.-B. (2015). Quantitative profiling of initiating ribosomes in vivo. Nature Methods, 12(2):147–153. Gao, Z., Herrera-Carrillo, E., and Berkhout, B. (2018). Delineation of the exact transcrip- tion termination signal for type 3 polymerase III. Molecular Therapy-Nucleic Acids, 10:36–44. Geissler, R., Golbik, R. P., and Behrens, S.-E. (2012). The DEAD-box helicase DDX3 supports the assembly of functional 80s ribosomes. Nucleic Acids Research, 40(11):4998– 5011. 168 Genuth, N. R. and Barna, M. (2018). The discovery of ribosome heterogeneity and its implications for gene regulation and organisms life. Molecular Cell, 71(3):364–374. Gerashchenko, M. V. and Gladyshev, V. N. (2014). Translation inhibitors cause abnormal- ities in ribosome profiling experiments. Nucleic Acids Research, 42(17):e134–e134. Gerashchenko, M. V. and Gladyshev, V. N. (2017). Ribonuclease selection for ribosome profiling. Nucleic Acids Research, 45(2):e6–e6. Ghaddar, N., Wang, S., Woodvine, B., Krishnamoorthy, J., van Hoef, V., Darini, C., Kazimierczak, U., Ah-Son, N., Popper, H., Johnson, M., et al. (2021). The integrated stress response is tumorigenic and constitutes a therapeutic liability in KRAS-driven lung cancer. Nature Communications, 12(1):1–15. Gingras, A.-C., Raught, B., and Sonenberg, N. (1999). eIF4 initiation factors: e↵ectors of mRNA recruitment to ribosomes and regulators of translation. Annual review of biochemistry, 68(1):913–963. God, J. M., Cameron, C., Figueroa, J., Amria, S., Hossain, A., Kempkes, B., Bornkamm, G. W., Stuart, R. K., Blum, J. S., and Haque, A. (2015). Elevation of c-MYC disrupts HLA class II–mediated immune recognition of human B cell tumors. The Journal of Immunology, 194(4):1434–1445. Gong, Krupka, Gao, J., Grigoropoulos, N. F., Screen, M., Usheva, Z., Cucco, F., Barrans, S., Painter, D., Mohammed, M., et al. (2021). Sequential inverse dysregulation of the RNA helicases DDX3X and DDX3Y facilitates MYC-driven lymphomagenesis. Good-Jacobson, K. L., Szumilas, C. G., Chen, L., Sharpe, A. H., Tomayko, M. M., and Shlomchik, M. J. (2010). PD-1 regulates germinal center B cell survival and the formation and anity of long-lived plasma cells. Nature Immunology, 11(6):535–542. Goodman, A., Patel, S. P., and Kurzrock, R. (2017). PD-1–PD-L1 immune-checkpoint blockade in B-cell lymphomas. Nature Reviews Clinical Oncology, 14(4):203–220. Grande, B. M., Gerhard, D. S., Jiang, A., Griner, N. B., Abramson, J. S., Alexander, T. B., Allen, H., Ayers, L. W., Bethony, J. M., Bhatia, K., et al. (2019). Genome-wide discovery of somatic coding and noncoding mutations in pediatric endemic and sporadic Burkitt lymphoma. Blood, 133(12):1313–1324. Grandori, C., Gomez-Roman, N., Felton-Edkins, Z. A., Ngouenet, C., Galloway, D. A., Eisenman, R. N., and White, R. J. (2005). c-Myc binds to human ribosomal DNA and stimulates transcription of rRNA genes by RNA polymerase I. Nature Cell Biology, 7(3):311–318. 169 Green, M. R., Monti, S., Rodig, S. J., Juszczynski, P., Currie, T., O’Donnell, E., Chapuy, B., Takeyama, K., Neuberg, D., Golub, T. R., et al. (2010). Integrative analysis reveals selective 9p24. 1 amplification, increased PD-1 ligand expression, and further induction via JAK2 in nodular sclerosing Hodgkin lymphoma and primary mediastinal large b-cell lymphoma. Blood, 116(17):3268–3277. Griss, J., Perez-Riverol, Y., Lewis, S., Tabb, D. L., Dianes, J. A., Del-Toro, N., Rurik, M., Walzer, M., Kohlbacher, O., Hermjakob, H., et al. (2016). Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nature Methods, 13(8):651–656. Guttman, M., Amit, I., Garber, M., French, C., Lin, M. F., Feldser, D., Huarte, M., Zuk, O., Carey, B. W., Cassady, J. P., et al. (2009). Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature, 458(7235):223–227. Guttman, M., Russell, P., Ingolia, N. T., Weissman, J. S., and Lander, E. S. (2013). Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell, 154(1):240–251. Hafner, M., Katsantoni, M., Ko¨ster, T., Marks, J., Mukherjee, J., Staiger, D., Ule, J., and Zavolan, M. (2021). CLIP and complementary methods. Nature Reviews Methods Primers, 1(1):1–23. Ha¨mmerl, L., Colombet, M., Rochford, R., Ogwang, D. M., and Parkin, D. M. (2019). The burden of Burkitt lymphoma in Africa. Infectious Agents and Cancer, 14(1):1–6. Hanahan, D. and Weinberg, R. A. (2011). Hallmarks of cancer: the next generation. cell, 144(5):646–674. Hanson, G. and Coller, J. (2018). Codon optimality, bias and usage in translation and mRNA decay. Nature reviews Molecular cell biology, 19(1):20. Hariri, F., Arguello, M., Volpon, L., Culjkovic-Kraljacic, B., Nielsen, T. H., Hiscott, J., Mann, K. K., and Borden, K. L. (2013). The eukaryotic translation initiation factor eif4e is a direct transcriptional target of NF-B and is aberrantly regulated in acute myeloid leukemia. Leukemia, 27(10):2047–2055. He, Y., Zhang, D., Yang, Y., Wang, X., Zhao, X., Zhang, P., Zhu, H., Xu, N., and Liang, S. (2018). A double-edged function of DDX3, as an oncogene or tumor suppressor, in cancer progression. Oncology reports, 39(3):883–892. Hellen, C. U. (2018). Translation termination and ribosome recycling in eukaryotes. Cold Spring Harbor perspectives in biology, 10(10):a032656. 170 Helmrich, A., Ballarino, M., and Tora, L. (2011). Collisions between replication and transcription complexes cause common fragile site instability at the longest human genes. Molecular cell, 44(6):966–977. Henderson, A., Warburton, D., and Atwood, K. (1972). Location of ribosomal DNA in the human chromosome complement. Proceedings of the National Academy of Sciences, 69(11):3394–3398. Hershey, J. W. B., Sonenberg, N., and Mathews, M. B. (2012). Principles of Translational Control: An Overview. Cold Spring Harbor Perspectives in Biology, 4(12):a011528– a011528. Hetz, C. (2012). The unfolded protein response: controlling cell fate decisions under ER stress and beyond. Nature reviews Molecular cell biology, 13(2):89–102. Hetz, C., Chevet, E., and Oakes, S. A. (2015). Proteostasis control by the unfolded protein response. Nature Cell Biology, 17(7):829–838. Hilliker, A., Gao, Z., Jankowsky, E., and Parker, R. (2011). The DEAD-box protein Ded1 modulates translation by the formation and resolution of an eIF4F-mRNA complex. Molecular Cell, 43(6):962–972. Hinnebusch, A. G. (2005). Translational regulation of GCN4 and the general amino acid control of yeast. Annu. Rev. Microbiol., 59:407–450. Hinnebusch, A. G. (2014). The scanning mechanism of eukaryotic translation initiation. Annual Review of Biochemistry, 83:779–812. Hinnebusch, A. G., Ivanov, I. P., and Sonenberg, N. (2016). Translational control by 5-untranslated regions of eukaryotic mRNAs. Science, 352(6292):1413–1416. Ho, J. S., Ma, W., Mao, D. Y., and Benchimol, S. (2005). p53-dependent transcriptional repression of c-myc is required for G1 cell cycle arrest. Molecular and Cellular Biology, 25(17):7423–7431. Ho, Y., Li, X., Jamison, S., Harding, H. P., McKinnon, P. J., Ron, D., and Lin, W. (2016). PERK activation promotes medulloblastoma tumorigenesis by attenuating premalignant granule cell precursor apoptosis. The American Journal of Pathology, 186(7):1939–1951. Horvilleur, E., Sbarrato, T., Hill, K., Spriggs, R., Screen, M., Goodrem, P., Sawicka, K., Chaplin, L., Touriol, C., Packham, G., et al. (2014). A role for eukaryotic initiation factor 4B overexpression in the pathogenesis of di↵use large b-cell lymphoma. Leukemia, 28(5):1092–1102. 171 Horvilleur, E., Wilson, L. A., and Willis, A. E. (2010). Translation deregulation in B-cell lymphomas. Howden, A. J., Geoghegan, V., Katsch, K., Efstathiou, G., Bhushan, B., Boutureira, O., Thomas, B., Trudgian, D. C., Kessler, B. M., Dieterich, D. C., et al. (2013). QuaNCAT: quantitating proteome dynamics in primary cells. Nature Methods, 10(4):343–346. Hsieh, A. C., Liu, Y., Edlind, M. P., Ingolia, N. T., Janes, M. R., Sher, A., Shi, E. Y., Stumpf, C. R., Christensen, C., Bonham, M. J., Wang, S., Ren, P., Martin, M., Jessen, K., Feldman, M. E., Weissman, J. S., Shokat, K. M., Rommel, C., and Ruggero, D. (2012). The translational landscape of mTOR signalling steers cancer initiation and metastasis. Nature, 485(7396):55–61. Hu, F., Lu, J., Munoz, M. D., Saveliev, A., and Turner, M. (2021). ORFLine: a bioinformatic pipeline to prioritise small open reading frames identifies candidate secreted small proteins from lymphocytes. bioRxiv. Huang, L., Luo, R., Li, J., Wang, D., Zhang, Y., Liu, L., Zhang, N., Xu, X., Lu, B., and Zhao, K. (2020). -catenin promotes NLRP3 inflammasome activation via increasing the association between NLRP3 and ASC. Molecular Immunology, 121:186–194. Hube, F., Velasco, G., Rollin, J., Furling, D., and Francastel, C. (2011). Steroid receptor RNA activator protein binds to and counteracts SRA RNA-mediated activation of MyoD and muscle di↵erentiation. Nucleic Acids Research, 39(2):513–525. Hussmann, J. A., Patchett, S., Johnson, A., Sawyer, S., and Press, W. H. (2015). Un- derstanding biases in ribosome profiling experiments reveals signatures of translation dynamics in yeast. PLoS Genetics, 11(12):e1005732. Ingolia, N. T., Brar, G. A., Rouskin, S., McGeachy, A. M., and Weissman, J. S. (2012). The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nature Protocols, 7(8):1534–1550. Ingolia, N. T., Ghaemmaghami, S., Newman, J. R., and Weissman, J. S. (2009). Genome- wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science, 324(5924):218–223. Ingolia, N. T., Lareau, L. F., and Weissman, J. S. (2011). Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell, 147(4):789–802. Iossifov, I., O’roak, B. J., Sanders, S. J., Ronemus, M., Krumm, N., Levy, D., Stessman, H. A., Witherspoon, K. T., Vives, L., Patterson, K. E., et al. (2014). The contribution of de novo coding mutations to autism spectrum disorder. Nature, 515(7526):216–221. 172 Ivanov, P., Emara, M. M., Villen, J., Gygi, S. P., and Anderson, P. (2011). Angiogenin- Induced tRNA Fragments Inhibit Translation Initiation. Molecular Cell, 43(4):613–623. Ivanov, P., O’Day, E., Emara, M. M., Wagner, G., Lieberman, J., and Anderson, P. (2014). G-quadruplex structures contribute to the neuroprotective e↵ects of angiogenin-induced trna fragments. Proceedings of the National Academy of Sciences, 111(51):18201–18206. Iwasaki, S. and Ingolia, N. T. (2017). The growing toolbox for protein synthesis studies. Trends in biochemical sciences, 42(8):612–624. Jackson, R., Kroehling, L., Khitun, A., Bailis, W., Jarret, A., York, A. G., Khan, O. M., Brewer, J. R., Skadow, M. H., Duizer, C., et al. (2018). The translation of non-canonical open reading frames controls mucosal immunity. Nature, 564(7736):434–438. Jackson, R. J., Hellen, C. U., and Pestova, T. V. (2010). The mechanism of eukaryotic translation initiation and principles of its regulation. Nature Reviews Molecular cell biology, 11(2):113–127. Jain, S., Wheeler, J. R., Walters, R. W., Agrawal, A., Barsic, A., and Parker, R. (2016). ATPase-modulated stress granules contain a diverse proteome and substructure. Cell, 164(3):487–498. Jaroszynski, L., Zimmer, J., Fietz, D., Bergmann, M., Kliesch, S., and Vogt, P. (2011). Translational control of the AZFa gene DDX3Y by 5 UTR exon-T extension. Interna- tional journal of andrology, 34(4pt1):313–326. Ji, Z. (2018). RibORF: identifying genome-wide translated open reading frames using ribosome profiling. Current Protocols in Molecular Biology, 124(1):e67. Ji, Z., Song, R., Huang, H., Regev, A., and Struhl, K. (2016). Transcriptome-scale rnase-footprinting of rna-protein complexes. Nature Biotechnology, 34(4):410–413. Ji, Z., Song, R., Regev, A., and Struhl, K. (2015). Many lncRNAs, 5’UTRs, and pseudo- genes are translated and some are likely to express functional proteins. elife, 4:e08890. Jiang, L., Gu, Z.-H., Yan, Z.-X., Zhao, X., Xie, Y.-Y., Zhang, Z.-G., Pan, C.-M., Hu, Y., Cai, C.-P., Dong, Y., et al. (2015). Exome sequencing identifies somatic mutations of DDX3X in natural killer/T-cell lymphoma. Nature Genetics, 47(9):1061–1066. Johannes, G., Carter, M. S., Eisen, M. B., Brown, P. O., and Sarnow, P. (1999). Iden- tification of eukaryotic mRNAs that are translated at reduced cap binding complex eIF4F concentrations using a cDNA microarray. Proceedings of the National Academy of Sciences, 96(23):13118–13123. 173 Johnson, L. F., Levis, R., Abelson, H. T., Green, H., and Penman, S. (1976). Changes in RNA in relation to growth of the fibroblast. iv. alterations in the production and processing of mRNA and rrna in resting and growing cells. The Journal of Cell Biology, 71(3):933–938. Johnson-Kerner, B., Blok, L. S., Suit, L., Thomas, J., Kleefstra, T., and Sherr, E. H. (2020). DDX3X-related neurodevelopmental disorder. GeneReviews R[Internet]. Johnston, H. E., Carter, M. J., Larrayoz, M., Clarke, J., Garbis, S. D., Oscier, D., Stre↵ord, J. C., Steele, A. J., Walewska, R., and Cragg, M. S. (2018). Proteomics profiling of CLL versus healthy B-cells identifies putative therapeutic targets and a subtype-independent signature of spliceosome dysregulation. Molecular & Cellular Proteomics, 17(4):776–791. Johnstone, T. G., Bazzini, A. A., and Giraldez, A. J. (2016). Upstream ORF s are prevalent translational repressors in vertebrates. The EMBO journal, 35(7):706–723. Jones, D. T., Ja¨ger, N., Kool, M., Zichner, T., Hutter, B., Sultan, M., Cho, Y.-J., Pugh, T. J., Hovestadt, V., Stu¨tz, A. M., et al. (2012). Dissecting the genomic complexity underlying medulloblastoma. Nature, 488(7409):100–105. Joshi-Tope, G., Gillespie, M., Vastrik, I., D’Eustachio, P., Schmidt, E., de Bono, B., Jassal, B., Gopinath, G., Wu, G., Matthews, L., et al. (2005). Reactome: a knowledgebase of biological pathways. Nucleic Acids Research, 33(suppl 1):D428–D432. Jovanovic, M., Rooney, M. S., Mertins, P., Przybylski, D., Chevrier, N., Satija, R., Rodriguez, E. H., Fields, A. P., Schwartz, S., Raychowdhury, R., et al. (2015). Dynamic profiling of the protein life cycle in response to pathogens. Science, 347(6226). Kampen, K. R., Sulima, S. O., Vereecke, S., and De Keersmaecker, K. (2020). Hallmarks of ribosomopathies. Nucleic acids research, 48(3):1013–1028. Kapadia, B., Nanaji, N. M., Bhalla, K., Bhandary, B., Lapidus, R., Beheshti, A., Evens, A. M., and Gartenhaus, R. B. (2018). Fatty acid synthase induced S6Kinase facilitates USP11-eIF4B complex formation for sustained oncogenic translation in DLBCL. Nature Communications, 9(1):1–15. Karginov, F. V. and Hannon, G. J. (2013). Remodeling of Ago2–mRNA interactions upon cellular stress reflects mirna complementarity and correlates with altered translation rates. Genes & development, 27(14):1624–1632. Kataoka, K., Shiraishi, Y., Takeda, Y., Sakata, S., Matsumoto, M., Nagano, S., Maeda, T., Nagata, Y., Kitanaka, A., Mizuno, S., et al. (2016). Aberrant PD-L1 expression through 3-UTR disruption in multiple cancers. Nature, 534(7607):402–406. 174 Kaymaz, Y., Oduor, C. I., Yu, H., Otieno, J. A., Ong’echa, J. M., Moormann, A. M., and Bailey, J. A. (2017). Comprehensive transcriptome and mutational profiling of endemic Burkitt lymphoma reveals EBV type–specific di↵erences. Molecular Cancer Research, 15(5):563–576. Kearse, M. G. and Wilusz, J. E. (2017). Non-AUG translation: a new start for protein synthesis in eukaryotes. Genes & Development, 31(17):1717–1731. Kellaris, G., Khan, K., Baig, S. M., Tsai, I.-C., Zamora, F. M., Ruggieri, P., Natowicz, M. R., and Katsanis, N. (2018). A hypomorphic inherited pathogenic variant in DDX3X causes male intellectual disability with additional neurodevelopmental and neurodegenerative features. Human Genomics, 12(1):1–9. Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., and Haussler, D. (2002). The human genome browser at UCSC. Genome research, 12(6):996–1006. Kent, W. J., Zweig, A. S., Barber, G., Hinrichs, A. S., and Karolchik, D. (2010). BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics, 26(17):2204– 2207. Ketteler, R. (2012). On programmed ribosomal frameshifting: the alternative proteomes. Frontiers in genetics, 3:242. Kevil, C. G., De Benedetti, A., Payne, D. K., Coe, L. L., Laroux, F. S., and Alexander, J. S. (1996). Translational regulation of vascular permeability factor by eukaryotic initiation factor 4E: implications for tumor angiogenesis. International Journal of Cancer, 65(6):785–790. Khodadoust, M. S., Olsson, N., Chen, B., Sworder, B., Shree, T., Liu, C. L., Zhang, L., Czerwinski, D. K., Davis, M. M., Levy, R., et al. (2019). B-cell lymphomas present immunoglobulin neoantigens. Blood, 133(8):878–881. Khodadoust, M. S., Olsson, N., Wagar, L. E., Haabeth, O. A., Chen, B., Swaminathan, K., Rawson, K., Liu, C. L., Steiner, D., Lund, P., et al. (2017). Antigen presentation profiling reveals recognition of lymphoma immunoglobulin neoantigens. Nature, 543(7647):723– 727. Kim, J. and Guan, K. L. (2019a). mTOR as a central hub of nutrient signalling and cell growth. Nature Cell Biology, 21(1):63–71. Kim, J. and Guan, K.-L. (2019b). mTOR as a central hub of nutrient signalling and cell growth. Nature Cell Biology, 21(1):63–71. 175 Kiyasu, J., Miyoshi, H., Hirata, A., Arakawa, F., Ichikawa, A., Niino, D., Sugita, Y., Yufu, Y., Choi, I., Abe, Y., et al. (2015). Expression of programmed cell death ligand 1 is associated with poor overall survival in patients with di↵use large b-cell lymphoma. Blood, 126(19):2193–2201. Klein, U. and Dalla-Favera, R. (2008). Germinal centres: role in B-cell physiology and malignancy. Nature Reviews Immunology, 8(1):22–33. Knight, J. R., Garland, G., Poyry, T., Mead, E., Vlahov, N., Sfakianos, A., Grosso, S., De-Lima-Hedayioglu, F., Mallucci, G. R., Von Der Haar, T., Smales, C. M., Sansom, O. J., and Willis, A. E. (2020). Control of translation elongation in health and disease. DMM Disease Models and Mechanisms, 13(3). Komar, A. A. and Hatzoglou, M. (2011). Cellular IRES-mediated translation: the war of ITAFs in pathophysiological states. Cell Cycle, 10(2):229–240. Kondo, T., Plaza, S., Zanet, J., Benrabah, E., Valenti, P., Hashimoto, Y., Kobayashi, S., Payre, F., and Kageyama, Y. (2010). Small peptides switch the transcriptional activity of shavenbaby during drosophila embryogenesis. Science, 329(5989):336–339. Kozak, M. (1987). At least six nucleotides preceding the aug initiator codon enhance translation in mammalian cells. Journal of Molecular Biology, 196(4):947–950. Krokhin, O. V. and Spicer, V. (2010). Predicting peptide retention times for proteomics. Current Protocols in Bioinformatics, 31(1):13–14. Kumari, P. and Sampath, K. (2015). cncRNAs: Bi-functional RNAs with protein coding and non-coding functions. In Seminars in Cell & Developmental Biology, volume 47, pages 40–51. Elsevier. Ku¨ppers, R. and Dalla-Favera, R. (2001). Mechanisms of chromosomal translocations in B cell lymphomas. Oncogene, 20(40):5580–5594. Kurosaki, T. and Maquat, L. E. (2016). Nonsense-mediated mRNA decay in humans at a glance. Journal of Cell Science, 129(3):461–467. Kustatscher, G., Grabowski, P., Schrader, T. A., Passmore, J. B., Schrader, M., and Rappsilber, J. (2019). Co-regulation map of the human proteome enables identification of protein functions. Nature Biotechnology, 37(11):1361–1371. Labun, K., Montague, T. G., Krause, M., Torres Cleuren, Y. N., Tjeldnes, H., and Valen, E. (2019). CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing. Nucleic Acids Research, 47(W1):W171–W174. 176 Lacy, S. E., Barrans, S. L., Beer, P. A., Painter, D., Smith, A. G., Roman, E., Cooke, S. L., Ruiz, C., Glover, P., Van Hoppe, S. J., et al. (2020). Targeted sequencing in DLBCL, molecular subtypes, and outcomes: a haematological malignancy research network report. Blood, 135(20):1759–1771. Lafontaine, D. L. (2015). Noncoding RNAs in eukaryotic ribosome biogenesis and function. Nature Structural & Molecular Biology, 22(1):11–19. Lafontaine, D. L., Riback, J. A., Bascetin, R., and Brangwynne, C. P. (2021). The nucleolus as a multiphase liquid condensate. Nature Reviews Molecular Cell Biology, 22(3):165–182. Langmead, B. and Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4):357–359. Larsson, O., Sonenberg, N., and Nadon, R. (2010). Identification of di↵erential translation in genome wide studies. Proceedings of the National Academy of Sciences, 107(50):21487– 21492. Larsson, O., Sonenberg, N., and Nadon, R. (2011). anota: Analysis of di↵erential translation in genome-wide studies. Bioinformatics, 27(10):1440–1441. Lauria, F., Tebaldi, T., Bernabo`, P., Groen, E. J., Gillingwater, T. H., and Viero, G. (2018). ribowaltz: optimization of ribosome p-site positioning in ribosome profiling data. PLoS computational biology, 14(8):e1006169. Law, C. W., Chen, Y., Shi, W., and Smyth, G. K. (2014). voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 15(2):1–17. Lawrence, M., Huber, W., Pages, H., Aboyoun, P., Carlson, M., Gentleman, R., Morgan, M. T., and Carey, V. J. (2013). Software for computing and annotating genomic ranges. PLoS Computational Biology, 9(8):e1003118. Lawrie, C. H., Chi, J., Taylor, S., Tramonti, D., Ballabio, E., Palazzo, S., Saunders, N. J., Pezzella, F., Boultwood, J., Wainscoat, J. S., et al. (2009). Expression of microRNAs in di↵use large b cell lymphoma is associated with immunophenotype, survival and transformation from follicular lymphoma. Journal of Cellular and Molecular Medicine, 13(7):1248–1260. Lazaris-Karatzas, A., Montine, K. S., and Sonenberg, N. (1990). Malignant transfor- mation by a eukaryotic initiation factor subunit that binds to mRNA 5’cap. Nature, 345(6275):544–547. 177 Lee, A. S., Kranzusch, P. J., Doudna, J. A., and Cate, J. H. (2016). eIF3d is an mRNA cap-binding protein that is required for specialized translation initiation. Nature, 536(7614):96–99. Lee, C.-S., Dias, A. P., Jedrychowski, M., Patel, A. H., Hsu, J. L., and Reed, R. (2008a). Human DDX3 functions in translation and interacts with the translation initiation factor eIF3. Nucleic Acids Research, 36(14):4708–4718. Lee, C.-S., Dias, A. P., Jedrychowski, M., Patel, A. H., Hsu, J. L., and Reed, R. (2008b). Human DDX3 functions in translation and interacts with the translation initiation factor eIF3. Nucleic Acids Research, 36(14):4708–4718. Lee, S., Liu, B., Lee, S., Huang, S.-X., Shen, B., and Qian, S.-B. (2012). Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proceedings of the National Academy of Sciences, 109(37):E2424–E2432. Lennox, A. L., Hoye, M. L., Jiang, R., Johnson-Kerner, B. L., Suit, L. A., Venkataramanan, S., Sheehan, C. J., Alsina, F. C., Fregeau, B., Aldinger, K. A., et al. (2020a). Pathogenic DDX3X mutations impair RNA metabolism and neurogenesis during fetal cortical development. Neuron, 106(3):404–420. Lennox, A. L., Hoye, M. L., Jiang, R., Johnson-Kerner, B. L., Suit, L. A., Venkataramanan, S., Sheehan, C. J., Alsina, F. C., Fregeau, B., Aldinger, K. A., et al. (2020b). Pathogenic DDX3X mutations impair RNA metabolism and neurogenesis during fetal cortical development. Neuron, 106(3):404–420. Leprivier, G., Remke, M., Rotblat, B., Dubuc, A., Mateo, A.-R. F., Kool, M., Agnihotri, S., El-Naggar, A., Yu, B., Somasekharan, S. P., et al. (2013). The eEF2 kinase confers resistance to nutrient deprivation by blocking translation elongation. Cell, 153(5):1064– 1079. Leucci, E., Cocco, M., Onnis, A., De Falco, G., Van Cleef, P., Bellan, C., Van Rijk, A., Nyagol, J., Byakika, B., Lazzi, S., et al. (2008). MYC translocation-negative classical Burkitt lymphoma cases: an alternative pathogenetic mechanism involving mirna deregulation. The Journal of Pathology: A Journal of the Pathological Society of Great Britain and Ireland, 216(4):440–450. Li, C., Kim, S.-W., Rai, D., Bolla, A. R., Adhvaryu, S., Kinney, M. C., Robetorye, R. S., and Aguiar, R. C. (2009a). Copy number abnormalities, MYC activity, and the genetic fingerprint of normal B cells mechanistically define the microRNA profile of di↵use large B-cell lymphoma. Blood, 113(26):6681–6690. 178 Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009b). The sequence alignment/map format and SAMtools. Bioinformatics, 25(16):2078–2079. Li, J. and Liu, C. (2019). Coding or noncoding, the converging concepts of RNAs. Frontiers in genetics, 10:496. Li, W., Wang, W., Uren, P. J., Penalva, L. O., and Smith, A. D. (2017). Riborex: fast and flexible identification of di↵erential translation from Ribo-seq data. Bioinformatics, 33(11):1735–1737. Liao, J.-M., Zhou, X., Gatignol, A., and Lu, H. (2014). Ribosomal proteins L5 and L11 co-operatively inactivate c-Myc via RNA-induced silencing complex. Oncogene, 33(41):4916–4923. Linder, P. and Jankowsky, E. (2011). From unwinding to clamping the DEAD box RNA helicase family. Nature Reviews Molecular Cell Biology, 12(8):505–516. Lindstro¨m, M. S., Jurada, D., Bursac, S., Orsolic, I., Bartek, J., and Volarevic, S. (2018). Nucleolus as an emerging hub in maintenance of genome stability and cancer pathogenesis. Oncogene, 37(18):2351–2366. Liu, G. Y. and Sabatini, D. M. (2020). mTOR at the nexus of nutrition, growth, ageing and disease. Nature Reviews Molecular Cell Biology, 21(4):183–203. Liu, J., Xu, Y., Stoleru, D., and Salic, A. (2012). Imaging protein synthesis in cells and tissues with an alkyne analog of puromycin. Proceedings of the National Academy of Sciences, 109(2):413–418. Liu, P., Ge, M., Hu, J., Li, X., Che, L., Sun, K., Cheng, L., Huang, Y., Pilo, M. G., Cigliano, A., et al. (2017). A functional mammalian target of rapamycin complex 1 signaling is indispensable for c-Myc-driven hepatocarcinogenesis. Hepatology, 66(1):167–181. Liu, Y., Beyer, A., and Aebersold, R. (2016). On the dependency of cellular protein levels on mRNA abundance. Cell, 165(3):535–550. Lo´pez, C., Kleinheinz, K., Aukema, S. M., Rohde, M., Bernhart, S. H., Hu¨bschmann, D., Wagener, R., Toprak, U. H., Raimondi, F., Kreuz, M., et al. (2019). Genomic and transcriptomic changes complement each other in the pathogenesis of sporadic Burkitt lymphoma. Nature communications, 10(1):1–19. Love, M. I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12):1–21. 179 Lu, J., Cannizzaro, E., Meier-Abt, F., Scheinost, S., Bruch, P.-M., Giles, H. A., Lu¨tge, A., Hu¨llein, J., Wagner, L., Giacopelli, B., et al. (2021). Multi-omics reveals clinically relevant proliferative drive associated with mTOR-MYC-OXPHOS activity in chronic lymphocytic leukemia. Nature Cancer, 2(8):853–864. Lundberg, E., Fagerberg, L., Klevebring, D., Matic, I., Geiger, T., Cox, J., A¨lgena¨s, C., Lundeberg, J., Mann, M., and Uhlen, M. (2010). Defining the transcriptome and proteome in three functionally di↵erent human cell lines. Molecular Systems Biology, 6(1):450. Lynch, M. and Marinov, G. K. (2015). The bioenergetic costs of a gene. Proceedings of the National Academy of Sciences, 112(51):15690–15695. Mackowiak, S. D., Zauber, H., Bielow, C., Thiel, D., Kutz, K., Calviello, L., Mastrobuoni, G., Rajewsky, N., Kempa, S., Selbach, M., et al. (2015). Extensive identification and analysis of conserved small ORFs in animals. Genome Biology, 16(1):1–21. Malumbres, R., Sarosiek, K. A., Cubedo, E., Ruiz, J. W., Jiang, X., Gascoyne, R. D., Tibshirani, R., and Lossos, I. S. (2009). Di↵erentiation stage–specific expression of microRNAs in B lymphocytes and di↵use large B-cell lymphomas. Blood, 113(16):3754– 3764. Mangus, D. A., Evans, M. C., and Jacobson, A. (2003). Poly (A)-binding proteins: multifunctional sca↵olds for the post-transcriptional control of gene expression. Genome Biology, 4(7):1–14. Maquat, L. E., Tarn, W.-Y., and Isken, O. (2010). The pioneer round of translation: features and functions. Cell, 142(3):368–374. Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. Journal, 17(1):10–12. Mathieson, T., Franken, H., Kosinski, J., Kurzawa, N., Zinn, N., Sweetman, G., Poeckel, D., Ratnu, V. S., Schramm, M., Becher, I., et al. (2018). Systematic analysis of protein turnover in primary cells. Nature Communications, 9(1):1–10. McCord, R., Bolen, C. R., Koeppen, H., Kadel, E. E., Oestergaard, M. Z., Nielsen, T., Sehn, L. H., and Venstrom, J. M. (2019a). PD-L1 and tumor-associated macrophages in de novo DLBCL. Blood Advances, 3(4):531–540. McCord, R., Bolen, C. R., Koeppen, H., Kadel, E. E., Oestergaard, M. Z., Nielsen, T., Sehn, L. H., and Venstrom, J. M. (2019b). PD-L1 and tumor-associated macrophages in de novo DLBCL. Blood Advances, 3(4):531–540. 180 McGlincy, N. J. and Ingolia, N. T. (2017). Transcriptome-wide measurement of translation by ribosome profiling. Methods, 126:112–129. McMahon, S. B. (2014). MYC and the control of apoptosis. Cold Spring Harbor perspectives in medicine, 4(7):a014407. Meyer, K. D., Patil, D. P., Zhou, J., Zinoviev, A., Skabkin, M. A., Elemento, O., Pestova, T. V., Qian, S.-B., and Ja↵rey, S. R. (2015). 5 UTR m6A promotes cap-independent translation. Cell, 163(4):999–1010. Meyer, N. and Penn, L. Z. (2008). Reflecting on 25 years with MYC. Nature Reviews Cancer, 8(12):976–990. Mlynarczyk, C., Fonta´n, L., and Melnick, A. (2019). Germinal center-derived lymphomas: The darkest side of humoral immunity. Immunological Reviews, 288(1):214–239. Mo, J., Liang, H., Su, C., Li, P., Chen, J., and Zhang, B. (2021). DDX3X: structure, physiologic functions and cancer. Molecular cancer, 20(1):1–20. Modelska, A., Turro, E., Russell, R., Beaton, J., Sbarrato, T., Spriggs, K., Miller, J., Gra¨f, S., Provenzano, E., Blows, F., et al. (2015). The malignant phenotype in breast cancer is driven by eif4a1-mediated changes in the translational landscape. Cell death & disease, 6(1):e1603–e1603. Mohr, I. (2016). Virology: Closing in on the causes of host shuto↵. eLife, 5:e20755. Molliex, A., Temirov, J., Lee, J., Coughlin, M., Kanagaraj, A. P., Kim, H. J., Mittag, T., and Taylor, J. P. (2015). Phase separation by low complexity domains promotes stress granule assembly and drives pathological fibrillization. Cell, 163(1):123–133. Moreno-Mateos, M. A., Vejnar, C. E., Beaudoin, J.-D., Fernandez, J. P., Mis, E. K., Khokha, M. K., and Giraldez, A. J. (2015). CRISPRscan: designing highly ecient sgRNAs for CRISPR-Cas9 targeting in vivo. Nature Methods, 12(10):982–988. Morgan, M., Page`s, H., Obenchain, V., and Hayden, N. (2016). Rsamtools: Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import. R package version, 1(0):677–689. Morrish, F. and Hockenbery, D. (2014). MYC and mitochondrial biogenesis. Cold Spring Harbor perspectives in medicine, 4(5):a014225. Morton, L. M., Wang, S. S., Devesa, S. S., Hartge, P., Weisenburger, D. D., and Linet, M. S. (2006). Lymphoma incidence patterns by WHO subtype in the United States, 1992-2001. Blood, 107(1):265–276. 181 Moser, B. and Willimann, K. (2004). Chemokines: role in inflammation and immune surveillance. Annals of the rheumatic diseases, 63(suppl 2):ii84–ii89. Muramatsu, M., Kinoshita, K., Fagarasan, S., Yamada, S., Shinkai, Y., and Honjo, T. (2000). Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell, 102(5):553–563. Nakagawa, R. and Calado, D. P. (2021). Positive selection in the light zone of germinal centers. Frontiers in Immunology, 12:1053. Nascimento, E. M., Cox, C. L., MacArthur, S., Hussain, S., Trotter, M., Blanco, S., Suraj, M., Nichols, J., Ku¨bler, B., Benitah, S. A., et al. (2011). The opposing transcriptional functions of Sin3a and c-Myc are required to maintain tissue homeostasis. Nature Cell Biology, 13(12):1395–1405. Navarro, A., Gaya, A., Martinez, A., Urbano-Ispizua, A., Pons, A., Balague´, O., Gel, B., Abrisqueta, P., Lopez-Guillermo, A., Artells, R., et al. (2008). MicroRNA expression profiling in classic Hodgkin lymphoma. Blood, 111(5):2825–2832. Neelagandan, N., Lamberti, I., Carvalho, H. J., Gobet, C., and Naef, F. (2020). What determines eukaryotic translation elongation: recent molecular and quantitative analyses of protein synthesis. Open biology, 10(12):200292. Nesvizhskii, A. I. (2014). Proteogenomics: concepts, applications and computational strategies. Nature Methods, 11(11):1114–1125. Northcott, P. A., Buchhalter, I., Morrissy, A. S., Hovestadt, V., Weischenfeldt, J., Ehren- berger, T., Gro¨bner, S., Segura-Wang, M., Zichner, T., Rudneva, V. A., et al. (2017). The whole-genome landscape of medulloblastoma subtypes. Nature, 547(7663):311–317. Obrig, T. G., Culp, W. J., McKeehan, W. L., and Hardesty, B. (1971). The mechanism by which cycloheximide and related glutarimide antibiotics inhibit peptide synthesis on reticulocyte ribosomes. Journal of Biological Chemistry, 246(1):174–181. Oertlin, C., Lorent, J., Murie, C., Furic, L., Topisirovic, I., and Larsson, O. (2019). Generally applicable transcriptome-wide analysis of translation using anota2seq. Nucleic Acids Research, 47(12):e70–e70. Oh, S., Flynn, R. A., Floor, S. N., Purzner, J., Martin, L., Do, B. T., Schubert, S., Vaka, D., Morrissy, S., Li, Y., et al. (2016). Medulloblastoma-associated DDX3 variant selectively alters the translational response to stress. Oncotarget, 7(19):28169. 182 Ojha, J., Ayres, J., Secreto, C., Tschumper, R., Rabe, K., Van Dyke, D., Slager, S., Shanafelt, T., Fonseca, R., Kay, N. E., et al. (2015). Deep sequencing identifies genetic heterogeneity and recurrent convergent evolution in chronic lymphocytic leukemia. Blood, The Journal of the American Society of Hematology, 125(3):492–498. Olexiouk, V., Van Criekinge, W., and Menschaert, G. (2018). An update on sORFs. org: a repository of small orfs identified by ribosome profiling. Nucleic Acids Research, 46(D1):D497–D502. Orr, M. W., Mao, Y., Storz, G., and Qian, S.-B. (2020). Alternative ORFs and small ORFs: shedding light on the dark proteome. Nucleic Acids Research, 48(3):1029–1042. Ouspenskaia, T., Law, T., Clauser, K. R., Klaeger, S., Sarkizova, S., Aguet, F., Li, B., Christian, E., Knisbacher, B. A., Le, P. M., Hartigan, C. R., Keshishian, H., Ap↵el, A., Oliveira, G., Zhang, W., Chow, Y. T., Ji, Z., Shukla, S. A., Bachireddy, P., Getz, G., Hacohen, N., Keskin, D. B., Carr, S. A., Wu, C. J., and Regev, A. (2020). Thousands of novel unannotated proteins expand the MHC I immunopeptidome in cancer. bioRxiv. Ozuah, N. W., Lubega, J., Allen, C. E., and El-Mallawany, N. K. (2020). Five decades of low intensity and low survival: adapting intensified regimens to cure pediatric Burkitt lymphoma in Africa. Blood Advances, 4(16):4007–4019. Pakos-Zebrucka, K., Koryga, I., Mnich, K., Ljujic, M., Samali, A., and Gorman, A. M. (2016). The integrated stress response. EMBO reports, 17(10):1374–1395. Palade, G. E. (1955). A small particulate component of the cytoplasm. The Journal of Cell Biology, 1(1):59–68. Pardoll, D. M. (2012). The blockade of immune checkpoints in cancer immunotherapy. Nature Reviews Cancer, 12(4):252–264. Patmore, D. M., Jassim, A., Nathan, E., Gilbertson, R. J., Tahan, D., Ho↵mann, N., Tong, Y., Smith, K. S., Kanneganti, T.-D., Suzuki, H., et al. (2020). DDX3X suppresses the susceptibility of hindbrain lineages to medulloblastoma. Developmental Cell, 54(4):455– 470. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., and Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature methods, 14(4):417–419. Pearson, H., Daouda, T., Granados, D. P., Durette, C., Bonneil, E., Courcelles, M., Rodenbrock, A., Laverdure, J.-P., Coˆte´, C., Mader, S., et al. (2016). MHC class I– associated peptides derive from selective regions of the human genome. The Journal of clinical investigation, 126(12):4690–4701. 183 Pearson, K. (1897). On a form of spurious correlation which may arise when indices are used in the measurement of organs. In Royal Soc., London, Proc., volume 60, pages 489–502. Pelletier, J., Thomas, G., and Volarevic´, S. (2018). Ribosome biogenesis in cancer: new players and therapeutic avenues. Nature Reviews Cancer, 18(1):51–63. Perdiga˜o, N., Heinrich, J., Stolte, C., Sabir, K. S., Buckley, M. J., Tabor, B., Signal, B., Gloss, B. S., Hammang, C. J., Rost, B., et al. (2015). Unexpected features of the dark proteome. Proceedings of the National Academy of Sciences, 112(52):15898–15903. Perez-Riverol, Y., Csordas, A., Bai, J., Bernal-Llinares, M., Hewapathirana, S., Kundu, D. J., Inuganti, A., Griss, J., Mayer, G., Eisenacher, M., et al. (2019). The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Research, 47(D1):D442–D450. Perez-Riverol, Y., Vizca´ıno, J. A., and Griss, J. (2018). Future prospects of spectral clustering approaches in proteomics. Proteomics, 18(14):1700454. Pfeifer, M., Grau, M., Lenze, D., Wenzel, S.-S., Wolf, A., Wollert-Wulf, B., Dietze, K., Nogai, H., Storek, B., Madle, H., et al. (2013). PTEN loss defines a PI3K/AKT pathway- dependent germinal center subtype of di↵use large B-cell lymphoma. Proceedings of the National Academy of Sciences, 110(30):12420–12425. Pfister, A. S. (2019). Emerging role of the nucleolar stress response in autophagy. Frontiers in Cellular Neuroscience, 13:156. Phelan, J. D., Young, R. M., Webster, D. E., Roulland, S., Wright, G. W., Kasbekar, M., Sha↵er, A. L., Ceribelli, M., Wang, J. Q., Schmitz, R., et al. (2018). A multiprotein supercomplex controlling oncogenic signalling in lymphoma. Nature, 560(7718):387–391. Philippe, L., van den Elzen, A. M., Watson, M. J., and Thoreen, C. C. (2020). Global analysis of LARP1 translation targets reveals tunable and dynamic features of 5TOP motifs. Proceedings of the National Academy of Sciences, 117(10):5319–5328. Phung, B., Cies´la, M., Sanna, A., Guzzi, N., Beneventi, G., Ngoc, P. C. T., Lauss, M., Cabrita, R., Cordero, E., Bosch, A., et al. (2019). The X-linked DDX3X RNA helicase dictates translation reprogramming and metastasis in melanoma. Cell Reports, 27(12):3573–3586. Pianese, G. (1896). Beitrag zur histologie und aetiologie des carcinoms, volume 1. G. Fischer. 184 Piccirillo, C. A., Bjur, E., Topisirovic, I., Sonenberg, N., and Larsson, O. (2014). Transla- tional control of immune responses: from transcripts to translatomes. Nature Immunology, 15(6):503–511. Pourdehnad, M., Truitt, M. L., Siddiqi, I. N., Ducker, G. S., Shokat, K. M., and Ruggero, D. (2013). Myc and mTOR converge on a common node in protein synthesis control that confers synthetic lethality in Myc-driven cancers. Proceedings of the National Academy of Sciences, 110(29):11988–11993. Presnyak, V., Alhusaini, N., Chen, Y.-H., Martin, S., Morris, N., Kline, N., Olson, S., Weinberg, D., Baker, K. E., Graveley, B. R., et al. (2015). Codon optimality is a major determinant of mRNA stability. Cell, 160(6):1111–1124. Protter, D. S. and Parker, R. (2016). Principles and properties of stress granules. Trends in Cell Biology, 26(9):668–679. Pugh, T. J., Weeraratne, S. D., Archer, T. C., Krummel, D. A. P., Auclair, D., Bochicchio, J., Carneiro, M. O., Carter, S. L., Cibulskis, K., Erlich, R. L., et al. (2012). Medul- loblastoma exome sequencing uncovers subtype-specific somatic mutations. Nature, 488(7409):106–110. Pyronnet, S., Imataka, H., Gingras, A.-C., Fukunaga, R., Hunter, T., and Sonenberg, N. (1999). Human eukaryotic translation initiation factor 4G (eIF4G) recruits mnk1 to phosphorylate eIF4E. The EMBO journal, 18(1):270–279. Rakhra, K., Bachireddy, P., Zabuawala, T., Zeiser, R., Xu, L., Kopelman, A., Fan, A. C., Yang, Q., Braunstein, L., Crosby, E., et al. (2010). CD4+ T cells contribute to the remodeling of the microenvironment required for sustained tumor regression upon oncogene inactivation. Cancer Cell, 18(5):485–498. Ramı´rez, F., Ryan, D. P., Gru¨ning, B., Bhardwaj, V., Kilpert, F., Richter, A. S., Heyne, S., Du¨ndar, F., and Manke, T. (2016). deeptools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research, 44(W1):W160–W165. Ratje, A. H., Loerke, J., Mikolajka, A., Bru¨nner, M., Hildebrand, P. W., Starosta, A. L., Do¨nho¨fer, A., Connell, S. R., Fucini, P., Mielke, T., et al. (2010). Head swivel on the ribosome facilitates translocation by means of intra-subunit tRNA hybrid sites. Nature, 468(7324):713–716. Raught, B. and Gingras, A.-C. (1999). eIF4E activity is regulated at multiple levels. The international journal of biochemistry & cell biology, 31(1):43–57. 185 Rauschendorf, M.-A., Zimmer, J., Hanstein, R., Dickemann, C., and Vogt, P. (2011). Complex transcriptional control of the AZFa gene DDX3Y in human testis. International Journal of Andrology, 34(1):84–96. Reddy, A., Zhang, J., Davis, N. S., Mott, A. B., Love, C. L., Waldrop, A., Leppa, S., Pasanen, A., Meriranta, L., Karjalainen-Lindsberg, M.-L., et al. (2017). Genetic and functional drivers of di↵use large b cell lymphoma. Cell, 171(2):481–494. Richter, J., Schlesner, M., Ho↵mann, S., Kreuz, M., Leich, E., Burkhardt, B., Rosolowski, M., Ammerpohl, O., Wagener, R., Bernhart, S. H., et al. (2012). Recurrent mutation of the ID3 gene in Burkitt lymphoma identified by integrated genome, exome and transcriptome sequencing. Nature Genetics, 44(12):1316. Richter, J. D. and Coller, J. (2015). Pausing on polyribosomes: make way for elongation in translational control. Cell, 163(2):292–300. Robichaud, N., Hsu, B. E., Istomine, R., Alvarez, F., Blagih, J., Ma, E. H., Morales, S. V., Dai, D. L., Li, G., Souleimanova, M., et al. (2018). Translational control in the tumor microenvironment promotes lung metastasis: Phosphorylation of eif4e in neutrophils. Proceedings of the National Academy of Sciences, 115(10):E2202–E2209. Robinson, G., Parker, M., Kranenburg, T. A., Lu, C., Chen, X., Ding, L., Phoenix, T. N., Hedlund, E., Wei, L., Zhu, X., et al. (2012). Novel mutations target distinct subgroups of medulloblastoma. Nature, 488(7409):43–48. Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010). edgeR: a Bioconductor package for di↵erential expression analysis of digital gene expression data. Bioinformatics, 26(1):139–140. Rogozin, I. B., Kochetov, A. V., Kondrashov, F. A., Koonin, E. V., and Milanesi, L. (2001). Presence of ATG triplets in 5 untranslated regions of eukaryotic cDNAs correlates with a ‘weak’context of the start codon. Bioinformatics, 17(10):890–900. Rolfe, D. and Brown, G. C. (1997). Cellular energy utilization and molecular origin of standard metabolic rate in mammals. Physiological Reviews, 77(3):731–758. Ron, D. and Walter, P. (2007). Signal integration in the endoplasmic reticulum unfolded protein response. Nature Reviews Molecular cell biology, 8(7):519–529. Rooijers, K., Loayza-Puch, F., Nijtmans, L. G., and Agami, R. (2013). Ribosome profiling reveals features of normal and disease-associated mitochondrial translation. Nature Communications, 4(1):1–8. 186 Rouschop, K. M., Van Den Beucken, T., Dubois, L., Niessen, H., Bussink, J., Savelkouls, K., Keulers, T., Mujcic, H., Landuyt, W., Voncken, J. W., et al. (2010). The unfolded protein response protects human tumor cells during hypoxia through regulation of the autophagy genes MAP1LC3B and ATG5. The Journal of Clinical Investigation, 120(1):127–141. Ruggero, D. (2013). Translational control in cancer etiology. Cold Spring Harbor Perspec- tives in Biology, 5(2):a012336. Ruggero, D., Montanaro, L., Ma, L., Xu, W., Londei, P., Cordon-Cardo, C., and Pandolfi, P. P. (2004). The translation factor eIF-4E promotes tumor formation and cooperates with c-Myc in lymphomagenesis. Nature medicine, 10(5):484–486. Ruggiano, A., Foresti, O., and Carvalho, P. (2014). Er-associated degradation: Protein quality control and beyond. Journal of Cell Biology, 204(6):869–879. Ruiz-Orera, J., Messeguer, X., Subirana, J. A., and Alba, M. M. (2014). Long non-coding RNAs as a source of new peptides. eLife, 3:e03523. Runte, F., Renner IV, P., and Hoppe, M. (2019). Kuby immunology. Sabi, R. and Tuller, T. (2014). Modelling the Eciency of Codon–tRNA Interactions Based on Codon Usage Bias. DNA Research, 21(5):511–526. Sadedin, S. P., Pope, B., and Oshlack, A. (2012). Bpipe: a tool for running and managing bioinformatics pipelines. Bioinformatics, 28(11):1525–1526. Samir, P., Kesavardhana, S., Patmore, D. M., Gingras, S., Malireddi, R. S., Karki, R., Guy, C. S., Briard, B., Place, D. E., Bhattacharya, A., et al. (2019). DDX3X acts as a live-or-die checkpoint in stressed cells by regulating NLRP3 inflammasome. Nature, 573(7775):590–594. Sander, S., Calado, D. P., Srinivasan, L., Ko¨chert, K., Zhang, B., Rosolowski, M., Rodig, S. J., Holzmann, K., Stilgenbauer, S., Siebert, R., et al. (2012). Synergy between PI3K signaling and MYC in Burkitt lymphomagenesis. Cancer Cell, 22(2):167–179. Sanson, K. R., Hanna, R. E., Hegde, M., Donovan, K. F., Strand, C., Sullender, M. E., Vaimberg, E. W., Goodale, A., Root, D. E., Piccioni, F., et al. (2018). Optimized libraries for CRISPR-Cas9 genetic screens with multiple modalities. Nature Communications, 9(1):1–15. Santos, D. A., Shi, L., Tu, B. P., and Weissman, J. S. (2019). Cycloheximide can distort measurements of mRNA levels and translation eciency. Nucleic acids research, 47(10):4974–4985. 187 Sanz, E., Yang, L., Su, T., Morris, D. R., McKnight, G. S., and Amieux, P. S. (2009). Cell- type-specific isolation of ribosome-associated mRNA from complex tissues. Proceedings of the National Academy of Sciences, 106(33):13939–13944. Sarkizova, S., Klaeger, S., Le, P. M., Li, L. W., Oliveira, G., Keshishian, H., Hartigan, C. R., Zhang, W., Braun, D. A., Ligon, K. L., et al. (2020). A large peptidome dataset improves HLA class I epitope prediction across most of the human population. Nature Biotechnology, 38(2):199–209. Saxton, R. A. and Sabatini, D. M. (2017). mTOR Signaling in Growth, Metabolism, and Disease. Cell, 168(6):960–976. Schatz, J. H., Oricchio, E., Wolfe, A. L., Jiang, M., Linkov, I., Maragulia, J., Shi, W., Zhang, Z., Rajasekhar, V. K., Pagano, N. C., et al. (2011). Targeting cap-dependent translation blocks converging survival signals by AKT and PIM kinases in lymphoma. Journal of Experimental Medicine, 208(9):1799–1807. Schmidt, E. V. (2004). The role of c-myc in regulation of translation initiation. Oncogene, 23(18):3217–3221. Schmitz, R., Wright, G. W., Huang, D. W., Johnson, C. A., Phelan, J. D., Wang, J. Q., Roulland, S., Kasbekar, M., Young, R. M., Sha↵er, A. L., et al. (2018). Genetics and pathogenesis of di↵use large b-cell lymphoma. New England Journal of Medicine, 378(15):1396–1407. Schmitz, R., Young, R. M., Ceribelli, M., Jhavar, S., Xiao, W., Zhang, M., Wright, G., Sha↵er, A. L., Hodson, D. J., Buras, E., et al. (2012). Burkitt lymphoma pathogenesis and therapeutic targets from structural and functional genomics. Nature, 490(7418):116– 120. Schneider-Poetsch, T., Ju, J., Eyler, D. E., Dang, Y., Bhat, S., Merrick, W. C., Green, R., Shen, B., and Liu, J. O. (2010). Inhibition of eukaryotic translation elongation by cycloheximide and lactimidomycin. Nature Chemical Biology, 6(3):209–217. Schueren, F. and Thoms, S. (2016). Functional translational readthrough: a systems biology perspective. PLoS Genetics, 12(8):e1006196. Schuller, A. P. and Green, R. (2018). Roadblocks and resolutions in eukaryotic translation. Nature Reviews Molecular Cell Biology, 19(8):526–541. Schwanha¨usser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., Chen, W., and Selbach, M. (2011). Global quantification of mammalian gene expression control. Nature, 473(7347):337–342. 188 Sen, N. D., Zhou, F., Ingolia, N. T., and Hinnebusch, A. G. (2015). Genome-wide analysis of translational eciency reveals distinct but overlapping functions of yeast DEAD-box RNA helicases Ded1 and eIF4A. Genome Research, 25(8):1196–1205. Sendoel, A., Dunn, J. G., Rodriguez, E. H., Naik, S., Gomez, N. C., Hurwitz, B., Levorse, J., Dill, B. D., Schramek, D., Molina, H., et al. (2017). Translation from unconventional 5 start sites drives tumour initiation. Nature, 541(7638):494–499. Sha, C., Barrans, S., Cucco, F., Bentley, M. A., Care, M. A., Cummin, T., Kennedy, H., Thompson, J. S., Uddin, R., Worrillow, L., et al. (2019). Molecular high-grade b-cell lymphoma: defining a poor-risk group that requires di↵erent approaches to therapy. Journal of Clinical Oncology, 37(3):202. Sha↵er III, A. L., Young, R. M., and Staudt, L. M. (2012). Pathogenesis of human b cell lymphomas. Annual review of immunology, 30:565–610. Sharma, P., Nilges, B. S., Wu, J., and Leidel, S. A. (2019). The translation inhibitor cycloheximide a↵ects ribosome profiling data in a species-specific manner. bioRxiv, page 746255. Sharpe, A. H. and Pauken, K. E. (2018). The diverse functions of the PD1 inhibitory pathway. Nature Reviews Immunology, 18(3):153–167. Shatsky, I. N., Terenin, I. M., Smirnova, V. V., and Andreev, D. E. (2018). Cap-independent translation: What’s in a name? Trends in Biochemical Sciences, 43(11):882–895. Shi, Z. and Barna, M. (2015). Translating the genome in time and space: specialized ribosomes, RNA regulons, and RNA-binding proteins. Annual Review of Cell and Developmental Biology, 31:31–54. Shi, Z., Fujii, K., Kovary, K. M., Genuth, N. R., Ro¨st, H. L., Teruel, M. N., and Barna, M. (2017). Heterogeneous ribosomes preferentially translate distinct subpools of mRNAs genome-wide. Molecular cell, 67(1):71–83. Shih, J., Tsai, T., Chao, C.-H., and Lee, Y. W. (2008). Candidate tumor suppressor DDX3 RNA helicase specifically represses cap-dependent translation by acting as an eIF4E inhibitory protein. Oncogene, 27(5):700–714. Shiue, C. N., Berkson, R. G., and Wright, A. P. (2009). c-Myc induces changes in higher order rDNA structure on stimulation of quiescent cells. Oncogene, 28(16):1833–1842. Silvera, D., Formenti, S. C., and Schneider, R. J. (2010). Translational control in cancer. Nature Reviews Cancer, 10(4):254–266. 189 Skourti-Stathaki, K. and Proudfoot, N. J. (2014). A double-edged sword: R loops as threats to genome integrity and powerful regulators of gene expression. Genes & Development, 28(13):1384–1396. Smith, A., Crouch, S., Lax, S., Li, J., Painter, D., Howell, D., Patmore, R., Jack, A., and Roman, E. (2015). Lymphoma incidence, survival and prevalence 2004-2014: Sub-type analyses from the UK’s Haematological Malignancy Research Network. British Journal of Cancer, 112(9):1575–1584. Smith, L. M. and Kelleher, N. L. (2013). Proteoform: a single term describing protein complexity. Nature methods, 10(3):186–187. Smith, R. C., Kanellos, G., Vlahov, N., Alexandrou, C., Willis, A. E., Knight, J. R., and Sansom, O. J. (2021). Translation initiation in cancer at a glance. Journal of Cell Science, 134(1). Sole, C., Larrea, E., Manterola, L., Goicoechea, I., Armesto, M., Arestin, M., M Ca↵arel, M., M Araujo, A., Fernandez-Mercado, M., Araiz, M., et al. (2016). Aberrant expression of microRNAs in B-cell lymphomas. Microrna, 5(2):87–105. Sonenberg, N. (1996). mRNA 5’cap-binding protein eIF4E and control of cell growth. Translational Control. Sonenberg, N. and Hinnebusch, A. G. (2009). Regulation of translation initiation in eukaryotes: mechanisms and biological targets. Cell, 136(4):731–745. Song, S., Cao, C., Choukrallah, M.-A., Tang, F., Christofori, G., Kohler, H., Wu, F., Fodor, B. D., Frederiksen, M., Willis, S. N., et al. (2021). OBF1 and Oct factors control the germinal center transcriptional program. Blood, The Journal of the American Society of Hematology, 137(21):2920–2934. Soto-Rifo, R., Rubilar, P. S., Limousin, T., De Breyne, S., Decimo, D., and Ohlmann, T. (2012). DEAD-box protein DDX3 associates with eif4f to promote translation of selected mRNAs. The EMBO journal, 31(18):3745–3756. Staege, M. S., Lee, S. P., Frisan, T., Mautner, J., Scholz, S., Pajic, A., Rickinson, A. B., Masucci, M. G., Polack, A., and Bornkamm, G. W. (2002). MYC overexpression imposes a nonimmunogenic phenotype on Epstein–Barr virus-infected B cells. Proceedings of the National Academy of Sciences, 99(7):4550–4555. Statello, L., Guo, C.-J., Chen, L.-L., and Huarte, M. (2021). Gene regulation by long non-coding RNAs and its biological functions. Nature Reviews Molecular Cell Biology, 22(2):96–118. 190 Steinhardt, J. J., Peroutka, R. J., Mazan-Mamczarz, K., Chen, Q., Houng, S., Robles, C., Barth, R. N., DuBose, J., Bruns, B., Tesoriero, R., et al. (2014). Inhibiting CARD11 translation during BCR activation by targeting the eIF4A RNA helicase. Blood, 124(25):3758–3767. Stransky, N., Eglo↵, A. M., Tward, A. D., Kostic, A. D., Cibulskis, K., Sivachenko, A., Kryukov, G. V., Lawrence, M. S., Sougnez, C., McKenna, A., et al. (2011). The mutational landscape of head and neck squamous cell carcinoma. Science, 333(6046):1157– 1160. Stults, D. M., Killen, M. W., Williamson, E. P., Hourigan, J. S., Vargas, H. D., Arnold, S. M., Moscow, J. A., and Pierce, A. J. (2009). Human rRNA gene clusters are recombinational hotspots in cancer. Cancer Research, 69(23):9096–9104. Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43):15545– 15550. Suresh, S., Chen, B., Zhu, J., Golden, R. J., Lu, C., Evers, B. M., Novaresi, N., Smith, B., Zhan, X., Schmid, V., et al. (2020). eIF5B drives integrated stress response-dependent translation of PD-L1 in lung cancer. Nature Cancer, 1(5):533–545. Swerdlow, S. H., Campo, E., Pileri, S. A., Harris, N. L., Stein, H., Siebert, R., Advani, R., Ghielmini, M., Salles, G. A., Zelenetz, A. D., et al. (2016). The 2016 revision of the world health organization classification of lymphoid neoplasms. Blood, 127(20):2375–2390. Takahashi, K., Hu, B., Wang, F., Yan, Y., Kim, E., Vitale, C., Patel, K. P., Strati, P., Gumbs, C., Little, L., et al. (2018). Clinical implications of cancer gene muta- tions in patients with chronic lymphocytic leukemia treated with lenalidomide. Blood, 131(16):1820–1832. Tate, J. G., Bamford, S., Jubb, H. C., Sondka, Z., Beare, D. M., Bindal, N., Boutselakis, H., Cole, C. G., Creatore, C., Dawson, E., et al. (2019). COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Research, 47(D1):D941–D947. Taylor, J., Yeomans, A. M., and Packham, G. (2020). Targeted inhibition of mRNA translation initiation factors as a novel therapeutic strategy for mature B-cell neoplasms. Exploration of targeted anti-tumor therapy, 1:3. 191 Terenin, I. M., Andreev, D. E., Dmitriev, S. E., and Shatsky, I. N. (2013). A novel mechanism of eukaryotic translation initiation that is neither m7G-cap-, nor IRES- dependent. Nucleic Acids Research, 41(3):1807–1816. Thoreen, C. C., Chantranupong, L., Keys, H. R., Wang, T., Gray, N. S., and Sabatini, D. M. (2012). A unifying model for mTORC1-mediated regulation of mRNA translation. Nature, 485(7396):109–113. Thyme, S. B., Akhmetova, L., Montague, T. G., Valen, E., and Schier, A. F. (2016). Internal guide RNA interactions interfere with Cas9-mediated cleavage. Nature Communications, 7(1):1–7. Tompa, P., Davey, N. E., Gibson, T. J., and Babu, M. M. (2014). A million peptide motifs for the molecular biologist. Molecular Cell, 55(2):161–169. Torrent, M., Chalancon, G., de Groot, N. S., Wuster, A., and Madan Babu, M. (2018). Cells alter their tRNA abundance to selectively regulate protein synthesis during stress conditions. Science Signaling, 11(546). Truitt, M. L., Conn, C. S., Shi, Z., Pang, X., Tokuyasu, T., Coady, A. M., Seo, Y., Barna, M., and Ruggero, D. (2015). Di↵erential requirements for eIF4E dose in normal development and cancer. Cell, 162(1):59–71. Tschochner, H. and Hurt, E. (2003). Pre-ribosomes on the road from the nucleolus to the cytoplasm. Trends in Cell Biology, 13(5):255–263. Tuller, T., Carmi, A., Vestsigian, K., Navon, S., Dorfan, Y., Zaborske, J., Pan, T., Dahan, O., Furman, I., and Pilpel, Y. (2010). An evolutionarily conserved mechanism for controlling the eciency of protein translation. Cell, 141(2):344–354. Twa, D. D., Chan, F. C., Ben-Neriah, S., Woolcock, B. W., Mottok, A., Tan, K. L., Slack, G. W., Gunawardana, J., Lim, R. S., McPherson, A. W., et al. (2014). Genomic rearrangements involving programmed death ligands are recurrent in primary mediastinal large b-cell lymphoma. Blood, 123(13):2062–2065. Unterluggauer, J. J., Prochazka, K., Tomazic, P. V., Huber, H. J., Seeboeck, R., Fechter, K., Steinbauer, E., Gruber, V., Feichtinger, J., Pichler, M., et al. (2018). Expression profile of translation initiation factor eIF2B5 in di↵use large B-cell lymphoma and its correlation to clinical outcome. Blood Cancer Journal, 8(9):1–5. Urra, H., Dufey, E., Avril, T., Chevet, E., and Hetz, C. (2016). Endoplasmic reticulum stress and the hallmarks of cancer. Trends in Cancer, 2(5):252–262. 192 Valentin-Vega, Y. A., Wang, Y.-D., Parker, M., Patmore, D. M., Kanagaraj, A., Moore, J., Rusch, M., Finkelstein, D., Ellison, D. W., Gilbertson, R. J., et al. (2016). Cancer- associated DDX3X mutations drive stress granule assembly and impair global translation. Scientific reports, 6(1):1–16. Van der Auwera, G. A. and O’Connor, B. D. (2020). Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. O’Reilly Media. van Heesch, S., Witte, F., Schneider-Lunitz, V., Schulz, J. F., Adami, E., Faber, A. B., Kirchner, M., Maatz, H., Blachut, S., Sandmann, C.-L., et al. (2019). The translational landscape of the human heart. Cell, 178(1):242–260. Van Riggelen, J., Yetil, A., and Felsher, D. W. (2010). MYC as a regulator of ribosome biogenesis and protein synthesis. Nature Reviews Cancer, 10(4):301–309. Van Steeg, H., Van Oostrom, C. T., Hodemaekers, H. M., Peters, L., and Thomas, A. A. (1991). The translation in vitro of rat ornithine decarboxylase mRNA is blocked by its 5 untranslated region in a polyamine-independent way. Biochemical Journal, 274(2):521–526. Vattem, K. M. and Wek, R. C. (2004). Reinitiation involving upstream ORFs regulates ATF4 mRNA translation in mammalian cells. Proceedings of the National Academy of Sciences, 101(31):11269–11274. Venkataramanan, S., Calviello, L., Wilkins, K., and Floor, S. N. (2020). DDX3X and DDX3Y are redundant in protein synthesis. Biorxiv. Versteeg, R., Noordermeer, I. A., Kru¨se-Wolters, M., Ruiter, D. J., and Schrier, P. I. (1988). c-myc down-regulates class I HLA expression in human melanomas. The EMBO journal, 7(4):1023–1029. Walter, P. and Ron, D. (2011). The unfolded protein response: from stress pathway to homeostatic regulation. Science, 334(6059):1081–1086. Wang, X., Zhao, B. S., Roundtree, I. A., Lu, Z., Han, D., Ma, H., Weng, X., Chen, K., Shi, H., and He, C. (2015). N6-methyladenosine modulates messenger RNA translation eciency. Cell, 161(6):1388–1399. Wei, J., Kishton, R. J., Angel, M., Conn, C. S., Dalla-Venezia, N., Marcel, V., Vincent, A., Catez, F., Ferre´, S., Ayadi, L., et al. (2019). Ribosomal proteins regulate MHC class I peptide generation for immunosurveillance. Molecular cell, 73(6):1162–1173. 193 Weinberg, D. E., Shah, P., Eichhorn, S. W., Hussmann, J. A., Plotkin, J. B., and Bartel, D. P. (2016). Improved ribosome-footprint and mRNA measurements provide insights into dynamics and regulation of yeast translation. Cell Reports, 14(7):1787–1799. Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R. M., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C., and Stuart, J. M. (2013). The cancer genome atlas pan-cancer analysis project. Nature Genetics, 45(10):1113–1120. Wendel, H.-G., De Stanchina, E., Fridman, J. S., Malina, A., Ray, S., Kogan, S., Cordon- Cardo, C., Pelletier, J., and Lowe, S. W. (2004). Survival signalling by Akt and eIF4E in oncogenesis and cancer therapy. Nature, 428(6980):332–337. Wethmar, K., Barbosa-Silva, A., Andrade-Navarro, M. A., and Leutz, A. (2014). uORFdb — a comprehensive literature database on eukaryotic uorf biology. Nucleic Acids Research, 42(D1):D60–D67. Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. Wilmore, S., Rogers-Broadway, K.-R., Taylor, J., Lemm, E., Fell, R., Stevenson, F. K., Forconi, F., Steele, A. J., Coldwell, M., Packham, G., et al. (2021). Targeted inhibition of eIF4A suppresses b-cell receptor-induced translation and expression of MYC and MCL1 in chronic lymphocytic leukemia cells. Cellular and Molecular Life Sciences, pages 1–13. Wolfe, A. L., Singh, K., Zhong, Y., Drewe, P., Rajasekhar, V. K., Sanghvi, V. R., Mavrakis, K. J., Jiang, M., Roderick, J. E., Van der Meulen, J., et al. (2014a). Rna g-quadruplexes cause eif4a-dependent oncogene translation in cancer. Nature, 513(7516):65–70. Wolfe, A. L., Singh, K., Zhong, Y., Drewe, P., Rajasekhar, V. K., Sanghvi, V. R., Mavrakis, K. J., Jiang, M., Roderick, J. E., Van der Meulen, J., Schatz, J. H., Rodrigo, C. M., Zhao, C., Rondou, P., de Stanchina, E., Teruya-Feldstein, J., Kelliher, M. A., Speleman, F., Porco, J. A., Pelletier, J., Ra¨tsch, G., and Wendel, H. G. (2014b). RNA G-quadruplexes cause eIF4A-dependent oncogene translation in cancer. Nature, 513(7516):65–70. Wolin, S. L. and Walter, P. (1988). Ribosome pausing and stacking during translation of a eukaryotic mRNA. The EMBO journal, 7(11):3559–3569. Xiang, N., He, M., Ishaq, M., Gao, Y., Song, F., Guo, L., Ma, L., Sun, G., Liu, D., Guo, D., et al. (2016). The DEAD-box RNA helicase DDX3 interacts with nf-b subunit p65 and suppresses p65-mediated transcription. PloS one, 11(10):e0164471. 194 Xiao, Z., Huang, R., Xing, X., Chen, Y., Deng, H., and Yang, X. (2018). De novo annotation and characterization of the translatome with ribosome profiling data. Nucleic Acids Research, 46(10):e61–e61. Xiao, Z., Zou, Q., Liu, Y., and Yang, X. (2016). Genome-wide assessment of di↵erential translations with ribosome profiling data. Nature Communications, 7(1):1–11. Xu, H., Xiao, T., Chen, C.-H., Li, W., Meyer, C. A., Wu, Q., Wu, D., Cong, L., Zhang, F., Liu, J. S., et al. (2015). Sequence determinants of improved CRISPR sgRNA design. Genome Research, 25(8):1147–1157. Xu, Y., Poggio, M., Jin, H. Y., Shi, Z., Forester, C. M., Wang, Y., Stumpf, C. R., Xue, L., Devericks, E., So, L., et al. (2019). Translation control of the immune checkpoint in cancer and its therapeutic targeting. Nature Medicine, 25(2):301–311. Xu, Y. and Ruggero, D. (2020). The role of translation control in tumorigenesis and its therapeutic implications. Annual Review of Cancer Biology, 4:437–457. Yang, H.-S., Jansen, A. P., Komar, A. A., Zheng, X., Merrick, W. C., Costes, S., Lockett, S. J., Sonenberg, N., and Colburn, N. H. (2003). The transformation suppressor Pdcd4 is a novel eukaryotic translation initiation factor 4A binding protein that inhibits translation. Molecular and Cellular Biology, 23(1):26–37. Yang, X., Zhong, W., and Cao, R. (2020). Phosphorylation of the mRNA cap-binding protein eIF4E and cancer. Cellular Signalling, page 109689. Yang, Y., Fan, X., Mao, M., Song, X., Wu, P., Zhang, Y., Jin, Y., Yang, Y., Chen, L.-L., Wang, Y., et al. (2017). Extensive translation of circular RNAs driven by N 6-methyladenosine. Cell Research, 27(5):626–641. Yang, Y. and Wang, Z. (2019). IRES-mediated cap-independent translation, a path leading to hidden proteome. Journal of Molecular Cell Biology, 11(10):911–919. Yeomans, A., Thirdborough, S. M., Valle-Argos, B., Linley, A., Krysov, S., Hidalgo, M. S., Leonard, E., Ishfaq, M., Wagner, S. D., Willis, A. E., et al. (2016). Engagement of the b-cell receptor of chronic lymphocytic leukemia cells drives global and MYC-specific mRNA translation. Blood, 127(4):449–457. Yewdell, J. W., Anto´n, L. C., and Bennink, J. R. (1996). Defective ribosomal products (DRiPs): a major source of antigenic peptides for MHC class I molecules? The Journal of Immunology, 157(5):1823–1826. 195 Yu, G., Wang, L.-G., Han, Y., and He, Q.-Y. (2012). clusterprofiler: an R package for comparing biological themes among gene clusters. Omics: a journal of integrative biology, 16(5):284–287. Zhai, W. and Comai, L. (2000). Repression of RNA polymerase i transcription by the tumor suppressor p53. Molecular and Cellular Biology, 20(16):5930. Zhang, H., Dou, S., He, F., Luo, J., Wei, L., and Lu, J. (2018a). Genome-wide maps of ribosomal occupancy provide insights into adaptive evolution and regulatory roles of uORFs during Drosophila development. PLoS Biology, 16(7):e2003903. Zhang, H., Wang, Y., and Lu, J. (2019). Function and evolution of upstream ORFs in eukaryotes. Trends in Biochemical Sciences, 44(9):782–794. Zhang, H., Wang, Y., Wu, X., Tang, X., Wu, C., and Lu, J. (2021). Determinants of genome-wide distribution and evolution of uORFs in eukaryotes. Nature Communications, 12(1):1–17. Zhang, J., Medeiros, L. J., and Young, K. H. (2018b). Cancer immunotherapy in di↵use large b-cell lymphoma. Frontiers in oncology, 8:351. Zhang, T., Li, N., Sun, C., Jin, Y., and Sheng, X. (2020). MYC and the unfolded protein response in cancer: synthetic lethal partners in crime? EMBO Molecular Medicine, 12(5):e11845. Zhao, J.-J., Lin, J., Lwin, T., Yang, H., Guo, J., Kong, W., Dessureault, S., Moscinski, L. C., Rezania, D., Dalton, W. S., et al. (2010). microRNA expression profile and identification of miR-29 as a prognostic marker and pathogenetic factor by targeting CDK6 in mantle cell lymphoma. Blood, 115(13):2630–2639. Zhou, P., Blain, A. E., Newman, A. M., Zaka, M., Chagaluka, G., Adlar, F. R., O↵or, U. T., Broadbent, C., Chaytor, L., Whitehead, A., et al. (2019). Sporadic and endemic Burkitt lymphoma have frequent FOXO1 mutations but distinct hotspots in the AKT recognition motif. Blood Advances, 3(14):2118–2127. Zuberek, J., Wyslouch-Cieszynska, A., Niedzwiecka, A., Dadlez, M., Stepinski, J., Au- gustyniak, W., Gingras, A.-C., Zhang, Z., Burley, S. K., Sonenberg, N., et al. (2003). Phosphorylation of eIF4E attenuates its interaction with mRNA 5 cap analogs by elec- trostatic repulsion: intein-mediated protein ligation strategy to obtain phosphorylated protein. RNA, 9(1):52–61. 196