1 Towards Clinical Utility of Polygenic Risk Scores Samuel A. Lambert1-4, Gad Abraham1-2,5, Michael Inouye1-6 1. Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, United Kingdom 2. Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia 3. MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, United Kingdom 4. Cambridge Substantive Site, Health Data Research UK, Wellcome Genome Campus, Hinxton, UK 5. Department of Clinical Pathology, University of Melbourne, Parkville, VIC 3010, Australia 6. The Alan Turing Institute, London, UK Abstract Prediction of disease risk is an essential part of preventative medicine, often guiding clinical management. Risk prediction typically includes risk factors such as age, sex, family history of disease, and lifestyle (e.g. smoking status); however, in recent years there has been increasing interest to include genomic information into risk models. Polygenic risk scores (PRS) aggregate the effects of many genetic variants across the human genome into a single score, and have recently been shown to have predictive value for multiple common diseases. In this review, we summarise the potential use cases for seven common diseases (breast cancer, prostate cancer, coronary artery disease, obesity, type 1 diabetes, type 2 diabetes, Alzheimer's disease) where PRS has or could have clinical utility. PRS analysis for these diseases frequently revolved around (i) risk prediction performance of a PRS alone and in combination with other non-genetic risk factors, (ii) estimation of lifetime risk trajectories, (iii) the independent information of PRS and family history of disease or monogenic mutations, and (iv) estimation of the value of adding a PRS to specific clinical risk prediction scenarios. We summarise open questions regarding PRS usability, ancestry bias, and transferability, emphasising the need for the next wave of studies to focus on the implementation and health-economic value of PRS testing. In conclusion, it is becoming clear that PRS have value in disease risk prediction and there are multiple areas where this may have clinical utility. D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 2 Introduction A multitude of human traits and diseases are heritable to varying degrees. Further, the genetic basis for many such traits has been established as polygenic—explained by the contributions of many genes, each with moderate or weak contribution to the trait, in contrast to Mendelian traits which are caused by variation in one gene or a small set of genes with large effect. The combination of large-scale genome variation projects, such as the HapMap (1) and 1000 Genomes projects (2), together with low-cost robust genotyping platforms, has enabled genome-wide association studies (GWAS) on large cohorts. GWAS have focused on identifying disease- or trait-associated genetic variants (typically SNPs, single nucleotide polymorphisms) which are common in a given population (e.g. minor allele frequency [MAF]>1%). To date, GWAS have identified thousands of loci that are associated with a range of complex human traits and diseases, including cardiovascular diseases, cancers, obesity, and Alzheimer’s disease (3). These data have provided numerous insights into the genes and pathways that cause disease, but more recently the use of these data for disease risk prediction has gained interest (4–6). Polygenic risk scores (PRS), sometimes referred to as genomic risk scores (GRS), are one such method to predict an individual's genetic predisposition for disease. In its simplest and most common form, PRS are sums of the effects of m SNPs, based on the estimated SNP effect sizes ̂ (obtained from GWAS summary statistics), ∑ ̂ where is the genotype for the i th individual and jth SNP (usually encoded as 0, 1, or 2 for the effect allele dosage). Typically, these scores include hundreds-to-thousands of SNPs, motivated by theory and data showing that many diseases are polygenic (7). In this way, PRS aggregate the contribution of an individual’s germline genome into a single number proportional to the risk for a given disease. There are numerous considerations related to the data and methods used to develop and validate a PRS (see (8, 9) for details). Here, we briefly summarise approaches that use GWAS summary statistics (alleles and effect sizes, and/or p-value) (10) rather than individual-level genotypes, although the principles are broadly similar. Initially, PRS tended to be constructed from genome-wide significant SNPs (typically, P<5x10-8), which for many diseases led to weakly predictive PRS as the number of genome-wide SNPs was small (11, 12). In contrast with GWAS, which was designed for detecting SNPs associated with the disease while maintaining a low false positive rate, the task of prediction allows for methods with a more lenient signal to noise trade-off. Thus, more powerful PRS can typically be constructed by incorporating larger numbers of SNPs, however, there is a trade-off between using a small number of SNPs with precise effect estimates and a large number of SNPs with increasingly noisy effect size estimates. There is no universal set of parameters for this trade-off, as they depend on the genetic architecture of the disease, genotyping density, and sample size. In practice, a training D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 3 set comprising individual-level genotypes and phenotypes is often used to optimise the PRS. Using an independent validation dataset, or cross-validation, allows unbiased estimation of the predictive performance, avoiding optimism due to overfitting. Generally, once predictive performance plateaus or declines in the validation set, the optimal trade-off of signal and noise has been reached. Another consideration is linkage disequilibrium (LD), the correlations between nearby SNPs, which leads to over-representation of high LD regions in the model, thus potentially reducing its predictive performance. Common methods for constructing PRS include LD pruning (randomly removing one SNP from a pair in high LD), P-value thresholding, and clumping (pruning by LD while referentially retaining more significantly-associated SNPs), as well as more complex methods that explicitly account for LD, such as LDpred (13) and lassosum (14). The result of PRS development is the set of SNPs and effect sizes that can be applied to an independent sample. After the PRS has been constructed, it is essential to assess its predictive performance with the disease of interest in an external cohort, one not used for the underlying GWAS or for tuning the PRS. The accuracy of a PRS is bounded by the disease's heritability (total amount of disease variance that can be explained by genetics), and current PRS agree with estimates from theory (see Box 1). For polygenic scores of quantitative traits, the effect size per standard deviation (SD) change is usually reported, as well as the proportion of variance explained (R2) by the score. However, as most diseases are binary outcomes, the effect sizes are expressed as odds ratios (OR) or hazard ratios (HR), depending on the study design (case/control vs. prospective) and the availability of age at event. The model’s performance can be measured using variance explained (Nagelkerke's or pseudo-R2), or classification accuracy using area under the receiver- operating characteristic curve (AUC), the area under the precision-recall curve (AUPRC), or Harrell’s C-index (15). However, caution must be exercised when interpreting prediction metrics such as AUC or C-index without sufficient context; even small increases in these metrics can lead to several percent of the population being reclassified into different risk categories, changing their clinical management. Further, these metrics do not take into account the costs and benefits of various clinical decisions (e.g. use of statins), which can only be done within a public health and health-economic framework. In addition, when comparing metrics across studies, it is important to note that the ancestry as well as study design (e.g. covariates included in the risk model) can affect these measures (as well as the standard deviation of the PRS). Potential for PRS utility We reviewed the literature for seven well-studied diseases where PRS could potentially have clinical value. These diseases include coronary artery disease (CAD), diabetes (types 1 and 2), obesity (and body mass index (BMI)), breast cancer, prostate cancer, and Alzheimer's disease. Table 1 summarizes information about each disease, their conventional risk factors, potential uses of a PRS, and recent references evaluating the clinical use of PRS in each case. A common theme is the expectation that the utility of PRS will be to predict future disease risk or identify those most at risk, and use this information to target treatments or alter screening D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 4 paradigms. In this section, we elaborate on two examples where the clinical benefit of a PRS has been suggested: (i) providing better CAD risk estimates to guide treatments, and (ii) the potential to target screening to populations at high risk of prostate cancer. For cardiovascular disease, traditional risk factors such as systolic blood pressure, cholesterol levels, and smoking habits (Table 1) are routinely used to predict risk and guide initiation of treatment (e.g. statins) to lower low-density lipoprotein (LDL) cholesterol and reduce the disease risk. Recent studies have shown that adding PRS for CAD to the Framingham Risk Score and the ACC/AHA pooled risk equations resulted in increased predictive power (16). Additionally, in two re-analyses of clinical trials evaluating the effect of statin use on cardiovascular disease prevention, it was shown that the treatment benefit (absolute CAD risk reduction) was highest in those with the highest CAD polygenic risk (17, 18). The MI-GENES study found that disclosing CAD genetic risk to patients when deciding whether to initiate statin therapy resulted in improved LDL reduction, and the effect was again higher in those with the most genetic risk (19). Preliminary health economic analysis has also shown the potential cost benefits of using PRS in targeted testing for CAD prevention within the Finnish health system (20). Together these results show that a PRS for CAD can inform a more accurate risk estimate and define individuals most likely to benefit from statin therapy; however, the exact net benefit will likely vary across health systems and thus will require evaluation within each one. Another potential use case for PRS may be to increase the utility of lower sensitivity diagnostics. The serum prostate-specific antigen (PSA) test was used to screen for prostate cancer, but large trials showed that it results in a significant amount of overdiagnosis (false-positives leading to overtreatment) (21); while still used in diagnosis it has been abandoned for broad screening. Multiple prostate cancer PRS have been developed that can accurately stratify individual’s risk; a key finding from these studies has been that the probability of overdiagnosis by screening decreases as individual’s prostate cancer polygenic risk increases (22–24). This finding suggests that the PSA test could be targeted to a higher-risk population, as measured by a PRS, where the PSA test has a higher positive predictive value. In other disease areas there is similar interest in adjusting screening test frequency and/or age of initiation, and in breast cancer the WISDOM clinical trial (25) is currently evaluating the use of risk (including PRS) instead of age-based guidelines (26) to guide these decisions. Lessons learned from PRS prediction studies PRS define a lifetime risk trajectory The majority of common complex diseases are late onset, with risk accumulating over time. Age is typically the strongest predictor of risk for many common diseases (Table 1), since it encapsulates the time dimension over which environmental exposures (risk factors) occur, as well as the ageing process (which can accelerate disease processes) (Figure 1). Thus the goal of risk prediction for such diseases is to evaluate whether the risk, either lifetime risk or shorter time horizons (e.g. 10-year risk), is higher than a threshold given by clinical guidelines or by D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 5 age-adjusted average risk. The predicted risk can then be used to plan appropriate clinical action, whether it be treatment or increased screening. The shape of this risk trajectory can be different from birth, and is modified by an individual’s genetics as well as environment and behaviors, such as smoking, diet, exercise, and medication usage. Analyses across a range of complex diseases have utilised methods from survival analysis to examine how PRS affect the trajectory of cumulative risk over a lifetime, including CAD (16, 27), breast cancer (28, 29), prostate cancer (23), Alzheimer’s disease (30, 31), and weight gain trajectories (32). These trajectories stratified by genetic risk can be estimated from an early age, prior to any clinical risk factors manifesting. For example, an average male in the UK Biobank would reach 10% cumulative risk of CAD by the age of 68 (27). On the other hand, individuals with the highest and lowest 20% of CAD PRS would attain 10% cumulative risk by 61 and 75 years, respectively. Similarly, the risk trajectory of breast cancer was modified in a cohort of Estonian women (29), whereby at age 70 the average risk of breast cancer was 5%, but was 12% for those >95th percentile of genetic risk, and 2.4% in those of the bottom quintile. Taken together, and with evidence from other diseases, it is clear that genetic risk can substantially stratify individual disease risk trajectories above what can be predicted by age alone. PRS capture risk not quantified by family history and rare monogenic mutations Two other major predictors that have been used for disease risk prediction are (i) family history and (ii) monogenic mutations. A family history of disease is a composite of genetic risk (both common and rare) and a shared environment. For instance, many breast cancer risk prediction methods implemented in clinical practice (e.g. BOADICEA (33)) use family history, often represented in a pedigree, to estimate risk alongside other predictors. Family history, however, suffers from several drawbacks: (i) family history depends on actual disease events occurring (a cancer diagnosis), and thus cannot detect individuals who are at high risk but have not experienced an event; (ii) complex trait theory predicts that the majority of cases of complex disease arise in individuals without any family history of disease (sporadic cases) (34); and (iii) family history information is often incomplete or imprecise in practice, leading to further reduction in its predictive power. PRS can be thought of as a method to explicitly capture the common polygenic component of family history. Indeed, even early PRS could predict lower prevalence diseases better than family history (35). Recently, more predictive PRS for higher prevalence disease, such as CAD, have been shown to be associated with CAD independently of family history (16, 36). Since family history includes an environmental as well as genetic component, we expect that as PRS get more powerful, they will better capture the common genetic component of family history, without affecting the shared environment or the monogenic (rare) component. Thus, it is likely that for prediction purposes, models combining both family history and PRS will be stronger than any one of the two single factors, and that family history will not be made redundant by PRS. D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 6 Another form of genetic risk is monogenic in origin, namely, Mendelian germline mutations with high penetrance. Such examples are given by BRCA1/2 for breast, ovarian, and prostate cancers, and familial hypercholesterolemia (FH, caused by LDLR/APOB/PCSK9 mutations) for CAD. While these mutations are often highly penetrant, their relative rarity in the population means that they only explain a small fraction of disease cases. Furthermore, these rare genetic variants are generally not well-genotyped by standard genome-wide genotyping arrays (nor well-imputed from reference panels) (37), such that PRS derived from standard GWAS summary statistics with typical MAF thresholds do not capture rarer variation with high accuracy. Comparing the relative contributions of polygenic and monogenic risk is not straightforward since PRS represent continuous risk while monogenic risk is typically represented as presence/absence of known mutations. One approach used by Khera et al. (2018) was to find what proportion of the population had a PRS level high enough to be considered as equivalent to carrying monogenic mutations. For example, the top 8% CAD PRS confers an odds ratio of 3, which is similar to that of FH (38, 39), but far more prevalent (1 in 13 and 1 in 200 for the PRS top 8% and FH, respectively), thus representing a much higher disease burden on the population level. Since monogenic and polygenic risk are largely independent, individuals can inherit any combination of these two factors, and some small proportion of the population may receive both high polygenic risk as well as monogenic mutations for the same disease, putting them at extreme risk; conversely, some monogenic carriers may be at lower risk than their average peers. This has been shown for LDL cholesterol levels in carriers of both FH mutations and high CAD risk (39), and by the ability of CAD PRS to predict CAD in cohorts of high-risk FH cases (16, 40). Outside of CAD, PRS for diseases have been combined with well-studied mutations to show that PRS provides additional stratification in carriers of BRCA1/2 mutations in prostate cancer (41) and breast cancer (42), and APOE ε4 carriers in Alzheimer’s disease (30, 43–45). There is some evidence from Alzheimer’s disease (44) and breast cancer (42) that polygenic risk may interact non-additively with monogenic risk, but more research is needed to understand the impact on risk prediction. Ultimately, combined monogenic/polygenic scores will likely provide the most information for individual risk prediction. PRS are largely independent of traditional risk factors and can improve current clinical risk prediction models When considering adding PRS to risk models based on traditional risk factors, there are three main questions: (i) is the PRS associated with disease risk independently of traditional risk factors; (ii) does the PRS combine additively or non-additively with traditional risk factors in affecting risk; and (iii) does the PRS increase predictive power over traditional risk factors. The PRS for several diseases have been shown to be associated with disease risk largely independently of traditional risk factors. For example, in CAD, the association of PRS with D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 7 disease is only partially attenuated by adjusting for a range of traditional risk factors such as systolic blood pressure, LDL cholesterol, BMI, and others (16, 27, 46). In addition, the PRSs are often only weakly associated with these risk factors. This is likely due to several reasons: (i) PRS are based on a large number of SNPs representing a multitude of biological pathways, some of which are not represented by traditional risk factors; (ii) many risk factors are themselves driven both by genetics and environment, and PRS can only capture the genetic component; (iii) current PRS are incomplete in that they typically only explain a small proportion of heritability; (iv) some risk factors, such as blood pressure, can exhibit substantial temporal variation and noise in measurements, whereas the PRS is capturing a life-long effect. A subsequent question is whether PRS and traditional risk factors combine additively in affecting disease, or does one modify the other in a non-additive way (statistical interaction). Results so far in CAD (16, 27) indicate that PRS and traditional risk factors combine largely additively; there is some evidence that PRS for breast cancer may interact with a minority of its risk factors including alcohol consumption, height, and hormone therapy (47), however, it is unknown whether the magnitude of these interactions has substantial implications for improved risk prediction. The final issue is whether PRS add substantial new information on top of traditional risk factors as to increase predictive power. In breast cancer this has been tested with multiple PRS (varying in GWAS summary statistics, training datasets, number of SNPs in the score) and multiple established risk predictors (varying in the genetic and non-genetic risk factors included; models listed in Table 1 and (48, 49)). In a systematic review and meta-analysis of these studies Fung et al. found that the AUC of any risk predictor improved by 0.004 with the inclusion of a PRS, and the net reclassification improvement (NRI, a measure of change in classification accuracy based on established risk thresholds) improved in all studies but one (49); however, care should be taken when interpreting these results as all of the scores included fewer than 100 SNPs. In a recent study of PRS utility for risk prediction in 101 breast cancer families without BRCA1/2 mutations the inclusion of a 161 SNP PRS into the BOADICEA changed screening recommendations for 11.5–19.8% of women based on the risk guidelines used (50). Using another recent PRS (PRS-77; (51)) resulted in similar fraction of risk categories changing when included into BOADICEA and a number of other risk prediction methods (BRCAPRO, BCRAT, and IBIS) in a small Australian cohort (52), and a smaller 67 SNP PRS was independently predictive of risk when included in the Gail risk model along with mammographic density and endogenous hormones (53) (similar findings are observed using PRS-77 (54)). The use of larger cancer PRS (28, 29, 38) will likely improve risk stratification further. PRS are most informative for prevention While there is benefit to adding PRS to existing clinical risk scores, the unique characteristics of PRS open up possibilities for earlier prevention. Indeed, a study to predict the development of T1D in high-risk children (family history of T1D) found that a PRS was only predictive of progression to T1D before any metabolic abnormalities were present (high DPT-1 score), indicating the value of a T1D PRS for predicting those likely to progress to disease (55). For D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 8 cardiovascular disease, traditional risk factors are typically not measured early in life and can have substantial temporal variation. In contrast, individuals can be genotyped early in life, and have their PRS for a wide range of complex diseases. For those at substantially increased lifetime risk of disease, but without elevated traditional risk factors, targeted lifestyle interventions could be used to reduce their risk, for example by more frequent follow-ups or more stringent targets for traditional risk factors (e.g. cholesterol) (56). Open questions and challenges for the PRS field We have outlined the value and potential of PRS for disease risk prediction but there remain a number of technical, practical, and ethical concerns that should be resolved before widespread clinical adoption. Improving the replicability and comparability of PRS predictions Currently PRS exist in the research domain, where scores methods and standards are constantly developing. The PRS for a single disease area can vary widely in their risk predictions because they will include different numbers (10–106) and non-overlapping sets of SNPs, with different effect sizes in different scores, depending on the GWAS summary statistics used to create the score (e.g., number of samples and their ancestry, phenotype definition, imputation panel for SNPs), along with the computational method and samples used to train the score. Apparent performance can also vary due to the covariates adjusted for in the risk prediction, such as age and sex. We believe this lack of consistency to be a prime concern for the PRS field and additional resources, such as a centralised public database of published polygenic scores, are necessary to increase PRS comparability and evaluation, and thus improve their potential for translation. However, further major challenges remain, including those as discussed below: increasing the diversity of genotyped cohorts to reduce the bias of PRS performance for European ancestries; investigating sex-based differences in PRS performance; and delineating clinical utility in disease-specific scenarios, rather than relying on generic prediction metrics, such as AUC. Sources of bias in PRS predictions: stratification by ancestry and sex? Currently, the majority of PRS are developed and evaluated using individuals of European ancestries, since the majority of GWAS and genetic reference panels (used for imputation) are currently biased toward European ancestries (57, 58). Because of this it has been observed that PRS developed using data from European ancestries are less predictive in non-European ancestries (59–63). There are various possible reasons for this lack of transferability including (i) population stratification in the original summary statistics (confounding the association results); (ii) differences in LD patterns between ancestries; and (iii) differences in the true genetic architectures of disease, including gene-environment interactions (58, 64). D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 9 Population stratification can generally be adjusted for during the GWAS or in the evaluation of the score on new datasets, using principal component analysis (PCA) or linear mixed models (65). Care must be taken even within a single ancestry group, as there can be regional variations of PRS driven by subtle population stratification (66, 67). As for performance differences due to diverging LD patterns, these arise since many of the GWAS SNPs are not necessarily the causal SNP but are in LD with the causal SNP (tagging), however due to differences in LD between populations, the causal SNPs may no longer be well tagged, leading to reduced performance (58, 64). The issue of differences in genetic architecture differing by ancestry groups is difficult to assess without large GWAS in non-European ancestries. So far, the evidence from diseases such as T2D is that the genetic architecture is largely concordant between European and non-European ancestries (68–71); and directionally concordant effect sizes between different ancestries have been observed in multiple other comparisons of GWAS across ancestral groups (72–74). Assuming that this holds across the majority of complex diseases, LD differences are likely the main challenge to overcome. Some proposed solutions include a single pan-ancestry PRS or creating different ancestry-specific PRSs (62, 75–77). A related challenge will then be how to accurately align an individual to a PRS based on their ancestry. Another important yet relatively unexplored aspect of PRS predictive differences are how they differ by sex. Many traits, including disease risk, differ by sex and some of that may be partly genetic (78). However, most GWAS are not sex-specific, and often exclude sex chromosomes (particularly X) from the analysis. This is an area of interest for future PRS research, with recent results showing stronger predictive power for obesity (79) and Alzheimer’s disease (80) using sex-specific PRS. What is the value of PRS, and how do we achieve it? In this review we have outlined the benefits of how PRS can improve risk prediction, and highlighted cases of potential clinical utility. However, the evaluation of a PRS in public health and health economic terms as well as in feasibility of implementation is necessary to motivate adoption; these aspects, however, have not been extensively explored. Public health, economic, and implementation assessment will be highly dependent on the PRS use case and costs of the clinical action (e.g. medication, or altered screening guidelines). A previous review outlined the potential value of PRS in optimally allocating therapies in reducing the Number Needed to Treat (NNT) (81), however cost-benefit analysis represent another large step to be taken. To our knowledge the cost-benefit of PRS testing has only been explored in CAD and breast cancer. In a simulation framework of the Finnish health system, it was found that an optimal allocation of a CAD PRS alongside traditional risk factors would be cost-beneficial if deployed in a targeted, rather than population-wide, approach (20). In a UK-based analysis of the benefits of allocating breast cancer screening using risk-based (a combined predictor including a PRS) rather than age-based estimates would improve the cost-effectiveness and the benefit-to-harm ratio over current guidelines (26). While these cases suggest the value of genetic testing in their specific use cases, they may be underestimating the potential benefit due to the multiple PRS that can be estimated from a single genotype array. It is possible that there would be a significant health D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 10 and economic benefit for genotyping once and receiving concurrent risk predictions for multiple diseases, optimizing treatment or screening for each. Conclusions Sixteen years since the human genome sequence was finished and nearly 13 years since the GWAS era began, PRS have emerged as a powerful tool to predict genetic predisposition of disease. For the seven diseases we evaluate here, the addition of PRS generally increased the accuracy of existing risk models of established risk factors, with the resulting improved risk prediction models affecting clinical management (diagnostic screening and/or treatment) in sizeable fractions of patients (~10% in the case of breast cancer). While these studies demonstrate the potential clinical impact and benefits of using PRS, there are still open questions regarding their eventual utility. The utility of PRS for informing disease risk is further evidenced by its practical implementation as a one-time, minimally invasive DNA extraction (e.g. saliva or blood draw) at any point in a lifetime, coupled with low-cost array genotyping and, in the future, genome sequencing. A single individual's genotype data allows for the parallel calculation of PRS for many diseases. From this single test, preexisting risk prediction models for multiple diseases appear to be improved, and lifetime risk trajectories can be estimated. In the future, these risk estimates may be used to guide screening frequencies, therapeutic interventions, and targeted recommendations for lifestyle. Regardless, the predictive accuracy of PRS will continue to improve with larger and more diverse cohorts as well as improved methods to derive and apply PRS, all of which are likely to increase the potential clinical utility of PRS and accelerate translation. D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 11 Display Items Box 1: Empirical results for CAD closely follow predictions from polygenic trait theory. Under an additive genetic liability threshold model, by assuming several key quantities, including population prevalence K, heritability h2 (on the liability scale) (and/or the sibling recurrence risk λs), we can derive the expected predictive power of a PRS, measured in sensitivity, specificity, AUC, and other quantities (82, 83). The adjacent figure shows simulation results for two scenarios relevant to CAD (assuming a population prevalence K=0.05 and h2=0.5): (a) a PRS explaining 10% of the phenotypic variance, similar to the results achieved by the latest CAD PRS (27); and (b) the results for a PRS explaining all the known heritability of CAD (50% of the phenotypic variance). Clearly, as the PRS explains more of the heritability, there is greater separation between the average scores of cases and non-cases (quantified by the AUC) and corresponding effect sizes (ORstdev). For a disease such as CAD, the expected AUC from a PRS explaining all of the known heritability is 0.9. For scenario (i), the top 5% of the population will have an average absolute (lifetime) CAD risk of 15%, but for scenario (ii) this goes up to a risk of 40%, and the top 15% of the population have a risk of >10%. Note that the genetic liability threshold model does not have direct bearing on how to increase the heritability explained by PRS, only what are the consequences of the increase. To increase the explained heritability we will likely need larger GWAS sample sizes (84, 85), together with wider genotyping of rarer genetic variants, such as via whole-genome sequencing (86, 87). In the absence of larger sample sizes, multi-trait prediction models can also be used to make small but consistent gains in predictive power (88, 89). D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 12 Table 1. An overview of PRS in seven different diseases. Cardiometabolic Traits/Diseases Obesity & BMI Risk Factors Mendelian Risk Factors MC4R mutations Other Factors Age, Sex, Family History Lifestyle: Diet, Physical Activity Potential clinical utility for PRS  Targeting lifestyle interventions and potential treatments (e.g. bariatric surgery) to those at most risk of developing obesity  BMI PRS is enriched in those who have undergone bariatric surgery in UK biobank (32)  Predicting weight gain trajectories (32, 90, 91)  Useful as a risk predictor of other diseases where obesity is a causal risk factor (79) Coronary artery disease (CAD) Risk Factors Mendelian Risk Factors Familial Hypercholesterolemia (FH) mutations: LDLR, APOB, PCSK9 Other Factors Age, Sex, Family History Systolic blood pressure, LDL or non-HDL cholesterol, BMI Lifestyle: Smoking, Diet, Physical Activity Potential clinical utility for PRS  Adds accuracy to clinical risk predictors (e.g. Framingham Risk Score, ACC/AHA13 (16))  Useful for defining most benefit from statin prescription (17, 18)  Useful for estimating lifetime risk trajectories (27, 56) Diabetes (Type 1) Risk Factors Mendelian Risk Factors Maturity onset diabetes of the young (MODY) related genes HLA susceptibility alleles Other Factors Age, Sex, Family History DPT-1 Metabolic Risk Score: BMI, glucose, and C-peptide Potential clinical utility for PRS  Predicting at-risk children who are most likely to progress to disease (55, 75, 92, 93)  Discriminating between Type 1 and 2 Diabetes (93) Diabetes (Type 2) Risk Factors Mendelian Risk Factors Undetermined Other Factors Age, Sex, Family History BMI, waist circumference, waist-hip ratio, history of hypertension, history of high blood glucose Lifestyle: Smoking, Diet, Physical Activity level Potential clinical utility for PRS  Adding additional stratification to already accurate risk models (e.g. age, sex, BMI) (94)  Estimating lifetime risk trajectories (94) Cancers Breast Cancer Risk Factors Mendelian Risk Factors Pathogenic BRCA1/2 mutations Lower risk pathogenic variants: PALB2, ATM, CHEK2 Other Factors Age, Sex, Family History Age at menarche, age at menopause, nulliparity and age at first childbirth, BMI, Hormone replacement therapy Potential clinical utility for PRS  Currently implemented within the BOADICEA risk model (33)  Added to other models including: Gail, Tyrer-Cusick, BCSC, BI- D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 13 RADS, Rosner-Colditz, NCI (29, 49, 53)  Has value when included in risk models that can be applied to study risk-based vs. age-based screening programs (26)  Disease subtyping: Can be used to estimate genetic risk for ER- positive or negative breast cancer separately (28) Prostate Cancer Risk Factors Mendelian Risk Factors Pathogenic BRCA1/2 mutations Other Factors Age, Sex, Family History Potential clinical utility for PRS  Improve predictions for risk-based screening and target PSA test to those with higher genetic risk (22)  Positive predictive value (PPV) of the PSA test increases with genetic risk (23) Other Alzheimer's Disease Risk Factors Mendelian Risk Factors APOE ε4 and ε2 alleles Other Factors Age, Sex, Family History Potential clinical utility for PRS  Current polygenic scores can explain the majority of heritability for common variants (95)  Polygenic Hazard Scores (PHS) to estimate age-of-onset (30) D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 14 Figure 1. PRS define lifetime risk trajectories. (A) Example density plot of a population according to polygenic risk. The distribution is filled and labeled according to the lowest (0-20%; blue), population average (40-60%; grey), and highest (80-100%; red) quintiles of genetic risk. (B). Example of a risk trajectory (Kaplan-Meier cumulative risk curve) for the population average (grey), and the highest and lowest quintiles of genetic risk (coloured as in A). Representative risk threshold shown for example. D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 15 References 1. Altshuler,D.M., Gibbs,R.A., Peltonen,L., Schaffner,S.F., Yu,F., Dermitzakis,E., Bonnen,P.E., De Bakker,P.I.W., Deloukas,P., Gabriel,S.B., et al. (2010) Integrating common and rare genetic variation in diverse human populations. Nature, 467, 52–58. 2. 1000 Genomes Project Consortium, Auton,A., Brooks,L.D., Durbin,R.M., Garrison,E.P., Kang,H.M., Korbel,J.O., Marchini,J.L., McCarthy,S., McVean,G.A., et al. (2015) A global reference for human genetic variation. Nature, 526, 68–74. 3. Buniello,A., MacArthur,J.A.L., Cerezo,M., Harris,L.W., Hayhurst,J., Malangone,C., McMahon,A., Morales,J., Mountjoy,E., Sollis,E., et al. (2019) The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res., 47, D1005–D1012. 4. Abraham,G. and Inouye,M. (2015) Genomic risk prediction of complex human disease and its clinical application. Curr. Opin. Genet. Dev., 33, 10–16. 5. Torkamani,A., Wineinger,N.E. and Topol,E.J. (2018) The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet., 19, 581–590. 6. Martin,A.R., Daly,M.J., Robinson,E.B., Hyman,S.E. and Neale,B.M. (2018) Predicting Polygenic Risk of Psychiatric Disorders. Biol. Psychiatry, 10.1016/j.biopsych.2018.12.015. 7. Visscher,P.M., Wray,N.R., Zhang,Q., Sklar,P., McCarthy,M.I., Brown,M.A. and Yang,J. (2017) 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet., 101, 5–22. 8. Choi,S.W., Shin,T., Mak,H. and Reilly,P.F.O. (2018) A guide to performing Polygenic Risk Score analyses. bioRxiv, 10.1101/416545. 9. Chatterjee,N., Shi,J. and García-Closas,M. (2016) Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet., 17, 392–406. 10. Pasaniuc,B. and Price,A.L. (2017) Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet., 18, 117–127. 11. International Schizophrenia Consortium, Purcell,S.M., Wray,N.R., Stone,J.L., Visscher,P.M., O’Donovan,M.C., Sullivan,P.F. and Sklar,P. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature, 460, 748–52. 12. Evans,D.M., Visscher,P.M. and Wray,N.R. (2009) Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Hum. Mol. Genet., 18, 3525–3531. 13. Vilhjálmsson,B.J., Yang,J., Finucane,H.K., Gusev,A., Lindström,S., Ripke,S., Genovese,G., Loh,P.-R., Bhatia,G., Do,R., et al. (2015) Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. Am. J. Hum. Genet., 97, 576–592. 14. Mak,T.S.H., Porsch,R.M., Choi,S.W., Zhou,X. and Sham,P.C. (2017) Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol., 41, 469–480. 15. Steyerberg,E.W., Vickers,A.J., Cook,N.R., Gerds,T., Gonen,M., Obuchowski,N., Pencina,M.J. and Kattan,M.W. (2010) Assessing the Performance of Prediction Models. Epidemiology, 21, 128–138. 16. Abraham,G., Havulinna,A.S., Bhalala,O.G., Byars,S.G., De Livera,A.M., Yetukuri,L., Tikkanen,E., Perola,M., Schunkert,H., Sijbrands,E.J., et al. (2016) Genomic prediction of coronary heart disease. Eur. Heart J., 37, 3267–3278. 17. Natarajan,P., Young,R., Stitziel,N.O., Padmanabhan,S., Baber,U., Mehran,R., Sartori,S., Fuster,V., Reilly,D.F., Butterworth,A., et al. (2017) Polygenic Risk Score Identifies Subgroup With Higher Burden of Atherosclerosis and Greater Relative Benefit From Statin Therapy in the Primary Prevention Setting. Circulation, 135, 2091–2101. 18. Mega,J.L., Stitziel,N.O., Smith,J.G., Chasman,D.I., Caulfield,M., Devlin,J.J., Nordio,F., Hyde,C., Cannon,C.P., Sacks,F., et al. (2015) Genetic risk, coronary heart disease events, D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 16 and the clinical benefit of statin therapy: an analysis of primary and secondary prevention trials. Lancet (London, England), 385, 2264–2271. 19. Kullo,I.J., Jouni,H., Austin,E.E., Brown,S.-A., Kruisselbrink,T.M., Isseh,I.N., Haddad,R.A., Marroush,T.S., Shameer,K., Olson,J.E., et al. (2016) Incorporating a Genetic Risk Score Into Coronary Heart Disease Risk Estimates: Effect on Low-Density Lipoprotein Cholesterol Levels (the MI-GENES Clinical Trial). Circulation, 133, 1181–8. 20. Hynninen,Y., Linna,M. and Vilkkumaa,E. (2018) Value of genetic testing in the prevention of cardiovascular events. PLoS One, 14, e0210010. 21. Grossman,D.C., Curry,S.J., Owens,D.K., Bibbins-Domingo,K., Caughey,A.B., Davidson,K.W., Doubeni,C.A., Ebell,M., Epling,J.W., Kemper,A.R., et al. (2018) Screening for Prostate Cancer. JAMA, 319, 1901. 22. Pashayan,N., Pharoah,P.D., Schleutker,J., Talala,K., Tammela,T.L., Määttänen,L., Harrington,P., Tyrer,J., Eeles,R., Duffy,S.W., et al. (2015) Reducing overdiagnosis by polygenic risk-stratified screening: findings from the Finnish section of the ERSPC. Br. J. Cancer, 113, 1086–1093. 23. Seibert,T.M., Fan,C.C., Wang,Y., Zuber,V., Karunamuni,R., Parsons,J.K., Eeles,R.A., Easton,D.F., Kote-Jarai,Z., Al Olama,A.A., et al. (2018) Polygenic hazard score to guide screening for aggressive prostate cancer: development and validation in large scale cohorts. BMJ, 360, j5757. 24. Pashayan,N., Duffy,S.W., Neal,D.E., Hamdy,F.C., Donovan,J.L., Martin,R.M., Harrington,P., Benlloch,S., Amin Al Olama,A., Shah,M., et al. (2015) Implications of polygenic risk- stratified screening for prostate cancer on overdiagnosis. Genet. Med., 17, 789–795. 25. Shieh,Y., Eklund,M., Madlensky,L., Sawyer,S.D., Thompson,C.K., Stover Fiscalini,A., Ziv,E., van’t Veer,L.J., Esserman,L.J. and Tice,J.A. (2017) Breast Cancer Screening in the Precision Medicine Era: Risk-Based Screening in a Population-Based Trial. J. Natl. Cancer Inst., 109, djw290. 26. Pashayan,N., Morris,S., Gilbert,F.J. and Pharoah,P.D.P. (2018) Cost-effectiveness and Benefit-to-Harm Ratio of Risk-Stratified Screening for Breast Cancer. JAMA Oncol., 4, 1504. 27. Inouye,M., Abraham,G., Nelson,C.P., Wood,A.M., Sweeting,M.J., Dudbridge,F., Lai,F.Y., Kaptoge,S., Brozynska,M., Wang,T., et al. (2018) Genomic Risk Prediction of Coronary Artery Disease in 480,000 Adults: Implications for Primary Prevention. J. Am. Coll. Cardiol., 72, 1883–1893. 28. Mavaddat,N., Michailidou,K., Dennis,J., Lush,M., Fachal,L., Lee,A., Tyrer,J.P., Chen,T.-H., Wang,Q., Bolla,M.K., et al. (2019) Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes. Am. J. Hum. Genet., 104, 21–34. 29. Läll,K., Lepamets,M., Palover,M., Esko,T., Metspalu,A., Tõnisson,N., Padrik,P., Mägi,R. and Fischer,K. (2019) Polygenic prediction of breast cancer: comparison of genetic predictors and implications for risk stratification. BMC Cancer, 19, 557. 30. Desikan,R.S., Fan,C.C., Wang,Y., Schork,A.J., Cabral,H.J., Cupples,L.A., Thompson,W.K., Besser,L., Kukull,W.A., Holland,D., et al. (2017) Genetic assessment of age-associated Alzheimer disease risk: Development and validation of a polygenic hazard score. PLoS Med., 14, 1–17. 31. Tan,C.H., Hyman,B.T., Tan,J.J.X., Hess,C.P., Dillon,W.P., Schellenberg,G.D., Besser,L.M., Kukull,W.A., Kauppi,K., McEvoy,L.K., et al. (2017) Polygenic hazard scores in preclinical Alzheimer disease. Ann. Neurol., 82, 484–488. 32. Khera,A. V., Chaffin,M., Wade,K.H., Zahid,S., Brancale,J., Xia,R., Distefano,M., Senol- Cosar,O., Haas,M.E., Bick,A., et al. (2019) Polygenic Prediction of Weight and Obesity Trajectories from Birth to Adulthood. Cell, 177, 587-596.e9. 33. Lee,A., Mavaddat,N., Wilcox,A.N., Cunningham,A.P., Carver,T., Hartley,S., Babb de Villiers,C., Izquierdo,A., Simard,J., Schmidt,M.K., et al. (2019) BOADICEA: a D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 17 comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genet. Med., 0, 1. 34. Yang,J., Visscher,P.M. and Wray,N.R. (2010) Sporadic cases are the norm for complex disease. Eur. J. Hum. Genet., 18, 1039–1043. 35. Do,C.B., Hinds,D.A., Francke,U. and Eriksson,N. (2012) Comparison of Family History and SNPs for Predicting Risk of Complex Disease. PLoS Genet., 8, e1002973. 36. Tada,H., Melander,O., Louie,J.Z., Catanese,J.J., Rowland,C.M., Devlin,J.J., Kathiresan,S. and Shiffman,D. (2016) Risk prediction by genetic risk scores for coronary heart disease is independent of self-reported family history. Eur. Heart J., 37, 561–7. 37. Weedon,M., Jackson,L., Harrison,J., Ruth,K., Tyrrell,J., Hattersley,A. and Wright,C. (2019) Very rare pathogenic genetic variants detected by SNP-chips are usually false positives : implications for direct-to-consumer genetic testing. bioRxiv, 10.1101/696799. 38. Khera,A. V., Chaffin,M., Aragam,K.G., Haas,M.E., Roselli,C., Choi,S.H., Natarajan,P., Lander,E.S., Lubitz,S.A., Ellinor,P.T., et al. (2018) Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet., 50, 1219–1224. 39. Khera,A. V, Chaffin,M., Zekavat,S.M., Collins,R.L., Roselli,C., Natarajan,P., Lichtman,J.H., D’Onofrio,G., Mattera,J.A., Dreyer,R.P., et al. (2018) Whole Genome Sequencing to Characterize Monogenic and Polygenic Contributions in Patients Hospitalized with Early- Onset Myocardial Infarction. Circulation, 10.1161/CIRCULATIONAHA.118.035658. 40. Paquette,M., Chong,M., Thériault,S., Dufour,R., Paré,G. and Baass,A. (2017) Polygenic risk score predicts prevalence of cardiovascular disease in patients with familial hypercholesterolemia. J. Clin. Lipidol., 11, 725-732.e5. 41. Lecarpentier,J., Silvestri,V., Kuchenbaecker,K.B., Barrowdale,D., Dennis,J., McGuffog,L., Soucy,P., Leslie,G., Rizzolo,P., Navazio,A.S., et al. (2017) Prediction of Breast and Prostate Cancer Risks in Male BRCA1 and BRCA2 Mutation Carriers Using Polygenic Risk Scores. J. Clin. Oncol., 35, 2240–2250. 42. Kuchenbaecker,K.B., McGuffog,L., Barrowdale,D., Lee,A., Soucy,P., Dennis,J., Domchek,S.M., Robson,M., Spurdle,A.B., Ramus,S.J., et al. (2017) Evaluation of Polygenic Risk Scores for Breast and Ovarian Cancer Risk Prediction in BRCA1 and BRCA2 Mutation Carriers. J. Natl. Cancer Inst., 109, 248–252. 43. Escott-Price,V., Sims,R., Bannister,C., Harold,D., Vronskaya,M., Majounie,E., Badarinarayan,N., GERAD/PERADES, IGAP consortia, Morgan,K., et al. (2015) Common polygenic variation enhances risk prediction for Alzheimer’s disease. Brain, 138, 3673–84. 44. van der Lee,S.J., Wolters,F.J., Ikram,M.K., Hofman,A., Ikram,M.A., Amin,N. and van Duijn,C.M. (2018) The effect of APOE and other common genetic variants on the onset of Alzheimer’s disease and dementia: a community-based cohort study. Lancet Neurol., 17, 434–444. 45. Stocker,H., Möllers,T., Perna,L. and Brenner,H. (2018) The genetic risk of Alzheimer’s disease beyond APOE ε4: systematic review of Alzheimer’s genetic risk scores. Transl. Psychiatry, 8, 166. 46. Hindy,G., Wiberg,F., Almgren,P., Melander,O. and Orho-Melander,M. (2018) Polygenic Risk Score for Coronary Heart Disease Modifies the Elevated Risk by Cigarette Smoking for Disease Incidence. Circ. Genomic Precis. Med., 11, e001856. 47. Rudolph,A., Song,M., Brook,M.N., Milne,R.L., Mavaddat,N., Michailidou,K., Bolla,M.K., Wang,Q., Dennis,J., Wilcox,A.N., et al. (2018) Joint associations of a polygenic risk score and environmental risk factors for breast cancer in the Breast Cancer Association Consortium. Int. J. Epidemiol., 47, 526–536. 48. Willoughby,A., Andreassen,P.R. and Toland,A.E. (2019) Genetic testing to guide risk- stratified screens for breast cancer. J. Pers. Med., 9. 49. Fung,S.M., Wong,X.Y., Lee,S.X., Miao,H., Hartman,M. and Wee,H.L. (2019) Performance of D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 18 single-nucleotide polymorphisms in breast cancer risk prediction models: A Systematic Review and Meta-analysis. Cancer Epidemiol. Biomarkers Prev., 28, 506–521. 50. Lakeman,I.M.M., Hilbers,F.S., Rodríguez-Girondo,M., Lee,A., Vreeswijk,M.P.G., Hollestelle,A., Seynaeve,C., Meijers-Heijboer,H., Oosterwijk,J.C., Hoogerbrugge,N., et al. (2019) Addition of a 161-SNP polygenic risk score to family history-based risk prediction: impact on clinical management in non- BRCA1/2 breast cancer families. J. Med. Genet., 10.1136/jmedgenet-2019-106072. 51. Mavaddat,N., Pharoah,P.D.P., Michailidou,K., Tyrer,J., Brook,M.N., Bolla,M.K., Wang,Q., Dennis,J., Dunning,A.M., Shah,M., et al. (2015) Prediction of Breast Cancer Risk Based on Profiling With Common Genetic Variants. JNCI J. Natl. Cancer Inst., 107, 1–15. 52. Dite,G.S., Macinnis,R.J., Bickerstaffe,A., Dowty,J.G., Allman,R., Apicella,C., Milne,R.L., Tsimiklis,H., Phillips,K.A., Giles,G.G., et al. (2016) Breast cancer risk prediction using clinical models and 77 independent risk-associated SNPs for women aged under 50 years: Australian breast cancer family registry. Cancer Epidemiol. Biomarkers Prev., 25, 359–365. 53. Zhang,X., Rice,M., Tworoger,S.S., Rosner,B.A., Eliassen,A.H., Tamimi,R.M., Joshi,A.D., Lindstrom,S., Qian,J., Colditz,G.A., et al. (2018) Addition of a polygenic risk score, mammographic density, and endogenous hormones to existing breast cancer risk prediction models: A nested case–control study. PLOS Med., 15, e1002644. 54. Vachon,C.M., Scott,C.G., Tamimi,R.M., Thompson,D.J., Fasching,P.A., Stone,J., Southey,M.C., Winham,S., Lindström,S., Lilyquist,J., et al. (2019) Joint association of mammographic density adjusted for age and body mass index and polygenic risk score with breast cancer risk. Breast Cancer Res., 21, 1–10. 55. Redondo,M.J., Geyer,S., Steck,A.K., Sharp,S., Wentworth,J.M., Weedon,M.N., Antinozzi,P., Sosenko,J., Atkinson,M., Pugliese,A., et al. (2018) A type 1 diabetes genetic risk score predicts progression of islet autoimmunity and development of type 1 diabetes in individuals at risk. Diabetes Care, 41, 1887–1894. 56. Natarajan,P. (2018) Polygenic Risk Scoring for Coronary Heart Disease. J. Am. Coll. Cardiol., 72, 1894–1897. 57. Morales,J., Welter,D., Bowler,E.H., Cerezo,M., Harris,L.W., McMahon,A.C., Hall,P., Junkins,H.A., Milano,A., Hastings,E., et al. (2018) A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog. Genome Biol., 19, 21. 58. Martin,A.R., Kanai,M., Kamatani,Y., Okada,Y., Neale,B.M. and Daly,M.J. (2019) Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet., 51, 584– 591. 59. Ware,E.B., Schmitz,L.L., Faul,J., Gard,A., Smith,J.A., Zhao,W., Weir,D. and Kardia,S.L.R. (2017) Heterogeneity in polygenic scores for common human traits. bioRxiv, 10.1101/106062. 60. Reisberg,S., Iljasenko,T., Läll,K., Fischer,K. and Vilo,J. (2017) Comparing distributions of polygenic risk scores of type 2 diabetes and coronary heart disease within different populations. PLoS One, 12, e0179238. 61. Kim,M.S., Patel,K.P., Teng,A.K., Berens,A.J. and Lachance,J. (2018) Genetic disease risks can be misestimated across global populations. Genome Biol., 19, 179. 62. Onengut-Gumuscu,S., Chen,W.-M., Robertson,C.C., Bonnie,J.K., Farber,E., Zhu,Z., Oksenberg,J.R., Brant,S.R., Bridges,S.L., Edberg,J.C., et al. (2019) Type 1 Diabetes Risk in African-Ancestry Participants and Utility of an Ancestry-Specific Genetic Risk Score. Diabetes Care, 42, 406–415. 63. Curtis,D. (2018) Polygenic risk score for schizophrenia is more strongly associated with ancestry than with schizophrenia. Psychiatr. Genet., 28, 85–89. 64. Martin,A.R., Gignoux,C.R., Walters,R.K., Wojcik,G.L., Neale,B.M., Gravel,S., Daly,M.J., Bustamante,C.D. and Kenny,E.E. (2017) Human Demographic History Impacts Genetic D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 19 Risk Prediction across Diverse Populations. Am. J. Hum. Genet., 100, 635–649. 65. Price,A.L., Zaitlen,N.A., Reich,D. and Patterson,N. (2010) New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet., 11, 459–463. 66. Haworth,S., Mitchell,R., Corbin,L., Wade,K.H., Dudding,T., Budu-Aggrey,A., Carslake,D., Hemani,G., Paternoster,L., Smith,G.D., et al. (2019) Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis. Nat. Commun., 10, 333. 67. Kerminen,S., Martin,A.R., Koskela,J., Ruotsalainen,S.E., Havulinna,A.S., Surakka,I., Palotie,A., Perola,M., Salomaa,V., Daly,M.J., et al. (2019) Geographic Variation and Bias in the Polygenic Scores of Complex Diseases and Traits in Finland. Am. J. Hum. Genet., 104, 1169–1181. 68. Mahajan,A., Go,M.J., Zhang,W., Below,J.E., Gaulton,K.J., Ferreira,T., Horikoshi,M., Johnson,A.D., Ng,M.C.Y., Prokopenko,I., et al. (2014) Genome-wide trans-ancestry meta- analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet., 46, 234–244. 69. Waters,K.M., Stram,D.O., Hassanein,M.T., Le Marchand,L., Wilkens,L.R., Maskarinec,G., Monroe,K.R., Kolonel,L.N., Altshuler,D., Henderson,B.E., et al. (2010) Consistent Association of Type 2 Diabetes Risk Variants Found in Europeans in Diverse Racial and Ethnic Groups. PLoS Genet., 6, e1001078. 70. Hassanali,N., De Silva,N.M.G., Robertson,N., Rayner,N.W., Barrett,A., Bennett,A.J., Groves,C.J., Matthews,D.R., Katulanda,P., Frayling,T.M., et al. (2014) Evaluation of Common Type 2 Diabetes Risk Variants in a South Asian Population of Sri Lankan Descent. PLoS One, 9, e98608. 71. Gan,W., Walters,R.G., Holmes,M. V., Bragg,F., Millwood,I.Y., Banasik,K., Chen,Y., Du,H., Iona,A., Mahajan,A., et al. (2016) Evaluation of type 2 diabetes genetic risk variants in Chinese adults: findings from 93,000 individuals from the China Kadoorie Biobank. Diabetologia, 59, 1446–1457. 72. Wojcik,G.L., Graff,M., Nishimura,K.K., Tao,R., Haessler,J., Gignoux,C.R., Highland,H.M., Patel,Y.M., Sorokin,E.P., Avery,C.L., et al. (2019) Genetic analyses of diverse populations improves discovery for complex traits. Nature, 570, 514–518. 73. Gurdasani,D., Barroso,I., Zeggini,E. and Sandhu,M.S. (2019) Genomics of disease risk in globally diverse populations. Nat. Rev. Genet., 10.1038/s41576-019-0144-0. 74. Lam,M., Chen,C.-Y., Li,Z., Martin,A., Bryois,J., Ma,X., Gaspar,H., Ikeda,M., Benyamin,B., Brown,B., et al. (2018) Comparative genetic architectures of schizophrenia in East Asian and European populations. bioRxiv, 10.1101/445874. 75. Perry,D.J., Wasserfall,C.H., Oram,R.A., Williams,M.D., Posgai,A., Muir,A.B., Haller,M.J., Schatz,D.A., Wallet,M.A., Mathews,C.E., et al. (2018) Application of a Genetic Risk Score to Racially Diverse Type 1 Diabetes Populations Demonstrates the Need for Diversity in Risk-Modeling. Sci. Rep., 8, 4529. 76. Starlard-Davenport,A., Allman,R., Dite,G.S., Hopper,J.L., Tuff,E.S., Macleod,S., Kadlubar,S., Preston,M. and Henry-Tillman,R. (2018) Validation of a genetic risk score for Arkansas women of color. PLoS One, 13. 77. Shieh,Y., Fejerman,L., Lott,P.C., Marker,K., Sawyer,S.D., Hu,D., Huntsman,S., Torres,J., Echeverry,M., Bohorquez,M.E., et al. (2019) A polygenic risk score for breast cancer in U.S. Latinas and Latin-American women. bioRxiv, 10.1101/598730. 78. Khramtsova,E.A., Davis,L.K. and Stranger,B.E. (2019) The role of sex in the genomics of human complex traits. Nat. Rev. Genet., 20, 173–190. 79. Censin,J.C., Bovijn,J., Ferreira,T., Pulit,S.L., Magi,R., Mahajan,A., Holmes,M. V and Lindgren,C.M. (2019) Causal relevance of obesity on the leading causes of death in women and men: A Mendelian randomization study. bioRxiv, 10.1101/523217. 80. Tan,C.H., Fan,C.C., Mormino,E.C., Sugrue,L.P., Broce,I.J., Hess,C.P., Dillon,W.P., Bonham,L.W., Yokoyama,J.S., Karch,C.M., et al. (2018) Polygenic hazard score: an D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019 20 enrichment marker for Alzheimer’s associated amyloid and tau deposition. Acta Neuropathol., 135, 85–93. 81. Gibson,G. (2019) On the utilization of polygenic risk scores for therapeutic targeting. PLOS Genet., 15, e1008060. 82. Wray,N.R., Yang,J., Goddard,M.E. and Visscher,P.M. (2010) The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling. PLoS Genet., 6, e1000864. 83. So,H.-C. and Sham,P.C. (2010) A Unifying Framework for Evaluating the Predictive Power of Genetic Variants Based on the Level of Heritability Explained. PLoS Genet., 6, e1001230. 84. Dudbridge,F. (2013) Power and Predictive Accuracy of Polygenic Risk Scores. PLoS Genet., 9, e1003348. 85. Chatterjee,N., Wheeler,B., Sampson,J., Hartge,P., Chanock,S.J. and Park,J.H. (2013) Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet., 45, 400–405. 86. Wainschtein,P., Jain,D.P., Yengo,L. and Zheng,Z. (2019) Recovery of trait heritability from whole genome sequence data Visscher 2019.pdf. bioRxiv, 10.1101/588020. 87. Wray,N.R., Kemper,K.E., Hayes,B.J., Goddard,M.E. and Visscher,P.M. (2019) Complex Trait Prediction from Genome Data: Contrasting EBV in Livestock to PRS in Humans. Genetics, 211, 1131–1141. 88. Turley,P., Walters,R.K., Maghzian,O., Okbay,A., Lee,J.J., Fontana,M.A., Nguyen-Viet,T.A., Wedow,R., Zacher,M., Furlotte,N.A., et al. (2018) Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet., 50, 229–237. 89. Maier,R.M., Zhu,Z., Lee,S.H., Trzaskowski,M., Ruderfer,D.M., Stahl,E.A., Ripke,S., Wray,N.R., Yang,J., Visscher,P.M., et al. (2018) Improving genetic prediction by leveraging genetic correlations among human diseases and traits. Nat. Commun., 9, 989. 90. Song,M., Zheng,Y., Qi,L., Hu,F.B., Chan,A.T. and Giovannucci,E.L. (2018) Longitudinal analysis of genetic susceptibility and BMI throughout adult life. Diabetes, 67, 248–255. 91. Brandkvist,M., Bjørngaard,J.H., Ødegård,R.A., Åsvold,B.O., Sund,E.R. and Vie,G.Å. (2019) Quantifying the impact of genes on body mass index during the obesity epidemic: longitudinal findings from the HUNT Study. Bmj, 10.1136/bmj.l4067. 92. Bonifacio,E., Beyerlein,A., Hippich,M., Winkler,C., Vehik,K., Weedon,M.N., Laimighofer,M., Hattersley,A.T., Krumsiek,J., Frohnert,B.I., et al. (2018) Genetic scores to stratify risk of developing multiple islet autoantibodies and type 1 diabetes: A prospective study in children. PLoS Med., 15, e1002548. 93. Sharp,S.A., Rich,S.S., Wood,A.R., Jones,S.E., Beaumont,R.N., Harrison,J.W., Schneider,D.A., Locke,J.M., Tyrrell,J., Weedon,M.N., et al. (2019) Development and Standardization of an Improved Type 1 Diabetes Genetic Risk Score for Use in Newborn Screening and Incident Diagnosis. Diabetes Care, 42, 200–207. 94. Läll,K., Mägi,R., Morris,A., Metspalu,A. and Fischer,K. (2017) Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores. Genet. Med., 19, 322–329. 95. Escott-Price,V., Shoai,M., Pither,R., Williams,J. and Hardy,J. (2017) Polygenic score prediction captures nearly all common genetic risk for Alzheimer’s disease. Neurobiol. Aging, 49, 214. D ow nloaded from https://academ ic.oup.com /hm g/advance-article-abstract/doi/10.1093/hm g/ddz187/5540980 by U niversity of C am bridge user on 31 July 2019