Cardiovascular Disease Risk Prediction using Automated Machine Learning: A Prospective Study of 423,604 UK Biobank Participants Ahmed M. Alaa1*, Thomas Bolton2,3, Emanuele Di Angelantonio2,3, James H. F. Rudd4, Mihaela van der Schaar1,5,6 1 University of California Los Angeles, Los Angeles, California, United States of America 2 Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom 3 National Institute for Health Research (NIHR) Blood and Transplant Research Unit (BTRU) in Donor Health and Genomics, University of Cambridge, Cambridge, United Kingdom 4 Department of Cardiovascular Medicine, University of Cambridge and Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom 5 University of Oxford, Oxford, United Kingdom 6 Alan Turing Institute, London, United Kingdom * ahmedmalaa@ucla.edu Abstract Background Identifying people at risk of cardiovascular diseases (CVD) is a cornerstone of preventative cardiology. Risk prediction models currently recommended by clinical guidelines are typically based on a limited number of predictors with sub-optimal performance across all patient groups. Data-driven techniques based on machine learning (ML) might improve the performance of risk predictions by agnostically discovering novel risk predictors and learning the complex interactions between them. We tested (1) whether ML techniques based on a state-of-the-art automated ML framework (AutoPrognosis) could improve CVD risk prediction compared to traditional approaches, and (2) whether considering non-traditional variables could increase the accuracy of CVD risk predictions. Methods and Findings Using data on 423,604 participants without CVD at baseline in UK Biobank, we developed a ML-based model for predicting CVD risk based on 473 available variables. Our ML-based model was derived using AutoPrognosis, an algorithmic tool that automatically selects and tunes ensembles of ML modeling pipelines (comprising data imputation, feature processing, classi cation and calibration algorithms). We compared our model with a well-established risk prediction algorithm based on conventional CVD risk factors (Framingham score), a Cox proportional hazards (PH) model based on familiar risk factors (i.e, age, gender, smoking status, systolic blood pressure, history of diabetes, reception of treatments for hypertension and body mass index), and a Cox PH model based on all of the 473 available variables. Predictive performances were assessed using area under the receiver operating characteristic curve (AUC-ROC). Overall, our AutoPrognosis model improved risk prediction (AUC-ROC: 0.774, 95% CI: 0.768-0.780) compared to Framingham score March 8, 2019 1/17 (AUC-ROC: 0.724, 95% CI: 0.720-0.728, p < 0.001), Cox PH model with conventional risk factors (AUC-ROC: 0.734, 95% CI: 0.729-0.739, p < 0.001), and Cox PH model with all UK Biobank variables (AUC-ROC: 0.758, 95% CI: 0.753-0.763, p < 0.001). Out of 4,801 CVD cases recorded within 5 years of baseline, AutoPrognosis was able to correctly predict 368 more cases compared to the Framingham score. Our AutoPrognosis model included predictors that are not usually considered in existing risk prediction models, such as the individuals' usual walking pace and their self-reported overall health rating. Furthermore, our model improved risk prediction in potentially relevant sub-populations, such as in individuals with history of diabetes. We also highlight the relative bene ts accrued from including more information into a predictive model (information gain) as compared to the bene ts of using more complex models (modeling gain). Conclusions Our AutoPrognosis model improves the accuracy of CVD risk prediction in the UK Biobank population. This approach performs well in traditionally poorly served patient subgroups. Additionally, AutoPrognosis uncovered novel predictors for CVD disease that may now be tested in prospective studies. We found that the \information gain" achieved by considering more risk factors in the predictive model was signi cantly higher than the \modeling gain" achieved by adopting complex predictive models. Introduction 1 Globally, cardiovascular disease (CVD) remains the leading cause of morbidity and 2 mortality [1]. Current clinical guidelines for primary prevention of CVD emphasize the need 3 to identify asymptomatic patients who may bene t from preventive action (e.g., initiation 4 of statin therapy [2]) based on their predicted risk [3{6]. Di erent guidelines recommend 5 di erent algorithms for risk prediction. For example, the 2010 American College of 6 Cardiology/American Heart Association (ACC/AHA) guideline [7] recommended use of 7 Framingham Risk Score [4], whereas the 2016 European guidelines recommended use of 8 the Systematic Coronary Risk Evaluation (SCORE) algorithm [8]. In the UK, the current 9 National Institute for Health and Care Excellence (NICE) guidelines recommend use of the 10 QRISK2 score to guide the initiation of lipid lowering therapies [9, 10]. 11 Existing risk prediction algorithms are typically developed using multivariate regression 12 models that combine information on a limited number of well-established risk factors, and 13 generally assume that all such factors are related to the CVD outcomes in a linear fashion, 14 with limited or no interactions between the di erent factors. Because of their restrictive 15 modeling assumptions and limited number of predictors, existing algorithms generally 16 exhibit modest predictive performance [11], especially for certain sub-populations such as 17 individuals with diabetes [12{15] or rheumatoid arthritis [3]. Data-driven techniques based 18 on machine learning (ML) can improve the performance of risk predictions by exploiting 19 large data repositories to agnostically identify novel risk predictors and more complex 20 interactions between them. However, only a few studies have investigated the potential 21 advantages of using ML approaches for CVD risk prediction, focusing only on a limited 22 number of ML methods [16,17] or a limited number of risk predictors [18]. 23 Here, we aim to assess the potential value of using ML approaches to derive risk 24 prediction models for CVD. We analyzed data on 423,604 participants without CVD at 25 baseline in UK Biobank, a large prospective cohort study in which participants were 26 recruited from 22 centers throughout the UK. We used a state-of-the-art automated ML 27 method (AutoPrognosis) to develop ML-based risk prediction models and evaluated their 28 predictive performances in the overall population and clinically relevant sub-populations. In 29 this paper, we do not focus on the algorithmic aspects of the ML methods involved and 30 March 8, 2019 2/17 rather focus on their clinical application. Methodological details on our automated ML 31 algorithm can be found in our technical publication in [19]. 32 Materials and methods 33 Study design and participants 34 Participants were enrolled in the UK Biobank from 22 assessment centers across England, 35 Wales, and Scotland, during the period spanning from 2006 to 2010 [20]. We extracted a 36 cohort of participants who were 40 years of age or older and had no known history of CVD 37 at baseline. That is, patients with previous history of coronary heart disease, other heart 38 disease, stroke, transient ischaemic attack, peripheral arterial disease, or cardiovascular 39 surgery were excluded from the analysis. The total number of participants who met the 40 inclusion criteria was 423,604. The last available date of participant follow-up was Feb 17, 41 2016. UK Biobank obtained approval from the North West Multi-centre Research Ethics 42 Committee (MREC), and the Community Health Index Advisory Group (CHIAG). All 43 participants provided written informed consent prior to enrollment in the study. The UK 44 Biobank protocol is available online [21]. 45 The UK Biobank dataset keeps track of a large number of variables for each participant, 46 but most of those variables are missing for most patients. In order to include the maximum 47 possible number of (informative) variables in our analysis, we included all variables that are 48 missing for less than 50% of patients with CVD outcomes. This corresponded to a rate of 49 missingness of 85% for the entire population of participants. Our rationale for assessing 50 the missingness rate among patients with CVD is that missingness itself maybe informative 51 (i.e., the chance of a variable being missing may depend on the outcome). By excluding all 52 variables that were missing for more than 85% of the participants, a total of 473 variables 53 were included in our analysis. We categorized all variables in the UK Biobank into 9 54 categories: health and medical history, lifestyle and environment, blood assays, physical 55 activity, family history, physical measures, psychosocial factors, dietary and nutritional 56 information, and sociodemographics [22]. The (categorized) lists of variables involved in 57 our analysis are provided in the supporting information (S1 to S9 Tables). 58 Outcome 59 The primary outcome was the rst fatal or non-fatal CVD event. A CVD event was 60 de ned as the assignment of any of the ICD-10 diagnosis codes F01 (vascular dementia), 61 I20-I25 (coronary/ischaemic heart diseases), I50 (heart failure events, including acute and 62 chronic systolic heart failures), and I60-I69 (cerebrovascular diseases), or any of the ICD-9 63 codes 410-414 (ischemic heart disease), 430-434, and 436-438 (cerebrovascular disease). 64 Follow-up data was obtained from the hospital episode statistics (a data warehouse 65 containing records of all patients admitted to NHS hospitals), and the equivalent datasets 66 in Scotland and Wales [23]. 67 Models Tested 68 Framingham Risk Score 69 At the time of conducting this study, the UK Biobank had not yet released data on the 70 participants' total cholesterol, HDL cholesterol and LDL cholesterol, which are used as 71 predictors in various established algorithms, such as Framingham score [4], ACC/AHA [24], 72 QRISK2 [9], and SCORE [5]. The Framingham score, however, provides an incarnation of 73 its underlying model based on nonlaboratory predictors, which replaces lipids with Body 74 Mass Index (BMI) [4]. Since BMI is currently collected for 99.38% of the UK Biobank 75 March 8, 2019 3/17 participants, we compared our model with the BMI version of the Framingham score. We 76 used the published predicting equations (beta-coecients and survival functions) of the 77 BMI-based Framingham model developed in [4]. (Framingham risk calculator and model 78 coecients are publicly available in: https://www.framinghamheartstudy.org.) 79 The Framingham score is based on 7 core risk factors: gender, age, systolic blood 80 pressure, treatment for hypertension, smoking status, history of diabetes, and BMI. All of 81 those variables were complete for the participants in the extracted cohort, with the 82 exception of systolic blood pressure (missing for 6.8% of the participants), and BMI 83 (missing for 0.62% of the participants). We used the MissForest non-parametric data 84 imputation algorithm [25] to recover the missing values. Using the MissForest algorithm, 85 we sampled 5 imputed datasets and averaged the model predictions for each participant on 86 the 5 datasets (this is known in the literature as Rubin's rules [25]). The number of 87 imputed datasets was selected via cross-validation. 88 Cox Proportional Hazards Model 89 We evaluated the performance of two Cox Proportional Hazards (PH) models derived from 90 the analysis cohort: a model that only uses the traditional 7 risk factors used by the 91 Framingham score, and a model that uses all of the 473 variables in the UK Biobank. To 92 t the Cox PH models, we imputed the missing data using the MissForest imputation 93 algorithm (with 5 imputations). The Cox PH model that uses the traditional 7 risk factors 94 used by Framingham score can be thought of as a variant of Framingham score calibrated 95 to the UK population (the Framingham score was originally derived for a US population). 96 For the Cox PH model that uses all of the 473 predictors, we applied variable selection 97 using the LASSO method [26]. (Variable selection was applied since tting the Cox model 98 with all variables resulted in an inferior performance due to the numerical collapse of the 99 Cox model solvers in high dimensions.) To apply variable selection, we t a LASSO 100 regression model (a linear model penalized with the L1 norm) to predict the (binary) CVD 101 outcomes. The tted model gives a sparse solution whereby many of the estimated 102 coecients are zero. We select all the variables with non-zero coecients in the tted 103 LASSO model and feed those variables into a Cox model tted on the same batch of data. 104 We optimize the LASSO model regularization parameter via cross-validation. 105 Standard ML Models 106 We considered 5 standard ML benchmarks that cover di erent classes of ML modeling 107 approaches. The models under consideration are: linear support vector machines 108 (SVM) [27] (a linear classi er), random forest [28] (a tree-based ensemble method), neural 109 networks [29] (a deep learning method), AdaBoost [30] and gradient boosting 110 machines [31] (boosting ensemble methods). (We also attempted to t a kernel SVM, but 111 tting such a model was computationally infeasible for the UK Biobank cohort because it 112 entails a cubic complexity in the number of datapoints.) The purpose of including those 113 models in our experimental evaluations is to ensure that AutoPrognosis has automatically 114 selected and tuned the best possible ML model, and that no individually-tuned ML model 115 performed better than the model selected by AutoPrognosis. (We decided to include a 116 Gradient boosting model in retrospect because it was assigned the largest weight in the 117 ensemble formed by AutoPrognosis.) We implemented all these models using the 118 Scikit-learn library in Python programming language [32]. The models' hyper-parameters 119 were determined via grid search. Data imputation for all models was conducted using the 120 MissForest algorithm (with 5 imputed datasets). (We have attempted other imputation 121 algorithms, such as multiple imputation by chained equations, but MissForest provided a 122 better predictive performance.) 123 March 8, 2019 4/17 Model Development using AutoPrognosis 124 We developed an ML-based model for CVD risk prediction using AutoPrognosis, an 125 algorithmic framework for automating the design of ML-based clinical prognostic 126 models [19]. A schematic for the AutoPrognosis framework is provided in Fig 1. Given the 127 participants' variables and CVD outcomes, AutoPrognosis uses an advanced Bayesian 128 optimization technique [33,34] in order to (automatically) design a prognostic model made 129 out of a weighted ensemble of ML pipelines. Each ML pipeline comprises design choices 130 for data imputation, feature processing, classi cation and calibration algorithms (and their 131 hyper-parameters). (Calibration means that the numerical outputs of a model correspond 132 to the actual risk of a CVD event. That is, an output prediction of 20% means that the 133 patient's 5-year risk of a CVD event is 20%.) The design space of AutoPrognosis contains 134 5,460 possible ML pipelines (7 possible imputation algorithms, 9 feature processing 135 algorithms, 20 classi cation algorithms, and 3 calibration methods). The list of algorithms 136 that constitute the design space of AutoPrognosis is provided in Table 1. A detailed 137 technical and methodological description of AutoPrognosis can be found in our previous 138 work in [19]. 139 Fig 1. An illustrative schematic for AutoPrognosis. In this depiction, AutoPrognosis constructs an ensemble of three ML pipelines. Pipeline 1 uses the MissForest algorithm to impute missing data, and then compresses the data into a lower-dimensional space using the principal component analysis (PCA) algorithm, before using the random forest algorithm to issue predictions. Pipelines 2 and 3 use di erent algorithms for imputation, feature processing, classi cation and calibration. AutoPrognosis uses the algorithm in [19] to make decisions on what pipelines to select and how to tune the pipelines' parameters. To train our model, we set AutoPrognosis to conduct 200 iterations of the Bayesian 140 optimization procedure in [19], where in each iteration the algorithm explores a new ML 141 pipeline and tunes its hyper-parameters. Cross-validation was used in every iteration to 142 evaluate the performance of the pipeline under evaluation. The (in-sample) model learned 143 by AutoPrognosis combined 200 weighted ML pipelines, the strongest of which comprised 144 the MissForest data imputation algorithm, no feature processing steps, an XGBoost 145 ensemble classi er (with 200 estimators) [35], and sigmoid regression for calibration. 146 Details of the model learned by AutoPrognosis is provided in the supporting information 147 (S10 Appendix). In the Results Section, we will directly refer to our model as 148 \AutoPrognosis". 149 Variable Ranking 150 In order to identify the relative importance of the 473 variables used to build our model, 151 we use a post-hoc approach to rank the contribution of the di erent variables in the 152 predictions issued by the model. The ranking is obtained by tting a random forest model 153 with the participants' variables as the inputs, and the predictions of our model as the 154 outputs, and then assigning variable importance scores to the di erent variables using the 155 standard permutation method in [36]. Using the permutation method, we assess the mean 156 decrease in classi cation accuracy for every variable after permuting that variable over all 157 trees. The resulting variable importance scores re ect the impact each variable has on the 158 predictions issued by AutoPrognosis. We used the random forest algorithm for post-hoc 159 variable ranking because it is a nonparametric algorithm that can recognize complex 160 patterns of variable interaction while enabling principled evaluation of variable 161 importance [36]. Other variable ranking methods based on associative classi ers (such as 162 the one proposed in [19]) entail a computational complexity that is exponential in the 163 number of variables, and hence are not suitable for our study as it involves more than 400 164 variables. 165 To disentangle the \modeling gain" achieved by utilizing ML-based techniques from 166 March 8, 2019 5/17 the \information gain" achieved by just using more variables, we created a simpler version 167 of AutoPrognosis that only uses the same 7 core risk factors (age, gender, systolic blood 168 pressure, smoking status, treatment of hypertension, history of diabetes, and BMI) used by 169 the existing prediction algorithms. In addition, we created another version of the 170 AutoPrognosis model that uses only non-laboratory variables in UK Biobank. 171 Table 1. List of algorithms included in AutoPrognosis. Pipeline Stage Algorithms Data Imputation  missForest  Median  Most-frequent  Mean  EM  Matrix completion  MICE  None Feature process.  Feature agglomeration  Kernel PCA  Polynomial  R. kitchen sinks  Fast ICA  PCA  Select Rates  Nystroem  Linear SVM Classi cation  Bernoulli NB  AdaBoost  Decision Tree  Linear SVM  Gradient Boosting  LDA  Gaussian NB  XGBoost  Extr. Random Trees  Multinomial NB  Random Forest  Neural Network  Light GBM  Logistic Regression  Gaussian Process  Survival Forest  Bagging  k-NN  Cox Regression  Ridge Classi er Calibration  Sigmoid  Isotonic  None MICE: multiple imputation by chained equations, EM: expectation maximization, PCA: principal component analysis, ICA: independent component analysis, SVM: support vector machines, NB: Nave Bayes, NN: nearest neighbors, LDA: linear discriminant analysis, GBM: gradient boosting machine. Statistical analysis 172 In order to avoid over- tting, we evaluated the prediction accuracy of all models under 173 consideration via 10-fold strati ed cross-validation using area under the receiver operating 174 characteristic curve (AUC-ROC). In every cross-validation fold, a training sample (381,244 175 participants) was used to derive the Cox PH models, standard ML models, and our model 176 (AutoPrognosis), and then a held-out sample (42,360 participants) was used for 177 performance evaluation. We report the mean AUC-ROC and the 95% con dence intervals 178 (Wilson score intervals) for all models. The calibration performance of our model was 179 evaluated via the Brier score. 180 Results 181 Characteristics of the Study Population 182 A total of 423,604 participants had sucient information for inclusion in this analysis. 183 Overall, the mean (SD) age of participants at baseline was 56.4 (8.1) years, and 188,577 184 participants (44.5%) were male. Over a median follow-up of 7 years (5th-95th percentile: 185 5.7-8.4 years; 3 million person-years at risk), there were 6,703 CVD cases. The mean age 186 of CVD cases was 60.5 years (60.2 years for men and 61.1 years for women). Because the 187 March 8, 2019 6/17 minimum follow-up period for all participants was 5 years, we evaluated the accuracy of 188 the di erent models in predicting the 5-year risk of CVD. At a 5-year horizon, the total 189 number of CVD cases was 4,801. 190 Prediction Accuracy 191 Comparison of Prediction Models 192 The prediction accuracy of the di erent models under consideration evaluated at a 5-year 193 horizon is shown in Table 2. We used the Framingham score as a baseline model for 194 performance evaluation (AUC-ROC: 0.724, 95% CI: 0.720-0.728). Both the Cox PH model 195 with the 7 conventional risk factors (AUC-ROC: 0.734, 95% CI: 0.729-0.739), and the Cox 196 PH model with all variables (AUC-ROC: 0.758, 95% CI: 0.753-0.763) achieved an 197 improvement in the AUC-ROC compared to the baseline model (p < 0.001). The 198 improvement achieved by the Cox PH model that uses the same predictors used by the 199 Framingham score is due in part to the fact that the Cox PH model is directly derived 200 from the analysis cohort, whereas the Framingham score coecients were derived from a 201 di erent population. 202 Table 2. Performance of all prediction models under consideration. Model AUC-ROC Absolute AUC-ROC Change Framingham score 0.724  0.004 Baseline model Cox PH Model (7 core variables) 0.734  0.005 + 1.0% Cox PH Model (all variables) 0.758  0.005 + 3.4% Support Vector Machines 0.709  0.061 - 1.5% Random Forest 0.730  0.004 + 0.6% Neural Networks 0.755  0.005 + 3.1% AdaBoost 0.759  0.004 + 3.5% Gradient Boosting 0.769  0.005 + 4.5% AutoPrognosis (7 core variables) 0.744  0.005 + 2.0% AutoPrognosis (369 non-lab. variables) 0.761  0.005 + 3.7% AutoPrognosis (104 lab. variables) 0.735  0.008 + 1.1% AutoPrognosis (all variables) 0.774  0.005 + 5.0% The Framingham score is provided as the reference model for comparative purposes. With the exception of support vector machines, all the standard ML models achieved 203 statistically signi cant improvements compared to the baseline Framingham score. 204 Furthermore, when compared to the Cox PH model that uses all variables, neural networks, 205 AdaBoost, gradient boosting, and AutoPrognosis all achieved a signi cantly higher 206 AUC-ROC. AutoPrognosis achieved a higher AUC-ROC compared to all other standard ML 207 models (AUC-ROC: 0.774, 95% CI: 0.768-0.780, p < 0.001), which suggests that the 208 automated ML system managed to automatically select and tune the "right" ML model. 209 (The AutoPrognosis model trained on all variables was also well-calibrated, with an 210 in-sample Brier score of 0.0121.) Compared to the most competitive benchmark (the Cox 211 PH model that uses all of the variables), the net re-classi cation index (NRI) was +12.5% 212 in favor of AutoPrognosis. AutoPrognosis trained only with the 7 conventional risk factors 213 still outperformed the baseline Framingham score (p < 0.001). 214 Most of the variables in the UK Biobank are non-laboratory variables collected through 215 March 8, 2019 7/17 an automated touchscreen questionnaire about lifestyle, clinical history and nutritional 216 habits. We evaluated the accuracy of AutoPrognosis once when it is trained with 369 217 variables corresponding to the participants' self-reported information (questionnaires) only, 218 and once when it is trained with 104 variables obtained from blood assays, diagnostic tests, 219 and physiological measurements. As we can see in Table 2, AutoPrognosis with only 220 questionnaire-related variables still achieves a signi cant improvement over the baseline 221 Framingham score (AUC-ROC: 0.752, 95% CI: 0.747-0.757, p < 0.001), and is superior to 222 the model that only uses laboratory-based variables. 223 We also evaluated the survival prediction accuracy of all models under consideration 224 using the (right censored) time-to-event outcomes rather than the binarized outcomes at 225 the 5-year horizon. In this case, we used Harrell's C-index for performance evaluation: 226 results are reported in Table 3. As we can see, the performance trends with respect to the 227 C-index resemble those in Table 2. 228 Table 3. Performance of the prediction models under consideration. Model C-index Absolute C-index Change Framingham score 0.746  0.004 Baseline model Cox PH Model (7 core variables) 0.758  0.004 + 1.2% Cox PH Model (all variables) 0.777  0.005 + 3.1% AutoPrognosis (7 core variables) 0.765  0.005 + 1.9% AutoPrognosis (369 non-lab. variables) 0.781  0.005 + 3.5% AutoPrognosis (104 lab. variables) 0.756  0.007 + 1.0% AutoPrognosis (all variables) 0.791  0.004 + 4.5% Classi cation Analysis 229 In order to better assess the clinical signi cance of our results, we compared the 230 AutoPrognosis model with the traditional Framingham score in predicting 7.5% CVD risk 231 (threshold for initiating lipid-lowering therapies recommended by the NICE guidelines [10]). 232 At this operating point, the Framingham baseline model predicted 2,989 CVD cases 233 correctly from 4,801 total cases, resulting in a sensitivity of 62.2% and PPV of 1.5%. Our 234 AutoPrognosis model correctly predicted 3,357 out of the 4,801 CVD cases, resulting in a 235 sensitivity of 69.9% and PPV of 2.6%. This corresponds to 368 net increase in the number 236 of CVD patients who would bene t from receiving a preventive treatment in a timely 237 manner when utilizing the predictions of our model. 238 Variable Importance 239 Table 4 lists the 20 most important variables ranked according to their contribution to the 240 predictions of the AutoPrognosis model (along with their importance scores). Variables 241 related to physical activity (usual walking pace) and information on blood measurements 242 appeared to be more important for the predictions of AutoPrognosis than traditional risk 243 factors included in most existing scoring systems. For women, a remarkable predictor of 244 CVD risk was the measured \ankle spacing width". This may be linked to symptoms of 245 poor circulation, such as swollen legs, which is predictive of future CVD events [37]. We 246 also found that usage of hormone-replacement therapy (HRT) was on the list of top 247 predictors of CVD risk for women. For men, blood measurements such as haematocrit 248 percentage and haemoglobin concentration, and variables such as urinary sodium 249 concentration were among the most important risk factors. 250 March 8, 2019 8/17 Table 4. Variable ranking by their contribution to the predictions of AutoPrognosis. Variable (Men) Score Variable (Women) Score Age 0.346 Age 0.370 Smoking 0.101 Smoking 0.099 Usual walking pace 0.052 Usual walking pace 0.057 Systolic blood pressure 0.040 Ankle spacing width 0.035 Microalbumin in urine 0.032 Self-reported health rating 0.030 High blood pressure 0.030 Systolic blood pressure 0.026 Red blood cell distribution width 0.025 High blood pressure 0.024 Self-reported health rating 0.019 Red blood cell distribution width 0.023 Haematocrit percentage 0.014 Microalbumin in urine 0.017 Father age at death 0.014 Father age at death 0.017 BMI 0.013 White blood cell count 0.011 Diastolic blood pressure 0.012 Number of Treatments 0.011 White blood cell count 0.012 Mean reticulocyte volume 0.008 Impedance of arm (left) 0.009 Leg predicted mass (right) 0.006 Haemoglobin concentration 0.007 Neutrophill count 0.006 Neutrophill count 0.005 Basal metabolic rate 0.005 Number of Treatments 0.004 Hormone-replac. therapy usage 0.005 Mean reticulocyte volume 0.004 Blood clot in the leg 0.004 Urinary sodium concentration 0.004 Forced expiratory volume 0.004 Monocyte count 0.004 Duration of tness test 0.004  Risk factors utilized by existing risk prediction algorithms. Explanations for the di erent variables in this table are provided in S11 Appendix. Prediction Accuracy in Individuals with History of Diabetes 251 Among the 423,604 participants included in our cohort, a total of 17,908 participants 252 (4.22%) had a known history of diabetes (either Type 1 or Type 2) at baseline. In Table 5, 253 we show the AUC-ROC performance of AutoPrognosis and the baseline Framingham score 254 when validated separately on the diabetic and non-diabetic populations. As we can see, 255 the baseline Framingham score was less accurate in the diabetic population (AUC-ROC: 256 0.578, 95% CI: 0.560-0.596) compared to its achieved accuracy for the overall population 257 (AUC-ROC: 0.724, 95% CI: 0.720-0.728, p < 0.001). On the contrary, AutoPrognosis 258 maintained high predictive accuracy for the diabetic population (AUC-ROC: 0.713, 95% 259 CI: 0.703-0.723). 260 The variable ranking for the diabetic sub-population is provided in Table 6. We note 261 that the list of important variables in the diabetic subgroup is substantially di erent from 262 that of the overall population. One major di erence is that for diabetic patients, 263 microalbuminuria appeared to be strongly linked to an elevated CVD risk. In the overall 264 population (423,604 participants), the average measure of microalbumin in urine was 27.8 265 mg/L for participants with no CVD events, and 52.2 mg/L for participants with CVD 266 events. In the diabetic population (17,908 participants), participants with no CVD events 267 had an average microalbumin in urine of 61.0 mg/L, whereas for those with a CVD event, 268 the average microalbumin in urine was 128.76 mg/L. (Information on microalbumin in 269 urine was available for 30% of the patients in the overall population, and 50% of patients 270 in the diabetic population.) 271 March 8, 2019 9/17 Table 5. Performance of AutoPrognosis in the diabetic patient subgroup. Model AUC-ROC (No diabetes) AUC-ROC (Diabetes) Framingham score 0.724  0.004 0.578  0.018 AutoPrognosis 0.774  0.005 0.713  0.010 Performance of AutoPrognosis and the Framingham score validated separately on a testing cohort of diabetic patients (1,790 participants), and a testing cohort of non-diabetic patients (40,570 participants) via 10-fold cross-validation. AutoPrognosis was trained using the entire training cohort that combines both diabetic and non-diabetic individuals (381,244 participants). Table 6. Variable ranking for the diabetic population. Variable Score Age 0.207 Microalbumin in urine 0.110 Usual walking pace 0.078 Smoking status 0.064 Systolic blood pressure 0.034 Red blood cell distribution width 0.027 Neutrophill count 0.018 Number of Treatments 0.018 High blood pressure 0.014 Urinary sodium concentration 0.014 Predictive Ability of Individual Variables in UK Biobank 272 In order to evaluate the individual predictive ability of the UK Biobank variables, we 273 exhaustively tted simple versions of our AutoPrognosis model for each of the 473 variables. 274 For each such model, we use one distinct variable as an input and evaluate the resulting 275 AUC-ROC. Because most variables are correlated with age and gender, we use the age 276 variable as a second predictor for all models, and t separate models for men and women. 277 The AUC-ROC values of the resulting models are depicted in the scatter-plot in Fig 2. 278 Fig 2. Predictive ability of the UK Biobank variables for men and women. Each point represents a variable in the UK Biobank ordered by the ability to predict CVD events for men and women. Predictions based solely on age achieved an AUC-ROC of 0.632  0.003 for men and 0.665  0.002 for women. We report the AUC-ROC from models trained with individual variables in addition to age, and only display variables that achieved a statistically signi cant improvement in AUC-ROC compared to predictions based on age only. Each color represents a di erent variable category. Variables deviating from the (dotted gray) regression line have an AUC-ROC that di ers between men and women more than expected in view of the overall association between the two genders, suggesting a stronger relative importance in one gender group. As shown in Fig 2, variables related to smoking habits or exposure to tobacco smoke 279 displayed the highest predictive ability. Self-reported health rating was predictive for both 280 genders, but more predictive for women. Existence of long-standing illness was strongly 281 predictive of CVD events for women, and less predictive for men. Variables extracted from 282 the electrocardiogram (ECG) records possessed stronger predictive ability for men. 283 March 8, 2019 10/17 Discussion 284 In this large prospective cohort study, we developed a ML model based on the 285 AutoPrognosis framework for predicting CVD events in asymptomatic individuals. The 286 model was built using data for more than 400,000 UK Biobank participants, with over 450 287 variables for each participant. Our study conveys several key messages. First, 288 AutoPrognosis signi cantly improved the accuracy of CVD risk prediction compared to 289 well-established scoring systems based on conventional risk factors and currently 290 recommended by primary prevention guidelines (Framingham score). Second, 291 AutoPrognosis was able to agnostically discover new predictors of CVD risk. Among the 292 discovered predictors were non-laboratory variables that can be collected relatively easily 293 via questionnaires, such as the individuals' self-reported health ratings and usual walking 294 pace. Third, AutoPrognosis uncovered complex interaction e ects between di erent 295 characteristics of an individual, which led to recognition of risk predictors that are speci c 296 to certain sub-populations for whom existing guidelines were providing unreliable 297 predictions. 298 When can ML help in prognostic modeling? 299 The abundance of a large number of informative variables in the UK Biobank (473 300 variables) guarantees an \information gain" that can be achieved by any data-driven 301 model, including the standard Cox PH model, compared to the existing prediction 302 algorithms that use only a limited number of conventional risk factors (e.g., Framingham 303 score). The results in Table 2 show that, in addition to the information gain, 304 AutoPrognosis also attained a \modeling gain" that allowed it to outperform the standard 305 Cox PH model that uses all of the 473 variables. In general, the modeling gain achieved by 306 AutoPrognosis would result from its ability to select among di erent models with various 307 levels of complexity and numerical robustness in a completely data-driven fashion, without 308 committing to any presupposition about the superiority of any given model. In our 309 experiments, the Cox PH supplied with all of the 473 variables (without variable selection) 310 provided a noticeably poor performance (i.e., an average AUC-ROC of 0.6). This is 311 because the numerical solvers of the Cox PH model collapse when the data dimensionality 312 is very large | this is why a variable selection pre-processing step was essential for tting 313 the Cox PH model. This implies that, even if the true underlying data model is perfectly 314 linear, tting standard linear models such as Cox PH or linear regression may not be 315 sucient for harnessing the information gain, since such models are not numerical robust 316 in high-dimensional settings. AutoPrognosis solves this problem by selecting more robust 317 models that better t the high-dimensional data | in our experiments, these where 318 tree-based models such as XGBoost and random forests. This observation shows that 319 information gain and modeling gain are inherently entangled: to harness the information 320 gain, we need to consider a more complex modeling space. 321 While the information gain appeared to be more signi cant than the modeling gain in 322 our experiments, we note that even when provided with the same 7 core risk factors used 323 by the Framingham score, AutoPrognosis was still able to o er a statistically signi cant 324 AUC-ROC gain compared to the Framingham score and a Cox PH model that uses the 325 same 7 variables. This shows that the modeling gain is not necessarily limited to settings 326 where many predictors are available and numerical robustness, but is rather achievable 327 whenever a small number of predictors display complex interactions. 328 Because not every ML model would necessarily improve over the Framingham score or 329 the simple Cox PH model, our usage of the AutoPrognosis algorithm was essential for 330 realizing the full bene ts of ML modeling. As the results in Table 2 demonstrate, some ML 331 models did not improve over the baseline Framingham score, whereas others provided 332 modest improvements. This is because selection of the right ML model and careful tuning 333 March 8, 2019 11/17 for the model's hyper-parameters are two crucial steps for realizing the potential bene ts 334 of ML. AutoPrognosis automates those steps, which makes ML application easily 335 accessible for mainstream clinical research. The importance of model selection and 336 hyper-parameter optimization have been overlooked in previous clinical studies that applied 337 ML in prognostic modeling [16{18]. Our study is unique in that, to the best of our 338 knowledge, it is the rst to carry out a comprehensive investigation of the performance of 339 ML models in a large cohort with such an extensive number of predictors. 340 Risk prediction with non-laboratory variables 341 Individuals in developed countries tend to seek out health information through online 342 resources and web-based risk calculators [38]. In developing countries, where 80% of all 343 world-wide CVD deaths occur [39], there are limited resources for risk assessment 344 strategies that require laboratory testing [39,40]. The results in Table 2 show that 345 AutoPrognosis could potentially provide reliable risk predictions by using information from 346 non-laboratory variables about the participants' lifestyle and medical history. The most 347 predictive non-laboratory variables included in our model were ages, gender, smoking 348 status, usual walking pace, self-reported overall health rating, previous diagnoses of high 349 blood pressure, income, Townsend index and parents' ages at death. Inclusion of such 350 variables in web-based risk calculators can help provide reasonably accurate risk predictions 351 when obtaining laboratory variables is not viable. 352 One remarkable nding in Table 2 (and Fig 2) is that apart from the well-established 353 age and gender risk factors, two other non-laboratory variables were found to be very 354 predictive of the CVD outcomes; those are the \self-reported health rating", and the 355 \usual walking pace". (Both variables were also found to be predictive of the overall 356 mortality risk in a recent study on the UK Biobank [22].) Neither of the two variables is 357 included in any of the existing risk prediction tools. Walking pace was equally predictive 358 for men and women, but the self-reported health rating was more predictive for women 359 and less for men. This may be explained by either gender-speci c reporting bias or true 360 clinical di erences. Therefore, prediction tools that would include subjective 361 non-laboratory variables, such as the self-reported health rating, should be carefully 362 designed in such a way that self-reporting bias is reduced. 363 Risk predictors speci c to diabetic patients 364 Unlike the Framingham score, AutoPrognosis was able to maintain high predictive 365 accuracy for participants diagnosed with diabetes at baseline (Table 5). This suggests that 366 the AutoPrognosis model has learned diabetes-speci c risk factors that were not previously 367 captured by the existing prediction algorithms. By investigating the risk factor ranking 368 within the diabetic subgroup (Table 6), we found that urinary microalbumin (measured in 369 mg/L) is a very strong marker for increased CVD risk among individuals with diabetes. 370 The dismissal of urinary microalbumin in existing risk scoring systems may explain their 371 poor prognostic performance when validated in cohorts of diabetic patients [12,13]. Our 372 results indicate that predictions based on AutoPrognosis can provide better guidance for 373 CVD preventive care in diabetic patients. 374 It is worth mentioning that the microalbumin in urine measures were available for only 375 125,406 participants in the overall cohort (29.6%). In a standard prognostic study, such a 376 variable may get omitted from the analysis because of its high missingness rate. 377 AutoPrognosis automatically recognized that this variable is relevant for diabetic patients, 378 and hence did not omit it in its feature processing stage. 379 March 8, 2019 12/17 Limitations 380 The main limitation of our study is the absence of the cholesterol biomarkers (total 381 cholesterol, HDL cholesterol and LDL cholesterol) from the latest release of the UK 382 Biobank data repository, which hindered direct comparisons with the QRISK2 scores 383 currently recommended by the NICE guidelines. Furthermore, other blood-based 384 biomarkers have been reported to be associated with CVD risk, but were also not yet 385 released in the UK Biobank data repository, such as triglycerides [41], measures of 386 glycemia [42], markers of in ammation [43], and and natriuretic peptides [44]. Inclusion of 387 such predictors could improve the predictive accuracy of all models tested in this study, 388 and could also alter the risk predictors' ranking in Table 2, but is unlikely to change our 389 conclusions on the usefulness of ML modeling in CVD risk prediction. 390 Another limitation of our study is that the UK Biobank cohort is ethnically 391 homogeneous: 94% of the participants were of white ethnicity. Hence, assessment of the 392 importance of ethnicity as a predictor of CVD events and the recognition of 393 ethnicity-speci c risk predictors was not possible in our study. 394 Supporting information 395 S1 Table List of blood test measurements collected for the UK Biobank 396 participants. 397 S2 Table List of variables on the participants' family history. 398 S3 Table List of variables on the participants' health and medical history. 399 S4 Table List of variables on the participants' dietary and nutritional information. 400 S5 Table List of variables on the participants' physical measures. 401 S6 Table List of variables on the participants' psychosocial status. 402 S7 Table List of variables on the participants' physical activity. 403 S8 Table List of variables on the participants' life style and environment. 404 S9 Table List of variables on the participants' sociodemographics. 405 S10 Appendix Machine learning pipelines used by the AutoPrognosis model. 406 S11 Appendix Explanation for the variables in Table 4. 407 Acknowledgments 408 This research has been conducted using the UK Biobank resource under application 409 number 26865. AMA and MvdS are supported by the Oce of Naval Research (ONR), 410 and the National Science Foundation (NSF). JHFR is part-supported by the NIHR 411 Cambridge Biomedical Research Centre, the British Heart Foundation, HEFCE, the 412 EPSRC and the Wellcome Trust. 413 March 8, 2019 13/17 The UK Biobank data is accessible through a request process 414 (http://www.ukbiobank.ac.uk/register-apply/). The authors had no special access or 415 privileges accessing the data that other researchers would not have. A Python 416 implementation for AutoPrognosis is publicly available in 417 https://github.com/ahmedmalaa/AutoPrognosis. 418 References 419 1. Thomas MR, Lip GY. Novel risk markers and risk assessments for cardiovascular 420 disease. Circulation research. 2017;120(1):133{149. 421 2. Ridker PM, Danielson E, Fonseca F, Genest J, Gotto Jr AM, Kastelein J, et al. 422 Rosuvastatin to prevent vascular events in men and women with elevated C-reactive 423 protein. New England Journal of Medicine. 2008;359(21):2195. 424 3. Kremers HM, Crowson CS, Therneau TM, Roger VL, Gabriel SE. High ten-year risk 425 of cardiovascular disease in newly diagnosed rheumatoid arthritis patients: A 426 population-based cohort study. Arthritis & Rheumatology. 2008;58(8):2268{2274. 427 4. DAgostino RB, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, et al. 428 General cardiovascular risk pro le for use in primary care: the Framingham Heart 429 Study. Circulation. 2008;117(6):743{753. 430 5. Conroy R, Pyorala K, Fitzgerald Ae, Sans S, Menotti A, De Backer G, et al. 431 Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE 432 project. European heart journal. 2003;24(11):987{1003. 433 6. Sjostrom L, Lindroos AK, Peltonen M, Torgerson J, Bouchard C, Carlsson B, et al. 434 Lifestyle, diabetes, and cardiovascular risk factors 10 years after bariatric surgery. 435 New England Journal of Medicine. 2004;351(26):2683{2693. 436 7. Greenland P, Alpert JS, Beller GA, Benjamin EJ, Budo MJ, Fayad ZA, et al. 2010 437 ACCF/AHA guideline for assessment of cardiovascular risk in asymptomatic adults: 438 a report of the American College of Cardiology Foundation/American Heart 439 Association task force on practice guidelines developed in collaboration with the 440 American Society of Echocardiography, American Society of Nuclear Cardiology, 441 Society of Atherosclerosis Imaging and Prevention, Society for Cardiovascular 442 Angiography and Interventions, Society of Cardiovascular Computed Tomography, 443 and Society for Cardiovascular Magnetic Resonance. Journal of the American 444 College of Cardiology. 2010;56(25):e50{e103. 445 8. Piepoli MF, Hoes AW, Agewall S, Albus C, Brotons C, Catapano AL, et al. 2016 446 European Guidelines on cardiovascular disease prevention in clinical practice: The 447 Sixth Joint Task Force of the European Society of Cardiology and Other Societies 448 on Cardiovascular Disease Prevention in Clinical Practice (constituted by 449 representatives of 10 societies and by invited experts) Developed with the special 450 contribution of the European Association for Cardiovascular Prevention & 451 Rehabilitation (EACPR). Atherosclerosis. 2016;252:207{274. 452 9. Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, Minhas R, Sheikh A, et al. 453 Predicting cardiovascular risk in England and Wales: prospective derivation and 454 validation of QRISK2. Bmj. 2008;336(7659):1475{1482. 455 10. Hippisley-Cox J, Coupland C, Brindle P. Development and validation of QRISK3 456 risk prediction algorithms to estimate future risk of cardiovascular disease: 457 prospective cohort study. bmj. 2017;357:j2099. 458 March 8, 2019 14/17 11. Siontis GC, Tzoulaki I, Siontis KC, Ioannidis JP. Comparisons of established risk 459 prediction models for cardiovascular disease: systematic review. Bmj. 460 2012;344:e3318. 461 12. Coleman RL, Stevens RJ, Retnakaran R, Holman RR. Framingham, SCORE, and 462 DECODE risk equations do not provide reliable cardiovascular risk estimates in type 463 2 diabetes. Diabetes care. 2007;30(5):1292{1293. 464 13. McEwan P, Williams J, Griths J, Bagust A, Peters J, Hopkinson P, et al. 465 Evaluating the performance of the Framingham risk equations in a population with 466 diabetes. Diabetic medicine. 2004;21(4):318{323. 467 14. Martn-Timon I, Sevillano-Collantes C, Segura-Galindo A, del Ca~nizo-Gomez FJ. 468 Type 2 diabetes and cardiovascular disease: have all risk factors the same strength? 469 World journal of diabetes. 2014;5(4):444. 470 15. Buse JB, Ginsberg HN, Bakris GL, Clark NG, Costa F, Eckel R, et al. Primary 471 prevention of cardiovascular diseases in people with diabetes mellitus: a scienti c 472 statement from the American Heart Association and the American Diabetes 473 Association. Circulation. 2007;115(1):114{126. 474 16. Ambale-Venkatesh B, Wu CO, Liu K, Hundley W, McClelland RL, Gomes AS, et al. 475 Cardiovascular event prediction by machine learning: the Multi-Ethnic Study of 476 Atherosclerosis. Circulation research. 2017; p. CIRCRESAHA{117. 477 17. Ahmad T, Lund LH, Rao P, Ghosh R, Warier P, Vaccaro B, et al. Machine Learning 478 Methods Improve Prognostication, Identify Clinically Distinct Phenotypes, and 479 Detect Heterogeneity in Response to Therapy in a Large Cohort of Heart Failure 480 Patients. Journal of the American Heart Association. 2018;7(8):e008081. 481 18. Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve 482 cardiovascular risk prediction using routine clinical data? PloS one. 483 2017;12(4):e0174944. 484 19. Alaa AM, van der Schaar M. AutoPrognosis: Automated Clinical Prognostic 485 Modeling via Bayesian Optimization with Structured Kernel Learning. International 486 Conference on Machine Learning. 2018;. 487 20. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK biobank: 488 an open access resource for identifying the causes of a wide range of complex 489 diseases of middle and old age. PLoS medicine. 2015;12(3):e1001779. 490 21. Palmer LJ. UK Biobank: bank on it. The Lancet. 2007;369(9578):1980{1982. 491 22. Ganna A, Ingelsson E. 5 year mortality predictors in 498 103 UK Biobank 492 participants: a prospective population-based study. The Lancet. 493 2015;386(9993):533{540. 494 23. Adamska L, Allen N, Flaig R, Sudlow C, Lay M, Landray M. Challenges of linking 495 to routine healthcare records in UK Biobank. Trials. 2015;16(2):O68. 496 24. Go DC, Lloyd-Jones DM, Bennett G, Coady S, DAgostino RB, Gibbons R, et al. 497 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the 498 American College of Cardiology/American Heart Association Task Force on Practice 499 Guidelines. Journal of the American College of Cardiology. 2014;63(25 Part 500 B):2935{2959. 501 March 8, 2019 15/17 25. Stekhoven DJ, Buhlmann P. MissForestnon-parametric missing value imputation for 502 mixed-type data. Bioinformatics. 2011;28(1):112{118. 503 26. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal 504 Statistical Society Series B (Methodological). 1996; p. 267{288. 505 27. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. 506 IEEE Intelligent Systems and their applications. 1998;13(4):18{28. 507 28. Breiman L. Random forests. Machine learning. 2001;45(1):5{32. 508 29. LeCun Y, Bengio Y, Hinton G. Deep learning. nature. 2015;521(7553):436. 509 30. Ratsch G, Onoda T, Muller KR. Soft margins for AdaBoost. Machine learning. 510 2001;42(3):287{320. 511 31. Friedman JH. Greedy function approximation: a gradient boosting machine. Annals 512 of statistics. 2001; p. 1189{1232. 513 32. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. 514 Scikit-learn: Machine learning in Python. Journal of machine learning research. 515 2011;12(Oct):2825{2830. 516 33. Ghahramani Z. Probabilistic machine learning and arti cial intelligence. Nature. 517 2015;521(7553):452. 518 34. Snoek J, Larochelle H, Adams RP. Practical bayesian optimization of machine 519 learning algorithms. In: Advances in neural information processing systems; 2012. p. 520 2951{2959. 521 35. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of 522 the 22nd acm sigkdd international conference on knowledge discovery and data 523 mining. ACM; 2016. p. 785{794. 524 36. Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A. Conditional variable 525 importance for random forests. BMC bioinformatics. 2008;9(1):307. 526 37. Allison MA, Hiatt WR, Hirsch AT, Coll JR, Criqui MH. A high ankle-brachial index 527 is associated with increased cardiovascular disease morbidity and lower quality of life. 528 Journal of the American College of Cardiology. 2008;51(13):1292{1298. 529 38. Hesse BW, Nelson DE, Kreps GL, Croyle RT, Arora NK, Rimer BK, et al. Trust and 530 sources of health information: the impact of the Internet and its implications for 531 health care providers: ndings from the rst Health Information National Trends 532 Survey. Archives of internal medicine. 2005;165(22):2618{2624. 533 39. Gaziano TA, Young CR, Fitzmaurice G, Atwood S, Gaziano JM. Laboratory-based 534 versus non-laboratory-based method for assessment of cardiovascular disease risk: 535 the NHANES I Follow-up Study cohort. The Lancet. 2008;371(9616):923{931. 536 40. Mendis S, Lindholm LH, Mancia G, Whitworth J, Alderman M, Lim S, et al. World 537 Health Organization (WHO) and International Society of Hypertension (ISH) risk 538 prediction charts: assessment of cardiovascular risk for prevention and control of 539 cardiovascular disease in low and middle-income countries. Journal of hypertension. 540 2007;25(8):1578{1582. 541 41. Assmann G, Cullen P, Schulte H. Simple scoring scheme for calculating the risk of 542 acute coronary events based on the 10-year follow-up of the prospective 543 cardiovascular Munster (PROCAM) study. Circulation. 2002;105(3):310{315. 544 March 8, 2019 16/17 42. Eeg-Olofsson K, Cederholm J, Nilsson P, Zethelius B, Svensson AM, 545 Gudbjornsdottir S, et al. New aspects of HbA1c as a risk factor for cardiovascular 546 diseases in type 2 diabetes: an observational study from the Swedish National 547 Diabetes Register (NDR). Journal of internal medicine. 2010;268(5):471{482. 548 43. Collaboration ERF. C-reactive protein, brinogen, and cardiovascular disease 549 prediction. New England Journal of Medicine. 2012;367(14):1310{1320. 550 44. Willeit P, Kaptoge S, Welsh P, Butterworth AS, Chowdhury R, Spackman SA, et al. 551 Natriuretic peptides and integrated risk assessment for cardiovascular disease: an 552 individual-participant-data meta-analysis. The Lancet Diabetes & Endocrinology. 553 2016;4(10):840{849. 554 March 8, 2019 17/17