Article https://doi.org/10.1038/s41467-023-39253-3 Genome-wide association analysis and Mendelian randomization proteomics identify drug targets for heart failure Danielle Rasooly 1,2,30 , Gina M. Peloso 2,3,30, Alexandre C. Pereira4,5, Hesam Dashti1,6, Claudia Giambartolomei 7,8, Eleanor Wheeler 9, Nay Aung 10,11, Brian R. Ferolito2, Maik Pietzner 9,12,13, Eric H. Farber-Eger 14, Quinn Stanton Wells15, Nicole M. Kosik 2, Liam Gaziano2,16, Daniel C. Posner2, A. Patrícia Bento17, Qin Hui18,19, Chang Liu 18, Krishna Aragam 2,6,20, Zeyuan Wang18, Brian Charest2, Jennifer E. Huffman 2, Peter W. F. Wilson19,21, Lawrence S. Phillips19,22, John Whittaker 23, Patricia B. Munroe 24,25, Steffen E. Petersen 11,26, Kelly Cho1,2, Andrew R. Leach 17, María Paula Magariños17, John Michael Gaziano1,2, VA Million Veteran Program*, Claudia Langenberg9,12,13,31, Yan V. Sun 18,19,27,31, Jacob Joseph28,29,31 & Juan P. Casas1,2,31 We conduct a large-scale meta-analysis of heart failure genome-wide associa- tion studies (GWAS) consisting of over 90,000 heart failure cases and more than 1 million control individuals of European ancestry to uncover novel genetic determinants for heart failure. Using the GWAS results and blood protein quantitative loci, we perform Mendelian randomization and colocali- zation analyses on human proteins to provide putative causal evidence for the role of druggable proteins in the genesis of heart failure. We identify 39 genome-wide significant heart failure risk variants, of which 18 are previously unreported.Using a combinationofMendelian randomizationproteomics and genetic cis-only colocalization analyses, we identify 10 additional putatively causal genes for heart failure. Findings from GWAS and Mendelian randomization-proteomics identify seven (CAMK2D, PRKD1, PRKD3, MAPK3, TNFSF12, APOC3 andNAE1) proteins as potential targets for interventions to be used in primary prevention of heart failure. Heart failure (HF) is one of the most important threats to the sus- tainability of health systems in the United States1. Despite major improvements in the understanding of risk factors for incident HF2, this knowledge has not yet been fully translated into effective inter- ventions for the primary prevention of HF, except for blood pressure (BP) lowering medications3 and statins4. Due to the inherent attributes of human genetics that minimize the risk of residual confounding and reverse causation5, large-scale genomic analyses provide an opportunity to uncover putative causal mechanisms for complex phenotypes such as HF6. Recent genome-wide association studies (GWAS) of HF by the Heart Failure Molecular Epidemiology for Ther- apeutic Targets (HERMES) and the Million Veteran Program (MVP)7 have identified 26 genomic loci associated with HF8. This emerging knowledge has served to identify novel biological mechanisms asso- ciated with incident HF and may inform the development of novel interventions for the primary prevention of HF. Received: 8 December 2022 Accepted: 5 June 2023 Check for updates A full list of affiliations appears at the end of the paper. *A list of authors and their affiliations appears at the end of the paper. e-mail: drasooly@bwh.harvard.edu; jacob.joseph@va.gov Nature Communications | (2023) 14:3826 1 12 34 56 78 9 0 () :,; 12 34 56 78 9 0 () :,; http://orcid.org/0000-0001-7715-1809 http://orcid.org/0000-0001-7715-1809 http://orcid.org/0000-0001-7715-1809 http://orcid.org/0000-0001-7715-1809 http://orcid.org/0000-0001-7715-1809 http://orcid.org/0000-0002-5355-8636 http://orcid.org/0000-0002-5355-8636 http://orcid.org/0000-0002-5355-8636 http://orcid.org/0000-0002-5355-8636 http://orcid.org/0000-0002-5355-8636 http://orcid.org/0000-0003-2786-1225 http://orcid.org/0000-0003-2786-1225 http://orcid.org/0000-0003-2786-1225 http://orcid.org/0000-0003-2786-1225 http://orcid.org/0000-0003-2786-1225 http://orcid.org/0000-0002-8616-6444 http://orcid.org/0000-0002-8616-6444 http://orcid.org/0000-0002-8616-6444 http://orcid.org/0000-0002-8616-6444 http://orcid.org/0000-0002-8616-6444 http://orcid.org/0000-0001-5095-1611 http://orcid.org/0000-0001-5095-1611 http://orcid.org/0000-0001-5095-1611 http://orcid.org/0000-0001-5095-1611 http://orcid.org/0000-0001-5095-1611 http://orcid.org/0000-0003-3437-9963 http://orcid.org/0000-0003-3437-9963 http://orcid.org/0000-0003-3437-9963 http://orcid.org/0000-0003-3437-9963 http://orcid.org/0000-0003-3437-9963 http://orcid.org/0000-0003-0281-3796 http://orcid.org/0000-0003-0281-3796 http://orcid.org/0000-0003-0281-3796 http://orcid.org/0000-0003-0281-3796 http://orcid.org/0000-0003-0281-3796 http://orcid.org/0000-0003-1384-7035 http://orcid.org/0000-0003-1384-7035 http://orcid.org/0000-0003-1384-7035 http://orcid.org/0000-0003-1384-7035 http://orcid.org/0000-0003-1384-7035 http://orcid.org/0000-0002-8918-7224 http://orcid.org/0000-0002-8918-7224 http://orcid.org/0000-0002-8918-7224 http://orcid.org/0000-0002-8918-7224 http://orcid.org/0000-0002-8918-7224 http://orcid.org/0000-0003-3223-9131 http://orcid.org/0000-0003-3223-9131 http://orcid.org/0000-0003-3223-9131 http://orcid.org/0000-0003-3223-9131 http://orcid.org/0000-0003-3223-9131 http://orcid.org/0000-0002-9672-2491 http://orcid.org/0000-0002-9672-2491 http://orcid.org/0000-0002-9672-2491 http://orcid.org/0000-0002-9672-2491 http://orcid.org/0000-0002-9672-2491 http://orcid.org/0000-0002-3529-2379 http://orcid.org/0000-0002-3529-2379 http://orcid.org/0000-0002-3529-2379 http://orcid.org/0000-0002-3529-2379 http://orcid.org/0000-0002-3529-2379 http://orcid.org/0000-0002-4176-2947 http://orcid.org/0000-0002-4176-2947 http://orcid.org/0000-0002-4176-2947 http://orcid.org/0000-0002-4176-2947 http://orcid.org/0000-0002-4176-2947 http://orcid.org/0000-0003-4622-5160 http://orcid.org/0000-0003-4622-5160 http://orcid.org/0000-0003-4622-5160 http://orcid.org/0000-0003-4622-5160 http://orcid.org/0000-0003-4622-5160 http://orcid.org/0000-0001-8178-0253 http://orcid.org/0000-0001-8178-0253 http://orcid.org/0000-0001-8178-0253 http://orcid.org/0000-0001-8178-0253 http://orcid.org/0000-0001-8178-0253 http://orcid.org/0000-0002-2838-1824 http://orcid.org/0000-0002-2838-1824 http://orcid.org/0000-0002-2838-1824 http://orcid.org/0000-0002-2838-1824 http://orcid.org/0000-0002-2838-1824 http://crossmark.crossref.org/dialog/?doi=10.1038/s41467-023-39253-3&domain=pdf http://crossmark.crossref.org/dialog/?doi=10.1038/s41467-023-39253-3&domain=pdf http://crossmark.crossref.org/dialog/?doi=10.1038/s41467-023-39253-3&domain=pdf http://crossmark.crossref.org/dialog/?doi=10.1038/s41467-023-39253-3&domain=pdf mailto:drasooly@bwh.harvard.edu mailto:jacob.joseph@va.gov Novel technological developments can simultaneously measure thousands of human proteins in a single blood sample. The SOMAscan V4 assay includes 5207 aptamers capable of measuring 4988 unique human proteins, of which 514 are the target of drugs licensed or in the clinical phase, 1153 are the target of compounds in the preclinical phase, and 1377 are proteins predicted to be druggable9,10. This offers a unique opportunity for translating the genomic findings of HF into novel interventions for the primary prevention of HF. Given that humanproteins account for themajority of targets for approved drugs to date and that expression or activity is central to the development of human disease11, leveraging GWAS data of HF and protein quantitative trait loci (pQTL) offers an opportunity to provide mechanistic insight into the causal pathways involved in the emergence of HF as well as to inform novel therapeutic targets. Here, we conduct a meta-analysis of GWAS on HF from the MVP and theHERMES consortium and leverage our GWAS of HFwith pQTLs from theFenland study to conductMendelian randomization (MR) and genetic colocalization analyses on human proteins covered by SOMAscan V412. We then perform extensive downstream analyses covering HF risk factors, cardiac MRI traits, -omics, and downstream transcriptomics analyses to investigate the biological credibility of our genetic findings. Results Genome-wide meta-analysis identifies 18 novel loci for HF We meta-analyzed GWAS on HF from the HERMES consortium and MVP (Supplementary Data 1) and identified variants at GW-significance (p < 5 × 10−8) (Fig. 1). The quantile-quantile (Q-Q) plot of the meta- analysis is shown in Supplementary Fig. 1. We performed follow-up analysis of thenewly discoveredHF variants to identify the likely causal gene for each signal and to investigate associations with 15 HF risk factors and nine left ventricular (LV) cardiac MRI traits. We performed meta-analyses of genome-wide association results for HF from two studies: MVP (ncases = 43,344; ncontrols = 258,943) and HERMES (ncases = 47,309; ncontrols = 930,014). After quality control, we obtained association results for 10,227,138 genetic variants with HF. We observed 39 variants with genome-wide significant signals with HF, of which 18 variants were >500KB from a previously reported indexed variant (Fig. 2 and SupplementaryData 2).Weperformedfine-mapping using GWAS summary statistics (Supplementary Fig. 2). We deter- mined the gene closest to the indexed SNP, aswell as the genewith the highest score from Polygenic Priority Score (PoPs)13 within a 500KB region of the indexed SNP (Table 1). PoPs take genome-wide features into account while the nearest gene is based on local information, providing complementary information for annotation of indexed Fig. 1 | Schematic diagram of the datasets and analyses. HF heart failure, MVP Million Veteran Program cohort, GWAS genome-wide association study, pQTL protein quantitative trait loci, PheWAS phenome-wide association study, MR Men- delian randomization, FDR false discovery rate, PP.H4 posterior probability of H4. Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 2 variants (seeMethods). For all the genes suggestedby the nearest gene and PoPS, we retrieved the results from gene-burden tests using putative Loss-of-Function (pLoF) variants from the Genebass-UK Bio- bank resource (seeMethods)14.RFX4 andUBC, both suggestedby PoPs, showed the most significant gene-based p values with HF (p values of 9.12 × 10−4 and 4.6 × 10−3, respectively). From herein, we used genes suggested by PoPs as default to describe the distinct variants. Except for rs6945340/HIP1 and rs79682748/SGIP1, all other dis- tinct variants for HF had an association (defined as 0.01/number of secondary traits, p < 1 × 10−4) with at least one HF risk factor (Fig. 3a). Five variants had the largest number of associations with HF risk fac- tors: rs9352691/PHIP (blood pressure, body mass index (BMI), high- density lipoprotein cholesterol (HDL-C), alcohol consumption, and atrial fibrillation (AF)), rs12992672/TMEM18 (BMI, HDL-C, type-2 dia- betes mellitus (T2DM), AF, and smoking), rs4755720/ HSD17B12 (BMI, HDL-C, T2DM, and CAD), rs233806/BANK1 (blood pressure, HDL-C, and BMI) and rs959388/PRKD1 (BMI, smoking, and blood pressure), details in SupplementaryData 3.Weobserved that the directionality of the associations with HF risk factors was concordant with the findings onHF risk in32out of the42 (76%)associations.HDL-C anddiastolic BP accounted for nine of the ten discordant associations (Supplementary Fig. 3).Wedidnotfindassociationswith troponin,NT-proBNP, and IL-6 (Supplementary Data 3). Only three variants (rs3820888/SPATS2L, rs4755720/HSD17B12, and rs72688573/FAF1) showed at least one association (p < 1 × 10−4) with LV cardiac MRI traits (Supplementary Fig. 4 and Supplementary Fig. 2 | Manhattan plots showing associations with HF from a GWAS meta- analysis on n = 1,266,315 individuals and b MR-wide proteomics. a Manhattan plot showing the −log10(P value) of association for each SNP from the GWASmeta- analysisplottedon the y-axis against genomicpositionon the x-axis. The reddotted line corresponds to the genome-wide significance threshold. The summary statis- tics of independent lead SNPs are noted in SupplementaryData 1. bManhattan plot showing the −log10-transformed FDR-adjusted P value of association for each gene plotted against genomic position on the x-axis. All tests were two-sided and adjusted for multiple comparisons. The blue line corresponds to an FDR threshold of 5% and points are color-coded by drug tractability information based on data provided by OpenTargets; green for druggable genes. FDR false discovery rate. Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 3 Data 3). The rs3820888/SPATS2L variant was associated with six LV cardiac MRI traits and AF; all these associations were directionally concordant with the HF findings. The rs4755720/HSD17B12 variant was associated with LV end-diastolic volume indexed to body surface area and four HF risk factors, and rs72688573/FAF1 was associated with LV mass to end-diastolic volume ratio and two HF risk factors, see details in Supplementary Data 3. In the African-American subpopulation from the MVP GWAS (Supplementary Data 4), we found none of our 39 genome-wide significant distinct variants with HF in the European datasets achieved genome-wide significance (Supplementary Data 5). MR Proteomics and colocalization identifies ten genes for HF Using the GWAS data on SOMAscan V4 proteomics, we selected conditionally independent cis-variants, defined as any variant within a +/− 1 Mb region of the protein-encoding gene, that is associated with plasma levels of SOMAscan proteins (p < 5 × 10−8). We propose that these variants are instrumental variables for measured SOMAscan proteins and conducted two-sample MR analyses using our European-descent GWASmeta-analysis of HF from the MVP and HERMES consortium. We conducted several analyses to minimize confounding and biases. For the MR results that passed our Table 1 | Loci reported for HF in the meta-analysis of HERMES and MVP HF GWAS datasets rsID Chr Pos Nearest Gene PoPs gene CADD Phred Score NEA EA MAF Beta SE P value Novel variants rs4755720 11 43628749 HSD17B12 HSD17B12 7.526 C T 0.400 −0.037 0.006 8.14E-11 rs7766436 6 22598259 HDGFL1 HDGFL1 1.902 C T 0.286 0.037 0.006 5.18E-10 rs3820888 2 201180023 SPATS2L SPATS2L 6.261 C T 0.378 −0.034 0.006 1.20E-09 rs12992672 2 632592 TMEM18 TMEM18 1.061 G A 0.172 0.045 0.007 1.70E-09 rs10846742 12 125308682 SCARB1 UBC 0.277 G A 0.173 −0.046 0.008 1.73E-09 rs17620390 4 114384328 CAMK2D CAMK2D 1.907 C A 0.284 −0.037 0.006 1.90E-09 rs72688573 1 50746997 FAF1 FAF1 6.834 C T 0.022 −0.122 0.021 4.31E-09 rs10938398 4 45186139 GNPDA2 N/A 1.094 G A 0.421 0.033 0.006 4.50E-09 rs6945340 7 75100124 POM121C HIP1 3.58 C T 0.208 −0.040 0.007 5.89E-09 rs7564469 2 145258445 ZEB2 GTDC1 19.56 C T 0.165 −0.043 0.007 6.66E-09 rs7977247 12 107259470 RIC8B RFX4 1.868 C T 0.434 0.032 0.006 1.07E-08 rs1016287 2 59305625 FANCL N/A 19.22 C T 0.280 0.037 0.006 1.11E-08 rs959388 14 30169987 PRKD1 PRKD1 0.596 G T 0.417 −0.031 0.006 1.30E-08 rs233806 4 103212846 SLC39A8 BANK1 9.713 C T 0.207 −0.037 0.007 1.57E-08 rs17038861 2 37233265 HEATR5B PRKD3 0.403 G T 0.195 0.039 0.007 2.35E-08 rs9352691 6 79785607 PHIP PHIP 6.373 C T 0.365 0.032 0.006 2.65E-08 rs10520390 19 46327831 SYMPK DMWD 2.849 G C 0.059 0.074 0.013 2.87E-08 rs79682748 1 66989719 SGIP1 SGIP1 4.719 G A 0.018 −0.155 0.028 3.00E-08 Previously reported variants rs7859727 9 22102165 CDKN2B CDKN2A 1.448 C T 0.488 0.061 0.006 3.11E-29 rs2634071 4 111669220 PITX2 PITX2 1.622 C T 0.219 0.079 0.007 3.64E-29 rs11642015 16 53802494 FTO RPGRIP1L 4.826 C T 0.432 0.058 0.006 2.69E-25 rs10455872 6 161010118 LPA PLG 0.146 G A 0.074 −0.104 0.011 8.20E-23 rs3176326 6 36647289 CDKN1A CDKN1A 10.88 G A 0.173 −0.068 0.007 2.51E-22 rs602633 1 109821511 PSRC1 CELSR2 8.63 G T 0.207 −0.054 0.007 5.19E-16 rs1739833 1 16331108 C1orf64 ZBTB17 4.205 C T 0.331 −0.048 0.006 7.89E-15 rs17617337 10 121426884 BAG3 BAG3 0.079 C T 0.218 −0.050 0.007 8.88E-14 rs600038 9 136151806 ABO SURF1 7.596 C T 0.215 −0.049 0.007 9.44E-14 rs34163229 10 75406912 SYNPO2L SEC24C 24 G T 0.133 −0.056 0.008 7.40E-13 rs113437066 17 65836220 BPTF BPTF 2.588 ATTT A 0.197 0.061 0.010 1.81E-10 rs11746435 5 137006762 KLHL3 HNRNPA0 7.216 T A 0.229 0.042 0.007 2.04E-10 rs2832275 21 30602994 BACH1 LTN1 1.08 T A 0.139 −0.047 0.008 2.90E-10 rs7795282 7 74122857 GTF2I GTF2IRD1 0.262 G A 0.221 −0.042 0.007 7.69E-10 rs12933292 16 69566309 NFAT5 NFAT5 0.403 G C 0.425 0.034 0.006 8.96E-10 rs216199 17 2200871 SMG6 SMG6 3.442 C T 0.388 −0.037 0.006 1.11E-09 rs2013002 12 112200150 ALDH2 ATXN2 4.336 C T 0.415 0.033 0.006 5.68E-09 rs17163345 1 222806218 MIA3 MIA3 7.541 G A 0.270 −0.034 0.006 2.15E-08 rs3764351 17 37824339 PNMT MED1 5.182 G A 0.340 −0.033 0.006 2.27E-08 rs9349379 6 12903957 PHACTR1 PHACTR1 5.478 G A 0.401 −0.031 0.006 2.58E-08 rs4327120 18 36532976 N/A N/A 1.032 C T 0.128 0.050 0.009 3.09E-08 Findings were identified using fixed effects inverse-variance weighted meta-analysis. The chromosomal position is based on GRCh37/hg19 reference. Gene names are italicized. Genes that are druggable or predicted to be druggable are highlighted in bold. CADD combined annotation-dependent depletion, NEA non-effect allele, EA effect allele, MAF minor allele frequency, SE standard error. Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 4 significance threshold (FDR <5%), we performed genetic colocali- zation analysis to ensure the MR results were unlikely to be con- founded by linkage disequilibrium (LD). For the MR results with evidence of colocalization, we conducted MR and colocalization analyses against HF risk factors and cardiac MRI traits and cis-eQTL searches. Then, we conducted a novel multi-step analytical approach to reduce the risk of horizontal pleiotropy. We used 2900 cis-pQTLs across 1557 genes from the Fenland study as proposed instrumental variables for conducting two-sample MR of proteomics with HF. We found 16 genes passed our MR threshold (FDR <5%), of which ten genes also showed suggestive evi- dence of colocalization between HF and pQTL signals (posterior probably ofHypothesis 4 (PP.H4): one commoncausal variant >0.5) for at least one of the instruments, and of which three genes show strong Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 5 evidence of colocalization (PP.H4 >0.8), see details in Table 2 and Supplementary Data 6. Except for ENPEP, no other gene that coloca- lizedwaswithin 500KBof a knownHFGWAS loci. For geneswithmore than one instrument, we did not observe any evidence of hetero- geneity based on Cochran’s Q statistic according to the IVW model or by the MR-Egger intercept test, Table 2. This lack of heterogeneity suggests that average directional horizontal pleiotropy may not explain these findings. Except for ENPEP, TNXB, and SIRPA, all the other genes that passed thresholds for MR and colocalization with HF also showed an asso- ciation (defined as MR p < 1 × 10−4 and colocalization: PP.H4 >0.5) with at leastoneof the 15HF risk factors (Fig. 3b andSupplementaryData 7). We observed that the directionality of theMR associations with HF risk factors was concordant with the MR findings on HF in 10 out of the 14 (71%) associations. HDL-C, LDL-C, and systolic BP accounted for dis- cordant associations. Only the TNFSF12 gene showed an association with an LV cardiac MRI trait that passed statistical thresholds for MR and colocalization, see details in Supplementary Data 7, 8. We inves- tigated if the cis-pQTL instruments for the ten MR genes were also cis- eQTLs (p < 5 × 10−8). Twelve of the 18 proposed instruments were also cis-eQTLs in at least one tissue. None of the cis-pQTLs used as pro- posed instruments for TNXB, APOC3, and APOH genes showed a cis- eQTL association (Supplementary Data 9). In our assessment of horizontal pleiotropy (see Methods and Supplementary Fig. 5), the 18 proposed instruments for the ten MR genes were associated (p < 5 × 10−8) with 251 proteins or gene expression using SOMAscan V4, Fenland study and eQTLGen, respectively (Supplementary Data 10). For 217 of the 251 proteins/ gene expression, we identified at least one cis-pQTL or cis-eQTL at p < 5 × 10−8 associated with protein levels based on the SOMAscan V4 Fenland study, or gene expression based on eQTLGen. We then conducted two-sample MR of these secondary proteins/genes expression against HF and identified four genes (TP53, ZNF259, ACVR2A, and MYRF) that passed multiple testing thresholds (0.05/ 217, p < 2 × 10−4, Supplementary Data 10). These four secondary genes correspond to the following genes identified by MR pro- teomics as hits for HF: TNXB (ACVR2A and MYRF), APOH (TP53), and APOC3 (ZNF259). TP53 and ACVR2A were in a different biological pathway than APOH and TNXB, respectively, suggesting potential horizontal pleiotropy. ZNF259 and MYRF did not retrieve any bio- logical pathways; hence, it is unknown if these are due to horizontal pleiotropy. We then determined protein–protein interaction (PPI) networks for APOH and TNXB proteins using Enrichr and GPS-Prot databases. The Enrichr’s PPI Hub Protein pathways reported inter- actions between APOH and CDC42, AKT1, TP53, and GRB2 (adjusted p values <0.04), while the GPS-Prot showed that the APOH protein is directly connected to TP53 with confidence >0.6 (Supplementary Fig. 6). No significant interaction was identified for the TNXB and ACVR2A proteins. Genetic correlation estimates Estimates of the genetic correlation between HF and 15 HF risk factors are reported in Supplementary Data 11. Results that pass multiple testing at 5% FDR are denoted, including a positive genetic correlation betweenHF andBMIof 0.56 (0.03) andwith AF of 0.11 (0.02), aswell as a negative genetic correlation between HF and HDL-C of −0.36 (0.03) (Supplementary Data 11). Polygenic risk score validation To test the PRS for HF in an out-of-sample cohort, we used data from 75,119 participants of European descent from the BioVU, of which 5845 participants hadHF. Individuals with a 1-standard deviation increase in the PRS had a 1.28 higher odds of HF (95% confidence interval (CI), 1.24–1.31; p < 2 × 10−16). Participants in the top decile had a 1.82-fold (95% CI, 1.60–2.06; p < 0.0001) higher odds of HF compared to those in the bottom PRS decile. Pathway enrichment analysis recovers pathways relevant to HF We used previously published and our newly identified HF GWAS variants (n = 40) togetherwith the 18 proposed instruments for the ten MR-proteomics genes associated with HF and conducted gene path- way enrichment analysis using GTEx V8. These 58 variants are asso- ciated with 1605 GTEx V8 cis-eQTLs (p < 1 × 10−4), corresponding to a total of 165 uniquegenes (see SupplementaryData 12). After restricting the analysis to pathways described in Gene Ontology, KEGG, and Reactome, we observed 56 enriched pathways (FDR <5%). Biological pathways include muscle adaptation (adjusted p value = 0.03), ven- tricular system development (p =0.03), sarcomere organization (p = 0.04), regulation of vasculature development (p =0.04), and aldosterone-regulated sodium reabsorption (p = 0.04), details on Supplementary Fig. 7 and Supplementary Data 13. For the 18 GWAS distinct variants on HF, we determined the dif- ferential gene expression associated with the novel HF variants (p < 1 × 10−4) in each GTEx V8 tested tissue (heart atrial, heart ventricle, artery aorta, adipose, liver, kidney, and whole-blood tissues, and transformed cultured fibroblasts). We then used the set of differen- tially expressed genes to conduct an overrepresentation analysis on a per-tissue basis (Supplementary Fig. 8). A total of 605 enriched path- ways had at least two differentially expressed genes, with heart-left ventricle being the tissue with the most significantly enriched path- ways (n = 393). The rs6945340/HIP1 variant showed the largest number of enriched pathways (n = 391, all tissues) with the heart’s left ventricle being the primary tissue. Pathways to highlight for this variant include the Krebs cycle, respiratory electron transport chain (both with p = 4.8 × 10−30), and oxidative phosphorylation (p = 3.2 × 10−5). Further details are available in Supplementary Data 14. For eight of the MR- proteomics genes, we identified 77 reported associations with HF- related medical terms according to the EpiGraphDB database (Sup- plementary Data 15). Fig. 3 | Plots showing a genetic association of 18 HF loci against risk factors for HF and bMR and colocalization estimates ofMR-proteomic genes-hits against HFrisk factors. aThe color of the bubble corresponds to thebeta coefficient of the genetic associationbetween the loci (x-axis) and trait (y-axis). Blue corresponds to a negative and red corresponds to a positive beta coefficient. The size of each bubble corresponds to the negative logarithm of the association p value; larger size cor- responds to lower p values. Loci are grouped by druggable and non-druggable genes. All tests were two-sided without adjustment for multiple comparisons. Associations which passed the p value threshold (p < 1 × 10−4) are denoted by a yellow diamond. b This bubble plot shows MR estimates for which p < 1 × 10−4. The size of each bubble corresponds to the posterior probability for hypothesis 4 derived from colocalization. The color of the bubble corresponds to the beta coefficient derived from MR. Blue corresponds to a negative association and red corresponds to a positive association; note that a positive β indicates either an increase in protein levels corresponding to an increase in HF risk or a decrease in protein levels corresponding to a decrease in HF risk, while a negative β indicates either a decrease in protein levels corresponding to an increase in HF risk or an increase in protein levels corresponding to a decrease in HF risk. The intensity of the color corresponds to −log10(P value) for the strength of association in the MR. All tests were two-sided without adjustment for multiple comparisons. Loci are groupedby druggable and non-druggable genes. TNXB, SIRPA, and ENPEPgenes are not included as these had no MR estimates on HF risk factors that pass the p < 1 × 10−4 threshold. β, Beta coefficient, AC alcohol consumption, AF atrial fibril- lation, BMI body mass index, CAD coronary artery disease, COPD chronic obstructive pulmonary disease, DBP diastolic blood pressure, eGFR estimated glomerular filtration rate, HDL-C high-density lipoprotein cholesterol, IL-6 Inter- leukin-6, LDL-C low-density lipoprotein cholesterol, NT-proBNP N-terminal proBNP, SBP systolic blood pressure, SMK smoking, T2D type-2 diabetes, TRP troponin I cardiac muscle, PP.H4 posterior probability of H4. Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 6 Ta b le 2 |P ro te in -h it s fo r h ea rt fa ilu re id en ti fi ed th ro ug h M en d el ia n ra n d o m iz at io n th at p as se d an FD R th re sh o ld o f 5 % P ro te in G en e N am e N um b er o f S N P s O d d s R at io 9 5 % C I P va lu e p H et † FD R M R -E g g er In te rc ep t (9 5 % C I) ; P va lu e C o lo c P P .H 4 * D ru g g ab ili ty cl as si fi ca ti o n (C h em ic al M o d al it y) N ea r kn o w n H F g en e* * IT IH 4 * 2 1. 13 (1 .0 7, 1. 17 ) 2. 51 E- 0 7 0 .6 6 3. 9 0 E- 0 4 N /A 0 .9 7 N on -d ru g g ab le N o A P O C 3 * 1 1. 19 (1 .1 1, 1. 28 ) 1. 74 E- 0 6 N A 1. 35 E- 0 3 N /A 0 .9 9 A d va nc ed C lin ic al Ph as e (O lig on uc le ot id e) N o M A P K 3 3 0 .9 5 (0 .9 3, 0 .9 7) 6 .7 0 E- 0 6 0 .4 1 3. 4 8 E- 0 3 0 .0 2 (− 0 .0 1, 0 .0 5) ;0 .4 3 0 .5 2 A d va nc ed C lin ic al Ph as e (S m al lm ol ec ul e) N o TN FS F1 2 2 0 .9 6 (0 .9 4 ,0 .9 8 ) 1. 78 E- 0 5 0 .0 5 6 .9 4 E- 0 3 N /A 0 .7 9 C lin ic al Ph as e 1 (A nt ib od y) N o A B O 2 1. 0 2 (1 .0 1, 1. 0 3) 2. 8 9 E- 0 5 0 .1 1 8 .9 9 E- 0 3 N /A 0 .0 1 N on -d ru g g ab le A B O A P O H * 2 0 .9 6 (0 .9 4 ,0 .9 8 ) 5. 24 E- 0 5 0 .8 3 1. 36 E- 0 2 N /A 0 .8 9 N on -d ru g g ab le N o B 3G N T8 2 0 .9 7 (0 .9 6 ,0 .9 9 ) 9 .3 5E -0 5 0 .9 6 2. 0 8 E- 0 2 N /A 0 .4 8 N on -d ru g g ab le N o N TN 4 2 1. 0 8 (1 .0 4 ,1 .1 3) 1. 10 E- 0 4 0 .6 8 2. 14 E- 0 2 N /A 0 .0 4 N on -d ru g g ab le N o D LL 1 1 0 .8 7 (0 .8 ,0 .9 3) 1. 53 E- 0 4 N A 2. 6 5E -0 2 N /A 0 .7 5 N on -d ru g g ab le N o M S T1 3 1. 0 2 (1 .0 1, 1. 0 3) 1. 9 9 E- 0 4 0 .1 1 3. 10 E- 0 2 −0 .2 0 (− 0 .3 8 ,− 0 .0 1) ;0 .2 9 0 .3 7 N on -d ru g g ab le N o EN P EP 4 0 .9 6 (0 .9 4 ,0 .9 8 ) 3. 12 E- 0 4 0 .1 8 4 .2 7E -0 2 0 .0 1 (− 0 .0 2, 0 .0 3) ;0 .6 2 0 .7 4 N on -d ru g g ab le PI TX 2, FA M 24 1A N A E1 1 0 .8 2 (0 .7 4 ,0 .9 1) 3. 55 E- 0 4 N A 4 .2 7E -0 2 N /A 0 .6 A d va nc ed C lin ic al Ph as e (S m al lm ol ec ul e) N o TN X B 1 1. 0 3 (1 .0 2, 1. 0 5) 3. 56 E- 0 4 N A 4 .2 7E -0 2 N /A 0 .6 1 N on -d ru g g ab le N o S IR P A 1 0 .9 8 (0 .9 7, 0 .9 9 ) 3. 9 4 E- 0 4 N A 4 .3 9 E- 0 2 N /A 0 .5 6 N on -d ru g g ab le N o EB I3 1 0 .7 5 (0 .6 4 ,0 .8 9 ) 4 .4 4 E- 0 4 N A 4 .6 1E -0 2 N /A 0 .0 1 N on -d ru g g ab le N o IL 27 1 0 .7 5 (0 .6 4 ,0 .8 9 ) 4 .4 4 E- 0 4 N A 4 .6 1E -0 2 N /A 0 .4 N on -d ru g g ab le N o G en e na m es ar e ita lic iz ed . S ig ni fi ca nt M R re su lt s, FD R <5 % .M R es tim at es w er e ca lc ul at ed us in g W al d ra tio fo r in st ru m en ts w ith on e va ri an t an d in ve rs e- va ri an ce w ei g ht in g an d fi xe d ef fe ct s fo r in st ru m en ts th at co nt ai ne d m or e th an on e va ri an t. N ot e th at an O R >1 in d ic at es an in cr ea se in p ro te in co rr es p on d in g w ith an in cr ea se in H F ri sk or vi ce ve rs a, su g g es tin g th at th e th er ap eu tic so lu tio n m ay b e an in hi b ito r; an O R <1 in d ic at es ei th er a d ec re as e in p ro te in le ve ls co rr es p on d in g w ith an in cr ea se in H F ri sk or an in cr ea se in p ro te in le ve ls co rr es p on d in g w ith a d ec re as e in H F ri sk ,s ug g es tin g th e th er ap eu tic so lu tio n m ay b e an ag on is t. G en es th at p as se d a co lo ca liz at io n th re sh ol d of PP .H 4 >0 .5 (s ug g es tiv e th re sh ol d )a re hi g hl ig ht ed in b ol d an d PP .H 4 >0 .8 (s tr on g th re sh ol d )a re m ar ke d w ith an as te ri sk . M R M en d el ia n ra nd om iz at io n, FD R fa ls e d is co ve ry ra te ,P P. H 4 p os te ri or p ro b ab ili ty of H 4 . *P os te ri or p ro b ab ili ty of H 4 (o ne co m m on ca us al va ri an t) fr om co lo ca liz at io n of p Q TL an d G W A S re su lt s. ** Pr ev io us ly re p or te d H F G W A S g en e fo r in st ru m en ts in G W A S lo ci (w ith in 50 0 K B up or d ow n fr om ea ch lo ci ). † M R p H et w er e m ea su re d b y C oc hr an ’s Q -t es t fo r he te ro g en ei ty ac ro ss in d iv id ua l- va ri an t M R es tim at es w ith in a g en et ic in st ru m en t; in st ru m en ts co nt ai ni ng on e va ri an t w er e no t te st ed fo r he te ro g en ei ty . Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 7 Mouse knock-out models for novel genes identified by GWAS or MR-proteomics We queried for knock-out (KO) mouse models, using the Mouse Genomics (MGI) resource, for evidence thatmodification of the target produces a phenotype relevant to HF. In 13 genes (eight GWAS and five MR-proteomics genes), we retrieved evidence of a KO associated with cardiovascular abnormalities. KO models on CAMK2D, PRKD1,MAPK3, NAE1, SLC39A8, PHIP, RFX4, SCARB1, and TNXB showed phenotypes such as myocardial abnormalities, dilated cardiomyopathy, abnormal response to cardiac infarction, and cardiac hypertrophy, suggesting an intrinsic role in heart function regulation (Supplementary Data 16). Druggability A total of seven novel genes from the GWAS (CAMK2D, PRKD1, and PRKD3) and MR-proteomics (MAPK3, TNFSF12, APOC3, and NAE1) were identified to encodeproteins that are predicted to be druggable (CAMK2D) or targets for 14 unique drugs that are either licensed or in the clinical phase (PRKD1, PRKD3, MAPK3, TNFSF12, APOC3, and NAE1). Except for drugs targeting Apolipoprotein C-III mRNA, Vola- nesorsen, and AKCEA-APO-CIII-LRx evaluated for familial chylomi- cronemia syndrome, all the other 12 drugs are either licensed or under clinical investigation for cancer (n = 10 (MAPK3, PRKD1, PRKD3, andNAE1)) or autoimmune disorders (n = 2, (TNFSF12)). In four of the seven druggable genes, we were able to use our MR findings to infer the type of pharmacological action (agonist versus antagonist) nee- ded to prevent HF and compared this against the pharmacological action of the existing drugs with a single target (which aremost likely to reproduce genetic findings). Through this process, we observed a match in one gene (APOC3); and for the other druggable genes (MAPK3, NAE1, and TNFSF12), the existing drugs were an inhibitor/ antagonist, whileMR suggested an agonist, details on Supplementary Data 17. In silico trials We searched for genetic associations for theGWAShits and conducted two-sample MR for the MR proteomics hits to evaluate safety and efficacy outcomes relevant to the primary prevention trials on HF. Seven of the 18 GWAS distinct variants and two of the ten MR- proteomics genes were additionally associated (p < 1 × 10−4) with effi- cacy outcomes (CAD, T2DM) in the same direction as HF (Supple- mentary Data 18). None of the 18 distinct GWAS variants or ten MR- proteomics genes showed an association (p < 1 × 10−4) with the fol- lowing safety traits: cancers (lung, prostate, colorectal, breast), chronic kidney disease, Alzheimer’s disease, liver enzymes, or creatinine. ComparisonwithGlobal BiobankMeta-analysis Initiative (GBMI) on HF An unpublished study from the GBMI reporting a multi-ancestry HF GWAS (68,408 HF cases and 1,286,331 controls) identified 11 poten- tially novel loci for HF15. We compared these associations with our HERMES-MVP GWAS and determined that seven of the 11 GBMI var- iants were associated (p < 5 × 10−8) in our HF meta-analysis. None of these variants were associated (p < 5 × 10−8) in the HF GWAS in MVP African-Americans dataset (Supplementary Data 19). Two GBMI loci correspond to the same variants (rs10455872/PLG and rs600038/ SURF1) previously reported by HERMES or MVP, and an additional five loci were in LD (r2 range: 0.39 to 1) with our findings (Supplementary Data 19). Finally, two GBMI GWAS variants (rs17035646 and rs61208973) showed suggestive evidence of association in our HF GWAS (p <0.003). In a replication study of the 18 novel loci, findings from the HF GWAS in the GBMI multi-ancestry excluding UK Biobank indicate 33.3% (6 of 18) of variants are significant (p value <0.05/18), 61.1% (11 of 18) are nominally significant (p value <0.05), and 100% have a beta estimate that is directionally concordant with our meta-analysis (Supplementary Data 20). Discussion Our genetic analysis on HF consisting of 90,653 cases identified 18 distinct HF variants through GWAS and an additional ten putatively causal genes for HF through MR and colocalization using proteomic instruments. Our study expands the knowledge on the biological pathways associated with all HF risk loci discovered to date and identifies seven druggable genes as potential drug targets for the pri- mary prevention of HF. We conducted several strategies to provide biological credibility to our 18 distinct GWAS variants. First, 16 of the 18 variants showed genetic associations with HF risk factors that were directionally con- cordantwith theHFfindings, and several LV cardiacMRI traits. Second, overrepresentation analysis using differentially expressed genes by each GWAS variant identified the heart LV myocardium as the most significantly enriched tissue and recovered several pathways of HF relevance. Third, systematic querying on KO mouse models identified CAMK2D, PRKD1, PHIP, RFX4, SLC39A8, and SCARB1, genes foundby our GWAS, with phenotypes relevant to HF. Novel variants to highlight include rs3820888/SPATS2L and rs4755720/HSD17B12 that showed associations with HF risk factors and LV cardiac MRI traits. The rs3820888/SPATS2L variant showed evidenceof colocalizationwith six cardiac MRI traits, including LVEF, LV mass to end-diastolic volume ratio, and AF, all of which were directionally concordant with the HF findings. Previous GWAS have also indicated that the same variant was also associated with QT interval16. The rs4755720/HSD17B12 variant colocalized with LV end-diastolic volume indexed to BSA and HF risk factors that were directionally concordant with the HF findings, all showing a protective effect. Previous GWAS indicated that this variant, as well as others in strong LD, associated with a reduction in adiposity measures and an increase in lung function metrics, suggesting that cardiometabolic fitness may explain the association with HF17–19. We conducted MR-proteomic analyses to uncover the putative causal role of human proteins in HF. Ten genes passed our genetic colocalization test, of which nine were also not in LD with a previously reportedHF variant, minimizing the probability of confounding by LD. Seven of the 10 genes showed associations with at least one HF risk factor, and in the majority (71%) of these associations, the point esti- mate was directionally concordant with the MR findings on HF. Four (MAPK3, PRKD1, CAMK2D, and PRKD3) of the seven drug- gable genes identified by our analyses encode proteins with serine/ threonine kinase activity. These four genes are associated with HF risk factors in a manner that is concordant with the findings on HF. CAMK2D also showed a suggestive association (p = 9 × 10−4) with LV mass. In support of our findings, a mouse model with deletion of MAPK3/MAPK1 genes developed cardiac hypertrophy and ventricular dilation followed by reduced ventricular performance20. CAMK2D, PRKD1, and PRKD3 are calcium/calmodulin-dependent protein kinases known to be associated with cardiac pathophysiology. Protein kinase- D, encoded by the PRKD1 gene, appears to be a regulator ofmyocardial structure and function. Mice with a deletion of PRKD1 in cardiomyo- cytes were reported to be resistant to stress-induced hypertrophy in response to pressure overload, angiotensin-II, and adrenergic activation21. Calcium/Calmodulin-Dependent Protein Kinase II (Cam- KII) is composed of four chains, one of which, delta (δ), is encoded by the CAMK2D gene. CamKII-δ is largely expressed in cardiac tissue (confirmed by our pathway enrichment analysis), where it regulates proteins involved in calcium handling, excitation-contraction cou- pling, activation of hypertrophy, cell death, and inflammation22. Sev- eral case-control studies have shown an upregulation of cardiac CamKII-δ expression and activity in patients with HF, dilated cardio- myopathy, and diabetic cardiomyopathy. In support of this, several experimental studies in animal models of dilated cardiomyopathy and HF have shown that chemical inhibition of CamKII led to protection from cardiac dysfunction, adverse cardiac remodeling, and cardiac arrhythmias22. More recently, the administration of a novel ATP- Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 8 competitive CaMKII-δ oral inhibitor (RA306) in a dilated cardiomyo- pathy mouse model led to an improvement in ejection fraction23. This oral inhibitor offers the opportunity to test the causal role of CamKII-δ through clinical trials for the prevention of HF. Interestingly, the CAMK2D gene was also associated with AF, confirming an association demonstrated by in-vitro and animal models of AF22. Additional druggable genes identified were APOC3, TNFSF12, and NAE1. The APOC3 gene, which achieved the highest level of evidence in our analyses (FDR 5% and PP.H4 >0.8), is known for its associations with lipids, and CAD, which were confirmed in our analysis. Apolipo- protein C-III mRNA is targeted by two different antisense oligonu- cleotides (ASO), Volanesorsen and AKCEA-APO-CIII-LRx, evaluated for familial chylomicronemia syndrome. Phase 3 trials on Volanesorsen have shown an increase in LDL-C levels and thrombocytopenia, which makes it an unlikely candidate for the prevention of HF.24 AKCEA-APO- CIII-LRx is an ASO liver-specific that appears to have a better safety profile, and may be more suitable for long-term use25. TNFSF12 gene encodes for the TNF superfamily member 12 protein; increased levels of this protein were associatedwith a risk reduction in HF according to our MR and colocalization findings. Similar, directionally concordant, findings were reported by recent MR proteomics (using various pro- teomics platforms) against ischemic stroke26. These results are con- sistent with the finding that TNFSF12 is MR associated and colocalized with AF, a risk factor for both ischemic stroke and HF. In addition, we observed a clear reduction in LV mass to end-diastolic volume ratio and a suggestive (p = 2 × 10−3) increase in LVEF, both directionally concordant with a risk reduction in HF. Transgenic mice and adenoviral-mediated gene expressionmodels have also pointed to the role of TNFSF12 in the development of dilated cardiomyopathy and severe cardiac dysfunction27. NAE1 gene encodes NEDD8 activating enzyme E1 subunit 1 protein, and our MR and colocalization findings showed this gene was associated with lower values of blood pressure, which coincides with the reduced risk of HF. The strengths of the current analysis are multiple. First, the large number of HF cases included in our analysis led us to identify new variants and putatively causal genes for HF through GWAS and MR proteomics. Second, we used three complementary strategies— near- est gene (local method), PoPs (global method), and pLoF— to assign themost likely gene responsible for the GWAS signal with HF. Through this process, we observed agreements in 11 of 18 GWAS variants, which provided some degree of confidence in the gene prioritization. How- ever, we acknowledge that the PoPs method will miss variants that do not act through various mechanisms captured by PoPs13, highlighting the challenge in assigning the gene responsible for GWAS loci28–30. Third, weprovide biological credibility formost of our geneticfindings through an extensive and complementary analysis covering HF risk factors, LV cardiacMRI, and -omics. Fourth, in sevenMRhits for HF, we showed that our proposed instruments, in addition to associations with HF risk factors or LV cardiac MRI traits, were also associated with gene expression, and protein levels all acting in cis. Fifth, KOmodels of thirteen genes identified through GWAS and MR developed highly relevant phenotypes to HF and in some cases (CAMK2D), specific pharmacological inhibition showed reversibility of the HF phenotypes. Six, the lack of associations between the distinct GWAS loci and MR geneswith safety outcomes used in the primary prevention trials of HF provides some reassurance on target safety profiles. The degree of credibility on the causality of proteins identified by MR depends on whether the MR assumptions are valid. First, our colocalization analysis on HF, risk factors for HF, and LV cardiac MRI traitsmake confounding by LD unlikely. The selection of cis-variants as proposed instruments minimize the chances of horizontal pleiotropy. To further minimize the chances of horizontal pleiotropy, we devel- oped a novel analysis that attempted to empirically test the relevant conditions needed for horizontal pleiotropy to invalidateMR. First, we looked for secondary proteins or gene expression associated with our MR protein-hits, and then evaluated if those secondary proteins/gene expression were associated with HF and fall in a biological or PPI pathway outside our protein-hits. After doing this, only TNXB showed some evidence of horizontal pleiotropy. Interestingly, cis-pQTLs used as instruments for TNXB were not associated with cis-eQTLs, HF risk factors or LV cardiac MRI traits. Although we used multiple lines of evidence to determine putative causal genes, the pathway enrichment analysis identifies pathways linked to cardiac biology, but may not point to specific insights for HF, and we did not functionally validate any of our results, which remains as the highest level of evidence to support causal roles for the hits, especially those that pass the sug- gestive MR and coloc thresholds of FDR 5% and PP.H4 >0.5. Although most of our variants and genes showed associations with HF risk factors that were biologically concordant with HF risk, some discordant associations were observed. HDL-C and diastolic BP accounted for most of these discordant associations. It has been reported that higher levels of diastolic BPmay be protective on HF31,32, insteadofdeleterious asweassumed,while theHDL-Cassociationwith HF seems to be non-linear32, which was not accounted for in our MR analysis that includedHDL-C as a co-variable.Wevalidated sevenof the 11 variants reported in an unpublished multi-ancestry HF GWAS by GBMI15. Another limitation is that our analysis was restricted to indi- viduals of European ancestry.While this does reduce the potential bias caused by population stratification, our results may not apply to populations of other ancestral groups. Future HF GWASmeta-analysis including larger releases of MVP, All of US, and GBMI will not only provide chances for replication of variants identified in Europeans, but also include non-white populations to further increase the discoveryof genetic determinants of HF. Although the absence of HF subtypes in this analysis most cer- tainly decreased our ability to detect signals specific to HF subtypes, it does not invalidate the ones identified. Evidence from primary pre- vention trials using HF as an outcome (as our genetic study) that uncovered the benefits of BP lowering therapies and statins indicates the plausibility for translation of our genetic findings. Future genomic analysis should extend to different HF subtypes, with a focus on HF with preserved ejection fraction, a major unmet need in medicine. Although our design attempted to emulate a primary prevention trial on HF, further studies with access to individual participant data that reliably recreate eligibility criteria and outcome ascertainment that cover efficacy (including HF subtypes) and safety outcomes are needed. In conclusion, we discovered a total of 18 distinct novel HF-associated variants and ten putatively causal genes for HF through GWAS andMR-proteomics with evidence of biological plausibility. The new mechanisms and pathways together with the seven druggable genes discovered provide a tractable path for the translation of our genomic findings for the primary prevention of HF. Methods Clinical and demographic characteristics The study population for the meta-analysis consisted of 1,279,610 participants, of which 302,287 were from MVP (43,344 cases and 258,943 controls) and977,323were fromHERMESConsortium (47,309 cases and 930,014 controls). The clinical and demographic features of the participants are summarized in Supplementary Data 1. A detailed breakdown of clinical and demographic characteristics according to each study included in the HERMES Consortium has been previously published8. The population characteristics of the BioVU PRS cohort can be found in Supplementary Data 21. Genotyping, quality control, and imputation of genetic data For the data obtained from the Million Veteran Program (MVP), DNA was extracted from participants’ blood and genotyped using the MVP 1.0 Genotyping Array, which is enriched for both common and rare Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 9 genetic variants of clinical significance. Imputation performance was assessed, and variants that had poor quality as determined were removed from further analyses. All studies included in the HERMES Consortium utilized high-density genotyping arrays. A detailed table summarizing the genotyping, quality control, imputation, and analysis across the29distinctdatasets included in theHERMESConsortiumhas been previously described8. For quality control, the per-variant call rate and the per-sample call rate across all studies was at least greater than 908. The MAF threshold ranged from >0 to 1% across studies8. Further details can be found in the Supplementary Information. Phenotyping of heart failure Across all 26 cohorts of the HERMES Consortium, cases with HF were identified by a clinical diagnosis of HF of any etiology, as determined by physician diagnosis or adjudication, ICD codes, and imaging, and controls were participants without a clinical diagnosis of HF. In the MVP, HF patients were identified as those with an International Clas- sification of Diseases (ICD)−9 codeof 428.xor ICD-10 codeof I50.xand an echocardiogram performed within 6 months of diagnosis (median time period from diagnosis to echocardiography was 3 days, inter- quartile range 0–32 days). Further details can be found in the Sup- plementary Information. Genome-wide association study for HF We performed a fixed effects inverse-variance weighted meta-analysis HF from the published MVP (n = 302,258) and HERMES (n = 964,057)8 GWAS using METAL33 (version release 2020-05-05) in a total of 1,266,315 individuals.We removed variants with aMAF<0.5%, resulting in 10,227,138 associations. Weused FUMA34 to annotate our results using the default settings. In accordance with the default FUMA parameters, we defined distinct variants to have an R2 < 0.6 and determined the associations that were >500KB from a previously reported indexed variant in MVP and HERMES. We used the closest gene to the indexed variant and the top gene per locus identified by PoPs to prioritize genes for our GWA- significant (p < 5 × 10−8) loci. The PoPS method13 is a new gene prioritization method that identifies the causal genes by integrating GWAS summary statistics with gene expression, biological pathway, and predicted protein–protein interactiondata.We applied the PoPS score because it has been shown to nominate causal genes at non-coding GWAS loci with greater predictive confidence compared to other similarity-based or locus-based methods13. By leveraging a framework unbiased by previous trait-specific knowledge, the PoPs tool can prioritize causal genes and therefore highlight relevant biological pathways with greater confidence. First, as part of the PoPS analysis, we usedMAGMA to compute gene association statistics (z-scores) and gene–gene cor- relations from GWAS summary statistics and LD information from the 1000 Genomes. Next, PoPS performs marginal feature selection by using MAGMA to perform enrichment analysis for each gene feature separately. The model is fit by generalized least squares (GLS), and MAGMA results are used to perform marginal feature selection, retaining only features that pass a nominal significance threshold (p < 0.05). Then, PoPS computes a joint enrichment of all selected features simultaneously in a leave one chromosome out (LOCO) framework. The gene features employed by PoPS are listed here: https://github.com/FinucaneLab/gene_features. The PoPs method uses data from gene expression datasets, protein–protein interaction networks, and pathway databases; however, variants that act through mechanisms not captured by the PoPs model would not be identified. Finally, PoPS computes polygenic priority scores for each gene by fitting a joint model for the enrichment of all selected features. The PoP score for a gene is independent of the GWAS data on the chro- mosome where the gene is located. The PoPS analysis returned scores for a total of 18,383 genes per set ofGWASdatasets.We then annotated our GWAS loci with the Ensembl genes in a 500 kb window and selected thehighest PoP score gene in the locus as theprioritizedgene. For all the genes suggested by the nearest gene and PoPS, we con- ducted gene-burden tests derived using a gene-based (mean) approach in a mixed model framework using the Genebass-UK Bio- bank resource (see Supplementary Information). Genome-wide association study in African-Americans MVP subpopulation We conducted a GWAS of HF in the African-American MVP sub- population and performed lookups for our novel HF variants as well as the previously described HF variants. The African-American sub- population in the MVP is composed of 11,399 cases with heart failure and 69,726 controls, of which 94.9% cases and 85.4% controls were malewith ameanageof 63.82 (9.92) and 56.39 (12.20) for the cases and controls, respectively (Supplementary Data 4). Associations of HF GWAS variants with HF risk factors and LV cardiac MRI traits For genetic variants that passed the GWAS threshold for HF (p < 5 × 10−8), we determined genetic associations for 15 HF risk factors and nine LV cardiac MRI traits derived from available GWAS. Data on HF risk factors was obtained from European-descent GWAS studies: BMI35, smoking36, alcohol intake frequency37, AF38, diastolic and sys- tolic BP39, T2DM40, CAD41, LDL-C42, HDL-C42, estimated glomerular fil- tration rate (eGFR)29, and chronic obstructive airways disease (COPD)36, and troponin I cardiac muscle, N-terminal proBNP (NT-proBNP), and interleukin-6 (IL-6). For LV cardiac MRI traits, we determined genetic associations from two separate publications. Seven LV cardiac MRI measurements in 36,041 participants of the UK Biobank from ref. 43 and LVmass and LV mass to end-diastolic volume ratio from cardiac MRI in 42,157 UK Biobank participants from Aung et al. (unpublished) using automated CMR analysis techniques and LV GWAS techniques44,45. We used p < 1 × 10−4 (0.01/number of secondary to HF traits tested in the manuscript) to account for multiple testing. For associations that passed our p value threshold, we evaluated whether the direc- tionality of HF risk factors associations was concordant with findings on HF; for example, for a variant that showed an increased risk of HF, we expect a positive association with a deleterious risk factor. Mendelian randomization on 1557 proteins and HF Selection of proposed pQTL instruments. We obtained pQTLs from a genome-proteome-wide association study in the Fenland study of 10,708 participants of European-descent12 (retrieved from www.omicscience.org). The genome-proteome-wide association study was conducted using 10.2 million genetic variants and plasma abun- dances of 4775distinct protein targets (proteins targetedby a leastone aptamer)measured using the SOMAscanV4 assay12. Significant genetic variant pQTLs were defined as passing a Bonferroni p value threshold of p < 1.004 × 10–11. Approximate conditional analysis was performed to detect secondary signals for each genomic region identified by distance-based clumping of association statistics12. To diminish the likelihood of horizontal pleiotropy, we restricted proposed instru- mental variables to (lead and secondary signals) cis-pQTLs using a p value threshold of p < 5 × 10−8 in marginal statistics, where cis is defined as any variant within a +/− 1Mb region of the protein-encoding gene. A total of 2900 cis-pQTLs across 1557 genes (mean= 1.9, min = 1, max = 14) covering an equal number of proteins from the Fenland studywere used as proposed instrument variables for conducting two- sample MR of proteomics against HF. Mendelian randomization and colocalization Weperformed two-sampleMR using the TwoSampleMRpackage in R (https://mrcieu.github.io/TwoSampleMR/)46. The Wald Ratio was Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 10 https://github.com/FinucaneLab/gene_features http://www.omicscience.org https://mrcieu.github.io/TwoSampleMR/ used for instruments with one variant and the inverse-variance weighted MR method was used for instruments with two or more variants. We tested the heterogeneity across variant-level MR esti- mates, using the Cochrane Q method (mr_heterogeneity option in TwoSampleMR package) and plotted the effects of the variants on the proteins against the effects of the variants on HF to validate our instruments when more than one variant was included. We defined significant MR results using a false discovery rate (FDR) of 0.05 cal- culated by the Benjamini–Hochberg method (corresponding p value = 5 × 10−4). We used the MR-Egger intercept test to detect potential directional pleiotropy, and report the Egger intercept and corresponding standard error and p value for genes with three or more variants, where theMR-Egger intercept can be interpreted as an estimate of the average horizontal pleiotropic effect of the genetic variants47. MR assumes the SNP influences the outcome only through exposure. To help guard against the existence of distinct but corre- lated causal variants for the exposure and outcome, for results that passed our MR threshold (FDR <0.05), we performed colocalization using the COLOC package48 in R. Colocalization assesses the prob- ability of a shared causal variant (PP.H4) or distinct causal variants (PP.H3) between the HF GWAS and cis-pQTL instruments for the pro- teinof interest.Weperformedconditional analysis on thepQTLdata to identify conditionally distinct pQTL signals and performed colocali- zation using marginal (unadjusted) pQTL results as well as results conditional on each of the instruments used in the MR. Statistically significant MR hits with a posterior probability of a shared causal variant (PP.H4) >0.5 for at least one instrumental variant were then investigated further. Colocalization was performed using with default priors (prior probability of initial trait association is 1 × 10−4, prior probability of shared causal variant across two traits is 1 × 10−5).We also investigated if the cis-pQTL instruments for genes that passedbothMR and colocalization thresholdswere also cis-eQTLs (p < 5 × 10−8). Tissues usedwere whole blood from eQTLGen and heart atrial, heart ventricle, artery aorta, adipose, liver, kidney tissues, and transformed cultured fibroblasts from GTEx V8. MR and colocalization for HF risk factors and cardiac MRI traits For proteins that passed both MR and colocalization thresholds, we conducted two-sample MR analyses of these proteins, using cis-pQTLs from the Fenland study as proposed instrumental variables, against 15 HF risk factors and nine cardiac MRI traits described in the previous section (see SupplementaryMaterial for details on traits and datasets). For the MR results that passed a p value threshold of p < 1 × 10−4, we conducted colocalization analyses as previouslydescribed.Wedefined significant findings as those that passed thresholds forMR (p < 1 × 10−4) and colocalization (PP.H4 >0.5). Assessment of horizontal pleiotropy For statistical findings that passed the MR and colocalization thresh- olds, we evaluated the possibility that horizontal pleiotropy may invalidate our findings. The pipeline of analysis is depicted in Supple- mentary Fig. 5. Step-1: We determined if our cis-pQTLs were associated (p < 5 × 10−8) with other proteins levels included in SOMAscan V4 or with gene expression using data from eQTLGen. Step-2: We queried if the genes (including genes that encode SOMAscan proteins) identified in Step-1 were within 1MB of the risk loci for HF identified by GWAS conducted todate. Step-3:We conducted a two-sampleMR to identify if the secondary genes/proteins (identified in Step-1) were associated with HF, using a Bonferroni-corrected p value (0.05/number of unique genes/proteins identified in Step-1). We leveraged as pro- posed instruments the lead cis-pQTL (p < 5 × 10−8) from the Fenland study, and if it was not available, we used the lead cis-eQTL (p < 5 × 10−8) identified from eQTLGen. Step-4: We then mapped all secondary genes/proteins identified in Step-3 to Reactome/KEGG pathways; and compared if these pathways are on the same (vertical pleiotropy) or different (horizontal pleiotropy) pathway as that associated with the primary genes identified throughMR proteomics for HF. To further investigate the physiological functionalities of our findings retrieved in Step-4, we queried two databases: the Enrichr49–51, an interactive gene knowledge discovery database, and the GPS-Prot server52, a platform with aggregated information about protein–protein interactions. LD score regression We used LD Score regression53 (LDSC) to estimate genetic correlations between heart failure and 15 cardiovascular traits. We estimated using European LD scores obtained from the 1000 Genomes Project Phase 3 data for the HapMap2 SNPs. We used MungeSumstats to perform standardization of association statistics54. Polygenic risk score analysis A polygenic score for heart failure was calculated using the HF meta- analysis using the PRS-CS package55, which utilizes a Bayesian regres- sion framework to calculate posterior SNP effect sizes under a con- tinuous shrinkage prior. We used the LD reference panel constructed using the 1000 Genomes Project Phase 3 data. We conducted these analyses in Python, using the packages scipy and h5py. The PRS was evaluated in the Vanderbilt UniversityMedicalCenter (VUMC)BioVU, a biobank that links the de-identified electronic medical record (EMR) system containing phenotypic data to discarded blood samples from routine clinical testing for the extraction of genetic data56. A full description of the BioVU resource has been previously published56. Participants with heart failure were identified by a modified version of the eMERGE definition for heart failure, which includes the Interna- tional Classification of Diseases, Tenth Revision (ICD-10) codes, where age was defined as age at heart failure for cases and age at lastmedical visit for controls. To determine the ability of PRS to stratify heart failure cases from controls, we used a logistic regression model, adjusting for age, sex, and three principal components of ancestry in the BioVU.We assessed enrichment in themore extreme tail of the PRS distribution by evaluating the odds ratio for individuals in the top PRS decile compared to individuals in the bottom PRS decile. In the top decile of PRS, there were 723 participants with HF and 6788 controls, and in the bottomdecile, therewere 416 participantswithHF and 7096 controls. Pathway enrichment analysis We conducted an enrichment analysis to identify biological pathways associated with HF risk loci (established and novel) that passed the GWAS p value thresholds. For each locus, we selected the top variant and then identified cis-eQTLs (within a 1Mb region) from GTEx V8 in any tissue associatedwith the top variants and extracted all genes with a p < 1 × 10−4. Wemerged all retrieved genes to a gene set that was then used for inquiry for the enriched pathways. This set of genes was set forth to an overrepresentation analysis using the pathways described inGeneOntology, KEGG, andReactome. Selectedpathwayswere those significantly enriched at an FDR <0.05. Additionally, we explored the downstream transcriptional con- sequences associated with the distinct variants identified by our GWAS on HF and those not previously reported. We used the distinct variants and conducted a differential gene expression analysis (using a dominant model) for all transcripts available in GTEx V8 for heart atrial, heart ventricle, artery aorta, adipose, liver, kidney, trans- formed cultured fibroblasts, and whole-blood tissues. After fitting models for our variants, we retrieved all genes differentially expres- sed at a p < 1 × 10−4 and conducted an enrichment pathway analysis (through an overrepresentation analysis, as described above). Enrichment analyses were performed using the R packages cluster- Profiler and enrichplot57. Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 11 EpiGraphDB queries To investigate the current knowledge about the biomedical functions of the hit genes in association with HF, we used the EpiGraphDB database58. We queried the biomedical and epidemiological relation- ships curated in the database to identify associations between the genes we identified and cardiovascular-related outcomes and risk factors (see Supplementary Methods). Querying the MGI database We queried the Mouse Genome Informatics (MGI, http://www. informatics.jax.org/) resource for all candidate genes from our novel GWAShits list or those suggested as causal fromourMR/colocalization approach. MGI uses a standardized nomenclature, and controlled vocabularies such as the Mouse Developmental Anatomy Ontology, theMammalianPhenotypeOntology, and theGeneOntologies. AsMGI extracts and organizes data from the primary literature, we have parsed all system abnormalities associated with models on all of the queried genes59. For models that displayed cardiovascular abnormal- ities, we have hand-curated the abnormalities and organized them into three distinct groups associated with (1) congenital heart malforma- tions, (2) myocardial abnormalities, and (3) vascular abnormalities. Druggability annotations Proteins encoded by genes identified in the GWAS andMRanalyses for HF were annotated with drug tractability information based on infor- mation provided by OpenTargets10,60,61 (release 2021-03-08). Open- Targets tractability system stratified drug targets into nine mutually exclusive groups (termed “buckets”) based on the drug type and the stage of the drug discovery pipeline. For easier interpretation, we regrouped the original buckets into fourmutually exclusive groups, as follows: Licensed drugs: bucket-1 for antibodies, small molecules, and other modalities. Drugs in clinical development: buckets 2 and 3 for antibodies, small molecules, and other modalities. Compounds in the preclinical phase: buckets 4 and 5 for small molecules. Predicted druggable: buckets 6 to 8 for small molecules plus buckets 4 and 5 for antibodies. The remaining proteins were considered non-druggable. For genes that were the target of licensed drugs, we checked whether the disease indication was also a risk factor for HF, as this may intro- duce a bias analogous to confounding by indication in MR. GBMI replication of novel loci We conducted a replication of the 18 novel loci in the Global Biobank Meta-analysis Initiative (GBMI) multi-ancestry GWAS on heart failure, which includes 859,141 controls and 60,605 cases fromBioBank Japan, BioMe, BioVU, China Kadoorie Biobank, Estonian Biobank, FinnGen, Genes & Health, HUNT, Lifelines, Michigan Genomics Initiative, Part- nersBiobank, UCLAPrecisionHealth Biobank, excludingUKBiobank62. Heart failure cases were ascertained by ICD code (phecode 428.2). We consider p <0.05/18 as a level of significance for replication and p <0.05 as a level of nominal significance. Reporting summary Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. Data availability The MVP GWAS summary statistics used in this study is available through dbGAP under accession code phs001672.v10. The only restriction is that use of the data is limited to health/medical/biome- dical purposes, and does not include the study of population origins or ancestry. Use of the data does include methods development research (e.g., development and testing of software or algorithms) and reques- ters agree to make the results of studies using the data available to the larger scientific community. The HERMES GWAS summary statistics used in this study are publicly available in the GWAS Catalog under accession code GCST009541. Fenland-SomaLogic protein GWAS data are available at https://omicscience.org/. GTEx project v.8 data were publicly available at https://gtexportal.org/home/. Mouse Genome Informatics (MGI) data is publicly available at http://www.informatics. jax.org/. The GWAS summary statistics for the risk factor analyses used in this study are deposited in theGWASCatalog (https://www.ebi.ac.uk/ gwas/) and the accession codes are as follows: body mass index (GCST006900), alcohol consumption (GCST007325), atrial fibrillation (GCST006414), systolic blood pressure (GCST006624), diastolic blood pressure (GCST006630), type-2 diabetes (GCST006867), and coronary artery disease (GCST005194) troponin (GCST005806), NT-pBNP (GCST005806) and IL-6 (GCST90012049). The GWAS summary statis- tics for smoking and chronic obstructive airways disease used in this study are available at https://gwas.mrcieu.ac.uk under GWAS ID ukb-b- 5779 and ukb-b-13447, respectively, and the GWAS summary statistics for the traits examined in the in silico trails are available at https://gwas. mrcieu.ac.uk using the GWAS IDs listed in the Supplementary Data. The GWAS summary statistics for the LDL-cholesterol and HDL-cholesterol are publicly available at http://csg.sph.umich.edu/willer/public/glgc- lipids2021/results/ancestry_specific/. The summary statistics for esti- mated glomerular filtration rate (eGFR) are deposited in://www.uni- regensburg.de/medizin/epidemiologie-praeventivmedizin/genetische- epidemiologie/gwas-summary-statistics/index.html. The cardiac MRI datasets provided by Pirruccello et al. are deposited under Dataset Name “UK Biobank Cardiac MRI LV GWAS” on https://cvd.hugeamp. org/downloads.html. The Open Targets data are deposited in https:// platform.opentargets.org/. The EpiGraphDB database used in this study is provided at: https://www.epigraphdb.org/. Code availability We used publicly available software for the analyses, and all software used is listed and described in the Methods section of our manuscript. Statistical analyses were conducted in R version 3.6.3. Mendelian ran- domization analyseswere conducted using the TwoSampleMRpackage in R version 0.5.3 (https://mrcieu.github.io/TwoSampleMR/), genetic colocalization analyses were conducted using the coloc package in R (https://cran.r-project.org/web/packages/coloc/index.html and https:// chr1swallace.github.io/coloc, using default priors), pathway enrich- ment analyses were conducted using the clusterProfiler package in R (https://pubmed.ncbi.nlm.nih.gov/22455463/) and the enrichplot R package, LD Score regression was conducting using LDSC (https:// github.com/bulik/ldsc), and polygenic risk score was calculated using the PRS-cs package v1.0.0 (https://github.com/getian107/PRScs). Meta- analysis of GWAS summary statistics were prepared using publicly available software, including METAL (https://genome.sph.umich.edu/ wiki/METAL_Documentation), version release 2020-05-05. The soft- ware used to annotate our results are described in theMethods section of the manuscript. References 1. Roth, G. A. et al. Global Burden of cardiovascular diseases and risk factors, 1990-2019: update from the GBD 2019 Study. J. Am. Coll. Cardiol. 76, 2982–3021 (2020). 2. Roger, V. L. Epidemiology of heart failure: a contemporary per- spective. Circ. Res. 128, 1421–1434 (2021). 3. Blood Pressure Lowering Treatment Trialists’ Collaboration. Phar- macological blood pressure lowering for primary and secondary prevention of cardiovascular disease across different levels of blood pressure: an individual participant-level data meta-analysis. Lancet 397, 1625–1636 (2021). 4. Nissen, S. E. et al. Statin therapy, LDL cholesterol, C-reactive pro- tein, and coronary artery disease. N. Engl. J. Med. 352, 29–38 (2005). 5. Smith, G. D. & Ebrahim, S. Mendelian randomization: prospects, potentials, and limitations. Int. J. Epidemiol. 33, 30–42 (2004). Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 12 http://www.informatics.jax.org/ http://www.informatics.jax.org/ https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001672.v10.p1 https://www.ebi.ac.uk/gwas/studies/GCST009541 https://omicscience.org/ https://gtexportal.org/home/ http://www.informatics.jax.org/ http://www.informatics.jax.org/ https://www.ebi.ac.uk/gwas/ https://www.ebi.ac.uk/gwas/ https://gwas.mrcieu.ac.uk https://gwas.mrcieu.ac.uk https://gwas.mrcieu.ac.uk http://csg.sph.umich.edu/willer/public/glgc-lipids2021/results/ancestry_specific/ http://csg.sph.umich.edu/willer/public/glgc-lipids2021/results/ancestry_specific/ http://www.uni-regensburg.de/medizin/epidemiologie-praeventivmedizin/genetische-epidemiologie/gwas-summary-statistics/index.html http://www.uni-regensburg.de/medizin/epidemiologie-praeventivmedizin/genetische-epidemiologie/gwas-summary-statistics/index.html http://www.uni-regensburg.de/medizin/epidemiologie-praeventivmedizin/genetische-epidemiologie/gwas-summary-statistics/index.html https://cvd.hugeamp.org/downloads.html https://cvd.hugeamp.org/downloads.html https://platform.opentargets.org/ https://platform.opentargets.org/ https://www.epigraphdb.org/ https://mrcieu.github.io/TwoSampleMR/ https://cran.r-project.org/web/packages/coloc/index.html https://chr1swallace.github.io/coloc https://chr1swallace.github.io/coloc https://pubmed.ncbi.nlm.nih.gov/22455463/ https://github.com/bulik/ldsc https://github.com/bulik/ldsc https://github.com/getian107/PRScs https://genome.sph.umich.edu/wiki/METAL_Documentation https://genome.sph.umich.edu/wiki/METAL_Documentation 6. Levin, M. G. et al. Genome-wide association andmulti-trait analyses characterize the common genetic architecture of heart failure. Nat. Commun. 13, 6914 (2022). 7. Joseph, J. et al. Genetic architecture of heart failure with preserved versus reduced ejection fraction. Nat. Commun. 13, 7753 (2022). 8. Shah, S. et al. Genome-wide association and Mendelian randomi- sation analysis provide insights into the pathogenesis of heart failure. Nat. Commun. 11, 163 (2020). 9. Williams, S. A. et al. Plasma protein patterns as comprehensive indicators of health. Nat. Med. 25, 1851–1857 (2019). 10. Ochoa, D. et al. Open Targets Platform: supporting systematic drug-target identification and prioritisation. Nucleic Acids Res. 49, D1302–D1310 (2021). 11. Santos, R. et al. A comprehensive map of molecular drug targets. Nat. Rev. Drug Discov. 16, 19–34 (2017). 12. Pietzner, M. et al. Mapping the proteo-genomic convergence of human diseases. Science 374, eabj1541 (2021). 13. Weeks, E. M. et al. Leveraging polygenic enrichments of gene fea- tures to predict genes underlying complex traits and diseases. Preprint at bioRxiv https://doi.org/10.1101/2020.09.08. 20190561 (2020). 14. Karczewski, K. J. et al. Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Bio- bank exomes. Cell Genom. 2, 100168 (2022). 15. Wu, K.-H. H. et al. Polygenic risk score from a multi-ancestry GWAS uncovers susceptibility of heart failure. Preprint at bioRxiv https://doi.org/10.1101/2021.12.06.21267389 (2021). 16. Verweij, N. et al. The geneticmakeupof the electrocardiogram.Cell Syst. 11, 229–238.e5 (2020). 17. Karlsson, T. et al. Contribution of genetics to visceral adiposity and its relation to cardiovascular and metabolic disease. Nat. Med. 25, 1390–1395 (2019). 18. Hoffmann, T. J. et al. A large multiethnic genome-wide association study of adult body mass index identifies novel loci. Genetics 210, 499–515 (2018). 19. Pulit, S. L. et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Hum. Mol. Genet. 28, 166–174 (2019). 20. Kehat, I. et al. Extracellular signal-regulated kinases 1 and 2 regulate the balance between eccentric and concentric cardiac growth. Circ. Res. 108, 176–183 (2011). 21. Fielitz, J. et al. Requirement of protein kinase D1 for pathological cardiac remodeling. Proc. Natl. Acad. Sci. USA 105, 3059–3063 (2008). 22. Swaminathan, P. D., Purohit, A., Hund, T. J. & Anderson, M. E. Calmodulin-dependent protein kinase II: linking heart failure and arrhythmias. Circ. Res. 110, 1661–1677 (2012). 23. Beauverger, P. et al. Reversion of cardiac dysfunction by a novel orally available calcium/calmodulin-dependent protein kinase II inhibitor, RA306, in a genetic model of dilated cardiomyopathy. Cardiovasc. Res. 116, 329–338 (2020). 24. Witztum, J. L. et al. Volanesorsen and triglyceride levels in familial chylomicronemia syndrome. N. Engl. J. Med. 381, 531–542 (2019). 25. Esan,O. &Wierzbicki, A. S. Volanesorsen in the treatment of familial chylomicronemia syndrome or hypertriglyceridaemia: Design, development and place in therapy. Drug Des. Devel. Ther. 14, 2623–2636 (2020). 26. Chong, M. et al. Novel drug targets for ischemic stroke identified through Mendelian randomization analysis of the blood proteome. Circulation 140, 819–830 (2019). 27. Jain, M. et al. A novel role for tumor necrosis factor-like weak inducer of apoptosis (TWEAK) in the development of cardiac dys- function and failure. Circulation 119, 2058–2068 (2009). 28. Gallagher, M. D. & Chen-Plotkin, A. S. The post-GWAS era: from association to function. Am. J. Hum. Genet. 102, 717–730 (2018). 29. Stanzick, K. J. et al. Discovery and prioritization of variants and genes for kidney function in >1.2 million individuals. Nat. Commun. 12, 4350 (2021). 30. Votava, J. A. & Parks, B. W. Cross-species data integration to prior- itize causal genes in lipid metabolism. Curr. Opin. Lipido. 32, 141–146 (2021). 31. Uijl, A. et al. Risk factors for incident heart failure in age- and sex- specific strata: a population-based cohort using linked electronic health records. Eur. J. Heart Fail. 21, 1197–1206 (2019). 32. Emerging Risk Factors Collaboration. et al. Major lipids, apolipo- proteins, and risk of vascular disease. JAMA 302, 1993–2000 (2009). 33. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta- analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010). 34. Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017). 35. Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of Eur- opean ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018). 36. Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife 7, e34408 (2018). 37. Karlsson Linnér, R. et al. Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nat. Genet. 51, 245–257 (2019). 38. Nielsen, J. B. et al. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat. Genet. 50, 1234–1239 (2018). 39. Evangelou, E. et al. Genetic analysis of over 1 million people iden- tifies 535 new loci associatedwith blood pressure traits.Nat. Genet. 50, 1412–1425 (2018). 40. Xue, A. et al. Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat. Commun. 9, 2941 (2018). 41. van der Harst, P. & Verweij, N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ. Res. 122, 433–443 (2018). 42. Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021). 43. Pirruccello, J. P. et al. Analysis of cardiac magnetic resonance imaging in 36,000 individuals yields genetic insights into dilated cardiomyopathy. Nat. Commun. 11, 2254 (2020). 44. Bai, W. et al. Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. J. Cardiovasc. Magn. Reson. 20, 65 (2018). 45. Aung, N. et al. Genome-wide analysis of left ventricular image- derived phenotypes identifies fourteen loci associated with cardiac morphogenesis and heart failure development. Circulation 140, 1318–1330 (2019). 46. Hemani, G., Tilling, K. & Davey Smith, G. Orienting the causal rela- tionship between imprecisely measured traits using GWAS sum- mary data. PLoS Genet. 13, e1007081 (2017). 47. Burgess, S. & Thompson, S.G. Interpretingfindings fromMendelian randomization using the MR-Egger method. Eur. J. Epidemiol. 32, 377–389 (2017). 48. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014). 49. Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform. 14, 128 (2013). 50. Kuleshov,M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016). Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 13 https://doi.org/10.1101/2020.09.08.20190561 https://doi.org/10.1101/2020.09.08.20190561 https://doi.org/10.1101/2021.12.06.21267389 51. Xie, Z. et al. Gene set knowledge discovery with Enrichr. Curr. Protoc. 1, e90 (2021). 52. Fahey, M. E. et al. GPS-Prot: a web-based visualization platform for integrating host-pathogen interaction data. BMC Bioinform. 12, 298 (2011). 53. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes con- founding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015). 54. Murphy, A. E., Schilder, B. M. & Skene, N. G. MungeSumstats: a bioconductor package for the standardisation andquality control of many GWAS summary statistics. Bioinformatics 37, 4593–4596 (2021). 55. Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019). 56. Roden, D. M. et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin. Pharmacol. Ther. 84, 362–369 (2008). 57. Wu, T. et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation 2, 100141 (2021). 58. Liu, Y. et al. EpiGraphDB: a database and data mining platform for health data science. Bioinformatics 37, 1304–1311 (2021). 59. Shaw, D. R. Searching the Mouse Genome Informatics (MGI) resources for information on mouse biology from genotype to phenotype. Curr. Protoc. Bioinformatics 56, 1.7.1–1.7.16 (2016). 60. Brown, K. K. et al. Approaches to target tractability assessment – a practical perspective. Medchemcomm 9, 606–613 (2018). 61. Schneider, M. et al. The PROTACtable genome. Nat. Rev. Drug Discov. 20, 789–797 (2021). 62. Zhou, W. et al. Global Biobank Meta-analysis Initiative: powering genetic discovery across human disease. Cell Genom. 2, 100192 (2022). Acknowledgements We are grateful to all the MVP investigators; a list of MVP investigators can be found in Supplementary Information. This research is supported by funding from the Department of Veterans Affairs Office of Research and Development, Million Veteran Program Grant I01 CX001737 (PI: Phillips), and I01-BX004821 (PI: Wilson/Cho). This publication does not represent the views of the Department of Veterans Affairs or the United States Government. We also acknowledge the VA Merit Grant I01- CX001025 (PI: Wilson/Cho). The Fenland study was approved by the National Health Service (NHS) Health Research Authority Research Ethics Committee (NRES Committee—East of England Cambridge Central, ref. 04/Q0108/19), and all participants provided written informed consent. We are grateful to all Fenland volunteers and to the General Practitioners and practice staff for assistance with recruitment. We thank the Fenland Study Investigators, Fenland Study Co-ordination team, and the Epidemiology Field, Data and Laboratory teams. The Fenland Study (10.22025/ 2017.10.101.00001) is funded by the Medical Research Council (MC_UU_12015/1). We further acknowledge support for genomics from the Medical Research Council (MC_PC_13046). Proteomic measure- ments were supported and governed by a collaboration agreement between the University of Cambridge and SomaLogic. P.B.M. and S.E.P. acknowledge the support of the National Institute for Health and Care Research Barts Biomedical Research Centre (NIHR203330); a delivery partnership of Barts Health NHS Trust, Queen Mary University of London, St George’s University Hospitals NHS Foun- dation Trust and St George’s University of London. N.A. acknowledges support from the NIHR Integrated Academic Training program which supports his Academic Clinical Lectureship post. C.G. has received funding from the European Union’s Horizon 2020 research and innovation programunder theMarie Skłodowska-Curie grant agreement No 754490—MINDED project. L.S.P. is supported in part by VA awards CSP #2008, I01 CX001899, I01 CX001737, and I01 BX005831; NIH awards R01 DK127083, R21 AI156161, UL1 TR002378, and U18DP006711; and a Cystic Fibrosis Foundation award PHILLI12A0. The sponsors had no role in the design and conduct of the study; collection, management, analysis, and inter- pretation of the data; and preparation, review, or approval of the manuscript. L.S.P. is also supported in part by the Veterans Health Administration (VA). This work is not intended to reflect the official opinion of the VA or the US government. J.P.C. moved to work with Novartis Institute for Biomedical Research during the submission of this project. Author contributions J.P.C. conceived the study design, oversaw all analyses and inter- pretations, and wrote the manuscript. J.P.C., J.J., Y.V.S., and C.La. conceived of the project. D.R., G.M.P, A.C.P., H.D., C.G., and B.R.F. performed the formal analyses and visualizations, and wrote the manuscript. E.W., N.A., M.P., and Q.H. contributed data. E.H.F.-E. and Q.S.W. contributed data. E.H.F.-E. performed analysis. N.M.K. contributed to project administration. J.W. edited the manuscript. L.G., D.C.P., A.P.B., C.Li., K.A., Z.W., B.C., J.E.H., P.W.F.W., L.S.P., P.B.M., S.E.P., K.C., A.R.L., M.P.M., and J.M.G. participated in the contribution of data or analysis tools. All authors critically reviewed the manuscript. Competing interests The authors declare no competing interests. Additional information Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41467-023-39253-3. Correspondence and requests for materials should be addressed to Danielle Rasooly or Jacob Joseph. Peer review information Nature Communications thanks the anon- ymous reviewers for their contribution to the peer review of this work. A peer review file is available. Reprints and permissions information is available at http://www.nature.com/reprints Publisher’s note Springer Nature remains neutral with regard to jur- isdictional claims in published maps and institutional affiliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/. © The Author(s) 2023 Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 14 https://doi.org/10.1038/s41467-023-39253-3 http://www.nature.com/reprints http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ 1Division of Aging, Brigham andWomen’s Hospital, Harvard Medical School, 75 Francis St., Boston, MA 02130, USA. 2Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, 150. S. Huntington Ave, Boston, MA 02130, USA. 3Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave Crosstown Centre, Boston, MA 02118, USA. 4Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of São Paulo, Av Dr Eneas de Carvalho Aguiar 54, São Paulo 5403000, Brazil. 5Genetics Department, Harvard Medical School, Harvard University, 77 Avenue Louis Pasteur, Boston, MA 02115, USA. 6Broad Institute of MIT and Harvard, 415 Main St., Cambridge, MA 02142, USA. 7Health Data Science Centre, Human Technopole, V.le Rita Levi-Montalcini, 1, Milan 20157, Italy. 8Central RNA Lab, Non-coding RNAs and RNA-based Therapeutics, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy. 9MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Addenbrookes Hospital, IMS, Box 285, Cambridge CB2 0QQ, UK. 10William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK. 11Barts Heart Centre, St Bartholomew’s Hospital, Barts Health NHS Trust, West Smithfield, London, UK. 12Computational Medicine, Berlin Institute of Health (BIH) at Charité – Universitätsmedizin Berlin, Kapelle Ufer 2, Berlin 10117, Germany. 13Precision Healthcare University Research Institute, Queen Mary University of London, London, UK. 14Vanderbilt Institute for Clinical and Trans- lational Research, Vanderbilt University Medical Center, Nashville, TN, USA. 15Vanderbilt University Med. Ctr., Departments of Medicine (Cardiology), Bio- medical Informatics, and Pharmacology, Nashville, TN, USA. 16BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Worts Causeway, Cambridge CB1 8RN, UK. 17Department of Chemical Biology, European Molecular Biology Laboratory, European Bioinformatics Institute,WellcomeGenomeCampus, HinxtonCB10 1SD,UK. 18Department of Epidemiology, EmoryUniversity Rollins School of PublicHealth, 1518 Clifton Rd NE, Atlanta, GA 30322, USA. 19Atlanta VA Health Care System, 1670 Clairmont Road, Decatur, GA 30033, USA. 20Massachusetts General Hospital, Boston, MA 02114, USA. 21Division of Cardiology, Department of Medicine, Emory University School of Medicine, 1639 Pierce Dr NE, Atlanta, GA 30322, USA. 22Division of Endocrinology, Emory University, 101Woodruff Circle, WMRB 1027, Atlanta, GA 30322, USA. 23MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, United Kingdom. 24William Harvey Research Institute, Barts and The London Faculty of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK. 25National Institute for Health Research, Barts Biomedical Research Centre, Queen Mary University of London, London, UK. 26William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London EC1M 68Q, UK. 27Department of Biomedical Informatics, Emory University School of Medicine, 1639 Pierce Dr NE, Atlanta, GA 30332, USA. 28Cardiology Section, VA Providence Healthcare System, 830 Chalkstone Avenue, Providence, RI 02908, USA. 29Department of Medicine, Warren Alpert Medical School of BrownUniversity, 222 Richmond Street, Providence, RI 02903, USA. 30These authors contributed equally: Danielle Rasooly, Gina M. Peloso. 31These authors jointly supervised this work: Claudia Langenberg, Yan V. Sun, Jacob Joseph, Juan P. Casas. e-mail: drasooly@bwh.harvard.edu; jacob.joseph@va.gov VA Million Veteran Program Jennifer E. Huffman 2, Peter W. F. Wilson19,21, Lawrence S. Phillips19,22, Kelly Cho1,2, John Michael Gaziano1,2, Yan V. Sun 18,19,27,31, Jacob Joseph28,29,31 & Juan P. Casas1,2,31 A full list of members and their affiliations appears in the Supplementary Information. Article https://doi.org/10.1038/s41467-023-39253-3 Nature Communications | (2023) 14:3826 15 mailto:drasooly@bwh.harvard.edu mailto:jacob.joseph@va.gov http://orcid.org/0000-0002-9672-2491 http://orcid.org/0000-0002-9672-2491 http://orcid.org/0000-0002-9672-2491 http://orcid.org/0000-0002-9672-2491 http://orcid.org/0000-0002-9672-2491 http://orcid.org/0000-0002-2838-1824 http://orcid.org/0000-0002-2838-1824 http://orcid.org/0000-0002-2838-1824 http://orcid.org/0000-0002-2838-1824 http://orcid.org/0000-0002-2838-1824 Genome-wide association analysis and Mendelian randomization proteomics identify drug targets for heart failure Results Genome-wide meta-analysis identifies 18 novel loci for HF MR Proteomics and colocalization identifies ten genes for HF Genetic correlation estimates Polygenic risk score validation Pathway enrichment analysis recovers pathways relevant to HF Mouse knock-out models for novel genes identified by GWAS or MR-proteomics Druggability In silico trials Comparison with Global Biobank Meta-analysis Initiative (GBMI) on HF Discussion Methods Clinical and demographic characteristics Genotyping, quality control, and imputation of genetic data Phenotyping of heart failure Genome-wide association study for HF Genome-wide association study in African-Americans MVP subpopulation Associations of HF GWAS variants with HF risk factors and LV cardiac MRI traits Mendelian randomization on 1557 proteins and HF Selection of proposed pQTL instruments Mendelian randomization and colocalization MR and colocalization for HF risk factors and cardiac MRI traits Assessment of horizontal pleiotropy LD score regression Polygenic risk score analysis Pathway enrichment analysis EpiGraphDB queries Querying the MGI database Druggability annotations GBMI replication of novel loci Reporting summary Data availability Code availability References Acknowledgements Author contributions Competing interests Additional information