Abstract

bioadv

Bioinformatics Advances

2635-0041

Oxford University Press

10.1093/bioadv/vbad185

vbad185

Original Article

Systems biology

AcademicSubjects/SCI01060

StabJGL: a stability approach to sparsity and similarity selection in multiple-network reconstruction

https://orcid.org/0000-0003-2701-5686

Lingjærde

Camilla

MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, United Kingdom camilla.lingjaerde@mrc-bsu.cam.ac.uk Richardson

Sylvia

MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, United Kingdom

Lengauer

Thomas

Associate Editor

Corresponding author. MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, Cambridge CB2 0SR, United Kingdom. E-mail: camilla.lingjaerde@mrc-bsu.cam.ac.uk

2023

05 01 2023

19 12 2023

3 1

vbad185

vbad185_Supplementary_Data

28 06 2023 23 11 2023 11 12 2023 18 12 2023 26 12 2023

2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract Motivation

In recent years, network models have gained prominence for their ability to capture complex associations. In statistical omics, networks can be used to model and study the functional relationships between genes, proteins, and other types of omics data. If a Gaussian graphical model is assumed, a gene association network can be determined from the non-zero entries of the inverse covariance matrix of the data. Due to the high-dimensional nature of such problems, integrative methods that leverage similarities between multiple graphical structures have become increasingly popular. The joint graphical lasso is a powerful tool for this purpose, however, the current AIC-based selection criterion used to tune the network sparsities and similarities leads to poor performance in high-dimensional settings.

Results

We propose stabJGL, which equips the joint graphical lasso with a stable and well-performing penalty parameter selection approach that combines the notion of model stability with likelihood-based similarity selection. The resulting method makes the powerful joint graphical lasso available for use in omics settings, and outperforms the standard joint graphical lasso, as well as state-of-the-art joint methods, in terms of all performance measures we consider. Applying stabJGL to proteomic data from a pan-cancer study, we demonstrate the potential for novel discoveries the method brings.

Availability and implementation

A user-friendly R package for stabJGL with tutorials is available on Github https://github.com/Camiling/stabJGL.

UK Medical Research Council programme

MRC MC UU 00002/10

1 Introduction

Network models have in recent years gained great popularity in many areas. In statistical omics, networks can be used to decode aspects of unknown structures, and hence study the relationships between genes, proteins, and other types of omics data. In health data sciences, rich datasets are more and more frequently encountered, enabling the development of models integrating a variety of biological resources. In the high-dimensional setting commonly found in omics, sharing information between independent observations from data sources with shared structures—which could be different tissues, conditions or patient subgroups—can give a valuable increase in statistical power while elucidating shared biological function. A key question is how to combine the different data sources into a single model.

If a Gaussian graphical model is assumed, a conditional (in)dependence network can be estimated by determining the non-zero entries of the inverse covariance (precision) matrix of the data. With its good performance in numerical studies, the “graphical lasso” (Glasso) of Friedman et al. (2008) is a state-of-the-art method for precision matrix estimation in the setting of Gaussian graphical models. The method combines L1 regularization with maximum likelihood estimation. Other notable methods include the neighborhood selection approach of Meinshausen and Bühlmann (2006) and the graphical SCAD (Fan et al. 2009). Notable Bayesian methods include the Bayesian Glasso (Wang 2012), Bayesian spike-and-slab approaches (Wang 2015), and the graphical horseshoe (Li et al. 2019a).

If multiple related datasets available, there are several ways to leverage common network structures. If focusing on one data type’s network structure, data from other types can enhance inference via weighted Glasso methods (Li and Jackson 2015, Lingjærde et al. 2021). However, to compare network structures across datasets, such as patient subgroups, a joint approach that leverages common information while preserving the differences can increase statistical power and provide interpretable insight.

In the area of multiple Gaussian graphical models, existing methods include the group extension of the Glasso to multiple networks of Guo et al. (2011), the Bayesian spike-and-slab joint graphical lasso (JGL) (Li et al. 2019b), and the Markov random field approach of Peterson et al. (2015). The widely used JGL of Danaher et al. (2014) extends the Glasso to a multiple-network setting and provides a powerful tool for inferring graphs with common traits. It employs two different penalty functions—group joint graphical lasso (GGL) and fused joint graphical lasso (FGL)—with the latter recommended for most applications. From this point forward, any mention of the JGL will imply the fused version, unless otherwise specified. The method needs tuning of two regularization parameters for controlling (i) the number of non-zero effects, and (ii) the similarity between networks, respectively. However, the default parameter selection routine based on the AIC (Akaike et al. 1973) often results in severe over-selection in high-dimensional data, potentially impacting performance negatively (Foygel and Drton 2010, Liu et al. 2010).

We propose a stable and well-performing penalty parameter selection method for the JGL, combining the model stability principle of Liu et al. (2010) with likelihood-based selection for high-dimensional data (Foygel and Drton 2010). The resulting method inherits the powerful traits of the JGL while mitigating the risk of severe under- or over-selection of edges in high-dimensional settings. We provide an R package, stabJGL (stable sparsity and similarity selection for the JGL), which implements the method.

The article is organized as follows. In Section 2, we first describe the Gaussian graphical model framework and the penalized log-likelihood problem we aim to solve. We then describe our proposed algorithm. In Section 3, we demonstrate the performance of our proposed method on simulated data and apply it proteomic data from a pan-cancer study of hormonally responsive cancers. Finally, we highlight possible extensions in Section 4.

2 Methods 2.1 Gaussian graphical models

In a gene network model, genes are represented by “nodes” and associations between them are represented by “edges.” Given measurable molecular units each corresponding to one gene (e.g. the encoded protein or mRNA), a network, or graph, can be constructed from their observed values.

Consider n observed values of the multivariate random vector x=(X1,…,Xp)T of node attributes, with each entry corresponding to one of p nodes. If we assume multivariate Gaussian node attributes, with an n×p observation matrix X with i.i.d. rows x1,…,xn∼N(0,Σ), a “partial correlation network” can be determined by estimating the inverse covariance matrix, or precision matrix, Θ=Σ−1. Specifically, the partial correlation between nodes i and j, conditional upon the rest of the graph, is given by ρij|V\{i,j}=−θijθiiθjj, where the θij’s are the entries of Θ and V the set of all node pairs (Lauritzen 1996). The partial correlations coincide with the conditional correlations in the Gaussian setting. Because correlation (resp. partial correlation) equal to zero is equivalent to independence (resp. conditional independence) for Gaussian variables, a conditional independence graph can thus be constructed by determining the non-zero entries of the precision matrix. To ensure invertibility, the precision matrix also required to be positive definite, Θ≻0.

In high-dimensional settings, the sample covariance matrix S=1n−1XTX is rarely of full rank and thus its inverse cannot be estimated directly. It is common to assume sparse network, meaning the number of edges in the edge set E is small relative to the number of potential edges in the graph (i.e. the sparsity measure 2|E|/(p2−p) is small). Penalized methods, such as the Glasso (Friedman et al. 2008) are well established for sparse Gaussian graphical model estimation. In the case of there being multiple (related) datasets available, such as from different tissue types, rather than estimating each network separately much statistical power could be gained by sharing information across networks through a joint approach.

2.2 Penalized log-likelihood problem

Assume a network inference problem with K groups. We let {Θ}=(Θ(1),…,Θ(K)) be the set of their (unknown) precision matrices, and assume that the set of ∑k=1Knk observations are independent. We aim to solve the penalized log-likelihood problem (Danaher et al. 2014) (1){Θ^}=argmax{Θ≻0}{∑k=1Knk[log(det(Θ(k)))−tr(S(k)Θ(k))]−P({Θ})}, where S(k) is the sample covariance matrix of group k and P(⋅) is a penalty function. In (1), det(⋅) denotes the determinant and tr(⋅) denotes the trace. The JGL employs the fused penalty function (2)P({Θ})=λ1∑k=1K∑i≠jabs(θij(k))+λ2∑k<k′||Θ(k)−Θ(k′)||1, where λ1 and λ2 are positive penalty parameters, abs(⋅) denotes the absolute value function and ||⋅||1 denotes the L1 penalty. This penalty applies L1 penalties to each off-diagonal element of the K precision matrices as well as to the differences between corresponding elements of each pair of precision matrices. The parameter λ1 controls the sparsity, and the similarity parameter λ2 controls the degree to which the K precision matrices are forced toward each other, encouraging not only similar network structures but also similar precision matrix entries. This way, the dependency between observed datasets is modeled through their underlying graphical structures. It is important to note that a necessary assumption of the resulting model is that given their respective graphical structures, observations are independent across and within each dataset. The current penalty parameter selection approach for λ1 and λ2 is based on the AIC (Danaher et al. 2014), and while suitable for determining network similarities, likelihood-based selection criteria can lead to severe under- or over-selection and thus poor performance in high-dimensional settings (Liu et al. 2010).

2.3 The stabJGL algorithm

To improve the performance of the JGL with the fused penalty for omics applications and other high-dimensional problems, we propose the stabJGL algorithm for stable sparsity and similarity selection in multiple-network reconstruction. StabJGL jointly estimates multiple networks by leveraging their common information and gives a basis for deeper exploration of their differences, as shown in Fig. 1. Below we outline the algorithm, which comprised two steps: (i) selecting the sparsity parameter λ1 in the fused penalty (2) based on the notion of model stability, and (ii) selecting the similarity parameter λ2 based on model likelihood. The full StabJGL algorithm is given in Supplementary Algorithm S1.

Figure 1.

The workflow of stabJGL, where the network structures of different data types or conditions are jointly estimated and can then be compared.

2.3.1 Selecting <inline-formula id="IE30"><mml:math id="IM30" xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="bold-italic">λ</mml:mi></mml:mrow><mml:mi mathvariant="bold">1</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula>

We first select λ1 by extending the framework introduced by Liu et al. (2010) in their Stability Approach to regularization Criterion (StARS) to a multiple-network setting. The aim is to select the least amount of penalization that makes graphs sparse as well as reproducible under random subsampling. This is done by drawing many random subsamples from each of the K data types and using them to construct JGL graphs over a range of λ1 values. The smallest parameter value for which a given graph estimation variability measure does not surpass a specified threshold is then selected. We use a measure of edge assignment instability across subsamples to quantify the variability.

Specifically, we consider a grid of λ1 values in a suitable interval, i.e. (0,1] and keep the similarity parameter λ2 fixed to some small value, such as 0.01 in the first instance. For η=1,…,Nsample, we draw a random subsample from each group k’s set of nk observations without replacement, each of size bk<nk. For each value of λ1 to consider, we next construct the corresponding set of JGL graphs {G(k)η(λ1)}k=1K from these K sets of subsamples, using the fused penalty (2).

The following is then done for each value of λ1 we consider. For each group k=1,…,K and all possible node pairs (i,j) we estimate the probability of an edge between the nodes over the Nsample inferred sets of graphs (3)ψ^ij(k)(λ1)=1Nsample∑η=1Nsample1[(i,j)∈G(k)η(λ1)], where 1[⋅] is the indicator function. Using this estimated probability, we find (4)ξ^ij(k)(λ1)=2ψ^ij(k)(λ1)(1−ψ^ij(k)(λ1)), which is an estimate of two times the variance of the Bernoulli indicator of the edge (i,j) in group k. It lies in [0, 0.5] and can be regarded as an estimate of the proportion of times two inferred graphs for group k found with the given λ1 value will disagree on the presence of the edge (i,j). Due to the L1 penalty in (2), the number of inferred edges will decrease as λ1 is increased.

For a given λ1, ξ^ij(k)(λ1) can be regarded as a measure of the variability of the edge (i,j) in group k across subsamples, and the total variability of graph k can be measured by averaging over all edges, yielding the estimate (5)D^(k)(λ1)=1(p2)∑i<jξ^ij(k)(λ1).

For each value of λ1, the total variability of the whole set of graphs found by the JGL is then found by averaging the variability over all K networks (6)D^(λ1)=1K∑k=1KD^(k)(λ1).

For sufficiently large λ1, all edges are excluded from the model and so the variability D^(λ1) will be 0. The variability will in general increase as the penalty λ1 decreases, however, for small enough λ1 the graphs will become so dense that the variability starts to decrease again. As sparse network inference is the aim, we therefore monotonize the variability function by letting D¯(λ1)=supt≥λ1D^(t).

Finally, for a given variability threshold β1, the optimal penalty is chosen to be λ^1=inf{λ1:D¯(λ1)≤β1}. As opposed to λ1, β1 is an interpretable quantity and we propose a default threshold of β1=0.1 as suggested by Liu et al. for the original StARS algorithm, which reflects an acceptance of 10% variability in the edge assignments.

2.3.2 Selecting <inline-formula id="IE66"><mml:math id="IM66" xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="bold-italic">λ</mml:mi></mml:mrow><mml:mi mathvariant="bold-italic">2</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula>

After λ1 has been selected, we select λ2 with a multiple-network version of the extended BIC (eBIC or BICγ) of Foygel and Drton (2010). The eBIC is an extension of the Bayesian Information Criterion of Schwarz (1978), where the prior is reformulated to account for high-dimensional graphical settings. We propose an adaptation the eBIC to a multiple-network setting, (7)BICγ(λ1,λ2)=∑k=1K[nktr(S(k)Θ^λ1,λ2(k))−nk log(det(Θ^λ1,λ2(k)))+|Ek| log nk+4|Ek|γ log p], where Θ^λ1,λ2(k) is the estimated precision matrix of network k obtained with the penalty parameters λ1 and λ2, γ∈[0,1] is an additional penalty parameter and |Ek| is the size of the corresponding edge set. A grid of λ2 values is considered, with λ1 fixed to the value selected in the previous step. The value of λ2 that minimizes (7) is selected, and the final graph estimate is obtained by running the JGL with the selected penalty parameters λ1 and λ2 in the fused penalty (2). An investigation into the effect of selecting the two penalty parameters in reverse order is detailed in Supplementary Section S5, where we find that selecting λ1 after λ2 might be desired in settings where one is not concerned about a high false discovery rate.

2.3.3 Implementation details

StabJGL is implemented in R, and available as an R package at https://github.com/Camiling/stabJGL. The subsampling routine is implemented so it can be done in parallel. The JGL fittings are done as in Danaher et al. (2014), using an ADMM (Alternating Direction Method of Multipliers) algorithm (Boyd et al. 2010) for general penalty functions to solve the penalized log-likelihood problem (1) with the fused penalty (2). The stabJGL algorithm itself does not inherently handle missing data, and requires complete data for the precision matrix estimation. In the case of missing data, we recommend using state-of-the-art approaches, such as multiple imputation to defer the issue. By default, 20 subsamples are used and we evaluate 20 values each of λ1∈[0.01,1] and λ2∈[0,0.1]. We use a subsample size of bk=⌊10nk⌋ for group k=1,…,K, which as Liu et al. (2010) show in a single network setting maintains theoretical properties for containing the true graph with high probability as well as high empirical performance. The additional penalty parameter γ in the eBIC for similarity selection is set to 0 by default, corresponding to the standard BIC. The choice of γ is not crucial as we are using it for similarity and not sparsity selection, and thus compare graphs of similar sparsity. We found this value to be suitable in most applications but leave the option to increase the penalization. We employ a default variability threshold of β1=0.1.

3 Results 3.1 Simulated data

We first assess the performance of stabJGL on simulated data. We compare the network reconstruction ability of stabJGL to that of state-of-the-art methods, including the JGL with the fused penalty (FGL) and group penalty (GGL) with penalty parameters selected with the default AIC-based criterion (Danaher et al. 2014). To assess the performance of another selection criterion specifically designed for high-dimensional graph selection, we also consider FGL with penalty parameters tuned by the extended BIC for multiple graphs (7). We further include the Bayesian spike-and-slab JGL (SSJGL) of Li et al. (2019b), as well as the Glasso of Friedman et al. (2008) tuned by StARS (Liu et al. 2010). The latter estimates each network separately. We generate data that closely resembles our omics application of interest, featuring partial correlations between 0.1 and 0.2 in absolute value, while also exhibiting the “scale-free” property—a typical assumption for omics data where the “degree distribution” (i.e. the distribution of the number of edges that are connected to the nodes) adheres to a power-law distribution (Chen and Sharp 2004). We consider a wide range of settings, simulating K∈{2,3,4} networks with p∈{100,200,300,1000} nodes. We manipulate the degree of similarity in their “true” graphical structures to assess the performance of the method over a wide range of scenarios. We then apply different network reconstruction techniques to determine the networks from the data and report the results averaged over N=100 replicates. The parameter specifications for the different methods are given in Supplementary Section S2. We also investigate the effect of the variability threshold β1 in stabJGL on the results in a setting with p=100 nodes and K=2 networks. Finally, to compare the scalability of the respective methods, we consider the time needed to infer networks for various p and K. Further details and code for the simulation study can be found at https://github.com/Camiling/stabJGL_simulations.

Estimation accuracy is assessed with the “precision” (positive predictive value), and the “recall” (sensitivity). The precision gives the fraction of predicted edges that were correct, while the recall is the fraction of edges in the true graph that were identified by the inference. Because the sparsity of estimated networks will vary between methods, the precision–recall trade-off should be taken into consideration. In general, the recall will increase with the number of selected edges while the precision will decrease. Since sparsity selection is a main feature of our proposed method, we do not consider threshold-free comparison metrics, such as the AUC. We therefore put emphasis on the following characteristics in our comparative simulation study; (i) suitable sparsity level selection, (ii) utilization of common information at any level of network similarity, i.e. inference improves with increased network similarity, and (iii) a suitable precision–recall trade-off that overly favors either measure.

3.2 Simulation results

The results for K=2 and K=4 networks with p=100 nodes are summarized in Fig. 2. The full tables of results for these settings and the additional settings with K∈{2,3,4} networks and p∈{100,200,300,1000} nodes are shown in Supplementary Section S2, where details on selected sparsity and penalty parameters are given for all methods. Additional simulations investigating the effect of random noise on the graph reconstruction performance of the different methods are also given in Supplementary Section S6. In Fig. 2, FGL tuned with eBIC did not select any edges in any of the settings and is therefore not shown. In all settings considered, the FGL and GGL with the default AIC-based penalty parameter selection strongly over-select edges. This leads to high recall, but very low precision. Second, they do not appear to sufficiently utilize network similarities; the performance of the two methods, particularly GGL, differs little between completely unrelated and identical networks. Notably, in all cases, the selected value of λ2 is smaller for FGL and GGL tuned by AIC than it is for stabJGL. Consequently, similarity is not sufficiently encouraged even in settings where the networks are identical. The AIC criterion does not seem to provide sufficient penalization to encourage suitable sparsity and similarity. On the other hand, the alternative eBIC criterion gives extremely sparse FGL estimates, selecting an empty graph in most settings, i.e. no edges. Although the extended BIC is developed specifically for graphical model selection, likelihood-based criteria for sparsity selection tend to perform poorly in high-dimensional settings and risk both severe under- and over-selection (Foygel and Drton 2010). This issue is avoided in the stabJGL algorithm as the eBIC only is used to select similarity and not sparsity.

Figure 2.

Performance of the Glasso, FGL, and GGL tuned by AIC, SSJGL, and stabJGL, reconstructing K∈{2,4} graphs with p=100 nodes and various similarity of the true graph structures. The similarity between the graphs is shown as the percentage of edges they have in common. The results are averaged over N=100 replicates and show the precision and recall for the first estimated graph in each setting, reconstructed from n∈{100,150} observations and n∈{150,200,250,300} observations for K=2 and K=4, respectively. Standard deviation bars are shown for all methods. All graphs have true sparsity 0.02.

The Bayesian SSJGL tends to select very few edges, leading to high precision but low recall. Its performance deteriorates drastically as the network differences increase, leading to extremely low recall. This implies a lack of flexibility to adapt to varying network similarity levels, as has previously been observed (Lingjærde et al. 2022). Out of all the joint methods, stabJGL gives the most accurate sparsity estimate. This ensures that we neither get very low precision like FGL and GGL tuned by AIC, nor very low recall like SSJGL and FGL tuned by eBIC. Replacing the mean of the λ1’s over the N=100 replicates by alternative measures, such as the median yields the same reported penalty parameter values and hence results in all settings considered in Fig. 2. The mean and median could be expected to differ slightly if a finer grid of λ1 values were to be considered, but our findings suggest they would yield highly similar results. StabJGL also appears to adapt well to the similarity between networks, with the prediction accuracy increasing with the number of shared edges. As a result, the method either outperforms the Glasso tuned by StARS for highly similar networks or performs comparably to it for unrelated networks. The similar performance for unrelated networks can be explained by the fact that the sparsity controlling penalty parameters of both methods are tuned with a stability-based approach. The results suggest that stabJGL adapts well to the level of similarity and hence can be employed agnostically in settings where there is no prior knowledge about the degree to which the networks have shared structures.

A key question is whether stabJGL can achieve as high precision as the methods that give sparser networks (i.e. SSJGL) by using a lower variability threshold. Similarly, we want to see if stabJGL can achieve as high recall as the methods that infer more edges (i.e. FGL and GGL). To investigate this, we consider the same setting as in Fig. 2 with K=2 networks. Supplementary Figure S4 compares the performance of stabJGL for different values of the variability threshold β1 to the other methods, and we find that by decreasing (resp. increasing) β1 we can obtain at least as high precision (resp. recall) as the other methods at any level of similarity. This illustrates that the method can be adapted to reflect the priorities of the user (i.e. concern for false positives versus false negatives). For most applications, a middle-ground value, such as 0.1 yields a good balance between false positives and false negatives as demonstrated in the simulations.

3.3 Runtime profiling

Figure 3 shows the CPU time used to jointly infer networks for K∈{2,3,4} networks and various numbers of nodes p, with n∈{100,150} observations, for the JGL with the fused penalty (FGL) with penalty parameters tuned with the AIC and stabJGL with the same parameter specifications as in the previously described simulations. Due to an efficient parallelized implementation, stabJGL has an almost identical runtime to FGL when the same number of λ1 and λ2 values are considered. Thus, the increased estimation accuracy of stabJGL does not come at a computational cost. It is important to note that due to the generalized fused lasso problem having a closed-form expression for the updates in the special case of K=2 (Danaher et al. 2014), stabJGL is substantially faster for only two networks than for K>2. As stabJGL uses the fused penalty this comparison is the most relevant, but a runtime comparison of all methods considered in our simulation study can be found in Supplementary Fig. S1. In the Supplementary Material, we also demonstrate that stabJGL can be applied to problems with p≥2500 nodes or K=10 networks within reasonable time (Supplementary Fig. S3).

Figure 3.

CPU time in seconds on a logarithmic scale used to jointly infer networks for K∈{2,3,4} networks and various numbers of nodes p, with n∈{100,150} observations, for FGL tuned with AIC and stabJGL. The computations were performed on a 16-core Intel Xeon CPU, 2.60 GHz.

3.4 Pan-cancer data

We perform a proteomic network analysis of Reverse Phase Protein Array (RPPA) data from The Cancer Genome Atlas (TCGA) across different pan-Cancer tumor types (Cancer Genome Atlas Network and others 2012). In a large proteomic pan-Cancer study of 11 TCGA tumor types, Akbani et al. (2014) identified a major tumor super cluster consisting of hormonally responsive “women’s cancers” [Luminal breast cancer (BRCA), ovarian cystadenocarcinoma (OVCA), and uterine corpus endometrial carcinoma (UCEC)]. Our objective is to map the proteomic network structure of the respective tumor types, so that we can get a better grasp of the common mechanisms at play in the hormonally responsive tumors. We are also interested in highlighting the differences.

We consider RPPA data from Luminal BRCA (n=273), high-grade serous OVCA (n=412), and UCEC (n=404). All data are downloaded from the UCSC Xena Browser (Goldman et al. 2020). The data are measured with p=131 high-quality antibodies that target (phospho)-proteins. To alleviate batch effects, the RPPA data are normalized with replicate-base normalization (Akbani et al. 2014). We use stabJGL to jointly estimate the proteomic networks of the respective tumor types and interpret the results and their implications. We compare the output with that obtained with the FGL of Danaher et al. (2014) with the default penalty parameter tuning with AIC as described in Section 3.1. Further details and code for the analysis are given at https://github.com/Camiling/stabJGL_analysis.

3.5 Pan-cancer analysis results 3.5.1 Estimated proteomic networks

The resulting stabJGL proteomic networks of the three tumor types are shown in Fig. 4, where we observe plenty of common edges as well as network-specific ones. The sparsity as well as the selected penalty parameter values in the resulting stabJGL and FGL networks is shown in Table 1. The tendency as observed in the simulations of FGL tuned by the AIC to over-select edges appears to be consistent with the findings in this context. With more than two-thirds of all potential edges being determined as present by FGL, the results are challenging to interpret and derive meaningful conclusions from. From a biological standpoint, we would not expect a proteomic network to be this saturated in terms of associations due to the expected scale-free property of the degree distribution (Barabasi and Oltvai 2004). While the degree distributions of the sparse stabJGL networks all follow a power-law with many low-degree nodes and fewer high-degree ones (hubs), an expected trait for omics data (Chen and Sharp 2004), the degree distributions of the FGL networks do not. The full degree distributions are shown in Supplementary Fig. S6.

Figure 4.

Proteomic network structure identified by stabJGL for the BRCA, OVCA, and UCEC tumors. The nodes represent proteins, and edges common to all three networks are darker.

Table 1.

Network analysis results for stabJGL and FGL tuned by the AIC, applied to data from BRCA, OVCA, and UCEC tumors.

			Sparsity
	λ 1	λ 2	BRCA	UCEC	OVCA
FGL	0.010	0.000	0.689	0.709	0.679
stabJGL	0.323	0.008	0.049	0.036	0.039

In terms of penalty parameters, we see that just like for the simulated data the AIC selects very small penalty parameters for FGL, resulting in little sparsity and similarity encouragement. Given the findings of Akbani et al. (2014) about the presence of a super cluster consisting of the three hormonally responsive cancer types, it is not unreasonable to expect at least some proteomic network similarity to be encouraged by a joint method. This is achieved by stabJGL, which selects a large enough value of λ2 to encourage similarity. A comparison of the pairwise similarities of the proteomic networks reveals that stabJGL indeed finds the networks of the three tumor types to be more similar than FGL, in accordance with the findings of Akbani et al. (2014) (Supplementary Fig. S5).

3.5.2 Edge validation in STRING

To compare the level of evidence supporting the edges detected by stabJGL and FGL tuned by the AIC in the literature, we conduct edge validation using the STRING database of known and predicted protein–protein interactions (Szklarczyk et al. 2019). To ensure the reliability of the validation process, we only consider the experimentally validated interactions in STRING as evidence, with default confidence score threshold ≥0.4. The number of predicted edges with supporting evidence in the STRING database, as well as the proportion of all predicted edges that have evidence in STRING, is computed for the respective stabJGL and FGL networks and shown in Table 2. While the fused Glasso identifies a larger number of edges with evidence in STRING, the method also includes far more edges than stabJGL, resulting in an overall lower percentage of edges with supporting evidence in the STRING database than stabJGL, for all three tumor types investigated. Complete lists of the protein–protein edges identified by stabJGL for each tumor type are provided as Supplementary Files.

Table 2.

Comparison of evidence for edges in the respective FGL tuned by AIC and stabJGL proteomic networks of BRCA, OVCA, and UCEC tumors, considering experimentally determined protein–protein interactions documented in the STRING database.^a

	Edge evidence %		Edge evidence #
Dataset	FGL (%)	stabJGL (%)	FGL	stabJGL
BRCA	5.4	12.3	408	68
UVEC	5.6	10.0	398	44
OVCA	5.7	12.4	424	42

Both the percentage of predicted edges with evidence and the number of predicted edges with evidence are shown. The highest percentage of edges with evidence is in bold.

3.5.3 Findings consistent with literature

StabJGL successfully identifies protein–protein interactions known from literature. To highlight the findings of the proposed methodology, we only discuss edges and central proteins identified by stabJGL but not FGL. One example is the edge between activated (S345-phosphorylated) Checkpoint kinase 1 (Chk1) and DNA repair protein RAD51 homolog 1 (Rad51) in ovarian and BRCA. The complex between the tumor suppressor BRCA2, which manifests predominantly in ovarian and BRCA, and Rad51, is mediated by the DNA damage checkpoint Chk1 through Rad51 phosphorylation (Bahassi et al. 2008, Nair et al. 2020). It is also reassuring that stabJGL identifies many relevant tumor type-specific proteins as hubs in the relevant tumor type only, such as mammalian target of rapamycin (mTOR), Tuberous Sclerosis Complex 2, and Ribosomal protein S6 in BRCA, all of which are involved or up/downstream of the PI3K/AKT/mTOR pathway known to frequently be deregulated in Luminal BRCA (Miricescu et al. 2020). Lists of the top hubs in the respective stabJGL and FGL networks of the different tumor types, and their node degree, are given in Supplementary Tables S9 and S10.

StabJGL also captures edges that we expect to be present in all three tumor types, such as the known interaction between the transcription factor Forkhead box O3 and 14–3-3-epsilon, which facilitates cancer cell proliferation (Nielsen et al. 2008, Tzivion et al. 2011). This common interaction is documented in the STRING database. Figure 5 shows the network structure identified by stabJGL that is common to all three tumor types. Central proteins in this common network structure include Oncoprotein 18 (Stathmin), which is known to be relevant in all three hormonally responsive cancers due to its role in the regulation of cell growth and motility (Bieche et al. 1998, Belletti and Baldassarre 2011, Trovik et al. 2011).

Figure 5.

The proteomic network structure identified by stabJGL common to all three tumor types (BRCA, UCEC, and OCVA). The node size indicates the degree in the common network structure, with proteins with more edges being represented by larger nodes.

3.5.4 Potential candidate hubs

The recovery of documented links in the protein networks estimated by stabJGL highlights its capability to detect numerous relevant proteins and interactions. The potential for new discoveries is however an important aspect of stabJGL, as suggested by its good performance on simulated data. For example, stabJGL identifies phosphorylated epidermal growth factor receptor (EGFR) as a central hub protein in all three tumor types. While known to be relevant in ovarian cancer (Yang et al. 2013, Zhang et al. 2016), the role of activated EGFR in UCEC and Luminal BRCA and is not yet clarified. Our findings suggest it could be relevant in all three hormonally responsive tumor types. Further, Platelet endothelial cell adhesion molecule (CD31) is found to be a central protein in UCEC only. The protein is important for angiogenesis and has been implicated in other tumor types, such as hemangioma (Bergom et al. 2005). Its prominence in the proteomic UCEC network suggests it may play a crucial role in this tumor type as well. Overall, these results showcase how stabJGL can aid in generating hypotheses by identifying central proteins and associations.

4 Discussion

Suitable sparsity and similarity selection is key for capturing and studying multiple-related biological networks. We have proposed the stabJGL algorithm, which determines the penalty parameters in the fused Glasso for multiple networks based on the principle of model stability. StabJGL demonstrably avoids the under- or over-selection of edges observed in state-of-the-art selection methods based on information criteria and succeeds at leveraging network similarities to a suitable degree. Consequently, the method can be employed in situations where the actual degree of similarity is uncertain, resulting in marked benefits with minimal risks associated with its use. StabJGL offers a fast parallelized implementation, particularly for K=2 networks as a closed-form expression of the updates exists. We successfully apply the method to problems with p>2000 nodes or K>10 networks.

With our novel approach, we can identify both common and distinct mechanisms in the proteomic networks of different types of hormonally responsive women’s cancers. The results obtained with stabJGL are in line with known biology and compliment those of Akbani et al. (2014) by offering additional understanding of the underlying mechanisms in action. By recognizing various proteins as highly critical in the proteomic networks, stabJGL suggests their possible involvement in driving the diseases. The method both identifies proteins that are central in all three hormonally responsive cancers (e.g. phosphorylated EGFR) and proteins of tumor-specific relevance (e.g. CD31 in UCEC).

The proposed approach can be applied to any type of omics data that can be transformed to adhere to the normality assumption. It is however important to note that if the datasets that are to be jointly analyzed are different types of omics data, a one-to-one mapping between the nodes in each dataset would be needed, e.g. having protein and mRNA expression data corresponding to the “same” encoding genes.

In the setting of the data sources being from the same set of patients, the joint approach is not applicable due to the key assumption that all observations are independent. One such example could be when the data sources are mRNA and protein expression from the same patients. In that case, alternative approaches developed for multi-omic data integration might be more suitable, see, e.g. Bonnet et al. (2015) and Li and Jackson (2015). In the case of multiple shared axes, an extension of our approach could be to concatenate the different omics data types and determine the global network. If independent observations from multiple sources, such as clinical groups or conditions is available as well, stabJGL could then be used to determine links across and within omics data types simultaneously. Such an approach might warrant the introduction of different penalties in (1), both within and across the different omics data types.

While stabJGL is highly competitive in terms of scalability relative to the existing methodology, all algorithms for estimating Gaussian graphical models have inevitable matrix inversion steps. This is computationally demanding for large p, and, at present, joint graphical methods based on Gaussian graphical models are realistically not feasible for the analysis of extremely high-dimensional data with tens of thousands of nodes. It can be argued that even if such an analysis was possible, it is not sensible to attempt to determine the conditional independence structure for problems of this size. For example, if we were to consider the expression of p=20 000 genes and K=3 groups, we would have the problem of determining over half a billion edges. With only a couple hundred samples in most studies and hence little statistical power, low estimation accuracy and many false discoveries would be expected. Meaningful interpretation of the results would also be a challenge. In settings where p is extremely large, an option could instead be to use a fast marginal correlation-based network approach, such as the weighted gene co-expression network analysis method of Langfelder and Horvath (2008), to identify subnetworks that are relevant for a more refined conditional independence network analysis.

Future extensions of the method can include alternative measures of variability, such as the entropy [see, e.g. Lartigue et al. (2020)]. Further, while the method is formulated specifically for the JGL with the fused penalty, it can in principle be used for any joint network approach requiring the tuning of sparsity- and similarity controlling parameters. Another relevant extension could be the introduction of network-specific sparsity penalty parameters λ1,k and network pair-specific similarity penalty parameters λ2,kl for graphs 1≤k<l≤K, which would provide additional flexibility in terms of differing levels of sparsity and similarity across networks. The introduction of these parameters would however invoke a sharp increase in computational complexity, proportional to K2.

To conclude, stabJGL provides a reliable approach to joint network inference of omics data. The output can provide a better understanding of both common and data type-specific mechanisms, which can be used for hypothesis generation regarding potential therapeutic targets.

Author contributions

C.L. and S.R. developed the methodology. C.L. performed the data simulation and analysis, developed the software, and drafted the manuscript. Both authors have contributed to the manuscript and read and approved the final version.

Supplementary data

Supplementary data are available at Bioinformatics Advances online.

Conflict of interest

None declared.

Funding

This work was supported by the UK Medical Research Council programme [MRC MC UU 00002/10 to C.L. and S.R.]; and Aker Scholarship (C.L.).

Data availability

The TCGA data sets analysed in this article is publicly available in the University of California Santa Cruz Xena Browser repository (http://xena.ucsc.edu), from where we downloaded RPPA data for BRCA (dataset ID: TCGA.BRCA.sampleMap/RPPA_RBN), UCEC (dataset ID: TCGA.UCEC.sampleMap/RPPA_RBN) and OVCA (dataset ID: TCGA.OV.sampleMap/RPPA_RBN).

References

Akaike

, Petrov

, Csaki

Information theory and an extension of the maximum likelihood principle. In: Second International Symposium on Information Theory, New York, NY. New York: Springer, 1973, 267–81.

Akbani

, Ng

PKS

, Werner

et al A pan-cancer proteomic perspective on The Cancer Genome Atlas. Nat Commun2014;5:3887.

Bahassi

, Ovesen

, Riesenberg

et al The checkpoint kinases Chk1 and Chk2 regulate the functional associations between hBRCA2 and Rad51 in response to DNA damage. Oncogene2008;27:3977–85.

Barabasi

A-L

, Oltvai

ZN.

Network biology: understanding the cell’s functional organization. Nat Rev Genet2004;5:101–13.

Belletti

, Baldassarre

Stathmin: a protein with many tasks. New biomarker and potential target in cancer. Expert Opin Ther Targets2011;15:1249–66.

Bergom

, Gao

, Newman

PJ.

Mechanisms of PECAM-1-mediated cytoprotection and implications for cancer cell survival. Leuk Lymphoma2005;46:1409–21.

Bieche

, Lachkar

, Becette

et al Overexpression of the stathmin gene in a subset of human breast cancer. Br J Cancer1998;78:701–9.

Bonnet

, Calzone

, Michoel

Integrative multi-omics module network inference with lemon-tree. PLoS Comput Biol2015;11:e1003983.

Boyd

, Parikh

, Chu

et al Distributed optimization and statistical learning via the alternating direction method of multipliers. FNT Mach Learn2010;3:1–122.

Cancer Genome Atlas Network and others. Comprehensive molecular portraits of human breast tumours. Nature2012;490:61–70.

Chen

, Sharp

BM.

Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics2004;5:147.

Danaher

, Wang

, Witten

DM.

The joint graphical lasso for inverse covariance estimation across multiple classes. J R Stat Soc Series B Stat Methodol2014;76:373–97.

Fan

, Feng

, Wu

Network exploration via the adaptive LASSO and SCAD penalties. Ann Appl Stat2009;3:521.

Foygel

, Drton

Extended Bayesian information criteria for Gaussian graphical models. In Advances in Neural Information Processing Systems, Vancouver, BC, 604–12. San Diego, CA: NeurIPS, 2010.

Friedman

, Hastie

, Tibshirani

Sparse inverse covariance estimation with the graphical lasso. Biostatistics2008;9:432–41.

Goldman

, Craft

, Hastie

et al Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol2020;38:675–8. doi: 10.1038/s41587-020-0546-8

Guo

, Levina

, Michailidis

et al Joint estimation of multiple graphical models. Biometrika2011;98:1–15.

Langfelder

, Horvath

WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics2008;1:559.

Lartigue

, Bottani

, Baron

et al Gaussian graphical model exploration and selection in high dimension low sample size setting. IEEE Trans Pattern Anal Mach Intell2020;43:3196–213.

Lauritzen

SL.

Graphical Models. Vol. 17. Oxford, UK: Clarendon Press, 1996.

, Craig

, Bhadra

The graphical horseshoe estimator for inverse covariance matrices. J Comput Graph Stat2019a;28:747–57.

, Jackson

SA.

Gene network reconstruction by integration of prior biological knowledge. G3 (Bethesda)2015;5:1075–9. doi: 10.1534/g3.115.018127.

, Mccormick

, Clark

Bayesian joint spike-and-slab graphical lasso. In: International Conference on Machine Learning, Long Beach, CA. 3877–85. PMLR, 2019b.

Lingjærde

, Fairfax

, Richardson

et al Scalable multiple network inference with the joint graphical horseshoe. Ann Appl Stat2023; in press.

Lingjærde

, Lien

, Borgan

et al Tailored graphical lasso for data integration in gene network reconstruction. BMC Bioinformatics2021;22:498.

Liu

, Roeder

, Wasserman

Stability approach to regularization selection (StARS) for high dimensional graphical models. Adv Neural Inf Process Syst2010;24:1432–40.

Meinshausen

, Bühlmann

High-dimensional graphs and variable selection with the lasso. Ann Statist2006;34:1436–62.

Miricescu

, Totan

, Stanescu-Spinu

I-I

et al PI3K/AKT/mTOR signaling pathway in breast cancer: from molecular landscape to clinical aspects. Int J Mol Sci2020;22:173.

Nair

, Huang

T-T

, Murai

et al Resistance to the CHK1 inhibitor prexasertib involves functionally distinct CHK1 activities in BRCA wild-type ovarian cancer. Oncogene2020;39:5520–35.

Nielsen

, Luo

, Biteau

et al 14-3-3-Epsilon antagonizes FoxO to control growth, apoptosis and longevity in Drosophila. Aging Cell2008;7:688–99.

Peterson

, Stingo

, Vannucci

Bayesian inference of multiple Gaussian graphical models. J Am Stat Assoc2015;110:159–74.

Schwarz

Estimating the dimension of a model. Ann Statist1978;6:461–4.

Szklarczyk

, Gable

, Lyon

et al STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res2019;47:D607–13.

Trovik

, Wik

, Stefansson

et al; MoMaTec Study Group. Stathmin overexpression identifies high-risk patients and lymph node metastasis in endometrial cancer. Clin Cancer Res2011;17:3368–77.

Tzivion

, Dobson

, Ramakrishnan

FoxO transcription factors; Regulation by AKT and 14-3-3 proteins. Biochim Biophys Acta2011;1813:1938–45.

Wang

Bayesian graphical lasso models and efficient posterior computation. Bayesian Anal2012;7:867–86.

Wang

Scaling it up: stochastic search structure learning in graphical models. Bayesian Anal2015;10:351–77.

Yang

J-Y

, Yoshihara

, Tanaka

et al; Cancer Genome Atlas (TCGA) Research Network. Predicting time to ovarian carcinoma recurrence using protein markers. J Clin Investig2013;123:3740–50.

Zhang

, Liu

, Zhang

et al; CPTAC Investigators. Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell2016;166:755–65.