Biometrical Journal () , zzz–zzz / DOI: 10.1002/bimj.200100000 An empirical Bayes approach to network recovery using external knowledge Gino B. Kpogbezan∗ ,1, Aad W. van der Vaart 1, Wessel N. van Wieringen 2,3, Gwenaël G. R. Leday 4 , and Mark A. van de Wiel 2,3 1 Mathematical Institute, University of Leiden, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands 2 Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands 3 Department of Epidemiology and Biostatistics, VU University Medical Center, PO Box 7057, 1007 MB Amsterdam, The Netherlands 4 MRC Biostatistics Unit, Cambridge Institute of Public Health, Forvie Site, Cambridge CB2 0SR, United Kingdom Received zzz, revised zzz, accepted zzz Reconstruction of a high-dimensional network may benefit substantially from the inclusion of prior knowl- edge on the network topology. In the case of gene interaction networks such knowledge may come for instance from pathway repositories like KEGG, or be inferred from data of a pilot study. The Bayesian framework provides a natural means of including such prior knowledge. Based on a Bayesian Simultaneous Equation Model, we develop an appealing Empirical Bayes (EB) procedure which automatically assesses the agreement of the used prior knowledge with the data at hand. We use variational Bayes method for pos- terior densities approximation and compare its accuracy with that of Gibbs sampling strategy. Our method is computationally fast, and can outperform known competitors. In a simulation study we show that accurate prior data can greatly improve the reconstruction of the network, but need not harm the reconstruction if wrong. We demonstrate the benefits of the method in an analysis of gene expression data from GEO. In par- ticular, the edges of the recovered network have superior reproducibility (compared to that of competitors) over resampled versions of the data. Key words: Empirical Bayes; High-dimensional Bayesian inference; Prior information; Undi- rected network; Variational approximation. Supporting Information for this article is available from the author or on the WWW under http://dx.doi.org/10.1022/bimj.XXXXXXX (please delete if not applicable) 1. Introduction Many areas of the quantitative sciences have witnessed a data deluge in recent years. This is due to an increased capacity of measuring and storing data in combination with a reduction in costs of acquiring this data. For instance, in the medical field high-throughput platforms yield mea- surements of many molecular aspects (e.g. gene expression) of the cell. As many as 20, 000 genes of a single patient can be characterized simultaneously. However, although the costs of such techniques have gone down over the years, the number of patients n in a typical clinical study is still small compared to the number of variables p measured. Reliable analysis of data of such a “n p” study is difficult. In this paper we try to solve the problem of few replicate measurements by incorporating external (or "prior”) data in the analysis. To allow interpretation, we restrict ourselves to predefined subsets of genes (e.g. pathways) for which p is usually moderately larger than n. High-dimensional modelling based on a small data set is particularly challenging in studies of relation- ships between variables. The number of potential pairwise relationships between even a modest number of ∗Corresponding author: e-mail: g.b.kpogbezan@math.leidenuniv.nl c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com 2 Kpogbezan et al.: Undirected network inference genes is p(p−1)/2. However, some of these relationships may be known from the vast body of medical lit- erature available. For instance, the current beliefs on interactions among genes is condensed in repositories like KEGG and Reactome. Although such information may not be reliable, or be only partially relevant for the case at hand, its flexible inclusion may help the analysis of high-dimensional data. Methodology that exploits such prior information may accelerate our understanding of complex systems like the cell. The cohesion of variables constituting a complex system is often represented by a network, also referred to as a graph. A graph G consists of a pair (I, E) where I = {1, ..., p} is a set of indices representing nodes (the variables of the system) and E is the set of edges (relations between the variables) in I × I. An edge can be characterized in many ways, we concentrate on it representing conditional independence between the node pair it connects. More formally, a pair (i1, i2) ∈ E if and only if random variables represented by nodes i1 and i2 are conditionally dependent, given all remaining nodes in I. All pairs of nodes of I not in E are conditionally independent given the remaining nodes. Graphs endowed with this operationalization of the edges are referred to as conditional independence graphs (Whittaker, 1990). Conditional independence graphs are learned from data by graphical models. Graphical models specify how data are generated obeying the relations among the variables as specified by a conditional indepen- dence graph. A Gaussian Graphical Model (GGM) assumes data are drawn from a multivariate normal distribution: Y j ∼iid N(0,Ω−1p ), j ∈ {1, ..., n}. (1) Here Y j is a p-dimensional random vector comprising the p random variables Y j1 , . . . , Y j p corresponding to the nodes of I and Ω−1p is a non-singular (p × p)-dimensional covariance matrix. The matrix Ωp, as opposed to its inverse, is referred to as the precision matrix. For a GGM the edge set E of the underlying conditional independence graph corresponds to the nonzero elements of Ωp (Lauritzen, 1996). Hence, to reconstruct the conditional independence graph it suffices to determine the non-zeros elements of this matrix. Reconstruction of the conditional independence graph may concentrate on the direct estimation of the precision matrix. Here we choose a different estimation strategy. This exploits an equivalence between Gaussian graphical models and Simultaneous Equations Models (SEMs), which we introduce first before pointing out the equivalence. Our choice for SEM is mainly motivated by its flexibility and its performance. It can account for experimental or biological covariates in the regression, and extensions to non-Gaussian data are available (Chen et al., 2015; Allen and Liu, 2013; Yang et al., 2012; Ravikumar et al., 2010). Its Bayesian counterpart is appealing for including prior knowledge, which likely is more complicated in many other frameworks. Its good performance in comparison with alternatives including (sparse) graphical models was demonstrated by Leday et al. (2015). In addition, SEM is also computational efficient (Mein- shausen and Bühlmann, 2006). We treat SEMs as a system of regression equations, with each equation modelling the conditional distribution of a node given the other nodes. If we collect all observations on node i ∈ I in a vector Yi := (Y 1i , . . . , Y ni )T , then we can write: Yi = Xiβi + i, i ∈ I, (2) where Xi is the n× (p− 1)-matrix with columns the observations of the p− 1 nodes different from i, i.e. Xi = [Y1, Y2, ..., Yi−1, Yi+1, ..., Yp] (where the square brackets mean “combine the vectors in a matrix”). The error vector i is defined by the equation, and possesses a multivariate Gaussian distribution N(0, σ2i In) under the GGM. (The covariances between the errors of different equations are in general non-zero, but are left unspecified.) The equivalence between the thus formulated SEM and the GGM as specified above stems from the one-to-one relationship between the regression parameters of the SEM and the elements of the GGM’s precision matrix (Lauritzen (1996)): βi,r = −ω−1ii ωir. In particular, (non)zero entries in the i-th row vector of the precision matrix Ωp correspond to the (non)zero coefficients of βi. The problem of identifying (non)zero entries in Ωp can therefore be cast as a variable selection problem in the p regression models (2). Lasso regression (Tibshirani, 1996) may be used for this purpose (as in Meinshausen and c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com Biometrical Journal () 3 Bühlmann (2006)), but other variable selection methods have also been employed. The problem that every partial correlation appears in two regression equations is usually resolved by post-symmetrization through application of the ‘AND’-rule: an edge (i, j) ∈ E if and only if βi,j 6= 0 and βj,i 6= 0 (Meinshausen and Bühlmann, 2006). Graph structures recovery based on model (2) performs well and is widely used in practice. Previously, we proposed a Bayesian formulation of the SEM (Leday et al., 2015). In this Bayesian SEM (henceforth BSEM) the structural model (2) is endowed with the following prior: i|σ2i , τ2i ∼ N(0n, σ2i In), βi|σ2i , τ2i ∼ N(0s, σ2i τ−2i Is), τ2i ∼ Gamma(a1, b1), σ−2i ∼ Gamma(a2, b2), (3) where I is an identity matrix, s = p − 1, and Gamma(a, b) denotes a gamma distribution with shape parameter a and rate parameter b, and τ2i and σ −2 i are independent. The normal-gamma-gamma (NGG) prior of model (3) regularizes the parameter estimates (e.g. estimated as the posterior mean) in two distinct ways. First, due to the normal prior on the regression coefficients βi,r (corresponding to a ridge penalty), the estimates of these parameters are shrunken locally (i.e. within each equation) to zero. Second, the esti- mates are simultaneously shrunken globally (i.e. across equations), due to the fact that the hyperparameters α = {a1, b1, a2, b2} do not depend on the index i. There seems to be no reason to connect the error vari- ances (which reflect the noise levels of the genes) across the equations, and hence we use a vague prior (e.g. a2 = b2 = 0.001). In contrast, estimating the parameters a1, b1 in EB fashion is advantageous, as it further “borrows information” across the regression equations. The resulting global shrinkage improves inference in particular for large networks (see also Section 5). Note that assuming a Gaussian distribution for the regression coefficients is also done in ridge regression and random effects models. The BSEM model can be fit computationally efficiently by a variational method, and generally outperforms the aforementioned lasso regression approach to the estimation of model (2). Furthermore, variables can be accurately selected based on the marginal posterior distributions of the regression coefficients (Leday et al., 2015). The problem of network reconstruction is challenging due to the vast space of possible graphs for even a moderate number of variables. This endeavour is further complicated by the inherent noise in the measurements used for the reconstruction. Fortunately, network reconstruction need not start from scratch, as often similar networks have been studied previously. Prior information on the network may be available in the literature, repositories, or simply as pilot data. It is natural to take such information along in network reconstruction. Many works have already been devoted to incorporating prior knowledge into network reconstruction. Among these studies, Imoto et al. (2003) use energy functions to incorporate prior knowledge sources into Bayesian gene regulatory network models and propose the incorporation of many types of different prior knowledge, including literature-based knowledge. The approach of Imoto et al. has been extended by Werhli and Husmeier which proposed a framework to incorporate multiple sources of prior knowledge into dynamic Bayesian network using MCMC sampling (Werhli and Husmeier, 2007). In the same line, Steele et al. proposed an advanced text-mining technique to incorporate literature-based prior knowledge into Bayesian network learning of gene networks. Similarly, Li et al. developed an approach that combines literature mining and microarray analysis in constructing biological networks (Li et al., 2006). Murkherjee and Speed (2008) proposed a method to incorporate network features including edges, classes of edges, degree distributions, and sparsity using MCMC sampling in Bayesian network learning. Still in Bayesian network learning, Isci et al. (2013) proposed also a framework to incorporate multiple sources of external knowledge where the incorporation of external knowledge uses Bayesian network infrastructure itself. However, none of these proposed methods explicitly estimate the agreement of the prior knowledge with the data at hand. In this paper we develop a method for incorporating external data or prior information into the recon- struction of a conditional independence network. To this aim we extend in Section 2 the Bayesian SEM c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com 4 Kpogbezan et al.: Undirected network inference framework (2)-(3). The extension incorporates prior knowledge in a flexible manner. Next in Section 3 we develop a variational Bayes approach to approximate the posterior distributions of the regression parame- ters for given hyperparameters, and show this to be comparable in accuracy to Gibbs sampling, although computationally much more efficient. In Section 4 this is complemented by a derivation of an empirical Bayes approach to estimate the hyperparameters. Using simulations we show in Section 5 that the method performs better, in terms of ROC curves, than BSEM when the prior knowledge agrees with the data, and is as accurate when it is not. In Section 6 we show the full potential of our approach on real data. We conclude the paper with a discussion. 2. Model The BSEM approach, comprising model (2) with priors (3), is modified to incorporate external information on the to-be-reconstructed network. The resulting model is referred to as BSEMed (BSEM with external data). Prior knowledge on the network is assumed to be available as a “prior network”, which specifies which edges (conditional independencies) are present and absent. This is coded in an adjacency matrix P, which contains only zeros and ones corresponding to the absence and presence of an edge in the prior network. That is, Pi,r = 1 if node i is connected with node r and Pi,r = 0 otherwise. Note that the adjacency matrix P is symmetric (for the purpose of undirected network reconstruction). The BSEMed approach keeps equation (2), but replaces the priors (3) of BSEM by: i|σ2i , τ2i,0, τ2i,1 ∼ N(0n, σ2i In), βi|σ2i , τ2i,0, τ2i,1 ∼ N(0s, σ2iDτ−2i ), Dτ−2i = diag(τ−2i,1 , ..., τ −2 i,s ), τ2i,r = { τ2i,0 ∼ Gamma(a0, b0), if Pi,r = 0, τ2i,1 ∼ Gamma(a1, b1), if Pi,r = 1, σ−2i ∼ Gamma(a2, b2). (4) where βi = βi,1, ..., βi,i−1, βi,i+1, ..., βi,p. The normal-gamma-gamma-gamma (NGGG) prior (4) retains the ridge-type regularization of the regres- sion parameters βi,r of (3), through Gaussian priors on these coefficients. The crucial difference between the two priors reveals itself in the variances of the latter priors. For each regression equation i there are two possible variances: βi,r ∼ { N(0, σ2i τ −2 i,0 ), if Pi,r = 0, N(0, σ2i τ −2 i,1 ), if Pi,r = 1. Hence, the regression coefficients corresponding to edges that are present according to the prior information share the same variance, and similarly for the other set of regression coefficients. Both variances can be both small and large, as they are themselves modelled through Gamma priors, where small values lead to small regression coefficients. If the prior information on the network were correct, then naturally a small value of τ−2i,0 would be desirable, smaller than the value of τ −2 i,1 . However, the construction is flexible in that the two values, and even their priors, are not fixed a-priori. In (4) the two parameters τ−2i,0 and τ−2i,1 are assumed to have gamma priors, with different hyperparameters (a0, b0) and (a1, b1). For further flexibility these hyperparameters will be estimated from the data with an empirical Bayes method. Then, if the absence of an edge in the prior network is supported by the current data, the corresponding regression coefficient βi,r may stem from a prior with a small variance, and will tend to be small; a similar, but opposite, situation will occur for edges that are present in the prior network. Indeed in Section 5 we shall see that the EB approach will tend to find similar values of τ2i,0 and τ 2 i,1 when the prior knowledge is non-informative, and rather different values otherwise. The fact that model (4) contains the model (3) as a submodel, provides robustness against the misspec- ification of the prior information. Although the number of latent variables in (4) is considerably higher c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com Biometrical Journal () 5 (namely p− 1 additional variances, one for each regression equation), the actual number of extra parame- ters is only two (the pair (a1, b1)). This suggests that if the prior information doesn’t agree with the data at hand, then the cost in terms of precision of the estimators is minor. It is amply compensated by the gains if the prior information is correct. We corroborate this in our simulation study in Section 5. In this connection it is also of interest to note the flexible roles of τ2i,0 and τ 2 i,1, τ 2 i,0 (resp. τ 2 i,1) is freely estimated from the data using the absent (resp. present) prior connections. We allow τ2i,0 < τ 2 i,1 which accommodates (rare) situations in which a prior is complementary to the data. 3. Variational Bayes method and Gibbs sampling In this section we develop a variational Bayes approach to approximate the (marginal) posterior distributions of the parameters βi,r, τ2i,0, τ 2 i,1, σ 2 i in model (4). The algorithm is similar, but still significantly different, from the algorithm developed in Leday et al. (2015) for the model (3). In the following we can see that, due to (4), the variational parameters have a form which renders the implementation of (4) much more challenging. We also verify that these approximations are accurate by comparing them to the results obtained using a Gibbs sampling strategy, which is much slower. Computational efficiency is an important characteristic, especially for fitting large networks. In this section we work on a single regression equation, i.e. for a fixed index i, and given hyperpa- rameters ak, bk, for k = 0, 1, 2. In the next section we combine the regression equations to estimate the hyperparameters. 3.1. Variational Bayes inference. In general a “variational approximation” to a distribution is simply the closest element in a given target set Q of distributions, usually with “distance” measured by Kullback-Leibler divergence. The set Q is chosen both for its computational tractability and accuracy of approximation. Distributions Q with stochastically independent marginals (i.e. product laws) are popular, and then the “accuracy” of approximation is naturally restricted to the marginal distributions. In our situation we wish to approximate the posterior distribution of the parameter θ := (βi, τ2i,0, τ 2 i,1, σ 2 i ) given the prior (4) and the observation Yi given in (2), for a fixed i. Here in (2) we take Xi (which depends on Yj for j 6= i) as given, as in a fixed-effects linear regression model. For p(·|Yi) the posterior density in this model, the variational Bayes approximation is given as q∗ = argmin q∈Q Eq log q(θ) p(θ|Yi) , where the expectation is taken with respect to the density q ∈ Q. For p(Yi, θ) the joint density of (Yi, θ), this is equivalent to finding the maximizer of Eq log p(Yi, θ) q(θ) . (5) By the nonnegativity of the Kullback-Leibler divergence, the latter expression is a lower bound on the marginal density p(Yi) = ∫ p(Yi, θ) dθ of the observation, and it is usually referred to as “the lower bound”. Solving the variational problem is equivalent to maximizing this lower bound (over Q). We choose the collection Q equal to the set of distributions of θ for which the components βi, τ2i,0, τ2i,1 and σ 2 i are stochastically independent, i.e. q(θ) = ∏4 l=1 ql(θl), where the marginal densities ql are arbitrary. Given such a factorization of q it can be shown in general (see e.g. Ormerod and Wand (2010)), that the optimal marginal densities q∗l satisfy: q∗l (θl) ∝ exp(Eq\l log p(Yi, θ)), where Eq\l = Eq1 . . .Eql−1Eql+1 . . .Eq4 . c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com 6 Kpogbezan et al.: Undirected network inference It can be shown (see the Supplementary Material) that in model (4) for regression equation i, with θ = (βi, τ 2 i,0, τ 2 i,1, σ −2 i ), this identity can be written in the “conjugate” closed-form βi|Yi ∼ N ( β∗i ,Σ ∗ i ) , τ2i,0|Yi ∼ Gamma ( a∗i,0, b ∗ i,0 ) , τ2i,1|Yi ∼ Gamma ( a∗i,1, b ∗ i,1 ) , σ−2i |Yi ∼ Gamma ( a∗i,2, b ∗ i,2 ) , (6) where Σ∗i = [ Eq∗4 (σ −2 i ) ( XTi Xi +DEq∗2 ·q∗3 (τ 2 i ) )]−1 , β∗i = [ XTi Xi +DEq∗2 ·q∗3 (τ 2 i ) ]−1 XTi Yi, a∗i,0 = a0 + 1 2s 0, b∗i,0 = b0 + 1 2Eq∗4 (σ −2 i )Eq∗1 (β 0 i T β0i ), a∗i,1 = a1 + 1 2s 1, b∗i,1 = b1 + 1 2Eq∗4 (σ −2 i )Eq∗1 (β 1 i T β1i ), a∗i,2 = a2 + 1 2n+ 1 2s, b ∗ i,2 = b2 + 1 2Eq∗\4 ( βTi Dτ2i βi ) + 12Eq∗1 (Yi −Xiβi)T (Yi −Xiβi), where s0 and s1 are the number of 0’s and 1’s in the i-th row of the adjacency matrix P, not counting the diagonal element; and β0i = {βi,r : r ∈ I\i,Pi,r = 0} and β1i = {βi,r : r ∈ I\i,Pi,r = 1} are the coordinates of the vector of regression parameters corresponding to these 0’s and 1’s. Furthermore DEq∗2 ·q∗3 (τ 2 i ) = diag ( Eq∗2Eq∗3 (τ 2 i,1), ...,Eq∗2Eq∗3 (τ 2 i,s) ) . In these identities the optimal densities q∗l appear both on the left and the right of the equations and hence the identities describe the optimal densities only as a fixed point. In practice the identities are iterated “until convergence” from suitable starting values. The iterations also depend on the hyperparameters ak, bk. In the next section we describe how these parameters can be estimated from the data by incorporating updates of these parameters in the iterations. 3.2. Variational Bayes vs Gibbs sampling. Under the true posterior distribution the coordinates βi, τ 2 i,0, τ 2 i,1, σ 2 i are not independent. This raises the question how close the variational approximation is to the true posterior distribution. As the latter is not available in closed form, we investigate this question in this section by comparing the variational approximation to the distribution obtained by running a Gibbs sampling algorithm. As for the network reconstruction we only use the marginal posterior distributions of the regression parameters, we restrict ourselves to these marginal distributions. The full conditional densities of BSEMed can be seen to take the explicit form: βi|Yi, τ2i,0, τ2i,1, σ−2i ∼ N(β∗i ,Σ∗i ), τ2i,0|Yi, βi, τ2i,1, σ−2i ∼ Gamma(a∗i,0, b∗i,0), τ2i,1|Yi, βi, τ2i,0, σ−2i ∼ Gamma(a∗i,1, b∗i,1), σ−2i |Yi, βi, τ2i,0, τ2i,1 ∼ Gamma(a∗i,2, b∗i,2), where the parameters Σ∗i , β ∗ i , a ∗ i,k and b ∗ i,k satisfy the same system of equations as in the variational algorithm, except that all expectations Eq∗ must be replaced by the “current” values taken from the condi- tioning (see Supplementary Material). Thus Gibbs sampling of the full posterior (βi, τ2i,0, τ 2 i,1, σ −2 i )|Yi is easy to implement, although slow. c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com Biometrical Journal () 7 We ran a simulation study with a single regression equation (say i = 1) with n = p = 50, and compared the variational Bayes estimates of the marginal densities with the corresponding Gibbs sampling-based estimates. Thus we sampled n = 50 independent replicates from a p = 50-dimensional normal distribution with mean zero and (p × p)-precision matrix Ω, and formed the vector Y1 and matrix X1 as indicated in (2). The precision matrix was chosen to be a band matrix with a lower bandwidth bl equal to the upper bandwith bu. It is bl = bu = 4, thus a total number of 9 band elements including the diagonal. For both the variational approximation and the Gibbs sampler we used prior hyperparameters a2 = b2 = 0.001 and prior hyperparameters aˆ0, bˆ0, aˆ1, bˆ1 fixed to the values set by the global empirical Bayes method described in Section 4. The Gibbs iterations were run nIter = 100, 000 times, after which the first nBurnin = 1000 iterates were discarded. Histograms based on subsampling every 10th value of the iterations are compared with the variational Bayes approximation to the marginal posterior densities. The correspondence between the two methods is remarkably good (see the Supplementary Material). We conclude that the variational Bayes method gives reliable estimates of the posterior marginal dis- tributions. The computing times in seconds are 40 for BSEMed and 2542 × 50 = 127, 100 for the Gibbs sampling (in R). The variational method clearly outperforms the Gibbs sampling method, which would hardly be feasible even for n = p = 50. 4. Global empirical Bayes for BSEMed Model (4) possesses three pairs of hyperparameters (ak, bk), for k ∈ {0, 1, 2}. The pair (a2, b2) controls the prior of the error variances σ2i ; we fix this to numerical values that render a vague prior, e.g. to (0.001, 0.001). In contrast, we let the values of the parameters α = (a0, b0, a1, b1) be determined by the data. As these hyperparameters are the same in every regression model i, this allows information to be borrowed across the regression equations, leading to global shrinkage of the regression parameters. A natural method to estimate the parameter α is to apply maximum likelihood to the marginal likelihood of the observations in the Bayesian BSEMed model determined by (2) and (4). Here “marginal” means that all parameters except α are integrated out of the likelihood according to their prior. The approach is similar to the one in van de Wiel et al. (2012). As a first simplification of this procedure we treat the vectors Y1, . . . , Yp as independent, thus leading to a likelihood of product form. As the exact marginal likelihoods of the Yi are intractable, we make a second simplication and replace these likelihoods by the lower bound (5) to the variational Bayes criterion (see Supplementary Material). Recall that in model (4) each regression parameter βi,r corresponds to one of two normal priors, that is: βi,r ∼ { N(0, σ2i τ −2 i,0 ), if Pi,r = 0, N(0, σ2i τ −2 i,1 ), if Pi,r = 1. It is the regression coefficients corresponding to edges that are not present according to the prior informa- tion share the same precision τ2i,0, and similarly the coefficients corresponding to the edges that are present obtain the precision τ2i,1. Both precisions are assumed to have gamma priors with different hyperparame- ters that are adapted by the current data by the means of the global EB procedure described above. Then, if the absence of an edge in the prior network is supported by the current data, the corresponding regression coefficient βi,r will have a small variance, and will tend to be small; a similar, but opposite, situation will occur for edges that are present in the prior network. In next Section we shall see that the EB approach will tend to find similar values of τ2i,0 and τ 2 i,1 when the prior knowledge is non-informative, and rather different values otherwise. We developed a dedicated edge selection algorithm for BSEM model in Leday et al. (2015). It is based on summarizing βi,r and βr,i by κ¯i,r, κ¯i,r = (κi,r + κr,i)/2 with κi,r = ∣∣Eqi∗ [βi,r|yi]∣∣√ Vqi∗ [βi,r|yi] (7) c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com 8 Kpogbezan et al.: Undirected network inference where Eqi∗ [βi,r|yi] and Vqi∗ [βi,r|yi] denote the approximate posterior expectation and variance of βi,r obtained in Section 3. The κ¯i,r values are ranked and corresponding edges are consecutively included according to a local false discovery rate (lfdr) criterion, which explores the relationship between lfdr and Bayes factors. Details are given in the Supplementary material. 5. Numerical investigation To study the effect of including a prior network in the model frame- work we compare BSEMed with BSEM. Hereto, we generated data Y 1, . . . , Y n according to (1), for p = 100 and n ∈ {50, 200}, which reflect a high- and a low-dimensional situation, respectively. We con- sidered precision matrices Ωp, which imply band, cluster and hub network topologies (Zhao et al., 2012) (See Supplementary Material). For BSEMed we vary the quality of the prior network information: ‘perfect’ prior information, i.e. the generating model; ‘75%’ true edges; ‘50%’ true edges; ‘0%’ true edges. To generate 75% (or 50%, or 0%) true information, we swapped 25% (or 50%, or 100%) of the true edges with the same number of absent edges, i.e. in the adjacency matrix P that describes the prior network we swapped these percentages of 1s with 0s. It may be noted that in the last case the prior network is completely wrong for the true edges, but not for the absent edges due to over-sampling of the 0’s, which seems realistic. Each simulation is repeated 50 times. We display the performances of BSEM and BSEMed by ROC curves, as based on ranking κ¯i,r, which summarizes βi,r and βr,i (7) (see Figure 1). We observe from Figure 1 that BSEMed performs better than BSEM when the prior information agrees the data and as good as BSEM when the prior doesn’t. The latter reflects the adaptive nature of the EB procedure. We also consider the EB estimates. We summarize the precisions by their prior means, as estimated by the EB procedure: E(τ2i,k) = aˆk/bˆk, for k ∈ {0, 1}. When there is some agreement of the prior knowledge with the data, we expect aˆ0/bˆ0 > aˆ1/bˆ1. In the case with 0% true edges, the prior is partly wrong: none of the truly present edges are in the prior network while some of the truly absent edges are part of the prior network. Hence, we expect the EB procedure to produce aˆ1/bˆ1 that are slightly larger than aˆ0/bˆ0. As discussed in Section 2 for the complementary case, reversal of the roles of the two priors can still improve performance of BSEMed, or at least not deteriorate it. The EB estimates of the prior means are presented in Table 1 for the case corresponding to Figure 1(a): band structure, n = 50. aˆ0/bˆ0 aˆ1/bˆ1 ratio true 366.10 8.08 45.30 0.75% true edges 272.97 14,36 19.00 0.50% true edges 216.10 27.56 7.84 0% true edges 142.59 152.95 1.07 Table 1 EB estimates of the prior means of precisions τ2i,0 and τ2i,1 in case of the band structure and n = 50 for various qualities of prior information Table 1 displays the prior means of precision, as estimated by EB, for BSEMed models with various qualities of prior information. It is clear that the better the quality of the prior information is, the larger the ratio of mean prior precisions is. Tables for other simulation settings are available in the Supplementary material. These generally show the same pattern. Figure 2 displays BSEM and BSEMed estimates of βi,r (3) and (4) for the band structure when n = 50 and p = 100 using the R package rags2ridges (Peeters and van Wieringen, 2014; van Wieringen and Peeters, 2014). Figures 1 & 2 show that BSEMed estimates become more accurate when prior knowledge quality increases and are as good as BSEM estimates when using 0% true edges information. It is also easy to see (Figure 2) a convergence of the BSEMed estimates to the true graph when the prior knowledge quality increases. c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com Biometrical Journal () 9 Average false positive rate Av e ra ge tr u e p os itiv e r a te 0.00 0.05 0.10 0.15 0.20 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 BSEM BSEMed 0% true BSEMed 50% true BSEMed 75% true BSEMed 100% true (a) Band: n = 50 Average false positive rate Av e ra ge tr u e p os itiv e r a te 0.00 0.05 0.10 0.15 0.20 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 BSEM BSEMed 0% true BSEMed 50% true BSEMed 75% true BSEMed 100% true (b) Band: n = 200 Average false positive rate Av e ra ge tr u e p os itiv e r a te 0.00 0.05 0.10 0.15 0.20 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 BSEM BSEMed 0% true BSEMed 50% true BSEMed 75% true BSEMed 100% true (c) Cluster: n =50 Average false positive rate Av e ra ge tr u e p os itiv e r a te 0.00 0.05 0.10 0.15 0.20 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 BSEM BSEMed 0% true BSEMed 50% true BSEMed 75% true BSEMed 100% true (d) Cluster: n = 200 Average false positive rate Av e ra ge tr u e p os itiv e r a te 0.00 0.05 0.10 0.15 0.20 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 BSEM BSEMed 0% true BSEMed 50% true BSEMed 75% true BSEMed 100% true (e) Hub: n = 50 Average false positive rate Av e ra ge tr u e p os itiv e r a te 0.00 0.05 0.10 0.15 0.20 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 BSEM BSEMed 0% true BSEMed 50% true BSEMed 75% true BSEMed 100% true (f) Hub: n = 200 Figure 1 ROC curves for BSEM (dashed) and BSEMed using perfect prior information (blue), BSEMed using 75% true edges present in the prior (brown), BSEMed using 50% true edges present in the prior (black) and BSEMed using 0% true edges present in the prior (red). Here, p = 100 and n ∈ {50, 200}. c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com 10 Kpogbezan et al.: Undirected network inference (a) True graph (b) BSEMed: perfect prior (c) BSEMed: 50 % true Info (d) BSEM Figure 2 Visualization of BSEMed ’κ¯i,r’ using perfect prior (b), BSEMed ’κ¯i,r’ using 50% true edges information (c), BSEM ’κ¯i,r’ (d) and the true graph (a) in case n = 50 and p = 100. c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com Biometrical Journal () 11 6. Illustration We turn to real data in this section. We use gene expression data from the Gene Expression Omnibus (GEO) to illustrate and evaluate methods for reconstructing gene networks. We consider two types of cancer and cancer-related pathways. First, we focus on the Apoptosis pathway with p = 84 genes in a lung data set (Landi et al., 2008), consisting of nlung1 = 49 observations from normal tissue and nlung2 = 58 observations from tumor tissue, so n lung = 107 in total. Secondly, we considered the p53 pathway in a pancreas data set (Badea et al., 2008) with p = 68 genes, consisting of npancreas1 = 39 observations from normal tissue and npancreas2 = 39 observations from tumor tissue, hence n pancreas = 78 in total. Note that the data were scaled per gene prior to the computations. BSEMed, BSEM, Graphical Lasso (GLλ) (Friedman et al., 2008), SEM with the Lasso penalty (SEML) (Meinshausen and Bühlmann, 2006) and GeneNet (Schäfer et al., 2006) were applied on the tumor data parts of the data sets. For BSEMed, the corresponding data parts from normal tissue were used as prior knowledge by fitting genes networks on the normal data using BSEM. The idea is that, while tumors and normal tissue may differ quite strongly in terms of mean gene expression, the gene-gene interaction network may be relatively more stable. We first illustrate the results from BSEM and BSEMed. Before considering the edge selection, we compare the total log-marginal likehood, as estimated by the variational lower bound, across the regression models for BSEM (3) and BSEMed (4) as a measure for goodness-of-fit. For the lung data set (resp. pancreas data set) we obtained −7082.93 for BSEM and −7071.99 for BSEMed (resp. −3807.58 for BSEM and −3798.91 for BSEMed). These improvements are clearly larger than what may be expected under random prior information of the same size, as shown in Supplementary Material in Section 7. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37383940414243444546 4748 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 (a) BSEM network estimate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37383940414243444546 4748 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 (b) BSEMed network estimate Figure 3 BSEM vs BSEMed network estimates in lung cancer. Red edges are the overlap edges. Figure 3 (Figure 4) displays the estimated gene-gene network interaction in lung cancer (pancreas can- cer) and their overlaps using the described selection procedure with estimated lfdr ≤ 0.1. Considerable overlap (red edges), but also notable differences can be seen. Table 2 displays the prior means of precision, as estimated by EB. The prior network is clearly of use: the mean prior precision for regression parameters corresponding to the edges absent in the prior network is relatively large, which effectuates stronger shrinkage towards zero than for parameters corresponding to edges present in the prior network. c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com 12 Kpogbezan et al.: Undirected network inference 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31323334353637 3839 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 (a) BSEM network estimate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31323334353637 3839 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 (b) BSEMed network estimate Figure 4 BSEM vs BSEMed network estimates in pancreas cancer. Red edges are the overlap edges. aˆ0/bˆ0 aˆ1/bˆ1 ratio Lung 27.32 1.71 15.97 Pancreas 20.03 1.21 12.97 Table 2 EB estimates of precisions τ2i,0 and τ2i,1 of prior distributions in lung data (resp. pancreas data) set. In the following, we argue that BSEMed network estimates may be more reliable in this setting than those of BSEM, Graphical Lasso (GLλ) (Friedman et al., 2008), SEM with the Lasso penalty (SEML) (Meinshausen and Bühlmann, 2006) and GeneNet (Schäfer et al., 2006) (see the Supplementary Material for methodological details). For that, we assess performance of all methods by studying reproducibility of edges. We randomly split the tumor data part of the lung data set (pancreas data set) into two equal and independent parts: nlung2,1 and n lung 2,2 (resp. n pancreas 2,1 and n pancreas 2,2 ). BSEM, BSEMed, GLλ, GeneNet and SEML were applied on each subset of the tumor data. We repeated the procedure 50 times. We report in Table 3 (Table 4) the average number of overlapping edges between the two subsets for each method when the total number of edges selected by each method on each subset is set to 50, 100 and 200. # edges BSEM overlap GeneNet overlap SEML overlap GLλ overlap BSEMed overlap # prior edges in BSEMed 50 4.56 1.88 1.32 3.42 29.58 13.4 100 10.68 5.7 5.64 7.86 37.88 22.14 200 24.16 17.2 16.46 18.14 51.54 33.7 Table 3 Lung data, reproducibility study: Average number of overlapping edges among the top 50 (100, 200) strongest ones in two equally-sized splits of the tumor data for BSEMed, BSEM, GLλ, GeneNet and SEML. c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com Biometrical Journal () 13 # edges BSEM overlap GeneNet overlap SEML overlap GLλ overlap BSEMed overlap # prior edges in BSEMed 50 7.42 3.32 2.8 4.52 27.82 11.92 100 17.46 10.34 9.08 11.4 57.18 29.22 200 44.14 30.94 28.54 33.66 81.66 54.1 Table 4 Pancreas data, reproducibility study: Average number of overlapping edges among the top 50 (100, 200) strongest ones in two equally-sized splits of the tumor data for BSEMed, BSEM, GLλ, GeneNet and SEML. We observe from Tables 3 & 4 that the results from the BSEMed networks are much more reproducible than that of BSEM, which is on its turn more reproducible than the other ones. Clearly, the improvement can partly be explained by overlapping edges that were also part of the prior network. However, it is clear from Figure 5 that the BSEMed network estimate in tumor tissue is not just a ‘finger print’ of the prior network (normal tissue network): BSEMed can even reveal edges that are neither in prior network nor in BSEM network estimate. BSEM BSEMed Priors Edges 2 3.2 0.7 20.14 12.54 4.78 (a) Lung data BSEM BSEMed Priors Edges 4.24 2 0.24 24.98 25.96 10.98 (b) Pancreas data Figure 5 Venn diagrams displaying the mean overlap of reproduced top-ranking edges, corresponding to the second row of Table 3 (Figure 5.a) and Table 4 (Figure 5.b). Figure 6 (resp. Figure 7) displays the network in normal tissue against the network in tumor tissue in the lung data (resp. in the pancreas data). The purpose of displaying Figure 6 and 7 is to emphasize the dysregulation of gene-gene interactions in cancer (Vogelstein and Kinzler, 2004; van Wieringen and van der Vaart, 2015) which may be caused by the heterogeneity of cancer (Nowell, 1976). Heterogeneity of tumor samples makes it more difficult to pinpoint reliable links, hence our selection algorithm which is based on local fdr ≤ 0.1 is likely to select fewer links in cancer samples. c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com 14 Kpogbezan et al.: Undirected network inference 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37383940414243444546 4748 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 (a) Network estimate in Normal tissue 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37383940414243444546 4748 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 (b) BSEMed network estimate in tumor tissue Figure 6 Network in a normal cell vs BSEMed network in lung cancer. Red edges are the overlap edges between prior and posterior networks. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31323334353637 3839 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 (a) Network estimate in Normal tissue 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31323334353637 3839 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 (b) BSEMed network estimate in tumor tissue Figure 7 Network in a normal cell vs BSEMed network in pancreas cancer. Red edges are the overlap edges between prior and posterior networks. 7. Discussion We have presented a new method for incorporating prior information in undi- rected network reconstrustion based on Bayesian SEM. Our approach allows the use of two central Gaus- sian distributions per regression equation for coefficients βi,r’s of our SEMs, where the prior information determines which of the two applies to a specific βi,r. Empirical Bayes estimation of the parameters of the two hyper priors of the precisions introduces shrinkage and accommodates the situation where there would c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com Biometrical Journal () 15 not be an agreement of the prior information with the data at hand. We showed in simulation with different graph structures that BSEMed outperforms BSEM when the used prior knowledge (partially) agrees with the data and as good as when not. In addition, for two real data sets we showed better reproducibility of top ranking edges with respect to other methods . In some cases, it may be desirable to give more weight only to some important edges of the prior graph rather than the whole graph. In gene regulatory networks reconstruction particularly, this may be edges that are known to characterise the disease biology. Assuming one is able to express such prior information as prior probabilities on edges, our software is able to incorporate such information via the Bayes factors used in the post-hoc selection procedure (Leday et al., 2015). Likewise, a user can also increase the weight of the entire prior graph uniformly (See Supplementary Material for details). Instead of assigning Gaussian distributions to the coefficients, other (e.g. sparse) priors can be used. However, the fast variational Bayes method for posterior density approximation may not be valid anymore. For instance, would one use Horseshoe priors (Carvalho et al., 2010), the variational marginals are non- existent. The complement property (Section 2 ) is preserved whenever the same functional forms of the priors are used for both classes. However, a combination of e.g. a Gaussian and a sparse prior ruins this property, which renders such a combination less attractive. Future research also focuses on extending our method to situations with more than two classes. For example, when considering integrative networks for two sets of molecular markers or two (related) path- ways, the three class setting is relevant: two classes represent the connections within the two sets and a third one between the two sets. Finally, multiple sources of external data may be available for incorpora- tion in BSEMed. This requires to model the parameter(s) of the priors in terms of contibutions of those external sources, and weigh those sources in a data-driven manner, as it is unlikely that the sources are equally informative. Acknowledgements The research leading to these results has received funding from the European Research Coun- cil under ERC Grant Agreement 320637. Conflict of Interest The authors have declared no conflict of interest. References Allen, G. I. and Liu, Z. (2013). A Local Poisson Graphical Model for Inferring Networks From Sequencing Data. NanoBioscience, IEEE Transactions on 12, 189–198. Badea, L., Herlea, V., Dima, S. O., Dumitrascu, T. and Popescu, I. (2008). Combined gene expression analysis of whole-tissue and microdissected pancreatic ductal adenocarcinoma identifies genes specifically overexpressed in tumor epithelia. Hepatogastroenterology 55, 2016–2027. Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97, 465–480. Chen, S., Witten, D. M. and Shojaie, A. (2015). Selection and estimation for mixed graphical models. Biometrika 102, 47–64. Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441. Imoto, S., Higuchi, T., Goto, T., Tashiro, K., Kuhara, S., and Miyano, S. (2003). Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks In Proceedings of the IEEE Computer Science Bioinformatics Conference (CSB’03), IEEE, Washington DC, USA, pp. 104–113. Isci, S., Dogan, H., Ozturk, C., and Otu, H. H. (2013). Bayesian network prior: network analysis of biological data using external knowledge. Bioinformatics , 860–867. Landi, M. T., Dracheva, T., Rotunno, M., Figueroa, J. D., Dasgupta A., Liu, H., Mann, F. E., Fukuoka, J., Hames, M., Bergen, A. W., Murphy, S. E., Yang, P., Pesatori, A. C., Consonni, D., Bertazzi, P. A., Wacholder, S., Shih, J. c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com 16 Kpogbezan et al.: Undirected network inference H., Caporaso, N. E., and Jen, J. J. (2008). Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS ONE 3, e1651. Lauritzen, S. (1996). Graphical Models. The Clarendon Press, Oxford University Press, New York . Leday, G. G. R., de Gunst, M., Kpogbezan, G. B., van der Vaart, A. W., van Wieringen, W. N., and van de Wiel, M. A. (2015). Gene network reconstruction using global-local shrinkage priors. arXiv preprint arXiv:1510.03771[stat.ME]. To appear in The Annals of Applied Statistics. Li, S., Wu, L., and Zhang, Z. (2006). Constructing biological networks through combined literature mining and mi- croarray analysis: a LMMA approach Bioinformatics 22, pp. 2143—2150. Meinshausen, N., and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics 34, 1436–1462. Mohammadi, A. and Wit, E. C. (2015). Bayesian structure learning in sparse Gaussian graphical models. Bayesian Anal. 10, 109–138. Mukherjee, S., and Speed, T. (2008). Network inference using informative priors. Proceedings of the National Academy of Sciences 105, 14313–14318. Nowel, P. C. (1976). The clonal evolution of tumor cell populations. Sciences 194, 23–28 Omerod, J., and Wand, M. (2010). Explaining variational approximations. The American Statistician 64, 140–153. Peeters, C. F. W., and van Wieringen, W. N. (2014). rags2ridges: Ridge estimation of precision matrices from high- dimensional data. R package version 1.4 Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using l1- regularized logistic regression. Ann. Statist. 38, 1287—1319. Schäfer, J., Opgen-Rhein, R., and Strimmer, K. (2006). Reverse engineering genetic networks using the GeneNet package. R News 6, 50–53. Steele, E., Tucker, A., ’t Hoen, P.A.C., and Schuemie, M. J. (2009). Literature-based priors for gene regulatory net- works Bioinformatics 25, pp. 1768—1774. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288. van de Wiel, M. A., Leday, G. G. R., Pardo, L., Rue, H., van der Vaart, A. W., and van Wieringen, W. N. (2012). Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics . van Wieringen, W. N. and Peeters, C. F. W. (2014). Ridge estimation of inverse covariance matrices from high- dimensional data. arXiv preprint arXiv:1403.0904[stat.ME]. van Wieringen, W. N. and van der Vaart, A. W. (2015). Transcriptomic heterogeneity in cancer as a consequence of dysregulation of the genegene interaction network. Bull. Math. Biol. 77, 1768—1786 . Vogelstein, B. and Kinzler, K. W. (2004). Cancer genes and the pathways they control. Nature Medicine 10, 789–799. Werhli, A. and Husmeier, D. (2007). Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge. Statistical Applications in Genetics and Molecular Biology 6. Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley, Chichester. Yang, E., Ravikumar, P., Allen, G. and Liu, Z. (2012). Graphical models via generalized linear models. In Advances in Neural Information Processing Systems 25 (P. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger, eds.) 1367—1375. Zhao, T., Liu, H., Roeder, K., Lafferty, J. and Wasserman, L. (2012). The huge package for high-dimensional undi- rected graph estimation in R. J. Mach. Learn. Res. 13, 1059—1062. c© WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com