Propensity score analysis with partially observed covariates: how should multiple imputation be used? Cle´mence Leyrat 1, Shaun R. Seaman 2, Ian R. White 2, Ian Douglas 3, Liam Smeeth 3, Joseph Kim 1,4, Matthieu Resche-Rigon 5,6, James R. Carpenter 1,7 and Elizabeth J. Williamson1,8 May 11, 2017 1Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK 2MRC Biostatistics Unit, Cambridge CB2 0SR, UK 3Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK 4IMS Health, Real-World Evidence Solutions, London N1 9JY , UK 5SBIM Biostatistics and Medical information, Hpital Saint-Louis, APHP, 75010 Paris, France 6ECSTRA Team (Epide´miologie Clinique et Statistiques pour la Recherche en Sante´), UMR 1153 INSERM, Universite´ Paris Diderot, Sorbonne Paris Cite´, 75010 Paris, France 7MRC Clinical Trials Unit at UCL, London WC2B 6NH, UK 8Farr Institute of Health Informatics, London NW1 2DA, UK Corresponding author: Cle´mence Leyrat, London School of Hygiene and Tropical Medicine. Department of Medical Statistics. Keppel Street, London WC1E 7HT. UK clemence.leyrat@lshtm.ac.uk Abstract Inverse probability of treatment weighting (IPTW) is a popular propensity score (PS)-based approach to estimate marginal treatment effects in observational studies at risk of confounding bias. A major issue when estimating the PS is the presence of partially observed covariates. Multiple imputation (MI) is a natural approach to handle missing data on covariates: covariates are imputed and a PS analysis is performed in each imputed dataset to estimate the treatment effect. The treatment effect estimates from each imputed dataset are then combined to obtain an overall estimate. We call this method MIte. However, an alternative approach has been proposed, in which the PSs are combined across the imputed datasets (MIps). Therefore, there are remaining uncertainties about how to implement MI for PS analysis: (i) should we apply Rubin’s rules to the IPTW treatment effect estimates or to the PS estimates themselves? (ii) does the outcome have to be included in the imputation model? (iii) how should we estimate the variance of the IPTW estimator after MI? We studied the consistency and balancing properties of the MIte and MIps estimators and per- formed a simulation study to empirically assess their performance for the analysis of a binary outcome. We also compared the performance of these methods to complete case (CC) analysis and the missingness pattern (MP) approach, which uses a different PS model for each pattern of missingness, and a third MI approach in which the PS parameters are combined rather than the PSs themselves (MIpar). Under a missing at random (MAR) mechanism, CC and MP analyses were biased in most cases for estimating the marginal treatment effect, whereas MI approaches were approximately unbiased as long as the outcome was included in the imputation model. Only MIte was unbiased in all the studied scenarios and Rubin’s rules provided good variance estimates for MIte. The PS estimated in the MIte approach showed good balancing properties. In conclusion, when using MI in the IPTW context, MIte with the outcome included in the imputation model is the preferred approach. Keywords: missing covariates, chained equations, Rubin’s rules, inverse probability of treatment weighting, missingness pattern 1 1 Introduction Data from observational studies provide useful information to address health-related questions and notably estimate treatment effects in real settings [1]. However, because individuals are not randomised, the treatment groups are often not comparable, which may lead to confounding bias [2] if these studies are analyzed without appropriate adjustment for confounding. Propensity scores (PS) have been proposed as a means to recover bal- ance between groups on observed covariates and so obtain a consistent estimate of the causal treatment effect [3]. The PS is defined as the individual’s probability of receiving the treatment rather than the control given their baseline characteristics [4]. One popular method to achieve covariate balance between treatment groups is to weight individuals by the inverse of their PS. This approach, known as inverse probability of treatment weighting (IPTW) [5] aims to emulate the sample that would have been observed in a randomised trial. In practice, a major issue when estimating the PS is the presence of partially observed covariates, as the PS cannot be estimated for individuals with at least one missing covariate value. Standard analyses include complete case analysis [6] or the missingness pattern approach [7], but a popular alternative to handle missing data is multiple imputation (MI) [8]. However, there remain four important unresolved questions about how to implement MI for PS analysis. First, two approaches to combine information from the imputed datasets have been proposed in the context of PS analyses: combining the treatment effects estimated on each imputed dataset (method called MIte hereafter) or combining for each individual their PS value across the imputed datasets (method called MIps hereafter) [9, 10]. MIte intuitively seems to adhere best to the MI philosophy by applying the full analysis strategy on each imputed dataset. Furthermore, Seaman and White [11] proved that for an infinite number of imputations, the MIte estimator is consistent. However, Mitra and Reiter [9] published recommendations on the use of MI for PS analysis, and they advocated using MIps, rather than MIte, for PS matching. The second question is whether or not to include the outcome in the imputation model for the missing covari- ates. Mitra and Reiter did not include it in their study [9], which could explain the bias they observed for both MIte and MIps. One of the advantages claimed for the PS approach in general is that it allows the investigator to use the data on the treatment and covariates to develop a well-fitting PS model without needing to look at the data on the outcome [9]. This makes it possible to avoid the temptation to search for a PS model that gives a significant treatment effect estimate, a temptation which may be experienced by an analysis which handles confounding by using regression adjustment. Intuition may therefore lead one to believe that imputation of missing covariates should also be done without using the outcome variable [9]. However, this intuition conflicts with advice to include the outcome when imputing missing covariates in a regression model whose parameters are the quantities of interest [12]. The third question is how to estimate the variance of the IPTW estimator after MI. When pooling the treatment effects (MIte), Seaman and White [11] showed that Rubin’s rule for estimating the variance performs well in practice, although theoretical justification for Rubin’s rules relies on the parameter of interest being estimated with maximum likelihood, which is not the case for the IPTW method. For MIps, there is, to our knowledge, no variance estimator that takes into account both the uncertainty due to the estimation of the PS given the complete data and that due to the imputation of the missing data. Finally, the fourth question is how well MI performs in comparison to other popular approaches to handle miss- ing data, namely complete case (CC) analysis or the missingness pattern (MP) approach. Qu and Lipkovitch [13] and Seaman and White [11] assessed the performance of MIte additionally including the missingness pattern indicator in the PS model, but they did not compare this approach to other combination rules after MI or to the MP approach alone. In this article, we show that the best method when using MI with IPTW is to include the outcome in the PS model and use Rubin’s rules to combine the treatment effect estimates from the imputed datasets (the method MIte). An informal survey of recent articles using MI with IPTW revealed that suboptimal methods 2 are commonly being used. In particular, several studies have following Mitra and Reiter’s recommendation to use MIps rather than MIte, and in several of these studies imputation is done without including the outcome in the imputation model (e.g. [14, 15, 16, 17, 18, 19, 20]). Another example of suboptimal MI is that of Hayes and Groner [21], who randomly selected one imputation for each individual and estimated that individual’s PS based on just this single imputation. Therefore, we conducted this work to address the unresolved questions and provide new recommendations. This paper is organised as follows: we present a motivating example looking at the effect of statins on short- term mortality after pneumonia in Section 2. A description of IPTW and its underlying assumptions is given in Section 3. Section 4 presents different strategies to handle partially missing covariates for PS analysis. The consistency and balancing properties of the MI approaches are studied in Section 5 and empirically assessed in Sections 6 and 7 through a simulation study. The application of these approaches to the statin motivating example is presented in Section 8, followed by a discussion in Section 9. 2 Motivating example: effect of statin use on short-term mortality after pneumonia To illustrate the importance of an adequate handling of missing covariates for PS analysis, we focused on a published study of the effect of statin use on short term mortality after pneumonia [22]. We utilised the THIN database, which consists of anonymised patient records from general practitioners (GPs) in the UK. As of the end of 2015, the database represented 3.5 million unique active patients, or approximately 6% of the UK population. The database has been found to be broadly representative of the UK population, and the validity of recorded information has been established in previous studies [23, 24]. Douglas et al carried out an analysis of 9073 patients who had a pneumonia episode, of whom 1398 were under statin treatment when pneumonia was diagnosed. In the statin group, 305 patients (21.8%) died within 6 months, while 2839 (37.0%) of the non-users died within 6 months. However, statin users and non users were very different in terms of characteristics, in particular on characteristics associated with mortality. In Douglas et al., propensity scores were used to recover balance between groups. However, three impor- tant potential confounders were only partially observed: body mass index (BMI), smoking status and alcohol consumption, with respectively 19.2%, 6.2% and 18.5% of missing data. In the original analysis, a missing indicator method was used. This approach is similar to the missingness pattern approach described later in this paper. We will explore whether handling the missing data using MI gives results that differ from those given by Douglas et al. 3 Inverse probability of treatment weighting 3.1 Propensity score estimation and assumptions Propensity scores (PS) have become a major tool in causal inference to estimate the causal effect of a binary treatment Z (Z=1 if treated, Z=0 otherwise) on an outcome in the presence of confounding [3]. The PS is the individual’s probability of receiving the treatment conditional on the individual’s values on a set of baseline covariates X [25]. The PS is usually estimated from the data using a logistic regression model which predicts each individual’s probability of receiving the treatment from their baseline covariates [26]. The PS approach is best understood using the potential outcomes framework [27], in which the causal effect of the treatment is de- fined as the difference between the two potential outcomes Y Z=0 and Y Z=1, which are the outcomes that would have been observed if an individual had been not treated and treated, respectively (this notation was invented in the context of randomised experiments). Three assumptions are usually made to consistently estimate the causal effect of a treatment: (i) positivity [28], (ii) Stable unit treatment value assumption (SUTVA) [29] and 3 (iii) strongly ignorable treatment assignment (SITA). Assumptions (i) and (ii) mean that each individual has a non-null probability of receiving either treatment and has only one possible potential outcome value for each treatment, respectively. Assumption (iii) implies that there are no unmeasured confounders. The key property of the PS is that it is a balancing score. That is, if assumptions (i)-(iii) are valid and the PS model is correctly specified, the variables included in the PS model are balanced between treatment groups at any level of the PS, i.e. they have the same conditional distribution in both groups given the PS. This leads to the consistency of PS-based estimators. Initially, three different PS-based approaches were proposed [3]: PS matching, subclassification and covariate adjustment. Although PS matching is the most common approach nowadays, matching often discards a substantial number of individuals from the analysis [30] and variance estimation after PS matching is not straightforward [31]. The two other approaches have drawbacks as well: residual bias due to heterogeneity within strata can remain with subclassification [7], whereas covariate adjustment can be biased in some circumstances [32]. Thus, we focus on a fourth approach [33]: inverse probability of treatment weighting (IPTW). 3.2 IPTW estimator and its variance for complete data IPTW aims to create a pseudo-population similar to a randomised trial by weighting the individuals by the inverse of their probability of receiving the treatment they actually received (i.e. eˆ−1i for treated individuals and (1− eˆi)−1 for untreated individuals). Thus, the IPTW estimators for the marginal proportions for a binary outcome Y , µ1 and µ0, among the treated and the untreated are [34]: µˆ1 = ( n∑ i=1 YiZi eˆi )( n∑ i=1 Zi eˆi )−1 , µˆ0 = ( n∑ i=1 Yi(1− Zi) 1− eˆi )( n∑ i=1 1− Zi 1− eˆi )−1 . (1) Zi is the treatment indicator for individual i (Zi = 1 if treated, 0 otherwise) and Yi is their outcome value. From these two marginal estimates, it is possible to estimate a relative risk ( µˆ1 µˆ0 ) , an odds ratio ( µˆ1/(1−µˆ1) µˆ0/(1−µˆ0) ) or a risk difference (µˆ1 − µˆ0) for a binary outcome. The IPTW estimator, as with other PS-based estimators, is a ”two-step estimator”. If the uncertainty linked to the PS estimation in the first step is not taken into account in the second step (treatment effect estimation), the repeated sampling variance of the treatment effect will be overestimated and inference will be conservative [31]. Lunceford and Davidian [35] and Williamson et al. [34] proposed a large-sample variance estimator for the IPTW treatment effect estimator in which a correction term including the variance/covariance matrix of the estimated PS parameters is applied. 4 Handling missing data in propensity score analysis A major issue in PS estimation is the presence of partially observed covariates. In this section, we describe five methods for applying IPTW to incomplete data. We assume the treatment status Z and outcome Y are fully observed. 4.1 Complete case analysis In complete case analysis, the PS is estimated only within the subgroup of individuals with observed values for all of the variables included in the PS model, and only these individuals contribute to the estimation of the treatment effect [36]. Although the complete case analysis is known to provide an unbiased estimate of the parameters of an outcome regression model when the missingness is independent of the outcome [6], little is known about complete case analysis for IPTW. Moreover, excluding individuals with missing covariates can reduce statistical power, because no use is made of partially observed records. 4 4.2 Missingness pattern approach Rosenbaum and Rubin [7] defined the generalized PS eˆ∗ as the probability of receiving the treatment given the observed covariates and the pattern of missing data. In practice, the PS is estimated separately in each stratum defined by missingness patterns using the covariates observed in that stratum, as long as the sample size is large enough in each stratum. When treatment allocation is independent of the potential outcomes given the observed covariates and the missingness pattern, the generalized propensity score balances the missingness indicators and the observed component of the partially observed covariates but, not surprisingly, may not balance the unobserved component [37]. 4.3 Multiple imputation The principle of multiple imputation (MI) is to generate multiple sets of plausible values for the missing variables by drawing from the posterior predictive distribution of these variables given the observed data. Variables of different types are often included in the PS model. Therefore, we focused on chained equations, in which a specific imputation model, including the outcome, is specified for each partially observed variable, rather than joint modelling to impute the missing data [38]. M complete datasets are created and analysed independently to produce estimates θˆk, (k = 1, ...,M) of θ the vector of the parameters of interest (eg. regression coefficients) and estimates W k of their associated variance matrix. Then θˆk and W k , (k = 1, ...,M) are combined across the M imputed datasets. Rubin’s rules for the mean and variance state that an overall estimate, θˆMI , of θ and an estimate of the variance of θˆMI , V̂ ar(θˆMI), are [8]: θˆMI = 1 M M∑ k=1 θˆk, V̂ ar(θˆMI) = W + ( 1 + 1 M ) B, (2) where W is the within-imputation variance-covariance matrix, which reflects the variability of the parameter estimates in each imputed dataset, and B is the between-imputation variance matrix reflecting the variability in the estimates caused by the missing information. These two components are defined as: W = 1 M M∑ k=1 W k, B = 1 M − 1 M∑ k=1 ( θˆk − θˆMI )2 . (3) Three variants of MI are MIte, MIps and MIpar. For complete data, the IPTW method involves three steps: i) estimation of the parameters α in the PS model; ii) calculation of individual PS’s; and iii) use of equation (1) to calculate µˆ0 and µˆ1. In MIte, all three steps are applied to each of the M imputed datasets separately. Then the M treatment effect estimates obtained are averaged. In MIps, the method recommended by Mitra and Reiter [9], the first two steps are applied to each imputed dataset separately. Then the M PS’s for each individual are averaged and equation (1) is applied once using these average PS’s. In MIpar, the first step is applied separately to each of the imputed datasets. The M values of α and each individual’s covariates Xi are then averaged over the M datasets. Each individual’s PS is then estimated as: eˆi (x¯i) = expα¯x¯i 1 + expα¯x¯i , (4) (i = 1, ..., n) with α¯ the (p + 1) vector of the average PS parameter values (for the p covariates and the intercept) and x¯i a (p+ 1) vector of the average p covariates across imputed datasets for individual i. Whereas the MIps estimate of treatment effect is based on the average PS, e¯i (xi), the MIpar estimate is based on the PS corresponding to the average of the covariates, eˆi (x¯i). Rubin’s Rules are based on the assumption that θˆk is approximately normally distributed, and the distribution of αˆk might be expected to be more approximately normal than the distribution of a PS (which is constrained to lie in [0,1]). The 3 MI approaches are illustrated in Figure 1. To obtain a variance estimator for the MIte estimator, Rubin’s rule for the variance (above) can be applied to the standard IPTW variance estimator for full data which takes account of the PS estimation. For 5 MIps and MIpar, because the PS is obtained from M imputations, the standard variance estimator for IPTW is no longer valid, since it does not take into account the uncertainty due to missing data. A large-sample estimate of the variance for MIpar, derived from [34], is given in Appendix 1. 5 Balancing properties and consistency of IPTW estimator after MI Without missing data, Rosenbaum and Rubin showed that the PS is a balancing score [3]. A balancing score b(x) is defined as a function of the observed covariates x such that the conditional distribution of x given b(x) is the same for Z = 0 and Z = 1. Moreover, Rosenbaum and Rubin showed that any balancing score b(x) is ’finer’ than the true PS, that is e(x) = f {b(x)}, for some function f(.). The consistency of PS estimators comes from this balancing property. Lunceford and Davidian [35] studied theoretical properties of the IPTW estimator when data are complete and gave a proof of consistency of this estimator. In this section, we study the consistency of the IPTW estimators obtained from MIte, MIps and MIpar and how this relates to balancing properties of the PS models used in these approaches. We suppose hereafter that (i) the SITA assumption required for IPTW (see 3.1) holds, (ii) the missing data are missing at random (MAR) and (iii) the imputation model is correctly specified. For simplicity, we consider here the estimation of θ = E[Y Z=1]. 5.1 Combining the treatment effects after MI (MIte) Let X, the vector of covariates, be split into observed and missing components, X = (Xobs,Xmiss). X (k) m is the imputed value of Xmiss in the k th imputed dataset (k = 1, ..,M). We show (see Appendix 2a) that in each imputed dataset: e(Xobs,X (k) m ) = E[Z|Xobs,X(k)m ]. (5) If X (k) m is imputed from the true model (i.e. correctly specified at the true parameter values), we can also show (Appendix 2b) that a SITA-type assumption holds in each imputed dataset, i.e.: Y z=1 ⊥ Z |Xobs,X(k)m and Y z=0 ⊥ Z |Xobs,X(k)m These two assumptions are the imputed-data version of what Imbens calls weak unconfoundedness [39]. Note that we do not have the analogue of the usual, stronger, assumption (Y z=1, Y z=0) ⊥ Z |Xobs,X(k)m , which requires the treatment to be independent of the set of potential outcomes. This is because our imputation model is a model for Xmiss|Z = z, Y Z=z,Xobs. The stronger assumption would require our imputation model to capture Xmiss|Z = z, Y z=0, Y z=1,Xobs. However, it is important to note, as Imbens does, that the weak unconfoundedness suffices to obtain unbiased estimates of the causal treatment effect. Balancing properties: We show in Appendix 2 that Xobs ⊥ Z | e(Xobs,X(k)m ) and X(k)m ⊥ Z | e(Xobs,X(k)m ). Thus the true PS in each completed dataset balance both the unobserved and the imputed values of the covariates across treatment groups. This balancing property is what leads to consistency of the MIte estimator. Consistency: Seaman and White [11] proved that for an infinite number of imputations, the estimator obtained by combining the treatment effects after MI (MIte) is consistent. To understand how this consistency relates to the SITA-type assumption above, it is helpful to consider the following expectation: E [ Y Z e(Xobs,X (k) m ) ] = E [ E [ Y Z e(Xobs,X (k) m ) ∣∣∣∣∣Xobs,X(k)m ]] = E [ E[Y z=1|Xobs,X(k)m ] E[Z|Xobs,X(k)m ] e(Xobs,X (k) m ) ] (6) = E[E[Y z=1|Xobs,X(k)m ]] (7) = E[Y z=1] = θ, 6 Step 6 requires the SITA-type assumption 6. Step 7 relies on PS in the kth imputed dataset being equal to the probability of being treated given the observed and imputed part of the covariates (equation 5). 5.2 Combining the PS or the PS parameters after MI (MIps and MIpar) For PS methods, consistency comes from the ability of PS to balance covariates between groups. MIps and MIpar create a single overall PS used to estimate the treatment effect. Thus, consistency for these methods would rely on the ability of these overall PS to balance both the observed and the missing parts of the covariates in the original incomplete dataset. Rosenbaum and Rubin’s results show that this can happen only if the single PS (as estimated in MIps or MIpar) is ‘finer’ than the true (full data) PS (in other words, if the true PS is a function of the single pooled PS) [3]. However, when combining the PS or the PS parameters, the overall PS used for the analysis is not a function of the observed covariates but rather is a function of the average covariates across imputed datasets, including the imputed values. Thus, the pooled PS (as estimated either in MIps or MIpar) is not ‘finer’ than the true PS according to Rosenbaum and Rubins definition (i.e. the true PS is not a function of the pooled PS). Consequently, it cannot be a balancing score. Thus, neither θˆMIps nor θˆMIpar are consistent estimators. We illustrate the lack of consistency with a counter example in Appendix 3. We also discuss the balancing properties of the MP approach in Appendix 4. 6 Simulation study design The aim of this simulation study is to assess the performance of the three MI approaches, complete case analysis and missingness pattern approach for IPTW when the outcome is binary and to compare them with non-MI methods. Data generation mechanisms We generated datasets of sample size n = 2000, reflecting an observational study comparing a treatment Z = 1 to a control treatment Z = 0 on a fully observed binary outcome Y with 3 measured covariates X = (X1, X2, X3). X1 and X2 were continuous and X3 was binary. X2 was fully observed whereas X1 and X3 were partially observed. We generated the data as follows: • Covariates: The 3 covariates X = (X1, X2, X3) are generated from a multivariate normal distribution X ∼ N3(0,Σ), with Σii = 1 and Σij = ρ for i 6= j. X3 is then dichotomised according to a threshold of 0 to obtain a prevalence of 0.5. • Treatment assignment depends on X according to the following model: logit(p(Z = 1|x)) = −1.15 + 0.7x1 + 0.6x2 + 0.6x3. (8) These coefficients give E(Z) = pZ = 0.3 and an important imbalance on covariates between treatment groups, as shown in Appendix 6. • Binary outcome: The outcome depends on the 3 covariates and the treatment received according the following model: logit(p(Y = 1|Z,x)) = −1.5 + 0.5x1 + 0.5x2 + 0.3x3 + θcZ. (9) In this model, exp θc is the conditional odds ratio (OR). We used the method described in [40] to find the value of θc that gives the desired relative risk (RR). • Missingness mechanism: In this simulation study, we consider a missing at random (MAR) mechanism. The missingness of each partially observed covariate (X1 and X3) depends on the fully observed covariate 7 X2, the treatment received Z and the outcome Y : logit(p(M1 = 1|Z, x1, x2, x3, y)) = γ0 + z + x2 + γY y, (10) logit(p(M3 = 1|Z, x1, x2, x3, y)) = γ0 + z + x2 + γY y, (11) where M1 and M3 are the missingness indicators for X1 and X3 (equal to 1 if the value is missing), respectively. The values considered for each simulation parameter are in Table 1. A full factorial design (2× 2× 2) leads to 8 scenarios, but 5 scenarios are added as a sensitivity analysis (Table 2). Methods For each studied scenario, 5000 datasets were simulated. We compared 9 methods, all of them using IPTW: treatment effect were estimated on the full dataset (before the imposition of missingness) and using CC, MP, and the 3 MI approaches (MIte, MIps and MIpar). For each of the 3 MI approaches, we considered two imputation models, including or not the outcome and we generated M = 10 imputed datasets. Simulations were performed in R and the mi package was used for multiple imputation [41], based on full conditional specification (FCS). Estimands We focused on 3 estimands: the log(RR), log(OR) and the risk difference (RD). Rubin’s rules were applied to the logarithm of the RRs (or the ORs), rather than to to RRs (or ORs) themselves, because the asymptotic normal approximation is likely to be better for the log RR (or log OR) than for the RR (or OR). This is important in particular when constructing confidence intervals, because log(OR) and log(RR) have unrestricted parameter spaces [42]. Measures of performance For each method, we computed the absolute bias of the mean treatment effect. We also estimated the variance of the treatment effect, as well as the empirical variance, the coverage rate and the standardized differences of the covariates after IPTW. The standardized differences for each covariate were used as a measure of method performance. In the absence of weighting, standardized differences are defined as: SDiff = 100× ∣∣X¯1 − X¯0∣∣√ sˆ21+sˆ 2 0 2 , (12) with X¯0, X¯1, sˆ 2 0 and sˆ 2 1 denoting the average value (or proportion if the covariate is binary) for the covariate and its estimated variance in the control and treatment group, respectively. After IPTW, standardized differences are calculated by replacing the unweighted means and variances in (12) by their weighted equivalents (weighted by inverse PS). For the MIte approach, standardized differences were calculated using the PS estimated from each imputed dataset, both to assess the balance on the originally simulated complete dataset (before imposing missingness) and on the given imputed dataset. For MIps and MIpar, standardized differences were calculated using the pooled PS to assess balance on (i) the original dataset, (ii) on the average value of the covariates across the imputed datasets. For (ii) we also calculated the standardized differences separately on the observed part of the covariates and the average imputed part. 8 7 Simulation study results 7.1 Main simulation study Because results were similar for the three measures of interest, we present the results for relative risks (RR) on the log scale only in the main text, while results for odds ratios and risk differences are in the appendices. 7.1.1 Bias The absolute bias of the log(RR) of the treatment, for ρ = 0.6, is presented in Figure 2. Since results for ρ = 0.3 are similar, they are presented in the Appendices. Full data, CC and MP analyses: As expected, the IPTW estimator on the full data (before generating missingness for X1 and X3) is approximately unbiased and the complete case (CC) estimator is strongly biased in all scenarios except those where the outcome is not associated with missingness and there is no treatment effect. The MP approach is always biased in the situations considered, with a bias which can be even stronger than that which is observed for the CC approach. The reason for this is an incorrect PS model specification in each pattern of missingness: in the strata in which a covariate is not observed, the covariate is omitted from the model. Multiple imputation: First, the results show that the imputation model must include the outcome, even if the outcome is not a predictor of missingness. All 3 MI estimators are strongly biased in all scenarios when the outcome is not included in the imputation model. Second, when the outcome is included in the imputation model, the 3 MI approaches lead to a decrease in bias relative to the crude analysis. However, only the MIte approach leads to an unbiased estimate in the 8 main scenarios. Combining the PS parameters to estimate the PS of the average covariates (MIpar) performed better than combining the PS themselves, but both these approaches are slightly biased. 7.1.2 Standardized differences between groups The bias observed for the MP, MIps and MIpar methods can be explained by a remaining imbalance on the covariates between groups. Standardized differences for each covariate are in Table 3. A covariate is usually considered adequately balanced if its standardized difference is < 10%. IPTW on the full data achieved a very good balance between groups on the 3 covariates (as expected, standardized difference < 5% for each of the 3 covariates). For the CC approach, groups were balanced but the bias occurs since excluded individuals are different from included individuals on confounding factors. The PS obtained from the MP approaches balanced the observed part of the covariates, but not the unobserved part. This means that within each pattern of missingness, treated and untreated individuals are balanced for the covariates included in the PS model, but unbalanced on the missing covariate because this covariate is an unmeasured confounder in the PS model. Thus, when the missingness rate increases, imbalances (and consequently, bias of the treatment effect estimate) increase. MIte performs well because the PS within each imputed dataset balances the observed and imputed components of the covariates in that imputed dataset (see Table 3). Conversely, MIps and MIpar could only be consistent by balancing the observed and missing covariates, which is not the case, as seen in Table 3. 7.1.3 Coverage rate and standard errors Figure 3 displays the coverage rate for each method when the outcome is included in the imputation model. 9 Each boxplot represents the coverage distribution for the 8 main scenarios. Because the CC and MP approaches are strongly biased, their coverage rates are not relevant. The coverage rate for the MIte approach is close to the nominal value of 95%, confirming that Rubin’s rules perform well in this context provided that the within- imputation variance estimation takes into account the uncertainty in PS estimation. Table 1 in Appendix 5.1 shows the mean estimates from different variance estimators for each method when the outcome is included in the imputation model. For MIte one could apply Rubin’s rules to a variance estimator that ignores the PS estimation (treating the PS weights as being known) or to a variance estimator that accounts for the PS estimation. We know that failing to account for the PS estimation results in an overestimation of the variance in the full-data context [43]. Table in Appendix 5.1 shows that for the analysis on full data and for MIte, the variances accounting for PS estimation are close to the empirical variance, whereas an estimator not taking the uncertainty in PSs estimation tends to overestimate the variance for these approaches, as expected. For MIps and MIpar, one could estimate the variance accounting for the variability linked to PS estimation but not the imputation procedure by using the pooled PS in the IPTW variance estimator or one could incorporate the PS estimation and imputation by applying the variance estimator we proposed in Appendix 1. For MIps and MIpar, there was little difference between the variance estimates accounting for the imputation and those that did not. Surprisingly, in our simulations, additionally accounting for the imputation procedure resulted in a lower variance estimator. This can be explained as follows: the within imputation variance of the PS parameters (reflecting the correlation between the covariates and treatment; the higher this is the larger gain in precision for IPTW) is higher than the between imputation variance component (noise due to missing data). However, when the missingness rate increases, the variance that correctly accounts for the imputation procedure is typically higher than the variance which ignores the imputation, because of a larger heterogeneity between imputed datasets. 7.2 Sensitivity simulation study Appendix 5.3 presents the results of one scenario with a non-null treatment effect with a smaller sample size (n = 500). Results were similar in terms of bias for n = 500 and n = 2000. Because the variance estimator for IPTW has been developed for large samples, we observed slightly underestimated variances for the full data analysis, MIps and MIpar. This underestimation is more pronounced in the CC analysis because the sample for the analysis is even smaller (269 on average when n = 500). Figure 4 shows the bias when 10% or 60% of each partially observed covariate is missing. Full results are presented in Appendix 5.4. For a low missingness rate, the CC and MP approaches are still biased but the 3 MI approaches corrected the bias. For a missingness rate of about 60% for each covariate, only the MIte approach showed good performance in terms of bias reduction, confirming the good statistical properties of this approach even with this large amount of missing data. In our sensitivity simulation study, increasing the number of imputed datasets did not strongly impact the results in terms of bias or variance (See Appendix 5.5). 8 Application to the motivating example We applied CC analysis, the MP approach and the three MI strategies to estimate the effect of statin treatment on mortality after pneumonia from our motivating example dataset. For simplicity, we analysed the primary outcome, mortality within 6 months, as a binary outcome, and estimated the corresponding relative risk and its 95% confidence interval. For each approach, IPTW was used to account for the confounding. We focused the analysis on the 7158 patients without coronary heart disease. The propensity score was estimated from a logistic regression modeling statin use as a function of the following covariates: age, sex, body mass index (BMI), alcohol consumption, smoking status, diabetes, cardiovascular disease, circulatory disease, heart fail- ure, dementia, cancer, hyperlipidaemia, hypertension and prescription of antipsychotics, hormone replacement therapy, antidepressants, steroids, nitrates, beta-blockers, diuretics, anticoagulants and use of antihypertensive 10 drugs. The imbalance between the study groups is illustrated in Figure 5. Complete case (CC) analysis was conducted on the 5168 individuals with complete records. For the missingness pattern approach (MP), 8 pat- terns were identified. However, some of these patterns were very rare. For instance, only 6 individuals had only the smoking status missing, and only 8 had both smoking status and BMI missing. Thus, we considered only 4 groups: • complete records (n=5168) for which all the covariates listed above are included • individuals with only the alcohol consumption missing (n=455) • individuals with only BMI missing (n=575) • individuals with the smoking status missing (alone or in addition to BMI and alcohol consumption) and individuals with both BMI and alcohol consumption missing (n=960) For multiple imputation (MI), 10 imputed datasets were created. The imputation model included statin use, mortality and all the variables listed above. The standardized differences estimated before weighting and after weighting by PS for CC, MP, MIte, MIps and MIpar are presented in Table 5. The MP and the 3 MI approaches lead to a similar reduction in imbalance between groups on the observed variables as compared to the crude standardized differences. Nevertheless, because of the poor overlap of the patients characteristics between groups (Figure 5), some covariates are still unbalanced even after MI. However, for binary covariates, large standardized differences can occur even for slight imbalance when the prevalence is low. Estimated RR are presented in Table 6. First, all approaches based on IPTW lead to a treatment effect estimate smaller than the unweighted treatment effect. The 3 MI approaches lead to similar RR and these were smaller than the RR obtained from CC and MP analyses. The small differences between the 3 MI approaches in this example can be explained by a low rate of missing data and the fact that the 3 partially observed covariates were not strong confounders. 9 Discussion This paper aimed to address three main questions about multiple imputation in the context of IPTW: (i) does the outcome have to be included in the imputation model? (ii) should we apply Rubin’s rules on the IPTW treatment effect estimates or on the PS estimates themselves? (iii) how should we estimate the variance of the IPTW estimator after MI? First, results showed that the outcome must be included in the imputation model, even if the outcome is not a predictor of missingness. This is well known in the context of multivariable regression, but can be seen as counter-intuitive in the PS paradigm, because the PS model for the full data does not use the outcome data. The simulation results showed a bias in the 3 MI estimators when the outcome was omitted from the imputation model, even when the outcome was not a predictor of missingness. In practice, to obtain an unbiased estimate of the intervention effect when using MI, the outcome should be included in the imputation model. This may not be always possible if the outcome data has not been yet recorded and the PS is used at the design stage of the study (to find good matched controls for example), rather than at the analysis stage. In such situations, it is worth considering whether the assumptions required for complete case analysis or the MP approach may be valid. If they are, one of these methods could be used instead of using MI omitting the outcome. Furthermore, the inclusion of the outcome in the imputation model may go against the principle of objective design, as described by Rubin [44]. However, when using MI, this is necessary to obtain a consistent estimator of the treatment effect. Second, we showed that combining the treatment effects after MI (MIte approach) is the preferred MI strategy in terms of bias reduction under a MAR mechanism. This estimator is the only estimator of the 3 MI estimators to be consistent and to provide good balancing properties. Even though MIps and MIpar are not consistent estimators of the treatment effect, they can reduce the bias of the CC analysis, if the rate of missing data is not too high. Combining the PS or the PS parameters has 11 no clear advantage for IPTW, but may be useful in the context of PS matching: because it involves only one estimation of the treatment effect, it could provide a computational advantage for large datasets. In addition, MIte for PS matching implies that the M treatment effect estimates are estimated from different matched sets, potentially of different sample sizes, leading to a more complex variance estimation because of these different sample sizes. Third, as long as the uncertainty in the PS estimation is taken into account in the variance estimation [34], Rubin’s rules perform well for MIte, even for moderate sample size (n=500). For MIpar, the proposed variance approximation (Appendix 1) showed good performance in our simulation study. The 3 MI approaches differ in terms of their balancing properties. We showed that whereas the PS estimated in each dataset in the MIte approach can balance covariates between groups in each imputed dataset, this is no longer true for MIps and MIpar. However, the best method to assess covariate balance after MI remains unknown. With MIte, the aim being to estimate a treatment effect from each dataset, we require balance between groups within each imputed dataset. In contrast, for MIps and MIpar, further investigation is needed to know if we should assess the balancing properties of the pooled PS on the average covariate values across the imputed dataset or on each dataset. Our recommendations for the optimal way to implement multiple imputation for PS analysis are in contradic- tion to those given by Mitra and Reiter, who suggest that pooling propensity scores across imputed datasets can sometimes out-perform our recommended method of pooling treatment effects [9]. In their simulation studies MIps (or MIpar) led to a bigger reduction in bias than MIte in some scenarios. This was due to a near violation of the positivity assumption in their simulated datasets [45] combined with the omission of the outcome from the imputation model. Penning de Vries and Groenwold [46] used Mitra and Reiters data-generating mecha- nism to perform a simulation study but they increased the effective sample size to ensure positivity. Under this condition and when the outcome was included in the imputation model, as we recommend, they showed MIte was superior, in accordance with our findings. While we understand the desire to adhere to the principle of objective design by omitting the outcome from the imputation model, our results show that this leads to bias in MIte and an increase in bias for MIps (and MIpar). Thus, when it is either undesirable or impossible to include the outcome in the imputation model we would suggest considering the validity of alternative approaches, such as complete case analysis or the MP approach, rather than adopting the systematically biased approach of imputation omitting the outcome from the imputation model. The MP approach, which is widely used in practice to handle missing data for PS analysis, showed poor per- formance under a MAR mechanism. In our simulation study, the MP approach does not perform well, because missing data were generated under a MAR mechanism but the MP approach relies on different assumptions about the missing data mechanism [37]. When the assumptions for the MP approach are valid, it avoids the need for an imputation model altogether, and hence avoids the need to use the outcome when calculating the PS. Because the assumptions for MI and MP are different, we believe that MP could be a promising alternative when the assumptions for MI are not valid. Further investigations are needed to understand the usefulness of the MP approach in practice. Moreover, its application to our real-life example dataset was challenging, because the sample size within each missingness pattern was not large enough to estimate the PS. This work has some limitations. We generated only 3 covariates in our simulation, whereas PS are often calculated from a large number of covariates. Moreover, we studied only the common situation where the PS model only includes main effects, and we assumed as well the model including main effects only for the outcome given the covariates. When there are interactions or quadratic terms in these two models, the specification of the imputation model can be less straightforward, requiring further efforts to ensure the imputation model is compatible with the substantive (analysis) model [46]. Furthermore, we focussed on the IPTW estimator only, because of its increasing use and its application in the presence of time-varying covariates and treatment. 12 Indeed, IPTW is widely used in marginal structural models (MSMs) to handle time-dependent confounding. Although this paper is about the simpler situation of a single time-point, it is important to clarify how to use IPTW after MI correctly in this context, because otherwise it is unlikely to be done correctly in the more complex situation of time-dependent confounding. However, some issues may arise with IPTW when some estimated weights are too extreme; PS matching can be a better solution in this context [47]. Nevertheless, because the PS estimated in either MIps or MIpar is not a balancing score, PS matching (like IPTW - and like also PS stratification and PS covariate adjustment) would not be expected to perform well when used in combination with MIPs or MIPar. Conversely, because in MIte, the estimated PS can balance both the observed and imputed component of the covariates in the imputed dataset, we would expect all the PS-based methods (PS matching, stratification, covariate adjustment and IPTW) to perform well to estimate the treatment effect within each imputed dataset as long as the underlying assumptions for these methods are fulfilled. In conclusion, for IPTW, multiple imputation followed by pooling of treatment effect estimates is the pre- ferred approach amongst those studied, when data are missing at random, and the outcome must be included in the imputation model. Funding This work has been supported by the Medical Research Council (project grant MR/M013278/1 and Unit Programme number U105260558). References [1] Concato J, Shah N, Horwitz RI. Randomized, Controlled Trials, Observational Studies, and the Hierarchy of Research Designs. New England Journal of Medicine. 2000 Jun;342(25):1887–1892. [2] Cochran WG, Rubin DB. Controlling Bias in Observational Studies: A Review. Sankhya¯: The Indian Journal of Statistics, Series A (1961-2002). 1973;35(4):417–446. [3] Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41 –55. [4] Guo S. Propensity score analysis : statistical methods and applications. Thousand Oaks Calif: Sage Publications; 2009. [5] Hirano K, Imbens GW. Estimation of Causal Effects using Propensity Score Weighting: An Application to Data on Right Heart Catheterization. Health Services and Outcomes Research Methodology. 2001 Dec;2(3):259–278. [6] White IR, Carlin JB. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Statistics in Medicine. 2010 Dec;29(28):2920–2931. [7] Rosenbaum PR, Rubin DB. Reducing Bias in Observational Studies Using Subclassification on the Propen- sity Score. Journal of the American Statistical Association. 1984 Sep;79(387):516–524. [8] Carpenter J, Kenward M. Multiple Imputation and its Application. John Wiley & Sons; 2012. [9] Mitra R, Reiter JP. A comparison of two methods of estimating propensity scores after multiple imputation. Statistical Methods in Medical Research. 2012 Jun;0(0):1–17. [10] Mitra R, Reiter JP. Estimating propensity scores with missing covariate data using general location mixture models [Monograph]; 2009. [Online]. Available from: http://eprints.soton.ac.uk/67154/. [11] Seaman S, White I. Inverse Probability Weighting with Missing Predictors of Treatment Assignment or Missingness. Communications in Statistics - Theory and Methods. 2014 Aug;43(16):3499–3515. [12] Moons KGM, Donders RART, Stijnen T, Harrell Jr FE. Using the outcome for imputation of missing predictor values was preferred. Journal of Clinical Epidemiology. 2006 Oct;59(10):1092–1101. 13 [13] Qu Y, Lipkovich I. Propensity score estimation with missing values using a multiple imputation missingness pattern (MIMP) approach. Statistics in Medicine. 2009 Apr;28(9):1402–1414. [14] Sulkowski JP, Cooper JN, Duggan EM, Balci O, Anandalwar SP, Blakely ML, et al. Does timing of neonatal inguinal hernia repair affect outcomes? Journal of Pediatric Surgery. 2015 Jan;50(1):171–176. [15] de Groot S, Redekop WK, Sleijfer S, Oosterwijk E, Bex A, Kiemeney LALM, et al. Survival in Patients With Primary Metastatic Renal Cell Carcinoma Treated With Sunitinib With or Without Previous Cytoreductive Nephrectomy: Results From a Population-based Registry. Urology. 2016 Sep;95:121–127. [16] Goel SS, Aksoy O, Gupta S, Houghtaling PL, Tuzcu EM, Marwick T, et al. Renin-angiotensin system blockade therapy after surgical aortic valve replacement for severe aortic stenosis: a cohort study. Annals of Internal Medicine. 2014 Nov;161(10):699–710. [17] Franklin JM, Eddings W, Schneeweiss S, Rassen JA. Incorporating linked healthcare claims to improve confounding control in a study of in-hospital medication use. Drug Safety. 2015 Jun;38(6):589–600. [18] Neuderth S, Schwarz B, Gerlich C, Schuler M, Markus M, Bethge M. Work-related medical rehabilitation in patients with musculoskeletal disorders: the protocol of a propensity score matched effectiveness study (EVA-WMR, DRKS00009780). BMC public health. 2016 Aug;16:804. [19] Jobarteh K, Shiraishi RW, Malimane I, Samo Gudo P, Decroo T, Auld AF, et al. Community ART Support Groups in Mozambique: The Potential of Patients as Partners in Care. PloS One. 2016;11(12):e0166444. [20] Weber-Schoendorfer C, Hoeltzenbein M, Wacker E, Meister R, Schaefer C. No evidence for an increased risk of adverse pregnancy outcome after paternal low-dose methotrexate: an observational cohort study. Rheumatology (Oxford, England). 2014 Apr;53(4):757–763. [21] Hayes JR, Groner JI. Using multiple imputation and propensity scores to test the effect of car seats and seat belt usage on injury severity from trauma registry data. Journal of pediatric surgery. 2008 May;43(5):924– 927. [22] Douglas I, Evans S, Smeeth L. Effect of statin treatment on short term mortality after pneumonia episode: cohort study. BMJ. 2011 Apr;342:d1642. [23] Lewis JD, Schinnar R, Bilker WB, Wang X, Strom BL. Validation studies of the health improvement network (THIN) database for pharmacoepidemiology research. Pharmacoepidemiology and Drug Safety. 2007 Apr;16(4):393–401. [24] Blak BT, Thompson M, Dattani H, Bourke A. Generalisability of The Health Improvement Network (THIN) database: demographics, chronic disease prevalence and mortality rates. Informatics in Primary Care. 2011;19(4):251–255. [25] Imbens GW, Rubin DB. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. New York, NY, USA: Cambridge University Press; 2015. [26] D’Agostino RB. Propensity score methods for bias reduction in the comparison of a treatment to a non- randomized control group. Statistics in Medicine. 1998 Oct;17(19):2265–2281. [27] Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 1974;66(5):688–701. [28] Cole SR, Herna´n MA. Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology. 2008 Sep;168(6):656–664. [29] Rubin DB. Comment: Which Ifs Have Causal Answers. Journal of the American Statistical Association. 1986 Dec;81(396):961–962. [30] Austin PC. A comparison of 12 algorithms for matching on the propensity score. Statistics in Medicine. 2014 Mar;33(6):1057–1069. [31] An W. Bayesian propensity score estimators: incorporating uncertainties in propensity scores into causal inference. Sociological Methodology. 2010 Aug;40(1):151–189. [32] Hade EM, Lu B. Bias associated with using the estimated propensity score as a regression covariate. Statistics in medicine. 2014 Jan;33(1):74–87. 14 [33] Rosenbaum PR. Model-Based Direct Adjustment. Journal of the American Statistical Association. 1987;82:387–394. [34] Williamson EJ, Forbes A, White IR. Variance reduction in randomised trials by inverse probability weight- ing using the propensity score. Statistics in Medicine. 2014 Feb;33(5):721–737. [35] Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine. 2004 Oct;23(19):2937–2960. [36] Hill J. Reducing Bias in Treatment Effect Estimation in Observational Studies Suffering from Missing Data. Columbia University Commons. 2004;Available from: http://hdl.handle.net/10022/AC:P:9697. [37] D’Agostino RB, Rubin DB. Estimating and Using Propensity Scores with Partially Missing Data. Journal of the American Statistical Association. 2000 Sep;95(451):749–759. [38] Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Multiple Imputation by Chained Equations: What is it and how does it work? International journal of methods in psychiatric research. 2011 Mar;20(1):40–49. [39] Imbens GW. The role of the propensity score in estimating dose-response functions. Biometrika. 2000 Jan;87(3):706–710. [40] Austin PC. The performance of different propensity score methods for estimating marginal odds ratios. Statistics in Medicine. 2007 Jul;26(16):3078–3094. [41] Gelman A, Hill J. Multiple Imputation with Diagnostics (mi) in R: Opening Windows to the Black Box. Journal of Statistical Software. 2011;40. [42] Collett D. Modelling Binary Data, Second Edition. CRC Press; 2002. Google-Books-ID: LMRAIBEbdqsC. [43] Tsiatis A. Semiparametric Theory and Missing Data. Springer Science & Business Media; 2007. Google- Books-ID: xqZFi2EMB40C. [44] Rubin DB. For Objective Causal Inference, Design Trumps Analysis. The Annals of Applied Statistics. 2008;2(3):808–840. Available from: http://www.jstor.org/stable/30245110. [45] Penning de Vries B, Groenwold R. Comments on propensity score matching following multiple imputation. Statistical Methods in Medical Research. 2016 Dec;25(6):3066–3068. [46] Bartlett JW, Seaman SR, White IR, Carpenter JR. Multiple imputation of covariates by fully condi- tional specification: Accommodating the substantive model. Statistical Methods in Medical Research. 2015 Aug;24(4):462–487. [47] Gutman R, Rubin D. Estimation of causal effects of binary treatments in unconfounded studies with one continuous covariate. Statistical Methods in Medical Research. 2015 Feb;. 15 Table 1: Factors used in the simulation study (2× 2× 2 factorial design) Factor Values Description Comments ρ 0.3 or 0.6 Correlation between the covariates After dichotomization, corr(X1;X3) = corr(X2, X3) = 0.24 or 0.46 RR 1 or 2 Relative risk In model (9), θc = 0 or 1.221 when ρ = 0.3, and 0 or 1.289 when ρ = 0.6. γY 0 or -0.4 Association between the outcome and the probabil- ity of missingness When γY = 0, γ0 = −1.5 and when γY = −0.4, γ0 = −1.3 in model (10) and (11) For each scenario, 5000 datasets of sample size 2000 were generated. The prevalence of the treatment was 0.3 and each partially observed covariate had 30% of data missing. 10 imputed datasets were created. Table 2: Simulation parameters used for the sensitivity analysis Factor Values Description n 500 Sample size pm 0.1 or 0.6 Missingness rate M 5 or 20 Number of imputed datasets For each scenario of the sensitivity analysis, 5000 datasets were generated. The other parameter values were: RR=2, ρ = 0.6 and γY = −0.4. 16 Table 3: Standardized differences (in %) after IPTW for each method for one scenario: RR=2, ρ = 0.6, outcome predictor of missingness and included in the imputation model. n=2000. Method X1 X2 X3 Crude (without IPTW) 81.3 74.7 51.7 Full 4.6 4.6 2.4 CC (n=1074) 7.6 7.3 3.5 MP Balance on full data 14.6 4.3 8.5 Balance on the observed part of the covariate 6.1 4.3 2.9 Balance on the missing part of the covariate 48.6 NA 28.3 MIte Balance on full data 15.0 4.5 9.1 Balance on each imputed dataset 4.5 4.5 2.4 MIps Balance on full data 15.9 5.5 10.7 Balance on the average imputed dataset 15.8 5.5 10.6 Balance on the observed part of the covariate 7.6 5.5 4.9 Balance on the imputed part of the covariate 58.1 NA 36.9 MIpar Balance on full data 15.1 4.8 9.6 Balance on the average imputed dataset 14.7 4.8 9.7 Balance on the observed part of the covariate 7.7 4.8 5.4 Balance on the imputed part of the covariate 52.5 NA 34.3 CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation. RR: relative risk. NA: not applicable because X2 is fully observed. 17 T ab le 4: B ia s of th e lo g( R R ), it s es ti m at ed va ri an ce an d co ve ra g e ra te fo r th e 3 M I a p p ro a ch es a cc o rd in g th e sa m p le si ze n fo r o n e sc en a ri o (R R = 2 , ρ = 0 .6 , o u tc o m e p re d ic to r of m is si n gn es s an d in cl u d ed in th e im p u ta ti on m o d el ). R es u lt s b a se d o n 5 0 0 0 si m u la ti o n s. F u ll C C M P M It e M Ip s M Ip a r n = 5 0 0 n = 2 0 0 0 n = 5 0 0 n = 2 0 0 0 n = 5 0 0 n = 2 0 0 0 n = 5 0 0 n = 2 0 0 0 n = 5 0 0 n = 2 0 0 0 n = 5 0 0 n = 2 0 0 0 B ia s 0 .0 0 7 0 .0 0 2 0 .1 1 0 0 .1 4 1 0 .1 5 3 0 .1 3 0 0 .0 1 0 0 .0 0 5 0 .0 3 8 0 .0 2 8 0 .0 2 4 0 .0 1 7 V a ri a n c e 0 .0 2 2 0 .0 0 6 0 .0 5 0 0 .0 1 4 0 .0 2 9 0 .0 0 8 0 .0 2 6 0 .0 0 7 0 .0 2 2 0 .0 0 6 0 .0 2 3 0 .0 0 6 E m p ir ic a l v a ri a n c e 0 .0 2 4 0 .0 0 6 0 .0 5 9 0 .0 1 4 0 .0 2 7 0 .0 0 7 0 .0 2 5 0 .0 0 6 0 .0 2 4 0 .0 0 6 0 .0 2 5 0 .0 0 6 C o v e ra g e ra te 0 .9 4 0 0 .9 4 7 0 .8 8 7 0 .7 6 9 0 .8 5 5 0 .6 9 1 0 .9 5 5 0 .9 5 7 0 .9 3 9 0 .9 3 2 0 .9 4 3 0 .9 4 2 M It e: tr ea tm en t eff ec ts co m b in ed a ft er m u lt ip le im p u ta ti o n ; M Ip s: p ro p en si ty sc o re s co m b in ed a ft er m u lt ip le im p u ta ti o n ; M Ip a r: p ro p en si ty sc o re p a ra m et er s co m b in ed a ft er m u lt ip le im p u ta ti o n . 18 T ab le 5: D es cr ip ti o n a n d co m p a ri so n o f st a ti n u se rs a n d n o n u se rs . n = 7 1 5 8 . V a r ia b le M is s in g ( % ) S t a t in u s e r s M is s in g ( % ) N o n s t a t in u s e r s S t a n d a r d iz e d d iff e r e n c e ( % ) n = 5 9 9 n = 6 5 5 9 C r u d e C C * M P M It e M Ip s M Ip a r C h a r a c t e r is t ic s A g e [m e a n (s d )] 6 6 .9 (1 0 .7 ) 6 9 .8 (1 0 .9 ) 2 7 .0 3 .8 2 .0 1 .4 1 .4 1 .4 M a le 3 2 2 (5 3 .8 ) 3 1 7 3 (4 8 .4 ) 1 0 .8 2 .0 6 .2 2 .2 2 .1 2 .2 B M I [m e a n (s d )] 4 3 (7 .2 ) 2 7 .6 (5 .9 ) 1 4 4 4 (2 2 .0 ) 2 5 .8 (5 .9 ) 3 1 .9 7 .8 9 .0 9 .0 1 1 .4 1 1 .4 D ri n k e rs 6 7 (1 1 .2 ) 9 8 (1 8 .4 ) 1 3 3 4 (2 0 .3 ) 8 1 4 (1 5 .6 ) 7 .6 2 .1 0 .3 2 .3 2 .9 3 .0 S m o k e rs 7 (1 .2 ) 2 5 6 (4 3 .2 ) 5 0 5 (7 .7 ) 2 7 2 8 (4 5 .1 ) 3 .7 1 .7 1 .5 2 .8 3 .0 3 .0 M e d ic a l h is t o r y D ia b e te s 2 4 3 (4 0 .6 ) 7 1 5 (1 0 .9 ) 7 2 .1 5 .0 7 .7 7 .1 7 .2 7 .1 C a rd io v a sc u la r d is e a se 1 4 1 (2 3 .5 ) 6 5 1 (9 .9 ) 3 7 .1 1 1 .4 1 1 .4 1 3 .6 1 3 .6 1 3 .6 C ir c u la to ry d is e a se 4 2 6 (7 1 .1 ) 3 4 7 1 (5 2 .9 ) 3 8 .2 1 3 .6 9 .8 1 6 .6 1 6 .7 1 6 .6 H e a rt fa il u re 5 1 (8 .5 ) 4 2 6 (6 .5 ) 7 .7 1 1 .6 6 .2 1 2 .8 1 2 .8 1 2 .8 C a n c e r 3 7 (6 .2 ) 6 0 7 (9 .2 ) 1 1 .5 2 .1 0 .4 0 .4 0 .0 0 .1 D e m e n ti a 6 (1 .0 ) 1 9 0 (2 .9 ) 1 3 .7 7 .3 1 3 .0 1 1 .6 1 1 .6 1 1 .6 H y p e rt e n si o n 3 3 6 (5 6 .1 ) 1 1 6 5 (1 7 .8 ) 5 2 .1 1 3 .3 2 1 .5 1 8 .7 1 8 .7 1 8 .7 H y p e rl ip id e m ia 2 0 5 (3 4 .2 ) 1 8 2 (2 .8 ) 8 8 .5 1 .1 4 .1 1 .9 2 .0 2 .0 T r e a t m e n t s A n ti d e p re ss a n t 1 0 8 (1 8 .0 ) 9 9 5 (1 5 .2 ) 7 .7 1 .7 5 .9 0 .3 0 .1 0 .1 A n ti p sy ch o ti c 1 1 (1 .8 ) 3 4 0 (5 .2 ) 1 8 .3 0 .5 1 1 .3 5 .0 5 .0 5 .0 H o rm o n e re p la c e m e n t th e ra p y 3 7 (6 .2 ) 2 7 7 (4 .2 ) 8 .8 0 .9 0 .9 1 .0 1 .0 1 .0 S te ro id 9 3 (1 5 .5 ) 1 0 9 0 (1 6 .6 ) 3 .0 1 .0 2 .2 0 .4 0 .3 0 .3 A n ti h y p e rt e n si v e 2 7 2 (4 5 .4 ) 1 1 6 5 (1 7 .8 ) 6 2 .3 1 2 .6 2 7 .5 1 8 .0 1 7 .8 1 7 .9 D iu re ti c s 3 1 9 (5 3 .3 ) 2 4 1 6 (3 6 .8 ) 3 3 .4 1 4 .3 1 9 .8 1 5 .8 1 5 .9 1 5 .9 B e ta b lo ck e r 1 9 3 (3 2 .2 ) 1 0 6 1 (1 6 .2 ) 3 8 .1 1 1 .4 7 .2 1 3 .8 1 3 .8 1 3 .8 N it ra te 7 4 (1 2 .4 ) 3 3 4 (5 .1 ) 2 5 .9 1 7 .3 1 4 .8 1 7 .5 1 7 .6 1 7 .6 F o r C C a n a ly si s, n = 5 1 6 8 (5 0 3 st a ti n u se rs a n d 4 6 6 5 n o n u se rs ). 19 Table 6: Estimate of the relative risk of mortality and its 95% confidence interval for statin vs non statin users (motivating example). n=7158. Method R̂R 95% CI(R̂R) Crude 0.587 [0.497;0.684] CC 0.702 [0.534;0.924] MP 0.708 [0.555;0.904] MIte 0.654 [0.513;0.835] MIps 0.653 [0.512;0.834] MIpar 0.654 [0.513;0.834] RR: relative risk. CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation. RR: relative risk. 20 Figure 1: The three approaches considered after multiple imputation (MI) of the partially observed covariates ? are missing values on the original dataset. ∗(k), (k = 1, ...,M) are imputed values in the kth imputed dataset. θˆ(k) and eˆ(k) are the estimated treatment effect and estimated propensity scores, respectively from the kth imputed dataset , (k = 1, ...,M). The MIte approach consists of pooling the M treatment effects estimated with IPTW on each imputed dataset. MIps estimate is obtained by using the average PS across the M imputed datasets in the IPTW estimator. Finally, the MIpar approach uses the PS of the average covariate value across the M imputed dataset. The PS is estimated using the average PS parameters as regression coefficients. 21 Missingness associated with Y B ia s B ia s 0 0. 05 0. 10 0. 15 0. 20 Full CC MP MIte+ MIte− MIps+ MIps− MIpar+ MIpar− Missingness independent of Y B ia s B ia s 0 0. 05 0. 10 0. 15 0. 20 Full CC MP MIte+ MIte− MIps+ MIps− MIpar+ MIpar− Missingness associated with Y B ia s B ia s 0 0. 05 0. 10 0. 15 0. 20 Full CC MP MIte+ MIte− MIps+ MIps− MIpar+ MIpar− Missingness independent of Y B ia s B ia s 0 0. 05 0. 10 0. 15 0. 20 Full CC MP MIte+ MIte− MIps+ MIps− MIpar+ MIpar− RR=1 RR=2 Figure 2: Absolute value of the bias for the 4 scenarios in which ρ = 0.6. CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation. For the 3 MI approaches ’+’ means that the outcome is included in the imputation model, ’-’ means that the outcome is not in the imputation model. RR: relative risk. 22 Co ve ra ge ra te (% ) 55 65 75 85 95 Full CC MP MIte MIps MIpar Figure 3: Coverage rate of the 95% CI for each method compared Results are pooled for the 8 main scenarios. CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation. RR: relative risk. For the 3 MI methods, the outcome is included in the imputation model. 10% of missing data B ia s 0 0. 05 0. 10 0. 15 0. 20 Full CC MP MIte MIps MIpar 60% of missing data B ia s 0 0. 05 0. 10 0. 15 0. 20 Full CC MP MIte MIps MIpar Figure 4: Absolute value of the bias according to the missingness rate CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation. RR: relative risk. For the 3 MI methods, the outcome is included in the imputation model. 23 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 5 10 15 20 25 PS distribution D en si ty Statin users Non statin users Figure 5: Distribution of the propensity score estimated on complete cases for statin users and non users (n=5168). 24 Appendix Appendix 1: Variance estimate for MIpar In this section, we provide a crude approximation to the variance for our MIpar estimator. In MIpar, we take the average propensity score parameters across imputed datasets, α, and the average of each covariate across imputed dataset for each individual, Xi = (Xobs,i, ∑ k X (k) m,i/K) and use these to calculate the propensity score, e(Xi;α). The MIpar treatment effect estimate is: θˆMIpar = ∑ i YiZi e(Xi;α)∑ i Zi e(Xi;α) − ∑ i Yi(1−Zi) (1−e(Xi;α))∑ i (1−Zi) (1−e(Xi;α)) If we let I represent the imputation distribution, and W the distribution of observed data (Y,Z,Xobs), the variance of the treatment effect estimate is given by V [θˆMIpar ] = EW [ VI|W (θˆMIpar ) ] + VW ( EI|W [θˆMIpar ] ) (13) Suppose αˆF denotes the propensity score parameters that would have been obtained in the full data, then EI|W [ θˆMIpar ] = ∑ i YiZi e(Xi;αˆF )∑ i Zi e(Xi;αˆF ) − ∑ i Yi(1−Zi) (1−e(Xi;αˆF ))∑ i (1−Zi) (1−e(Xi;αˆF )) + rn with rn p→ 0 as n → ∞. This is simply the full-data IPTW estimate, with the propensity scores evaluated at X (and at the full data parameter estimates) rather than X. Following Williamson et al., the same calculation shows that the large-sample variance of this full-data estimate, evaluated at X rather than X, is given by: VW ( EI|W [ θˆMI ] ) = Vun − vTCαv +  where Vun is the uncorrected variance estimate, that is considering that the PS is a true value rather than an estimate, which can be estimated by: Vˆun = K21 nwˆ1 2 n∑ i=1 (Yi − µˆ1)2Zi eˆ2i + K20 nwˆ0 2 n∑ i=1 (Yi − µˆ0)2(1− Zi) (1− eˆi)2 , and vˆ = K1 nwˆ1 n∑ i=1 x¯i(Yi − µˆ1)Zi(1− eˆi) eˆi + K0 nwˆ0 n∑ i=1 x¯i(Yi − µˆ1)(1− Zi)eˆi (1− eˆi) . µˆ1 and µˆ0 are the marginal (weighted) means in the treated and control group, and wˆ1 and wˆ0 the average estimated weights for IPTW in the treated and control group, respectively, and eˆi = e(x¯i; αˆ). Finally, the quantities K0 and K1 depend on the measure of interest for the treatment effect. For a difference in mean or a risk difference, K0 = K1 = 1. For a relative risk, K0 = µˆ0 −1 and K1 = µˆ1−1, and for an odds ratio, K0 = {µˆ0(1− µˆ0)}−1 and K1 = {µˆ1(1− µˆ1)}−1. Cα is n 2×the variance/covariance matrix of the estimated PS parameters αˆ in the full data, which we estimate by the within-imputation matrix W. The additional part of the variance is  = 2vCα ∗, where ∗ = 1 nE[Z/e(x¯;αF )] E [ x>(Y − µF1 )Z { (e(x;αF )− e(x;αF ) e(x;αF ) }] + 1 nE[(1− Z)/(1− e(x¯;αF ))]E [ x>(Y − µF0 )(1− Z) { (1− e(x;αF ))− (1− e(x;αF ))) 1− e(x;αF ) }] This is the only part of the variance which involves the full (unobserved) data. The magnitude of this term is driven partly by the difference between the propensity score evaluated at the average covariates across imputed datasets and the true (possibly unobserved) covariate values. For complete cases, therefore, this term is 0. We could attempt to estimate this component using the imputed datasets. However, we take a pragmatic approach and assume that pooling of the propensity scores is undertaken due to a desire to avoid the need to retain all K imputed datasets. Therefore, we ignore this term in the simulation study and evaluate the performance of 25 the estimator without this term. Thus, we assume that approximately V̂W ( EI|W [ θˆMI ] ) = Vˆun − vˆTWvˆ (14) This is our estimator for the second term in 13. For the first term, we begin by noting that, conditional on the observed data (Y, Z,Xobs), we have approximately, VI|W (θˆMIpar ) = ( ∂θˆMIpar ∂α )T CovI|W (α) ( ∂θˆMIpar ∂α ) Noting that the average of the propensity score parameters across imputed datasets is also the standard MI estimate of these parameters, Ĉov(α) = ( 1 + 1 M ) B where B is the between-imputation covariance matrix of the propensity score parameters. Differentiating the estimate above gives ∂θˆMI ∂α = − ∑ i [ (Y−µˆ1)Zxe(x;α)(1−e(x;α)) (e(x;α))2 ] ∑ i [ Z e(x;α) ] − ∑i [ (Y−µˆ0)(1−Z)xe(x;α)(1−e(x;α)) (1−e(x;α))2 ] ∑ i [ (1−Z) (1−e(x;α)) ] = −vˆ Thus, we can estimate Ê [ VI|W (θˆMI) ] = ( 1 + 1 M ) vˆTBvˆ (15) Putting (14) and (15) into (13), we have Vˆ (θˆMI) = Vˆun − vˆT { W − ( 1 + 1 M ) B } vˆ Note that the two components of variability in the propensity score parameters - W and B - act in opposite directions on the variance of the treatment effect estimator. The between-imputation variance reflects noise due to the missing data, thus this adds to the overall variance of the treatment effect estimator. The within- imputation variance reflects the correlation between the covariates and treatment (ie covariate imbalance), which results in smaller variance of the treatment effect estimator (since the IPTW estimator gains precision by rebalancing imbalanced covariates). 26 Appendix 2: Assumptions required when pooling treatment effects Notations Let X the vector of covariates be split into observed and partially missing, X = (Xobs,Xmiss). X (k) m is the imputed value for Xmiss in the k th imputed dataset (k = 1, ..,M) and α(k) the true propensity score parameters in the kth imputed dataset (with α(k) = α, the overall true PS parameters, k=1,...,M). Appendix 2a: Proof that E[Z|Xobs,X(k)m ] = e(Xobs,X(k)m ;α(k)) We have: E[Z|Xobs,X(k)m ] = Pr(X (k) m |Z = 1,Xobs)Pr(Z = 1|Xobs)∑ z=0,1 Pr(X (k) m |Z = z,Xobs)Pr(Z = z|Xobs) . (16) We now express the probabilities Pr(X (k) m |Z = z,Xobs) in terms of the (unobserved) missing values: Pr(X(k)m |Z = z,Xobs) = ∑ y=0,1 Pr(X(k)m |Z = z,Xobs, Y = y)Pr(Y = y |Z = z,Xobs) = ∑ y=0,1 Pr(Xmiss|Z = z,Xobs, Y = y)Pr(Y = y |Z = z,Xobs) = Pr(Xmiss|Z = z,Xobs), if the values are imputed from the true distribution. Substituting this back into (16) gives E[Z|Xobs,X(k)m = x] = E[Z|Xobs,Xmiss = x] = e(Xobs,x;α) = e(Xobs,X(k)m = x;α(k)). 27 Appendix 2b: Proof that Y z=1 ⊥ Z |Xobs,X(k)m We have: Pr(Z = 1|Y z=1 = y1,Xobs,X(k)m ) = Pr(X (k) m |Z = 1, Y z=1 = y1,Xobs)Pr(Z = 1|Y z=1 = y1,Xobs) Pr(X (k) m |Y z=1 = y1,Xobs) = Pr(Xmiss|Z = 1, Y z=1 = y1,Xobs)Pr(Z = 1|Y z=1 = y1,Xobs) Pr(Xmiss|Y1 = y1,Xobs) (17) = Pr(Z = 1|Y z=1 = y1,Xobs,Xmiss) = Pr(Z = 1|Xobs,Xmiss) (18) = Pr(Z = 1|Xobs,X(k)m ), (19) where (17) was reached by noting that Y1 = Y when Z = 1, (18) is true by the SITA assumption for the full data (Assumption 1), and (19) was shown in Appendix 1a. Proof that Y z=0 ⊥ Z |Xobs,X(k)m follows in a similar way. 28 Appendix 3: inconsistency of MIps and MIpar estimators: a counter example Neither θˆMIps nor θˆMIpar are consistent estimators. This can be shown by a simple counter-example. Suppose we have a single covariate, X, which is MAR depending on treatment Z only: X ∼ Bernouilli(0.5) Z ∼ { Bernouilli(0.1) if X = 0 Bernouilli(0.9) if X = 1 Y ∼ { Bernouilli(0.1) if X = 0 and/or Z = 0 Bernouilli(0.9) if X = Z = 1 R ∼ { 1 if Z = 0 Bernouilli(0.1) if Z = 1 Xobs = X : R = 1 and Xmiss = X : R = 0. The treatment model corresponds to: ln(e/(1− e)) = −2.1972246 + 4.3944492X, thus α = (−2.1972246, 4.3944492)T . The true expected potential outcome is E[Y z=1] = 0.5. Consistency when pooling propensity scores When Z = 1 and Y = 1, the expected propensity score is E[e(Xobs, X(k)m ;α)] = E [E[e(X,α)|X]] = 0.89. When Z = 1, Y = 1, X can take values of 0, 1 or missing, producing average (expected) propensity scores of 0.1, 0.9, and 0.89, respectively. Thus E [ Y Z E[e(Xobs, X(k)m ;αt)] ] = Pr(X = 0, Z = 1, Y = 1, R = 1)/0.1 + Pr(X = 1, Z = 1, Y = 1, R = 1)/0.9 + Pr(Z = 1, Y = 1, R = 0)/0.89 = 0.465 So E[ Y Z E [ e(Xobs,X (k) m ;α) ] ] 6= E[Y z=1]. Thus pooling the propensity scores will produce an inconsistent estimator here. Consistency when pooling propensity score parameters When Y = 1, Z = 1, the expected value of X is µbarXE[Xmiss|Z = 1, Y = 1] = Pr(X = 1, Z = 1, Y = 1)/Pr(Z = 1, Y = 1) = 81/82. The true propensity score evaluated at α with X = 0, 1 and 81/82 is 0, 1, and 0.895, respectively. Then, E [ Y Z e(Xobs, µX ;α) ] = Pr(X = 0, Z = 1, Y = 1, R = 1)/0.1 + Pr(X = 1, Z = 1, Y = 1, R = 1)/0.9 + Pr(Z = 1, Y = 1, R = 0)/0.895 = 0.462 So E [ Y Z e(Xobs,µX ;α) ] 6= E[Y z=1]. Thus pooling the propensity scores will produce an inconsistent estimator here also. Appendix 4: Consistency for the MP approach Let e∗ be the generalized PS estimated from the MP approach. Rosenbaum and Rubin showed that Xobs ⊥ Z|e∗. This implies that for each value of e∗, the observed part of the covariates and the frequency of missingness are balanced between groups. However, they state that we do not have: X ⊥ Z|e∗. Because we do not have this conditional independence, e∗ is not a balancing score for (Xobs,Xmiss) and thus the MP estimator of treatment effect is inconsistent, without making further conditional independence assumptions. 29 A p p e n d ix 5 : A d d it io n a l si m u la ti o n re su lt s 5 .1 : V a ri a n ce e st im a ti o n T ab le 7: U n co rr ec te d an d co rr ec te d m o d el -b as ed va ri a n ce a n d em p ir ic a l va ri a n ce o f th e tr ea tm en t eff ec t es ti m a to r fo r th e 8 m a in sc en a ri o s. S c e n a ri o F u ll M It e M Ip s M Ip a r Vˆ u Vˆ P S Vˆ e Vˆ m Vˆ P S + m Vˆ e Vˆ P S Vˆ P S + m Vˆ e Vˆ P S Vˆ P S + m Vˆ e R R = 1 , ρ = 0 .3 , Y 0 .0 1 0 0 .0 0 9 0 .0 0 9 0 .0 1 1 0 .0 1 0 0 .0 0 9 0 .0 0 9 0 .0 0 8 0 .0 0 9 0 .0 1 0 0 .0 0 9 0 .0 0 9 R R = 1 , ρ = 0 .3 0 .0 1 0 0 .0 0 9 0 .0 0 9 0 .0 1 1 0 .0 1 0 0 .0 0 9 0 .0 0 9 0 .0 0 8 0 .0 0 9 0 .0 1 0 0 .0 0 9 0 .0 0 9 R R = 2 , ρ = 0 .3 , Y 0 .0 0 7 0 .0 0 6 0 .0 0 6 0 .0 0 7 0 .0 0 6 0 .0 0 6 0 .0 0 6 0 .0 0 6 0 .0 0 6 0 .0 0 6 0 .0 0 6 0 .0 0 6 R R = 2 , ρ = 0 .3 0 .0 0 7 0 .0 0 6 0 .0 0 6 0 .0 0 7 0 .0 0 6 0 .0 0 6 0 .0 0 6 0 .0 0 6 0 .0 0 6 0 .0 0 6 0 .0 0 6 0 .0 0 6 R R = 1 , ρ = 0 .6 , Y 0 .0 1 1 0 .0 0 9 0 .0 0 9 0 .0 1 2 0 .0 0 9 0 .0 0 9 0 .0 1 0 0 .0 0 9 0 .0 0 9 0 .0 1 0 0 .0 0 9 0 .0 0 9 R R = 1 , ρ = 0 .6 0 .0 1 1 0 .0 0 9 0 .0 0 9 0 .0 1 2 0 .0 0 9 0 .0 0 9 0 .0 1 0 0 .0 0 9 0 .0 0 9 0 .0 1 0 0 .0 0 9 0 .0 0 9 R R = 2 , ρ = 0 .6 , Y 0 .0 0 8 0 .0 0 6 0 .0 0 6 0 .0 0 8 0 .0 0 7 0 .0 0 6 0 .0 0 7 0 .0 0 6 0 .0 0 6 0 .0 0 7 0 .0 0 6 0 .0 0 6 R R = 2 , ρ = 0 .6 0 .0 0 8 0 .0 0 6 0 .0 0 6 0 .0 0 8 0 .0 0 7 0 .0 0 6 0 .0 0 7 0 .0 0 6 0 .0 0 6 0 .0 0 7 0 .0 0 6 0 .0 0 6 N u m b er s a re m ea n es ti m a te s o f th e v a ri a n ce s in th e 8 m a in sc en a ri o s. Y m ea n s th e o u tc o m e is in cl u d ed in th e im p u ta ti o n m o d el . R R : re la ti v e ri sk ; M It e: tr ea tm en t eff ec ts co m b in ed a ft er m u lt ip le im p u ta ti o n ; M Ip s: p ro p en si ty sc o re s co m b in ed a ft er m u lt ip le im p u ta ti o n ; M Ip a r: p ro p en si ty sc o re p a ra m et er s co m b in ed a ft er m u lt ip le im p u ta ti o n . Vˆ u V a ri a n ce es ti m a te n o t a cc o u n ti n g fo r th e u n ce rt a in ty in P S es ti m a ti o n o r m is si n g d a ta Vˆ P S V a ri a n ce es ti m a te a cc o u n ti n g fo r th e u n ce rt a in ty in P S es ti m a ti o n (W il li a m so n et . a l fo rm u la ) Vˆ m V a ri a n ce es ti m a te a cc o u n ti n g fo r m is si n g d a ta b u t n o t fo r th e u n ce rt a in ty in P S es ti m a ti o n Vˆ P S + m V a ri a n ce es ti m a te a cc o u n ti n g fo r th e u n ce rt a in ty in P S es ti m a ti o n a n d a cc o u n ti n g fo r th e p re se n ce o f m is si n g d a ta Vˆ e E m p ir ic a l v a ri a n ce 30 5.2: Full results for a binary outcome Table 8: Scenario 1: RR=1, ρ = 0.3 and outcome predictor of missingness and included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.538 0.000 0.092 0.182 0.003 0.036 0.021 Variance 0.006 0.009 0.025 0.011 0.010 0.008 0.009 Empirical variance 0.006 0.009 0.026 0.009 0.009 0.009 0.009 Coverage rate 0.000 0.947 0.898 0.570 0.959 0.929 0.944 OR Bias 0.722 0.000 0.121 0.239 0.004 0.048 0.028 Variance 0.012 0.015 0.041 0.019 0.016 0.015 0.015 Empirical variance 0.012 0.015 0.042 0.016 0.015 0.015 0.015 Coverage rate 0.000 0.948 0.908 0.580 0.958 0.931 0.946 Risk difference Bias 0.136 0.000 0.022 0.043 0.001 0.009 0.005 Variance 0.000 0.000 0.001 0.001 0.001 0.000 0.000 Empirical variance 0.000 0.000 0.001 0.001 0.000 0.000 0.000 Coverage rate 0.000 0.949 0.928 0.602 0.960 0.935 0.947 * For complete case analysis, the average sample size is 1061. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. Table 9: Scenario 2: RR=1, ρ = 0.3 and outcome independent of missingness but included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.539 -0.001 -0.007 0.082 0.002 0.030 0.016 Variance 0.006 0.009 0.030 0.011 0.010 0.008 0.009 Empirical variance 0.006 0.009 0.031 0.009 0.009 0.009 0.009 Coverage rate 0.000 0.949 0.941 0.890 0.958 0.938 0.943 OR Bias 0.722 0.000 -0.006 0.107 0.003 0.040 0.022 Variance 0.012 0.015 0.044 0.018 0.016 0.015 0.015 Empirical variance 0.012 0.015 0.046 0.016 0.015 0.015 0.015 Coverage rate 0.000 0.949 0.943 0.895 0.958 0.939 0.945 Risk difference Bias 0.136 0.000 0.001 0.019 0.001 0.007 0.004 Variance 0.000 0.000 0.001 0.001 0.001 0.000 0.000 Empirical variance 0.000 0.000 0.001 0.001 0.000 0.000 0.000 Coverage rate 0.000 0.948 0.942 0.903 0.959 0.942 0.946 * For complete case analysis, the average sample size is 1103. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. 31 Table 10: Scenario 3: RR=2, ρ = 0.3 and outcome predictor of missingness and included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.438 0.000 0.111 0.145 0.002 0.028 0.016 Variance 0.004 0.006 0.013 0.007 0.006 0.006 0.006 Empirical variance 0.004 0.006 0.014 0.006 0.006 0.006 0.006 Coverage rate 0.000 0.949 0.815 0.584 0.958 0.931 0.940 OR Bias 0.747 0.002 0.130 0.227 0.004 0.047 0.026 Variance 0.011 0.014 0.036 0.018 0.016 0.014 0.015 Empirical variance 0.011 0.015 0.038 0.016 0.015 0.015 0.015 Coverage rate 0.000 0.947 0.887 0.622 0.958 0.930 0.942 Risk difference Bias 0.163 0.000 0.018 0.048 0.001 0.010 0.005 Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.948 0.928 0.661 0.957 0.928 0.939 * For complete case analysis, the average sample size is 1076. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. Table 11: Scenario 4: RR=2, ρ = 0.3 and outcome independent of missingness but included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.439 0.002 0.102 0.074 0.003 0.027 0.016 Variance 0.004 0.006 0.015 0.007 0.006 0.006 0.006 Empirical variance 0.004 0.006 0.016 0.006 0.006 0.006 0.006 Coverage rate 0.000 0.950 0.850 0.868 0.958 0.931 0.941 OR Bias 0.748 0.005 0.051 0.112 0.007 0.041 0.023 Variance 0.011 0.014 0.036 0.018 0.016 0.014 0.015 Empirical variance 0.011 0.015 0.038 0.016 0.015 0.015 0.015 Coverage rate 0.000 0.950 0.935 0.883 0.959 0.933 0.941 Risk difference Bias 0.163 0.001 -0.014 0.023 0.001 0.008 0.004 Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.951 0.919 0.898 0.959 0.936 0.943 * For complete case analysis, the average sample size is 1103. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. 32 Table 12: Scenario 5: RR=1, ρ = 0.6 and outcome predictor of missingness and included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.650 0.001 0.099 0.162 0.006 0.036 0.022 Variance 0.006 0.009 0.027 0.012 0.009 0.009 0.009 Empirical variance 0.006 0.009 0.029 0.010 0.009 0.009 0.009 Coverage rate 0.000 0.945 0.886 0.686 0.955 0.928 0.937 OR Bias 0.883 0.002 0.128 0.214 0.008 0.048 0.029 Variance 0.011 0.015 0.044 0.020 0.017 0.015 0.015 Empirical variance 0.012 0.016 0.047 0.017 0.016 0.016 0.016 Coverage rate 0.000 0.945 0.896 0.693 0.955 0.928 0.938 Risk difference Bias 0.169 0.001 0.023 0.040 0.002 0.009 0.006 Variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.945 0.914 0.705 0.956 0.932 0.941 * For complete case analysis, the average sample size is 1058. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. Table 13: Scenario 6: RR=1, ρ = 0.6 and outcome independent of missingness but included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.649 -0.001 -0.001 0.063 0.003 0.030 0.016 Variance 0.006 0.009 0.033 0.011 0.009 0.009 0.009 Empirical variance 0.006 0.009 0.035 0.010 0.009 0.009 0.009 Coverage rate 0.000 0.944 0.938 0.927 0.954 0.931 0.940 OR Bias 0.881 0.000 0.002 0.084 0.005 0.040 0.022 Variance 0.011 0.015 0.048 0.020 0.017 0.015 0.015 Empirical variance 0.011 0.016 0.051 0.017 0.016 0.015 0.016 Coverage rate 0.000 0.944 0.941 0.929 0.954 0.933 0.941 Risk difference Bias 0.169 0.000 0.002 0.016 0.001 0.007 0.004 Variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.945 0.938 0.934 0.953 0.936 0.941 * For complete case analysis, the average sample size is 1099. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. 33 Table 14: Scenario 7: RR=2, ρ = 0.6 and outcome predictor of missingness and included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.529 0.002 0.141 0.130 0.005 0.028 0.017 Variance 0.004 0.006 0.014 0.008 0.007 0.006 0.006 Empirical variance 0.004 0.006 0.014 0.007 0.006 0.006 0.006 Coverage rate 0.000 0.947 0.769 0.691 0.957 0.932 0.942 OR Bias 0.924 0.005 0.148 0.210 0.009 0.047 0.028 Variance 0.011 0.016 0.040 0.022 0.018 0.016 0.016 Empirical variance 0.011 0.017 0.040 0.018 0.017 0.016 0.017 Coverage rate 0.000 0.948 0.887 0.712 0.958 0.932 0.943 Risk difference Bias 0.199 0.001 0.017 0.045 0.002 0.011 0.006 Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.945 0.937 0.731 0.957 0.932 0.942 * For complete case analysis, the average sample size is 1074. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. Table 15: Scenario 8: RR=2, ρ = 0.6 and outcome independent of missingness but included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.531 0.002 0.138 0.057 0.005 0.027 0.016 Variance 0.004 0.006 0.016 0.008 0.007 0.006 0.006 Empirical variance 0.004 0.006 0.017 0.007 0.006 0.006 0.006 Coverage rate 0.000 0.947 0.793 0.921 0.960 0.937 0.946 OR Bias 0.927 0.006 0.077 0.091 0.009 0.041 0.024 Variance 0.011 0.016 0.039 0.021 0.018 0.016 0.016 Empirical variance 0.011 0.016 0.041 0.017 0.016 0.016 0.016 Coverage rate 0.000 0.949 0.935 0.924 0.959 0.938 0.944 Risk difference Bias 0.200 0.002 -0.015 0.020 0.002 0.009 0.005 Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.947 0.918 0.926 0.955 0.934 0.941 * For complete case analysis, the average sample size is 1099. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. 34 Table 16: Scenario 9: RR=1, ρ = 0.3 and outcome predictor of missingness but not included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.540 0.000 0.093 0.182 0.084 0.100 0.090 Variance 0.006 0.009 0.025 0.011 0.010 0.009 0.009 Empirical variance 0.006 0.009 0.026 0.010 0.008 0.008 0.008 Coverage rate 0.000 0.946 0.892 0.573 0.883 0.807 0.835 OR Bias 0.724 0.001 0.121 0.239 0.111 0.132 0.119 Variance 0.012 0.015 0.041 0.019 0.017 0.015 0.015 Empirical variance 0.012 0.015 0.043 0.017 0.015 0.015 0.015 Coverage rate 0.000 0.947 0.901 0.580 0.886 0.810 0.838 Risk difference Bias 0.136 0.000 0.022 0.043 0.020 0.024 0.022 Variance 0.000 0.000 0.001 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.000 0.001 0.001 0.000 0.000 0.000 Coverage rate 0.000 0.947 0.920 0.598 0.892 0.820 0.849 * For complete case analysis, the average sample size is 1060. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. Table 17: Scenario 10: RR=1, ρ = 0.3 and outcome independent of missingness and not included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.540 0.000 -0.006 0.082 0.088 0.098 0.090 Variance 0.006 0.009 0.030 0.011 0.010 0.008 0.009 Empirical variance 0.006 0.009 0.032 0.009 0.008 0.008 0.008 Coverage rate 0.000 0.947 0.942 0.891 0.882 0.824 0.845 OR Bias 0.724 0.000 -0.005 0.108 0.116 0.129 0.119 Variance 0.012 0.015 0.045 0.018 0.017 0.015 0.015 Empirical variance 0.012 0.015 0.047 0.016 0.014 0.014 0.014 Coverage rate 0.000 0.947 0.941 0.894 0.884 0.827 0.847 Risk difference Bias 0.136 0.000 0.001 0.020 0.021 0.024 0.022 Variance 0.000 0.000 0.001 0.001 0.001 0.000 0.000 Empirical variance 0.000 0.000 0.001 0.001 0.000 0.000 0.000 Coverage rate 0.000 0.948 0.940 0.903 0.893 0.835 0.855 * For complete case analysis, the average sample size is 1103. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. 35 Table 18: Scenario 11: RR=2, ρ = 0.3 and outcome predictor of missingness but not included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.436 0.000 0.111 0.144 0.065 0.077 0.070 Variance 0.004 0.006 0.013 0.007 0.006 0.006 0.006 Empirical variance 0.004 0.006 0.014 0.006 0.006 0.006 0.006 Coverage rate 0.000 0.946 0.818 0.583 0.893 0.825 0.854 OR Bias 0.743 0.001 0.130 0.227 0.108 0.128 0.115 Variance 0.011 0.014 0.036 0.018 0.017 0.015 0.015 Empirical variance 0.011 0.015 0.038 0.017 0.015 0.015 0.015 Coverage rate 0.000 0.945 0.894 0.614 0.887 0.819 0.847 Risk difference Bias 0.162 0.000 0.018 0.048 0.024 0.028 0.025 Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.944 0.923 0.646 0.886 0.822 0.851 * For complete case analysis, the average sample size is 1075. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. Table 19: Scenario 12: RR=2, ρ = 0.3 and outcome independent of missingness and not included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.437 0.002 0.099 0.073 0.069 0.079 0.073 Variance 0.004 0.006 0.015 0.007 0.006 0.006 0.006 Empirical variance 0.004 0.006 0.016 0.006 0.006 0.006 0.006 Coverage rate 0.000 0.947 0.860 0.875 0.879 0.824 0.843 OR Bias 0.745 0.004 0.047 0.110 0.115 0.126 0.117 Variance 0.011 0.014 0.036 0.018 0.017 0.015 0.015 Empirical variance 0.011 0.015 0.038 0.016 0.014 0.014 0.015 Coverage rate 0.000 0.947 0.936 0.887 0.877 0.829 0.846 Risk difference Bias 0.162 0.001 -0.015 0.023 0.026 0.027 0.025 Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.946 0.913 0.901 0.879 0.836 0.855 * For complete case analysis, the average sample size is 1103. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. 36 Table 20: Scenario 13: RR=1, ρ = 0.6 and outcome predictor of missingness but not included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.650 0.001 0.096 0.161 0.061 0.082 0.070 Variance 0.006 0.009 0.027 0.012 0.010 0.009 0.009 Empirical variance 0.006 0.009 0.029 0.010 0.009 0.009 0.009 Coverage rate 0.000 0.939 0.896 0.683 0.919 0.858 0.882 OR Bias 0.883 0.002 0.124 0.214 0.081 0.109 0.094 Variance 0.011 0.015 0.044 0.020 0.017 0.015 0.015 Empirical variance 0.012 0.016 0.046 0.017 0.015 0.015 0.015 Coverage rate 0.000 0.938 0.905 0.690 0.921 0.861 0.885 Risk difference Bias 0.169 0.001 0.022 0.040 0.015 0.020 0.017 Variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.939 0.925 0.703 0.925 0.867 0.890 * For complete case analysis, the average sample size is 1058. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. Table 21: Scenario 14: RR=1, ρ = 0.6 and outcome independent of missingness and not included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.650 -0.001 -0.001 0.063 0.063 0.080 0.070 Variance 0.006 0.009 0.033 0.011 0.010 0.009 0.009 Empirical variance 0.006 0.009 0.034 0.009 0.008 0.008 0.008 Coverage rate 0.000 0.952 0.937 0.931 0.928 0.870 0.893 OR Bias 0.882 0.000 0.002 0.083 0.084 0.106 0.093 Variance 0.011 0.015 0.048 0.020 0.017 0.015 0.015 Empirical variance 0.012 0.015 0.051 0.016 0.014 0.014 0.014 Coverage rate 0.000 0.953 0.939 0.935 0.929 0.873 0.896 Risk difference Bias 0.169 0.000 0.002 0.015 0.016 0.020 0.017 Variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.001 0.001 0.001 0.000 0.000 Coverage rate 0.000 0.953 0.938 0.938 0.932 0.882 0.904 * For complete case analysis, the average sample size is 1099. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. 37 Table 22: Scenario 15: RR=2, ρ = 0.6 and outcome predictor of missingness but not included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.529 0.000 0.139 0.129 0.048 0.064 0.056 Variance 0.004 0.006 0.014 0.008 0.007 0.006 0.006 Empirical variance 0.004 0.006 0.014 0.006 0.006 0.006 0.006 Coverage rate 0.000 0.953 0.763 0.696 0.929 0.878 0.901 OR Bias 0.924 0.001 0.144 0.207 0.081 0.107 0.092 Variance 0.011 0.016 0.039 0.021 0.018 0.017 0.017 Empirical variance 0.011 0.016 0.041 0.018 0.016 0.016 0.016 Coverage rate 0.000 0.953 0.897 0.716 0.927 0.872 0.894 Risk difference Bias 0.199 0.001 0.016 0.045 0.018 0.024 0.021 Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.948 0.936 0.737 0.925 0.869 0.894 * For complete case analysis, the average sample size is 1074. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. Table 23: Scenario 16: RR=2, ρ = 0.6 and outcome independent of missingness and not included in the imputation model. Crude Full CC* MP MIte MIps MIpar RR Bias 0.530 0.001 0.139 0.057 0.051 0.065 0.058 Variance 0.004 0.006 0.016 0.008 0.007 0.006 0.006 Empirical variance 0.004 0.006 0.018 0.007 0.006 0.006 0.006 Coverage rate 0.000 0.944 0.795 0.918 0.923 0.869 0.891 OR Bias 0.926 0.003 0.077 0.090 0.086 0.105 0.093 Variance 0.011 0.016 0.039 0.021 0.018 0.016 0.017 Empirical variance 0.011 0.017 0.042 0.018 0.016 0.017 0.017 Coverage rate 0.000 0.945 0.928 0.925 0.921 0.875 0.894 Risk difference Bias 0.199 0.001 -0.015 0.019 0.019 0.023 0.020 Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.945 0.912 0.927 0.919 0.877 0.894 * For complete case analysis, the average sample size is 1099. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. 38 5.3: Impact of the sample size Table 24: Results for one scenario (RR=2, ρ = 0.6 and outcome predictor of missingness and included in the imputation model) with n = 500. Crude Full CC* MP MIte MIps MIpar RR Bias 0.442 0.007 0.110 0.153 0.010 0.038 0.024 Variance 0.017 0.022 0.050 0.029 0.026 0.022 0.023 Empirical variance 0.018 0.024 0.059 0.027 0.025 0.024 0.025 Coverage rate 0.065 0.940 0.887 0.855 0.955 0.939 0.943 OR Bias 0.753 0.015 0.145 0.244 0.020 0.066 0.041 Variance 0.044 0.057 0.139 0.077 0.065 0.058 0.058 Empirical variance 0.045 0.061 0.168 0.071 0.063 0.062 0.063 Coverage rate 0.047 0.942 0.913 0.865 0.952 0.934 0.937 Risk difference Bias 0.162 0.002 0.021 0.051 0.003 0.013 0.008 Variance 0.002 0.003 0.007 0.004 0.003 0.003 0.003 Empirical variance 0.002 0.003 0.009 0.004 0.003 0.003 0.003 Coverage rate 0.046 0.934 0.902 0.869 0.947 0.928 0.929 * For complete case analysis, the average sample size is 269. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. 39 5.4: Impact of the missingness rate Table 25: Results for one scenario (RR=2, ρ = 0.6 and outcome predictor of missingness and included in the imputation model) with 10% of data missing for X1 and X3. Crude Full CC* MP MIte MIps MIpar RR Bias 0.529 0.000 0.066 0.061 0.001 0.009 0.005 Variance 0.004 0.006 0.007 0.008 0.006 0.006 0.006 Empirical variance 0.004 0.006 0.008 0.007 0.006 0.006 0.006 Coverage rate 0.000 0.946 0.874 0.907 0.951 0.945 0.947 OR Bias 0.924 0.002 0.073 0.098 0.003 0.015 0.009 Variance 0.011 0.016 0.020 0.021 0.017 0.016 0.016 Empirical variance 0.012 0.017 0.021 0.018 0.017 0.017 0.017 Coverage rate 0.000 0.944 0.913 0.904 0.949 0.945 0.946 Risk difference Bias 0.199 0.001 0.010 0.021 0.001 0.004 0.002 Variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.942 0.933 0.905 0.948 0.943 0.944 * For complete case analysis, the average sample size is 1609. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. Table 26: Results for one scenario (RR=2, ρ = 0.6 and outcome predictor of missingness and included in the imputation model) with 60% of data missing for X1 and X3. Crude Full CC* MP MIte MIps MIpar RR Bias 0.530 0.001 0.184 0.175 0.010 0.063 0.037 Variance 0.004 0.006 0.074 0.008 0.008 0.006 0.006 Empirical variance 0.004 0.006 0.101 0.007 0.007 0.006 0.007 Coverage rate 0.000 0.945 0.802 0.490 0.962 0.869 0.916 OR Bias 0.925 0.004 0.184 0.285 0.018 0.105 0.060 Variance 0.011 0.016 0.208 0.025 0.021 0.016 0.017 Empirical variance 0.012 0.017 0.285 0.022 0.019 0.017 0.018 Coverage rate 0.000 0.944 0.889 0.544 0.963 0.863 0.916 Risk difference Bias 0.199 0.001 0.010 0.062 0.004 0.023 0.013 Variance 0.000 0.001 0.011 0.001 0.001 0.001 0.001 Empirical variance 0.000 0.001 0.014 0.001 0.001 0.001 0.001 Coverage rate 0.000 0.945 0.867 0.578 0.963 0.862 0.913 * For complete case analysis, the average sample size is 399. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. 40 5.5: Number of imputed datasets Table 27: Bias of the log(RR), its variance and coverage rate for the 3 MI approaches according the number M of imputed datasets for one scenario (RR=2, ρ = 0.6, outcome predictor of missingness and included in the imputation model). MIte MIps MIpar M=5 M=10 M=20 M=5 M=10 M=20 M=5 M=10 M=20 Bias 0.002 0.005 0.003 0.026 0.028 0.031 0.015 0.017 0.018 Variance 0.006 0.007 0.006 0.006 0.006 0.006 0.006 0.006 0.006 Empirical variance 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 Coverage rate 0.957 0.957 0.953 0.931 0.932 0.924 0.940 0.942 0.937 MIte: treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. 41 5.6: Partially observed outcome and treatment indicator Table 28: Results for one scenario (RR=2, ρ = 0.6 and outcome predictor of missingness and included in the imputation model) with 30% of data missing for X1, X3, the outcome Y and the treatment indicator Z. Crude Full CC* MIte RR Bias 0.530 0.002 0.207 0.000 Variance 0.004 0.006 0.032 0.014 Empirical variance 0.004 0.006 0.034 0.012 Coverage rate 0.000 0.949 0.770 0.956 OR Bias 0.926 0.005 0.164 -0.002 Variance 0.011 0.016 0.078 0.034 Empirical variance 0.011 0.016 0.083 0.031 Coverage rate 0.000 0.947 0.906 0.952 Risk difference Bias 0.200 0.001 -0.002 -0.001 Variance 0.000 0.001 0.004 0.002 Empirical variance 0.000 0.001 0.004 0.002 Coverage rate 0.000 0.950 0.923 0.953 * For complete case analysis, the average sample size is 635. RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple imputation. 42 Appendix 6: PS distribution in the simulation study Figure 6: Distribution of the propensity score in both groups in our simulation study. These distributions are obtained from one dataset of size n=1000000. 43