Propensity score analysis with partially observed covariates: how
should multiple imputation be used?
Cle´mence Leyrat 1, Shaun R. Seaman 2, Ian R. White 2,
Ian Douglas 3, Liam Smeeth 3, Joseph Kim 1,4,
Matthieu Resche-Rigon 5,6, James R. Carpenter 1,7 and Elizabeth J. Williamson1,8
May 11, 2017
1Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK
2MRC Biostatistics Unit, Cambridge CB2 0SR, UK
3Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK
4IMS Health, Real-World Evidence Solutions, London N1 9JY , UK
5SBIM Biostatistics and Medical information, Hpital Saint-Louis, APHP, 75010 Paris, France
6ECSTRA Team (Epide´miologie Clinique et Statistiques pour la Recherche en Sante´), UMR 1153 INSERM, Universite´ Paris
Diderot, Sorbonne Paris Cite´, 75010 Paris, France
7MRC Clinical Trials Unit at UCL, London WC2B 6NH, UK
8Farr Institute of Health Informatics, London NW1 2DA, UK
Corresponding author: Cle´mence Leyrat, London School of Hygiene and Tropical Medicine. Department of Medical Statistics.
Keppel Street, London WC1E 7HT. UK
clemence.leyrat@lshtm.ac.uk
Abstract
Inverse probability of treatment weighting (IPTW) is a popular propensity score (PS)-based approach to
estimate marginal treatment effects in observational studies at risk of confounding bias. A major issue when
estimating the PS is the presence of partially observed covariates. Multiple imputation (MI) is a natural
approach to handle missing data on covariates: covariates are imputed and a PS analysis is performed in
each imputed dataset to estimate the treatment effect. The treatment effect estimates from each imputed
dataset are then combined to obtain an overall estimate. We call this method MIte. However, an alternative
approach has been proposed, in which the PSs are combined across the imputed datasets (MIps). Therefore,
there are remaining uncertainties about how to implement MI for PS analysis: (i) should we apply Rubin’s
rules to the IPTW treatment effect estimates or to the PS estimates themselves? (ii) does the outcome have
to be included in the imputation model? (iii) how should we estimate the variance of the IPTW estimator
after MI? We studied the consistency and balancing properties of the MIte and MIps estimators and per-
formed a simulation study to empirically assess their performance for the analysis of a binary outcome. We
also compared the performance of these methods to complete case (CC) analysis and the missingness pattern
(MP) approach, which uses a different PS model for each pattern of missingness, and a third MI approach in
which the PS parameters are combined rather than the PSs themselves (MIpar). Under a missing at random
(MAR) mechanism, CC and MP analyses were biased in most cases for estimating the marginal treatment
effect, whereas MI approaches were approximately unbiased as long as the outcome was included in the
imputation model. Only MIte was unbiased in all the studied scenarios and Rubin’s rules provided good
variance estimates for MIte. The PS estimated in the MIte approach showed good balancing properties. In
conclusion, when using MI in the IPTW context, MIte with the outcome included in the imputation model
is the preferred approach.
Keywords: missing covariates, chained equations, Rubin’s rules, inverse probability of treatment weighting,
missingness pattern
1
1 Introduction
Data from observational studies provide useful information to address health-related questions and notably
estimate treatment effects in real settings [1]. However, because individuals are not randomised, the treatment
groups are often not comparable, which may lead to confounding bias [2] if these studies are analyzed without
appropriate adjustment for confounding. Propensity scores (PS) have been proposed as a means to recover bal-
ance between groups on observed covariates and so obtain a consistent estimate of the causal treatment effect
[3]. The PS is defined as the individual’s probability of receiving the treatment rather than the control given
their baseline characteristics [4]. One popular method to achieve covariate balance between treatment groups
is to weight individuals by the inverse of their PS. This approach, known as inverse probability of treatment
weighting (IPTW) [5] aims to emulate the sample that would have been observed in a randomised trial.
In practice, a major issue when estimating the PS is the presence of partially observed covariates, as the
PS cannot be estimated for individuals with at least one missing covariate value. Standard analyses include
complete case analysis [6] or the missingness pattern approach [7], but a popular alternative to handle missing
data is multiple imputation (MI) [8]. However, there remain four important unresolved questions about how to
implement MI for PS analysis.
First, two approaches to combine information from the imputed datasets have been proposed in the context of
PS analyses: combining the treatment effects estimated on each imputed dataset (method called MIte hereafter)
or combining for each individual their PS value across the imputed datasets (method called MIps hereafter)
[9, 10]. MIte intuitively seems to adhere best to the MI philosophy by applying the full analysis strategy on
each imputed dataset. Furthermore, Seaman and White [11] proved that for an infinite number of imputations,
the MIte estimator is consistent. However, Mitra and Reiter [9] published recommendations on the use of MI
for PS analysis, and they advocated using MIps, rather than MIte, for PS matching.
The second question is whether or not to include the outcome in the imputation model for the missing covari-
ates. Mitra and Reiter did not include it in their study [9], which could explain the bias they observed for both
MIte and MIps. One of the advantages claimed for the PS approach in general is that it allows the investigator
to use the data on the treatment and covariates to develop a well-fitting PS model without needing to look at
the data on the outcome [9]. This makes it possible to avoid the temptation to search for a PS model that gives
a significant treatment effect estimate, a temptation which may be experienced by an analysis which handles
confounding by using regression adjustment. Intuition may therefore lead one to believe that imputation of
missing covariates should also be done without using the outcome variable [9]. However, this intuition conflicts
with advice to include the outcome when imputing missing covariates in a regression model whose parameters
are the quantities of interest [12].
The third question is how to estimate the variance of the IPTW estimator after MI. When pooling the treatment
effects (MIte), Seaman and White [11] showed that Rubin’s rule for estimating the variance performs well in
practice, although theoretical justification for Rubin’s rules relies on the parameter of interest being estimated
with maximum likelihood, which is not the case for the IPTW method. For MIps, there is, to our knowledge,
no variance estimator that takes into account both the uncertainty due to the estimation of the PS given the
complete data and that due to the imputation of the missing data.
Finally, the fourth question is how well MI performs in comparison to other popular approaches to handle miss-
ing data, namely complete case (CC) analysis or the missingness pattern (MP) approach. Qu and Lipkovitch
[13] and Seaman and White [11] assessed the performance of MIte additionally including the missingness pattern
indicator in the PS model, but they did not compare this approach to other combination rules after MI or to
the MP approach alone.
In this article, we show that the best method when using MI with IPTW is to include the outcome in the
PS model and use Rubin’s rules to combine the treatment effect estimates from the imputed datasets (the
method MIte). An informal survey of recent articles using MI with IPTW revealed that suboptimal methods
2
are commonly being used. In particular, several studies have following Mitra and Reiter’s recommendation to
use MIps rather than MIte, and in several of these studies imputation is done without including the outcome
in the imputation model (e.g. [14, 15, 16, 17, 18, 19, 20]). Another example of suboptimal MI is that of Hayes
and Groner [21], who randomly selected one imputation for each individual and estimated that individual’s PS
based on just this single imputation. Therefore, we conducted this work to address the unresolved questions
and provide new recommendations.
This paper is organised as follows: we present a motivating example looking at the effect of statins on short-
term mortality after pneumonia in Section 2. A description of IPTW and its underlying assumptions is given
in Section 3. Section 4 presents different strategies to handle partially missing covariates for PS analysis. The
consistency and balancing properties of the MI approaches are studied in Section 5 and empirically assessed
in Sections 6 and 7 through a simulation study. The application of these approaches to the statin motivating
example is presented in Section 8, followed by a discussion in Section 9.
2 Motivating example: effect of statin use on short-term mortality
after pneumonia
To illustrate the importance of an adequate handling of missing covariates for PS analysis, we focused on a
published study of the effect of statin use on short term mortality after pneumonia [22]. We utilised the THIN
database, which consists of anonymised patient records from general practitioners (GPs) in the UK. As of
the end of 2015, the database represented 3.5 million unique active patients, or approximately 6% of the UK
population. The database has been found to be broadly representative of the UK population, and the validity of
recorded information has been established in previous studies [23, 24]. Douglas et al carried out an analysis of
9073 patients who had a pneumonia episode, of whom 1398 were under statin treatment when pneumonia was
diagnosed. In the statin group, 305 patients (21.8%) died within 6 months, while 2839 (37.0%) of the non-users
died within 6 months. However, statin users and non users were very different in terms of characteristics, in
particular on characteristics associated with mortality.
In Douglas et al., propensity scores were used to recover balance between groups. However, three impor-
tant potential confounders were only partially observed: body mass index (BMI), smoking status and alcohol
consumption, with respectively 19.2%, 6.2% and 18.5% of missing data. In the original analysis, a missing
indicator method was used. This approach is similar to the missingness pattern approach described later in this
paper. We will explore whether handling the missing data using MI gives results that differ from those given
by Douglas et al.
3 Inverse probability of treatment weighting
3.1 Propensity score estimation and assumptions
Propensity scores (PS) have become a major tool in causal inference to estimate the causal effect of a binary
treatment Z (Z=1 if treated, Z=0 otherwise) on an outcome in the presence of confounding [3]. The PS is
the individual’s probability of receiving the treatment conditional on the individual’s values on a set of baseline
covariates X [25]. The PS is usually estimated from the data using a logistic regression model which predicts
each individual’s probability of receiving the treatment from their baseline covariates [26]. The PS approach is
best understood using the potential outcomes framework [27], in which the causal effect of the treatment is de-
fined as the difference between the two potential outcomes Y Z=0 and Y Z=1, which are the outcomes that would
have been observed if an individual had been not treated and treated, respectively (this notation was invented
in the context of randomised experiments). Three assumptions are usually made to consistently estimate the
causal effect of a treatment: (i) positivity [28], (ii) Stable unit treatment value assumption (SUTVA) [29] and
3
(iii) strongly ignorable treatment assignment (SITA). Assumptions (i) and (ii) mean that each individual has
a non-null probability of receiving either treatment and has only one possible potential outcome value for each
treatment, respectively. Assumption (iii) implies that there are no unmeasured confounders.
The key property of the PS is that it is a balancing score. That is, if assumptions (i)-(iii) are valid and the
PS model is correctly specified, the variables included in the PS model are balanced between treatment groups
at any level of the PS, i.e. they have the same conditional distribution in both groups given the PS. This
leads to the consistency of PS-based estimators. Initially, three different PS-based approaches were proposed
[3]: PS matching, subclassification and covariate adjustment. Although PS matching is the most common
approach nowadays, matching often discards a substantial number of individuals from the analysis [30] and
variance estimation after PS matching is not straightforward [31]. The two other approaches have drawbacks
as well: residual bias due to heterogeneity within strata can remain with subclassification [7], whereas covariate
adjustment can be biased in some circumstances [32]. Thus, we focus on a fourth approach [33]: inverse
probability of treatment weighting (IPTW).
3.2 IPTW estimator and its variance for complete data
IPTW aims to create a pseudo-population similar to a randomised trial by weighting the individuals by the
inverse of their probability of receiving the treatment they actually received (i.e. eˆ−1i for treated individuals
and (1− eˆi)−1 for untreated individuals). Thus, the IPTW estimators for the marginal proportions for a binary
outcome Y , µ1 and µ0, among the treated and the untreated are [34]:
µˆ1 =
(
n∑
i=1
YiZi
eˆi
)(
n∑
i=1
Zi
eˆi
)−1
, µˆ0 =
(
n∑
i=1
Yi(1− Zi)
1− eˆi
)(
n∑
i=1
1− Zi
1− eˆi
)−1
. (1)
Zi is the treatment indicator for individual i (Zi = 1 if treated, 0 otherwise) and Yi is their outcome value.
From these two marginal estimates, it is possible to estimate a relative risk
(
µˆ1
µˆ0
)
, an odds ratio
(
µˆ1/(1−µˆ1)
µˆ0/(1−µˆ0)
)
or
a risk difference (µˆ1 − µˆ0) for a binary outcome.
The IPTW estimator, as with other PS-based estimators, is a ”two-step estimator”. If the uncertainty linked
to the PS estimation in the first step is not taken into account in the second step (treatment effect estimation),
the repeated sampling variance of the treatment effect will be overestimated and inference will be conservative
[31]. Lunceford and Davidian [35] and Williamson et al. [34] proposed a large-sample variance estimator for
the IPTW treatment effect estimator in which a correction term including the variance/covariance matrix of
the estimated PS parameters is applied.
4 Handling missing data in propensity score analysis
A major issue in PS estimation is the presence of partially observed covariates. In this section, we describe five
methods for applying IPTW to incomplete data. We assume the treatment status Z and outcome Y are fully
observed.
4.1 Complete case analysis
In complete case analysis, the PS is estimated only within the subgroup of individuals with observed values
for all of the variables included in the PS model, and only these individuals contribute to the estimation of
the treatment effect [36]. Although the complete case analysis is known to provide an unbiased estimate of
the parameters of an outcome regression model when the missingness is independent of the outcome [6], little
is known about complete case analysis for IPTW. Moreover, excluding individuals with missing covariates can
reduce statistical power, because no use is made of partially observed records.
4
4.2 Missingness pattern approach
Rosenbaum and Rubin [7] defined the generalized PS eˆ∗ as the probability of receiving the treatment given
the observed covariates and the pattern of missing data. In practice, the PS is estimated separately in each
stratum defined by missingness patterns using the covariates observed in that stratum, as long as the sample
size is large enough in each stratum. When treatment allocation is independent of the potential outcomes given
the observed covariates and the missingness pattern, the generalized propensity score balances the missingness
indicators and the observed component of the partially observed covariates but, not surprisingly, may not
balance the unobserved component [37].
4.3 Multiple imputation
The principle of multiple imputation (MI) is to generate multiple sets of plausible values for the missing variables
by drawing from the posterior predictive distribution of these variables given the observed data. Variables of
different types are often included in the PS model. Therefore, we focused on chained equations, in which a
specific imputation model, including the outcome, is specified for each partially observed variable, rather than
joint modelling to impute the missing data [38]. M complete datasets are created and analysed independently
to produce estimates θˆk, (k = 1, ...,M) of θ the vector of the parameters of interest (eg. regression coefficients)
and estimates W k of their associated variance matrix. Then θˆk and W k , (k = 1, ...,M) are combined across
the M imputed datasets. Rubin’s rules for the mean and variance state that an overall estimate, θˆMI , of θ and
an estimate of the variance of θˆMI , V̂ ar(θˆMI), are [8]:
θˆMI =
1
M
M∑
k=1
θˆk, V̂ ar(θˆMI) = W +
(
1 +
1
M
)
B, (2)
where W is the within-imputation variance-covariance matrix, which reflects the variability of the parameter
estimates in each imputed dataset, and B is the between-imputation variance matrix reflecting the variability
in the estimates caused by the missing information. These two components are defined as:
W =
1
M
M∑
k=1
W k, B =
1
M − 1
M∑
k=1
(
θˆk − θˆMI
)2
. (3)
Three variants of MI are MIte, MIps and MIpar. For complete data, the IPTW method involves three steps:
i) estimation of the parameters α in the PS model; ii) calculation of individual PS’s; and iii) use of equation
(1) to calculate µˆ0 and µˆ1. In MIte, all three steps are applied to each of the M imputed datasets separately.
Then the M treatment effect estimates obtained are averaged. In MIps, the method recommended by Mitra
and Reiter [9], the first two steps are applied to each imputed dataset separately. Then the M PS’s for each
individual are averaged and equation (1) is applied once using these average PS’s. In MIpar, the first step is
applied separately to each of the imputed datasets. The M values of α and each individual’s covariates Xi are
then averaged over the M datasets. Each individual’s PS is then estimated as:
eˆi (x¯i) =
expα¯x¯i
1 + expα¯x¯i
, (4)
(i = 1, ..., n) with α¯ the (p + 1) vector of the average PS parameter values (for the p covariates and the
intercept) and x¯i a (p+ 1) vector of the average p covariates across imputed datasets for individual i. Whereas
the MIps estimate of treatment effect is based on the average PS, e¯i (xi), the MIpar estimate is based on the
PS corresponding to the average of the covariates, eˆi (x¯i). Rubin’s Rules are based on the assumption that θˆk
is approximately normally distributed, and the distribution of αˆk might be expected to be more approximately
normal than the distribution of a PS (which is constrained to lie in [0,1]). The 3 MI approaches are illustrated
in Figure 1. To obtain a variance estimator for the MIte estimator, Rubin’s rule for the variance (above) can be
applied to the standard IPTW variance estimator for full data which takes account of the PS estimation. For
5
MIps and MIpar, because the PS is obtained from M imputations, the standard variance estimator for IPTW is
no longer valid, since it does not take into account the uncertainty due to missing data. A large-sample estimate
of the variance for MIpar, derived from [34], is given in Appendix 1.
5 Balancing properties and consistency of IPTW estimator after MI
Without missing data, Rosenbaum and Rubin showed that the PS is a balancing score [3]. A balancing score
b(x) is defined as a function of the observed covariates x such that the conditional distribution of x given b(x)
is the same for Z = 0 and Z = 1. Moreover, Rosenbaum and Rubin showed that any balancing score b(x)
is ’finer’ than the true PS, that is e(x) = f {b(x)}, for some function f(.). The consistency of PS estimators
comes from this balancing property. Lunceford and Davidian [35] studied theoretical properties of the IPTW
estimator when data are complete and gave a proof of consistency of this estimator. In this section, we study
the consistency of the IPTW estimators obtained from MIte, MIps and MIpar and how this relates to balancing
properties of the PS models used in these approaches. We suppose hereafter that (i) the SITA assumption
required for IPTW (see 3.1) holds, (ii) the missing data are missing at random (MAR) and (iii) the imputation
model is correctly specified. For simplicity, we consider here the estimation of θ = E[Y Z=1].
5.1 Combining the treatment effects after MI (MIte)
Let X, the vector of covariates, be split into observed and missing components, X = (Xobs,Xmiss). X
(k)
m is the
imputed value of Xmiss in the k
th imputed dataset (k = 1, ..,M). We show (see Appendix 2a) that in each
imputed dataset:
e(Xobs,X
(k)
m ) = E[Z|Xobs,X(k)m ]. (5)
If X
(k)
m is imputed from the true model (i.e. correctly specified at the true parameter values), we can also
show (Appendix 2b) that a SITA-type assumption holds in each imputed dataset, i.e.:
Y z=1 ⊥ Z |Xobs,X(k)m and Y z=0 ⊥ Z |Xobs,X(k)m
These two assumptions are the imputed-data version of what Imbens calls weak unconfoundedness [39]. Note
that we do not have the analogue of the usual, stronger, assumption (Y z=1, Y z=0) ⊥ Z |Xobs,X(k)m , which
requires the treatment to be independent of the set of potential outcomes. This is because our imputation
model is a model for Xmiss|Z = z, Y Z=z,Xobs. The stronger assumption would require our imputation model
to capture Xmiss|Z = z, Y z=0, Y z=1,Xobs. However, it is important to note, as Imbens does, that the weak
unconfoundedness suffices to obtain unbiased estimates of the causal treatment effect.
Balancing properties: We show in Appendix 2 that Xobs ⊥ Z | e(Xobs,X(k)m ) and X(k)m ⊥ Z | e(Xobs,X(k)m ).
Thus the true PS in each completed dataset balance both the unobserved and the imputed values of the covariates
across treatment groups. This balancing property is what leads to consistency of the MIte estimator.
Consistency: Seaman and White [11] proved that for an infinite number of imputations, the estimator
obtained by combining the treatment effects after MI (MIte) is consistent. To understand how this consistency
relates to the SITA-type assumption above, it is helpful to consider the following expectation:
E
[
Y Z
e(Xobs,X
(k)
m )
]
= E
[
E
[
Y Z
e(Xobs,X
(k)
m )
∣∣∣∣∣Xobs,X(k)m
]]
= E
[
E[Y z=1|Xobs,X(k)m ] E[Z|Xobs,X(k)m ]
e(Xobs,X
(k)
m )
]
(6)
= E[E[Y z=1|Xobs,X(k)m ]] (7)
= E[Y z=1] = θ,
6
Step 6 requires the SITA-type assumption 6. Step 7 relies on PS in the kth imputed dataset being equal to the
probability of being treated given the observed and imputed part of the covariates (equation 5).
5.2 Combining the PS or the PS parameters after MI (MIps and MIpar)
For PS methods, consistency comes from the ability of PS to balance covariates between groups. MIps and
MIpar create a single overall PS used to estimate the treatment effect. Thus, consistency for these methods
would rely on the ability of these overall PS to balance both the observed and the missing parts of the covariates
in the original incomplete dataset. Rosenbaum and Rubin’s results show that this can happen only if the single
PS (as estimated in MIps or MIpar) is ‘finer’ than the true (full data) PS (in other words, if the true PS is
a function of the single pooled PS) [3]. However, when combining the PS or the PS parameters, the overall
PS used for the analysis is not a function of the observed covariates but rather is a function of the average
covariates across imputed datasets, including the imputed values. Thus, the pooled PS (as estimated either in
MIps or MIpar) is not ‘finer’ than the true PS according to Rosenbaum and Rubins definition (i.e. the true
PS is not a function of the pooled PS). Consequently, it cannot be a balancing score. Thus, neither θˆMIps nor
θˆMIpar are consistent estimators. We illustrate the lack of consistency with a counter example in Appendix 3.
We also discuss the balancing properties of the MP approach in Appendix 4.
6 Simulation study design
The aim of this simulation study is to assess the performance of the three MI approaches, complete case analysis
and missingness pattern approach for IPTW when the outcome is binary and to compare them with non-MI
methods.
Data generation mechanisms
We generated datasets of sample size n = 2000, reflecting an observational study comparing a treatment Z = 1 to
a control treatment Z = 0 on a fully observed binary outcome Y with 3 measured covariates X = (X1, X2, X3).
X1 and X2 were continuous and X3 was binary. X2 was fully observed whereas X1 and X3 were partially
observed. We generated the data as follows:
• Covariates: The 3 covariates X = (X1, X2, X3) are generated from a multivariate normal distribution
X ∼ N3(0,Σ), with Σii = 1 and Σij = ρ for i 6= j. X3 is then dichotomised according to a threshold of 0
to obtain a prevalence of 0.5.
• Treatment assignment depends on X according to the following model:
logit(p(Z = 1|x)) = −1.15 + 0.7x1 + 0.6x2 + 0.6x3. (8)
These coefficients give E(Z) = pZ = 0.3 and an important imbalance on covariates between treatment
groups, as shown in Appendix 6.
• Binary outcome: The outcome depends on the 3 covariates and the treatment received according the
following model:
logit(p(Y = 1|Z,x)) = −1.5 + 0.5x1 + 0.5x2 + 0.3x3 + θcZ. (9)
In this model, exp θc is the conditional odds ratio (OR). We used the method described in [40] to find the
value of θc that gives the desired relative risk (RR).
• Missingness mechanism: In this simulation study, we consider a missing at random (MAR) mechanism.
The missingness of each partially observed covariate (X1 and X3) depends on the fully observed covariate
7
X2, the treatment received Z and the outcome Y :
logit(p(M1 = 1|Z, x1, x2, x3, y)) = γ0 + z + x2 + γY y, (10)
logit(p(M3 = 1|Z, x1, x2, x3, y)) = γ0 + z + x2 + γY y, (11)
where M1 and M3 are the missingness indicators for X1 and X3 (equal to 1 if the value is missing),
respectively.
The values considered for each simulation parameter are in Table 1. A full factorial design (2× 2× 2) leads
to 8 scenarios, but 5 scenarios are added as a sensitivity analysis (Table 2).
Methods
For each studied scenario, 5000 datasets were simulated. We compared 9 methods, all of them using IPTW:
treatment effect were estimated on the full dataset (before the imposition of missingness) and using CC, MP, and
the 3 MI approaches (MIte, MIps and MIpar). For each of the 3 MI approaches, we considered two imputation
models, including or not the outcome and we generated M = 10 imputed datasets. Simulations were performed
in R and the mi package was used for multiple imputation [41], based on full conditional specification (FCS).
Estimands
We focused on 3 estimands: the log(RR), log(OR) and the risk difference (RD). Rubin’s rules were applied to
the logarithm of the RRs (or the ORs), rather than to to RRs (or ORs) themselves, because the asymptotic
normal approximation is likely to be better for the log RR (or log OR) than for the RR (or OR). This is
important in particular when constructing confidence intervals, because log(OR) and log(RR) have unrestricted
parameter spaces [42].
Measures of performance
For each method, we computed the absolute bias of the mean treatment effect. We also estimated the variance
of the treatment effect, as well as the empirical variance, the coverage rate and the standardized differences of
the covariates after IPTW.
The standardized differences for each covariate were used as a measure of method performance. In the absence
of weighting, standardized differences are defined as:
SDiff =
100× ∣∣X¯1 − X¯0∣∣√
sˆ21+sˆ
2
0
2
, (12)
with X¯0, X¯1, sˆ
2
0 and sˆ
2
1 denoting the average value (or proportion if the covariate is binary) for the covariate and
its estimated variance in the control and treatment group, respectively. After IPTW, standardized differences
are calculated by replacing the unweighted means and variances in (12) by their weighted equivalents (weighted
by inverse PS). For the MIte approach, standardized differences were calculated using the PS estimated from
each imputed dataset, both to assess the balance on the originally simulated complete dataset (before imposing
missingness) and on the given imputed dataset. For MIps and MIpar, standardized differences were calculated
using the pooled PS to assess balance on (i) the original dataset, (ii) on the average value of the covariates
across the imputed datasets. For (ii) we also calculated the standardized differences separately on the observed
part of the covariates and the average imputed part.
8
7 Simulation study results
7.1 Main simulation study
Because results were similar for the three measures of interest, we present the results for relative risks (RR) on
the log scale only in the main text, while results for odds ratios and risk differences are in the appendices.
7.1.1 Bias
The absolute bias of the log(RR) of the treatment, for ρ = 0.6, is presented in Figure 2. Since results for ρ = 0.3
are similar, they are presented in the Appendices.
Full data, CC and MP analyses: As expected, the IPTW estimator on the full data (before generating
missingness for X1 and X3) is approximately unbiased and the complete case (CC) estimator is strongly biased
in all scenarios except those where the outcome is not associated with missingness and there is no treatment
effect. The MP approach is always biased in the situations considered, with a bias which can be even stronger
than that which is observed for the CC approach. The reason for this is an incorrect PS model specification in
each pattern of missingness: in the strata in which a covariate is not observed, the covariate is omitted from
the model.
Multiple imputation: First, the results show that the imputation model must include the outcome, even
if the outcome is not a predictor of missingness. All 3 MI estimators are strongly biased in all scenarios when
the outcome is not included in the imputation model. Second, when the outcome is included in the imputation
model, the 3 MI approaches lead to a decrease in bias relative to the crude analysis. However, only the MIte
approach leads to an unbiased estimate in the 8 main scenarios. Combining the PS parameters to estimate
the PS of the average covariates (MIpar) performed better than combining the PS themselves, but both these
approaches are slightly biased.
7.1.2 Standardized differences between groups
The bias observed for the MP, MIps and MIpar methods can be explained by a remaining imbalance on the
covariates between groups. Standardized differences for each covariate are in Table 3. A covariate is usually
considered adequately balanced if its standardized difference is < 10%. IPTW on the full data achieved a very
good balance between groups on the 3 covariates (as expected, standardized difference < 5% for each of the
3 covariates). For the CC approach, groups were balanced but the bias occurs since excluded individuals are
different from included individuals on confounding factors. The PS obtained from the MP approaches balanced
the observed part of the covariates, but not the unobserved part. This means that within each pattern of
missingness, treated and untreated individuals are balanced for the covariates included in the PS model, but
unbalanced on the missing covariate because this covariate is an unmeasured confounder in the PS model.
Thus, when the missingness rate increases, imbalances (and consequently, bias of the treatment effect estimate)
increase. MIte performs well because the PS within each imputed dataset balances the observed and imputed
components of the covariates in that imputed dataset (see Table 3). Conversely, MIps and MIpar could only be
consistent by balancing the observed and missing covariates, which is not the case, as seen in Table 3.
7.1.3 Coverage rate and standard errors
Figure 3 displays the coverage rate for each method when the outcome is included in the imputation model.
9
Each boxplot represents the coverage distribution for the 8 main scenarios. Because the CC and MP approaches
are strongly biased, their coverage rates are not relevant. The coverage rate for the MIte approach is close to
the nominal value of 95%, confirming that Rubin’s rules perform well in this context provided that the within-
imputation variance estimation takes into account the uncertainty in PS estimation.
Table 1 in Appendix 5.1 shows the mean estimates from different variance estimators for each method when
the outcome is included in the imputation model. For MIte one could apply Rubin’s rules to a variance estimator
that ignores the PS estimation (treating the PS weights as being known) or to a variance estimator that accounts
for the PS estimation. We know that failing to account for the PS estimation results in an overestimation of
the variance in the full-data context [43]. Table in Appendix 5.1 shows that for the analysis on full data and for
MIte, the variances accounting for PS estimation are close to the empirical variance, whereas an estimator not
taking the uncertainty in PSs estimation tends to overestimate the variance for these approaches, as expected.
For MIps and MIpar, one could estimate the variance accounting for the variability linked to PS estimation but
not the imputation procedure by using the pooled PS in the IPTW variance estimator or one could incorporate
the PS estimation and imputation by applying the variance estimator we proposed in Appendix 1. For MIps
and MIpar, there was little difference between the variance estimates accounting for the imputation and those
that did not. Surprisingly, in our simulations, additionally accounting for the imputation procedure resulted
in a lower variance estimator. This can be explained as follows: the within imputation variance of the PS
parameters (reflecting the correlation between the covariates and treatment; the higher this is the larger gain
in precision for IPTW) is higher than the between imputation variance component (noise due to missing data).
However, when the missingness rate increases, the variance that correctly accounts for the imputation procedure
is typically higher than the variance which ignores the imputation, because of a larger heterogeneity between
imputed datasets.
7.2 Sensitivity simulation study
Appendix 5.3 presents the results of one scenario with a non-null treatment effect with a smaller sample size
(n = 500). Results were similar in terms of bias for n = 500 and n = 2000. Because the variance estimator
for IPTW has been developed for large samples, we observed slightly underestimated variances for the full data
analysis, MIps and MIpar. This underestimation is more pronounced in the CC analysis because the sample
for the analysis is even smaller (269 on average when n = 500).
Figure 4 shows the bias when 10% or 60% of each partially observed covariate is missing. Full results are
presented in Appendix 5.4. For a low missingness rate, the CC and MP approaches are still biased but the 3 MI
approaches corrected the bias. For a missingness rate of about 60% for each covariate, only the MIte approach
showed good performance in terms of bias reduction, confirming the good statistical properties of this approach
even with this large amount of missing data.
In our sensitivity simulation study, increasing the number of imputed datasets did not strongly impact the
results in terms of bias or variance (See Appendix 5.5).
8 Application to the motivating example
We applied CC analysis, the MP approach and the three MI strategies to estimate the effect of statin treatment
on mortality after pneumonia from our motivating example dataset. For simplicity, we analysed the primary
outcome, mortality within 6 months, as a binary outcome, and estimated the corresponding relative risk and
its 95% confidence interval. For each approach, IPTW was used to account for the confounding. We focused
the analysis on the 7158 patients without coronary heart disease. The propensity score was estimated from
a logistic regression modeling statin use as a function of the following covariates: age, sex, body mass index
(BMI), alcohol consumption, smoking status, diabetes, cardiovascular disease, circulatory disease, heart fail-
ure, dementia, cancer, hyperlipidaemia, hypertension and prescription of antipsychotics, hormone replacement
therapy, antidepressants, steroids, nitrates, beta-blockers, diuretics, anticoagulants and use of antihypertensive
10
drugs. The imbalance between the study groups is illustrated in Figure 5. Complete case (CC) analysis was
conducted on the 5168 individuals with complete records. For the missingness pattern approach (MP), 8 pat-
terns were identified. However, some of these patterns were very rare. For instance, only 6 individuals had only
the smoking status missing, and only 8 had both smoking status and BMI missing. Thus, we considered only
4 groups:
• complete records (n=5168) for which all the covariates listed above are included
• individuals with only the alcohol consumption missing (n=455)
• individuals with only BMI missing (n=575)
• individuals with the smoking status missing (alone or in addition to BMI and alcohol consumption) and
individuals with both BMI and alcohol consumption missing (n=960)
For multiple imputation (MI), 10 imputed datasets were created. The imputation model included statin use,
mortality and all the variables listed above.
The standardized differences estimated before weighting and after weighting by PS for CC, MP, MIte, MIps
and MIpar are presented in Table 5. The MP and the 3 MI approaches lead to a similar reduction in imbalance
between groups on the observed variables as compared to the crude standardized differences. Nevertheless,
because of the poor overlap of the patients characteristics between groups (Figure 5), some covariates are still
unbalanced even after MI. However, for binary covariates, large standardized differences can occur even for
slight imbalance when the prevalence is low. Estimated RR are presented in Table 6. First, all approaches
based on IPTW lead to a treatment effect estimate smaller than the unweighted treatment effect. The 3 MI
approaches lead to similar RR and these were smaller than the RR obtained from CC and MP analyses. The
small differences between the 3 MI approaches in this example can be explained by a low rate of missing data
and the fact that the 3 partially observed covariates were not strong confounders.
9 Discussion
This paper aimed to address three main questions about multiple imputation in the context of IPTW: (i) does
the outcome have to be included in the imputation model? (ii) should we apply Rubin’s rules on the IPTW
treatment effect estimates or on the PS estimates themselves? (iii) how should we estimate the variance of
the IPTW estimator after MI? First, results showed that the outcome must be included in the imputation
model, even if the outcome is not a predictor of missingness. This is well known in the context of multivariable
regression, but can be seen as counter-intuitive in the PS paradigm, because the PS model for the full data does
not use the outcome data. The simulation results showed a bias in the 3 MI estimators when the outcome was
omitted from the imputation model, even when the outcome was not a predictor of missingness. In practice, to
obtain an unbiased estimate of the intervention effect when using MI, the outcome should be included in the
imputation model. This may not be always possible if the outcome data has not been yet recorded and the
PS is used at the design stage of the study (to find good matched controls for example), rather than at the
analysis stage. In such situations, it is worth considering whether the assumptions required for complete case
analysis or the MP approach may be valid. If they are, one of these methods could be used instead of using MI
omitting the outcome. Furthermore, the inclusion of the outcome in the imputation model may go against the
principle of objective design, as described by Rubin [44]. However, when using MI, this is necessary to obtain
a consistent estimator of the treatment effect. Second, we showed that combining the treatment effects after
MI (MIte approach) is the preferred MI strategy in terms of bias reduction under a MAR mechanism. This
estimator is the only estimator of the 3 MI estimators to be consistent and to provide good balancing properties.
Even though MIps and MIpar are not consistent estimators of the treatment effect, they can reduce the bias
of the CC analysis, if the rate of missing data is not too high. Combining the PS or the PS parameters has
11
no clear advantage for IPTW, but may be useful in the context of PS matching: because it involves only one
estimation of the treatment effect, it could provide a computational advantage for large datasets. In addition,
MIte for PS matching implies that the M treatment effect estimates are estimated from different matched sets,
potentially of different sample sizes, leading to a more complex variance estimation because of these different
sample sizes.
Third, as long as the uncertainty in the PS estimation is taken into account in the variance estimation [34],
Rubin’s rules perform well for MIte, even for moderate sample size (n=500). For MIpar, the proposed variance
approximation (Appendix 1) showed good performance in our simulation study.
The 3 MI approaches differ in terms of their balancing properties. We showed that whereas the PS estimated
in each dataset in the MIte approach can balance covariates between groups in each imputed dataset, this is
no longer true for MIps and MIpar. However, the best method to assess covariate balance after MI remains
unknown. With MIte, the aim being to estimate a treatment effect from each dataset, we require balance
between groups within each imputed dataset. In contrast, for MIps and MIpar, further investigation is needed
to know if we should assess the balancing properties of the pooled PS on the average covariate values across the
imputed dataset or on each dataset.
Our recommendations for the optimal way to implement multiple imputation for PS analysis are in contradic-
tion to those given by Mitra and Reiter, who suggest that pooling propensity scores across imputed datasets can
sometimes out-perform our recommended method of pooling treatment effects [9]. In their simulation studies
MIps (or MIpar) led to a bigger reduction in bias than MIte in some scenarios. This was due to a near violation
of the positivity assumption in their simulated datasets [45] combined with the omission of the outcome from
the imputation model. Penning de Vries and Groenwold [46] used Mitra and Reiters data-generating mecha-
nism to perform a simulation study but they increased the effective sample size to ensure positivity. Under this
condition and when the outcome was included in the imputation model, as we recommend, they showed MIte
was superior, in accordance with our findings. While we understand the desire to adhere to the principle of
objective design by omitting the outcome from the imputation model, our results show that this leads to bias in
MIte and an increase in bias for MIps (and MIpar). Thus, when it is either undesirable or impossible to include
the outcome in the imputation model we would suggest considering the validity of alternative approaches, such
as complete case analysis or the MP approach, rather than adopting the systematically biased approach of
imputation omitting the outcome from the imputation model.
The MP approach, which is widely used in practice to handle missing data for PS analysis, showed poor per-
formance under a MAR mechanism. In our simulation study, the MP approach does not perform well, because
missing data were generated under a MAR mechanism but the MP approach relies on different assumptions
about the missing data mechanism [37]. When the assumptions for the MP approach are valid, it avoids the
need for an imputation model altogether, and hence avoids the need to use the outcome when calculating the
PS. Because the assumptions for MI and MP are different, we believe that MP could be a promising alternative
when the assumptions for MI are not valid. Further investigations are needed to understand the usefulness of
the MP approach in practice. Moreover, its application to our real-life example dataset was challenging, because
the sample size within each missingness pattern was not large enough to estimate the PS.
This work has some limitations. We generated only 3 covariates in our simulation, whereas PS are often
calculated from a large number of covariates. Moreover, we studied only the common situation where the PS
model only includes main effects, and we assumed as well the model including main effects only for the outcome
given the covariates. When there are interactions or quadratic terms in these two models, the specification
of the imputation model can be less straightforward, requiring further efforts to ensure the imputation model
is compatible with the substantive (analysis) model [46]. Furthermore, we focussed on the IPTW estimator
only, because of its increasing use and its application in the presence of time-varying covariates and treatment.
12
Indeed, IPTW is widely used in marginal structural models (MSMs) to handle time-dependent confounding.
Although this paper is about the simpler situation of a single time-point, it is important to clarify how to
use IPTW after MI correctly in this context, because otherwise it is unlikely to be done correctly in the more
complex situation of time-dependent confounding. However, some issues may arise with IPTW when some
estimated weights are too extreme; PS matching can be a better solution in this context [47]. Nevertheless,
because the PS estimated in either MIps or MIpar is not a balancing score, PS matching (like IPTW - and
like also PS stratification and PS covariate adjustment) would not be expected to perform well when used in
combination with MIPs or MIPar. Conversely, because in MIte, the estimated PS can balance both the observed
and imputed component of the covariates in the imputed dataset, we would expect all the PS-based methods
(PS matching, stratification, covariate adjustment and IPTW) to perform well to estimate the treatment effect
within each imputed dataset as long as the underlying assumptions for these methods are fulfilled.
In conclusion, for IPTW, multiple imputation followed by pooling of treatment effect estimates is the pre-
ferred approach amongst those studied, when data are missing at random, and the outcome must be included
in the imputation model.
Funding
This work has been supported by the Medical Research Council (project grant MR/M013278/1 and Unit
Programme number U105260558).
References
[1] Concato J, Shah N, Horwitz RI. Randomized, Controlled Trials, Observational Studies, and the Hierarchy
of Research Designs. New England Journal of Medicine. 2000 Jun;342(25):1887–1892.
[2] Cochran WG, Rubin DB. Controlling Bias in Observational Studies: A Review. Sankhya¯: The Indian
Journal of Statistics, Series A (1961-2002). 1973;35(4):417–446.
[3] Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal
effects. Biometrika. 1983;70(1):41 –55.
[4] Guo S. Propensity score analysis : statistical methods and applications. Thousand Oaks Calif: Sage
Publications; 2009.
[5] Hirano K, Imbens GW. Estimation of Causal Effects using Propensity Score Weighting: An Application
to Data on Right Heart Catheterization. Health Services and Outcomes Research Methodology. 2001
Dec;2(3):259–278.
[6] White IR, Carlin JB. Bias and efficiency of multiple imputation compared with complete-case analysis for
missing covariate values. Statistics in Medicine. 2010 Dec;29(28):2920–2931.
[7] Rosenbaum PR, Rubin DB. Reducing Bias in Observational Studies Using Subclassification on the Propen-
sity Score. Journal of the American Statistical Association. 1984 Sep;79(387):516–524.
[8] Carpenter J, Kenward M. Multiple Imputation and its Application. John Wiley & Sons; 2012.
[9] Mitra R, Reiter JP. A comparison of two methods of estimating propensity scores after multiple imputation.
Statistical Methods in Medical Research. 2012 Jun;0(0):1–17.
[10] Mitra R, Reiter JP. Estimating propensity scores with missing covariate data using general location mixture
models [Monograph]; 2009. [Online]. Available from: http://eprints.soton.ac.uk/67154/.
[11] Seaman S, White I. Inverse Probability Weighting with Missing Predictors of Treatment Assignment or
Missingness. Communications in Statistics - Theory and Methods. 2014 Aug;43(16):3499–3515.
[12] Moons KGM, Donders RART, Stijnen T, Harrell Jr FE. Using the outcome for imputation of missing
predictor values was preferred. Journal of Clinical Epidemiology. 2006 Oct;59(10):1092–1101.
13
[13] Qu Y, Lipkovich I. Propensity score estimation with missing values using a multiple imputation missingness
pattern (MIMP) approach. Statistics in Medicine. 2009 Apr;28(9):1402–1414.
[14] Sulkowski JP, Cooper JN, Duggan EM, Balci O, Anandalwar SP, Blakely ML, et al. Does timing of neonatal
inguinal hernia repair affect outcomes? Journal of Pediatric Surgery. 2015 Jan;50(1):171–176.
[15] de Groot S, Redekop WK, Sleijfer S, Oosterwijk E, Bex A, Kiemeney LALM, et al. Survival in Patients With
Primary Metastatic Renal Cell Carcinoma Treated With Sunitinib With or Without Previous Cytoreductive
Nephrectomy: Results From a Population-based Registry. Urology. 2016 Sep;95:121–127.
[16] Goel SS, Aksoy O, Gupta S, Houghtaling PL, Tuzcu EM, Marwick T, et al. Renin-angiotensin system
blockade therapy after surgical aortic valve replacement for severe aortic stenosis: a cohort study. Annals
of Internal Medicine. 2014 Nov;161(10):699–710.
[17] Franklin JM, Eddings W, Schneeweiss S, Rassen JA. Incorporating linked healthcare claims to improve
confounding control in a study of in-hospital medication use. Drug Safety. 2015 Jun;38(6):589–600.
[18] Neuderth S, Schwarz B, Gerlich C, Schuler M, Markus M, Bethge M. Work-related medical rehabilitation
in patients with musculoskeletal disorders: the protocol of a propensity score matched effectiveness study
(EVA-WMR, DRKS00009780). BMC public health. 2016 Aug;16:804.
[19] Jobarteh K, Shiraishi RW, Malimane I, Samo Gudo P, Decroo T, Auld AF, et al. Community ART Support
Groups in Mozambique: The Potential of Patients as Partners in Care. PloS One. 2016;11(12):e0166444.
[20] Weber-Schoendorfer C, Hoeltzenbein M, Wacker E, Meister R, Schaefer C. No evidence for an increased
risk of adverse pregnancy outcome after paternal low-dose methotrexate: an observational cohort study.
Rheumatology (Oxford, England). 2014 Apr;53(4):757–763.
[21] Hayes JR, Groner JI. Using multiple imputation and propensity scores to test the effect of car seats and seat
belt usage on injury severity from trauma registry data. Journal of pediatric surgery. 2008 May;43(5):924–
927.
[22] Douglas I, Evans S, Smeeth L. Effect of statin treatment on short term mortality after pneumonia episode:
cohort study. BMJ. 2011 Apr;342:d1642.
[23] Lewis JD, Schinnar R, Bilker WB, Wang X, Strom BL. Validation studies of the health improvement
network (THIN) database for pharmacoepidemiology research. Pharmacoepidemiology and Drug Safety.
2007 Apr;16(4):393–401.
[24] Blak BT, Thompson M, Dattani H, Bourke A. Generalisability of The Health Improvement Network
(THIN) database: demographics, chronic disease prevalence and mortality rates. Informatics in Primary
Care. 2011;19(4):251–255.
[25] Imbens GW, Rubin DB. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction.
New York, NY, USA: Cambridge University Press; 2015.
[26] D’Agostino RB. Propensity score methods for bias reduction in the comparison of a treatment to a non-
randomized control group. Statistics in Medicine. 1998 Oct;17(19):2265–2281.
[27] Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of
Educational Psychology. 1974;66(5):688–701.
[28] Cole SR, Herna´n MA. Constructing inverse probability weights for marginal structural models. American
Journal of Epidemiology. 2008 Sep;168(6):656–664.
[29] Rubin DB. Comment: Which Ifs Have Causal Answers. Journal of the American Statistical Association.
1986 Dec;81(396):961–962.
[30] Austin PC. A comparison of 12 algorithms for matching on the propensity score. Statistics in Medicine.
2014 Mar;33(6):1057–1069.
[31] An W. Bayesian propensity score estimators: incorporating uncertainties in propensity scores into causal
inference. Sociological Methodology. 2010 Aug;40(1):151–189.
[32] Hade EM, Lu B. Bias associated with using the estimated propensity score as a regression covariate.
Statistics in medicine. 2014 Jan;33(1):74–87.
14
[33] Rosenbaum PR. Model-Based Direct Adjustment. Journal of the American Statistical Association.
1987;82:387–394.
[34] Williamson EJ, Forbes A, White IR. Variance reduction in randomised trials by inverse probability weight-
ing using the propensity score. Statistics in Medicine. 2014 Feb;33(5):721–737.
[35] Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal
treatment effects: a comparative study. Statistics in Medicine. 2004 Oct;23(19):2937–2960.
[36] Hill J. Reducing Bias in Treatment Effect Estimation in Observational Studies Suffering from Missing
Data. Columbia University Commons. 2004;Available from: http://hdl.handle.net/10022/AC:P:9697.
[37] D’Agostino RB, Rubin DB. Estimating and Using Propensity Scores with Partially Missing Data. Journal
of the American Statistical Association. 2000 Sep;95(451):749–759.
[38] Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Multiple Imputation by Chained Equations: What is it and
how does it work? International journal of methods in psychiatric research. 2011 Mar;20(1):40–49.
[39] Imbens GW. The role of the propensity score in estimating dose-response functions. Biometrika. 2000
Jan;87(3):706–710.
[40] Austin PC. The performance of different propensity score methods for estimating marginal odds ratios.
Statistics in Medicine. 2007 Jul;26(16):3078–3094.
[41] Gelman A, Hill J. Multiple Imputation with Diagnostics (mi) in R: Opening Windows to the Black Box.
Journal of Statistical Software. 2011;40.
[42] Collett D. Modelling Binary Data, Second Edition. CRC Press; 2002. Google-Books-ID: LMRAIBEbdqsC.
[43] Tsiatis A. Semiparametric Theory and Missing Data. Springer Science & Business Media; 2007. Google-
Books-ID: xqZFi2EMB40C.
[44] Rubin DB. For Objective Causal Inference, Design Trumps Analysis. The Annals of Applied Statistics.
2008;2(3):808–840. Available from: http://www.jstor.org/stable/30245110.
[45] Penning de Vries B, Groenwold R. Comments on propensity score matching following multiple imputation.
Statistical Methods in Medical Research. 2016 Dec;25(6):3066–3068.
[46] Bartlett JW, Seaman SR, White IR, Carpenter JR. Multiple imputation of covariates by fully condi-
tional specification: Accommodating the substantive model. Statistical Methods in Medical Research.
2015 Aug;24(4):462–487.
[47] Gutman R, Rubin D. Estimation of causal effects of binary treatments in unconfounded studies with one
continuous covariate. Statistical Methods in Medical Research. 2015 Feb;.
15
Table 1: Factors used in the simulation study (2× 2× 2 factorial design)
Factor Values Description Comments
ρ 0.3 or 0.6 Correlation between the
covariates
After dichotomization, corr(X1;X3) =
corr(X2, X3) = 0.24 or 0.46
RR 1 or 2 Relative risk In model (9), θc = 0 or 1.221 when ρ = 0.3,
and 0 or 1.289 when ρ = 0.6.
γY 0 or -0.4 Association between the
outcome and the probabil-
ity of missingness
When γY = 0, γ0 = −1.5 and when γY =
−0.4, γ0 = −1.3 in model (10) and (11)
For each scenario, 5000 datasets of sample size 2000 were generated. The prevalence of the treatment was 0.3 and each partially
observed covariate had 30% of data missing. 10 imputed datasets were created.
Table 2: Simulation parameters used for the sensitivity analysis
Factor Values Description
n 500 Sample size
pm 0.1 or 0.6 Missingness rate
M 5 or 20 Number of imputed datasets
For each scenario of the sensitivity analysis, 5000 datasets were generated. The other parameter values were: RR=2, ρ = 0.6 and
γY = −0.4.
16
Table 3: Standardized differences (in %) after IPTW for each method for one scenario: RR=2, ρ = 0.6, outcome
predictor of missingness and included in the imputation model. n=2000.
Method X1 X2 X3
Crude (without IPTW) 81.3 74.7 51.7
Full 4.6 4.6 2.4
CC (n=1074) 7.6 7.3 3.5
MP
Balance on full data 14.6 4.3 8.5
Balance on the observed part of the covariate 6.1 4.3 2.9
Balance on the missing part of the covariate 48.6 NA 28.3
MIte
Balance on full data 15.0 4.5 9.1
Balance on each imputed dataset 4.5 4.5 2.4
MIps
Balance on full data 15.9 5.5 10.7
Balance on the average imputed dataset 15.8 5.5 10.6
Balance on the observed part of the covariate 7.6 5.5 4.9
Balance on the imputed part of the covariate 58.1 NA 36.9
MIpar
Balance on full data 15.1 4.8 9.6
Balance on the average imputed dataset 14.7 4.8 9.7
Balance on the observed part of the covariate 7.7 4.8 5.4
Balance on the imputed part of the covariate 52.5 NA 34.3
CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity
scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation. RR: relative
risk. NA: not applicable because X2 is fully observed.
17
T
ab
le
4:
B
ia
s
of
th
e
lo
g(
R
R
),
it
s
es
ti
m
at
ed
va
ri
an
ce
an
d
co
ve
ra
g
e
ra
te
fo
r
th
e
3
M
I
a
p
p
ro
a
ch
es
a
cc
o
rd
in
g
th
e
sa
m
p
le
si
ze
n
fo
r
o
n
e
sc
en
a
ri
o
(R
R
=
2
,
ρ
=
0
.6
,
o
u
tc
o
m
e
p
re
d
ic
to
r
of
m
is
si
n
gn
es
s
an
d
in
cl
u
d
ed
in
th
e
im
p
u
ta
ti
on
m
o
d
el
).
R
es
u
lt
s
b
a
se
d
o
n
5
0
0
0
si
m
u
la
ti
o
n
s.
F
u
ll
C
C
M
P
M
It
e
M
Ip
s
M
Ip
a
r
n
=
5
0
0
n
=
2
0
0
0
n
=
5
0
0
n
=
2
0
0
0
n
=
5
0
0
n
=
2
0
0
0
n
=
5
0
0
n
=
2
0
0
0
n
=
5
0
0
n
=
2
0
0
0
n
=
5
0
0
n
=
2
0
0
0
B
ia
s
0
.0
0
7
0
.0
0
2
0
.1
1
0
0
.1
4
1
0
.1
5
3
0
.1
3
0
0
.0
1
0
0
.0
0
5
0
.0
3
8
0
.0
2
8
0
.0
2
4
0
.0
1
7
V
a
ri
a
n
c
e
0
.0
2
2
0
.0
0
6
0
.0
5
0
0
.0
1
4
0
.0
2
9
0
.0
0
8
0
.0
2
6
0
.0
0
7
0
.0
2
2
0
.0
0
6
0
.0
2
3
0
.0
0
6
E
m
p
ir
ic
a
l
v
a
ri
a
n
c
e
0
.0
2
4
0
.0
0
6
0
.0
5
9
0
.0
1
4
0
.0
2
7
0
.0
0
7
0
.0
2
5
0
.0
0
6
0
.0
2
4
0
.0
0
6
0
.0
2
5
0
.0
0
6
C
o
v
e
ra
g
e
ra
te
0
.9
4
0
0
.9
4
7
0
.8
8
7
0
.7
6
9
0
.8
5
5
0
.6
9
1
0
.9
5
5
0
.9
5
7
0
.9
3
9
0
.9
3
2
0
.9
4
3
0
.9
4
2
M
It
e:
tr
ea
tm
en
t
eff
ec
ts
co
m
b
in
ed
a
ft
er
m
u
lt
ip
le
im
p
u
ta
ti
o
n
;
M
Ip
s:
p
ro
p
en
si
ty
sc
o
re
s
co
m
b
in
ed
a
ft
er
m
u
lt
ip
le
im
p
u
ta
ti
o
n
;
M
Ip
a
r:
p
ro
p
en
si
ty
sc
o
re
p
a
ra
m
et
er
s
co
m
b
in
ed
a
ft
er
m
u
lt
ip
le
im
p
u
ta
ti
o
n
.
18
T
ab
le
5:
D
es
cr
ip
ti
o
n
a
n
d
co
m
p
a
ri
so
n
o
f
st
a
ti
n
u
se
rs
a
n
d
n
o
n
u
se
rs
.
n
=
7
1
5
8
.
V
a
r
ia
b
le
M
is
s
in
g
(
%
)
S
t
a
t
in
u
s
e
r
s
M
is
s
in
g
(
%
)
N
o
n
s
t
a
t
in
u
s
e
r
s
S
t
a
n
d
a
r
d
iz
e
d
d
iff
e
r
e
n
c
e
(
%
)
n
=
5
9
9
n
=
6
5
5
9
C
r
u
d
e
C
C
*
M
P
M
It
e
M
Ip
s
M
Ip
a
r
C
h
a
r
a
c
t
e
r
is
t
ic
s
A
g
e
[m
e
a
n
(s
d
)]
6
6
.9
(1
0
.7
)
6
9
.8
(1
0
.9
)
2
7
.0
3
.8
2
.0
1
.4
1
.4
1
.4
M
a
le
3
2
2
(5
3
.8
)
3
1
7
3
(4
8
.4
)
1
0
.8
2
.0
6
.2
2
.2
2
.1
2
.2
B
M
I
[m
e
a
n
(s
d
)]
4
3
(7
.2
)
2
7
.6
(5
.9
)
1
4
4
4
(2
2
.0
)
2
5
.8
(5
.9
)
3
1
.9
7
.8
9
.0
9
.0
1
1
.4
1
1
.4
D
ri
n
k
e
rs
6
7
(1
1
.2
)
9
8
(1
8
.4
)
1
3
3
4
(2
0
.3
)
8
1
4
(1
5
.6
)
7
.6
2
.1
0
.3
2
.3
2
.9
3
.0
S
m
o
k
e
rs
7
(1
.2
)
2
5
6
(4
3
.2
)
5
0
5
(7
.7
)
2
7
2
8
(4
5
.1
)
3
.7
1
.7
1
.5
2
.8
3
.0
3
.0
M
e
d
ic
a
l
h
is
t
o
r
y
D
ia
b
e
te
s
2
4
3
(4
0
.6
)
7
1
5
(1
0
.9
)
7
2
.1
5
.0
7
.7
7
.1
7
.2
7
.1
C
a
rd
io
v
a
sc
u
la
r
d
is
e
a
se
1
4
1
(2
3
.5
)
6
5
1
(9
.9
)
3
7
.1
1
1
.4
1
1
.4
1
3
.6
1
3
.6
1
3
.6
C
ir
c
u
la
to
ry
d
is
e
a
se
4
2
6
(7
1
.1
)
3
4
7
1
(5
2
.9
)
3
8
.2
1
3
.6
9
.8
1
6
.6
1
6
.7
1
6
.6
H
e
a
rt
fa
il
u
re
5
1
(8
.5
)
4
2
6
(6
.5
)
7
.7
1
1
.6
6
.2
1
2
.8
1
2
.8
1
2
.8
C
a
n
c
e
r
3
7
(6
.2
)
6
0
7
(9
.2
)
1
1
.5
2
.1
0
.4
0
.4
0
.0
0
.1
D
e
m
e
n
ti
a
6
(1
.0
)
1
9
0
(2
.9
)
1
3
.7
7
.3
1
3
.0
1
1
.6
1
1
.6
1
1
.6
H
y
p
e
rt
e
n
si
o
n
3
3
6
(5
6
.1
)
1
1
6
5
(1
7
.8
)
5
2
.1
1
3
.3
2
1
.5
1
8
.7
1
8
.7
1
8
.7
H
y
p
e
rl
ip
id
e
m
ia
2
0
5
(3
4
.2
)
1
8
2
(2
.8
)
8
8
.5
1
.1
4
.1
1
.9
2
.0
2
.0
T
r
e
a
t
m
e
n
t
s
A
n
ti
d
e
p
re
ss
a
n
t
1
0
8
(1
8
.0
)
9
9
5
(1
5
.2
)
7
.7
1
.7
5
.9
0
.3
0
.1
0
.1
A
n
ti
p
sy
ch
o
ti
c
1
1
(1
.8
)
3
4
0
(5
.2
)
1
8
.3
0
.5
1
1
.3
5
.0
5
.0
5
.0
H
o
rm
o
n
e
re
p
la
c
e
m
e
n
t
th
e
ra
p
y
3
7
(6
.2
)
2
7
7
(4
.2
)
8
.8
0
.9
0
.9
1
.0
1
.0
1
.0
S
te
ro
id
9
3
(1
5
.5
)
1
0
9
0
(1
6
.6
)
3
.0
1
.0
2
.2
0
.4
0
.3
0
.3
A
n
ti
h
y
p
e
rt
e
n
si
v
e
2
7
2
(4
5
.4
)
1
1
6
5
(1
7
.8
)
6
2
.3
1
2
.6
2
7
.5
1
8
.0
1
7
.8
1
7
.9
D
iu
re
ti
c
s
3
1
9
(5
3
.3
)
2
4
1
6
(3
6
.8
)
3
3
.4
1
4
.3
1
9
.8
1
5
.8
1
5
.9
1
5
.9
B
e
ta
b
lo
ck
e
r
1
9
3
(3
2
.2
)
1
0
6
1
(1
6
.2
)
3
8
.1
1
1
.4
7
.2
1
3
.8
1
3
.8
1
3
.8
N
it
ra
te
7
4
(1
2
.4
)
3
3
4
(5
.1
)
2
5
.9
1
7
.3
1
4
.8
1
7
.5
1
7
.6
1
7
.6
F
o
r
C
C
a
n
a
ly
si
s,
n
=
5
1
6
8
(5
0
3
st
a
ti
n
u
se
rs
a
n
d
4
6
6
5
n
o
n
u
se
rs
).
19
Table 6: Estimate of the relative risk of mortality and its 95% confidence interval for statin vs non statin users
(motivating example). n=7158.
Method R̂R 95% CI(R̂R)
Crude 0.587 [0.497;0.684]
CC 0.702 [0.534;0.924]
MP 0.708 [0.555;0.904]
MIte 0.654 [0.513;0.835]
MIps 0.653 [0.512;0.834]
MIpar 0.654 [0.513;0.834]
RR: relative risk. CC: complete case; MP: missingness pattern; MIte: treatment effects combined after
multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score
parameters combined after multiple imputation. RR: relative risk.
20
Figure 1: The three approaches considered after multiple imputation (MI) of the partially observed covariates
? are missing values on the original dataset. ∗(k), (k = 1, ...,M) are imputed values in the kth imputed dataset. θˆ(k) and eˆ(k)
are the estimated treatment effect and estimated propensity scores, respectively from the kth imputed dataset , (k = 1, ...,M).
The MIte approach consists of pooling the M treatment effects estimated with IPTW on each imputed dataset. MIps estimate is
obtained by using the average PS across the M imputed datasets in the IPTW estimator. Finally, the MIpar approach uses the PS
of the average covariate value across the M imputed dataset. The PS is estimated using the average PS parameters as regression
coefficients.
21
Missingness associated with Y
B
ia
s
B
ia
s
0
0.
05
0.
10
0.
15
0.
20
Full
CC
MP
MIte+
MIte−
MIps+
MIps−
MIpar+
MIpar−
Missingness independent of Y
B
ia
s
B
ia
s
0
0.
05
0.
10
0.
15
0.
20
Full CC
MP
MIte+
MIte−
MIps+
MIps−
MIpar+
MIpar−
Missingness associated with Y
B
ia
s
B
ia
s
0
0.
05
0.
10
0.
15
0.
20
Full
CC
MP
MIte+
MIte−
MIps+
MIps−
MIpar+
MIpar−
Missingness independent of Y
B
ia
s
B
ia
s
0
0.
05
0.
10
0.
15
0.
20
Full
CC
MP
MIte+
MIte−
MIps+
MIps−
MIpar+
MIpar−
RR=1
RR=2
Figure 2: Absolute value of the bias for the 4 scenarios in which ρ = 0.6.
CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity scores
combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation. For the 3 MI
approaches ’+’ means that the outcome is included in the imputation model, ’-’ means that the outcome is not in the imputation
model. RR: relative risk.
22
Co
ve
ra
ge
 ra
te
 (%
)
55
65
75
85
95
Full CC MP MIte MIps MIpar
Figure 3: Coverage rate of the 95% CI for each method compared
Results are pooled for the 8 main scenarios. CC: complete case; MP: missingness pattern; MIte: treatment effects combined after
multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score parameters combined
after multiple imputation. RR: relative risk. For the 3 MI methods, the outcome is included in the imputation model.
10% of missing data
B
ia
s
0
0.
05
0.
10
0.
15
0.
20
Full
CC MP
MIte
MIps MIpar
60% of missing data
B
ia
s
0
0.
05
0.
10
0.
15
0.
20
Full
CC
MP
MIte
MIps
MIpar
Figure 4: Absolute value of the bias according to the missingness rate
CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity
scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation. RR: relative
risk. For the 3 MI methods, the outcome is included in the imputation model.
23
−0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
0
5
10
15
20
25
PS distribution
D
en
si
ty
Statin users
Non statin users
Figure 5: Distribution of the propensity score estimated on complete cases for statin users and non users
(n=5168).
24
Appendix
Appendix 1: Variance estimate for MIpar
In this section, we provide a crude approximation to the variance for our MIpar estimator. In MIpar, we take
the average propensity score parameters across imputed datasets, α, and the average of each covariate across
imputed dataset for each individual, Xi = (Xobs,i,
∑
k X
(k)
m,i/K) and use these to calculate the propensity score,
e(Xi;α). The MIpar treatment effect estimate is:
θˆMIpar =
∑
i
YiZi
e(Xi;α)∑
i
Zi
e(Xi;α)
−
∑
i
Yi(1−Zi)
(1−e(Xi;α))∑
i
(1−Zi)
(1−e(Xi;α))
If we let I represent the imputation distribution, and W the distribution of observed data (Y,Z,Xobs), the
variance of the treatment effect estimate is given by
V [θˆMIpar ] = EW
[
VI|W (θˆMIpar )
]
+ VW
(
EI|W [θˆMIpar ]
)
(13)
Suppose αˆF denotes the propensity score parameters that would have been obtained in the full data, then
EI|W [ θˆMIpar ] =
∑
i
YiZi
e(Xi;αˆF )∑
i
Zi
e(Xi;αˆF )
−
∑
i
Yi(1−Zi)
(1−e(Xi;αˆF ))∑
i
(1−Zi)
(1−e(Xi;αˆF ))
+ rn
with rn
p→ 0 as n → ∞. This is simply the full-data IPTW estimate, with the propensity scores evaluated at
X (and at the full data parameter estimates) rather than X. Following Williamson et al., the same calculation
shows that the large-sample variance of this full-data estimate, evaluated at X rather than X, is given by:
VW
(
EI|W [ θˆMI ]
)
= Vun − vTCαv + 
where Vun is the uncorrected variance estimate, that is considering that the PS is a true value rather than
an estimate, which can be estimated by:
Vˆun =
K21
nwˆ1
2
n∑
i=1
(Yi − µˆ1)2Zi
eˆ2i
+
K20
nwˆ0
2
n∑
i=1
(Yi − µˆ0)2(1− Zi)
(1− eˆi)2 ,
and
vˆ =
K1
nwˆ1
n∑
i=1
x¯i(Yi − µˆ1)Zi(1− eˆi)
eˆi
+
K0
nwˆ0
n∑
i=1
x¯i(Yi − µˆ1)(1− Zi)eˆi
(1− eˆi) .
µˆ1 and µˆ0 are the marginal (weighted) means in the treated and control group, and wˆ1 and wˆ0 the average
estimated weights for IPTW in the treated and control group, respectively, and eˆi = e(x¯i; αˆ). Finally, the
quantities K0 and K1 depend on the measure of interest for the treatment effect. For a difference in mean
or a risk difference, K0 = K1 = 1. For a relative risk, K0 = µˆ0
−1 and K1 = µˆ1−1, and for an odds ratio,
K0 = {µˆ0(1− µˆ0)}−1 and K1 = {µˆ1(1− µˆ1)}−1.
Cα is n
2×the variance/covariance matrix of the estimated PS parameters αˆ in the full data, which we
estimate by the within-imputation matrix W. The additional part of the variance is  = 2vCα
∗, where
∗ =
1
nE[Z/e(x¯;αF )]
E
[
x>(Y − µF1 )Z
{
(e(x;αF )− e(x;αF )
e(x;αF )
}]
+
1
nE[(1− Z)/(1− e(x¯;αF ))]E
[
x>(Y − µF0 )(1− Z)
{
(1− e(x;αF ))− (1− e(x;αF )))
1− e(x;αF )
}]
This is the only part of the variance which involves the full (unobserved) data. The magnitude of this term is
driven partly by the difference between the propensity score evaluated at the average covariates across imputed
datasets and the true (possibly unobserved) covariate values. For complete cases, therefore, this term is 0. We
could attempt to estimate this component using the imputed datasets. However, we take a pragmatic approach
and assume that pooling of the propensity scores is undertaken due to a desire to avoid the need to retain all
K imputed datasets. Therefore, we ignore this term in the simulation study and evaluate the performance of
25
the estimator without this term. Thus, we assume that approximately
V̂W
(
EI|W [ θˆMI ]
)
= Vˆun − vˆTWvˆ (14)
This is our estimator for the second term in 13. For the first term, we begin by noting that, conditional on the
observed data (Y, Z,Xobs), we have approximately,
VI|W (θˆMIpar ) =
(
∂θˆMIpar
∂α
)T
CovI|W (α)
(
∂θˆMIpar
∂α
)
Noting that the average of the propensity score parameters across imputed datasets is also the standard MI
estimate of these parameters,
Ĉov(α) =
(
1 +
1
M
)
B
where B is the between-imputation covariance matrix of the propensity score parameters. Differentiating the
estimate above gives
∂θˆMI
∂α
= −
∑
i
[
(Y−µˆ1)Zxe(x;α)(1−e(x;α))
(e(x;α))2
]
∑
i
[
Z
e(x;α)
] − ∑i
[
(Y−µˆ0)(1−Z)xe(x;α)(1−e(x;α))
(1−e(x;α))2
]
∑
i
[
(1−Z)
(1−e(x;α))
] = −vˆ
Thus, we can estimate
Ê
[
VI|W (θˆMI)
]
=
(
1 +
1
M
)
vˆTBvˆ (15)
Putting (14) and (15) into (13), we have
Vˆ (θˆMI) = Vˆun − vˆT
{
W −
(
1 +
1
M
)
B
}
vˆ
Note that the two components of variability in the propensity score parameters - W and B - act in opposite
directions on the variance of the treatment effect estimator. The between-imputation variance reflects noise
due to the missing data, thus this adds to the overall variance of the treatment effect estimator. The within-
imputation variance reflects the correlation between the covariates and treatment (ie covariate imbalance),
which results in smaller variance of the treatment effect estimator (since the IPTW estimator gains precision
by rebalancing imbalanced covariates).
26
Appendix 2: Assumptions required when pooling treatment effects
Notations Let X the vector of covariates be split into observed and partially missing, X = (Xobs,Xmiss). X
(k)
m
is the imputed value for Xmiss in the k
th imputed dataset (k = 1, ..,M) and α(k) the true propensity score
parameters in the kth imputed dataset (with α(k) = α, the overall true PS parameters, k=1,...,M).
Appendix 2a: Proof that E[Z|Xobs,X(k)m ] = e(Xobs,X(k)m ;α(k))
We have:
E[Z|Xobs,X(k)m ] =
Pr(X
(k)
m |Z = 1,Xobs)Pr(Z = 1|Xobs)∑
z=0,1 Pr(X
(k)
m |Z = z,Xobs)Pr(Z = z|Xobs)
. (16)
We now express the probabilities Pr(X
(k)
m |Z = z,Xobs) in terms of the (unobserved) missing values:
Pr(X(k)m |Z = z,Xobs) =
∑
y=0,1
Pr(X(k)m |Z = z,Xobs, Y = y)Pr(Y = y |Z = z,Xobs)
=
∑
y=0,1
Pr(Xmiss|Z = z,Xobs, Y = y)Pr(Y = y |Z = z,Xobs)
= Pr(Xmiss|Z = z,Xobs),
if the values are imputed from the true distribution. Substituting this back into (16) gives
E[Z|Xobs,X(k)m = x] = E[Z|Xobs,Xmiss = x] = e(Xobs,x;α) = e(Xobs,X(k)m = x;α(k)).
27
Appendix 2b: Proof that Y z=1 ⊥ Z |Xobs,X(k)m
We have:
Pr(Z = 1|Y z=1 = y1,Xobs,X(k)m )
=
Pr(X
(k)
m |Z = 1, Y z=1 = y1,Xobs)Pr(Z = 1|Y z=1 = y1,Xobs)
Pr(X
(k)
m |Y z=1 = y1,Xobs)
=
Pr(Xmiss|Z = 1, Y z=1 = y1,Xobs)Pr(Z = 1|Y z=1 = y1,Xobs)
Pr(Xmiss|Y1 = y1,Xobs) (17)
= Pr(Z = 1|Y z=1 = y1,Xobs,Xmiss)
= Pr(Z = 1|Xobs,Xmiss) (18)
= Pr(Z = 1|Xobs,X(k)m ), (19)
where (17) was reached by noting that Y1 = Y when Z = 1, (18) is true by the SITA assumption for the full
data (Assumption 1), and (19) was shown in Appendix 1a.
Proof that Y z=0 ⊥ Z |Xobs,X(k)m follows in a similar way.
28
Appendix 3: inconsistency of MIps and MIpar estimators: a counter
example
Neither θˆMIps nor θˆMIpar are consistent estimators. This can be shown by a simple counter-example. Suppose
we have a single covariate, X, which is MAR depending on treatment Z only:
X ∼ Bernouilli(0.5)
Z ∼
{
Bernouilli(0.1) if X = 0
Bernouilli(0.9) if X = 1
Y ∼
{
Bernouilli(0.1) if X = 0 and/or Z = 0
Bernouilli(0.9) if X = Z = 1
R ∼
{
1 if Z = 0
Bernouilli(0.1) if Z = 1
Xobs = X : R = 1 and Xmiss = X : R = 0. The treatment model corresponds to: ln(e/(1− e)) = −2.1972246 +
4.3944492X, thus α = (−2.1972246, 4.3944492)T . The true expected potential outcome is E[Y z=1] = 0.5.
Consistency when pooling propensity scores When Z = 1 and Y = 1, the expected propensity score
is E[e(Xobs, X(k)m ;α)] = E [E[e(X,α)|X]] = 0.89. When Z = 1, Y = 1, X can take values of 0, 1 or missing,
producing average (expected) propensity scores of 0.1, 0.9, and 0.89, respectively. Thus
E
[
Y Z
E[e(Xobs, X(k)m ;αt)]
]
= Pr(X = 0, Z = 1, Y = 1, R = 1)/0.1
+ Pr(X = 1, Z = 1, Y = 1, R = 1)/0.9 + Pr(Z = 1, Y = 1, R = 0)/0.89
= 0.465
So E[ Y Z
E
[
e(Xobs,X
(k)
m ;α)
] ] 6= E[Y z=1]. Thus pooling the propensity scores will produce an inconsistent estimator
here.
Consistency when pooling propensity score parameters When Y = 1, Z = 1, the expected value of
X is µbarXE[Xmiss|Z = 1, Y = 1] = Pr(X = 1, Z = 1, Y = 1)/Pr(Z = 1, Y = 1) = 81/82. The true propensity
score evaluated at α with X = 0, 1 and 81/82 is 0, 1, and 0.895, respectively. Then,
E
[
Y Z
e(Xobs, µX ;α)
]
= Pr(X = 0, Z = 1, Y = 1, R = 1)/0.1
+ Pr(X = 1, Z = 1, Y = 1, R = 1)/0.9 + Pr(Z = 1, Y = 1, R = 0)/0.895
= 0.462
So E
[
Y Z
e(Xobs,µX ;α)
]
6= E[Y z=1]. Thus pooling the propensity scores will produce an inconsistent estimator here
also.
Appendix 4: Consistency for the MP approach
Let e∗ be the generalized PS estimated from the MP approach. Rosenbaum and Rubin showed that Xobs ⊥ Z|e∗.
This implies that for each value of e∗, the observed part of the covariates and the frequency of missingness are
balanced between groups. However, they state that we do not have: X ⊥ Z|e∗. Because we do not have this
conditional independence, e∗ is not a balancing score for (Xobs,Xmiss) and thus the MP estimator of treatment
effect is inconsistent, without making further conditional independence assumptions.
29
A
p
p
e
n
d
ix
5
:
A
d
d
it
io
n
a
l
si
m
u
la
ti
o
n
re
su
lt
s
5
.1
:
V
a
ri
a
n
ce
e
st
im
a
ti
o
n
T
ab
le
7:
U
n
co
rr
ec
te
d
an
d
co
rr
ec
te
d
m
o
d
el
-b
as
ed
va
ri
a
n
ce
a
n
d
em
p
ir
ic
a
l
va
ri
a
n
ce
o
f
th
e
tr
ea
tm
en
t
eff
ec
t
es
ti
m
a
to
r
fo
r
th
e
8
m
a
in
sc
en
a
ri
o
s.
S
c
e
n
a
ri
o
F
u
ll
M
It
e
M
Ip
s
M
Ip
a
r
Vˆ
u
Vˆ
P
S
Vˆ
e
Vˆ
m
Vˆ
P
S
+
m
Vˆ
e
Vˆ
P
S
Vˆ
P
S
+
m
Vˆ
e
Vˆ
P
S
Vˆ
P
S
+
m
Vˆ
e
R
R
=
1
,
ρ
=
0
.3
,
Y
0
.0
1
0
0
.0
0
9
0
.0
0
9
0
.0
1
1
0
.0
1
0
0
.0
0
9
0
.0
0
9
0
.0
0
8
0
.0
0
9
0
.0
1
0
0
.0
0
9
0
.0
0
9
R
R
=
1
,
ρ
=
0
.3
0
.0
1
0
0
.0
0
9
0
.0
0
9
0
.0
1
1
0
.0
1
0
0
.0
0
9
0
.0
0
9
0
.0
0
8
0
.0
0
9
0
.0
1
0
0
.0
0
9
0
.0
0
9
R
R
=
2
,
ρ
=
0
.3
,
Y
0
.0
0
7
0
.0
0
6
0
.0
0
6
0
.0
0
7
0
.0
0
6
0
.0
0
6
0
.0
0
6
0
.0
0
6
0
.0
0
6
0
.0
0
6
0
.0
0
6
0
.0
0
6
R
R
=
2
,
ρ
=
0
.3
0
.0
0
7
0
.0
0
6
0
.0
0
6
0
.0
0
7
0
.0
0
6
0
.0
0
6
0
.0
0
6
0
.0
0
6
0
.0
0
6
0
.0
0
6
0
.0
0
6
0
.0
0
6
R
R
=
1
,
ρ
=
0
.6
,
Y
0
.0
1
1
0
.0
0
9
0
.0
0
9
0
.0
1
2
0
.0
0
9
0
.0
0
9
0
.0
1
0
0
.0
0
9
0
.0
0
9
0
.0
1
0
0
.0
0
9
0
.0
0
9
R
R
=
1
,
ρ
=
0
.6
0
.0
1
1
0
.0
0
9
0
.0
0
9
0
.0
1
2
0
.0
0
9
0
.0
0
9
0
.0
1
0
0
.0
0
9
0
.0
0
9
0
.0
1
0
0
.0
0
9
0
.0
0
9
R
R
=
2
,
ρ
=
0
.6
,
Y
0
.0
0
8
0
.0
0
6
0
.0
0
6
0
.0
0
8
0
.0
0
7
0
.0
0
6
0
.0
0
7
0
.0
0
6
0
.0
0
6
0
.0
0
7
0
.0
0
6
0
.0
0
6
R
R
=
2
,
ρ
=
0
.6
0
.0
0
8
0
.0
0
6
0
.0
0
6
0
.0
0
8
0
.0
0
7
0
.0
0
6
0
.0
0
7
0
.0
0
6
0
.0
0
6
0
.0
0
7
0
.0
0
6
0
.0
0
6
N
u
m
b
er
s
a
re
m
ea
n
es
ti
m
a
te
s
o
f
th
e
v
a
ri
a
n
ce
s
in
th
e
8
m
a
in
sc
en
a
ri
o
s.
Y
m
ea
n
s
th
e
o
u
tc
o
m
e
is
in
cl
u
d
ed
in
th
e
im
p
u
ta
ti
o
n
m
o
d
el
.
R
R
:
re
la
ti
v
e
ri
sk
;
M
It
e:
tr
ea
tm
en
t
eff
ec
ts
co
m
b
in
ed
a
ft
er
m
u
lt
ip
le
im
p
u
ta
ti
o
n
;
M
Ip
s:
p
ro
p
en
si
ty
sc
o
re
s
co
m
b
in
ed
a
ft
er
m
u
lt
ip
le
im
p
u
ta
ti
o
n
;
M
Ip
a
r:
p
ro
p
en
si
ty
sc
o
re
p
a
ra
m
et
er
s
co
m
b
in
ed
a
ft
er
m
u
lt
ip
le
im
p
u
ta
ti
o
n
.
Vˆ
u
V
a
ri
a
n
ce
es
ti
m
a
te
n
o
t
a
cc
o
u
n
ti
n
g
fo
r
th
e
u
n
ce
rt
a
in
ty
in
P
S
es
ti
m
a
ti
o
n
o
r
m
is
si
n
g
d
a
ta
Vˆ
P
S
V
a
ri
a
n
ce
es
ti
m
a
te
a
cc
o
u
n
ti
n
g
fo
r
th
e
u
n
ce
rt
a
in
ty
in
P
S
es
ti
m
a
ti
o
n
(W
il
li
a
m
so
n
et
.
a
l
fo
rm
u
la
)
Vˆ
m
V
a
ri
a
n
ce
es
ti
m
a
te
a
cc
o
u
n
ti
n
g
fo
r
m
is
si
n
g
d
a
ta
b
u
t
n
o
t
fo
r
th
e
u
n
ce
rt
a
in
ty
in
P
S
es
ti
m
a
ti
o
n
Vˆ
P
S
+
m
V
a
ri
a
n
ce
es
ti
m
a
te
a
cc
o
u
n
ti
n
g
fo
r
th
e
u
n
ce
rt
a
in
ty
in
P
S
es
ti
m
a
ti
o
n
a
n
d
a
cc
o
u
n
ti
n
g
fo
r
th
e
p
re
se
n
ce
o
f
m
is
si
n
g
d
a
ta
Vˆ
e
E
m
p
ir
ic
a
l
v
a
ri
a
n
ce
30
5.2: Full results for a binary outcome
Table 8: Scenario 1: RR=1, ρ = 0.3 and outcome predictor of missingness and included in the imputation
model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.538 0.000 0.092 0.182 0.003 0.036 0.021
Variance 0.006 0.009 0.025 0.011 0.010 0.008 0.009
Empirical variance 0.006 0.009 0.026 0.009 0.009 0.009 0.009
Coverage rate 0.000 0.947 0.898 0.570 0.959 0.929 0.944
OR
Bias 0.722 0.000 0.121 0.239 0.004 0.048 0.028
Variance 0.012 0.015 0.041 0.019 0.016 0.015 0.015
Empirical variance 0.012 0.015 0.042 0.016 0.015 0.015 0.015
Coverage rate 0.000 0.948 0.908 0.580 0.958 0.931 0.946
Risk difference
Bias 0.136 0.000 0.022 0.043 0.001 0.009 0.005
Variance 0.000 0.000 0.001 0.001 0.001 0.000 0.000
Empirical variance 0.000 0.000 0.001 0.001 0.000 0.000 0.000
Coverage rate 0.000 0.949 0.928 0.602 0.960 0.935 0.947
* For complete case analysis, the average sample size is 1061.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
Table 9: Scenario 2: RR=1, ρ = 0.3 and outcome independent of missingness but included in the imputation
model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.539 -0.001 -0.007 0.082 0.002 0.030 0.016
Variance 0.006 0.009 0.030 0.011 0.010 0.008 0.009
Empirical variance 0.006 0.009 0.031 0.009 0.009 0.009 0.009
Coverage rate 0.000 0.949 0.941 0.890 0.958 0.938 0.943
OR
Bias 0.722 0.000 -0.006 0.107 0.003 0.040 0.022
Variance 0.012 0.015 0.044 0.018 0.016 0.015 0.015
Empirical variance 0.012 0.015 0.046 0.016 0.015 0.015 0.015
Coverage rate 0.000 0.949 0.943 0.895 0.958 0.939 0.945
Risk difference
Bias 0.136 0.000 0.001 0.019 0.001 0.007 0.004
Variance 0.000 0.000 0.001 0.001 0.001 0.000 0.000
Empirical variance 0.000 0.000 0.001 0.001 0.000 0.000 0.000
Coverage rate 0.000 0.948 0.942 0.903 0.959 0.942 0.946
* For complete case analysis, the average sample size is 1103.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
31
Table 10: Scenario 3: RR=2, ρ = 0.3 and outcome predictor of missingness and included in the imputation
model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.438 0.000 0.111 0.145 0.002 0.028 0.016
Variance 0.004 0.006 0.013 0.007 0.006 0.006 0.006
Empirical variance 0.004 0.006 0.014 0.006 0.006 0.006 0.006
Coverage rate 0.000 0.949 0.815 0.584 0.958 0.931 0.940
OR
Bias 0.747 0.002 0.130 0.227 0.004 0.047 0.026
Variance 0.011 0.014 0.036 0.018 0.016 0.014 0.015
Empirical variance 0.011 0.015 0.038 0.016 0.015 0.015 0.015
Coverage rate 0.000 0.947 0.887 0.622 0.958 0.930 0.942
Risk difference
Bias 0.163 0.000 0.018 0.048 0.001 0.010 0.005
Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.948 0.928 0.661 0.957 0.928 0.939
* For complete case analysis, the average sample size is 1076.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
Table 11: Scenario 4: RR=2, ρ = 0.3 and outcome independent of missingness but included in the imputation
model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.439 0.002 0.102 0.074 0.003 0.027 0.016
Variance 0.004 0.006 0.015 0.007 0.006 0.006 0.006
Empirical variance 0.004 0.006 0.016 0.006 0.006 0.006 0.006
Coverage rate 0.000 0.950 0.850 0.868 0.958 0.931 0.941
OR
Bias 0.748 0.005 0.051 0.112 0.007 0.041 0.023
Variance 0.011 0.014 0.036 0.018 0.016 0.014 0.015
Empirical variance 0.011 0.015 0.038 0.016 0.015 0.015 0.015
Coverage rate 0.000 0.950 0.935 0.883 0.959 0.933 0.941
Risk difference
Bias 0.163 0.001 -0.014 0.023 0.001 0.008 0.004
Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.951 0.919 0.898 0.959 0.936 0.943
* For complete case analysis, the average sample size is 1103.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
32
Table 12: Scenario 5: RR=1, ρ = 0.6 and outcome predictor of missingness and included in the imputation
model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.650 0.001 0.099 0.162 0.006 0.036 0.022
Variance 0.006 0.009 0.027 0.012 0.009 0.009 0.009
Empirical variance 0.006 0.009 0.029 0.010 0.009 0.009 0.009
Coverage rate 0.000 0.945 0.886 0.686 0.955 0.928 0.937
OR
Bias 0.883 0.002 0.128 0.214 0.008 0.048 0.029
Variance 0.011 0.015 0.044 0.020 0.017 0.015 0.015
Empirical variance 0.012 0.016 0.047 0.017 0.016 0.016 0.016
Coverage rate 0.000 0.945 0.896 0.693 0.955 0.928 0.938
Risk difference
Bias 0.169 0.001 0.023 0.040 0.002 0.009 0.006
Variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.945 0.914 0.705 0.956 0.932 0.941
* For complete case analysis, the average sample size is 1058.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
Table 13: Scenario 6: RR=1, ρ = 0.6 and outcome independent of missingness but included in the imputation
model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.649 -0.001 -0.001 0.063 0.003 0.030 0.016
Variance 0.006 0.009 0.033 0.011 0.009 0.009 0.009
Empirical variance 0.006 0.009 0.035 0.010 0.009 0.009 0.009
Coverage rate 0.000 0.944 0.938 0.927 0.954 0.931 0.940
OR
Bias 0.881 0.000 0.002 0.084 0.005 0.040 0.022
Variance 0.011 0.015 0.048 0.020 0.017 0.015 0.015
Empirical variance 0.011 0.016 0.051 0.017 0.016 0.015 0.016
Coverage rate 0.000 0.944 0.941 0.929 0.954 0.933 0.941
Risk difference
Bias 0.169 0.000 0.002 0.016 0.001 0.007 0.004
Variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.945 0.938 0.934 0.953 0.936 0.941
* For complete case analysis, the average sample size is 1099.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
33
Table 14: Scenario 7: RR=2, ρ = 0.6 and outcome predictor of missingness and included in the imputation
model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.529 0.002 0.141 0.130 0.005 0.028 0.017
Variance 0.004 0.006 0.014 0.008 0.007 0.006 0.006
Empirical variance 0.004 0.006 0.014 0.007 0.006 0.006 0.006
Coverage rate 0.000 0.947 0.769 0.691 0.957 0.932 0.942
OR
Bias 0.924 0.005 0.148 0.210 0.009 0.047 0.028
Variance 0.011 0.016 0.040 0.022 0.018 0.016 0.016
Empirical variance 0.011 0.017 0.040 0.018 0.017 0.016 0.017
Coverage rate 0.000 0.948 0.887 0.712 0.958 0.932 0.943
Risk difference
Bias 0.199 0.001 0.017 0.045 0.002 0.011 0.006
Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.945 0.937 0.731 0.957 0.932 0.942
* For complete case analysis, the average sample size is 1074.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
Table 15: Scenario 8: RR=2, ρ = 0.6 and outcome independent of missingness but included in the imputation
model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.531 0.002 0.138 0.057 0.005 0.027 0.016
Variance 0.004 0.006 0.016 0.008 0.007 0.006 0.006
Empirical variance 0.004 0.006 0.017 0.007 0.006 0.006 0.006
Coverage rate 0.000 0.947 0.793 0.921 0.960 0.937 0.946
OR
Bias 0.927 0.006 0.077 0.091 0.009 0.041 0.024
Variance 0.011 0.016 0.039 0.021 0.018 0.016 0.016
Empirical variance 0.011 0.016 0.041 0.017 0.016 0.016 0.016
Coverage rate 0.000 0.949 0.935 0.924 0.959 0.938 0.944
Risk difference
Bias 0.200 0.002 -0.015 0.020 0.002 0.009 0.005
Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.947 0.918 0.926 0.955 0.934 0.941
* For complete case analysis, the average sample size is 1099.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
34
Table 16: Scenario 9: RR=1, ρ = 0.3 and outcome predictor of missingness but not included in the imputation
model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.540 0.000 0.093 0.182 0.084 0.100 0.090
Variance 0.006 0.009 0.025 0.011 0.010 0.009 0.009
Empirical variance 0.006 0.009 0.026 0.010 0.008 0.008 0.008
Coverage rate 0.000 0.946 0.892 0.573 0.883 0.807 0.835
OR
Bias 0.724 0.001 0.121 0.239 0.111 0.132 0.119
Variance 0.012 0.015 0.041 0.019 0.017 0.015 0.015
Empirical variance 0.012 0.015 0.043 0.017 0.015 0.015 0.015
Coverage rate 0.000 0.947 0.901 0.580 0.886 0.810 0.838
Risk difference
Bias 0.136 0.000 0.022 0.043 0.020 0.024 0.022
Variance 0.000 0.000 0.001 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.000 0.001 0.001 0.000 0.000 0.000
Coverage rate 0.000 0.947 0.920 0.598 0.892 0.820 0.849
* For complete case analysis, the average sample size is 1060.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
Table 17: Scenario 10: RR=1, ρ = 0.3 and outcome independent of missingness and not included in the
imputation model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.540 0.000 -0.006 0.082 0.088 0.098 0.090
Variance 0.006 0.009 0.030 0.011 0.010 0.008 0.009
Empirical variance 0.006 0.009 0.032 0.009 0.008 0.008 0.008
Coverage rate 0.000 0.947 0.942 0.891 0.882 0.824 0.845
OR
Bias 0.724 0.000 -0.005 0.108 0.116 0.129 0.119
Variance 0.012 0.015 0.045 0.018 0.017 0.015 0.015
Empirical variance 0.012 0.015 0.047 0.016 0.014 0.014 0.014
Coverage rate 0.000 0.947 0.941 0.894 0.884 0.827 0.847
Risk difference
Bias 0.136 0.000 0.001 0.020 0.021 0.024 0.022
Variance 0.000 0.000 0.001 0.001 0.001 0.000 0.000
Empirical variance 0.000 0.000 0.001 0.001 0.000 0.000 0.000
Coverage rate 0.000 0.948 0.940 0.903 0.893 0.835 0.855
* For complete case analysis, the average sample size is 1103.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
35
Table 18: Scenario 11: RR=2, ρ = 0.3 and outcome predictor of missingness but not included in the imputation
model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.436 0.000 0.111 0.144 0.065 0.077 0.070
Variance 0.004 0.006 0.013 0.007 0.006 0.006 0.006
Empirical variance 0.004 0.006 0.014 0.006 0.006 0.006 0.006
Coverage rate 0.000 0.946 0.818 0.583 0.893 0.825 0.854
OR
Bias 0.743 0.001 0.130 0.227 0.108 0.128 0.115
Variance 0.011 0.014 0.036 0.018 0.017 0.015 0.015
Empirical variance 0.011 0.015 0.038 0.017 0.015 0.015 0.015
Coverage rate 0.000 0.945 0.894 0.614 0.887 0.819 0.847
Risk difference
Bias 0.162 0.000 0.018 0.048 0.024 0.028 0.025
Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.944 0.923 0.646 0.886 0.822 0.851
* For complete case analysis, the average sample size is 1075.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
Table 19: Scenario 12: RR=2, ρ = 0.3 and outcome independent of missingness and not included in the
imputation model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.437 0.002 0.099 0.073 0.069 0.079 0.073
Variance 0.004 0.006 0.015 0.007 0.006 0.006 0.006
Empirical variance 0.004 0.006 0.016 0.006 0.006 0.006 0.006
Coverage rate 0.000 0.947 0.860 0.875 0.879 0.824 0.843
OR
Bias 0.745 0.004 0.047 0.110 0.115 0.126 0.117
Variance 0.011 0.014 0.036 0.018 0.017 0.015 0.015
Empirical variance 0.011 0.015 0.038 0.016 0.014 0.014 0.015
Coverage rate 0.000 0.947 0.936 0.887 0.877 0.829 0.846
Risk difference
Bias 0.162 0.001 -0.015 0.023 0.026 0.027 0.025
Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.946 0.913 0.901 0.879 0.836 0.855
* For complete case analysis, the average sample size is 1103.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
36
Table 20: Scenario 13: RR=1, ρ = 0.6 and outcome predictor of missingness but not included in the imputation
model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.650 0.001 0.096 0.161 0.061 0.082 0.070
Variance 0.006 0.009 0.027 0.012 0.010 0.009 0.009
Empirical variance 0.006 0.009 0.029 0.010 0.009 0.009 0.009
Coverage rate 0.000 0.939 0.896 0.683 0.919 0.858 0.882
OR
Bias 0.883 0.002 0.124 0.214 0.081 0.109 0.094
Variance 0.011 0.015 0.044 0.020 0.017 0.015 0.015
Empirical variance 0.012 0.016 0.046 0.017 0.015 0.015 0.015
Coverage rate 0.000 0.938 0.905 0.690 0.921 0.861 0.885
Risk difference
Bias 0.169 0.001 0.022 0.040 0.015 0.020 0.017
Variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.939 0.925 0.703 0.925 0.867 0.890
* For complete case analysis, the average sample size is 1058.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
Table 21: Scenario 14: RR=1, ρ = 0.6 and outcome independent of missingness and not included in the
imputation model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.650 -0.001 -0.001 0.063 0.063 0.080 0.070
Variance 0.006 0.009 0.033 0.011 0.010 0.009 0.009
Empirical variance 0.006 0.009 0.034 0.009 0.008 0.008 0.008
Coverage rate 0.000 0.952 0.937 0.931 0.928 0.870 0.893
OR
Bias 0.882 0.000 0.002 0.083 0.084 0.106 0.093
Variance 0.011 0.015 0.048 0.020 0.017 0.015 0.015
Empirical variance 0.012 0.015 0.051 0.016 0.014 0.014 0.014
Coverage rate 0.000 0.953 0.939 0.935 0.929 0.873 0.896
Risk difference
Bias 0.169 0.000 0.002 0.015 0.016 0.020 0.017
Variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.001 0.001 0.001 0.000 0.000
Coverage rate 0.000 0.953 0.938 0.938 0.932 0.882 0.904
* For complete case analysis, the average sample size is 1099.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
37
Table 22: Scenario 15: RR=2, ρ = 0.6 and outcome predictor of missingness but not included in the imputation
model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.529 0.000 0.139 0.129 0.048 0.064 0.056
Variance 0.004 0.006 0.014 0.008 0.007 0.006 0.006
Empirical variance 0.004 0.006 0.014 0.006 0.006 0.006 0.006
Coverage rate 0.000 0.953 0.763 0.696 0.929 0.878 0.901
OR
Bias 0.924 0.001 0.144 0.207 0.081 0.107 0.092
Variance 0.011 0.016 0.039 0.021 0.018 0.017 0.017
Empirical variance 0.011 0.016 0.041 0.018 0.016 0.016 0.016
Coverage rate 0.000 0.953 0.897 0.716 0.927 0.872 0.894
Risk difference
Bias 0.199 0.001 0.016 0.045 0.018 0.024 0.021
Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.948 0.936 0.737 0.925 0.869 0.894
* For complete case analysis, the average sample size is 1074.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
Table 23: Scenario 16: RR=2, ρ = 0.6 and outcome independent of missingness and not included in the
imputation model.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.530 0.001 0.139 0.057 0.051 0.065 0.058
Variance 0.004 0.006 0.016 0.008 0.007 0.006 0.006
Empirical variance 0.004 0.006 0.018 0.007 0.006 0.006 0.006
Coverage rate 0.000 0.944 0.795 0.918 0.923 0.869 0.891
OR
Bias 0.926 0.003 0.077 0.090 0.086 0.105 0.093
Variance 0.011 0.016 0.039 0.021 0.018 0.016 0.017
Empirical variance 0.011 0.017 0.042 0.018 0.016 0.017 0.017
Coverage rate 0.000 0.945 0.928 0.925 0.921 0.875 0.894
Risk difference
Bias 0.199 0.001 -0.015 0.019 0.019 0.023 0.020
Variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.002 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.945 0.912 0.927 0.919 0.877 0.894
* For complete case analysis, the average sample size is 1099.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
38
5.3: Impact of the sample size
Table 24: Results for one scenario (RR=2, ρ = 0.6 and outcome predictor of missingness and included in the
imputation model) with n = 500.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.442 0.007 0.110 0.153 0.010 0.038 0.024
Variance 0.017 0.022 0.050 0.029 0.026 0.022 0.023
Empirical variance 0.018 0.024 0.059 0.027 0.025 0.024 0.025
Coverage rate 0.065 0.940 0.887 0.855 0.955 0.939 0.943
OR
Bias 0.753 0.015 0.145 0.244 0.020 0.066 0.041
Variance 0.044 0.057 0.139 0.077 0.065 0.058 0.058
Empirical variance 0.045 0.061 0.168 0.071 0.063 0.062 0.063
Coverage rate 0.047 0.942 0.913 0.865 0.952 0.934 0.937
Risk difference
Bias 0.162 0.002 0.021 0.051 0.003 0.013 0.008
Variance 0.002 0.003 0.007 0.004 0.003 0.003 0.003
Empirical variance 0.002 0.003 0.009 0.004 0.003 0.003 0.003
Coverage rate 0.046 0.934 0.902 0.869 0.947 0.928 0.929
* For complete case analysis, the average sample size is 269.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
39
5.4: Impact of the missingness rate
Table 25: Results for one scenario (RR=2, ρ = 0.6 and outcome predictor of missingness and included in the
imputation model) with 10% of data missing for X1 and X3.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.529 0.000 0.066 0.061 0.001 0.009 0.005
Variance 0.004 0.006 0.007 0.008 0.006 0.006 0.006
Empirical variance 0.004 0.006 0.008 0.007 0.006 0.006 0.006
Coverage rate 0.000 0.946 0.874 0.907 0.951 0.945 0.947
OR
Bias 0.924 0.002 0.073 0.098 0.003 0.015 0.009
Variance 0.011 0.016 0.020 0.021 0.017 0.016 0.016
Empirical variance 0.012 0.017 0.021 0.018 0.017 0.017 0.017
Coverage rate 0.000 0.944 0.913 0.904 0.949 0.945 0.946
Risk difference
Bias 0.199 0.001 0.010 0.021 0.001 0.004 0.002
Variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.001 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.942 0.933 0.905 0.948 0.943 0.944
* For complete case analysis, the average sample size is 1609.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
Table 26: Results for one scenario (RR=2, ρ = 0.6 and outcome predictor of missingness and included in the
imputation model) with 60% of data missing for X1 and X3.
Crude Full CC* MP MIte MIps MIpar
RR
Bias 0.530 0.001 0.184 0.175 0.010 0.063 0.037
Variance 0.004 0.006 0.074 0.008 0.008 0.006 0.006
Empirical variance 0.004 0.006 0.101 0.007 0.007 0.006 0.007
Coverage rate 0.000 0.945 0.802 0.490 0.962 0.869 0.916
OR
Bias 0.925 0.004 0.184 0.285 0.018 0.105 0.060
Variance 0.011 0.016 0.208 0.025 0.021 0.016 0.017
Empirical variance 0.012 0.017 0.285 0.022 0.019 0.017 0.018
Coverage rate 0.000 0.944 0.889 0.544 0.963 0.863 0.916
Risk difference
Bias 0.199 0.001 0.010 0.062 0.004 0.023 0.013
Variance 0.000 0.001 0.011 0.001 0.001 0.001 0.001
Empirical variance 0.000 0.001 0.014 0.001 0.001 0.001 0.001
Coverage rate 0.000 0.945 0.867 0.578 0.963 0.862 0.913
* For complete case analysis, the average sample size is 399.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
40
5.5: Number of imputed datasets
Table 27: Bias of the log(RR), its variance and coverage rate for the 3 MI approaches according the number
M of imputed datasets for one scenario (RR=2, ρ = 0.6, outcome predictor of missingness and included in the
imputation model).
MIte MIps MIpar
M=5 M=10 M=20 M=5 M=10 M=20 M=5 M=10 M=20
Bias 0.002 0.005 0.003 0.026 0.028 0.031 0.015 0.017 0.018
Variance 0.006 0.007 0.006 0.006 0.006 0.006 0.006 0.006 0.006
Empirical variance 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006
Coverage rate 0.957 0.957 0.953 0.931 0.932 0.924 0.940 0.942 0.937
MIte: treatment effects pooled after multiple imputation; MIps: propensity scores pooled after multiple imputation; MIpar:
propensity score parameters pooled after multiple imputation.
41
5.6: Partially observed outcome and treatment indicator
Table 28: Results for one scenario (RR=2, ρ = 0.6 and outcome predictor of missingness and included in the
imputation model) with 30% of data missing for X1, X3, the outcome Y and the treatment indicator Z.
Crude Full CC* MIte
RR
Bias 0.530 0.002 0.207 0.000
Variance 0.004 0.006 0.032 0.014
Empirical variance 0.004 0.006 0.034 0.012
Coverage rate 0.000 0.949 0.770 0.956
OR
Bias 0.926 0.005 0.164 -0.002
Variance 0.011 0.016 0.078 0.034
Empirical variance 0.011 0.016 0.083 0.031
Coverage rate 0.000 0.947 0.906 0.952
Risk difference
Bias 0.200 0.001 -0.002 -0.001
Variance 0.000 0.001 0.004 0.002
Empirical variance 0.000 0.001 0.004 0.002
Coverage rate 0.000 0.950 0.923 0.953
* For complete case analysis, the average sample size is 635.
RR: relative risk; OR: odds ratio; CC: complete case; MP: missingness pattern; MIte:treatment effects pooled after multiple
imputation; MIps: propensity scores pooled after multiple imputation; MIpar: propensity score parameters pooled after multiple
imputation.
42
Appendix 6: PS distribution in the simulation study
Figure 6: Distribution of the propensity score in both groups in our simulation study.
These distributions are obtained from one dataset of size n=1000000.
43