Cambridge Working Papers in Economics: 1906 
 
SEMIPARAMETRIC NONLINEAR PANEL DATA MODELS WITH 
MEASUREMENT ERROR 
 
 
Oliver Linton Ji-Liang Shiu 
  
18 January 2019 
 
 
This paper develops identification and estimation of the parameters of a nonlinear semi-parametric 
panel data model with mismeasured variables as well as the corresponding average partial effects 
using only three periods of data. The past observables are used as instruments to control the 
measurement error problem, and the time averages of perfectly observed variables are used to restrict 
the unobserved individual-specific effect by a correlated random effects specification. The proposed 
approach relies on the Fourier transforms of several conditional expectations of observable variables. 
We estimate the model via the semi-parametric sieve minimum distance estimator. The finite-sample 
properties of the estimator are investigated through Monte Carlo simulations. We use our method to 
estimate the effect of the wage rate on labor supply using PSID data. 
Cambridge Working Papers in Economics 
 
Faculty of Economics 
Semiparametric Nonlinear Panel Data Models
with Measurement Error
Oliver Linton∗ Ji-Liang Shiu†
January 12, 2019
Abstract
This paper develops identification and estimation of the parameters of a nonlin-
ear semi-parametric panel data model with mismeasured variables as well as the
corresponding average partial effects using only three periods of data. The past
observables are used as instruments to control the measurement error problem,
and the time averages of perfectly observed variables are used to restrict the un-
observed individual-specific effect by a correlated random effects specification. The
proposed approach relies on the Fourier transforms of several conditional expec-
tations of observable variables. We estimate the model via the semi-parametric
sieve minimum distance estimator. The finite-sample properties of the estimator
are investigated through Monte Carlo simulations. We use our method to estimate
the effect of the wage rate on labor supply using PSID data.
Keywords: Correlated random effects, Measurement error, Nonlinear panel
data models, Semi-parametric identification
∗Department of Economics, University of Cambridge, Email: obl20@cam.ac.uk.
†Institute for Economic and Social Research, Jinan University. Email: jishiu.econ@gmail.com.
1. Introduction
The availability of panel data allows economists to control for unobservable individual-
specific characteristics that may be correlated with explanatory variables. Substantial
progress has been made on how to handle linear or nonlinear models ignoring the
potential presence of measurement error. However, many economic quantities such as
work hours, earnings, fringe benefits, employment, and health in surveys are frequent-
ly measured with errors, especially when longitudinal information is collected through
one-time retrospective surveys.1 This concern has been heightened by the increased
use of longitudinal data sets; mismeasurement of the panel data may lead to false re-
sults or obscure the true economic relationships. The estimation problems caused by
the mismeasurement of economic data may be greatly exacerbated when one tries to
control for the consequences of unobserved individual effects by using standard fixed
effects or first-differenced estimators.
We consider the following semi-parametric nonlinear panel data model with un-
known finite-dimensional parameter β0
(1) Yit =m
(
Wit, X∗it,Ci;β0
)+Uit, i = 1, . . . ,n, t= 1, . . . ,T.
In this model, Yit is an observed scalar dependent variable, Wit are perfectly observed
explanatory variables, X∗it is a latent continuously distributed mismeasured variable,
Ci is an unobserved individual-specific effect, and Uit is an unobserved random vari-
able. The function m may not be separable with regard to Wit, X∗it, and Ci, but it
belongs to a known, finite-dimensional, parametric family. We focus on the case where
the data consists of a large number of individuals observed through a small (fixed)
number of time periods. The variable X it is a proxy or measure of the unobserved true
regressor X∗it.
The model described in Eq. (1) has two aspects that are new in the literature of
1The problems of the measurement error have raised great concern in a number of practical applica-
tions. Studies in Bollinger (1998), Bound, Brown, Duncan, and Rodgers (1994), and Bound, Brown, and
Mathiowetz (2001) provide evidences of the measurement errors in economics data sets.
2
panel data models with measurement errors. First, the unobserved heterogeneity en-
ters the structural regression function nonseparably and without imposing a linear
index structure. Second, the potentially nonlinear regression function also contains
a mismeasured variable (nonseparably) along with other explanatory variables. This
proposed regression model is consistent with a structural function derived from a dy-
namic utility maximization problem with flexible preferences. For example, models of
this form can arise in the study of life cycle labor supply with individual preference.
See e.g., Koebel, Laisney, Pohlmeier, and Staat (2008).2
Linear panel data models with measurement error problems have been widely stud-
ied in the literature including: Griliches and Hausman (1986), Wansbeek and Koning
(1991), Biørn (1992), and Wansbeek (2001). Their approaches involved first applying
an appropriate transformation to handle the unobserved effect and then using instru-
ments in a generalized method of moments (GMM) framework. On the other hand, if
we ignore the measurement error problem in Eq. (1), then the model belong to the class
of nonseparable panel data models, which have been studied in: Evdokimov (2011),
Chernozhukov, Fernández-Val, Hahn, and Newey (2013), Hoderlein and White (2012),
Chen and Swanson (2012), Hoderlein and Mammen (2007), Altonji and Matzkin (2005),
and Chernozhukov, Fernandez-Val, Hoderlein, Holzmann, and Newey (2015). In par-
ticular, Chernozhukov, Fernández-Val, Hahn, and Newey (2013), Graham and Powell
(2012), and Hoderlein and White (2012) use changes over time in x to obtain the ce-
teris paribus effect of x on y for identification and estimation of nonseparable models.
Wilhelm (2015) considers nonlinear panel data models with measurement error where
fixed effects are additively separable. He differences out the fixed effects and provides
a nonparametric identification result without requiring any extra variable other than
outcomes and observed regressors. However, in nonseparable panel data models it is
not clear how to remove the unobserved heterogeneity and address measurement error
problems simultaneously (first differencing does not work), so there is a fundamental
difference between additively separable models and nonseparable models.
2Our model could accommodate Eq. (23.13) in Koebel, Laisney, Pohlmeier, and Staat (2008) with
δ= δi which depends on individual i and thus the equation is a special case of our formula provided.
3
Besides the short panel data setting considered here, there is a lot of closely related
work in the large panel literature, but not allowing for measurement error. Alvarez
and Arellano (2003) investigate the linear panel regression models with fixed effects for
large n,T, and they find that their GMM estimator has an asymptotic bias of an order
1/n and does not cause bias for large T. Akashi and Kunitomo (2012) use the approach
in Alvarez and Arellano (2003) to study panel dynamic simultaneous equation models.
Hahn and Kuersteiner (2002) characterize the bias of the fixed effect estimator by
allowing both n and T to approach infinity and the ratio n/T to approach a constant.
We develop an identification technique that builds on previous work of Schennach
(2007), concerning the identification and estimation of nonlinear measurement error
models with instruments. The identification strategy is to employ Fourier transforms
of conditional expectations of observable variables and to provide a closed form solu-
tion to the regression function based on these transforms. We generalize the method
of Schennach (2007) by allowing for a measurement error term in the regression func-
tion with an additional unobserved individual-specific effect in a panel data setting.
The proposed method works in a way that panel data contains enough information on
observables to identify the mismeasured variable X∗it, and the unobserved individual-
specific effect Ci. While the past observables are used as instruments to control the
measurement error problem, the time averages of perfectly observed variables are
used to restrict the unobserved individual-specific effect by a correlated random effects
specification. Altonji and Matzkin (2005) and Wooldridge (2005) have used correlated
random effects (CRE) approaches to nonlinear panel data models to control the unob-
served individual-specific effect. Thus, the nonseparable regression function of interest
also admits a similar representation of the closed form solution in Schennach (2007)
under a mild regularity condition.
We propose an estimation method that closely follows the identification result, in
particular it builds on knowledge of the three conditional expectations. We propose
a sieve minimum distance (hereafter SMD) estimator for the parameters of interest.
Then, estimating the parameters of interest by implementing the methods of series or
4
sieve estimation developed in Ai and Chen (2007), which is an extension of SMD esti-
mation in Ai and Chen (2003) and Newey and Powell (2003). The estimation procedure
consists of applying the SMD method to a vector of moment conditions with different
conditioning variables related to the identification result. It follows that the SMD esti-
mator for the finite-dimensional parameters of the structural function is
p
n-consistent
and asymptotically normally distributed.
The rest of the paper is organized as follows. Section 2 describes the identification
assumptions and strategy for nonlinear panel data models with measurement errors.
Section 3 covers the SMD estimation procedure based on the identification restric-
tions in Section 2. Section 4 discusses the implementation of the SMD estimator and
presents its Monte Carlo simulation. Section 5 presents our empirical application, the
elasticity of labor supply. Section 6 concludes. All proofs are collected in the Appendix.
2. Semiparametric Identification
Without loss of generality, we consider both Wit and X∗it to be scalar quantities (a multi-
variate case can be straightforwardly provided). To avoid confusion, upper case letters
are used exclusively for random variables and lower case letters are used exclusively
for non-random quantities corresponding to its upper case random variables. The data
{yit,wit, xit} is independently and identically distributed across i for each t and it is an
observable random sample for {Yit,Wit, X it} for i = 1,2, . . . ,n and t= 1, . . . ,T ≥ 2.
Assumption 2.1. (Correlated Random Effects (CRE)) There exists a nonzero coefficient
λ0 such that
Ci =λ0W i+ηi,
where W i = 1T
∑T
t=1 Wit is denoted as the time average of the perfectly observed explana-
tory variables. The remainder term ηi is independent of W i.
Assumption 2.1 can be generalized to include more perfectly observed explanatory
variables. For example, if there exists another time-invariant variable Z i, we can
5
consider the following CRE specification
Ci =λ01W i+λ02Z i+ηi.
Including more control variables in the specification may make the independent as-
sumption of the projection error ηi more reasonable.
Assumption 2.2. (Classical measurement error):
(i)(Past variables as IV) There exists an unknown function ht at time t satisfying
X∗it = ht(G i,<t)+Vit,
where G i,<t = (Wit−1, X it−1, . . . ,Wi1, X i1), while Vit is independent of G i,<t and E[Vit]= 0.
(ii)(Measurement error)
X it = X∗it+ e it, E[e it|Wit,G i,<t,Vit,W i,ηi,Uit]= 0
(iii)(Conditional mean independence)
E[Uit|Wit,G i,<t,Vit,W i]= 0;
(iv)(Independent Distribution) The remainder error of CRE ηi and the unobservable Vit
are independent.
The setting for the measurement errors is the same as Schennach (2007). She uses
external instruments to identify her nonlinear errors-in-variables model. Assumption
2.2(i) can be regarded as a control function assumption. It uses the past variables as in-
struments to construct the estimable ht(G i,<t) thereby to extract the independent unob-
servable variable Vit from the unobservable true regressor X∗it. The assumption is com-
monly used for identification of nonlinear models.3 We may further assume that X∗it fol-
3Combining Assumption 2.2(i) and (ii) yields X it = ht(G i,<t)+Vit + e it. As mentioned in Schennach
(2007), an indirect test of the validity of the independence of Vit in Assumption 2.2(i) and condition-
al mean independence of e it in Assumption 2.2(ii) can be conducted by testing the dependence of the
estimated residual from regressing X it on ht(G i,<t).
6
lows a first order stationary (Markov-type) motion by setting X∗it = h(Wit−1, X it−1)+Vit.
Assumption 2.2(ii) implies that E[X∗ite it] = 0, i.e., there is no correlation between the
unobserved true regressor and the measurement error. Assumption 2.2(iii) only im-
poses the standard conditional moment restriction that E[Uit|Wit,G i,<t,Vit,W i] = 0;
the disturbance Uit does not have to be independent of Wit, G i,<t, Vit, and W i and the
distribution of Uit does not have to be the same across time periods. This implies that
Uit can have an AR(1) stochastic process, for example.
As mentioned in Eq. (A.3), the measurement error equation and correlated random
effects can be defined as follows:
X∗it = G˜ i,<t− V˜it, and Ci =λ0W i− η˜i,
where ht(G i,<t) ≡ G˜ i,<t =E[X it|G i,<t], V˜it = −Vit, and η˜i = −ηi. The following assump-
tion guarantees that the Fourier transforms of the related conditional expectations are
well defined.
Assumption 2.3. Define Rt(g,w;w)=E[Yit|Wit =w,G˜ i,<t = g,W i =w] and St(g,w;w)=
E[X itYit|Wit = w,G˜ i,<t = g,W i = w], and consider these as functions of g,w for fixed
values of w. They belong to a function space Sγ that contains functions f : R2 −→ R
satisfying ∫
(1+ξᵀξ)r| f (ξ)|dξ≤ A <∞, for some γ> 0.
Assumption 2.3 ensures that the Fourier transforms of the conditional expectations
to be well defined members of a subclass of locally integrable functions.
Define the Fourier transforms of the function m and the conditional expectations
Rt(g,w;w) and St(g,w;w) defined in Assumption 2.3:
Fy(w,ξ1,ξ2)=
∫ ∫
Rt(g,w;w)eiξ1 geiξ2wdgdw(2)
Fxy(w,ξ1,ξ2)=
∫ ∫
St(g,w;w)eiξ1 geiξ2wdgdw(3)
Fm(w,ξ1,ξ2;β0)=
∫ ∫
m
(
w, x, c;β0
)
eiξ1xeiξ2cdxdc,(4)
7
where i=p−1. Define also φv(ξ1)=
∫
eiξ1 v˜ fV˜it(v˜)dv˜ and φη(ξ2)=
∫
eiξ2η˜ fη˜i (η˜)dη˜, where
fV˜it(v˜) and fη˜i (η˜) are the density functions of V˜it and η˜i, respectively.
Lemma 2.1. Suppose that Assumptions 2.1, 2.2, and 2.3 hold. Then,
Fy(w,ξ1,ξ2)= 1
λ0
Fm(w,ξ1,
ξ2
λ0
)φv(ξ1)φη(
ξ2
λ0
),(5)
Fxy(w,ξ1,ξ2)= 1
λ0
− i
∂Fm(w,ξ1,
ξ2
λ0
)
∂ξ1
φv(ξ1)φη(
ξ2
λ0
).(6)
Proof. See the appendix.
Assumption 2.4. Suppose that: (i)
∫ |v˜| fV˜it(v˜)dv˜ ≤ A <∞, ∫ |η˜| fη˜i (η˜)dη˜ ≤ A <∞; and
(ii) the characteristic functions φv(ξ1) 6= 0, and φη(ξ2) 6= 0 are continuously differentiable
for all ξ1,ξ2 ∈R.
Assumption 2.5. Set Θ as a parameter space containing β0. There exists a finite or
infinite constant ζ¯ > 0 and some wit such that for all β ∈ Θ : (i) Fm(wit,ξ1,ξ2;β) 6= 0
almost everywhere in [−ζ¯, ζ¯]2 and (ii) Fm(wit,ξ1,ξ2;β)= 0 for all |ζ1|, |ζ2| > ζ¯.
Assumptions 2.4 and 2.5 are standard in the deconvolution literature. Assumption
2.4(ii) requires that the characteristic functions of V and η˜ are non-vanishing, which
excludes uniform or triangular distributions, for example.
Exploiting the conditional mean function in Eq. (A.5) by replacing fη˜i (η˜) by fη˜i ;γ(η˜),
we have the following result. Denote γ= (β,λ) and γ is a (dβ+2)×1-dimensional vector.
Consider the parametric conditional mean function in Eq. (A.16):
Rt(g,w;w,γ)=E[Yit|Wit =w,G˜ i,<t = g,W i =w;γ]
=
∫ ∫
m
(
w, g− v˜it,λ1w− η˜i;β
)
fV˜it(v˜) fη˜i ;γ(η˜)dv˜dη˜.
Define the gradient of E[Yit|Wit =w,G˜ i,<t = g,W i =w;γ] and the information matrix
as follows:
∇γE[Yit|Wit =w,G˜ i,<t = g,W i =w;γ]=
(
∂Rt(g,w;w,γ)
∂β1
, . . . ,
∂Rt(g,w;w,γ)
∂λ2
)ᵀ
.
8
I(γ)=E
[
∇γRt(G˜ i,<t,W i;Wit,γ) ·∇γRt(G˜ i,<t,W i;Wit,γ)
ᵀ]
.
Assumption 2.6. (Nonsingular Parametric Structure) Set Γ = Θ×Υ as a parameter
space containing (β0,λ0). The elements of the vector ∇γE[Yit|wit, g˜ i,<t,wi;γ] exist and
are continuous in Γ for each (wit, g˜ i,<t,wi), and the matrix I(β0,λ0) is nonsingular.
Theorem 2.1. Suppose that Assumptions 2.1, 2.2, 2.3, 2.4, 2.5, and 2.6 hold. Then, the
three unknown parameters of interest, including the finite-dimensional parameters β0
and λ0, the distribution of the remainder error of control function approach fV˜it(v˜), and
the distribution of the remainder error of CRE ηi, fη˜i (η˜), are identifiable.
Proof. See the appendix.
There are two main steps for the identification strategy for Theorem 2.1. In the
first step, we use the method of Theorem 1 in Schennach (2007) and of Theorem 3(B)
in Zinde-Walsh (2014) to identify the distribution of measurement error. As for the
second step we use the CRE specification and the properties of Fourier transforms on
convolution functions to connect the distribution of the individual effect to a parametric
conditional moment function. Then, the identification is achieved by the nonsingular
parametric structure of the information matrix formed by the parametric conditional
moment function of Assumption 2.6.
A quantity of interest in many applications is the partial effect. The magnitude of
the partial effect evidently cannot be estimated at meaningful values of the individual
effect. One solution is to average the partial effects across the distribution of the in-
dividual effect; this quantity is also identified by Theorem 2.1. With the identification
of the distribution of ηi and the independence assumption of ηi in Assumption 2.1, we
have f (c|wi) = fη˜i (−c+λ0wi). Then, the distribution of the individual effect can be
identified with the identification of f (c|wi) from the equation
fCi (c)=
∫
f (c|wi) · f (wi)︸ ︷︷ ︸
estimable
from data
dwi.(7)
9
Suppose that x∗it takes continuous values. The average partial effect (APE) for x
∗
it
at the point (w0, x∗0 ) is
APE(w0, x∗0 )=
∫
C
∂m
(
wit, xit, ci;β0
)
∂xit
∣∣∣
(wit,xit)=(w0,x∗0 )
fCi (c)dc.(8)
Corollary 2.1. Suppose that Assumptions 2.1, 2.2, 2.3, 2.4, 2.5, and 2.6 hold. Then,
both the distribution of the individual effect and the average partial effect defined in
Eq. (8) are identified.
3. SMD Estimation
We have shown in Theorem 2.1 that the three unknown parameters of interest, in-
cluding the finite-dimensional parameters β0 and λ0, the distribution of the remainder
error of control function approach fV˜it(v˜), and the distribution of the remainder error
of CRE ηi, fη˜i (η˜), are uniquely identified. The identification is based on knowledge
of the three observable conditional expectations E[X it|G i,<t], E[Yit|Wit,G˜ i,<t,W i] and
E[X itYit|Wit,G˜ i,<t,W i], where G˜ i,<t = ht(G i,<t). In general, the conditioning set is high
dimensional and nonparametric estimation procedures will perform poorly. We impose
a Markov assumption, which reduces the dimensionality considerably.
Assumption 3.1. (Stationary Markov motion) The mismeasured covariate X∗it follows
a first order stationary Markov process, X∗it = h(Wit−1, X it−1)+Vit for each t.
Denote H˜i,<t = h(Wit−1, X it−1) and D it = (Wit,Wit−1, X it−1,W i). Under the assump-
tions of Theorem 2.1 and Assumption 3.1, we rewrite these conditional expectations as
10
follows:4
0≡E[X it|Wit−1, X it−1]−h(Wit−1, X it−1),
0≡E[Yit|D it]−
∫ ∫
m
(
Wit, H˜i,<t− v˜,λ0W i− η˜;β0
)
fV˜it(v˜it) fη˜i (η˜i)dv˜dη˜,
0≡E[X itYit|D it]−
∫ ∫
(H˜i,<t− v˜)m
(
Wit, H˜i,<t− v˜,λ0W i− η˜;β0
)
× fV˜it(v˜) fη˜i (η˜)dv˜dη˜.
Denote α0 = (β0,λ0, fV˜it(·), fη˜i (·),h(·))
ᵀ
. Define the following residual functions:
ρ1 (X it,Yit,D it;α0)≡ X it−h(Wit−1, X it−1),
ρ2 (X it,Yit,D it;α0)≡Yit−
∫ ∫
m
(
Wit, H˜i,<t− v˜,λ0W i− η˜;β0
)
fV˜it(v˜) fη˜i (η˜)dv˜dη˜,
ρ3 (X it,Yit,D it;α0)≡ X itYit−
∫ ∫
(H˜i,<t− v˜)m
(
Wit, H˜i,<t− v˜,λ0W i− η˜;β0
)
× fV˜it(v˜) fη˜i (η˜)dv˜dη˜.
Define the 3×1 vector of residual functions ρ(X it,Yit,D it;α0) that contains ρ j (X it,Yit,D it;α0) ,
j = 1,2,3. The parameter vector α= (β,λ, fV (·), fη(·),h(·))ᵀ has three infinite-dimensional
nuisance parameters because of the presence of the unknown functions λ, fV (·), fη(·),
and h(·). The conditional moments functions for α0 can be summarized as the following
conditional moment restrictions with different conditioning variables
m(D it;α)≡

m1(Wit−1, X it−1;α)
m2(D it;α)
m3(D it;α)
≡

E[ρ1 (X it,Yit,D it;α) |Wit−1, X it−1]
E[ρ2 (X it,Yit,D it;α) |D it]
E[ρ3 (X it,Yit,D it;α) |D it]
 ,
with m(D it;α0) = 0. While the conditioning variable used in the first conditional mo-
ment restriction is (Wit−1, X it−1), the conditioning variable used in the second and third
conditional moment restriction is D it. Therefore, the model fits into the general models
of conditional moment restrictions with different conditioning variables in Ai and Chen
4The detailed derivations can be found in Eqs. (A.5) and (A.6) in the appendix.
11
(2007), which contain finite dimensional unknown parameters and infinite dimension-
al unknown functions.
We consider a nonparametric least squares (LS) regression estimator for each com-
ponent of m(D it;α). Let pk(·) = (p1(·), . . . , pk(·))ᵀ be a vector of some known univariate
basis function and pk(·, . . . , ·) = (p1(·, . . . , ·), . . . , pk(·, . . . , ·))ᵀ be a multivariate basis func-
tion generated by the tensor product construction. Let pk1n(Wit−1, X it−1)= (p1(Wit−1, X it−1), . . . ,
pk1n(Wit−1, X it−1))
ᵀ
and H1 = (pk1n(W11, X11), . . . ,Pk1n(Wn,T−1, Xn,T−1))ᵀ . The series LS
estimator of m1(Wit−1, X it−1;α) is given by
m̂1(Wit−1, X it−1;α)(9)
= pk1n(Wit−1, X it−1)
ᵀ
(H
ᵀ
1H1)
−1 n∑
i=1
T∑
t=3
pk1n(Wit−1, X it−1)ρ1(X it,Yit,D it;α).
As for the other conditional moment restrictions, for j = 2,3, denote the k jn × 1
vector of approximating functions as pk jn(D it) = (p1(D it), . . . ,pk jn(D it))
ᵀ
, which is con-
structed from some known basis functions for any square integrable real-valued func-
tion of D it. A linear consistent sieve estimator m̂ j(D it;α) can be obtained by regressing
ρ j(X it,Yit,D it;α) on pk jn(D it), whence
(10) m̂ j(D it;α)= pk jn(D it)
ᵀ
(H
ᵀ
j H j)
−1 n∑
i=1
T∑
t=3
pk jn(D it)ρ j(X it,Yit,D it;α),
where H j = (pk jn(D12), . . . ,Pk jn(DnT))ᵀ . It follows that m̂(D it;α) ≡ (m̂1(Wit−1, X it−1;α),
m̂2(D it;α), m̂3(D it;α))
ᵀ
is a consistent estimator for m(D it;α) and An is a sequence
of approximating sieve spaces for the parameter space A containing α0. The SMD
estimator α̂n minimizes the following sample analog of a minimum distance objective
function with the parameters restricted to the sieve spaces, An:
α̂n = arg min
α∈An
1
n(T−1)
n∑
i=1
T∑
t=3
m̂(D it;α)
ᵀ
m̂(D it;α).
For simplicity, we use the identity weighting matrix in the sample objective function.
There are two approximations in the optimization problem to make the estimator fea-
12
sible and consistent. One is that m̂(D it;α) approximates m(D it;α) and the other is
thatAn approximatesA . This GMM type estimator is proposed by Ai and Chen (2007)
and is called a modified SMD estimator comparing to the sieve minimum distance es-
timation that are identified through a conditional moment restriction model with the
same conditioning variables in each conditional moment restriction in Ai and Chen
(2003) and Newey and Powell (2003). Ai and Chen (2007) show that the modified S-
MD estimator is consistent, and the parametric components of the estimator have an
asymptotically normal limiting distribution under suitable regularity conditions.
4. Monte Carlo Simulation
This section presents the finite sample properties of the SMD estimator (defined in
Section 3) by a Monte Carlo simulation. We focus on the estimation of β0 and λ0, which
correspond to the regression function m
(
Wit, X∗it,Ci;β0
)
and the CRE Ci = λ01W i +
λ02Z i + ηi, respectively. However, the distributions of fV˜it(v˜) and fη˜i (η˜) are treated
nonparametrically and will be approximated by a sequence of truncated sieves.
The simulation design is according to the following DGP. Denote Trun(Φ, [a,b]) as
the distribution of a random variable generated by Φ−1(u · (Φ(b)−Φ(a))+Φ(a)), where
Φ is the CDF of standard normal distribution, while Φ−1 is the inverse of Φ and u is a
uniform random variable on [0,1]. Both Wi1, and X∗i1 are generated from Trun(Φ, [0,1]).
The covariates (Wit, X∗it) for t= 2,3 are generated according to
Wit = ρWit−1+UW ,it−1 with UW ,it−1 ∼Trun(Φ, [−2,2]),
X∗it = ρX∗it−1+UX ,it−1 with UX ,it−1 ∼Trun(Φ, [−2,2]),
where ρ = 0.8. The specification for the measurement error problem is:
X it = X∗it+ e it, where e it ∼Trun(Φ, [−2,2]).
Let W i = 13
3∑
t=1
Wit and Z i ∼ Trun(Φ, [0,1]). Then, the specification for the individual
13
effect is
Ci =λ01W i+λ02Z i+ηi, where (λ01,λ02)= (−0.5,0.5),ηi ∼Trun(Φ, [−2,2]).
Set β0 = (β00,β01,β02)= (0.5,0.5,−0.5). We consider three specifications for the regres-
sion function:
Simulation I: m
(
Wit, X∗it,Ci;β0
)=β00+β01Wit+β02X∗2it +Ci,
Simulation II: m
(
Wit, X∗it,Ci;β0
)= (β00+β01Wit+β02X∗it+Ci)2 ,
Simulation III: m
(
Wit, X∗it,Ci;β0
)= (β00+β01(1+Ci)Wit+β02(1+Ci)X∗it+Ci)2 .
The SMD procedure requires approximating the three nonparametric parts by sieves,
including the conditional expectation function ht, fV˜it(v˜) and fη˜i (η˜). We use the polyno-
mial base in the sieve approximation series for ht,
ht(w, x)= γ0+γ1w+γ2x+γ3w2+γ4x2+γ5xw.
Let f1 and f2 be the nonparametric series estimators for fV˜it(v˜) and fη˜i (η˜), respectively.
We construct f 1/21 and f
1/2
2 by univariate Hermite functions,
f 1/21 (v˜)=
3∑
i=0
δ1iHi(v˜), f 1/22 (η˜)=
3∑
i=0
δ2iHi(η˜),
where H0(x)= e− x
2
2 , H1(x)= xe− x
2
2 , H2(x)= (x2−1)e− x
2
2 , H3(x)= (x3−3x)e− x
2
2 . The sieve
coefficients of both f1 and f2 need to satisfy density restrictions. Because the Her-
mite functions form an orthogonal series that satisfies
∫∞
−∞Hn(x)Hm(x)dx=
p
2pin!δnm,
where δnm = 1 if n=m, and δnm = 0 otherwise, the density restriction on the sieve co-
efficients is
p
2pi(δ210+δ211+2!δ212+3!δ213)= 1.
As discussed in Section 3, we use a tensor product polynomial sieve to approximate
each component of the conditional mean function m(D it;α), which are the sets of instru-
ments. While we choose the set of instruments for the argument of pk1n(Wit−1, X it−1)
14
as {1,Wit−1, X it−1,W2it−1,Wit−1X it−1, X
2
it−1} for the first conditional moment restriction,
the set of instruments for each argument of pk2n(D it), and pk3n(D it) for the second and
third conditional moment restrictions is being chosen from
{
1,Wit,Wit−1, X it−1,W i, Z i,
W2it,WitWit−1,WitX it−1,WitW i,WitZ i,W
2
it−1,Wit−1X it−1,Wit−1W i,Wit−1Z i, X
2
it−1, X it−1W i,
X it−1Z i,W
2
i ,W iZ i, Z
2
i
}
. The total number of the instruments is
3∑
j=1
k jn = 6+2×21= 48.
The 150 replications of 500, and 1000 observations are drawn from these three data
generating processes corresponding to the different regression function m(·). The sim-
ulation results of Tables 1-2 show the proposed SMD estimator performs well in these
samples. The mean estimates are almost the same as median estimates of different
sample sizes and simulation designs. This implies that there does not exist skewness
in their respective distributions. For each estimated coefficient, the RMSE declines as
the sample size is increased, as would be expected for this simulation. We can further
use Eq. (7) with the estimated coefficient of λ and observation of wi to recover the
distribution of the individual effect fCi (·) and then APEs can be calculated by Eq. (8).
Tables 3-4 report the mean, standard deviation (SD) and RMSE of the APE estimation
results. All estimations are nearly unbiased and the APE estimator has the best per-
formance in DGP II. In terms of RMSE, the RMSE almost declines as the sample size
is increased.
5. Empirical Study
In this section, we apply our proposed nonlinear panel data model to investigate the
effect of the hourly wage rate of individuals on their labor supply given their demo-
graphic variables. The dependent variable is the log of annual hours of work for those
with positive working hours. The variable of interest is the hourly wage rate. Mea-
surement error can be a significant problem for the hourly wage rate in survey data.
Our model allows for measurement error of the hourly wage rate and provides consis-
tent estimate of the effects of interest. Our model uses the correlated random effect to
control for unobserved time invariant factors such as individual unobserved skill level,
15
ability, or motivation factors which may be correlated with the hourly wage rate.5 The
data is from Ziliak (1997), Waves IXX-XXI of the PSID. Table 5 presents summary s-
tatistics for the working hours, the hourly wage rate, and socioeconomic variables. The
between and within sample standard deviations are 0.233 and 0.172 for ln(hours) and
0.432 and 0.118 for ln(wage), respectively. We have a three-periods of the panel data
with a cross-sectional size 532 of males.
We consider the following empirical model for labor supply:
ln(hoursit)=β0+β1(1+ ci) ln(wage it)+β2kidsit+β3age it+β4age2it
+β5disabit+β6t+ ci+uit.
This specification allows interactions between observables and unobservables through
the term β1(1+ ci); working in differences does not eliminate this effect. The growth
of individual hours is allowed to proceed heterogeneously unlike many studies in the
literature, MaCurdy (1981). The variable ci represents unmeasured ability or moti-
vation factors that affect hours of working, while uit may contain time-varying unob-
served macro shocks. Because the true wage rate of each individual is subject to a
misreporting error, the measurement error of the variable ln(wage it) is likely to oc-
cur.6 The vector of time-varying covariates is (kidsit,age it,age2it,disabit)
ᵀ
and the
time averages of these variables are used in the CRE specification in this estimation
of labor supply elasticity. A theoretical model of labor supply implies that there are
two effects of a wage increase on labor supply, one is the income effect and the other is
the substitution effect. While the income effect induces less work, the substitution ef-
fect increases more work. Because both effects work in opposite directions, the overall
effect of a wage increase on labor supply is ambiguous.
The identification assumptions in Section 2 must hold to apply the proposed sieve
GMM estimator. The following discussion presents these assumptions for this empir-
ical application. Assumption 2.1 is the modelling of the individual effect and is to
5Borjas (2013) reviews the literature on the estimation of the labor supply elasticity and also discuss-
es the problems caused by measurement error.
6See detailed discussion in Bound, Brown, and Mathiowetz (2001).
16
replace the unobserved individual effect with its linear projection onto the time av-
erage of explanatory variables. This allows a correlation between Ci and Wit, X∗it. If
Ci represents the willingness to work long hours, then the modelling indicates that it
would depend on the average number of kids, age, and the average status of disability.
Assumption 2.2 is based on the validity of using the past variables as IVs. This setup
attempts to eliminate the endogeneity bias of the measurement error by exploiting the
zero correlation of the measurement error at period t with other explanatory variables
in the past, and the past explanatory variables might be related to the current hourly
wage rate. Assumptions 2.3-2.5 ensures the Fourier transforms or the characteristic
functions of the related conditional mean functions and density functions are well de-
fined for algebraic manipulations, and are technical conditions. Assumption 2.6 implies
that there is a nonsingular parametric structure around the population parameters.
Three comparable estimators can be constructed based on this regression model
without the (1+ ci) multiplying wage, i.e.,
ln(hoursit)=β0+β1 ln(wage it)+β2kidsit+β3age it+β4age2it+β5disabit+β6t+ci+uit.
The first estimator (Linear Fixed Effect) is the fixed effect method using within trans-
formation to remove the individual effect Ci, and the second estimator (First Differ-
encing IV) is to use the first-difference and then estimate the parameters by using the
past variables as IVs. We use (kidsit,age it,age2it,disabit)
ᵀ
from the periods t−1 and
t−2 and ln(wage it) from t−2 as instruments for the contemporaneous period. The
third linear correlated random effects model is to estimate the parameters using the
CRE specification in Assumption 2.1.
Table 6 reports the estimates obtained with our sieve GMM method and with the
other three linear estimates. We find that the estimated coefficients for the elasticity
are not much different to both models except for the one using the linear fixed effect
method which is negative. The values of the coefficients in these estimates are -9.4%,
4.9%, 4.1%, and 3.3%. However, if we consider the estimates of APE then the estimate
for the elasticity in our semi-parametric nonlinear panel data model is twice as the
17
estimates in the linear first differencing IV model and the linear correlated random ef-
fect model. A 1% increase in wage exhibits an approximately 9.7% increase in working
hours. Given the flexible nature of our estimation approach, the difference implies that
the estimate in the other linear models might be biased downward when the measure-
ment error problem is not accounted for. As for the sign of the labor supply elasticity,
the estimates are all positive except for the fixed effect method and this indicates that
the number of hours worked is increasing in the wage, i.e. the substitution effect is
stronger than the income effect.
In Figure 1, the distribution of the error η in the CRE specification does not show
any kind of symmetry so it is an asymmetric distribution. On the other hand, the
distribution is a bimodal distribution because the distribution has two peaks. This
indicates that there are two different groups for the error. The error falls into a bimodal
distribution with a lot of values getting zero and a lot getting some value greater than
zero.
6. Conclusion
This paper presents the semi-parametric identification and estimation of nonlinear
panel data models with mismeasured variables and their corresponding average par-
tial effects using only three periods of data. The approach addresses settings without
external information such as a validation or replicate data set. This study was motivat-
ed by the richer structure of panel data. We have shown how to use past observables
as instruments to identify the nonlinear regression model in the presence of measure-
ment error, while applying the correlated random effects specification to control the
unobserved individual heterogeneity.
In simulation experiments we showed that the sieve GMM estimators perform well
for both linear and nonlinear panel models with measurement errors. In the applica-
tion we found that the substitution effect is stronger than the income effect and a 1%
increase in wage enhances an approximately 10% increase in working hours.
18
Appendix
A. Identification Results
The proof of Lemma 2.1: Because both Wit and X∗it are a scalar, we can write Ci =
λ0W i+ηi. Combining Assumptions 2.2(i) and (ii) yields
X it = ht(G i,<t)+Vit+ e it.(A.1)
Taking conditional expectation with respect to G i,<t, and applying zero conditional
mean of Vit, and e it implies:
E[X it|G i,<t]= ht(G i,<t)≡ G˜ i,<t.(A.2)
Rewrite the measurement error equation and correlated random effects as follows:
X∗it = G˜ i,<t− V˜it, and Ci =λ0W i− η˜i.(A.3)
Use the relations in Eq. (A.3) to write
(A.4) Yit =m
(
Wit,G˜ i,<t− V˜it,λ0W i− η˜i;β0
)
+Uit
Then, using the conditional mean independence of Uit in Assumption 2.2(iii) and inde-
pendence of V˜it and η˜i in Assumption 2.2(iv), we obtain
E[Yit|wit, g˜ i,<t,wi](A.5)
=
∫ ∫
m
(
wit, g˜ i,<t− v˜it,λ0wi− η˜i;β0
)
fV˜it(v˜it) fη˜i (η˜i)dv˜itdη˜i.
19
Expanding out the term X itYit and taking conditional expectation with respect to
(wit, g˜ i,<t,wi) results in
E[X itYit|wit, g˜ i,<t,wi]
=E[(G˜ i,<t− V˜it)m
(
Wit,G˜ i,<t− V˜it,λ0W i− η˜i;β0
)
|wit, g˜ i,<t,wi]
+E[∆X itm
(
Wit,G˜ i,<t− V˜it,λ0W i− η˜i;β0
)
|wit, g˜ i,<t,wi]
+E[(G˜ i,<t− V˜it)Uit|wit, g˜ i,<t,wi]+E[∆X itUit|wit, g˜ i,<t,wi]
=E[(G˜ i,<t− V˜it)m
(
Wit,G˜ i,<t− V˜it,λ0W i− η˜i;β0
)
|wit, g˜ i,<t,wi],
=
∫ ∫
( g˜ i,<t− v˜it)m
(
wit, g˜ i,<t− v˜it,λ0wi− η˜i;β0
)
fV˜it(v˜it) fη˜i (η˜i)dv˜itdη˜i.(A.6)
where we have used the zero conditional mean of ∆X it in Assumption 2.2(ii), the zero
conditional mean of Uit in Assumption 2.2(iii), and the law of iterated expectation.
Given wit, taking the Fourier transform on both sides of Eqs. (A.5) and (A.6) with
respect to G˜ i,<t, and W i, we have
Fy(wit,ξ1,ξ2)
=
∫ ∫
E[Yit|wit, g˜ i,<t,wi]eiξ1 g˜ i,<t eiξ2wi dg˜ i,<tdwi
=
∫ ∫ (∫ ∫
m
(
wit, g˜ i,<t− v˜it,λ0wi− η˜i;β0
)
fV˜it(v˜it) fη˜i (η˜i)dv˜itdη˜i
)
eiξ1 g˜ i,<t eiξ2wi dg˜ i,<tdwi
= 1
λ0
(∫ ∫
m
(
wit, x∗it, ci;β0
)
eiξ1x
∗
it eiξ2
ci
λ0 dx∗itdci
)(∫
eiξ1 v˜it fV˜it(v˜it)dv˜it
)(∫
eiξ2
η˜i
λ0 fη˜i (η˜i)dη˜i
)
= 1
λ0
Fm(wit,ξ1,
ξ2
λ0
)φv(ξ1)φη(
ξ2
λ0
),
20
and
Fxy(wit,ξ1,ξ2)
=
∫ ∫
E[X itYit|wit, g˜ i,<t,wi]eiξ1 g˜ i,<t eiξ2wi dg˜ i,<tdwi
=
∫ ∫ (∫ ∫
( g˜ i,<t− v˜it)m
(
wit, g˜ i,<t− v˜it,λ0wi− η˜i;β0
)
fV˜it(v˜it) fη˜i (η˜i)dv˜itdη˜i
)
eiξ1 g˜ i,<t eiξ2wi dg˜ i,<tdwi
= 1
λ0
(∫ ∫
x∗itm
(
wit, x∗it, ci;β0
)
eiξ1x
∗
it eiξ2
ci
λ0 dx∗itdci
)(∫
eiξ1 v˜it fV˜it(v˜it)dv˜it
)(∫
eiξ2
η˜i
λ0 fη˜i (η˜i)dη˜i
)
= 1
λ0
− i
∂Fm(wit,ξ1,
ξ2
λ0
)
∂ξ1
φv(ξ1)φη(
ξ2
λ0
).
This yields Eqs. (5) and (6). Q.E.D.
The proof of Theorem 2.1: We will recover fV˜it(v˜) first. Differentiating the defini-
tion of Fy(wit,ξ1,ξ2) in Eq. (2) with respect to ξ1 yields
∂
∂ξ1
Fy(wit,ξ1,ξ2)= ∂
∂ξ1
∫ ∫
E[Yit|wit, g˜ i,<t]eiξ1 g˜ i,<t eiξ2wi dg˜ i,<tdwi
= i
∫ ∫
E[G˜ i,<tYit|wit, g˜ i,<t]eiξ1 g˜ i,<t eiξ2wi dg˜ i,<tdwi.(A.7)
Notice that Eq. (6) can be written as
∂Fm(wit,ξ1,
ξ2
λ0
)
∂ξ1
φv(ξ1)φη(
ξ2
λ0
) = iFxy(wit,ξ1,ξ2). On
the other hand, differentiating Eq. (5) with respect to ξ1, we obtain
∂
∂ξ1
Fy(wit,wi,ξ1,ξ2)
= 1
λ0
[∂Fm(wit,ξ1, ξ2λ0 )
∂ξ1
φv(ξ1)+Fm(wit,ξ1, ξ2
λ0
)
∂φv(ξ1)
∂ξ1
]
φη(
ξ2
λ0
)
= iFxy(wit,wi,ξ1,ξ2)+ 1
λ0
Fm(wit,ξ1,
ξ2
λ0
)
∂φv(ξ1)
∂ξ1
φη(
ξ2
λ0
)
= i
∫ ∫
E[X itYit|wit, g˜ i,<t,wi]eiξ1 g˜ i,<t eiξ2xi dg˜ i,<tdxi
+ 1
λ0
Fm(wit,ξ1,
ξ2
λ0
)
∂φv(ξ1)
∂ξ1
φη(
ξ2
λ0
).(A.8)
21
Combining Eqs. (A.7) and (A.8) yields
iF( g˜−x)y(wit,ξ1,ξ2)
≡ i
∫ ∫
E[(G˜ i,<t−X it)Yit|wit, g˜ i,<t,wi]eiξ1 g˜ i,<t eiξ2wi dg˜ i,<tdwi
= 1
λ0
Fm(wit,ξ1,
ξ2
λ0
)
∂φv(ξ1)
∂ξ1
φη(
ξ2
λ0
)(A.9)
Because φv(ξ1), φη(ξ2), and Fm(wit,ξ1,ξ2;β0) are all nonzero by Assumptions 2.4(ii)
and 2.5, we can divide each side of Eq. (A.9) by the corresponding side of Eq. (5) to
obtain
−iF( g˜−x)y(wit,ξ1,ξ2)+
∂φv(ξ1)
∂ξ1
φv(ξ1)
Fy(wit,ξ1,ξ2)= 0.(A.10)
By Theorem 1(b) in Zinde-Walsh (2014), there exists a unique function Q(ξ1) ≡
∂φv(ξ1)
∂ξ1
φv(ξ1)
such that
−iF( g˜−x)y(wit,ξ1,ξ2)+Q(ξ1)Fy(wit,ξ1,ξ2)= 0.(A.11)
Integrating the above equation from 0 to ξ1 with the boundary condition φv(0)=
∫
fV˜it(v˜it)dv˜it =
1 yields
φv(ξ1)= exp
(∫ ξ1
0
Q(ξ)dξ
)
.
This implies that φv(ξ1) is identified because it is expressed in terms of the Fouri-
er transforms of observable conditional expectations. It follows that the distribution
fV˜it(v˜it) is identified. Rescaling ξ2 by λ0ξ2 in Eq. (5) and rearranging the terms, we
have
λ0Fy(wit,ξ1,λ0ξ2)= Fm(wit,ξ1,ξ2;β0)φv(ξ1)φη(ξ2),(A.12)
22
Solving φη(ξ2) from the above equation yields
φη(ξ2)=
λ0Fy(wit,ξ1,λ0ξ2)
Fm(wit,ξ1,ξ2;β0)φv(ξ1)
.(A.13)
Because Fy(wit,ξ1,ξ2), Fm(wit,ξ1,ξ2;β) are all known from the data and the proposed
semi-parametric regression function, and φv(ξ1) is identified, we can generalize the
relation into the following parametric function:
φη;γ(ξ2)=
λFy(wit,ξ1,λξ2)
Fm(wit,ξ1,ξ2;β)φv(ξ1)
,(A.14)
where φη;γ0(ξ2) = φη(ξ2). Notice that the identification of the true parameter γ0 leads
to the identification of φη(ξ2). Consider the following parametric function by applying
the inverse Fourier transform to φη;γ(ξ2):
fη˜i ;γ(η˜)=
1
2pi
∫ ∞
−∞
e−iξ2η˜φη;γ(ξ2)dξ2.(A.15)
Evaluating the parametric function at γ0, we have fη˜i ;γ0(η˜) = fη˜i (η˜) by the Fourier in-
version theorem. Exploiting the conditional mean function in Eq. (A.5) by replacing
fη˜i (η˜i) by fη˜i ;γ(η˜), we have
E[Yit|wit, g˜ i,<t,wi;γ](A.16)
=
∫ ∫
m
(
wit, g˜ i,<t− v˜it,λ1wi− η˜i;β
)
fV˜it(v˜it) fη˜i ;γ(η˜i)dv˜itdη˜i.
with E[Yit|wit, g˜ i,<t,wi;γ0] = E[Yit|wit, g˜ i,<t,wi]. Next, we will show that γ0 is iden-
tifiable. If γ0 is not locally identifiable. Then there exists a sequence of distinct
parameters γs ≡ (βs,λs) approaching to γ0 = (β0,λ0) such that ‖(βs,λs)− (β0,λ0)‖ 6= 0
and E[Yit|wit, g˜ i,<t,wi;γs] = E[Yit|wit, g˜ i,<t,wi]. Applying the mean value theorem to
23
E[Yit|wit, g˜ i,<t,wi;γs] around γ0 yields
E[Yit|wit, g˜ i,<t,wi;γs]−E[Yit|wit, g˜ i,<t,wi;γ0](A.17)
=
dβ∑
τ=1
∂E[Yit|wit, g˜ i,<t,wi;γ∗]
∂βτ
(βsτ−β0τ)+
2∑
k=1
∂E[Yit|wit, g˜ i,<t,wi;γ∗]
∂λk
(λsk−λ0k),
where γ∗ ≡ (β∗,λ∗) is a parameter between γs and γ0. Combining these relationships
yields
0=
dβ∑
τ=1
∂E[Yit|wit, g˜ i,<t,wi;γ∗]
∂βτ
(βsτ−β0τ)
‖(βs,λs)− (β0,λ0)‖
+
2∑
k=1
∂E[Yit|wit, g˜ i,<t,wi;γ∗]
∂λk
(λsk−λ0k)
‖(βs,λs)− (β0,λ0)‖
,
=∇γE[Yit|wit, g˜ i,<t,wi;γ∗]T
[ (βs−β0)′
‖(βs,λs)− (β0,λ0)‖
(λs−λ0)′
‖(βs,λs)− (β0,λ0)‖
]T
≡∇γE[Yit|wit, g˜ i,<t,wi;γ∗]TSγs(A.18)
Because ‖Sγs‖2E = 1 for all s, {Sγs : s = 1, ...} is a distinct sequence on the unit sphere.
This implies that there exist a convergent subsequence {Sγs j : j = 1, ...} whose limit is
also on the unit sphere. Denote the limit as Sγ0 . Combining the continuity assumption
in Assumption 2.6 and Eq. (A.18), we obtain
0=∇γE[Yit|wit, g˜ i,<t,wi;γ0]TSγ0 .(A.19)
Multiplying each side by ∇γE[Yit|wit, g˜ i,<t,wi;γ0] yields
0=
(
∇γE[Yit|wit, g˜ i,<t,wi;γ0] ·∇γE[Yit|wit, g˜ i,<t,wi;γ0]T
)
Sγ0 .(A.20)
Taking an expectation, we obtain
0=E
[
∇γE[Yit|wit, g˜ i,<t,wi;γ0];γ0] ·∇γE[Yit|wit, g˜ i,<t,wi;γ0]T
]
Sγ0
= I(β0,λ0)Sγ0 with Sγ0 6= 0.(A.21)
24
Since I(β0,λ0) is nonsingular by Assumption 2.6, we have to conclude that (β0,λ0) is
identifiable from this contradiction. Q.E.D.
References
AI, C., AND X. CHEN (2003): “Efficient Estimation of Models with Conditional Moment
Restrictions Containing Unknown Functions,” Econometrica, 71(6), 1795–1843.
(2007): “Estimation of Possibly Misspecified Semiparametric Conditional Mo-
ment Restriction Models with Different Conditioning Variables,” Journal of Econo-
metrics, 141(1), 5–43.
AKASHI, K., AND N. KUNITOMO (2012): “Some Properties of the LIML Estimator in a
Dynamic Panel Structural Equation,” 166, 167–183.
ALTONJI, J., AND R. MATZKIN (2005): “Cross Section and Panel Data Estimators
for Nonseparable Models with Endogenous Regressors,” Econometrica, 73(4), 1053–
1102.
ALVAREZ, J., AND M. ARELLANO (2003): “The Time Series and Cross-Section Asymp-
totics of Dynamic Panel Data Estimators,” Econometrica, 71(4), 1121–1159.
BIØRN, E. (1992): “The Bias of Some Estimators for Panel Data Models with Measure-
ment Errors,” Empirical Economics, 17(1), 51–66.
BOLLINGER, C. (1998): “Measurement Error in the Current Population Survey: A
Nonparametric Look,” Journal of Labor Economics, 16(3), 576–594.
BORJAS, G. J. (2013): Labor Economics. McGraw-Hill/Irwin.
BOUND, J., C. BROWN, G. DUNCAN, AND W. RODGERS (1994): “Evidence on the Va-
lidity of Cross-sectional and Longitudinal Labor Market Data,” Journal of Labor
Economics, 12(3), 345–368.
25
BOUND, J., C. BROWN, AND N. MATHIOWETZ (2001): Measurement Error in Survey
Data. North-Holland.
CHEN, X., AND N. SWANSON (2012): Dynamic Nonseparable Panel Data Models, in:
Recent Advances and Future Directions in Causality, Prediction, and Specification
Analysis: Essays in Honor of Halbert L. White Jr. Springer.
CHERNOZHUKOV, V., I. FERNÁNDEZ-VAL, J. HAHN, AND W. NEWEY (2013): “Average
and Quantile Effects in Nonseparable Panel Models,” Econometrica, 81(2), 535–580.
CHERNOZHUKOV, V., I. FERNANDEZ-VAL, S. HODERLEIN, S. HOLZMANN, AND
W. NEWEY (2015): “Nonparametric Identification in Panels using Quantiles,” Jour-
nal of Econometrics, 188(2), 378–392.
EVDOKIMOV, K. (2011): “Identification and Estimation of a Nonparametric Panel Data
Model with Unobserved Heterogeneity,” Working Paper.
GRAHAM, B., AND J. POWELL (2012): “Identification and Estimation of Average Partial
Effects in "Irregular" Correlated Random Coefficient Panel Data Models,” Economet-
rica, 80(5), 2105–2152.
GRILICHES, Z., AND J. HAUSMAN (1986): “Errors in Variables in Panel Data,” Journal
of Econometrics, 31(1), 93–118.
HAHN, J., AND G. KUERSTEINER (2002): “Asymptotically Unbiased Inference for a Dy-
namic Panel Model with Fixed Effects When Both n and T Are Large,” Econometrica,
70(4), 1639–1657.
HODERLEIN, S., AND E. MAMMEN (2007): “Identification of Marginal Effects in Non-
separable Models without Monotonicity,” Econometrica, 75(5), 1513–1518.
HODERLEIN, S., AND H. WHITE (2012): “Nonparametric Identification in Nonsepa-
rable Panel Data Models with Generalized Fixed Effects,” Journal of Econometrics,
168(2), 300–314.
26
KOEBEL, B., F. LAISNEY, W. POHLMEIER, AND M. STAAT (2008): “Life Cycle Labor
Supply and Panel Data: a Survey,” in The Econometrics of Panel Data, pp. 761–794.
Springer.
MACURDY, T. E. (1981): “An Empirical Model of Labor Supply in a Life-cycle Setting,”
Journal of Political Economy, 89(6), 1059–1085.
NEWEY, W., AND J. POWELL (2003): “Instrumental Variable Estimation of Nonpara-
metric Models,” Econometrica, 71(5), 1565–1578.
SCHENNACH, S. (2007): “Instrumental Variable Estimation of Nonlinear Errors-in-
variables Models,” Econometrica, 75(1), 201–239.
WANSBEEK, T. J. (2001): “GMM Estimation in Panel Data Models with Measurement
Error,” Journal of Econometrics, 104(2), 259–268.
WANSBEEK, T. J., AND R. H. KONING (1991): “Measurement Error and Panel Data,”
Statistica Neerlandica, 45(2), 85–92.
WILHELM, D. (2015): “Identification and Estimation of Nonparametric Panel Data
Regressions with Measurement Error,” Discussion paper, Cemmap Working Paper.
WOOLDRIDGE, J. (2005): “Simple Solutions to the Initial Conditions Problem in Dy-
namic, Nonlinear Panel Data Models with Unobserved Heterogeneity,” Journal of
Applied Econometrics, 20(1), 39–54.
ZILIAK, J. (1997): “Efficient Estimation With Panel Data When Instruments are Pre-
determined: An Empirical Comparison of Moment-Condition Estimators,” Journal
of Business & Economic Statistics, 15(4), 419–431.
ZINDE-WALSH, V. (2014): “Measurement Error and Deconvolution in Spaces of Gener-
alized Functions,” Econometric Theory, 30(6), 1207–1246.
27
Table 1: Estimations of Nonlinear Panel Data Models with Measurement Error (n=500)
β0 = 0.5 β1 = 0.5 β2 =−0.5 λ1 =−0.5 λ2 = 0.5
Simulation I
Mean 0.527 0.501 -0.436 -0.493 0.496
Median 0.530 0.493 -0.436 -0.497 0.493
RMSE 0.122 0.109 0.118 0.113 0.106
Simulation II
Mean 0.533 0.505 -0.425 -0.501 0.525
Median 0.527 0.507 -0.426 -0.519 0.522
RMSE 0.119 0.110 0.105 0.100 0.122
Simulation III
Mean 0.524 0.501 -0.436 -0.491 0.502
Median 0.521 0.501 -0.437 -0.502 0.505
RMSE 0.114 0.107 0.102 0.101 0.110
Note: Standard deviations of the parameters are computed by the standard de-
viation of the estimates across 150 simulations and called (simulation) standard
deviations.
Table 2: Estimations of Nonlinear Panel Data Models with Measurement Error
(n=1000)
β0 = 0.5 β1 = 0.5 β2 =−0.5 λ1 =−0.5 λ2 = 0.5
Simulation I
Mean 0.522 0.503 -0.435 -0.491 0.532
Median 0.522 0.500 -0.434 -0.501 0.524
RMSE 0.116 0.112 0.111 0.100 0.131
Simulation II
Mean 0.541 0.506 -0.424 -0.512 0.517
Median 0.541 0.506 -0.424 -0.517 0.509
RMSE 0.131 0.110 0.106 0.110 0.109
Simulation III
Mean 0.525 0.510 -0.434 -0.499 0.503
Median 0.527 0.511 -0.436 -0.510 0.514
RMSE 0.117 0.118 0.101 0.106 0.109
Note: Standard deviations of the parameters are computed by the standard de-
viation of the estimates across 150 simulations and called (simulation) standard
deviations.
28
Table 3: Estimation of the APEs in Simulations (n=500)
Infeasible Sieve MD
Simulation I:
Mean -0.250 -0.218
Std. dev. 0.000 0.049
RMSE – 0.059
Simulation II:
Mean -0.375 -0.387
Std. dev. 0.038 0.114
RMSE – 0.114
Simulation III:
Mean -1.662 -1.205
Std. dev. 0.083 0.260
RMSE – 0.526
Note: Standard deviations of the parameters are
computed by the standard deviation of the esti-
mates across 150 simulations.
Table 4: Estimation of the APEs in Simulations (n=1000)
Infeasible Sieve MD
Simulation I:
Mean -0.250 -0.217
Std. dev. 0.000 0.045
RMSE – 0.056
Simulation II:
Mean -0.375 -0.394
Std. dev. 0.025 0.125
RMSE – 0.126
Simulation III:
Mean -1.662 -1.204
Std. dev. 0.060 0.230
RMSE – 0.511
Note: Standard deviations of the parameters are
computed by the standard deviation of the esti-
mates across 150 simulations.
29
Table 5: Data Summary
Variable Mean Std. Dev. Min Max
ln(hours) overall 7.671 0.289 2.770 8.560
between 0.233 4.950 8.407
within 0.172 5.491 10.011
ln(wage) overall 2.614 0.448 -0.220 4.600
between 0.432 0.877 4.367
within 0.118 1.274 3.344
kids overall 1.484 1.218 0 6
between 1.191 0 5.333
within 0.257 -0.183 3.150
age overall 42.415 7.973 29 60
between 7.933 30 59
within 0.849 40.748 44.081
age2 overall 1,862.545 708.068 841 3,600
between 704.740 900.667 3,481.667
within 72.973 1,668.212 2,051.545
disab overall 0.083 0.276 0 1
between 0.230 0 1
within 0.153 -0.583 0.750
Note: The data is a three-periods of panel data with a cross-sectional
size 532.
30
Table 6: Estimates for the Elasticity of Labor Supply
Dependent Variable: ln(hours)
Linear Fixed First Differencing Linear Corr. Semi-parametric
Effect IV Random Effects Nonlinear Reg.
ln(wage) -0.094 0.049 0.041 0.033
(0.045) (0.081) (0.021) (0.020)
kids -0.019 0.000 -0.015 -0.011
(0.021) (0.021) (0.021) (0.013)
age -0.008 0.019 -0.011 -0.009
(0.043) (0.072) (0.043) (0.012)
age2 0.000 0.000 0.000 -0.002
(0.000) (0.001) (0.000) (0.003)
disab -0.042 -0.063 -0.048 -0.040
(0.035) (0.091) (0.035) (0.044)
time trend 0.002 0.000 0.002 0.003
(0.029) (0.004) (0.029) (0.002)
kids – – 0.018 0.026
– – (0.024) (0.014)
age – – 0.016 0.014
– – (0.046) (0.013)
age2 – – 0.000 0.002
– – (0.000) (0.004)
disab – – -0.109 -0.084
– – (0.056) (0.162)
constant 7.918 – 7.523 8.435
1.314 – (0.325) (5.723)
APE -0.094 0.049 0.041 0.097
(0.045) (0.081) (0.021) (0.020)
Note: Bootstrap (simulation) standard errors are reported in parentheses, using 150 bootstrap repli-
cations for the semi-parametric nonlinear regression model. The linear fixed effects model, the first
differencing IV model and the linear correlated random effects model are the proposed model without
the (1+ ci) multiplying wage. The linear correlated random effects model is estimated using the CRE
specification in Assumption 2.1. While the APE of labor supply for the linear models is β1, the APE
for the non-linear model is β1
∫
C (1+ c) fCi (c)dc.
31
-5 -4 -3 -2 -1 0 1 2 3 4 5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
The Density of the CRE Error: f
Figure 1: The Estimated Density of the Error in the CRE Specification
32