Description

A Multidimensional Latent Class IRT Model for Non-Ignorable Missing Responses

Categories

Published

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Share

Transcript

A multidimensional latent class IRT modelfor non-ignorable missing responses
∗
Silvia Bacci
†
Department of Economics, University of Perugia (IT)email
: silvia.bacci@stat.unipg.itFrancesco Bartolucci
∗
Department of Economics, University of Perugia (IT)email
: bart@stat.unipg.itOctober 23, 2014
Abstract
We propose a structural equation model, which reduces to a multidimensional latent classitem response theory model, for the analysis of binary item responses with non-ignorablemissingness. The missingness mechanism is driven by two sets of latent variables: onedescribing the propensity to respond and the other referred to the abilities measured bythe test items. These latent variables are assumed to have a discrete distribution, so as toreduce the number of parametric assumptions regarding the latent structure of the model.Individual covariates may also be included through a multinomial logistic parametrizationof the probabilities of each support point of the distribution of the latent variables. Giventhe discrete nature of this distribution, the proposed model is eﬃciently estimated by theExpectation-Maximization algorithm. A simulation study is performed to evaluate the ﬁnite-sample properties of the parameter estimates. Moreover, an application is illustrated to datacoming from a Students’ Entry Test for the admission to some university courses.
Keywords:
EM algorithm, Finite mixture models, Item response theory, Semiparametricinference, Students’ Entry Test
∗
The present paper has been accepted for the publication on
Structural Equation Modeling: A Multidisciplinary Journal.
†
Both authors acknowledge the ﬁnancial support from the grant FIRB (“Futuro in ricerca”) 2012 on “Mixtureand latent variable models for causal inference and analysis of socio-economic data”, which is funded by the ItalianGovernment (RBFR12SHVV). The authors are also grateful to Dr. B. Bertaccini of the University of Florence(IT) for making available the data.
1
a r X i v : 1 4 1 0 . 4 8 5 6 v 1 [ s t a t . M E ] 1 7 O c t 2 0 1 4
1 Introduction
A relevant problem in applications of Item Response Theory (IRT) models is related to missingresponses to some items. Following the general theory of Little and Rubin (2002), we deﬁne
missing item responses to be
ignorable
if: (
i
) these responses are missing at random (MAR),that is, the event that the response to an item is missing is conditionally independent of the(unobservable) response to this item given the observed responses to the other items and (
ii
) themissing mechanism is governed by a model based on a distinct set of parameters with respectto the parameters of the model governing the response process. Under ignorability, maximumlikelihood estimation of the parameters of the IRT model of interest is only based on the observedresponses.Obviously, when condition (
i
) or (
ii
) above does not hold, then missing responses are
non-ignorable
and the missingness mechanism must be modeled along with the relationships of directinterest to avoid wrong inferential conclusions and loss of relevant information for the assessmentof the examinees’ ability level. A typical example of non-ignorable missing responses (or missingnot at random, MNAR) is observed with educational tests where, in order to avoid guessing,a wrong item response is penalized to a greater extent in comparison with a missing response.In such a context, it is natural to suppose that the choice of not answering to a given item isrelated to the ability (or abilities) measured by the test.In the statistical literature there exist diﬀerent approaches to model a non-ignorable miss-ing mechanism. Among the best known, we recall the
selection approach
(Diggle and Kenward,1994) and the
pattern-mixture approach
(Little, 1993). The ﬁrst formulation deﬁnes a joint model
of observed and missing responses and factorizes the corresponding distribution in a marginaldistribution for the complete data (union of observed and missing responses) and a conditionaldistribution for the missing data given the complete data. In contrast, the pattern-mixtureapproach speciﬁes the marginal distribution for the missing responses and the conditional dis-tribution of the complete data given the missing responses.Recently, Formann (2007) showed that on the basis of the latent class (LC) model (Lazarsfeld
and Henry, 1968; Goodman, 1974) it is possible to deﬁne class of MNAR models, which is distinct
from that of selection models and that of pattern-mixture models. He treated the presence of non-ignorable missing entries in the case of repeated measurements of the same variable orobservations of diﬀerent variables made on the same individuals. The approach is based oncreating an extra category for missing responses, so as to model the missingness mechanism,and analyzing the data coded in this way by means of an LC model. In this way each latentclass is also characterized in terms of missing responses. Moreover, individual covariates mayinﬂuence the class weights, so that the latent class distribution becomes individual-speciﬁc as inthe latent regression models of Bandeen-Roche et al. (1997) and Bartolucci and Forcina (2006).
In the statistical literature, Harel and Schafer (2009) proposed the use of LC models to treat
cases where missingness is only partially ignorable. They introduced the concepts of
partial ignorability
, which supposes that a summary of the missing-data indicators depends on themissing values, and of
latent ignorability
, based on assuming that the missing-data indicatorsdepend on a summary of the missing values. In the context of item responses, they proposed tocreate a binary missingness indicator corresponding to each item and to ﬁt an LC model treatingthese indicators as additional items. In this way, each latent class does not only summarizeanswers to questionnaire items, but the individual propensity to answer.In the IRT context, which is of our speciﬁc interest, several approaches may be adopted todeal with MNAR responses. The most naive ones consist of adopting simple IRT models thatignore the missing responses or consider the omissions as wrong responses. Bradlow and Thomas(1998) and Rose et al. (2010) warned against the drawbacks of these approaches, which lead to
2
biased estimates of model parameters and, therefore, to unfair comparisons between persons.To overcome these limitations, several authors proposed approaches based on modeling the non-ignorable missingness process. Moustaki and Knott (2000) and Moustaki and O’Muircheartaigh
(2000), among others, discussed a nominal IRT model, with possible covariates for the ability,where the missing responses are treated as separate response categories, elaborating an srcinalidea by Bock (1972). On the other hand, Rose et al. (2010) proposed a latent regression model
where the latent ability is regressed on the observed response rate, referring in this way themissingness mechanism to the covariates rather than to the responses.An interesting stream of research has been introduced by Lord (1983), who suggested to treat
the problem of MNAR responses by assuming that the observed item responses depend bothon the latent ability (or abilities), intended to be measured by the test, and on another latentvariable which represents the “temperament” of respondents, and describes their propensity torespond. Elaborating the approach of Lord (1983), Holman and Glas (2005) discussed a uniﬁed
model-based approach for handling non-ignorable missing data and, therefore, assessing theextent to which the missingness mechanism may be ignored. The adopted approach relies onmultidimensional IRT models (Reckase, 2010) and on the assumption that the latent traits arenormally distributed.It is also worth recalling the work of Bertoli-Barsotti and Punzo (2013) that proposed an
alternative non-parametric approach based on the conditional maximum likelihood estimationmethod, where a multidimensional IRT model is speciﬁed according to the Rasch model assump-tions (Rasch, 1960). The main drawback of the conditional approach is that it does not allow
us to measure the correlation between the assumed latent variables; moreover, its use is limitedto settings for which the Rasch model is realistic.The above mentioned approaches based on the introduction of a latent variable describingthe tendency to respond are well suited to a Structural Equation Model (SEM) formulation(Goldberger, 1972; Duncan, 1975; Bollen et al., 2008), which allows for several types of general-
izations (e.g., semiparametric speciﬁcation of the latent trait distribution and eﬀect of individualcovariates on the latent traits).Aim of the present article is to introduce a SEM, which reduces to a special type of multi-dimensional LC IRT model, to deal with non-ignorable missing responses to a set of test items.The model is based on the assumption of discreteness of the latent variables, not only for theresponse process but also for the missingness process. Therefore, with respect to traditionalSEM, the proposed model takes the form of a ﬁnite mixture SEM (Jedidi et al., 1997; Dolan
and van der Maas, 1998; Arminger et al., 1999).
The basic model we rely on was introduced by Bartolucci (2007) and it is based on two
main assumptions: (
i
) more latent traits can be simultaneously considered and each item isassociated with only one of them (between-item multidimensionality, see Adams et al., 1997),and (
ii
) these latent traits are represented by a random vector with a discrete distributioncommon to all subjects, so that each support point of such a distribution identiﬁes a diﬀerentlatent class of individuals having homogenous unobservable characteristics. Moreover, withbinary response variables, either a Rasch or a two-parameter logistic (2PL) parametrization(Birnbaum, 1968) may be adopted for the probability of a correct response to each item. In
this context, we propose to include a further discrete latent variable to model the probabilityof observing a response to each item, so that the non-ignorable missing process may be treatedin a semiparametric way, as made for the response process. Other than extending the modelof Bartolucci (2007) to allow for missingness, we also extend it to allow for latent individual
covariates which may explain the probability of belonging to a given latent class.The approach proposed in this article joins the latent class approach of Formann (2007)
3
with the parametric approach of Holman and Glas (2005) developed in the IRT setting. Several
advantages with respect to the last one may be found. First, the proposed model is moreﬂexible because it does not introduce any parametric assumption about the distribution of thelatent variables. Second, detecting homogenous classes of individuals is convenient for certaindecisional processes, because individuals in the same class may be associated to the same decision(e.g., students admitted, admitted with reserve, not admitted to university courses). Finally, ourmodel allows us to skip the well-known problem of the intractability of multidimensional integralswhich characterizes the marginal log-likelihood function of a continuous multidimensional IRTmodel. Indeed, parameter estimation may be performed through the discrete marginal maximumlikelihood method, based on an Expectation-Maximization (EM) algorithm (Dempster et al.,1977), and implemented in an
R
function that we make publicly available.In order to assess the ﬁnite-sample properties of the parameter estimates obtained from theEM algorithm, we have performed a simulation study under diﬀerent scenarios corresponding todiﬀerent structures of missing data. In this way, we can also assess the impact of missing dataon the quality of the parameter estimates with respect to the case in which all data are observed.The proposed approach is also illustrated through an application to real data coming from theStudents’ Entry Test given at the Faculty of Economics of an Italian university in 2011. Thetest is composed of 36 multiple-choice items devoted to measure three latent abilities (Logic,Mathematics, and Verbal comprehension) and certain covariates are also included.The remainder of the paper is organized as follows. We ﬁrst describe the proposed structuralmodel to account for the presence of non-ignorable missing responses in the IRT context andits statistical formulation. Then, some details about the estimation procedure through the EMalgorithm are described together with other details about likelihood inference. In the sequel,we illustrate the simulation study to evaluate the adequacy of the proposed approach. Theapplication of the proposed approach to the data arising from the Students’ Entry Test isillustrated in the last section.
2 Proposed SEM formulation
In this section, we describe the proposed approach to model MNAR item responses (Little andRubin, 2002). We begin by illustrating the proposed SEM and then we provide the resultingstatistical formulation which may be cast in the class of multidimensional IRT models.
2.1 Structural model
For a random subject drawn from the population of interest, denote by
Y
j
the response providedby the subject to binary item
j
, with
j
= 1
,...,m
. In order to model the response process,we have to consider that the subject may answer correctly (
Y
j
= 1) or incorrectly (
Y
j
= 0) orhe/she may skip the question, so that
Y
j
can be observed or not. Therefore, for
j
= 1
,...,m
,we also introduce the binary indicator
R
j
equal to 1 if the individual provides a response toitem
j
and to 0 otherwise (i.e.,
Y
j
is missing); see also Harel and Schafer (2009). Moreover, we
consider a set of
c
exogenous individual covariates denoted by
X
1
,...,X
c
.In order to explain the association between the exogenous variables
X
1
,...,X
c
and theendogenous variables
Y
1
,...,Y
m
, we introduce two latent variables. The ﬁrst of these latentvariables, denoted by
U
, represents the latent trait that is measured by the test items (e.g., abil-ity in Mathematics). The second latent variable, denoted by
V
, is interpreted as the propensityto answer (as in Lord, 1983), the opposite of an
aversion to risk
if a wrong response is somehowpenalized. Based on these latent variables and considering
Y
j
and
R
j
as deriving from a dis-4
cretization of continuous variables denoted by
Y
∗
j
and
R
∗
j
, we formulate the following equationsentering the
measurement component
of the proposed SEM:
Y
j
=
I
{
Y
∗
j
≥
0
}
,
(1)
R
j
=
I
{
R
∗
j
≥
0
}
,
(2)
Y
∗
j
=
α
j
U
−
β
j
+
ε
1
j
,
(3)
R
∗
j
=
γ
1
j
U
+
γ
2
j
V
−
δ
j
+
ε
2
j
,
(4)for
j
= 1
,...,m
, where
I
{·}
is the indicator function equal to 1 if its argument is true and to0 otherwise and
ε
1
j
and
ε
2
j
are independent error terms. Moreover, the slope
α
j
measures theeﬀect of an increase of the latent variable
U
on
Y
∗
j
and, similarly,
γ
1
j
and
γ
2
j
measure the eﬀecton
R
∗
j
of
U
and
V
, respectively.According to the proposed model, the observed response to a given item
j
depends only onthe latent ability
U
measured by the test, whereas the event of answering to item
j
dependsboth on
U
and on the propensity to respond
V
. Therefore, provided that
γ
2
j
>
0,
R
∗
j
tends toincrease with the propensity to respond given the latent ability level. Similarly, provided that
γ
1
j
>
0,
R
∗
j
tends to increase with the ability level even if the propensity to answer remainconstant. The idea behind this assumption is that better students are more willing to responddue to their conﬁdence on the correctness of the response. Note that the adopted formulationreminds model G3 of Holman and Glas (2005), whereas model G2 proposed by the same authors
is obtained by imposing the constraint
γ
1
j
= 0,
j
= 1
,...,m
, which implies the absence of anydirect eﬀect of
U
on
R
∗
j
and, therefore, denotes that the missingness process may be ignored.Finally,
β
j
and
δ
j
denote other two parameters characterizing item
j
, which may be interpretedas
diﬃculty parameters
because higher values of them correspond to smaller values of
Y
∗
j
and
R
∗
j
.The proposed SEM formulation is completed by assuming that: (
i
) the latent variables
U
and
V
are conditionally independent given the covariates
X
1
,...,X
c
and that (
ii
) a direct eﬀectof these covariates on the response variables is ruled out. How we formulate the conditionaldistributions of
U
and
V
given the covariates will be clariﬁed in the following section.The above approach may be easily extended to the multidimensional case with items mea-suring
s
diﬀerent latent traits (e.g., ability in Mathematics, ability in Logic, ability in Verbalcomprehension), which are represented by the latent variables
U
1
,...,U
s
, assuming, in additionto (1) and (2), that
Y
∗
j
=
α
js
d
=1
z
dj
U
d
−
β
j
+
ε
1
j
,
(5)
R
∗
j
=
γ
1
js
d
=1
z
dj
U
d
+
γ
2
j
V
−
δ
j
+
ε
2
j
,
(6)for
j
= 1
,...,m
. In comparison with the structural model based on equations (1)-(4), the new
one changes in the last two equations involving the indicator variables
z
dj
, which are equal to 1if item
j
measures latent trait of type
d
and to 0 otherwise. A between-item multidimensionalapproach (Adams et al., 1997) is assumed with reference to the measurement of the
s
latentabilities, indicating that each item measures only one of them. On the other hand, a within-item multidimensional approach is here adopted for the indicator
R
j
, since it is aﬀected by twolatent variables. In any case, our conceptual model still assumes one latent variable
V
for thepropensity to answer (for an illustration see Figure 1). A possible alternative, which is more5

Search

Similar documents

Related Search

Switzerland as a model for net-centred democrDevelopment of average model for control of aA Radio Propagation Model for VANETs in UrbanA Phenomenological Model for Psychiatry, PsycA simulation model for chemically amplified rDo We Need a Specific Grammar for Non-canonicA Novel Model for Competition and CooperationA model for introducing technology in rural aEuropean Union as a Model for Regional IntergDevelopment of average model for control of a

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...Sign Now!

We are very appreciated for your Prompt Action!

x