Title: | Supplementary Item Response Theory Models |
---|---|
Description: | Supplementary functions for item response models aiming to complement existing R packages. The functionality includes among others multidimensional compensatory and noncompensatory IRT models (Reckase, 2009, <doi:10.1007/978-0-387-89976-3>), MCMC for hierarchical IRT models and testlet models (Fox, 2010, <doi:10.1007/978-1-4419-0742-4>), NOHARM (McDonald, 1982, <doi:10.1177/014662168200600402>), Rasch copula model (Braeken, 2011, <doi:10.1007/s11336-010-9190-4>; Schroeders, Robitzsch & Schipolowski, 2014, <doi:10.1111/jedm.12054>), faceted and hierarchical rater models (DeCarlo, Kim & Johnson, 2011, <doi:10.1111/j.1745-3984.2011.00143.x>), ordinal IRT model (ISOP; Scheiblechner, 1995, <doi:10.1007/BF02301417>), DETECT statistic (Stout, Habing, Douglas & Kim, 1996, <doi:10.1177/014662169602000403>), local structural equation modeling (LSEM; Hildebrandt, Luedtke, Robitzsch, Sommer & Wilhelm, 2016, <doi:10.1080/00273171.2016.1142856>). |
Authors: | Alexander Robitzsch [aut,cre]
|
Maintainer: | Alexander Robitzsch <[email protected]> |
License: | GPL (>= 2) |
Version: | 4.2-106 |
Built: | 2025-01-30 06:57:27 UTC |
Source: | https://github.com/alexanderrobitzsch/sirt |
Supplementary functions for item response models aiming to complement existing R packages. The functionality includes among others multidimensional compensatory and noncompensatory IRT models (Reckase, 2009, <doi:10.1007/978-0-387-89976-3>), MCMC for hierarchical IRT models and testlet models (Fox, 2010, <doi:10.1007/978-1-4419-0742-4>), NOHARM (McDonald, 1982, <doi:10.1177/014662168200600402>), Rasch copula model (Braeken, 2011, <doi:10.1007/s11336-010-9190-4>; Schroeders, Robitzsch & Schipolowski, 2014, <doi:10.1111/jedm.12054>), faceted and hierarchical rater models (DeCarlo, Kim & Johnson, 2011, <doi:10.1111/j.1745-3984.2011.00143.x>), ordinal IRT model (ISOP; Scheiblechner, 1995, <doi:10.1007/BF02301417>), DETECT statistic (Stout, Habing, Douglas & Kim, 1996, <doi:10.1177/014662169602000403>), local structural equation modeling (LSEM; Hildebrandt, Luedtke, Robitzsch, Sommer & Wilhelm, 2016, <doi:10.1080/00273171.2016.1142856>).
The sirt package enables the estimation of following models:
Multidimensional marginal maximum likelihood estimation (MML)
of generalized logistic Rasch type models using the
generalized logistic link function (Stukel, 1988) can be conducted
with rasch.mml2
and the argument itemtype="raschtype"
.
This model also allows the estimation of the 4PL item
response model (Loken & Rulison, 2010).
Multiple group estimation, latent regression models and
plausible value imputation are supported. In addition, pseudo-likelihood
estimation for fractional item response data can be conducted.
Multidimensional noncompensatory, compensatory and partially compensatory
item response models for dichotomous item responses (Reckase, 2009) can be estimated
with the smirt
function and the options irtmodel="noncomp"
, irtmodel="comp"
and irtmodel="partcomp"
.
The unidimensional quotient model (Ramsay, 1989) can be estimated
using rasch.mml2
with itemtype="ramsay.qm"
.
Unidimensional nonparametric item response models can be estimated
employing MML estimation (Rossi, Wang & Ramsay, 2002) by making use of
rasch.mml2
with itemtype="npirt"
.
Kernel smoothing for item response function estimation (Ramsay, 1991)
is implemented in np.dich
.
The multidimensional IRT copula model (Braeken, 2011) can be applied
for handling local dependencies, see rasch.copula3
.
Unidimensional joint maximum likelihood estimation (JML) of the Rasch model
is possible with the rasch.jml
function. Bias correction methods
for item parameters are included in rasch.jml.jackknife1
and rasch.jml.biascorr
.
The multidimensional latent class Rasch and 2PL model (Bartolucci, 2007)
which employs a discrete trait distribution can be estimated
with rasch.mirtlc
.
The unidimensional 2PL rater facets model (Lincare, 1994) can be estimated
with rm.facets
. A hierarchical rater model based on
signal detection theory (DeCarlo, Kim & Johnson, 2011) can be conducted
with rm.sdt
. A simple latent class model for two exchangeable
raters is implemented in lc.2raters
. See Robitzsch and Steinfeld (2018)
for more details.
The discrete grade of membership model (Erosheva, Fienberg & Joutard, 2007)
and the Rasch grade of membership model can be estimated by gom.em
.
Some hierarchical IRT models and random item models for dichotomous
and normally distributed data (van den Noortgate, de Boeck & Meulders, 2003;
Fox & Verhagen, 2010) can be estimated with mcmc.2pno.ml
.
Unidimensional pairwise conditional likelihood estimation
(PCML; Zwinderman, 1995) is implemented in rasch.pairwise
or
rasch.pairwise.itemcluster
.
Unidimensional pairwise marginal likelihood estimation
(PMML; Renard, Molenberghs & Geys, 2004)
can be conducted using rasch.pml3
. In this function
local dependence can be handled by imposing residual error structure
or omitting item pairs within a dependent item cluster from the
estimation.
The function rasch.evm.pcm
estimates the multiple group
partial credit model based on the pairwise eigenvector approach
which avoids iterative estimation.
Some item response models in sirt can be estimated via
Markov Chain Monte Carlo (MCMC) methods. In mcmc.2pno
the two-parameter normal ogive model can be estimated. A hierarchical
version of this model (Janssen, Tuerlinckx, Meulders & de Boeck, 2000)
is implemented in mcmc.2pnoh
. The 3PNO testlet model
(Wainer, Bradlow & Wang, 2007; Glas, 2012) can be estimated with
mcmc.3pno.testlet
.
Some hierarchical IRT models and random item models
(van den Noortgate, de Boeck & Meulders, 2003) can be estimated
with mcmc.2pno.ml
.
For dichotomous response data, the free NOHARM software
(McDonald, 1982, 1997) estimates the multidimensional compensatory 3PL model and the function
R2noharm
runs NOHARM from within R. Note that NOHARM must be
downloaded from http://noharm.niagararesearch.ca/nh4cldl.html
at first. A pure R implementation of the NOHARM model with some extensions
can be found in noharm.sirt
.
The measurement theoretic founded nonparametric item
response models of Scheiblechner (1995, 1999) – the ISOP
and the ADISOP model – can be estimated with
isop.dich
or isop.poly
.
Item scoring within this theory can be conducted with
isop.scoring
.
The functional unidimensional item response model
(Ip et al., 2013) can be estimated with f1d.irt
.
The Rasch model can be estimated by variational approximation
(Rijmen & Vomlel, 2008) using rasch.va
.
The unidimensional probabilistic Guttman model (Proctor, 1970) can be
specified with prob.guttman
.
A jackknife method for the estimation of standard errors of the
weighted likelihood trait estimate (Warm, 1989) is available in
wle.rasch.jackknife
.
Model based reliability for dichotomous data can be calculated by the method
of Green and Yang (2009) with greenyang.reliability
and the
marginal true score method of Dimitrov (2003) using the function
marginal.truescore.reliability
.
Essential unidimensionality can be assessed by the DETECT
index (Stout, Habing, Douglas & Kim, 1996), see the function
conf.detect
.
Item parameters from several studies can be linked using the Haberman
method (Haberman, 2009) in linking.haberman
. See also
equating.rasch
and linking.robust
.
The alignment procedure (Asparouhov & Muthen, 2013)
invariance.alignment
is originally for comfirmatory factor
analysis and aims at obtaining approximate invariance.
Some person fit statistics in the Rasch model (Meijer & Sijtsma, 2001)
are included in personfit.stat
.
An alternative to the linear logistic test model (LLTM), the
so called least squares distance model for cognitive diagnosis
(LSDM; Dimitrov, 2007), can be estimated with the function
lsdm
.
Local structural equation models (LSEM) can be estimated with the
lsem.estimate
function (Hildebrandt et al., 2016).
Alexander Robitzsch [aut,cre] (<https://orcid.org/0000-0002-8226-3132>)
Maintainer: Alexander Robitzsch <[email protected]>
Asparouhov, T., & Muthen, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling, 21(4), 1-14. doi:10.1080/10705511.2014.919210
Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141-157.
Braeken, J. (2011). A boundary mixture approach to violations of conditional independence. Psychometrika, 76(1), 57-76. doi:10.1007/s11336-010-9190-4
DeCarlo, T., Kim, Y., & Johnson, M. S. (2011). A hierarchical rater model for constructed responses, with a signal detection rater model. Journal of Educational Measurement, 48(3), 333-356. doi:10.1111/j.1745-3984.2011.00143.x
Dimitrov, D. (2003). Marginal true-score measures and reliability for binary items as a function of their IRT parameters. Applied Psychological Measurement, 27, 440-458.
Dimitrov, D. M. (2007). Least squares distance method of cognitive validation and analysis for binary items using their item response theory parameters. Applied Psychological Measurement, 31, 367-387.
Erosheva, E. A., Fienberg, S. E., & Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. Annals of Applied Statistics, 1, 502-537.
Fox, J.-P. (2010). Bayesian item response modeling. New York: Springer. doi:10.1007/978-1-4419-0742-4
Fox, J.-P., & Verhagen, A.-J. (2010). Random item effects modeling for cross-national survey data. In E. Davidov, P. Schmidt, & J. Billiet (Eds.), Cross-cultural Analysis: Methods and Applications (pp. 467-488), London: Routledge Academic.
Fraser, C., & McDonald, R. P. (1988). NOHARM: Least squares item factor analysis. Multivariate Behavioral Research, 23, 267-269.
Glas, C. A. W. (2012). Estimating and testing the extended testlet model. LSAC Research Report Series, RR 12-03.
Green, S.B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 74, 155-167.
Haberman, S. J. (2009). Linking parameter estimates derived from an item response model through separate calibrations. ETS Research Report ETS RR-09-40. Princeton, ETS. doi:10.1002/j.2333-8504.2009.tb02197.x
Hildebrandt, A., Luedtke, O., Robitzsch, A., Sommer, C., & Wilhelm, O. (2016). Exploring factor model parameters across continuous variables with local structural equation models. Multivariate Behavioral Research, 51(2-3), 257-278. doi:10.1080/00273171.2016.1142856
Ip, E. H., Molenberghs, G., Chen, S. H., Goegebeur, Y., & De Boeck, P. (2013). Functionally unidimensional item response models for multivariate binary data. Multivariate Behavioral Research, 48, 534-562.
Janssen, R., Tuerlinckx, F., Meulders, M., & de Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285-306.
Jeon, M., & Rijmen, F. (2016). A modular approach for item response theory modeling with the R package flirt. Behavior Research Methods, 48(2), 742-755. doi:10.3758/s13428-015-0606-z
Linacre, J. M. (1994). Many-Facet Rasch Measurement. Chicago: MESA Press.
Loken, E. & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63, 509-525.
McDonald, R. P. (1982). Linear versus nonlinear models in item response theory. Applied Psychological Measurement, 6(4), 379-396. doi:10.1177/014662168200600402
McDonald, R. P. (1997). Normal-ogive multidimensional model. In W. van der Linden & R. K. Hambleton (1997): Handbook of modern item response theory (pp. 257-269). New York: Springer. doi:10.1007/978-1-4757-2691-6_15
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107-135.
Proctor, C. H. (1970). A probabilistic formulation and statistical analysis for Guttman scaling. Psychometrika, 35, 73-78.
Ramsay, J. O. (1989). A comparison of three simple test theory models. Psychometrika, 54, 487-499.
Ramsay, J. O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56, 611-630.
Reckase, M. (2009). Multidimensional item response theory. New York: Springer. doi:10.1007/978-0-387-89976-3
Renard, D., Molenberghs, G., & Geys, H. (2004). A pairwise likelihood approach to estimation in multilevel probit models. Computational Statistics & Data Analysis, 44, 649-667.
Rijmen, F., & Vomlel, J. (2008). Assessing the performance of variational methods for mixed logistic regression models. Journal of Statistical Computation and Simulation, 78, 765-779.
Robitzsch, A., & Steinfeld, J. (2018). Item response models for human ratings: Overview, estimation methods, and implementation in R. Psychological Test and Assessment Modeling, 60(1), 101-139.
Rossi, N., Wang, X. & Ramsay, J. O. (2002). Nonparametric item response function estimates with the EM algorithm. Journal of Educational and Behavioral Statistics, 27, 291-317.
Rusch, T., Mair, P., & Hatzinger, R. (2013). Psychometrics with R: A Review of CRAN Packages for Item Response Theory. http://epub.wu.ac.at/4010/1/resrepIRThandbook.pdf
Scheiblechner, H. (1995). Isotonic ordinal probabilistic models (ISOP). Psychometrika, 60(2), 281-304. doi:10.1007/BF02301417
Scheiblechner, H. (1999). Additive conjoint isotonic probabilistic models (ADISOP). Psychometrika, 64, 295-316.
Schroeders, U., Robitzsch, A., & Schipolowski, S. (2014). A comparison of different psychometric approaches to modeling testlet structures: An example with C-tests. Journal of Educational Measurement, 51(4), 400-418. doi:10.1111/jedm.12054
Stout, W., Habing, B., Douglas, J., & Kim, H. R. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20(4), 331-354. doi:10.1177/014662169602000403
Stukel, T. A. (1988). Generalized logistic models. Journal of the American Statistical Association, 83(402), 426-431. doi:10.1080/01621459.1988.10478613
Uenlue, A., & Yanagida, T. (2011). R you ready for R?: The CRAN psychometrics task view. British Journal of Mathematical and Statistical Psychology, 64(1), 182-186. doi:10.1348/000711010X519320
van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369-386.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427-450.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge: Cambridge University Press.
Zwinderman, A. H. (1995). Pairwise parameter estimation in Rasch models. Applied Psychological Measurement, 19, 369-375.
For estimating multidimensional models for polytomous item responses see the mirt, flirt (Jeon & Rijmen, 2016) and TAM packages.
For conditional maximum likelihood estimation see the eRm package.
For pairwise estimation likelihood methods (also known as composite likelihood methods) see pln or lavaan.
The estimation of cognitive diagnostic models is possible using the CDM package.
For the multidimensional latent class IRT model see the MultiLCIRT package which also allows the estimation IRT models with polytomous item responses.
Latent class analysis can be carried out with covLCA, poLCA, BayesLCA, randomLCA or lcmm packages.
Markov Chain Monte Carlo estimation for item response models can also
be found in the MCMCpack package (see the MCMCirt
functions
therein).
See Rusch, Mair and Hatzinger (2013) and Uenlue and Yanagida (2011) for reviews of psychometrics packages in R.
## ## |-----------------------------------------------------------------| ## | sirt 0.40-4 (2013-11-26) | ## | Supplementary Item Response Theory | ## | Maintainer: Alexander Robitzsch <a.robitzsch at bifie.at > | ## | https://sites.google.com/site/alexanderrobitzsch/software | ## |-----------------------------------------------------------------| ## ## _/ _/ ## _/_/_/ _/ _/_/ _/_/_/_/ ## _/_/ _/ _/_/ _/ ## _/_/ _/ _/ _/ ## _/_/_/ _/ _/ _/_/ ##
## ## |-----------------------------------------------------------------| ## | sirt 0.40-4 (2013-11-26) | ## | Supplementary Item Response Theory | ## | Maintainer: Alexander Robitzsch <a.robitzsch at bifie.at > | ## | https://sites.google.com/site/alexanderrobitzsch/software | ## |-----------------------------------------------------------------| ## ## _/ _/ ## _/_/_/ _/ _/_/ _/_/_/_/ ## _/_/ _/ _/_/ _/ ## _/_/ _/ _/ _/ ## _/_/_/ _/ _/ _/_/ ##
This function calculates keys of a dataset with raw item responses. It starts with setting the most frequent category of an item to 1. Then, in each iteration keys are changed such that the highest item discrimination is found.
automatic.recode(data, exclude=NULL, pstart.min=0.6, allocate=200, maxiter=20, progress=TRUE)
automatic.recode(data, exclude=NULL, pstart.min=0.6, allocate=200, maxiter=20, progress=TRUE)
data |
Dataset with raw item responses |
exclude |
Vector with categories to be excluded for searching the key |
pstart.min |
Minimum probability for an initial solution of keys. |
allocate |
Maximum number of categories per item. This argument is used in
the function |
maxiter |
Maximum number of iterations |
progress |
A logical which indicates if iteration progress should be displayed |
A list with following entries
item.stat |
Data frame with item name, p value, item discrimination and the calculated key |
data.scored |
Scored data frame using calculated keys
in |
categ.stats |
Data frame with statistics for all categories of all items |
## Not run: ############################################################################# # EXAMPLE 1: data.raw1 ############################################################################# data(data.raw1) # recode data.raw1 and exclude keys 8 and 9 (missing codes) and # start with initially setting all categories larger than 50 res1 <- sirt::automatic.recode( data.raw1, exclude=c(8,9), pstart.min=.50 ) # inspect calculated keys res1$item.stat ############################################################################# # EXAMPLE 2: data.timssAusTwn from TAM package ############################################################################# miceadds::library_install("TAM") data(data.timssAusTwn,package="TAM") raw.resp <- data.timssAusTwn[,1:11] res2 <- sirt::automatic.recode( data=raw.resp ) ## End(Not run)
## Not run: ############################################################################# # EXAMPLE 1: data.raw1 ############################################################################# data(data.raw1) # recode data.raw1 and exclude keys 8 and 9 (missing codes) and # start with initially setting all categories larger than 50 res1 <- sirt::automatic.recode( data.raw1, exclude=c(8,9), pstart.min=.50 ) # inspect calculated keys res1$item.stat ############################################################################# # EXAMPLE 2: data.timssAusTwn from TAM package ############################################################################# miceadds::library_install("TAM") data(data.timssAusTwn,package="TAM") raw.resp <- data.timssAusTwn[,1:11] res2 <- sirt::automatic.recode( data=raw.resp ) ## End(Not run)
Functions for simulating and estimating the Beta item response model
(Noel & Dauvier, 2007). brm.sim
can be used for
simulating the model, brm.irf
computes the item response
function. The Beta item response model is estimated as a discrete
version to enable estimation in standard IRT software like
mirt or TAM packages.
# simulating the beta item response model brm.sim(theta, delta, tau, K=NULL) # computing the item response function of the beta item response model brm.irf( Theta, delta, tau, ncat, thdim=1, eps=1E-10 )
# simulating the beta item response model brm.sim(theta, delta, tau, K=NULL) # computing the item response function of the beta item response model brm.irf( Theta, delta, tau, ncat, thdim=1, eps=1E-10 )
theta |
Ability vector of |
delta |
Vector of item difficulty parameters |
tau |
Vector item dispersion parameters |
K |
Number of discretized categories. The default is |
Theta |
Matrix of the ability vector |
ncat |
Number of categories |
thdim |
Theta dimension in the matrix |
eps |
Nuisance parameter which stabilize probabilities. |
The discrete version of the beta item response model is defined as follows.
Assume that for item there are
categories resulting in
values
. Each value
is associated with a
corresponding the transformed value in
, namely
.
The item response model is defined as
This density is a discrete version of a Beta distribution with
shape parameters and
. These parameters are
defined as
The item response function can also be formulated as
The item parameters can be reparameterized as
and
.
Then, the original item parameters can be retrieved by
and
.
Using
, we obtain
This formulation enables the specification of the Beta item response
model as a structured latent class model
(see TAM::tam.mml.3pl
;
Example 1).
See Smithson and Verkuilen (2006) for motivations for treating continuous indicators not as normally distributed variables.
A simulated dataset of item responses if brm.sim
is applied.
A matrix of item response probabilities if brm.irf
is applied.
Gruen, B., Kosmidis, I., & Zeileis, A. (2012). Extended Beta regression in R: Shaken, stirred, mixed, and partitioned. Journal of Statistical Software, 48(11), 1-25. doi:10.18637/jss.v048.i11
Noel, Y., & Dauvier, B. (2007). A beta item response model for continuous bounded responses. Applied Psychological Measurement, 31(1), 47-73. doi:10.1177/0146621605287691
Smithson, M., & Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods, 11(1), 54-71. doi: 10.1037/1082-989X.11.1.54
See also the betareg package for fitting Beta regression regression models in R (Gruen, Kosmidis & Zeileis, 2012).
############################################################################# # EXAMPLE 1: Simulated data beta response model ############################################################################# #*** (1) Simulation of the beta response model # Table 3 (p. 65) of Noel and Dauvier (2007) delta <- c( -.942, -.649, -.603, -.398, -.379, .523, .649, .781, .907 ) tau <- c( .382, .166, 1.799, .615, 2.092, 1.988, 1.899, 1.439, 1.057 ) K <- 5 # number of categories for discretization N <- 500 # number of persons I <- length(delta) # number of items set.seed(865) theta <- stats::rnorm( N ) dat <- sirt::brm.sim( theta=theta, delta=delta, tau=tau, K=K) psych::describe(dat) #*** (2) some preliminaries for estimation of the model in mirt #*** define a mirt function library(mirt) Theta <- matrix( seq( -4, 4, len=21), ncol=1 ) # compute item response function ii <- 1 # item ii=1 b1 <- sirt::brm.irf( Theta=Theta, delta=delta[ii], tau=tau[ii], ncat=K ) # plot item response functions graphics::matplot( Theta[,1], b1, type="l" ) #*** defining the beta item response function for estimation in mirt par <- c( 0, 1, 1) names(par) <- c( "delta", "tau","thdim") est <- c( TRUE, TRUE, FALSE ) names(est) <- names(par) brm.icc <- function( par, Theta, ncat ){ delta <- par[1] tau <- par[2] thdim <- par[3] probs <- sirt::brm.irf( Theta=Theta, delta=delta, tau=tau, ncat=ncat, thdim=thdim) return(probs) } name <- "brm" # create item response function brm.itemfct <- mirt::createItem(name, par=par, est=est, P=brm.icc) #*** define model in mirt mirtmodel <- mirt::mirt.model(" F1=1-9 " ) itemtype <- rep("brm", I ) customItems <- list("brm"=brm.itemfct) # define parameters to be estimated mod1.pars <- mirt::mirt(dat, mirtmodel, itemtype=itemtype, customItems=customItems, pars="values") ## Not run: #*** (3) estimate beta item response model in mirt mod1 <- mirt::mirt(dat,mirtmodel, itemtype=itemtype, customItems=customItems, pars=mod1.pars, verbose=TRUE ) # model summaries print(mod1) summary(mod1) coef(mod1) # estimated coefficients and comparison with simulated data cbind( sirt::mirt.wrapper.coef( mod1 )$coef, delta, tau ) mirt.wrapper.itemplot(mod1,ask=TRUE) #--------------------------- # estimate beta item response model in TAM library(TAM) # define the skill space: standard normal distribution TP <- 21 # number of theta points theta.k <- diag(TP) theta.vec <- seq( -6,6, len=TP) d1 <- stats::dnorm(theta.vec) d1 <- d1 / sum(d1) delta.designmatrix <- matrix( log(d1), ncol=1 ) delta.fixed <- cbind( 1, 1, 1 ) # define design matrix E E <- array(0, dim=c(I,K,TP,2*I + 1) ) dimnames(E)[[1]] <- items <- colnames(dat) dimnames(E)[[4]] <- c( paste0( rep( items, each=2 ), rep( c("_a","_b" ), I) ), "one" ) for (ii in 1:I){ for (kk in 1:K){ for (tt in 1:TP){ qk <- (2*(kk-1)+1)/(2*K) gammap <- exp( theta.vec[tt] / 2 ) E[ii, kk, tt, 2*(ii-1) + 1 ] <- gammap * log( qk ) E[ii, kk, tt, 2*(ii-1) + 2 ] <- 1 / gammap * log( 1 - qk ) E[ii, kk, tt, 2*I+1 ] <- - log(qk) - log( 1 - qk ) } } } gammaslope.fixed <- cbind( 2*I+1, 1 ) gammaslope <- exp( rep(0,2*I+1) ) # estimate model in TAM mod2 <- TAM::tam.mml.3pl(resp=dat, E=E,control=list(maxiter=100), skillspace="discrete", delta.designmatrix=delta.designmatrix, delta.fixed=delta.fixed, theta.k=theta.k, gammaslope=gammaslope, gammaslope.fixed=gammaslope.fixed, notA=TRUE ) summary(mod2) # extract original tau and delta parameters m1 <- matrix( mod2$gammaslope[1:(2*I) ], ncol=2, byrow=TRUE ) m1 <- as.data.frame(m1) colnames(m1) <- c("a","b") m1$delta.TAM <- log( m1$b / m1$a) m1$tau.TAM <- log( m1$a * m1$b ) # compare estimated parameter m2 <- cbind( sirt::mirt.wrapper.coef( mod1 )$coef, delta, tau )[,-1] colnames(m2) <- c( "delta.mirt", "tau.mirt", "thdim","delta.true","tau.true" ) m2 <- cbind(m1,m2) round( m2, 3 ) ## End(Not run)
############################################################################# # EXAMPLE 1: Simulated data beta response model ############################################################################# #*** (1) Simulation of the beta response model # Table 3 (p. 65) of Noel and Dauvier (2007) delta <- c( -.942, -.649, -.603, -.398, -.379, .523, .649, .781, .907 ) tau <- c( .382, .166, 1.799, .615, 2.092, 1.988, 1.899, 1.439, 1.057 ) K <- 5 # number of categories for discretization N <- 500 # number of persons I <- length(delta) # number of items set.seed(865) theta <- stats::rnorm( N ) dat <- sirt::brm.sim( theta=theta, delta=delta, tau=tau, K=K) psych::describe(dat) #*** (2) some preliminaries for estimation of the model in mirt #*** define a mirt function library(mirt) Theta <- matrix( seq( -4, 4, len=21), ncol=1 ) # compute item response function ii <- 1 # item ii=1 b1 <- sirt::brm.irf( Theta=Theta, delta=delta[ii], tau=tau[ii], ncat=K ) # plot item response functions graphics::matplot( Theta[,1], b1, type="l" ) #*** defining the beta item response function for estimation in mirt par <- c( 0, 1, 1) names(par) <- c( "delta", "tau","thdim") est <- c( TRUE, TRUE, FALSE ) names(est) <- names(par) brm.icc <- function( par, Theta, ncat ){ delta <- par[1] tau <- par[2] thdim <- par[3] probs <- sirt::brm.irf( Theta=Theta, delta=delta, tau=tau, ncat=ncat, thdim=thdim) return(probs) } name <- "brm" # create item response function brm.itemfct <- mirt::createItem(name, par=par, est=est, P=brm.icc) #*** define model in mirt mirtmodel <- mirt::mirt.model(" F1=1-9 " ) itemtype <- rep("brm", I ) customItems <- list("brm"=brm.itemfct) # define parameters to be estimated mod1.pars <- mirt::mirt(dat, mirtmodel, itemtype=itemtype, customItems=customItems, pars="values") ## Not run: #*** (3) estimate beta item response model in mirt mod1 <- mirt::mirt(dat,mirtmodel, itemtype=itemtype, customItems=customItems, pars=mod1.pars, verbose=TRUE ) # model summaries print(mod1) summary(mod1) coef(mod1) # estimated coefficients and comparison with simulated data cbind( sirt::mirt.wrapper.coef( mod1 )$coef, delta, tau ) mirt.wrapper.itemplot(mod1,ask=TRUE) #--------------------------- # estimate beta item response model in TAM library(TAM) # define the skill space: standard normal distribution TP <- 21 # number of theta points theta.k <- diag(TP) theta.vec <- seq( -6,6, len=TP) d1 <- stats::dnorm(theta.vec) d1 <- d1 / sum(d1) delta.designmatrix <- matrix( log(d1), ncol=1 ) delta.fixed <- cbind( 1, 1, 1 ) # define design matrix E E <- array(0, dim=c(I,K,TP,2*I + 1) ) dimnames(E)[[1]] <- items <- colnames(dat) dimnames(E)[[4]] <- c( paste0( rep( items, each=2 ), rep( c("_a","_b" ), I) ), "one" ) for (ii in 1:I){ for (kk in 1:K){ for (tt in 1:TP){ qk <- (2*(kk-1)+1)/(2*K) gammap <- exp( theta.vec[tt] / 2 ) E[ii, kk, tt, 2*(ii-1) + 1 ] <- gammap * log( qk ) E[ii, kk, tt, 2*(ii-1) + 2 ] <- 1 / gammap * log( 1 - qk ) E[ii, kk, tt, 2*I+1 ] <- - log(qk) - log( 1 - qk ) } } } gammaslope.fixed <- cbind( 2*I+1, 1 ) gammaslope <- exp( rep(0,2*I+1) ) # estimate model in TAM mod2 <- TAM::tam.mml.3pl(resp=dat, E=E,control=list(maxiter=100), skillspace="discrete", delta.designmatrix=delta.designmatrix, delta.fixed=delta.fixed, theta.k=theta.k, gammaslope=gammaslope, gammaslope.fixed=gammaslope.fixed, notA=TRUE ) summary(mod2) # extract original tau and delta parameters m1 <- matrix( mod2$gammaslope[1:(2*I) ], ncol=2, byrow=TRUE ) m1 <- as.data.frame(m1) colnames(m1) <- c("a","b") m1$delta.TAM <- log( m1$b / m1$a) m1$tau.TAM <- log( m1$a * m1$b ) # compare estimated parameter m2 <- cbind( sirt::mirt.wrapper.coef( mod1 )$coef, delta, tau )[,-1] colnames(m2) <- c( "delta.mirt", "tau.mirt", "thdim","delta.true","tau.true" ) m2 <- cbind(m1,m2) round( m2, 3 ) ## End(Not run)
The function btm
estimates an extended Bradley-Terry model (Hunter, 2004; see Details).
Parameter estimation uses a bias corrected joint maximum likelihood
estimation method based on -adjustment (see Bertoli-Barsotti, Lando & Punzo, 2014).
See Details for the algorithm.
The function btm_sim
simulated data from the extended Bradley-Terry model.
btm(data, judge=NULL, ignore.ties=FALSE, fix.eta=NULL, fix.delta=NULL, fix.theta=NULL, maxiter=100, conv=1e-04, eps=0.3, wgt.ties=.5) ## S3 method for class 'btm' summary(object, file=NULL, digits=4,...) ## S3 method for class 'btm' predict(object, data=NULL, ...) btm_sim(theta, eta=0, delta=-99, repeated=FALSE)
btm(data, judge=NULL, ignore.ties=FALSE, fix.eta=NULL, fix.delta=NULL, fix.theta=NULL, maxiter=100, conv=1e-04, eps=0.3, wgt.ties=.5) ## S3 method for class 'btm' summary(object, file=NULL, digits=4,...) ## S3 method for class 'btm' predict(object, data=NULL, ...) btm_sim(theta, eta=0, delta=-99, repeated=FALSE)
data |
Data frame with three columns. The first two columns contain labels from the units in the pair comparison. The third column contains the result of the comparison. "1" means that the first units wins, "0" means that the second unit wins and "0.5" means a draw (a tie). |
judge |
Optional vector of judge identifiers (if multiple judges are available) |
ignore.ties |
Logical indicating whether ties should be ignored. |
fix.eta |
Numeric value for a fixed |
fix.delta |
Numeric value for a fixed |
fix.theta |
A vector with entries for fixed theta values. |
maxiter |
Maximum number of iterations |
conv |
Convergence criterion |
eps |
The |
wgt.ties |
Weighting parameter for ties, see formula in Details. The default is .5 |
object |
Object of class |
file |
Optional file name for sinking the summary into |
digits |
Number of digits after decimal to print |
... |
Further arguments to be passed. |
theta |
Vector of abilities |
eta |
Value of |
delta |
Value of |
repeated |
Logical indicating whether repeated ratings of dyads (for home advantage effect) should be simulated |
The extended Bradley-Terry model for the comparison of individuals
and
is defined as
The parameters denote the abilities,
is the
tendency of the occurrence of ties and
is the home-advantage
effect. The weighting parameter
governs the importance of ties and can be
chosen in the argument
wgt.ties
.
A joint maximum likelihood (JML) estimation is applied for simulataneous
estimation of ,
and all
parameters.
In the Rasch model, it was shown that JML can result in biased parameter
estimates. The
-adjustment approach has been proposed
to reduce the bias in parameter estimates (Bertoli-Bersotti, Lando & Punzo, 2014).
This estimation approach is adapted to the Bradley-Terry model in
the
btm
function. To this end, the likelihood function is
modified for the purpose of bias reduction. It can be easily shown that there
exist sufficient statistics for ,
and all
parameters. In the
-adjustment approach, the sufficient
statistic for the
parameter is modified. In JML estimation
of the Bradley-Terry model,
is
a sufficient statistic for
. Let
the maximum score
for person
which is the number of
terms appearing in
. In the
-adjustment approach, the sufficient statistic
is modified to
and instead of
is used in JML estimation. Hence, original scores
are
linearly transformed for all persons
.
List with following entries
pars |
Parameter summary for |
effects |
Parameter estimates for |
summary.effects |
Summary of |
mle.rel |
MLE reliability, also known as separation reliability |
sepG |
Separation index |
probs |
Estimated probabilities |
data |
Used dataset with integer identifiers |
fit_judges |
Fit statistics (outfit and infit) for judges if |
residuals |
Unstandardized and standardized residuals for each observation |
Bertoli-Barsotti, L., Lando, T., & Punzo, A. (2014). Estimating a Rasch Model via fuzzy empirical probability functions. In D. Vicari, A. Okada, G. Ragozini & C. Weihs (Eds.). Analysis and Modeling of Complex Data in Behavioral and Social Sciences. Springer. doi:10.1007/978-3-319-06692-9_4
Hunter, D. R. (2004). MM algorithms for generalized Bradley-Terry models. Annals of Statistics, 32, 384-406. doi: 10.1214/aos/1079120141
See also the R packages BradleyTerry2, psychotools, psychomix and prefmod.
############################################################################# # EXAMPLE 1: Bradley-Terry model | data.pw01 ############################################################################# data(data.pw01) dat <- data.pw01 dat <- dat[, c("home_team", "away_team", "result") ] # recode results according to needed input dat$result[ dat$result==0 ] <- 1/2 # code for ties dat$result[ dat$result==2 ] <- 0 # code for victory of away team #******************** # Model 1: Estimation with ties and home advantage mod1 <- sirt::btm( dat) summary(mod1) ## Not run: #*** Model 2: Estimation with ties, no epsilon adjustment mod2 <- sirt::btm( dat, eps=0) summary(mod2) #*** Model 3: Estimation with ties, no epsilon adjustment, weight for ties of .333 which # corresponds to the rule of 3 points for a victory and 1 point of a draw in football mod3 <- sirt::btm( dat, eps=0, wgt.ties=1/3) summary(mod3) #*** Model 4: Some fixed abilities fix.theta <- c("Anhalt Dessau"=-1 ) mod4 <- sirt::btm( dat, eps=0, fix.theta=fix.theta) summary(mod4) #*** Model 5: Ignoring ties, no home advantage effect mod5 <- sirt::btm( dat, ignore.ties=TRUE, fix.eta=0) summary(mod5) #*** Model 6: Ignoring ties, no home advantage effect (JML approach and eps=0) mod6 <- sirt::btm( dat, ignore.ties=TRUE, fix.eta=0, eps=0) summary(mod5) ############################################################################# # EXAMPLE 2: Venice chess data ############################################################################# # See http://www.rasch.org/rmt/rmt113o.htm # Linacre, J. M. (1997). Paired Comparisons with Standard Rasch Software. # Rasch Measurement Transactions, 11:3, 584-585. # dataset with chess games -> "D" denotes a draw (tie) chessdata <- scan( what="character") 1D.0..1...1....1.....1......D.......D........1.........1.......... Browne 0.1.D..0...1....1.....1......D.......1........D.........1......... Mariotti .D0..0..1...D....D.....1......1.......1........1.........D........ Tatai ...1D1...D...D....1.....D......D.......D........1.........0....... Hort ......010D....D....D.....1......D.......1........1.........D...... Kavalek ..........00DDD.....D.....D......D.......1........D.........1..... Damjanovic ...............00D0DD......D......1.......1........1.........0.... Gligoric .....................000D0DD.......D.......1........D.........1... Radulov ............................DD0DDD0D........0........0.........1.. Bobotsov ....................................D00D00001.........1.........1. Cosulich .............................................0D000D0D10..........1 Westerinen .......................................................00D1D010000 Zichichi L <- length(chessdata) / 2 games <- matrix( chessdata, nrow=L, ncol=2, byrow=TRUE ) G <- nchar(games[1,1]) # create matrix with results results <- matrix( NA, nrow=G, ncol=3 ) for (gg in 1:G){ games.gg <- substring( games[,1], gg, gg ) ind.gg <- which( games.gg !="." ) results[gg, 1:2 ] <- games[ ind.gg, 2] results[gg, 3 ] <- games.gg[ ind.gg[1] ] } results <- as.data.frame(results) results[,3] <- paste(results[,3] ) results[ results[,3]=="D", 3] <- 1/2 results[,3] <- as.numeric( results[,3] ) # fit model ignoring draws mod1 <- sirt::btm( results, ignore.ties=TRUE, fix.eta=0, eps=0 ) summary(mod1) # fit model with draws mod2 <- sirt::btm( results, fix.eta=0, eps=0 ) summary(mod2) ############################################################################# # EXAMPLE 3: Simulated data from the Bradley-Terry model ############################################################################# set.seed(9098) N <- 22 theta <- seq(2,-2, len=N) #** simulate and estimate data without repeated dyads dat1 <- sirt::btm_sim(theta=theta) mod1 <- sirt::btm( dat1, ignore.ties=TRUE, fix.delta=-99, fix.eta=0) summary(mod1) #*** simulate data with home advantage effect and ties dat2 <- sirt::btm_sim(theta=theta, eta=.8, delta=-.6, repeated=TRUE) mod2 <- sirt::btm(dat2) summary(mod2) ############################################################################# # EXAMPLE 4: Estimating the Bradley-Terry model with multiple judges ############################################################################# #*** simulating data with multiple judges set.seed(987) N <- 26 # number of objects to be rated theta <- seq(2,-2, len=N) s1 <- stats::sd(theta) dat <- NULL # judge discriminations which define tendency to provide reliable ratings discrim <- c( rep(.9,10), rep(.5,2), rep(0,2) ) #=> last four raters provide less reliable ratings RR <- length(discrim) for (rr in 1:RR){ theta1 <- discrim[rr]*theta + stats::rnorm(N, mean=0, sd=s1*sqrt(1-discrim[rr])) dat1 <- sirt::btm_sim(theta1) dat1$judge <- rr dat <- rbind(dat, dat1) } #** estimate the Bradley-Terry model and compute judge-specific fit statistics mod <- sirt::btm( dat[,1:3], judge=paste0("J",100+dat[,4]), fix.eta=0, ignore.ties=TRUE) summary(mod) ## End(Not run)
############################################################################# # EXAMPLE 1: Bradley-Terry model | data.pw01 ############################################################################# data(data.pw01) dat <- data.pw01 dat <- dat[, c("home_team", "away_team", "result") ] # recode results according to needed input dat$result[ dat$result==0 ] <- 1/2 # code for ties dat$result[ dat$result==2 ] <- 0 # code for victory of away team #******************** # Model 1: Estimation with ties and home advantage mod1 <- sirt::btm( dat) summary(mod1) ## Not run: #*** Model 2: Estimation with ties, no epsilon adjustment mod2 <- sirt::btm( dat, eps=0) summary(mod2) #*** Model 3: Estimation with ties, no epsilon adjustment, weight for ties of .333 which # corresponds to the rule of 3 points for a victory and 1 point of a draw in football mod3 <- sirt::btm( dat, eps=0, wgt.ties=1/3) summary(mod3) #*** Model 4: Some fixed abilities fix.theta <- c("Anhalt Dessau"=-1 ) mod4 <- sirt::btm( dat, eps=0, fix.theta=fix.theta) summary(mod4) #*** Model 5: Ignoring ties, no home advantage effect mod5 <- sirt::btm( dat, ignore.ties=TRUE, fix.eta=0) summary(mod5) #*** Model 6: Ignoring ties, no home advantage effect (JML approach and eps=0) mod6 <- sirt::btm( dat, ignore.ties=TRUE, fix.eta=0, eps=0) summary(mod5) ############################################################################# # EXAMPLE 2: Venice chess data ############################################################################# # See http://www.rasch.org/rmt/rmt113o.htm # Linacre, J. M. (1997). Paired Comparisons with Standard Rasch Software. # Rasch Measurement Transactions, 11:3, 584-585. # dataset with chess games -> "D" denotes a draw (tie) chessdata <- scan( what="character") 1D.0..1...1....1.....1......D.......D........1.........1.......... Browne 0.1.D..0...1....1.....1......D.......1........D.........1......... Mariotti .D0..0..1...D....D.....1......1.......1........1.........D........ Tatai ...1D1...D...D....1.....D......D.......D........1.........0....... Hort ......010D....D....D.....1......D.......1........1.........D...... Kavalek ..........00DDD.....D.....D......D.......1........D.........1..... Damjanovic ...............00D0DD......D......1.......1........1.........0.... Gligoric .....................000D0DD.......D.......1........D.........1... Radulov ............................DD0DDD0D........0........0.........1.. Bobotsov ....................................D00D00001.........1.........1. Cosulich .............................................0D000D0D10..........1 Westerinen .......................................................00D1D010000 Zichichi L <- length(chessdata) / 2 games <- matrix( chessdata, nrow=L, ncol=2, byrow=TRUE ) G <- nchar(games[1,1]) # create matrix with results results <- matrix( NA, nrow=G, ncol=3 ) for (gg in 1:G){ games.gg <- substring( games[,1], gg, gg ) ind.gg <- which( games.gg !="." ) results[gg, 1:2 ] <- games[ ind.gg, 2] results[gg, 3 ] <- games.gg[ ind.gg[1] ] } results <- as.data.frame(results) results[,3] <- paste(results[,3] ) results[ results[,3]=="D", 3] <- 1/2 results[,3] <- as.numeric( results[,3] ) # fit model ignoring draws mod1 <- sirt::btm( results, ignore.ties=TRUE, fix.eta=0, eps=0 ) summary(mod1) # fit model with draws mod2 <- sirt::btm( results, fix.eta=0, eps=0 ) summary(mod2) ############################################################################# # EXAMPLE 3: Simulated data from the Bradley-Terry model ############################################################################# set.seed(9098) N <- 22 theta <- seq(2,-2, len=N) #** simulate and estimate data without repeated dyads dat1 <- sirt::btm_sim(theta=theta) mod1 <- sirt::btm( dat1, ignore.ties=TRUE, fix.delta=-99, fix.eta=0) summary(mod1) #*** simulate data with home advantage effect and ties dat2 <- sirt::btm_sim(theta=theta, eta=.8, delta=-.6, repeated=TRUE) mod2 <- sirt::btm(dat2) summary(mod2) ############################################################################# # EXAMPLE 4: Estimating the Bradley-Terry model with multiple judges ############################################################################# #*** simulating data with multiple judges set.seed(987) N <- 26 # number of objects to be rated theta <- seq(2,-2, len=N) s1 <- stats::sd(theta) dat <- NULL # judge discriminations which define tendency to provide reliable ratings discrim <- c( rep(.9,10), rep(.5,2), rep(0,2) ) #=> last four raters provide less reliable ratings RR <- length(discrim) for (rr in 1:RR){ theta1 <- discrim[rr]*theta + stats::rnorm(N, mean=0, sd=s1*sqrt(1-discrim[rr])) dat1 <- sirt::btm_sim(theta1) dat1$judge <- rr dat <- rbind(dat, dat1) } #** estimate the Bradley-Terry model and compute judge-specific fit statistics mod <- sirt::btm( dat[,1:3], judge=paste0("J",100+dat[,4]), fix.eta=0, ignore.ties=TRUE) summary(mod) ## End(Not run)
The function categorize
defines categories for variables in
a data frame, starting with a user-defined index (e.g. 0 or 1).
Continuous variables can be categorized by defining categories by
discretizing the variables in different quantile groups.
The function decategorize
does the reverse operation.
categorize(dat, categorical=NULL, quant=NULL, lowest=0) decategorize(dat, categ_design=NULL)
categorize(dat, categorical=NULL, quant=NULL, lowest=0) decategorize(dat, categ_design=NULL)
dat |
Data frame |
categorical |
Vector with variable names which should be converted into categories,
beginning with integer |
quant |
Vector with number of classes for each variables. Variables are categorized among quantiles. The vector must have names containing variable names. |
lowest |
Lowest category index. Default is 0. |
categ_design |
Data frame containing informations about
categorization which is the output of |
For categorize
, it is a list with entries
data |
Converted data frame |
categ_design |
Data frame containing some informations about categorization |
For decategorize
it is a data frame.
## Not run: library(mice) library(miceadds) ############################################################################# # EXAMPLE 1: Categorize questionnaire data ############################################################################# data(data.smallscale, package="miceadds") dat <- data.smallscale # (0) select dataset dat <- dat[, 9:20 ] summary(dat) categorical <- colnames(dat)[2:6] # (1) categorize data res <- sirt::categorize( dat, categorical=categorical ) # (2) multiple imputation using the mice package dat2 <- res$data VV <- ncol(dat2) impMethod <- rep( "sample", VV ) # define random sampling imputation method names(impMethod) <- colnames(dat2) imp <- mice::mice( as.matrix(dat2), impMethod=impMethod, maxit=1, m=1 ) dat3 <- mice::complete(imp,action=1) # (3) decategorize dataset dat3a <- sirt::decategorize( dat3, categ_design=res$categ_design ) ############################################################################# # EXAMPLE 2: Categorize ordinal and continuous data ############################################################################# data(data.ma01,package="miceadds") dat <- data.ma01 summary(dat[,-c(1:2)] ) # define variables to be categorized categorical <- c("books", "paredu" ) # define quantiles quant <- c(6,5,11) names(quant) <- c("math", "read", "hisei") # categorize data res <- sirt::categorize( dat, categorical=categorical, quant=quant) str(res) ## End(Not run)
## Not run: library(mice) library(miceadds) ############################################################################# # EXAMPLE 1: Categorize questionnaire data ############################################################################# data(data.smallscale, package="miceadds") dat <- data.smallscale # (0) select dataset dat <- dat[, 9:20 ] summary(dat) categorical <- colnames(dat)[2:6] # (1) categorize data res <- sirt::categorize( dat, categorical=categorical ) # (2) multiple imputation using the mice package dat2 <- res$data VV <- ncol(dat2) impMethod <- rep( "sample", VV ) # define random sampling imputation method names(impMethod) <- colnames(dat2) imp <- mice::mice( as.matrix(dat2), impMethod=impMethod, maxit=1, m=1 ) dat3 <- mice::complete(imp,action=1) # (3) decategorize dataset dat3a <- sirt::decategorize( dat3, categ_design=res$categ_design ) ############################################################################# # EXAMPLE 2: Categorize ordinal and continuous data ############################################################################# data(data.ma01,package="miceadds") dat <- data.ma01 summary(dat[,-c(1:2)] ) # define variables to be categorized categorical <- c("books", "paredu" ) # define quantiles quant <- c(6,5,11) names(quant) <- c("math", "read", "hisei") # categorize data res <- sirt::categorize( dat, categorical=categorical, quant=quant) str(res) ## End(Not run)
This function estimates conditional covariances of itempairs
(Stout, Habing, Douglas & Kim, 1996; Zhang & Stout,
1999a). The function is used for the estimation of the DETECT index.
The ccov.np
function has the (default) option to smooth item response
functions (argument smooth
) in the computation of conditional covariances
(Douglas, Kim, Habing, & Gao, 1998).
ccov.np(data, score, bwscale=1.1, thetagrid=seq(-3, 3, len=200), progress=TRUE, scale_score=TRUE, adjust_thetagrid=TRUE, smooth=TRUE, use_sum_score=FALSE, bias_corr=TRUE)
ccov.np(data, score, bwscale=1.1, thetagrid=seq(-3, 3, len=200), progress=TRUE, scale_score=TRUE, adjust_thetagrid=TRUE, smooth=TRUE, use_sum_score=FALSE, bias_corr=TRUE)
data |
An |
score |
An ability estimate, e.g. the WLE |
bwscale |
Bandwidth factor for calculation of conditional covariance. The bandwidth
used in the estimation is |
thetagrid |
A vector which contains theta values where conditional covariances are evaluated. |
progress |
Display progress? |
scale_score |
Logical indicating whether |
adjust_thetagrid |
Logical indicating whether |
smooth |
Logical indicating whether smoothing should be applied for conditional covariance estimation |
use_sum_score |
Logical indicating whether sum score should be used. With this option, the bias corrected conditional covariance of Zhang and Stout (1999) is used. |
bias_corr |
Logical indicating whether bias correction (Zhang & Stout, 1999)
should be utilized if |
This function is used in conf.detect
and expl.detect
.
Douglas, J., Kim, H. R., Habing, B., & Gao, F. (1998). Investigating local dependence with conditional covariance functions. Journal of Educational and Behavioral Statistics, 23(2), 129-151. doi:10.3102/10769986023002129
Stout, W., Habing, B., Douglas, J., & Kim, H. R. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20(4), 331-354. doi:10.1177/014662169602000403
Zhang, J., & Stout, W. (1999). Conditional covariance structure of generalized compensatory multidimensional items. Psychometrika, 64(2), 129-152. doi:10.1007/BF02294532
## Not run: ############################################################################# # EXAMPLE 1: data.read | different settings for computing conditional covariance ############################################################################# data(data.read, package="sirt") dat <- data.read #* fit Rasch model mod <- sirt::rasch.mml2(dat) score <- sirt::wle.rasch(dat=dat, b=mod$item$b)$theta #* ccov with smoothing cmod1 <- sirt::ccov.np(data=dat, score=score, bwscale=1.1) #* ccov without smoothing cmod2 <- sirt::ccov.np(data=dat, score=score, smooth=FALSE) #- compare results 100*cbind( cmod1$ccov.table[1:6, "ccov"], cmod2$ccov.table[1:6, "ccov"]) ## End(Not run)
## Not run: ############################################################################# # EXAMPLE 1: data.read | different settings for computing conditional covariance ############################################################################# data(data.read, package="sirt") dat <- data.read #* fit Rasch model mod <- sirt::rasch.mml2(dat) score <- sirt::wle.rasch(dat=dat, b=mod$item$b)$theta #* ccov with smoothing cmod1 <- sirt::ccov.np(data=dat, score=score, bwscale=1.1) #* ccov without smoothing cmod2 <- sirt::ccov.np(data=dat, score=score, smooth=FALSE) #- compare results 100*cbind( cmod1$ccov.table[1:6, "ccov"], cmod2$ccov.table[1:6, "ccov"]) ## End(Not run)
Estimates a unidimensional factor model based on the normal distribution fitting
function under full and partial measurement invariance.
Item loadings and item intercepts are successively freed based on the largest
modification index and a chosen significance level alpha
.
cfa_meas_inv(dat, group, weights=NULL, alpha=0.01, verbose=FALSE, op=c("~1","=~"))
cfa_meas_inv(dat, group, weights=NULL, alpha=0.01, verbose=FALSE, op=c("~1","=~"))
dat |
Data frame containing items |
group |
Vector of group identifiers |
weights |
Optional vector of sampling weights |
alpha |
Significance level |
verbose |
Logical indicating whether progress should be shown |
op |
Operators (intercepts or loadings) for which estimation should be freed |
List with several entries
pars_mi |
Model parameters under full invariance |
pars_pi |
Model parameters under partial invariance |
mod_mi |
Fitted model under full invariance |
mod_pi |
Fitted model under partial invariance |
... |
More output |
See also sirt::invariance.alignment
## Not run: ############################################################################# # EXAMPLE 1: Factor model under full and partial invariance ############################################################################# #--- data simulation set.seed(65) G <- 3 # number of groups I <- 5 # number of items # define lambda and nu parameters lambda <- matrix(1, nrow=G, ncol=I) nu <- matrix(0, nrow=G, ncol=I) err_var <- matrix(1, nrow=G, ncol=I) # define size of noninvariance dif <- 1 #- 1st group: N(0,1) lambda[1,3] <- 1+dif*.4; nu[1,5] <- dif*.5 #- 2nd group: N(0.3,1.5) gg <- 2 ; lambda[gg,5] <- 1-.5*dif; nu[gg,1] <- -.5*dif #- 3nd group: N(.8,1.2) gg <- 3 lambda[gg,4] <- 1-.7*dif; nu[gg,2] <- -.5*dif #- define distributions of groups mu <- c(0,.3,.8) sigma <- sqrt(c(1,1.5,1.2)) N <- rep(1000,3) # sample sizes per group #* use simulation function dat <- sirt::invariance_alignment_simulate(nu, lambda, err_var, mu, sigma, N, exact=TRUE) #--- estimate CFA mod <- sirt::cfa_meas_inv(dat=dat[,-1], group=dat$group, verbose=TRUE, alpha=0.05) mod$pars_mi mod$pars_pi ## End(Not run)
## Not run: ############################################################################# # EXAMPLE 1: Factor model under full and partial invariance ############################################################################# #--- data simulation set.seed(65) G <- 3 # number of groups I <- 5 # number of items # define lambda and nu parameters lambda <- matrix(1, nrow=G, ncol=I) nu <- matrix(0, nrow=G, ncol=I) err_var <- matrix(1, nrow=G, ncol=I) # define size of noninvariance dif <- 1 #- 1st group: N(0,1) lambda[1,3] <- 1+dif*.4; nu[1,5] <- dif*.5 #- 2nd group: N(0.3,1.5) gg <- 2 ; lambda[gg,5] <- 1-.5*dif; nu[gg,1] <- -.5*dif #- 3nd group: N(.8,1.2) gg <- 3 lambda[gg,4] <- 1-.7*dif; nu[gg,2] <- -.5*dif #- define distributions of groups mu <- c(0,.3,.8) sigma <- sqrt(c(1,1.5,1.2)) N <- rep(1000,3) # sample sizes per group #* use simulation function dat <- sirt::invariance_alignment_simulate(nu, lambda, err_var, mu, sigma, N, exact=TRUE) #--- estimate CFA mod <- sirt::cfa_meas_inv(dat=dat[,-1], group=dat$group, verbose=TRUE, alpha=0.05) mod$pars_mi mod$pars_pi ## End(Not run)
This function computes the classification accuracy in the Rasch model for the maximum likelihood (person parameter) estimate according to the method of Rudner (2001).
class.accuracy.rasch(cutscores, b, meantheta, sdtheta, theta.l, n.sims=0)
class.accuracy.rasch(cutscores, b, meantheta, sdtheta, theta.l, n.sims=0)
cutscores |
Vector of cut scores |
b |
Vector of item difficulties |
meantheta |
Mean of the trait distribution |
sdtheta |
Standard deviation of the trait distribution |
theta.l |
Discretized theta distribution |
n.sims |
Number of simulated persons in a data set. The default is 0 which means that no simulation is performed. |
A list with following entries:
class.stats |
Data frame containing classification accuracy statistics. The
column |
class.prob |
Probability table of classification |
Rudner, L.M. (2001). Computing the expected proportions of misclassified examinees. Practical Assessment, Research & Evaluation, 7(14).
Classification accuracy of other IRT models can be obtained with the R package cacIRT.
############################################################################# # EXAMPLE 1: Reading dataset ############################################################################# data( data.read, package="sirt") dat <- data.read # estimate the Rasch model mod <- sirt::rasch.mml2( dat ) # estimate classification accuracy (3 levels) cutscores <- c( -1, .3 ) # cut scores at theta=-1 and theta=.3 sirt::class.accuracy.rasch( cutscores=cutscores, b=mod$item$b, meantheta=0, sdtheta=mod$sd.trait, theta.l=seq(-4,4,len=200), n.sims=3000) ## Cut Scores ## [1] -1.0 0.3 ## ## WLE reliability (by simulation)=0.671 ## WLE consistency (correlation between two parallel forms)=0.649 ## ## Classification accuracy and consistency ## agree0 agree1 kappa consistency ## analytical 0.68 0.990 0.492 NA ## simulated 0.70 0.997 0.489 0.599 ## ## Probability classification table ## Est_Class1 Est_Class2 Est_Class3 ## True_Class1 0.136 0.041 0.001 ## True_Class2 0.081 0.249 0.093 ## True_Class3 0.009 0.095 0.294
############################################################################# # EXAMPLE 1: Reading dataset ############################################################################# data( data.read, package="sirt") dat <- data.read # estimate the Rasch model mod <- sirt::rasch.mml2( dat ) # estimate classification accuracy (3 levels) cutscores <- c( -1, .3 ) # cut scores at theta=-1 and theta=.3 sirt::class.accuracy.rasch( cutscores=cutscores, b=mod$item$b, meantheta=0, sdtheta=mod$sd.trait, theta.l=seq(-4,4,len=200), n.sims=3000) ## Cut Scores ## [1] -1.0 0.3 ## ## WLE reliability (by simulation)=0.671 ## WLE consistency (correlation between two parallel forms)=0.649 ## ## Classification accuracy and consistency ## agree0 agree1 kappa consistency ## analytical 0.68 0.990 0.492 NA ## simulated 0.70 0.997 0.489 0.599 ## ## Probability classification table ## Est_Class1 Est_Class2 Est_Class3 ## True_Class1 0.136 0.041 0.001 ## True_Class2 0.081 0.249 0.093 ## True_Class3 0.009 0.095 0.294
This function computes the DETECT statistics for dichotomous item responses and the polyDETECT statistic for polytomous item responses under a confirmatory specification of item clusters (Stout, Habing, Douglas & Kim, 1996; Zhang & Stout, 1999a, 1999b; Zhang, 2007; Bonifay, Reise, Scheines, & Meijer, 2015).
Item responses in a multi-matrix design are allowed (Zhang, 2013).
An exploratory DETECT analysis can be conducted using the
expl.detect
function.
conf.detect(data, score, itemcluster, bwscale=1.1, progress=TRUE, thetagrid=seq(-3, 3, len=200), smooth=TRUE, use_sum_score=FALSE, bias_corr=TRUE) ## S3 method for class 'conf.detect' summary(object, digits=3, file=NULL, ...)
conf.detect(data, score, itemcluster, bwscale=1.1, progress=TRUE, thetagrid=seq(-3, 3, len=200), smooth=TRUE, use_sum_score=FALSE, bias_corr=TRUE) ## S3 method for class 'conf.detect' summary(object, digits=3, file=NULL, ...)
data |
An |
score |
An ability estimate, e.g. the WLE, sum score or mean score |
itemcluster |
Item cluster for each item. The order of entries must correspond
to the columns in |
bwscale |
Bandwidth factor for calculation of conditional covariance
(see |
progress |
Display progress? |
smooth |
Logical indicating whether smoothing should be applied for conditional covariance estimation |
thetagrid |
A vector which contains theta values where conditional covariances are evaluated. |
use_sum_score |
Logical indicating whether sum score should be used. With this option, the bias corrected conditional covariance of Zhang and Stout (1999) is used. |
bias_corr |
Logical indicating whether bias correction (Zhang & Stout, 1999)
should be utilized if |
object |
Object of class |
digits |
Number of digits for rounding in |
file |
Optional file name to be sunk for |
... |
Further arguments to be passed |
The result of DETECT are the indices DETECT
, ASSI
and RATIO
(see Zhang 2007 for details) calculated
for the options unweighted
and weighted
.
The option unweighted
means that all conditional covariances of
item pairs are equally weighted, weighted
means that
these covariances are weighted by the sample size of item pairs.
In case of multi matrix item designs, both types of indices can
differ.
The classification scheme of these indices are as follows (Jang & Roussos, 2007; Zhang, 2007):
Strong multidimensionality | DETECT > 1.00 |
Moderate multidimensionality | .40 < DETECT < 1.00 |
Weak multidimensionality | .20 < DETECT < .40 |
Essential unidimensionality | DETECT < .20 |
Maximum value under simple structure | ASSI=1 | RATIO=1 |
Essential deviation from unidimensionality | ASSI > .25 | RATIO > .36 |
Essential unidimensionality | ASSI < .25 | RATIO < .36 |
Note that the expected value of a conditional covariance for an item pair
is negative when a unidimensional model holds. In consequence,
the DETECT index can become negative for unidimensional data
(see Example 3). This can be also seen in the statistic
MCOV100
in the value detect
.
A list with following entries:
detect |
Data frame with statistics DETECT, ASSI, RATIO, MADCOV100 and MCOV100 |
ccovtable |
Individual contributions to conditional covariance |
ccov.matrix |
Evaluated conditional covariance |
Bonifay, W. E., Reise, S. P., Scheines, R., & Meijer, R. R. (2015). When are multidimensional data unidimensional enough for structural equation modeling? An evaluation of the DETECT multidimensionality index. Structural Equation Modeling, 22(4), 504-516. doi:10.1080/10705511.2014.938596
Jang, E. E., & Roussos, L. (2007). An investigation into the dimensionality of TOEFL using conditional covariance-based nonparametric approach. Journal of Educational Measurement, 44(1), 1-21. doi:10.1111/j.1745-3984.2007.00024.x
Stout, W., Habing, B., Douglas, J., & Kim, H. R. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20(4), 331-354. doi:10.1177/014662169602000403
Zhang, J. (2007). Conditional covariance theory and DETECT for polytomous items. Psychometrika, 72(1), 69-91. doi:10.1007/s11336-004-1257-7
Zhang, J. (2013). A procedure for dimensionality analyses of response data from various test designs. Psychometrika, 78(1), 37-58. doi:10.1007/s11336-012-9287-z
Zhang, J., & Stout, W. (1999a). Conditional covariance structure of generalized compensatory multidimensional items. Psychometrika, 64(2), 129-152. doi:10.1007/BF02294532
Zhang, J., & Stout, W. (1999b). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika, 64(2), 213-249. doi:10.1007/BF02294536
For a download of the free DIM-Pack software (DIMTEST, DETECT) see https://psychometrics.onlinehelp.measuredprogress.org/tools/dim/.
See expl.detect
for exploratory DETECT analysis.
############################################################################# # EXAMPLE 1: TIMSS mathematics data set (dichotomous data) ############################################################################# data(data.timss) # extract data dat <- data.timss$data dat <- dat[, substring( colnames(dat),1,1)=="M" ] # extract item informations iteminfo <- data.timss$item # estimate Rasch model mod1 <- sirt::rasch.mml2( dat ) # estimate WLEs wle1 <- sirt::wle.rasch( dat, b=mod1$item$b )$theta # DETECT for content domains detect1 <- sirt::conf.detect( data=dat, score=wle1, itemcluster=iteminfo$Content.Domain ) ## unweighted weighted ## DETECT 0.316 0.316 ## ASSI 0.273 0.273 ## RATIO 0.355 0.355 ## Not run: # DETECT cognitive domains detect2 <- sirt::conf.detect( data=dat, score=wle1, itemcluster=iteminfo$Cognitive.Domain ) ## unweighted weighted ## DETECT 0.251 0.251 ## ASSI 0.227 0.227 ## RATIO 0.282 0.282 # DETECT for item format detect3 <- sirt::conf.detect( data=dat, score=wle1, itemcluster=iteminfo$Format ) ## unweighted weighted ## DETECT 0.056 0.056 ## ASSI 0.060 0.060 ## RATIO 0.062 0.062 # DETECT for item blocks detect4 <- sirt::conf.detect( data=dat, score=wle1, itemcluster=iteminfo$Block ) ## unweighted weighted ## DETECT 0.301 0.301 ## ASSI 0.193 0.193 ## RATIO 0.339 0.339 ## End(Not run) # Exploratory DETECT: Application of a cluster analysis employing the Ward method detect5 <- sirt::expl.detect( data=dat, score=wle1, nclusters=10, N.est=nrow(dat) ) # Plot cluster solution pl <- graphics::plot( detect5$clusterfit, main="Cluster solution" ) stats::rect.hclust(detect5$clusterfit, k=4, border="red") ## Not run: ############################################################################# # EXAMPLE 2: Big 5 data set (polytomous data) ############################################################################# # attach Big5 Dataset data(data.big5) # select 6 items of each dimension dat <- data.big5 dat <- dat[, 1:30] # estimate person score by simply using a transformed sum score score <- stats::qnorm( ( rowMeans( dat )+.5 ) / ( 30 + 1 ) ) # extract item cluster (Big 5 dimensions) itemcluster <- substring( colnames(dat), 1, 1 ) # DETECT Item cluster detect1 <- sirt::conf.detect( data=dat, score=score, itemcluster=itemcluster ) ## unweighted weighted ## DETECT 1.256 1.256 ## ASSI 0.384 0.384 ## RATIO 0.597 0.597 # Exploratory DETECT detect5 <- sirt::expl.detect( data=dat, score=score, nclusters=9, N.est=nrow(dat) ) ## DETECT (unweighted) ## Optimal Cluster Size is 6 (Maximum of DETECT Index) ## N.Cluster N.items N.est N.val size.cluster DETECT.est ASSI.est RATIO.est ## 1 2 30 500 0 6-24 1.073 0.246 0.510 ## 2 3 30 500 0 6-10-14 1.578 0.457 0.750 ## 3 4 30 500 0 6-10-11-3 1.532 0.444 0.729 ## 4 5 30 500 0 6-8-11-2-3 1.591 0.462 0.757 ## 5 6 30 500 0 6-8-6-2-5-3 1.610 0.499 0.766 ## 6 7 30 500 0 6-3-6-2-5-5-3 1.557 0.476 0.740 ## 7 8 30 500 0 6-3-3-2-3-5-5-3 1.540 0.462 0.732 ## 8 9 30 500 0 6-3-3-2-3-5-3-3-2 1.522 0.444 0.724 # Plot Cluster solution pl <- graphics::plot( detect5$clusterfit, main="Cluster solution" ) stats::rect.hclust(detect5$clusterfit, k=6, border="red") ############################################################################# # EXAMPLE 3: DETECT index for unidimensional data ############################################################################# set.seed(976) N <- 1000 I <- 20 b <- sample( seq( -2, 2, len=I) ) dat <- sirt::sim.raschtype( stats::rnorm(N), b=b ) # estimate Rasch model and corresponding WLEs mod1 <- TAM::tam.mml( dat ) wmod1 <- TAM::tam.wle(mod1)$theta # define item cluster itemcluster <- c( rep(1,5), rep(2,I-5) ) # compute DETECT statistic detect1 <- sirt::conf.detect( data=dat, score=wmod1, itemcluster=itemcluster) ## unweighted weighted ## DETECT -0.184 -0.184 ## ASSI -0.147 -0.147 ## RATIO -0.226 -0.226 ## MADCOV100 0.816 0.816 ## MCOV100 -0.786 -0.786 ## End(Not run)
############################################################################# # EXAMPLE 1: TIMSS mathematics data set (dichotomous data) ############################################################################# data(data.timss) # extract data dat <- data.timss$data dat <- dat[, substring( colnames(dat),1,1)=="M" ] # extract item informations iteminfo <- data.timss$item # estimate Rasch model mod1 <- sirt::rasch.mml2( dat ) # estimate WLEs wle1 <- sirt::wle.rasch( dat, b=mod1$item$b )$theta # DETECT for content domains detect1 <- sirt::conf.detect( data=dat, score=wle1, itemcluster=iteminfo$Content.Domain ) ## unweighted weighted ## DETECT 0.316 0.316 ## ASSI 0.273 0.273 ## RATIO 0.355 0.355 ## Not run: # DETECT cognitive domains detect2 <- sirt::conf.detect( data=dat, score=wle1, itemcluster=iteminfo$Cognitive.Domain ) ## unweighted weighted ## DETECT 0.251 0.251 ## ASSI 0.227 0.227 ## RATIO 0.282 0.282 # DETECT for item format detect3 <- sirt::conf.detect( data=dat, score=wle1, itemcluster=iteminfo$Format ) ## unweighted weighted ## DETECT 0.056 0.056 ## ASSI 0.060 0.060 ## RATIO 0.062 0.062 # DETECT for item blocks detect4 <- sirt::conf.detect( data=dat, score=wle1, itemcluster=iteminfo$Block ) ## unweighted weighted ## DETECT 0.301 0.301 ## ASSI 0.193 0.193 ## RATIO 0.339 0.339 ## End(Not run) # Exploratory DETECT: Application of a cluster analysis employing the Ward method detect5 <- sirt::expl.detect( data=dat, score=wle1, nclusters=10, N.est=nrow(dat) ) # Plot cluster solution pl <- graphics::plot( detect5$clusterfit, main="Cluster solution" ) stats::rect.hclust(detect5$clusterfit, k=4, border="red") ## Not run: ############################################################################# # EXAMPLE 2: Big 5 data set (polytomous data) ############################################################################# # attach Big5 Dataset data(data.big5) # select 6 items of each dimension dat <- data.big5 dat <- dat[, 1:30] # estimate person score by simply using a transformed sum score score <- stats::qnorm( ( rowMeans( dat )+.5 ) / ( 30 + 1 ) ) # extract item cluster (Big 5 dimensions) itemcluster <- substring( colnames(dat), 1, 1 ) # DETECT Item cluster detect1 <- sirt::conf.detect( data=dat, score=score, itemcluster=itemcluster ) ## unweighted weighted ## DETECT 1.256 1.256 ## ASSI 0.384 0.384 ## RATIO 0.597 0.597 # Exploratory DETECT detect5 <- sirt::expl.detect( data=dat, score=score, nclusters=9, N.est=nrow(dat) ) ## DETECT (unweighted) ## Optimal Cluster Size is 6 (Maximum of DETECT Index) ## N.Cluster N.items N.est N.val size.cluster DETECT.est ASSI.est RATIO.est ## 1 2 30 500 0 6-24 1.073 0.246 0.510 ## 2 3 30 500 0 6-10-14 1.578 0.457 0.750 ## 3 4 30 500 0 6-10-11-3 1.532 0.444 0.729 ## 4 5 30 500 0 6-8-11-2-3 1.591 0.462 0.757 ## 5 6 30 500 0 6-8-6-2-5-3 1.610 0.499 0.766 ## 6 7 30 500 0 6-3-6-2-5-5-3 1.557 0.476 0.740 ## 7 8 30 500 0 6-3-3-2-3-5-5-3 1.540 0.462 0.732 ## 8 9 30 500 0 6-3-3-2-3-5-3-3-2 1.522 0.444 0.724 # Plot Cluster solution pl <- graphics::plot( detect5$clusterfit, main="Cluster solution" ) stats::rect.hclust(detect5$clusterfit, k=6, border="red") ############################################################################# # EXAMPLE 3: DETECT index for unidimensional data ############################################################################# set.seed(976) N <- 1000 I <- 20 b <- sample( seq( -2, 2, len=I) ) dat <- sirt::sim.raschtype( stats::rnorm(N), b=b ) # estimate Rasch model and corresponding WLEs mod1 <- TAM::tam.mml( dat ) wmod1 <- TAM::tam.wle(mod1)$theta # define item cluster itemcluster <- c( rep(1,5), rep(2,I-5) ) # compute DETECT statistic detect1 <- sirt::conf.detect( data=dat, score=wmod1, itemcluster=itemcluster) ## unweighted weighted ## DETECT -0.184 -0.184 ## ASSI -0.147 -0.147 ## RATIO -0.226 -0.226 ## MADCOV100 0.816 0.816 ## MCOV100 -0.786 -0.786 ## End(Not run)
List with item parameters for cultural activities of Austrian students for 9 Austrian countries.
data(data.activity.itempars)
data(data.activity.itempars)
The format is a list with number of students per group
(N
), item loadings (lambda
) and
item intercepts (nu
):
List of 3
$ N : 'table' int [1:9(1d)] 2580 5279 15131 14692 5525 11005 7080 ...
..- attr(*, "dimnames")=List of 1
.. ..$ : chr [1:9] "1" "2" "3" "4" ...
$ lambda: num [1:9, 1:5] 0.423 0.485 0.455 0.437 0.502 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:9] "country1" "country2" "country3" "country4" ...
.. ..$ : chr [1:5] "act1" "act2" "act3" "act4" ...
$ nu : num [1:9, 1:5] 1.65 1.53 1.7 1.59 1.7 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:9] "country1" "country2" "country3" "country4" ...
.. ..$ : chr [1:5] "act1" "act2" "act3" "act4" ...
The synthetic dataset is based on the standardization sample of the Berlin Test of Fluid and Crystallized Intelligence (BEFKI, Wilhelm, Schroeders, & Schipolowski, 2014). The underlying sample consists of N=11,756 students from all German federal states (except for the smallest one) and all school types of the general educational system attending Grades 5 to 12. A detailed description of the study, the sample, and the measure is given in Schroeders, Schipolowski, and Wilhelm (2015).
data(data.befki) data(data.befki_resp)
data(data.befki) data(data.befki_resp)
The dataset data.befki
contains 11756 students, nested within
581 classes.
'data.frame': 11756 obs. of 12 variables:
$ idclass: int 1276 1276 1276 1276 1276 1276 1276 1276 1276 1276 ...
$ idstud : int 127601 127602 127603 127604 127605 127606 127607 127608 127609 127610 ...
$ grade : int 5 5 5 5 5 5 5 5 5 5 ...
$ gym : int 0 0 0 0 0 0 0 0 0 0 ...
$ female : int 0 1 0 0 0 0 1 0 0 0 ...
$ age : num 12.2 11.8 11.5 10.8 10.9 ...
$ sci : num -3.14 -3.44 -2.62 -2.16 -1.01 -1.91 -1.01 -4.13 -2.16 -3.44 ...
$ hum : num -1.71 -1.29 -2.29 -2.48 -0.65 -0.92 -1.71 -2.31 -1.99 -2.48 ...
$ soc : num -2.87 -3.35 -3.81 -2.35 -1.32 -1.11 -1.68 -2.96 -2.69 -3.35 ...
$ gfv : num -2.25 -2.19 -2.25 -1.17 -2.19 -3.05 -1.7 -2.19 -3.05 -1.7 ...
$ gfn : num -2.2 -1.85 -1.85 -1.85 -1.85 -0.27 -1.37 -2.58 -1.85 -3.13 ...
$ gff : num -0.91 -0.43 -1.17 -1.45 -0.61 -1.78 -1.17 -1.78 -1.78 -3.87 ...
The dataset data.befki_resp
contains response indicators for observed data points
in the dataset data.befki
.
num [1:11756, 1:12] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:12] "idclass" "idstud" "grade" "gym" ...
The procedure for generating this dataset is based on a factorization of the joint distribution. All variables are simulated from unidimensional conditional parametric regression models including several interaction and quadratic terms. The multilevel structure is approximated by including cluster means as predictors in the regression models.
Synthetic dataset
Schroeders, U., Schipolowski, S., & Wilhelm, O. (2015). Age-related changes in the mean and covariance structure of fluid and crystallized intelligence in childhood and adolescence. Intelligence, 48, 15-29. doi:10.1016/j.intell.2014.10.006
Wilhelm, O., Schroeders, U., & Schipolowski, S. (2014). Berliner Test zur Erfassung fluider und kristalliner Intelligenz fuer die 8. bis 10. Jahrgangsstufe [Berlin test of fluid and crystallized intelligence for grades 8-10]. Goettingen: Hogrefe.
This is a Big 5 dataset from the qgraph package (Dolan, Oorts, Stoel, Wicherts, 2009). It contains 500 subjects on 240 items.
data(data.big5) data(data.big5.qgraph)
data(data.big5) data(data.big5.qgraph)
The format of data.big5
is: num [1:500, 1:240] 1 0 0 0 0 1 1 2 0 1 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:240] "N1" "E2" "O3" "A4" ...
The format of data.big5.qgraph
is:
num [1:500, 1:240] 2 3 4 4 5 2 2 1 4 2 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:240] "N1" "E2" "O3" "A4" ...
In these datasets, there exist 48 items for each dimension. The Big 5
dimensions are Neuroticism (N
), Extraversion (E
),
Openness (O
), Agreeableness (A
) and
Conscientiousness (C
). Note that the data.big5
differs from
data.big5.qgraph
in a way that original items were recoded into
three categories 0,1 and 2.
See big5
in qgraph package.
Dolan, C. V., Oort, F. J., Stoel, R. D., & Wicherts, J. M. (2009). Testing measurement invariance in the target rotates multigroup exploratory factor model. Structural Equation Modeling, 16, 295-314.
## Not run: # list of needed packages for the following examples packages <- scan(what="character") sirt TAM eRm CDM mirt ltm mokken psychotools psychomix psych # load packages. make an installation if necessary miceadds::library_install(packages) ############################################################################# # EXAMPLE 1: Unidimensional models openness scale ############################################################################# data(data.big5) # extract first 10 openness items items <- which( substring( colnames(data.big5), 1, 1 )=="O" )[1:10] dat <- data.big5[, items ] I <- ncol(dat) summary(dat) ## > colnames(dat) ## [1] "O3" "O8" "O13" "O18" "O23" "O28" "O33" "O38" "O43" "O48" # descriptive statistics psych::describe(dat) #**************** # Model 1: Partial credit model #**************** #-- M1a: rm.facets (in sirt) m1a <- sirt::rm.facets( dat ) summary(m1a) #-- M1b: tam.mml (in TAM) m1b <- TAM::tam.mml( resp=dat ) summary(m1b) #-- M1c: gdm (in CDM) theta.k <- seq(-6,6,len=21) m1c <- CDM::gdm( dat, irtmodel="1PL",theta.k=theta.k, skillspace="normal") summary(m1c) # compare results with loglinear skillspace m1c2 <- CDM::gdm( dat, irtmodel="1PL",theta.k=theta.k, skillspace="loglinear") summary(m1c2) #-- M1d: PCM (in eRm) m1d <- eRm::PCM( dat ) summary(m1d) #-- M1e: gpcm (in ltm) m1e <- ltm::gpcm( dat, constraint="1PL", control=list(verbose=TRUE)) summary(m1e) #-- M1f: mirt (in mirt) m1f <- mirt::mirt( dat, model=1, itemtype="1PL", verbose=TRUE) summary(m1f) coef(m1f) #-- M1g: PCModel.fit (in psychotools) mod1g <- psychotools::PCModel.fit(dat) summary(mod1g) plot(mod1g) #**************** # Model 2: Generalized partial credit model #**************** #-- M2a: rm.facets (in sirt) m2a <- sirt::rm.facets( dat, est.a.item=TRUE) summary(m2a) # Note that in rm.facets the mean of item discriminations is fixed to 1 #-- M2b: tam.mml.2pl (in TAM) m2b <- TAM::tam.mml.2pl( resp=dat, irtmodel="GPCM") summary(m2b) #-- M2c: gdm (in CDM) m2c <- CDM::gdm( dat, irtmodel="2PL",theta.k=seq(-6,6,len=21), skillspace="normal", standardized.latent=TRUE) summary(m2c) #-- M2d: gpcm (in ltm) m2d <- ltm::gpcm( dat, control=list(verbose=TRUE)) summary(m2d) #-- M2e: mirt (in mirt) m2e <- mirt::mirt( dat, model=1, itemtype="GPCM", verbose=TRUE) summary(m2e) coef(m2e) #**************** # Model 3: Nonparametric item response model #**************** #-- M3a: ISOP and ADISOP model - isop.poly (in sirt) m3a <- sirt::isop.poly( dat ) summary(m3a) plot(m3a) #-- M3b: Mokken scale analysis (in mokken) # Scalability coefficients mokken::coefH(dat) # Assumption of monotonicity monotonicity.list <- mokken::check.monotonicity(dat) summary(monotonicity.list) plot(monotonicity.list) # Assumption of non-intersecting ISRFs using method restscore restscore.list <- mokken::check.restscore(dat) summary(restscore.list) plot(restscore.list) #**************** # Model 4: Graded response model #**************** #-- M4a: mirt (in mirt) m4a <- mirt::mirt( dat, model=1, itemtype="graded", verbose=TRUE) print(m4a) mirt.wrapper.coef(m4a) #---- M4b: WLSMV estimation with cfa (in lavaan) lavmodel <- "F=~ O3__O48 F ~~ 1*F " # transform lavaan syntax with lavaanify.IRT lavmodel <- TAM::lavaanify.IRT( lavmodel, items=colnames(dat) )$lavaan.syntax mod4b <- lavaan::cfa( data=as.data.frame(dat), model=lavmodel, std.lv=TRUE, ordered=colnames(dat), parameterization="theta") summary(mod4b, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE) coef(mod4b) #**************** # Model 5: Normally distributed residuals #**************** #---- M5a: cfa (in lavaan) lavmodel <- "F=~ O3__O48 F ~~ 1*F F ~ 0*1 O3__O48 ~ 1 " lavmodel <- TAM::lavaanify.IRT( lavmodel, items=colnames(dat) )$lavaan.syntax mod5a <- lavaan::cfa( data=as.data.frame(dat), model=lavmodel, std.lv=TRUE, estimator="MLR" ) summary(mod5a, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE) #---- M5b: mirt (in mirt) # create user defined function name <- 'normal' par <- c("d"=1, "a1"=0.8, "vy"=1) est <- c(TRUE, TRUE,FALSE) P.normal <- function(par,Theta,ncat){ d <- par[1] a1 <- par[2] vy <- par[3] psi <- vy - a1^2 # expected values given Theta mui <- a1*Theta[,1] + d TP <- nrow(Theta) probs <- matrix( NA, nrow=TP, ncol=ncat ) eps <- .01 for (cc in 1:ncat){ probs[,cc] <- stats::dnorm( cc, mean=mui, sd=sqrt( abs( psi + eps) ) ) } psum <- matrix( rep(rowSums( probs ),each=ncat), nrow=TP, ncol=ncat, byrow=TRUE) probs <- probs / psum return(probs) } # create item response function normal <- mirt::createItem(name, par=par, est=est, P=P.normal) customItems <- list("normal"=normal) itemtype <- rep( "normal",I) # define parameters to be estimated mod5b.pars <- mirt::mirt(dat, 1, itemtype=itemtype, customItems=customItems, pars="values") ind <- which( mod5b.pars$name=="vy") vy <- apply( dat, 2, var, na.rm=TRUE ) mod5b.pars[ ind, "value" ] <- vy ind <- which( mod5b.pars$name=="a1") mod5b.pars[ ind, "value" ] <- .5* sqrt(vy) ind <- which( mod5b.pars$name=="d") mod5b.pars[ ind, "value" ] <- colMeans( dat, na.rm=TRUE ) # estimate model mod5b <- mirt::mirt(dat, 1, itemtype=itemtype, customItems=customItems, pars=mod5b.pars, verbose=TRUE ) sirt::mirt.wrapper.coef(mod5b)$coef # some item plots par(ask=TRUE) plot(mod5b, type='trace', layout=c(1,1)) par(ask=FALSE) # Alternatively: sirt::mirt.wrapper.itemplot(mod5b) ## End(Not run)
## Not run: # list of needed packages for the following examples packages <- scan(what="character") sirt TAM eRm CDM mirt ltm mokken psychotools psychomix psych # load packages. make an installation if necessary miceadds::library_install(packages) ############################################################################# # EXAMPLE 1: Unidimensional models openness scale ############################################################################# data(data.big5) # extract first 10 openness items items <- which( substring( colnames(data.big5), 1, 1 )=="O" )[1:10] dat <- data.big5[, items ] I <- ncol(dat) summary(dat) ## > colnames(dat) ## [1] "O3" "O8" "O13" "O18" "O23" "O28" "O33" "O38" "O43" "O48" # descriptive statistics psych::describe(dat) #**************** # Model 1: Partial credit model #**************** #-- M1a: rm.facets (in sirt) m1a <- sirt::rm.facets( dat ) summary(m1a) #-- M1b: tam.mml (in TAM) m1b <- TAM::tam.mml( resp=dat ) summary(m1b) #-- M1c: gdm (in CDM) theta.k <- seq(-6,6,len=21) m1c <- CDM::gdm( dat, irtmodel="1PL",theta.k=theta.k, skillspace="normal") summary(m1c) # compare results with loglinear skillspace m1c2 <- CDM::gdm( dat, irtmodel="1PL",theta.k=theta.k, skillspace="loglinear") summary(m1c2) #-- M1d: PCM (in eRm) m1d <- eRm::PCM( dat ) summary(m1d) #-- M1e: gpcm (in ltm) m1e <- ltm::gpcm( dat, constraint="1PL", control=list(verbose=TRUE)) summary(m1e) #-- M1f: mirt (in mirt) m1f <- mirt::mirt( dat, model=1, itemtype="1PL", verbose=TRUE) summary(m1f) coef(m1f) #-- M1g: PCModel.fit (in psychotools) mod1g <- psychotools::PCModel.fit(dat) summary(mod1g) plot(mod1g) #**************** # Model 2: Generalized partial credit model #**************** #-- M2a: rm.facets (in sirt) m2a <- sirt::rm.facets( dat, est.a.item=TRUE) summary(m2a) # Note that in rm.facets the mean of item discriminations is fixed to 1 #-- M2b: tam.mml.2pl (in TAM) m2b <- TAM::tam.mml.2pl( resp=dat, irtmodel="GPCM") summary(m2b) #-- M2c: gdm (in CDM) m2c <- CDM::gdm( dat, irtmodel="2PL",theta.k=seq(-6,6,len=21), skillspace="normal", standardized.latent=TRUE) summary(m2c) #-- M2d: gpcm (in ltm) m2d <- ltm::gpcm( dat, control=list(verbose=TRUE)) summary(m2d) #-- M2e: mirt (in mirt) m2e <- mirt::mirt( dat, model=1, itemtype="GPCM", verbose=TRUE) summary(m2e) coef(m2e) #**************** # Model 3: Nonparametric item response model #**************** #-- M3a: ISOP and ADISOP model - isop.poly (in sirt) m3a <- sirt::isop.poly( dat ) summary(m3a) plot(m3a) #-- M3b: Mokken scale analysis (in mokken) # Scalability coefficients mokken::coefH(dat) # Assumption of monotonicity monotonicity.list <- mokken::check.monotonicity(dat) summary(monotonicity.list) plot(monotonicity.list) # Assumption of non-intersecting ISRFs using method restscore restscore.list <- mokken::check.restscore(dat) summary(restscore.list) plot(restscore.list) #**************** # Model 4: Graded response model #**************** #-- M4a: mirt (in mirt) m4a <- mirt::mirt( dat, model=1, itemtype="graded", verbose=TRUE) print(m4a) mirt.wrapper.coef(m4a) #---- M4b: WLSMV estimation with cfa (in lavaan) lavmodel <- "F=~ O3__O48 F ~~ 1*F " # transform lavaan syntax with lavaanify.IRT lavmodel <- TAM::lavaanify.IRT( lavmodel, items=colnames(dat) )$lavaan.syntax mod4b <- lavaan::cfa( data=as.data.frame(dat), model=lavmodel, std.lv=TRUE, ordered=colnames(dat), parameterization="theta") summary(mod4b, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE) coef(mod4b) #**************** # Model 5: Normally distributed residuals #**************** #---- M5a: cfa (in lavaan) lavmodel <- "F=~ O3__O48 F ~~ 1*F F ~ 0*1 O3__O48 ~ 1 " lavmodel <- TAM::lavaanify.IRT( lavmodel, items=colnames(dat) )$lavaan.syntax mod5a <- lavaan::cfa( data=as.data.frame(dat), model=lavmodel, std.lv=TRUE, estimator="MLR" ) summary(mod5a, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE) #---- M5b: mirt (in mirt) # create user defined function name <- 'normal' par <- c("d"=1, "a1"=0.8, "vy"=1) est <- c(TRUE, TRUE,FALSE) P.normal <- function(par,Theta,ncat){ d <- par[1] a1 <- par[2] vy <- par[3] psi <- vy - a1^2 # expected values given Theta mui <- a1*Theta[,1] + d TP <- nrow(Theta) probs <- matrix( NA, nrow=TP, ncol=ncat ) eps <- .01 for (cc in 1:ncat){ probs[,cc] <- stats::dnorm( cc, mean=mui, sd=sqrt( abs( psi + eps) ) ) } psum <- matrix( rep(rowSums( probs ),each=ncat), nrow=TP, ncol=ncat, byrow=TRUE) probs <- probs / psum return(probs) } # create item response function normal <- mirt::createItem(name, par=par, est=est, P=P.normal) customItems <- list("normal"=normal) itemtype <- rep( "normal",I) # define parameters to be estimated mod5b.pars <- mirt::mirt(dat, 1, itemtype=itemtype, customItems=customItems, pars="values") ind <- which( mod5b.pars$name=="vy") vy <- apply( dat, 2, var, na.rm=TRUE ) mod5b.pars[ ind, "value" ] <- vy ind <- which( mod5b.pars$name=="a1") mod5b.pars[ ind, "value" ] <- .5* sqrt(vy) ind <- which( mod5b.pars$name=="d") mod5b.pars[ ind, "value" ] <- colMeans( dat, na.rm=TRUE ) # estimate model mod5b <- mirt::mirt(dat, 1, itemtype=itemtype, customItems=customItems, pars=mod5b.pars, verbose=TRUE ) sirt::mirt.wrapper.coef(mod5b)$coef # some item plots par(ask=TRUE) plot(mod5b, type='trace', layout=c(1,1)) par(ask=FALSE) # Alternatively: sirt::mirt.wrapper.itemplot(mod5b) ## End(Not run)
Datasets of the book of Borg and Staufenbiel (2007) Lehrbuch Theorien and Methoden der Skalierung.
data(data.bs07a)
data(data.bs07a)
The dataset data.bs07a
contains the data
Gefechtsangst (p. 130) and contains 8 of the original 9 items.
The items are symptoms of anxiety in engagement. GF1
: starkes Herzklopfen, GF2
: flaues Gefuehl in der
Magengegend, GF3
: Schwaechegefuehl, GF4
: Uebelkeitsgefuehl,
GF5
: Erbrechen, GF6
: Schuettelfrost,
GF7
: in die Hose urinieren/einkoten, GF9
: Gefuehl der
Gelaehmtheit
The format is
'data.frame': 100 obs. of 9 variables:
$ idpatt: int 44 29 1 3 28 50 50 36 37 25 ...
$ GF1 : int 1 1 1 1 1 0 0 1 1 1 ...
$ GF2 : int 0 1 1 1 1 0 0 1 1 1 ...
$ GF3 : int 0 0 1 1 0 0 0 0 0 1 ...
$ GF4 : int 0 0 1 1 0 0 0 1 0 1 ...
$ GF5 : int 0 0 1 1 0 0 0 0 0 0 ...
$ GF6 : int 1 1 1 1 1 0 0 0 0 0 ...
$ GF7 : num 0 0 1 1 0 0 0 0 0 0 ...
$ GF9 : int 0 0 1 1 1 0 0 0 0 0 ...
MORE DATASETS
Borg, I., & Staufenbiel, T. (2007). Lehrbuch Theorie und Methoden der Skalierung. Bern: Hogrefe.
## Not run: ############################################################################# # EXAMPLE 07a: Dataset Gefechtsangst ############################################################################# data(data.bs07a) dat <- data.bs07a items <- grep( "GF", colnames(dat), value=TRUE ) #************************ # Model 1: Rasch model mod1 <- TAM::tam.mml(dat[,items] ) summary(mod1) IRT.WrightMap(mod1) #************************ # Model 2: 2PL model mod2 <- TAM::tam.mml.2pl(dat[,items] ) summary(mod2) #************************ # Model 3: Latent class analysis (LCA) with two classes tammodel <- " ANALYSIS: TYPE=LCA; NCLASSES(2) NSTARTS(5,10) LAVAAN MODEL: F=~ GF1__GF9 " mod3 <- TAM::tamaan( tammodel, dat ) summary(mod3) #************************ # Model 4: LCA with three classes tammodel <- " ANALYSIS: TYPE=LCA; NCLASSES(3) NSTARTS(5,10) LAVAAN MODEL: F=~ GF1__GF9 " mod4 <- TAM::tamaan( tammodel, dat ) summary(mod4) #************************ # Model 5: Located latent class model (LOCLCA) with two classes tammodel <- " ANALYSIS: TYPE=LOCLCA; NCLASSES(2) NSTARTS(5,10) LAVAAN MODEL: F=~ GF1__GF9 " mod5 <- TAM::tamaan( tammodel, dat ) summary(mod5) #************************ # Model 6: Located latent class model with three classes tammodel <- " ANALYSIS: TYPE=LOCLCA; NCLASSES(3) NSTARTS(5,10) LAVAAN MODEL: F=~ GF1__GF9 " mod6 <- TAM::tamaan( tammodel, dat ) summary(mod6) #************************ # Model 7: Probabilistic Guttman model mod7 <- sirt::prob.guttman( dat[,items] ) summary(mod7) #-- model comparison IRT.compareModels( mod1, mod2, mod3, mod4, mod5, mod6, mod7 ) ## End(Not run)
## Not run: ############################################################################# # EXAMPLE 07a: Dataset Gefechtsangst ############################################################################# data(data.bs07a) dat <- data.bs07a items <- grep( "GF", colnames(dat), value=TRUE ) #************************ # Model 1: Rasch model mod1 <- TAM::tam.mml(dat[,items] ) summary(mod1) IRT.WrightMap(mod1) #************************ # Model 2: 2PL model mod2 <- TAM::tam.mml.2pl(dat[,items] ) summary(mod2) #************************ # Model 3: Latent class analysis (LCA) with two classes tammodel <- " ANALYSIS: TYPE=LCA; NCLASSES(2) NSTARTS(5,10) LAVAAN MODEL: F=~ GF1__GF9 " mod3 <- TAM::tamaan( tammodel, dat ) summary(mod3) #************************ # Model 4: LCA with three classes tammodel <- " ANALYSIS: TYPE=LCA; NCLASSES(3) NSTARTS(5,10) LAVAAN MODEL: F=~ GF1__GF9 " mod4 <- TAM::tamaan( tammodel, dat ) summary(mod4) #************************ # Model 5: Located latent class model (LOCLCA) with two classes tammodel <- " ANALYSIS: TYPE=LOCLCA; NCLASSES(2) NSTARTS(5,10) LAVAAN MODEL: F=~ GF1__GF9 " mod5 <- TAM::tamaan( tammodel, dat ) summary(mod5) #************************ # Model 6: Located latent class model with three classes tammodel <- " ANALYSIS: TYPE=LOCLCA; NCLASSES(3) NSTARTS(5,10) LAVAAN MODEL: F=~ GF1__GF9 " mod6 <- TAM::tamaan( tammodel, dat ) summary(mod6) #************************ # Model 7: Probabilistic Guttman model mod7 <- sirt::prob.guttman( dat[,items] ) summary(mod7) #-- model comparison IRT.compareModels( mod1, mod2, mod3, mod4, mod5, mod6, mod7 ) ## End(Not run)
Examples with datasets from Eid and Schmidt (2014), illustrations with several R packages. The examples follow closely the online material of Hosoya (2014). The datasets are completely synthetic datasets which were resimulated from the originally available data.
data(data.eid.kap4) data(data.eid.kap5) data(data.eid.kap6) data(data.eid.kap7)
data(data.eid.kap4) data(data.eid.kap5) data(data.eid.kap6) data(data.eid.kap7)
data.eid.kap4
is the dataset from Chapter 4.
'data.frame': 193 obs. of 11 variables:
$ sex : int 0 0 0 0 0 0 1 0 0 1 ...
$ Freude_1: int 1 1 1 0 1 1 1 1 1 1 ...
$ Wut_1 : int 1 1 1 0 1 1 1 1 1 1 ...
$ Angst_1 : int 1 0 0 0 1 1 1 0 1 0 ...
$ Trauer_1: int 1 1 1 0 1 1 1 1 1 1 ...
$ Ueber_1 : int 1 1 1 0 1 1 0 1 1 1 ...
$ Trauer_2: int 0 1 1 1 1 1 1 1 1 0 ...
$ Angst_2 : int 0 0 1 0 0 1 0 0 0 0 ...
$ Wut_2 : int 1 1 1 1 1 1 1 1 1 1 ...
$ Ueber_2 : int 1 0 1 0 1 1 1 0 1 1 ...
$ Freude_2: int 1 1 1 0 1 1 1 1 1 1 ...
data.eid.kap5
is the dataset from Chapter 5.
'data.frame': 499 obs. of 7 variables:
$ sex : int 0 0 0 0 1 1 1 0 0 0 ...
$ item_1: int 2 3 3 2 4 1 0 0 0 2 ...
$ item_2: int 1 1 4 1 3 3 2 1 2 3 ...
$ item_3: int 1 3 3 2 3 3 0 0 0 1 ...
$ item_4: int 2 4 3 4 3 3 3 2 0 2 ...
$ item_5: int 1 3 2 2 0 0 0 0 1 2 ...
$ item_6: int 4 3 4 3 4 3 2 1 1 3 ...
data.eid.kap6
is the dataset from Chapter 6.
'data.frame': 238 obs. of 7 variables:
$ geschl: int 1 1 0 0 0 1 0 1 1 0 ...
$ item_1: int 3 3 3 3 2 0 1 4 3 3 ...
$ item_2: int 2 2 2 2 2 0 2 3 1 3 ...
$ item_3: int 2 2 1 3 2 0 0 3 1 3 ...
$ item_4: int 2 3 3 3 3 0 2 4 3 4 ...
$ item_5: int 1 2 1 2 2 0 1 2 2 2 ...
$ item_6: int 2 2 2 2 2 0 1 2 1 2 ...
data.eid.kap7
is the dataset Emotionale Klarheit from Chapter 7.
'data.frame': 238 obs. of 9 variables:
$ geschl : int 1 0 1 1 0 1 0 1 0 1 ...
$ reakt_1: num 2.13 1.78 1.28 1.82 1.9 1.63 1.73 1.49 1.43 1.27 ...
$ reakt_2: num 1.2 1.73 0.95 1.5 1.99 1.75 1.58 1.71 1.41 0.96 ...
$ reakt_3: num 1.77 1.42 0.76 1.54 2.36 1.84 2.06 1.21 1.75 0.92 ...
$ reakt_4: num 2.18 1.28 1.39 1.82 2.09 2.15 2.1 1.13 1.71 0.78 ...
$ reakt_5: num 1.47 1.7 1.08 1.77 1.49 1.73 1.96 1.76 1.88 1.1 ...
$ reakt_6: num 1.63 0.9 0.82 1.63 1.79 1.37 1.79 1.11 1.27 1.06 ...
$ kla_th1: int 8 11 11 8 10 11 12 5 6 12 ...
$ kla_th2: int 7 11 12 8 10 11 12 5 8 11 ...
The material and original datasets can be downloaded from http://www.hogrefe.de/buecher/lehrbuecher/psychlehrbuchplus/lehrbuecher/ testtheorie-und-testkonstruktion/zusatzmaterial/.
Eid, M., & Schmidt, K. (2014). Testtheorie und Testkonstruktion. Goettingen, Hogrefe.
Hosoya, G. (2014). Einfuehrung in die Analyse testtheoretischer Modelle mit R. Available at http://www.hogrefe.de/buecher/lehrbuecher/psychlehrbuchplus/lehrbuecher/testtheorie-und-testkonstruktion/zusatzmaterial/.
## Not run: miceadds::library_install("foreign") #---- load some IRT packages in R miceadds::library_install("TAM") # package (a) miceadds::library_install("mirt") # package (b) miceadds::library_install("sirt") # package (c) miceadds::library_install("eRm") # package (d) miceadds::library_install("ltm") # package (e) miceadds::library_install("psychomix") # package (f) ############################################################################# # EXAMPLES Ch. 4: Unidimensional IRT models | dichotomous data ############################################################################# data(data.eid.kap4) data0 <- data.eid.kap4 # load data data0 <- foreign::read.spss( linkname, to.data.frame=TRUE, use.value.labels=FALSE) # extract items dat <- data0[,2:11] #********************************************************* # Model 1: Rasch model #********************************************************* #----------- #-- 1a: estimation with TAM package # estimation with tam.mml mod1a <- TAM::tam.mml(dat) summary(mod1a) # person parameters in TAM pp1a <- TAM::tam.wle(mod1a) # plot item response functions plot(mod1a,export=FALSE,ask=TRUE) # Infit and outfit in TAM itemf1a <- TAM::tam.fit(mod1a) itemf1a # model fit modf1a <- TAM::tam.modelfit(mod1a) summary(modf1a) #----------- #-- 1b: estimation with mirt package # estimation with mirt mod1b <- mirt::mirt( dat, 1, itemtype="Rasch") summary(mod1b) print(mod1b) # person parameters pp1b <- mirt::fscores(mod1b, method="WLE") # extract coefficients sirt::mirt.wrapper.coef(mod1b) # plot item response functions plot(mod1b, type="trace" ) par(mfrow=c(1,1)) # item fit itemf1b <- mirt::itemfit(mod1b) itemf1b # model fit modf1b <- mirt::M2(mod1b) modf1b #----------- #-- 1c: estimation with sirt package # estimation with rasch.mml2 mod1c <- sirt::rasch.mml2(dat) summary(mod1c) # person parameters (EAP) pp1c <- mod1c$person # plot item response functions plot(mod1c, ask=TRUE ) # model fit modf1c <- sirt::modelfit.sirt(mod1c) summary(modf1c) #----------- #-- 1d: estimation with eRm package # estimation with RM mod1d <- eRm::RM(dat) summary(mod1d) # estimation person parameters pp1d <- eRm::person.parameter(mod1d) summary(pp1d) # plot item response functions eRm::plotICC(mod1d) # person-item map eRm::plotPImap(mod1d) # item fit itemf1d <- eRm::itemfit(pp1d) # person fit persf1d <- eRm::personfit(pp1d) #----------- #-- 1e: estimation with ltm package # estimation with rasch mod1e <- ltm::rasch(dat) summary(mod1e) # estimation person parameters pp1e <- ltm::factor.scores(mod1e) # plot item response functions plot(mod1e) # item fit itemf1e <- ltm::item.fit(mod1e) # person fit persf1e <- ltm::person.fit(mod1e) # goodness of fit with Bootstrap modf1e <- ltm::GoF.rasch(mod1e,B=20) # use more bootstrap samples modf1e #********************************************************* # Model 2: 2PL model #********************************************************* #----------- #-- 2a: estimation with TAM package # estimation mod2a <- TAM::tam.mml.2pl(dat) summary(mod2a) # model fit modf2a <- TAM::tam.modelfit(mod2a) summary(modf2a) # item response functions plot(mod2a, export=FALSE, ask=TRUE) # model comparison anova(mod1a,mod2a) #----------- #-- 2b: estimation with mirt package # estimation mod2b <- mirt::mirt(dat,1,itemtype="2PL") summary(mod2b) print(mod2b) sirt::mirt.wrapper.coef(mod2b) # model fit modf2b <- mirt::M2(mod2b) modf2b #----------- #-- 2c: estimation with sirt package I <- ncol(dat) # estimation mod2c <- sirt::rasch.mml2(dat,est.a=1:I) summary(mod2c) # model fit modf2c <- sirt::modelfit.sirt(mod2c) summary(modf2c) #----------- #-- 2e: estimation with ltm package # estimation mod2e <- ltm::ltm(dat ~ z1 ) summary(mod2e) # item response functions plot(mod2e) #********************************************************* # Model 3: Mixture Rasch model #********************************************************* #----------- #-- 3a: estimation with TAM package # avoid "_" in column names if the "__" operator is used in # the tamaan syntax dat1 <- dat colnames(dat1) <- gsub("_", "", colnames(dat1) ) # define tamaan model tammodel <- " ANALYSIS: TYPE=MIXTURE ; NCLASSES(2); NSTARTS(20,25); # 20 random starts with 25 initial iterations each LAVAAN MODEL: F=~ Freude1__Freude2 F ~~ F ITEM TYPE: ALL(Rasch); " mod3a <- TAM::tamaan( tammodel, resp=dat1 ) summary(mod3a) # extract item parameters ipars <- mod2$itempartable_MIXTURE[ 1:10, ] plot( 1:10, ipars[,3], type="o", ylim=range( ipars[,3:4] ), pch=16, xlab="Item", ylab="Item difficulty") lines( 1:10, ipars[,4], type="l", col=2, lty=2) points( 1:10, ipars[,4], col=2, pch=2) #----------- #-- 3f: estimation with psychomix package # estimation mod3f <- psychomix::raschmix( as.matrix(dat), k=2, scores="meanvar") summary(mod3f) # plot class-specific item difficulties plot(mod3f) ############################################################################# # EXAMPLES Ch. 5: Unidimensional IRT models | polytomous data ############################################################################# data(data.eid.kap5) data0 <- data.eid.kap5 # extract items dat <- data0[,2:7] #********************************************************* # Model 1: Partial credit model #********************************************************* #----------- #-- 1a: estimation with TAM package # estimation with tam.mml mod1a <- TAM::tam.mml(dat) summary(mod1a) # person parameters in TAM pp1a <- tam.wle(mod1a) # plot item response functions plot(mod1a,export=FALSE,ask=TRUE) # Infit and outfit in TAM itemf1a <- TAM::tam.fit(mod1a) itemf1a # model fit modf1a <- TAM::tam.modelfit(mod1a) summary(modf1a) #----------- #-- 1b: estimation with mirt package # estimation with tam.mml mod1b <- mirt::mirt( dat, 1, itemtype="Rasch") summary(mod1b) print(mod1b) sirt::mirt.wrapper.coef(mod1b) # plot item response functions plot(mod1b, type="trace" ) par(mfrow=c(1,1)) # item fit itemf1b <- mirt::itemfit(mod1b) itemf1b #----------- #-- 1c: estimation with sirt package # estimation with rm.facets mod1c <- sirt::rm.facets(dat) summary(mod1c) summary(mod1a) #----------- #-- 1d: estimation with eRm package # estimation mod1d <- eRm::PCM(dat) summary(mod1d) # plot item response functions eRm::plotICC(mod1d) # person-item map eRm::plotPImap(mod1d) # item fit itemf1d <- eRm::itemfit(pp1d) #----------- #-- 1e: estimation with ltm package # estimation mod1e <- ltm::gpcm(dat, constraint="1PL") summary(mod1e) # plot item response functions plot(mod1e) #********************************************************* # Model 2: Generalized partial credit model #********************************************************* #----------- #-- 2a: estimation with TAM package # estimation with tam.mml mod2a <- TAM::tam.mml.2pl(dat, irtmodel="GPCM") summary(mod2a) # model fit modf2a <- TAM::tam.modelfit(mod2a) summary(modf2a) #----------- #-- 2b: estimation with mirt package # estimation mod2b <- mirt::mirt( dat, 1, itemtype="gpcm") summary(mod2b) print(mod2b) sirt::mirt.wrapper.coef(mod2b) #----------- #-- 2c: estimation with sirt package # estimation with rm.facets mod2c <- sirt::rm.facets(dat, est.a.item=TRUE) summary(mod2c) #----------- #-- 2e: estimation with ltm package # estimation mod2e <- ltm::gpcm(dat) summary(mod2e) plot(mod2e) ## End(Not run)
## Not run: miceadds::library_install("foreign") #---- load some IRT packages in R miceadds::library_install("TAM") # package (a) miceadds::library_install("mirt") # package (b) miceadds::library_install("sirt") # package (c) miceadds::library_install("eRm") # package (d) miceadds::library_install("ltm") # package (e) miceadds::library_install("psychomix") # package (f) ############################################################################# # EXAMPLES Ch. 4: Unidimensional IRT models | dichotomous data ############################################################################# data(data.eid.kap4) data0 <- data.eid.kap4 # load data data0 <- foreign::read.spss( linkname, to.data.frame=TRUE, use.value.labels=FALSE) # extract items dat <- data0[,2:11] #********************************************************* # Model 1: Rasch model #********************************************************* #----------- #-- 1a: estimation with TAM package # estimation with tam.mml mod1a <- TAM::tam.mml(dat) summary(mod1a) # person parameters in TAM pp1a <- TAM::tam.wle(mod1a) # plot item response functions plot(mod1a,export=FALSE,ask=TRUE) # Infit and outfit in TAM itemf1a <- TAM::tam.fit(mod1a) itemf1a # model fit modf1a <- TAM::tam.modelfit(mod1a) summary(modf1a) #----------- #-- 1b: estimation with mirt package # estimation with mirt mod1b <- mirt::mirt( dat, 1, itemtype="Rasch") summary(mod1b) print(mod1b) # person parameters pp1b <- mirt::fscores(mod1b, method="WLE") # extract coefficients sirt::mirt.wrapper.coef(mod1b) # plot item response functions plot(mod1b, type="trace" ) par(mfrow=c(1,1)) # item fit itemf1b <- mirt::itemfit(mod1b) itemf1b # model fit modf1b <- mirt::M2(mod1b) modf1b #----------- #-- 1c: estimation with sirt package # estimation with rasch.mml2 mod1c <- sirt::rasch.mml2(dat) summary(mod1c) # person parameters (EAP) pp1c <- mod1c$person # plot item response functions plot(mod1c, ask=TRUE ) # model fit modf1c <- sirt::modelfit.sirt(mod1c) summary(modf1c) #----------- #-- 1d: estimation with eRm package # estimation with RM mod1d <- eRm::RM(dat) summary(mod1d) # estimation person parameters pp1d <- eRm::person.parameter(mod1d) summary(pp1d) # plot item response functions eRm::plotICC(mod1d) # person-item map eRm::plotPImap(mod1d) # item fit itemf1d <- eRm::itemfit(pp1d) # person fit persf1d <- eRm::personfit(pp1d) #----------- #-- 1e: estimation with ltm package # estimation with rasch mod1e <- ltm::rasch(dat) summary(mod1e) # estimation person parameters pp1e <- ltm::factor.scores(mod1e) # plot item response functions plot(mod1e) # item fit itemf1e <- ltm::item.fit(mod1e) # person fit persf1e <- ltm::person.fit(mod1e) # goodness of fit with Bootstrap modf1e <- ltm::GoF.rasch(mod1e,B=20) # use more bootstrap samples modf1e #********************************************************* # Model 2: 2PL model #********************************************************* #----------- #-- 2a: estimation with TAM package # estimation mod2a <- TAM::tam.mml.2pl(dat) summary(mod2a) # model fit modf2a <- TAM::tam.modelfit(mod2a) summary(modf2a) # item response functions plot(mod2a, export=FALSE, ask=TRUE) # model comparison anova(mod1a,mod2a) #----------- #-- 2b: estimation with mirt package # estimation mod2b <- mirt::mirt(dat,1,itemtype="2PL") summary(mod2b) print(mod2b) sirt::mirt.wrapper.coef(mod2b) # model fit modf2b <- mirt::M2(mod2b) modf2b #----------- #-- 2c: estimation with sirt package I <- ncol(dat) # estimation mod2c <- sirt::rasch.mml2(dat,est.a=1:I) summary(mod2c) # model fit modf2c <- sirt::modelfit.sirt(mod2c) summary(modf2c) #----------- #-- 2e: estimation with ltm package # estimation mod2e <- ltm::ltm(dat ~ z1 ) summary(mod2e) # item response functions plot(mod2e) #********************************************************* # Model 3: Mixture Rasch model #********************************************************* #----------- #-- 3a: estimation with TAM package # avoid "_" in column names if the "__" operator is used in # the tamaan syntax dat1 <- dat colnames(dat1) <- gsub("_", "", colnames(dat1) ) # define tamaan model tammodel <- " ANALYSIS: TYPE=MIXTURE ; NCLASSES(2); NSTARTS(20,25); # 20 random starts with 25 initial iterations each LAVAAN MODEL: F=~ Freude1__Freude2 F ~~ F ITEM TYPE: ALL(Rasch); " mod3a <- TAM::tamaan( tammodel, resp=dat1 ) summary(mod3a) # extract item parameters ipars <- mod2$itempartable_MIXTURE[ 1:10, ] plot( 1:10, ipars[,3], type="o", ylim=range( ipars[,3:4] ), pch=16, xlab="Item", ylab="Item difficulty") lines( 1:10, ipars[,4], type="l", col=2, lty=2) points( 1:10, ipars[,4], col=2, pch=2) #----------- #-- 3f: estimation with psychomix package # estimation mod3f <- psychomix::raschmix( as.matrix(dat), k=2, scores="meanvar") summary(mod3f) # plot class-specific item difficulties plot(mod3f) ############################################################################# # EXAMPLES Ch. 5: Unidimensional IRT models | polytomous data ############################################################################# data(data.eid.kap5) data0 <- data.eid.kap5 # extract items dat <- data0[,2:7] #********************************************************* # Model 1: Partial credit model #********************************************************* #----------- #-- 1a: estimation with TAM package # estimation with tam.mml mod1a <- TAM::tam.mml(dat) summary(mod1a) # person parameters in TAM pp1a <- tam.wle(mod1a) # plot item response functions plot(mod1a,export=FALSE,ask=TRUE) # Infit and outfit in TAM itemf1a <- TAM::tam.fit(mod1a) itemf1a # model fit modf1a <- TAM::tam.modelfit(mod1a) summary(modf1a) #----------- #-- 1b: estimation with mirt package # estimation with tam.mml mod1b <- mirt::mirt( dat, 1, itemtype="Rasch") summary(mod1b) print(mod1b) sirt::mirt.wrapper.coef(mod1b) # plot item response functions plot(mod1b, type="trace" ) par(mfrow=c(1,1)) # item fit itemf1b <- mirt::itemfit(mod1b) itemf1b #----------- #-- 1c: estimation with sirt package # estimation with rm.facets mod1c <- sirt::rm.facets(dat) summary(mod1c) summary(mod1a) #----------- #-- 1d: estimation with eRm package # estimation mod1d <- eRm::PCM(dat) summary(mod1d) # plot item response functions eRm::plotICC(mod1d) # person-item map eRm::plotPImap(mod1d) # item fit itemf1d <- eRm::itemfit(pp1d) #----------- #-- 1e: estimation with ltm package # estimation mod1e <- ltm::gpcm(dat, constraint="1PL") summary(mod1e) # plot item response functions plot(mod1e) #********************************************************* # Model 2: Generalized partial credit model #********************************************************* #----------- #-- 2a: estimation with TAM package # estimation with tam.mml mod2a <- TAM::tam.mml.2pl(dat, irtmodel="GPCM") summary(mod2a) # model fit modf2a <- TAM::tam.modelfit(mod2a) summary(modf2a) #----------- #-- 2b: estimation with mirt package # estimation mod2b <- mirt::mirt( dat, 1, itemtype="gpcm") summary(mod2b) print(mod2b) sirt::mirt.wrapper.coef(mod2b) #----------- #-- 2c: estimation with sirt package # estimation with rm.facets mod2c <- sirt::rm.facets(dat, est.a.item=TRUE) summary(mod2c) #----------- #-- 2e: estimation with ltm package # estimation mod2e <- ltm::gpcm(dat) summary(mod2e) plot(mod2e) ## End(Not run)
This dataset contains item loadings and intercepts
for 26 countries for the European Social Survey (ESS 2005;
see Asparouhov & Muthen, 2014).
data(data.ess2005)
data(data.ess2005)
The format of the dataset is:
List of 2
$ lambda: num [1:26, 1:4] 0.688 0.721 0.72 0.687 0.625 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:4] "ipfrule" "ipmodst" "ipbhprp" "imptrad"
$ nu : num [1:26, 1:4] 3.26 2.52 3.41 2.84 2.79 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:4] "ipfrule" "ipmodst" "ipbhprp" "imptrad"
Asparouhov, T., & Muthen, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling, 21(4), 1-14. doi:10.1080/10705511.2014.919210
Some datasets of C-tests are provided. The dataset data.g308
was used in Schroeders, Robitzsch and Schipolowski (2014).
data(data.g308)
data(data.g308)
The dataset data.g308
is a C-test containing 20 items and is
used in Schroeders, Robitzsch and Schipolowski (2014) and is of the
following format
'data.frame': 747 obs. of 21 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ G30801: int 1 1 1 1 1 0 0 1 1 1 ...
$ G30802: int 1 1 1 1 1 1 1 1 1 1 ...
$ G30803: int 1 1 1 1 1 1 1 1 1 1 ...
$ G30804: int 1 1 1 1 1 0 1 1 1 1 ...
[...]
$ G30817: int 0 0 0 0 1 0 1 0 1 0 ...
$ G30818: int 0 0 1 0 0 0 0 1 1 0 ...
$ G30819: int 1 1 1 1 0 0 1 1 1 0 ...
$ G30820: int 1 1 1 1 0 0 0 1 1 0 ...
Schroeders, U., Robitzsch, A., & Schipolowski, S. (2014). A comparison of different psychometric approaches to modeling testlet structures: An example with C-tests. Journal of Educational Measurement, 51(4), 400-418.
## Not run: ############################################################################# # EXAMPLE 1: Dataset G308 from Schroeders et al. (2014) ############################################################################# data(data.g308) dat <- data.g308 library(TAM) library(sirt) # define testlets testlet <- c(1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 5, 5, 6, 6, 6) #**************************************** #*** Model 1: Rasch model mod1 <- TAM::tam.mml(resp=dat, control=list(maxiter=300, snodes=1500)) summary(mod1) #**************************************** #*** Model 2: Rasch testlet model # testlets are dimensions, assign items to Q-matrix TT <- length(unique(testlet)) Q <- matrix(0, nrow=ncol(dat), ncol=TT + 1) Q[,1] <- 1 # First dimension constitutes g-factor for (tt in 1:TT){Q[testlet==tt, tt+1] <- 1} # In a testlet model, all dimensions are uncorrelated among # each other, that is, all pairwise correlations are set to 0, # which can be accomplished with the "variance.fixed" command variance.fixed <- cbind(t( utils::combn(TT+1,2)), 0) mod2 <- TAM::tam.mml(resp=dat, Q=Q, variance.fixed=variance.fixed, control=list(snodes=1500, maxiter=300)) summary(mod2) #**************************************** #*** Model 3: Partial credit model scores <- list() testlet.names <- NULL dat.pcm <- NULL for (tt in 1:max(testlet) ){ scores[[tt]] <- rowSums (dat[, testlet==tt, drop=FALSE]) dat.pcm <- c(dat.pcm, list(c(scores[[tt]]))) testlet.names <- append(testlet.names, paste0("testlet",tt) ) } dat.pcm <- as.data.frame(dat.pcm) colnames(dat.pcm) <- testlet.names mod3 <- TAM::tam.mml(resp=dat.pcm, control=list(snodes=1500, maxiter=300) ) summary(mod3) #**************************************** #*** Model 4: Copula model mod4 <- sirt::rasch.copula2 (dat=dat, itemcluster=testlet) summary(mod4) ## End(Not run)
## Not run: ############################################################################# # EXAMPLE 1: Dataset G308 from Schroeders et al. (2014) ############################################################################# data(data.g308) dat <- data.g308 library(TAM) library(sirt) # define testlets testlet <- c(1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 5, 5, 6, 6, 6) #**************************************** #*** Model 1: Rasch model mod1 <- TAM::tam.mml(resp=dat, control=list(maxiter=300, snodes=1500)) summary(mod1) #**************************************** #*** Model 2: Rasch testlet model # testlets are dimensions, assign items to Q-matrix TT <- length(unique(testlet)) Q <- matrix(0, nrow=ncol(dat), ncol=TT + 1) Q[,1] <- 1 # First dimension constitutes g-factor for (tt in 1:TT){Q[testlet==tt, tt+1] <- 1} # In a testlet model, all dimensions are uncorrelated among # each other, that is, all pairwise correlations are set to 0, # which can be accomplished with the "variance.fixed" command variance.fixed <- cbind(t( utils::combn(TT+1,2)), 0) mod2 <- TAM::tam.mml(resp=dat, Q=Q, variance.fixed=variance.fixed, control=list(snodes=1500, maxiter=300)) summary(mod2) #**************************************** #*** Model 3: Partial credit model scores <- list() testlet.names <- NULL dat.pcm <- NULL for (tt in 1:max(testlet) ){ scores[[tt]] <- rowSums (dat[, testlet==tt, drop=FALSE]) dat.pcm <- c(dat.pcm, list(c(scores[[tt]]))) testlet.names <- append(testlet.names, paste0("testlet",tt) ) } dat.pcm <- as.data.frame(dat.pcm) colnames(dat.pcm) <- testlet.names mod3 <- TAM::tam.mml(resp=dat.pcm, control=list(snodes=1500, maxiter=300) ) summary(mod3) #**************************************** #*** Model 4: Copula model mod4 <- sirt::rasch.copula2 (dat=dat, itemcluster=testlet) summary(mod4) ## End(Not run)
Dataset for invariance testing with 4 groups.
data(data.inv4gr)
data(data.inv4gr)
A data frame with 4000 observations on the following 12 variables. The first variable is a group identifier, the other variables are items.
group
A group identifier
I01
a numeric vector
I02
a numeric vector
I03
a numeric vector
I04
a numeric vector
I05
a numeric vector
I06
a numeric vector
I07
a numeric vector
I08
a numeric vector
I09
a numeric vector
I10
a numeric vector
I11
a numeric vector
Simulated dataset
Dataset 'Liking for science' published by Wright and Masters (1982).
data(data.liking.science)
data(data.liking.science)
The format is:
num [1:75, 1:24] 1 2 2 1 1 1 2 2 0 2 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:24] "LS01" "LS02" "LS03" "LS04" ...
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press.
This dataset contains 200 observations on
12 items. 6 items (I1T1
, ...,I6T1
)
were administered at measurement occasion T1
and 6 items at T2 (I3T2
, ..., I8T2
). There were 4 anchor items
which were presented at both time points.
The first column in the dataset contains the student identifier.
data(data.long)
data(data.long)
The format of the dataset is
'data.frame': 200 obs. of 13 variables:
$ idstud: int 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 ...
$ I1T1 : int 1 1 1 1 1 1 1 0 1 1 ...
$ I2T1 : int 0 0 1 1 1 1 0 1 1 1 ...
$ I3T1 : int 1 0 1 1 0 1 0 0 0 0 ...
$ I4T1 : int 1 0 0 1 0 0 0 0 1 1 ...
$ I5T1 : int 1 0 0 1 0 0 0 0 1 0 ...
$ I6T1 : int 1 0 0 0 0 0 0 0 0 0 ...
$ I3T2 : int 1 1 0 0 1 1 1 1 0 1 ...
$ I4T2 : int 1 1 0 0 1 1 0 0 0 1 ...
$ I5T2 : int 1 0 1 1 1 1 1 0 1 1 ...
$ I6T2 : int 1 1 0 0 0 0 0 0 0 1 ...
$ I7T2 : int 1 0 0 0 0 0 0 0 0 1 ...
$ I8T2 : int 0 0 0 0 1 0 0 0 0 0 ...
## Not run: data(data.long) dat <- data.long dat <- dat[,-1] I <- ncol(dat) #************************************************* # Model 1: 2-dimensional Rasch model #************************************************* # define Q-matrix Q <- matrix(0,I,2) Q[1:6,1] <- 1 Q[7:12,2] <- 1 rownames(Q) <- colnames(dat) colnames(Q) <- c("T1","T2") # vector with same items itemnr <- as.numeric( substring( colnames(dat),2,2) ) # fix mean at T2 to zero mu.fixed <- cbind( 2,0 ) #--- M1a: rasch.mml2 (in sirt) mod1a <- sirt::rasch.mml2(dat, Q=Q, est.b=itemnr, mu.fixed=mu.fixed) summary(mod1a) #--- M1b: smirt (in sirt) mod1b <- sirt::smirt(dat, Qmatrix=Q, irtmodel="comp", est.b=itemnr, mu.fixed=mu.fixed ) #--- M1c: tam.mml (in TAM) # assume equal item difficulty of I3T1 and I3T2, I4T1 and I4T2, ... # create draft design matrix and modify it A <- TAM::designMatrices(resp=dat)$A dimnames(A)[[1]] <- colnames(dat) ## > str(A) ## num [1:12, 1:2, 1:12] 0 0 0 0 0 0 0 0 0 0 ... ## - attr(*, "dimnames")=List of 3 ## ..$ : chr [1:12] "Item01" "Item02" "Item03" "Item04" ... ## ..$ : chr [1:2] "Category0" "Category1" ## ..$ : chr [1:12] "I1T1" "I2T1" "I3T1" "I4T1" ... A1 <- A[,, c(1:6, 11:12 ) ] A1[7,2,3] <- -1 # difficulty(I3T1)=difficulty(I3T2) A1[8,2,4] <- -1 # I4T1=I4T2 A1[9,2,5] <- A1[10,2,6] <- -1 dimnames(A1)[[3]] <- substring( dimnames(A1)[[3]],1,2) ## > A1[,2,] ## I1 I2 I3 I4 I5 I6 I7 I8 ## I1T1 -1 0 0 0 0 0 0 0 ## I2T1 0 -1 0 0 0 0 0 0 ## I3T1 0 0 -1 0 0 0 0 0 ## I4T1 0 0 0 -1 0 0 0 0 ## I5T1 0 0 0 0 -1 0 0 0 ## I6T1 0 0 0 0 0 -1 0 0 ## I3T2 0 0 -1 0 0 0 0 0 ## I4T2 0 0 0 -1 0 0 0 0 ## I5T2 0 0 0 0 -1 0 0 0 ## I6T2 0 0 0 0 0 -1 0 0 ## I7T2 0 0 0 0 0 0 -1 0 ## I8T2 0 0 0 0 0 0 0 -1 # estimate model # set intercept of second dimension (T2) to zero beta.fixed <- cbind( 1, 2, 0 ) mod1c <- TAM::tam.mml( resp=dat, Q=Q, A=A1, beta.fixed=beta.fixed) summary(mod1c) #************************************************* # Model 2: 2-dimensional 2PL model #************************************************* # set variance at T2 to 1 variance.fixed <- cbind(2,2,1) # M2a: rasch.mml2 (in sirt) mod2a <- sirt::rasch.mml2(dat, Q=Q, est.b=itemnr, est.a=itemnr, mu.fixed=mu.fixed, variance.fixed=variance.fixed, mmliter=100) summary(mod2a) #************************************************* # Model 3: Concurrent calibration by assuming invariant item parameters #************************************************* library(mirt) # use mirt for concurrent calibration data(data.long) dat <- data.long[,-1] I <- ncol(dat) # create user defined function for between item dimensionality 4PL model name <- "4PLbw" par <- c("low"=0,"upp"=1,"a"=1,"d"=0,"dimItem"=1) est <- c(TRUE, TRUE,TRUE,TRUE,FALSE) # item response function irf <- function(par,Theta,ncat){ low <- par[1] upp <- par[2] a <- par[3] d <- par[4] dimItem <- par[5] P1 <- low + ( upp - low ) * plogis( a*Theta[,dimItem] + d ) cbind(1-P1, P1) } # create item response function fourPLbetw <- mirt::createItem(name, par=par, est=est, P=irf) head(dat) # create mirt model (use variable names in mirt.model) mirtsyn <- " T1=I1T1,I2T1,I3T1,I4T1,I5T1,I6T1 T2=I3T2,I4T2,I5T2,I6T2,I7T2,I8T2 COV=T1*T2,,T2*T2 MEAN=T1 CONSTRAIN=(I3T1,I3T2,d),(I4T1,I4T2,d),(I5T1,I5T2,d),(I6T1,I6T2,d), (I3T1,I3T2,a),(I4T1,I4T2,a),(I5T1,I5T2,a),(I6T1,I6T2,a) " # create mirt model mirtmodel <- mirt::mirt.model( mirtsyn, itemnames=colnames(dat) ) # define parameters to be estimated mod3.pars <- mirt::mirt(dat, mirtmodel$model, rep( "4PLbw",I), customItems=list("4PLbw"=fourPLbetw), pars="values") # select dimensions ind <- intersect( grep("T2",mod3.pars$item), which( mod3.pars$name=="dimItem" ) ) mod3.pars[ind,"value"] <- 2 # set item parameters low and upp to non-estimated ind <- which( mod3.pars$name %in% c("low","upp") ) mod3.pars[ind,"est"] <- FALSE # estimate 2PL model mod3 <- mirt::mirt(dat, mirtmodel$model, itemtype=rep( "4PLbw",I), customItems=list("4PLbw"=fourPLbetw), pars=mod3.pars, verbose=TRUE, technical=list(NCYCLES=50) ) mirt.wrapper.coef(mod3) #****** estimate model in lavaan library(lavaan) # specify syntax lavmodel <- " #**** T1 F1=~ a1*I1T1+a2*I2T1+a3*I3T1+a4*I4T1+a5*I5T1+a6*I6T1 I1T1 | b1*t1 ; I2T1 | b2*t1 ; I3T1 | b3*t1 ; I4T1 | b4*t1 I5T1 | b5*t1 ; I6T1 | b6*t1 F1 ~~ 1*F1 #**** T2 F2=~ a3*I3T2+a4*I4T2+a5*I5T2+a6*I6T2+a7*I7T2+a8*I8T2 I3T2 | b3*t1 ; I4T2 | b4*t1 ; I5T2 | b5*t1 ; I6T2 | b6*t1 I7T2 | b7*t1 ; I8T2 | b8*t1 F2 ~~ NA*F2 F2 ~ 1 #*** covariance F1 ~~ F2 " # estimate model using theta parameterization mod3lav <- lavaan::cfa( data=dat, model=lavmodel, std.lv=TRUE, ordered=colnames(dat), parameterization="theta") summary(mod3lav, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE) #************************************************* # Model 4: Linking with items of different item slope groups #************************************************* data(data.long) dat <- data.long # dataset for T1 dat1 <- dat[, grep( "T1", colnames(dat) ) ] colnames(dat1) <- gsub("T1","", colnames(dat1) ) # dataset for T2 dat2 <- dat[, grep( "T2", colnames(dat) ) ] colnames(dat2) <- gsub("T2","", colnames(dat2) ) # 2PL model with slope groups T1 mod1 <- sirt::rasch.mml2( dat1, est.a=c( rep(1,2), rep(2,4) ) ) summary(mod1) # 2PL model with slope groups T2 mod2 <- sirt::rasch.mml2( dat2, est.a=c( rep(1,4), rep(2,2) ) ) summary(mod2) #------- Link 1: Haberman Linking # collect item parameters dfr1 <- data.frame( "study1", mod1$item$item, mod1$item$a, mod1$item$b ) dfr2 <- data.frame( "study2", mod2$item$item, mod2$item$a, mod2$item$b ) colnames(dfr2) <- colnames(dfr1) <- c("study", "item", "a", "b" ) itempars <- rbind( dfr1, dfr2 ) # Linking link1 <- sirt::linking.haberman(itempars=itempars) #------- Link 2: Invariance alignment method # create objects for invariance.alignment nu <- rbind( c(mod1$item$thresh,NA,NA), c(NA,NA,mod2$item$thresh) ) lambda <- rbind( c(mod1$item$a,NA,NA), c(NA,NA,mod2$item$a ) ) colnames(lambda) <- colnames(nu) <- paste0("I",1:8) rownames(lambda) <- rownames(nu) <- c("T1", "T2") # Linking link2a <- sirt::invariance.alignment( lambda, nu ) summary(link2a) ## End(Not run)
## Not run: data(data.long) dat <- data.long dat <- dat[,-1] I <- ncol(dat) #************************************************* # Model 1: 2-dimensional Rasch model #************************************************* # define Q-matrix Q <- matrix(0,I,2) Q[1:6,1] <- 1 Q[7:12,2] <- 1 rownames(Q) <- colnames(dat) colnames(Q) <- c("T1","T2") # vector with same items itemnr <- as.numeric( substring( colnames(dat),2,2) ) # fix mean at T2 to zero mu.fixed <- cbind( 2,0 ) #--- M1a: rasch.mml2 (in sirt) mod1a <- sirt::rasch.mml2(dat, Q=Q, est.b=itemnr, mu.fixed=mu.fixed) summary(mod1a) #--- M1b: smirt (in sirt) mod1b <- sirt::smirt(dat, Qmatrix=Q, irtmodel="comp", est.b=itemnr, mu.fixed=mu.fixed ) #--- M1c: tam.mml (in TAM) # assume equal item difficulty of I3T1 and I3T2, I4T1 and I4T2, ... # create draft design matrix and modify it A <- TAM::designMatrices(resp=dat)$A dimnames(A)[[1]] <- colnames(dat) ## > str(A) ## num [1:12, 1:2, 1:12] 0 0 0 0 0 0 0 0 0 0 ... ## - attr(*, "dimnames")=List of 3 ## ..$ : chr [1:12] "Item01" "Item02" "Item03" "Item04" ... ## ..$ : chr [1:2] "Category0" "Category1" ## ..$ : chr [1:12] "I1T1" "I2T1" "I3T1" "I4T1" ... A1 <- A[,, c(1:6, 11:12 ) ] A1[7,2,3] <- -1 # difficulty(I3T1)=difficulty(I3T2) A1[8,2,4] <- -1 # I4T1=I4T2 A1[9,2,5] <- A1[10,2,6] <- -1 dimnames(A1)[[3]] <- substring( dimnames(A1)[[3]],1,2) ## > A1[,2,] ## I1 I2 I3 I4 I5 I6 I7 I8 ## I1T1 -1 0 0 0 0 0 0 0 ## I2T1 0 -1 0 0 0 0 0 0 ## I3T1 0 0 -1 0 0 0 0 0 ## I4T1 0 0 0 -1 0 0 0 0 ## I5T1 0 0 0 0 -1 0 0 0 ## I6T1 0 0 0 0 0 -1 0 0 ## I3T2 0 0 -1 0 0 0 0 0 ## I4T2 0 0 0 -1 0 0 0 0 ## I5T2 0 0 0 0 -1 0 0 0 ## I6T2 0 0 0 0 0 -1 0 0 ## I7T2 0 0 0 0 0 0 -1 0 ## I8T2 0 0 0 0 0 0 0 -1 # estimate model # set intercept of second dimension (T2) to zero beta.fixed <- cbind( 1, 2, 0 ) mod1c <- TAM::tam.mml( resp=dat, Q=Q, A=A1, beta.fixed=beta.fixed) summary(mod1c) #************************************************* # Model 2: 2-dimensional 2PL model #************************************************* # set variance at T2 to 1 variance.fixed <- cbind(2,2,1) # M2a: rasch.mml2 (in sirt) mod2a <- sirt::rasch.mml2(dat, Q=Q, est.b=itemnr, est.a=itemnr, mu.fixed=mu.fixed, variance.fixed=variance.fixed, mmliter=100) summary(mod2a) #************************************************* # Model 3: Concurrent calibration by assuming invariant item parameters #************************************************* library(mirt) # use mirt for concurrent calibration data(data.long) dat <- data.long[,-1] I <- ncol(dat) # create user defined function for between item dimensionality 4PL model name <- "4PLbw" par <- c("low"=0,"upp"=1,"a"=1,"d"=0,"dimItem"=1) est <- c(TRUE, TRUE,TRUE,TRUE,FALSE) # item response function irf <- function(par,Theta,ncat){ low <- par[1] upp <- par[2] a <- par[3] d <- par[4] dimItem <- par[5] P1 <- low + ( upp - low ) * plogis( a*Theta[,dimItem] + d ) cbind(1-P1, P1) } # create item response function fourPLbetw <- mirt::createItem(name, par=par, est=est, P=irf) head(dat) # create mirt model (use variable names in mirt.model) mirtsyn <- " T1=I1T1,I2T1,I3T1,I4T1,I5T1,I6T1 T2=I3T2,I4T2,I5T2,I6T2,I7T2,I8T2 COV=T1*T2,,T2*T2 MEAN=T1 CONSTRAIN=(I3T1,I3T2,d),(I4T1,I4T2,d),(I5T1,I5T2,d),(I6T1,I6T2,d), (I3T1,I3T2,a),(I4T1,I4T2,a),(I5T1,I5T2,a),(I6T1,I6T2,a) " # create mirt model mirtmodel <- mirt::mirt.model( mirtsyn, itemnames=colnames(dat) ) # define parameters to be estimated mod3.pars <- mirt::mirt(dat, mirtmodel$model, rep( "4PLbw",I), customItems=list("4PLbw"=fourPLbetw), pars="values") # select dimensions ind <- intersect( grep("T2",mod3.pars$item), which( mod3.pars$name=="dimItem" ) ) mod3.pars[ind,"value"] <- 2 # set item parameters low and upp to non-estimated ind <- which( mod3.pars$name %in% c("low","upp") ) mod3.pars[ind,"est"] <- FALSE # estimate 2PL model mod3 <- mirt::mirt(dat, mirtmodel$model, itemtype=rep( "4PLbw",I), customItems=list("4PLbw"=fourPLbetw), pars=mod3.pars, verbose=TRUE, technical=list(NCYCLES=50) ) mirt.wrapper.coef(mod3) #****** estimate model in lavaan library(lavaan) # specify syntax lavmodel <- " #**** T1 F1=~ a1*I1T1+a2*I2T1+a3*I3T1+a4*I4T1+a5*I5T1+a6*I6T1 I1T1 | b1*t1 ; I2T1 | b2*t1 ; I3T1 | b3*t1 ; I4T1 | b4*t1 I5T1 | b5*t1 ; I6T1 | b6*t1 F1 ~~ 1*F1 #**** T2 F2=~ a3*I3T2+a4*I4T2+a5*I5T2+a6*I6T2+a7*I7T2+a8*I8T2 I3T2 | b3*t1 ; I4T2 | b4*t1 ; I5T2 | b5*t1 ; I6T2 | b6*t1 I7T2 | b7*t1 ; I8T2 | b8*t1 F2 ~~ NA*F2 F2 ~ 1 #*** covariance F1 ~~ F2 " # estimate model using theta parameterization mod3lav <- lavaan::cfa( data=dat, model=lavmodel, std.lv=TRUE, ordered=colnames(dat), parameterization="theta") summary(mod3lav, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE) #************************************************* # Model 4: Linking with items of different item slope groups #************************************************* data(data.long) dat <- data.long # dataset for T1 dat1 <- dat[, grep( "T1", colnames(dat) ) ] colnames(dat1) <- gsub("T1","", colnames(dat1) ) # dataset for T2 dat2 <- dat[, grep( "T2", colnames(dat) ) ] colnames(dat2) <- gsub("T2","", colnames(dat2) ) # 2PL model with slope groups T1 mod1 <- sirt::rasch.mml2( dat1, est.a=c( rep(1,2), rep(2,4) ) ) summary(mod1) # 2PL model with slope groups T2 mod2 <- sirt::rasch.mml2( dat2, est.a=c( rep(1,4), rep(2,2) ) ) summary(mod2) #------- Link 1: Haberman Linking # collect item parameters dfr1 <- data.frame( "study1", mod1$item$item, mod1$item$a, mod1$item$b ) dfr2 <- data.frame( "study2", mod2$item$item, mod2$item$a, mod2$item$b ) colnames(dfr2) <- colnames(dfr1) <- c("study", "item", "a", "b" ) itempars <- rbind( dfr1, dfr2 ) # Linking link1 <- sirt::linking.haberman(itempars=itempars) #------- Link 2: Invariance alignment method # create objects for invariance.alignment nu <- rbind( c(mod1$item$thresh,NA,NA), c(NA,NA,mod2$item$thresh) ) lambda <- rbind( c(mod1$item$a,NA,NA), c(NA,NA,mod2$item$a ) ) colnames(lambda) <- colnames(nu) <- paste0("I",1:8) rownames(lambda) <- rownames(nu) <- c("T1", "T2") # Linking link2a <- sirt::invariance.alignment( lambda, nu ) summary(link2a) ## End(Not run)
Datasets for local structural equation models or moderated factor analysis.
data(data.lsem01) data(data.lsem02) data(data.lsem03)
data(data.lsem01) data(data.lsem02) data(data.lsem03)
The dataset data.lsem01
has the following structure
'data.frame': 989 obs. of 6 variables:
$ age: num 4 4 4 4 4 4 4 4 4 4 ...
$ v1 : num 1.83 2.38 1.85 4.53 -0.04 4.35 2.38 1.83 4.81 2.82 ...
$ v2 : num 6.06 9.08 7.41 8.24 6.18 7.4 6.54 4.28 6.43 7.6 ...
$ v3 : num 1.42 3.05 6.42 -1.05 -1.79 4.06 -0.17 -2.64 0.84 6.42 ...
$ v4 : num 3.84 4.24 3.24 3.36 2.31 6.07 4 5.93 4.4 3.49 ...
$ v5 : num 7.84 7.51 6.62 8.02 7.12 7.99 7.25 7.62 7.66 7.03 ...
The dataset data.lsem02
is a slightly perturbed dataset of the
Woodcock-Johnson III (WJ-III) Tests of Cognitive Abilities used in Hildebrandt et al.
(2016) and has the following structure
'data.frame': 1129 obs. of 8 variables:
$ age : int 4 4 4 4 4 4 4 4 4 4 ...
$ gcw : num -3.53 -3.73 -3.77 -3.84 -4.26 -4.6 -3.66 -4.31 -4.46 -3.64 ...
$ gvw : num -1.98 -1.35 -1.66 -3.24 -1.17 -2.78 -2.97 -3.88 -3.22 -0.68 ...
$ gfw : num -2.49 -2.41 -4.48 -4.17 -4.43 -5.06 -3.94 -3.66 -3.7 -2.74 ...
$ gsw : num -4.85 -5.05 -5.66 -4.3 -5.23 -5.63 -4.91 -5.75 -6.29 -5.47 ...
$ gsmw: num -2.99 -1.13 -4.21 -3.59 -3.79 -4.77 -2.98 -4.48 -2.99 -3.83 ...
$ glrw: num -2.49 -2.91 -3.45 -2.91 -3.31 -3.78 -3.5 -3.96 -2.97 -3.14 ...
$ gaw : num -3.22 -3.77 -3.54 -3.6 -3.22 -3.5 -1.27 -2.08 -2.23 -3.25 ...
The dataset data.lsem03
is a synthetic dataset of the SON-R application
used in Hueluer et al. (2011) has the following structure
'data.frame': 1027 obs. of 10 variables:
$ id : num 10001 10002 10003 10004 10005 ...
$ female : int 0 0 0 0 0 0 0 0 0 0 ...
$ age : num 2.62 2.65 2.66 2.67 2.68 2.68 2.68 2.69 2.71 2.71 ...
$ age_group: int 1 1 1 1 1 1 1 1 1 1 ...
$ p1 : num -1.98 -1.98 -1.67 -2.29 -1.67 -1.98 -2.29 -1.98 -2.6 -1.67 ...
$ p2 : num -1.51 -1.51 -0.55 -1.84 -1.51 -1.84 -2.16 -1.84 -2.48 -1.84 ...
$ p3 : num -1.4 -2.31 -1.1 -2 -1.4 -1.7 -2.31 -1.4 -2.31 -0.79 ...
$ r1 : num -1.46 -1.14 -0.49 -2.11 -1.46 -1.46 -2.11 -1.46 -2.75 -1.78 ...
$ r2 : num -2.67 -1.74 0.74 -1.74 -0.81 -1.43 -2.05 -1.43 -1.74 -1.12 ...
$ r3 : num -1.64 -1.64 -1.64 -0.9 -1.27 -3.11 -2.74 -1.64 -2.37 -1.27 ...
The subtests Mosaics (p1
), Puzzles (p1
), and Patterns (p3
)
constitute the performance subscale;
the subtests Categories (r1
), Analogies (r2
), and
Situations (r3
) constitute the reasoning subscale.
Hildebrandt, A., Luedtke, O., Robitzsch, A., Sommer, C., & Wilhelm, O. (2016). Exploring factor model parameters across continuous variables with local structural equation models. Multivariate Behavioral Research, 51(2-3), 257-278. doi:10.1080/00273171.2016.1142856
Hueluer, G., Wilhelm, O., & Robitzsch, A. (2011). Intelligence differentiation in early childhood. Journal of Individual Differences, 32(3), 170-179. doi:10.1027/1614-0001/a000049
This is an example dataset involving Mathematics items for German fourth graders. Items are classified into several domains and subdomains (see Section Format). The dataset contains 664 students on 30 items.
data(data.math)
data(data.math)
The dataset is a list. The list element data
contains the dataset with the demographic variables
student ID (idstud
) and a dummy variable
for female students (female
). The remaining
variables (starting with M
in the name) are
the mathematics items.
The item metadata are included in the list element
item
which contains item name (item
) and the
testlet label (testlet
). An item not included
in a testlet is indicated by NA
.
Each item is allocated to one and only competence domain (domain
).
The format is:
List of 2
$ data:'data.frame':
..$ idstud: int [1:664] 1001 1002 1003 ...
..$ female: int [1:664] 1 1 0 0 1 1 1 0 0 1 ...
..$ MA1 : int [1:664] 1 1 1 0 0 1 1 1 1 1 ...
..$ MA2 : int [1:664] 1 1 1 1 1 0 0 0 0 1 ...
..$ MA3 : int [1:664] 1 1 0 0 0 0 0 1 0 0 ...
..$ MA4 : int [1:664] 0 1 1 1 0 0 1 0 0 0 ...
..$ MB1 : int [1:664] 0 1 0 1 0 0 0 0 0 1 ...
..$ MB2 : int [1:664] 1 1 1 1 0 1 0 1 0 0 ...
..$ MB3 : int [1:664] 1 1 1 1 0 0 0 1 0 1 ...
[...]
..$ MH3 : int [1:664] 1 1 0 1 0 0 1 0 1 0 ...
..$ MH4 : int [1:664] 0 1 1 1 0 0 0 0 1 0 ...
..$ MI1 : int [1:664] 1 1 0 1 0 1 0 0 1 0 ...
..$ MI2 : int [1:664] 1 1 0 0 0 1 1 0 1 1 ...
..$ MI3 : int [1:664] 0 1 0 1 0 0 0 0 0 0 ...
$ item:'data.frame':
..$ item : Factor w/ 30 levels "MA1","MA2","MA3",..: 1 2 3 4 5 ...
..$ testlet : Factor w/ 9 levels "","MA","MB","MC",..: 2 2 2 2 3 3 ...
..$ domain : Factor w/ 3 levels "arithmetic","geometry",..: 1 1 1 ...
..$ subdomain: Factor w/ 9 levels "","addition",..: 2 2 2 2 7 7 ...
Some datasets from McDonald (1999), especially related to using NOHARM for item response modeling. See Examples below.
data(data.mcdonald.act15) data(data.mcdonald.LSAT6) data(data.mcdonald.rape)
data(data.mcdonald.act15) data(data.mcdonald.LSAT6) data(data.mcdonald.rape)
The format of the ACT15 data data.mcdonald.act15
is:
num [1:15, 1:15] 0.49 0.44 0.38 0.3 0.29 0.13 0.23 0.16 0.16 0.23 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:15] "A01" "A02" "A03" "A04" ...
..$ : chr [1:15] "A01" "A02" "A03" "A04" ...
The dataset (which is the product-moment covariance matrix)
is obtained from Ch. 12 in McDonald (1999).
The format of the LSAT6 data data.mcdonald.LSAT6
is:
'data.frame': 1004 obs. of 5 variables:
$ L1: int 0 0 0 0 0 0 0 0 0 0 ...
$ L2: int 0 0 0 0 0 0 0 0 0 0 ...
$ L3: int 0 0 0 0 0 0 0 0 0 0 ...
$ L4: int 0 0 0 0 0 0 0 0 0 1 ...
$ L5: int 0 0 0 1 1 1 1 1 1 0 ...
The dataset is obtained from Ch. 6 in McDonald (1999).
The format of the rape myth scale data data.mcdonald.rape
is
List of 2
$ lambda: num [1:2, 1:19] 1.13 0.88 0.85 0.77 0.79 0.55 1.12 1.01 0.99 0.79 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "male" "female"
.. ..$ : chr [1:19] "I1" "I2" "I3" "I4" ...
$ nu : num [1:2, 1:19] 2.88 1.87 3.12 2.32 2.13 1.43 3.79 2.6 3.01 2.11 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "male" "female"
.. ..$ : chr [1:19] "I1" "I2" "I3" "I4" ...
The dataset is obtained from Ch. 15 in McDonald (1999).
Tables in McDonald (1999)
McDonald, R. P. (1999). Test theory: A unified treatment. Psychology Press.
## Not run: ############################################################################# # EXAMPLE 1: LSAT6 data | Chapter 12 McDonald (1999) ############################################################################# data(data.mcdonald.act15) #************ # Model 1: 2-parameter normal ogive model #++ NOHARM estimation I <- ncol(dat) # covariance structure P.pattern <- matrix( 0, ncol=1, nrow=1 ) P.init <- 1+0*P.pattern # fix all entries in the loading matrix to 1 F.pattern <- matrix( 1, nrow=I, ncol=1 ) F.init <- F.pattern # estimate model mod1a <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="LSAT6__1dim_2pno", noharm.path=noharm.path, dec="," ) summary(mod1a, logfile="LSAT6__1dim_2pno__SUMMARY") #++ pairwise marginal maximum likelihood estimation using the probit link mod1b <- sirt::rasch.pml3( dat, est.a=1:I, est.sigma=FALSE) #************ # Model 2: 1-parameter normal ogive model #++ NOHARM estimation # covariance structure P.pattern <- matrix( 0, ncol=1, nrow=1 ) P.init <- 1+0*P.pattern # fix all entries in the loading matrix to 1 F.pattern <- matrix( 2, nrow=I, ncol=1 ) F.init <- 1+0*F.pattern # estimate model mod2a <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="LSAT6__1dim_1pno", noharm.path=noharm.path, dec="," ) summary(mod2a, logfile="LSAT6__1dim_1pno__SUMMARY") # PMML estimation mod2b <- sirt::rasch.pml3( dat, est.a=rep(1,I), est.sigma=FALSE ) summary(mod2b) #************ # Model 3: 3-parameter normal ogive model with fixed guessing parameters #++ NOHARM estimation # covariance structure P.pattern <- matrix( 0, ncol=1, nrow=1 ) P.init <- 1+0*P.pattern # fix all entries in the loading matrix to 1 F.pattern <- matrix( 1, nrow=I, ncol=1 ) F.init <- 1+0*F.pattern # estimate model mod <- sirt::R2noharm( dat=dat, model.type="CFA", guesses=rep(.2,I), F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="LSAT6__1dim_3pno", noharm.path=noharm.path, dec="," ) summary(mod, logfile="LSAT6__1dim_3pno__SUMMARY") #++ logistic link function employed in smirt function mod1d <- sirt::smirt(dat, Qmatrix=F.pattern, est.a=matrix(1:I,I,1), c.init=rep(.2,I)) summary(mod1d) ############################################################################# # EXAMPLE 2: ACT15 data | Chapter 6 McDonald (1999) ############################################################################# data(data.mcdonald.act15) pm <- data.mcdonald.act15 #************ # Model 1: 2-dimensional exploratory factor analysis mod1 <- sirt::R2noharm( pm=pm, n=1000, model.type="EFA", dimensions=2, writename="ACT15__efa_2dim", noharm.path=noharm.path, dec="," ) summary(mod1) #************ # Model 2: 2-dimensional independent clusters basis solution P.pattern <- matrix(1,2,2) diag(P.pattern) <- 0 P.init <- 1+0*P.pattern F.pattern <- matrix(0,15,2) F.pattern[ c(1:5,11:15),1] <- 1 F.pattern[ c(6:10,11:15),2] <- 1 F.init <- F.pattern # estimate model mod2 <- sirt::R2noharm( pm=pm, n=1000, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern,P.init=P.init, writename="ACT15_indep_clusters", noharm.path=noharm.path, dec="," ) summary(mod2) #************ # Model 3: Hierarchical model P.pattern <- matrix(0,3,3) P.init <- P.pattern diag(P.init) <- 1 F.pattern <- matrix(0,15,3) F.pattern[,1] <- 1 # all items load on g factor F.pattern[ c(1:5,11:15),2] <- 1 # Items 1-5 and 11-15 load on first nested factor F.pattern[ c(6:10,11:15),3] <- 1 # Items 6-10 and 11-15 load on second nested factor F.init <- F.pattern # estimate model mod3 <- sirt::R2noharm( pm=pm, n=1000, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="ACT15_hierarch_model", noharm.path=noharm.path, dec="," ) summary(mod3) ############################################################################# # EXAMPLE 3: Rape myth scale | Chapter 15 McDonald (1999) ############################################################################# data(data.mcdonald.rape) lambda <- data.mcdonald.rape$lambda nu <- data.mcdonald.rape$nu # obtain multiplier for factor loadings (Formula 15.5) k <- sum( lambda[1,] * lambda[2,] ) / sum( lambda[2,]^2 ) ## [1] 1.263243 # additive parameter (Formula 15.7) c <- sum( lambda[2,]*(nu[1,]-nu[2,]) ) / sum( lambda[2,]^2 ) ## [1] 1.247697 # SD in the female group 1/k ## [1] 0.7916132 # M in the female group - c/k ## [1] -0.9876932 # Burt's coefficient of factorial congruence (Formula 15.10a) sum( lambda[1,] * lambda[2,] ) / sqrt( sum( lambda[1,]^2 ) * sum( lambda[2,]^2 ) ) ## [1] 0.9727831 # congruence for mean parameters sum( (nu[1,]-nu[2,]) * lambda[2,] ) / sqrt( sum( (nu[1,]-nu[2,])^2 ) * sum( lambda[2,]^2 ) ) ## [1] 0.968176 ## End(Not run)
## Not run: ############################################################################# # EXAMPLE 1: LSAT6 data | Chapter 12 McDonald (1999) ############################################################################# data(data.mcdonald.act15) #************ # Model 1: 2-parameter normal ogive model #++ NOHARM estimation I <- ncol(dat) # covariance structure P.pattern <- matrix( 0, ncol=1, nrow=1 ) P.init <- 1+0*P.pattern # fix all entries in the loading matrix to 1 F.pattern <- matrix( 1, nrow=I, ncol=1 ) F.init <- F.pattern # estimate model mod1a <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="LSAT6__1dim_2pno", noharm.path=noharm.path, dec="," ) summary(mod1a, logfile="LSAT6__1dim_2pno__SUMMARY") #++ pairwise marginal maximum likelihood estimation using the probit link mod1b <- sirt::rasch.pml3( dat, est.a=1:I, est.sigma=FALSE) #************ # Model 2: 1-parameter normal ogive model #++ NOHARM estimation # covariance structure P.pattern <- matrix( 0, ncol=1, nrow=1 ) P.init <- 1+0*P.pattern # fix all entries in the loading matrix to 1 F.pattern <- matrix( 2, nrow=I, ncol=1 ) F.init <- 1+0*F.pattern # estimate model mod2a <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="LSAT6__1dim_1pno", noharm.path=noharm.path, dec="," ) summary(mod2a, logfile="LSAT6__1dim_1pno__SUMMARY") # PMML estimation mod2b <- sirt::rasch.pml3( dat, est.a=rep(1,I), est.sigma=FALSE ) summary(mod2b) #************ # Model 3: 3-parameter normal ogive model with fixed guessing parameters #++ NOHARM estimation # covariance structure P.pattern <- matrix( 0, ncol=1, nrow=1 ) P.init <- 1+0*P.pattern # fix all entries in the loading matrix to 1 F.pattern <- matrix( 1, nrow=I, ncol=1 ) F.init <- 1+0*F.pattern # estimate model mod <- sirt::R2noharm( dat=dat, model.type="CFA", guesses=rep(.2,I), F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="LSAT6__1dim_3pno", noharm.path=noharm.path, dec="," ) summary(mod, logfile="LSAT6__1dim_3pno__SUMMARY") #++ logistic link function employed in smirt function mod1d <- sirt::smirt(dat, Qmatrix=F.pattern, est.a=matrix(1:I,I,1), c.init=rep(.2,I)) summary(mod1d) ############################################################################# # EXAMPLE 2: ACT15 data | Chapter 6 McDonald (1999) ############################################################################# data(data.mcdonald.act15) pm <- data.mcdonald.act15 #************ # Model 1: 2-dimensional exploratory factor analysis mod1 <- sirt::R2noharm( pm=pm, n=1000, model.type="EFA", dimensions=2, writename="ACT15__efa_2dim", noharm.path=noharm.path, dec="," ) summary(mod1) #************ # Model 2: 2-dimensional independent clusters basis solution P.pattern <- matrix(1,2,2) diag(P.pattern) <- 0 P.init <- 1+0*P.pattern F.pattern <- matrix(0,15,2) F.pattern[ c(1:5,11:15),1] <- 1 F.pattern[ c(6:10,11:15),2] <- 1 F.init <- F.pattern # estimate model mod2 <- sirt::R2noharm( pm=pm, n=1000, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern,P.init=P.init, writename="ACT15_indep_clusters", noharm.path=noharm.path, dec="," ) summary(mod2) #************ # Model 3: Hierarchical model P.pattern <- matrix(0,3,3) P.init <- P.pattern diag(P.init) <- 1 F.pattern <- matrix(0,15,3) F.pattern[,1] <- 1 # all items load on g factor F.pattern[ c(1:5,11:15),2] <- 1 # Items 1-5 and 11-15 load on first nested factor F.pattern[ c(6:10,11:15),3] <- 1 # Items 6-10 and 11-15 load on second nested factor F.init <- F.pattern # estimate model mod3 <- sirt::R2noharm( pm=pm, n=1000, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="ACT15_hierarch_model", noharm.path=noharm.path, dec="," ) summary(mod3) ############################################################################# # EXAMPLE 3: Rape myth scale | Chapter 15 McDonald (1999) ############################################################################# data(data.mcdonald.rape) lambda <- data.mcdonald.rape$lambda nu <- data.mcdonald.rape$nu # obtain multiplier for factor loadings (Formula 15.5) k <- sum( lambda[1,] * lambda[2,] ) / sum( lambda[2,]^2 ) ## [1] 1.263243 # additive parameter (Formula 15.7) c <- sum( lambda[2,]*(nu[1,]-nu[2,]) ) / sum( lambda[2,]^2 ) ## [1] 1.247697 # SD in the female group 1/k ## [1] 0.7916132 # M in the female group - c/k ## [1] -0.9876932 # Burt's coefficient of factorial congruence (Formula 15.10a) sum( lambda[1,] * lambda[2,] ) / sqrt( sum( lambda[1,]^2 ) * sum( lambda[2,]^2 ) ) ## [1] 0.9727831 # congruence for mean parameters sum( (nu[1,]-nu[2,]) * lambda[2,] ) / sqrt( sum( (nu[1,]-nu[2,])^2 ) * sum( lambda[2,]^2 ) ) ## [1] 0.968176 ## End(Not run)
Dataset with mixed dichotomous and polytomous item responses.
data(data.mixed1)
data(data.mixed1)
A data frame with 1000 observations on the following 37 variables.
'data.frame': 1000 obs. of 37 variables:
$ I01: num 1 1 1 1 1 1 1 0 1 1 ...
$ I02: num 1 1 1 1 1 1 1 1 0 1 ...
[...]
$ I36: num 1 1 1 1 0 0 0 0 1 1 ...
$ I37: num 0 1 1 1 0 1 0 0 1 1 ...
data(data.mixed1) apply( data.mixed1, 2, max ) ## I01 I02 I03 I04 I05 I06 I07 I08 I09 I10 I11 I12 I13 I14 I15 I16 ## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## I17 I18 I19 I20 I21 I22 I23 I24 I25 I26 I27 I28 I29 I30 I31 I32 ## 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 ## I33 I34 I35 I36 I37 ## 1 1 1 1 1
data(data.mixed1) apply( data.mixed1, 2, max ) ## I01 I02 I03 I04 I05 I06 I07 I08 I09 I10 I11 I12 I13 I14 I15 I16 ## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## I17 I18 I19 I20 I21 I22 I23 I24 I25 I26 I27 I28 I29 I30 I31 I32 ## 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 ## I33 I34 I35 I36 I37 ## 1 1 1 1 1
Datasets for conducting multilevel IRT analysis. This dataset is used
in the examples of the function mcmc.2pno.ml
.
data(data.ml1) data(data.ml2)
data(data.ml1) data(data.ml2)
data.ml1
A data frame with 2000 student observations in 100 classes on 17 variables.
The first variable group
contains the class identifier.
The remaining 16 variables are dichotomous test items.
'data.frame': 2000 obs. of 17 variables:
$ group: num 1001 1001 1001 1001 1001 ...
$ X1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ X2 : num 1 1 1 0 1 1 1 1 1 1 ...
$ X3 : num 0 1 1 0 1 0 1 0 1 0 ...
$ X4 : num 1 1 1 0 0 1 1 1 1 1 ...
$ X5 : num 0 0 0 1 1 1 0 0 1 1 ...
[...]
$ X16 : num 0 0 1 0 0 0 1 0 0 0 ...
data.ml2
A data frame with 2000 student observations in 100 classes on 6 variables.
The first variable group
contains the class identifier.
The remaining 5 variables are polytomous test items.
'data.frame': 2000 obs. of 6 variables:
$ group: num 1 1 1 1 1 1 1 1 1 1 ...
$ X1 : num 2 3 4 3 3 3 1 4 4 3 ...
$ X2 : num 2 2 4 3 3 2 2 3 4 3 ...
$ X3 : num 3 4 5 4 2 3 3 4 4 2 ...
$ X4 : num 2 3 3 2 1 3 1 4 4 3 ...
$ X5 : num 2 3 3 2 3 3 1 3 2 2 ...
Datasets for analyses in NOHARM (see R2noharm
).
data(data.noharmExC) data(data.noharm18)
data(data.noharmExC) data(data.noharm18)
data.noharmExC
The format of this dataset is
'data.frame': 300 obs. of 8 variables:
$ C1: int 1 1 1 1 1 0 1 1 1 1 ...
$ C2: int 1 1 1 1 0 1 1 1 1 1 ...
$ C3: int 1 1 1 1 1 0 0 0 1 1 ...
$ C4: int 0 0 1 1 1 1 1 0 1 0 ...
$ C5: int 1 1 1 1 1 0 0 1 1 0 ...
$ C6: int 1 0 0 0 1 0 1 1 0 1 ...
$ C7: int 1 1 0 0 1 1 0 0 0 1 ...
$ C8: int 1 0 1 0 1 0 1 0 1 1 ...
data.noharm18
A data frame with 200 observations on the following 18 variables I01
,
..., I18
. The format is
'data.frame': 200 obs. of 18 variables:
$ I01: int 1 1 1 1 1 0 1 1 0 1 ...
$ I02: int 1 1 0 1 1 0 1 1 1 1 ...
$ I03: int 1 0 0 1 0 0 1 1 0 1 ...
$ I04: int 0 1 0 1 0 0 0 1 1 1 ...
$ I05: int 1 0 0 0 1 0 1 1 0 1 ...
$ I06: int 1 1 0 1 0 0 1 1 0 1 ...
$ I07: int 1 1 1 1 0 1 1 1 1 1 ...
$ I08: int 1 1 1 1 1 1 1 1 0 1 ...
$ I09: int 1 1 1 1 0 0 1 1 0 1 ...
$ I10: int 1 0 0 1 1 0 1 1 0 1 ...
$ I11: int 1 1 1 1 0 0 1 1 0 1 ...
$ I12: int 0 0 0 0 0 1 0 0 0 0 ...
$ I13: int 1 1 1 1 0 1 1 0 1 1 ...
$ I14: int 1 1 1 0 1 0 1 1 0 1 ...
$ I15: int 1 1 1 0 0 1 1 1 0 1 ...
$ I16: int 1 1 0 1 1 0 1 0 1 1 ...
$ I17: int 0 1 0 0 0 0 1 1 0 1 ...
$ I18: int 0 0 0 0 0 0 0 0 1 0 ...
The datasets contain item parameters to be prepared for linking
using the function linking.haberman
.
data(data.pars1.rasch) data(data.pars1.2pl)
data(data.pars1.rasch) data(data.pars1.2pl)
The format of data.pars1.rasch
is:
'data.frame': 22 obs. of 4 variables:
$ study: chr "study1" "study1" "study1" "study1" ...
$ item : Factor w/ 12 levels "M133","M176",..: 1 2 3 4 5 1 6 7 3 8 ...
$ a : num 1 1 1 1 1 1 1 1 1 1 ...
$ b : num -1.5862 0.40762 1.78031 2.00382 0.00862 ...
Item slopes a
are fixed to 1 in 1PL estimation. Item difficulties
are denoted by b
.
The format of data.pars1.2pl
is:
'data.frame': 22 obs. of 4 variables:
$ study: chr "study1" "study1" "study1" "study1" ...
$ item : Factor w/ 12 levels "M133","M176",..: 1 2 3 4 5 1 6 7 3 8 ...
$ a : num 1.238 0.957 1.83 1.927 2.298 ...
$ b : num -1.16607 0.35844 1.06571 1.17159 0.00792 ...
This is a dataset of the PIRLS 2011 study for 4th graders for the reading booklet 13 (the 'PIRLS reader') and 4 countries (Austria, Germany, France, Netherlands). Missing responses (missing by intention and not reached) are coded by 9.
data(data.pirlsmissing)
data(data.pirlsmissing)
A data frame with 3480 observations on the following 38 variables.
The format is:
'data.frame': 3480 obs. of 38 variables:
$ idstud : int 1000001 1000002 1000003 1000004 1000005 ...
$ country : Factor w/ 4 levels "AUT","DEU","FRA",..: 1 1 1 1 1 1 1 1 1 1 ...
$ studwgt : num 1.06 1.06 1.06 1.06 1.06 ...
$ R31G01M : int 1 1 1 1 1 1 0 1 1 0 ...
$ R31G02C : int 0 9 0 1 0 0 0 0 1 0 ...
$ R31G03M : int 1 1 1 1 0 1 0 0 1 1 ...
[...]
$ R31P15C : int 1 9 0 1 0 0 0 0 1 0 ...
$ R31P16C : int 0 0 0 0 0 0 0 9 0 1 ...
data(data.pirlsmissing) # inspect missing rates round( colMeans( data.pirlsmissing==9 ), 3 ) ## idstud country studwgt R31G01M R31G02C R31G03M R31G04C R31G05M ## 0.000 0.000 0.000 0.009 0.076 0.012 0.203 0.018 ## R31G06M R31G07M R31G08CZ R31G08CA R31G08CB R31G09M R31G10C R31G11M ## 0.010 0.020 0.189 0.225 0.252 0.019 0.126 0.023 ## R31G12C R31G13CZ R31G13CA R31G13CB R31G13CC R31G14M R31P01M R31P02C ## 0.202 0.170 0.198 0.220 0.223 0.074 0.013 0.039 ## R31P03C R31P04M R31P05C R31P06C R31P07C R31P08M R31P09C R31P10M ## 0.056 0.012 0.075 0.043 0.074 0.024 0.062 0.025 ## R31P11M R31P12M R31P13M R31P14C R31P15C R31P16C ## 0.027 0.030 0.030 0.126 0.130 0.127
data(data.pirlsmissing) # inspect missing rates round( colMeans( data.pirlsmissing==9 ), 3 ) ## idstud country studwgt R31G01M R31G02C R31G03M R31G04C R31G05M ## 0.000 0.000 0.000 0.009 0.076 0.012 0.203 0.018 ## R31G06M R31G07M R31G08CZ R31G08CA R31G08CB R31G09M R31G10C R31G11M ## 0.010 0.020 0.189 0.225 0.252 0.019 0.126 0.023 ## R31G12C R31G13CZ R31G13CA R31G13CB R31G13CC R31G14M R31P01M R31P02C ## 0.202 0.170 0.198 0.220 0.223 0.074 0.013 0.039 ## R31P03C R31P04M R31P05C R31P06C R31P07C R31P08M R31P09C R31P10M ## 0.056 0.012 0.075 0.043 0.074 0.024 0.062 0.025 ## R31P11M R31P12M R31P13M R31P14C R31P15C R31P16C ## 0.027 0.030 0.030 0.126 0.130 0.127
This is an example PISA dataset of reading items from the PISA 2009 study of students from Austria. The dataset contains 565 students who worked on the 11 mathematics items from item cluster M3.
data(data.pisaMath)
data(data.pisaMath)
The dataset is a list. The list element data
contains the dataset with the demographical variables
student ID (idstud
), school ID (idschool
), a dummy variable for female
students (female
), socioeconomic status (hisei
)
and migration background (migra
). The remaining
variables (starting with M
in the name) are
the mathematics items.
The item metadata are included in the list element
item
which contains item name (item
) and the
testlet label (testlet
). An item not included
in a testlet is indicated by NA
.
The format is:
List of 2
$ data:'data.frame':
..$ idstud : num [1:565] 9e+10 9e+10 9e+10 9e+10 9e+10 ...
..$ idschool: int [1:565] 900015 900015 900015 900015 ...
..$ female : int [1:565] 0 0 0 0 0 0 0 0 0 0 ...
..$ hisei : num [1:565] -1.16 -1.099 -1.588 -0.365 -1.588 ...
..$ migra : int [1:565] 0 0 0 0 0 0 0 0 0 1 ...
..$ M192Q01 : int [1:565] 1 0 1 1 1 1 1 0 0 0 ...
..$ M406Q01 : int [1:565] 1 1 1 0 1 0 0 0 1 0 ...
..$ M406Q02 : int [1:565] 1 0 0 0 1 0 0 0 1 0 ...
..$ M423Q01 : int [1:565] 0 1 0 1 1 1 1 1 1 0 ...
..$ M496Q01 : int [1:565] 1 0 0 0 0 0 0 0 1 0 ...
..$ M496Q02 : int [1:565] 1 0 0 1 0 1 0 1 1 0 ...
..$ M564Q01 : int [1:565] 1 1 1 1 1 1 0 0 1 0 ...
..$ M564Q02 : int [1:565] 1 0 1 1 1 0 0 0 0 0 ...
..$ M571Q01 : int [1:565] 1 0 0 0 1 0 0 0 0 0 ...
..$ M603Q01 : int [1:565] 1 0 0 0 1 0 0 0 0 0 ...
..$ M603Q02 : int [1:565] 1 0 0 0 1 0 0 0 1 0 ...
$ item:'data.frame':
..$ item : Factor w/ 11 levels "M192Q01","M406Q01",..: 1 2 3 4 ...
..$ testlet: chr [1:11] NA "M406" "M406" NA ...
This data frame contains item parameters from two PISA studies. Because the Rasch model is used, only item difficulties are considered.
data(data.pisaPars)
data(data.pisaPars)
A data frame with 25 observations on the following 4 variables.
item
Item names
testlet
Items are arranged in corresponding testlets. These names are located in this column.
study1
Item difficulties of study 1
study2
Item difficulties of study 2
This is an example PISA dataset of reading items from the PISA 2009 study of students from Austria. The dataset contains 623 students who worked on the 12 reading items from item cluster R7.
data(data.pisaRead)
data(data.pisaRead)
The dataset is a list. The list element data
contains the dataset with the demographical variables
student ID (idstud
), school ID (idschool
), a dummy variable for female
students (female
), socioeconomic status (hisei
)
and migration background (migra
). The remaining
variables (starting with R
in the name) are
the reading items.
The item metadata are included in the list element
item
which contains item name (item
),
testlet label (testlet
), item format (ItemFormat
),
text type (TextType
) and text aspect (Aspect
).
The format is:
List of 2
$ data:'data.frame':
..$ idstud : num [1:623] 9e+10 9e+10 9e+10 9e+10 9e+10 ...
..$ idschool: int [1:623] 900003 900003 900003 900003 ...
..$ female : int [1:623] 1 0 1 0 0 0 1 0 1 0 ...
..$ hisei : num [1:623] -1.16 -0.671 1.286 0.185 1.225 ...
..$ migra : int [1:623] 0 0 0 0 0 0 0 0 0 0 ...
..$ R432Q01 : int [1:623] 1 1 1 1 1 1 1 1 1 1 ...
..$ R432Q05 : int [1:623] 1 1 1 1 1 0 1 1 1 0 ...
..$ R432Q06 : int [1:623] 0 0 0 0 0 0 0 0 0 0 ...
..$ R456Q01 : int [1:623] 1 1 1 1 1 1 1 1 1 1 ...
..$ R456Q02 : int [1:623] 1 1 1 1 1 1 1 1 1 1 ...
..$ R456Q06 : int [1:623] 1 1 1 1 1 1 0 0 1 1 ...
..$ R460Q01 : int [1:623] 1 1 0 0 0 0 0 1 1 1 ...
..$ R460Q05 : int [1:623] 1 1 1 1 1 1 1 1 1 1 ...
..$ R460Q06 : int [1:623] 0 1 1 1 1 1 0 0 1 1 ...
..$ R466Q02 : int [1:623] 0 1 0 1 1 0 1 0 0 1 ...
..$ R466Q03 : int [1:623] 0 0 0 1 0 0 0 1 0 1 ...
..$ R466Q06 : int [1:623] 0 1 1 1 1 1 0 1 1 1 ...
$ item:'data.frame':
..$ item : Factor w/ 12 levels "R432Q01","R432Q05",..: 1 2 3 4 ...
..$ testlet : Factor w/ 4 levels "R432","R456",..: 1 1 1 2 ...
..$ ItemFormat: Factor w/ 2 levels "CR","MC": 1 1 2 2 1 1 1 2 2 1 ...
..$ TextType : Factor w/ 3 levels "Argumentation",..: 1 1 1 3 ...
..$ Aspect : Factor w/ 3 levels "Access_and_retrieve",..: 2 3 2 1 ...
Some datasets for pairwise comparisons.
data(data.pw01)
data(data.pw01)
The dataset data.pw01
contains results of a German football league
from the season 2000/01.
Some rating datasets.
data(data.ratings1) data(data.ratings2) data(data.ratings3)
data(data.ratings1) data(data.ratings2) data(data.ratings3)
Dataset data.ratings1
:
Data frame with 274 observations containing 5 criteria (k1
, ..., k5
),
135 students and 7 raters.
'data.frame': 274 obs. of 7 variables:
$ idstud: int 100020106 100020106 100070101 100070101 100100109 ...
$ rater : Factor w/ 16 levels "db01","db02",..: 3 15 5 10 2 1 5 4 1 5 ...
$ k1 : int 1 1 0 1 2 0 1 3 0 0 ...
$ k2 : int 1 1 1 1 1 0 0 3 0 0 ...
$ k3 : int 1 1 1 1 2 0 0 3 1 0 ...
$ k4 : int 1 1 1 2 1 0 0 2 0 1 ...
$ k5 : int 2 2 1 2 0 1 0 3 1 0 ...
Data from a 2009 Austrian survey of national educational
standards for 8th graders in German language writing.
Variables k1
to k5
denote several rating
criteria of writing competency.
Dataset data.ratings2
:
Data frame with 615 observations containing 5 criteria (k1
, ..., k5
),
178 students and 16 raters.
'data.frame': 615 obs. of 7 variables:
$ idstud: num 1001 1001 1002 1002 1003 ...
$ rater : chr "R03" "R15" "R05" "R10" ...
$ k1 : int 1 1 0 1 2 0 1 3 3 0 ...
$ k2 : int 1 1 1 1 1 0 0 3 3 0 ...
$ k3 : int 1 1 1 1 2 0 0 3 3 1 ...
$ k4 : int 1 1 1 2 1 0 0 2 2 0 ...
$ k5 : int 2 2 1 2 0 1 0 3 2 1 ...
Dataset data.ratings3
:
Data frame with 3169 observations containing 4 criteria (crit2
, ..., crit6
),
561 students and 52 raters.
'data.frame': 3169 obs. of 6 variables:
$ idstud: num 10001 10001 10002 10002 10003 ...
$ rater : num 840 838 842 808 830 845 813 849 809 802 ...
$ crit2 : int 1 3 3 1 2 2 2 2 3 3 ...
$ crit3 : int 2 2 2 2 2 2 2 2 3 3 ...
$ crit4 : int 1 2 2 2 1 1 1 2 2 2 ...
$ crit6 : num 4 4 4 3 4 4 4 4 4 4 ...
Dataset with raw item responses
data(data.raw1)
data(data.raw1)
A data frame with raw item responses of 1200 persons on the following 77 items:
'data.frame': 1200 obs. of 77 variables:
$ I101: num 0 0 0 2 0 0 0 0 0 0 ...
$ I102: int NA NA 2 1 2 1 3 2 NA NA ...
$ I103: int 1 1 NA NA NA NA NA NA 1 1 ...
...
$ I179: chr "E" "C" "D" "E" ...
This dataset contains students and
items measuring reading
competence. All 12 items are arranged into 3 testlets (items with common
text stimulus) labeled as
A, B and C. The allocation of items to testlets is indicated by their
variable names.
data(data.read)
data(data.read)
A data frame with 328 persons on the following 12 variables.
Rows correspond to persons and columns to items. The following items are
included in data.read
:
Testlet A: A1
, A2
, A3
, A4
Testlet B: B1
, B2
, B3
, B4
Testlet C: C1
, C2
, C3
, C4
## Not run: data(data.read) dat <- data.read I <- ncol(dat) # list of needed packages for the following examples packages <- scan(what="character") eRm ltm TAM mRm CDM mirt psychotools IsingFit igraph qgraph pcalg poLCA randomLCA psychomix MplusAutomation lavaan # load packages. make an installation if necessary miceadds::library_install(packages) #***************************************************** # Model 1: Rasch model #***************************************************** #---- M1a: rasch.mml2 (in sirt) mod1a <- sirt::rasch.mml2(dat) summary(mod1a) #---- M1b: smirt (in sirt) Qmatrix <- matrix(1,nrow=I, ncol=1) mod1b <- sirt::smirt(dat,Qmatrix=Qmatrix) summary(mod1b) #---- M1c: gdm (in CDM) theta.k <- seq(-6,6,len=21) mod1c <- CDM::gdm(dat,theta.k=theta.k,irtmodel="1PL", skillspace="normal") summary(mod1c) #---- M1d: tam.mml (in TAM) mod1d <- TAM::tam.mml( resp=dat ) summary(mod1d) #---- M1e: RM (in eRm) mod1e <- eRm::RM( dat ) # eRm uses Conditional Maximum Likelihood (CML) as the estimation method. summary(mod1e) eRm::plotPImap(mod1e) #---- M1f: mrm (in mRm) mod1f <- mRm::mrm( dat, cl=1) # CML estimation mod1f$beta # item parameters #---- M1g: mirt (in mirt) mod1g <- mirt::mirt( dat, model=1, itemtype="Rasch", verbose=TRUE ) print(mod1g) summary(mod1g) coef(mod1g) # arrange coefficients in nicer layout sirt::mirt.wrapper.coef(mod1g)$coef #---- M1h: rasch (in ltm) mod1h <- ltm::rasch( dat, control=list(verbose=TRUE ) ) summary(mod1h) coef(mod1h) #---- M1i: RaschModel.fit (in psychotools) mod1i <- psychotools::RaschModel.fit(dat) # CML estimation summary(mod1i) plot(mod1i) #---- M1j: noharm.sirt (in sirt) Fpatt <- matrix( 0, I, 1 ) Fval <- 1 + 0*Fpatt Ppatt <- Pval <- matrix(1,1,1) mod1j <- sirt::noharm.sirt( dat=dat, Ppatt=Ppatt, Fpatt=Fpatt, Fval=Fval, Pval=Pval) summary(mod1j) # Normal-ogive model, multiply item discriminations with constant D=1.7. # The same holds for other examples with noharm.sirt and R2noharm. plot(mod1j) #---- M1k: rasch.pml3 (in sirt) mod1k <- sirt::rasch.pml3( dat=dat) # pairwise marginal maximum likelihood estimation summary(mod1k) #---- M1l: running Mplus (using MplusAutomation package) mplus_path <- "c:/Mplus7/Mplus.exe" # locate Mplus executable #**************** # specify Mplus object mplusmod <- MplusAutomation::mplusObject( TITLE="1PL in Mplus ;", VARIABLE=paste0( "CATEGORICAL ARE ", paste0(colnames(dat),collapse=" ") ), MODEL=" ! fix all item loadings to 1 F1 BY A1@1 A2@1 A3@1 A4@1 ; F1 BY B1@1 B2@1 B3@1 B4@1 ; F1 BY C1@1 C2@1 C3@1 C4@1 ; ! estimate variance F1 ; ", ANALYSIS="ESTIMATOR=MLR;", OUTPUT="stand;", usevariables=colnames(dat), rdata=dat ) #**************** # write Mplus syntax filename <- "mod1u" # specify file name # create Mplus syntaxes res2 <- MplusAutomation::mplusModeler(object=mplusmod, dataout=paste0(filename,".dat"), modelout=paste0(filename,".inp"), run=0 ) # run Mplus model MplusAutomation::runModels( filefilter=paste0(filename,".inp"), Mplus_command=mplus_path) # alternatively, the system() command can also be used # get results mod1l <- MplusAutomation::readModels(target=getwd(), filefilter=filename ) mod1l$summaries # summaries mod1l$parameters$unstandardized # parameter estimates #***************************************************** # Model 2: 2PL model #***************************************************** #---- M2a: rasch.mml2 (in sirt) mod2a <- sirt::rasch.mml2(dat, est.a=1:I) summary(mod2a) #---- M2b: smirt (in sirt) mod2b <- sirt::smirt(dat,Qmatrix=Qmatrix,est.a="2PL") summary(mod2b) #---- M2c: gdm (in CDM) mod2c <- CDM::gdm(dat,theta.k=theta.k,irtmodel="2PL", skillspace="normal") summary(mod2c) #---- M2d: tam.mml (in TAM) mod2d <- TAM::tam.mml.2pl( resp=dat ) summary(mod2d) #---- M2e: mirt (in mirt) mod2e <- mirt::mirt( dat, model=1, itemtype="2PL" ) print(mod2e) summary(mod2e) sirt::mirt.wrapper.coef(mod1g)$coef #---- M2f: ltm (in ltm) mod2f <- ltm::ltm( dat ~ z1, control=list(verbose=TRUE ) ) summary(mod2f) coef(mod2f) plot(mod2f) #---- M2g: R2noharm (in NOHARM, running from within R using sirt package) # define noharm.path where 'NoharmCL.exe' is located noharm.path <- "c:/NOHARM" # covariance matrix P.pattern <- matrix( 1, ncol=1, nrow=1 ) P.init <- P.pattern P.init[1,1] <- 1 # loading matrix F.pattern <- matrix(1,I,1) F.init <- F.pattern # estimate model mod2g <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="ex2g", noharm.path=noharm.path, dec="," ) summary(mod2g) #---- M2h: noharm.sirt (in sirt) mod2h <- sirt::noharm.sirt( dat=dat, Ppatt=P.pattern,Fpatt=F.pattern, Fval=F.init, Pval=P.init ) summary(mod2h) plot(mod2h) #---- M2i: rasch.pml2 (in sirt) mod2i <- sirt::rasch.pml2(dat, est.a=1:I) summary(mod2i) #---- M2j: WLSMV estimation with cfa (in lavaan) lavmodel <- "F=~ A1+A2+A3+A4+B1+B2+B3+B4+ C1+C2+C3+C4" mod2j <- lavaan::cfa( data=dat, model=lavmodel, std.lv=TRUE, ordered=colnames(dat)) summary(mod2j, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE) #***************************************************** # Model 3: 3PL model (note that results can be quite unstable!) #***************************************************** #---- M3a: rasch.mml2 (in sirt) mod3a <- sirt::rasch.mml2(dat, est.a=1:I, est.c=1:I) summary(mod3a) #---- M3b: smirt (in sirt) mod3b <- sirt::smirt(dat,Qmatrix=Qmatrix,est.a="2PL", est.c=1:I) summary(mod3b) #---- M3c: mirt (in mirt) mod3c <- mirt::mirt( dat, model=1, itemtype="3PL", verbose=TRUE) summary(mod3c) coef(mod3c) # stabilize parameter estimating using informative priors for guessing parameters mirtmodel <- mirt::mirt.model(" F=1-12 PRIOR=(1-12, g, norm, -1.38, 0.25) ") # a prior N(-1.38,.25) is specified for transformed guessing parameters: qlogis(g) # simulate values from this prior for illustration N <- 100000 logit.g <- stats::rnorm(N, mean=-1.38, sd=sqrt(.5) ) graphics::plot( stats::density(logit.g) ) # transformed qlogis(g) graphics::plot( stats::density( stats::plogis(logit.g)) ) # g parameters # estimate 3PL with priors mod3c1 <- mirt::mirt(dat, mirtmodel, itemtype="3PL",verbose=TRUE) coef(mod3c1) # In addition, set upper bounds for g parameters of .35 mirt.pars <- mirt::mirt( dat, mirtmodel, itemtype="3PL", pars="values") ind <- which( mirt.pars$name=="g" ) mirt.pars[ ind, "value" ] <- stats::plogis(-1.38) mirt.pars[ ind, "ubound" ] <- .35 # prior distribution for slopes ind <- which( mirt.pars$name=="a1" ) mirt.pars[ ind, "prior_1" ] <- 1.3 mirt.pars[ ind, "prior_2" ] <- 2 mod3c2 <- mirt::mirt(dat, mirtmodel, itemtype="3PL", pars=mirt.pars,verbose=TRUE, technical=list(NCYCLES=100) ) coef(mod3c2) sirt::mirt.wrapper.coef(mod3c2) #---- M3d: ltm (in ltm) mod3d <- ltm::tpm( dat, control=list(verbose=TRUE), max.guessing=.3) summary(mod3d) coef(mod3d) #=> numerical instabilities #***************************************************** # Model 4: 3-dimensional Rasch model #***************************************************** # define Q-matrix Q <- matrix( 0, nrow=12, ncol=3 ) Q[ cbind(1:12, rep(1:3,each=4) ) ] <- 1 rownames(Q) <- colnames(dat) colnames(Q) <- c("A","B","C") # define nodes theta.k <- seq(-6,6,len=13) #---- M4a: smirt (in sirt) mod4a <- sirt::smirt(dat,Qmatrix=Q,irtmodel="comp", theta.k=theta.k, maxiter=30) summary(mod4a) #---- M4b: rasch.mml2 (in sirt) mod4b <- sirt::rasch.mml2(dat,Q=Q,theta.k=theta.k, mmliter=30) summary(mod4b) #---- M4c: gdm (in CDM) mod4c <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="normal", Qmatrix=Q, maxiter=30, centered.latent=TRUE ) summary(mod4c) #---- M4d: tam.mml (in TAM) mod4d <- TAM::tam.mml( resp=dat, Q=Q, control=list(nodes=theta.k, maxiter=30) ) summary(mod4d) #---- M4e: R2noharm (in NOHARM, running from within R using sirt package) noharm.path <- "c:/NOHARM" # covariance matrix P.pattern <- matrix( 1, ncol=3, nrow=3 ) P.init <- 0.8+0*P.pattern diag(P.init) <- 1 # loading matrix F.pattern <- 0*Q F.init <- Q # estimate model mod4e <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="ex4e", noharm.path=noharm.path, dec="," ) summary(mod4e) #---- M4f: mirt (in mirt) cmodel <- mirt::mirt.model(" F1=1-4 F2=5-8 F3=9-12 # equal item slopes correspond to the Rasch model CONSTRAIN=(1-4, a1), (5-8, a2), (9-12,a3) COV=F1*F2, F1*F3, F2*F3 " ) mod4f <- mirt::mirt(dat, cmodel, verbose=TRUE) summary(mod4f) #***************************************************** # Model 5: 3-dimensional 2PL model #***************************************************** #---- M5a: smirt (in sirt) mod5a <- sirt::smirt(dat,Qmatrix=Q,irtmodel="comp", est.a="2PL", theta.k=theta.k, maxiter=30) summary(mod5a) #---- M5b: rasch.mml2 (in sirt) mod5b <- sirt::rasch.mml2(dat,Q=Q,theta.k=theta.k,est.a=1:12, mmliter=30) summary(mod5b) #---- M5c: gdm (in CDM) mod5c <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, skillspace="loglinear", Qmatrix=Q, maxiter=30, centered.latent=TRUE, standardized.latent=TRUE) summary(mod5c) #---- M5d: tam.mml (in TAM) mod5d <- TAM::tam.mml.2pl( resp=dat, Q=Q, control=list(nodes=theta.k, maxiter=30) ) summary(mod5d) #---- M5e: R2noharm (in NOHARM, running from within R using sirt package) noharm.path <- "c:/NOHARM" # covariance matrix P.pattern <- matrix( 1, ncol=3, nrow=3 ) diag(P.pattern) <- 0 P.init <- 0.8+0*P.pattern diag(P.init) <- 1 # loading matrix F.pattern <- Q F.init <- Q # estimate model mod5e <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="ex5e", noharm.path=noharm.path, dec="," ) summary(mod5e) #---- M5f: mirt (in mirt) cmodel <- mirt::mirt.model(" F1=1-4 F2=5-8 F3=9-12 COV=F1*F2, F1*F3, F2*F3 " ) mod5f <- mirt::mirt(dat, cmodel, verbose=TRUE) summary(mod5f) #***************************************************** # Model 6: Network models (Graphical models) #***************************************************** #---- M6a: Ising model using the IsingFit package (undirected graph) # - fit Ising model using the "OR rule" (AND=FALSE) mod6a <- IsingFit::IsingFit(x=dat, family="binomial", AND=FALSE) summary(mod6a) ## Network Density: 0.29 ## Gamma: 0.25 ## Rule used: Or-rule # plot results qgraph::qgraph(mod6a$weiadj,fade=FALSE) #**-- graph estimation using pcalg package # some packages from Bioconductor must be downloaded at first (if not yet done) if (FALSE){ # set 'if (TRUE)' if packages should be downloaded source("http://bioconductor.org/biocLite.R") biocLite("RBGL") biocLite("Rgraphviz") } #---- M6b: graph estimation based on Pearson correlations V <- colnames(dat) n <- nrow(dat) mod6b <- pcalg::pc(suffStat=list(C=stats::cor(dat), n=n ), indepTest=gaussCItest, ## indep.test: partial correlations alpha=0.05, labels=V, verbose=TRUE) plot(mod6b) # plot in qgraph package qgraph::qgraph(mod6b, label.color=rep( c( "red", "blue","darkgreen" ), each=4 ), edge.color="black") summary(mod6b) #---- M6c: graph estimation based on tetrachoric correlations mod6c <- pcalg::pc(suffStat=list(C=sirt::tetrachoric2(dat)$rho, n=n ), indepTest=gaussCItest, alpha=0.05, labels=V, verbose=TRUE) plot(mod6c) summary(mod6c) #---- M6d: Statistical implicative analysis (in sirt) mod6d <- sirt::sia.sirt(dat, significance=.85 ) # plot results with igraph and qgraph package plot( mod6d$igraph.obj, vertex.shape="rectangle", vertex.size=30 ) qgraph::qgraph( mod6d$adj.matrix ) #***************************************************** # Model 7: Latent class analysis with 3 classes #***************************************************** #---- M7a: randomLCA (in randomLCA) # - use two trials of starting values mod7a <- randomLCA::randomLCA(dat,nclass=3, notrials=2, verbose=TRUE) summary(mod7a) plot(mod7a,type="l", xlab="Item") #---- M7b: rasch.mirtlc (in sirt) mod7b <- sirt::rasch.mirtlc( dat, Nclasses=3,seed=-30, nstarts=2 ) summary(mod7b) matplot( t(mod7b$pjk), type="l", xlab="Item" ) #---- M7c: poLCA (in poLCA) # define formula for outcomes f7c <- paste0( "cbind(", paste0(colnames(dat),collapse=","), ") ~ 1 " ) dat1 <- as.data.frame( dat + 1 ) # poLCA needs integer values from 1,2,.. mod7c <- poLCA::poLCA( stats::as.formula(f7c),dat1,nclass=3, verbose=TRUE) plot(mod7c) #---- M7d: gom.em (in sirt) # - the latent class model is a special grade of membership model mod7d <- sirt::gom.em( dat, K=3, problevels=c(0,1), model="GOM" ) summary(mod7d) #---- - M7e: mirt (in mirt) # define three latent classes Theta <- diag(3) # define mirt model I <- ncol(dat) # I=12 mirtmodel <- mirt::mirt.model(" C1=1-12 C2=1-12 C3=1-12 ") # get initial parameter values mod.pars <- mirt::mirt(dat, model=mirtmodel, pars="values") # modify parameters: only slopes refer to item-class probabilities set.seed(9976) # set starting values for class specific item probabilities mod.pars[ mod.pars$name=="d","value" ] <- 0 mod.pars[ mod.pars$name=="d","est" ] <- FALSE b1 <- stats::qnorm( colMeans( dat ) ) mod.pars[ mod.pars$name=="a1","value" ] <- b1 # random starting values for other classes mod.pars[ mod.pars$name %in% c("a2","a3"),"value" ] <- b1 + stats::runif(12*2,-1,1) mod.pars #** define prior for latent class analysis lca_prior <- function(Theta,Etable){ # number of latent Theta classes TP <- nrow(Theta) # prior in initial iteration if ( is.null(Etable) ){ prior <- rep( 1/TP, TP ) } # process Etable (this is correct for datasets without missing data) if ( ! is.null(Etable) ){ # sum over correct and incorrect expected responses prior <- ( rowSums(Etable[, seq(1,2*I,2)]) + rowSums(Etable[,seq(2,2*I,2)]) )/I } prior <- prior / sum(prior) return(prior) } #** estimate model mod7e <- mirt::mirt(dat, mirtmodel, pars=mod.pars, verbose=TRUE, technical=list( customTheta=Theta, customPriorFun=lca_prior) ) # compare estimated results print(mod7e) summary(mod7b) # The number of estimated parameters is incorrect because mirt does not correctly count # estimated parameters from the user customized prior distribution. mod7e@nest <- as.integer(sum(mod.pars$est) + 2) # two additional class probabilities # extract log-likelihood mod7e@logLik # compute AIC and BIC ( AIC <- -2*mod7e@logLik+2*mod7e@nest ) ( BIC <- -2*mod7e@logLik+log(mod7e@Data$N)*mod7e@nest ) # RMSEA and SRMSR fit statistic mirt::M2(mod7e) # TLI and CFI does not make sense in this example #** extract item parameters sirt::mirt.wrapper.coef(mod7e) #** extract class-specific item-probabilities probs <- apply( coef1[, c("a1","a2","a3") ], 2, stats::plogis ) matplot( probs, type="l", xlab="Item", main="mirt::mirt") #** inspect estimated distribution mod7e@Theta mod7e@Prior[[1]] #***************************************************** # Model 8: Mixed Rasch model with two classes #***************************************************** #---- M8a: raschmix (in psychomix) mod8a <- psychomix::raschmix(data=as.matrix(dat), k=2, scores="saturated") summary(mod8a) #---- M8b: mrm (in mRm) mod8b <- mRm::mrm(data.matrix=dat, cl=2) mod8b$conv.to.bound plot(mod8b) print(mod8b) #---- M8c: mirt (in mirt) #* define theta grid theta.k <- seq( -5, 5, len=9 ) TP <- length(theta.k) Theta <- matrix( 0, nrow=2*TP, ncol=4) Theta[1:TP,1:2] <- cbind(theta.k, 1 ) Theta[1:TP + TP,3:4] <- cbind(theta.k, 1 ) Theta # define model I <- ncol(dat) # I=12 mirtmodel <- mirt::mirt.model(" F1a=1-12 # slope Class 1 F1b=1-12 # difficulty Class 1 F2a=1-12 # slope Class 2 F2b=1-12 # difficulty Class 2 CONSTRAIN=(1-12,a1),(1-12,a3) ") # get initial parameter values mod.pars <- mirt::mirt(dat, model=mirtmodel, pars="values") # set starting values for class specific item probabilities mod.pars[ mod.pars$name=="d","value" ] <- 0 mod.pars[ mod.pars$name=="d","est" ] <- FALSE mod.pars[ mod.pars$name=="a1","value" ] <- 1 mod.pars[ mod.pars$name=="a3","value" ] <- 1 # initial values difficulties b1 <- stats::qlogis( colMeans(dat) ) mod.pars[ mod.pars$name=="a2","value" ] <- b1 mod.pars[ mod.pars$name=="a4","value" ] <- b1 + stats::runif(I, -1, 1) #* define prior for mixed Rasch analysis mixed_prior <- function(Theta,Etable){ NC <- 2 # number of theta classes TP <- nrow(Theta) / NC prior1 <- stats::dnorm( Theta[1:TP,1] ) prior1 <- prior1 / sum(prior1) if ( is.null(Etable) ){ prior <- c( prior1, prior1 ) } if ( ! is.null(Etable) ){ prior <- ( rowSums( Etable[, seq(1,2*I,2)] ) + rowSums( Etable[,seq(2,2*I,2)]) )/I a1 <- stats::aggregate( prior, list( rep(1:NC, each=TP) ), sum ) a1[,2] <- a1[,2] / sum( a1[,2]) # print some information during estimation cat( paste0( " Class proportions: ", paste0( round(a1[,2], 3 ), collapse=" " ) ), "\n") a1 <- rep( a1[,2], each=TP ) # specify mixture of two normal distributions prior <- a1*c(prior1,prior1) } prior <- prior / sum(prior) return(prior) } #* estimate model mod8c <- mirt::mirt(dat, mirtmodel, pars=mod.pars, verbose=TRUE, technical=list( customTheta=Theta, customPriorFun=mixed_prior ) ) # Like in Model 7e, the number of estimated parameters must be included. mod8c@nest <- as.integer(sum(mod.pars$est) + 1) # two class proportions and therefore one probability is freely estimated. #* extract item parameters sirt::mirt.wrapper.coef(mod8c) #* estimated distribution mod8c@Theta mod8c@Prior #---- M8d: tamaan (in TAM) tammodel <- " ANALYSIS: TYPE=MIXTURE ; NCLASSES(2); NSTARTS(7,20); LAVAAN MODEL: F=~ A1__C4 F ~~ F ITEM TYPE: ALL(Rasch); " mod8d <- TAM::tamaan( tammodel, resp=dat ) summary(mod8d) # plot item parameters I <- 12 ipars <- mod8d$itempartable_MIXTURE[ 1:I, ] plot( 1:I, ipars[,3], type="o", ylim=range( ipars[,3:4] ), pch=16, xlab="Item", ylab="Item difficulty") lines( 1:I, ipars[,4], type="l", col=2, lty=2) points( 1:I, ipars[,4], col=2, pch=2) #***************************************************** # Model 9: Mixed 2PL model with two classes #***************************************************** #---- M9a: tamaan (in TAM) tammodel <- " ANALYSIS: TYPE=MIXTURE ; NCLASSES(2); NSTARTS(10,30); LAVAAN MODEL: F=~ A1__C4 F ~~ F ITEM TYPE: ALL(2PL); " mod9a <- TAM::tamaan( tammodel, resp=dat ) summary(mod9a) #***************************************************** # Model 10: Rasch testlet model #***************************************************** #---- M10a: tam.fa (in TAM) dims <- substring( colnames(dat),1,1 ) # define dimensions mod10a <- TAM::tam.fa( resp=dat, irtmodel="bifactor1", dims=dims, control=list(maxiter=60) ) summary(mod10a) #---- M10b: mirt (in mirt) cmodel <- mirt::mirt.model(" G=1-12 A=1-4 B=5-8 C=9-12 CONSTRAIN=(1-12,a1), (1-4, a2), (5-8, a3), (9-12,a4) ") mod10b <- mirt::mirt(dat, model=cmodel, verbose=TRUE) summary(mod10b) coef(mod10b) mod10b@logLik # equivalent is slot( mod10b, "logLik") #alternatively, using a dimensional reduction approach (faster and better accuracy) cmodel <- mirt::mirt.model(" G=1-12 CONSTRAIN=(1-12,a1), (1-4, a2), (5-8, a3), (9-12,a4) ") item_bundles <- rep(c(1,2,3), each=4) mod10b1 <- mirt::bfactor(dat, model=item_bundles, model2=cmodel, verbose=TRUE) coef(mod10b1) #---- M10c: smirt (in sirt) # define Q-matrix Qmatrix <- matrix(0,12,4) Qmatrix[,1] <- 1 Qmatrix[ cbind( 1:12, match( dims, unique(dims)) +1 ) ] <- 1 # uncorrelated factors variance.fixed <- cbind( c(1,1,1,2,2,3), c(2,3,4,3,4,4), 0 ) # estimate model mod10c <- sirt::smirt( dat, Qmatrix=Qmatrix, irtmodel="comp", variance.fixed=variance.fixed, qmcnodes=1000, maxiter=60) summary(mod10c) #***************************************************** # Model 11: Bifactor model #***************************************************** #---- M11a: tam.fa (in TAM) dims <- substring( colnames(dat),1,1 ) # define dimensions mod11a <- TAM::tam.fa( resp=dat, irtmodel="bifactor2", dims=dims, control=list(maxiter=60) ) summary(mod11a) #---- M11b: bfactor (in mirt) dims1 <- match( dims, unique(dims) ) mod11b <- mirt::bfactor(dat, model=dims1, verbose=TRUE) summary(mod11b) coef(mod11b) mod11b@logLik #---- M11c: smirt (in sirt) # define Q-matrix Qmatrix <- matrix(0,12,4) Qmatrix[,1] <- 1 Qmatrix[ cbind( 1:12, match( dims, unique(dims)) +1 ) ] <- 1 # uncorrelated factors variance.fixed <- cbind( c(1,1,1,2,2,3), c(2,3,4,3,4,4), 0 ) # estimate model mod11c <- sirt::smirt( dat, Qmatrix=Qmatrix, irtmodel="comp", est.a="2PL", variance.fixed=variance.fixed, qmcnodes=1000, maxiter=60) summary(mod11c) #***************************************************** # Model 12: Located latent class model: Rasch model with three theta classes #***************************************************** # use 10th item as the reference item ref.item <- 10 # ability grid theta.k <- seq(-4,4,len=9) #---- M12a: rasch.mirtlc (in sirt) mod12a <- sirt::rasch.mirtlc(dat, Nclasses=3, modeltype="MLC1", ref.item=ref.item) summary(mod12a) #---- M12b: gdm (in CDM) theta.k <- seq(-1, 1, len=3) # initial matrix b.constraint <- matrix( c(10,1,0), nrow=1,ncol=3) # estimate model mod12b <- CDM::gdm( dat, theta.k=theta.k, skillspace="est", irtmodel="1PL", b.constraint=b.constraint, maxiter=200) summary(mod12b) #---- M12c: mirt (in mirt) items <- colnames(dat) # define three latent classes Theta <- diag(3) # define mirt model I <- ncol(dat) # I=12 mirtmodel <- mirt::mirt.model(" C1=1-12 C2=1-12 C3=1-12 CONSTRAIN=(1-12,a1),(1-12,a2),(1-12,a3) ") # get parameters mod.pars <- mirt(dat, model=mirtmodel, pars="values") # set starting values for class specific item probabilities mod.pars[ mod.pars$name=="d","value" ] <- stats::qlogis( colMeans(dat,na.rm=TRUE) ) # set item difficulty of reference item to zero ind <- which( ( paste(mod.pars$item)==items[ref.item] ) & ( ( paste(mod.pars$name)=="d" ) ) ) mod.pars[ ind,"value" ] <- 0 mod.pars[ ind,"est" ] <- FALSE # initial values for a1, a2 and a3 mod.pars[ mod.pars$name %in% c("a1","a2","a3"),"value" ] <- c(-1,0,1) mod.pars #* define prior for latent class analysis lca_prior <- function(Theta,Etable){ # number of latent Theta classes TP <- nrow(Theta) # prior in initial iteration if ( is.null(Etable) ){ prior <- rep( 1/TP, TP ) } # process Etable (this is correct for datasets without missing data) if ( ! is.null(Etable) ){ # sum over correct and incorrect expected responses prior <- ( rowSums( Etable[, seq(1,2*I,2)] ) + rowSums( Etable[, seq(2,2*I,2)] ) )/I } prior <- prior / sum(prior) return(prior) } #* estimate model mod12c <- mirt(dat, mirtmodel, technical=list( customTheta=Theta, customPriorFun=lca_prior), pars=mod.pars, verbose=TRUE ) # estimated parameters from the user customized prior distribution. mod12c@nest <- as.integer(sum(mod.pars$est) + 2) #* extract item parameters coef1 <- sirt::mirt.wrapper.coef(mod12c) #* inspect estimated distribution mod12c@Theta coef1$coef[1,c("a1","a2","a3")] mod12c@Prior[[1]] #***************************************************** # Model 13: Multidimensional model with discrete traits #***************************************************** # define Q-Matrix Q <- matrix( 0, nrow=12,ncol=3) Q[1:4,1] <- 1 Q[5:8,2] <- 1 Q[9:12,3] <- 1 # define discrete theta distribution with 3 dimensions Theta <- scan(what="character",nlines=1) 000 100 010 001 110 101 011 111 Theta <- as.numeric( unlist( lapply( Theta, strsplit, split="") ) ) Theta <- matrix(Theta, 8, 3, byrow=TRUE ) Theta #---- Model 13a: din (in CDM) mod13a <- CDM::din( dat, q.matrix=Q, rule="DINA") summary(mod13a) # compare used Theta distributions cbind( Theta, mod13a$attribute.patt.splitted) #---- Model 13b: gdm (in CDM) mod13b <- CDM::gdm( dat, Qmatrix=Q, theta.k=Theta, skillspace="full") summary(mod13b) #---- Model 13c: mirt (in mirt) # define mirt model I <- ncol(dat) # I=12 mirtmodel <- mirt::mirt.model(" F1=1-4 F2=5-8 F3=9-12 ") # get parameters mod.pars <- mirt(dat, model=mirtmodel, pars="values") # starting values d parameters (transformed guessing parameters) ind <- which( mod.pars$name=="d" ) mod.pars[ind,"value"] <- stats::qlogis(.2) # starting values transformed slipping parameters ind <- which( ( mod.pars$name %in% paste0("a",1:3) ) & ( mod.pars$est ) ) mod.pars[ind,"value"] <- stats::qlogis(.8) - stats::qlogis(.2) mod.pars #* define prior for latent class analysis lca_prior <- function(Theta,Etable){ TP <- nrow(Theta) if ( is.null(Etable) ){ prior <- rep( 1/TP, TP ) } if ( ! is.null(Etable) ){ prior <- ( rowSums( Etable[, seq(1,2*I,2)] ) + rowSums( Etable[, seq(2,2*I,2)] ) )/I } prior <- prior / sum(prior) return(prior) } #* estimate model mod13c <- mirt(dat, mirtmodel, technical=list( customTheta=Theta, customPriorFun=lca_prior), pars=mod.pars, verbose=TRUE ) # estimated parameters from the user customized prior distribution. mod13c@nest <- as.integer(sum(mod.pars$est) + 2) #* extract item parameters coef13c <- sirt::mirt.wrapper.coef(mod13c)$coef #* inspect estimated distribution mod13c@Theta mod13c@Prior[[1]] #-* comparisons of estimated parameters # extract guessing and slipping parameters from din dfr <- coef(mod13a)[, c("guess","slip") ] colnames(dfr) <- paste0("din.",c("guess","slip") ) # estimated parameters from gdm dfr$gdm.guess <- stats::plogis(mod13b$item$b) dfr$gdm.slip <- 1 - stats::plogis( rowSums(mod13b$item[,c("b.Cat1","a.F1","a.F2","a.F3")] ) ) # estimated parameters from mirt dfr$mirt.guess <- stats::plogis( coef13c$d ) dfr$mirt.slip <- 1 - stats::plogis( rowSums(coef13c[,c("d","a1","a2","a3")]) ) # comparison round(dfr[, c(1,3,5,2,4,6)],3) ## din.guess gdm.guess mirt.guess din.slip gdm.slip mirt.slip ## A1 0.691 0.684 0.686 0.000 0.000 0.000 ## A2 0.491 0.489 0.489 0.031 0.038 0.036 ## A3 0.302 0.300 0.300 0.184 0.193 0.190 ## A4 0.244 0.239 0.240 0.337 0.340 0.339 ## B1 0.568 0.579 0.577 0.163 0.148 0.151 ## B2 0.329 0.344 0.340 0.344 0.326 0.329 ## B3 0.817 0.827 0.825 0.014 0.007 0.009 ## B4 0.431 0.463 0.456 0.104 0.089 0.092 ## C1 0.188 0.191 0.189 0.013 0.013 0.013 ## C2 0.050 0.050 0.050 0.239 0.238 0.239 ## C3 0.000 0.002 0.001 0.065 0.065 0.065 ## C4 0.000 0.004 0.000 0.212 0.212 0.212 # estimated class sizes dfr <- data.frame( "Theta"=Theta, "din"=mod13a$attribute.patt$class.prob, "gdm"=mod13b$pi.k, "mirt"=mod13c@Prior[[1]]) # comparison round(dfr,3) ## Theta.1 Theta.2 Theta.3 din gdm mirt ## 1 0 0 0 0.039 0.041 0.040 ## 2 1 0 0 0.008 0.009 0.009 ## 3 0 1 0 0.009 0.007 0.008 ## 4 0 0 1 0.394 0.417 0.412 ## 5 1 1 0 0.011 0.011 0.011 ## 6 1 0 1 0.017 0.042 0.037 ## 7 0 1 1 0.042 0.008 0.016 ## 8 1 1 1 0.480 0.465 0.467 #***************************************************** # Model 14: DINA model with two skills #***************************************************** # define some simple Q-Matrix (does not really make in this application) Q <- matrix( 0, nrow=12,ncol=2) Q[1:4,1] <- 1 Q[5:8,2] <- 1 Q[9:12,1:2] <- 1 # define discrete theta distribution with 3 dimensions Theta <- scan(what="character",nlines=1) 00 10 01 11 Theta <- as.numeric( unlist( lapply( Theta, strsplit, split="") ) ) Theta <- matrix(Theta, 4, 2, byrow=TRUE ) Theta #---- Model 14a: din (in CDM) mod14a <- CDM::din( dat, q.matrix=Q, rule="DINA") summary(mod14a) # compare used Theta distributions cbind( Theta, mod14a$attribute.patt.splitted) #---- Model 14b: mirt (in mirt) # define mirt model I <- ncol(dat) # I=12 mirtmodel <- mirt::mirt.model(" F1=1-4 F2=5-8 (F1*F2)=9-12 ") #-> constructions like (F1*F2*F3) are also allowed in mirt.model # get parameters mod.pars <- mirt(dat, model=mirtmodel, pars="values") # starting values d parameters (transformed guessing parameters) ind <- which( mod.pars$name=="d" ) mod.pars[ind,"value"] <- stats::qlogis(.2) # starting values transformed slipping parameters ind <- which( ( mod.pars$name %in% paste0("a",1:3) ) & ( mod.pars$est ) ) mod.pars[ind,"value"] <- stats::qlogis(.8) - stats::qlogis(.2) mod.pars #* use above defined prior lca_prior # lca_prior <- function(prior,Etable) ... #* estimate model mod14b <- mirt(dat, mirtmodel, technical=list( customTheta=Theta, customPriorFun=lca_prior), pars=mod.pars, verbose=TRUE ) # estimated parameters from the user customized prior distribution. mod14b@nest <- as.integer(sum(mod.pars$est) + 2) #* extract item parameters coef14b <- sirt::mirt.wrapper.coef(mod14b)$coef #-* comparisons of estimated parameters # extract guessing and slipping parameters from din dfr <- coef(mod14a)[, c("guess","slip") ] colnames(dfr) <- paste0("din.",c("guess","slip") ) # estimated parameters from mirt dfr$mirt.guess <- stats::plogis( coef14b$d ) dfr$mirt.slip <- 1 - stats::plogis( rowSums(coef14b[,c("d","a1","a2","a3")]) ) # comparison round(dfr[, c(1,3,2,4)],3) ## din.guess mirt.guess din.slip mirt.slip ## A1 0.674 0.671 0.030 0.030 ## A2 0.423 0.420 0.049 0.050 ## A3 0.258 0.255 0.224 0.225 ## A4 0.245 0.243 0.394 0.395 ## B1 0.534 0.543 0.166 0.164 ## B2 0.338 0.347 0.382 0.380 ## B3 0.796 0.802 0.016 0.015 ## B4 0.421 0.436 0.142 0.140 ## C1 0.850 0.851 0.000 0.000 ## C2 0.480 0.480 0.097 0.097 ## C3 0.746 0.746 0.026 0.026 ## C4 0.575 0.577 0.136 0.137 # estimated class sizes dfr <- data.frame( "Theta"=Theta, "din"=mod13a$attribute.patt$class.prob, "mirt"=mod14b@Prior[[1]]) # comparison round(dfr,3) ## Theta.1 Theta.2 din mirt ## 1 0 0 0.357 0.369 ## 2 1 0 0.044 0.049 ## 3 0 1 0.047 0.031 ## 4 1 1 0.553 0.551 #***************************************************** # Model 15: Rasch model with non-normal distribution #***************************************************** # A non-normal theta distributed is specified by log-linear smoothing # the distribution as described in # Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model # to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS. # define theta grid theta.k <- matrix( seq(-4,4,len=15), ncol=1 ) # define design matrix for smoothing (up to cubic moments) delta.designmatrix <- cbind( 1, theta.k, theta.k^2, theta.k^3 ) # constrain item difficulty of fifth item (item B1) to zero b.constraint <- matrix( c(5,1,0), ncol=3 ) #---- Model 15a: gdm (in CDM) mod15a <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, b.constraint=b.constraint ) summary(mod15a) # plot estimated distribution graphics::barplot( mod15a$pi.k[,1], space=0, names.arg=round(theta.k[,1],2), main="Estimated Skewed Distribution (gdm function)") #---- Model 15b: mirt (in mirt) # define mirt model mirtmodel <- mirt::mirt.model(" F=1-12 ") # get parameters mod.pars <- mirt::mirt(dat, model=mirtmodel, pars="values", itemtype="Rasch") # fix variance (just for correct counting of parameters) mod.pars[ mod.pars$name=="COV_11", "est"] <- FALSE # fix item difficulty ind <- which( ( mod.pars$item=="B1" ) & ( mod.pars$name=="d" ) ) mod.pars[ ind, "value"] <- 0 mod.pars[ ind, "est"] <- FALSE # define prior loglinear_prior <- function(Theta,Etable){ TP <- nrow(Theta) if ( is.null(Etable) ){ prior <- rep( 1/TP, TP ) } # process Etable (this is correct for datasets without missing data) if ( ! is.null(Etable) ){ # sum over correct and incorrect expected responses prior <- ( rowSums( Etable[, seq(1,2*I,2)] ) + rowSums( Etable[, seq(2,2*I,2)] ) )/I # smooth prior using the above design matrix and a log-linear model # see Xu & von Davier (2008). y <- log( prior + 1E-15 ) lm1 <- lm( y ~ 0 + delta.designmatrix, weights=prior ) prior <- exp(fitted(lm1)) # smoothed prior } prior <- prior / sum(prior) return(prior) } #* estimate model mod15b <- mirt::mirt(dat, mirtmodel, technical=list( customTheta=theta.k, customPriorFun=loglinear_prior ), pars=mod.pars, verbose=TRUE ) # estimated parameters from the user customized prior distribution. mod15b@nest <- as.integer(sum(mod.pars$est) + 3) #* extract item parameters coef1 <- sirt::mirt.wrapper.coef(mod15b)$coef #** compare estimated item parameters dfr <- data.frame( "gdm"=mod15a$item$b.Cat1, "mirt"=coef1$d ) rownames(dfr) <- colnames(dat) round(t(dfr),4) ## A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 ## gdm 0.9818 0.1538 -0.7837 -1.3197 0 -1.0902 1.6088 -0.170 1.9778 0.006 1.1859 0.135 ## mirt 0.9829 0.1548 -0.7826 -1.3186 0 -1.0892 1.6099 -0.169 1.9790 0.007 1.1870 0.136 # compare estimated theta distribution dfr <- data.frame( "gdm"=mod15a$pi.k, "mirt"=mod15b@Prior[[1]] ) round(t(dfr),4) ## 1 2 3 4 5 6 7 8 9 10 11 12 13 ## gdm 0 0 1e-04 9e-04 0.0056 0.0231 0.0652 0.1299 0.1881 0.2038 0.1702 0.1129 0.0612 ## mirt 0 0 1e-04 9e-04 0.0056 0.0232 0.0653 0.1300 0.1881 0.2038 0.1702 0.1128 0.0611 ## 14 15 ## gdm 0.0279 0.011 ## mirt 0.0278 0.011 ## End(Not run)
## Not run: data(data.read) dat <- data.read I <- ncol(dat) # list of needed packages for the following examples packages <- scan(what="character") eRm ltm TAM mRm CDM mirt psychotools IsingFit igraph qgraph pcalg poLCA randomLCA psychomix MplusAutomation lavaan # load packages. make an installation if necessary miceadds::library_install(packages) #***************************************************** # Model 1: Rasch model #***************************************************** #---- M1a: rasch.mml2 (in sirt) mod1a <- sirt::rasch.mml2(dat) summary(mod1a) #---- M1b: smirt (in sirt) Qmatrix <- matrix(1,nrow=I, ncol=1) mod1b <- sirt::smirt(dat,Qmatrix=Qmatrix) summary(mod1b) #---- M1c: gdm (in CDM) theta.k <- seq(-6,6,len=21) mod1c <- CDM::gdm(dat,theta.k=theta.k,irtmodel="1PL", skillspace="normal") summary(mod1c) #---- M1d: tam.mml (in TAM) mod1d <- TAM::tam.mml( resp=dat ) summary(mod1d) #---- M1e: RM (in eRm) mod1e <- eRm::RM( dat ) # eRm uses Conditional Maximum Likelihood (CML) as the estimation method. summary(mod1e) eRm::plotPImap(mod1e) #---- M1f: mrm (in mRm) mod1f <- mRm::mrm( dat, cl=1) # CML estimation mod1f$beta # item parameters #---- M1g: mirt (in mirt) mod1g <- mirt::mirt( dat, model=1, itemtype="Rasch", verbose=TRUE ) print(mod1g) summary(mod1g) coef(mod1g) # arrange coefficients in nicer layout sirt::mirt.wrapper.coef(mod1g)$coef #---- M1h: rasch (in ltm) mod1h <- ltm::rasch( dat, control=list(verbose=TRUE ) ) summary(mod1h) coef(mod1h) #---- M1i: RaschModel.fit (in psychotools) mod1i <- psychotools::RaschModel.fit(dat) # CML estimation summary(mod1i) plot(mod1i) #---- M1j: noharm.sirt (in sirt) Fpatt <- matrix( 0, I, 1 ) Fval <- 1 + 0*Fpatt Ppatt <- Pval <- matrix(1,1,1) mod1j <- sirt::noharm.sirt( dat=dat, Ppatt=Ppatt, Fpatt=Fpatt, Fval=Fval, Pval=Pval) summary(mod1j) # Normal-ogive model, multiply item discriminations with constant D=1.7. # The same holds for other examples with noharm.sirt and R2noharm. plot(mod1j) #---- M1k: rasch.pml3 (in sirt) mod1k <- sirt::rasch.pml3( dat=dat) # pairwise marginal maximum likelihood estimation summary(mod1k) #---- M1l: running Mplus (using MplusAutomation package) mplus_path <- "c:/Mplus7/Mplus.exe" # locate Mplus executable #**************** # specify Mplus object mplusmod <- MplusAutomation::mplusObject( TITLE="1PL in Mplus ;", VARIABLE=paste0( "CATEGORICAL ARE ", paste0(colnames(dat),collapse=" ") ), MODEL=" ! fix all item loadings to 1 F1 BY A1@1 A2@1 A3@1 A4@1 ; F1 BY B1@1 B2@1 B3@1 B4@1 ; F1 BY C1@1 C2@1 C3@1 C4@1 ; ! estimate variance F1 ; ", ANALYSIS="ESTIMATOR=MLR;", OUTPUT="stand;", usevariables=colnames(dat), rdata=dat ) #**************** # write Mplus syntax filename <- "mod1u" # specify file name # create Mplus syntaxes res2 <- MplusAutomation::mplusModeler(object=mplusmod, dataout=paste0(filename,".dat"), modelout=paste0(filename,".inp"), run=0 ) # run Mplus model MplusAutomation::runModels( filefilter=paste0(filename,".inp"), Mplus_command=mplus_path) # alternatively, the system() command can also be used # get results mod1l <- MplusAutomation::readModels(target=getwd(), filefilter=filename ) mod1l$summaries # summaries mod1l$parameters$unstandardized # parameter estimates #***************************************************** # Model 2: 2PL model #***************************************************** #---- M2a: rasch.mml2 (in sirt) mod2a <- sirt::rasch.mml2(dat, est.a=1:I) summary(mod2a) #---- M2b: smirt (in sirt) mod2b <- sirt::smirt(dat,Qmatrix=Qmatrix,est.a="2PL") summary(mod2b) #---- M2c: gdm (in CDM) mod2c <- CDM::gdm(dat,theta.k=theta.k,irtmodel="2PL", skillspace="normal") summary(mod2c) #---- M2d: tam.mml (in TAM) mod2d <- TAM::tam.mml.2pl( resp=dat ) summary(mod2d) #---- M2e: mirt (in mirt) mod2e <- mirt::mirt( dat, model=1, itemtype="2PL" ) print(mod2e) summary(mod2e) sirt::mirt.wrapper.coef(mod1g)$coef #---- M2f: ltm (in ltm) mod2f <- ltm::ltm( dat ~ z1, control=list(verbose=TRUE ) ) summary(mod2f) coef(mod2f) plot(mod2f) #---- M2g: R2noharm (in NOHARM, running from within R using sirt package) # define noharm.path where 'NoharmCL.exe' is located noharm.path <- "c:/NOHARM" # covariance matrix P.pattern <- matrix( 1, ncol=1, nrow=1 ) P.init <- P.pattern P.init[1,1] <- 1 # loading matrix F.pattern <- matrix(1,I,1) F.init <- F.pattern # estimate model mod2g <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="ex2g", noharm.path=noharm.path, dec="," ) summary(mod2g) #---- M2h: noharm.sirt (in sirt) mod2h <- sirt::noharm.sirt( dat=dat, Ppatt=P.pattern,Fpatt=F.pattern, Fval=F.init, Pval=P.init ) summary(mod2h) plot(mod2h) #---- M2i: rasch.pml2 (in sirt) mod2i <- sirt::rasch.pml2(dat, est.a=1:I) summary(mod2i) #---- M2j: WLSMV estimation with cfa (in lavaan) lavmodel <- "F=~ A1+A2+A3+A4+B1+B2+B3+B4+ C1+C2+C3+C4" mod2j <- lavaan::cfa( data=dat, model=lavmodel, std.lv=TRUE, ordered=colnames(dat)) summary(mod2j, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE) #***************************************************** # Model 3: 3PL model (note that results can be quite unstable!) #***************************************************** #---- M3a: rasch.mml2 (in sirt) mod3a <- sirt::rasch.mml2(dat, est.a=1:I, est.c=1:I) summary(mod3a) #---- M3b: smirt (in sirt) mod3b <- sirt::smirt(dat,Qmatrix=Qmatrix,est.a="2PL", est.c=1:I) summary(mod3b) #---- M3c: mirt (in mirt) mod3c <- mirt::mirt( dat, model=1, itemtype="3PL", verbose=TRUE) summary(mod3c) coef(mod3c) # stabilize parameter estimating using informative priors for guessing parameters mirtmodel <- mirt::mirt.model(" F=1-12 PRIOR=(1-12, g, norm, -1.38, 0.25) ") # a prior N(-1.38,.25) is specified for transformed guessing parameters: qlogis(g) # simulate values from this prior for illustration N <- 100000 logit.g <- stats::rnorm(N, mean=-1.38, sd=sqrt(.5) ) graphics::plot( stats::density(logit.g) ) # transformed qlogis(g) graphics::plot( stats::density( stats::plogis(logit.g)) ) # g parameters # estimate 3PL with priors mod3c1 <- mirt::mirt(dat, mirtmodel, itemtype="3PL",verbose=TRUE) coef(mod3c1) # In addition, set upper bounds for g parameters of .35 mirt.pars <- mirt::mirt( dat, mirtmodel, itemtype="3PL", pars="values") ind <- which( mirt.pars$name=="g" ) mirt.pars[ ind, "value" ] <- stats::plogis(-1.38) mirt.pars[ ind, "ubound" ] <- .35 # prior distribution for slopes ind <- which( mirt.pars$name=="a1" ) mirt.pars[ ind, "prior_1" ] <- 1.3 mirt.pars[ ind, "prior_2" ] <- 2 mod3c2 <- mirt::mirt(dat, mirtmodel, itemtype="3PL", pars=mirt.pars,verbose=TRUE, technical=list(NCYCLES=100) ) coef(mod3c2) sirt::mirt.wrapper.coef(mod3c2) #---- M3d: ltm (in ltm) mod3d <- ltm::tpm( dat, control=list(verbose=TRUE), max.guessing=.3) summary(mod3d) coef(mod3d) #=> numerical instabilities #***************************************************** # Model 4: 3-dimensional Rasch model #***************************************************** # define Q-matrix Q <- matrix( 0, nrow=12, ncol=3 ) Q[ cbind(1:12, rep(1:3,each=4) ) ] <- 1 rownames(Q) <- colnames(dat) colnames(Q) <- c("A","B","C") # define nodes theta.k <- seq(-6,6,len=13) #---- M4a: smirt (in sirt) mod4a <- sirt::smirt(dat,Qmatrix=Q,irtmodel="comp", theta.k=theta.k, maxiter=30) summary(mod4a) #---- M4b: rasch.mml2 (in sirt) mod4b <- sirt::rasch.mml2(dat,Q=Q,theta.k=theta.k, mmliter=30) summary(mod4b) #---- M4c: gdm (in CDM) mod4c <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="normal", Qmatrix=Q, maxiter=30, centered.latent=TRUE ) summary(mod4c) #---- M4d: tam.mml (in TAM) mod4d <- TAM::tam.mml( resp=dat, Q=Q, control=list(nodes=theta.k, maxiter=30) ) summary(mod4d) #---- M4e: R2noharm (in NOHARM, running from within R using sirt package) noharm.path <- "c:/NOHARM" # covariance matrix P.pattern <- matrix( 1, ncol=3, nrow=3 ) P.init <- 0.8+0*P.pattern diag(P.init) <- 1 # loading matrix F.pattern <- 0*Q F.init <- Q # estimate model mod4e <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="ex4e", noharm.path=noharm.path, dec="," ) summary(mod4e) #---- M4f: mirt (in mirt) cmodel <- mirt::mirt.model(" F1=1-4 F2=5-8 F3=9-12 # equal item slopes correspond to the Rasch model CONSTRAIN=(1-4, a1), (5-8, a2), (9-12,a3) COV=F1*F2, F1*F3, F2*F3 " ) mod4f <- mirt::mirt(dat, cmodel, verbose=TRUE) summary(mod4f) #***************************************************** # Model 5: 3-dimensional 2PL model #***************************************************** #---- M5a: smirt (in sirt) mod5a <- sirt::smirt(dat,Qmatrix=Q,irtmodel="comp", est.a="2PL", theta.k=theta.k, maxiter=30) summary(mod5a) #---- M5b: rasch.mml2 (in sirt) mod5b <- sirt::rasch.mml2(dat,Q=Q,theta.k=theta.k,est.a=1:12, mmliter=30) summary(mod5b) #---- M5c: gdm (in CDM) mod5c <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, skillspace="loglinear", Qmatrix=Q, maxiter=30, centered.latent=TRUE, standardized.latent=TRUE) summary(mod5c) #---- M5d: tam.mml (in TAM) mod5d <- TAM::tam.mml.2pl( resp=dat, Q=Q, control=list(nodes=theta.k, maxiter=30) ) summary(mod5d) #---- M5e: R2noharm (in NOHARM, running from within R using sirt package) noharm.path <- "c:/NOHARM" # covariance matrix P.pattern <- matrix( 1, ncol=3, nrow=3 ) diag(P.pattern) <- 0 P.init <- 0.8+0*P.pattern diag(P.init) <- 1 # loading matrix F.pattern <- Q F.init <- Q # estimate model mod5e <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern, P.init=P.init, writename="ex5e", noharm.path=noharm.path, dec="," ) summary(mod5e) #---- M5f: mirt (in mirt) cmodel <- mirt::mirt.model(" F1=1-4 F2=5-8 F3=9-12 COV=F1*F2, F1*F3, F2*F3 " ) mod5f <- mirt::mirt(dat, cmodel, verbose=TRUE) summary(mod5f) #***************************************************** # Model 6: Network models (Graphical models) #***************************************************** #---- M6a: Ising model using the IsingFit package (undirected graph) # - fit Ising model using the "OR rule" (AND=FALSE) mod6a <- IsingFit::IsingFit(x=dat, family="binomial", AND=FALSE) summary(mod6a) ## Network Density: 0.29 ## Gamma: 0.25 ## Rule used: Or-rule # plot results qgraph::qgraph(mod6a$weiadj,fade=FALSE) #**-- graph estimation using pcalg package # some packages from Bioconductor must be downloaded at first (if not yet done) if (FALSE){ # set 'if (TRUE)' if packages should be downloaded source("http://bioconductor.org/biocLite.R") biocLite("RBGL") biocLite("Rgraphviz") } #---- M6b: graph estimation based on Pearson correlations V <- colnames(dat) n <- nrow(dat) mod6b <- pcalg::pc(suffStat=list(C=stats::cor(dat), n=n ), indepTest=gaussCItest, ## indep.test: partial correlations alpha=0.05, labels=V, verbose=TRUE) plot(mod6b) # plot in qgraph package qgraph::qgraph(mod6b, label.color=rep( c( "red", "blue","darkgreen" ), each=4 ), edge.color="black") summary(mod6b) #---- M6c: graph estimation based on tetrachoric correlations mod6c <- pcalg::pc(suffStat=list(C=sirt::tetrachoric2(dat)$rho, n=n ), indepTest=gaussCItest, alpha=0.05, labels=V, verbose=TRUE) plot(mod6c) summary(mod6c) #---- M6d: Statistical implicative analysis (in sirt) mod6d <- sirt::sia.sirt(dat, significance=.85 ) # plot results with igraph and qgraph package plot( mod6d$igraph.obj, vertex.shape="rectangle", vertex.size=30 ) qgraph::qgraph( mod6d$adj.matrix ) #***************************************************** # Model 7: Latent class analysis with 3 classes #***************************************************** #---- M7a: randomLCA (in randomLCA) # - use two trials of starting values mod7a <- randomLCA::randomLCA(dat,nclass=3, notrials=2, verbose=TRUE) summary(mod7a) plot(mod7a,type="l", xlab="Item") #---- M7b: rasch.mirtlc (in sirt) mod7b <- sirt::rasch.mirtlc( dat, Nclasses=3,seed=-30, nstarts=2 ) summary(mod7b) matplot( t(mod7b$pjk), type="l", xlab="Item" ) #---- M7c: poLCA (in poLCA) # define formula for outcomes f7c <- paste0( "cbind(", paste0(colnames(dat),collapse=","), ") ~ 1 " ) dat1 <- as.data.frame( dat + 1 ) # poLCA needs integer values from 1,2,.. mod7c <- poLCA::poLCA( stats::as.formula(f7c),dat1,nclass=3, verbose=TRUE) plot(mod7c) #---- M7d: gom.em (in sirt) # - the latent class model is a special grade of membership model mod7d <- sirt::gom.em( dat, K=3, problevels=c(0,1), model="GOM" ) summary(mod7d) #---- - M7e: mirt (in mirt) # define three latent classes Theta <- diag(3) # define mirt model I <- ncol(dat) # I=12 mirtmodel <- mirt::mirt.model(" C1=1-12 C2=1-12 C3=1-12 ") # get initial parameter values mod.pars <- mirt::mirt(dat, model=mirtmodel, pars="values") # modify parameters: only slopes refer to item-class probabilities set.seed(9976) # set starting values for class specific item probabilities mod.pars[ mod.pars$name=="d","value" ] <- 0 mod.pars[ mod.pars$name=="d","est" ] <- FALSE b1 <- stats::qnorm( colMeans( dat ) ) mod.pars[ mod.pars$name=="a1","value" ] <- b1 # random starting values for other classes mod.pars[ mod.pars$name %in% c("a2","a3"),"value" ] <- b1 + stats::runif(12*2,-1,1) mod.pars #** define prior for latent class analysis lca_prior <- function(Theta,Etable){ # number of latent Theta classes TP <- nrow(Theta) # prior in initial iteration if ( is.null(Etable) ){ prior <- rep( 1/TP, TP ) } # process Etable (this is correct for datasets without missing data) if ( ! is.null(Etable) ){ # sum over correct and incorrect expected responses prior <- ( rowSums(Etable[, seq(1,2*I,2)]) + rowSums(Etable[,seq(2,2*I,2)]) )/I } prior <- prior / sum(prior) return(prior) } #** estimate model mod7e <- mirt::mirt(dat, mirtmodel, pars=mod.pars, verbose=TRUE, technical=list( customTheta=Theta, customPriorFun=lca_prior) ) # compare estimated results print(mod7e) summary(mod7b) # The number of estimated parameters is incorrect because mirt does not correctly count # estimated parameters from the user customized prior distribution. mod7e@nest <- as.integer(sum(mod.pars$est) + 2) # two additional class probabilities # extract log-likelihood mod7e@logLik # compute AIC and BIC ( AIC <- -2*mod7e@logLik+2*mod7e@nest ) ( BIC <- -2*mod7e@logLik+log(mod7e@Data$N)*mod7e@nest ) # RMSEA and SRMSR fit statistic mirt::M2(mod7e) # TLI and CFI does not make sense in this example #** extract item parameters sirt::mirt.wrapper.coef(mod7e) #** extract class-specific item-probabilities probs <- apply( coef1[, c("a1","a2","a3") ], 2, stats::plogis ) matplot( probs, type="l", xlab="Item", main="mirt::mirt") #** inspect estimated distribution mod7e@Theta mod7e@Prior[[1]] #***************************************************** # Model 8: Mixed Rasch model with two classes #***************************************************** #---- M8a: raschmix (in psychomix) mod8a <- psychomix::raschmix(data=as.matrix(dat), k=2, scores="saturated") summary(mod8a) #---- M8b: mrm (in mRm) mod8b <- mRm::mrm(data.matrix=dat, cl=2) mod8b$conv.to.bound plot(mod8b) print(mod8b) #---- M8c: mirt (in mirt) #* define theta grid theta.k <- seq( -5, 5, len=9 ) TP <- length(theta.k) Theta <- matrix( 0, nrow=2*TP, ncol=4) Theta[1:TP,1:2] <- cbind(theta.k, 1 ) Theta[1:TP + TP,3:4] <- cbind(theta.k, 1 ) Theta # define model I <- ncol(dat) # I=12 mirtmodel <- mirt::mirt.model(" F1a=1-12 # slope Class 1 F1b=1-12 # difficulty Class 1 F2a=1-12 # slope Class 2 F2b=1-12 # difficulty Class 2 CONSTRAIN=(1-12,a1),(1-12,a3) ") # get initial parameter values mod.pars <- mirt::mirt(dat, model=mirtmodel, pars="values") # set starting values for class specific item probabilities mod.pars[ mod.pars$name=="d","value" ] <- 0 mod.pars[ mod.pars$name=="d","est" ] <- FALSE mod.pars[ mod.pars$name=="a1","value" ] <- 1 mod.pars[ mod.pars$name=="a3","value" ] <- 1 # initial values difficulties b1 <- stats::qlogis( colMeans(dat) ) mod.pars[ mod.pars$name=="a2","value" ] <- b1 mod.pars[ mod.pars$name=="a4","value" ] <- b1 + stats::runif(I, -1, 1) #* define prior for mixed Rasch analysis mixed_prior <- function(Theta,Etable){ NC <- 2 # number of theta classes TP <- nrow(Theta) / NC prior1 <- stats::dnorm( Theta[1:TP,1] ) prior1 <- prior1 / sum(prior1) if ( is.null(Etable) ){ prior <- c( prior1, prior1 ) } if ( ! is.null(Etable) ){ prior <- ( rowSums( Etable[, seq(1,2*I,2)] ) + rowSums( Etable[,seq(2,2*I,2)]) )/I a1 <- stats::aggregate( prior, list( rep(1:NC, each=TP) ), sum ) a1[,2] <- a1[,2] / sum( a1[,2]) # print some information during estimation cat( paste0( " Class proportions: ", paste0( round(a1[,2], 3 ), collapse=" " ) ), "\n") a1 <- rep( a1[,2], each=TP ) # specify mixture of two normal distributions prior <- a1*c(prior1,prior1) } prior <- prior / sum(prior) return(prior) } #* estimate model mod8c <- mirt::mirt(dat, mirtmodel, pars=mod.pars, verbose=TRUE, technical=list( customTheta=Theta, customPriorFun=mixed_prior ) ) # Like in Model 7e, the number of estimated parameters must be included. mod8c@nest <- as.integer(sum(mod.pars$est) + 1) # two class proportions and therefore one probability is freely estimated. #* extract item parameters sirt::mirt.wrapper.coef(mod8c) #* estimated distribution mod8c@Theta mod8c@Prior #---- M8d: tamaan (in TAM) tammodel <- " ANALYSIS: TYPE=MIXTURE ; NCLASSES(2); NSTARTS(7,20); LAVAAN MODEL: F=~ A1__C4 F ~~ F ITEM TYPE: ALL(Rasch); " mod8d <- TAM::tamaan( tammodel, resp=dat ) summary(mod8d) # plot item parameters I <- 12 ipars <- mod8d$itempartable_MIXTURE[ 1:I, ] plot( 1:I, ipars[,3], type="o", ylim=range( ipars[,3:4] ), pch=16, xlab="Item", ylab="Item difficulty") lines( 1:I, ipars[,4], type="l", col=2, lty=2) points( 1:I, ipars[,4], col=2, pch=2) #***************************************************** # Model 9: Mixed 2PL model with two classes #***************************************************** #---- M9a: tamaan (in TAM) tammodel <- " ANALYSIS: TYPE=MIXTURE ; NCLASSES(2); NSTARTS(10,30); LAVAAN MODEL: F=~ A1__C4 F ~~ F ITEM TYPE: ALL(2PL); " mod9a <- TAM::tamaan( tammodel, resp=dat ) summary(mod9a) #***************************************************** # Model 10: Rasch testlet model #***************************************************** #---- M10a: tam.fa (in TAM) dims <- substring( colnames(dat),1,1 ) # define dimensions mod10a <- TAM::tam.fa( resp=dat, irtmodel="bifactor1", dims=dims, control=list(maxiter=60) ) summary(mod10a) #---- M10b: mirt (in mirt) cmodel <- mirt::mirt.model(" G=1-12 A=1-4 B=5-8 C=9-12 CONSTRAIN=(1-12,a1), (1-4, a2), (5-8, a3), (9-12,a4) ") mod10b <- mirt::mirt(dat, model=cmodel, verbose=TRUE) summary(mod10b) coef(mod10b) mod10b@logLik # equivalent is slot( mod10b, "logLik") #alternatively, using a dimensional reduction approach (faster and better accuracy) cmodel <- mirt::mirt.model(" G=1-12 CONSTRAIN=(1-12,a1), (1-4, a2), (5-8, a3), (9-12,a4) ") item_bundles <- rep(c(1,2,3), each=4) mod10b1 <- mirt::bfactor(dat, model=item_bundles, model2=cmodel, verbose=TRUE) coef(mod10b1) #---- M10c: smirt (in sirt) # define Q-matrix Qmatrix <- matrix(0,12,4) Qmatrix[,1] <- 1 Qmatrix[ cbind( 1:12, match( dims, unique(dims)) +1 ) ] <- 1 # uncorrelated factors variance.fixed <- cbind( c(1,1,1,2,2,3), c(2,3,4,3,4,4), 0 ) # estimate model mod10c <- sirt::smirt( dat, Qmatrix=Qmatrix, irtmodel="comp", variance.fixed=variance.fixed, qmcnodes=1000, maxiter=60) summary(mod10c) #***************************************************** # Model 11: Bifactor model #***************************************************** #---- M11a: tam.fa (in TAM) dims <- substring( colnames(dat),1,1 ) # define dimensions mod11a <- TAM::tam.fa( resp=dat, irtmodel="bifactor2", dims=dims, control=list(maxiter=60) ) summary(mod11a) #---- M11b: bfactor (in mirt) dims1 <- match( dims, unique(dims) ) mod11b <- mirt::bfactor(dat, model=dims1, verbose=TRUE) summary(mod11b) coef(mod11b) mod11b@logLik #---- M11c: smirt (in sirt) # define Q-matrix Qmatrix <- matrix(0,12,4) Qmatrix[,1] <- 1 Qmatrix[ cbind( 1:12, match( dims, unique(dims)) +1 ) ] <- 1 # uncorrelated factors variance.fixed <- cbind( c(1,1,1,2,2,3), c(2,3,4,3,4,4), 0 ) # estimate model mod11c <- sirt::smirt( dat, Qmatrix=Qmatrix, irtmodel="comp", est.a="2PL", variance.fixed=variance.fixed, qmcnodes=1000, maxiter=60) summary(mod11c) #***************************************************** # Model 12: Located latent class model: Rasch model with three theta classes #***************************************************** # use 10th item as the reference item ref.item <- 10 # ability grid theta.k <- seq(-4,4,len=9) #---- M12a: rasch.mirtlc (in sirt) mod12a <- sirt::rasch.mirtlc(dat, Nclasses=3, modeltype="MLC1", ref.item=ref.item) summary(mod12a) #---- M12b: gdm (in CDM) theta.k <- seq(-1, 1, len=3) # initial matrix b.constraint <- matrix( c(10,1,0), nrow=1,ncol=3) # estimate model mod12b <- CDM::gdm( dat, theta.k=theta.k, skillspace="est", irtmodel="1PL", b.constraint=b.constraint, maxiter=200) summary(mod12b) #---- M12c: mirt (in mirt) items <- colnames(dat) # define three latent classes Theta <- diag(3) # define mirt model I <- ncol(dat) # I=12 mirtmodel <- mirt::mirt.model(" C1=1-12 C2=1-12 C3=1-12 CONSTRAIN=(1-12,a1),(1-12,a2),(1-12,a3) ") # get parameters mod.pars <- mirt(dat, model=mirtmodel, pars="values") # set starting values for class specific item probabilities mod.pars[ mod.pars$name=="d","value" ] <- stats::qlogis( colMeans(dat,na.rm=TRUE) ) # set item difficulty of reference item to zero ind <- which( ( paste(mod.pars$item)==items[ref.item] ) & ( ( paste(mod.pars$name)=="d" ) ) ) mod.pars[ ind,"value" ] <- 0 mod.pars[ ind,"est" ] <- FALSE # initial values for a1, a2 and a3 mod.pars[ mod.pars$name %in% c("a1","a2","a3"),"value" ] <- c(-1,0,1) mod.pars #* define prior for latent class analysis lca_prior <- function(Theta,Etable){ # number of latent Theta classes TP <- nrow(Theta) # prior in initial iteration if ( is.null(Etable) ){ prior <- rep( 1/TP, TP ) } # process Etable (this is correct for datasets without missing data) if ( ! is.null(Etable) ){ # sum over correct and incorrect expected responses prior <- ( rowSums( Etable[, seq(1,2*I,2)] ) + rowSums( Etable[, seq(2,2*I,2)] ) )/I } prior <- prior / sum(prior) return(prior) } #* estimate model mod12c <- mirt(dat, mirtmodel, technical=list( customTheta=Theta, customPriorFun=lca_prior), pars=mod.pars, verbose=TRUE ) # estimated parameters from the user customized prior distribution. mod12c@nest <- as.integer(sum(mod.pars$est) + 2) #* extract item parameters coef1 <- sirt::mirt.wrapper.coef(mod12c) #* inspect estimated distribution mod12c@Theta coef1$coef[1,c("a1","a2","a3")] mod12c@Prior[[1]] #***************************************************** # Model 13: Multidimensional model with discrete traits #***************************************************** # define Q-Matrix Q <- matrix( 0, nrow=12,ncol=3) Q[1:4,1] <- 1 Q[5:8,2] <- 1 Q[9:12,3] <- 1 # define discrete theta distribution with 3 dimensions Theta <- scan(what="character",nlines=1) 000 100 010 001 110 101 011 111 Theta <- as.numeric( unlist( lapply( Theta, strsplit, split="") ) ) Theta <- matrix(Theta, 8, 3, byrow=TRUE ) Theta #---- Model 13a: din (in CDM) mod13a <- CDM::din( dat, q.matrix=Q, rule="DINA") summary(mod13a) # compare used Theta distributions cbind( Theta, mod13a$attribute.patt.splitted) #---- Model 13b: gdm (in CDM) mod13b <- CDM::gdm( dat, Qmatrix=Q, theta.k=Theta, skillspace="full") summary(mod13b) #---- Model 13c: mirt (in mirt) # define mirt model I <- ncol(dat) # I=12 mirtmodel <- mirt::mirt.model(" F1=1-4 F2=5-8 F3=9-12 ") # get parameters mod.pars <- mirt(dat, model=mirtmodel, pars="values") # starting values d parameters (transformed guessing parameters) ind <- which( mod.pars$name=="d" ) mod.pars[ind,"value"] <- stats::qlogis(.2) # starting values transformed slipping parameters ind <- which( ( mod.pars$name %in% paste0("a",1:3) ) & ( mod.pars$est ) ) mod.pars[ind,"value"] <- stats::qlogis(.8) - stats::qlogis(.2) mod.pars #* define prior for latent class analysis lca_prior <- function(Theta,Etable){ TP <- nrow(Theta) if ( is.null(Etable) ){ prior <- rep( 1/TP, TP ) } if ( ! is.null(Etable) ){ prior <- ( rowSums( Etable[, seq(1,2*I,2)] ) + rowSums( Etable[, seq(2,2*I,2)] ) )/I } prior <- prior / sum(prior) return(prior) } #* estimate model mod13c <- mirt(dat, mirtmodel, technical=list( customTheta=Theta, customPriorFun=lca_prior), pars=mod.pars, verbose=TRUE ) # estimated parameters from the user customized prior distribution. mod13c@nest <- as.integer(sum(mod.pars$est) + 2) #* extract item parameters coef13c <- sirt::mirt.wrapper.coef(mod13c)$coef #* inspect estimated distribution mod13c@Theta mod13c@Prior[[1]] #-* comparisons of estimated parameters # extract guessing and slipping parameters from din dfr <- coef(mod13a)[, c("guess","slip") ] colnames(dfr) <- paste0("din.",c("guess","slip") ) # estimated parameters from gdm dfr$gdm.guess <- stats::plogis(mod13b$item$b) dfr$gdm.slip <- 1 - stats::plogis( rowSums(mod13b$item[,c("b.Cat1","a.F1","a.F2","a.F3")] ) ) # estimated parameters from mirt dfr$mirt.guess <- stats::plogis( coef13c$d ) dfr$mirt.slip <- 1 - stats::plogis( rowSums(coef13c[,c("d","a1","a2","a3")]) ) # comparison round(dfr[, c(1,3,5,2,4,6)],3) ## din.guess gdm.guess mirt.guess din.slip gdm.slip mirt.slip ## A1 0.691 0.684 0.686 0.000 0.000 0.000 ## A2 0.491 0.489 0.489 0.031 0.038 0.036 ## A3 0.302 0.300 0.300 0.184 0.193 0.190 ## A4 0.244 0.239 0.240 0.337 0.340 0.339 ## B1 0.568 0.579 0.577 0.163 0.148 0.151 ## B2 0.329 0.344 0.340 0.344 0.326 0.329 ## B3 0.817 0.827 0.825 0.014 0.007 0.009 ## B4 0.431 0.463 0.456 0.104 0.089 0.092 ## C1 0.188 0.191 0.189 0.013 0.013 0.013 ## C2 0.050 0.050 0.050 0.239 0.238 0.239 ## C3 0.000 0.002 0.001 0.065 0.065 0.065 ## C4 0.000 0.004 0.000 0.212 0.212 0.212 # estimated class sizes dfr <- data.frame( "Theta"=Theta, "din"=mod13a$attribute.patt$class.prob, "gdm"=mod13b$pi.k, "mirt"=mod13c@Prior[[1]]) # comparison round(dfr,3) ## Theta.1 Theta.2 Theta.3 din gdm mirt ## 1 0 0 0 0.039 0.041 0.040 ## 2 1 0 0 0.008 0.009 0.009 ## 3 0 1 0 0.009 0.007 0.008 ## 4 0 0 1 0.394 0.417 0.412 ## 5 1 1 0 0.011 0.011 0.011 ## 6 1 0 1 0.017 0.042 0.037 ## 7 0 1 1 0.042 0.008 0.016 ## 8 1 1 1 0.480 0.465 0.467 #***************************************************** # Model 14: DINA model with two skills #***************************************************** # define some simple Q-Matrix (does not really make in this application) Q <- matrix( 0, nrow=12,ncol=2) Q[1:4,1] <- 1 Q[5:8,2] <- 1 Q[9:12,1:2] <- 1 # define discrete theta distribution with 3 dimensions Theta <- scan(what="character",nlines=1) 00 10 01 11 Theta <- as.numeric( unlist( lapply( Theta, strsplit, split="") ) ) Theta <- matrix(Theta, 4, 2, byrow=TRUE ) Theta #---- Model 14a: din (in CDM) mod14a <- CDM::din( dat, q.matrix=Q, rule="DINA") summary(mod14a) # compare used Theta distributions cbind( Theta, mod14a$attribute.patt.splitted) #---- Model 14b: mirt (in mirt) # define mirt model I <- ncol(dat) # I=12 mirtmodel <- mirt::mirt.model(" F1=1-4 F2=5-8 (F1*F2)=9-12 ") #-> constructions like (F1*F2*F3) are also allowed in mirt.model # get parameters mod.pars <- mirt(dat, model=mirtmodel, pars="values") # starting values d parameters (transformed guessing parameters) ind <- which( mod.pars$name=="d" ) mod.pars[ind,"value"] <- stats::qlogis(.2) # starting values transformed slipping parameters ind <- which( ( mod.pars$name %in% paste0("a",1:3) ) & ( mod.pars$est ) ) mod.pars[ind,"value"] <- stats::qlogis(.8) - stats::qlogis(.2) mod.pars #* use above defined prior lca_prior # lca_prior <- function(prior,Etable) ... #* estimate model mod14b <- mirt(dat, mirtmodel, technical=list( customTheta=Theta, customPriorFun=lca_prior), pars=mod.pars, verbose=TRUE ) # estimated parameters from the user customized prior distribution. mod14b@nest <- as.integer(sum(mod.pars$est) + 2) #* extract item parameters coef14b <- sirt::mirt.wrapper.coef(mod14b)$coef #-* comparisons of estimated parameters # extract guessing and slipping parameters from din dfr <- coef(mod14a)[, c("guess","slip") ] colnames(dfr) <- paste0("din.",c("guess","slip") ) # estimated parameters from mirt dfr$mirt.guess <- stats::plogis( coef14b$d ) dfr$mirt.slip <- 1 - stats::plogis( rowSums(coef14b[,c("d","a1","a2","a3")]) ) # comparison round(dfr[, c(1,3,2,4)],3) ## din.guess mirt.guess din.slip mirt.slip ## A1 0.674 0.671 0.030 0.030 ## A2 0.423 0.420 0.049 0.050 ## A3 0.258 0.255 0.224 0.225 ## A4 0.245 0.243 0.394 0.395 ## B1 0.534 0.543 0.166 0.164 ## B2 0.338 0.347 0.382 0.380 ## B3 0.796 0.802 0.016 0.015 ## B4 0.421 0.436 0.142 0.140 ## C1 0.850 0.851 0.000 0.000 ## C2 0.480 0.480 0.097 0.097 ## C3 0.746 0.746 0.026 0.026 ## C4 0.575 0.577 0.136 0.137 # estimated class sizes dfr <- data.frame( "Theta"=Theta, "din"=mod13a$attribute.patt$class.prob, "mirt"=mod14b@Prior[[1]]) # comparison round(dfr,3) ## Theta.1 Theta.2 din mirt ## 1 0 0 0.357 0.369 ## 2 1 0 0.044 0.049 ## 3 0 1 0.047 0.031 ## 4 1 1 0.553 0.551 #***************************************************** # Model 15: Rasch model with non-normal distribution #***************************************************** # A non-normal theta distributed is specified by log-linear smoothing # the distribution as described in # Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model # to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS. # define theta grid theta.k <- matrix( seq(-4,4,len=15), ncol=1 ) # define design matrix for smoothing (up to cubic moments) delta.designmatrix <- cbind( 1, theta.k, theta.k^2, theta.k^3 ) # constrain item difficulty of fifth item (item B1) to zero b.constraint <- matrix( c(5,1,0), ncol=3 ) #---- Model 15a: gdm (in CDM) mod15a <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, b.constraint=b.constraint ) summary(mod15a) # plot estimated distribution graphics::barplot( mod15a$pi.k[,1], space=0, names.arg=round(theta.k[,1],2), main="Estimated Skewed Distribution (gdm function)") #---- Model 15b: mirt (in mirt) # define mirt model mirtmodel <- mirt::mirt.model(" F=1-12 ") # get parameters mod.pars <- mirt::mirt(dat, model=mirtmodel, pars="values", itemtype="Rasch") # fix variance (just for correct counting of parameters) mod.pars[ mod.pars$name=="COV_11", "est"] <- FALSE # fix item difficulty ind <- which( ( mod.pars$item=="B1" ) & ( mod.pars$name=="d" ) ) mod.pars[ ind, "value"] <- 0 mod.pars[ ind, "est"] <- FALSE # define prior loglinear_prior <- function(Theta,Etable){ TP <- nrow(Theta) if ( is.null(Etable) ){ prior <- rep( 1/TP, TP ) } # process Etable (this is correct for datasets without missing data) if ( ! is.null(Etable) ){ # sum over correct and incorrect expected responses prior <- ( rowSums( Etable[, seq(1,2*I,2)] ) + rowSums( Etable[, seq(2,2*I,2)] ) )/I # smooth prior using the above design matrix and a log-linear model # see Xu & von Davier (2008). y <- log( prior + 1E-15 ) lm1 <- lm( y ~ 0 + delta.designmatrix, weights=prior ) prior <- exp(fitted(lm1)) # smoothed prior } prior <- prior / sum(prior) return(prior) } #* estimate model mod15b <- mirt::mirt(dat, mirtmodel, technical=list( customTheta=theta.k, customPriorFun=loglinear_prior ), pars=mod.pars, verbose=TRUE ) # estimated parameters from the user customized prior distribution. mod15b@nest <- as.integer(sum(mod.pars$est) + 3) #* extract item parameters coef1 <- sirt::mirt.wrapper.coef(mod15b)$coef #** compare estimated item parameters dfr <- data.frame( "gdm"=mod15a$item$b.Cat1, "mirt"=coef1$d ) rownames(dfr) <- colnames(dat) round(t(dfr),4) ## A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 ## gdm 0.9818 0.1538 -0.7837 -1.3197 0 -1.0902 1.6088 -0.170 1.9778 0.006 1.1859 0.135 ## mirt 0.9829 0.1548 -0.7826 -1.3186 0 -1.0892 1.6099 -0.169 1.9790 0.007 1.1870 0.136 # compare estimated theta distribution dfr <- data.frame( "gdm"=mod15a$pi.k, "mirt"=mod15b@Prior[[1]] ) round(t(dfr),4) ## 1 2 3 4 5 6 7 8 9 10 11 12 13 ## gdm 0 0 1e-04 9e-04 0.0056 0.0231 0.0652 0.1299 0.1881 0.2038 0.1702 0.1129 0.0612 ## mirt 0 0 1e-04 9e-04 0.0056 0.0232 0.0653 0.1300 0.1881 0.2038 0.1702 0.1128 0.0611 ## 14 15 ## gdm 0.0279 0.011 ## mirt 0.0278 0.011 ## End(Not run)
Some simulated datasets from Reckase (2009).
data(data.reck21) data(data.reck61DAT1) data(data.reck61DAT2) data(data.reck73C1a) data(data.reck73C1b) data(data.reck75C2) data(data.reck78ExA) data(data.reck79ExB)
data(data.reck21) data(data.reck61DAT1) data(data.reck61DAT2) data(data.reck73C1a) data(data.reck73C1b) data(data.reck75C2) data(data.reck78ExA) data(data.reck79ExB)
The format of the data.reck21
(Table 2.1, p. 45) is:
List of 2
$ data: num [1:2500, 1:50] 0 0 0 1 1 0 0 0 1 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:50] "I0001" "I0002" "I0003" "I0004" ...
$ pars:'data.frame':
..$ a: num [1:50] 1.83 1.38 1.47 1.53 0.88 0.82 1.02 1.19 1.15 0.18 ...
..$ b: num [1:50] 0.91 0.81 0.06 -0.8 0.24 0.99 1.23 -0.47 2.78 -3.85 ...
..$ c: num [1:50] 0 0 0 0.25 0.21 0.29 0.26 0.19 0 0.21 ...
The format of the datasets data.reck61DAT1
and
data.reck61DAT2
(Table 6.1, p. 153) is
List of 4
$ data : num [1:2500, 1:30] 1 0 0 1 1 0 0 1 1 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:30] "A01" "A02" "A03" "A04" ...
$ pars :'data.frame':
..$ a1: num [1:30] 0.747 0.46 0.861 1.014 0.552 ...
..$ a2: num [1:30] 0.025 0.0097 0.0067 0.008 0.0204 0.0064 0.0861 ...
..$ a3: num [1:30] 0.1428 0.0692 0.404 0.047 0.1482 ...
..$ d : num [1:30] 0.183 -0.192 -0.466 -0.434 -0.443 ...
$ mu : num [1:3] -0.4 -0.7 0.1
$ sigma: num [1:3, 1:3] 1.21 0.297 1.232 0.297 0.81 ...
The dataset data.reck61DAT2
has correlated dimensions while
data.reck61DAT1
has uncorrelated dimensions.
Datasets data.reck73C1a
and data.reck73C1b
use item parameters
from Table 7.3 (p. 188). The dataset C1a
has uncorrelated dimensions,
while C1b
has perfectly correlated dimensions. The items are sensitive to
3 dimensions. The format of the datasets is
List of 4
$ data : num [1:2500, 1:30] 1 0 1 1 1 0 1 1 1 1 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:30] "A01" "A02" "A03" "A04" ...
$ pars :'data.frame': 30 obs. of 4 variables:
..$ a1: num [1:30] 0.747 0.46 0.861 1.014 0.552 ...
..$ a2: num [1:30] 0.025 0.0097 0.0067 0.008 0.0204 0.0064 ...
..$ a3: num [1:30] 0.1428 0.0692 0.404 0.047 0.1482 ...
..$ d : num [1:30] 0.183 -0.192 -0.466 -0.434 -0.443 ...
$ mu : num [1:3] 0 0 0
$ sigma: num [1:3, 1:3] 0.167 0.236 0.289 0.236 0.334 ...
The dataset data.reck75C2
is simulated using item parameters
from Table 7.5 (p. 191). It contains items which are sensitive to only
one dimension but individuals which have abilities in three
uncorrelated dimensions. The format is
List of 4
$ data : num [1:2500, 1:30] 0 0 1 1 1 0 0 1 1 1 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:30] "A01" "A02" "A03" "A04" ...
$ pars :'data.frame': 30 obs. of 4 variables:
..$ a1: num [1:30] 0.56 0.48 0.67 0.57 0.54 0.74 0.7 0.59 0.63 0.64 ...
..$ a2: num [1:30] 0.62 0.53 0.63 0.69 0.58 0.69 0.75 0.63 0.64 0.64 ...
..$ a3: num [1:30] 0.46 0.42 0.43 0.51 0.41 0.48 0.46 0.5 0.51 0.46 ...
..$ d : num [1:30] 0.1 0.06 -0.38 0.46 0.14 0.31 0.06 -1.23 0.47 1.06 ...
$ mu : num [1:3] 0 0 0
$ sigma: num [1:3, 1:3] 1 0 0 0 1 0 0 0 1
The dataset data.reck78ExA
contains simulated item responses
from Table 7.8 (p. 204 ff.). There are three item clusters and
two ability dimensions. The format is
List of 4
$ data : num [1:2500, 1:50] 0 1 1 0 1 0 0 0 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:50] "A01" "A02" "A03" "A04" ...
$ pars :'data.frame': 50 obs. of 3 variables:
..$ a1: num [1:50] 0.889 1.057 1.047 1.178 1.029 ...
..$ a2: num [1:50] 0.1399 0.0432 0.016 0.0231 0.2347 ...
..$ d : num [1:50] 0.2724 1.2335 -0.0918 -0.2372 0.8471 ...
$ mu : num [1:2] 0 0
$ sigma: num [1:2, 1:2] 1 0 0 1
The dataset data.reck79ExB
contains simulated item responses
from Table 7.9 (p. 207 ff.). There are three item clusters and
three ability dimensions. The format is
List of 4
$ data : num [1:2500, 1:50] 1 1 0 1 0 0 0 1 1 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:50] "A01" "A02" "A03" "A04" ...
$ pars :'data.frame': 50 obs. of 4 variables:
..$ a1: num [1:50] 0.895 1.032 1.036 1.163 1.022 ...
..$ a2: num [1:50] 0.052 0.132 0.144 0.13 0.165 ...
..$ a3: num [1:50] 0.0722 0.1923 0.0482 0.1321 0.204 ...
..$ d : num [1:50] 0.2724 1.2335 -0.0918 -0.2372 0.8471 ...
$ mu : num [1:3] 0 0 0
$ sigma: num [1:3, 1:3] 1 0 0 0 1 0 0 0 1
Simulated datasets
Reckase, M. (2009). Multidimensional item response theory. New York: Springer. doi:10.1007/978-0-387-89976-3
## Not run: ############################################################################# # EXAMPLE 1: data.reck21 dataset, Table 2.1, p. 45 ############################################################################# data(data.reck21) dat <- data.reck21$dat # extract dataset # items with zero guessing parameters guess0 <- c( 1, 2, 3, 9,11,27,30,35,45,49,50 ) I <- ncol(dat) #*** # Model 1: 3PL estimation using rasch.mml2 est.c <- est.a <- 1:I est.c[ guess0 ] <- 0 mod1 <- sirt::rasch.mml2( dat, est.a=est.a, est.c=est.c, mmliter=300 ) summary(mod1) #*** # Model 2: 3PL estimation using smirt Q <- matrix(1,I,1) mod2 <- sirt::smirt( dat, Qmatrix=Q, est.a="2PL", est.c=est.c, increment.factor=1.01) summary(mod2) #*** # Model 3: estimation in mirt package library(mirt) itemtype <- rep("3PL", I ) itemtype[ guess0 ] <- "2PL" mod3 <- mirt::mirt(dat, 1, itemtype=itemtype, verbose=TRUE) summary(mod3) c3 <- unlist( coef(mod3) )[ 1:(4*I) ] c3 <- matrix( c3, I, 4, byrow=TRUE ) # compare estimates of rasch.mml2, smirt and true parameters round( cbind( mod1$item$c, mod2$item$c,c3[,3],data.reck21$pars$c ), 2 ) round( cbind( mod1$item$a, mod2$item$a.Dim1,c3[,1], data.reck21$pars$a ), 2 ) round( cbind( mod1$item$b, mod2$item$b.Dim1 / mod2$item$a.Dim1, - c3[,2] / c3[,1], data.reck21$pars$b ), 2 ) ############################################################################# # EXAMPLE 2: data.reck61 dataset, Table 6.1, p. 153 ############################################################################# data(data.reck61DAT1) dat <- data.reck61DAT1$data #*** # Model 1: Exploratory factor analysis #-- Model 1a: tam.fa in TAM library(TAM) mod1a <- TAM::tam.fa( dat, irtmodel="efa", nfactors=3 ) # varimax rotation varimax(mod1a$B.stand) # Model 1b: EFA in NOHARM (Promax rotation) mod1b <- sirt::R2noharm( dat=dat, model.type="EFA", dimensions=3, writename="reck61__3dim_efa", noharm.path="c:/NOHARM",dec=",") summary(mod1b) # Model 1c: EFA with noharm.sirt mod1c <- sirt::noharm.sirt( dat=dat, dimensions=3 ) summary(mod1c) plot(mod1c) # Model 1d: EFA with 2 dimensions in noharm.sirt mod1d <- sirt::noharm.sirt( dat=dat, dimensions=2 ) summary(mod1d) plot(mod1d, efa.load.min=.2) # plot loadings of at least .20 #*** # Model 2: Confirmatory factor analysis #-- Model 2a: tam.fa in TAM dims <- c( rep(1,10), rep(3,10), rep(2,10) ) Qmatrix <- matrix( 0, nrow=30, ncol=3 ) Qmatrix[ cbind( 1:30, dims) ] <- 1 mod2a <- TAM::tam.mml.2pl( dat,Q=Qmatrix, control=list( snodes=1000, QMC=TRUE, maxiter=200) ) summary(mod2a) #-- Model 2b: smirt in sirt mod2b <- sirt::smirt( dat,Qmatrix=Qmatrix, est.a="2PL", maxiter=20, qmcnodes=1000 ) summary(mod2b) #-- Model 2c: rasch.mml2 in sirt mod2c <- sirt::rasch.mml2( dat,Qmatrix=Qmatrix, est.a=1:30, mmliter=200, theta.k=seq(-5,5,len=11) ) summary(mod2c) #-- Model 2d: mirt in mirt cmodel <- mirt::mirt.model(" F1=1-10 F2=21-30 F3=11-20 COV=F1*F2, F1*F3, F2*F3 " ) mod2d <- mirt::mirt(dat, cmodel, verbose=TRUE) summary(mod2d) coef(mod2d) #-- Model 2e: CFA in NOHARM # specify covariance pattern P.pattern <- matrix( 1, ncol=3, nrow=3 ) P.init <- .4*P.pattern diag(P.pattern) <- 0 diag(P.init) <- 1 # fix all entries in the loading matrix to 1 F.pattern <- matrix( 0, nrow=30, ncol=3 ) F.pattern[1:10,1] <- 1 F.pattern[21:30,2] <- 1 F.pattern[11:20,3] <- 1 F.init <- F.pattern # estimate model mod2e <- sirt::R2noharm( dat=dat, model.type="CFA", P.pattern=P.pattern, P.init=P.init, F.pattern=F.pattern, F.init=F.init, writename="reck61__3dim_cfa", noharm.path="c:/NOHARM",dec=",") summary(mod2e) #-- Model 2f: CFA with noharm.sirt mod2f <- sirt::noharm.sirt( dat=dat, Fval=F.init, Fpatt=F.pattern, Pval=P.init, Ppatt=P.pattern ) summary(mod2f) ############################################################################# # EXAMPLE 3: DETECT analysis data.reck78ExA and data.reck79ExB ############################################################################# data(data.reck78ExA) data(data.reck79ExB) #************************ # Example A dat <- data.reck78ExA$data #- estimate person score score <- stats::qnorm( ( rowMeans( dat )+.5 ) / ( ncol(dat) + 1 ) ) #- extract item cluster itemcluster <- substring( colnames(dat), 1, 1 ) #- confirmatory DETECT Item cluster detectA <- sirt::conf.detect( data=dat, score=score, itemcluster=itemcluster ) ## unweighted weighted ## DETECT 0.571 0.571 ## ASSI 0.523 0.523 ## RATIO 0.757 0.757 #- exploratory DETECT analysis detect_explA <- sirt::expl.detect(data=dat, score, nclusters=10, N.est=nrow(dat)/2 ) ## Optimal Cluster Size is 5 (Maximum of DETECT Index) ## N.Cluster N.items N.est N.val size.cluster DETECT.est ASSI.est ## 1 2 50 1250 1250 31-19 0.531 0.404 ## 2 3 50 1250 1250 10-19-21 0.554 0.407 ## 3 4 50 1250 1250 10-19-14-7 0.630 0.509 ## 4 5 50 1250 1250 10-19-3-7-11 0.653 0.546 ## 5 6 50 1250 1250 10-12-7-3-7-11 0.593 0.458 ## 6 7 50 1250 1250 10-12-7-3-7-9-2 0.604 0.474 ## 7 8 50 1250 1250 10-12-7-3-3-9-4-2 0.608 0.481 ## 8 9 50 1250 1250 10-12-7-3-3-5-4-2-4 0.617 0.494 ## 9 10 50 1250 1250 10-5-7-7-3-3-5-4-2-4 0.592 0.460 # cluster membership cluster_membership <- detect_explA$itemcluster$cluster3 # Cluster 1: colnames(dat)[ cluster_membership==1 ] ## [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10" # Cluster 2: colnames(dat)[ cluster_membership==2 ] ## [1] "B11" "B12" "B13" "B14" "B15" "B16" "B17" "B18" "B19" "B20" "B21" "B22" ## [13] "B23" "B25" "B26" "B27" "B28" "B29" "B30" # Cluster 3: colnames(dat)[ cluster_membership==3 ] ## [1] "B24" "C31" "C32" "C33" "C34" "C35" "C36" "C37" "C38" "C39" "C40" "C41" ## [13] "C42" "C43" "C44" "C45" "C46" "C47" "C48" "C49" "C50" #************************ # Example B dat <- data.reck79ExB$data #- estimate person score score <- stats::qnorm( ( rowMeans( dat )+.5 ) / ( ncol(dat) + 1 ) ) #- extract item cluster itemcluster <- substring( colnames(dat), 1, 1 ) #- confirmatory DETECT Item cluster detectB <- sirt::conf.detect( data=dat, score=score, itemcluster=itemcluster ) ## unweighted weighted ## DETECT 0.715 0.715 ## ASSI 0.624 0.624 ## RATIO 0.855 0.855 #- exploratory DETECT analysis detect_explB <- sirt::expl.detect(data=dat, score, nclusters=10, N.est=nrow(dat)/2 ) ## Optimal Cluster Size is 4 (Maximum of DETECT Index) ## ## N.Cluster N.items N.est N.val size.cluster DETECT.est ASSI.est ## 1 2 50 1250 1250 30-20 0.665 0.546 ## 2 3 50 1250 1250 10-20-20 0.686 0.585 ## 3 4 50 1250 1250 10-20-8-12 0.728 0.644 ## 4 5 50 1250 1250 10-6-14-8-12 0.654 0.553 ## 5 6 50 1250 1250 10-6-14-3-12-5 0.659 0.561 ## 6 7 50 1250 1250 10-6-14-3-7-5-5 0.664 0.576 ## 7 8 50 1250 1250 10-6-7-7-3-7-5-5 0.616 0.518 ## 8 9 50 1250 1250 10-6-7-7-3-5-5-5-2 0.612 0.512 ## 9 10 50 1250 1250 10-6-7-7-3-5-3-5-2-2 0.613 0.512 ## End(Not run)
## Not run: ############################################################################# # EXAMPLE 1: data.reck21 dataset, Table 2.1, p. 45 ############################################################################# data(data.reck21) dat <- data.reck21$dat # extract dataset # items with zero guessing parameters guess0 <- c( 1, 2, 3, 9,11,27,30,35,45,49,50 ) I <- ncol(dat) #*** # Model 1: 3PL estimation using rasch.mml2 est.c <- est.a <- 1:I est.c[ guess0 ] <- 0 mod1 <- sirt::rasch.mml2( dat, est.a=est.a, est.c=est.c, mmliter=300 ) summary(mod1) #*** # Model 2: 3PL estimation using smirt Q <- matrix(1,I,1) mod2 <- sirt::smirt( dat, Qmatrix=Q, est.a="2PL", est.c=est.c, increment.factor=1.01) summary(mod2) #*** # Model 3: estimation in mirt package library(mirt) itemtype <- rep("3PL", I ) itemtype[ guess0 ] <- "2PL" mod3 <- mirt::mirt(dat, 1, itemtype=itemtype, verbose=TRUE) summary(mod3) c3 <- unlist( coef(mod3) )[ 1:(4*I) ] c3 <- matrix( c3, I, 4, byrow=TRUE ) # compare estimates of rasch.mml2, smirt and true parameters round( cbind( mod1$item$c, mod2$item$c,c3[,3],data.reck21$pars$c ), 2 ) round( cbind( mod1$item$a, mod2$item$a.Dim1,c3[,1], data.reck21$pars$a ), 2 ) round( cbind( mod1$item$b, mod2$item$b.Dim1 / mod2$item$a.Dim1, - c3[,2] / c3[,1], data.reck21$pars$b ), 2 ) ############################################################################# # EXAMPLE 2: data.reck61 dataset, Table 6.1, p. 153 ############################################################################# data(data.reck61DAT1) dat <- data.reck61DAT1$data #*** # Model 1: Exploratory factor analysis #-- Model 1a: tam.fa in TAM library(TAM) mod1a <- TAM::tam.fa( dat, irtmodel="efa", nfactors=3 ) # varimax rotation varimax(mod1a$B.stand) # Model 1b: EFA in NOHARM (Promax rotation) mod1b <- sirt::R2noharm( dat=dat, model.type="EFA", dimensions=3, writename="reck61__3dim_efa", noharm.path="c:/NOHARM",dec=",") summary(mod1b) # Model 1c: EFA with noharm.sirt mod1c <- sirt::noharm.sirt( dat=dat, dimensions=3 ) summary(mod1c) plot(mod1c) # Model 1d: EFA with 2 dimensions in noharm.sirt mod1d <- sirt::noharm.sirt( dat=dat, dimensions=2 ) summary(mod1d) plot(mod1d, efa.load.min=.2) # plot loadings of at least .20 #*** # Model 2: Confirmatory factor analysis #-- Model 2a: tam.fa in TAM dims <- c( rep(1,10), rep(3,10), rep(2,10) ) Qmatrix <- matrix( 0, nrow=30, ncol=3 ) Qmatrix[ cbind( 1:30, dims) ] <- 1 mod2a <- TAM::tam.mml.2pl( dat,Q=Qmatrix, control=list( snodes=1000, QMC=TRUE, maxiter=200) ) summary(mod2a) #-- Model 2b: smirt in sirt mod2b <- sirt::smirt( dat,Qmatrix=Qmatrix, est.a="2PL", maxiter=20, qmcnodes=1000 ) summary(mod2b) #-- Model 2c: rasch.mml2 in sirt mod2c <- sirt::rasch.mml2( dat,Qmatrix=Qmatrix, est.a=1:30, mmliter=200, theta.k=seq(-5,5,len=11) ) summary(mod2c) #-- Model 2d: mirt in mirt cmodel <- mirt::mirt.model(" F1=1-10 F2=21-30 F3=11-20 COV=F1*F2, F1*F3, F2*F3 " ) mod2d <- mirt::mirt(dat, cmodel, verbose=TRUE) summary(mod2d) coef(mod2d) #-- Model 2e: CFA in NOHARM # specify covariance pattern P.pattern <- matrix( 1, ncol=3, nrow=3 ) P.init <- .4*P.pattern diag(P.pattern) <- 0 diag(P.init) <- 1 # fix all entries in the loading matrix to 1 F.pattern <- matrix( 0, nrow=30, ncol=3 ) F.pattern[1:10,1] <- 1 F.pattern[21:30,2] <- 1 F.pattern[11:20,3] <- 1 F.init <- F.pattern # estimate model mod2e <- sirt::R2noharm( dat=dat, model.type="CFA", P.pattern=P.pattern, P.init=P.init, F.pattern=F.pattern, F.init=F.init, writename="reck61__3dim_cfa", noharm.path="c:/NOHARM",dec=",") summary(mod2e) #-- Model 2f: CFA with noharm.sirt mod2f <- sirt::noharm.sirt( dat=dat, Fval=F.init, Fpatt=F.pattern, Pval=P.init, Ppatt=P.pattern ) summary(mod2f) ############################################################################# # EXAMPLE 3: DETECT analysis data.reck78ExA and data.reck79ExB ############################################################################# data(data.reck78ExA) data(data.reck79ExB) #************************ # Example A dat <- data.reck78ExA$data #- estimate person score score <- stats::qnorm( ( rowMeans( dat )+.5 ) / ( ncol(dat) + 1 ) ) #- extract item cluster itemcluster <- substring( colnames(dat), 1, 1 ) #- confirmatory DETECT Item cluster detectA <- sirt::conf.detect( data=dat, score=score, itemcluster=itemcluster ) ## unweighted weighted ## DETECT 0.571 0.571 ## ASSI 0.523 0.523 ## RATIO 0.757 0.757 #- exploratory DETECT analysis detect_explA <- sirt::expl.detect(data=dat, score, nclusters=10, N.est=nrow(dat)/2 ) ## Optimal Cluster Size is 5 (Maximum of DETECT Index) ## N.Cluster N.items N.est N.val size.cluster DETECT.est ASSI.est ## 1 2 50 1250 1250 31-19 0.531 0.404 ## 2 3 50 1250 1250 10-19-21 0.554 0.407 ## 3 4 50 1250 1250 10-19-14-7 0.630 0.509 ## 4 5 50 1250 1250 10-19-3-7-11 0.653 0.546 ## 5 6 50 1250 1250 10-12-7-3-7-11 0.593 0.458 ## 6 7 50 1250 1250 10-12-7-3-7-9-2 0.604 0.474 ## 7 8 50 1250 1250 10-12-7-3-3-9-4-2 0.608 0.481 ## 8 9 50 1250 1250 10-12-7-3-3-5-4-2-4 0.617 0.494 ## 9 10 50 1250 1250 10-5-7-7-3-3-5-4-2-4 0.592 0.460 # cluster membership cluster_membership <- detect_explA$itemcluster$cluster3 # Cluster 1: colnames(dat)[ cluster_membership==1 ] ## [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10" # Cluster 2: colnames(dat)[ cluster_membership==2 ] ## [1] "B11" "B12" "B13" "B14" "B15" "B16" "B17" "B18" "B19" "B20" "B21" "B22" ## [13] "B23" "B25" "B26" "B27" "B28" "B29" "B30" # Cluster 3: colnames(dat)[ cluster_membership==3 ] ## [1] "B24" "C31" "C32" "C33" "C34" "C35" "C36" "C37" "C38" "C39" "C40" "C41" ## [13] "C42" "C43" "C44" "C45" "C46" "C47" "C48" "C49" "C50" #************************ # Example B dat <- data.reck79ExB$data #- estimate person score score <- stats::qnorm( ( rowMeans( dat )+.5 ) / ( ncol(dat) + 1 ) ) #- extract item cluster itemcluster <- substring( colnames(dat), 1, 1 ) #- confirmatory DETECT Item cluster detectB <- sirt::conf.detect( data=dat, score=score, itemcluster=itemcluster ) ## unweighted weighted ## DETECT 0.715 0.715 ## ASSI 0.624 0.624 ## RATIO 0.855 0.855 #- exploratory DETECT analysis detect_explB <- sirt::expl.detect(data=dat, score, nclusters=10, N.est=nrow(dat)/2 ) ## Optimal Cluster Size is 4 (Maximum of DETECT Index) ## ## N.Cluster N.items N.est N.val size.cluster DETECT.est ASSI.est ## 1 2 50 1250 1250 30-20 0.665 0.546 ## 2 3 50 1250 1250 10-20-20 0.686 0.585 ## 3 4 50 1250 1250 10-20-8-12 0.728 0.644 ## 4 5 50 1250 1250 10-6-14-8-12 0.654 0.553 ## 5 6 50 1250 1250 10-6-14-3-12-5 0.659 0.561 ## 6 7 50 1250 1250 10-6-14-3-7-5-5 0.664 0.576 ## 7 8 50 1250 1250 10-6-7-7-3-7-5-5 0.616 0.518 ## 8 9 50 1250 1250 10-6-7-7-3-5-5-5-2 0.612 0.512 ## 9 10 50 1250 1250 10-6-7-7-3-5-3-5-2-2 0.613 0.512 ## End(Not run)
sirt
Package
Some example datasets for the sirt
package.
data(data.si01) data(data.si02) data(data.si03) data(data.si04) data(data.si05) data(data.si06) data(data.si07) data(data.si08) data(data.si09) data(data.si10)
data(data.si01) data(data.si02) data(data.si03) data(data.si04) data(data.si05) data(data.si06) data(data.si07) data(data.si08) data(data.si09) data(data.si10)
The format of the dataset data.si01
is:
'data.frame': 1857 obs. of 3 variables:
$ idgroup: int 1 1 1 1 1 1 1 1 1 1 ...
$ item1 : int NA NA NA NA NA NA NA NA NA NA ...
$ item2 : int 4 4 4 4 4 4 4 2 4 4 ...
The dataset data.si02
is the Stouffer-Toby-dataset published
in Lindsay, Clogg and Grego (1991; Table 1, p.97, Cross-classification A):
List of 2
$ data : num [1:16, 1:4] 1 0 1 0 1 0 1 0 1 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:4] "I1" "I2" "I3" "I4"
$ weights: num [1:16] 42 1 6 2 6 1 7 2 23 4 ...
The format of the dataset data.si03
(containing item
parameters of two studies) is:
'data.frame': 27 obs. of 3 variables:
$ item : Factor w/ 27 levels "M1","M10","M11",..: 1 12 21 22 ...
$ b_study1: num 0.297 1.163 0.151 -0.855 -1.653 ...
$ b_study2: num 0.72 1.118 0.351 -0.861 -1.593 ...
The dataset data.si04
is adapted from Bartolucci, Montanari
and Pandolfi (2012; Table 4, Table 7). The data contains 4999 persons,
79 items on 5 dimensions. See rasch.mirtlc
for using the
data in an analysis.
List of 3
$ data : num [1:4999, 1:79] 0 1 1 0 1 1 0 0 1 1 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:79] "A01" "A02" "A03" "A04" ...
$ itempars :'data.frame': 79 obs. of 4 variables:
..$ item : Factor w/ 79 levels "A01","A02","A03",..: 1 2 3 4 5 6 7 8 9 10 ...
..$ dim : num [1:79] 1 1 1 1 1 1 1 1 1 1 ...
..$ gamma : num [1:79] 1 1 1 1 1 1 1 1 1 1 ...
..$ gamma.beta: num [1:79] -0.189 0.25 0.758 1.695 1.022 ...
$ distribution: num [1:9, 1:7] 1 2 3 4 5 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:7] "class" "A" "B" "C" ...
The dataset data.si05
contains double ratings of two
exchangeable raters for three items which are in Ex1
, Ex2
and Ex3
, respectively.
List of 3
$ Ex1:'data.frame': 199 obs. of 2 variables:
..$ C7040: num [1:199] NA 1 0 1 1 0 0 0 1 0 ...
..$ C7041: num [1:199] 1 1 0 0 0 0 0 0 1 0 ...
$ Ex2:'data.frame': 2000 obs. of 2 variables:
..$ rater1: num [1:2000] 2 0 3 1 2 2 0 0 0 0 ...
..$ rater2: num [1:2000] 4 1 3 2 1 0 0 0 0 2 ...
$ Ex3:'data.frame': 2000 obs. of 2 variables:
..$ rater1: num [1:2000] 5 1 6 2 3 3 0 0 0 0 ...
..$ rater2: num [1:2000] 7 2 6 3 2 1 0 1 0 3 ...
The dataset data.si06
contains multiple choice item
responses. The correct alternative is denoted as 0, distractors
are indicated by the codes 1, 2 or 3.
'data.frame': 4441 obs. of 14 variables:
$ WV01: num 0 0 0 0 0 0 0 0 0 3 ...
$ WV02: num 0 0 0 3 0 0 0 0 0 1 ...
$ WV03: num 0 1 0 0 0 0 0 0 0 0 ...
$ WV04: num 0 0 0 0 0 0 0 0 0 1 ...
$ WV05: num 3 1 1 1 0 0 1 1 0 2 ...
$ WV06: num 0 1 3 0 0 0 2 0 0 1 ...
$ WV07: num 0 0 0 0 0 0 0 0 0 0 ...
$ WV08: num 0 1 1 0 0 0 0 0 0 0 ...
$ WV09: num 0 0 0 0 0 0 0 0 0 2 ...
$ WV10: num 1 1 3 0 0 2 0 0 0 0 ...
$ WV11: num 0 0 0 0 0 0 0 0 0 0 ...
$ WV12: num 0 0 0 2 0 0 2 0 0 0 ...
$ WV13: num 3 1 1 3 0 0 3 0 0 0 ...
$ WV14: num 3 1 2 3 0 3 1 3 3 0 ...
The dataset data.si07
contains parameters of the empirical illustration
of DeCarlo (2020). The simulation function sim_fun
can be used for
simulating data from the IRSDT model (see DeCarlo, 2020)
List of 3
$ pars :'data.frame': 16 obs. of 3 variables:
..$ item: Factor w/ 16 levels "I01","I02","I03",..: 1 2 3 4 5 6 7 8 9 10 ...
..$ b : num [1:16] -1.1 -0.18 1.44 1.78 -1.19 0.45 -1.12 0.33 0.82 -0.43 ...
..$ d : num [1:16] 2.69 4.6 6.1 3.11 3.2 ...
$ trait :'data.frame': 20 obs. of 2 variables:
..$ x : num [1:20] 0.025 0.075 0.125 0.175 0.225 0.275 0.325 0.375 0.425 0.475 ...
..$ prob: num [1:20] 0.0238 0.1267 0.105 0.0594 0.0548 ...
$ sim_fun:function (lambda, b, d, items)
The dataset data.si08
contains 5 items with respect to knowledge
about lung cancer and the kind of information acquisition (Goodman, 1970;
see also Rasch, Kubinger & Yanagida, 2011).
L1
: reading newspapers, L2
: listening radio,
L3
: reading books and magazines,
L4
: attending talks, L5
: knowledge about lung cancer
'data.frame': 32 obs. of 6 variables:
$ L1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ L2 : num 1 1 1 1 1 1 1 1 0 0 ...
$ L3 : num 1 1 1 1 0 0 0 0 1 1 ...
$ L4 : num 1 1 0 0 1 1 0 0 1 1 ...
$ L5 : num 1 0 1 0 1 0 1 0 1 0 ...
$ wgt: num 23 8 102 67 8 4 35 59 27 18 ...
The dataset data.si09
was used in Fischer and Karl (2019) and they
asked employees in a eight countries, to report whether they typically help
other employees (helping behavior, seven items, help
) and whether
they make suggestions to improve work conditions and products
(voice behavior, five items, voice
). Individuals responded to these items
on a 1-7 Likert-type scale. The dataset was downloaded from https://osf.io/wkx8c/.
'data.frame': 5201 obs. of 13 variables:
$ country: Factor w/ 8 levels "BRA","CAN","KEN",..: 5 5 5 5 5 5 5 5 5 5 ...
$ help1 : int 6 6 5 5 5 6 6 6 4 6 ...
$ help2 : int 3 6 5 6 6 6 6 6 6 7 ...
$ help3 : int 5 6 6 7 7 6 5 6 6 7 ...
$ help4 : int 7 6 5 6 6 7 7 6 6 7 ...
$ help5 : int 5 5 5 6 6 6 6 6 6 7 ...
$ help6 : int 3 4 5 6 6 7 7 6 6 5 ...
$ help7 : int 5 4 4 5 5 7 7 6 6 6 ...
$ voice1 : int 3 6 5 6 4 7 6 6 5 7 ...
$ voice2 : int 3 6 4 7 6 5 6 6 4 7 ...
$ voice3 : int 6 6 5 7 6 5 6 6 6 5 ...
$ voice4 : int 6 6 6 5 5 7 5 6 6 6 ...
$ voice5 : int 6 7 4 7 6 6 6 6 5 7 ...
The dataset data.si10
contains votes of 435 members of the U.S. House of
Representatives, 267 Democrates and 168 Republicans. The dataset was
used by Fop and Murphy (2017).
'data.frame': 435 obs. of 17 variables:
$ party : Factor w/ 2 levels "democrat","republican": 2 2 1 1 1 1 1 2 2 1 ...
$ vote01: num 0 0 NA 0 1 0 0 0 0 1 ...
$ vote02: num 1 1 1 1 1 1 1 1 1 1 ...
$ vote03: num 0 0 1 1 1 1 0 0 0 1 ...
$ vote04: num 1 1 NA 0 0 0 1 1 1 0 ...
$ vote05: num 1 1 1 NA 1 1 1 1 1 0 ...
$ vote06: num 1 1 1 1 1 1 1 1 1 0 ...
$ vote07: num 0 0 0 0 0 0 0 0 0 1 ...
$ vote08: num 0 0 0 0 0 0 0 0 0 1 ...
$ vote09: num 0 0 0 0 0 0 0 0 0 1 ...
$ vote10: num 1 0 0 0 0 0 0 0 0 0 ...
$ vote11: num NA 0 1 1 1 0 0 0 0 0 ...
$ vote12: num 1 1 0 0 NA 0 0 0 1 0 ...
$ vote13: num 1 1 1 1 1 1 NA 1 1 0 ...
$ vote14: num 1 1 1 0 1 1 1 1 1 0 ...
$ vote15: num 0 0 0 0 1 1 1 NA 0 NA ...
$ vote16: num 1 NA 0 1 1 1 1 1 1 NA ...
Bartolucci, F., Montanari, G. E., & Pandolfi, S. (2012). Dimensionality of the latent structure and item selection via latent class multidimensional IRT models. Psychometrika, 77(4), 782-802. doi:10.1007/s11336-012-9278-0
DeCarlo, L. T. (2020). An item response model for true-false exams based on signal detection theory. Applied Psychological Measurement, 34(3). 234-248. doi:10.1177/0146621619843823
Fischer, R., & Karl, J. A. (2019). A primer to (cross-cultural) multi-group invariance testing possibilities in R. Frontiers in Psychology | Cultural Psychology, 10:1507. doi:10.3389/fpsyg.2019.01507
Fop, M., & Murphy, T. B. (2018). Variable selection methods for model-based clustering. Statistics Surveys, 12, 18-65. https://doi.org/10.1214/18-SS119
Goodman, L. A. (1970). The multivariate analysis of qualitative data: Interactions among multiple classifications. Journal of the American Statistical Association, 65(329), 226-256. doi:10.1080/01621459.1970.10481076
Lindsay, B., Clogg, C. C., & Grego, J. (1991). Semiparametric estimation in the Rasch model and related exponential response models, including a simple latent class model for item analysis. Journal of the American Statistical Association, 86(413), 96-107. doi:10.1080/01621459.1991.10475008
Rasch, D., Kubinger, K. D., & Yanagida, T. (2011). Statistics in psychology using R and SPSS. New York: Wiley. doi:10.1002/9781119979630
Some free datasets can be obtained from
Psychological questionnaires: http://personality-testing.info/_rawdata/
PISA 2012: http://pisa2012.acer.edu.au/downloads.php
PIAAC: http://www.oecd.org/site/piaac/publicdataandanalysis.htm
TIMSS 2011: http://timssandpirls.bc.edu/timss2011/international-database.html
ALLBUS: http://www.gesis.org/allbus/allbus-home/
## Not run: ############################################################################# # EXAMPLE 1: Nested logit model multiple choice dataset data.si06 ############################################################################# data(data.si06, package="sirt") dat <- data.si06 #** estimate 2PL nested logit model library(mirt) mod1 <- mirt::mirt( dat, model=1, itemtype="2PLNRM", key=rep(0,ncol(dat) ), verbose=TRUE ) summary(mod1) cmod1 <- sirt::mirt.wrapper.coef(mod1)$coef cmod1[,-1] <- round( cmod1[,-1], 3) #** normalize item parameters according Suh and Bolt (2010) cmod2 <- cmod1 # slope parameters ind <- grep("ak",colnames(cmod2)) h1 <- cmod2[,ind ] cmod2[,ind] <- t( apply( h1, 1, FUN=function(ll){ ll - mean(ll) } ) ) # item intercepts ind <- paste0( "d", 0:9 ) ind <- which( colnames(cmod2) %in% ind ) h1 <- cmod2[,ind ] cmod2[,ind] <- t( apply( h1, 1, FUN=function(ll){ ll - mean(ll) } ) ) cmod2[,-1] <- round( cmod2[,-1], 3) ############################################################################# # EXAMPLE 2: Item response modle based on signal detection theory (IRSDT model) ############################################################################# data(data.si07, package="sirt") data <- data.si07 #-- simulate data set.seed(98) N <- 2000 # define sample size # generate membership scores lambda <- sample(size=N, x=data$trait$x, prob=data$trait$prob, replace=TRUE) b <- data$pars$b d <- data$pars$d items <- data$pars$item dat <- data$sim_fun(lambda=lambda, b=b, d=d, items=items) #- estimate IRSDT model as a grade of membership model with two classes problevels <- seq( 0.025, 0.975, length=20 ) mod1 <- sirt::gom.em( dat, K=2, problevels=problevels ) summary(mod1) ## End(Not run)
## Not run: ############################################################################# # EXAMPLE 1: Nested logit model multiple choice dataset data.si06 ############################################################################# data(data.si06, package="sirt") dat <- data.si06 #** estimate 2PL nested logit model library(mirt) mod1 <- mirt::mirt( dat, model=1, itemtype="2PLNRM", key=rep(0,ncol(dat) ), verbose=TRUE ) summary(mod1) cmod1 <- sirt::mirt.wrapper.coef(mod1)$coef cmod1[,-1] <- round( cmod1[,-1], 3) #** normalize item parameters according Suh and Bolt (2010) cmod2 <- cmod1 # slope parameters ind <- grep("ak",colnames(cmod2)) h1 <- cmod2[,ind ] cmod2[,ind] <- t( apply( h1, 1, FUN=function(ll){ ll - mean(ll) } ) ) # item intercepts ind <- paste0( "d", 0:9 ) ind <- which( colnames(cmod2) %in% ind ) h1 <- cmod2[,ind ] cmod2[,ind] <- t( apply( h1, 1, FUN=function(ll){ ll - mean(ll) } ) ) cmod2[,-1] <- round( cmod2[,-1], 3) ############################################################################# # EXAMPLE 2: Item response modle based on signal detection theory (IRSDT model) ############################################################################# data(data.si07, package="sirt") data <- data.si07 #-- simulate data set.seed(98) N <- 2000 # define sample size # generate membership scores lambda <- sample(size=N, x=data$trait$x, prob=data$trait$prob, replace=TRUE) b <- data$pars$b d <- data$pars$d items <- data$pars$item dat <- data$sim_fun(lambda=lambda, b=b, d=d, items=items) #- estimate IRSDT model as a grade of membership model with two classes problevels <- seq( 0.025, 0.975, length=20 ) mod1 <- sirt::gom.em( dat, K=2, problevels=problevels ) summary(mod1) ## End(Not run)
This datasets contains TIMSS mathematics data from 345 students on 25 items.
data(data.timss)
data(data.timss)
This dataset is a list. data
is the dataset containing
student ID (idstud
), a dummy variable for female (girl
)
and student age (age
). The following variables (starting with
M
in the variable name are items.
The format is:
List of 2
$ data:'data.frame':
..$ idstud : num [1:345] 4e+09 4e+09 4e+09 4e+09 4e+09 ...
..$ girl : int [1:345] 0 0 0 0 0 0 0 0 1 0 ...
..$ age : num [1:345] 10.5 10 10.25 10.25 9.92 ...
..$ M031286 : int [1:345] 0 0 0 1 1 0 1 0 1 0 ...
..$ M031106 : int [1:345] 0 0 0 1 1 0 1 1 0 0 ...
..$ M031282 : int [1:345] 0 0 0 1 1 0 1 1 0 0 ...
..$ M031227 : int [1:345] 0 0 0 0 1 0 0 0 0 0 ...
[...]
..$ M041203 : int [1:345] 0 0 0 1 1 0 0 0 0 1 ...
$ item:'data.frame':
..$ item : Factor w/ 25 levels "M031045","M031068",..: ...
..$ Block : Factor w/ 2 levels "M01","M02": 1 1 1 1 1 1 ..
..$ Format : Factor w/ 2 levels "CR","MC": 1 1 1 1 2 ...
..$ Content.Domain : Factor w/ 3 levels "Data Display",..: 3 3 3 3 ...
..$ Cognitive.Domain: Factor w/ 3 levels "Applying","Knowing",..: 2 3 3 ..
This TIMSS 2007 dataset contains item responses of 4472 eigth grade Russian students in Mathematics and Science.
data(data.timss07.G8.RUS)
data(data.timss07.G8.RUS)
The datasets contains raw responses (raw
), scored responses
(scored
) and item informations (iteminfo
).
The format of the dataset is:
List of 3
$ raw :'data.frame':
..$ idstud : num [1:4472] 3010101 3010102 3010104 3010105 3010106 ...
..$ M022043 : atomic [1:4472] NA 1 4 NA NA NA NA NA NA NA ...
.. ..- attr(*, "value.labels")=Named num [1:7] 9 6 5 4 3 2 1
.. .. ..- attr(*, "names")=chr [1:7] "OMITTED" "NOT REACHED" "E" "D*" ...
[...]
..$ M032698 : atomic [1:4472] NA NA NA NA NA NA NA 2 1 NA ...
.. ..- attr(*, "value.labels")=Named num [1:6] 9 6 4 3 2 1
.. .. ..- attr(*, "names")=chr [1:6] "OMITTED" "NOT REACHED" "D" "C" ...
..$ M032097 : atomic [1:4472] NA NA NA NA NA NA NA 2 3 NA ...
.. ..- attr(*, "value.labels")=Named num [1:6] 9 6 4 3 2 1
.. .. ..- attr(*, "names")=chr [1:6] "OMITTED" "NOT REACHED" "D" "C*" ...
.. [list output truncated]
$ scored : num [1:4472, 1:443] 3010101 3010102 3010104 3010105 3010106 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:443] "idstud" "M022043" "M022046" "M022049" ...
$ iteminfo:'data.frame':
..$ item : Factor w/ 442 levels "M022043","M022046",..: 1 2 3 4 5 6 21 7 8 17 ...
..$ content : Factor w/ 8 levels "Algebra","Biology",..: 7 7 6 1 6 7 4 6 7 7 ...
..$ topic : Factor w/ 49 levels "Algebraic Expression",..: 32 32 41 29 ...
..$ cognitive : Factor w/ 3 levels "Applying","Knowing",..: 2 1 3 2 1 1 1 1 2 1 ...
..$ item.type : Factor w/ 2 levels "CR","MC": 2 1 2 2 1 2 2 2 2 1 ...
..$ N.options : Factor w/ 4 levels "-"," -","4","5": 4 1 3 4 1 4 4 4 3 1 ...
..$ key : Factor w/ 7 levels "-"," -","A","B",..: 6 1 6 7 1 5 5 4 6 1 ...
..$ max.points: int [1:442] 1 1 1 1 1 1 1 1 1 2 ...
..$ item.label: Factor w/ 432 levels "1 teacher for every 12 students ",..: 58 351 ...
TIMSS 2007 8th Grade, Russian Sample
Dataset used in Stoyan, Pommerening and Wuensche (2018; see also Pommerening et al., 2018). In the dataset, 15 forest managers classify 387 trees either as trees to be maintained or as trees to be removed. They assign tree marks, either 0 or 1, where mark 1 means remove.
data(data.trees)
data(data.trees)
The dataset has the following structure.
'data.frame': 387 obs. of 16 variables:
$ Number: int 142 184 9 300 374 42 382 108 125 201 ...
$ FM1 : int 1 1 1 1 1 1 1 1 1 0 ...
$ FM2 : int 1 1 1 0 1 1 1 1 1 1 ...
$ FM3 : int 1 0 1 1 1 1 1 1 1 1 ...
$ FM4 : int 1 1 1 1 1 1 0 1 1 1 ...
$ FM5 : int 1 1 1 1 1 1 0 0 0 1 ...
$ FM6 : int 1 1 1 1 0 1 1 1 1 0 ...
$ FM7 : int 1 0 1 1 0 0 1 0 1 1 ...
$ FM8 : int 1 1 1 1 1 0 0 1 0 1 ...
$ FM9 : int 1 1 0 1 1 1 1 0 1 1 ...
$ FM10 : int 0 1 1 0 1 1 1 1 0 0 ...
$ FM11 : int 1 1 1 1 0 1 1 0 1 0 ...
$ FM12 : int 1 1 1 1 1 1 0 1 0 0 ...
$ FM13 : int 0 1 0 0 1 1 1 1 1 1 ...
$ FM14 : int 1 1 1 1 1 0 1 1 1 1 ...
$ FM15 : int 1 1 0 1 1 0 1 0 0 1 ...
https://www.pommerening.org/wiki/images/d/dc/CoedyBreninSortedforPublication.txt
Pommerening, A., Ramos, C. P., Kedziora, W., Haufe, J., & Stoyan, D. (2018). Rating experiments in forestry: How much agreement is there in tree marking? PloS ONE, 13(3), e0194747. doi:10.1371/journal.pone.0194747
Stoyan, D., Pommerening, A., & Wuensche, A. (2018). Rater classification by means of set-theoretic methods applied to forestry data. Journal of Environmental Statistics, 8(2), 1-17.
## Not run: ############################################################################# # EXAMPLE 1: Latent class models, latent trait models, mixed membership models ############################################################################# data(data.trees, package="sirt") dat <- data.trees[,-1] I <- ncol(dat) #** latent class models with 2, 3, and 4 classes problevels <- seq( 0, 1, len=2 ) mod02 <- sirt::gom.em(dat, K=2, problevels, model="GOM") mod03 <- sirt::gom.em(dat, K=3, problevels, model="GOM") mod04 <- sirt::gom.em(dat, K=4, problevels, model="GOM") #** grade of membership models mod11 <- sirt::gom.em(dat, K=2, theta0.k=10*seq(-1,1,len=11), model="GOMnormal") problevels <- seq( 0, 1, len=3 ) mod12 <- sirt::gom.em(dat, K=2, problevels, model="GOM") mod13 <- sirt::gom.em(dat, K=3, problevels, model="GOM") mod14 <- sirt::gom.em(dat, K=4, problevels, model="GOM") problevels <- seq( 0, 1, len=4 ) mod22 <- sirt::gom.em(dat, K=2, problevels, model="GOM") mod23 <- sirt::gom.em(dat, K=3, problevels, model="GOM") mod24 <- sirt::gom.em(dat, K=4, problevels, model="GOM") #** latent trait models #- 1PL mod31 <- sirt::rasch.mml2(dat) #- 2PL mod32 <- sirt::rasch.mml2(dat, est.a=1:I) #- model comparison IRT.compareModels(mod02, mod03, mod04, mod11, mod12, mod13, mod14, mod22, mod23, mod24, mod31, mod32) #-- inspect model results summary(mod12) round( cbind( mod12$theta.k, mod12$pi.k ),3) summary(mod13) round(cbind( mod13$theta.k, mod13$pi.k ),3) ## End(Not run)
## Not run: ############################################################################# # EXAMPLE 1: Latent class models, latent trait models, mixed membership models ############################################################################# data(data.trees, package="sirt") dat <- data.trees[,-1] I <- ncol(dat) #** latent class models with 2, 3, and 4 classes problevels <- seq( 0, 1, len=2 ) mod02 <- sirt::gom.em(dat, K=2, problevels, model="GOM") mod03 <- sirt::gom.em(dat, K=3, problevels, model="GOM") mod04 <- sirt::gom.em(dat, K=4, problevels, model="GOM") #** grade of membership models mod11 <- sirt::gom.em(dat, K=2, theta0.k=10*seq(-1,1,len=11), model="GOMnormal") problevels <- seq( 0, 1, len=3 ) mod12 <- sirt::gom.em(dat, K=2, problevels, model="GOM") mod13 <- sirt::gom.em(dat, K=3, problevels, model="GOM") mod14 <- sirt::gom.em(dat, K=4, problevels, model="GOM") problevels <- seq( 0, 1, len=4 ) mod22 <- sirt::gom.em(dat, K=2, problevels, model="GOM") mod23 <- sirt::gom.em(dat, K=3, problevels, model="GOM") mod24 <- sirt::gom.em(dat, K=4, problevels, model="GOM") #** latent trait models #- 1PL mod31 <- sirt::rasch.mml2(dat) #- 2PL mod32 <- sirt::rasch.mml2(dat, est.a=1:I) #- model comparison IRT.compareModels(mod02, mod03, mod04, mod11, mod12, mod13, mod14, mod22, mod23, mod24, mod31, mod32) #-- inspect model results summary(mod12) round( cbind( mod12$theta.k, mod12$pi.k ),3) summary(mod13) round(cbind( mod13$theta.k, mod13$pi.k ),3) ## End(Not run)
Converts a data frame in wide format into long format.
data.wide2long(dat, id=NULL, X=NULL, Q=NULL)
data.wide2long(dat, id=NULL, X=NULL, Q=NULL)
dat |
Data frame with item responses and a person identifier if
|
id |
An optional string with the variable name of the person identifier. |
X |
Data frame with person covariates for inclusion in the data frame of long format |
Q |
Data frame with item predictors. Item labels must be included
as a column named by |
Data frame in long format
## Not run: ############################################################################# # EXAMPLE 1: data.pisaRead ############################################################################# miceadds::library_install("lme4") data(data.pisaRead) dat <- data.pisaRead$data Q <- data.pisaRead$item # item predictors # define items items <- colnames(dat)[ substring( colnames(dat), 1, 1 )=="R" ] dat1 <- dat[, c( "idstud", items ) ] # matrix with person predictors X <- dat[, c("idschool", "hisei", "female", "migra") ] # create dataset in long format dat.long <- sirt::data.wide2long( dat=dat1, id="idstud", X=X, Q=Q ) #*** # Model 1: Rasch model mod1 <- lme4::glmer( resp ~ 0 + ( 1 | idstud ) + as.factor(item), data=dat.long, family="binomial", verbose=TRUE) summary(mod1) #*** # Model 2: Rasch model and inclusion of person predictors mod2 <- lme4::glmer( resp ~ 0 + ( 1 | idstud ) + as.factor(item) + female + hisei + migra, data=dat.long, family="binomial", verbose=TRUE) summary(mod2) #*** # Model 3: LLTM mod3 <- lme4::glmer(resp ~ (1|idstud) + as.factor(ItemFormat) + as.factor(TextType), data=dat.long, family="binomial", verbose=TRUE) summary(mod3) ############################################################################# # EXAMPLE 2: Rasch model in lme4 ############################################################################# set.seed(765) N <- 1000 # number of persons I <- 10 # number of items b <- seq(-2,2,length=I) dat <- sirt::sim.raschtype( stats::rnorm(N,sd=1.2), b=b ) dat.long <- sirt::data.wide2long( dat=dat ) #*** # estimate Rasch model with lmer library(lme4) mod1 <- lme4::glmer( resp ~ 0 + as.factor( item ) + ( 1 | id_index), data=dat.long, verbose=TRUE, family="binomial") summary(mod1) ## Random effects: ## Groups Name Variance Std.Dev. ## id_index (Intercept) 1.454 1.206 ## Number of obs: 10000, groups: id_index, 1000 ## ## Fixed effects: ## Estimate Std. Error z value Pr(>|z|) ## as.factor(item)I0001 2.16365 0.10541 20.527 < 2e-16 *** ## as.factor(item)I0002 1.66437 0.09400 17.706 < 2e-16 *** ## as.factor(item)I0003 1.21816 0.08700 14.002 < 2e-16 *** ## as.factor(item)I0004 0.68611 0.08184 8.383 < 2e-16 *** ## [...] ## End(Not run)
## Not run: ############################################################################# # EXAMPLE 1: data.pisaRead ############################################################################# miceadds::library_install("lme4") data(data.pisaRead) dat <- data.pisaRead$data Q <- data.pisaRead$item # item predictors # define items items <- colnames(dat)[ substring( colnames(dat), 1, 1 )=="R" ] dat1 <- dat[, c( "idstud", items ) ] # matrix with person predictors X <- dat[, c("idschool", "hisei", "female", "migra") ] # create dataset in long format dat.long <- sirt::data.wide2long( dat=dat1, id="idstud", X=X, Q=Q ) #*** # Model 1: Rasch model mod1 <- lme4::glmer( resp ~ 0 + ( 1 | idstud ) + as.factor(item), data=dat.long, family="binomial", verbose=TRUE) summary(mod1) #*** # Model 2: Rasch model and inclusion of person predictors mod2 <- lme4::glmer( resp ~ 0 + ( 1 | idstud ) + as.factor(item) + female + hisei + migra, data=dat.long, family="binomial", verbose=TRUE) summary(mod2) #*** # Model 3: LLTM mod3 <- lme4::glmer(resp ~ (1|idstud) + as.factor(ItemFormat) + as.factor(TextType), data=dat.long, family="binomial", verbose=TRUE) summary(mod3) ############################################################################# # EXAMPLE 2: Rasch model in lme4 ############################################################################# set.seed(765) N <- 1000 # number of persons I <- 10 # number of items b <- seq(-2,2,length=I) dat <- sirt::sim.raschtype( stats::rnorm(N,sd=1.2), b=b ) dat.long <- sirt::data.wide2long( dat=dat ) #*** # estimate Rasch model with lmer library(lme4) mod1 <- lme4::glmer( resp ~ 0 + as.factor( item ) + ( 1 | id_index), data=dat.long, verbose=TRUE, family="binomial") summary(mod1) ## Random effects: ## Groups Name Variance Std.Dev. ## id_index (Intercept) 1.454 1.206 ## Number of obs: 10000, groups: id_index, 1000 ## ## Fixed effects: ## Estimate Std. Error z value Pr(>|z|) ## as.factor(item)I0001 2.16365 0.10541 20.527 < 2e-16 *** ## as.factor(item)I0002 1.66437 0.09400 17.706 < 2e-16 *** ## as.factor(item)I0003 1.21816 0.08700 14.002 < 2e-16 *** ## as.factor(item)I0004 0.68611 0.08184 8.383 < 2e-16 *** ## [...] ## End(Not run)
This function calculated the DETECT and polyDETECT index (Stout, Habing, Douglas
& Kim, 1996; Zhang & Stout, 1999a; Zhang, 2007). At first,
conditional covariances have to be estimated
using the ccov.np
function.
detect.index(ccovtable, itemcluster)
detect.index(ccovtable, itemcluster)
ccovtable |
A value of |
itemcluster |
Item cluster for each item. The order of entries must correspond
to the columns in |
Stout, W., Habing, B., Douglas, J., & Kim, H. R. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20, 331-354.
Zhang, J., & Stout, W. (1999a). Conditional covariance structure of generalized compensatory multidimensional items. Psychometrika, 64, 129-152.
Zhang, J., & Stout, W. (1999b). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika, 64, 213-249.
Zhang, J. (2007). Conditional covariance theory and DETECT for polytomous items. Psychometrika, 72, 69-91.
For examples see conf.detect
.
This function assesses differential item functioning using logistic regression analysis (Zumbo, 1999).
dif.logistic.regression(dat, group, score,quant=1.645)
dif.logistic.regression(dat, group, score,quant=1.645)
dat |
Data frame with dichotomous item responses |
group |
Group identifier |
score |
Ability estimate, e.g. the WLE. |
quant |
Used quantile of the normal distribution for assessing statistical significance |
Items are classified into A (negligible DIF), B (moderate DIF) and C (large DIF) levels according to the ETS classification system (Longford, Holland & Thayer, 1993, p. 175). See also Monahan, McHorney, Stump and Perkins (2007) for further DIF effect size classifications.
A data frame with following variables:
itemnr |
Numeric index of the item |
sortDIFindex |
Rank of item with respect to the uniform DIF (from negative to positive values) |
item |
Item name |
N |
Sample size per item |
R |
Value of |
F |
Value of |
nR |
Sample size per item in reference group |
nF |
Sample size per item in focal group |
p |
Item |
pR |
Item |
pF |
Item |
pdiff |
Item |
pdiff.adj |
Adjusted |
uniformDIF |
Uniform DIF estimate |
se.uniformDIF |
Standard error of uniform DIF |
t.uniformDIF |
The |
sig.uniformDIF |
Significance label for uniform DIF |
DIF.ETS |
DIF classification according to the ETS classification system (see Details) |
uniform.EBDIF |
Empirical Bayes estimate of uniform DIF (Longford, Holland & Thayer, 1993) which takes degree of DIF standard error into account |
DIF.SD |
Value of the DIF standard deviation |
nonuniformDIF |
Nonuniform DIF estimate |
se.nonuniformDIF |
Standard error of nonuniform DIF |
t.nonuniformDIF |
The |
sig.nonuniformDIF |
Significance label for nonuniform DIF |
Longford, N. T., Holland, P. W., & Thayer, D. T. (1993). Stability of the MH D-DIF statistics across populations. In P. W. Holland & H. Wainer (Eds.). Differential Item Functioning (pp. 171-196). Hillsdale, NJ: Erlbaum.
Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42(3), 847-862. doi:10.3758/BRM.42.3.847
Monahan, P. O., McHorney, C. A., Stump, T. E., & Perkins, A. J. (2007). Odds ratio, delta, ETS classification, and standardization measures of DIF magnitude for binary logistic regression. Journal of Educational and Behavioral Statistics, 32(1), 92-109. doi:10.3102/1076998606298035
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
For assessing DIF variance see dif.variance
and
dif.strata.variance
See also rasch.evm.pcm
for assessing differential item
functioning in the partial credit model.
See the difR package for a large collection of DIF detection methods (Magis, Beland, Tuerlinckx, & De Boeck, 2010).
For a download of the free DIF-Pack software (SIBTEST, ...) see http://psychometrictools.measuredprogress.org/home.
############################################################################# # EXAMPLE 1: Mathematics data | Gender DIF ############################################################################# data( data.math ) dat <- data.math$data items <- grep( "M", colnames(dat)) # estimate item parameters and WLEs mod <- sirt::rasch.mml2( dat[,items] ) wle <- sirt::wle.rasch( dat[,items], b=mod$item$b )$theta # assess DIF by logistic regression mod1 <- sirt::dif.logistic.regression( dat=dat[,items], score=wle, group=dat$female) # calculate DIF variance dif1 <- sirt::dif.variance( dif=mod1$uniformDIF, se.dif=mod1$se.uniformDIF ) dif1$unweighted.DIFSD ## > dif1$unweighted.DIFSD ## [1] 0.1963958 # calculate stratified DIF variance # stratification based on domains dif2 <- sirt::dif.strata.variance( dif=mod1$uniformDIF, se.dif=mod1$se.uniformDIF, itemcluster=data.math$item$domain ) ## $unweighted.DIFSD ## [1] 0.1455916 ## Not run: #**** # Likelihood ratio test and graphical model test in eRm package miceadds::library_install("eRm") # estimate Rasch model res <- eRm::RM( dat[,items] ) summary(res) # LR-test with respect to female lrres <- eRm::LRtest(res, splitcr=dat$female) summary(lrres) # graphical model test eRm::plotGOF(lrres) ############################################################################# # EXAMPLE 2: Comparison with Mantel-Haenszel test ############################################################################# library(TAM) library(difR) #*** (1) simulate data set.seed(776) N <- 1500 # number of persons per group I <- 12 # number of items mu2 <- .5 # impact (group difference) sd2 <- 1.3 # standard deviation group 2 # define item difficulties b <- seq( -1.5, 1.5, length=I) # simulate DIF effects bdif <- scale( stats::rnorm(I, sd=.6 ), scale=FALSE )[,1] # item difficulties per group b1 <- b + 1/2 * bdif b2 <- b - 1/2 * bdif # simulate item responses dat1 <- sirt::sim.raschtype( theta=stats::rnorm(N, mean=0, sd=1 ), b=b1 ) dat2 <- sirt::sim.raschtype( theta=stats::rnorm(N, mean=mu2, sd=sd2 ), b=b2 ) dat <- rbind( dat1, dat2 ) group <- rep( c(1,2), each=N ) # define group indicator #*** (2) scale data mod <- TAM::tam.mml( dat, group=group ) summary(mod) #*** (3) extract person parameter estimates mod_eap <- mod$person$EAP mod_wle <- tam.wle( mod )$theta #********************************* # (4) techniques for assessing differential item functioning # Model 1: assess DIF by logistic regression and WLEs dif1 <- sirt::dif.logistic.regression( dat=dat, score=mod_wle, group=group) # Model 2: assess DIF by logistic regression and EAPs dif2 <- sirt::dif.logistic.regression( dat=dat, score=mod_eap, group=group) # Model 3: assess DIF by Mantel-Haenszel statistic dif3 <- difR::difMH(Data=dat, group=group, focal.name="1", purify=FALSE ) print(dif3) ## Mantel-Haenszel Chi-square statistic: ## ## Stat. P-value ## I0001 14.5655 0.0001 *** ## I0002 300.3225 0.0000 *** ## I0003 2.7160 0.0993 . ## I0004 191.6925 0.0000 *** ## I0005 0.0011 0.9740 ## [...] ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## Detection threshold: 3.8415 (significance level: 0.05) ## ## Effect size (ETS Delta scale): ## ## Effect size code: ## 'A': negligible effect ## 'B': moderate effect ## 'C': large effect ## ## alphaMH deltaMH ## I0001 1.3908 -0.7752 A ## I0002 0.2339 3.4147 C ## I0003 1.1407 -0.3093 A ## I0004 2.8515 -2.4625 C ## I0005 1.0050 -0.0118 A ## [...] ## ## Effect size codes: 0 'A' 1.0 'B' 1.5 'C' ## (for absolute values of 'deltaMH') # recompute DIF parameter from alphaMH uniformDIF3 <- log(dif3$alphaMH) # compare different DIF statistics dfr <- data.frame( "bdif"=bdif, "LR_wle"=dif1$uniformDIF, "LR_eap"=dif2$uniformDIF, "MH"=uniformDIF3 ) round( dfr, 3 ) ## bdif LR_wle LR_eap MH ## 1 0.236 0.319 0.278 0.330 ## 2 -1.149 -1.473 -1.523 -1.453 ## 3 0.140 0.122 0.038 0.132 ## 4 0.957 1.048 0.938 1.048 ## [...] colMeans( abs( dfr[,-1] - bdif )) ## LR_wle LR_eap MH ## 0.07759187 0.19085743 0.07501708 ## End(Not run)
############################################################################# # EXAMPLE 1: Mathematics data | Gender DIF ############################################################################# data( data.math ) dat <- data.math$data items <- grep( "M", colnames(dat)) # estimate item parameters and WLEs mod <- sirt::rasch.mml2( dat[,items] ) wle <- sirt::wle.rasch( dat[,items], b=mod$item$b )$theta # assess DIF by logistic regression mod1 <- sirt::dif.logistic.regression( dat=dat[,items], score=wle, group=dat$female) # calculate DIF variance dif1 <- sirt::dif.variance( dif=mod1$uniformDIF, se.dif=mod1$se.uniformDIF ) dif1$unweighted.DIFSD ## > dif1$unweighted.DIFSD ## [1] 0.1963958 # calculate stratified DIF variance # stratification based on domains dif2 <- sirt::dif.strata.variance( dif=mod1$uniformDIF, se.dif=mod1$se.uniformDIF, itemcluster=data.math$item$domain ) ## $unweighted.DIFSD ## [1] 0.1455916 ## Not run: #**** # Likelihood ratio test and graphical model test in eRm package miceadds::library_install("eRm") # estimate Rasch model res <- eRm::RM( dat[,items] ) summary(res) # LR-test with respect to female lrres <- eRm::LRtest(res, splitcr=dat$female) summary(lrres) # graphical model test eRm::plotGOF(lrres) ############################################################################# # EXAMPLE 2: Comparison with Mantel-Haenszel test ############################################################################# library(TAM) library(difR) #*** (1) simulate data set.seed(776) N <- 1500 # number of persons per group I <- 12 # number of items mu2 <- .5 # impact (group difference) sd2 <- 1.3 # standard deviation group 2 # define item difficulties b <- seq( -1.5, 1.5, length=I) # simulate DIF effects bdif <- scale( stats::rnorm(I, sd=.6 ), scale=FALSE )[,1] # item difficulties per group b1 <- b + 1/2 * bdif b2 <- b - 1/2 * bdif # simulate item responses dat1 <- sirt::sim.raschtype( theta=stats::rnorm(N, mean=0, sd=1 ), b=b1 ) dat2 <- sirt::sim.raschtype( theta=stats::rnorm(N, mean=mu2, sd=sd2 ), b=b2 ) dat <- rbind( dat1, dat2 ) group <- rep( c(1,2), each=N ) # define group indicator #*** (2) scale data mod <- TAM::tam.mml( dat, group=group ) summary(mod) #*** (3) extract person parameter estimates mod_eap <- mod$person$EAP mod_wle <- tam.wle( mod )$theta #********************************* # (4) techniques for assessing differential item functioning # Model 1: assess DIF by logistic regression and WLEs dif1 <- sirt::dif.logistic.regression( dat=dat, score=mod_wle, group=group) # Model 2: assess DIF by logistic regression and EAPs dif2 <- sirt::dif.logistic.regression( dat=dat, score=mod_eap, group=group) # Model 3: assess DIF by Mantel-Haenszel statistic dif3 <- difR::difMH(Data=dat, group=group, focal.name="1", purify=FALSE ) print(dif3) ## Mantel-Haenszel Chi-square statistic: ## ## Stat. P-value ## I0001 14.5655 0.0001 *** ## I0002 300.3225 0.0000 *** ## I0003 2.7160 0.0993 . ## I0004 191.6925 0.0000 *** ## I0005 0.0011 0.9740 ## [...] ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## Detection threshold: 3.8415 (significance level: 0.05) ## ## Effect size (ETS Delta scale): ## ## Effect size code: ## 'A': negligible effect ## 'B': moderate effect ## 'C': large effect ## ## alphaMH deltaMH ## I0001 1.3908 -0.7752 A ## I0002 0.2339 3.4147 C ## I0003 1.1407 -0.3093 A ## I0004 2.8515 -2.4625 C ## I0005 1.0050 -0.0118 A ## [...] ## ## Effect size codes: 0 'A' 1.0 'B' 1.5 'C' ## (for absolute values of 'deltaMH') # recompute DIF parameter from alphaMH uniformDIF3 <- log(dif3$alphaMH) # compare different DIF statistics dfr <- data.frame( "bdif"=bdif, "LR_wle"=dif1$uniformDIF, "LR_eap"=dif2$uniformDIF, "MH"=uniformDIF3 ) round( dfr, 3 ) ## bdif LR_wle LR_eap MH ## 1 0.236 0.319 0.278 0.330 ## 2 -1.149 -1.473 -1.523 -1.453 ## 3 0.140 0.122 0.038 0.132 ## 4 0.957 1.048 0.938 1.048 ## [...] colMeans( abs( dfr[,-1] - bdif )) ## LR_wle LR_eap MH ## 0.07759187 0.19085743 0.07501708 ## End(Not run)
Calculation of stratified DIF variance
dif.strata.variance(dif, se.dif, itemcluster)
dif.strata.variance(dif, se.dif, itemcluster)
dif |
Vector of uniform DIF effects |
se.dif |
Standard error of uniform DIF effects |
itemcluster |
Vector of item strata |
A list with following entries:
stratadif |
Summary statistics of DIF effects within item strata |
weighted.DIFSD |
Weighted DIF standard deviation |
unweigted.DIFSD |
DIF standard deviation |
Longford, N. T., Holland, P. W., & Thayer, D. T. (1993). Stability of the MH D-DIF statistics across populations. In P. W. Holland & H. Wainer (Eds.). Differential Item Functioning (pp. 171-196). Hillsdale, NJ: Erlbaum.
See dif.logistic.regression
for examples.
This function calculates the variance of DIF effects, the so called DIF variance (Longford, Holland & Thayer, 1993).
dif.variance(dif, se.dif, items=paste("item", 1:length(dif), sep="") )
dif.variance(dif, se.dif, items=paste("item", 1:length(dif), sep="") )
dif |
Vector of uniform DIF effects |
se.dif |
Standard error of uniform DIF effects |
items |
Optional vector of item names |
A list with following entries
weighted.DIFSD |
Weighted DIF standard deviation |
unweigted.DIFSD |
DIF standard deviation |
mean.se.dif |
Mean of standard errors of DIF effects |
eb.dif |
Empirical Bayes estimates of DIF effects |
Longford, N. T., Holland, P. W., & Thayer, D. T. (1993). Stability of the MH D-DIF statistics across populations. In P. W. Holland & H. Wainer (Eds.). Differential Item Functioning (pp. 171-196). Hillsdale, NJ: Erlbaum.
See dif.logistic.regression
for examples.
Maximum likelihood estimation of the parameters of the Dirichlet distribution
dirichlet.mle(x, weights=NULL, eps=10^(-5), convcrit=1e-05, maxit=1000, oldfac=.3, progress=FALSE)
dirichlet.mle(x, weights=NULL, eps=10^(-5), convcrit=1e-05, maxit=1000, oldfac=.3, progress=FALSE)
x |
Data frame with |
weights |
Optional vector of frequency weights |
eps |
Tolerance number which is added to prevent from logarithms of zero |
convcrit |
Convergence criterion |
maxit |
Maximum number of iterations |
oldfac |
Convergence acceleration factor. It must be a parameter between 0 and 1. |
progress |
Display iteration progress? |
A list with following entries
alpha |
Vector of |
alpha0 |
The concentration parameter |
xsi |
Vector of proportions |
Minka, T. P. (2012). Estimating a Dirichlet distribution. Technical Report.
For simulating Dirichlet vectors with matrix-wise
parameters see
dirichlet.simul
.
For a variety of functions concerning the Dirichlet distribution see the DirichletReg package.
############################################################################# # EXAMPLE 1: Simulate and estimate Dirichlet distribution ############################################################################# # (1) simulate data set.seed(789) N <- 200 probs <- c(.5, .3, .2 ) alpha0 <- .5 alpha <- alpha0*probs alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE ) x <- sirt::dirichlet.simul( alpha ) # (2) estimate Dirichlet parameters dirichlet.mle(x) ## $alpha ## [1] 0.24507708 0.14470944 0.09590745 ## $alpha0 ## [1] 0.485694 ## $xsi ## [1] 0.5045916 0.2979437 0.1974648 ## Not run: ############################################################################# # EXAMPLE 2: Fitting Dirichlet distribution with frequency weights ############################################################################# # define observed data x <- scan( nlines=1) 1 0 0 1 .5 .5 x <- matrix( x, nrow=3, ncol=2, byrow=TRUE) # transform observations x into (0,1) eps <- .01 x <- ( x + eps ) / ( 1 + 2 * eps ) # compare results with likelihood fitting package maxLik miceadds::library_install("maxLik") # define likelihood function dirichlet.ll <- function(param) { ll <- sum( weights * log( ddirichlet( x, param ) ) ) ll } #*** weights 10-10-1 weights <- c(10, 10, 1 ) mod1a <- sirt::dirichlet.mle( x, weights=weights ) mod1a # estimation in maxLik mod1b <- maxLik::maxLik(loglik, start=c(.5,.5)) print( mod1b ) coef( mod1b ) #*** weights 10-10-10 weights <- c(10, 10, 10 ) mod2a <- sirt::dirichlet.mle( x, weights=weights ) mod2a # estimation in maxLik mod2b <- maxLik::maxLik(loglik, start=c(.5,.5)) print( mod2b ) coef( mod2b ) #*** weights 30-10-2 weights <- c(30, 10, 2 ) mod3a <- sirt::dirichlet.mle( x, weights=weights ) mod3a # estimation in maxLik mod3b <- maxLik::maxLik(loglik, start=c(.25,.25)) print( mod3b ) coef( mod3b ) ## End(Not run)
############################################################################# # EXAMPLE 1: Simulate and estimate Dirichlet distribution ############################################################################# # (1) simulate data set.seed(789) N <- 200 probs <- c(.5, .3, .2 ) alpha0 <- .5 alpha <- alpha0*probs alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE ) x <- sirt::dirichlet.simul( alpha ) # (2) estimate Dirichlet parameters dirichlet.mle(x) ## $alpha ## [1] 0.24507708 0.14470944 0.09590745 ## $alpha0 ## [1] 0.485694 ## $xsi ## [1] 0.5045916 0.2979437 0.1974648 ## Not run: ############################################################################# # EXAMPLE 2: Fitting Dirichlet distribution with frequency weights ############################################################################# # define observed data x <- scan( nlines=1) 1 0 0 1 .5 .5 x <- matrix( x, nrow=3, ncol=2, byrow=TRUE) # transform observations x into (0,1) eps <- .01 x <- ( x + eps ) / ( 1 + 2 * eps ) # compare results with likelihood fitting package maxLik miceadds::library_install("maxLik") # define likelihood function dirichlet.ll <- function(param) { ll <- sum( weights * log( ddirichlet( x, param ) ) ) ll } #*** weights 10-10-1 weights <- c(10, 10, 1 ) mod1a <- sirt::dirichlet.mle( x, weights=weights ) mod1a # estimation in maxLik mod1b <- maxLik::maxLik(loglik, start=c(.5,.5)) print( mod1b ) coef( mod1b ) #*** weights 10-10-10 weights <- c(10, 10, 10 ) mod2a <- sirt::dirichlet.mle( x, weights=weights ) mod2a # estimation in maxLik mod2b <- maxLik::maxLik(loglik, start=c(.5,.5)) print( mod2b ) coef( mod2b ) #*** weights 30-10-2 weights <- c(30, 10, 2 ) mod3a <- sirt::dirichlet.mle( x, weights=weights ) mod3a # estimation in maxLik mod3b <- maxLik::maxLik(loglik, start=c(.25,.25)) print( mod3b ) coef( mod3b ) ## End(Not run)
This function makes random draws from a Dirichlet distribution.
dirichlet.simul(alpha)
dirichlet.simul(alpha)
alpha |
A matrix with |
A data frame with Dirichlet distributed responses
############################################################################# # EXAMPLE 1: Simulation with two components ############################################################################# set.seed(789) N <- 2000 probs <- c(.7, .3) # define (extremal) class probabilities #*** alpha0=.2 -> nearly crisp latent classes alpha0 <- .2 alpha <- alpha0*probs alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE ) x <- sirt::dirichlet.simul( alpha ) htitle <- expression(paste( alpha[0], "=.2, ", p[1], "=.7" ) ) hist( x[,1], breaks=seq(0,1,len=20), main=htitle) #*** alpha0=3 -> strong deviation from crisp membership alpha0 <- 3 alpha <- alpha0*probs alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE ) x <- sirt::dirichlet.simul( alpha ) htitle <- expression(paste( alpha[0], "=3, ", p[1], "=.7" ) ) hist( x[,1], breaks=seq(0,1,len=20), main=htitle) ## Not run: ############################################################################# # EXAMPLE 2: Simulation with three components ############################################################################# set.seed(986) N <- 2000 probs <- c( .5, .35, .15 ) #*** alpha0=.2 alpha0 <- .2 alpha <- alpha0*probs alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE ) x <- sirt::dirichlet.simul( alpha ) htitle <- expression(paste( alpha[0], "=.2, ", p[1], "=.7" ) ) miceadds::library_install("ade4") ade4::triangle.plot(x, label=NULL, clabel=1) #*** alpha0=3 alpha0 <- 3 alpha <- alpha0*probs alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE ) x <- sirt::dirichlet.simul( alpha ) htitle <- expression(paste( alpha[0], "=3, ", p[1], "=.7" ) ) ade4::triangle.plot(x, label=NULL, clabel=1) ## End(Not run)
############################################################################# # EXAMPLE 1: Simulation with two components ############################################################################# set.seed(789) N <- 2000 probs <- c(.7, .3) # define (extremal) class probabilities #*** alpha0=.2 -> nearly crisp latent classes alpha0 <- .2 alpha <- alpha0*probs alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE ) x <- sirt::dirichlet.simul( alpha ) htitle <- expression(paste( alpha[0], "=.2, ", p[1], "=.7" ) ) hist( x[,1], breaks=seq(0,1,len=20), main=htitle) #*** alpha0=3 -> strong deviation from crisp membership alpha0 <- 3 alpha <- alpha0*probs alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE ) x <- sirt::dirichlet.simul( alpha ) htitle <- expression(paste( alpha[0], "=3, ", p[1], "=.7" ) ) hist( x[,1], breaks=seq(0,1,len=20), main=htitle) ## Not run: ############################################################################# # EXAMPLE 2: Simulation with three components ############################################################################# set.seed(986) N <- 2000 probs <- c( .5, .35, .15 ) #*** alpha0=.2 alpha0 <- .2 alpha <- alpha0*probs alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE ) x <- sirt::dirichlet.simul( alpha ) htitle <- expression(paste( alpha[0], "=.2, ", p[1], "=.7" ) ) miceadds::library_install("ade4") ade4::triangle.plot(x, label=NULL, clabel=1) #*** alpha0=3 alpha0 <- 3 alpha <- alpha0*probs alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE ) x <- sirt::dirichlet.simul( alpha ) htitle <- expression(paste( alpha[0], "=3, ", p[1], "=.7" ) ) ade4::triangle.plot(x, label=NULL, clabel=1) ## End(Not run)
The function dmlavaan
compares model parameters from different lavaan
models fitted to the same dataset. This leads to dependent coefficients.
Statistical inference is either conducted by M-estimation (i.e., robust
sandwich method; method="bootstrap"
) or bootstrap (method="bootstrap"
).
See Mize et al. (2019) or Weesie (1999) for more details.
dmlavaan(fun1, args1, fun2, args2, method="sandwich", R=50)
dmlavaan(fun1, args1, fun2, args2, method="sandwich", R=50)
fun1 |
lavaan function of the first model (e.g., |
args1 |
arguments for lavaan function in the first model |
fun2 |
lavaan function of the second model (e.g., |
args2 |
arguments for lavaan function in the second model |
method |
estimation method for standard errors |
R |
Number of bootstrap samples |
In bootstrap estimation, a normal approximation is applied in the
computation of confidence intervals. Hence, R
could be chosen
relatively small.
TO DO (not yet implemented):
1) | inclusion of sampling weights |
2) | cluster robust standard errors in hierarchical sampling |
3) | stratification |
A list with following entries
coef |
Model parameters of both models |
vcov |
Covariance matrix of model parameters of both models |
partable |
Parameter table containing all univariate model parameters |
... |
More entries |
Mize, T.D., Doan, L., & Long, J.S. (2019). A general framework for comparing predictions and marginal effects across models. Sociological Methodology, 49(1), 152-189. doi:10.1177/0081175019852763
Weesie, J. (1999) Seemingly unrelated estimation and the cluster-adjusted sandwich estimator. Stata Technical Bulletin, 9, 231-248.
## Not run: ############################################################################ # EXAMPLE 1: Confirmatory factor analysis with and without fourth item ############################################################################# #**** simulate data N <- 200 # number of persons I <- 4 # number of items # loadings and error correlations lam <- seq(.7,.4, len=I) PSI <- diag( 1-lam^2 ) # define some model misspecification sd_error <- .1 S1 <- matrix( c( -1.84, 0.39,-0.68, 0.13, 0.39,-1.31,-0.07,-0.27, -0.68,-0.07, 0.90, 1.91, 0.13,-0.27, 1.91,-0.56 ), nrow=4, ncol=4, byrow=TRUE) S1 <- ( S1 - mean(S1) ) / sd(S1) * sd_error Sigma <- lam %*% t(lam) + PSI + S1 dat <- MASS::mvrnorm(n=N, mu=rep(0,I), Sigma=Sigma) colnames(dat) <- paste0("X",1:4) dat <- as.data.frame(dat) rownames(Sigma) <- colnames(Sigma) <- colnames(dat) #*** define two lavaan models lavmodel1 <- "F=~ X1 + X2 + X3 + X4" lavmodel2 <- "F=~ X1 + X2 + X3" #*** define lavaan estimation arguments and functions fun2 <- fun1 <- "cfa" args1 <- list( model=lavmodel1, data=dat, std.lv=TRUE, estimator="MLR") args2 <- args1 args2$model <- lavmodel2 #* run model comparison res1 <- sirt::dmlavaan( fun1=fun1, args1=args1, fun2=fun2, args2=args2) # inspect results sirt:::print_digits(res1$partable, digits=3) ## End(Not run)
## Not run: ############################################################################ # EXAMPLE 1: Confirmatory factor analysis with and without fourth item ############################################################################# #**** simulate data N <- 200 # number of persons I <- 4 # number of items # loadings and error correlations lam <- seq(.7,.4, len=I) PSI <- diag( 1-lam^2 ) # define some model misspecification sd_error <- .1 S1 <- matrix( c( -1.84, 0.39,-0.68, 0.13, 0.39,-1.31,-0.07,-0.27, -0.68,-0.07, 0.90, 1.91, 0.13,-0.27, 1.91,-0.56 ), nrow=4, ncol=4, byrow=TRUE) S1 <- ( S1 - mean(S1) ) / sd(S1) * sd_error Sigma <- lam %*% t(lam) + PSI + S1 dat <- MASS::mvrnorm(n=N, mu=rep(0,I), Sigma=Sigma) colnames(dat) <- paste0("X",1:4) dat <- as.data.frame(dat) rownames(Sigma) <- colnames(Sigma) <- colnames(dat) #*** define two lavaan models lavmodel1 <- "F=~ X1 + X2 + X3 + X4" lavmodel2 <- "F=~ X1 + X2 + X3" #*** define lavaan estimation arguments and functions fun2 <- fun1 <- "cfa" args1 <- list( model=lavmodel1, data=dat, std.lv=TRUE, estimator="MLR") args2 <- args1 args2$model <- lavmodel2 #* run model comparison res1 <- sirt::dmlavaan( fun1=fun1, args1=args1, fun2=fun2, args2=args2) # inspect results sirt:::print_digits(res1$partable, digits=3) ## End(Not run)
This function computes the eigenvalue decomposition of
symmetric positive definite matrices. The eigenvalues are computed
by the Rayleigh quotient method (Lange, 2010, p. 120). In addition,
the inverse matrix can be calculated.
eigenvalues.manymatrices(Sigma.all, itermax=10, maxconv=0.001, inverse=FALSE )
eigenvalues.manymatrices(Sigma.all, itermax=10, maxconv=0.001, inverse=FALSE )
Sigma.all |
An |
itermax |
Maximum number of iterations |
maxconv |
Convergence criterion for convergence of eigenvectors |
inverse |
A logical which indicates if the inverse matrix shall be calculated |
A list with following entries
lambda |
Matrix with eigenvalues |
U |
An |
logdet |
Vector of logarithm of determinants |
det |
Vector of determinants |
Sigma.inv |
Inverse matrix if |
Lange, K. (2010). Numerical Analysis for Statisticians. New York: Springer.
# define matrices Sigma <- diag(1,3) Sigma[ lower.tri(Sigma) ] <- Sigma[ upper.tri(Sigma) ] <- c(.4,.6,.8 ) Sigma1 <- Sigma Sigma <- diag(1,3) Sigma[ lower.tri(Sigma) ] <- Sigma[ upper.tri(Sigma) ] <- c(.2,.1,.99 ) Sigma2 <- Sigma # collect matrices in a "super-matrix" Sigma.all <- rbind( matrix( Sigma1, nrow=1, byrow=TRUE), matrix( Sigma2, nrow=1, byrow=TRUE) ) Sigma.all <- Sigma.all[ c(1,1,2,2,1 ), ] # eigenvalue decomposition m1 <- sirt::eigenvalues.manymatrices( Sigma.all ) m1 # eigenvalue decomposition for Sigma1 s1 <- svd(Sigma1) s1
# define matrices Sigma <- diag(1,3) Sigma[ lower.tri(Sigma) ] <- Sigma[ upper.tri(Sigma) ] <- c(.4,.6,.8 ) Sigma1 <- Sigma Sigma <- diag(1,3) Sigma[ lower.tri(Sigma) ] <- Sigma[ upper.tri(Sigma) ] <- c(.2,.1,.99 ) Sigma2 <- Sigma # collect matrices in a "super-matrix" Sigma.all <- rbind( matrix( Sigma1, nrow=1, byrow=TRUE), matrix( Sigma2, nrow=1, byrow=TRUE) ) Sigma.all <- Sigma.all[ c(1,1,2,2,1 ), ] # eigenvalue decomposition m1 <- sirt::eigenvalues.manymatrices( Sigma.all ) m1 # eigenvalue decomposition for Sigma1 s1 <- svd(Sigma1) s1
This function does the linking in the generalized
logistic item response model. Only item difficulties (
item parameters) are allowed. Mean-mean linking and the methods
of Haebara and Stocking-Lord are implemented (Kolen & Brennan, 2004).
equating.rasch(x, y, theta=seq(-4, 4, len=100), alpha1=0, alpha2=0)
equating.rasch(x, y, theta=seq(-4, 4, len=100), alpha1=0, alpha2=0)
x |
Matrix with two columns: First column items, second column item difficulties |
y |
Matrix with two columns: First columns item, second column item difficulties |
theta |
Vector of theta values at which the linking functions
should be evaluated. If a weighting according to a prespecified normal
distribution |
alpha1 |
Fixed |
alpha2 |
Fixed |
B.est |
Estimated linking constants according to the methods
|
descriptives |
Descriptives of the linking. The linking error
( |
anchor |
Original and transformed item parameters of anchor items |
transf.par |
Original and transformed item parameters of all items |
Kolen, M. J., & Brennan, R. L. (2004). Test Equating, Scaling, and Linking: Methods and Practices. New York: Springer.
For estimating standard errors (due to inference with respect to
the item domain) of this procedure see equating.rasch.jackknife
.
For linking several studies see linking.haberman
or
invariance.alignment
.
A robust alternative to mean-mean linking is implemented in
linking.robust
.
For linking under more general item response models see the plink package.
############################################################################# # EXAMPLE 1: Linking item parameters of the PISA study ############################################################################# data(data.pisaPars) pars <- data.pisaPars # linking the two studies with the Rasch model mod <- sirt::equating.rasch(x=pars[,c("item","study1")], y=pars[,c("item","study2")]) ## Mean.Mean Haebara Stocking.Lord ## 1 0.08828 0.08896269 0.09292838 ## Not run: #*** linking using the plink package # The plink package is not available on CRAN anymore. # You can download the package with # utils::install.packages("plink", repos="http://www2.uaem.mx/r-mirror") library(plink) I <- nrow(pars) pm <- plink::as.poly.mod(I) # linking parameters plink.pars1 <- list( "study1"=data.frame( 1, pars$study1, 0 ), "study2"=data.frame( 1, pars$study2, 0 ) ) # the parameters are arranged in the columns: # Discrimination, Difficulty, Guessing Parameter # common items common.items <- cbind("study1"=1:I,"study2"=1:I) # number of categories per item cats.item <- list( "study1"=rep(2,I), "study2"=rep(2,I)) # convert into plink object x <- plink::as.irt.pars( plink.pars1, common.items, cat=cats.item, poly.mod=list(pm,pm)) # linking using plink: first group is reference group out <- plink::plink(x, rescale="MS", base.grp=1, D=1.7) # summary for linking summary(out) ## ------- group2/group1* ------- ## Linking Constants ## ## A B ## Mean/Mean 1.000000 -0.088280 ## Mean/Sigma 1.000000 -0.088280 ## Haebara 1.000000 -0.088515 ## Stocking-Lord 1.000000 -0.096610 # extract linked parameters pars.out <- plink::link.pars(out) ## End(Not run)
############################################################################# # EXAMPLE 1: Linking item parameters of the PISA study ############################################################################# data(data.pisaPars) pars <- data.pisaPars # linking the two studies with the Rasch model mod <- sirt::equating.rasch(x=pars[,c("item","study1")], y=pars[,c("item","study2")]) ## Mean.Mean Haebara Stocking.Lord ## 1 0.08828 0.08896269 0.09292838 ## Not run: #*** linking using the plink package # The plink package is not available on CRAN anymore. # You can download the package with # utils::install.packages("plink", repos="http://www2.uaem.mx/r-mirror") library(plink) I <- nrow(pars) pm <- plink::as.poly.mod(I) # linking parameters plink.pars1 <- list( "study1"=data.frame( 1, pars$study1, 0 ), "study2"=data.frame( 1, pars$study2, 0 ) ) # the parameters are arranged in the columns: # Discrimination, Difficulty, Guessing Parameter # common items common.items <- cbind("study1"=1:I,"study2"=1:I) # number of categories per item cats.item <- list( "study1"=rep(2,I), "study2"=rep(2,I)) # convert into plink object x <- plink::as.irt.pars( plink.pars1, common.items, cat=cats.item, poly.mod=list(pm,pm)) # linking using plink: first group is reference group out <- plink::plink(x, rescale="MS", base.grp=1, D=1.7) # summary for linking summary(out) ## ------- group2/group1* ------- ## Linking Constants ## ## A B ## Mean/Mean 1.000000 -0.088280 ## Mean/Sigma 1.000000 -0.088280 ## Haebara 1.000000 -0.088515 ## Stocking-Lord 1.000000 -0.096610 # extract linked parameters pars.out <- plink::link.pars(out) ## End(Not run)
This function estimates the linking error in linking based on Jackknife (Monseur & Berezner, 2007).
equating.rasch.jackknife(pars.data, display=TRUE, se.linkerror=FALSE, alpha1=0, alpha2=0)
equating.rasch.jackknife(pars.data, display=TRUE, se.linkerror=FALSE, alpha1=0, alpha2=0)
pars.data |
Data frame with four columns: jackknife unit (1st column), item parameter study 1 (2nd column), item parameter study 2 (3rd column), item (4th column) |
display |
Display progress? |
se.linkerror |
Compute standard error of the linking error |
alpha1 |
Fixed |
alpha2 |
Fixed |
A list with following entries:
pars.data |
Used item parameters |
itemunits |
Used units for jackknife |
descriptives |
Descriptives for Jackknife.
|
Monseur, C., & Berezner, A. (2007). The computation of equating errors in international surveys in education. Journal of Applied Measurement, 8, 323-335.
For more details on linking methods see equating.rasch
.
############################################################################# # EXAMPLE 1: Linking errors PISA study ############################################################################# data(data.pisaPars) pars <- data.pisaPars # Linking error: Jackknife unit is the testlet vars <- c("testlet","study1","study2","item") res1 <- sirt::equating.rasch.jackknife(pars[, vars]) res1$descriptives ## N.items N.units shift SD linkerror.jackknife SE.SD.jackknife ## 1 25 8 0.09292838 0.1487387 0.04491197 0.03466309 # Linking error: Jackknife unit is the item res2 <- sirt::equating.rasch.jackknife(pars[, vars ] ) res2$descriptives ## N.items N.units shift SD linkerror.jackknife SE.SD.jackknife ## 1 25 25 0.09292838 0.1487387 0.02682839 0.02533327
############################################################################# # EXAMPLE 1: Linking errors PISA study ############################################################################# data(data.pisaPars) pars <- data.pisaPars # Linking error: Jackknife unit is the testlet vars <- c("testlet","study1","study2","item") res1 <- sirt::equating.rasch.jackknife(pars[, vars]) res1$descriptives ## N.items N.units shift SD linkerror.jackknife SE.SD.jackknife ## 1 25 8 0.09292838 0.1487387 0.04491197 0.03466309 # Linking error: Jackknife unit is the item res2 <- sirt::equating.rasch.jackknife(pars[, vars ] ) res2$descriptives ## N.items N.units shift SD linkerror.jackknife SE.SD.jackknife ## 1 25 25 0.09292838 0.1487387 0.02682839 0.02533327
This function estimates the DETECT index (Stout, Habing, Douglas & Kim, 1996; Zhang & Stout, 1999a, 1999b) in an exploratory way. Conditional covariances of itempairs are transformed into a distance matrix such that items are clustered by the hierarchical Ward algorithm (Roussos, Stout & Marden, 1998). Note that the function will not provide the same output as the original DETECT software.
expl.detect(data, score, nclusters, N.est=NULL, seed=NULL, bwscale=1.1, smooth=TRUE, use_sum_score=FALSE, hclust_method="ward.D", estsample=NULL)
expl.detect(data, score, nclusters, N.est=NULL, seed=NULL, bwscale=1.1, smooth=TRUE, use_sum_score=FALSE, hclust_method="ward.D", estsample=NULL)
data |
An |
score |
An ability estimate, e.g. the WLE, sum score or mean score |
nclusters |
Maximum number of clusters used in the exploratory analysis |
N.est |
Number of students in a (possible) validation of the DETECT index.
|
seed |
Random seed |
bwscale |
Bandwidth scale factor |
smooth |
Logical indicating whether smoothing should be applied for conditional covariance estimation |
use_sum_score |
Logical indicating whether sum score should be used. With this option, the bias corrected conditional covariance of Zhang and Stout (1999) is used. |
hclust_method |
Clustering method used as the argument
|
estsample |
Optional vector of subject indices that defines the estimation sample |
A list with following entries
detect.unweighted |
Unweighted DETECT statistics |
detect.weighted |
Weighted DETECT statistics. Weighting is done proportionally to sample sizes of item pairs. |
clusterfit |
Fit of the cluster method |
itemcluster |
Cluster allocations |
use_sum_score
Roussos, L. A., Stout, W. F., & Marden, J. I. (1998). Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement, 35, 1-30.
Stout, W., Habing, B., Douglas, J., & Kim, H. R. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20, 331-354.
Zhang, J., & Stout, W. (1999a). Conditional covariance structure of generalized compensatory multidimensional items, Psychometrika, 64, 129-152.
Zhang, J., & Stout, W. (1999b). The theoretical DETECT index of dimensionality and its application to approximate simple structure, Psychometrika, 64, 213-249.
For examples see conf.detect
.
Estimates the functional unidimensional item response model for dichotomous data (Ip, Molenberghs, Chen, Goegebeur & De Boeck, 2013). Either the IRT model is estimated using a probit link and employing tetrachoric correlations or item discriminations and intercepts of a pre-estimated multidimensional IRT model are provided as input.
f1d.irt(dat=NULL, nnormal=1000, nfactors=3, A=NULL, intercept=NULL, mu=NULL, Sigma=NULL, maxiter=100, conv=10^(-5), progress=TRUE)
f1d.irt(dat=NULL, nnormal=1000, nfactors=3, A=NULL, intercept=NULL, mu=NULL, Sigma=NULL, maxiter=100, conv=10^(-5), progress=TRUE)
dat |
Data frame with dichotomous item responses |
nnormal |
Number of |
nfactors |
Number of dimensions to be estimated |
A |
Matrix of item discriminations (if the IRT model is already estimated) |
intercept |
Vector of item intercepts (if the IRT model is already estimated) |
mu |
Vector of estimated means. In the default it is assumed that all means are zero. |
Sigma |
Estimated covariance matrix. In the default it is the identity matrix. |
maxiter |
Maximum number of iterations |
conv |
Convergence criterion |
progress |
Display progress? The default is |
The functional unidimensional item response model (F1D model)
for dichotomous item responses is based on a multidimensional model with a
link function (probit or logit):
It is assumed that is multivariate normally
distribution with a zero mean vector and identity covariance matrix.
The F1D model estimates unidimensional item response functions such that
The optimization function minimizes the deviations of
the approximation equations
The optimization function is defined by