Package 'sirt'

Title: Supplementary Item Response Theory Models
Description: Supplementary functions for item response models aiming to complement existing R packages. The functionality includes among others multidimensional compensatory and noncompensatory IRT models (Reckase, 2009, <doi:10.1007/978-0-387-89976-3>), MCMC for hierarchical IRT models and testlet models (Fox, 2010, <doi:10.1007/978-1-4419-0742-4>), NOHARM (McDonald, 1982, <doi:10.1177/014662168200600402>), Rasch copula model (Braeken, 2011, <doi:10.1007/s11336-010-9190-4>; Schroeders, Robitzsch & Schipolowski, 2014, <doi:10.1111/jedm.12054>), faceted and hierarchical rater models (DeCarlo, Kim & Johnson, 2011, <doi:10.1111/j.1745-3984.2011.00143.x>), ordinal IRT model (ISOP; Scheiblechner, 1995, <doi:10.1007/BF02301417>), DETECT statistic (Stout, Habing, Douglas & Kim, 1996, <doi:10.1177/014662169602000403>), local structural equation modeling (LSEM; Hildebrandt, Luedtke, Robitzsch, Sommer & Wilhelm, 2016, <doi:10.1080/00273171.2016.1142856>).
Authors: Alexander Robitzsch [aut,cre]
Maintainer: Alexander Robitzsch <[email protected]>
License: GPL (>= 2)
Version: 4.2-57
Built: 2024-04-20 16:33:39 UTC
Source: https://github.com/alexanderrobitzsch/sirt

Help Index


Supplementary Item Response Theory Models

Description

Supplementary functions for item response models aiming to complement existing R packages. The functionality includes among others multidimensional compensatory and noncompensatory IRT models (Reckase, 2009, <doi:10.1007/978-0-387-89976-3>), MCMC for hierarchical IRT models and testlet models (Fox, 2010, <doi:10.1007/978-1-4419-0742-4>), NOHARM (McDonald, 1982, <doi:10.1177/014662168200600402>), Rasch copula model (Braeken, 2011, <doi:10.1007/s11336-010-9190-4>; Schroeders, Robitzsch & Schipolowski, 2014, <doi:10.1111/jedm.12054>), faceted and hierarchical rater models (DeCarlo, Kim & Johnson, 2011, <doi:10.1111/j.1745-3984.2011.00143.x>), ordinal IRT model (ISOP; Scheiblechner, 1995, <doi:10.1007/BF02301417>), DETECT statistic (Stout, Habing, Douglas & Kim, 1996, <doi:10.1177/014662169602000403>), local structural equation modeling (LSEM; Hildebrandt, Luedtke, Robitzsch, Sommer & Wilhelm, 2016, <doi:10.1080/00273171.2016.1142856>).

Details

The sirt package enables the estimation of following models:

Author(s)

Alexander Robitzsch [aut,cre] (<https://orcid.org/0000-0002-8226-3132>)

Maintainer: Alexander Robitzsch <[email protected]>

References

Asparouhov, T., & Muthen, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling, 21(4), 1-14. doi:10.1080/10705511.2014.919210

Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141-157.

Braeken, J. (2011). A boundary mixture approach to violations of conditional independence. Psychometrika, 76(1), 57-76. doi:10.1007/s11336-010-9190-4

DeCarlo, T., Kim, Y., & Johnson, M. S. (2011). A hierarchical rater model for constructed responses, with a signal detection rater model. Journal of Educational Measurement, 48(3), 333-356. doi:10.1111/j.1745-3984.2011.00143.x

Dimitrov, D. (2003). Marginal true-score measures and reliability for binary items as a function of their IRT parameters. Applied Psychological Measurement, 27, 440-458.

Dimitrov, D. M. (2007). Least squares distance method of cognitive validation and analysis for binary items using their item response theory parameters. Applied Psychological Measurement, 31, 367-387.

Erosheva, E. A., Fienberg, S. E., & Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. Annals of Applied Statistics, 1, 502-537.

Fox, J.-P. (2010). Bayesian item response modeling. New York: Springer. doi:10.1007/978-1-4419-0742-4

Fox, J.-P., & Verhagen, A.-J. (2010). Random item effects modeling for cross-national survey data. In E. Davidov, P. Schmidt, & J. Billiet (Eds.), Cross-cultural Analysis: Methods and Applications (pp. 467-488), London: Routledge Academic.

Fraser, C., & McDonald, R. P. (1988). NOHARM: Least squares item factor analysis. Multivariate Behavioral Research, 23, 267-269.

Glas, C. A. W. (2012). Estimating and testing the extended testlet model. LSAC Research Report Series, RR 12-03.

Green, S.B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 74, 155-167.

Haberman, S. J. (2009). Linking parameter estimates derived from an item response model through separate calibrations. ETS Research Report ETS RR-09-40. Princeton, ETS. doi:10.1002/j.2333-8504.2009.tb02197.x

Hildebrandt, A., Luedtke, O., Robitzsch, A., Sommer, C., & Wilhelm, O. (2016). Exploring factor model parameters across continuous variables with local structural equation models. Multivariate Behavioral Research, 51(2-3), 257-278. doi:10.1080/00273171.2016.1142856

Ip, E. H., Molenberghs, G., Chen, S. H., Goegebeur, Y., & De Boeck, P. (2013). Functionally unidimensional item response models for multivariate binary data. Multivariate Behavioral Research, 48, 534-562.

Janssen, R., Tuerlinckx, F., Meulders, M., & de Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285-306.

Jeon, M., & Rijmen, F. (2016). A modular approach for item response theory modeling with the R package flirt. Behavior Research Methods, 48(2), 742-755. doi:10.3758/s13428-015-0606-z

Linacre, J. M. (1994). Many-Facet Rasch Measurement. Chicago: MESA Press.

Loken, E. & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63, 509-525.

McDonald, R. P. (1982). Linear versus nonlinear models in item response theory. Applied Psychological Measurement, 6(4), 379-396. doi:10.1177/014662168200600402

McDonald, R. P. (1997). Normal-ogive multidimensional model. In W. van der Linden & R. K. Hambleton (1997): Handbook of modern item response theory (pp. 257-269). New York: Springer. doi:10.1007/978-1-4757-2691-6_15

Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107-135.

Proctor, C. H. (1970). A probabilistic formulation and statistical analysis for Guttman scaling. Psychometrika, 35, 73-78.

Ramsay, J. O. (1989). A comparison of three simple test theory models. Psychometrika, 54, 487-499.

Ramsay, J. O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56, 611-630.

Reckase, M. (2009). Multidimensional item response theory. New York: Springer. doi:10.1007/978-0-387-89976-3

Renard, D., Molenberghs, G., & Geys, H. (2004). A pairwise likelihood approach to estimation in multilevel probit models. Computational Statistics & Data Analysis, 44, 649-667.

Rijmen, F., & Vomlel, J. (2008). Assessing the performance of variational methods for mixed logistic regression models. Journal of Statistical Computation and Simulation, 78, 765-779.

Robitzsch, A., & Steinfeld, J. (2018). Item response models for human ratings: Overview, estimation methods, and implementation in R. Psychological Test and Assessment Modeling, 60(1), 101-139.

Rossi, N., Wang, X. & Ramsay, J. O. (2002). Nonparametric item response function estimates with the EM algorithm. Journal of Educational and Behavioral Statistics, 27, 291-317.

Rusch, T., Mair, P., & Hatzinger, R. (2013). Psychometrics with R: A Review of CRAN Packages for Item Response Theory. http://epub.wu.ac.at/4010/1/resrepIRThandbook.pdf

Scheiblechner, H. (1995). Isotonic ordinal probabilistic models (ISOP). Psychometrika, 60(2), 281-304. doi:10.1007/BF02301417

Scheiblechner, H. (1999). Additive conjoint isotonic probabilistic models (ADISOP). Psychometrika, 64, 295-316.

Schroeders, U., Robitzsch, A., & Schipolowski, S. (2014). A comparison of different psychometric approaches to modeling testlet structures: An example with C-tests. Journal of Educational Measurement, 51(4), 400-418. doi:10.1111/jedm.12054

Stout, W., Habing, B., Douglas, J., & Kim, H. R. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20(4), 331-354. doi:10.1177/014662169602000403

Stukel, T. A. (1988). Generalized logistic models. Journal of the American Statistical Association, 83(402), 426-431. doi:10.1080/01621459.1988.10478613

Uenlue, A., & Yanagida, T. (2011). R you ready for R?: The CRAN psychometrics task view. British Journal of Mathematical and Statistical Psychology, 64(1), 182-186. doi:10.1348/000711010X519320

van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369-386.

Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427-450.

Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge: Cambridge University Press.

Zwinderman, A. H. (1995). Pairwise parameter estimation in Rasch models. Applied Psychological Measurement, 19, 369-375.

See Also

For estimating multidimensional models for polytomous item responses see the mirt, flirt (Jeon & Rijmen, 2016) and TAM packages.

For conditional maximum likelihood estimation see the eRm package.

For pairwise estimation likelihood methods (also known as composite likelihood methods) see pln or lavaan.

The estimation of cognitive diagnostic models is possible using the CDM package.

For the multidimensional latent class IRT model see the MultiLCIRT package which also allows the estimation IRT models with polytomous item responses.

Latent class analysis can be carried out with covLCA, poLCA, BayesLCA, randomLCA or lcmm packages.

Markov Chain Monte Carlo estimation for item response models can also be found in the MCMCpack package (see the MCMCirt functions therein).

See Rusch, Mair and Hatzinger (2013) and Uenlue and Yanagida (2011) for reviews of psychometrics packages in R.

Examples

##
##   |-----------------------------------------------------------------|
##   | sirt 0.40-4 (2013-11-26)                                        |
##   | Supplementary Item Response Theory                              |
##   | Maintainer: Alexander Robitzsch <a.robitzsch at bifie.at >      |
##   | https://sites.google.com/site/alexanderrobitzsch/software       |
##   |-----------------------------------------------------------------|
##
##                       _/              _/
##              _/_/_/      _/  _/_/  _/_/_/_/
##           _/_/      _/  _/_/        _/
##              _/_/  _/  _/          _/
##         _/_/_/    _/  _/            _/_/
##

Automatic Method of Finding Keys in a Dataset with Raw Item Responses

Description

This function calculates keys of a dataset with raw item responses. It starts with setting the most frequent category of an item to 1. Then, in each iteration keys are changed such that the highest item discrimination is found.

Usage

automatic.recode(data, exclude=NULL, pstart.min=0.6, allocate=200,
    maxiter=20, progress=TRUE)

Arguments

data

Dataset with raw item responses

exclude

Vector with categories to be excluded for searching the key

pstart.min

Minimum probability for an initial solution of keys.

allocate

Maximum number of categories per item. This argument is used in the function tam.ctt3 of the TAM package.

maxiter

Maximum number of iterations

progress

A logical which indicates if iteration progress should be displayed

Value

A list with following entries

item.stat

Data frame with item name, p value, item discrimination and the calculated key

data.scored

Scored data frame using calculated keys in item.stat

categ.stats

Data frame with statistics for all categories of all items

Examples

## Not run: 
#############################################################################
# EXAMPLE 1: data.raw1
#############################################################################
data(data.raw1)

# recode data.raw1 and exclude keys 8 and 9 (missing codes) and
# start with initially setting all categories larger than 50 
res1 <- sirt::automatic.recode( data.raw1, exclude=c(8,9), pstart.min=.50 )
# inspect calculated keys
res1$item.stat

#############################################################################
# EXAMPLE 2: data.timssAusTwn from TAM package
#############################################################################

miceadds::library_install("TAM")
data(data.timssAusTwn,package="TAM")
raw.resp <- data.timssAusTwn[,1:11]
res2 <- sirt::automatic.recode( data=raw.resp )

## End(Not run)

Functions for the Beta Item Response Model

Description

Functions for simulating and estimating the Beta item response model (Noel & Dauvier, 2007). brm.sim can be used for simulating the model, brm.irf computes the item response function. The Beta item response model is estimated as a discrete version to enable estimation in standard IRT software like mirt or TAM packages.

Usage

# simulating the beta item response model
brm.sim(theta, delta, tau, K=NULL)

# computing the item response function of the beta item response model
brm.irf( Theta, delta, tau, ncat, thdim=1, eps=1E-10 )

Arguments

theta

Ability vector of θ\theta values

delta

Vector of item difficulty parameters

tau

Vector item dispersion parameters

K

Number of discretized categories. The default is NULL which means that the simulated item responses are real number values between 0 and 1. If an integer K chosen, then values are discretized such that values of 0, 1, ..., KK-1 arise.

Theta

Matrix of the ability vector θ\bold{\theta}

ncat

Number of categories

thdim

Theta dimension in the matrix Theta on which the item loads.

eps

Nuisance parameter which stabilize probabilities.

Details

The discrete version of the beta item response model is defined as follows. Assume that for item ii there are KK categories resulting in values k=0,1,,K1k=0,1,\dots,K-1. Each value kk is associated with a corresponding the transformed value in [0,1][0,1], namely q(k)=1/(2K),1/(2K)+1/K,,11/(2K)q (k)=1/(2 \cdot K), 1/(2 \cdot K) + 1/K, \ldots, 1 - 1/(2 \cdot K). The item response model is defined as

P(Xpi=xpiθp)q(xpi)mpi1[1q(xpi)]npi1P( X_{pi}=x_{pi} | \theta_p) \propto q( x_{pi} )^{ m_{pi} - 1 } [ 1- q( x_{pi} ) ]^{ n_{pi} - 1 }

This density is a discrete version of a Beta distribution with shape parameters mpim_{pi} and npin_{pi}. These parameters are defined as

mpi=exp[(θpδi+τi)/2]andnpi=exp[(θp+δi+τi)/2]m_{pi}=\mathrm{exp} \left[ ( \theta_p - \delta_i + \tau_i ) / 2 \right] \qquad \mbox{and} \qquad n_{pi}=\mathrm{exp} \left[ ( - \theta_p + \delta_i + \tau_i ) / 2 \right]

The item response function can also be formulated as

log[P(Xpi=xpiθp)](mpi1)log[q(xpi)]+(npi1)log[1q(xpi)]\mathrm{log} \left[ P( X_{pi}=x_{pi} | \theta_p) \right] \propto ( m_{pi} - 1 ) \cdot \mathrm{log} [ q( x_{pi} ) ] + ( n_{pi} - 1 ) \cdot \mathrm{log} [ 1- q( x_{pi} ) ]

The item parameters can be reparameterized as ai=exp[(δi+τi)/2]a_{i}=\mathrm{exp} \left[ ( - \delta_i + \tau_i ) / 2 \right] and bi=exp[(δi+τi)/2]b_{i}=\mathrm{exp} \left[ ( \delta_i + \tau_i ) / 2 \right].

Then, the original item parameters can be retrieved by τi=log(aibi)\tau_i=\mathrm{log} ( a_i b_i) and δi=log(bi/ai)\delta_i=\mathrm{log} ( b_i / a_i). Using γp=exp(θp/2)\gamma _p=\mathrm{exp} ( \theta_p / 2), we obtain

log[P(Xpi=xpiθp)]aiγplog[q(xpi)]+bi/γplog[1q(xpi)][logq(xpi)+log[1q(xpi)]]\mathrm{log} \left[ P( X_{pi}=x_{pi} | \theta_p) \right] \propto a_{i} \gamma_p \cdot \mathrm{log} [ q( x_{pi} ) ] + b_i / \gamma_p \cdot \mathrm{log} [ 1- q( x_{pi} ) ] - \left[ \mathrm{log} q( x_{pi} ) + \mathrm{log} [ 1- q( x_{pi} ) ] \right]

This formulation enables the specification of the Beta item response model as a structured latent class model (see TAM::tam.mml.3pl; Example 1).

See Smithson and Verkuilen (2006) for motivations for treating continuous indicators not as normally distributed variables.

Value

A simulated dataset of item responses if brm.sim is applied.

A matrix of item response probabilities if brm.irf is applied.

References

Gruen, B., Kosmidis, I., & Zeileis, A. (2012). Extended Beta regression in R: Shaken, stirred, mixed, and partitioned. Journal of Statistical Software, 48(11), 1-25. doi:10.18637/jss.v048.i11

Noel, Y., & Dauvier, B. (2007). A beta item response model for continuous bounded responses. Applied Psychological Measurement, 31(1), 47-73. doi:10.1177/0146621605287691

Smithson, M., & Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods, 11(1), 54-71. doi: 10.1037/1082-989X.11.1.54

See Also

See also the betareg package for fitting Beta regression regression models in R (Gruen, Kosmidis & Zeileis, 2012).

Examples

#############################################################################
# EXAMPLE 1: Simulated data beta response model
#############################################################################

#*** (1) Simulation of the beta response model
# Table 3 (p. 65) of Noel and Dauvier (2007)
delta <- c( -.942, -.649, -.603, -.398, -.379, .523, .649, .781, .907 )
tau <- c( .382, .166, 1.799, .615, 2.092, 1.988, 1.899, 1.439, 1.057 )
K <- 5        # number of categories for discretization
N <- 500        # number of persons
I <- length(delta) # number of items

set.seed(865)
theta <- stats::rnorm( N )
dat <- sirt::brm.sim( theta=theta, delta=delta, tau=tau, K=K)
psych::describe(dat)

#*** (2) some preliminaries for estimation of the model in mirt
#*** define a mirt function
library(mirt)
Theta <- matrix( seq( -4, 4, len=21), ncol=1 )

# compute item response function
ii <- 1     # item ii=1
b1 <- sirt::brm.irf( Theta=Theta, delta=delta[ii], tau=tau[ii],  ncat=K )
# plot item response functions
graphics::matplot( Theta[,1], b1, type="l" )

#*** defining the beta item response function for estimation in mirt
par <- c( 0, 1,  1)
names(par) <- c( "delta", "tau","thdim")
est <- c( TRUE, TRUE, FALSE )
names(est) <- names(par)
brm.icc <- function( par, Theta, ncat ){
     delta <- par[1]
     tau <- par[2]
     thdim <- par[3]
     probs <- sirt::brm.irf( Theta=Theta, delta=delta, tau=tau,  ncat=ncat,
            thdim=thdim)
     return(probs)
            }
name <- "brm"
# create item response function
brm.itemfct <- mirt::createItem(name, par=par, est=est, P=brm.icc)
#*** define model in mirt
mirtmodel <- mirt::mirt.model("
           F1=1-9
            " )
itemtype <- rep("brm", I )
customItems <- list("brm"=brm.itemfct)

# define parameters to be estimated
mod1.pars <- mirt::mirt(dat, mirtmodel, itemtype=itemtype,
                   customItems=customItems, pars="values")

## Not run: 
#*** (3) estimate beta item response model in mirt
mod1 <- mirt::mirt(dat,mirtmodel, itemtype=itemtype, customItems=customItems,
               pars=mod1.pars, verbose=TRUE  )
# model summaries
print(mod1)
summary(mod1)
coef(mod1)
# estimated coefficients and comparison with simulated data
cbind( sirt::mirt.wrapper.coef( mod1 )$coef, delta, tau )
mirt.wrapper.itemplot(mod1,ask=TRUE)

#---------------------------
# estimate beta item response model in TAM
library(TAM)

# define the skill space: standard normal distribution
TP <- 21                   # number of theta points
theta.k <- diag(TP)
theta.vec <-  seq( -6,6, len=TP)
d1 <- stats::dnorm(theta.vec)
d1 <- d1 / sum(d1)
delta.designmatrix <- matrix( log(d1), ncol=1 )
delta.fixed <- cbind( 1, 1, 1 )

# define design matrix E
E <- array(0, dim=c(I,K,TP,2*I + 1) )
dimnames(E)[[1]] <- items <- colnames(dat)
dimnames(E)[[4]] <- c( paste0( rep( items, each=2 ),
        rep( c("_a","_b" ), I) ), "one" )
for (ii in 1:I){
    for (kk in 1:K){
      for (tt in 1:TP){
        qk <- (2*(kk-1)+1)/(2*K)
        gammap <- exp( theta.vec[tt] / 2 )
        E[ii, kk, tt, 2*(ii-1) + 1 ] <- gammap * log( qk )
        E[ii, kk, tt, 2*(ii-1) + 2 ] <- 1 / gammap * log( 1 - qk )
        E[ii, kk, tt, 2*I+1 ] <- - log(qk) - log( 1 - qk )
                    }
            }
        }
gammaslope.fixed <- cbind( 2*I+1, 1 )
gammaslope <- exp( rep(0,2*I+1) )

# estimate model in TAM
mod2 <- TAM::tam.mml.3pl(resp=dat, E=E,control=list(maxiter=100),
              skillspace="discrete", delta.designmatrix=delta.designmatrix,
              delta.fixed=delta.fixed, theta.k=theta.k, gammaslope=gammaslope,
              gammaslope.fixed=gammaslope.fixed, notA=TRUE )
summary(mod2)

# extract original tau and delta parameters
m1 <- matrix( mod2$gammaslope[1:(2*I) ], ncol=2, byrow=TRUE )
m1 <- as.data.frame(m1)
colnames(m1) <- c("a","b")
m1$delta.TAM <- log( m1$b / m1$a)
m1$tau.TAM <- log( m1$a * m1$b )

# compare estimated parameter
m2 <- cbind( sirt::mirt.wrapper.coef( mod1 )$coef, delta, tau )[,-1]
colnames(m2) <- c(  "delta.mirt", "tau.mirt", "thdim","delta.true","tau.true"   )
m2 <- cbind(m1,m2)
round( m2, 3 )

## End(Not run)

Extended Bradley-Terry Model

Description

The function btm estimates an extended Bradley-Terry model (Hunter, 2004; see Details). Parameter estimation uses a bias corrected joint maximum likelihood estimation method based on ε\varepsilon-adjustment (see Bertoli-Barsotti, Lando & Punzo, 2014). See Details for the algorithm.

The function btm_sim simulated data from the extended Bradley-Terry model.

Usage

btm(data, judge=NULL, ignore.ties=FALSE, fix.eta=NULL, fix.delta=NULL, fix.theta=NULL,
       maxiter=100, conv=1e-04, eps=0.3, wgt.ties=.5)

## S3 method for class 'btm'
summary(object, file=NULL, digits=4,...)

## S3 method for class 'btm'
predict(object, data=NULL, ...)

btm_sim(theta, eta=0, delta=-99, repeated=FALSE)

Arguments

data

Data frame with three columns. The first two columns contain labels from the units in the pair comparison. The third column contains the result of the comparison. "1" means that the first units wins, "0" means that the second unit wins and "0.5" means a draw (a tie).

judge

Optional vector of judge identifiers (if multiple judges are available)

ignore.ties

Logical indicating whether ties should be ignored.

fix.eta

Numeric value for a fixed η\eta value

fix.delta

Numeric value for a fixed δ\delta value

fix.theta

A vector with entries for fixed theta values.

maxiter

Maximum number of iterations

conv

Convergence criterion

eps

The ε\varepsilon parameter for the ε\varepsilon-adjustment method (see Bertoli-Barsotti, Lando & Punzo, 2014) which reduces bias in ability estimates. In case of ε=0\varepsilon=0, persons with extreme scores are removed from the pairwise comparison.

wgt.ties

Weighting parameter for ties, see formula in Details. The default is .5

object

Object of class btm

file

Optional file name for sinking the summary into

digits

Number of digits after decimal to print

...

Further arguments to be passed.

theta

Vector of abilities

eta

Value of η\eta parameter

delta

Value of δ\delta parameter

repeated

Logical indicating whether repeated ratings of dyads (for home advantage effect) should be simulated

Details

The extended Bradley-Terry model for the comparison of individuals ii and jj is defined as

P(Xij=1)exp(η+θi)P(X_{ij}=1 ) \propto \exp( \eta + \theta_i )

P(Xij=0)exp(θj)P(X_{ij}=0 ) \propto \exp( \theta_j )

P(Xij=0.5)exp(δ+wT(η+θi+θj))P(X_{ij}=0.5) \propto \exp( \delta + w_T ( \eta + \theta_i +\theta_j ) )

The parameters θi\theta_i denote the abilities, δ\delta is the tendency of the occurrence of ties and η\eta is the home-advantage effect. The weighting parameter wTw_T governs the importance of ties and can be chosen in the argument wgt.ties.

A joint maximum likelihood (JML) estimation is applied for simulataneous estimation of η\eta, δ\delta and all θi\theta_i parameters. In the Rasch model, it was shown that JML can result in biased parameter estimates. The ε\varepsilon-adjustment approach has been proposed to reduce the bias in parameter estimates (Bertoli-Bersotti, Lando & Punzo, 2014). This estimation approach is adapted to the Bradley-Terry model in the btm function. To this end, the likelihood function is modified for the purpose of bias reduction. It can be easily shown that there exist sufficient statistics for η\eta, δ\delta and all θi\theta_i parameters. In the ε\varepsilon-adjustment approach, the sufficient statistic for the θi\theta_i parameter is modified. In JML estimation of the Bradley-Terry model, Si=ji(xij+xji)S_i=\sum_{j \ne i} ( x_{ij} + x_{ji} ) is a sufficient statistic for θi\theta_i. Let MiM_i the maximum score for person ii which is the number of xijx_{ij} terms appearing in SiS_i. In the ε\varepsilon-adjustment approach, the sufficient statistic SiS_i is modified to

Si,ε=ε+Mi2εMiSiS_{i, \varepsilon}=\varepsilon + \frac{M_i - 2 \varepsilon}{M_i} S_i

and Si,εS_{i, \varepsilon} instead of SiS_{i} is used in JML estimation. Hence, original scores SiS_i are linearly transformed for all persons ii.

Value

List with following entries

pars

Parameter summary for η\eta and δ\delta

effects

Parameter estimates for θ\theta and outfit and infit statistics

summary.effects

Summary of θ\theta parameter estimates

mle.rel

MLE reliability, also known as separation reliability

sepG

Separation index GG

probs

Estimated probabilities

data

Used dataset with integer identifiers

fit_judges

Fit statistics (outfit and infit) for judges if judge is provided. In addition, average agreement of the rating with the mode of the ratings is calculated for each judge (at least three ratings per dyad has to be available for computing the agreement).

residuals

Unstandardized and standardized residuals for each observation

References

Bertoli-Barsotti, L., Lando, T., & Punzo, A. (2014). Estimating a Rasch Model via fuzzy empirical probability functions. In D. Vicari, A. Okada, G. Ragozini & C. Weihs (Eds.). Analysis and Modeling of Complex Data in Behavioral and Social Sciences. Springer. doi:10.1007/978-3-319-06692-9_4

Hunter, D. R. (2004). MM algorithms for generalized Bradley-Terry models. Annals of Statistics, 32, 384-406. doi: 10.1214/aos/1079120141

See Also

See also the R packages BradleyTerry2, psychotools, psychomix and prefmod.

Examples

#############################################################################
# EXAMPLE 1: Bradley-Terry model | data.pw01
#############################################################################

data(data.pw01)

dat <- data.pw01
dat <- dat[, c("home_team", "away_team", "result") ]

# recode results according to needed input
dat$result[ dat$result==0 ] <- 1/2   # code for ties
dat$result[ dat$result==2 ] <- 0     # code for victory of away team

#********************
# Model 1: Estimation with ties and home advantage
mod1 <- sirt::btm( dat)
summary(mod1)

## Not run: 
#*** Model 2: Estimation with ties, no epsilon adjustment
mod2 <- sirt::btm( dat, eps=0)
summary(mod2)

#*** Model 3: Estimation with ties, no epsilon adjustment, weight for ties of .333 which
#    corresponds to the rule of 3 points for a victory and 1 point of a draw in football
mod3 <- sirt::btm( dat, eps=0, wgt.ties=1/3)
summary(mod3)

#*** Model 4: Some fixed abilities
fix.theta <- c("Anhalt Dessau"=-1 )
mod4 <- sirt::btm( dat, eps=0, fix.theta=fix.theta)
summary(mod4)

#*** Model 5: Ignoring ties, no home advantage effect
mod5 <- sirt::btm( dat, ignore.ties=TRUE, fix.eta=0)
summary(mod5)

#*** Model 6: Ignoring ties, no home advantage effect (JML approach and eps=0)
mod6 <- sirt::btm( dat, ignore.ties=TRUE, fix.eta=0, eps=0)
summary(mod5)

#############################################################################
# EXAMPLE 2: Venice chess data
#############################################################################

# See http://www.rasch.org/rmt/rmt113o.htm
# Linacre, J. M. (1997). Paired Comparisons with Standard Rasch Software.
# Rasch Measurement Transactions, 11:3, 584-585.

# dataset with chess games -> "D" denotes a draw (tie)
chessdata <- scan( what="character")
    1D.0..1...1....1.....1......D.......D........1.........1.......... Browne
    0.1.D..0...1....1.....1......D.......1........D.........1......... Mariotti
    .D0..0..1...D....D.....1......1.......1........1.........D........ Tatai
    ...1D1...D...D....1.....D......D.......D........1.........0....... Hort
    ......010D....D....D.....1......D.......1........1.........D...... Kavalek
    ..........00DDD.....D.....D......D.......1........D.........1..... Damjanovic
    ...............00D0DD......D......1.......1........1.........0.... Gligoric
    .....................000D0DD.......D.......1........D.........1... Radulov
    ............................DD0DDD0D........0........0.........1.. Bobotsov
    ....................................D00D00001.........1.........1. Cosulich
    .............................................0D000D0D10..........1 Westerinen
    .......................................................00D1D010000 Zichichi

L <- length(chessdata) / 2
games <- matrix( chessdata, nrow=L, ncol=2, byrow=TRUE )
G <- nchar(games[1,1])
# create matrix with results
results <- matrix( NA, nrow=G, ncol=3 )
for (gg in 1:G){
    games.gg <- substring( games[,1], gg, gg )
    ind.gg <- which( games.gg !="." )
    results[gg, 1:2 ] <- games[ ind.gg, 2]
    results[gg, 3 ] <- games.gg[ ind.gg[1] ]
}
results <- as.data.frame(results)
results[,3] <- paste(results[,3] )
results[ results[,3]=="D", 3] <- 1/2
results[,3] <- as.numeric( results[,3] )

# fit model ignoring draws
mod1 <- sirt::btm( results, ignore.ties=TRUE, fix.eta=0, eps=0 )
summary(mod1)

# fit model with draws
mod2 <- sirt::btm( results, fix.eta=0, eps=0 )
summary(mod2)

#############################################################################
# EXAMPLE 3: Simulated data from the Bradley-Terry model
#############################################################################

set.seed(9098)
N <- 22
theta <- seq(2,-2, len=N)

#** simulate and estimate data without repeated dyads
dat1 <- sirt::btm_sim(theta=theta)
mod1 <- sirt::btm( dat1, ignore.ties=TRUE, fix.delta=-99, fix.eta=0)
summary(mod1)

#*** simulate data with home advantage effect and ties
dat2 <- sirt::btm_sim(theta=theta, eta=.8, delta=-.6, repeated=TRUE)
mod2 <- sirt::btm(dat2)
summary(mod2)

#############################################################################
# EXAMPLE 4: Estimating the Bradley-Terry model with multiple judges
#############################################################################

#*** simulating data with multiple judges
set.seed(987)
N <- 26  # number of objects to be rated
theta <- seq(2,-2, len=N)
s1 <- stats::sd(theta)
dat <- NULL
# judge discriminations which define tendency to provide reliable ratings
discrim <- c( rep(.9,10), rep(.5,2), rep(0,2) )
#=> last four raters provide less reliable ratings

RR <- length(discrim)
for (rr in 1:RR){
    theta1 <- discrim[rr]*theta + stats::rnorm(N, mean=0, sd=s1*sqrt(1-discrim[rr]))
    dat1 <- sirt::btm_sim(theta1)
    dat1$judge <- rr
    dat <- rbind(dat, dat1)
}

#** estimate the Bradley-Terry model and compute judge-specific fit statistics
mod <- sirt::btm( dat[,1:3], judge=paste0("J",100+dat[,4]), fix.eta=0, ignore.ties=TRUE)
summary(mod)

## End(Not run)

Categorize and Decategorize Variables in a Data Frame

Description

The function categorize defines categories for variables in a data frame, starting with a user-defined index (e.g. 0 or 1). Continuous variables can be categorized by defining categories by discretizing the variables in different quantile groups.

The function decategorize does the reverse operation.

Usage

categorize(dat, categorical=NULL, quant=NULL, lowest=0)

decategorize(dat, categ_design=NULL)

Arguments

dat

Data frame

categorical

Vector with variable names which should be converted into categories, beginning with integer lowest

quant

Vector with number of classes for each variables. Variables are categorized among quantiles. The vector must have names containing variable names.

lowest

Lowest category index. Default is 0.

categ_design

Data frame containing informations about categorization which is the output of categorize.

Value

For categorize, it is a list with entries

data

Converted data frame

categ_design

Data frame containing some informations about categorization

For decategorize it is a data frame.

Examples

## Not run: 
library(mice)
library(miceadds)

#############################################################################
# EXAMPLE 1: Categorize questionnaire data
#############################################################################

data(data.smallscale, package="miceadds")
dat <- data.smallscale

# (0) select dataset
dat <- dat[, 9:20 ]
summary(dat)
categorical <- colnames(dat)[2:6]

# (1) categorize data
res <- sirt::categorize( dat, categorical=categorical )

# (2) multiple imputation using the mice package
dat2 <- res$data
VV <- ncol(dat2)
impMethod <- rep( "sample", VV )    # define random sampling imputation method
names(impMethod) <- colnames(dat2)
imp <- mice::mice( as.matrix(dat2), impMethod=impMethod, maxit=1, m=1 )
dat3 <- mice::complete(imp,action=1)

# (3) decategorize dataset
dat3a <- sirt::decategorize( dat3, categ_design=res$categ_design )

#############################################################################
# EXAMPLE 2: Categorize ordinal and continuous data
#############################################################################

data(data.ma01,package="miceadds")
dat <- data.ma01
summary(dat[,-c(1:2)] )

# define variables to be categorized
categorical <- c("books", "paredu" )
# define quantiles
quant <-  c(6,5,11)
names(quant) <- c("math", "read", "hisei")

# categorize data
res <- sirt::categorize( dat, categorical=categorical, quant=quant)
str(res)

## End(Not run)

Nonparametric Estimation of Conditional Covariances of Item Pairs

Description

This function estimates conditional covariances of itempairs (Stout, Habing, Douglas & Kim, 1996; Zhang & Stout, 1999a). The function is used for the estimation of the DETECT index. The ccov.np function has the (default) option to smooth item response functions (argument smooth) in the computation of conditional covariances (Douglas, Kim, Habing, & Gao, 1998).

Usage

ccov.np(data, score, bwscale=1.1, thetagrid=seq(-3, 3, len=200),
    progress=TRUE, scale_score=TRUE, adjust_thetagrid=TRUE, smooth=TRUE,
    use_sum_score=FALSE, bias_corr=TRUE)

Arguments

data

An N×IN \times I data frame of dichotomous responses. Missing responses are allowed.

score

An ability estimate, e.g. the WLE

bwscale

Bandwidth factor for calculation of conditional covariance. The bandwidth used in the estimation is bwscale times N1/5N^{-1/5}.

thetagrid

A vector which contains theta values where conditional covariances are evaluated.

progress

Display progress?

scale_score

Logical indicating whether score should be z standardized in advance of the calculation of conditional covariances

adjust_thetagrid

Logical indicating whether thetagrid should be adjusted if observed values in score are outside of thetagrid.

smooth

Logical indicating whether smoothing should be applied for conditional covariance estimation

use_sum_score

Logical indicating whether sum score should be used. With this option, the bias corrected conditional covariance of Zhang and Stout (1999) is used.

bias_corr

Logical indicating whether bias correction (Zhang & Stout, 1999) should be utilized if use_sum_score=TRUE.

Note

This function is used in conf.detect and expl.detect.

References

Douglas, J., Kim, H. R., Habing, B., & Gao, F. (1998). Investigating local dependence with conditional covariance functions. Journal of Educational and Behavioral Statistics, 23(2), 129-151. doi:10.3102/10769986023002129

Stout, W., Habing, B., Douglas, J., & Kim, H. R. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20(4), 331-354. doi:10.1177/014662169602000403

Zhang, J., & Stout, W. (1999). Conditional covariance structure of generalized compensatory multidimensional items. Psychometrika, 64(2), 129-152. doi:10.1007/BF02294532

Examples

## Not run: 
#############################################################################
# EXAMPLE 1: data.read | different settings for computing conditional covariance
#############################################################################

data(data.read, package="sirt")
dat <- data.read

#* fit Rasch model
mod <- sirt::rasch.mml2(dat)
score <- sirt::wle.rasch(dat=dat, b=mod$item$b)$theta

#* ccov with smoothing
cmod1 <- sirt::ccov.np(data=dat, score=score, bwscale=1.1)
#* ccov without smoothing
cmod2 <- sirt::ccov.np(data=dat, score=score, smooth=FALSE)

#- compare results
100*cbind( cmod1$ccov.table[1:6, "ccov"], cmod2$ccov.table[1:6, "ccov"])

## End(Not run)

Estimation of a Unidimensional Factor Model under Full and Partial Measurement Invariance

Description

Estimates a unidimensional factor model based on the normal distribution fitting function under full and partial measurement invariance. Item loadings and item intercepts are successively freed based on the largest modification index and a chosen significance level alpha.

Usage

cfa_meas_inv(dat, group, weights=NULL, alpha=0.01, verbose=FALSE, op=c("~1","=~"))

Arguments

dat

Data frame containing items

group

Vector of group identifiers

weights

Optional vector of sampling weights

alpha

Significance level

verbose

Logical indicating whether progress should be shown

op

Operators (intercepts or loadings) for which estimation should be freed

Value

List with several entries

pars_mi

Model parameters under full invariance

pars_pi

Model parameters under partial invariance

mod_mi

Fitted model under full invariance

mod_pi

Fitted model under partial invariance

...

More output

See Also

See also sirt::invariance.alignment

Examples

## Not run: 
#############################################################################
# EXAMPLE 1: Factor model under full and partial invariance
#############################################################################

#--- data simulation

set.seed(65)
G <- 3  # number of groups
I <- 5  # number of items
# define lambda and nu parameters
lambda <- matrix(1, nrow=G, ncol=I)
nu <- matrix(0, nrow=G, ncol=I)
err_var <- matrix(1, nrow=G, ncol=I)

# define size of noninvariance
dif <- 1
#- 1st group: N(0,1)
lambda[1,3] <- 1+dif*.4; nu[1,5] <- dif*.5
#- 2nd group: N(0.3,1.5)
gg <- 2 ;
lambda[gg,5] <- 1-.5*dif; nu[gg,1] <- -.5*dif
#- 3nd group: N(.8,1.2)
gg <- 3
lambda[gg,4] <- 1-.7*dif; nu[gg,2] <- -.5*dif
#- define distributions of groups
mu <- c(0,.3,.8)
sigma <- sqrt(c(1,1.5,1.2))
N <- rep(1000,3) # sample sizes per group

#* use simulation function
dat <- sirt::invariance_alignment_simulate(nu, lambda, err_var, mu, sigma, N,
                exact=TRUE)

#--- estimate CFA
mod <- sirt::cfa_meas_inv(dat=dat[,-1], group=dat$group, verbose=TRUE, alpha=0.05)
mod$pars_mi
mod$pars_pi

## End(Not run)

Classification Accuracy in the Rasch Model

Description

This function computes the classification accuracy in the Rasch model for the maximum likelihood (person parameter) estimate according to the method of Rudner (2001).

Usage

class.accuracy.rasch(cutscores, b, meantheta, sdtheta, theta.l, n.sims=0)

Arguments

cutscores

Vector of cut scores

b

Vector of item difficulties

meantheta

Mean of the trait distribution

sdtheta

Standard deviation of the trait distribution

theta.l

Discretized theta distribution

n.sims

Number of simulated persons in a data set. The default is 0 which means that no simulation is performed.

Value

A list with following entries:

class.stats

Data frame containing classification accuracy statistics. The column agree0 refers to absolute agreement, agree1 to the agreement of at most a difference of one level.

class.prob

Probability table of classification

References

Rudner, L.M. (2001). Computing the expected proportions of misclassified examinees. Practical Assessment, Research & Evaluation, 7(14).

See Also

Classification accuracy of other IRT models can be obtained with the R package cacIRT.

Examples

#############################################################################
# EXAMPLE 1: Reading dataset
#############################################################################
data( data.read, package="sirt")
dat <- data.read

# estimate the Rasch model
mod <- sirt::rasch.mml2( dat )

# estimate classification accuracy (3 levels)
cutscores <- c( -1, .3 )    # cut scores at theta=-1 and theta=.3
sirt::class.accuracy.rasch( cutscores=cutscores, b=mod$item$b,
           meantheta=0,  sdtheta=mod$sd.trait,
           theta.l=seq(-4,4,len=200), n.sims=3000)
  ##   Cut Scores
  ##   [1] -1.0  0.3
  ##
  ##   WLE reliability (by simulation)=0.671
  ##   WLE consistency (correlation between two parallel forms)=0.649
  ##
  ##   Classification accuracy and consistency
  ##              agree0 agree1 kappa consistency
  ##   analytical   0.68  0.990 0.492          NA
  ##   simulated    0.70  0.997 0.489       0.599
  ##
  ##   Probability classification table
  ##               Est_Class1 Est_Class2 Est_Class3
  ##   True_Class1      0.136      0.041      0.001
  ##   True_Class2      0.081      0.249      0.093
  ##   True_Class3      0.009      0.095      0.294

Confirmatory DETECT and polyDETECT Analysis

Description

This function computes the DETECT statistics for dichotomous item responses and the polyDETECT statistic for polytomous item responses under a confirmatory specification of item clusters (Stout, Habing, Douglas & Kim, 1996; Zhang & Stout, 1999a, 1999b; Zhang, 2007; Bonifay, Reise, Scheines, & Meijer, 2015).

Item responses in a multi-matrix design are allowed (Zhang, 2013).

An exploratory DETECT analysis can be conducted using the expl.detect function.

Usage

conf.detect(data, score, itemcluster, bwscale=1.1, progress=TRUE,
        thetagrid=seq(-3, 3, len=200), smooth=TRUE, use_sum_score=FALSE, bias_corr=TRUE)

## S3 method for class 'conf.detect'
summary(object, digits=3, file=NULL, ...)

Arguments

data

An N×IN \times I data frame of dichotomous or polytomous responses. Missing responses are allowed.

score

An ability estimate, e.g. the WLE, sum score or mean score

itemcluster

Item cluster for each item. The order of entries must correspond to the columns in data.

bwscale

Bandwidth factor for calculation of conditional covariance (see ccov.np)

progress

Display progress?

smooth

Logical indicating whether smoothing should be applied for conditional covariance estimation

thetagrid

A vector which contains theta values where conditional covariances are evaluated.

use_sum_score

Logical indicating whether sum score should be used. With this option, the bias corrected conditional covariance of Zhang and Stout (1999) is used.

bias_corr

Logical indicating whether bias correction (Zhang & Stout, 1999) should be utilized if use_sum_score=TRUE.

object

Object of class conf.detect

digits

Number of digits for rounding in summary

file

Optional file name to be sunk for summary

...

Further arguments to be passed

Details

The result of DETECT are the indices DETECT, ASSI and RATIO (see Zhang 2007 for details) calculated for the options unweighted and weighted. The option unweighted means that all conditional covariances of item pairs are equally weighted, weighted means that these covariances are weighted by the sample size of item pairs. In case of multi matrix item designs, both types of indices can differ.

The classification scheme of these indices are as follows (Jang & Roussos, 2007; Zhang, 2007):

Strong multidimensionality DETECT > 1.00
Moderate multidimensionality .40 < DETECT < 1.00
Weak multidimensionality .20 < DETECT < .40
Essential unidimensionality DETECT < .20
Maximum value under simple structure ASSI=1 RATIO=1
Essential deviation from unidimensionality ASSI > .25 RATIO > .36
Essential unidimensionality ASSI < .25 RATIO < .36

Note that the expected value of a conditional covariance for an item pair is negative when a unidimensional model holds. In consequence, the DETECT index can become negative for unidimensional data (see Example 3). This can be also seen in the statistic MCOV100 in the value detect.

Value

A list with following entries:

detect

Data frame with statistics DETECT, ASSI, RATIO, MADCOV100 and MCOV100

ccovtable

Individual contributions to conditional covariance

ccov.matrix

Evaluated conditional covariance

References

Bonifay, W. E., Reise, S. P., Scheines, R., & Meijer, R. R. (2015). When are multidimensional data unidimensional enough for structural equation modeling? An evaluation of the DETECT multidimensionality index. Structural Equation Modeling, 22(4), 504-516. doi:10.1080/10705511.2014.938596

Jang, E. E., & Roussos, L. (2007). An investigation into the dimensionality of TOEFL using conditional covariance-based nonparametric approach. Journal of Educational Measurement, 44(1), 1-21. doi:10.1111/j.1745-3984.2007.00024.x

Stout, W., Habing, B., Douglas, J., & Kim, H. R. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20(4), 331-354. doi:10.1177/014662169602000403

Zhang, J. (2007). Conditional covariance theory and DETECT for polytomous items. Psychometrika, 72(1), 69-91. doi:10.1007/s11336-004-1257-7

Zhang, J. (2013). A procedure for dimensionality analyses of response data from various test designs. Psychometrika, 78(1), 37-58. doi:10.1007/s11336-012-9287-z

Zhang, J., & Stout, W. (1999a). Conditional covariance structure of generalized compensatory multidimensional items. Psychometrika, 64(2), 129-152. doi:10.1007/BF02294532

Zhang, J., & Stout, W. (1999b). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika, 64(2), 213-249. doi:10.1007/BF02294536

See Also

For a download of the free DIM-Pack software (DIMTEST, DETECT) see https://psychometrics.onlinehelp.measuredprogress.org/tools/dim/.

See expl.detect for exploratory DETECT analysis.

Examples

#############################################################################
# EXAMPLE 1: TIMSS mathematics data set (dichotomous data)
#############################################################################
data(data.timss)

# extract data
dat <- data.timss$data
dat <- dat[, substring( colnames(dat),1,1)=="M" ]
# extract item informations
iteminfo <- data.timss$item
# estimate Rasch model
mod1 <- sirt::rasch.mml2( dat )
# estimate WLEs
wle1 <- sirt::wle.rasch( dat, b=mod1$item$b )$theta

# DETECT for content domains
detect1 <- sirt::conf.detect( data=dat, score=wle1,
                    itemcluster=iteminfo$Content.Domain )
  ##          unweighted weighted
  ##   DETECT      0.316    0.316
  ##   ASSI        0.273    0.273
  ##   RATIO       0.355    0.355

## Not run: 
# DETECT cognitive domains
detect2 <- sirt::conf.detect( data=dat, score=wle1,
                    itemcluster=iteminfo$Cognitive.Domain )
  ##          unweighted weighted
  ##   DETECT      0.251    0.251
  ##   ASSI        0.227    0.227
  ##   RATIO       0.282    0.282

# DETECT for item format
detect3 <- sirt::conf.detect( data=dat, score=wle1,
                    itemcluster=iteminfo$Format )
  ##          unweighted weighted
  ##   DETECT      0.056    0.056
  ##   ASSI        0.060    0.060
  ##   RATIO       0.062    0.062

# DETECT for item blocks
detect4 <- sirt::conf.detect( data=dat, score=wle1,
                    itemcluster=iteminfo$Block )
  ##          unweighted weighted
  ##   DETECT      0.301    0.301
  ##   ASSI        0.193    0.193
  ##   RATIO       0.339    0.339 
## End(Not run)

# Exploratory DETECT: Application of a cluster analysis employing the Ward method
detect5 <- sirt::expl.detect( data=dat, score=wle1,
                nclusters=10, N.est=nrow(dat)  )
# Plot cluster solution
pl <- graphics::plot( detect5$clusterfit, main="Cluster solution" )
stats::rect.hclust(detect5$clusterfit, k=4, border="red")

## Not run: 
#############################################################################
# EXAMPLE 2: Big 5 data set (polytomous data)
#############################################################################

# attach Big5 Dataset
data(data.big5)

# select 6 items of each dimension
dat <- data.big5
dat <- dat[, 1:30]

# estimate person score by simply using a transformed sum score
score <- stats::qnorm( ( rowMeans( dat )+.5 )  / ( 30 + 1 ) )

# extract item cluster (Big 5 dimensions)
itemcluster <- substring( colnames(dat), 1, 1 )

# DETECT Item cluster
detect1 <- sirt::conf.detect( data=dat, score=score, itemcluster=itemcluster )
  ##        unweighted weighted
  ## DETECT      1.256    1.256
  ## ASSI        0.384    0.384
  ## RATIO       0.597    0.597

# Exploratory DETECT
detect5 <- sirt::expl.detect( data=dat, score=score,
                     nclusters=9, N.est=nrow(dat)  )
  ## DETECT (unweighted)
  ## Optimal Cluster Size is  6  (Maximum of DETECT Index)
  ##   N.Cluster N.items N.est N.val      size.cluster DETECT.est ASSI.est RATIO.est
  ## 1         2      30   500     0              6-24      1.073    0.246     0.510
  ## 2         3      30   500     0           6-10-14      1.578    0.457     0.750
  ## 3         4      30   500     0         6-10-11-3      1.532    0.444     0.729
  ## 4         5      30   500     0        6-8-11-2-3      1.591    0.462     0.757
  ## 5         6      30   500     0       6-8-6-2-5-3      1.610    0.499     0.766
  ## 6         7      30   500     0     6-3-6-2-5-5-3      1.557    0.476     0.740
  ## 7         8      30   500     0   6-3-3-2-3-5-5-3      1.540    0.462     0.732
  ## 8         9      30   500     0 6-3-3-2-3-5-3-3-2      1.522    0.444     0.724

# Plot Cluster solution
pl <- graphics::plot( detect5$clusterfit, main="Cluster solution" )
stats::rect.hclust(detect5$clusterfit, k=6, border="red")

#############################################################################
# EXAMPLE 3: DETECT index for unidimensional data
#############################################################################

set.seed(976)
N <- 1000
I <- 20
b <- sample( seq( -2, 2, len=I) )
dat <- sirt::sim.raschtype( stats::rnorm(N), b=b )

# estimate Rasch model and corresponding WLEs
mod1 <- TAM::tam.mml( dat )
wmod1 <- TAM::tam.wle(mod1)$theta

# define item cluster
itemcluster <- c( rep(1,5), rep(2,I-5) )

# compute DETECT statistic
detect1 <- sirt::conf.detect( data=dat, score=wmod1, itemcluster=itemcluster)
  ##            unweighted weighted
  ##  DETECT        -0.184   -0.184
  ##  ASSI          -0.147   -0.147
  ##  RATIO         -0.226   -0.226
  ##  MADCOV100      0.816    0.816
  ##  MCOV100       -0.786   -0.786

## End(Not run)

Item Parameters Cultural Activities

Description

List with item parameters for cultural activities of Austrian students for 9 Austrian countries.

Usage

data(data.activity.itempars)

Format

The format is a list with number of students per group (N), item loadings (lambda) and item intercepts (nu):

List of 3
$ N : 'table' int [1:9(1d)] 2580 5279 15131 14692 5525 11005 7080 ...
..- attr(*, "dimnames")=List of 1
.. ..$ : chr [1:9] "1" "2" "3" "4" ...
$ lambda: num [1:9, 1:5] 0.423 0.485 0.455 0.437 0.502 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:9] "country1" "country2" "country3" "country4" ...
.. ..$ : chr [1:5] "act1" "act2" "act3" "act4" ...
$ nu : num [1:9, 1:5] 1.65 1.53 1.7 1.59 1.7 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:9] "country1" "country2" "country3" "country4" ...
.. ..$ : chr [1:5] "act1" "act2" "act3" "act4" ...


BEFKI Dataset (Schroeders, Schipolowski, & Wilhelm, 2015)

Description

The synthetic dataset is based on the standardization sample of the Berlin Test of Fluid and Crystallized Intelligence (BEFKI, Wilhelm, Schroeders, & Schipolowski, 2014). The underlying sample consists of N=11,756 students from all German federal states (except for the smallest one) and all school types of the general educational system attending Grades 5 to 12. A detailed description of the study, the sample, and the measure is given in Schroeders, Schipolowski, and Wilhelm (2015).

Usage

data(data.befki)
data(data.befki_resp)

Format

Details

The procedure for generating this dataset is based on a factorization of the joint distribution. All variables are simulated from unidimensional conditional parametric regression models including several interaction and quadratic terms. The multilevel structure is approximated by including cluster means as predictors in the regression models.

Source

Synthetic dataset

References

Schroeders, U., Schipolowski, S., & Wilhelm, O. (2015). Age-related changes in the mean and covariance structure of fluid and crystallized intelligence in childhood and adolescence. Intelligence, 48, 15-29. doi:10.1016/j.intell.2014.10.006

Wilhelm, O., Schroeders, U., & Schipolowski, S. (2014). Berliner Test zur Erfassung fluider und kristalliner Intelligenz fuer die 8. bis 10. Jahrgangsstufe [Berlin test of fluid and crystallized intelligence for grades 8-10]. Goettingen: Hogrefe.


Dataset Big 5 from qgraph Package

Description

This is a Big 5 dataset from the qgraph package (Dolan, Oorts, Stoel, Wicherts, 2009). It contains 500 subjects on 240 items.

Usage

data(data.big5)
data(data.big5.qgraph)

Format

Details

In these datasets, there exist 48 items for each dimension. The Big 5 dimensions are Neuroticism (N), Extraversion (E), Openness (O), Agreeableness (A) and Conscientiousness (C). Note that the data.big5 differs from data.big5.qgraph in a way that original items were recoded into three categories 0,1 and 2.

Source

See big5 in qgraph package.

References

Dolan, C. V., Oort, F. J., Stoel, R. D., & Wicherts, J. M. (2009). Testing measurement invariance in the target rotates multigroup exploratory factor model. Structural Equation Modeling, 16, 295-314.

Examples

## Not run: 
# list of needed packages for the following examples
packages <- scan(what="character")
     sirt   TAM   eRm   CDM   mirt  ltm   mokken  psychotools  psychomix
     psych

# load packages. make an installation if necessary
miceadds::library_install(packages)

#############################################################################
# EXAMPLE 1: Unidimensional models openness scale
#############################################################################

data(data.big5)
# extract first 10 openness items
items <- which( substring( colnames(data.big5), 1, 1 )=="O"  )[1:10]
dat <- data.big5[, items ]
I <- ncol(dat)
summary(dat)
  ##   > colnames(dat)
  ##    [1] "O3"  "O8"  "O13" "O18" "O23" "O28" "O33" "O38" "O43" "O48"
# descriptive statistics
psych::describe(dat)

#****************
# Model 1: Partial credit model
#****************

#-- M1a: rm.facets (in sirt)
m1a <- sirt::rm.facets( dat )
summary(m1a)

#-- M1b: tam.mml (in TAM)
m1b <- TAM::tam.mml( resp=dat )
summary(m1b)

#-- M1c: gdm (in CDM)
theta.k <- seq(-6,6,len=21)
m1c <- CDM::gdm( dat, irtmodel="1PL",theta.k=theta.k, skillspace="normal")
summary(m1c)
# compare results with loglinear skillspace
m1c2 <- CDM::gdm( dat, irtmodel="1PL",theta.k=theta.k, skillspace="loglinear")
summary(m1c2)

#-- M1d: PCM (in eRm)
m1d <- eRm::PCM( dat )
summary(m1d)

#-- M1e: gpcm (in ltm)
m1e <- ltm::gpcm( dat, constraint="1PL", control=list(verbose=TRUE))
summary(m1e)

#-- M1f: mirt (in mirt)
m1f <- mirt::mirt( dat, model=1, itemtype="1PL", verbose=TRUE)
summary(m1f)
coef(m1f)

#-- M1g: PCModel.fit (in psychotools)
mod1g <- psychotools::PCModel.fit(dat)
summary(mod1g)
plot(mod1g)

#****************
# Model 2: Generalized partial credit model
#****************

#-- M2a: rm.facets (in sirt)
m2a <- sirt::rm.facets( dat, est.a.item=TRUE)
summary(m2a)
# Note that in rm.facets the mean of item discriminations is fixed to 1

#-- M2b: tam.mml.2pl (in TAM)
m2b <- TAM::tam.mml.2pl( resp=dat, irtmodel="GPCM")
summary(m2b)

#-- M2c: gdm (in CDM)
m2c <- CDM::gdm( dat, irtmodel="2PL",theta.k=seq(-6,6,len=21),
                   skillspace="normal", standardized.latent=TRUE)
summary(m2c)

#-- M2d: gpcm (in ltm)
m2d <- ltm::gpcm( dat, control=list(verbose=TRUE))
summary(m2d)

#-- M2e: mirt (in mirt)
m2e <- mirt::mirt( dat, model=1,  itemtype="GPCM", verbose=TRUE)
summary(m2e)
coef(m2e)

#****************
# Model 3: Nonparametric item response model
#****************

#-- M3a: ISOP and ADISOP model - isop.poly (in sirt)
m3a <- sirt::isop.poly( dat )
summary(m3a)
plot(m3a)

#-- M3b: Mokken scale analysis (in mokken)
# Scalability coefficients
mokken::coefH(dat)
# Assumption of monotonicity
monotonicity.list <- mokken::check.monotonicity(dat)
summary(monotonicity.list)
plot(monotonicity.list)
# Assumption of non-intersecting ISRFs using method restscore
restscore.list <- mokken::check.restscore(dat)
summary(restscore.list)
plot(restscore.list)

#****************
# Model 4: Graded response model
#****************

#-- M4a: mirt (in mirt)
m4a <- mirt::mirt( dat, model=1,  itemtype="graded", verbose=TRUE)
print(m4a)
mirt.wrapper.coef(m4a)

#----  M4b: WLSMV estimation with cfa (in lavaan)
lavmodel <- "F=~ O3__O48
             F ~~ 1*F
                "
# transform lavaan syntax with lavaanify.IRT
lavmodel <- TAM::lavaanify.IRT( lavmodel, items=colnames(dat) )$lavaan.syntax
mod4b <- lavaan::cfa( data=as.data.frame(dat), model=lavmodel, std.lv=TRUE,
                 ordered=colnames(dat),  parameterization="theta")
summary(mod4b, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE)
coef(mod4b)

#****************
# Model 5: Normally distributed residuals
#****************

#----  M5a: cfa (in lavaan)
lavmodel <- "F=~ O3__O48
             F ~~ 1*F
             F ~ 0*1
             O3__O48 ~ 1
                "
lavmodel <- TAM::lavaanify.IRT( lavmodel, items=colnames(dat) )$lavaan.syntax
mod5a <- lavaan::cfa( data=as.data.frame(dat), model=lavmodel, std.lv=TRUE,
                 estimator="MLR" )
summary(mod5a, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE)

#----  M5b: mirt (in mirt)

# create user defined function
name <- 'normal'
par <- c("d"=1, "a1"=0.8, "vy"=1)
est <- c(TRUE, TRUE,FALSE)
P.normal <- function(par,Theta,ncat){
     d <- par[1]
     a1 <- par[2]
     vy <- par[3]
     psi <- vy - a1^2
     # expected values given Theta
     mui <- a1*Theta[,1] + d
     TP <- nrow(Theta)
     probs <- matrix( NA, nrow=TP, ncol=ncat )
     eps <- .01
     for (cc in 1:ncat){
        probs[,cc] <- stats::dnorm( cc, mean=mui, sd=sqrt( abs( psi + eps) ) )
                    }
     psum <- matrix( rep(rowSums( probs ),each=ncat), nrow=TP, ncol=ncat, byrow=TRUE)
     probs <- probs / psum
     return(probs)
}

# create item response function
normal <- mirt::createItem(name, par=par, est=est, P=P.normal)
customItems <- list("normal"=normal)
itemtype <- rep( "normal",I)
# define parameters to be estimated
mod5b.pars <- mirt::mirt(dat, 1, itemtype=itemtype,
                   customItems=customItems, pars="values")
ind <- which( mod5b.pars$name=="vy")
vy <- apply( dat, 2, var, na.rm=TRUE )
mod5b.pars[ ind, "value" ] <- vy
ind <- which( mod5b.pars$name=="a1")
mod5b.pars[ ind, "value" ] <- .5* sqrt(vy)
ind <- which( mod5b.pars$name=="d")
mod5b.pars[ ind, "value" ] <- colMeans( dat, na.rm=TRUE )

# estimate model
mod5b <- mirt::mirt(dat, 1, itemtype=itemtype, customItems=customItems,
                 pars=mod5b.pars, verbose=TRUE    )
sirt::mirt.wrapper.coef(mod5b)$coef

# some item plots
    par(ask=TRUE)
plot(mod5b, type='trace', layout=c(1,1))
    par(ask=FALSE)
# Alternatively:
sirt::mirt.wrapper.itemplot(mod5b)

## End(Not run)

Datasets from Borg and Staufenbiel (2007)

Description

Datasets of the book of Borg and Staufenbiel (2007) Lehrbuch Theorien and Methoden der Skalierung.

Usage

data(data.bs07a)

Format

References

Borg, I., & Staufenbiel, T. (2007). Lehrbuch Theorie und Methoden der Skalierung. Bern: Hogrefe.

Examples

## Not run: 
#############################################################################
# EXAMPLE 07a: Dataset Gefechtsangst
#############################################################################

data(data.bs07a)
dat <- data.bs07a
items <- grep( "GF", colnames(dat), value=TRUE )

#************************
# Model 1: Rasch model
mod1 <- TAM::tam.mml(dat[,items] )
summary(mod1)
IRT.WrightMap(mod1)

#************************
# Model 2: 2PL model
mod2 <- TAM::tam.mml.2pl(dat[,items] )
summary(mod2)

#************************
# Model 3: Latent class analysis (LCA) with two classes
tammodel <- "
ANALYSIS:
  TYPE=LCA;
  NCLASSES(2)
  NSTARTS(5,10)
LAVAAN MODEL:
  F=~ GF1__GF9
  "
mod3 <- TAM::tamaan( tammodel, dat )
summary(mod3)

#************************
# Model 4: LCA with three classes
tammodel <- "
ANALYSIS:
  TYPE=LCA;
  NCLASSES(3)
  NSTARTS(5,10)
LAVAAN MODEL:
  F=~ GF1__GF9
  "
mod4 <- TAM::tamaan( tammodel, dat )
summary(mod4)

#************************
# Model 5: Located latent class model (LOCLCA) with two classes
tammodel <- "
ANALYSIS:
  TYPE=LOCLCA;
  NCLASSES(2)
  NSTARTS(5,10)
LAVAAN MODEL:
  F=~ GF1__GF9
  "
mod5 <- TAM::tamaan( tammodel, dat )
summary(mod5)

#************************
# Model 6: Located latent class model with three classes
tammodel <- "
ANALYSIS:
  TYPE=LOCLCA;
  NCLASSES(3)
  NSTARTS(5,10)
LAVAAN MODEL:
  F=~ GF1__GF9
  "
mod6 <- TAM::tamaan( tammodel, dat )
summary(mod6)

#************************
# Model 7: Probabilistic Guttman model
mod7 <- sirt::prob.guttman( dat[,items] )
summary(mod7)

#-- model comparison
IRT.compareModels( mod1, mod2, mod3, mod4, mod5, mod6, mod7 )

## End(Not run)

Examples with Datasets from Eid and Schmidt (2014)

Description

Examples with datasets from Eid and Schmidt (2014), illustrations with several R packages. The examples follow closely the online material of Hosoya (2014). The datasets are completely synthetic datasets which were resimulated from the originally available data.

Usage

data(data.eid.kap4)
data(data.eid.kap5)
data(data.eid.kap6)
data(data.eid.kap7)

Format

Source

The material and original datasets can be downloaded from http://www.hogrefe.de/buecher/lehrbuecher/psychlehrbuchplus/lehrbuecher/ testtheorie-und-testkonstruktion/zusatzmaterial/.

References

Eid, M., & Schmidt, K. (2014). Testtheorie und Testkonstruktion. Goettingen, Hogrefe.

Hosoya, G. (2014). Einfuehrung in die Analyse testtheoretischer Modelle mit R. Available at http://www.hogrefe.de/buecher/lehrbuecher/psychlehrbuchplus/lehrbuecher/testtheorie-und-testkonstruktion/zusatzmaterial/.

Examples

## Not run: 
miceadds::library_install("foreign")
#---- load some IRT packages in R
miceadds::library_install("TAM")        # package (a)
miceadds::library_install("mirt")       # package (b)
miceadds::library_install("sirt")       # package (c)
miceadds::library_install("eRm")        # package (d)
miceadds::library_install("ltm")        # package (e)
miceadds::library_install("psychomix")  # package (f)

#############################################################################
# EXAMPLES Ch. 4: Unidimensional IRT models | dichotomous data
#############################################################################

data(data.eid.kap4)
data0 <- data.eid.kap4

# load data
data0 <- foreign::read.spss( linkname, to.data.frame=TRUE, use.value.labels=FALSE)
# extract items
dat <- data0[,2:11]

#*********************************************************
# Model 1: Rasch model
#*********************************************************

#-----------
#-- 1a: estimation with TAM package

# estimation with tam.mml
mod1a <- TAM::tam.mml(dat)
summary(mod1a)

# person parameters in TAM
pp1a <- TAM::tam.wle(mod1a)

# plot item response functions
plot(mod1a,export=FALSE,ask=TRUE)

# Infit and outfit in TAM
itemf1a <- TAM::tam.fit(mod1a)
itemf1a

# model fit
modf1a <- TAM::tam.modelfit(mod1a)
summary(modf1a)

#-----------
#-- 1b: estimation with mirt package

# estimation with mirt
mod1b <- mirt::mirt( dat, 1, itemtype="Rasch")
summary(mod1b)
print(mod1b)

# person parameters
pp1b <- mirt::fscores(mod1b, method="WLE")

# extract coefficients
sirt::mirt.wrapper.coef(mod1b)

# plot item response functions
plot(mod1b, type="trace" )
par(mfrow=c(1,1))

# item fit
itemf1b <- mirt::itemfit(mod1b)
itemf1b

# model fit
modf1b <- mirt::M2(mod1b)
modf1b

#-----------
#-- 1c: estimation with sirt package

# estimation with rasch.mml2
mod1c <- sirt::rasch.mml2(dat)
summary(mod1c)

# person parameters (EAP)
pp1c <- mod1c$person

# plot item response functions
plot(mod1c, ask=TRUE )

# model fit
modf1c <- sirt::modelfit.sirt(mod1c)
summary(modf1c)

#-----------
#-- 1d: estimation with eRm package

# estimation with RM
mod1d <- eRm::RM(dat)
summary(mod1d)

# estimation person parameters
pp1d <- eRm::person.parameter(mod1d)
summary(pp1d)

# plot item response functions
eRm::plotICC(mod1d)

# person-item map
eRm::plotPImap(mod1d)

# item fit
itemf1d <- eRm::itemfit(pp1d)

# person fit
persf1d <- eRm::personfit(pp1d)

#-----------
#-- 1e: estimation with ltm package

# estimation with rasch
mod1e <- ltm::rasch(dat)
summary(mod1e)

# estimation person parameters
pp1e <- ltm::factor.scores(mod1e)

# plot item response functions
plot(mod1e)

# item fit
itemf1e <- ltm::item.fit(mod1e)

# person fit
persf1e <- ltm::person.fit(mod1e)

# goodness of fit with Bootstrap
modf1e <- ltm::GoF.rasch(mod1e,B=20)    # use more bootstrap samples
modf1e

#*********************************************************
# Model 2: 2PL model
#*********************************************************

#-----------
#-- 2a: estimation with TAM package

# estimation
mod2a <- TAM::tam.mml.2pl(dat)
summary(mod2a)

# model fit
modf2a <- TAM::tam.modelfit(mod2a)
summary(modf2a)

# item response functions
plot(mod2a, export=FALSE, ask=TRUE)

# model comparison
anova(mod1a,mod2a)

#-----------
#-- 2b: estimation with mirt package

# estimation
mod2b <- mirt::mirt(dat,1,itemtype="2PL")
summary(mod2b)
print(mod2b)
sirt::mirt.wrapper.coef(mod2b)

# model fit
modf2b <- mirt::M2(mod2b)
modf2b

#-----------
#-- 2c: estimation with sirt package

I <- ncol(dat)
# estimation
mod2c <- sirt::rasch.mml2(dat,est.a=1:I)
summary(mod2c)

# model fit
modf2c <- sirt::modelfit.sirt(mod2c)
summary(modf2c)

#-----------
#-- 2e: estimation with ltm package

# estimation
mod2e <- ltm::ltm(dat ~ z1 )
summary(mod2e)

# item response functions
plot(mod2e)

#*********************************************************
# Model 3: Mixture Rasch model
#*********************************************************

#-----------
#-- 3a: estimation with TAM package

# avoid "_" in column names if the "__" operator is used in
# the tamaan syntax
dat1 <- dat
colnames(dat1) <- gsub("_", "", colnames(dat1) )
# define tamaan model
tammodel <- "
ANALYSIS:
  TYPE=MIXTURE ;
  NCLASSES(2);
  NSTARTS(20,25);   # 20 random starts with 25 initial iterations each
LAVAAN MODEL:
  F=~ Freude1__Freude2
  F ~~ F
ITEM TYPE:
  ALL(Rasch);
    "
mod3a <- TAM::tamaan( tammodel, resp=dat1 )
summary(mod3a)
# extract item parameters
ipars <- mod2$itempartable_MIXTURE[ 1:10, ]
plot( 1:10, ipars[,3], type="o", ylim=range( ipars[,3:4] ), pch=16,
        xlab="Item", ylab="Item difficulty")
lines( 1:10, ipars[,4], type="l", col=2, lty=2)
points( 1:10, ipars[,4],  col=2, pch=2)

#-----------
#-- 3f: estimation with psychomix package

# estimation
mod3f <- psychomix::raschmix( as.matrix(dat), k=2, scores="meanvar")
summary(mod3f)
# plot class-specific item difficulties
plot(mod3f)

#############################################################################
# EXAMPLES Ch. 5: Unidimensional IRT models | polytomous data
#############################################################################

data(data.eid.kap5)
data0 <- data.eid.kap5
# extract items
dat <- data0[,2:7]

#*********************************************************
# Model 1: Partial credit model
#*********************************************************

#-----------
#-- 1a: estimation with TAM package

# estimation with tam.mml
mod1a <- TAM::tam.mml(dat)
summary(mod1a)

# person parameters in TAM
pp1a <- tam.wle(mod1a)

# plot item response functions
plot(mod1a,export=FALSE,ask=TRUE)

# Infit and outfit in TAM
itemf1a <- TAM::tam.fit(mod1a)
itemf1a

# model fit
modf1a <- TAM::tam.modelfit(mod1a)
summary(modf1a)

#-----------
#-- 1b: estimation with mirt package

# estimation with tam.mml
mod1b <- mirt::mirt( dat, 1, itemtype="Rasch")
summary(mod1b)
print(mod1b)
sirt::mirt.wrapper.coef(mod1b)

# plot item response functions
plot(mod1b, type="trace" )
par(mfrow=c(1,1))

# item fit
itemf1b <- mirt::itemfit(mod1b)
itemf1b

#-----------
#-- 1c: estimation with sirt package

# estimation with rm.facets
mod1c <- sirt::rm.facets(dat)
summary(mod1c)
summary(mod1a)

#-----------
#-- 1d: estimation with eRm package

# estimation
mod1d <- eRm::PCM(dat)
summary(mod1d)

# plot item response functions
eRm::plotICC(mod1d)

# person-item map
eRm::plotPImap(mod1d)

# item fit
itemf1d <- eRm::itemfit(pp1d)

#-----------
#-- 1e: estimation with ltm package

# estimation
mod1e <- ltm::gpcm(dat, constraint="1PL")
summary(mod1e)
# plot item response functions
plot(mod1e)

#*********************************************************
# Model 2: Generalized partial credit model
#*********************************************************

#-----------
#-- 2a: estimation with TAM package

# estimation with tam.mml
mod2a <- TAM::tam.mml.2pl(dat, irtmodel="GPCM")
summary(mod2a)

# model fit
modf2a <- TAM::tam.modelfit(mod2a)
summary(modf2a)

#-----------
#-- 2b: estimation with mirt package

# estimation
mod2b <- mirt::mirt( dat, 1, itemtype="gpcm")
summary(mod2b)
print(mod2b)
sirt::mirt.wrapper.coef(mod2b)

#-----------
#-- 2c: estimation with sirt package

# estimation with rm.facets
mod2c <- sirt::rm.facets(dat, est.a.item=TRUE)
summary(mod2c)

#-----------
#-- 2e: estimation with ltm package

# estimation
mod2e <- ltm::gpcm(dat)
summary(mod2e)
plot(mod2e)

## End(Not run)

Dataset European Social Survey 2005

Description

This dataset contains item loadings λ\lambda and intercepts ν\nu for 26 countries for the European Social Survey (ESS 2005; see Asparouhov & Muthen, 2014).

Usage

data(data.ess2005)

Format

The format of the dataset is:

List of 2
$ lambda: num [1:26, 1:4] 0.688 0.721 0.72 0.687 0.625 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:4] "ipfrule" "ipmodst" "ipbhprp" "imptrad"
$ nu : num [1:26, 1:4] 3.26 2.52 3.41 2.84 2.79 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:4] "ipfrule" "ipmodst" "ipbhprp" "imptrad"

References

Asparouhov, T., & Muthen, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling, 21(4), 1-14. doi:10.1080/10705511.2014.919210


C-Test Datasets

Description

Some datasets of C-tests are provided. The dataset data.g308 was used in Schroeders, Robitzsch and Schipolowski (2014).

Usage

data(data.g308)

Format

References

Schroeders, U., Robitzsch, A., & Schipolowski, S. (2014). A comparison of different psychometric approaches to modeling testlet structures: An example with C-tests. Journal of Educational Measurement, 51(4), 400-418.

Examples

## Not run: 
#############################################################################
# EXAMPLE 1: Dataset G308 from Schroeders et al. (2014)
#############################################################################

data(data.g308)
dat <- data.g308

library(TAM)
library(sirt)

# define testlets
testlet <- c(1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 5, 5, 6, 6, 6)

#****************************************
#*** Model 1: Rasch model
mod1 <- TAM::tam.mml(resp=dat, control=list(maxiter=300, snodes=1500))
summary(mod1)

#****************************************
#*** Model 2: Rasch testlet model

# testlets are dimensions, assign items to Q-matrix
TT <- length(unique(testlet))
Q <- matrix(0, nrow=ncol(dat), ncol=TT + 1)
Q[,1] <- 1 # First dimension constitutes g-factor
for (tt in 1:TT){Q[testlet==tt, tt+1] <- 1}

# In a testlet model, all dimensions are uncorrelated among
# each other, that is, all pairwise correlations are set to 0,
# which can be accomplished with the "variance.fixed" command
variance.fixed <- cbind(t( utils::combn(TT+1,2)), 0)
mod2 <- TAM::tam.mml(resp=dat, Q=Q, variance.fixed=variance.fixed,
            control=list(snodes=1500, maxiter=300))
summary(mod2)

#****************************************
#*** Model 3: Partial credit model

scores <- list()
testlet.names <- NULL
dat.pcm <- NULL
for (tt in 1:max(testlet) ){
   scores[[tt]] <- rowSums (dat[, testlet==tt, drop=FALSE])
   dat.pcm <- c(dat.pcm, list(c(scores[[tt]])))
   testlet.names <- append(testlet.names, paste0("testlet",tt) )
   }
dat.pcm <- as.data.frame(dat.pcm)
colnames(dat.pcm) <- testlet.names
mod3 <- TAM::tam.mml(resp=dat.pcm, control=list(snodes=1500, maxiter=300) )
summary(mod3)

#****************************************
#*** Model 4: Copula model

mod4 <- sirt::rasch.copula2 (dat=dat, itemcluster=testlet)
summary(mod4)

## End(Not run)

Dataset for Invariance Testing with 4 Groups

Description

Dataset for invariance testing with 4 groups.

Usage

data(data.inv4gr)

Format

A data frame with 4000 observations on the following 12 variables. The first variable is a group identifier, the other variables are items.

group

A group identifier

I01

a numeric vector

I02

a numeric vector

I03

a numeric vector

I04

a numeric vector

I05

a numeric vector

I06

a numeric vector

I07

a numeric vector

I08

a numeric vector

I09

a numeric vector

I10

a numeric vector

I11

a numeric vector

Source

Simulated dataset


Dataset 'Liking For Science'

Description

Dataset 'Liking for science' published by Wright and Masters (1982).

Usage

data(data.liking.science)

Format

The format is:

num [1:75, 1:24] 1 2 2 1 1 1 2 2 0 2 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:24] "LS01" "LS02" "LS03" "LS04" ...

References

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press.


Longitudinal Dataset

Description

This dataset contains 200 observations on 12 items. 6 items (I1T1, ...,I6T1) were administered at measurement occasion T1 and 6 items at T2 (I3T2, ..., I8T2). There were 4 anchor items which were presented at both time points. The first column in the dataset contains the student identifier.

Usage

data(data.long)

Format

The format of the dataset is

'data.frame': 200 obs. of 13 variables:
$ idstud: int 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 ...
$ I1T1 : int 1 1 1 1 1 1 1 0 1 1 ...
$ I2T1 : int 0 0 1 1 1 1 0 1 1 1 ...
$ I3T1 : int 1 0 1 1 0 1 0 0 0 0 ...
$ I4T1 : int 1 0 0 1 0 0 0 0 1 1 ...
$ I5T1 : int 1 0 0 1 0 0 0 0 1 0 ...
$ I6T1 : int 1 0 0 0 0 0 0 0 0 0 ...
$ I3T2 : int 1 1 0 0 1 1 1 1 0 1 ...
$ I4T2 : int 1 1 0 0 1 1 0 0 0 1 ...
$ I5T2 : int 1 0 1 1 1 1 1 0 1 1 ...
$ I6T2 : int 1 1 0 0 0 0 0 0 0 1 ...
$ I7T2 : int 1 0 0 0 0 0 0 0 0 1 ...
$ I8T2 : int 0 0 0 0 1 0 0 0 0 0 ...

Examples

## Not run: 
data(data.long)
dat <- data.long
dat <- dat[,-1]
I <- ncol(dat)

#*************************************************
# Model 1: 2-dimensional Rasch model
#*************************************************
# define Q-matrix
Q <- matrix(0,I,2)
Q[1:6,1] <- 1
Q[7:12,2] <- 1
rownames(Q) <- colnames(dat)
colnames(Q) <- c("T1","T2")

# vector with same items
itemnr <- as.numeric( substring( colnames(dat),2,2) )
# fix mean at T2 to zero
mu.fixed <- cbind( 2,0 )

#--- M1a: rasch.mml2 (in sirt)
mod1a <- sirt::rasch.mml2(dat, Q=Q, est.b=itemnr, mu.fixed=mu.fixed)
summary(mod1a)

#--- M1b: smirt (in sirt)
mod1b <- sirt::smirt(dat, Qmatrix=Q, irtmodel="comp", est.b=itemnr,
                  mu.fixed=mu.fixed )

#--- M1c: tam.mml (in TAM)

# assume equal item difficulty of I3T1 and I3T2, I4T1 and I4T2, ...
# create draft design matrix and modify it
A <- TAM::designMatrices(resp=dat)$A
dimnames(A)[[1]] <- colnames(dat)
  ##   > str(A)
  ##    num [1:12, 1:2, 1:12] 0 0 0 0 0 0 0 0 0 0 ...
  ##    - attr(*, "dimnames")=List of 3
  ##     ..$ : chr [1:12] "Item01" "Item02" "Item03" "Item04" ...
  ##     ..$ : chr [1:2] "Category0" "Category1"
  ##     ..$ : chr [1:12] "I1T1" "I2T1" "I3T1" "I4T1" ...
A1 <- A[,, c(1:6, 11:12 ) ]
A1[7,2,3] <- -1     # difficulty(I3T1)=difficulty(I3T2)
A1[8,2,4] <- -1     # I4T1=I4T2
A1[9,2,5] <- A1[10,2,6] <- -1
dimnames(A1)[[3]] <- substring( dimnames(A1)[[3]],1,2)
  ##   > A1[,2,]
  ##        I1 I2 I3 I4 I5 I6 I7 I8
  ##   I1T1 -1  0  0  0  0  0  0  0
  ##   I2T1  0 -1  0  0  0  0  0  0
  ##   I3T1  0  0 -1  0  0  0  0  0
  ##   I4T1  0  0  0 -1  0  0  0  0
  ##   I5T1  0  0  0  0 -1  0  0  0
  ##   I6T1  0  0  0  0  0 -1  0  0
  ##   I3T2  0  0 -1  0  0  0  0  0
  ##   I4T2  0  0  0 -1  0  0  0  0
  ##   I5T2  0  0  0  0 -1  0  0  0
  ##   I6T2  0  0  0  0  0 -1  0  0
  ##   I7T2  0  0  0  0  0  0 -1  0
  ##   I8T2  0  0  0  0  0  0  0 -1

# estimate model
# set intercept of second dimension (T2) to zero
beta.fixed <- cbind( 1, 2, 0 )
mod1c <- TAM::tam.mml( resp=dat, Q=Q, A=A1, beta.fixed=beta.fixed)
summary(mod1c)

#*************************************************
# Model 2: 2-dimensional 2PL model
#*************************************************

# set variance at T2 to 1
variance.fixed <- cbind(2,2,1)

# M2a: rasch.mml2 (in sirt)
mod2a <- sirt::rasch.mml2(dat, Q=Q, est.b=itemnr, est.a=itemnr, mu.fixed=mu.fixed,
             variance.fixed=variance.fixed, mmliter=100)
summary(mod2a)

#*************************************************
# Model 3: Concurrent calibration by assuming invariant item parameters
#*************************************************

library(mirt)   # use mirt for concurrent calibration
data(data.long)
dat <- data.long[,-1]
I <- ncol(dat)

# create user defined function for between item dimensionality 4PL model
name <- "4PLbw"
par <- c("low"=0,"upp"=1,"a"=1,"d"=0,"dimItem"=1)
est <- c(TRUE, TRUE,TRUE,TRUE,FALSE)
# item response function
irf <- function(par,Theta,ncat){
     low <- par[1]
     upp <- par[2]
     a <- par[3]
     d <- par[4]
     dimItem <- par[5]
     P1 <- low + ( upp - low ) * plogis( a*Theta[,dimItem] + d )
     cbind(1-P1, P1)
}

# create item response function
fourPLbetw <- mirt::createItem(name, par=par, est=est, P=irf)
head(dat)

# create mirt model (use variable names in mirt.model)
mirtsyn <- "
     T1=I1T1,I2T1,I3T1,I4T1,I5T1,I6T1
     T2=I3T2,I4T2,I5T2,I6T2,I7T2,I8T2
     COV=T1*T2,,T2*T2
     MEAN=T1
     CONSTRAIN=(I3T1,I3T2,d),(I4T1,I4T2,d),(I5T1,I5T2,d),(I6T1,I6T2,d),
                 (I3T1,I3T2,a),(I4T1,I4T2,a),(I5T1,I5T2,a),(I6T1,I6T2,a)
        "
# create mirt model
mirtmodel <- mirt::mirt.model( mirtsyn, itemnames=colnames(dat) )
# define parameters to be estimated
mod3.pars <- mirt::mirt(dat, mirtmodel$model, rep( "4PLbw",I),
                   customItems=list("4PLbw"=fourPLbetw), pars="values")
# select dimensions
ind <- intersect( grep("T2",mod3.pars$item), which( mod3.pars$name=="dimItem" ) )
mod3.pars[ind,"value"] <- 2
# set item parameters low and upp to non-estimated
ind <- which( mod3.pars$name %in% c("low","upp") )
mod3.pars[ind,"est"] <- FALSE

# estimate 2PL model
mod3 <- mirt::mirt(dat, mirtmodel$model, itemtype=rep( "4PLbw",I),
                customItems=list("4PLbw"=fourPLbetw), pars=mod3.pars, verbose=TRUE,
                technical=list(NCYCLES=50)  )
mirt.wrapper.coef(mod3)

#****** estimate model in lavaan
library(lavaan)

# specify syntax
lavmodel <- "
             #**** T1
             F1=~ a1*I1T1+a2*I2T1+a3*I3T1+a4*I4T1+a5*I5T1+a6*I6T1
             I1T1 | b1*t1 ; I2T1 | b2*t1 ; I3T1 | b3*t1 ; I4T1 | b4*t1
             I5T1 | b5*t1 ; I6T1 | b6*t1
             F1 ~~ 1*F1
             #**** T2
             F2=~ a3*I3T2+a4*I4T2+a5*I5T2+a6*I6T2+a7*I7T2+a8*I8T2
             I3T2 | b3*t1 ; I4T2 | b4*t1 ; I5T2 | b5*t1 ; I6T2 | b6*t1
             I7T2 | b7*t1 ; I8T2 | b8*t1
             F2 ~~ NA*F2
             F2 ~ 1
             #*** covariance
             F1 ~~ F2
                "
# estimate model using theta parameterization
mod3lav <- lavaan::cfa( data=dat, model=lavmodel,
            std.lv=TRUE, ordered=colnames(dat), parameterization="theta")
summary(mod3lav, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE)

#*************************************************
# Model 4: Linking with items of different item slope groups
#*************************************************

data(data.long)
dat <- data.long
# dataset for T1
dat1 <- dat[, grep( "T1", colnames(dat) ) ]
colnames(dat1) <- gsub("T1","", colnames(dat1) )
# dataset for T2
dat2 <- dat[, grep( "T2", colnames(dat) ) ]
colnames(dat2) <- gsub("T2","", colnames(dat2) )

# 2PL model with slope groups T1
mod1 <- sirt::rasch.mml2( dat1, est.a=c( rep(1,2), rep(2,4) ) )
summary(mod1)

# 2PL model with slope groups T2
mod2 <- sirt::rasch.mml2( dat2, est.a=c( rep(1,4), rep(2,2) ) )
summary(mod2)

#------- Link 1: Haberman Linking
# collect item parameters
dfr1 <- data.frame( "study1", mod1$item$item, mod1$item$a, mod1$item$b )
dfr2 <- data.frame( "study2", mod2$item$item, mod2$item$a, mod2$item$b )
colnames(dfr2) <- colnames(dfr1) <- c("study", "item", "a", "b" )
itempars <- rbind( dfr1, dfr2 )
# Linking
link1 <- sirt::linking.haberman(itempars=itempars)

#------- Link 2: Invariance alignment method
# create objects for invariance.alignment
nu <- rbind( c(mod1$item$thresh,NA,NA), c(NA,NA,mod2$item$thresh) )
lambda <- rbind( c(mod1$item$a,NA,NA), c(NA,NA,mod2$item$a ) )
colnames(lambda) <- colnames(nu) <- paste0("I",1:8)
rownames(lambda) <- rownames(nu) <- c("T1", "T2")
# Linking
link2a <- sirt::invariance.alignment( lambda, nu )
summary(link2a)

## End(Not run)

Datasets for Local Structural Equation Models / Moderated Factor Analysis

Description

Datasets for local structural equation models or moderated factor analysis.

Usage

data(data.lsem01)
data(data.lsem02)
data(data.lsem03)

Format

References

Hildebrandt, A., Luedtke, O., Robitzsch, A., Sommer, C., & Wilhelm, O. (2016). Exploring factor model parameters across continuous variables with local structural equation models. Multivariate Behavioral Research, 51(2-3), 257-278. doi:10.1080/00273171.2016.1142856

Hueluer, G., Wilhelm, O., & Robitzsch, A. (2011). Intelligence differentiation in early childhood. Journal of Individual Differences, 32(3), 170-179. doi:10.1027/1614-0001/a000049


Dataset Mathematics

Description

This is an example dataset involving Mathematics items for German fourth graders. Items are classified into several domains and subdomains (see Section Format). The dataset contains 664 students on 30 items.

Usage

data(data.math)

Format

The dataset is a list. The list element data contains the dataset with the demographic variables student ID (idstud) and a dummy variable for female students (female). The remaining variables (starting with M in the name) are the mathematics items.
The item metadata are included in the list element item which contains item name (item) and the testlet label (testlet). An item not included in a testlet is indicated by NA. Each item is allocated to one and only competence domain (domain).

The format is:

List of 2
$ data:'data.frame':
..$ idstud: int [1:664] 1001 1002 1003 ...
..$ female: int [1:664] 1 1 0 0 1 1 1 0 0 1 ...
..$ MA1 : int [1:664] 1 1 1 0 0 1 1 1 1 1 ...
..$ MA2 : int [1:664] 1 1 1 1 1 0 0 0 0 1 ...
..$ MA3 : int [1:664] 1 1 0 0 0 0 0 1 0 0 ...
..$ MA4 : int [1:664] 0 1 1 1 0 0 1 0 0 0 ...
..$ MB1 : int [1:664] 0 1 0 1 0 0 0 0 0 1 ...
..$ MB2 : int [1:664] 1 1 1 1 0 1 0 1 0 0 ...
..$ MB3 : int [1:664] 1 1 1 1 0 0 0 1 0 1 ...
[...]
..$ MH3 : int [1:664] 1 1 0 1 0 0 1 0 1 0 ...
..$ MH4 : int [1:664] 0 1 1 1 0 0 0 0 1 0 ...
..$ MI1 : int [1:664] 1 1 0 1 0 1 0 0 1 0 ...
..$ MI2 : int [1:664] 1 1 0 0 0 1 1 0 1 1 ...
..$ MI3 : int [1:664] 0 1 0 1 0 0 0 0 0 0 ...
$ item:'data.frame':
..$ item : Factor w/ 30 levels "MA1","MA2","MA3",..: 1 2 3 4 5 ...
..$ testlet : Factor w/ 9 levels "","MA","MB","MC",..: 2 2 2 2 3 3 ...
..$ domain : Factor w/ 3 levels "arithmetic","geometry",..: 1 1 1 ...
..$ subdomain: Factor w/ 9 levels "","addition",..: 2 2 2 2 7 7 ...


Some Datasets from McDonald's Test Theory Book

Description

Some datasets from McDonald (1999), especially related to using NOHARM for item response modeling. See Examples below.

Usage

data(data.mcdonald.act15)
data(data.mcdonald.LSAT6)
data(data.mcdonald.rape)

Format

Source

Tables in McDonald (1999)

References

McDonald, R. P. (1999). Test theory: A unified treatment. Psychology Press.

Examples

## Not run: 
#############################################################################
# EXAMPLE 1: LSAT6 data    | Chapter 12 McDonald (1999)
#############################################################################
data(data.mcdonald.act15)

#************
# Model 1: 2-parameter normal ogive model

#++ NOHARM estimation
I <- ncol(dat)
# covariance structure
P.pattern <- matrix( 0, ncol=1, nrow=1 )
P.init <- 1+0*P.pattern
# fix all entries in the loading matrix to 1
F.pattern <- matrix( 1, nrow=I, ncol=1 )
F.init <- F.pattern
# estimate model
mod1a <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern,
             F.init=F.init, P.pattern=P.pattern, P.init=P.init,
             writename="LSAT6__1dim_2pno", noharm.path=noharm.path, dec="," )
summary(mod1a, logfile="LSAT6__1dim_2pno__SUMMARY")

#++ pairwise marginal maximum likelihood estimation using the probit link
mod1b <- sirt::rasch.pml3( dat, est.a=1:I, est.sigma=FALSE)

#************
# Model 2: 1-parameter normal ogive model

#++ NOHARM estimation
# covariance structure
P.pattern <- matrix( 0, ncol=1, nrow=1 )
P.init <- 1+0*P.pattern
# fix all entries in the loading matrix to 1
F.pattern <- matrix( 2, nrow=I, ncol=1 )
F.init <- 1+0*F.pattern
# estimate model
mod2a <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern,
                F.init=F.init, P.pattern=P.pattern, P.init=P.init,
                writename="LSAT6__1dim_1pno", noharm.path=noharm.path, dec="," )
summary(mod2a, logfile="LSAT6__1dim_1pno__SUMMARY")

# PMML estimation
mod2b <- sirt::rasch.pml3( dat, est.a=rep(1,I), est.sigma=FALSE )
summary(mod2b)

#************
# Model 3: 3-parameter normal ogive model with fixed guessing parameters

#++ NOHARM estimation
# covariance structure
P.pattern <- matrix( 0, ncol=1, nrow=1 )
P.init <- 1+0*P.pattern
# fix all entries in the loading matrix to 1
F.pattern <- matrix( 1, nrow=I, ncol=1 )
F.init <- 1+0*F.pattern
# estimate model
mod <- sirt::R2noharm( dat=dat, model.type="CFA",  guesses=rep(.2,I),
            F.pattern=F.pattern, F.init=F.init, P.pattern=P.pattern,
            P.init=P.init, writename="LSAT6__1dim_3pno",
            noharm.path=noharm.path, dec="," )
summary(mod, logfile="LSAT6__1dim_3pno__SUMMARY")

#++ logistic link function employed in smirt function
mod1d <- sirt::smirt(dat, Qmatrix=F.pattern, est.a=matrix(1:I,I,1), c.init=rep(.2,I))
summary(mod1d)

#############################################################################
# EXAMPLE 2: ACT15 data    | Chapter 6 McDonald (1999)
#############################################################################
data(data.mcdonald.act15)
pm <- data.mcdonald.act15

#************
# Model 1: 2-dimensional exploratory factor analysis
mod1 <- sirt::R2noharm( pm=pm, n=1000, model.type="EFA", dimensions=2,
             writename="ACT15__efa_2dim", noharm.path=noharm.path, dec="," )
summary(mod1)

#************
# Model 2: 2-dimensional independent clusters basis solution
P.pattern <- matrix(1,2,2)
diag(P.pattern) <- 0
P.init <- 1+0*P.pattern
F.pattern <- matrix(0,15,2)
F.pattern[ c(1:5,11:15),1] <- 1
F.pattern[ c(6:10,11:15),2] <- 1
F.init <- F.pattern

# estimate model
mod2 <- sirt::R2noharm( pm=pm, n=1000,  model.type="CFA", F.pattern=F.pattern,
            F.init=F.init, P.pattern=P.pattern,P.init=P.init,
            writename="ACT15_indep_clusters", noharm.path=noharm.path, dec="," )
summary(mod2)

#************
# Model 3: Hierarchical model

P.pattern <- matrix(0,3,3)
P.init <- P.pattern
diag(P.init) <- 1
F.pattern <- matrix(0,15,3)
F.pattern[,1] <- 1    # all items load on g factor
F.pattern[ c(1:5,11:15),2] <- 1   # Items 1-5 and 11-15 load on first nested factor
F.pattern[ c(6:10,11:15),3] <- 1  # Items 6-10 and 11-15 load on second nested factor
F.init <- F.pattern

# estimate model
mod3 <- sirt::R2noharm( pm=pm, n=1000,  model.type="CFA", F.pattern=F.pattern,
           F.init=F.init, P.pattern=P.pattern, P.init=P.init,
           writename="ACT15_hierarch_model", noharm.path=noharm.path, dec="," )
summary(mod3)

#############################################################################
# EXAMPLE 3: Rape myth scale | Chapter 15 McDonald (1999)
#############################################################################
data(data.mcdonald.rape)
lambda <- data.mcdonald.rape$lambda
nu <- data.mcdonald.rape$nu

# obtain multiplier for factor loadings (Formula 15.5)
k <- sum( lambda[1,] * lambda[2,] ) / sum( lambda[2,]^2 )
  ##   [1] 1.263243

# additive parameter (Formula 15.7)
c <- sum( lambda[2,]*(nu[1,]-nu[2,]) ) / sum( lambda[2,]^2 )
  ##   [1] 1.247697

# SD in the female group
1/k
  ##   [1] 0.7916132

# M in the female group
- c/k
  ##   [1] -0.9876932

# Burt's coefficient of factorial congruence (Formula 15.10a)
sum( lambda[1,] * lambda[2,] ) / sqrt( sum( lambda[1,]^2 ) * sum( lambda[2,]^2 ) )
  ##   [1] 0.9727831

# congruence for mean parameters
sum(  (nu[1,]-nu[2,]) * lambda[2,] ) / sqrt( sum( (nu[1,]-nu[2,])^2 ) * sum( lambda[2,]^2 ) )
  ##   [1] 0.968176

## End(Not run)

Dataset with Mixed Dichotomous and Polytomous Item Responses

Description

Dataset with mixed dichotomous and polytomous item responses.

Usage

data(data.mixed1)

Format

A data frame with 1000 observations on the following 37 variables.

'data.frame': 1000 obs. of 37 variables:
$ I01: num 1 1 1 1 1 1 1 0 1 1 ...
$ I02: num 1 1 1 1 1 1 1 1 0 1 ...
[...]
$ I36: num 1 1 1 1 0 0 0 0 1 1 ...
$ I37: num 0 1 1 1 0 1 0 0 1 1 ...

Examples

data(data.mixed1)
apply( data.mixed1, 2, max )
  ##   I01 I02 I03 I04 I05 I06 I07 I08 I09 I10 I11 I12 I13 I14 I15 I16
  ##     1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
  ##   I17 I18 I19 I20 I21 I22 I23 I24 I25 I26 I27 I28 I29 I30 I31 I32
  ##     1   1   1   1   4   4   1   1   1   1   1   1   1   1   1   1
  ##   I33 I34 I35 I36 I37
  ##     1   1   1   1   1

Multilevel Datasets

Description

Datasets for conducting multilevel IRT analysis. This dataset is used in the examples of the function mcmc.2pno.ml.

Usage

data(data.ml1)
data(data.ml2)

Format


Datasets for NOHARM Analysis

Description

Datasets for analyses in NOHARM (see R2noharm).

Usage

data(data.noharmExC)
data(data.noharm18)

Format


Item Parameters for Three Studies Obtained by 1PL and 2PL Estimation

Description

The datasets contain item parameters to be prepared for linking using the function linking.haberman.

Usage

data(data.pars1.rasch)
data(data.pars1.2pl)

Format


Dataset from PIRLS Study with Missing Responses

Description

This is a dataset of the PIRLS 2011 study for 4th graders for the reading booklet 13 (the 'PIRLS reader') and 4 countries (Austria, Germany, France, Netherlands). Missing responses (missing by intention and not reached) are coded by 9.

Usage

data(data.pirlsmissing)

Format

A data frame with 3480 observations on the following 38 variables.

The format is:

'data.frame': 3480 obs. of 38 variables:
$ idstud : int 1000001 1000002 1000003 1000004 1000005 ...
$ country : Factor w/ 4 levels "AUT","DEU","FRA",..: 1 1 1 1 1 1 1 1 1 1 ...
$ studwgt : num 1.06 1.06 1.06 1.06 1.06 ...
$ R31G01M : int 1 1 1 1 1 1 0 1 1 0 ...
$ R31G02C : int 0 9 0 1 0 0 0 0 1 0 ...
$ R31G03M : int 1 1 1 1 0 1 0 0 1 1 ...
[...]
$ R31P15C : int 1 9 0 1 0 0 0 0 1 0 ...
$ R31P16C : int 0 0 0 0 0 0 0 9 0 1 ...

Examples

data(data.pirlsmissing)
# inspect missing rates
round( colMeans( data.pirlsmissing==9 ), 3 )
  ##    idstud  country  studwgt  R31G01M  R31G02C  R31G03M  R31G04C  R31G05M
  ##     0.000    0.000    0.000    0.009    0.076    0.012    0.203    0.018
  ##   R31G06M  R31G07M R31G08CZ R31G08CA R31G08CB  R31G09M  R31G10C  R31G11M
  ##     0.010    0.020    0.189    0.225    0.252    0.019    0.126    0.023
  ##   R31G12C R31G13CZ R31G13CA R31G13CB R31G13CC  R31G14M  R31P01M  R31P02C
  ##     0.202    0.170    0.198    0.220    0.223    0.074    0.013    0.039
  ##   R31P03C  R31P04M  R31P05C  R31P06C  R31P07C  R31P08M  R31P09C  R31P10M
  ##     0.056    0.012    0.075    0.043    0.074    0.024    0.062    0.025
  ##   R31P11M  R31P12M  R31P13M  R31P14C  R31P15C  R31P16C
  ##     0.027    0.030    0.030    0.126    0.130    0.127

Dataset PISA Mathematics

Description

This is an example PISA dataset of reading items from the PISA 2009 study of students from Austria. The dataset contains 565 students who worked on the 11 mathematics items from item cluster M3.

Usage

data(data.pisaMath)

Format

The dataset is a list. The list element data contains the dataset with the demographical variables student ID (idstud), school ID (idschool), a dummy variable for female students (female), socioeconomic status (hisei) and migration background (migra). The remaining variables (starting with M in the name) are the mathematics items.
The item metadata are included in the list element item which contains item name (item) and the testlet label (testlet). An item not included in a testlet is indicated by NA.

The format is:

List of 2
$ data:'data.frame':
..$ idstud : num [1:565] 9e+10 9e+10 9e+10 9e+10 9e+10 ...
..$ idschool: int [1:565] 900015 900015 900015 900015 ...
..$ female : int [1:565] 0 0 0 0 0 0 0 0 0 0 ...
..$ hisei : num [1:565] -1.16 -1.099 -1.588 -0.365 -1.588 ...
..$ migra : int [1:565] 0 0 0 0 0 0 0 0 0 1 ...
..$ M192Q01 : int [1:565] 1 0 1 1 1 1 1 0 0 0 ...
..$ M406Q01 : int [1:565] 1 1 1 0 1 0 0 0 1 0 ...
..$ M406Q02 : int [1:565] 1 0 0 0 1 0 0 0 1 0 ...
..$ M423Q01 : int [1:565] 0 1 0 1 1 1 1 1 1 0 ...
..$ M496Q01 : int [1:565] 1 0 0 0 0 0 0 0 1 0 ...
..$ M496Q02 : int [1:565] 1 0 0 1 0 1 0 1 1 0 ...
..$ M564Q01 : int [1:565] 1 1 1 1 1 1 0 0 1 0 ...
..$ M564Q02 : int [1:565] 1 0 1 1 1 0 0 0 0 0 ...
..$ M571Q01 : int [1:565] 1 0 0 0 1 0 0 0 0 0 ...
..$ M603Q01 : int [1:565] 1 0 0 0 1 0 0 0 0 0 ...
..$ M603Q02 : int [1:565] 1 0 0 0 1 0 0 0 1 0 ...
$ item:'data.frame':
..$ item : Factor w/ 11 levels "M192Q01","M406Q01",..: 1 2 3 4 ...
..$ testlet: chr [1:11] NA "M406" "M406" NA ...


Item Parameters from Two PISA Studies

Description

This data frame contains item parameters from two PISA studies. Because the Rasch model is used, only item difficulties are considered.

Usage

data(data.pisaPars)

Format

A data frame with 25 observations on the following 4 variables.

item

Item names

testlet

Items are arranged in corresponding testlets. These names are located in this column.

study1

Item difficulties of study 1

study2

Item difficulties of study 2


Dataset PISA Reading

Description

This is an example PISA dataset of reading items from the PISA 2009 study of students from Austria. The dataset contains 623 students who worked on the 12 reading items from item cluster R7.

Usage

data(data.pisaRead)

Format

The dataset is a list. The list element data contains the dataset with the demographical variables student ID (idstud), school ID (idschool), a dummy variable for female students (female), socioeconomic status (hisei) and migration background (migra). The remaining variables (starting with R in the name) are the reading items.
The item metadata are included in the list element item which contains item name (item), testlet label (testlet), item format (ItemFormat), text type (TextType) and text aspect (Aspect).

The format is:

List of 2
$ data:'data.frame':
..$ idstud : num [1:623] 9e+10 9e+10 9e+10 9e+10 9e+10 ...
..$ idschool: int [1:623] 900003 900003 900003 900003 ...
..$ female : int [1:623] 1 0 1 0 0 0 1 0 1 0 ...
..$ hisei : num [1:623] -1.16 -0.671 1.286 0.185 1.225 ...
..$ migra : int [1:623] 0 0 0 0 0 0 0 0 0 0 ...
..$ R432Q01 : int [1:623] 1 1 1 1 1 1 1 1 1 1 ...
..$ R432Q05 : int [1:623] 1 1 1 1 1 0 1 1 1 0 ...
..$ R432Q06 : int [1:623] 0 0 0 0 0 0 0 0 0 0 ...
..$ R456Q01 : int [1:623] 1 1 1 1 1 1 1 1 1 1 ...
..$ R456Q02 : int [1:623] 1 1 1 1 1 1 1 1 1 1 ...
..$ R456Q06 : int [1:623] 1 1 1 1 1 1 0 0 1 1 ...
..$ R460Q01 : int [1:623] 1 1 0 0 0 0 0 1 1 1 ...
..$ R460Q05 : int [1:623] 1 1 1 1 1 1 1 1 1 1 ...
..$ R460Q06 : int [1:623] 0 1 1 1 1 1 0 0 1 1 ...
..$ R466Q02 : int [1:623] 0 1 0 1 1 0 1 0 0 1 ...
..$ R466Q03 : int [1:623] 0 0 0 1 0 0 0 1 0 1 ...
..$ R466Q06 : int [1:623] 0 1 1 1 1 1 0 1 1 1 ...
$ item:'data.frame':
..$ item : Factor w/ 12 levels "R432Q01","R432Q05",..: 1 2 3 4 ...
..$ testlet : Factor w/ 4 levels "R432","R456",..: 1 1 1 2 ...
..$ ItemFormat: Factor w/ 2 levels "CR","MC": 1 1 2 2 1 1 1 2 2 1 ...
..$ TextType : Factor w/ 3 levels "Argumentation",..: 1 1 1 3 ...
..$ Aspect : Factor w/ 3 levels "Access_and_retrieve",..: 2 3 2 1 ...


Datasets for Pairwise Comparisons

Description

Some datasets for pairwise comparisons.

Usage

data(data.pw01)

Format

The dataset data.pw01 contains results of a German football league from the season 2000/01.


Rating Datasets

Description

Some rating datasets.

Usage

data(data.ratings1)
data(data.ratings2)
data(data.ratings3)

Format


Dataset with Raw Item Responses

Description

Dataset with raw item responses

Usage

data(data.raw1)

Format

A data frame with raw item responses of 1200 persons on the following 77 items:

'data.frame': 1200 obs. of 77 variables:
$ I101: num 0 0 0 2 0 0 0 0 0 0 ...
$ I102: int NA NA 2 1 2 1 3 2 NA NA ...
$ I103: int 1 1 NA NA NA NA NA NA 1 1 ...
...
$ I179: chr "E" "C" "D" "E" ...


Dataset Reading

Description

This dataset contains N=328N=328 students and I=12I=12 items measuring reading competence. All 12 items are arranged into 3 testlets (items with common text stimulus) labeled as A, B and C. The allocation of items to testlets is indicated by their variable names.

Usage

data(data.read)

Format

A data frame with 328 persons on the following 12 variables. Rows correspond to persons and columns to items. The following items are included in data.read:

Testlet A: A1, A2, A3, A4

Testlet B: B1, B2, B3, B4

Testlet C: C1, C2, C3, C4

Examples

## Not run: 
data(data.read)
dat <- data.read
I <- ncol(dat)

# list of needed packages for the following examples
packages <- scan(what="character")
     eRm  ltm  TAM mRm  CDM  mirt psychotools  IsingFit  igraph  qgraph  pcalg
     poLCA  randomLCA psychomix MplusAutomation lavaan

# load packages. make an installation if necessary
miceadds::library_install(packages)

#*****************************************************
# Model 1: Rasch model
#*****************************************************

#----  M1a: rasch.mml2 (in sirt)
mod1a <- sirt::rasch.mml2(dat)
summary(mod1a)

#----  M1b: smirt (in sirt)
Qmatrix <- matrix(1,nrow=I, ncol=1)
mod1b <- sirt::smirt(dat,Qmatrix=Qmatrix)
summary(mod1b)

#----  M1c: gdm (in CDM)
theta.k <- seq(-6,6,len=21)
mod1c <- CDM::gdm(dat,theta.k=theta.k,irtmodel="1PL", skillspace="normal")
summary(mod1c)

#----  M1d: tam.mml (in TAM)
mod1d <- TAM::tam.mml( resp=dat )
summary(mod1d)

#----  M1e: RM (in eRm)
mod1e <- eRm::RM( dat )
  # eRm uses Conditional Maximum Likelihood (CML) as the estimation method.
summary(mod1e)
eRm::plotPImap(mod1e)

#----  M1f: mrm (in mRm)
mod1f <- mRm::mrm( dat, cl=1)   # CML estimation
mod1f$beta  # item parameters

#----  M1g: mirt (in mirt)
mod1g <- mirt::mirt( dat, model=1, itemtype="Rasch", verbose=TRUE )
print(mod1g)
summary(mod1g)
coef(mod1g)
    # arrange coefficients in nicer layout
sirt::mirt.wrapper.coef(mod1g)$coef

#----  M1h: rasch (in ltm)
mod1h <- ltm::rasch( dat, control=list(verbose=TRUE ) )
summary(mod1h)
coef(mod1h)

#----  M1i: RaschModel.fit (in psychotools)
mod1i <- psychotools::RaschModel.fit(dat)  # CML estimation
summary(mod1i)
plot(mod1i)

#----  M1j: noharm.sirt (in sirt)
Fpatt <- matrix( 0, I, 1 )
Fval <- 1 + 0*Fpatt
Ppatt <- Pval <- matrix(1,1,1)
mod1j <- sirt::noharm.sirt( dat=dat, Ppatt=Ppatt, Fpatt=Fpatt, Fval=Fval, Pval=Pval)
summary(mod1j)
  #   Normal-ogive model, multiply item discriminations with constant D=1.7.
  #   The same holds for other examples with noharm.sirt and R2noharm.
plot(mod1j)

#----  M1k: rasch.pml3 (in sirt)
mod1k <- sirt::rasch.pml3( dat=dat)
  #         pairwise marginal maximum likelihood estimation
summary(mod1k)

#----  M1l: running Mplus (using MplusAutomation package)
mplus_path <- "c:/Mplus7/Mplus.exe"    # locate Mplus executable
#****************
  # specify Mplus object
mplusmod <- MplusAutomation::mplusObject(
    TITLE="1PL in Mplus ;",
    VARIABLE=paste0( "CATEGORICAL ARE ", paste0(colnames(dat),collapse=" ") ),
    MODEL="
       ! fix all item loadings to 1
       F1 BY A1@1 A2@1 A3@1 A4@1 ;
       F1 BY B1@1 B2@1 B3@1 B4@1 ;
       F1 BY C1@1 C2@1 C3@1 C4@1 ;
       ! estimate variance
       F1 ;
            ",
    ANALYSIS="ESTIMATOR=MLR;",
    OUTPUT="stand;",
    usevariables=colnames(dat),  rdata=dat )
#****************

  # write Mplus syntax
filename <- "mod1u"   # specify file name
  # create Mplus syntaxes
res2 <- MplusAutomation::mplusModeler(object=mplusmod, dataout=paste0(filename,".dat"),
               modelout=paste0(filename,".inp"), run=0 )
  # run Mplus model
MplusAutomation::runModels( filefilter=paste0(filename,".inp"), Mplus_command=mplus_path)
  # alternatively, the system() command can also be used
  # get results
mod1l <- MplusAutomation::readModels(target=getwd(), filefilter=filename )
mod1l$summaries    # summaries
mod1l$parameters$unstandardized   # parameter estimates

#*****************************************************
# Model 2: 2PL model
#*****************************************************

#----  M2a: rasch.mml2 (in sirt)
mod2a <- sirt::rasch.mml2(dat, est.a=1:I)
summary(mod2a)

#----  M2b: smirt (in sirt)
mod2b <- sirt::smirt(dat,Qmatrix=Qmatrix,est.a="2PL")
summary(mod2b)

#----  M2c: gdm (in CDM)
mod2c <- CDM::gdm(dat,theta.k=theta.k,irtmodel="2PL", skillspace="normal")
summary(mod2c)

#----  M2d: tam.mml (in TAM)
mod2d <- TAM::tam.mml.2pl( resp=dat )
summary(mod2d)

#----  M2e: mirt (in mirt)
mod2e <- mirt::mirt( dat, model=1, itemtype="2PL" )
print(mod2e)
summary(mod2e)
sirt::mirt.wrapper.coef(mod1g)$coef

#----  M2f: ltm (in ltm)
mod2f <- ltm::ltm( dat ~ z1, control=list(verbose=TRUE ) )
summary(mod2f)
coef(mod2f)
plot(mod2f)

#----  M2g: R2noharm (in NOHARM, running from within R using sirt package)
  # define noharm.path where 'NoharmCL.exe' is located
noharm.path <- "c:/NOHARM"
  # covariance matrix
P.pattern <- matrix( 1, ncol=1, nrow=1 )
P.init <- P.pattern
P.init[1,1] <- 1
  # loading matrix
F.pattern <- matrix(1,I,1)
F.init <- F.pattern
  # estimate model
mod2g <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern,
             F.init=F.init, P.pattern=P.pattern, P.init=P.init,
             writename="ex2g", noharm.path=noharm.path, dec="," )
summary(mod2g)

#----  M2h: noharm.sirt (in sirt)
mod2h <- sirt::noharm.sirt( dat=dat, Ppatt=P.pattern,Fpatt=F.pattern,
              Fval=F.init, Pval=P.init )
summary(mod2h)
plot(mod2h)

#----  M2i: rasch.pml2 (in sirt)
mod2i <- sirt::rasch.pml2(dat, est.a=1:I)
summary(mod2i)

#----  M2j: WLSMV estimation with cfa (in lavaan)
lavmodel <- "F=~ A1+A2+A3+A4+B1+B2+B3+B4+
                        C1+C2+C3+C4"
mod2j <- lavaan::cfa( data=dat, model=lavmodel, std.lv=TRUE, ordered=colnames(dat))
summary(mod2j, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE)

#*****************************************************
# Model 3: 3PL model (note that results can be quite unstable!)
#*****************************************************

#----  M3a: rasch.mml2 (in sirt)
mod3a <- sirt::rasch.mml2(dat, est.a=1:I, est.c=1:I)
summary(mod3a)

#----  M3b: smirt (in sirt)
mod3b <- sirt::smirt(dat,Qmatrix=Qmatrix,est.a="2PL", est.c=1:I)
summary(mod3b)

#----  M3c: mirt (in mirt)
mod3c <- mirt::mirt( dat, model=1, itemtype="3PL", verbose=TRUE)
summary(mod3c)
coef(mod3c)
  # stabilize parameter estimating using informative priors for guessing parameters
mirtmodel <- mirt::mirt.model("
            F=1-12
            PRIOR=(1-12, g, norm, -1.38, 0.25)
            ")
  # a prior N(-1.38,.25) is specified for transformed guessing parameters: qlogis(g)
  # simulate values from this prior for illustration
N <- 100000
logit.g <- stats::rnorm(N, mean=-1.38, sd=sqrt(.5) )
graphics::plot( stats::density(logit.g) )  # transformed qlogis(g)
graphics::plot( stats::density( stats::plogis(logit.g)) )  # g parameters
  # estimate 3PL with priors
mod3c1 <- mirt::mirt(dat, mirtmodel, itemtype="3PL",verbose=TRUE)
coef(mod3c1)
  # In addition, set upper bounds for g parameters of .35
mirt.pars <- mirt::mirt( dat, mirtmodel, itemtype="3PL",  pars="values")
ind <- which( mirt.pars$name=="g" )
mirt.pars[ ind, "value" ] <- stats::plogis(-1.38)
mirt.pars[ ind, "ubound" ] <- .35
  # prior distribution for slopes
ind <- which( mirt.pars$name=="a1" )
mirt.pars[ ind, "prior_1" ] <- 1.3
mirt.pars[ ind, "prior_2" ] <- 2
mod3c2 <- mirt::mirt(dat, mirtmodel, itemtype="3PL",
                pars=mirt.pars,verbose=TRUE, technical=list(NCYCLES=100) )
coef(mod3c2)
sirt::mirt.wrapper.coef(mod3c2)

#----  M3d: ltm (in ltm)
mod3d <- ltm::tpm( dat, control=list(verbose=TRUE), max.guessing=.3)
summary(mod3d)
coef(mod3d) #=> numerical instabilities

#*****************************************************
# Model 4: 3-dimensional Rasch model
#*****************************************************

# define Q-matrix
Q <- matrix( 0, nrow=12, ncol=3 )
Q[ cbind(1:12, rep(1:3,each=4) ) ] <- 1
rownames(Q) <- colnames(dat)
colnames(Q) <- c("A","B","C")

# define nodes
theta.k <- seq(-6,6,len=13)

#----  M4a: smirt (in sirt)
mod4a <- sirt::smirt(dat,Qmatrix=Q,irtmodel="comp", theta.k=theta.k, maxiter=30)
summary(mod4a)

#----  M4b: rasch.mml2 (in sirt)
mod4b <- sirt::rasch.mml2(dat,Q=Q,theta.k=theta.k, mmliter=30)
summary(mod4b)

#----  M4c: gdm (in CDM)
mod4c <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="normal",
            Qmatrix=Q, maxiter=30, centered.latent=TRUE )
summary(mod4c)

#----  M4d: tam.mml (in TAM)
mod4d <- TAM::tam.mml( resp=dat, Q=Q, control=list(nodes=theta.k, maxiter=30) )
summary(mod4d)

#----  M4e: R2noharm (in NOHARM, running from within R using sirt package)
noharm.path <- "c:/NOHARM"
  # covariance matrix
P.pattern <- matrix( 1, ncol=3, nrow=3 )
P.init <- 0.8+0*P.pattern
diag(P.init) <- 1
  # loading matrix
F.pattern <- 0*Q
F.init <- Q
  # estimate model
mod4e <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern,
    F.init=F.init, P.pattern=P.pattern, P.init=P.init,
    writename="ex4e", noharm.path=noharm.path, dec="," )
summary(mod4e)

#----  M4f: mirt (in mirt)
cmodel <- mirt::mirt.model("
     F1=1-4
     F2=5-8
     F3=9-12
     # equal item slopes correspond to the Rasch model
     CONSTRAIN=(1-4, a1), (5-8, a2), (9-12,a3)
     COV=F1*F2, F1*F3, F2*F3
     " )
mod4f <- mirt::mirt(dat, cmodel, verbose=TRUE)
summary(mod4f)

#*****************************************************
# Model 5: 3-dimensional 2PL model
#*****************************************************

#----  M5a: smirt (in sirt)
mod5a <- sirt::smirt(dat,Qmatrix=Q,irtmodel="comp", est.a="2PL", theta.k=theta.k,
                 maxiter=30)
summary(mod5a)

#----  M5b: rasch.mml2 (in sirt)
mod5b <- sirt::rasch.mml2(dat,Q=Q,theta.k=theta.k,est.a=1:12, mmliter=30)
summary(mod5b)

#----  M5c: gdm (in CDM)
mod5c <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, skillspace="loglinear",
            Qmatrix=Q, maxiter=30, centered.latent=TRUE,
            standardized.latent=TRUE)
summary(mod5c)

#----  M5d: tam.mml (in TAM)
mod5d <- TAM::tam.mml.2pl( resp=dat, Q=Q, control=list(nodes=theta.k, maxiter=30) )
summary(mod5d)

#----  M5e: R2noharm (in NOHARM, running from within R using sirt package)
noharm.path <- "c:/NOHARM"
  # covariance matrix
P.pattern <- matrix( 1, ncol=3, nrow=3 )
diag(P.pattern) <- 0
P.init <- 0.8+0*P.pattern
diag(P.init) <- 1
  # loading matrix
F.pattern <- Q
F.init <- Q
  # estimate model
mod5e <- sirt::R2noharm( dat=dat, model.type="CFA", F.pattern=F.pattern,
    F.init=F.init, P.pattern=P.pattern, P.init=P.init,
    writename="ex5e", noharm.path=noharm.path, dec="," )
summary(mod5e)

#----  M5f: mirt (in mirt)
cmodel <- mirt::mirt.model("
   F1=1-4
   F2=5-8
   F3=9-12
   COV=F1*F2, F1*F3, F2*F3
   "  )
mod5f <- mirt::mirt(dat, cmodel, verbose=TRUE)
summary(mod5f)

#*****************************************************
# Model 6: Network models (Graphical models)
#*****************************************************

#----  M6a: Ising model using the IsingFit package (undirected graph)
#        - fit Ising model using the "OR rule" (AND=FALSE)
mod6a <- IsingFit::IsingFit(x=dat, family="binomial", AND=FALSE)
summary(mod6a)
##           Network Density:                 0.29
##    Gamma:                  0.25
##    Rule used:              Or-rule
# plot results
qgraph::qgraph(mod6a$weiadj,fade=FALSE)

#**-- graph estimation using pcalg package

# some packages from Bioconductor must be downloaded at first (if not yet done)
if (FALSE){  # set 'if (TRUE)' if packages should be downloaded
     source("http://bioconductor.org/biocLite.R")
     biocLite("RBGL")
     biocLite("Rgraphviz")
}

#----  M6b: graph estimation based on Pearson correlations
V <- colnames(dat)
n <- nrow(dat)
mod6b <- pcalg::pc(suffStat=list(C=stats::cor(dat), n=n ),
             indepTest=gaussCItest, ## indep.test: partial correlations
             alpha=0.05, labels=V, verbose=TRUE)
plot(mod6b)
# plot in qgraph package
qgraph::qgraph(mod6b, label.color=rep( c( "red", "blue","darkgreen" ), each=4 ),
         edge.color="black")
summary(mod6b)

#----  M6c: graph estimation based on tetrachoric correlations
mod6c <- pcalg::pc(suffStat=list(C=sirt::tetrachoric2(dat)$rho, n=n ),
             indepTest=gaussCItest, alpha=0.05, labels=V, verbose=TRUE)
plot(mod6c)
summary(mod6c)

#----  M6d: Statistical implicative analysis (in sirt)
mod6d <- sirt::sia.sirt(dat, significance=.85 )
  # plot results with igraph and qgraph package
plot( mod6d$igraph.obj, vertex.shape="rectangle", vertex.size=30 )
qgraph::qgraph( mod6d$adj.matrix )

#*****************************************************
# Model 7: Latent class analysis with 3 classes
#*****************************************************

#----  M7a: randomLCA (in randomLCA)
  #        - use two trials of starting values
mod7a <- randomLCA::randomLCA(dat,nclass=3, notrials=2, verbose=TRUE)
summary(mod7a)
plot(mod7a,type="l", xlab="Item")

#----  M7b: rasch.mirtlc (in sirt)
mod7b <- sirt::rasch.mirtlc( dat, Nclasses=3,seed=-30,  nstarts=2 )
summary(mod7b)
matplot( t(mod7b$pjk), type="l", xlab="Item" )

#----  M7c: poLCA (in poLCA)
  #   define formula for outcomes
f7c <- paste0( "cbind(", paste0(colnames(dat),collapse=","), ") ~ 1 " )
dat1 <- as.data.frame( dat + 1 ) # poLCA needs integer values from 1,2,..
mod7c <- poLCA::poLCA( stats::as.formula(f7c),dat1,nclass=3, verbose=TRUE)
plot(mod7c)

#----  M7d: gom.em (in sirt)
  #    - the latent class model is a special grade of membership model
mod7d <- sirt::gom.em( dat, K=3, problevels=c(0,1),  model="GOM"  )
summary(mod7d)

#---- - M7e: mirt (in mirt)
  # define three latent classes
Theta <- diag(3)
  # define mirt model
I <- ncol(dat)  # I=12
mirtmodel <- mirt::mirt.model("
        C1=1-12
        C2=1-12
        C3=1-12
        ")
  # get initial parameter values
mod.pars <- mirt::mirt(dat, model=mirtmodel,  pars="values")
  # modify parameters: only slopes refer to item-class probabilities
set.seed(9976)
  # set starting values for class specific item probabilities
mod.pars[ mod.pars$name=="d","value" ]  <- 0
mod.pars[ mod.pars$name=="d","est" ]  <- FALSE
b1 <- stats::qnorm( colMeans( dat ) )
mod.pars[ mod.pars$name=="a1","value" ]  <- b1
  # random starting values for other classes
mod.pars[ mod.pars$name %in% c("a2","a3"),"value" ]  <- b1 + stats::runif(12*2,-1,1)
mod.pars
  #** define prior for latent class analysis
lca_prior <- function(Theta,Etable){
  # number of latent Theta classes
  TP <- nrow(Theta)
  # prior in initial iteration
  if ( is.null(Etable) ){
    prior <- rep( 1/TP, TP )
  }
  # process Etable (this is correct for datasets without missing data)
  if ( ! is.null(Etable) ){
    # sum over correct and incorrect expected responses
    prior <- ( rowSums(Etable[, seq(1,2*I,2)]) + rowSums(Etable[,seq(2,2*I,2)]) )/I
  }
  prior <- prior / sum(prior)
  return(prior)
}
  #** estimate model
mod7e <- mirt::mirt(dat, mirtmodel, pars=mod.pars, verbose=TRUE,
            technical=list( customTheta=Theta, customPriorFun=lca_prior) )
  # compare estimated results
print(mod7e)
summary(mod7b)
  # The number of estimated parameters is incorrect because mirt does not correctly count
  # estimated parameters from the user customized  prior distribution.
mod7e@nest <- as.integer(sum(mod.pars$est) + 2)  # two additional class probabilities
  # extract log-likelihood
mod7e@logLik
  # compute AIC and BIC
( AIC <- -2*mod7e@logLik+2*mod7e@nest )
( BIC <- -2*mod7e@logLik+log(mod7e@Data$N)*mod7e@nest )
  # RMSEA and SRMSR fit statistic
mirt::M2(mod7e)     # TLI and CFI does not make sense in this example
  #** extract item parameters
sirt::mirt.wrapper.coef(mod7e)
  #** extract class-specific item-probabilities
probs <- apply( coef1[, c("a1","a2","a3") ], 2, stats::plogis )
matplot( probs, type="l", xlab="Item", main="mirt::mirt")
  #** inspect estimated distribution
mod7e@Theta
mod7e@Prior[[1]]

#*****************************************************
# Model 8: Mixed Rasch model with two classes
#*****************************************************

#----  M8a: raschmix (in psychomix)
mod8a <- psychomix::raschmix(data=as.matrix(dat), k=2, scores="saturated")
summary(mod8a)

#----  M8b: mrm (in mRm)
mod8b <- mRm::mrm(data.matrix=dat, cl=2)
mod8b$conv.to.bound
plot(mod8b)
print(mod8b)

#----  M8c: mirt (in mirt)
  #* define theta grid
theta.k <- seq( -5, 5, len=9 )
TP <- length(theta.k)
Theta <- matrix( 0, nrow=2*TP, ncol=4)
Theta[1:TP,1:2] <- cbind(theta.k, 1 )
Theta[1:TP + TP,3:4] <- cbind(theta.k, 1 )
Theta
  # define model
I <- ncol(dat)  # I=12
mirtmodel <- mirt::mirt.model("
        F1a=1-12  # slope Class 1
        F1b=1-12  # difficulty Class 1
        F2a=1-12  # slope Class 2
        F2b=1-12  # difficulty Class 2
        CONSTRAIN=(1-12,a1),(1-12,a3)
        ")
  # get initial parameter values
mod.pars <- mirt::mirt(dat, model=mirtmodel,  pars="values")
  # set starting values for class specific item probabilities
mod.pars[ mod.pars$name=="d","value" ]  <- 0
mod.pars[ mod.pars$name=="d","est" ]  <- FALSE
mod.pars[ mod.pars$name=="a1","value" ]  <- 1
mod.pars[ mod.pars$name=="a3","value" ]  <- 1
  # initial values difficulties
b1 <-  stats::qlogis( colMeans(dat) )
mod.pars[ mod.pars$name=="a2","value" ]  <- b1
mod.pars[ mod.pars$name=="a4","value" ]  <- b1 + stats::runif(I, -1, 1)
  #* define prior for mixed Rasch analysis
mixed_prior <- function(Theta,Etable){
  NC <- 2   # number of theta classes
  TP <- nrow(Theta) / NC
  prior1 <- stats::dnorm( Theta[1:TP,1] )
  prior1 <- prior1 / sum(prior1)
  if ( is.null(Etable) ){   prior <- c( prior1, prior1 ) }
  if ( ! is.null(Etable) ){
    prior <- ( rowSums( Etable[, seq(1,2*I,2)] ) +
                   rowSums( Etable[,seq(2,2*I,2)]) )/I
    a1 <- stats::aggregate( prior, list( rep(1:NC, each=TP) ), sum )
    a1[,2] <- a1[,2] / sum( a1[,2])
    # print some information during estimation
    cat( paste0( " Class proportions: ",
              paste0( round(a1[,2], 3 ), collapse=" " ) ), "\n")
    a1 <- rep( a1[,2], each=TP )
    # specify mixture of two normal distributions
    prior <- a1*c(prior1,prior1)
  }
  prior <- prior / sum(prior)
  return(prior)
}
  #* estimate model
mod8c <- mirt::mirt(dat, mirtmodel, pars=mod.pars, verbose=TRUE,
        technical=list(  customTheta=Theta, customPriorFun=mixed_prior ) )
  # Like in Model 7e, the number of estimated parameters must be included.
mod8c@nest <- as.integer(sum(mod.pars$est) + 1)
      # two class proportions and therefore one probability is freely estimated.
  #* extract item parameters
sirt::mirt.wrapper.coef(mod8c)
  #* estimated distribution
mod8c@Theta
mod8c@Prior

#----  M8d: tamaan (in TAM)

tammodel <- "
ANALYSIS:
  TYPE=MIXTURE ;
  NCLASSES(2);
  NSTARTS(7,20);
LAVAAN MODEL:
  F=~ A1__C4
  F ~~ F
ITEM TYPE:
  ALL(Rasch);
    "
mod8d <- TAM::tamaan( tammodel, resp=dat )
summary(mod8d)
# plot item parameters
I <- 12
ipars <- mod8d$itempartable_MIXTURE[ 1:I, ]
plot( 1:I, ipars[,3], type="o", ylim=range( ipars[,3:4] ), pch=16,
        xlab="Item", ylab="Item difficulty")
lines( 1:I, ipars[,4], type="l", col=2, lty=2)
points( 1:I, ipars[,4],  col=2, pch=2)

#*****************************************************
# Model 9: Mixed 2PL model with two classes
#*****************************************************

#----  M9a: tamaan (in TAM)

tammodel <- "
ANALYSIS:
  TYPE=MIXTURE ;
  NCLASSES(2);
  NSTARTS(10,30);
LAVAAN MODEL:
  F=~ A1__C4
  F ~~ F
ITEM TYPE:
  ALL(2PL);
    "
mod9a <- TAM::tamaan( tammodel, resp=dat )
summary(mod9a)

#*****************************************************
# Model 10: Rasch testlet model
#*****************************************************

#----  M10a: tam.fa (in TAM)
dims <- substring( colnames(dat),1,1 )  # define dimensions
mod10a <- TAM::tam.fa( resp=dat, irtmodel="bifactor1", dims=dims,
                control=list(maxiter=60) )
summary(mod10a)

#----  M10b: mirt (in mirt)
cmodel <- mirt::mirt.model("
        G=1-12
        A=1-4
        B=5-8
        C=9-12
        CONSTRAIN=(1-12,a1), (1-4, a2), (5-8, a3), (9-12,a4)
      ")
mod10b <- mirt::mirt(dat, model=cmodel, verbose=TRUE)
summary(mod10b)
coef(mod10b)
mod10b@logLik   # equivalent is slot( mod10b, "logLik")

#alternatively, using a dimensional reduction approach (faster and better accuracy)
cmodel <- mirt::mirt.model("
      G=1-12
      CONSTRAIN=(1-12,a1), (1-4, a2), (5-8, a3), (9-12,a4)
     ")
item_bundles <- rep(c(1,2,3), each=4)
mod10b1 <- mirt::bfactor(dat, model=item_bundles, model2=cmodel, verbose=TRUE)
coef(mod10b1)

#----  M10c: smirt (in sirt)
  # define Q-matrix
Qmatrix <- matrix(0,12,4)
Qmatrix[,1] <- 1
Qmatrix[ cbind( 1:12, match( dims, unique(dims)) +1 ) ]  <- 1
  # uncorrelated factors
variance.fixed <- cbind( c(1,1,1,2,2,3), c(2,3,4,3,4,4), 0 )
  # estimate model
mod10c <- sirt::smirt( dat, Qmatrix=Qmatrix, irtmodel="comp",
              variance.fixed=variance.fixed, qmcnodes=1000, maxiter=60)
summary(mod10c)

#*****************************************************
# Model 11: Bifactor model
#*****************************************************

#----  M11a: tam.fa (in TAM)
dims <- substring( colnames(dat),1,1 )  # define dimensions
mod11a <- TAM::tam.fa( resp=dat, irtmodel="bifactor2", dims=dims,
                 control=list(maxiter=60) )
summary(mod11a)

#----  M11b: bfactor (in mirt)
dims1 <- match( dims, unique(dims) )
mod11b <- mirt::bfactor(dat, model=dims1, verbose=TRUE)
summary(mod11b)
coef(mod11b)
mod11b@logLik

#----  M11c: smirt (in sirt)
  # define Q-matrix
Qmatrix <- matrix(0,12,4)
Qmatrix[,1] <- 1
Qmatrix[ cbind( 1:12, match( dims, unique(dims)) +1 ) ]  <- 1
  # uncorrelated factors
variance.fixed <- cbind( c(1,1,1,2,2,3), c(2,3,4,3,4,4), 0 )
  # estimate model
mod11c <- sirt::smirt( dat, Qmatrix=Qmatrix, irtmodel="comp", est.a="2PL",
                variance.fixed=variance.fixed, qmcnodes=1000, maxiter=60)
summary(mod11c)

#*****************************************************
# Model 12: Located latent class model: Rasch model with three theta classes
#*****************************************************

# use 10th item as the reference item
ref.item <- 10
# ability grid
theta.k <- seq(-4,4,len=9)

#----  M12a: rasch.mirtlc (in sirt)
mod12a <- sirt::rasch.mirtlc(dat, Nclasses=3, modeltype="MLC1", ref.item=ref.item)
summary(mod12a)

#----  M12b: gdm (in CDM)
theta.k <- seq(-1, 1, len=3)      # initial matrix
b.constraint <- matrix( c(10,1,0), nrow=1,ncol=3)
  # estimate model
mod12b <- CDM::gdm( dat, theta.k=theta.k, skillspace="est", irtmodel="1PL",
              b.constraint=b.constraint, maxiter=200)
summary(mod12b)

#----  M12c: mirt (in mirt)
items <- colnames(dat)
  # define three latent classes
Theta <- diag(3)
  # define mirt model
I <- ncol(dat)  # I=12
mirtmodel <- mirt::mirt.model("
        C1=1-12
        C2=1-12
        C3=1-12
        CONSTRAIN=(1-12,a1),(1-12,a2),(1-12,a3)
        ")
  # get parameters
mod.pars <- mirt(dat, model=mirtmodel,  pars="values")
 # set starting values for class specific item probabilities
mod.pars[ mod.pars$name=="d","value" ]  <- stats::qlogis( colMeans(dat,na.rm=TRUE) )
  # set item difficulty of reference item to zero
ind <- which( ( paste(mod.pars$item)==items[ref.item] ) &
               ( ( paste(mod.pars$name)=="d" ) ) )
mod.pars[ ind,"value" ]  <- 0
mod.pars[ ind,"est" ]  <- FALSE
  # initial values for a1, a2 and a3
mod.pars[ mod.pars$name %in% c("a1","a2","a3"),"value" ]  <- c(-1,0,1)
mod.pars
  #* define prior for latent class analysis
lca_prior <- function(Theta,Etable){
  # number of latent Theta classes
  TP <- nrow(Theta)
  # prior in initial iteration
  if ( is.null(Etable) ){
    prior <- rep( 1/TP, TP )
              }
  # process Etable (this is correct for datasets without missing data)
  if ( ! is.null(Etable) ){
    # sum over correct and incorrect expected responses
    prior <- ( rowSums( Etable[, seq(1,2*I,2)] ) + rowSums( Etable[, seq(2,2*I,2)] ) )/I
            }
  prior <- prior / sum(prior)
  return(prior)
   }
 #* estimate model
mod12c <- mirt(dat, mirtmodel, technical=list(
            customTheta=Theta, customPriorFun=lca_prior),
            pars=mod.pars, verbose=TRUE )
  # estimated parameters from the user customized  prior distribution.
mod12c@nest <- as.integer(sum(mod.pars$est) + 2)
  #* extract item parameters
coef1 <- sirt::mirt.wrapper.coef(mod12c)
  #* inspect estimated distribution
mod12c@Theta
coef1$coef[1,c("a1","a2","a3")]
mod12c@Prior[[1]]

#*****************************************************
# Model 13: Multidimensional model with discrete traits
#*****************************************************
# define Q-Matrix
Q <- matrix( 0, nrow=12,ncol=3)
Q[1:4,1] <- 1
Q[5:8,2] <- 1
Q[9:12,3] <- 1
# define discrete theta distribution with 3 dimensions
Theta <- scan(what="character",nlines=1)
  000 100 010 001 110 101 011 111
Theta <- as.numeric( unlist( lapply( Theta, strsplit, split="")   ) )
Theta <- matrix(Theta, 8, 3, byrow=TRUE )
Theta

#----  Model 13a: din (in CDM)
mod13a <- CDM::din( dat, q.matrix=Q, rule="DINA")
summary(mod13a)
# compare used Theta distributions
cbind( Theta, mod13a$attribute.patt.splitted)

#----  Model 13b: gdm (in CDM)
mod13b <- CDM::gdm( dat, Qmatrix=Q, theta.k=Theta, skillspace="full")
summary(mod13b)

#----  Model 13c: mirt (in mirt)
  # define mirt model
I <- ncol(dat)  # I=12
mirtmodel <- mirt::mirt.model("
        F1=1-4
        F2=5-8
        F3=9-12
        ")
  # get parameters
mod.pars <- mirt(dat, model=mirtmodel,  pars="values")
# starting values d parameters (transformed guessing parameters)
ind <- which(  mod.pars$name=="d"  )
mod.pars[ind,"value"] <- stats::qlogis(.2)
# starting values transformed slipping parameters
ind <- which( ( mod.pars$name %in% paste0("a",1:3)  ) &  ( mod.pars$est ) )
mod.pars[ind,"value"] <- stats::qlogis(.8) - stats::qlogis(.2)
mod.pars

  #* define prior for latent class analysis
lca_prior <- function(Theta,Etable){
  TP <- nrow(Theta)
  if ( is.null(Etable) ){
    prior <- rep( 1/TP, TP )
              }
  if ( ! is.null(Etable) ){
    prior <- ( rowSums( Etable[, seq(1,2*I,2)] ) + rowSums( Etable[, seq(2,2*I,2)] ) )/I
            }
  prior <- prior / sum(prior)
  return(prior)
}
 #* estimate model
mod13c <- mirt(dat, mirtmodel, technical=list(
            customTheta=Theta, customPriorFun=lca_prior),
            pars=mod.pars, verbose=TRUE )
  # estimated parameters from the user customized  prior distribution.
mod13c@nest <- as.integer(sum(mod.pars$est) + 2)
  #* extract item parameters
coef13c <- sirt::mirt.wrapper.coef(mod13c)$coef
  #* inspect estimated distribution
mod13c@Theta
mod13c@Prior[[1]]

 #-* comparisons of estimated  parameters
# extract guessing and slipping parameters from din
dfr <- coef(mod13a)[, c("guess","slip") ]
colnames(dfr) <- paste0("din.",c("guess","slip") )
# estimated parameters from gdm
dfr$gdm.guess <- stats::plogis(mod13b$item$b)
dfr$gdm.slip <- 1 - stats::plogis( rowSums(mod13b$item[,c("b.Cat1","a.F1","a.F2","a.F3")] ) )
# estimated parameters from mirt
dfr$mirt.guess <- stats::plogis( coef13c$d )
dfr$mirt.slip <- 1 - stats::plogis( rowSums(coef13c[,c("d","a1","a2","a3")]) )
# comparison
round(dfr[, c(1,3,5,2,4,6)],3)
  ##      din.guess gdm.guess mirt.guess din.slip gdm.slip mirt.slip
  ##   A1     0.691     0.684      0.686    0.000    0.000     0.000
  ##   A2     0.491     0.489      0.489    0.031    0.038     0.036
  ##   A3     0.302     0.300      0.300    0.184    0.193     0.190
  ##   A4     0.244     0.239      0.240    0.337    0.340     0.339
  ##   B1     0.568     0.579      0.577    0.163    0.148     0.151
  ##   B2     0.329     0.344      0.340    0.344    0.326     0.329
  ##   B3     0.817     0.827      0.825    0.014    0.007     0.009
  ##   B4     0.431     0.463      0.456    0.104    0.089     0.092
  ##   C1     0.188     0.191      0.189    0.013    0.013     0.013
  ##   C2     0.050     0.050      0.050    0.239    0.238     0.239
  ##   C3     0.000     0.002      0.001    0.065    0.065     0.065
  ##   C4     0.000     0.004      0.000    0.212    0.212     0.212

# estimated class sizes
dfr <- data.frame( "Theta"=Theta, "din"=mod13a$attribute.patt$class.prob,
                   "gdm"=mod13b$pi.k, "mirt"=mod13c@Prior[[1]])
# comparison
round(dfr,3)
  ##     Theta.1 Theta.2 Theta.3   din   gdm  mirt
  ##   1       0       0       0 0.039 0.041 0.040
  ##   2       1       0       0 0.008 0.009 0.009
  ##   3       0       1       0 0.009 0.007 0.008
  ##   4       0       0       1 0.394 0.417 0.412
  ##   5       1       1       0 0.011 0.011 0.011
  ##   6       1       0       1 0.017 0.042 0.037
  ##   7       0       1       1 0.042 0.008 0.016
  ##   8       1       1       1 0.480 0.465 0.467

#*****************************************************
# Model 14: DINA model with two skills
#*****************************************************

# define some simple Q-Matrix (does not really make in this application)
Q <- matrix( 0, nrow=12,ncol=2)
Q[1:4,1] <- 1
Q[5:8,2] <- 1
Q[9:12,1:2] <- 1
# define discrete theta distribution with 3 dimensions
Theta <- scan(what="character",nlines=1)
  00 10 01 11
Theta <- as.numeric( unlist( lapply( Theta, strsplit, split="")   ) )
Theta <- matrix(Theta, 4, 2, byrow=TRUE )
Theta

#----  Model 14a: din (in CDM)
mod14a <- CDM::din( dat, q.matrix=Q, rule="DINA")
summary(mod14a)
# compare used Theta distributions
cbind( Theta, mod14a$attribute.patt.splitted)

#----  Model 14b: mirt (in mirt)
  # define mirt model
I <- ncol(dat)  # I=12
mirtmodel <- mirt::mirt.model("
        F1=1-4
        F2=5-8
        (F1*F2)=9-12
        ")
#-> constructions like (F1*F2*F3) are also allowed in mirt.model
  # get parameters
mod.pars <- mirt(dat, model=mirtmodel,  pars="values")
# starting values d parameters (transformed guessing parameters)
ind <- which(  mod.pars$name=="d"  )
mod.pars[ind,"value"] <- stats::qlogis(.2)
# starting values transformed slipping parameters
ind <- which( ( mod.pars$name %in% paste0("a",1:3)  ) &  ( mod.pars$est ) )
mod.pars[ind,"value"] <- stats::qlogis(.8) - stats::qlogis(.2)
mod.pars
 #* use above defined prior lca_prior
 # lca_prior <- function(prior,Etable) ...
 #* estimate model
mod14b <- mirt(dat, mirtmodel, technical=list(
            customTheta=Theta, customPriorFun=lca_prior),
            pars=mod.pars, verbose=TRUE )
  # estimated parameters from the user customized  prior distribution.
mod14b@nest <- as.integer(sum(mod.pars$est) + 2)
  #* extract item parameters
coef14b <- sirt::mirt.wrapper.coef(mod14b)$coef

 #-* comparisons of estimated  parameters
# extract guessing and slipping parameters from din
dfr <- coef(mod14a)[, c("guess","slip") ]
colnames(dfr) <- paste0("din.",c("guess","slip") )
# estimated parameters from mirt
dfr$mirt.guess <- stats::plogis( coef14b$d )
dfr$mirt.slip <- 1 - stats::plogis( rowSums(coef14b[,c("d","a1","a2","a3")]) )
# comparison
round(dfr[, c(1,3,2,4)],3)
  ##      din.guess mirt.guess din.slip mirt.slip
  ##   A1     0.674      0.671    0.030     0.030
  ##   A2     0.423      0.420    0.049     0.050
  ##   A3     0.258      0.255    0.224     0.225
  ##   A4     0.245      0.243    0.394     0.395
  ##   B1     0.534      0.543    0.166     0.164
  ##   B2     0.338      0.347    0.382     0.380
  ##   B3     0.796      0.802    0.016     0.015
  ##   B4     0.421      0.436    0.142     0.140
  ##   C1     0.850      0.851    0.000     0.000
  ##   C2     0.480      0.480    0.097     0.097
  ##   C3     0.746      0.746    0.026     0.026
  ##   C4     0.575      0.577    0.136     0.137

# estimated class sizes
dfr <- data.frame( "Theta"=Theta, "din"=mod13a$attribute.patt$class.prob,
                    "mirt"=mod14b@Prior[[1]])
# comparison
round(dfr,3)
  ##     Theta.1 Theta.2   din  mirt
  ##   1       0       0 0.357 0.369
  ##   2       1       0 0.044 0.049
  ##   3       0       1 0.047 0.031
  ##   4       1       1 0.553 0.551

#*****************************************************
# Model 15: Rasch model with non-normal distribution
#*****************************************************

# A non-normal theta distributed is specified by log-linear smoothing
# the distribution as described in
# Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model
# to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS.

# define theta grid
theta.k <- matrix( seq(-4,4,len=15), ncol=1 )
# define design matrix for smoothing (up to cubic moments)
delta.designmatrix <- cbind( 1, theta.k, theta.k^2, theta.k^3 )
# constrain item difficulty of fifth item (item B1) to zero
b.constraint <- matrix( c(5,1,0), ncol=3 )

#----  Model 15a: gdm (in CDM)
mod15a <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k,
               b.constraint=b.constraint  )
summary(mod15a)
 # plot estimated distribution
graphics::barplot( mod15a$pi.k[,1], space=0, names.arg=round(theta.k[,1],2),
           main="Estimated Skewed Distribution (gdm function)")

#----  Model 15b: mirt (in mirt)
 # define mirt model
mirtmodel <- mirt::mirt.model("
    F=1-12
    ")
 # get parameters
mod.pars <- mirt::mirt(dat, model=mirtmodel, pars="values", itemtype="Rasch")
  # fix variance (just for correct counting of parameters)
mod.pars[ mod.pars$name=="COV_11", "est"] <- FALSE
  # fix item difficulty
ind <- which( ( mod.pars$item=="B1" ) & ( mod.pars$name=="d" ) )
mod.pars[ ind, "value"] <- 0
mod.pars[ ind, "est"] <- FALSE

 # define prior
loglinear_prior <- function(Theta,Etable){
    TP <- nrow(Theta)
    if ( is.null(Etable) ){
    prior <- rep( 1/TP, TP )
           }
    # process Etable (this is correct for datasets without missing data)
    if ( ! is.null(Etable) ){
          # sum over correct and incorrect expected responses
       prior <- ( rowSums( Etable[, seq(1,2*I,2)] ) + rowSums( Etable[, seq(2,2*I,2)] ) )/I
       # smooth prior using the above design matrix and a log-linear model
       # see Xu & von Davier (2008).
       y <- log( prior + 1E-15 )
       lm1 <- lm( y ~ 0 + delta.designmatrix, weights=prior )
       prior <- exp(fitted(lm1))   # smoothed prior
           }
    prior <- prior / sum(prior)
    return(prior)
}

#* estimate model
mod15b <- mirt::mirt(dat, mirtmodel, technical=list(
                customTheta=theta.k, customPriorFun=loglinear_prior ),
                pars=mod.pars, verbose=TRUE )
# estimated parameters from the user customized prior distribution.
mod15b@nest <- as.integer(sum(mod.pars$est) + 3)
#* extract item parameters
coef1 <- sirt::mirt.wrapper.coef(mod15b)$coef

#** compare estimated item parameters
dfr <- data.frame( "gdm"=mod15a$item$b.Cat1, "mirt"=coef1$d )
rownames(dfr) <- colnames(dat)
round(t(dfr),4)
  ##            A1     A2      A3      A4 B1      B2     B3     B4     C1    C2     C3    C4
  ##   gdm  0.9818 0.1538 -0.7837 -1.3197  0 -1.0902 1.6088 -0.170 1.9778 0.006 1.1859 0.135
  ##   mirt 0.9829 0.1548 -0.7826 -1.3186  0 -1.0892 1.6099 -0.169 1.9790 0.007 1.1870 0.136
# compare estimated theta distribution
dfr <- data.frame( "gdm"=mod15a$pi.k, "mirt"=mod15b@Prior[[1]] )
round(t(dfr),4)
  ##        1 2     3     4      5      6      7      8      9     10     11     12     13
  ##   gdm  0 0 1e-04 9e-04 0.0056 0.0231 0.0652 0.1299 0.1881 0.2038 0.1702 0.1129 0.0612
  ##   mirt 0 0 1e-04 9e-04 0.0056 0.0232 0.0653 0.1300 0.1881 0.2038 0.1702 0.1128 0.0611
  ##            14    15
  ##   gdm  0.0279 0.011
  ##   mirt 0.0278 0.011

## End(Not run)

Datasets from Reckase' Book Multidimensional Item Response Theory

Description

Some simulated datasets from Reckase (2009).

Usage

data(data.reck21)
data(data.reck61DAT1)
data(data.reck61DAT2)
data(data.reck73C1a)
data(data.reck73C1b)
data(data.reck75C2)
data(data.reck78ExA)
data(data.reck79ExB)

Format

Source

Simulated datasets

References

Reckase, M. (2009). Multidimensional item response theory. New York: Springer. doi:10.1007/978-0-387-89976-3

Examples

## Not run: 
#############################################################################
# EXAMPLE 1: data.reck21 dataset, Table 2.1, p. 45
#############################################################################
data(data.reck21)

dat <- data.reck21$dat      # extract dataset

# items with zero guessing parameters
guess0 <- c( 1, 2, 3, 9,11,27,30,35,45,49,50 )
I <- ncol(dat)

#***
# Model 1: 3PL estimation using rasch.mml2
est.c <- est.a <- 1:I
est.c[ guess0 ] <- 0
mod1 <- sirt::rasch.mml2( dat, est.a=est.a, est.c=est.c, mmliter=300 )
summary(mod1)

#***
# Model 2: 3PL estimation using smirt
Q <- matrix(1,I,1)
mod2 <- sirt::smirt( dat, Qmatrix=Q, est.a="2PL", est.c=est.c, increment.factor=1.01)
summary(mod2)

#***
# Model 3: estimation in mirt package
library(mirt)
itemtype <- rep("3PL", I )
itemtype[ guess0 ] <- "2PL"
mod3 <- mirt::mirt(dat, 1, itemtype=itemtype, verbose=TRUE)
summary(mod3)

c3 <- unlist( coef(mod3) )[ 1:(4*I) ]
c3 <- matrix( c3, I, 4, byrow=TRUE )
# compare estimates of rasch.mml2, smirt and true parameters
round( cbind( mod1$item$c, mod2$item$c,c3[,3],data.reck21$pars$c ), 2 )
round( cbind( mod1$item$a, mod2$item$a.Dim1,c3[,1], data.reck21$pars$a ), 2 )
round( cbind( mod1$item$b, mod2$item$b.Dim1 / mod2$item$a.Dim1, - c3[,2] / c3[,1],
            data.reck21$pars$b ), 2 )

#############################################################################
# EXAMPLE 2: data.reck61 dataset, Table 6.1, p. 153
#############################################################################

data(data.reck61DAT1)
dat <- data.reck61DAT1$data

#***
# Model 1: Exploratory factor analysis

#-- Model 1a: tam.fa in TAM
library(TAM)
mod1a <- TAM::tam.fa( dat, irtmodel="efa", nfactors=3 )
# varimax rotation
varimax(mod1a$B.stand)

# Model 1b: EFA in NOHARM (Promax rotation)
mod1b <- sirt::R2noharm( dat=dat, model.type="EFA",  dimensions=3,
              writename="reck61__3dim_efa", noharm.path="c:/NOHARM",dec=",")
summary(mod1b)

# Model 1c: EFA with noharm.sirt
mod1c <- sirt::noharm.sirt( dat=dat, dimensions=3  )
summary(mod1c)
plot(mod1c)

# Model 1d: EFA with 2 dimensions in noharm.sirt
mod1d <- sirt::noharm.sirt( dat=dat, dimensions=2  )
summary(mod1d)
plot(mod1d, efa.load.min=.2)   # plot loadings of at least .20

#***
# Model 2: Confirmatory factor analysis

#-- Model 2a: tam.fa in TAM
dims <- c( rep(1,10), rep(3,10), rep(2,10)  )
Qmatrix <- matrix( 0, nrow=30, ncol=3 )
Qmatrix[ cbind( 1:30, dims) ] <- 1
mod2a <- TAM::tam.mml.2pl( dat,Q=Qmatrix,
            control=list( snodes=1000, QMC=TRUE, maxiter=200) )
summary(mod2a)

#-- Model 2b: smirt in sirt
mod2b <- sirt::smirt( dat,Qmatrix=Qmatrix, est.a="2PL", maxiter=20, qmcnodes=1000 )
summary(mod2b)

#-- Model 2c: rasch.mml2 in sirt
mod2c <- sirt::rasch.mml2( dat,Qmatrix=Qmatrix, est.a=1:30,
                mmliter=200, theta.k=seq(-5,5,len=11) )
summary(mod2c)

#-- Model 2d: mirt in mirt
cmodel <- mirt::mirt.model("
     F1=1-10
     F2=21-30
     F3=11-20
     COV=F1*F2, F1*F3, F2*F3 " )
mod2d <- mirt::mirt(dat, cmodel, verbose=TRUE)
summary(mod2d)
coef(mod2d)

#-- Model 2e: CFA in NOHARM
# specify covariance pattern
P.pattern <- matrix( 1, ncol=3, nrow=3 )
P.init <- .4*P.pattern
diag(P.pattern) <- 0
diag(P.init) <- 1
# fix all entries in the loading matrix to 1
F.pattern <- matrix( 0, nrow=30, ncol=3 )
F.pattern[1:10,1] <- 1
F.pattern[21:30,2] <- 1
F.pattern[11:20,3] <- 1
F.init <- F.pattern
# estimate model
mod2e <- sirt::R2noharm( dat=dat, model.type="CFA", P.pattern=P.pattern,
            P.init=P.init, F.pattern=F.pattern, F.init=F.init,
            writename="reck61__3dim_cfa", noharm.path="c:/NOHARM",dec=",")
summary(mod2e)

#-- Model 2f: CFA with noharm.sirt
mod2f <- sirt::noharm.sirt( dat=dat, Fval=F.init, Fpatt=F.pattern,
                 Pval=P.init, Ppatt=P.pattern )
summary(mod2f)

#############################################################################
# EXAMPLE 3: DETECT analysis data.reck78ExA and data.reck79ExB
#############################################################################

data(data.reck78ExA)
data(data.reck79ExB)

#************************
# Example A
dat <- data.reck78ExA$data
#- estimate person score
score <- stats::qnorm( ( rowMeans( dat )+.5 )  / ( ncol(dat) + 1 ) )
#- extract item cluster
itemcluster <- substring( colnames(dat), 1, 1 )
#- confirmatory DETECT Item cluster
detectA <- sirt::conf.detect( data=dat, score=score, itemcluster=itemcluster )
  ##          unweighted weighted
  ##   DETECT      0.571    0.571
  ##   ASSI        0.523    0.523
  ##   RATIO       0.757    0.757

#- exploratory DETECT analysis
detect_explA <- sirt::expl.detect(data=dat, score, nclusters=10, N.est=nrow(dat)/2  )
  ##  Optimal Cluster Size is  5  (Maximum of DETECT Index)
  ##     N.Cluster N.items N.est N.val         size.cluster DETECT.est ASSI.est
  ##   1         2      50  1250  1250                31-19      0.531    0.404
  ##   2         3      50  1250  1250             10-19-21      0.554    0.407
  ##   3         4      50  1250  1250           10-19-14-7      0.630    0.509
  ##   4         5      50  1250  1250         10-19-3-7-11      0.653    0.546
  ##   5         6      50  1250  1250       10-12-7-3-7-11      0.593    0.458
  ##   6         7      50  1250  1250      10-12-7-3-7-9-2      0.604    0.474
  ##   7         8      50  1250  1250    10-12-7-3-3-9-4-2      0.608    0.481
  ##   8         9      50  1250  1250  10-12-7-3-3-5-4-2-4      0.617    0.494
  ##   9        10      50  1250  1250 10-5-7-7-3-3-5-4-2-4      0.592    0.460

# cluster membership
cluster_membership <- detect_explA$itemcluster$cluster3
# Cluster 1:
colnames(dat)[ cluster_membership==1 ]
  ##   [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10"
# Cluster 2:
colnames(dat)[ cluster_membership==2 ]
  ##    [1] "B11" "B12" "B13" "B14" "B15" "B16" "B17" "B18" "B19" "B20" "B21" "B22"
  ##   [13] "B23" "B25" "B26" "B27" "B28" "B29" "B30"
# Cluster 3:
colnames(dat)[ cluster_membership==3 ]
  ##    [1] "B24" "C31" "C32" "C33" "C34" "C35" "C36" "C37" "C38" "C39" "C40" "C41"
  ##   [13] "C42" "C43" "C44" "C45" "C46" "C47" "C48" "C49" "C50"

#************************
# Example B
dat <- data.reck79ExB$data
#- estimate person score
score <- stats::qnorm( ( rowMeans( dat )+.5 )  / ( ncol(dat) + 1 ) )
#- extract item cluster
itemcluster <- substring( colnames(dat), 1, 1 )
#- confirmatory DETECT Item cluster
detectB <- sirt::conf.detect( data=dat, score=score, itemcluster=itemcluster )
  ##          unweighted weighted
  ##   DETECT      0.715    0.715
  ##   ASSI        0.624    0.624
  ##   RATIO       0.855    0.855

#- exploratory DETECT analysis
detect_explB <- sirt::expl.detect(data=dat, score, nclusters=10, N.est=nrow(dat)/2  )
  ##   Optimal Cluster Size is  4  (Maximum of DETECT Index)
  ##
  ##     N.Cluster N.items N.est N.val         size.cluster DETECT.est ASSI.est
  ##   1         2      50  1250  1250                30-20      0.665    0.546
  ##   2         3      50  1250  1250             10-20-20      0.686    0.585
  ##   3         4      50  1250  1250           10-20-8-12      0.728    0.644
  ##   4         5      50  1250  1250         10-6-14-8-12      0.654    0.553
  ##   5         6      50  1250  1250       10-6-14-3-12-5      0.659    0.561
  ##   6         7      50  1250  1250      10-6-14-3-7-5-5      0.664    0.576
  ##   7         8      50  1250  1250     10-6-7-7-3-7-5-5      0.616    0.518
  ##   8         9      50  1250  1250   10-6-7-7-3-5-5-5-2      0.612    0.512
  ##   9        10      50  1250  1250 10-6-7-7-3-5-3-5-2-2      0.613    0.512

## End(Not run)

Some Example Datasets for the sirt Package

Description

Some example datasets for the sirt package.

Usage

data(data.si01)
data(data.si02)
data(data.si03)
data(data.si04)
data(data.si05)
data(data.si06)
data(data.si07)
data(data.si08)
data(data.si09)
data(data.si10)

Format

References

Bartolucci, F., Montanari, G. E., & Pandolfi, S. (2012). Dimensionality of the latent structure and item selection via latent class multidimensional IRT models. Psychometrika, 77(4), 782-802. doi:10.1007/s11336-012-9278-0

DeCarlo, L. T. (2020). An item response model for true-false exams based on signal detection theory. Applied Psychological Measurement, 34(3). 234-248. doi:10.1177/0146621619843823

Fischer, R., & Karl, J. A. (2019). A primer to (cross-cultural) multi-group invariance testing possibilities in R. Frontiers in Psychology | Cultural Psychology, 10:1507. doi:10.3389/fpsyg.2019.01507

Fop, M., & Murphy, T. B. (2018). Variable selection methods for model-based clustering. Statistics Surveys, 12, 18-65. https://doi.org/10.1214/18-SS119

Goodman, L. A. (1970). The multivariate analysis of qualitative data: Interactions among multiple classifications. Journal of the American Statistical Association, 65(329), 226-256. doi:10.1080/01621459.1970.10481076

Lindsay, B., Clogg, C. C., & Grego, J. (1991). Semiparametric estimation in the Rasch model and related exponential response models, including a simple latent class model for item analysis. Journal of the American Statistical Association, 86(413), 96-107. doi:10.1080/01621459.1991.10475008

Rasch, D., Kubinger, K. D., & Yanagida, T. (2011). Statistics in psychology using R and SPSS. New York: Wiley. doi:10.1002/9781119979630

See Also

Some free datasets can be obtained from
Psychological questionnaires: http://personality-testing.info/_rawdata/
PISA 2012: http://pisa2012.acer.edu.au/downloads.php
PIAAC: http://www.oecd.org/site/piaac/publicdataandanalysis.htm
TIMSS 2011: http://timssandpirls.bc.edu/timss2011/international-database.html
ALLBUS: http://www.gesis.org/allbus/allbus-home/

Examples

## Not run: 
#############################################################################
# EXAMPLE 1: Nested logit model multiple choice dataset data.si06
#############################################################################

data(data.si06, package="sirt")
dat <- data.si06

#** estimate 2PL nested logit model
library(mirt)
mod1 <- mirt::mirt( dat, model=1, itemtype="2PLNRM", key=rep(0,ncol(dat) ),
            verbose=TRUE  )
summary(mod1)
cmod1 <- sirt::mirt.wrapper.coef(mod1)$coef
cmod1[,-1] <- round( cmod1[,-1], 3)

#** normalize item parameters according Suh and Bolt (2010)
cmod2 <- cmod1

# slope parameters
ind <-  grep("ak",colnames(cmod2))
h1 <- cmod2[,ind ]
cmod2[,ind] <- t( apply( h1, 1, FUN=function(ll){ ll - mean(ll) } ) )
# item intercepts
ind <-  paste0( "d", 0:9 )
ind <- which( colnames(cmod2) %in% ind )
h1 <- cmod2[,ind ]
cmod2[,ind] <- t( apply( h1, 1, FUN=function(ll){ ll - mean(ll) } ) )
cmod2[,-1] <- round( cmod2[,-1], 3)

#############################################################################
# EXAMPLE 2: Item response modle based on signal detection theory (IRSDT model)
#############################################################################

data(data.si07, package="sirt")
data <- data.si07

#-- simulate data
set.seed(98)
N <- 2000 # define sample size
# generate membership scores
lambda <- sample(size=N, x=data$trait$x, prob=data$trait$prob, replace=TRUE)
b <- data$pars$b
d <- data$pars$d
items <- data$pars$item
dat <- data$sim_fun(lambda=lambda, b=b, d=d, items=items)

#- estimate IRSDT model as a grade of membership model with two classes
problevels <- seq( 0.025, 0.975, length=20 )
mod1 <- sirt::gom.em( dat, K=2, problevels=problevels )
summary(mod1)

## End(Not run)

Dataset TIMSS Mathematics

Description

This datasets contains TIMSS mathematics data from 345 students on 25 items.

Usage

data(data.timss)

Format

This dataset is a list. data is the dataset containing student ID (idstud), a dummy variable for female (girl) and student age (age). The following variables (starting with M in the variable name are items.

The format is:

List of 2
$ data:'data.frame':
..$ idstud : num [1:345] 4e+09 4e+09 4e+09 4e+09 4e+09 ...
..$ girl : int [1:345] 0 0 0 0 0 0 0 0 1 0 ...
..$ age : num [1:345] 10.5 10 10.25 10.25 9.92 ...
..$ M031286 : int [1:345] 0 0 0 1 1 0 1 0 1 0 ...
..$ M031106 : int [1:345] 0 0 0 1 1 0 1 1 0 0 ...
..$ M031282 : int [1:345] 0 0 0 1 1 0 1 1 0 0 ...
..$ M031227 : int [1:345] 0 0 0 0 1 0 0 0 0 0 ...
[...]
..$ M041203 : int [1:345] 0 0 0 1 1 0 0 0 0 1 ...
$ item:'data.frame':
..$ item : Factor w/ 25 levels "M031045","M031068",..: ...
..$ Block : Factor w/ 2 levels "M01","M02": 1 1 1 1 1 1 ..
..$ Format : Factor w/ 2 levels "CR","MC": 1 1 1 1 2 ...
..$ Content.Domain : Factor w/ 3 levels "Data Display",..: 3 3 3 3 ...
..$ Cognitive.Domain: Factor w/ 3 levels "Applying","Knowing",..: 2 3 3 ..


TIMSS 2007 Grade 8 Mathematics and Science Russia

Description

This TIMSS 2007 dataset contains item responses of 4472 eigth grade Russian students in Mathematics and Science.

Usage

data(data.timss07.G8.RUS)

Format

The datasets contains raw responses (raw), scored responses (scored) and item informations (iteminfo).

The format of the dataset is:

List of 3
$ raw :'data.frame':
..$ idstud : num [1:4472] 3010101 3010102 3010104 3010105 3010106 ...
..$ M022043 : atomic [1:4472] NA 1 4 NA NA NA NA NA NA NA ...
.. ..- attr(*, "value.labels")=Named num [1:7] 9 6 5 4 3 2 1
.. .. ..- attr(*, "names")=chr [1:7] "OMITTED" "NOT REACHED" "E" "D*" ...
[...]
..$ M032698 : atomic [1:4472] NA NA NA NA NA NA NA 2 1 NA ...
.. ..- attr(*, "value.labels")=Named num [1:6] 9 6 4 3 2 1
.. .. ..- attr(*, "names")=chr [1:6] "OMITTED" "NOT REACHED" "D" "C" ...
..$ M032097 : atomic [1:4472] NA NA NA NA NA NA NA 2 3 NA ...
.. ..- attr(*, "value.labels")=Named num [1:6] 9 6 4 3 2 1
.. .. ..- attr(*, "names")=chr [1:6] "OMITTED" "NOT REACHED" "D" "C*" ...
.. [list output truncated]
$ scored : num [1:4472, 1:443] 3010101 3010102 3010104 3010105 3010106 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:443] "idstud" "M022043" "M022046" "M022049" ...
$ iteminfo:'data.frame':
..$ item : Factor w/ 442 levels "M022043","M022046",..: 1 2 3 4 5 6 21 7 8 17 ...
..$ content : Factor w/ 8 levels "Algebra","Biology",..: 7 7 6 1 6 7 4 6 7 7 ...
..$ topic : Factor w/ 49 levels "Algebraic Expression",..: 32 32 41 29 ...
..$ cognitive : Factor w/ 3 levels "Applying","Knowing",..: 2 1 3 2 1 1 1 1 2 1 ...
..$ item.type : Factor w/ 2 levels "CR","MC": 2 1 2 2 1 2 2 2 2 1 ...
..$ N.options : Factor w/ 4 levels "-"," -","4","5": 4 1 3 4 1 4 4 4 3 1 ...
..$ key : Factor w/ 7 levels "-"," -","A","B",..: 6 1 6 7 1 5 5 4 6 1 ...
..$ max.points: int [1:442] 1 1 1 1 1 1 1 1 1 2 ...
..$ item.label: Factor w/ 432 levels "1 teacher for every 12 students ",..: 58 351 ...

Source

TIMSS 2007 8th Grade, Russian Sample


Dataset Used in Stoyan, Pommerening and Wuensche (2018)

Description

Dataset used in Stoyan, Pommerening and Wuensche (2018; see also Pommerening et al., 2018). In the dataset, 15 forest managers classify 387 trees either as trees to be maintained or as trees to be removed. They assign tree marks, either 0 or 1, where mark 1 means remove.

Usage

data(data.trees)

Format

The dataset has the following structure.

'data.frame': 387 obs. of 16 variables:
$ Number: int 142 184 9 300 374 42 382 108 125 201 ...
$ FM1 : int 1 1 1 1 1 1 1 1 1 0 ...
$ FM2 : int 1 1 1 0 1 1 1 1 1 1 ...
$ FM3 : int 1 0 1 1 1 1 1 1 1 1 ...
$ FM4 : int 1 1 1 1 1 1 0 1 1 1 ...
$ FM5 : int 1 1 1 1 1 1 0 0 0 1 ...
$ FM6 : int 1 1 1 1 0 1 1 1 1 0 ...
$ FM7 : int 1 0 1 1 0 0 1 0 1 1 ...
$ FM8 : int 1 1 1 1 1 0 0 1 0 1 ...
$ FM9 : int 1 1 0 1 1 1 1 0 1 1 ...
$ FM10 : int 0 1 1 0 1 1 1 1 0 0 ...
$ FM11 : int 1 1 1 1 0 1 1 0 1 0 ...
$ FM12 : int 1 1 1 1 1 1 0 1 0 0 ...
$ FM13 : int 0 1 0 0 1 1 1 1 1 1 ...
$ FM14 : int 1 1 1 1 1 0 1 1 1 1 ...
$ FM15 : int 1 1 0 1 1 0 1 0 0 1 ...

Source

https://www.pommerening.org/wiki/images/d/dc/CoedyBreninSortedforPublication.txt

References

Pommerening, A., Ramos, C. P., Kedziora, W., Haufe, J., & Stoyan, D. (2018). Rating experiments in forestry: How much agreement is there in tree marking? PloS ONE, 13(3), e0194747. doi:10.1371/journal.pone.0194747

Stoyan, D., Pommerening, A., & Wuensche, A. (2018). Rater classification by means of set-theoretic methods applied to forestry data. Journal of Environmental Statistics, 8(2), 1-17.

Examples

## Not run: 
#############################################################################
# EXAMPLE 1: Latent class models, latent trait models, mixed membership models
#############################################################################

data(data.trees, package="sirt")
dat <- data.trees[,-1]
I <- ncol(dat)

#** latent class models with 2, 3, and 4 classes
problevels <- seq( 0, 1, len=2 )
mod02 <- sirt::gom.em(dat, K=2, problevels, model="GOM")
mod03 <- sirt::gom.em(dat, K=3, problevels, model="GOM")
mod04 <- sirt::gom.em(dat, K=4, problevels, model="GOM")

#** grade of membership models
mod11 <- sirt::gom.em(dat, K=2, theta0.k=10*seq(-1,1,len=11), model="GOMnormal")
problevels <- seq( 0, 1, len=3 )
mod12 <- sirt::gom.em(dat, K=2, problevels, model="GOM")
mod13 <- sirt::gom.em(dat, K=3, problevels, model="GOM")
mod14 <- sirt::gom.em(dat, K=4, problevels, model="GOM")
problevels <- seq( 0, 1, len=4 )
mod22 <- sirt::gom.em(dat, K=2, problevels, model="GOM")
mod23 <- sirt::gom.em(dat, K=3, problevels, model="GOM")
mod24 <- sirt::gom.em(dat, K=4, problevels, model="GOM")

#** latent trait models
#- 1PL
mod31 <- sirt::rasch.mml2(dat)
#- 2PL
mod32 <- sirt::rasch.mml2(dat, est.a=1:I)

#- model comparison
IRT.compareModels(mod02, mod03, mod04, mod11, mod12, mod13, mod14,
                     mod22, mod23, mod24, mod31, mod32)

#-- inspect model results
summary(mod12)
round( cbind( mod12$theta.k, mod12$pi.k ),3)

summary(mod13)
round(cbind( mod13$theta.k, mod13$pi.k ),3)

## End(Not run)

Converting a Data Frame from Wide Format in a Long Format

Description

Converts a data frame in wide format into long format.

Usage

data.wide2long(dat, id=NULL, X=NULL, Q=NULL)

Arguments

dat

Data frame with item responses and a person identifier if id !=NULL.

id

An optional string with the variable name of the person identifier.

X

Data frame with person covariates for inclusion in the data frame of long format

Q

Data frame with item predictors. Item labels must be included as a column named by "item".

Value

Data frame in long format

Examples

## Not run: 
#############################################################################
# EXAMPLE 1: data.pisaRead
#############################################################################
miceadds::library_install("lme4")

data(data.pisaRead)
dat <- data.pisaRead$data
Q <- data.pisaRead$item   # item predictors

# define items
items <- colnames(dat)[ substring( colnames(dat), 1, 1 )=="R" ]
dat1 <- dat[, c( "idstud", items ) ]
# matrix with person predictors
X <- dat[, c("idschool", "hisei", "female", "migra") ]

# create dataset in long format
dat.long <- sirt::data.wide2long( dat=dat1, id="idstud", X=X, Q=Q )

#***
# Model 1: Rasch model
mod1 <- lme4::glmer( resp ~ 0 + ( 1 | idstud ) + as.factor(item), data=dat.long,
            family="binomial", verbose=TRUE)
summary(mod1)

#***
# Model 2: Rasch model and inclusion of person predictors
mod2 <- lme4::glmer( resp ~ 0 + ( 1 | idstud ) + as.factor(item) + female + hisei + migra,
           data=dat.long, family="binomial", verbose=TRUE)
summary(mod2)

#***
# Model 3: LLTM
mod3 <- lme4::glmer(resp ~ (1|idstud) + as.factor(ItemFormat) + as.factor(TextType),
            data=dat.long, family="binomial", verbose=TRUE)
summary(mod3)

#############################################################################
# EXAMPLE 2: Rasch model in lme4
#############################################################################

set.seed(765)
N <- 1000  # number of persons
I <- 10    # number of items
b <- seq(-2,2,length=I)
dat <- sirt::sim.raschtype( stats::rnorm(N,sd=1.2), b=b )
dat.long <- sirt::data.wide2long( dat=dat )
#***
# estimate Rasch model with lmer
library(lme4)
mod1 <- lme4::glmer( resp ~ 0 + as.factor( item ) + ( 1 | id_index), data=dat.long,
             verbose=TRUE, family="binomial")
summary(mod1)
  ##   Random effects:
  ##    Groups   Name        Variance Std.Dev.
  ##    id_index (Intercept) 1.454    1.206
  ##   Number of obs: 10000, groups: id_index, 1000
  ##
  ##   Fixed effects:
  ##                        Estimate Std. Error z value Pr(>|z|)
  ##   as.factor(item)I0001  2.16365    0.10541  20.527  < 2e-16 ***
  ##   as.factor(item)I0002  1.66437    0.09400  17.706  < 2e-16 ***
  ##   as.factor(item)I0003  1.21816    0.08700  14.002  < 2e-16 ***
  ##   as.factor(item)I0004  0.68611    0.08184   8.383  < 2e-16 ***
  ##   [...]

## End(Not run)

Calculation of the DETECT and polyDETECT Index

Description

This function calculated the DETECT and polyDETECT index (Stout, Habing, Douglas & Kim, 1996; Zhang & Stout, 1999a; Zhang, 2007). At first, conditional covariances have to be estimated using the ccov.np function.

Usage

detect.index(ccovtable, itemcluster)

Arguments

ccovtable

A value of ccov.np.

itemcluster

Item cluster for each item. The order of entries must correspond to the columns in data (submitted to ccov.np).

References

Stout, W., Habing, B., Douglas, J., & Kim, H. R. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20, 331-354.

Zhang, J., & Stout, W. (1999a). Conditional covariance structure of generalized compensatory multidimensional items. Psychometrika, 64, 129-152.

Zhang, J., & Stout, W. (1999b). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika, 64, 213-249.

Zhang, J. (2007). Conditional covariance theory and DETECT for polytomous items. Psychometrika, 72, 69-91.

See Also

For examples see conf.detect.


Differential Item Functioning using Logistic Regression Analysis

Description

This function assesses differential item functioning using logistic regression analysis (Zumbo, 1999).

Usage

dif.logistic.regression(dat, group, score,quant=1.645)

Arguments

dat

Data frame with dichotomous item responses

group

Group identifier

score

Ability estimate, e.g. the WLE.

quant

Used quantile of the normal distribution for assessing statistical significance

Details

Items are classified into A (negligible DIF), B (moderate DIF) and C (large DIF) levels according to the ETS classification system (Longford, Holland & Thayer, 1993, p. 175). See also Monahan, McHorney, Stump and Perkins (2007) for further DIF effect size classifications.

Value

A data frame with following variables:

itemnr

Numeric index of the item

sortDIFindex

Rank of item with respect to the uniform DIF (from negative to positive values)

item

Item name

N

Sample size per item

R

Value of group variable for reference group

F

Value of group variable for focal group

nR

Sample size per item in reference group

nF

Sample size per item in focal group

p

Item pp value

pR

Item pp value in reference group

pF

Item pp value in focal group

pdiff

Item pp value differences

pdiff.adj

Adjusted pp value difference

uniformDIF

Uniform DIF estimate

se.uniformDIF

Standard error of uniform DIF

t.uniformDIF

The tt value for uniform DIF

sig.uniformDIF

Significance label for uniform DIF

DIF.ETS

DIF classification according to the ETS classification system (see Details)

uniform.EBDIF

Empirical Bayes estimate of uniform DIF (Longford, Holland & Thayer, 1993) which takes degree of DIF standard error into account

DIF.SD

Value of the DIF standard deviation

nonuniformDIF

Nonuniform DIF estimate

se.nonuniformDIF

Standard error of nonuniform DIF

t.nonuniformDIF

The tt value for nonuniform DIF

sig.nonuniformDIF

Significance label for nonuniform DIF

References

Longford, N. T., Holland, P. W., & Thayer, D. T. (1993). Stability of the MH D-DIF statistics across populations. In P. W. Holland & H. Wainer (Eds.). Differential Item Functioning (pp. 171-196). Hillsdale, NJ: Erlbaum.

Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42(3), 847-862. doi:10.3758/BRM.42.3.847

Monahan, P. O., McHorney, C. A., Stump, T. E., & Perkins, A. J. (2007). Odds ratio, delta, ETS classification, and standardization measures of DIF magnitude for binary logistic regression. Journal of Educational and Behavioral Statistics, 32(1), 92-109. doi:10.3102/1076998606298035

Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.

See Also

For assessing DIF variance see dif.variance and dif.strata.variance

See also rasch.evm.pcm for assessing differential item functioning in the partial credit model.

See the difR package for a large collection of DIF detection methods (Magis, Beland, Tuerlinckx, & De Boeck, 2010).

For a download of the free DIF-Pack software (SIBTEST, ...) see http://psychometrictools.measuredprogress.org/home.

Examples

#############################################################################
# EXAMPLE 1: Mathematics data | Gender DIF
#############################################################################

data( data.math )
dat <- data.math$data
items <- grep( "M", colnames(dat))

# estimate item parameters and WLEs
mod <- sirt::rasch.mml2( dat[,items] )
wle <- sirt::wle.rasch( dat[,items], b=mod$item$b )$theta

# assess DIF by logistic regression
mod1 <- sirt::dif.logistic.regression( dat=dat[,items], score=wle, group=dat$female)

# calculate DIF variance
dif1 <- sirt::dif.variance( dif=mod1$uniformDIF, se.dif=mod1$se.uniformDIF )
dif1$unweighted.DIFSD
  ## > dif1$unweighted.DIFSD
  ## [1] 0.1963958

# calculate stratified DIF variance
# stratification based on domains
dif2 <- sirt::dif.strata.variance( dif=mod1$uniformDIF, se.dif=mod1$se.uniformDIF,
              itemcluster=data.math$item$domain )
  ## $unweighted.DIFSD
  ## [1] 0.1455916

## Not run: 
#****
# Likelihood ratio test and graphical model test in eRm package
miceadds::library_install("eRm")
# estimate Rasch model
res <- eRm::RM( dat[,items] )
summary(res)
# LR-test with respect to female
lrres <- eRm::LRtest(res, splitcr=dat$female)
summary(lrres)
# graphical model test
eRm::plotGOF(lrres)

#############################################################################
# EXAMPLE 2: Comparison with Mantel-Haenszel test
#############################################################################

library(TAM)
library(difR)

#*** (1) simulate data
set.seed(776)
N <- 1500   # number of persons per group
I <- 12     # number of items
mu2 <- .5   # impact (group difference)
sd2 <- 1.3  # standard deviation group 2

# define item difficulties
b <- seq( -1.5, 1.5, length=I)
# simulate DIF effects
bdif <- scale( stats::rnorm(I, sd=.6 ), scale=FALSE )[,1]
# item difficulties per group
b1 <- b + 1/2 * bdif
b2 <- b - 1/2 * bdif
# simulate item responses
dat1 <- sirt::sim.raschtype( theta=stats::rnorm(N, mean=0, sd=1 ), b=b1 )
dat2 <- sirt::sim.raschtype( theta=stats::rnorm(N, mean=mu2, sd=sd2 ), b=b2 )
dat <- rbind( dat1, dat2 )
group <- rep( c(1,2), each=N ) # define group indicator

#*** (2) scale data
mod <- TAM::tam.mml( dat, group=group )
summary(mod)

#*** (3) extract person parameter estimates
mod_eap <- mod$person$EAP
mod_wle <- tam.wle( mod )$theta

#*********************************
# (4) techniques for assessing differential item functioning

# Model 1: assess DIF by logistic regression and WLEs
dif1 <- sirt::dif.logistic.regression( dat=dat, score=mod_wle, group=group)
# Model 2: assess DIF by logistic regression and EAPs
dif2 <- sirt::dif.logistic.regression( dat=dat, score=mod_eap, group=group)
# Model 3: assess DIF by Mantel-Haenszel statistic
dif3 <- difR::difMH(Data=dat, group=group, focal.name="1",  purify=FALSE )
print(dif3)
  ##  Mantel-Haenszel Chi-square statistic:
  ##
  ##        Stat.    P-value
  ##  I0001  14.5655   0.0001 ***
  ##  I0002 300.3225   0.0000 ***
  ##  I0003   2.7160   0.0993 .
  ##  I0004 191.6925   0.0000 ***
  ##  I0005   0.0011   0.9740
  ##  [...]
  ##  Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  ##  Detection threshold: 3.8415 (significance level: 0.05)
  ##
  ##  Effect size (ETS Delta scale):
  ##
  ##  Effect size code:
  ##   'A': negligible effect
  ##   'B': moderate effect
  ##   'C': large effect
  ##
  ##        alphaMH deltaMH
  ##  I0001  1.3908 -0.7752 A
  ##  I0002  0.2339  3.4147 C
  ##  I0003  1.1407 -0.3093 A
  ##  I0004  2.8515 -2.4625 C
  ##  I0005  1.0050 -0.0118 A
  ##  [...]
  ##
  ##  Effect size codes: 0 'A' 1.0 'B' 1.5 'C'
  ##   (for absolute values of 'deltaMH')

# recompute DIF parameter from alphaMH
uniformDIF3 <- log(dif3$alphaMH)

# compare different DIF statistics
dfr <- data.frame( "bdif"=bdif, "LR_wle"=dif1$uniformDIF,
        "LR_eap"=dif2$uniformDIF, "MH"=uniformDIF3 )
round( dfr, 3 )
  ##       bdif LR_wle LR_eap     MH
  ##  1   0.236  0.319  0.278  0.330
  ##  2  -1.149 -1.473 -1.523 -1.453
  ##  3   0.140  0.122  0.038  0.132
  ##  4   0.957  1.048  0.938  1.048
  ##  [...]
colMeans( abs( dfr[,-1] - bdif ))
  ##      LR_wle     LR_eap         MH
  ##  0.07759187 0.19085743 0.07501708

## End(Not run)

Stratified DIF Variance

Description

Calculation of stratified DIF variance

Usage

dif.strata.variance(dif, se.dif, itemcluster)

Arguments

dif

Vector of uniform DIF effects

se.dif

Standard error of uniform DIF effects

itemcluster

Vector of item strata

Value

A list with following entries:

stratadif

Summary statistics of DIF effects within item strata

weighted.DIFSD

Weighted DIF standard deviation

unweigted.DIFSD

DIF standard deviation

References

Longford, N. T., Holland, P. W., & Thayer, D. T. (1993). Stability of the MH D-DIF statistics across populations. In P. W. Holland & H. Wainer (Eds.). Differential Item Functioning (pp. 171-196). Hillsdale, NJ: Erlbaum.

See Also

See dif.logistic.regression for examples.


DIF Variance

Description

This function calculates the variance of DIF effects, the so called DIF variance (Longford, Holland & Thayer, 1993).

Usage

dif.variance(dif, se.dif, items=paste("item", 1:length(dif), sep="") )

Arguments

dif

Vector of uniform DIF effects

se.dif

Standard error of uniform DIF effects

items

Optional vector of item names

Value

A list with following entries

weighted.DIFSD

Weighted DIF standard deviation

unweigted.DIFSD

DIF standard deviation

mean.se.dif

Mean of standard errors of DIF effects

eb.dif

Empirical Bayes estimates of DIF effects

References

Longford, N. T., Holland, P. W., & Thayer, D. T. (1993). Stability of the MH D-DIF statistics across populations. In P. W. Holland & H. Wainer (Eds.). Differential Item Functioning (pp. 171-196). Hillsdale, NJ: Erlbaum.

See Also

See dif.logistic.regression for examples.


Maximum Likelihood Estimation of the Dirichlet Distribution

Description

Maximum likelihood estimation of the parameters of the Dirichlet distribution

Usage

dirichlet.mle(x, weights=NULL, eps=10^(-5), convcrit=1e-05, maxit=1000,
     oldfac=.3, progress=FALSE)

Arguments

x

Data frame with NN observations and KK variables of a Dirichlet distribution

weights

Optional vector of frequency weights

eps

Tolerance number which is added to prevent from logarithms of zero

convcrit

Convergence criterion

maxit

Maximum number of iterations

oldfac

Convergence acceleration factor. It must be a parameter between 0 and 1.

progress

Display iteration progress?

Value

A list with following entries

alpha

Vector of α\alpha parameters

alpha0

The concentration parameter α0=kαk\alpha_0=\sum_k \alpha_k

xsi

Vector of proportions ξk=αk/α0\xi_k=\alpha_k / \alpha_0

References

Minka, T. P. (2012). Estimating a Dirichlet distribution. Technical Report.

See Also

For simulating Dirichlet vectors with matrix-wise α\bold{\alpha} parameters see dirichlet.simul.

For a variety of functions concerning the Dirichlet distribution see the DirichletReg package.

Examples

#############################################################################
# EXAMPLE 1: Simulate and estimate Dirichlet distribution
#############################################################################

# (1) simulate data
set.seed(789)
N <- 200
probs <- c(.5, .3, .2 )
alpha0 <- .5
alpha <- alpha0*probs
alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE  )
x <- sirt::dirichlet.simul( alpha )

# (2) estimate Dirichlet parameters
dirichlet.mle(x)
  ##   $alpha
  ##   [1] 0.24507708 0.14470944 0.09590745
  ##   $alpha0
  ##   [1] 0.485694
  ##   $xsi
  ##   [1] 0.5045916 0.2979437 0.1974648

## Not run: 
#############################################################################
# EXAMPLE 2: Fitting Dirichlet distribution with frequency weights
#############################################################################

# define observed data
x <- scan( nlines=1)
    1 0   0 1   .5 .5
x <- matrix( x, nrow=3, ncol=2, byrow=TRUE)

# transform observations x into (0,1)
eps <- .01
x <- ( x + eps ) / ( 1 + 2 * eps )

# compare results with likelihood fitting package maxLik
miceadds::library_install("maxLik")
# define likelihood function
dirichlet.ll <- function(param) {
    ll <- sum( weights * log( ddirichlet( x, param ) ) )
    ll
}

#*** weights 10-10-1
weights <- c(10, 10, 1 )
mod1a <- sirt::dirichlet.mle( x, weights=weights )
mod1a
# estimation in maxLik
mod1b <- maxLik::maxLik(loglik, start=c(.5,.5))
print( mod1b )
coef( mod1b )

#*** weights 10-10-10
weights <- c(10, 10, 10 )
mod2a <- sirt::dirichlet.mle( x, weights=weights )
mod2a
# estimation in maxLik
mod2b <- maxLik::maxLik(loglik, start=c(.5,.5))
print( mod2b )
coef( mod2b )

#*** weights 30-10-2
weights <- c(30, 10, 2 )
mod3a <- sirt::dirichlet.mle( x, weights=weights )
mod3a
# estimation in maxLik
mod3b <- maxLik::maxLik(loglik, start=c(.25,.25))
print( mod3b )
coef( mod3b )

## End(Not run)

Simulation of a Dirichlet Distributed Vectors

Description

This function makes random draws from a Dirichlet distribution.

Usage

dirichlet.simul(alpha)

Arguments

alpha

A matrix with α\bold{\alpha} parameters of the Dirichlet distribution

Value

A data frame with Dirichlet distributed responses

Examples

#############################################################################
# EXAMPLE 1: Simulation with two components
#############################################################################

set.seed(789)
N <- 2000
probs <- c(.7, .3)    # define (extremal) class probabilities

#*** alpha0=.2  -> nearly crisp latent classes
alpha0 <- .2
alpha <- alpha0*probs
alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE  )
x <- sirt::dirichlet.simul( alpha )
htitle <- expression(paste( alpha[0], "=.2, ", p[1], "=.7"   ) )
hist( x[,1], breaks=seq(0,1,len=20), main=htitle)

#*** alpha0=3 -> strong deviation from crisp membership
alpha0 <- 3
alpha <- alpha0*probs
alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE  )
x <- sirt::dirichlet.simul( alpha )
htitle <- expression(paste( alpha[0], "=3, ", p[1], "=.7"   ) )
hist( x[,1], breaks=seq(0,1,len=20), main=htitle)

## Not run: 
#############################################################################
# EXAMPLE 2: Simulation with three components
#############################################################################

set.seed(986)
N <- 2000
probs <- c( .5, .35, .15 )

#*** alpha0=.2
alpha0 <- .2
alpha <- alpha0*probs
alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE  )
x <- sirt::dirichlet.simul( alpha )
htitle <- expression(paste( alpha[0], "=.2, ", p[1], "=.7"   ) )
miceadds::library_install("ade4")
ade4::triangle.plot(x, label=NULL, clabel=1)

#*** alpha0=3
alpha0 <- 3
alpha <- alpha0*probs
alpha <- matrix( alpha, nrow=N, ncol=length(alpha), byrow=TRUE  )
x <- sirt::dirichlet.simul( alpha )
htitle <- expression(paste( alpha[0], "=3, ", p[1], "=.7"   ) )
ade4::triangle.plot(x, label=NULL, clabel=1)

## End(Not run)

Comparing Regression Parameters of Different lavaan Models Fitted to the Same Dataset

Description

The function dmlavaan compares model parameters from different lavaan models fitted to the same dataset. This leads to dependent coefficients. Statistical inference is either conducted by M-estimation (i.e., robust sandwich method; method="bootstrap") or bootstrap (method="bootstrap"). See Mize et al. (2019) or Weesie (1999) for more details.

Usage

dmlavaan(fun1, args1, fun2, args2, method="sandwich", R=50)

Arguments

fun1

lavaan function of the first model (e.g., "lavaan", "cfa", or "sem")

args1

arguments for lavaan function in the first model

fun2

lavaan function of the second model (e.g., "lavaan", "cfa", or "sem")

args2

arguments for lavaan function in the second model

method

estimation method for standard errors

R

Number of bootstrap samples

Details

In bootstrap estimation, a normal approximation is applied in the computation of confidence intervals. Hence, R could be chosen relatively small.

TO DO (not yet implemented):

1) inclusion of sampling weights
2) cluster robust standard errors in hierarchical sampling
3) stratification

Value

A list with following entries

coef

Model parameters of both models

vcov

Covariance matrix of model parameters of both models

partable

Parameter table containing all univariate model parameters

...

More entries

References

Mize, T.D., Doan, L., & Long, J.S. (2019). A general framework for comparing predictions and marginal effects across models. Sociological Methodology, 49(1), 152-189. doi:10.1177/0081175019852763

Weesie, J. (1999) Seemingly unrelated estimation and the cluster-adjusted sandwich estimator. Stata Technical Bulletin, 9, 231-248.

Examples

## Not run: 
############################################################################
# EXAMPLE 1: Confirmatory factor analysis with and without fourth item
#############################################################################

#**** simulate data
N <- 200  # number of persons
I <- 4    # number of items

# loadings and error correlations
lam <- seq(.7,.4, len=I)
PSI <- diag( 1-lam^2 )

# define some model misspecification
sd_error <- .1
S1 <- matrix( c( -1.84, 0.39,-0.68, 0.13,
  0.39,-1.31,-0.07,-0.27,
 -0.68,-0.07, 0.90, 1.91,
  0.13,-0.27, 1.91,-0.56 ), nrow=4, ncol=4, byrow=TRUE)
S1 <- ( S1 - mean(S1) ) / sd(S1) * sd_error

Sigma <- lam %*% t(lam) + PSI + S1
dat <- MASS::mvrnorm(n=N, mu=rep(0,I), Sigma=Sigma)
colnames(dat) <- paste0("X",1:4)
dat <- as.data.frame(dat)
rownames(Sigma) <- colnames(Sigma) <- colnames(dat)


#*** define two lavaan models
lavmodel1 <- "F=~ X1 + X2 + X3 + X4"
lavmodel2 <- "F=~ X1 + X2 + X3"

#*** define lavaan estimation arguments and functions
fun2 <- fun1 <- "cfa"
args1 <- list( model=lavmodel1, data=dat, std.lv=TRUE, estimator="MLR")
args2 <- args1
args2$model <- lavmodel2

#* run model comparison
res1 <- sirt::dmlavaan( fun1=fun1, args1=args1, fun2=fun2, args2=args2)

# inspect results
sirt:::print_digits(res1$partable, digits=3)

## End(Not run)

Computation of Eigenvalues of Many Symmetric Matrices

Description

This function computes the eigenvalue decomposition of NN symmetric positive definite matrices. The eigenvalues are computed by the Rayleigh quotient method (Lange, 2010, p. 120). In addition, the inverse matrix can be calculated.

Usage

eigenvalues.manymatrices(Sigma.all, itermax=10, maxconv=0.001,
    inverse=FALSE )

Arguments

Sigma.all

An N×D2N \times D^2 matrix containing the D2D^2 entries of NN symmetric matrices of dimension D×DD \times D

itermax

Maximum number of iterations

maxconv

Convergence criterion for convergence of eigenvectors

inverse

A logical which indicates if the inverse matrix shall be calculated

Value

A list with following entries

lambda

Matrix with eigenvalues

U

An N×D2N \times D^2 Matrix of orthonormal eigenvectors

logdet

Vector of logarithm of determinants

det

Vector of determinants

Sigma.inv

Inverse matrix if inverse=TRUE.

References

Lange, K. (2010). Numerical Analysis for Statisticians. New York: Springer.

Examples

# define matrices
Sigma <- diag(1,3)
Sigma[ lower.tri(Sigma) ] <- Sigma[ upper.tri(Sigma) ] <- c(.4,.6,.8 )
Sigma1 <- Sigma

Sigma <- diag(1,3)
Sigma[ lower.tri(Sigma) ] <- Sigma[ upper.tri(Sigma) ] <- c(.2,.1,.99 )
Sigma2 <- Sigma

# collect matrices in a "super-matrix"
Sigma.all <- rbind( matrix( Sigma1, nrow=1, byrow=TRUE),
                matrix( Sigma2, nrow=1, byrow=TRUE) )
Sigma.all <- Sigma.all[ c(1,1,2,2,1 ), ]

# eigenvalue decomposition
m1 <- sirt::eigenvalues.manymatrices( Sigma.all )
m1

# eigenvalue decomposition for Sigma1
s1 <- svd(Sigma1)
s1

Equating in the Generalized Logistic Rasch Model

Description

This function does the linking in the generalized logistic item response model. Only item difficulties (bb item parameters) are allowed. Mean-mean linking and the methods of Haebara and Stocking-Lord are implemented (Kolen & Brennan, 2004).

Usage

equating.rasch(x, y, theta=seq(-4, 4, len=100),
       alpha1=0, alpha2=0)

Arguments

x

Matrix with two columns: First column items, second column item difficulties

y

Matrix with two columns: First columns item, second column item difficulties

theta

Vector of theta values at which the linking functions should be evaluated. If a weighting according to a prespecified normal distribution N(μ,σ2)N( \mu,\sigma^2) is aimed, then choose theta=stats::qnorm( seq(.001, .999, len=100), mean=mu, sd=sigma)

alpha1

Fixed α1\alpha_1 parameter in the generalized item response model

alpha2

Fixed α2\alpha_2 parameter in the generalized item response model

Value

B.est

Estimated linking constants according to the methods Mean.Mean (Mean-mean linking), Haebara (Haebara method) and Stocking.Lord (Stocking-Lord method).

descriptives

Descriptives of the linking. The linking error (linkerror) is calculated under the assumption of simple random sampling of items

anchor

Original and transformed item parameters of anchor items

transf.par

Original and transformed item parameters of all items

References

Kolen, M. J., & Brennan, R. L. (2004). Test Equating, Scaling, and Linking: Methods and Practices. New York: Springer.

See Also

For estimating standard errors (due to inference with respect to the item domain) of this procedure see equating.rasch.jackknife.

For linking several studies see linking.haberman or invariance.alignment.

A robust alternative to mean-mean linking is implemented in linking.robust.

For linking under more general item response models see the plink package.

Examples

#############################################################################
# EXAMPLE 1: Linking item parameters of the PISA study
#############################################################################

data(data.pisaPars)
pars <- data.pisaPars

# linking the two studies with the Rasch model
mod <- sirt::equating.rasch(x=pars[,c("item","study1")], y=pars[,c("item","study2")])
  ##   Mean.Mean    Haebara Stocking.Lord
  ## 1   0.08828 0.08896269    0.09292838

## Not run: 
#*** linking using the plink package
# The plink package is not available on CRAN anymore.
# You can download the package with
# utils::install.packages("plink", repos="http://www2.uaem.mx/r-mirror")
library(plink)
I <- nrow(pars)
pm <- plink::as.poly.mod(I)
# linking parameters
plink.pars1 <- list( "study1"=data.frame( 1, pars$study1, 0 ),
                     "study2"=data.frame( 1, pars$study2, 0 ) )
      # the parameters are arranged in the columns:
      # Discrimination, Difficulty, Guessing Parameter
# common items
common.items <- cbind("study1"=1:I,"study2"=1:I)
# number of categories per item
cats.item <- list( "study1"=rep(2,I), "study2"=rep(2,I))
# convert into plink object
x <- plink::as.irt.pars( plink.pars1, common.items, cat=cats.item,
          poly.mod=list(pm,pm))
# linking using plink: first group is reference group
out <- plink::plink(x, rescale="MS", base.grp=1, D=1.7)
# summary for linking
summary(out)
  ##   -------  group2/group1*  -------
  ##   Linking Constants
  ##
  ##                        A         B
  ##   Mean/Mean     1.000000 -0.088280
  ##   Mean/Sigma    1.000000 -0.088280
  ##   Haebara       1.000000 -0.088515
  ##   Stocking-Lord 1.000000 -0.096610
# extract linked parameters
pars.out <- plink::link.pars(out)

## End(Not run)

Jackknife Equating Error in Generalized Logistic Rasch Model

Description

This function estimates the linking error in linking based on Jackknife (Monseur & Berezner, 2007).

Usage

equating.rasch.jackknife(pars.data, display=TRUE,
   se.linkerror=FALSE, alpha1=0, alpha2=0)

Arguments

pars.data

Data frame with four columns: jackknife unit (1st column), item parameter study 1 (2nd column), item parameter study 2 (3rd column), item (4th column)

display

Display progress?

se.linkerror

Compute standard error of the linking error

alpha1

Fixed α1\alpha_1 parameter in the generalized item response model

alpha2

Fixed α2\alpha_2 parameter in the generalized item response model

Value

A list with following entries:

pars.data

Used item parameters

itemunits

Used units for jackknife

descriptives

Descriptives for Jackknife. linkingerror.jackknife is the estimated linking error.

References

Monseur, C., & Berezner, A. (2007). The computation of equating errors in international surveys in education. Journal of Applied Measurement, 8, 323-335.

See Also

For more details on linking methods see equating.rasch.

Examples

#############################################################################
# EXAMPLE 1: Linking errors PISA study
#############################################################################

data(data.pisaPars)
pars <- data.pisaPars

# Linking error: Jackknife unit is the testlet
vars <- c("testlet","study1","study2","item")
res1 <- sirt::equating.rasch.jackknife(pars[, vars])
res1$descriptives
  ##   N.items N.units      shift        SD linkerror.jackknife SE.SD.jackknife
  ## 1      25       8 0.09292838 0.1487387          0.04491197      0.03466309

# Linking error: Jackknife unit is the item
res2 <- sirt::equating.rasch.jackknife(pars[, vars ] )
res2$descriptives
  ##   N.items N.units      shift        SD linkerror.jackknife SE.SD.jackknife
  ## 1      25      25 0.09292838 0.1487387          0.02682839      0.02533327

Exploratory DETECT Analysis

Description

This function estimates the DETECT index (Stout, Habing, Douglas & Kim, 1996; Zhang & Stout, 1999a, 1999b) in an exploratory way. Conditional covariances of itempairs are transformed into a distance matrix such that items are clustered by the hierarchical Ward algorithm (Roussos, Stout & Marden, 1998). Note that the function will not provide the same output as the original DETECT software.

Usage

expl.detect(data, score, nclusters, N.est=NULL, seed=NULL, bwscale=1.1,
    smooth=TRUE, use_sum_score=FALSE, hclust_method="ward.D", estsample=NULL)

Arguments

data

An N×IN \times I data frame of dichotomous or polytomous responses. Missing responses are allowed.

score

An ability estimate, e.g. the WLE, sum score or mean score

nclusters

Maximum number of clusters used in the exploratory analysis

N.est

Number of students in a (possible) validation of the DETECT index. N.est students are drawn at random from data.

seed

Random seed

bwscale

Bandwidth scale factor

smooth

Logical indicating whether smoothing should be applied for conditional covariance estimation

use_sum_score

Logical indicating whether sum score should be used. With this option, the bias corrected conditional covariance of Zhang and Stout (1999) is used.

hclust_method

Clustering method used as the argument method in stats::hclust.

estsample

Optional vector of subject indices that defines the estimation sample

Value

A list with following entries

detect.unweighted

Unweighted DETECT statistics

detect.weighted

Weighted DETECT statistics. Weighting is done proportionally to sample sizes of item pairs.

clusterfit

Fit of the cluster method

itemcluster

Cluster allocations

use_sum_score

References

Roussos, L. A., Stout, W. F., & Marden, J. I. (1998). Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement, 35, 1-30.

Stout, W., Habing, B., Douglas, J., & Kim, H. R. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20, 331-354.

Zhang, J., & Stout, W. (1999a). Conditional covariance structure of generalized compensatory multidimensional items, Psychometrika, 64, 129-152.

Zhang, J., & Stout, W. (1999b). The theoretical DETECT index of dimensionality and its application to approximate simple structure, Psychometrika, 64, 213-249.

See Also

For examples see conf.detect.


Functional Unidimensional Item Response Model

Description

Estimates the functional unidimensional item response model for dichotomous data (Ip, Molenberghs, Chen, Goegebeur & De Boeck, 2013). Either the IRT model is estimated using a probit link and employing tetrachoric correlations or item discriminations and intercepts of a pre-estimated multidimensional IRT model are provided as input.

Usage

f1d.irt(dat=NULL, nnormal=1000, nfactors=3, A=NULL, intercept=NULL,
    mu=NULL, Sigma=NULL, maxiter=100, conv=10^(-5), progress=TRUE)

Arguments

dat

Data frame with dichotomous item responses

nnormal

Number of θp\theta_p grid points for approximating the normal distribution

nfactors

Number of dimensions to be estimated

A

Matrix of item discriminations (if the IRT model is already estimated)

intercept

Vector of item intercepts (if the IRT model is already estimated)

mu

Vector of estimated means. In the default it is assumed that all means are zero.

Sigma

Estimated covariance matrix. In the default it is the identity matrix.

maxiter

Maximum number of iterations

conv

Convergence criterion

progress

Display progress? The default is TRUE.

Details

The functional unidimensional item response model (F1D model) for dichotomous item responses is based on a multidimensional model with a link function gg (probit or logit):

P(Xpi=1θp)=g(daidθpddi)P( X_{pi}=1 | \bold{\theta}_p )= g( \sum_d a_{id} \theta_{pd} - d_i )

It is assumed that θp\bold{\theta}_p is multivariate normally distribution with a zero mean vector and identity covariance matrix.

The F1D model estimates unidimensional item response functions such that

P(Xpi=1θp)g(aiθpdi)P( X_{pi}=1 | \theta_p^\ast ) \approx g \left( a_{i}^\ast \theta_{p}^\ast - d_i^\ast \right)

The optimization function FF minimizes the deviations of the approximation equations

aiθpdidaidθpddia_{i}^\ast \theta_{p}^\ast - d_i^\ast \approx \sum_d a_{id} \theta_{pd} - d_i

The optimization function FF is defined by

F({ai,di}i,{θp}p)=piwp(aidθpddiaiθp+di)2Min!F( \{ a_i^\ast, d_i^\ast \}_i, \{ \theta_p^\ast \}_p )= \sum_p \sum_i w_p ( a_{id} \theta_{pd} - d_i- a_{i}^\ast \theta_{p}^\ast + d_i^\ast )^2 \rightarrow Min!

All items ii are equally weighted whereas the ability distribution of persons pp are weighted according to the multivariate normal distribution (using weights wpw_p). The estimation is conducted using an alternating least squares algorithm (see Ip et al. 2013 for a different algorithm). The ability distribution θp\theta_p^\ast of the functional unidimensional model is assumed to be standardized, i.e. does have a zero mean and a standard deviation of one.

Value

A list with following entries:

item

Data frame with estimated item parameters: Item intercepts for the functional unidimensional aia_{i}^\ast (ai.ast) and the ('ordinary') unidimensional (ai0) item response model. The same holds for item intercepts did_{i}^\ast (di.ast and di0 respectively).

person

Data frame with estimated θp\theta_p^\ast distribution. Locations are theta.ast with corresponding probabilities in wgt.

A

Estimated or provided item discriminations

intercept

Estimated or provided intercepts

dat

Used dataset

tetra

Object generated by tetrachoric2 if dat is specified as input. This list entry is useful for applying greenyang.reliability.

References

Ip, E. H., Molenberghs, G., Chen, S. H., Goegebeur, Y., & De Boeck, P. (2013). Functionally unidimensional item response models for multivariate binary data. Multivariate Behavioral Research, 48, 534-562.

See Also

For estimation of bifactor models and Green-Yang reliability based on tetrachoric correlations see greenyang.reliability.

For estimation of bifactor models based on marginal maximum likelihood (i.e. full information maximum likelihood) see the TAM::tam.fa function in the TAM package.

Examples

#############################################################################
# EXAMPLE 1: Dataset Mathematics data.math | Exploratory multidimensional model
#############################################################################
data(data.math)
dat <- ( data.math$data )[, -c(1,2) ] # select Mathematics items

#****
# Model 1: Functional unidimensional model based on original data

#++ (1) estimate model with 3 factors
mod1 <- sirt::f1d.irt( dat=dat, nfactors=3)

#++ (2) plot results
     par(mfrow=c(1,2))
# Intercepts
plot( mod1$item$di0, mod1$item$di.ast, pch=16, main="Item Intercepts",
        xlab=expression( paste( d[i], " (Unidimensional Model)" )),
        ylab=expression( paste( d[i], " (Functional Unidimensional Model)" )))
abline( lm(mod1$item$di.ast ~ mod1$item$di0), col=2, lty=2 )
# Discriminations
plot( mod1$item$ai0, mod1$item$ai.ast, pch=16, main="Item Discriminations",
        xlab=expression( paste( a[i], " (Unidimensional Model)" )),
        ylab=expression( paste( a[i], " (Functional Unidimensional Model)" )))
abline( lm(mod1$item$ai.ast ~ mod1$item$ai0), col=2, lty=2 )
     par(mfrow=c(1,1))

#++ (3) estimate bifactor model and Green-Yang reliability
gy1 <- sirt::greenyang.reliability( mod1$tetra, nfactors=3 )

## Not run: 
#****
# Model 2: Functional unidimensional model based on estimated multidimensional
#          item response model

#++ (1) estimate 2-dimensional exploratory factor analysis with 'smirt'
I <- ncol(dat)
Q <- matrix( 1, I,2 )
Q[1,2] <- 0
variance.fixed <- cbind( 1,2,0 )
mod2a <- sirt::smirt( dat, Qmatrix=Q, irtmodel="comp", est.a="2PL",
                variance.fixed=variance.fixed, maxiter=50)
#++ (2) input estimated discriminations and intercepts for
#       functional unidimensional model
mod2b <- sirt::f1d.irt( A=mod2a$a, intercept=mod2a$b )

#############################################################################
# EXAMPLE 2: Dataset Mathematics data.math | Confirmatory multidimensional model
#############################################################################

data(data.math)
library(TAM)

# dataset
dat <- data.math$data
dat <- dat[, grep("M", colnames(dat) ) ]
# extract item informations
iteminfo <- data.math$item
I <- ncol(dat)
# define Q-matrix
Q <- matrix( 0, nrow=I, ncol=3 )
Q[ grep( "arith", iteminfo$domain ), 1 ] <- 1
Q[ grep( "Meas", iteminfo$domain ), 2 ] <- 1
Q[ grep( "geom", iteminfo$domain ), 3 ] <- 1

# fit three-dimensional model in TAM
mod1 <- TAM::tam.mml.2pl(  dat, Q=Q, control=list(maxiter=40, snodes=1000) )
summary(mod1)

# specify functional unidimensional model
intercept <- mod1$xsi[, c("xsi") ]
names(intercept) <- rownames(mod1$xsi)
fumod1 <- sirt::f1d.irt( A=mod1$B[,2,], intercept=intercept, Sigma=mod1$variance)
fumod1$item

## End(Not run)

Fitting the ISOP and ADISOP Model for Frequency Tables

Description

Fit the isotonic probabilistic model (ISOP; Scheiblechner, 1995) and the additive isotonic probabilistic model (ADISOP; Scheiblechner, 1999).

Usage

fit.isop(freq.correct, wgt, conv=1e-04, maxit=100,
      progress=TRUE, calc.ll=TRUE)

fit.adisop(freq.correct, wgt, conv=1e-04, maxit=100,
      epsilon=0.01, progress=TRUE, calc.ll=TRUE)

Arguments

freq.correct

Frequency table

wgt

Weights for frequency table (number of persons in each cell)

conv

Convergence criterion

maxit

Maximum number of iterations

epsilon

Additive constant to handle cell frequencies of 0 or 1 in fit.adisop

progress

Display progress?

calc.ll

Calculate log-likelihood values? The default is TRUE.

Details

See isop.dich for more details of the ISOP and ADISOP model.

Value

A list with following entries

fX

Fitted frequency table

ResX

Residual frequency table

fit

Fit statistic: weighted least squares of deviations between observed and expected frequencies

item.sc

Estimated item parameters

person.sc

Estimated person parameters

ll

Log-likelihood of the model

freq.fitted

Fitted frequencies in a long data frame

Note

For fitting the ADISOP model it is recommended to first fit the ISOP model and then proceed with the fitted frequency table from ISOP (see Examples).

References

Scheiblechner, H. (1995). Isotonic ordinal probabilistic models (ISOP). Psychometrika, 60, 281-304.

Scheiblechner, H. (1999). Additive conjoint isotonic probabilistic models (ADISOP). Psychometrika, 64, 295-316.

See Also

For fitting the ISOP model to dichotomous and polytomous data see isop.dich.

Examples

#############################################################################
# EXAMPLE 1: Dataset Reading
#############################################################################

data(data.read)
dat <- as.matrix( data.read)
dat.resp <- 1 - is.na(dat) # response indicator matrix
I <- ncol(dat)

#***
# (1) Data preparation
#     actually only freq.correct and wgt are needed
#     but these matrices must be computed in advance.

# different scores of students
stud.p <- rowMeans( dat, na.rm=TRUE )
# different item p values
item.p <- colMeans( dat, na.rm=TRUE )
item.ps <- sort( item.p, index.return=TRUE)
dat <- dat[,  item.ps$ix ]
# define score groups students
scores <- sort( unique( stud.p ) )
SC <- length(scores)
# create table
freq.correct <- matrix( NA, SC, I )
wgt <- freq.correct
# percent correct
a1 <- stats::aggregate( dat==1, list( stud.p ), mean, na.rm=TRUE )
freq.correct <- a1[,-1]
# weights
a1 <- stats::aggregate( dat.resp, list( stud.p ), sum, na.rm=TRUE )
wgt <- a1[,-1]

#***
# (2) Fit ISOP model
res.isop <- sirt::fit.isop( freq.correct, wgt )
# fitted frequency table
res.isop$fX

#***
# (3) Fit ADISOP model
# use monotonely smoothed frequency table from ISOP model
res.adisop <- sirt::fit.adisop( freq.correct=res.isop$fX, wgt )
# fitted frequency table
res.adisop$fX

Clustering for Continuous Fuzzy Data

Description

This function performs clustering for continuous fuzzy data for which membership functions are assumed to be Gaussian (Denoeux, 2013). The mixture is also assumed to be Gaussian and (conditionally cluster membership) independent.

Usage

fuzcluster(dat_m, dat_s, K=2, nstarts=7, seed=NULL, maxiter=100,
     parmconv=0.001, fac.oldxsi=0.75, progress=TRUE)

## S3 method for class 'fuzcluster'
summary(object,...)

Arguments

dat_m

Centers for individual item specific membership functions

dat_s

Standard deviations for individual item specific membership functions

K

Number of latent classes

nstarts

Number of random starts. The default is 7 random starts.

seed

Simulation seed. If one value is provided, then only one start is performed.

maxiter

Maximum number of iterations

parmconv

Maximum absolute change in parameters

fac.oldxsi

Convergence acceleration factor which should take values between 0 and 1. The default is 0.75.

progress

An optional logical indicating whether iteration progress should be displayed.

object

Object of class fuzcluster

...

Further arguments to be passed

Value

A list with following entries

deviance

Deviance

iter

Number of iterations

pi_est

Estimated class probabilities

mu_est

Cluster means

sd_est

Cluster standard deviations

posterior

Individual posterior distributions of cluster membership

seed

Simulation seed for cluster solution

ic

Information criteria

References

Denoeux, T. (2013). Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Transactions on Knowledge and Data Engineering, 25, 119-130.

See Also

See fuzdiscr for estimating discrete distributions for fuzzy data.

See the fclust package for fuzzy clustering.

Examples

## Not run: 
#############################################################################
# EXAMPLE 1: 2 classes and 3 items
#############################################################################

#*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
# simulate data (2 classes and 3 items)
set.seed(876)
library(mvtnorm)
Ntot <- 1000  # number of subjects
# define SDs for simulating uncertainty
sd_uncertain <- c( .2, 1, 2 )

dat_m <- NULL   # data frame containing mean of membership function
dat_s <- NULL   # data frame containing SD of membership function

# *** Class 1
pi_class <- .6
Nclass <- Ntot * pi_class
mu <- c(3,1,0)
Sigma <- diag(3)
# simulate data
dat_m1 <- mvtnorm::rmvnorm( Nclass, mean=mu, sigma=Sigma )
dat_s1 <- matrix( stats::runif( Nclass * 3 ), nrow=Nclass )
for ( ii in 1:3){ dat_s1[,ii] <- dat_s1[,ii] * sd_uncertain[ii] }
dat_m <- rbind( dat_m, dat_m1 )
dat_s <- rbind( dat_s, dat_s1 )

# *** Class 2
pi_class <- .4
Nclass <- Ntot * pi_class
mu <- c(0,-2,0.4)
Sigma <- diag(c(0.5, 2, 2 ) )
# simulate data
dat_m1 <- mvtnorm::rmvnorm( Nclass, mean=mu, sigma=Sigma )
dat_s1 <- matrix( stats::runif( Nclass * 3 ), nrow=Nclass )
for ( ii in 1:3){ dat_s1[,ii] <- dat_s1[,ii] * sd_uncertain[ii] }
dat_m <- rbind( dat_m, dat_m1 )
dat_s <- rbind( dat_s, dat_s1 )
colnames(dat_s) <- colnames(dat_m) <- paste0("I", 1:3 )

#*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
# estimation

#*** Model 1: Clustering with 8 random starts
res1 <- sirt::fuzcluster(K=2,dat_m, dat_s, nstarts=8, maxiter=25)
summary(res1)
  ##  Number of iterations=22 (Seed=5090 )
  ##  ---------------------------------------------------
  ##  Class probabilities (2 Classes)
  ##  [1] 0.4083 0.5917
  ##
  ##  Means
  ##           I1      I2     I3
  ##  [1,] 0.0595 -1.9070 0.4011
  ##  [2,] 3.0682  1.0233 0.0359
  ##
  ##  Standard deviations
  ##         [,1]   [,2]   [,3]
  ##  [1,] 0.7238 1.3712 1.2647
  ##  [2,] 0.9740 0.8500 0.7523

#*** Model 2: Clustering with one start with seed 4550
res2 <- sirt::fuzcluster(K=2,dat_m, dat_s, nstarts=1, seed=5090 )
summary(res2)

#*** Model 3: Clustering for crisp data
#             (assuming no uncertainty, i.e. dat_s=0)
res3 <- sirt::fuzcluster(K=2,dat_m, dat_s=0*dat_s, nstarts=30, maxiter=25)
summary(res3)
  ##  Class probabilities (2 Classes)
  ##  [1] 0.3645 0.6355
  ##
  ##  Means
  ##           I1      I2      I3
  ##  [1,] 0.0463 -1.9221  0.4481
  ##  [2,] 3.0527  1.0241 -0.0008
  ##
  ##  Standard deviations
  ##         [,1]   [,2]   [,3]
  ##  [1,] 0.7261 1.4541 1.4586
  ##  [2,] 0.9933 0.9592 0.9535

#*** Model 4: kmeans cluster analysis
res4 <- stats::kmeans( dat_m, centers=2 )
  ##   K-means clustering with 2 clusters of sizes 607, 393
  ##   Cluster means:
  ##             I1        I2          I3
  ##   1 3.01550780  1.035848 -0.01662275
  ##   2 0.03448309 -2.008209  0.48295067

## End(Not run)

Estimation of a Discrete Distribution for Fuzzy Data (Data in Belief Function Framework)

Description

This function estimates a discrete distribution for uncertain data based on the belief function framework (Denoeux, 2013; see Details).

Usage

fuzdiscr(X, theta0=NULL, maxiter=200, conv=1e-04)

Arguments

X

Matrix with fuzzy data. Rows corresponds to subjects and columns to values of the membership function

theta0

Initial vector of parameter estimates

maxiter

Maximum number of iterations

conv

Convergence criterion

Details

For nn subjects, membership functions mn(k)m_n(k) are observed which indicate the belief in data Xn=kX_n=k. The membership function is interpreted as epistemic uncertainty (Denoeux, 2011). However, associated parameters in statistical models are crisp which means that models are formulated at the basis of precise (crisp) data if they would be observed.

In the present estimation problem of a discrete distribution, the parameters of interest are category probabilities θk=P(X=k)\theta_k=P( X=k).

The parameter estimation follows the evidential EM algorithm (Denoeux, 2013).

Value

Vector of probabilities of the discrete distribution

References

Denoeux, T. (2011). Maximum likelihood estimation from fuzzy data using the EM algorithm. Fuzzy Sets and Systems, 183, 72-91.

Denoeux, T. (2013). Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Transactions on Knowledge and Data Engineering, 25, 119-130.

Examples

#############################################################################
# EXAMPLE 1: Binomial distribution Denoeux Example 4.3 (2013)
#############################################################################

#*** define uncertain data
X_alpha <- function( alpha ){
    Q <- matrix( 0, 6, 2 )
    Q[5:6,2] <- Q[1:3,1] <- 1
    Q[4,] <- c( alpha, 1 - alpha )
    return(Q)
        }

# define data for alpha=0.5
X <- X_alpha( alpha=.5 )
  ##   > X
  ##        [,1] [,2]
  ##   [1,]  1.0  0.0
  ##   [2,]  1.0  0.0
  ##   [3,]  1.0  0.0
  ##   [4,]  0.5  0.5
  ##   [5,]  0.0  1.0
  ##   [6,]  0.0  1.0

  ## The fourth observation has equal plausibility for the first and the
  ## second category.

# parameter estimate uncertain data
fuzdiscr( X )
  ##   > sirt::fuzdiscr( X )
  ##   [1] 0.5999871 0.4000129

# parameter estimate pseudo likelihood
colMeans( X )
  ##   > colMeans( X )
  ##   [1] 0.5833333 0.4166667
##-> Observations are weighted according to belief function values.

#*****
# plot parameter estimates as function of alpha
alpha <- seq( 0, 1, len=100 )
res <- sapply( alpha, FUN=function(aa){
             X <- X_alpha( alpha=aa )
             c( sirt::fuzdiscr( X )[1], colMeans( X )[1] )
                    } )
# plot
plot( alpha, res[1,], xlab=expression(alpha), ylab=expression( theta[alpha] ), type="l",
        main="Comparison Belief Function and Pseudo-Likelihood (Example 1)")
lines( alpha, res[2,], lty=2, col=2)
legend( 0, .67, c("Belief Function", "Pseudo-Likelihood" ), col=c(1,2), lty=c(1,2) )

#############################################################################
# EXAMPLE 2: Binomial distribution (extends Example 1)
#############################################################################

X_alpha <- function( alpha ){
    Q <- matrix( 0, 6, 2 )
    Q[6,2] <- Q[1:2,1] <- 1
    Q[3:5,] <- matrix( c( alpha, 1 - alpha ), 3, 2, byrow=TRUE)
    return(Q)
        }

X <- X_alpha( alpha=.5 )
alpha <- seq( 0, 1, len=100 )
res <- sapply( alpha, FUN=function(aa){
           X <- X_alpha( alpha=aa )
           c( sirt::fuzdiscr( X )[1], colMeans( X )[1] )
                    } )
# plot
plot( alpha, res[1,], xlab=expression(alpha), ylab=expression( theta[alpha] ), type="l",
        main="Comparison Belief Function and Pseudo-Likelihood (Example 2)")
lines( alpha, res[2,], lty=2, col=2)
legend( 0, .67, c("Belief Function", "Pseudo-Likelihood" ), col=c(1,2), lty=c(1,2) )

#############################################################################
# EXAMPLE 3: Multinomial distribution with three categories
#############################################################################

# define uncertain data
X <- matrix( c( 1,0,0, 1,0,0,   0,1,0, 0,0,1, .7, .2, .1,
         .4, .6, 0 ), 6, 3, byrow=TRUE )
  ##   > X
  ##        [,1] [,2] [,3]
  ##   [1,]  1.0  0.0  0.0
  ##   [2,]  1.0  0.0  0.0
  ##   [3,]  0.0  1.0  0.0
  ##   [4,]  0.0  0.0  1.0
  ##   [5,]  0.7  0.2  0.1
  ##   [6,]  0.4  0.6  0.0

##->  Only the first four observations are crisp.

#*** estimation for uncertain data
fuzdiscr( X )
  ##   > sirt::fuzdiscr( X )
  ##   [1] 0.5772305 0.2499931 0.1727764

#*** estimation pseudo-likelihood
colMeans(X)
  ##   > colMeans(X)
  ##   [1] 0.5166667 0.3000000 0.1833333

##-> Obviously, the treatment uncertainty is different in belief function
##   and in pseudo-likelihood framework.

Discrete (Rasch) Grade of Membership Model

Description

This function estimates the grade of membership model (Erosheva, Fienberg & Joutard, 2007; also called mixed membership model) by the EM algorithm assuming a discrete membership score distribution. The function is restricted to dichotomous item responses.

Usage

gom.em(dat, K=NULL, problevels=NULL, weights=NULL, model="GOM", theta0.k=seq(-5,5,len=15),
    xsi0.k=exp(seq(-6, 3, len=15)), max.increment=0.3, numdiff.parm=1e-4,
    maxdevchange=1e-6, globconv=1e-4, maxiter=1000, msteps=4, mstepconv=0.001,
    theta_adjust=FALSE, lambda.inits=NULL, lambda.index=NULL, pi.k.inits=NULL,
    newton_raphson=TRUE, optimizer="nlminb", progress=TRUE)

## S3 method for class 'gom'
summary(object, file=NULL, ...)

## S3 method for class 'gom'
anova(object,...)

## S3 method for class 'gom'
logLik(object,...)

## S3 method for class 'gom'
IRT.irfprob(object,...)

## S3 method for class 'gom'
IRT.likelihood(object,...)

## S3 method for class 'gom'
IRT.posterior(object,...)

## S3 method for class 'gom'
IRT.modelfit(object,...)

## S3 method for class 'IRT.modelfit.gom'
summary(object,...)

Arguments

dat

Data frame with dichotomous responses

K

Number of classes (only applies for model="GOM")

problevels

Vector containing probability levels for membership functions (only applies for model="GOM"). If a specific space of probability levels should be estimated, then a matrix can be supplied (see Example 1, Model 2a).

weights

Optional vector of sampling weights

model

The type of grade of membership model. The default "GOM" is the nonparametric grade of membership model. A parametric multivariate normal representation can be requested by "GOMnormal". The probabilities and membership functions specifications described in Details are called via "GOMRasch".

theta0.k

Vector of θ~k\tilde{\theta}_k grid (applies only for model="GOMRasch")

xsi0.k

Vector of ξp\xi_p grid (applies only for model="GOMRasch")

max.increment

Maximum increment

numdiff.parm

Numerical differentiation parameter

maxdevchange

Convergence criterion for change in relative deviance

globconv

Global convergence criterion for parameter change

maxiter

Maximum number of iterations

msteps

Number of iterations within a M step

mstepconv

Convergence criterion within a M step

theta_adjust

Logical indicating whether multivariate normal distribution should be adaptively chosen during the EM algorithm.

lambda.inits

Initial values for item parameters

lambda.index

Optional integer matrix with integers indicating equality constraints among λ\lambda item parameters

pi.k.inits

Initial values for distribution parameters

newton_raphson

Logical indicating whether Newton-Raphson should be used for final iterations

optimizer

Type of optimizer. Can be "optim" or "nlminb".

progress

Display iteration progress? Default is TRUE.

object

Object of class gom

file

Optional file name for summary output

...

Further arguments to be passed

Details

The item response model of the grade of membership model (Erosheva, Fienberg & Junker, 2002; Erosheva, Fienberg & Joutard, 2007) with KK classes for dichotomous correct responses XpiX_{pi} of person pp on item ii is as follows (model="GOM")

P(Xpi=1gp1,,gpK)=kλikgpk,k=1Kgpk=1,0gpk1P(X_{pi}=1 | g_{p1}, \ldots, g_{pK} )=\sum_k \lambda_{ik} g_{pk} \quad, \quad \sum_{k=1}^K g_{pk}=1 \quad, \quad 0 \leq g_{pk} \leq 1

In most applications (e.g. Erosheva et al., 2007), the grade of membership function {gpk}\{g_{pk}\} is assumed to follow a Dirichlet distribution. In our gom.em implementation the membership function is assumed to be discretely represented by a grid u=(u1,,uL)u=(u_1, \ldots, u_L) with entries between 0 and 1 (e.g. seq(0,1,length=5) with L=5L=5). The values gpkg_{pk} of the membership function can then only take values in {u1,,uL}\{ u_1, \ldots, u_L \} with the restriction kgpkl1(gpk=ul)=1\sum_k g_{pk} \sum_l \bold{1}(g_{pk}=u_l )=1. The grid uu is specified by using the argument problevels.

The Rasch grade of membership model (model="GOMRasch") poses constraints on probabilities λik\lambda_{ik}