| Title: | Bayesian Psychometric Measurement Using 'Stan' |
|---|---|
| Description: | Estimate diagnostic classification models (also called cognitive diagnostic models) with 'Stan'. Diagnostic classification models are confirmatory latent class models, as described by Rupp et al. (2010, ISBN: 978-1-60623-527-0). Automatically generate 'Stan' code for the general loglinear cognitive diagnostic diagnostic model proposed by Henson et al. (2009) <doi:10.1007/s11336-008-9089-5> and other subtypes that introduce additional model constraints. Using the generated 'Stan' code, estimate the model evaluate the model's performance using model fit indices, information criteria, and reliability metrics. |
| Authors: | W. Jake Thompson [aut, cre] (ORCID: <https://orcid.org/0000-0001-7339-0300>), Jeffrey Hoover [aut] (ORCID: <https://orcid.org/0000-0002-0276-0308>), Auburn Jimenez [ctb] (ORCID: <https://orcid.org/0000-0002-7072-2960>), Nathan Jones [ctb] (ORCID: <https://orcid.org/0000-0001-6177-7161>), Matthew Johnson [cph] (Authored code adapted for measrdcm method for `reliability()`), University of Kansas [cph], Institute of Education Sciences [fnd] |
| Maintainer: | W. Jake Thompson <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 2.0.1.9000 |
| Built: | 2026-05-31 06:10:43 UTC |
| Source: | https://github.com/r-dcm/measr |
Calculate information criteria for diagnostic models not estimated with full
Markov chain Monte Carlo (i.e., with method = "optim"). Available
information include the Akaike information criterion (AIC; Akaike, 1973) and
the Bayesian information criterion (BIC; Schwarz, 1978).
aic(x, ..., force = FALSE) bic(x, ..., force = FALSE)aic(x, ..., force = FALSE) bic(x, ..., force = FALSE)
x |
A measrdcm object estimated with
|
... |
Unused. |
force |
If the criterion has already been added to the
model object with |
The numeric value of the information criterion.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csáki (Eds.), Proceedings of the Second International Symposium on Information Theory (pp. 267-281). Akademiai Kiado.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. doi:10.1214/aos/1176344136
model_spec <- dcm_specify( qmatrix = dcmdata::mdm_qmatrix, identifier = "item" ) model <- dcm_estimate( dcm_spec = model_spec, data = dcmdata::mdm_data, identifier = "respondent", method = "optim", seed = 63277 ) aic(model) bic(model)model_spec <- dcm_specify( qmatrix = dcmdata::mdm_qmatrix, identifier = "item" ) model <- dcm_estimate( dcm_spec = model_spec, data = dcmdata::mdm_data, identifier = "respondent", method = "optim", seed = 63277 ) aic(model) bic(model)
Calculate the Bayes factor for model comparisons, which represents the posterior odds of the null hypothesis when the prior probability of the null model is 0.5 (Jeffreys, 1935; Kass & Raftery, 1995). Consistent with the Bayesian reporting guidelines from Kruschke (2021), we calculate the posterior probability of the null model for a variety of prior probabilities, in addition to the Bayes factor.
bayes_factor( x, ..., model_names = NULL, prior_prob = seq(0.02, 0.98, by = 0.02) )bayes_factor( x, ..., model_names = NULL, prior_prob = seq(0.02, 0.98, by = 0.02) )
x |
A measrdcm object. |
... |
Additional measrdcm to be compared to |
model_names |
Names given to each provided model in the comparison
output. If |
prior_prob |
A numeric vector of prior probabilities for the null model used to calculate the posterior probability of the null model relative to alternative model. See details for more information. |
Bayes factors will be calculated for all possible pairwise comparisons
between the models provided to x and .... In each comparison, one model
is identified as the null model, and the other is the alternative. This
distinction is not terribly meaningful from a calculation standpoint, as the
probabilities for the alternative model are simply 1 minus the null
probabilities. If you want particular models to be labeled as the "null", the
determination is made by the order the models are sent to the function. That
is, x will always be the null model. The first model included in ... will
be the alternative model when compared to x and the null model when
compared to all other models included in .... Similarly, the second model
included in ... will be the alternative model when compared to x and the
first model included in ... and the null model in all other comparisons.
prior_prob is used to specify a vector of possible prior probabilities for
the null model. These are used in conjunction with the Bayes factor to
determine the posterior model probability for the null model, relative to the
alternative model. The posterior probability for the alternative model can
be calculated as 1 minus the null model's posterior probability. You may
specify a specific prior probability, or specify a range of possibilities to
construct a graph similar to Kruschke's (2021) Figure 1. These probabilities
can be interpreted as, "If the prior probability is {prior_prob_null},
then the posterior is {posterior_prob_null}" (or 1 minus for the
alternative model).
A tibble with one row per model comparison and four columns.
null_model: The null model in the comparison.
alt_model: The alternative model in the comparison.
bf: The estimated Bayes factor.
posterior_probs: A nested list column, where element element is a
tibble with two columns:
prior_prob_null: The prior probability that the null model is
correct.
posterior_prob_null: The posterior probability that the null model is
correct.
The list column can be unnested with tidyr::unnest() (see examples). If
prior_prob is NULL, the posterior_probs column is excluded from
the returned object.
Jeffreys, H. (1935). Some tests of significance, treated by the theory of probability. Mathematical Proceedings of the Cambridge Philosophical Society, 31(2), 203-222. doi:10.1017/S030500410001330X
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773-795. doi:10.1080/01621459.1995.10476572
Kruschke, J. K. (2021). Bayesian analysis reporting guidelines. Nature, 5, 1282-1291. doi:10.1038/s41562-021-01177-7
mdm_dina <- dcm_estimate( dcm_specify( qmatrix = dcmdata::mdm_qmatrix, identifier = "item", measurement_model = dina() ), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "mcmc", seed = 63277, backend = "rstan", iter = 700, warmup = 500, chains = 2, refresh = 0 ) mdm_dino <- dcm_estimate( dcm_specify( qmatrix = dcmdata::mdm_qmatrix, identifier = "item", measurement_model = dino() ), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "mcmc", seed = 63277, backend = "rstan", iter = 700, warmup = 500, chains = 2, refresh = 0 ) bf <- bayes_factor(mdm_dina, mdm_dino) bf tidyr::unnest(bf, "posterior_probs")mdm_dina <- dcm_estimate( dcm_specify( qmatrix = dcmdata::mdm_qmatrix, identifier = "item", measurement_model = dina() ), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "mcmc", seed = 63277, backend = "rstan", iter = 700, warmup = 500, chains = 2, refresh = 0 ) mdm_dino <- dcm_estimate( dcm_specify( qmatrix = dcmdata::mdm_qmatrix, identifier = "item", measurement_model = dino() ), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "mcmc", seed = 63277, backend = "rstan", iter = 700, warmup = 500, chains = 2, refresh = 0 ) bf <- bayes_factor(mdm_dina, mdm_dino) bf tidyr::unnest(bf, "posterior_probs")
The cognitive diagnostic index (CDI) is a measure of how well an assessment is able to distinguish between attribute profiles. The index was originally proposed by Henson & Douglas (2005) for item- and test-level discrimination, and then expanded by Henson et al. (2008) to include attribute-level discrimination indices.
cdi(model, weight_prevalence = TRUE)cdi(model, weight_prevalence = TRUE)
model |
The estimated model to be evaluated. |
weight_prevalence |
Logical indicating whether the discrimination indices should be weighted by the prevalence of the attribute profiles. See details for additional information. |
Henson et al. (2008) described two attribute-level discrimination indices,
(Equation 8) and
(Equation 13), which are similar in that
both are the sum of item-level discrimination indices.
In both cases, item-level discrimination indices are calculated as the
average of Kullback-Leibler information for all pairs of attributes profiles
for the item.
The item-level indices are then summed to achieve the test-level
discrimination index for each attribute, or the test overall.
However, whereas is an unweighted
average of the Kullback-Leibler information,
is a weighted average, where the weight
is defined by the prevalence of each profile (i.e.,
measr_extract(model, what = "strc_param")).
A list with two elements:
item_discrimination: A tibble with one row
per item containing the CDI for the item and any relevant
attributes.
test_discrimination: A tibble with one row
containing the total CDI for the assessment and for each
attribute.
Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29(4), 262-277. doi:10.1177/0146621604272623
Henson, R., Roussos, L., Douglas, J., & Xuming, H. (2008). Cognitive diagnostic attribute-level discrimination indices. Applied Psychological Measurement, 32(4), 275-288. doi:10.1177/0146621607302478
rstn_ecpe_lcdm <- dcm_estimate( dcm_specify(dcmdata::ecpe_qmatrix, identifier = "item_id"), data = dcmdata::ecpe_data, missing = NA, identifier = "resp_id", method = "optim", seed = 63277, backend = "rstan" ) cdi(rstn_ecpe_lcdm)rstn_ecpe_lcdm <- dcm_estimate( dcm_specify(dcmdata::ecpe_qmatrix, identifier = "item_id"), data = dcmdata::ecpe_data, missing = NA, identifier = "resp_id", method = "optim", seed = 63277, backend = "rstan" ) cdi(rstn_ecpe_lcdm)
Estimate diagnostic classification models (DCMs; also known as cognitive diagnostic models) using 'Stan'. Models can be estimated using Stan's optimizer, or full Markov chain Monte Carlo (MCMC).
dcm_estimate( dcm_spec, data, missing = NA, identifier = NULL, method = c("variational", "mcmc", "optim", "pathfinder"), backend = getOption("measr.backend", "rstan"), file = NULL, file_refit = getOption("measr.file_refit", "never"), ... )dcm_estimate( dcm_spec, data, missing = NA, identifier = NULL, method = c("variational", "mcmc", "optim", "pathfinder"), backend = getOption("measr.backend", "rstan"), file = NULL, file_refit = getOption("measr.file_refit", "never"), ... )
dcm_spec |
A DCM specification created with |
data |
Response data. A data frame with 1 row per respondent and 1 column per item. |
missing |
An |
identifier |
Optional. Variable name of a column in |
method |
Estimation method. Options are |
backend |
Character string naming the package to use as the backend for
fitting the Stan model. Options are |
file |
Either |
file_refit |
Controls when a saved model is refit. Options are
|
... |
Additional arguments passed to Stan.
|
A measrdcm object.
model_spec <- dcm_specify( qmatrix = dcmdata::mdm_qmatrix, identifier = "item" ) model <- dcm_estimate( dcm_spec = model_spec, data = dcmdata::mdm_data, identifier = "respondent", method = "optim", seed = 63277 )model_spec <- dcm_specify( qmatrix = dcmdata::mdm_qmatrix, identifier = "item" ) model <- dcm_estimate( dcm_spec = model_spec, data = dcmdata::mdm_data, identifier = "respondent", method = "optim", seed = 63277 )
For models estimated with a method that results in posterior distributions (e.g., "mcmc", "variational"), use the posterior distributions to compute expected distributions for fit statistics and compare to values in the observed data.
fit_ppmc( x, ..., model_fit = NULL, item_fit = NULL, ndraws = NULL, probs = c(0.025, 0.975), return_draws = 0, force = FALSE )fit_ppmc( x, ..., model_fit = NULL, item_fit = NULL, ndraws = NULL, probs = c(0.025, 0.975), return_draws = 0, force = FALSE )
x |
An estimated model object (e.g., from |
... |
Unused. For future extensions. |
model_fit |
The posterior predictive model checks to compute for an
evaluation of model-level fit. If |
item_fit |
The posterior predictive model checks to compute for an
evaluation of item-level fit. If |
ndraws |
The number of posterior draws to base the checks on. Must be
less than or equal to the total number of posterior draws retained in the
estimated model. If |
probs |
The percentiles to be computed by the |
return_draws |
Number of posterior draws for each specified fit
statistic to be returned. This does not affect the calculation of the
posterior predictive checks, but can be useful for visualizing the fit
statistics. Must be less than |
force |
If all requested PPMCs have already been added to the
model object using |
Posterior predictive model checks (PPMCs) use the posterior distribution of an estimated model to compute different statistics. This creates an expected distribution of the given statistic, if our estimated parameters are correct. We then compute the statistic in our observed data and compare the observed value to the expected distribution. Observed values that fall outside of the expected distributions indicate incompatibility between the estimated model and the observed data.
For DCMs, we currently support PPMCs at the model and item level. At the
model level, we calculate the expected raw score distribution
(model_fit = "raw_score") as described by Thompson (2019) and Park et al.
(2015). At the item level, we can calculate the conditional probability that
a respondent in each class provides a correct response (item_fit = "conditional_prob") as described by Thompson (2019) and Sinharay & Almond
(2007) or the overall proportion correct for an item (item_fit = "pvalue"),
as described by Thompson (2019). We can also calculate the odds ratio for
each pair of items (item_fit = "odds_ratio") as described by Park et al.
(2015) and Sinharay et al. (2006).
A list with two elements, "model_fit" and "item_fit". If either
model_fit = NULL or item_fit = NULL in the function call, this will be
a one-element list, with the null criteria excluded. Each list element, is
itself a list with one element for each specified PPMC containing
a tibble. For example if
item_fit = c("conditional_prob", "odds_ratio"), the "item_fit" element
will be a list of length two, where each element is a tibble containing the
results of the PPMC. All tibbles follow the same general
structure:
obs_{ppmc}: The value of the relevant statistic in the observed
data.
ppmc_mean: The mean of the ndraws posterior samples calculated
for the given statistic.
Quantile columns: 1 column for each value of probs, providing the
corresponding quantiles of the ndraws posterior samples calculated for
the given statistic.
samples: A list column, where each element contains a vector of
length return_draws, representing samples from the posterior
distribution of the calculated statistic. This column is excluded if
return_draws = 0.
ppp: The posterior predictive p-value. This is the proportion of
posterior samples for calculated statistic that are greater than the
observed value. Values very close to 0 or 1 indicate incompatibility
between the fitted model and the observed data.
Park, J. Y., Johnson, M. S., Lee, Y-S. (2015). Posterior predictive model checks for cognitive diagnostic models. International Journal of Quantitative Research in Education, 2(3-4), 244-264. doi:10.1504/IJQRE.2015.071738
Sinharay, S., & Almond, R. G. (2007). Assessing fit of cognitive diagnostic models. Educational and Psychological Measurement, 67(2), 239-257. doi:10.1177/0013164406292025
Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30(4), 298-321. doi:10.1177/0146621605285517
Thompson, W. J. (2019). Bayesian psychometrics for diagnostic assessments: A proof of concept (Research Report No. 19-01). University of Kansas; Accessible Teaching, Learning, and Assessment Systems. doi:10.35542/osf.io/jzqs8
mdm_dina <- dcm_estimate( dcm_specify( dcmdata::mdm_qmatrix, identifier = "item", measurement_model = dina() ), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "mcmc", seed = 63277, backend = "rstan", iter = 700, warmup = 500, chains = 2, refresh = 0 ) fit_ppmc(mdm_dina, model_fit = "raw_score")mdm_dina <- dcm_estimate( dcm_specify( dcmdata::mdm_qmatrix, identifier = "item", measurement_model = dina() ), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "mcmc", seed = 63277, backend = "rstan", iter = 700, warmup = 500, chains = 2, refresh = 0 ) fit_ppmc(mdm_dina, model_fit = "raw_score")
Calculate the log marginal likelihood with bridge sampling (Meng & Wong,
1996). This is a wrapper around bridgesampling::bridge_sampler().
Therefore, log marginal likelihood calculation is currently only available
for models estimated with {rstan} using MCMC.
log_mll(x, ..., force = FALSE)log_mll(x, ..., force = FALSE)
x |
A measrdcm object estimated with
|
... |
Unused. |
force |
If the criterion has already been added to the
model object with |
The estimate of the log marginal likelihood.
Meng, X.-L., & Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistical Sinica, 6(4), 831-860. https://www.jstor.org/stable/24306045
model_spec <- dcm_specify( qmatrix = dcmdata::mdm_qmatrix, identifier = "item" ) model <- dcm_estimate( dcm_spec = model_spec, data = dcmdata::mdm_data, identifier = "respondent", method = "variational", seed = 63277 ) log_mll(model)model_spec <- dcm_specify( qmatrix = dcmdata::mdm_qmatrix, identifier = "item" ) model <- dcm_estimate( dcm_spec = model_spec, data = dcmdata::mdm_data, identifier = "respondent", method = "variational", seed = 63277 ) log_mll(model)
The loglik_array() methods for measrdcm objects
calculates the log-likelihood for an estimated model via the generated
quantities functionality in Stan and returns the draws of the log_lik
parameter.
loglik_array(model, ...)loglik_array(model, ...)
model |
A measrdcm object. |
... |
Unused. For future extensions. |
A "draws_array" object containing the
log-likelihood estimates for the model.
rstn_mdm_lcdm <- dcm_estimate( dcm_specify(dcmdata::mdm_qmatrix, identifier = "item"), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "optim", seed = 63277, backend = "rstan" ) loglik_array(rstn_mdm_lcdm)rstn_mdm_lcdm <- dcm_estimate( dcm_specify(dcmdata::mdm_qmatrix, identifier = "item"), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "optim", seed = 63277, backend = "rstan" ) loglik_array(rstn_mdm_lcdm)
For models estimated with MCMC, relative model fit comparisons can be made using the LOO-CV or WAIC indicates (Vehtari et al., 2017). These functions are wrappers for the loo package. See the loo package vignettes for details on the implementation.
## S3 method for class ''measr::measrdcm'' loo(x, ..., r_eff = NA, force = FALSE) ## S3 method for class ''measr::measrdcm'' waic(x, ..., force = FALSE) ## S3 method for class ''measr::measrdcm'' loo_compare(x, ..., criterion = c("loo", "waic"), model_names = NULL)## S3 method for class ''measr::measrdcm'' loo(x, ..., r_eff = NA, force = FALSE) ## S3 method for class ''measr::measrdcm'' waic(x, ..., force = FALSE) ## S3 method for class ''measr::measrdcm'' loo_compare(x, ..., criterion = c("loo", "waic"), model_names = NULL)
x |
A measrdcm object. |
... |
For |
r_eff |
Vector of relative effective sample size estimates for the
likelihood ( |
force |
If the LOO criterion has already been added to the
model object with |
criterion |
The name of the criterion to be extracted from the
|
model_names |
Names given to each provided model in the comparison
output. If |
For loo() and waic(), the information criteria returned by
loo::loo.array() or loo::waic.array(), respectively.
For loo_compare(), the criterion comparison returned by
loo::loo_compare().
Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413-1432. doi:10.1007/s11222-016-9696-4
For diagnostic classification models, the M2 statistic is calculated as described by Hansen et al. (2016) and Liu et al. (2016).
## S3 method for class ''measr::measrdcm'' fit_m2(model, ..., ci = 0.9, force = FALSE)## S3 method for class ''measr::measrdcm'' fit_m2(model, ..., ci = 0.9, force = FALSE)
model |
An estimated diagnostic classification model. |
... |
Unused, for extensibility. |
ci |
The confidence interval for the RMSEA. |
force |
If the M2 has already
been saved to the model object with |
A data frame created by dcm2::fit_m2().
Hansen, M., Cai, L., Monroe, S., & Li, Z. (2016). Limited-information goodness-of-fit testing of diagnostic classification item response models. British Journal of Mathematical and Statistical Psychology, 69(3), 225-252. doi:10.1111/bmsp.12074
Liu, Y., Tian, W., & Xin, T. (2016). An application of M2 statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41(1), 3-26. doi:10.3102/1076998615621293
rstn_mdm_lcdm <- dcm_estimate( dcm_specify(dcmdata::mdm_qmatrix, identifier = "item"), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "optim", seed = 63277, backend = "rstan" ) fit_m2(rstn_mdm_lcdm)rstn_mdm_lcdm <- dcm_estimate( dcm_specify(dcmdata::mdm_qmatrix, identifier = "item"), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "optim", seed = 63277, backend = "rstan" ) fit_m2(rstn_mdm_lcdm)
Used for determining examples that shouldn't be run on CRAN, but can be run for the pkgdown website.
measr_examples()measr_examples()
A logical value indicating whether or not the examples should be run.
measr_examples()measr_examples()
measrfit objectExtract model metadata, parameter estimates, and model evaluation results.
measr_extract(model, what, ...)measr_extract(model, what, ...)
model |
The estimated to extract information from. |
what |
Character string. The information to be extracted. See details for available options. |
... |
Additional arguments passed to each extract method.
|
For diagnostic classification models, we can extract the following information:
prior: The priors used when estimating the model.
classes: The possible classes or profile patterns. This will show the
class label (i.e., the pattern of proficiency) and the attributes
included in each class.
item_param: The estimated item parameters. This shows the name of the
parameter, the class of the parameter, and the estimated value.
strc_param: The estimated structural parameters. This is the base rate
of membership in each class. This shows the class pattern, the attributes
present in each class, and the estimated proportion of respondents in
each class.
attribute_base_rate: The estimated base rate of attribute proficiency.
Calculated from the structural parameters of the classes where each
attribute is present.
pi_matrix: The model estimated probability that a respondent in the
given class provides a correct response to the item. The output shows the
the item (rows), class (columns), and estimated p-values.
exp_pvalues: Model expected p-values for each item. This is
equivalent to the pi_matrix, but also includes an "overall" field,
which represents the expected p-value for each item (i.e., an average
of the class-specific p-values, weighted by the prevalence of each
class).
class_prob: The probability that each respondent belongs to class
(i.e., has the given pattern of proficiency).
attribute_prob: The proficiency probability for each respondent and
attribute.
m2: The M2 fit statistic.
See fit_m2() for details.
rmsea: The root mean square error of approximation (RMSEA) fit
statistic and associated confidence interval. See fit_m2() for details.
srmsr: The standardized root mean square residual (SRMSR) fit
statistic. See fit_m2() for details.
ppmc_raw_score: The observed and posterior predicted chi-square
statistic for the raw score distribution. See fit_ppmc() for details.
ppmc_conditional_prob: The observed and posterior predicted conditional
probabilities of each class providing a correct response to each item.
See fit_ppmc() for details.
ppmc_conditional_prob_flags: A subset of the PPMC conditional
probabilities where the ppp is outside the specified ppmc_interval.
ppmc_odds_ratio: The observed and posterior predicted odds ratios of
each item pair. See fit_ppmc() for details.
ppmc_odds_ratio_flags: A subset of the PPMC odds ratios where the ppp
is outside the specified ppmc_interval.
ppmc_pvalue: The observed and posterior predicted proportion of correct
responses to each item. See fit_ppmc() for details.
ppmc_pvalue_flags: A subset of the PPMC proportion correct values where
the ppp is outside the specified ppmc_interval.
loo: The leave-one-out cross validation results. See loo::loo() for
details.
waic: The widely applicable information criterion results. See
loo::waic() for details.
aic: The Akaike information criterion results. See aic() for details.
bic: The Bayesian information criterion results. See bic() for
details.
pattern_reliability: The accuracy and consistency of the overall
attribute profile classification, as described by Cui et al. (2012).
classification_reliability: The classification accuracy and consistency
for each attribute, using the metrics described by Johnson & Sinharay
(2018).
probability_reliability: Reliability estimates for the probability of
proficiency on each attribute, as described by Johnson & Sinharay (2020).
The extracted information. The specific structure will vary depending on what is being extracted, but usually the returned object is a tibble with the requested information.
Cui, Y., Gierl, M. J., & Chang, H.-H. (2012). Estimating classification consistency and accuracy for cognitive diagnostic assessment. Journal of Educational Measurement, 49(1), 19-38. doi:10.1111/j.1745-3984.2011.00158.x
Johnson, M. S., & Sinharay, S. (2018). Measures of agreement to assess attribute-level classification accuracy and consistency for cognitive diagnostic assessments. Journal of Educational Measurement, 55(4), 635-664. doi:10.1111/jedm.12196
Johnson, M. S., & Sinharay, S. (2020). The reliability of the posterior probability of skill attainment in diagnostic classification models. Journal of Educational and Behavioral Statistics, 45(1), 5-31. doi:10.3102/1076998619864550
Templin, J., & Bradshaw, L. (2013). Measuring the reliability of diagnostic classification model examinee estimates. Journal of Classification, 30(2), 251-275. doi:10.1007/s00357-013-9129-4
rstn_mdm_lcdm <- dcm_estimate( dcm_specify(dcmdata::mdm_qmatrix, identifier = "item"), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "optim", seed = 63277, backend = "rstan" ) measr_extract(rstn_mdm_lcdm, "strc_param")rstn_mdm_lcdm <- dcm_estimate( dcm_specify(dcmdata::mdm_qmatrix, identifier = "item"), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "optim", seed = 63277, backend = "rstan" ) measr_extract(rstn_mdm_lcdm, "strc_param")
The measrdcm constructor is exported to facilitate the conversion of other
model objects (e.g., stanfit) to measrdcm objects. We do not expect or
recommend calling this function directly, unless you are creating a method
for converting to measrdcm. Rather, to create a measrdcm object, one
should use dcm_estimate().
measrdcm( model_spec = NULL, data = list(), stancode = character(0), method = stanmethod(), algorithm = character(0), backend = stanbackend(), model = list(), respondent_estimates = list(), fit = list(), criteria = list(), reliability = list(), file = character(0), version = list() )measrdcm( model_spec = NULL, data = list(), stancode = character(0), method = stanmethod(), algorithm = character(0), backend = stanbackend(), model = list(), respondent_estimates = list(), fit = list(), criteria = list(), reliability = list(), file = character(0), version = list() )
model_spec |
The model specification used to estimate the model. |
data |
The data used to estimate the model. |
stancode |
The model code in Stan language. |
method |
The method used to fit the model. |
algorithm |
The name of the algorithm used to fit the model. |
backend |
The name of the backend used to fit the model. |
model |
The fitted Stan model. This will object of class
rstan::stanfit if |
respondent_estimates |
An empty list for adding estimated person parameters after fitting the model. |
fit |
An empty list for adding model fit information after fitting the model. |
criteria |
An empty list for adding information criteria after fitting the model. |
reliability |
An empty list for adding reliability information after fitting the model. |
file |
Optional name of a file which the model objects was saved to or loaded from. |
version |
The versions of measr, Stan, and rstan or cmdstanr that were used to fit the model. |
A measrdcm object.
qmatrix <- tibble::tibble( att1 = sample(0:1, size = 15, replace = TRUE), att2 = sample(0:1, size = 15, replace = TRUE), att3 = sample(0:1, size = 15, replace = TRUE), att4 = sample(0:1, size = 15, replace = TRUE) ) spec <- dcm_specify(qmatrix = qmatrix) measrdcm(spec)qmatrix <- tibble::tibble( att1 = sample(0:1, size = 15, replace = TRUE), att2 = sample(0:1, size = 15, replace = TRUE), att3 = sample(0:1, size = 15, replace = TRUE), att4 = sample(0:1, size = 15, replace = TRUE) ) spec <- dcm_specify(qmatrix = qmatrix) measrdcm(spec)
Add model evaluation metrics to fitted model objects. These functions are wrappers around other functions that compute the metrics. The benefit of using these wrappers is that the model evaluation metrics are saved as part of the model object so that time-intensive calculations do not need to be repeated. See Details for specifics.
add_criterion( x, criterion = c("loo", "waic", "log_mll", "aic", "bic"), overwrite = FALSE, save = TRUE, ..., r_eff = NA ) add_reliability(x, overwrite = FALSE, save = TRUE, ...) add_fit( x, method = c("m2", "ppmc"), ndraws = NULL, return_draws = ndraws, overwrite = FALSE, save = TRUE, ..., ci = 0.9 ) add_respondent_estimates( x, probs = c(0.025, 0.975), overwrite = FALSE, save = TRUE )add_criterion( x, criterion = c("loo", "waic", "log_mll", "aic", "bic"), overwrite = FALSE, save = TRUE, ..., r_eff = NA ) add_reliability(x, overwrite = FALSE, save = TRUE, ...) add_fit( x, method = c("m2", "ppmc"), ndraws = NULL, return_draws = ndraws, overwrite = FALSE, save = TRUE, ..., ci = 0.9 ) add_respondent_estimates( x, probs = c(0.025, 0.975), overwrite = FALSE, save = TRUE )
x |
A measrdcm object. |
criterion |
A vector of information criteria to calculate and add to the
model object. Must be |
overwrite |
Logical. Indicates whether specified elements that have
already been added to the estimated model should be overwritten. Default is
|
save |
Logical. Only relevant if a file was specified in the
measrdcm object passed to |
... |
Arguments passed on to
|
r_eff |
Vector of relative effective sample size estimates for the
likelihood ( |
method |
A vector of model fit methods to evaluate and add to the model object. |
ndraws |
The number of posterior draws to base the checks on. Must be
less than or equal to the total number of posterior draws retained in the
estimated model. If |
return_draws |
Number of posterior draws for each specified fit
statistic to be returned. This does not affect the calculation of the
posterior predictive checks, but can be useful for visualizing the fit
statistics. Must be less than |
ci |
The confidence interval for the RMSEA, computed from the M2 |
probs |
The percentiles to be computed by the |
For add_respondent_estimates(), estimated person parameters are added to
the @respondent_estimates element of the fitted model.
For add_fit(), model and item fit information are added to the @fit
element of the fitted model. This function wraps fit_m2() to calculate the
M2 statistic (Hansen et al., 2016;
Liu et al., 2016) and/or fit_ppmc() to calculate posterior predictive model
checks (Park et al., 2015; Sinharay & Almond, 2007; Sinharay et al., 2006;
Thompson, 2019), depending on which methods are specified.
For add_criterion(), relative fit criteria are added to the @criteria
element of the fitted model. For models estimated with MCMC, this function
wraps loo() or waic() to calculate the LOO-CV (Vehtari et al., 2017) or
WAIC (Watanabe, 2010), respectively, or log_mll() to calculate the log
marginal likelihood, which is used for calculating Bayes factors.
For models estimated with the optimizer, this wraps aic() or bic()
to estimate the AIC (Akaike, 1973) or BIC (Schwarz, 1978), respectively.
For add_reliability(), reliability information is added to the
@reliability element of the fitted model. Pattern level reliability is
described by Cui et al. (2012). Classification reliability and posterior
probability reliability are described by Johnson & Sinharay (2018, 2020),
respectively. This function wraps reliability(). Arguments supplied to
... are passed to reliability().
A modified measrdcm object with the corresponding slot populated with the specified information.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csáki (Eds.), Proceedings of the Second International Symposium on Information Theory (pp. 267-281). Akademiai Kiado.
Cui, Y., Gierl, M. J., & Chang, H.-H. (2012). Estimating classification consistency and accuracy for cognitive diagnostic assessment. Journal of Educational Measurement, 49(1), 19-38. doi:10.1111/j.1745-3984.2011.00158.x
Hansen, M., Cai, L., Monroe, S., & Li, Z. (2016). Limited-information goodness-of-fit testing of diagnostic classification item response models. British Journal of Mathematical and Statistical Psychology, 69(3), 225-252. doi:10.1111/bmsp.12074
Johnson, M. S., & Sinharay, S. (2018). Measures of agreement to assess attribute-level classification accuracy and consistency for cognitive diagnostic assessments. Journal of Educational Measurement, 55(4), 635-664. doi:10.1111/jedm.12196
Johnson, M. S., & Sinharay, S. (2020). The reliability of the posterior probability of skill attainment in diagnostic classification models. Journal of Educational and Behavioral Statistics, 45(1), 5-31. doi:10.3102/1076998619864550
Liu, Y., Tian, W., & Xin, T. (2016). An application of M2 statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41(1), 3-26. doi:10.3102/1076998615621293
Park, J. Y., Johnson, M. S., Lee, Y-S. (2015). Posterior predictive model checks for cognitive diagnostic models. International Journal of Quantitative Research in Education, 2(3-4), 244-264. doi:10.1504/IJQRE.2015.071738
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. doi:10.1214/aos/1176344136
Sinharay, S., & Almond, R. G. (2007). Assessing fit of cognitive diagnostic models. Educational and Psychological Measurement, 67(2), 239-257. doi:10.1177/0013164406292025
Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30(4), 298-321. doi:10.1177/0146621605285517
Thompson, W. J. (2019). Bayesian psychometrics for diagnostic assessments: A proof of concept (Research Report No. 19-01). University of Kansas; Accessible Teaching, Learning, and Assessment Systems. doi:10.35542/osf.io/jzqs8
Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413-1432. doi:10.1007/s11222-016-9696-4
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11(116), 3571-3594. https://jmlr.org/papers/v11/watanabe10a.html
cmds_mdm_dina <- dcm_estimate( dcm_specify( dcmdata::mdm_qmatrix, identifier = "item", measurement_model = dina(), priors = c( prior(beta(5, 17), type = "slip"), prior(beta(5, 17), type = "guess") ) ), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "optim", seed = 63277, backend = "rstan" ) cmds_mdm_dina <- add_reliability(cmds_mdm_dina) cmds_mdm_dina <- add_fit(cmds_mdm_dina, method = "m2") cmds_mdm_dina <- add_respondent_estimates(cmds_mdm_dina)cmds_mdm_dina <- dcm_estimate( dcm_specify( dcmdata::mdm_qmatrix, identifier = "item", measurement_model = dina(), priors = c( prior(beta(5, 17), type = "slip"), prior(beta(5, 17), type = "guess") ) ), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "optim", seed = 63277, backend = "rstan" ) cmds_mdm_dina <- add_reliability(cmds_mdm_dina) cmds_mdm_dina <- add_fit(cmds_mdm_dina, method = "m2") cmds_mdm_dina <- add_respondent_estimates(cmds_mdm_dina)
Calculate Q-matrix validation metrics for a fitted model objects using methods described by de la Torre and Chiu (2016). See details for additional information.
qmatrix_validation(x, ..., pvaf_threshold = 0.95)qmatrix_validation(x, ..., pvaf_threshold = 0.95)
x |
A measrdcm object. |
... |
Unused. |
pvaf_threshold |
The threshold for proportion of variance accounted for to flag items for appropriate empirical specifications. The default is .95 as implemented by de la Torre and Chiu (2016). |
Q-matrix validation is conducted by evaluating the proportion of variance accounted for by different Q-matrix specifications. Following the method described by de la Torre and Chiu (2016), we use the following steps for each item:
Calculate the total variance explained if an item measured all possible attributes.
For each possible Q-matrix entry, calculate the variance explained if the item measured the given attributes. Calculate the proportion of variance explained (PVAF) as the variance explained by the current Q-matrix entry divided by the variance explained by the saturated entry (Step 1).
After computing the PVAF for all possible Q-matrix entries, filter to only
those with a PVAF greater than the specified pvaf_threshold threshold.
Filter the remaining Q-matrix entries to those that measure the fewest number of attributes (i.e., we prefer a more parsimonious model).
If there is more than one Q-matrix entry remaining, select the entry with the highest PVAF.
A tibble containing the Q-matrix validation results. There is one row per item with 5 columns:
The item identifier, as specified in the Q-matrix and used to estimate the model.
original_specification: The original Q-matrix entry for the item.
original_pvaf: The proportion of variance accounted for by the original
specification, compared to a specification where the item measures all
attributes.
empirical_specification: The Q-matrix specification that measures the
fewest attributes with a proportion of variance accounted for over the
the specified pvaf_threshold threshold. If the original specification is
optimal, empirical_specification will be NA.
empirical_pvaf: The proportion of variance accounted for by the empirical
specification, compared to a specification where the item measures all
attributes. If the original specification is optimal, emprirical_pvaf
will be NA.
de la Torre, J., & Chiu, C.-Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81(2), 253-273. doi:10.1007/s11336-015-9467-8
mod_spec <- dcm_specify( qmatrix = dcmdata::ecpe_qmatrix, identifier = "item_id", measurement_model = dcmstan::lcdm(), structural_model = dcmstan::hdcm( hierarchy = "lexical -> cohesive -> morphosyntactic" ) ) rstn_ecpe <- dcm_estimate( mod_spec, data = dcmdata::ecpe_data, identifier = "resp_id", backend = "rstan", method = "optim" ) q_matrix_validation <- qmatrix_validation(rstn_ecpe)mod_spec <- dcm_specify( qmatrix = dcmdata::ecpe_qmatrix, identifier = "item_id", measurement_model = dcmstan::lcdm(), structural_model = dcmstan::hdcm( hierarchy = "lexical -> cohesive -> morphosyntactic" ) ) rstn_ecpe <- dcm_estimate( mod_spec, data = dcmdata::ecpe_data, identifier = "resp_id", backend = "rstan", method = "optim" ) q_matrix_validation <- qmatrix_validation(rstn_ecpe)
For diagnostic classification models, reliability can be estimated at the pattern or attribute level. Pattern-level reliability represents the classification consistency and accuracy of placing students into an overall mastery profile. Rather than an overall profile, attributes can also be scored individually. In this case, classification consistency and accuracy should be evaluated for each individual attribute, rather than the overall profile. This is referred to as the maximum a posteriori (MAP) reliability. Finally, it may be desirable to report results as the probability of proficiency or mastery on each attribute instead of a proficient/not proficient classification. In this case, the reliability of the posterior probability should be reported. This is the expected a posteriori (EAP) reliability.
reliability(x, ..., threshold = 0.5, force = FALSE)reliability(x, ..., threshold = 0.5, force = FALSE)
x |
The estimated model to be evaluated. |
... |
Unused. For future extensions. |
threshold |
For |
force |
If reliability information has already been added to the model
object with |
The pattern-level reliability (pattern_reliability) statistics are
described in Cui et al. (2012). Attribute-level classification reliability
statistics (map_reliability) are described in Johnson & Sinharay (2018).
Reliability statistics for the posterior mean of the skill indicators (i.e.,
the mastery or proficiency probabilities; eap_reliability) are described in
Johnson & Sinharay (2019).
For class measrdcm, a list with 3 elements:
pattern_reliability: The pattern-level accuracy (p_a) and consistency
(p_c) described by Cui et al. (2012).
map_reliability: A list with 2 elements: accuracy and consistency,
which include the attribute-level classification reliability statistics
described by Johnson & Sinharay (2018).
eap_reliability: The attribute-level posterior probability reliability
statistics described by Johnson & Sinharay (2020).
Cui, Y., Gierl, M. J., & Chang, H.-H. (2012). Estimating classification consistency and accuracy for cognitive diagnostic assessment. Journal of Educational Measurement, 49(1), 19-38. doi:10.1111/j.1745-3984.2011.00158.x
Johnson, M. S., & Sinharay, S. (2018). Measures of agreement to assess attribute-level classification accuracy and consistency for cognitive diagnostic assessments. Journal of Educational Measurement, 55(4), 635-664. doi:10.1111/jedm.12196
Johnson, M. S., & Sinharay, S. (2020). The reliability of the posterior probability of skill attainment in diagnostic classification models. Journal of Educational and Behavioral Statistics, 45(1), 5-31. doi:10.3102/1076998619864550
rstn_mdm_lcdm <- dcm_estimate( dcm_specify(dcmdata::mdm_qmatrix, identifier = "item"), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "optim", seed = 63277, backend = "rstan" ) reliability(rstn_mdm_lcdm)rstn_mdm_lcdm <- dcm_estimate( dcm_specify(dcmdata::mdm_qmatrix, identifier = "item"), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "optim", seed = 63277, backend = "rstan" ) reliability(rstn_mdm_lcdm)
Calculate posterior draws of respondent proficiency. Optionally retain all posterior draws or return only summaries of the distribution for each respondent.
score( x, newdata = NULL, missing = NA, identifier = NULL, summary = TRUE, probs = c(0.025, 0.975), force = FALSE )score( x, newdata = NULL, missing = NA, identifier = NULL, summary = TRUE, probs = c(0.025, 0.975), force = FALSE )
x |
An estimated model (e.g., from |
newdata |
Optional new data. If not provided, the data used to estimate
the model is scored. If provided, |
missing |
An |
identifier |
Optional. Variable name of a column in |
summary |
Should summary statistics be returned instead of the raw
posterior draws? Only relevant if the model was estimated with a method
that results in posterior distributions (e.g., "mcmc", "variational").
Default is |
probs |
The percentiles to be computed by the |
force |
If respondent estimates have already been added to the model
object with |
A list with two elements: class_probabilities and
attribute_probabilities.
If summary is FALSE, each element is a tibble with one row per
respondent. The columns include the respondent identifier, and one column
of probabilities for each of the possible classes or attributes (as
posterior::rvar() objects).
If summary is TRUE, each element is a tibble with one row per respondent
and class or attribute. The columns include the respondent identifier,
class or attribute, mean, and one column for every value specified in
probs.
rstn_mdm_lcdm <- dcm_estimate( dcm_specify(dcmdata::mdm_qmatrix, identifier = "item"), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "optim", seed = 63277, backend = "rstan" ) score(rstn_mdm_lcdm, summary = FALSE)rstn_mdm_lcdm <- dcm_estimate( dcm_specify(dcmdata::mdm_qmatrix, identifier = "item"), data = dcmdata::mdm_data, missing = NA, identifier = "respondent", method = "optim", seed = 63277, backend = "rstan" ) score(rstn_mdm_lcdm, summary = FALSE)
The constructors for Stan back-ends and methods are exported to support
extensions to measr, for example converting other models to measrfit
objects. We do not expect or recommend calling these functions directly
unless you are converting objects, or creating new methods for measrfit
objects.
rstan() cmdstanr() mcmc() optim() variational() pathfinder() gqs()rstan() cmdstanr() mcmc() optim() variational() pathfinder() gqs()
There are two classes for estimation backends, which define the package that
should be used, or was used, to estimate a model. Both classes inherit from
measr::stanbackend.
The rstan() class indicates use of the {rstan} package.
cmdstanr() indicates use of the {cmdstanr} package.
The method classes define which estimation method should be used, or was
used, for a model. All method classes inherit from measr::stanmethod.
The mcmc() class indicates the use of Markov chain Monte Carlo via
rstan::sampling() when using {rstan} or the
$sample() method of the
CmdStanModel class when using {cmdstanr}.
The variational() class indicated the use of Stan's variational algorithm
for approximate posterior sampling via rstan::vb() when using {rstan} or
the $variational() method of the
CmdStanModel class when using {cmdstanr}.
The pathfinder() class indicates the use of pathfinder variational
inference algorithm via the $pathfinder() method
of the CmdStanModel. This method is only available
when using {cmdstanr}.
The optim() class indicates the use maximum-likelihood via
rstan::optimizing() when using {rstan} or the
$optimize() method of the
CmdStanModel class when using {cmdstanr}.
Finally, there is a gqs() class for use when a model has previously been
estimated and were are interested in calculating generated quantities (e.g.,
score(), loglik_array()).
The gqs() class indicates the use of rstan::gqs() when using {rstan}
and the $generate_quantities() method of
the CmdStanModel class when using {cmdstanr}.
An S7 object with the corresponding class.
rstan() mcmc()rstan() mcmc()
Calculate the Q3 statistic to evaluate the assumption of independent items.
yens_q3(x, ..., crit_value = 0.2, summary = NULL)yens_q3(x, ..., crit_value = 0.2, summary = NULL)
x |
A measrdcm object. |
... |
Unused. |
crit_value |
The critical value threshold for flagging the residual correlation of a given item pair. The default is 0.2, as described by Chen and Thissen (1997). |
summary |
A summary statistic to be returned. Must be one of |
Psychometric models assume that items are independent of each other, conditional on the latent trait. The Q3 statistic (Yen, 1984) is used to evaluate this assumption. For each observed item response, we calculate the residual between the model predicted score and the observed score and then estimate correlations between the residuals across items. Each residual correlation is a Q3 statistic.
Often, a critical values is used to flag a residual correlation above a given threshold (e.g., Chen & Thissen, 1997). Alternatively, we may use a summary statistic such as the maximum Q3 statistic (Q3,max; Christensen et al., 2017), or the mean-adjusted maximum Q3 statistic (Q3,*; Marais, 2013).
If summary = NULL, a tibble with the residual correlation and
flags for all item pairs. Otherwise, a numeric value representing the
requested summary statistic.
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-389. doi:10.3102/10769986022003265
Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical values for Yen's Q3: Identification of local dependence in the Rasch model using residual correlations. Applied Psychological Measurement, 41(3), 178-194. doi:10.1177/0146621616677520
Marais, I. (2013). Local dependence. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.), Rasch models in health (pp. 111-130). Wiley.
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145. doi:10.1177/014662168400800201
model_spec <- dcm_specify( qmatrix = dcmdata::mdm_qmatrix, dentifier = "item" ) model <- dcm_estimate( dcm_spec = model_spec, data = dcmdata::mdm_data, identifier = "respondent", method = "optim", seed = 63277 ) yens_q3(model)model_spec <- dcm_specify( qmatrix = dcmdata::mdm_qmatrix, dentifier = "item" ) model <- dcm_estimate( dcm_spec = model_spec, data = dcmdata::mdm_data, identifier = "respondent", method = "optim", seed = 63277 ) yens_q3(model)