36-835: Statistics and Model-Based Measurement

The readings are organized into topic clusters.  After the introduction,
we can pick and choose topical clusters, and the readings within
clusters can be adjusted.  Below I first list all the clusters I
thought of, then after that I list the individual papers per cluster.

Here are the topical clusters.  For the first 4--6 weeks, it seems
clear to me what to cover, and the readings go at the rate of about one
cluster (1.5--2 papers) per week.  Then it gets "tentative".

1. Introductory Readings				
2. Maximum Likelihood and Empirical Bayes 	
3. Notes on MCMC 				
4. Two Parametric Applications 		
5. "Nonparametric" IRT  			
6. Model Fit I  				tentative
7. Model Fit II  				tentative

After this we have to choose clusters for the rest of the semester.
Some suggested clusters are below.  Within most clusters I have listed
many more papers than we can read; I will narrow the list as I "see"
how many clusters we want to talk about.  They can be done in more or
less any order, except that 9. and 10. go together.

8. Latent Class and Related Models
9. Some Cognitive Assessment Models
10. Applications in Computer-Based Cognitive Tutoring
11. Modeling the Judgements of Graders/Raters
12. Two Social Science Models [ grade-of-membership and unfolding models ]
13. Computerized Adaptive Tests (Sequentual Estimation)

The following are supplemental [ but we could choose one or more for
discussion if there is interest ]

14. Some References on Intelligent Tutors
15. General information on Bayesian Networks
16. More Estimation and Numerical Methods



Here are the papers themselves.  Class discussion will focus on the
starred readings (at least in the first 4--6 clusters where I've
marked them).  The other papers are for background or elaboration.



1. Introductory Readings
------------------------

(*) Junker, B. W. (2000). Factor analysis and latent structure: IRT and
Rasch models.  To appear, {\em International Encylcopedia of the
Social and Behavioral Sciences}.

(*) van der Linden W. J. and Hambleton, R. K. (1997). Item response
theory: brief history, common models and extensions.  Chapter 1
(pp. 1--28) in van der Linden W. J. and Hambleton, R. K. (eds)
(1997). {\em Handbook of modern item response theory.}  New York:
Springer-Verlag.





2. Maximum Likelihood and Empirical Bayes
-----------------------------------------

(*) Holland, P. W. (1990). On the sampling theory foundations of item
response theory models.  {\em Psychometrika, 55,} 577--601.

	[ Esp. the ten pages: 577--581 and 592--597 ] 

(*) Tanner, M. A. (1996).  "Maximum likelihood", Section 2.3 [pp 26--30];
and "The EM algorithm", Chapter 4 [pp. 64--86] in Tanner,
M. A. (1996). {\em Tools for statistical inference: methods for the
exploration of posterior distributions and likelihood functions.  3rd
Ed.} New York: Springer-Verlag.

	[ Minimally, look at the ten pages: 26--30, 64--65, 69--71 ]

(*) Muraki, E. (1997).  A generalized partial credit model.  Chapter
(pp. 153--164) in van der Linden W. J. and Hambleton, R. K. (eds)
(1997). {\em Handbook of modern item response theory.}  New York:
Springer-Verlag.





3. Notes on MCMC
----------------

(*) Patz, R. J. and Junker, B. W. (1999). A straightforward approach
to Markov Chain Monte Carlo methods for item response models.  {\em
Journal of Educational and Behavioral Statistics, 24,} 146--178.

(*) Junker, B. W. (1999).  Some schematic MCMC algorithms for IRT and
related educational and cognitive assessment models.  Notes.






4. Two Parametric Applications
------------------------------

(*) Johnson, M. S., Cohen, W. and Junker, B. W. (1999).  Measuring
Appropriability in Research and Development with Item Response Models.
CMU Statistics Department Technical Report \#690.  [WWW Document.] URL
{\tt http://www.stat.cmu.edu/cmu-stats/tr}.

(*) Ip, E. H. and Scott, S. L. (2000).  Empirical bayes and item
clustering effects in a latent variable hierarchical model: A case
study from the National Assessment of Educational Progress.  Paper
presented at the Annual North American Meeting of the Psychometric
Society, July 2000. Vancouver BC, Canada.






5. "Nonparametric" IRT 
----------------------

(*) Sijtsma, K. (1998). Methodology review: Nonparametric IRT approaches
to the analysis of dichotomous item scores. {\em Applied Psychological
Measurement, 22,} 3--31.

(*) Ramsay, J. O. (1996) A geometrical approach to item response theory.
Behaviormetrika, 23, 3-17. [ draft obtained from
http://www.psych.mcgill.ca/faculty/ramsay.html ]

  [ the following are for background only, if you are interested ]

Hemker, B. T., Sijtsma K., Molenaar, I. W. and Junker, B. W. (1997).
Stochastic ordering using the latent trait and the sum score in
polytomous IRT models.  {\em Psychometrika}, 62, 331--347.

Ramsay, J. O. (1991).  Kernel smoothing approaches to nonparametric
item characteristic curve estimation. {\em Psychometrika, 56,} 611--630.

Stout, W. F. (1990).  A new item response theory modeling approach
with applications to unidimensionality assessment and ability
estimation.  {\it Psychometrika, 55,} 293--325.








6. Model Fit I  --  tentative
--------------

(*) Stone, C. A. (2000).  Monte Carlo based null distribution for an
alternative goodness-of-fit test statistic for IRT models.  {\em
Journal of Educational Measurement, 37,} 58--75.

(*) Hoijtink, H. and Molenaar, I. W. (1997). A multidimensional item
response model: constrained latent class analysis using the Gibbs
Sampler and posterior predictive checks. {\em Psychometrika, 62,}
171--189.


	[ Following for background only, if you are interested ]

Gelman, A., Goegebeur, Y., Tuerlinckx, F., and Van Mechelen, I.
(2000).  Diagnostic checks for discrete-data regression models using
posterior predictive simulations.  To appear. {\em Applied
Statistics}.  






7. Model Fit II  --  tentative
---------------

(*) Kass, R. E. and Raftery, A. E. (1995). Bayes Factors. {\em Journal of
the American Statistical Association, 90,} 773--795.

(*) Spiegelhalter, D. J., Best, N. G., and Carlin, B. P. (1998).  Bayesian
deviance, the effective number of parameters and the comparison of
arbitrarily complex models.  Discussion paper obtained 27 Aug 2000
from http://www.mrc-bsu.cam.ac.uk/Publications/preslid.shtml


	[ Following for further reading only, if you are interested ]

Han, C. and Carlin, B. P. (2000). MCMC methods for computing Bayes
Factors: a comparative review.  Research Report 2000-001, Obtained 28
August 2000 from http://www.biostat.umn.edu/





********************************************
*                                          *
*  From here on out, we can discuss what   *
*  the emphasis of the seminar should be.  *
*  Some suggested clusters appear below;   *
*  the number of papers per cluster will   *
*  be tailored to time we have available   *
*  for the clusters we choose.             *
*                                          *
********************************************





8. Latent Class and Related Models
----------------------------------

Bartholomew, D. J. (1987). Latent class models.  Chapter 2 in
Bartholomew, D. J. (1987). {\em Latent variable models and factor
analysis.} New York: Oxford University Press.

[ there is a 1999 edition with Knott that may be more up to date ]

Gelman, A., Leenen, I., Van Mechelen, I., and De Boeck, P. (2000).
Bridges between deterministic and probabilistic models for binary
data.  Manuscript obtained 27 Aug 2000 from
http://www.stat.columbia.edu/~gelman/research/unpublished/







9. Some Cognitive Assessment Models
-----------------------------------

Sijtsma, K. and Verweij, A. C. (1999). Knowledge of solution
strategies and IRT modeling of items for transitive reasoning. {\em
Applied Psychological Measurement, 23,} 55--68.

Mislevy, R. J. (1996). Test theory reconceived.  {\em Journal of
Educational Measurement, 33,} 379--416.

Junker, B. W. (2000).  An orientation to some statistical models for
educational and cognitive assessment.  Manuscript.

VanLehn, K., Niu, Z., Siler, S. and Gertner, A. (1998).  Student
modeling from conventional test data: a Bayesian approach without
priors. pp. 434--443 in Goetl, B. et al. (Eds.) (1998).  {\em
Proceedings of the Intelligent Tutoring Systems Fourth International
Conference, ITS 98.}  Berlin, Hiedelberg: Springer-Verlag.

Haertel, E. H. and Wiley, D. E. (1993).  Representations of
ability structures: implications for testing.  Chapter 14 in
Fredriksen, N. and Mislevy, R. J. (eds.) (1993).  {\em Test theory for
a new generation of tests.}  Hillsdale, NJ: Lawrence Erlbaum
Associates.

Tatsuoka, K. K. (1995).  Architecture of knowledge structures and
cognitive diagnosis: a statistical pattern recognition and
classification approach.  Chapter 14 (pp. 327-359) in Nichols, P. D.,
Chipman, S. F. and Brennan, R. L. (eds.) (1995).  {\em Cognitively
diagnostic assessment.}  Hillsdale, NJ: Lawrence Erlbaum Associate.

Falmagne, J.-C., Koppen, M., Villano, M., Doignon, J.-P., and
Johannesen, L. (1990).  Introduction to knowledge spaces: how to
build, test and search them. {\em Psychological Review, 97,} 201--224.

Tatsuoka, C. (2000).  Model fitting an item analysis for cognitive
diagnosis.  Technical Report 00-03, Department of Statistics, George
Washington University, Washington DC.








10. Applications in Computer-Based Cognitive Tutoring
-----------------------------------------------------

Corbett, A. T., Anderson, J. R. and O'Brien, A. T. (1995).  Student
modeling in the ACT programming tutor.  Chapter 2 in Nichols, P. D.,
Chipman, S. F. and Brennan, R. L. (eds.) (1995).  {\em Cognitively
diagnostic assessment.}  Hillsdale, NJ: Lawrence Erlbaum Associates.

Draney, K. L., Pirolli, P. and Wilson, M. (1995).  A measurement model
for a complex cognitive skill.  Chapter 5 in Nichols, P. D., Chipman,
S. F. and Brennan, R. L. (eds.) (1995).  {\em Cognitively diagnostic
assessment.}  Hillsdale, NJ: Lawrence Erlbaum Associates.

VanLehn, K. and Niu, Z. (1999). Bayesian student modeling, user
interfaces and feedback: A sensitivity analysis. In press, Journal of
Artificial Intelligence in Education.

Jameson, A. (1995).  Numerical uncertainty management in user and
student modeling: an overview of systems and issues.  {\em User
Modeling and User-Adapted Interaction, 5,} xxx-xxx.

   [ other papers from this special issue are also worth looking at ]






11. Modeling the Judgements of Graders/Raters
---------------------------------------------

Brennan, R. L. (1997). A perspective on the history of
generalizability theory.  {\em Educational Researcher, 16,} 14--20.

Engelhard, G., Jr. (1994).  Examining rater errors in the assessment
of written composition with many-faceted Rasch models.  {\em Journal
of Educational Measurement, 31,} 93--112.

Myford, C. M., and Mislevy, R. J. (1995).  {\em Monitoring and
improving a portfolio assessment system.}  Center for Performance
Assessment Research Report.  Princeton, NJ: Educational Testing
Service.

Patz, R. J., Junker, B. W. and Johnson, M. S. (2000).  The
hierarchical rater model for rated test items and its application to
large-scale educational assessment data.  Submitted to {\em Journal of
Educational and Behavioral Statistics.} [CMU Statistics Department
Technical Report \#712.  [WWW Document.] URL {\tt
http://www.stat.cmu.edu/cmu-stats/tr}.








12. Two Social Science Models
-----------------------------

Manton, K. and Woodbury, M. A. (1989). Grade of Membership analysis of
depression related psychiatric disorders.  Chapter 5 in Latent
Variable Models for Dichotomous Outcomes: Applications to Data from
the NIMH Epidemiological Catchment Area Program (W.W. Eaton,
Bohronstedt, eds.).  {\em Sociological Methods and Research, 18,}
126--163.


Roberts, J. S., Donoghue, J. R., and Loughlin, J. E. (2000).  A
general item response theory model for unfolding unidimensional
polytomous responses.  {\em Applied Psychological Measurement, 24,}
3--32.







13. Computerized Adaptive Tests (Sequentual Estimation)
-------------------------------------------------------

Mislevy, R. J., and Chang, H.-H. (2000).  Does adaptive testing
violate local independence?  {\em Psychometrika, 65,} 149--156.

van der Linden, W. J., and Reese, L. M. (1998).  A model for optimal
constrained adaptive testing.  {\em Applied Psychological Measurement,
22,} 259--270.

Bradlow, E. T., Weiss, R. E., and Cho, M. (1998). Bayesian
identification of outliers in computerized adaptive tests.  {\em
Journal of the American Statistical Association, 93,} 910--919.

Chang, H.-H. and  Ying, Z.-L. (in press). Nonlinear sequential designs
for logistic item response theory models, with applications to
computerized adaptive tests.  {\em Annals of Statistics, xx,}
xxx--xxx.

Wainer, H. (2000).  Rescuing computerized testing by breaking Zipf's
Law.  {\em Journal of Educational Statistics, 25,} 203--224.










14. Some References on Intelligent Tutors
-----------------------------------------

Intelligent Tutoring Systems aka (Computer-based) Cognitive Tutors:

Shute, V. J., & Psotka, J. (1996). Intelligent tutoring systems: Past,
Present and Future. In D. Jonassen (Ed.), Handbook of Research on
Educational Communications and Technology : Scholastic Publications.



Andes (a Bayes Net - based elementary Physics Tutor):

Gertner, A., and VanLehn, K. (2000).  Andes: A coached problem solving
environment for physics.  In {\em Proceedings 5th International
Conference, ITS 2000,} Montreal Canada, June 2000.  Obtained 28 August
2000 from http://www.pitt.edu/~vanlehn/distrib/titles.html.

VanLehn, K., Freedman, R., Jordan, P., Murray, C., Osan. R.,
Ringenberg, M., Rose, C., Schulze, K., Shelby, R., Treacy, D.,
Weinstein, A., and Wintersgill, A. (2000).  Fading and deepening: The
next steps for Andes and other model-tracing tutors.  In {\em
Proceedings 5th International Conference, ITS 2000,} Montreal Canada.
Obtained 28 August from http://www.pitt.edu/~vanlehn/distrib/titles.html.


PUMP Algebra Tutors (ACT-R - based elemetary Algebra Tutor):

Koedinger, K. R., Anderson, J. R., Hadley, W. H., & Mark,
M. A. (1995). Intelligent tutoring goes to school in the big city. In
J. Greer (Ed.), Proceedings of the 7th World Conference on Artificial
Intelligence and Education (pp. 421-428). Charlottesville, NC:
AACE. [8 pgs.]

Aleven, V., Koedinger, K. R., and Cross, K. (1999). Tutoring
answer-explanation fosters learning with understanding.   In
Lajoie, S. P. & Vivet, M. (Eds.) Artificial Intelligence in
Education, Open Learning Environments: New Computational Technologies
to Support Learning, Exploration, and Collaboration, Proceedings
of AIED-99, (pp.  199-206).  Amsterdam: IOS Press.
  







15. General information on Bayesian Networks
--------------------------------------------

Spiegelhalter, D. J., Dawid, A. P. Lauritzen, S. L., and Cowell,
R. G. (1993).  Bayesian analysis in expert systems (Disc: P247-283).
{\em Statistical Science,8,} 219--247.

http://www.mrc-bsu.cam.ac.uk/bugs/

http://www2.sis.pitt.edu/~genie/

http://www.hugin.dk/






16. More Estimation and Numerical Methods
-----------------------------------------


Jaakkola, T. S., and Jordan, M. I. (1999).  Bayesian parameter
estimation via variational methods.  In press, {\em Statistics and
Computing}.  Ms. obtained from the World Wide Web at address {\tt
http://www.cs.berk\-eley.edu/\~{}jordan/}, September 1999.

Hrycej, T. (1990).  Gibbs sampling in Bayesian networks.  {\em
Artificial Intelligence, 46,} 351--363.

Lauritzen, S. L., and Spiegelhalter, D. J. (1988).  Local computations
with probabilities on graphical structures {\em Journal of the Royal
Statistical Society, Series B, Methodological, 50,} 157--194}.

Michalewicz, Z., Esquivel, S., Gallard, R., Michalewicz, M., and Tau,
G. (1999).  {\em The spirit of evolutionary algorithms [some remarks
on design of evolutionary algorithms].}  Paper presented at the 3rd
On-line World Conference on Soft Computing in Engineering Design and
Manufacturing (WSC3), Internet, June 21--30, 1998.  Obtained from the
World Wide Web at address
{\tt http://www.coe.uncc.edu/\~{}zbyszek/}, September 1999.