Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here for more information on Research and Evaluation in Education and Psychology, 3e

Sign In to gain access to subscriptions and/or personal tools.
Educational and Psychological Measurement
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Lunz, M. E.
Right arrow Articles by Wright, B. D.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Interjudge Reliability and Decision Reproducibility

Mary E. Lunz

American Society of Clinical Pathologists

John A. Stahl

American Society of Clinical Pathologists

Benjamin D. Wright

University of Chicago

The purpose of this article is to discuss the importance of decision reproducibility for performance assessments. When decisions from two judges about a student's performance using comparable tasks correlate, decisions have been considered reproducible. However, when judges differ in expectations and tasks differ in difficulty, decisions may not be independent of the particular judges or tasks encountered unless appropriate adjustments for the observable differences are made. In this study, data were analyzed with the Facets model and provided evidence that judges grade differently, whether or not the scores given correlate well. This outcome suggests that adjustments for differences among judge severities should be made before student measures are estimated to produce reproducible decisions for certification, achievement, or promotion.

Educational and Psychological Measurement, Vol. 54, No. 4, 913-925 (1994)
DOI: 10.1177/0013164494054004007


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?


This article has been cited by other articles:


Home page
Assessment for Effective InterventionHome page
B. Garrett, E. Towles, H. Kleinert, and J. Kearns
Portfolios in Large-Scale Alternate Assessment Systems: Frameworks for Reliability
Assessment for Effective Intervention, January 1, 2003; 28(2): 17 - 27.
[Abstract] [PDF]