Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

CiteULike is a free service for managing and discovering scholarly references - click here to get started.

Sign In to gain access to subscriptions and/or personal tools.
Educational and Psychological Measurement
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Fan, X.
Right arrow Articles by Chen, M.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Published Studies of Interrater Reliability Often Overestimate Reliability: Computing the Correct Coefficient

Xitao Fan

Utah State University

Michael Chen

University of Mississippi

It is erroneous to generalize the interrater reliability coefficient estimated from two or more raters rating only a (small) portion of the sample to the rest of the sample data for which only one rater is used for scoring, although such generalization is often made implicitly in practice. If the interrater reliability estimate from part of a sample is available, the score reliability for the rest of the sample data for which only one rater is used for scoring can be estimated both within the framework of classical reliability theory and that of generalizability theory. As intuitively expected, score reliability when only one rater is used for scoring is lower than the score reliability for which two raters are used. The authors provide a sample of published studies in different disciplines that inappropriately generalized reliability coefficients involving several raters to scores generated by a single rater.

Educational and Psychological Measurement, Vol. 60, No. 4, 532-542 (2000)
DOI: 10.1177/00131640021970709


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?


This article has been cited by other articles:


Home page
Group Processes Intergroup RelationsHome page
G. Randsley de Moura, T. Leader, J. Pelletier, and D. Abrams
Prospects for Group Processes and Intergroup Relations Research: A Review of 70 Years' Progress
Group Processes Intergroup Relations, October 1, 2008; 11(4): 575 - 596.
[Abstract] [PDF]


Home page
Language TestingHome page
C. Roever
Validation of a web-based test of ESL pragmalinguistics
Language Testing, April 1, 2006; 23(2): 229 - 256.
[Abstract] [PDF]