Inter-rater reliability of wound care skills checklist in objective structured clinical examination

Ina Laela Abdillah, Intansari Nurjannah


Background: The wound care skills checklist in objective structured clinical examination (OSCE) should be valid and reliable. Thus, the reliability test of the wound care skills checklist is needed. Purpose of the study was to identify the reliability of the wound care skills checklist.

Methods: This study is a descriptive non-experimental quantitative research with a cross-sectional study design. This study was conducted in the School of Nursing, Universitas Gadjah Mada, Indonesia. The number of respondents was 94 second-year students of this school of nursing. Inter-rater reliability was performed by 2 raters during OSCE. Kappa and Percent agreement (PA) were used to analyze the reliability of the checklist.

Results: Inter-rater reliability of the wound care skills checklist is categorized as good based on kappa value (0.7613) and acceptable based on PA value (89.36%). The results of the twenty-two item checklist were divided into five categories. Sixteen of the twenty-two items on the wound care skills checklist are included in the first category in which kappa category (≥0.41) and PA (>70%) are acceptable. One item is in the second category which has unacceptable value of kappa and PA, one item is in the third category which has low kappa value (0.3974) and high PA (89.36%), one item is in the fourth category which has a kappa value of 0, and three items are in the fifth category which has negative kappa value.

Conclusions: Inter-rater reliability of the wound care skills checklist OSCE in this nursing school can be categorized as good and acceptable.


Checklist, Evaluation, Inter-rater reliability, Kappa, OSCE, Percent agreement

Full Text:



Houghton C, Casey D, Shaw D, Murphy K. Staff and students' perceptions and experiences of teaching and assessment in Clinical Skills Laboratories: Interview findings from a multiple case study. Nurse Education Today. 2012;32(6):e29-34.

Schwab DP. Research methods for organizational studies: Psychology Press. 2013.

Peyré S, Peyré C, Hagen J, Sullivan M. Reliability of a procedural checklist as a high-stakes measurement of advanced technical skill. Amer J Surg. 2010;199(1):110-4.

Patrício M, Julião M, Fareleira F, Young M, Norman G, Vaz Carneiro A. A comprehensive checklist for reporting the use of OSCEs. Medical teacher. 2009;31(2):112-24.

McCray G, editor. Assessing inter-rater agreement for nominal judgement variables. Language Testing Forum. 2013.

McHugh M. Interrater reliability: the kappa statistic. Biochemia Medica. 2012;22(3):276-82.

Osborne JW. Best practices in quantitative methods: Sage. 2008.

Rushforth H. Objective structured clinical examination (OSCE): review of literature and implications for nursing education. Nurse Education Today. 2007;27(5):481-90.

Medical Council of Canada. Guidelines of the development of objective structured clinical examination (OSCE) cases. 2013; Available from: wp-content/ uploads/ osce-booklet-2014.pdf.

Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. Journal of clinical epidemiology. 1990;43(6):543-9.

Viera A, Garrett J. Understanding interobserver agreement: the kappa statistic. Fam Med. 2005;37(5):360-3.

Cargo M, Stankov I, Thomas J, Saini M, Rogers P, Mayo-Wilson E, et al. Development, inter-rater reliability and feasibility of a checklist to assess implementation (Ch-IMP) in systematic reviews: the case of provider-based prevention and treatment programs targeting children and youth. BMC medical research methodology. 2015;15(1):1.

Sim J, Wright C. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Physical therapy. 2005;85(3):257-68.

Xie Q. Agree or Disagree? A Demonstration of An Alternative Statistic to Cohen’s Kappa for Measuring the Extent and Reliability of Agreement between Observers. 2013.

Ludbrook J. Detecting systematic bias between two raters. Clin experi pharma physio. 2004;31(1‐2):113-5.

Gisev N, Bell J, Chen T. Interrater agreement and interrater reliability: key concepts, approaches, and applications. Res Social Adm Pharm. 2013;9(3):330-8.

Kvålseth T. Measurement of Interobserver Disagreement: Correction of Cohen’s Kappa for Negative Values. J Proba Statis. 2015;2015.

Krippendorff K. Agreement and information in the reliability of coding. Comm Met Meas. 2011;5(2):93-112.

Cunningham M. More than just the kappa coefficient: A program to fully characterize inter-rater reliability between two raters, SAS Global Forum: Statistics and data analysis. Retrieved Sept. 2009;21:2010.

Graham M, Milanowski A, Miller J. Measuring and Promoting Inter-Rater Agreement of Teacher and Principal Performance Ratings. Online Submission. 2012.

Joyce M, editor. Picking the best intercoder reliability statistic for your digital activism content analysis. Digital Activism Research Project: Investigating the Global Impact of Comment Forum Speech as a Mirror of Mainstream Discourse; 2013.

Cazzell M, Howe C. Using objective structured clinical evaluation for simulation evaluation: Checklist considerations for interrater reliability. Clinical simulation in Nursing. 2012;8(6):e219-e25.