Reliability and Discriminant Validity of a Checklist for Surgical Scrubbing, Gowning and Gloving




Medical education, Surgery, Augmented reality, Virtual reality


Background: Surgical scrubbing, gowning, and gloving is challenging for medical trainees to learn in the operating room environment. Currently, there are few reliable or valid tools to evaluate a trainee’s ability to scrub, gown and glove. The objective of this study is to test the reliability and validity of a checklist that evaluates the technique of surgical scrubbing, gowning and gloving (SGG).

Methods: This Institutional Review Board-approved study recruited medical students, residents, and fellows from an academic, tertiary care institution. Trainees were stratified based upon prior surgical experience as novices, intermediates, or experts. Participants were instructed to scrub, gown and glove in a staged operating room while being video-recorded. Two blinded raters scored the videos according to the SGG checklist. Reliability was assessed using the intraclass correlation coefficient for total scores and Cohen’s kappa for item completion. The internal consistency and discriminant validity of the SGG checklist were assessed using Cronbach alpha and the Wilcoxon rank sum test, respectively.

Results: 56 participants were recruited (18 novices, 19 intermediates, 19 experts). The intraclass correlation coefficient demonstrated excellent inter-rater reliability for the overall checklist (0.990), and the Cohen’s kappa ranged from 0.598 to 1.00. The checklist also had excellent internal consistency (Cronbach’s alpha 0.950). A significant difference in scores was observed between all groups (p < 0.001).

Conclusion: This checklist demonstrates a high inter-rater reliability, discriminant validity, and internal consistency. It has the potential to enhance medical education curricula.


Park J, MacRae H, Musselman LJ, Rossos P, Hamstra SJ, Wolman S, et al. Randomized controlled trial of virtual reality simulator training: transfer to live patients. Am J Surg. 2007 Augt;194(2):205-11.

Pirie S. Surgical gowning and gloving. J Perioper Pract. 2010 Jun;20(6):207-9.

Pirie S. Hand washing and surgical hand antisepsis. J Perioper Pract. 2010 May;20(5):169-72.

Samia H, Khan S, Lawrence J, Delaney CP. Simulation and its role in training. Clin Colon Rectal Surg. 2013 Mar;26(1):47-55.

Hampton BS, Craig LB, Abbott JF, Buery-Joyner SD, Dalrymple JL, Forstein DA, et al. To the point: teaching the obstetrics and gynecology medical student in the operating room. Am J Obstet Gynecol. 2015 Oct;213(4):464-8.

Kanumuri P, Ganai S, Wohaibi EM, Bush RW, Grow DR, Seymour NE. Virtual reality and computer-enhanced training devices equally improve laparoscopic surgical skill in novices. JSLS. 2008 Jul-Sep;12(3):219-26.

Berg K, Berg D, Riesenberg LA, Mealey K, Schaeffer A, Weber D, et al. The development of validated checklist for Foley catheterization: preliminary results. Am J Med Qual. 2013 Nov-Dec;28(6):519-24.

Berg K, Riesenberg LA, Berg D, Schaeffer A, Davis J, Justice EM, et al. The development of a validated checklist for radial arterial line placement: preliminary results. Am J Med Qual. 2014 May-Jun;29(3):242-6.

Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: theory and application. Am J Med. 2006 Feb;119(2):166 e7-16.

Grant EC, Grant VJ, Bhanji F, Duff JP, Cheng A, Lockyer JM. The development and assessment of an evaluation tool for pediatric resident competence in leading simulated pediatric resuscitations. Resuscitation. 2012 Jul;83(7):887-93.

van der Heide PA, van Toledo-Eppinga L, van der Heide M, van der Lee JH. Assessment of neonatal resuscitation skills: a reliable and valid scoring system. Resuscitation. 2006 Nov;71(2):212-21.

Baez J, Powell E, Leo M, Stolz U, Stolz L. Derivation of a procedural performance checklist for ultrasound-guided femoral arterial line placement using the modified Delphi method. J Vasc Access. 2020 Sep;21(5):715-22.

Ilgen JS, Ma IW, Hatala R, Cook DA. A systematic review of validity evidence for checklists versus global rating scales in simulation-based assessment. Med Educ. 2015 Feb;49(2):161-73.

Hasty BN, Lau JN, Tekian A, Miller SE, Shipper ES, Bereknyei Merrell S, et al. Validity Evidence for a Knowledge Assessment Tool for a Mastery Learning Scrub Training Curriculum. Acad Med. 2020 Jan;95(1):129-35.

Barnum TJ, Salzman DH, Odell DD, Even E, Reczynski A, Corcoran J, et al. Orientation to the Operating Room: An Introduction to the Surgery Clerkship for Third-Year Medical Students. MedEdPORTAL. 2017 Nov 14;13:10652.

Jeyakumar A, Sabu S, Segeran F. Adequacy of Scrubbing, Gowning and Gloving Among Operating room Nurses. IOSR Journal of Nursing and Health Science. 2017;6(1):61-4.

Canton S, Foley C, Donnellan N. Development of Surgical Scrubbing, Gowning and Gloving Checklist using the Delphi Method. MedEdPublish. 2020 Mar 26;9.

Stufflebeam DL. Guidelines for developing evaluation checklists: the checklists development checklist (CDC). Kalamazoo, MI: The Evaluation Center Retrieved on January 16 2000.

Dong Y, Suri HS, Cook DA, Kashani KB, Mullon JJ, Enders FT, et al. Simulation-based objective assessment discerns clinical proficiency in central line placement: a construct validation. Chest. 2010 May;137(5):1050-6.

Hanlon C, Medhin G, Alem A, Araya M, Abdulahi A, Hughes M, et al. Detecting perinatal common mental disorders in Ethiopia: validation of the self-reporting questionnaire and Edinburgh Postnatal Depression Scale. J Affect Disord. 2008 Jun;108(3):251-6.

Murphy SP, Kaiser LL, Townsend MS, Allen LH. Evaluation of validity of items for a food behavior checklist. J Am Diet Assoc. 2001 Jul;101(7):751-61.

Hallgren KA. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. Tutor Quant Methods Psychol. 2012;8(1):23-34.

Portney LG, Watkins MP. Foundations of clinical research: applications to practice. Upper Saddle River, New Jersey: Pearson/Prentice Hall; 2009.

Cohen J. A coefficient of agreement for nominal scales. Educational and psychological measurement. 1960 Apr;20(1):37-46.

McHugh ML. Interrater reliability: the kappa statistic. Biochemia medica: Biochemia medica. 2012 Oct 15; 22(3):276-82.

Cronbach LJ. Coefficient alpha and the internal structure of tests. psychometrika. 1951 Sep;16(3):297-334.

Tavakol M, Dennick R. Making sense of Cronbach's alpha. International journal of medical education. 2011;2: 53.

Nunnally JC, Bernstein IH. Psychometric Theory. 3rd ed. New York: McGraw-Hill; 1994.

Association of periOperative Registered Nurses (AORN). Available from: Last updated August 15, 2019; cited September 10 2019.

United States Medical Licensing Examination (USMLE). Step 2 CS. Available from: Last updated July 1 2019; cited July 18, 2019.

Mohan S, Follansbee C, Nwankwo U, Hofkosh D, Sherman FS, Hamilton MF. Embedding patient simulation in a pediatric cardiology rotation: a unique opportunity for improving resident education. Congenit Heart Dis. 2015 Jan-Feb;10(1):88-94.

Sperling JD, Clark S, Kang Y. Teaching medical students a clinical approach to altered mental status: simulation enhances traditional curriculum. Med Educ Online. 2013 Apr 3;18:1-8.

Dayal AK, Fisher N, Magrane D, Goffman D, Bernstein PS, Katz NT. Simulation training improves medical students' learning experiences when performing real vaginal deliveries. Simul Healthc. 2009 Fall;4(3):155-9.

Hodges B, McNaughton N, Tiberius R. OSCE checklists do not capture increasing. Acad Med. 1999;74:1129-3.

Reronr R. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an GSCE-format examination. Acad Med. 1998;73:993-7.

Ringsted C, Ostergaard D, Ravn L, Pedersen JA, Berlac PA, van der Vleuten CP. A feasibility study comparing checklists and global rating forms to assess resident performance in clinical skills. Med Teach. 2003 Nov;25(6):654-8.

van der Vleuten CP, Swanson DB. Assessment of clinical skills with standardized patients: state of the art. Teaching and Learning in Medicine: An International Journal. 1990 Jan 1;2(2):58-76.

Archer JC. State of the science in health professional education: effective feedback. Medical education. 2010 Jan;44(1):101-8.

Cunnington JP, Neville AJ, Norman GR. The risks of thoroughness: Reliability and validity of global ratings and checklists in an OSCE. Adv Health Sci Educ Theory Pract. 1996 Jan 1;1(3):227-33.

Norman G. Editorial–checklists vs. ratings, the illusion of objectivity, the demise of skills and the debasement of evidence. Advances in Health Sciences Education. 2005 Mar 1;10(1):1-3.

Norman G, Van der Vleuten C, De Graaff E. Pitfalls in the pursuit of objectivity: issues of validity, efficiency and acceptability. Medical education. 1991 Mar;25(2):119-26.

treiner DL. Statistics Commentary Series: Commentary No. 20: Statistical Significance and Practical Importance. Journal of clinical psychopharmacology. 2017 Jun 1;37(3):287-8.

Van der Vleuten C, Norman G, De Graaff E. Pitfalls in the pursuit of objectivity: issues of reliability. Medical education. 1991 Mar;25(2):110-8.

Eva KW, Hodges BD. Scylla or Charybdis? Can we navigate between objectification and judgement in assessment? Medical education. 2012 Sep;46(9):914-9.

Govaerts MJ, Van der Vleuten CP, Schuwirth LW, Muijtjens AM. Broadening perspectives on clinical performance assessment: rethinking the nature of in-training assessment. Advances in health sciences education. 2007 May;12(2):239-60.

Hodges B, McIlroy JH. Analytic global OSCE ratings are sensitive to level of training. Medical education. 2003 Nov;37(11):1012-6.

Schuwirth LW, van der Vleuten CP. A plea for new psychometric models in educational assessment. Medical education. 2006 Apr;40(4):296-300.

Holmboe ES, Hawkins RE, Huot SJ. Effects of training in direct observation of medical residents' clinical competence: a randomized trial. Annals of internal medicine. 2004 Jun 1;140(11):874-81.

Kogan JR, Hess BJ, Conforti LN, Holmboe ES. What drives faculty ratings of residents' clinical skills? The impact of faculty's own clinical skills. Academic Medicine. 2010 Oct 1;85(10):S25-S8.

Lievens F. Assessor training strategies and their effects on accuracy, interrater reliability, and discriminant validity. Journal of Applied Psychology. 2001 Apr;86(2):255.

Campbell S, Cantrill J. Consensus methods in prescribing research. Journal of clinical pharmacy and therapeutics. 2001 Feb 15;26(1):5-14.

lahlafi A, Burge S. What should undergraduate medical students know about psoriasis? Involving patients in curriculum development: modified Delphi technique. BMJ. 2005 Mar 17;330(7492):633-6.

Ferri CP, Prince M, Brayne C, Brodaty H, Fratiglioni L, Ganguli M, et al. Global prevalence of dementia: a Delphi consensus study. The Lancet. 2005 Dec 17;366(9503):2112-7.

Downing SM. Validity: on the meaningful interpretation of assessment data. Medical education. 2003 Sep;37(9):830-7.

Downing SM, Haladyna TM. Validity threats: overcoming interference with proposed interpretations of assessment data. Medical education. 2004 Mar;38(3):327-33.


2021-11-24 — Updated on 2022-04-13


How to Cite

Canton, S. P., Foley, C. E., Fulcher, I., Newcomb, L. K., Rindos, N., & Donnellan, N. M. (2022). Reliability and Discriminant Validity of a Checklist for Surgical Scrubbing, Gowning and Gloving. International Journal of Medical Students, 10(1), 18–24. (Original work published April 5, 2022)



Original Article