Understanding Validity and Reliability in Healthcare Simulation Assessment

Understanding Validity and Reliability in Healthcare Simulation Assessment

Messick’s Unified Validity Theory and Kane’s Argument-based Validity Framework are used to support validation work in healthcare simulation. While both frameworks have been applied in healthcare and healthcare education, there is a growing and continued trend in their use within the field of clinical simulation. As medical simulation becomes more widely used in high-stakes assessment, being able to assure the public that it can be used reliably and is a valid way to assess knowledge and skills is vital. This HealthySimulation.com article by Jill Sanko, PhD, APRN, CHSE-A, FSSH, will explore these validity theories as applied to healthcare simulation.

What is Validity?

Validity is a key concept and is important in fields that measure outcomes or measurable metrics, e.g. nearly all fields. In disciplines like clinical medicine, where physical instruments are used to measure specific properties (such as scales used to weigh materials, spectrometers used to count cells, and pulse oximeters used to measure oxygen levels), these instruments are calibrated. Calibration ensures that the measurements made by these tools are accurate, reliable, and reproducible.

In disciplines like healthcare simulation or education, the tools used to measure something are not physical tools but rather rating or self-reporting tools that aim to measure an observed (latent) variable rather than a physical one. These measurement tools cannot physically be calibrated, but still need to be accurate. Therefore, a method is needed to provide evidence of their accuracy. The simplest way to demonstrate the accuracy of these tools is to demonstrate their ability to reliably measure the construct (an abstract concept or theoretical idea that researchers attempt to measure, e.g. intelligence, competence, knowledge) in which the tool was designed to measure each time and under all circumstances used.


View the HealthySimulation.com Webinar Assessing Team Performance in Simulated Neonatal Resuscitation Programs to learn more!


The ability to support the validity and reliability of a measurement tool is important to assure users that a valid and reliable measure will be achieved at each use and in each circumstance used. Why is this important? Consider the context of high-stakes assessments, but really this is important for all assessments. The use of high-stakes assessments is ubiquitous in healthcare disciplines, nursing students take the National Council Licensure Examination (NCLEX), medical students take a several part licensing exam called United States Medical Licensing Examination (USMLE; step exams), and physical therapy students take the National Physical Therapy Exam (NPTE).

The organizations that administer these milestone licensing exams must assure the public that when each of these exams are administered, regardless of the date taken, state administered, or person taking them, that they can accurately and reproducibly assess individuals’ mastery of disciplinary knowledge and readiness for professional practice. Imagine if these tests varied from administration to administration. Would you feel confident that a student who took a licensing exam in March 2020 was as qualified as one who took an exam in September 2024?

As validity is discussed in particular – consider a few tenets of validity as proposed in the validity theory:

  1. Validity is not a property of a test
  2. A test is not valid or invalid
  3. What is sought when assessing the abilities of a tool are inferences and uses of scores
  4. Validity is not an all or none attribute
  5. Validity must be evaluated with respect to the purpose of specific testing purposes
  6. Evaluation of the validity of inferences from any one test score requires multiple sources of evidence
  7. Validation is ongoing, continuous data helps to support notions of validity for a tool
  8. Construct validity is an inclusive form of validity evidence where the construct is measured in a precise and accurate way.

Validation has evolved over the last 100 years and continues to evolve. Both Messick and Kane have been instrumental in this evolution from a validation focused on the validity of the test itself to the validity of the interpretations and uses of scores that are generated from the tools. Healthcare simulationists can thank both Messick and Kane for our more modern view of validation. More on the evolution of validation can be read in Kane and Bridgman’s 2021 commentary on the topic.


View the new HealthySimulation.com Community Simulation Research Group to discuss this topic with your Global Healthcare Simulation peers!


Messick’s Unified Theory Validity Framework

Messick’s Validity Framework, also known as Messick’s Unified Validity Framework, was published in 1995 and is the most commonly used validity framework employed in medical education, as found in the literature. Messick’s Validity Framework has been adopted and modified by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education as part of their Standards for Educational and Psychological Testing. Messick’s framework describes five sources of validity evidence:

  • Content Aspect: alignment between content and construct
  • Substantive Aspect: alignment among raters or respondents and construct
  • Structural Aspect: reliability across items (questions, observations) that make up the tool
  • Relational Aspect: evidence of the statistical associations between assessment scores and another measure with a specified theoretical relationship
  • Generalizability: the ability for the data or tool to hold true in other contexts

Messick considered all validity construct validity. Messick’s unifying framework boils down to a key idea that validity is a unified concept that incorporates both traditional measurement concerns and broader social/contextual considerations.

Kane’s Argument-based Validity Framework

Following Messick, Kane (2006, 2013) extended Messick’s work and developed his own Argument-based Validity Framework. Kane’s framework iis founded on the idea that the construction of a logical reasoned argument based on an assessment’s interpretations and uses is a better way to understand a tool’s ability to capture and measure a construct (latent variable). The development of his framework shifted the focus from simply demonstrating a relationship to making a case that what matters is how an assessment accurately measures and explains a specific construct. In other words, Kane’s framework focuses on the specific uses, meaning of scores, and collection of evidence to demonstrate the plausibility of the interpretations being correct. His framework represents a pragmatic, flexible, and transparent model of validity. Kane’s framework is based on four inferences:

  • Scoring: what is observed
  • Generalization: applicability across contexts
  • Extrapolation: meaning making
  • Implications: usefulness

Simulation-based education and validation continue to evolve, driven by an increased focus on the utilization of healthcare simulation for assessment. Validation studies will become increasingly needed, as will the ability to be able to provide data on the validity and reliability of scenarios and the measurement tools used to evaluate learner outcomes. Therefore, knowledge about this topic is important for healthcare simulation practitioners, researchers, and educators.

Messick’s and Kane’s frameworks are two theories that should help guide efforts as the need arises to expand knowledge about the validity and reliability of simulation as a means to assess knowledge and skills. Calhoun and Scerbo (2022) present a nice guide, with a deeper dive into the use of these frameworks and publishing validation work in healthcare simulation. Continuation of the work to generate data to demonstrate the validity and reliability of healthcare simulation, scenarios, and simulation assessment tools is important to the field of simulation education and, more broadly, healthcare-related education. As data increases, so does confidence in the use of simulation. Moreover, these efforts generate disciplinary accountability that helps to ensure fair, defensible, and reproducible educational assessment in the field of simulation.

Learn More About Healthcare Simulation Research Journals!

Resources Cited

Other Related Reads

  • Cook, David A. MD, MHPE; Brydges, Ryan PhD; Zendejas, Benjamin MD, MSc; Hamstra, Stanley J. PhD; Hatala, Rose MD, MSc. Technology-Enhanced Simulation to Assess Health Professionals: A Systematic Review of Validity Evidence, Research Methods, and Reporting Quality. Academic Medicine 88(6):p 872-883, June 2013. | DOI: 10.1097/ACM.0b013e31828ffdcf

Jill SankoPhD, APRN, CHSE-A, FSSH

Adjunct professor at Walden University and MGH IHP

Jill Sanko is an award-winning nurse scientist focused on using simulation-based education and research to improve healthcare systems. She began her career in simulation- at the National Institutes of Health Clinical Center as the founding Associate Director of the Simulation and Patient Safety Program. Over two decades in the field have afforded her many opportunities to impact healthcare simulation through her efforts. She has over 50 published articles, book chapters and media and has presented her work nationally and internationally. Outside her current teaching and research roles she is active in her community; currently serving in several roles within the Society for Simulation in Healthcare as co-chair of the distance simulation affinity group, chair of the meetings oversight commission, and chair of the Academy of Simulation Fellows. She also recently took on a role in the Distance Simulation Collaboration as co-chair of the communication and visibility committee.