Understanding Validity and Reliability in Healthcare Simulation Assessment

September 3, 2025

PhD, APRN, CHSE-A, FSSH

Understanding Validity and Reliability in Healthcare Simulation Assessment

Messick’s Unified Validity Theory and Kane’s Argument-based Validity Framework are used to support validation work in healthcare simulation. While both frameworks have been applied in healthcare and healthcare education, there is a growing and continued trend in their use within the field of clinical simulation. As medical simulation becomes more widely used in high-stakes assessment, being able to assure the public that it can be used reliably and is a valid way to assess knowledge and skills is vital. This HealthySimulation.com article by Jill Sanko, PhD, APRN, CHSE-A, FSSH, will explore these validity theories as applied to healthcare simulation.

What is Validity?

Validity is a key concept and is important in fields that measure outcomes or measurable metrics, e.g. nearly all fields. In disciplines like clinical medicine, where physical instruments are used to measure specific properties (such as scales used to weigh materials, spectrometers used to count cells, and pulse oximeters used to measure oxygen levels), these instruments are calibrated. Calibration ensures that the measurements made by these tools are accurate, reliable, and reproducible.

In disciplines like healthcare simulation or education, the tools used to measure something are not physical tools but rather rating or self-reporting tools that aim to measure an observed (latent) variable rather than a physical one. These measurement tools cannot physically be calibrated, but still need to be accurate. Therefore, a method is needed to provide evidence of their accuracy. The simplest way to demonstrate the accuracy of these tools is to demonstrate their ability to reliably measure the construct (an abstract concept or theoretical idea that researchers attempt to measure, e.g. intelligence, competence, knowledge) in which the tool was designed to measure each time and under all circumstances used.

View the HealthySimulation.com Webinar Assessing Team Performance in Simulated Neonatal Resuscitation Programs to learn more!

The ability to support the validity and reliability of a measurement tool is important to assure users that a valid and reliable measure will be achieved at each use and in each circumstance used. Why is this important? Consider the context of high-stakes assessments, but really this is important for all assessments. The use of high-stakes assessments is ubiquitous in healthcare disciplines, nursing students take the National Council Licensure Examination (NCLEX), medical students take a several part licensing exam called United States Medical Licensing Examination (USMLE; step exams), and physical therapy students take the National Physical Therapy Exam (NPTE).

The organizations that administer these milestone licensing exams must assure the public that when each of these exams are administered, regardless of the date taken, state administered, or person taking them, that they can accurately and reproducibly assess individuals’ mastery of disciplinary knowledge and readiness for professional practice. Imagine if these tests varied from administration to administration. Would you feel confident that a student who took a licensing exam in March 2020 was as qualified as one who took an exam in September 2024?

As validity is discussed in particular – consider a few tenets of validity as proposed in the validity theory:

Validity is not a property of a test
A test is not valid or invalid
What is sought when assessing the abilities of a tool are inferences and uses of scores
Validity is not an all or none attribute
Validity must be evaluated with respect to the purpose of specific testing purposes
Evaluation of the validity of inferences from any one test score requires multiple sources of evidence
Validation is ongoing, continuous data helps to support notions of validity for a tool
Construct validity is an inclusive form of validity evidence where the construct is measured in a precise and accurate way.

Validation has evolved over the last 100 years and continues to evolve. Both Messick and Kane have been instrumental in this evolution from a validation focused on the validity of the test itself to the validity of the interpretations and uses of scores that are generated from the tools. Healthcare simulationists can thank both Messick and Kane for our more modern view of validation. More on the evolution of validation can be read in Kane and Bridgman’s 2021 commentary on the topic.

View the new HealthySimulation.com Community Simulation Research Group to discuss this topic with your Global Healthcare Simulation peers!

Messick’s Unified Theory Validity Framework

Messick’s Validity Framework, also known as Messick’s Unified Validity Framework, was published in 1995 and is the most commonly used validity framework employed in medical education, as found in the literature. Messick’s Validity Framework has been adopted and modified by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education as part of their Standards for Educational and Psychological Testing. Messick’s framework describes five sources of validity evidence:

Content Aspect: alignment between content and construct
Substantive Aspect: alignment among raters or respondents and construct
Structural Aspect: reliability across items (questions, observations) that make up the tool
Relational Aspect: evidence of the statistical associations between assessment scores and another measure with a specified theoretical relationship
Generalizability: the ability for the data or tool to hold true in other contexts

Messick considered all validity construct validity. Messick’s unifying framework boils down to a key idea that validity is a unified concept that incorporates both traditional measurement concerns and broader social/contextual considerations.

Kane’s Argument-based Validity Framework

Following Messick, Kane (2006, 2013) extended Messick’s work and developed his own Argument-based Validity Framework. Kane’s framework iis founded on the idea that the construction of a logical reasoned argument based on an assessment’s interpretations and uses is a better way to understand a tool’s ability to capture and measure a construct (latent variable). The development of his framework shifted the focus from simply demonstrating a relationship to making a case that what matters is how an assessment accurately measures and explains a specific construct. In other words, Kane’s framework focuses on the specific uses, meaning of scores, and collection of evidence to demonstrate the plausibility of the interpretations being correct. His framework represents a pragmatic, flexible, and transparent model of validity. Kane’s framework is based on four inferences:

Scoring: what is observed
Generalization: applicability across contexts
Extrapolation: meaning making
Implications: usefulness

Simulation-based education and validation continue to evolve, driven by an increased focus on the utilization of healthcare simulation for assessment. Validation studies will become increasingly needed, as will the ability to be able to provide data on the validity and reliability of scenarios and the measurement tools used to evaluate learner outcomes. Therefore, knowledge about this topic is important for healthcare simulation practitioners, researchers, and educators.

Messick’s and Kane’s frameworks are two theories that should help guide efforts as the need arises to expand knowledge about the validity and reliability of simulation as a means to assess knowledge and skills. Calhoun and Scerbo (2022) present a nice guide, with a deeper dive into the use of these frameworks and publishing validation work in healthcare simulation. Continuation of the work to generate data to demonstrate the validity and reliability of healthcare simulation, scenarios, and simulation assessment tools is important to the field of simulation education and, more broadly, healthcare-related education. As data increases, so does confidence in the use of simulation. Moreover, these efforts generate disciplinary accountability that helps to ensure fair, defensible, and reproducible educational assessment in the field of simulation.

Learn More About Healthcare Simulation Research Journals!

Resources Cited

Calhoun, Aaron W. MD, FSSH; Scerbo, Mark W. PhD, FSSH. Preparing and Presenting Validation Studies: A Guide for the Perplexed. Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare 17(6):p 357-365, December 2022. | DOI: 10.1097/SIH.0000000000000667
Clauser, B.E., & Bunch, M.B. (Eds.). (2021). The History of Educational Measurement: Key Advancements in Theory, Policy, and Practice (1st ed.). Routledge. https://doi.org/10.4324/9780367815318
Kane, M. (2013). The Argument-Based Approach to Validation. School Psychology Review, 42(4), 448–457. https://doi.org/10.1080/02796015.2013.12087465
Tenants of validity. https://jeromedelisle.org/edme_6300-measurement_theory/validity_theory

Other Related Reads

Cook, David A. MD, MHPE; Brydges, Ryan PhD; Zendejas, Benjamin MD, MSc; Hamstra, Stanley J. PhD; Hatala, Rose MD, MSc. Technology-Enhanced Simulation to Assess Health Professionals: A Systematic Review of Validity Evidence, Research Methods, and Reporting Quality. Academic Medicine 88(6):p 872-883, June 2013. | DOI: 10.1097/ACM.0b013e31828ffdcf

Tags:

evaluation, healthcare simulation, Reliability, research, tools, Validity

Jill SankoPhD, APRN, CHSE-A, FSSH

Adjunct professor at Walden University and MGH IHP

Jill Sanko is an award-winning nurse scientist focused on using simulation-based education and research to improve healthcare systems. She began her career in simulation- at the National Institutes of Health Clinical Center as the founding Associate Director of the Simulation and Patient Safety Program. Over two decades in the field have afforded her many opportunities to impact healthcare simulation through her efforts. She has over 50 published articles, book chapters and media and has presented her work nationally and internationally. Outside her current teaching and research roles she is active in her community; currently serving in several roles within the Society for Simulation in Healthcare as co-chair of the distance simulation affinity group, chair of the meetings oversight commission, and chair of the Academy of Simulation Fellows. She also recently took on a role in the Distance Simulation Collaboration as co-chair of the communication and visibility committee.

View more articles by Jill Sanko

Browse Resources By

Discipline

Phase

Role

Operations

Vendor Resources

Featured Vendors

Procedural

Immersive

Supportive

Browse Webinars By

Discipline

Phase

Modality

Various

Understanding Validity and Reliability in Healthcare Simulation Assessment

Jill SankoPhD, APRN, CHSE-A, FSSH

Sign Up for Your Free Healthcare Simulation Newsletters!

Latest Articles

Simulation Australasia – SimAust

Strategic Approach: How to Maximize Healthcare Simulation Conference Experiences

Downloadable Moulage Checklist Balances Realism and Educational Value

How To Design a High-Impact Clinical Simulation Scenario

Browse Resources By

Discipline

Phase

Role

Operations

Vendor Resources

Featured Vendors

Procedural

Immersive

Supportive

Browse Webinars By

Discipline

Phase

Modality

Various

Understanding Validity and Reliability in Healthcare Simulation Assessment

Jill SankoPhD, APRN, CHSE-A, FSSH

Related Community Groups

Sign Up for Your Free Healthcare Simulation Newsletters!

Related Webinars

Related Articles

Latest Articles