Xiaolin Wang, Amy Ribera, Bob Gonyea — Messick (1989) defines validity as an "integrative evaluative judgment of the degree to which empirical evidence and theoretical rationales support the...appropriateness of interpretations and actions based on...scores or other modes of assessment" (p. 1487). One of the many methods to build a case for the validity of an instrument is by examining the stylistic tendencies of survey respondents, also known as response styles (Cronbach, 1946).
Survey respondents are known to follow many types of response styles, such as the tendency to unconditionally agree with the items and the tendency to select middle points on a Likert scale (Baumgartner & Steenkamp, 2001). Among the different response styles, the tendency to select endpoints on a Likert scale is known as extreme response style (ERS) (Greenleaf, 1992). For example, on a five-point Likert scale ranging from "strongly disagree" to "strongly agree," ERS refers to the tendency to select the endpoints, "strongly disagree" or "strongly agree," as opposed to the middle points ranging from "disagree" to "agree". Response styles such as ERS could contaminate group comparison results and lead to incorrect conclusions. With the existence of ERS, group score differences are mixtures of both true response difference and response style difference, which threatens survey validity since it is measuring more than the construct of interest. Since responses from surveys are commonly used for group comparisons, an understanding of whether ERS exists and if it potentially contaminates group comparison results would be very meaningful.
To investigate ERS in higher education assessment, we applied a generalized IRT-ERS (Jin & Wang, 2014) model to analyze the responses of 22,450 senior college students who participated in the National Survey of Student Engagement in 2014. We isolated our analysis to students from 71 four-year institutions that opted to administer the module item set, Global Perspectives Inventory. After estimating each individual's ERS parameter using the Markov chain Monte Carlo estimation method, we further compared the ERS of these students in order to reveal group differences by eight demographic factors (gender, enrollment status, international status, first-generation status, race and ethnicity, STEM major, sexual orientation, and disability status). Based on group comparison results from t-tests and F-tests, we found significant (p<.05) ERS tendency differences for two demographic factors: STEM major and first-generation status. Specifically, STEM students and non-first generation students were more likely to select either "Strongly Agree" or "Strongly Disagree" over "Agree", "Disagree", or "Neither".
We recommend that, when evaluating the validity of score interpretations, researchers should consider assessing survey response style effects through an IRT lens. In addition, we suggest that researchers examine whether different demographic groups show different ERS tendencies in order to better interpret the data. The issue of ERS is especially important when the survey results are utilized for policy or high-stakes decision making: rampant ERS exaggerates differences, and thus might draw attention where it may be unwarranted.
For more information you may read the full paper, presented at the 2017 American Educational Research Association in San Antonio, TX, here.