(Post last updated June 16, 2022)
Review panel summary
The Students' Understanding of Models in Science (SUMS) instrument is a twenty-seven item, 5-point Likert-scale assessment, with response options including strongly disagree (1), disagree (2), not sure (3), agree (4), and strongly disagree (5). SUMS was designed to assess students’ understanding of the role of scientific models in science. It has been evaluated with students enrolled in high school science (chemistry, physics, and biology) classes in both Australia [1] and the United States [2, 3, 5], as well as undergraduate first-semester general chemistry courses in the United States [4]. Several aspects of validity and reliability have been assessed for the data generated by SUMS. During the development of this instrument, it was noted that the word “phenomenon” in SUMS items had a high response rate of “not sure” on the Likert scale [1]; the SUMS developers inferred that this indicated that students were likely unfamiliar with the word “phenomenon” and suggested readers to cautiously consider results from items containing this word, although interviews with students were not conducted to further probe the meaning in their responses to these specific items. In a subsequent study [4], student interviews were conducted to provide evidence for response process validity in which students were asked to respond to each SUMS item, describing their thought processes and any unclear language within the items. Potential issues regarding clarity were found with SUMS items 8 and 15 (confusing, double-barreled items) as well as SUMS items 16 and 23 (difficult to interpret); these issues were also noted in factor analyses [4]. In terms of validity, evidence for internal structure validity was provided through an exploratory factor analysis, resulting in a five-factor solution [1]. Subsequent studies in the literature have used this five-factor model without conducting any additional internal structure analyses [2, 3, 5]; however, one study within the context of first-semester general chemistry [4] reported an exploratory factor analysis which resulted in a four-factor solution, based on an included scree plot and eigenvalues greater than 1.00. This study additionally excluded four items due to insufficient factor loadings and cross-loadings with more than one factor indicated [4]; ultimately, a confirmatory factor analysis was conducted, which failed to indicate good model fit for the four-factor model. Evidence for relation to other variables (subject matter/age) [2] has also been reported in connection with SUMS; although gender was reported as having a significant correlation to one of the SUMS factors, no data was shown in the publication [1]. In terms of evidence for reliability, coefficient alpha has been used to estimate single administration reliability separately for each of the five factors [1-3]; however, one study [4] intentionally does not report coefficient alpha due to concerns regarding inconsistencies in the evidence for internal structure validity of data when using SUMS in the context of undergraduate chemistry students’ knowledge of scientific models.
Recommendations for use
SUMS was intended to assess students’ understanding of the role of scientific models in learning science, originally developed on a 5-point Likert scale [1]. Differences exist in the literature regarding how this response scale is used, with most studies collecting and analyzing data on the 5-point scale [2-4]. However, several studies in the literature truncate/collapse this scale for analysis [1, 5], including the original SUMS study [1]. This lack of consistency in how the SUMS scale is used makes it unclear which scale is most appropriate; it is suggested that future researchers provide support for the selection of a response scale based on theoretical framing. Additionally, differences exist in the literature regarding how to best analyze the data for item/factor scores. Although the original development study does not clarify scoring [1], a subsequent study reverse codes some items within one of the factors (models as exact replicas (ER)) [2], meaning that they assigned a value of 5 to strongly disagree responses and value of 1 to strongly agree responses. Other reports in the literature explicitly do not reverse score/code any items [4] in order to best compare to the original development of the instrument [1], meaning that a higher level of agreement to items does not necessarily align with a high level of understanding of scientific models. This lack of consistency in how the SUMS scale is scored makes it unclear which scoring strategy is most appropriate. It is suggested that future researchers provide support for the selection of the response scale based on theoretical framing. Finally, it has been reported that SUMS does not generate valid and reliable data in undergraduate chemistry student contexts [4].
Details from panel review
While some aspects of validity and reliability evidence for the SUMS instrument have been reported, the panel found it concerning that there is currently no evidence for test content validity, although one study provides minimal details on the exclusion of SUMS item 23 based on expert interpretation of problematic wording within that item. Regarding internal structure validity, the original development of the SUMS allowed for factor cross-loadings and reported correlations between factors without theoretical backing for why those factors might be related [1]; with the failed 4-factor model [4] and potential issues with the 5-factor model and its origins [1], more evidence related to the internal structure of the SUMS instrument is warranted in future studies.
