(Post last updated 09 June 2023)
Review panel summary
Lawson’s Classroom Test of Scientific Reasoning (LCTSR) was developed in 2000 by modifying the 1978 version of Lawson’s Classroom Test of Formal Reasoning (CTFR-78) [1,5]. The LCTSR has been used to measure scientific reasoning of students in undergraduate introductory biology [1,2,6,8,9], physics [3,5], astronomy [3], and chemistry [4,10] courses, as well as during a professional development workshop for secondary teachers [7]. The LCTSR is generally administered as a 24-item (12 question pair) multiple-choice test [2-10], although the original development paper states it was administered as a written test [1]. The items are associated with six different reasoning patterns: conservation, proportional reasoning, control of variables, probability, correlation reasoning, and hypothetical-deductive reasoning [1,5].The first 10 question pairs follow a common two-tier multiple-choice design where participants are first asked to select their response to a prompt and then asked to select an answer that supports that response [5]. The last two question pairs are designed slightly differently; the first asks participants to select an experimental design to test a hypothesis and then select an outcome that would disprove the hypothesis, while the last question pair asks participants to select outcomes that would disprove two hypotheses for an experiment [5]. These two question pairs were specifically developed to assess participants’ hypothetical-deductive reasoning [1,5]. The structure of the items has led to three different methods of scoring. The first is scoring each item individually for a maximum of 24 points [2,5,6,7]. The second is to use pair scoring, where participants have to answer both items in a question pair correctly to score 1 point. Pair scoring all 12 question pairs gives a maximum of 12 points [3,4,5,7]. Since the last question pair asks students about two different hypotheses, some studies score this last pair individually while scoring the other 11 question pairs pairwise, leading to a maximum of 13 points [1,8,10]. Some studies also grouped participants into reasoning levels based on their score [1,2,3,7,10]. For example, when scoring out of 13 total points, participants could be grouped as Level 0 (0-3 points), Low Level 1 (4-6 points), High Level 1 (7-10 points), or Level 2 (11-13 points) based on their score [1]. Similar reasoning levels created from LCTSR scores have been used to create collaborative groups that varied in the composition of students’ reasoning levels to assess the effect of different group compositions on reasoning gains in different classroom environments (e.g., inquiry vs. didactic) [2].
Evidence based on test content of data collected with the LCTSR is minimal. Expert feedback from seven science faculty reported concerns regarding five question pairs that showed a high amount of inconsistent student responses (i.e., where students answered one item in a question pair correctly but the other item incorrectly). These concerns were related to the multiple-choice options, design and presentation of the items, and complexity of some of the scenarios [5]. Evidence based on response process was evaluated through open-ended written explanations and interviews with students. Results further supported the expert concerns about the five question pairs with high inconsistent response patterns [5]. Evidence based on relation to other variables has been gathered using the LCTSR score as well as with reasoning levels. Pre to post score comparisons found that students’ scientific reasoning improved over the course of a semester of instruction in an introductory biology course [1]. Additionally, LCTSR scores were found to be positively related to exam scores [1], ACS exam scores [4], and course grades (i.e., success in the course) [8]. Significant correlations have been found between students’ LCTSR score and their normalized learning gains score from concept inventories in physics and astronomy courses [3], as well as a chemistry concept inventory and measures of intelligence and proportional reasoning ability [4]. When students were grouped into reasoning levels, it was found that students’ success on a transfer problem (i.e., an unrelated problem that assesses students’ reasoning) increased with higher reasoning levels [1]. Additionally, students in higher reasoning levels were also found to perform better on both algorithmic and conceptual questions, as well as on an ACS exam [10].
Evidence of single administration reliability was evaluated by calculating coefficient alpha for the entire instrument using the individual item scoring method (alpha = 0.85) and the pairwise scoring method (alpha = 0.86) [5]. Additionally, coefficient alpha was calculated for each question pair and found to range between 0.52 - 0.97 [5]. The difficulty of most question pairs were found to be in the suggested range of 0.3 - 0.9, although three question pairs were found outside of this range and were deemed too easy. These same three question pairs also showed poor discrimination (< 0.3) [5]. The point biserial correlation coefficients for all question pairs were used to investigate the relation between the question pair scores and the total test score and were found to be above the suggested value of 0.2 for all question pairs [5].
Recommendations for use
The LCTSR has been administered in multiple different introductory courses in order to assess students’ scientific reasoning; however, the validity evidence of data collected with the LCTSR is limited. Evidence based on relation to other variables has been provided through relations of LCTSR scores and/or reasoning levels and other variables such as, scores on a transfer problem [1], exam scores [1,4,10], concept inventories [3,4], and course grade [8]. This supports the use of this measure to predict student achievement outcomes (i.e., participants that score higher on the LCTSR would be expected to score higher on content-based assessments).
The predecessor to the LCTSR, the CTFR-78, was developed with the aid of experts in Piaget’s developmental theory and student interviews to provide some support of test content and response process; however, there was no indication that the same type of expert and/or student feedback were sought when modifying and creating additional items for the LCTSR [1]. A use study conducted at a later time obtained expert feedback and student interviews of the LCTSR question pairs and found multiple concerns with at least five question pairs [5]. Therefore, although there is evidence that the CTFR-78 measures formal reasoning, there is no equivalent evidence presented specifically to support the LCTSR as a measure of scientific reasoning. Concerns with some of the question pairs indicates that users should proceed with caution with interpreting results.
Limited evidence based on internal structure of LCTSR data is provided. Although studies generally report an overall LCTSR score [1-10], with some also reporting subscale scores for the six reasoning patterns [3,5], no evidence to support the use of these composite scores is provided. There is also some concern about inconsistent student response patterns to the question pairs, which may increase uncertainties in participant scores and need to be considered before interpreting data collected with the LCTSR [5].
Given the limited support from validity evidence, future users are encouraged to provide evidence of test content, response process, and internal structure before interpreting results from data collected with the LCTSR.
Details from panel review
The multiple-choice 2000 version of the LCTSR is based on modifications to an earlier 1978 version (CTFR-78). The CTFR-78 showed evidence based on test content and response process, as it was developed using Piaget’s developmental theory with the aid of experts in Piagetian research and interview data from middle-school students [5]. Additionally, quantitative data collected with the CRFR-78 was analyzed with principal components analysis to gain some information about the internal structure [5]. However, the validity evidence presented to support the modified multiple-choice LCTSR is more limited. Regarding evidence based on test content, information about the process behind the selection and modification of CTFR-78 items is not included in the development paper [1]. Additionally, although two new items are explicitly included in the development paper, there are no details about how these items were created [1]. A later study obtained expert feedback about the LCSTR items [5]; however, the feedback was used to note concerns about some question pairs and was not used for the development of the items. Similarly, evidence based on response process related to the LCTSR is minimal, with a later study noting concerns with some question pairs based on student interviews about the items[5]. Regarding evidence based on internal structure, one study evaluated the dependencies within and between question pairs by comparing the correlations of the residuals’ variances (i.e., absolute value of Q3 from a Rasch analysis) [5]. They found that most items correlated most strongly with their associated item in a question pair; however, there were six question pairs where this was not the case, indicating possible validity concerns for data collected with these items[5]. Student response patterns to the items within each question pair were also examined for consistency. Results indicated that although seven question pairs showed evidence of good consistency (i.e., students responded to both items in the same question pair either correctly or incorrectly), five of the question pairs showed high levels of inconsistency(i.e., students responded correctly to one item in a question pair and incorrectly to the other item in the same pair) [5]. The relations between LCTSR results and related outcomes and measures have been evaluated in multiple studies [1,3,4,8,10] providing evidence based on relation to other variables.
Single administration reliability of both the individual and pairwise scoring methods were evaluated by coefficient alpha, which initially gave 0.85 and 0.76, respectfully [5]. As the individual and pairwise scoring method result in different test lengths (i.e., 24 points vs. 12 points), the alpha value for the pairwise scoring method was adjusted using the Spearman-Brown prophecy formula, which gave an alpha value of 0.86 [5]. The sample-to-sample reliability of the measure was assessed in another study, which found that the mean and standard deviation of the results obtained with the LCTSR from one year to the next year were very similar [4]. Additionally, a correlation value of LCTSR scores from pretest to posttest was found to be 0.65, which could be considered as evidence for test-retest reliability [1]. However, as there was instruction between the pretest and posttest, it is not clear that the results from both administrations is a good indicator of test-retest reliability, as students’ reasoning skills may be expected to change during this time.
References
[1] Lawson, A.E., Clark, B., Cramer-Meldrum, E., Falconer, K.A., Sequist, J.M., & Kwon, Y.J. (2000). Development of Scientific Reasoning in College Biology: Do Two Levels of General Hypothesis-Testing Skills Exist? J. Res. Sci. Teach. 37, 81-101.
[2] Jensen, J. L. & Lawson, A. (2011). Effects of collaborative group composition and inquiry instruction on reasoning gains and achievement in undergraduate biology. CBE LSE 10(1), 63-73.
[3] Moore, J. C. & Rubbo, L. J. (2012). Scientific reasoning abilities of nonscience majors in physics-based courses. Phys. Rev. Phys. Educ, 8, 010106.
[4] Cracolice, M. S. & Busby, B. D. (2015). Preparation for College General Chemistry: More than Just a Matter of Content Knowledge Acquisition. J. Chem. Educ., 92(11), 1790-1797.
[5] Bao, L., Xiao, Y., Koenig, K. & Han, J. (2018). Validity evaluation of the Lawson classroom test of scientific reasoning. Phys. Rev. Phys. Educ., 14, 020106.
[6] Jensen, J. L., Holt, E. A., Sowards, J. B., Heath Ogden, T. & West, R. E. (2018). Investigating Strategies for Pre-Class Content Learning in a Flipped Classroom. J. Sci. Educ. Tech., 27, 523-535.
[7] Stammen, A. N., Malone, K. L. & Irving, K. E. (2018). Effects of Modeling Instruction Professional Development on Biology Teachers’ Scientific Reasoning Skills. Educ. Sciences 8(3), 119.
[8] Thompson, E. D., Bowling, B. V. & Markle, R. E. (2018). Predicting Student Success in a Major’s Introductory Biology Course via Logistic Regression Analysis of Scientific Reasoning Ability and Mathematics Scores. Res. Sci. Educ. 48, 151-163.
[9] Jensen, J. L., McDaniel, M. A., Kummer, T. A., Godoy, P. D. D. M. & St. Clair, B. (2020). Testing Effect on High-Level Cognitive Skills. CBE LSE 19:ar39.
[10] Cracolice, M. S., Deming, J. C. & Ehlert, B. (2008). Concept Learning versus Problem Solving: A Cognitive Difference. J. Chem. Educ. 85(6), 873-878.