(Post last updated 09 June 2023)
Review panel summary
The Flame Test Concept Inventory (FTCI) was designed to measure students’ understanding of atomic emission. This concept inventory consists of 19 multiple choice items, 11 of which are independent, while the remaining 8 make up 4 two-tier, ‘answer-reason’ pairs [1]. The FTCI has been administered to students enrolled in secondary high school level chemistry courses and advanced placement chemistry courses across the United States, in addition to students enrolled in an undergraduate first year chemistry course at a single institution.
Evidence based on test content was collected during the development of the FTCI. Development of the items was informed by cognitive interview data collected from students enrolled in high school, advanced placement, undergraduate first year, and upper division (instrumental analysis or analytical) chemistry courses. "The alternative conceptions that emerged through the analysis of student interviews formed the basis for creating questions and distracters" [1]. Once the developers had created a preliminary version of the FTCI, it was reviewed by seven faculty members who had experience teaching first-year chemistry and/or analytical chemistry. Analysis of the experts’ suggestions led to revisions in item wording and the addition of two new items, resulting in the pilot version of the FTCI.
Evidence based on response process for the pilot version of the FTCI was collected through cognitive interviews with undergraduate first year and upper division chemistry students. Analysis of these interviews resulted in the deletion of two items, the addition of one item, and revisions in wording to others, resulting in the final 19-item version of the FTCI. In order to collect additional evidence based on response process, a second round of cognitive interviews was conducted with undergraduate first year chemistry students who completed the final 19-item version. Based on analysis of this second set of cognitive interviews, the developers of the instrument deemed that no additional modifications to the FTCI were necessary.
Evidence related to single administration reliability was assessed through the calculation of coefficient alpha and Kuder−Richardson (KR-20) indices for each course level in which FTCI data were collected (i.e., high school, advanced placement, and first year undergraduate chemistry). The alpha and KR-20 values for the data from advanced placement students and first year undergraduate chemistry students were found to be acceptable, while the values from the high school chemistry student data were lower than the typically recommended cutoffs for either metric. The developers highlighted that the results seen in the high school chemistry data may reflect “a more fragmented knowledge about the concepts assessed by the FTCI” [1].
Validity evidence based on relation to other variables was assessed by comparing the performance of first year undergraduate chemistry students and upper division chemistry students on the pilot-version of the FTCI. A t-test indicated that, as expected, the upper division chemistry students significantly outperformed the first year undergraduate chemistry students. This provides some supportive evidence; however, the final 19-item version of the FTCI was not used in this analysis.
In order to collect evidence related to test-retest reliability, the final 19-item version of the FTCI was administered to the same group of first year undergraduate chemistry students two times, with three weeks between the two administrations. A stability coefficient, as measured by the Pearson correlation, did not meet the recommended threshold (0.7). The developers argued that a lower correlation should not be a cause for concern, as “the appropriateness of internal consistency thresholds for concept inventories and alternative conceptions have been recently questioned, given the challenges associated with measuring incomplete and incorrect student understandings” [1].
Item difficulty and item discrimination of the data generated by the FTCI was also examined. Data collected from high school, advanced placement, and undergraduate general chemistry students showed that item difficulty fell within the acceptable cutoff range for all but seven, two, and three items, respectively. Overall, only one item (Item 9) was uniformly difficult, and no items were uniformly easy, across all three groups of students. Regarding discrimination, data collected from the three aforementioned groups showed acceptable values for Fergusons’ delta, which indicated that the measure can distinguish students’ understanding across the full range of scores. Additionally, item-level discrimination was investigated to assess how well each item could discriminate between the top-performing students and the lowest-performing students. Data collected from high school, advanced placement, and undergraduate general chemistry students showed that item discrimination fell above the acceptable cutoff for all but six, three, and three items, respectively. Like item difficulty, only one item (Item 10) demonstrated poor item-level discrimination values across all three groups of students. Additionally, the point biserial correlation (ρbis) was calculated for each item. Results indicated that the majority of the items functioned acceptably for all students. Item 9 had low ρbis values for both high school, advanced placement students, and item 10 had low ρbis values across all three groups of students.
Recommendations for use
The FTCI has been used to assess students’ understanding of atomic emission in high school, advanced placement, and first year undergraduate chemistry courses. The developers of the concept inventory collected significant evidence based on test content for data collected with the pilot version of the FTCI through cognitive interviews with students and item level feedback from a panel of chemistry instructors. Future researchers using the FTCI may be interested in collecting additional evidence based on test content related to the items that were added or modified between the pilot and final versions of the FTCI.
Regarding the evidence based on response process, cognitive interviews have only been conducted with first year undergraduate chemistry students, while the instrument itself has been used to collect data from students enrolled in other chemistry courses. Collecting additional evidence in support of response process from students in high school and/or advanced placement chemistry courses would provide evidence to support that the items and distracters included in the FTCI are being appropriately understood and interpreted by students at these levels.
Additionally, validity evidence based on relation to other variables has been assessed for the pilot version of the FTCI by comparing the performance of first year undergraduate chemistry students to that of upper division chemistry students. While this comparison provides some supportive evidence, future researchers interested in using the FTCI to measure student differences between course levels are encouraged to collect additional evidence in this area.
Details from panel review
During the development of the FTCI, evidence based on response process was collected through cognitive interviews with undergraduate first year and upper division chemistry students only. The collection of additional evidence related to response process, especially from students enrolled in high school chemistry or advanced placement courses could be useful to researchers interested in using the FTCI to collect data from these populations.
For item difficulty, data collected from high school chemistry students showed that values fell within the acceptable cutoff range of 0.3-0.8 for all but seven items (Items 3,6,8,9,11,18,19), data collected from advanced placement students showed that values fell within the acceptable cutoff range for all but two items (Items 9,10), and data collected from undergraduate general chemistry students showed that values fell within the acceptable cutoff range for all but three items (Items 6,9,10). Turning to discrimination, data collected from the three aforementioned groups showed acceptable values for Ferguson’s delta (> 0.9). Additionally, item-level discrimination was investigated to assess how well each item could discriminate between the top 27% of students and the lowest 27% of students. For item-level discrimination, data collected from high school chemistry students showed that values fell above the acceptable cutoff of 0.3 for all but five items (Items 3,5,6,9,10,11), data collected from advanced placement students showed that values fell within the acceptable cutoff range for all but three items (Items 1,9,10), and data collected from undergraduate general chemistry students showed that values fell within the acceptable cutoff range for all but three items (Items 6,10,14). The developers of the FTCI highlighted that “the most difficult items also poorly discriminated, given that students in both the top and bottom 27% found them difficult" [1].
References
[1] Bretz, S.L. & Mayo, A.V.M. Development of the Flame Test Concept Inventory: Measuring Student Thinking about Atomic Emission. J. Chem. Educ. 95(1), 17-27.