Panel Review: Classroom Observation Protocol For Undergraduate STEM (COPUS)

(Post last updated 09 June 2023)

Review panel summary
Classroom Observation Protocol for Undergraduate STEM (COPUS) is an observation protocol used to document and describe classroom practices in a non-evaluative manner. COPUS has been used in numerous environments varying in course level (middle-school/ high-school [5] through college (lower and upper-level courses, and graduate classes [6]) [1-10]), discipline (both STEM [1-10] and non-STEM [9]), and class size (<25 to >200) [1-10]. When using COPUS, observers identify instructor and student behaviors within 2-minute time periods. COPUS consists of 25 behavioral codes, including 12 instructor behaviors and 13 student behaviors [1]. Example instructor codes include lecturing, asking and answering questions, guiding small-group discussions, or writing on the board; example student codes include working in small groups, asking and answering questions, or listening to instructor [1]. For ease in interpretation of data, the 25 individual codes are most often reduced. Statistical approaches used for code reduction have included cluster analysis [3,6,9] and item profiles [9]. Additionally, a heuristic approach toward collapsing categories based on instructor behavior (4 categories) and student behavior (4 categories) has been utilized, although details describing the methods by which the categories were collapsed are not described in the literature. [3,4,5,8]

Training for observers of COPUS differs in reported length in the literature, and the literature has not come to a consensus on the length of training needed. However, trained observers have included people without experience in faculty development or education research (e.g., undergraduates, graduates, and faculty), as well as faculty developers and science-education specialists. In addition, the literature has not come to a consensus on the number of observations of an instructor needed to determine an instructor’s teaching style; although it is agreed that one observation is not sufficient [6,10]. Lastly, observations using COPUS have been performed both in person (i.e., in real time) and via video recordings of courses [3,10].

Validity Evidence
COPUS was developed from the Teaching Dimensions Observation Protocol (TDOP) [13]. The initial development and collection of validity evidence related to the COPUS took place across two years and two institutions [1]. The development process included an iterative cycle of classroom observation, feedback from observers, and refinement of COPUS codes. Evidence based on test content was collected by the COPUS developers through feedback from science-education specialists (“who are highly trained and experienced classroom observers” [1]) and interviews with K–12 teachers, all of whom had completed at least one observation using COPUS [1]. The developers used multiple observers (including science education specialists, K-12 teachers, and STEM faculty) across multiple STEM departments, sizes of courses, and level of courses. The final version of COPUS was used by 16 K-12 teachers to observe 23 undergraduate STEM courses across 7 departments (biology, molecular biology, engineering, chemistry, math, physics, and geology; 96% introductory with 35% > 100 students) at one institution and 7 faculty observers in 8 departments (biology, chemistry, math, physics; 100% introductory with 63% >100 students) at a second institution.

After its development, most of the validity evidence supporting COPUS comes in the form of relations to other variables. These encompass comparisons of the results generated by COPUS with the teaching-practice inventory (TPI [2]), student self-report surveys (engagement [9]), the learner-centered teaching rubric (LCTR [10]), classroom discourse observation protocols (CDOP [8]), biometric responses [7], and other observation protocols (see Table 1 below). COPUS results align with instruments that measure more holistic teaching methods (i.e., LCTR and TPI), measurements of student engagement (i.e., biometric response and student self-report engagement survey), and classroom discourse observation protocols (i.e., CDOP). Additionally, COPUS codes overlap with the codes from numerous observation protocols (Table 1), showing alignment with factors for measuring teaching practices developed by other science educators.

Table 1: Observation Protocols Compared with COPUS
Observation Protocol	Reference
Three Dimensional Learning Observation Protocol (3D-LOP)	[6], Supplemental Material (SM)
Behavioral Engagement Related to Instruction (BERI)	[6], SM
Teaching Dimensions Observation Protocol (TDOP)	[6], SM
Flanders Interaction Analysis (FIA)	[6], SM
Observing Patterns of Adaptive Learning (OPAL)	[6], SM
Real-time Instructor Observation Tool (RIOT)	[6], SM
Reformed Teaching Observation Protocol (RTOP)	[3]
STROBE	[6], SM
UTeach Observation Protocol (UTOP)	[6], SM

Reliability Evidence

Inter-rater reliability (IRR) is the method that has been the most commonly reported source of reliability evidence for COPUS data. All studies referenced in this summary have shown acceptable values for IRR. The use of Cohen’s Kappa was the most widely cited [1-6,9], with others using percent agreement [1,9], Jaccard similarity scores [1,6], and Fleiss Kappa [8,10]. IRR has been performed on classroom observations for multiple sizes and types of courses. The number of coders for IRR varies widely depending on the study. IRRs for in-person observations [1,2,4,5,7-9,12] and video recordings [3,10] have shown similar and acceptable values.

Recommendations for use
COPUS was developed to document teaching practices in STEM courses, while not providing data about the quality of the behaviors observed or the material covered in the class. Since its first appearance in the biology education literature, COPUS has been used for research in secondary schools through university settings, STEM and non-STEM courses, and in various course sizes. The validity and reliability evidence reported in the literature lends support that COPUS data can be used to measure teaching practices in a wide variety of settings.

Different amounts of training have been reported when preparing observers to use COPUS. Unlike some observation protocols, observers have been non-experts in education research and faculty-development (e.g., undergraduates, K-12 teachers, and university faculty), as well as faculty-development or education-research experts. For resources that could be useful for training COPUS observers, see Trestle COPUS Observation Resources (https://trestlenetwork.ku.edu/copus-observation-resources/).

Studies have used different numbers of observations to determine an instructor’s teaching style. While there is agreement that one observation is not enough, the actual number has not yet been established in the literature [6,10]. A statistical study on the number of observations needed for valid inferences of an instructor’s practices showed the number varied depending on course and instructor [14]. Hence, the recommended number of observations per instructor likely depends on the consistency and variety of practices employed within a course.

Researchers have used both in-person observations and video recordings for COPUS. For in-person observations, single [5] and multiple observers [1-3,8] have been used in each classroom observation. When there are multiple observers, two approaches for handling code discrepancies have been reported: 1) codes that have been marked in a 2-minute period by both observers are retained and any discrepancies are removed from the data [1-3], and 2) the observers meet immediately after each observation, discuss any code discrepancies and resolve these discrepancies through discussion [8].

COPUS has been shown in the literature to be able to display nuances in instructional practices across contexts, using both statistical and non-statistical methods. As most studies reduce the number of COPUS codes when interpreting the data, researchers should think carefully about what methods to use when reducing the number of codes for analysis. Researchers who have used COPUS have suggested that the observation protocol could be used for professional development and providing instructor feedback in a non-judgmental manner, although any existing applications of such use were not included in the literature used for this summary.

Details from panel review
COPUS has been used in many environments with no additional codes needed. For example, Stains et al. studied 709 university courses in STEM showing the distributions of three main teaching styles (didactic, interactive, and student-centered) over a wide range of class sizes, classroom layouts, course level, and STEM disciplines (including in chemistry courses) [6]. Denaro et al. observed 250 college classrooms of which 42% were non-STEM courses and analyzed the data via the COPUS analyzer, k-means, and PAM method [9]. The authors found that the frequency of COPUS codes varied minimally between STEM and non-STEM courses (i.e., the frequency differed for only two codes: Student Individual Thinking/Problem Solving and Instructor Real-Time Writing on the Board). Akiha et al. observed 118 secondary school classes and 364 university courses in STEM finding a shift towards lecturing moving from middle-school classes to university courses [5].

Most studies analyzed the observation data using a reduced number of codes, which allows for ease of interpretation of teaching practices for research and for feedback to instructors. Some studies used a heuristic approach of collapsed categories based on instructor behavior (4 categories) and student behavior (4 categories), although details describing the methods by which the categories were collapsed were not described [2,4,5,8]. For example, Lewin et al. used COPUS to demonstrate the multiple ways instructors implement clickers giving implications to faculty developers and also found that implementing clickers does not automatically lead to more time overall being spent in active-learning student centered engagement [4]. Other studies used statistical approaches such as cluster analyses [3,6,9] and item profiles [9]. In addition, Stains et al. developed the COPUS Analyzer [6], which automatically categorizes the observation data into instructional styles called COPUS profiles (http://www.copusprofiles.org/). While researchers have found the COPUS profiles useful, one study has shown that some formative assessment behaviors could not be differentiated between the COPUS profiles [12]. These results suggest that researchers should take caution when interpreting and drawing conclusions about the COPUS profiles.

References

[1] Smith, M.K., Jones, F.H.M., Gibert, S.L. & Wieman, C. (2013). The Classroom Observation Protocol for Undergraduate STEM (COPUS): A New Instrument to Characterize University STEM Classroom Practices. CBE LSE. 12(4), 577-723.

[2] Smith, M.K., Vinson, E.L., Smith, J.A., Lewin, J.D., & Stetzer, M.R. (2014). A Campus-Wide Study of STEM Courses: New Perspectives on Teaching Practices and Perceptions. CBE LSE. 13(4), 624-635.

[3] Lund, T.J., Pilarz, M., Velasco, J.B., Chakraverty, D., Rosphloch, K., Undersander, M., & Stains, M. (2015). The Best of Both Worlds: Building on the COPUS and RTOP Observation Protocols to Easily and Reliably Measure Various Levels of Reformed Instructional Practice. CBE LSE. 14(2), 14:ar1-12.

[4] Lewin, J.D., Vinson, E.L., Stetzer, M.R., & Smith, M.K. (2016). A Campus-Wide Investigation of Clicker Implementation: The Status of Peer Discussion in STEM Classes. CBE LSE. 15(1), 15:ar1-12.

[5] Akiha, K., Brigham, E., Couch, B.A., Lewin, J., Stains, M., Stetzer, M.R., Vinson, E.L. & Smith, M. (2015). What Types of Instructional Shifts Do Students Experience? Investigating Active Learning in Science, Technology, Engineering, and Math Classes across Key Transition Points from Middle School to the University Level. Frontiers Educ. 2(68), 1-18.

[6] Stains, M., Harshman, J., Barker, M. K., Chasteen, S. V., Cole, R., DeChenne-Peters, S. E., Eagan, M. K., Esson, J. M., Knight, J. K., Laski, F. A., Levis-Fitzgerald, M., Lee, C. J., Lo, S. M., McDonnell, L. M., McKay, T. A., Michelotti, N., Musgrove, A., Palmer, M. S., Plank, K. M., … Young, A. M. (2018). Anatomy of STEM teaching in North American universities. Science. 359(6383), 1468-1470.

[7] McNeal, K.S., Zhong, M., Soltis, N.A., Doukopoulos, L., Johnson, E.T., Courtney, S., Alwan, A., & Porch, M. (2020). Biosensors Show Promise as a Measure of Student Engagement in a Large Introductory Biology Course. CBE LSE. 19(4), 19:ar50, 1-10.

[8] Alkhouri, J.S., Donham, C., Pusey, T.S., Signorini, A., Stivers, A.H., & Kranzfelder, P. (2021). Look Who's Talking: Teaching and Discourse Practices across Discipline, Position, Experience, and Class Size in STEM College Classrooms. BioScience. 71(10), 1063-1078.

[9] Denaro, K., Sato, B., Harlow, A., Aebersold, A., & Verma, M. (2021). Comparison of Cluster Analysis Methodologies for Characterization of Classroom Observation Protocol for Undergraduate STEM (COPUS) Data. CBE LSE. 20(1), 20:ar3, 1-11.

[10] Shi, L., Popova, M., Erdmann, R.M., Pellegrini, A., Johsnon, V., Le, B., Popple, T., Nelson, Z., Undersander, M.G., Stains, M. (2023). Exploring the Complementarity of Measures of Instructional Practices. CBE LSE. 22(1), 22:ar1, 1-9.

[11] Connell, G.L., Donovan, D.A., Chambers, T.G. (2016). Increasing the Use of Student-Centered Pedagogies from Moderate to High Improves Student Learning and Attitudes about Biology. CBE LSE. 15(1), 15:ar3, 1-15.

[12] McConnell, M., Boyer, J., Montplaisir, L.M., Arneson, J.B., Harding, R.L.S., Farlow, B., & Offerdahl, E.G. (2021). Increasing the Use of Student-Centered Pedagogies from Moderate to High Improves Student Learning and Attitudes about Biology. CBE LSE. 20(2), 20:ar26, 1-9.

[13] Hora, M.T., Oleson, A., Ferrare, J.J. (2013). Teaching Dimensions Observation Protocol (TDOP) User’s Manual, Madison: Wisconsin Center for Education Research, University of Wisconsin–Madison. http://tdop.wceruw.org/Document/TDOP-Users-Guide.pdf

[14] Sbeglia, G.C., Goodridge, J.A., Gordon, L.H., & Nehm. R.H. (2021). Are Faculty Changing? How Reform Frameworks, Sampling Intensities, and Instrument Measures Impact Inferences about Student-Centered Teaching Practices. CBE LSE. 20(3), 20:ar39, 1-16.

Panel Review: Classroom Observation Protocol For Undergraduate STEM (COPUS)

CHIRAL

ChemEd X

NSF