Skip to main content

Classroom Observation Protocol For Undergraduate Stem


    Listed below is general information about the instrument.
    Original author(s)
    • Smith, M. K., Jones, F. H. M., Gilbert, S. L., & Wieman, C. E.

    Original publication
    • Smith, M. K., Jones, F. H., Gilbert, S. L., & Wieman, C. E. (2013). The Classroom Observation Protocol for Undergraduate STEM (COPUS): A new instrument to characterize university STEM classroom practices. CBE—Life Sciences Education, 12(4), 618-627.
    Year original instrument was published 2013
    Number of items 25
    Number of versions/translations 1
    Cited implementations 4
    • English
    Country Canada, United States
    • Observation Protocol
    Intended population(s)
    • Students
    • Undergraduate
    • Faculty
    • Tertiary
    • Unknown
    • Behavioral
    • Observation
    The CHIRAL team carefully combs through every reference that cites this instrument and pulls all evidence that relates to the instruments’ validity and reliability. These data are presented in the following table that simply notes the presence or absence of evidence related to that concept, but does not indicate the quality of that evidence. Similarly, if evidence is lacking, that does not necessarily mean the instrument is “less valid,” just that it wasn’t presented in literature. Learn more about this process by viewing the CHIRAL Process and consult the instrument’s Review (next tab), if available, for better insights into the usability of this instrument.

    Information in the table is given in four different categories:
    1. General - information about how each article used the instrument:
      • Original development paper - indicates whether in which paper(s) the instrument was developed initially
      • Uses the instrument in data collection - indicates whether an article administered the instrument and collected responses
      • Modified version of existing instrument - indicates whether an article has modified a prior version of this instrument
      • Evaluation of existing instrument - indicates whether an article explicitly provides evidence that attempt to evaluate the performance of the instrument; lack of a checkmark here implies an article that administered the instrument but did not evaluate the instrument itself
    2. Reliability - information about the evidence presented to establish reliability of data generated by the instrument; please see the Glossary for term definitions
    3. Validity - information about the evidence presented to establish reliability of data generated by the instrument; please see the Glossary for term definitions
    4. Other Information - information that may or may not directly relate to the evidence for validity and reliability, but are commonly reported when evaluating instruments; please see the Glossary for term definitions
    Publications: 1 2 3 4


    Original development paper
    Uses the instrument in data collection
    Modified version of existing instrument
    Evaluation of existing instrument


    Test-retest reliability
    Internal consistency
    Coefficient (Cronbach's) alpha
    McDonald's Omega
    Inter-rater reliability
    Person separation
    Generalizability coefficients
    Other reliability evidence


    Expert judgment
    Response process
    Factor analysis, IRT, Rasch analysis
    Differential item function
    Evidence based on relationships to other variables
    Evidence based on consequences of testing
    Other validity evidence

    Other information

    Evidence based on fairness
    Other general evidence
    DISCLAIMER: The evidence supporting the validity and reliability of the data summarized below is for use of this assessment instrument within the reported settings and populations. The continued collection and evaluation of validity and reliability evidence, in both similar and dissimilar contexts, is encouraged and will support the chemistry education community’s ongoing understanding of this instrument and its limitations.
    This review was generated by a CHIRAL review panel. Each CHIRAL review panel consists of multiple experts who first individually review the citations of the assessment instrument listed on this page for evidence in support of the validity and reliability of the data generated by the instrument. Panels then meet to discuss the evidence and summarize their opinions in the review posted in this tab. These reviews summarize only the evidence that was discussed during the panel which may not represent all evidence available in the published literature or that which appears on the Evidence tab.
    If you feel that evidence is missing from this review, or that something was documented in error, please use the CHIRAL Feedback page.

    Panel Review: Classroom Observation Protocol For Undergraduate STEM (COPUS)

    (Post last updated 09 June 2023)

    Review panel summary   
    Classroom Observation Protocol for Undergraduate STEM (COPUS) is an observation protocol used to document and describe classroom practices in a non-evaluative manner. COPUS has been used in numerous environments varying in course level (middle-school/ high-school [5] through college (lower and upper-level courses, and graduate classes [6]) [1-10]), discipline (both STEM [1-10] and non-STEM [9]), and class size (<25 to >200) [1-10]. When using COPUS, observers identify instructor and student behaviors within 2-minute time periods. COPUS consists of 25 behavioral codes, including 12 instructor behaviors and 13 student behaviors [1]. Example instructor codes include lecturing, asking and answering questions, guiding small-group discussions, or writing on the board; example student codes include working in small groups, asking and answering questions, or listening to instructor [1]. For ease in interpretation of data, the 25 individual codes are most often reduced. Statistical approaches used for code reduction have included cluster analysis [3,6,9] and item profiles [9]. Additionally, a heuristic approach toward collapsing categories based on instructor behavior (4 categories) and student behavior (4 categories) has been utilized, although details describing the methods by which the categories were collapsed are not described in the literature. [3,4,5,8]

    Training for observers of COPUS differs in reported length in the literature, and the literature has not come to a consensus on the length of training needed. However, trained observers have included people without experience in faculty development or education research (e.g., undergraduates, graduates, and faculty), as well as faculty developers and science-education specialists. In addition, the literature has not come to a consensus on the number of observations of an instructor needed to determine an instructor’s teaching style; although it is agreed that one observation is not sufficient [6,10]. Lastly, observations using COPUS have been performed both in person (i.e., in real time) and via video recordings of courses [3,10].

    Validity Evidence
    COPUS was developed from the Teaching Dimensions Observation Protocol (TDOP) [13]. The initial development and collection of validity evidence related to the COPUS took place across two years and two institutions [1]. The development process included an iterative cycle of classroom observation, feedback from observers, and refinement of COPUS codes. Evidence based on test content was collected by the COPUS developers through feedback from science-education specialists (“who are highly trained and experienced classroom observers” [1]) and interviews with K–12 teachers, all of whom had completed at least one observation using COPUS [1]. The developers used multiple observers (including science education specialists, K-12 teachers, and STEM faculty) across multiple STEM departments, sizes of courses, and level of courses. The final version of COPUS was used by 16 K-12 teachers to observe 23 undergraduate STEM courses across 7 departments (biology, molecular biology, engineering, chemistry, math, physics, and geology; 96% introductory with 35% > 100 students) at one institution and 7 faculty observers in 8 departments (biology, chemistry, math, physics; 100% introductory with 63% >100 students) at a second institution.

    After its development, most of the validity evidence supporting COPUS comes in the form of relations to other variables. These encompass comparisons of the results generated by COPUS with the teaching-practice inventory (TPI [2]), student self-report surveys (engagement [9]), the learner-centered teaching rubric (LCTR [10]), classroom discourse observation protocols (CDOP [8]), biometric responses [7], and other observation protocols (see Table 1 below). COPUS results align with instruments that measure more holistic teaching methods (i.e., LCTR and TPI), measurements of student engagement (i.e., biometric response and student self-report engagement survey), and classroom discourse observation protocols (i.e., CDOP). Additionally, COPUS codes overlap with the codes from numerous observation protocols (Table 1), showing alignment with factors for measuring teaching practices developed by other science educators.

    Table 1: Observation Protocols Compared with COPUS
    Observation ProtocolReference

    Three Dimensional Learning Observation Protocol (3D-LOP)

    [6], Supplemental Material (SM)

    Behavioral Engagement Related to Instruction (BERI)

    [6], SM

    Teaching Dimensions Observation Protocol (TDOP)

    [6], SM

    Flanders Interaction Analysis (FIA)

    [6], SM

    Observing Patterns of Adaptive Learning (OPAL)

    [6], SM

    Real-time Instructor Observation Tool (RIOT)

    [6], SM

    Reformed Teaching Observation Protocol (RTOP)



    [6], SM

    UTeach Observation Protocol (UTOP)

    [6], SM

    Reliability Evidence

    Inter-rater reliability (IRR) is the method that has been the most commonly reported source of reliability evidence for COPUS data. All studies referenced in this summary have shown acceptable values for IRR. The use of Cohen’s Kappa was the most widely cited [1-6,9], with others using percent agreement [1,9], Jaccard similarity scores [1,6], and Fleiss Kappa [8,10]. IRR has been performed on classroom observations for multiple sizes and types of courses. The number of coders for IRR varies widely depending on the study. IRRs for in-person observations [1,2,4,5,7-9,12] and video recordings [3,10] have shown similar and acceptable values.

    Recommendations for use   
    COPUS was developed to document teaching practices in STEM courses, while not providing data about the quality of the behaviors observed or the material covered in the class. Since its first appearance in the biology education literature, COPUS has been used for research in secondary schools through university settings, STEM and non-STEM courses, and in various course sizes. The validity and reliability evidence reported in the literature lends support that COPUS data can be used to measure teaching practices in a wide variety of settings.

    Different amounts of training have been reported when preparing observers to use COPUS. Unlike some observation protocols, observers have been non-experts in education research and faculty-development (e.g., undergraduates, K-12 teachers, and university faculty), as well as faculty-development or education-research experts. For resources that could be useful for training COPUS observers, see Trestle COPUS Observation Resources (

    Studies have used different numbers of observations to determine an instructor’s teaching style. While there is agreement that one observation is not enough, the actual number has not yet been established in the literature [6,10]. A statistical study on the number of observations needed for valid inferences of an instructor’s practices showed the number varied depending on course and instructor [14]. Hence, the recommended number of observations per instructor likely depends on the consistency and variety of practices employed within a course.

    Researchers have used both in-person observations and video recordings for COPUS. For in-person observations, single [5] and multiple observers [1-3,8] have been used in each classroom observation. When there are multiple observers, two approaches for handling code discrepancies have been reported: 1) codes that have been marked in a 2-minute period by both observers are retained and any discrepancies are removed from the data [1-3], and 2) the observers meet immediately after each observation, discuss any code discrepancies and resolve these discrepancies through discussion [8].

    COPUS has been shown in the literature to be able to display nuances in instructional practices across contexts, using both statistical and non-statistical methods. As most studies reduce the number of COPUS codes when interpreting the data, researchers should think carefully about what methods to use when reducing the number of codes for analysis. Researchers who have used COPUS have suggested that the observation protocol could be used for professional development and providing instructor feedback in a non-judgmental manner, although any existing applications of such use were not included in the literature used for this summary.

    Details from panel review   
    COPUS has been used in many environments with no additional codes needed. For example, Stains et al. studied 709 university courses in STEM showing the distributions of three main teaching styles (didactic, interactive, and student-centered) over a wide range of class sizes, classroom layouts, course level, and STEM disciplines (including in chemistry courses) [6]. Denaro et al. observed 250 college classrooms of which 42% were non-STEM courses and analyzed the data via the COPUS analyzer, k-means, and PAM method [9]. The authors found that the frequency of COPUS codes varied minimally between STEM and non-STEM courses (i.e., the frequency differed for only two codes: Student Individual Thinking/Problem Solving and Instructor Real-Time Writing on the Board). Akiha et al. observed 118 secondary school classes and 364 university courses in STEM finding a shift towards lecturing moving from middle-school classes to university courses [5].

    Most studies analyzed the observation data using a reduced number of codes, which allows for ease of interpretation of teaching practices for research and for feedback to instructors. Some studies used a heuristic approach of collapsed categories based on instructor behavior (4 categories) and student behavior (4 categories), although details describing the methods by which the categories were collapsed were not described [2,4,5,8]. For example, Lewin et al. used COPUS to demonstrate the multiple ways instructors implement clickers giving implications to faculty developers and also found that implementing clickers does not automatically lead to more time overall being spent in active-learning student centered engagement [4]. Other studies used statistical approaches such as cluster analyses [3,6,9] and item profiles [9]. In addition, Stains et al. developed the COPUS Analyzer [6], which automatically categorizes the observation data into instructional styles called COPUS profiles ( While researchers have found the COPUS profiles useful, one study has shown that some formative assessment behaviors could not be differentiated between the COPUS profiles [12]. These results suggest that researchers should take caution when interpreting and drawing conclusions about the COPUS profiles.


    [1] Smith, M.K., Jones, F.H.M., Gibert, S.L. & Wieman, C. (2013). The Classroom Observation Protocol for Undergraduate STEM (COPUS): A New Instrument to Characterize University STEM Classroom Practices. CBE LSE. 12(4), 577-723.

    [2] Smith, M.K., Vinson, E.L., Smith, J.A., Lewin, J.D., & Stetzer, M.R. (2014). A Campus-Wide Study of STEM Courses: New Perspectives on Teaching Practices and Perceptions. CBE LSE. 13(4), 624-635.

    [3] Lund, T.J., Pilarz, M., Velasco, J.B., Chakraverty, D., Rosphloch, K., Undersander, M., & Stains, M. (2015). The Best of Both Worlds: Building on the COPUS and RTOP Observation Protocols to Easily and Reliably Measure Various Levels of Reformed Instructional Practice. CBE LSE. 14(2), 14:ar1-12.

    [4] Lewin, J.D., Vinson, E.L., Stetzer, M.R., & Smith, M.K. (2016). A Campus-Wide Investigation of Clicker Implementation: The Status of Peer Discussion in STEM Classes. CBE LSE. 15(1), 15:ar1-12.

    [5] Akiha, K., Brigham, E., Couch, B.A., Lewin, J., Stains, M., Stetzer, M.R., Vinson, E.L. & Smith, M. (2015). What Types of Instructional Shifts Do Students Experience? Investigating Active Learning in Science, Technology, Engineering, and Math Classes across Key Transition Points from Middle School to the University Level. Frontiers Educ. 2(68), 1-18.

    [6] Stains, M., Harshman, J., Barker, M. K., Chasteen, S. V., Cole, R., DeChenne-Peters, S. E., Eagan, M. K., Esson, J. M., Knight, J. K., Laski, F. A., Levis-Fitzgerald, M., Lee, C. J., Lo, S. M., McDonnell, L. M., McKay, T. A., Michelotti, N., Musgrove, A., Palmer, M. S., Plank, K. M., … Young, A. M. (2018). Anatomy of STEM teaching in North American universities. Science. 359(6383), 1468-1470.

    [7] McNeal, K.S., Zhong, M., Soltis, N.A., Doukopoulos, L., Johnson, E.T., Courtney, S., Alwan, A., & Porch, M. (2020). Biosensors Show Promise as a Measure of Student Engagement in a Large Introductory Biology Course. CBE LSE. 19(4), 19:ar50, 1-10.

    [8] Alkhouri, J.S., Donham, C., Pusey, T.S., Signorini, A., Stivers, A.H., & Kranzfelder, P. (2021). Look Who's Talking: Teaching and Discourse Practices across Discipline, Position, Experience, and Class Size in STEM College Classrooms. BioScience. 71(10), 1063-1078.

    [9] Denaro, K., Sato, B., Harlow, A., Aebersold, A., & Verma, M. (2021). Comparison of Cluster Analysis Methodologies for Characterization of Classroom Observation Protocol for Undergraduate STEM (COPUS) Data. CBE LSE. 20(1), 20:ar3, 1-11.

    [10] Shi, L., Popova, M., Erdmann, R.M., Pellegrini, A., Johsnon, V., Le, B., Popple, T., Nelson, Z., Undersander, M.G., Stains, M. (2023). Exploring the Complementarity of Measures of Instructional Practices. CBE LSE. 22(1), 22:ar1, 1-9.

    [11] Connell, G.L., Donovan, D.A., Chambers, T.G. (2016). Increasing the Use of Student-Centered Pedagogies from Moderate to High Improves Student Learning and Attitudes about Biology. CBE LSE. 15(1), 15:ar3, 1-15.

    [12] McConnell, M., Boyer, J., Montplaisir, L.M., Arneson, J.B., Harding, R.L.S., Farlow, B., & Offerdahl, E.G. (2021). Increasing the Use of Student-Centered Pedagogies from Moderate to High Improves Student Learning and Attitudes about Biology. CBE LSE. 20(2), 20:ar26, 1-9.

    [13] Hora, M.T., Oleson, A., Ferrare, J.J. (2013). Teaching Dimensions Observation Protocol (TDOP) User’s Manual, Madison: Wisconsin Center for Education Research, University of Wisconsin–Madison.

    [14] Sbeglia, G.C., Goodridge, J.A., Gordon, L.H., & Nehm. R.H. (2021). Are Faculty Changing? How Reform Frameworks, Sampling Intensities, and Instrument Measures Impact Inferences about Student-Centered Teaching Practices. CBE LSE. 20(3), 20:ar39, 1-16.

    Listed below are all versions and modifications that were based on this instrument or this instrument were based on.
    Instrument is derived from:
    Name Authors
    • Hora, M. T., Oleson, A., & Ferrare, J. J.

    Listed below are all literature that develop, implement, modify, or reference the instrument.
    1. Smith, M. K., Jones, F. H., Gilbert, S. L., & Wieman, C. E. (2013). The Classroom Observation Protocol for Undergraduate STEM (COPUS): A new instrument to characterize university STEM classroom practices. CBE—Life Sciences Education, 12(4), 618-627.

    2. Reisner, B. A., Pate, C. L., Kinkaid, M. M., Paunovic, D. M., Pratt, J. M., Stewart, J. L., ... & Smith, S. R. (2020). I’ve been given copus (classroom observation protocol for undergraduate stem) data on my chemistry class... now what?. Journal of Chemical Education, 97(4), 1181-1189.

    3. Pelletreau, K. N., Knight, J. K., Lemons, P. P., McCourt, J. S., Merrill, J. E., Nehm, R. H., ... & Smith, M. K. (2018). A faculty professional development model that improves student learning, encourages active-learning instructional practices, and works for faculty at multiple institutions. CBE—Life Sciences Education, 17(2), es5.

    4. Flynn, A. B. (2017). Flipped chemistry courses: Structure, aligning learning outcomes, and evaluation. In Online approaches to chemical education (pp. 151-164). American Chemical Society.