Presenting a Validated Mid-Semester Evaluation of College Teaching to Improve Online Teaching

Mid-semester formative evaluations of college teaching are a promising, low-cost solution to providing online instructors with in-the-moment feedback to improve their online teaching practices. However, existing instruments suffer from issues of validity and bias, and fail to align with evidence-based strategies. This paper presented and psychometrically validated a researchbased Mid-Semester Evaluation of College Teaching for Online Instructors (MSECT-O) among 170 undergraduate students in seven online courses. Pilot study results demonstrated that the MSECT-O is a valid and reliable tool for online educators seeking to improve their online teaching and classroom climate.

because of instructor identity, the data are too general to inform faculty, and items are not developed using evidence-based psychometrics (e.g., Hammonds, Mariano, Ammons, & Chambers, 2017).
Most institutions develop their own SET instrument, but these measures are limited because of their lack of validation or alignment with education literature, and, as with all surveys, are only as strong as the literature used to create them (Hammonds et al., 2017). Most traditional student evaluations fail to reliably measure the multidimensionality of teaching (Knol, Dolan, Mellenbergh, & van der Maas, 2016), and instead load onto only one dimension of teaching (e.g., student satisfaction; Marsh, 1987). Teaching effectiveness, however, is more validly and accurately measured as a multi-dimensional factor, which is not simply satisfaction (Donlan & Byrne, 2020;Ambrose, Bridges, DiPietro, Movett, & Normal, 2010). For example, researchers such as Bangert (2008) pose that online teaching effectiveness consists of multiple factors including active learning.
End-of-semester SETs are also the primary method used for gathering institution-wide student evaluations of online teaching (Byrne, 2018). SETs for online courses are often developed by the institution and include a mix of Likert scale and open-ended core questions used for all inperson and online classes (Hammonds et al., 2017). However, just like evaluations of face-to-face courses, most evaluations for online courses were not developed using evidence-based psychometrics (Hammonds et al., 2017). Exceptions include the costly eSIR, a questionnaire developed by Educational Testing Services (Liu, 2012), which was recently discontinued by the company. Additionally, Gómez-Rey, Barbera, and Fernández-Navarro (2016) adapted the Online Learning Consortium scorecard into a student-facing evaluation. However, at 36 items, its length may hinder students' completion of the survey. Validated, evidence-based surveys enable institutions to ask students meaningful questions about the multidimensional factors of effective online teaching and focus on constructs meaningful to online teachers. Unfortunately, summative end-of-semester feedback cannot be implemented until the following semester, which fails to benefit the students completing the survey.

Mid-Semester Student Evaluations of Teaching
Formative feedback is often used to improve teaching (Overall & Marsh, 1979) and identify issues mid-semester before the end-of-semester evaluations are collected. Mid-semester evaluations (MSE) allow instructors to collect actionable and timely feedback from students in time to implement changes in the current semester (Berridge, Penney, & Wells, 2012;Costello et al., 2002). By collecting feedback mid-semester, instructors have time to react to feedback and make changes that could improve their end-of-semester evaluations, a critical measure for promotion and tenure. Thus, instructors have both intrinsic and extrinsic motivations to improve their teaching and the student experience. The use of a formative feedback process has been found to inform the use of better teaching practices (e.g., Hampton & Reiser, 2004) and can contribute to improved student satisfaction with the course (Costello et al., 2002;Overall & Marsh, 1979). MSEs have been found to be particularly beneficial for new instructors (Hampton & Reiser, 2004).

Bias in Student Evaluations
Existing student evaluations of teaching have been found to be biased measures of instruction dependent on student-level variables (Spooren, Vandermoere, Vanderstraeten, & Pepermans, 2017). For example, students tend to provide more positive evaluations when they expect to earn a higher grade or if the course is an elective, rather than a required course (Ting, 2000). Additionally, older students (e.g., juniors and seniors) are more likely to provide higher evaluations than younger students (Spooren, 2010). Despite the bias in student evaluations of teaching, Thomas, Graham, and Piña (2018) note that they "capture a critical perspective on online instructor behaviors that may be missed otherwise" (p. 6). Thus, an evidence-based formative feedback tool is needed that provides new online instructors with less biased student feedback on their online teaching.

Mid-Semester Evaluations in Online Teaching
Online instructors gather MSE data in different ways. The most discussed in the literature are instructor-made online surveys that collect anonymous open-ended feedback from students (Peterson, 2016). While these opportunities to provide feedback are appreciated by students, repeatedly generating and writing out ideas for course improvements is perceived to be tedious by students (Winchester & Winchester, 2012). Additionally, instructors often adapt in-person formative feedback processes to the online environment such as Berridge et al. (2012) andO'Neil-Hixon, Long, andBlock (2017) who both adapted the Small Groups Instructional Diagnosis (SGID; Coffman, 1998) process for online courses. During an online SGID, the instructor identifies a colleague or faculty developer to repeatedly communicate with students via email to collect their thoughts on the teacher's effectiveness using open-ended prompts such as "What suggestions do you have for this class?" The responses are then synthesized into a report and recommendations for the instructor. This process is both labor-intensive and unfocused, with students providing feedback on topics unrelated to online teaching effectiveness such as technical issues or their desire for face-to-face time with the instructor (Berridge et al., 2012). Considering these limitations, researchers like Thomas et al. (2018) and Walker (2005) suggest the use of formative quantitative evaluations with constructs specific to aspects of the online course which the teacher can control. The field, however, lacks a literature-based instrument that gathers valid and reliable mid-semester feedback from online students with questions that they can answer and provides useful feedback to the instructor on the multiple dimensions of online teaching.

Fearless Teaching Framework
This study is part of a larger research project on mid-semester evaluations based on the University of Maryland's Fearless Teaching Framework (Donlan, Loughlin, & Byrne, 2019;Donlan & Byrne, 2020), a literature-based model for effective college teaching organized into four dimensions: classroom climate, course content, teaching practices, and learning assessments. These four dimensions emerged from education theory and empirical literature as being strong predictors of student engagement, motivation, and success (e.g., Lave & Wenger, 1991;Wigfield & Eccles, 2000) and are classroom aspects that the instructor can control (as opposed to the classroom design, student demographics, etc.).
The Fearless Teaching Framework poses that in courses with a positive course climate, the instructor supports students' learning and designs the curriculum to be challenging, yet accessible, to all students (Morin, Marsh, Nagengast, & Scalas, 2014). Effective course content is meaningfully relevant to students' lives and interests, both personal and professional, developmentally appropriate for their existing knowledge level, and aligned with learning objectives (Howard, 2001;Lave & Wenger, 1991). Positive teaching practices are intentionally organized, aligned with active learning research, and connected with prior knowledge (Alexander & Winne, 2006;Wentzel & Brophy, 2014). Effective instructors communicate high and clear expectations of student work and engagement (Online Learning Consortium, 2019). Assessments are most effective when they are aligned with learning objectives, are communicated clearly (e.g., via a rubric), and do not contribute to unnecessary student stress (Wass, Timmermans, Harland, & McLean, 2018).
Building on the extensive literature review and expert validation that informed the Framework, the SET and MSE literatures were reviewed to develop the Mid-Semester Evaluation for College Teaching (MSECT; Donlan & Byrne, 2020). The original Mid-Semester Evaluation for College Teaching (MSECT) was designed to capture students' evaluations of effective faceto-face teaching and provide instructors with reliable and actionable feedback to improve their teaching (Donlan & Byrne, 2020). In partnership with education and faculty development experts, the original 13-item MSECT was designed to be applicable to all instructors from all departments who teach all types of courses. When piloted among 29 instructors and 1,350 undergraduate students, the MSECT items convincingly loaded onto the four latent factors of the Fearless Teaching Framework (all items loaded onto one of the four factors at or above the .40 threshold). That study provided sufficient evidence that the MSECT is a valid formative evaluation instrument of effective in-person teaching (Donlan & Byrne, 2020).
After reviewing the literature on student evaluations of online teaching, however, the MSECT items developed and validated for a face-to-face course may not provide sufficient feedback for an online instructor and thus warrant an online-specific MSECT instrument (Byrne, 2018). The purpose of this paper is to present the design and validation of a pilot study of the MSECT for Online Instructors (i.e., the MSECT-O). Additionally, this paper serves to explore the extent to which the MSECT-O feedback is susceptible to documented trends in student bias of teaching evaluations.

Research Questions
1. In alignment with the Fearless Teaching Framework, to what extent do latent climate, content, practice, and assessment factors fit the underlying structure of the data? 2. To what extent does the MSECT-O data differ by student-reported variables? 3. To what extent does the MSECT-O data differ by course-level variables?

Instrument Development
Through a three-step process, the original MSECT instrument was amended to gather feedback about online teaching. First, the authors reviewed the original items and determined two that were too specific for a face-to-face classroom and thus were rewritten to be applicable for an online course. Second, experts in college teaching and online teaching reviewed the new MSECT-O instrument and provided feedback on the phrasing and relevance. After incorporating this feedback, a pilot 13-item MSECT-O instrument was developed. Third, the MSECT-O was administered to the students of seven online instructors to collect data to confirm the factor structure and validity, and potentially reduce the number items (Table 3 presents the items in the reduced 12-item MSECT-O).
The MSECT-O items used a 6-point Likert scale (1 = "Strongly Disagree" to 6 = "Strongly Agree") and were based on the four dimensions of the Fearless Teaching Framework (Donlan, Loughlin, & Byrne, 2019): climate, content, practices, and assessments. The initial MSECT-O consistent of 13 items (reduced to 12 items in the analyses presented below) that load onto four factors: 1. Climate is a three-item measure of the extent to which the instructor fosters a classroom environment is inclusive and positive (a = .85). Items include, "My instructor creates an inclusive learning environment where everyone is welcome." 2. Content is a three-item measure of the extent to which the instructor conveys the relevance and connection of the course content (three-item scale a = .74). Items include "I have the prior knowledge necessary to be successful in this course." 3. Practice is a three-item measure of the extent to which the instructor enacts active and engaging activities to teach the content (a = .86). Items include "During online classes, this course includes activities other than watching recorded lectures." 4. Assessment is a three-item measure of the extent to which the instructor designs graded assignments that are fair and aligned with the learning objectives (a = .84). Items include "My instructor provides me with timely feedback on my work." The full instrument can be found in Table 3. Sample In fall 2018, seven online instructors at a large research university in the mid-Atlantic agreed to participate in the study as unpaid volunteers. Instructors were interested in getting student feedback for their professional development. The participating students were enrolled in one of the seven online courses taught by a participating instructor including an introductory level journalism course, a lower level agriculture course, two upper level humanities and social sciences courses, and three upper level STEM courses. 170 undergraduate students completed a midsemester evaluation of teaching about their online instructor. Of the students, 13 (7.65%) identified as first year students, 22 (12.94%) were sophomores, 43 (25.29%) were juniors, 91 (53.53%) were seniors (see demographic and enrollment information in Table 1; 41.18% men, 55.88% women). This sample is demographically representative of the University of Maryland's student body. The University's Institutional Review Board approved this study.

Data Collection
In fall 2018, online instructors from all departments, ranks, and course types (including tenured faculty, tenure-track faculty, professional-track faculty, and graduate instructors) were recruited to participate in this study via email. Seven online instructors agreed to participate. Prior to agreeing to participate, instructors were informed of the Fearless Teaching Framework and the MSECT-O instrument, and that their participation would not impact their employment. Midway through the semester, participating instructors were provided with a link to the MSECT-O instrument to distribute to students. Instructors were asked to collect data after students had completed and received feedback on one major assignment (e.g., a midterm exam or project).
Students of the seven participating online instructors completed an online survey which included the 13 original MSECT-O items and questions about their course and demographic information (e.g., year in school, expected course grade, the degree to which the course is a degree program requirement). Instructors distributed the online survey via email or though the learning management system messaging platform on or around the eighth week of the sixteenth week semester and provided students with three business days to complete the instrument. The survey was designed to be mobile-and tablet-compatible so that students could complete the evaluation even when they were away from their laptop or desktop computer.
Before completing the survey, students were informed that their participation would not impact their employment with the institution, nor would it impact their grade. The Informed Consent form stated that the evaluation results would be shared only with the research team, who would aggregate and anonymize the results before sharing them with the instructors. The results would be used to study the instrument itself and provide instructors with formative feedback.

Data Analysis
Because preliminary normality testing (Shapiro & Wilk, 1965) provided evidence that the sampling distribution for the MSECT-O items was statistically different from a normal distribution ( < .00), robust nonparametric versions of traditional statistical methods in SPSS 24 and a robust maximum likelihood estimator (MLR) in MPlus 8.0 were adopted. To address the first research question, Confirmatory Factor Analyses (CFA) in MPlus was conducted to determine the fit of our four-factor solution. CFA assessed the extent to which the data aligned with the theoretically-expected model based on the Fearless Teaching Framework. Because the MSECT-O items were developed to reflect the four parts of the Fearless Teaching Framework (Donlan, Loughlin, & Byrne, 2019;Kline, 2011), a confirmatory rather than exploratory factor analysis was appropriate. That is, whether the factor structure was aligned with the data was being confirmed, not rather than exploring different potential factor structures. A CFA was used to compare a four-factor model to a one-factor model to assess the extent to which this short scale could measure four subscales, as opposed to one omnibus latent construct. Output included loadings, factor covariance statistics, and multiple fit indices that aided in determining good model fit: a non-statistically significant chisquare test statistic, a comparative fit index (CFI) and Tucker Lewis index (TLI) greater than .95, a standardized root mean square residual (SRMR) value lower than .08, and a root mean square error of approximation (RMSEA) score lower than .06 (Hu & Bentler, 1999;Thompson, 2004). The model with the lower Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) values was seen to have a better fit. Factor replicability was determined by an Hindex value below .80 (Hammer, 2016;Hancock & Mueller, 2001) and reliability was assessed using Cronbach's alpha (Raykov & Marcoulides, 2011).
To answer the remaining questions, robust nonparametric methods were adopted, such as: Spearman rank-order correlations (the nonparametric version of the Pearson correlation test), the Kruskal-Wallis test (the nonparametric version of the one-way ANOVA), and the Mann-Whitney U test (the nonparametric version of the unpaired Students' sample t-test; Byrne, 2017). The effect sizes for the significant Mann-Whitney U tests were calculated by hand to interpret the percent of the variance of the dependent variable is explained by the independent variable. These analyses assessed the extent to which students' responses differed by students' demographic variables, which may indicate some bias in the MSECT scales.

Research Question 1. In alignment with the theoretical frame, to what extent do latent climate, content, practice, and assessment factors fit the underlying structure of the data? Two
CFAs were conducted: one with one factor and another with four factors based on the Fearless Teaching Framework (see Table 2). During analyses, the modification indices indicated that allowing one item (Content3: "The instructor communicated clear learning outcomes for this course") to crossload onto the climate and assessment factors as well as the content factor would improve model fit. To address this crossloading issue, the item Content3 was removed in the final analysis. We suggest that Content3 is not used in future assessments of online instruction because it does not provide unique information to the instructor within the Fearless Teaching Framework. The one-factor model had poor model fit while the reduced four-factor model had good model fit (see Tables 2 and 3). To statistically compare the reduced one-and four-factor models and determine the model with best fit, a c ! difference test was conducted to compares the fit of nested latent models (Kline, 2011).  Table 4 presents the correlations and descriptive statistics for the individual items.

H = Factor replicability scores
Note. The item Content3: "The instructor communicated clear learning outcomes for this course" was removed from the final instrument. It should not be used to assess online instruction.

Research Question 2. To what extent does the MSECT-O data differ by student reported variables?
Students' year in school was not significantly correlated with any of the MSECT-O constructs (see Table 5). This means students provided comparable feedback regardless of their year in college. Next, students' expected grade had a moderate, positive, statistically significant correlation with all four of the constructs which indicates that as expected grade increases so does the degree to which their feedback is positive. Table 6, the extent to which students identified the course as being a requirement for their major or minor was not correlated with the MSECT-O constructs of practice and assessment. Indicating the course was a degree requirement, however, had a weak, positive correlation with the course climate and content. In other words, students provided more positive climate and content feedback if they perceived the course to be a requirement.

Research Question 3. To what extent does the MSECT-O data differ by course-level variables? As presented in
Finally, a Kruskal-Wallis H test provided evidence that there was not a significant difference in MSECT-O responses (across all four factors) between the difference course disciplines, p > 0.05. This suggests that, for example, students provided comparable feedback to STEM and to journalism instructors.

Discussion
In summary, this study found evidence that the MSECT-O is a valid and reliable instrument for gathering formative feedback from undergraduate students about online teaching practices. The CFA produced evidence that the reduced MSECT-O instrument is a valuable tool because it provides online instructors with formative feedback specific to the four factors of teaching effectiveness outlined in the Fearless Teaching Framework. With the MSECT-O, online instructors can collect feedback from their students across four dimensions and use the Fearless Teaching Framework to improve their teaching during the semester the data was collected. Institutions or instructors interested in the emerging practice of online MSEs (Thomas, Graham, & Piña, 2018), can use the free and brief MSECT-O instrument with confidence in its validity and reliability. The MSECT-O fills a gap in the field of online student evaluations because it is grounded in the education literature, designed to collect less biased formative feedback, and was developed with rigorous psychometrics.
The MSECT-O responses were generally consistent among students' year in school, their reason for taking the course (i.e., if it was a requirement), and the discipline of the course. The findings provide support for the notion that instructors from all disciplines can feel confident using MSECT-O to collect mid-semester feedback regardless of if their online class is in STEM versus the humanities. However, identifying the course as a requirement had a weak, positive correlation with a more positive evaluation of the class content and climate, i.e., that students in elective courses provided slightly more negative feedback to their instructors about the classroom climate and content. These results differ from that of Ting (2000) and suggests that more research is needed to understand the relationship between students' perceptions of required courses and the evaluations of the classroom climate and content.
Finally, the findings align with those of other student evaluation researchers (e.g., Spooren et al., 2017) that as students' expected course grade increases so does their positive evaluation of the instructor's teaching. While this correlation is only moderate, it is still significant and an area of further investigation. future analyses will include a qualitative study of students' comments to explore the types of feedback they provide and if the constructiveness differs by course factors or expected grade.

Limitations
Limitations should be considered when interpreting the findings. First, the instructors who participated opted in, and therefore may be more open to student feedback than instructors who did not participate. Critics of SETs may not value or trust student feedback because the existing student-facing instruments ask questions that either students cannot meaningfully answer (e.g., regarding technology choices that require knowledge of the Learning Management System students might not have) or that instructors do not find valuable to improving their teaching (Peterson, 2016). In response to this perception, MSET-O measures were built to be clear and actionable.
Second, the effect of instructor-level variables such as instructor race and gender on students' responses were not explored because of the low number of instructors that participated. Future research will explore issues of bias by purposefully sampling a pool of instructors from diverse backgrounds.
Finally, although participating courses spanned disciplines, all student responses were from a single university. Therefore, further replication could assess the extent to which the measure provides robust information at other higher education contexts, such as small teaching colleges, and community colleges.

Conclusion
As more faculty, staff, and graduate students move their courses online, the MSECT-O provides a useful and valid way to gather student input for improving the course climate, content, practices, and assessments. An important contribution of this paper is providing an instrument that gathers formative feedback for online instructors about multiple aspects of their teaching, including how they foster an inclusive online classroom climate. As more instructors move online, faculty developers can provide tools like the MSECT-O to aid instructors in evaluating if they are providing an inclusive and equitable learning experience for all students and how they can further foster a climate in which all students feel supported.