Instructional Quality of Business MOOCs: Indicators and Initial Findings

The concept of instructional quality is central to the design and evaluation of massive open online courses (MOOCs). As MOOCs from the field of business and management are gaining importance both in academia and professional learning, questions on how to determine and improve the quality of these offerings arise. In this paper, we introduce an instrument for evaluating MOOCs against a set of theoretically grounded instructional design principles. After an overview of related research, we describe the concise course scan rubric and its application in detail. A pilot study with N = 101 business MOOCs reveals their rather low overall instructional quality. While most aspects of structuredness and clarity are rated high, the implementation of instructional design principles falls notably behind. The implications from our study point toward a learner-oriented notion of instructional quality and individualized learning and increased learner support in business MOOCs.


Instructional Quality of Business MOOCs: Indicators and Initial Findings
Massive open online courses (MOOCs) have been a trending topic in educational technology since inception in 2008. Departing from utopian-like expectations, such as the "democratization" of higher education with unrestricted und ubiquitous access, MOOCs have overcome much disillusionment and criticism (Wiley, 2015) and reached a state of productivity. In the past, many MOOCs showed unsatisfactory completion rates (Jordan, 2015), leading research toward topics like motivation, retention and completion, and satisfaction or engagement (Joksimović et al., 2018;Zhu, Sari, & Lee, 2018). Parts of these phenomena investigated in the past few years are associated with the instructional quality of MOOCs. Margaryan, Bianco, and Littlejohn (2015) have operationalized these concerns under the The elaborate evaluation of 15 courses from 12 different providers involved participant observation and the documentation of interaction and activities. Results show high average scores in the content category over all courses, and striking deficiencies in the interaction category in four of the courses.
In a qualitative embedded single case study, Kocdar, Okur, and Bozkurt (2017) analyzed three Coursera-style xMOOCs in depth. As a research framework, they applied the 12 dimensions for characterizing MOOCs by Conole (2013), some of which can be directly associated with instructional quality (e.g., degree of communication, type of learner pathway, and amount of reflection). The results of Kocdar et al.'s (2017) study showed that the "openness," "massiveness," "diversity," "use of multimedia," "communication among learners," "learning pathway," and "amount of reflection" dimensions were rated high. The "communication with instructors," "degree of collaboration," and "autonomy" dimensions were rated medium, whereas the "quality assurance," "certification," and "formal learning" dimensions were rated low.
Yilmaz, Ünal, and Çakır (2017) evaluated six Turkish MOOCs from a single platform according to instructional design principles. The 32 items of their online evaluation form were structured according to the seven principles for good practice in undergraduate education (e.g., ease of use, emphasizing time on task, encourage active learning, feedback) by Chickering and Gamson (1987) and based on the 2016 version of the Quality Online Course Initiative Rubric (Illinois Online Network, 2018). Results showed that paid courses had no advantages over free courses. The authors also found a number of drawbacks, such as limited instructor feedback or lack of opportunities for resource sharing among students.
Building on the well-known e-learning design principles (i.e., segmentation, redundancy, pretraining, contiguity, learner control, modality, practice, worked examples, feedback, coherence, multimedia, and personalization principle) by Clark and Mayer (2008), Oh, Chang, and Park (2018) analyzed 40 STEM MOOCs. Their initial findings showed differences in the application of those principles: segmentation and redundancy were applied to a very large extent, whereas practice, worked examples, and feedback principles were least applied. Further analyses revealed significant platform differences in the application of the contiguity, practice, and feedback principles, as well as significant differences in the application of the redundancy, practice, and feedback principles according to the course level difficulty (introductory vs. intermediate).
As a clearly pedagogically oriented approach, the assessing MOOC pedagogies (AMP) tool (Swan, Day, Bogle, & van Prooyen, 2015) builds on an existing instrument for evaluating the pedagogical dimensions of computer-based education by Reeves (1996). AMP generates a coursespecific profile over 10 pedagogical dimensions (i.e., epistemology, role of teacher, focus of activities, structure, approach to content, feedback, cooperative learning, accommodation of individual differences, activities/assessment, and user role), each being rated on a bipolar scale. An initial comparison of 13 STEM MOOCs revealed differences in pedagogies on the provider level. The expanded sample then showed further differences between STEM and non-STEM courses. Additionally, three pedagogical patterns, so-called "metaphors for learning" (Swan, Day, & Bogle, 2016) have been identified (i.e., acquisition, participation, self-direction). Fan (2017) later used the AMP tool to evaluate 10 MOOCs from the Chinese provider XuetangX. This analysis revealed differences in the pedagogical approaches of STEM and non-STEM MOOCs. In an analysis of four MOOCs from the Malaysian UNIMAS platform, Taib, Chuah, and Aziz (2017) asked both learners and instructors to apply the AMP tool. Results showed differences in the respective course profiles, with only four dimensions rated unequivocally by learners and instructors over the courses surveyed. Quintana and Tan (2019) recently introduced an expanded version of the AMP tool with adjusted terminology and more sophisticated indicators. After rating 20 MOOCs (from the same platform and institution but from different subject areas), they demonstrated how nearest neighbor cluster analysis can help identify pedagogically similar MOOCs.
The evaluation framework used by Margaryan et al. (2015) is based on a set of design criteria originally developed for professional learning (Collis & Margaryan, 2005) and the Expanded Pebble-in-the-Pond Instructional Design Checklist (Merrill, 2013). The Course Scan rating scheme builds on the first principles of instruction, as synthesized by Merrill (2002): Learning is promoted when (1) instruction is problem-or task-centered, (2) learners activate existing knowledge and connect it to new knowledge, (3) learners are exposed to demonstrations of what they are expected to learn, (4) learners apply and practice what they have learned, and (5) learners integrate what they have learned into their everyday life. These five principles focus on learning activities. In addition, five further theoretically grounded principles focusing on learning resources and learning support were incorporated in the rating instrument: (6) collective knowledge: learning is promoted when learners contribute to the collective knowledge; (7) collaboration: learning is promoted when learners collaborate with others; (8) differentiation: learning is promoted when different learners are provided with individualized learning pathways; (9) authentic resources: learning is promoted when learning resources come from real-world settings, and (10) feedback: learning is promoted when learners are given expert feedback on their performance.
The Course Scan instrument has 37 items in three sections: (a) Course Details (7 items), (b) Objectives and Organization (6 items), and (c) Instructional Principles (24 items). Among a heterogeneous sample of 76 MOOCs with different pedagogies (xMOOCs and cMOOCs) from different providers and domains, the instructional quality was essentially low: Out of 72 possible total points, no MOOC scored above 28 points. While nearly all MOOCs presented well-packaged, structured offerings, there was only limited evidence of instructional principles. Chukwuemeka, Yoila, and Iscioglu (2015) used the Course Scan rubric to evaluate 27 random courses from the Open Education Europa Network. Their results indicated low overall instructional quality, as most of the courses did not follow the principles of instruction. Likewise, the 12 offerings from Eastern Mediterranean University Open CourseWare analyzed by Yoila and Chukwuemeka (2015) scored rather low. Watson, Watson, and Janakiraman (2017) used an extended version of the Course Scan instrument to assess nine MOOCs on attitudinal change, yielding better results than in the reference study.

Analyzing MOOCs in the Field of Business and Management
Research Questions Given the partially inconclusive findings on pedagogical aspects of MOOCs on the one hand and the importance of content-related pedagogies on the other, we decided to analyze instructional quality not as an overarching generic concept but rather in a domainspecific approach. As MOOCs from the field of business and management represent one of the largest sections in the global MOOC market and as there is only scarce evidence concerning their instructional quality, the following research questions (RQs) formed the basis of this exploratory study: • RQ 1: How can the instructional quality of MOOCs in the field of business and management be described in terms of structuredness and fit with existing instructional design principles? • RQ 2: Which categories point toward high instructional quality of business MOOCs, and which categories indicate room for improvement?
• RQ 3: Are there systematic differences concerning instructional quality based on distinctive features of business MOOCs, such as provider/platform, geographic region, and authoring institution?
Rating Instrument, Sample, and Procedure Due to its conceptual fit with some common principles of business education (e.g., problem-centeredness and active learning) and its focus on professional learning, we used the Course Scan rating scheme as a basis for our instrument. After an initial review, we decided to drop similar and potentially equivocal indicators and thus reduce the number of items (e.g., "To what extent are the problems in the course typical of those learners will encounter in the real world?" vs. "To what extent do the activities in the course relate to the participants' real workplace problems?"). In contrast to the original instrument, with item numbers ranging between 1 (e.g., activation) and 6 (problem centeredness), we decided to address each of Merrill's principles with two distinctive items and each of the more straightforward additional principles with only one single item. The final Concise Course Scan (CCS) rubric consists of three sections with 20 items in total.
Section A comprises five items in five categories, which refer to the structuredness and clarity of a course. High ratings imply a clear and comprehensive description of the course structure, its contents, the expected effort, the target audience, and the corresponding learning goals. In Section B, we operationalized Merrill's first principles of instruction. Ten items address the five categories: problem-centeredness, activation, demonstration, application, and integration (covered by two items each). Section C comprises of five items in five additional categories, which reflect key instructional quality aspects, like feedback, collaboration and cooperation, authenticity of learning materials, and individualization and differentiation. Following the assumption that learner activity plays a crucial role in instructional quality, we exchanged the contribution to a collective knowledge pool category (whose operationalization was very close to the collaboration category) from the original Course Scan rubric accordingly. Table 1 illustrates the CCS rubric and its sections, categories, and items. The categories in Section A and C are operationalized by one item, those in Section B by two items each. Every item is rated on a scale from 0 (not at all true-i.e., not in place) to 3 (very much true-i.e., in place to a large extent) points. For the weighting of the sections, we decided on a ratio of 1:2:2 for the points to be achieved in A, B, and C. This was based on the assumptions that instructional quality should be determined by the implementation of instructional principles rather than by course organization, and that the first principles and the additional principles should be equally important. Therefore, we doubled the raw points of Section C before adding them to the calculation. All in all, a weighted sum score adding up to a maximum of 75 points was calculated over the three sections as a measure for the overall instructional quality of a MOOC.
An analysis of the internal consistency of the instrument revealed a Cronbach's alpha of .822, which is satisfactory. In Section A, there were two items that slightly affected the internal consistency negatively-namely, learning goals (1) and requirements/effort (3). As these items are highly relevant for determining the course objectives and organization, excluding them from the rubric was not considered. The CCS rubric is subject to ongoing development concerning the formulation of categories, items, and indicators.

Audience
The target audience is clearly described.
3. Requirements/effort Course requirements are described sufficiently.

Course contents
The course contents are described in detail.

Course structure
The course structure is clear.

B) First principles of instruction 30 x 1
6. Problem centeredness The course tasks are linked to real-world problems.
The course tasks are at the center of activities.

Activation
The necessary prior knowledge is clearly described.
The course elements (contents, tasks) build on prior knowledge.

Demonstration
New knowledge is being demonstrated in a coherent way.
Media is being used adequately to demonstrate new knowledge.

Application
New knowledge can be applied and practiced in a coherent way.
The knowledge transfer to additional contexts is being promoted.

Integration
The reflection of new knowledge is being promoted.
The discussion of new knowledge is being promoted.
C) Additional principles of instruction 15 x 2 11. Feedback Feedback is an integral element of the course.

Authentic resources
The course materials are authentic.

Differentiation
The course enables different learning pathways, according to learners' needs.
14. Cooperation/collaboration The course promotes collaboration and cooperation.
15. Learner/activity orientation The course promotes active learning. Note. Items scored from 0 to 3 points each.
The sample of our pilot study (see Appendix) consisted of N = 101 courses. We randomly selected the courses from MOOC aggregators and course catalogues. Primary inclusion criteria were course language (generally English, with one "outlier" taught in German selected for comparison only) and course accessibility during the assessment period. In an attempt to approximate the market shares from the time of the assessment, we included courses from seven different MOOC providers, with a different number of courses each. The sample included MOOCs from eight topic areas in the field of business and management. Eighty-six courses were authored by academic institutions and 15 by nonacademic institutions. Most of the authoring institutions were North American (n = 38) or European (n = 37). In addition, 17 courses were authored by Australian institutions, eight from Asia, and just one from Africa. Session-based courses (n = 76) outweighed the self-paced courses (n = 25) in the sample. As calculated from the given information in the course specifications, the mean course length was 5.1 weeks (SD = 2.5; min = 1 week, max = 13 weeks), and the participants were engaged in coursework for approximately four hours per week (SD = 2.1; min = .5 hours; max = 11 hours).
Three trained raters, each with a background in pedagogy and instructional design, performed the assessment within a period of four months. After an initial training, it took about one-and-a-half hours on average to rate one single course. Five courses were coded by all three raters. Intercoder reliability was analyzed with Kendall's coefficient of concordance. The overall reliability was satisfying (W = .85). Pairwise comparisons of raters led to values between W = .83 and W = .99.

RQ1: Overall Instructional Quality of Business MOOCs
For the first research question, we analyzed the mean scores and standard deviations for each section and for the weighted sum scores. Concerning Section A (i.e., structuredness and clarity), the courses reached 11.55 points out of 15 on average (SD = 2.10). The lowest score of seven was reached by three courses in the sample, while the highest score of 15 was reached by six of the 101 MOOCs we analyzed. In terms of Section B (i.e., first principles of instruction), the mean score was 16.34 points out of 30 (SD = 5.58). A minimum score of 5-which illustrated a very low instructional quality-was assigned to two courses with the topics business intelligence and strategic management. The highest score of 27 points was assigned to only one MOOC on social enterprises. In Section C (i.e., Additional principles), the mean score was 12.85 points out of 30 (SD = 3.35).
Across all category groups, the mean weighted sum score was 40.75 points of a potential 75 points (SD = 9.25). The courses with the highest ratings reached 56 points, and the lowest ratings only added up to 17 points. The 10 top courses, reaching between 53 and 56 points on the CCS rubric, are shown in Table 2. Reflecting on the achieved ratings over the three sections, it becomes obvious that even among the top-rated courses, Section C falls behind when compared to Section B. Note. a Raw points in Section C weighted with factor 2.
Further, a correlation analysis revealed significant interrelations between the three sections. High ratings on structuredness and clarity (Section A) correspond with a higher quality related to Merrill's (2002) first principles of instruction detailed in Section B (r = .418**) as well as with better scores regarding the additional principles of instruction found in Section C (r = .342**). The strongest correlation, however, was found between Section B and C (r = .646**). Not too surprisingly, it appears that courses that address principles like problem-centeredness or integration are likely to show higher values concerning authentic resources or learner/activity orientation.

RQ2: Areas of Improvement
In the next step, we set out to identify categories that showed room for improvement. Table  3 offers an overview of the means and standard deviations for all categories. The highest average rating within Section A (M = 2.56; SD = .65) was reached in the category covering clear descriptions of the course contents, with the highest score of 3 reached by n = 66 courses of the sample. The lowest mean score was noted for the category clear description of the target audience. Notably, seven courses were rated with the minimum score of 0 in this category. In the other categories in Section A, there were only a few courses with the lowest rating (n < 10), and most courses reached higher scores.
Pertaining to Section B, the highest mean ratings (M = 2.08; SD = .65) were observed for the item on the adequate implementation of media (demonstration category). The highest score was reached by n = 36 courses here. The lowest ratings were achieved for the item on problem orientation (problem centeredness category; M = 1.39; SD =.87). Lower rated categories were integration (M = 1.68; SD = . 66), application (M = 1.55; SD = .84) and activation (M = 1.49; SD = .69). The number of courses which were rated 0 on an item varied between n = 1 (integration: reflection being promoted) and n = 35 (application: knowledge transfer being promoted). On average, there were n = 17 courses rated 0 which is a higher amount compared to Section A.
In Section C, finally, the best ratings were assigned for a regular integration of feedback during the course (M = 1.99; SD = .84). The maximum score of 3 points was assigned to 32 courses. Learner orientation (M = .68; SD = .49) as well as the degree of differentiation (M = .50; SD = .50) were rated particularly low. Concerning the implementation of different learning pathways according to the learners' needs, n = 50 courses were rated 0.
All in all, Section A shows much less room for improvement than the other sections, while two categories in Section B and C were rated particularly low. Note. Categories in Sections A and C based on single items. Categories in Section B based on two-item-scales.

RQ3: Distinctive Course Features and Instructional Quality
Concerning systematic differences between different groups of business MOOCs, we focused on six distinctive features. We considered provider/platform, course topic, region, pacing, course type, and authoring institution as relevant categories that could have an influence on instructional quality. As detailed in Table 4, we conducted variance analyses and found significant differences due to provider/platform, region, and authoring institution, as shown in Table 4. T-value; η 2 1.659; η 2 = .027 3.266**; η 2 = .097 2.463*; η 2 = .058 3.274**; η 2 = .098 Note. a b Analysis based on weighted scores. * p < .05. ** p < .01.Í Concerning provider/platform, we found significant differences between Udacity and the other MOOC providers (.002 < p < .039) as well as between Open2Study and the other providers evaluated in this study (.000 < p < .039). Thereby, Udacity showed significantly lower mean ratings than the rest. The effect sizes were the strongest for Section B (η 2 = .415). The highest means were reached by courses administrated by FutureLearn and iversity. However, these differences were not statistically significant.
In search of potential regional differences, we analyzed MOOCs from five geographic regions (i.e., North America, Europe, Asia, Australia, and Africa). We found small but significant differences in instructional quality in every section except Section A. In our sample, Australian courses showed the lowest means in most of the categories. This, however, relates to the fact that most of the Australian courses in our sample were offered by the provider/platform Open2Study and that these courses did not fare too well in our evaluation rubric. In contrast, courses from Europe scored significantly higher (p = .018; η 2 = .115).
With regard to the authoring institution, we found that MOOCs that were authored by academic institutions showed slightly higher instructional quality than those from nonacademic institutions. The total effect was small but statistically significant (p = .001; η 2 = .098).
Significant effects were not revealed for any of the other variables and categories analyzed. In detail, course topic, course type, and pacing were irrelevant when discussing potential impact factors on instructional quality. First of all, in terms of the eight different topic areas addressed by the MOOCs in the sample (see Appendix), we did not find any statistically significant differences. There was no systematic variation of instructional quality due to course topics here. Secondly, we analyzed different course types, as we differentiated four groups by a median split of the variables weekly course load and course length. This led to four distinctive course types: short course/high effort, short course/low effort, long course/high effort, and long course/low effort. However, the intensity and duration of the coursework implemented in the MOOCs of our sample were not systematically related to their instructional quality. Finally, being either session based or selfpaced, the MOOCs in this study did not significantly differ with respect to instructional quality.

Findings and Implications
This research focused on analysis of the instructional quality of MOOCs from the field of business and management. We introduced a rating instrument with 20 items in 15 categories in three sections. In an explorative study, three trained raters analyzed N = 101 business MOOCs. The overall findings indicate low overall instructional quality of the analyzed MOOCs. This finding corresponds to previous research in the field (e.g., Margaryan, Bianco, & Littlejohn 2015). Structuredness and clarity as well as adequate media integration as part of the Demonstration category were rated best, but otherwise the implementation of instructional design principles (first principles from Merrill [2002] as well as additional principles) was rather insufficient. More specifically, the rated courses showed substantial shortcomings with regard to an adequate individualized support of learners and the implementation of collaborative elements. Such results correspond with Spector's (2017) call for greater personalized learning in MOOCs, be it with adaptive digital technology or through instructor-selected activities .
Our results also point toward ample room for improvement in MOOC design. From the domain-specific perspective, the low scores in problem/task orientation are of most concern. In their present implementations, business MOOCs do not fit too well with the case-based teaching approach widely accepted as good practice in business education. For problem-centered business MOOCs, there is a clear need for "relevant and intentionally designed activities with both formative and summative assessments" (Spector 2017, p. 143) developed around complex, real-world tasks with corresponding authentic materials. This, of course, might come into conflict with one of the defining characteristics of the MOOC concept, which is to provide highly scalable online instruction at very low marginal costs. Hence, it remains a challenging task for instructional designers to bridge this gap and to explicitly address domain-specific pedagogical affordances.
In line with Reich (2015), our study also focused on comparisons of MOOCs across different contexts. With respect to systematic differences between business MOOCs depending on their characteristic features, we analyzed the potential effects of six variables: provider/platform, region, authoring institution, course type, pacing, and course topic. We found that courses administered by Open2study and Udacity scored significantly lower than MOOCs from other providers, with Udacity (who have been focusing on corporate training in recent years) scoring lowest in most of the categories. Further, courses authored by nonacademic institutions scored slightly lower. One suggestion, therefore, is that providers of VET or professional development MOOCs should take adequate actions not to fall behind (cf. Paton, Fluck, & Scanlan, 2018), especially when following the demands for smaller course sizes and tailored "learning nuggets" that seem to evolve around MOOCs in professional contexts (e.g., Egloffstein & Schwerer, 2019). In contrast, academic business MOOCs can be considered suitable for professional learning and development given that these MOOCs seem to align better with the instructional quality standards established in the field. The observed variations due to provider/platform and regional differences point in the same direction, as most of the Australian courses in our sample ran on the Open2Study platform. Although one could have expected that "platform capabilities have a strong influence on what can and will be done pedagogically" (Blackmon & Major, 2017, p. 210), we did not find any additional platform differences of statistical significance. Here, a deeper analysis with an extended sample is necessary to further clarify possible effects. With regard to course type (intensity), topic, and pacing, no systematic differences could be found.

Limitations and Future Research
The reported study has some evident limitations. First, the sample size and selection could be questioned, as the 101 business MOOCs in this study are far from being representative. Although we tried to approximate the market shares with a "snapshot" at the time of our analysis, we could, of course, capture only a fraction of the global MOOC market. XuetangX from China, for example, the third-largest MOOC provider in terms of registered students , had to be omitted due to language barriers. The same applies to Miríadax, which serves the Ibero-American world, France Université Numérique, and a number of other regional providers. Crosscultural studies could provide fruitful insights here, as it is largely unclear how regional influences could affect the concept of instructional quality.
Likewise, the rating instrument must be continuously improved, with a constant focus on valid indicators. As business MOOCs keep on evolving, we will continue our study and try to include more courses in our sample. Repeated measures, on the other hand, could provide valuable insights not only for research but also for a systematic quality assurance. MOOC providers then could build on empirically grounded instructional design knowledge to improve their offerings.
Additionally, it seems necessary to analyze learner interactions and instructional processes in MOOCs more rigorously. Such research is needed because the relationship between instructional design quality and instructional process quality is still debated. Most probably, a thorough course scan with participant observation over a longer period could lead to a better understanding here.
Regarding the instructional quality of MOOCs in general, we concur with Littlejohn and Hood's (2018) call for the development and evaluation of new measures. Thereby, measures from the instructor perspective must be complemented by measures capturing the learner perspective. Learner characteristics, learning processes, and learning outcomes (Biggs, 1993) could provide a rich set of additional indicators for instructional quality. An extended learning analytics approach focusing on learner motivation and emotions could add others layers of detail.
The current study presents valuable insights into the instructional quality of MOOCs in the field of business and management. Drawing upon the results, future tasks for instructional designers in this rapidly evolving field of distance education become evident. As this occurs, a prospective agenda for MOOC research can be mapped and interrogated.