News - Assessing Student Learning and Developing Comprehensive Standards for Measuring Teaching Effectiveness in Medical School

Evaluation of curriculum and faculty is critical for all institutions of higher education, including medical schools. Student evaluations of teaching (SET) typically take the form of anonymous questionnaires, and although they were originally developed to evaluate courses and programs, over time they have also been used to measure teaching effectiveness and subsequently make important teaching-related decisions. Teacher professional development. However, certain factors and biases may affect SET scores and teaching effectiveness cannot be measured objectively. Although the literature on course and faculty evaluation in general higher education is well established, there are concerns about using the same tools to evaluate courses and faculty in medical programs. In particular, SET in general higher education cannot be directly applied to curriculum design and implementation in medical schools. This review provides an overview of how SET can be improved at the instrument, management, and interpretation levels. Additionally, this article points out that by using various methods such as peer review, focus groups, and self-assessment to collect and triangulate data from multiple sources, including students, peers, program managers, and self-awareness, a comprehensive assessment system can be constructed. Effectively Measure teaching effectiveness, support the professional development of medical educators, and improve the quality of teaching in medical education.
Course and program evaluation is an internal quality control process in all institutions of higher education, including medical schools. Student Evaluation of Teaching (SET) usually takes the form of an anonymous paper or online questionnaire using a rating scale such as a Likert scale (usually five, seven or higher) that allows people to indicate their agreement or degree of agreement. I do not agree with specific statements) [1,2,3]. Although SETs were originally developed to evaluate courses and programs, over time they have also been used to measure teaching effectiveness [4, 5, 6]. Teaching effectiveness is considered important because it is assumed that there is a positive relationship between teaching effectiveness and student learning [7]. Although the literature does not clearly define the effectiveness of training, it is usually specified through specific characteristics of training, such as “group interaction”, “preparation and organization”, “feedback to students” [8].
Information obtained from SET can provide useful information, such as whether there is a need to adjust the teaching materials or teaching methods used in a particular course. SET is also used to make important decisions related to teacher professional development [4,5,6]. However, the appropriateness of this approach is questionable when higher education institutions make decisions regarding faculty, such as promotion to higher academic ranks (often associated with seniority and salary increases) and key administrative positions within the institution [4, 9] . In addition, institutions often require new faculty to include SETs from previous institutions in their applications for new positions, thereby influencing not only faculty promotions within the institution, but also potential new employers [10].
Although the literature on curriculum and teacher evaluation is well established in the field of general higher education, this is not the case in the field of medicine and health care [11]. The curriculum and needs of medical educators differ from those of general higher education. For example, team learning is often used in integrated medical education courses. This means that the medical school curriculum consists of a series of courses taught by a number of faculty members who have training and experience in various medical disciplines. Although students benefit from the in-depth knowledge of experts in the field under this structure, they often face the challenge of adapting to each teacher’s different teaching styles [1, 12, 13, 14].
Although there are differences between general higher education and medical education, the SET used in the former is also sometimes used in medicine and health care courses. However, implementing SET in general higher education poses many challenges in terms of curriculum and faculty evaluation in health professional programs [11]. In particular, due to differences in teaching methods and teacher qualifications, course evaluation results may not include student opinions of all teachers or classes. Research by Uytenhaage and O’Neill (2015) [5] suggests that asking students to rate all individual teachers at the end of a course may be inappropriate because it is nearly impossible for students to remember and comment on multiple teacher ratings. categories. In addition, many medical education teachers are also physicians for whom teaching is only a small part of their responsibilities [15, 16]. Because they are primarily involved in patient care and, in many cases, research, they often have little time to develop their teaching skills. However, physicians as teachers should receive time, support, and constructive feedback from their organizations [16].
Medical students tend to be highly motivated and hard-working individuals who successfully gain admission to medical school (through a competitive and demanding process internationally). In addition, during medical school, medical students are expected to acquire a large amount of knowledge and develop a large number of skills in a short period of time, as well as to succeed in complex internal and comprehensive national assessments [17,18,19,20]. Thus, due to the high standards expected of medical students, medical students may be more critical and have higher expectations for high quality teaching than students in other disciplines. Thus, medical students may have lower ratings from their professors compared to students in other disciplines for the reasons mentioned above. Interestingly, previous studies have shown a positive relationship between student motivation and individual teacher evaluations [21]. In addition, over the past 20 years, most medical school curricula around the world have become vertically integrated [22], so that students are exposed to clinical practice from the earliest years of their program. Thus, over the past few years, physicians have become increasingly involved in the education of medical students, endorsing, even early in their programs, the importance of developing SETs tailored to specific faculty populations [22].
Due to the specific nature of medical education mentioned above, SETs used to evaluate general higher education courses taught by a single faculty member should be adapted to evaluate the integrated curriculum and clinical faculty of medical programs [14]. Therefore, there is a need to develop more effective SET models and comprehensive assessment systems for more effective application in medical education.
The current review describes recent advances in the use of SET in (general) higher education and its limitations, and then outlines the various needs of SET for medical education courses and faculty. This review provides an update on how SET can be improved at the instrumental, administrative and interpretive levels, and focuses on the goals of developing effective SET models and comprehensive assessment systems that will effectively measure teaching effectiveness, support the development of professional health educators and Improve the quality of teaching in medical education.
This study follows the study of Green et al. (2006) [23] for advice and Baumeister (2013) [24] for advice on writing narrative reviews. We decided to write a narrative review on this topic because this type of review helps present a broad perspective on the topic. Moreover, because narrative reviews draw on methodologically diverse studies, they help answer broader questions. Additionally, narrative commentary can help stimulate thought and discussion about a topic.
How is SET used in medical education and what are the challenges compared to SET used in general higher education,
The Pubmed and ERIC databases were searched using a combination of the search terms “student teaching evaluation,” “teaching effectiveness,” “medical education,” “higher education,” “curriculum and faculty evaluation,” and for Peer Review 2000, logical operators. articles published between 2021 and 2021. Inclusion criteria: Included studies were original studies or review articles, and the studies were relevant to the areas of the three main research questions. Exclusion criteria: Studies that were not English language or studies in which full-text articles could not be found or were not relevant to the three main research questions were excluded from the current review document. After selecting publications, they were organized into the following topics and associated subtopics: (a) The use of SET in general higher education and its limitations, (b) The use of SET in medical education and its relevance to addressing issues related to comparison of SET (c ) Improving SET at instrumental, managerial and interpretive levels to develop effective SET models.
Figure 1 provides a flowchart of selected articles included and discussed in the current portion of the review.
SET has been traditionally used in higher education and the topic has been well studied in the literature [10, 21]. However, a large number of studies have examined their many limitations and efforts to address these limitations.
Research shows that there are many variables that influence SET scores [10, 21, 25, 26]. Therefore, it is important for administrators and teachers to understand these variables when interpreting and using data. The next section provides a brief overview of these variables. Figure 2 shows some of the factors that influence SET scores, which are detailed in the following sections.
In recent years, the use of online kits has increased compared to paper kits. However, evidence in the literature suggests that online SET can be completed without students devoting the necessary attention to the completion process. In an interesting study by Uitdehaage and O’Neill [5], non-existent teachers were added to the SET and many students provided feedback [5]. Moreover, evidence in the literature suggests that students often believe that completion of SET does not lead to improved educational attainment, which, when combined with the busy schedule of medical students, may result in lower response rates [27]. Although research shows that the opinions of students who take the test are no different from those of the entire group, low response rates can still lead teachers to take the results less seriously [28].
Most online SETs are completed anonymously. The idea is to allow students to express their opinions freely without the assumption that their expression will have any impact on their future relationships with teachers. In Alfonso et al.’s study [29], researchers used anonymous ratings and ratings in which raters had to give their names (public ratings) to evaluate the teaching effectiveness of medical school faculty by residents and medical students. The results showed that teachers generally scored lower on the anonymous assessments. The authors argue that students are more honest in anonymous assessments due to certain barriers in open assessments, such as damaged working relationships with participating teachers [29]. However, it should also be noted that the anonymity often associated with online SET may lead some students to be disrespectful and retaliatory towards the instructor if assessment scores do not meet student expectations [30]. However, research shows that students rarely provide disrespectful feedback, and the latter can be further limited by teaching students to provide constructive feedback [30].
Several studies have shown that there is a correlation between students’ SET scores, their test performance expectations, and their test satisfaction [10, 21]. For example, Strobe (2020) [9] reported that students reward easy courses and teachers reward weak grades, which can encourage poor teaching and lead to grade inflation [9]. In a recent study, Looi et al. (2020) [31] Researchers have reported that more favorable SETs are related and easier to assess. Moreover, there is disturbing evidence that SET is inversely related to student performance in subsequent courses: the higher the rating, the worse student performance in subsequent courses. Cornell et al. (2016)[32] conducted a study to examine whether college students learned relatively more from teachers whose SET they rated highly. The results show that when learning is assessed at the end of a course, teachers with the highest ratings also contribute to the learning of the most students. However, when learning is measured by performance in subsequent relevant courses, teachers who score relatively low are the most effective. The researchers hypothesized that making a course more challenging in a productive way could lower ratings but improve learning. Thus, student assessments should not be the sole basis for evaluating teaching, but should be recognized.
Several studies show that SET performance is influenced by the course itself and its organization. Ming and Baozhi [33] found in their study that there were significant differences in SET scores among students in different subjects. For example, clinical sciences have higher SET scores than basic sciences. The authors explained that this is because medical students are interested in becoming doctors and therefore have a personal interest and higher motivation to participate more in clinical science courses compared to basic science courses [33]. As in the case of electives, student motivation for the subject also has a positive effect on scores [21]. Several other studies also support that course type may influence SET scores [10, 21].
Moreover, other studies have shown that the smaller the class size, the higher the level of SET achieved by teachers [10, 33]. One possible explanation is that smaller class sizes increase opportunities for teacher-student interaction. In addition, the conditions in which the assessment is conducted may influence the results. For example, SET scores appear to be influenced by the time and day the course is taught, as well as the day of the week that the SET is completed (e.g., assessments completed on weekends tend to result in more positive scores) than assessments completed earlier in the week . [10].
An interesting study by Hessler et al also questions the effectiveness of SET. [34]. In this study, a randomized controlled trial was conducted in an emergency medicine course. Third-year medical students were randomly assigned to either a control group or a group that received free chocolate chip cookies (cookie group). All groups were taught by the same teachers, and the training content and course materials were identical for both groups. After the course, all students were asked to complete a set. Results showed that the cookie group rated teachers significantly better than the control group, calling into question the effectiveness of SET [34].
Evidence in the literature also supports that gender may influence SET scores [35,36,37,38,39,40,41,42,43,44,45,46]. For example, some studies have shown a relationship between students’ gender and assessment results: female students scored higher than male students [27]. Most evidence confirms that students rate female teachers lower than male teachers [37, 38, 39, 40]. For example, Boring et al. [38] showed that both male and female students believed that men were more knowledgeable and had stronger leadership abilities than women. The fact that gender and stereotypes influence SET is also supported by the study of MacNell et al. [41], who reported that students in his study rated female teachers lower than male teachers on various aspects of teaching [41]. Moreover, Morgan et al [42] provided evidence that female physicians received lower teaching ratings in four major clinical rotations (surgery, pediatrics, obstetrics and gynecology, and internal medicine) compared to male physicians.
In Murray et al.’s (2020) study [43], the researchers reported that faculty attractiveness and student interest in the course were associated with higher SET scores. Conversely, course difficulty is associated with lower SET scores. Additionally, students gave higher SET scores to young white male humanities teachers and to faculty holding full professorships. There were no correlations between SET teaching evaluations and teacher survey results. Other studies also confirm the positive impact of teachers’ physical attractiveness on assessment results [44].
Clayson et al. (2017) [45] reported that although there is general agreement that SET produces reliable results and that class and teacher averages are consistent, inconsistencies still exist in individual student responses. In summary, the results of this assessment report indicate that students did not agree with what they were asked to evaluate. Reliability measures derived from student evaluations of teaching are insufficient to provide a basis for establishing validity. Therefore, SET may sometimes provide information about students rather than teachers.
Health education SET differs from traditional SET, but educators often use SET available in general higher education rather than SET specific to health professions programs reported in the literature. However, studies conducted over the years have identified several problems.
Jones et al (1994). [46] conducted a study to determine the question of how to evaluate medical school faculty from the perspectives of faculty and administrators. Overall, the most frequently mentioned issues related to teaching evaluation. The most common were general complaints about the inadequacy of current performance assessment methods, with respondents also making specific complaints about SET and the lack of recognition of teaching in academic reward systems. Other problems reported included inconsistent evaluation procedures and promotion criteria across departments, a lack of regular evaluations, and a failure to link evaluation results to salaries.
Royal et al (2018) [11] outline some of the limitations of using SET to evaluate curriculum and faculty in health professional programs in general higher education. Researchers report that SET in higher education faces various challenges because it cannot be directly applied to curriculum design and course teaching in medical schools. Frequently asked questions, including questions about the instructor and the course, are often combined into one questionnaire, so students often have trouble distinguishing between them. In addition, courses in medical programs are often taught by multiple faculty members. This raises questions of validity given the potentially limited number of interactions between students and teachers assessed by Royal et al. (2018)[11]. In a study by Hwang et al. (2017) [14], researchers examined the concept of how retrospective course evaluations comprehensively reflect student perceptions of various instructors’ courses. Their results suggest that individualized class assessment is necessary to manage multidepartmental courses within an integrated medical school curriculum.
Uitdehaage and O’Neill (2015) [5] examined the extent to which medical students deliberately took SET in a multi-faculty classroom course. Each of the two preclinical courses featured a fictitious instructor. Students must provide anonymous ratings to all instructors (including fictitious instructors) within two weeks of completing the course, but may decline to evaluate the instructor. The following year it happened again, but the portrait of the fictional lecturer was included. Sixty-six percent of students rated the virtual instructor without similarity, but fewer students (49%) rated the virtual instructor with similarity present. These findings suggest that many medical students complete SETs blindly, even when accompanied by photographs, without careful consideration of who they are assessing, let alone the performance of the instructor. This hinders the improvement of program quality and can be detrimental to the academic progress of teachers. The researchers propose a framework that offers a radically different approach to SET that actively and actively engages students.
There are many other differences in the educational curriculum of medical programs compared to other general higher education programs [11]. Medical education, like professional health education, is clearly focused on the development of clearly defined professional roles (clinical practice). As a result, medical and health program curricula become more static, with limited course and faculty choices. Interestingly, medical education courses are often offered in a cohort format, with all students taking the same course at the same time each semester. Therefore, enrolling a large number of students (usually n = 100 or more) can affect the teaching format as well as the teacher-student relationship. Moreover, in many medical schools, the psychometric properties of most instruments are not assessed upon initial use, and the properties of most instruments may remain unknown [11].
Several studies over the past few years have provided evidence that SET can be improved by addressing some important factors that may influence the effectiveness of SET at the instrumental, administrative, and interpretive levels. Figure 3 shows some of the steps that can be used to create an effective SET model. The following sections provide a more detailed description.
Improve SET at the instrumental, managerial, and interpretive levels to develop effective SET models.
As mentioned earlier, the literature confirms that gender bias can influence teacher evaluations [35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46]. Peterson et al. (2019) [40] conducted a study that examined whether student gender influenced student responses to bias mitigation efforts. In this study, SET was administered to four classes (two taught by male teachers and two taught by female teachers). Within each course, students were randomly assigned to receive a standard assessment tool or the same tool but using language designed to reduce gender bias. The study found that students who used anti-bias assessment tools gave female teachers significantly higher SET scores than students who used standard assessment tools. Moreover, there were no differences in ratings of male teachers between the two groups. The results of this study are significant and demonstrate how a relatively simple language intervention can reduce gender bias in student evaluations of teaching. Therefore, it is good practice to carefully consider all SETs and use language to reduce gender bias in their development [40].
To get useful results from any SET, it is important to carefully consider the purpose of the assessment and the wording of the questions in advance. Although most SET surveys clearly indicate a section on organizational aspects of the course, i.e. “Course Evaluation”, and a section on faculty, i.e. “Teacher Evaluation”, in some surveys the difference may not be obvious, or There may be confusion among students about how to assess each of these areas individually. Therefore, the design of the questionnaire must be appropriate, clarify the two different parts of the questionnaire, and make students aware of what should be assessed in each area. In addition, pilot testing is recommended to determine whether students interpret the questions in the intended way [24]. In a study by Oermann et al. (2018) [26], the researchers searched and synthesized literature describing the use of SET in a wide range of disciplines in undergraduate and graduate education to provide educators with guidance on the use of SET in nursing and other health professional programs. The results suggest that SET instruments should be evaluated before use, including pilot testing the instruments with students who may not be able to interpret the SET instrument items or questions as intended by the instructor.
Several studies have examined whether the SET governance model influences student engagement.
Daumier et al. (2004) [47] compared student ratings of instructor training completed in class with ratings collected online by comparing the number of responses and ratings. Research shows that online surveys typically have lower response rates than in-class surveys. However, the study found that online assessments did not produce significantly different average grades from traditional classroom assessments.
There was a reported lack of two-way communication between students and teachers during completion of online (but often printed) SETs, resulting in a lack of opportunity for clarification. Therefore, the meaning of SET questions, comments, or student evaluations may not always be clear [48]. Some institutions have addressed this issue by bringing students together for an hour and allocating a specific time to complete the SET online (anonymously) [49]. In their study, Malone et al. (2018) [49] held several meetings to discuss with students the purpose of SET, who would see the SET results and how the results would be used, and any other issues raised by students. SET is conducted much like a focus group: the collective group answers open-ended questions through informal voting, debate, and clarification. The response rate was over 70–80%, providing teachers, administrators, and curriculum committees with extensive information [49].
As mentioned above, in Uitdehaage and O’Neill’s study [5], the researchers reported that students in their study rated non-existent teachers. As mentioned earlier, this is a common problem in medical school courses, where each course may be taught by many faculty members, but students may not remember who contributed to each course or what each faculty member did. Some institutions have addressed this issue by providing a photograph of each lecturer, his/her name, and the topic/date presented to refresh students’ memories and avoid problems that compromise the effectiveness of SET [49].
Perhaps the most important problem associated with SET is that teachers are unable to correctly interpret quantitative and qualitative SET results. Some teachers may want to make statistical comparisons across years, some may view minor increases/decreases in mean scores as meaningful changes, some want to believe every survey, and others are downright skeptical of any survey [45,50, 51].
Failure to correctly interpret results or process student feedback can affect teachers’ attitudes toward teaching. The results of Lutovac et al. (2017) [52] Supportive teacher training is necessary and beneficial for providing feedback to students. Medical education urgently needs training in the correct interpretation of SET results. Therefore, medical school faculty should receive training on how to evaluate outcomes and the important areas on which they should focus [50, 51].
Thus, the results described suggest that SETs should be carefully designed, administered, and interpreted to ensure that SET results have a meaningful impact on all relevant stakeholders, including faculty, medical school administrators, and students.
Because of some of the limitations of SET, we should continue to strive to create a comprehensive evaluation system to reduce bias in teaching effectiveness and support the professional development of medical educators.
A more complete understanding of clinical faculty teaching quality can be gained by collecting and triangulating data from multiple sources, including students, colleagues, program administrators, and faculty self-assessments [53, 54, 55, 56, 57]. The following sections describe possible other tools/methods that can be used in addition to effective SET to help develop a more appropriate and complete understanding of training effectiveness (Figure 4).
Methods that can be used to develop a comprehensive model of a system for assessing the effectiveness of teaching in a medical school.
A focus group is defined as “a group discussion organized to explore a specific set of issues” [58]. Over the past few years, medical schools have created focus groups to obtain quality feedback from students and address some of the pitfalls of online SET. These studies show that focus groups are effective in providing quality feedback and increasing student satisfaction [59, 60, 61].
In a study by Brundle et al. [59] The researchers implemented a student evaluation group process that allowed course directors and students to discuss courses in focus groups. Results indicate that focus group discussions complement online assessments and increase student satisfaction with the overall course assessment process. Students value the opportunity to communicate directly with course directors and believe that this process can contribute to educational improvement. They also felt that they were able to understand the course director’s point of view. In addition to students, course directors also rated that focus groups facilitated more effective communication with students [59]. Thus, the use of focus groups can provide medical schools with a more complete understanding of the quality of each course and the teaching effectiveness of the respective faculty. However, it should be noted that the focus groups themselves have some limitations, such as only a small number of students participating in them compared to the online SET program, which is available to all students. Additionally, conducting focus groups for various courses can be a time-consuming process for advisors and students. This poses significant limitations, especially for medical students who have very busy schedules and may undertake clinical placements in different geographic locations. In addition, focus groups require a large number of experienced facilitators. However, incorporating focus groups into the evaluation process can provide more detailed and specific information about the effectiveness of training [48, 59, 60, 61].
Schiekierka-Schwacke et al. (2018) [62] examined student and faculty perceptions of a new tool for assessing faculty performance and student learning outcomes in two German medical schools. Focus group discussions and individual interviews were conducted with faculty and medical students. Teachers appreciated the personal feedback provided by the assessment tool, and students reported that a feedback loop, including goals and consequences, should be created to encourage the reporting of assessment data. Thus, the results of this study support the importance of closing the loop of communication with students and informing them of assessment results.
Peer Review of Teaching (PRT) programs are very important and have been implemented in higher education for many years. PRT involves a collaborative process of observing teaching and providing feedback to the observer to improve teaching effectiveness [63]. In addition, self-reflection exercises, structured follow-up discussions, and systematic assignment of trained colleagues can help improve the effectiveness of PRT and the teaching culture of the department [64]. These programs are reported to have many benefits as they can help teachers receive constructive feedback from peer teachers who may have faced similar difficulties in the past and can provide greater support by providing useful suggestions for improvement [63]. Moreover, when used constructively, peer review can improve course content and delivery methods, and support medical educators in improving the quality of their teaching [65, 66].
A recent study by Campbell et al. (2019) [67] provide evidence that the workplace peer support model is an acceptable and effective teacher development strategy for clinical health educators. In another study, Caygill et al. [68] conducted a study in which a specially designed questionnaire was sent to health educators at the University of Melbourne to allow them to share their experiences of using PRT. The results indicate that there is pent-up interest in PRT among medical educators and that the voluntary and informative peer review format is considered an important and valuable opportunity for professional development.
It is worth noting that PRT programs must be carefully designed to avoid creating a judgmental, “managerial” environment that often leads to increased anxiety among observed teachers [69]. Therefore, the goal should be to carefully develop PRT plans that will complement and facilitate the creation of a safe environment and provide constructive feedback. Therefore, special training is needed to train reviewers, and PRT programs should only involve truly interested and experienced teachers. This is especially important if the information obtained from the PRT is used in faculty decisions such as promotions to higher levels, salary increases, and promotions to important administrative positions. It should be noted that PRT is time-consuming and, like focus groups, requires the participation of a large number of experienced faculty members, making this approach difficult to implement in low-resource medical schools.
Newman et al. (2019) [70] describe strategies used before, during and after training, observations that highlight best practices and identify solutions to learning problems. The researchers provided 12 suggestions to reviewers, including: (1) choose your words wisely; (2) allow the observer to determine the direction of the discussion; (3) keep feedback confidential and formatted; (4) keep feedback confidential and formatted; Feedback focuses on teaching skills rather than the individual teacher; (5) Get to know your colleagues (6) Be considerate of yourself and others (7) Remember that pronouns play an important role in providing feedback, (8) Use questions to shed light on the teaching perspective, (10) Establish processes trust and feedback in peer observations, (11) make observation of learning a win-win, (12) create an action plan. Researchers are also exploring the impact of bias on observations and how the process of learning, observing and discussing feedback can provide valuable learning experiences for both parties, leading to long-term partnerships and improved educational quality. Gomaly et al. (2014) [71] reported that the quality of effective feedback should include (1) clarification of the task by providing directions, (2) increased motivation to encourage greater effort, and (3) the recipient’s perception of it as a valuable process. provided by a reputable source.
Although medical school faculty receive feedback on PRT, it is important to train faculty on how to interpret feedback (similar to the recommendation to receive training in SET interpretation) and to allow faculty sufficient time to constructively reflect on the feedback received.

Post time: Nov-24-2023

Assessing Student Learning and Developing Comprehensive Standards for Measuring Teaching Effectiveness in Medical School | BMC Medical Education