09. Assessment, Evaluation, Testing and Measurement
Paper
TIMSS Repeat in Flanders: a longitudinal follow-up to TIMSS 2019
Dries Verhelst, Marijn Gijsen, Lies Appels, Sven De Maeyer, Peter Van Petegem
University of Antwerp, Belgium
Presenting Author: Verhelst, Dries
Flanders has a history of participating in International Large-Scale Assessment studies (ILSAs) like TIMSS, where it has often ranked highly. However, the last cycles of TIMSS have shown a gradual decline in the academic achievement of Flemish students. This has sparked a debate about the quality of education in Flanders. Between the TIMSS cycles of 2015 and 2019, Flemish students' achievement levels decreased by 14 points for mathematics and 11 points for science (Faddar et al., 2020).
Although ILSAs are crucial tools for policymakers to assess the quality of educational systems, their primary purpose is periodic benchmarking (Addey and Sellar, 2019). However, the decline found among Flemish students has prompted a deeper investigation and monitoring of the evolution of Flemish learning gains throughout the remaining two years of primary schooling, which goes beyond benchmarking. To this end, a longitudinal study based on TIMSS-2019 was set up in Flanders: TIMSS-repeat.
Using a longitudinal design, TIMSS-repeat retested students who participated in the TIMSS 2019 cycle in 2021, when most of the students were in the sixth grade of primary education. In total, 4.301 students, their teachers, and their school principals participated in TIMSS repeat. The main purpose of TIMSS-repeat was to investigate the learning gains of Flemish students during the last two years of primary school, allowing an inquiry into the connection between students' background characteristics and their learning gains for mathematics and science. Moreover, the specific timing of the data collection in May 2021, just after the school closures and quarantines due to COVID-19, allowed for additional information regarding the impact of COVID-19 to be collected. This enabled the investigation of COVID-19's impact on the learning gains in mathematics and science. The following research questions were central:
- What are the achievement gains for Flemish students in the last two years of primary education?
- What are the differential effects of student background characteristics on their achievement gains in the last two years of primary education?
- How did the COVID-19 pandemic and the resulting school closures impact student achievement gains in Flanders?
The first and second questions aim to analyze how students in Flanders progress through the last years of primary school. With these questions, we aim to reveal how students’ learning gains increase and whether specific background characteristics facilitate or hamper student learning gains. In previous TIMSS studies, it was found that home language or students’ socioeconomic status is linked to their achievement (Faddar et al., 2020; IEA, 2020). The third question seeks to provide valuable information to both researchers and policymakers regarding the impact of the COVID-19 pandemic on student learning and achievement. Not only does TIMSS-repeat in Flanders provide answers to these research questions, but it also aligns with the research goals of the TIMSS longitudinal study that is following the TIMSS 2023 cycle (The International Association for the Evaluation of Educational Achievement (IEA), 2022).
TIMSS-repeat in Flanders provided a valuable but tentative insight into Flemish learning gains during the final grades of primary education in Flanders, characterized by one of the most impactful global events of our time. In this presentation, we will discuss the different steps taken to conduct the TIMSS-repeat study in Flanders and present our most important findings.
Methodology, Methods, Research Instruments or Sources UsedThe research presented here utilizes data from the TIMSS 2019 cycle collected in May 2019 (T1) and a repeated measurement after two years in May 2021 (T2), based on the same sample of schools and students. For T2 91.9% of the schools from TIMSS 2019 agreed to participate, resulting in a sample of 4301 students nested in 133 schools. Rigorous checks were conducted for selection bias in comparison to the T1 sample, including factors from both the school and the student level such as school performance, educational network, geographical location, gender, and socioeconomic status. Both T1 and T2 samples are comparable, revealing no significant selection bias, and this is on both the school and the student level.
To ensure the reliability of the data, several precautions were taken. To avoid a modus effect (Martin et al., 2020), paper-based achievement tests were administered for both T1 and T2. Additionally, to minimize the likelihood of a ceiling effect, adjustments were made to the test materials: easier items were excluded and more difficult mathematics and science items were included from the Flemish national assessment tests conducted in 2015, 2016, and 2021. In the selection of these new items, we maintained a distribution that aligns with the content domains (measurement & geometry, numbers and data for mathematics; life, physical, and earth for science) and cognitive domains (knowing, applying, and reasoning) (Martin et al., 2020). To allow for a precise description of the learning gains, the test items of T1 and T2 were calibrated (Scharfen et al., 2018). Finally, to avoid a retest effect individual students were administered different test items compared to the 2019 test.
To grasp the impact of COVID-19, specific scales were added to the background questionnaires for the students, teachers, and school leaders. All new instruments were found to be reliable and valid.
The analysis began by calculating weights, jackknife estimates, and plausible values for students’ mathematics and science achievement (Martin et al., 2020). The R package “EdSurvey” was used for all analyses (Bailey, 2020), specifically the “mixed.sdf” function was used to estimate mixed effect models mapping differential effects of student characteristics on achievement. The analysis used a scale ranging from 0 to 1000 points.
Conclusions, Expected Outcomes or FindingsLooking at the first research question, Flemish pupils demonstrated achievement gains in both mathematics and sciences over the last two years of primary education, with an increase of 117 points in mathematics and 107 points in science. In terms of cognitive domains, Flemish students exhibited the most significant improvements in the Applying domain for both mathematics and science, aligning with Faddar's hypothesis regarding the emphasis on higher cognitive skills in later years of primary education (Faddar et al., 2020).
Answering the second research question, we found that boys obtained slightly higher learning gains compared to girls, with an increase of 120 points in mathematics and 113 points in science, compared to 116 and 109 points, respectively. For home language, noteworthy results were found: students who never spoke the language of the test at home demonstrated the most substantial achievement gains in both mathematics (137 points) and science (134 points). Additionally, students with a room for themselves and access to a significant number of books at home experienced the highest achievement gains in both subjects.
When answering the first and second research questions, caution is advised: while we found learning gains, empirical evidence to compare the size of these learning gains is lacking. Potential benchmarks such as Bloom et al. (2008), Martin et al. (1997), or Mullis et al. (1997) are based on empirical data, but may also not be as pertinent due to their age and dissimilar contexts.
Finally, the descriptive data on how schools, teachers, and students adapted to COVID-19 provides an answer to the third research question.Results include, among others, a shift in didactics and teaching and difficulties with online teaching.
ReferencesAddey, C., and Sellar, S. (2019). Rationales for (non) participation in international large-scale learning assessments. Education Research and Foresight: UNESCO Working paper.
Bailey, P., Lee, M., Nguyen, T., & Zhang, T. (2020). Using EdSurvey to Analyze TIMSS Data. In
Faddar, J., Appels, L., Merckx, B., Boeve-de Pauw, J., Delrue, K., , De Maeyer, S., and Van Petegem, P. (2020). Vlaanderen in TIMSS 2019. Wiskunde- en wetenschapsprestaties van het vierde leerjaar in internationaal perspectief en doorheen de tijd. .
IEA. (2020). TIMSS 2019 International Results in Mathematics and Science.
Martin, M. O., von Davier, M., and Mullis, I. V. S. (2020). Methods and Procedures: TIMSS 2019 Technical Report. T. P. I. S. C. Boston College. https://timssandpirls.bc.edu/timss2019/methods
Scharfen, J., Peters, J. M., and Holling, H. (2018). Retest effects in cognitive ability tests: A meta-analysis. Intelligence, 67, 44-66. https://doi.org/https://doi.org/10.1016/j.intell.2018.01.003
The International Association for the Evaluation of Educational Achievement (IEA). (2022). TIMSS Longitudinal Study: Measuring Student Progress over One Year.
09. Assessment, Evaluation, Testing and Measurement
Paper
Role of Metacognitive Skills and Self-Efficacy in Predicting Academic Results of Middle School Students
Diana Akhmedjanova, Elizaveta Korotkikh
National Research University "Higher School of Economics"
Presenting Author: Korotkikh, Elizaveta
Metacognition or metacognitive skills refer to students’ “understanding and control of their own cognition” (Sternberg, 2007, p. 18). Metacognition or knowledge about thinking includes declarative, procedural, and conditional knowledge (McCormick, 2003). Students who have well developed metacognitive skills tend to thrive academically. For example, research shows that systematic metacognitive monitoring leads to better understanding and academic performance (Zimmerman & Cleary, 2009). However, many studies in education report on low to medium associations between metacognition and academic achievement (Fleur et al., 2021; Winne & Azevedo, 2022).
Self-efficacy is another construct that relates to academic achievement across educational settings and age groups (DiBenedetto & Schunk, 2022). Self-efficacy refers to students’ beliefs that they can successfully tackle a task (Anderman & Wolters, 2008; Bandura, 2006). Students’ self-efficacy is related to their engagement with a task and the types of strategies they use (Bandura, 1994). Years of research indicate that self-efficacy relates to students’ learning, motivation, achievement, and self-regulated learning (DiBenedetto & Schunk, 2022). High self-efficacy is a strong predictor of students’ achievement and success (DiBenedetto & Schunk, 2022) and strongly relates to academic achievement for middle school students (Carpenter, 2007).
Available research studies suggest positive yet small correlations between metacognition and general and domain-specific self-efficacy (Cera et al., 2013; Ridlo & Lutfia, 2016). In addition, metacognitive scaffolding improved metacognitive awareness, academic self-efficacy, and learning achievement of biology students (Valencia-Valejo et al., 2019). Research evidence from other countries provides support in positive relationships among metacognition, self-efficacy, and academic achievement. However, it is not clear how these constructs relate to each other in other contexts such as Russia. Therefore, the goal of this study is to examine the role of metacognitive skills and self-efficacy in predicting middle school students’ academic results.
Theoretical framework
The role of metacognition and self-efficacy in students’ academic results in this study is examined through a Model of Self- and Socially Regulated Learning (Author). The model is organized around three broad areas: self-regulated learning (SRL; C–I, M–N), socially regulated learning (SoRL; A–B, J–N), and culture (O). Each area has its own set of processes contributing to the development of self-/socially regulated skills. Thus, SoRL includes instructional techniques (A–B) and formative assessment practices, such as feedback, which occur in classrooms (J–N). SRL includes the processes that activate student’s background knowledge and motivational beliefs, which lead to the choice of goals and strategies to do the task (C–I, M–N). Finally, culture (O) situates both types of processes within a socio-cultural context.
This model reflects the complexity of school classrooms and includes a number of variables. In this paper, however, the focus is on such components of SRL as metacognition and self-efficacy. For the purposes of this study, metacognition includes the processes of planning, progress monitoring, and reflection. According to Albert Bandura (2006), self-efficacy is domain-specific, which is why separate self-efficacy scales were developed for each of the domains. The main purpose of this study was to examine the role of metacognition and self-efficacy in predicting middle school students’ academic results. The study addresses the following research questions:
- Are there differences in middle school students’ metacognitive skills, self-efficacy, and academic results by gender and grade?
- Do metacognition and self-efficacy predict students’ academic results by subject domains?
Methodology, Methods, Research Instruments or Sources UsedThis study employed a cross-sectional survey design.
Sample. The sample included 1,167 students (55.3% girls, n = 645) from seventh (n = 345), eights (n = 514), and nineth (n = 308) grades.
Instruments. The metacognition subscale is an adaptation from the SRL survey for DAACS (Lui et al., 2018). It includes the subscales of planning (5 items), monitoring (6 items), and reflection (7), using a Likert-type scale (4 – almost always, 1 – almost never), indicating good internal consistency estimate for the scale (α = 0.92; ω = 0.93). Example item: “I plan when I am going to do my homework”.
The self-efficacy surveys for mathematics (4 items, α = 0.85, ω = 0.9), Russian (4 items, α = 0.79, ω = 0.85), reading (4 items, α = 0.84, ω = 0.86), foreign language (5 items, α = 0.93, ω = 0.94), biology (4 items , α = 0.87, ω = 0.9), and physics (5 items, α = 0.93, ω = 0.95) used a Likert-type scale (4 – I can do it well, 1– I cannot do it at all) with good reliability estimates. An example item: “Can you solve a math problem?”.
Procedures. After receiving approval from the Ethics Committee, the data were collected online in public schools. Parents signed online consent forms, and children provided their assent to participate. The data analyses were conducted in R Studio.
Results
RQ1: While no differences were observed for planning and reflection, girls showed higher scores for monitoring than boys, t = 2, df = 1090.6, p = 0.04, d = 0.12. No differences were observed in self-efficacy for math, reading, foreign language, and biology. However, girls had higher self-efficacy for Russian, t = 7.81, df = 1023.6, p < 0.0001, d = 0.47. Boys had higher self-efficacy for physics, t = -3.72, df = 1095.9, p < 0.001, d = 0.22. Girls reported higher scores across all subjects than boys. Examination by grade levels revealed that students form the 9th grade had higher estimates for planning, reflection, and self-efficacy across most subjects than students from the 7th and 8th grades.
RQ2: Linear regression analyses revealed that planning predicted students’ scores in foreign language and biology, and reflection predicted scores for foreign language and physics. For all other subjects, contributions of metacognition were not significant. In contrast, self-efficacy significantly predicted scores for all subjects, explaining between 16% and 32% of variance in scores.
Conclusions, Expected Outcomes or FindingsThis paper examined the role of metacognition and self-efficacy in predicting middle school students’ academic results. The group comparison results revealed that girls had higher scores in metacognitive monitoring than boys. No differences were observed for metacognitive planning and reflection. Also, girls indicated higher self-efficacy in Russian and boys higher self-efficacy in physics. These results are partially in line with research studies, showing gender differences with boys scoring higher in mathematics (Breda & Napp, 2019) and research on perceived self-efficacy (Pajares & Valiante, 2002). Students from the 9th grade seemed to have higher scores for planning, reflection, and self-efficacy across all subjects. Ninth grade is considered a final grade of the middle school in Russia and students take the final examination, and then decide if they continue in high school or switch to other educational institutions. In 9th grade, students’ abstract thinking and analysing skills necessary to reflect on behaviours and emotions are developed enough to engage in metacognitive thinking (Uytun, 2018).
The results of the regression analysis indicated that metacognition was not as strong in predicting students’ scores in respective subjects as self-efficacy. However, planning and reflection contributed to scores in foreign language, biology, and physics. These results support research studies reporting weak and moderate relationships of metacognition with academic results (Cera et al., 2013; Ridlo & Lutfia, 2016) and significant contributions of self-efficacy to academic achievement (DiBenedetto & Schunk, 2022).
The scholarly significance of this study is that it examined the relationships among metacognition, self-efficacy by domains, and academic achievement of middle school students, using a relatively large sample in Russia. It provides evidence of the links between students perceived self-efficacy beliefs and their results in subject domains, and positive role of planning and reflection for some subjects.
ReferencesAnderman, E. M., & Wolters, C. A. (2008). Goals, values, and affect: Influences on student
motivation. In P. A. Alexander and P. H. Winne (Eds.), Handbook of educational psychology, 369–390, 2nd ed. Lawrence Erlbaum Associates Publishers.
Bandura, A. (2006). Toward a psychology of human agency. Perspectives on
psychological science, 1(2), 164-180.
Breda, T. & Napp, C. (2019). Girls’ comparative advantage in reading can largely explain the gender gap in math-related fields.” Proceedings of the National Academy of Sciences, 116(31), 15435-15440. https://doi.org/10.1073/pnas.1905779116
Carpenter, S. L. (2007). A comparison of the relationships of students' self-efficacy, goal
orientation, and achievement across grade levels: a meta-analysis. https://summit.sfu.ca/_flysystem/fedora/sfu_migrate/2661/etd2816.pdf
DiBenedetto, M. K., & Schunk, D. H. (2022). Assessing academic self-efficacy. In M. S. Khine and Tine Nielsen (Eds.), Academic Self-Efficacy in Education: Nature, Assessment, and Research 11-37. Springer.
Cera, R., Mancini, M., & Antonietti, A. (2013). Relationships between metacognition, self-efficacy and self-regulation in learning. Journal of Educational, Cultural and Psychological Studies (ECPS Journal), 4(7), 115-141.
Fleur, D.S., Bredeweg, B. & van den Bos, W. Metacognition: ideas and insights from neuro- and educational sciences. npj Sci. Learn. 6, 13 (2021). https://doi.org/10.1038/s41539-021-00089-5
McCormick, C. B. (2003). Metacognition and learning. In W. M. Reynolds & G. E. Miller (Eds.), Handbook of psychology: Educational psychology (Vol. 7, pp. 79-102). John Wiley & Sons Inc.
Pajares, F., & Valiante, G. (2002). Students’self-efficacy in their self-regulated learning strategies: a developmental perspective. Psychologia, 45(4), 211-221.
Ridlo, S., & Lutfiya, F. (2017, March). The correlation between metacognition level with self-efficacy of biology education college students. In Journal of Physics: Conference Series (Vol. 824, No. 1, p. 012067). IOP Publishing.
Sternberg, R. J. (2007). Intelligence, competence, and expertise. In A. J. Elliot, & C. S. Dweck (Eds.), Handbook of competence and motivation (pp. 15–30). The Guilford Press.
Uytun, M. C. (2018). Development period of prefrontal cortex. In A. Starcevic and
B. Filipovic (Eds.), Prefrontal Cortex. IntechOpen. DOI: 10.5772/intechopen.78697
Valencia-Vallejo, N., López-Vargas, O., & Sanabria-Rodríguez, L. (2019). Effect of a metacognitive scaffolding on self-efficacy, metacognition, and achievement in e-learning environments. Knowledge Management & ELearning, 11(1), 1–19. https://doi.org/10.34105/j.kmel.2019.11.001
Winne, P., & Azevedo, R. (2022). Metacognition and self-regulated learning. In R. K. Sawyer (Ed.), The Cambridge handbook of the learning sciences, 93-113. Cambridge University Press.
Zimmerman, B. J., & Cleary, T. J. (2009). Motives to self-regulate learning: A social-cognitive account. In K. Wentzel, & A. Wigfield (Eds.), Handbook on Motivation at School. Taylor & Francis.
09. Assessment, Evaluation, Testing and Measurement
Paper
The Impact of the Negative Grading Effect in Different School Subjects
David Clarke, Alli Klapp, Monica Rosén
University of Gothenburg, Sweden
Presenting Author: Clarke, David
The negative grading effect (NGE) is the decrement in grade outcomes associated with the process of being assessed and graded. By exploiting the natural experimental conditions resulting from the introduction or abolition of grades earlier in the school career, researchers have been able to contrast the outcomes of comparable groups of Swedish students with different grading backgrounds, i.e. whether they were previously graded or not. The effect has repeatedly been identified in students’ year 9 (age 15/16) grades, and seems to particularly affect low-ability students and boys (Facchinello, 2014; Klapp, Cliffordson, & Gustafsson, 2016; Clarke, Klapp, & Rosen, under review). Despite substantial reforms to the grading and assessment system, the effect persists and thus seems to have an enduring and robust impact on compulsory school students’ grades. When investigating the effect, previous research has, however, tended to combine the results from multiple or all school subjects together, thus losing the possibility to identify detail and nuance in how the NGE may differentially impact individual school subjects. There is a strong imperative for investigating the NGE for individual subjects rather than using aggregated grades like SAT- or GPA-scores. As they are the primary means of student sorting to higher education and employment Swedish school grades have become increasingly high-stakes (Lundahl, Hultén, & Tveit, 2017) and may be driving student testing and grading anxiety, a factor known to affect test performance. Previous research indicates that students’ levels of test anxiety varies between subjects with the suggestion that there may even be subject-specific anxieties (e.g. mathematics anxiety (Mammarella, Donolato, Caviolo, & Giofrè, 2018)). Student self-efficacy beliefs also vary between subjects, with high self-efficacy thought to be a protective factor against test anxiety (Marsh, 1990; Mammarella, Donolato, Caviolo, & Giofrè, 2018). Different subjects are also taught by different teachers, have different content requiring different modes of thinking, and are likely to receive very different test feedback, another factor which impacts subjects differently (Azmat & Iriberri, 2010). These few examples suggest that the impact of being graded is dependent on several factors which vary from subject to subject, and supports the need to investigate the NGE for different subjects. This research intends to replicate and extend previous research to investigate how the NGE manifests in different school subjects. By comparing the year 9 grade outcomes in different subjects of students who were either previously graded in year 6 or not it is hoped to establish how the NGE varies across different subjects. As with previous research, how factors such as gender, parental educational background, immigration status, and student’s cognitive ability affect the expression of the NGE will also be investigated. The research is relevant also to wider audiences than just the Swedish education system. As the NGE has been shown to be robust across variations in the Swedish system it is reasonable to infer that it might exist in other the results of other countries, especially given that Sweden is not unique in using end of year grades for high-stakes purposes like admission to further education and employment.
Methodology, Methods, Research Instruments or Sources UsedThis quasi-experimental study plans to use structural equation modelling or multivariate regression analyses of data collected in the Evaluation Through Follow-up project of Sweden’s compulsory school students. The database contains information from recurring studies of cohorts of students since 1948 to present. The database contains student and parental demographic background and questionnaire data, as well as teacher and school information. The data contains student academic performance measures from multiple points in their academic career as well as cognitive ability measures collected by testing the students in year 6 (age 12/13).
The analysis uses birth-cohorts 1992 (N = 10147) and 2004 (N = 9775). This comparison allows for the evaluation of the academic outcomes of students in cohorts before and after a reform that lowered the age at which students are first graded. The reforms also introduced changes which increased the stakes of grades by i. a. introduction of a fail grade. The outcomes of students who have previously been graded will be compared on a by-subject level to those who have not previously received grades to determine whether having previously received grades has differential effects for different subjects.
In addition to the grading status of the students, the analysis will also include the independent variables for student gender, parental education level, immigration background, and student cognitive ability levels. The dependent variables will be the grade outcomes for the school subjects studied achieved at the end of school year 9 (age 15/16). Data are available for around 14 subjects.
Statistical analysis and modelling will use Mplus version 8.5 (Muthén & Muthén, 1998-2019) which can account for missing data and possible clustering effects of students within schools.
Conclusions, Expected Outcomes or FindingsFurther support for the presence of the NGE is expected. The NGE is expected to vary in magnitude between subjects. However, at this stage, the exact nature of how the NGE varies between the various school subjects or the presence of any patterns or groupings of the subjects has not yet been determined. The remaining independent variables are expected to show similar relationships to the grade outcomes as previous research has established, though again, some between-subject variation is expected, but has not yet been determined. The study is ongoing and results are expected around Summer 2024. The study is a part of the research project funded by the Swedish Research Council (2019-04531).
ReferencesAzmat, G., & Iriberri, N. (2010). The importance of relative performance feedback information: Evidence from a natural experiment using high school students. Journal of Public Economics, 94, 435-452. doi:https://doi.org/10.1016/j.jpubeco.2010.04.001
Clarke, D. R., Klapp, A., & Rosen, M. (under review). The negative effect of earlier grading.
Facchinello, L. (2014). The impact of early grading on academic choices: mechanisms and social implications. Department of Economics. Stockholm: Stockholm Schools of Economics. Retrieved from https://mysu.sabanciuniv.edu/events/sites/mysu.sabanciuniv.edu.events/files/units/FASS%20Editor/jmp_-_luca_facchinello.pdf
Klapp, A., Cliffordson, C., & Gustafsson, J.-E. (2016). The effect of being graded on later achievement: evidence from 13-year olds in Swedish compulsory school. Educational Psychology, 36(10), 1771-1789. doi:https://doi.org/10.1080/01443410.2014.933176
Lundahl, C., Hultén, M., & Tveit, S. (2017). The power of teacher-assigned grades in outcome-based education. Nordic Journal of Studies in Educational Policy, 3(1), 56-66. doi:https://doi.org/10.1080/20020317.2017.1317229
Mammarella, I. C., Donolato, E., Caviolo, S., & Giofrè, D. (2018). Anxiety profiles and protective factors: A latent profile analysis in children. Personality and Individual Differences, 124, 201-208. doi:https://doi.org/10.1016/j.paid.2017.12.017
Marsh, H. W. (1990). The structure of academic self-concept: The Marsh/Shavelson Model. Journal of Educational Psychology, 82(4), 623-636.
Muthén, B., & Muthén, L. (1998-2019). Mplus user's guide (8th ed.). Los Angeles, CA: Author.
|