Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Please note that all times are shown in the time zone of the conference. The current conference time is: 10th May 2025, 09:50:14 EEST
|
Session Overview | |
Location: Room 012 in ΧΩΔ 02 (Common Teaching Facilities [CTF02]) [Ground Floor] Cap: 56 |
Date: Tuesday, 27/Aug/2024 | |
9:15 - 11:45 | 00 SES 0.5 WS L: MAXQDA and AI on Education Research – A Starter Workshop Session Location: Room 012 in ΧΩΔ 02 (Common Teaching Facilities [CTF02]) [Ground Floor] Session Chair: Francisco Freitas Workshop
Please bring your Laptop.
Based on Pre-registration |
|
00. Central & EERA Sessions
Research Workshop MAXQDA and AI on Education Research – A Starter Workshop Session MAXQDA - Verbi Software GmbH, Portugal Presenting Author:This workshop is designed for researchers and practitioners willing to learn about computer assisted qualitative and mixed-methods research. This hands-on session will comprise the presentation of the main options available for coding and extracting meaning from data using MAXQDA. Workshop participants will grant the possibility of testing different options, ranging from the more traditional approaches to automation using Artificial Intelligence (AI) tools. The workshop will consist of a quick guided practice tour through the opening stages of a research data project. Main features and tasks will be practiced in detail, including importing data, creating and applying codes, performing searches and queries, writing memos, retrieving selected coded data segments, analyzing data, and reporting results using some of the special features available (e.g. summaries, QTT, reports, AI Assist). The main goal of this workshop is to provide an overview of the data analysis process relying on MAXQDA assistance, a state-of-the-art software package for qualitative and mixed-methods research. Upon completing the session, workshop participants will identify important options available for tackling their qualitative research data. Verbi GmbH (2023), MAXQDA User Manual (https://www.maxqda.com/help-mx22/welcome) Methodology, Methods, Research Instruments or Sources Used . Conclusions, Expected Outcomes or Findings .: References : |
13:15 - 14:45 | 09 SES 01 B: Insights into Learning and Assessment Location: Room 012 in ΧΩΔ 02 (Common Teaching Facilities [CTF02]) [Ground Floor] Session Chair: Sarah Howie Paper Session |
|
09. Assessment, Evaluation, Testing and Measurement
Paper Dangers of the Skip-Button - How Learning Analytics Can Enhance Learning Platforms and Student Learning University of Flensburg, Germany Presenting Author:A high number of European students face challenges in reading (Betthäuser et al., 2023; Mullis et al., 2023) and require reading support as the new Progress in International Reading Literacy Study (PIRLS) displayed last year. Consequently, it is crucial to employ formative diagnostics to record the learning levels of students, enabling the design of targeted interventions at an early stage. As a result, the German Kultusminsterkonferenz [Commission of the Conference of Ministers of Education] advocates for the early implementation of nationwide diagnostics (Köller et al., 2022). Furthermore, the PIRLS, is not only conducted through a computer-based assessment since 2016, but also has a focus point in digital forms of reading (Mullis & Martin, 2019). Formative assessment serves not only to monitor the students’ learning but also to provide ongoing feedback, refine teaching approaches, and address the unique needs of individual students (OECD, 2005; OECD, 2008). Formative assessment is essential for both students and teachers. Additionally, in the realm of monitoring and aggregating student data, is learning analytics, which plays a vital role for researchers and developers of educational applications. Researchers seek to understand students' learning behaviors and their utilization of platforms, while developers strive to leverage this information for enhancing learning platforms (SoLAR, 2021). In the project DaF-L an adaptive, digital, and competence-oriented reading screening with aligned reading packages, consisting of literary texts and reading exercises, was developed, tested, standardized, and subsequently made available as an Open Education Resource (OER) on the online learning platform Levumi (Gebhardt et al., 2016) for primary schools in Germany. The digital reading packages were developed for three reading ability levels, in which the students were sorted into through the screening. The packages’ reading ability levels consist of the same story line for the literary texts and the same exercise formats with some variation depending on the ability level. One key importance of the reading packages is the digitalization. The reading packages support different ability levels and an individualized learning where students can work on the exercises at their own speed with integrated tools such as immediate individualized feedback, second try-options, and solutions. Additionally, and essential is that the reading packages can be used as a diagnostic tool, which enables teachers to support students in the best way possible. Teachers are required and encouraged to conduct and to employ diagnostics with the focus to support their students. However, it is very time consuming and difficult for teachers to execute as they often do not know what ability test/s they should convey, how to determine if the application was helpful, if students used their full potential to answer questions, where they might need help, and what tools were helpful or unnecessary in an application. Therefore, the DaF-L project provides diagnostic tools for teachers with everything they need in order to support their students. Furthermore, through the intervention study, researchers who developed the digital reading packages received essential data. The gathered data offered insights into the reading packages, allowing for an assessment of the reading packages' strengths and weaknesses. This information was utilized to make essential adjustments, ensuring the development of the most optimal application. Moreover, the research conducted a four-week intervention in a regular classroom setting continuously gathering data to gain a deeper understanding of students' learning behaviors and their utilization of the learning platform. The presentation will discuss the digitalization of the reading packages with focusing on learning analytics. With the objective of exploring how learning analytics, such as examining the time dedicated to reading literary texts, time spent on answering questions, and assessing the use of the skip-button, can improve learning platforms worldwide. Methodology, Methods, Research Instruments or Sources Used The collaborative project follows a multi-method design. An ABA-design was selected for the intervention study (Graham et al., 2012). The study was expedited from April 2023 until July 2023 and collected quantitative data of individuals, groups, and classes. It consisted of a survey group (N = 59) and a control group (N = 53). A) Start of the initial testing started, which consisted of the self-developed digital and competence-oriented reading screening and the ELFE 2, which is an established diagnostic test. B) Approximately two weeks later, in the first lesson the students took a self-developed digital a-version test tailored to their reading packages, marking the initiation of a four-week intervention phase. The intervention (reading support) occurred three times a week for 30 minutes within a classroom setting. Students were provided with a reading package based on their proficiency levels and worked on them individually. Throughout all intervention sessions students’ responses as well as any additional information regarding their learning and their platform usage were digitally recorded. At the end of the intervention, students participated in the b-version of the aligned test as well as a second administration of the competence-oriented reading screening and the ELFE 2. A) A follow-up was conducted with the ELFE 2, the screening, and the c-version of the aligned test. Throughout the study, educators were interviewed, and observation protocols were employed. The learning platform Levumi and digital reading packages underwent adjustments based on insights gleaned from these interviews and observation protocols. However, the most intriguing insights of the students’ learning behavior emerged from the data collected during the intervention (learning analytics). This data encompassed various aspects, including the duration students spent completing the entire reading package, the time allocated to each exercise, the time students devoted to initially reading the literary text, the time spent on text rereading, whether students attempted exercises a second time, and the frequency of skip-button usage. Moreover, the scrutinized learning behavior could also be compared with the test results, examining whether factors such as the duration spent on reading materials align with test scores. Conclusions, Expected Outcomes or Findings The insights gained from interviews with educators and the observation protocols played a crucial role in advancing and enhancing the learning platform Levumi and the digital reading packages. While the observation protocols partially revealed intriguing findings, the data, especially concerning the misuse of the skip-button, was a noteworthy revelation. The collected information also shed light on important aspects of students' reading and response behavior, including the time spent reading the text before tackling exercises, the duration devoted to each exercise, utilization of the text-going-back function for rereading, use of the showing solution feature, engagement with the 2nd attempt, and the frequency of skip-button usage. Furthermore, based on the collected data, adjustments were made to the reading packages. Others such as the removal of the skip-button, in response to the observed misuse, are planned. Moreover, the learning behavior could also be examined in correspondence with the results of the tests, such as if an excessive usage of the skip-button has a negative effect on the test results or if a short reading time correlates with the test results. Additionally, the insights gained on the project can be applied to other learning platforms worldwide in order to enhance those. References Betthäuser, B. A., Bach-Mortensen, A. M. & Engzell, P. (2023). A systematic review and meta-analysis of the evidence on learning during the COVID-19 pandemic. Nature human behaviour. Vorab-Onlinepublikation. Gebhardt, M., Diehl, K. & Mühling, A. (2016). Online Lernverlaufsmessung für alle SchülerInnen in inklusiven Klassen. www.LEVUMI.de. Zeitschrift für Heilpädagogik, 67(10), 444-454. Graham, J. E., Karmarkar, A. M., Ottenbacher, K. J. (2012). Small Sample Research Designs for Evidence-Based Rehabilitation: Issues and Methods. Archives of Physical Medicine and Rehabilitation, 93(8, Supplement), 111-S116. https://doi.org/10.1016/j.apmr.2011.12.017 Köller, O., Thiel, F., van Ackeren, I., Anders, Y., Becker-Mrotzek, M., Cress, U., Diehl, C., Kleickmann, T., Lütje-Klose, B., Prediger, S., Seeber, S., Ziegler, B., Kuper, H., Stanat, P., Maaz, K. & Lewalter, D. (2022). Basale Kompetenzen vermitteln – Bildungschancen sichern. Perspektiven für die Grundschule. Gutachten der Ständigen Wissenschaftlichen Kommission der Kultusministerkonferenz (SWK). SWK: Bonn. Mullis, I.V.S., von Davier, M., Foy, P., Fishbein, B., Reynolds, K.A., & Wry, E. (2023). PIRLS 2021 International Results in Reading. Boston College, TIMSS & PIRLS International Study Center. https://doi.org/10.6017/lse.tpisc.tr2103.kb5342 Mullis, I. V. S., & Martin, M. O. (Eds.). (2019). PIRLS 2021 Assessment Frameworks. Retrieved from Boston College, TIMSS & PIRLS International Study Center website: https://timssandpirls.bc.edu/pirls2021/frameworks/ Organisation for Economic Co-Operation and Development (OECD). (2008). Assessment for Learning - Formative Assessment . AE Assessment for Learning . https://www.oecd.org/site/educeri21st/40600533.pdf Organisation for Economic Co-Operation and Development (OECD). (2005). Formative Assessment: Improving Learning in Secondary Classrooms. Policy Brief. OECD. https://www.oecd.org/education/ceri/35661078.pdf PIRLS International Study Center website: https://timssandpirls.bc.edu/pirls2021/frameworks/ Society for Learning Analytics Research (SoLAR). (2021). What is Learning Analytics?. Society for Learning Analytics Research (SoLAR). https://www.solaresearch.org/about/what-is-learning-analytics/ Stanat, P., Schipolowski, S., Schneider, R., Sachse, K. A., Weirich, S. & Henschel, S. (2022). IQB-Bildungstrend 2021: Kompetenzen in den Fächern Deutsch und Mathematik am Ende der 4. Jahrgangsstufe im dritten Ländervergleich. Waxmann Verlag. 09. Assessment, Evaluation, Testing and Measurement
Paper The Missing Piece in Multi-Informant Assessments? A Systematic Review on Self-Reports of School-Aged Participants with ADHD. 1University of Vienna, Austria; 2North-West University, Research Focus Area Optentia, Vanderbijlpark, South Africa Presenting Author:Recent evidence suggests that multi-informant assessments of children and adolescents with Attention-Deficit/Hyperactivity Disorder (ADHD) and co-occurring problems are more likely to provide sufficient sensitivity and specificity for population screening and clinical use than single measures (De Los Reyes et al., 2015); however, the perspectives of children and adolescents are underrepresented in scientific studies (Caye et al., 2017; for review see Mulraney et al., 2022). The question remains whether children and adolescents are reliable sources of information about their own ADHD symptoms. This may point to the need to investigate the complex interplay between self- and other (i.e. teachers, parents) reported ADHD symptoms (e.g. hyperactivity, inattention) and other externalizing (e.g. aggression) and internalizing (e.g. anxiety) problems. This review aims to systematically analyze and examine existing empirical studies that have focused on the comparison of self-reported and other-reported ADHD symptoms and co-occurring behavior problems in children and adolescents with ADHD. The purpose is to evaluate (1) the overall inclusion of self-reports in the assessment process (2) the agreements between informants (3) which types of informants are frequently used and (4) the instruments utilized. Methodology, Methods, Research Instruments or Sources Used Eligible studies published over the past decade in four major databases (PubMed, ERIC, PsycINFO, Web of Science) and retrieved from educational and psychological peer-reviewed journals through a thorough manual hand-search process were identified. Following PRISMA 2020 (Brennan & Munn, 2021) guidelines for inclusion and exclusion criteria, the study focuses on prospective data collection of school-aged participants to minimize recall bias associated with retrospective data reported in previous studies (von Wirth et al., 2021). Conclusions, Expected Outcomes or Findings Only 11 studies out of 467 selected studies published between 2003 and 2023 that involved a sample of diagnosed school-aged participants met the pre-defined inclusion criteria. Agreements of raters differ by (1) type of other informants (i.e. teachers or parents), (2) methodological procedures, (3) utilized assessment instruments and their psychometric properties, and (4) measured constructs. A variety of screening measures were utilized, with questionnaires predominating over interviews. In addition to teacher reports, parent reports were commonly included, with only one study gathering information from objective measurement methods. The review emphasizes that researchers who include self-reports need to be aware that young participants with ADHD often tend to underreport their behavior problems. Considering the strengths and limitations of the study, implications for practice and future research concerning existing inconsistencies in the conceptualization of externalizing problems are discussed. References Brennan, S. E., & Munn, Z. (2021). PRISMA 2020: A reporting guideline for the next generation of systematic reviews. JBI Evidence Synthesis, 19(5), 906–908. https://doi.org/10.11124/JBIES-21-00112 Caye, A., Machado, J. D., & Rohde, L. A. (2017). Evaluating parental disagreement in ADHD diagnosis: Can we rely on a single report from home? Journal of Attention Disorders, 21(7), 561–566. APA PsycInfo. https://doi.org/10.1177/1087054713504134 De Los Reyes, A., Augenstein, T. M., Wang, M., Thomas, S. A., Drabick, D. A. G., Burgers, D. E., & Rabinowitz, J. (2015). The validity of the multi-informant approach to assessing child and adolescent mental health. Psychological Bulletin, 141(4), 858–900. https://doi.org/10.1037/a0038498 Mulraney, M., Arrondo, G., Musullulu, H., Iturmendi-Sabater, I., Cortese, S., Westwood, S. J., Donno, F., Banaschewski, T., Simonoff, E., Zuddas, A., Döpfner, M., Hinshaw, S. P., & Coghill, D. (2022). Systematic Review and Meta-analysis: Screening Tools for Attention-Deficit/Hyperactivity Disorder in Children and Adolescents. Journal of the American Academy of Child & Adolescent Psychiatry, 61(8), 982–996. https://doi.org/10.1016/j.jaac.2021.11.031 von Wirth, E., Mandler, J., Breuer, D., & Döpfner, M. (2021). The Accuracy of Retrospective Recall of Childhood ADHD: Results from a Longitudinal Study. Journal of Psychopathology & Behavioral Assessment, 43(2), 413–426. 09. Assessment, Evaluation, Testing and Measurement
Paper The Relationship Between Students’ Response Times and Their Socioeconomic Status in European Countries: A Case of Achievement Motivation Questionnaire Items The Anchoring Center for Educational Research, Faculty of Education, Charles University, Czech Republic Presenting Author:Student questionnaire data, typically collected via Likert-scale items, is commonly used to compare different groups of students, be it across countries or based on student characteristics such as gender and socioeconomic status (e.g., OECD, 2017). However, such analyses can lead to inaccurate conclusions as the data might be biased due to the differences in reporting behavior between different groups of students (e.g., He & van de Vijver, 2016; Kyllonen & Bertling, 2013). Students can, for example, differ in the amount of effort they put into filling in the questionnaire.
Our theoretical framework relates to reporting behavior in surveys. Terms careless responding or insufficient effort responding have been used to describe responding patterns in which respondents lack motivation to answer accurately and do not pay attention to the content of items and survey instructions (Goldammer et al., 2020). A number of approaches have been suggested to identify such careless responding, the analysis of response time (to the whole survey or parts of it) being one of them (Curran, 2016; Goldammer et al., 2020). It rests on an assumption that there is a minimum time needed to read and answer a questionnaire item (Goldammer et al., 2020). The term “speeding” has been used for responding too fast to questionnaire items to give much thought to answers (Zhang & Conrad, 2014).
The analysis of response times is a promising tool for identification of the differences in the amount of effort put into filling in questionnaire surveys between different groups of students. It could help identify careless responding (a) between different groups of students during a single wave of measurement as well as (b) changes in careless responding of particular groups of students across different waves of measurement. This could be exploited, for example, in longitudinal research studies using questionnaires (e.g., [foreign language] learning motivation studies) as well as international large-scale assessment (ILSA) studies such as, for example, Programme for International Student Assessment (PISA). So far, however, the knowledge concerning the differences in questionnaire item response times between different groups of students in the context of ILSA studies is rather limited.
Previous research has suggested that students’ reporting behavior may differ across different socioeconomic groups (Vonkova et al., 2017), encouraging further exploration of the reporting behavior-socioeconomic status relationship. In this contribution, we address this research area. Our aim is to analyze the relationship between students’ response times to achievement motivation questionnaire items and their socioeconomic status in European PISA 2015 participating countries. Our research question is: How do questionnaire response times to achievement motivation items differ between students with parents achieving different education levels in European PISA 2015 participating countries? Methodology, Methods, Research Instruments or Sources Used We analyze data from the PISA 2015 questionnaire, focusing on 171,762 respondents from 29 European countries who were administered the questionnaire via computer. Specifically, we look at the response time to question ST119 (Achievement motivation), and we use the highest achieved education by parents (PISA variable HISCED) as an indicator for the socioeconomic status of students. Only respondents, who had complete information on all analysed variables were included in this analysis. In the question ST119, respondents were asked to answer five statements on achievement motivation using responses Strongly disagree (1), Disagree (2), Agree (3), and Strongly agree (4). The five statements were: (1) I want top grades in most or all of my courses, (2) I want to be able to select from among the best opportunities available when I graduate, (3) I want to be the best, whatever I do, (4) I see myself as an ambitious person, and (5) I want to be one of the best students in my class (OECD, 2014). Response times were taken from the response time dataset for PISA 2015, specifically the variable ST119_TT. They were logged for each screen (in this case screen containing five items relating to achievement motivation) and they were logged in milliseconds. For the purposes of our analysis, we have set an upper limit of two minutes for students to be included in the analysis. That is because a vast majority of respondents were able to respond in this time interval and only 406 respondents took longer. These were typically respondents who took extremely long (one even nearing an hour spent on the screen) and as such would negatively impact the analysis through not displaying standard response behavior. Information on parental education levels (HISCED) was extracted from the PISA 2015 dataset which uses the ISCED (International Standard Classification of Education) 1997 classification. HISCED categories ranged from 0 to 6, representing various levels of educational attainment. HISCED0 represents unfinished ISCED 1 level, HISCED1 and HISCED2 represent ISCED 1 and 2 levels, respectively, HISCED3 represents ISCED 3B and 3C, HISCED4 represents ISCED 3A and 4, HISCED5 represents ISCED 5B, and HISCED6 represents ISCED 5A and 6. Due to the low number of observations, we combined HISCED 0-2 categories for the purposes of our analysis. Conclusions, Expected Outcomes or Findings Our initial analysis showed a notable inverse relationship between mean response times to question ST119 and HISCED for European countries. Specifically, respondents from families with a lower educational background took longer when answering. This is further highlighted when looking both at median times and results of linear regression with country fixed effects (time being the explained variable and HISCED levels and country dummies being the explanatory variables), both of which display the same trend. However, when examining the variation in each HISCED group, data showed that HISCED0-2 group had fairly higher variation of response time than all other HISCED groups, the lowest being in the HISCED5 group. This suggests that there is a greater heterogeneity in response time within the HISCED0-2 group. This indicates that this group consists of a mix of respondents with low and high response times to the question ST119. Our results show that it is necessary to take response times into consideration when comparing groups of respondents, as they can potentially affect the analysis. Further research may be focused on the relationship of response times and home possessions or other indicators of socioeconomic status. Additionally, further research may also analyze other world regions and compare them with the European results. References Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006 Goldammer, P., Annen,H., Stöckli, P.L., & Jonas, K.(2020). Careless responding in questionnaire measures: Detection, impact, and remedies. The Leadership Quarterly, 31(4), Article 101384. https://doi.org/10.1016/j.leaqua.2020.101384 He, J., & Van de Vijver, F. J. R. (2016). The motivation-achievement paradox in international educational achievement tests: Toward a better understanding. In R. B. King & A. B. I. Bernardo (Eds.), The psychology of Asian learners: A festschrift in honor of David Watkins (pp. 253–268). Springer Science. https://doi.org/10.1007/978-981-287-576-1 Kyllonen, P. C., & Bertling, J. (2013). Innovative questionnaire assessment methods to increase cross-country comparability. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), A handbook of international large-scale assessment data analysis: Background, technical issues, and methods of data analysis (pp. 277–285). Chapman Hall/CRC Press. OECD. (2014). PISA 2015 student questionnaire (computer-based version). https://www.oecd.org/pisa/data/CY6_QST_MS_STQ_CBA_Final.pdf OECD. (2017). PISA 2015 results (volume III): Students' well-being. https://doi.org/10.1787/9789264273856-en Vonkova, H., Bendl, S., & Papajoanu, O. (2017). How students report dishonest behavior in school: Self-assessment and anchoring vignettes. The Journal of Experimental Education, 85(1), 36-53. https://doi.org/10.1080/00220973.2015.1094438 Zhang, C., & Conrad, F. (2014). Speeding in web surveys: The tendency to answer very fast and its association with straightlining. Survey Research Methods, 8(2), 127–135. https://doi.org/10.18148/srm/2014.v8i2.5453 |
17:15 - 18:45 | 09 SES 03 B: Challenges in Educational Measurement Practices Location: Room 012 in ΧΩΔ 02 (Common Teaching Facilities [CTF02]) [Ground Floor] Session Chair: Elena Papanastasiou Paper Session |
|
09. Assessment, Evaluation, Testing and Measurement
Paper A Peculiarity in Educational Measurement Practices University of Oslo, Norway Presenting Author:This paper discusses a peculiarity in institutionalized educational measurement practices. Namely, an inherent contradiction between guidelines for how scales/tests are developed and how those scales/tests are typically analyzed. Standard guidelines for developing scales/tests emphasize the need to identify the intended construct and select items to capture the construct’s full breadth, leading items (or subsets of items) to target different aspects of the construct. This occurs in test development through specifying the test’s content domain along with a blueprint allocating items to content domains, item formats, and/or cognitive demand levels (AERA, APA, & NCME, 2014, ch. 4). Similarly, scale development guidelines emphasize identifying sub-facets of constructs, such that items can be targeted to capture each sub-facet, ensuring that the full construct is measured (e.g., Gehlbach & Brinkworth, 2011; Steger et al., 2022). These guidelines intentionally ensure that items (or subsets of items) contain construct-relevant variation that is not contained in every other item (e.g., it is recommended to include geometry-related items when measuring math ability because such items capture construct-relevant variation in math ability that is not present in, e.g., algebra-related items; c.f., Stadler et al., 2021). At the same time, scales/tests are typically analyzed with reflective measurement models (Fried, 2020). I focus on factor models for simplicity, but the same basic point applies to item-response theory models, as a reparameterization of item-response theory models to non-linear factor models would show (McDonald, 2013). In the unidimensional factor model, the item, Xip, is modelled as Xip=(alpha_i+lambda_i*F_p)+e_p, where i represents items, p is persons, alpha_i is an item intercept, lambda_i is a factor loading, F_p is the latent factor construct, and e_p is the person-specific error. The (alpha_i+lambda_i*F_p) term can be understood as an item-specific linear rescaling of the latent factor (that is on an arbitrary scale) to the item’s scale, just as one might rescale a test to obtain more interpretable scores. The factor model, then, consists of two parts, the rescaled factor and the error term. Since each item is defined as containing a rescaling of the factor and this is the only construct-relevant variation contains in items, each item must contain all construct-related variation (i.e., all changes in the construct are reflected in each item). Note that these points are conceptual, stemming from the mathematics of the factor model, not claims about the results of fitting models to specific data. There is a contradiction here: Scales/tests are intentionally designed so that each item (or subset of items) captures unique, construct-related variation, but analyses are conducted under the assumption that no item (nor subset of items) contain unique, construct-related variation. To have such a clear contradiction baked into the institutionalized practices of measurement in the educational and social sciences is peculiar indeed. Methodology, Methods, Research Instruments or Sources Used This is a discussion paper so there are no true methods per se. The analyses are based on careful study of institutionalized guidelines for how to construct tests and survey scales and the typical approaches for analyzing data from tests and survey scales. The presentation will focus on reviewing direct quotes from these guidelines in order to build the case that there is an inbuilt contradiction to baked into current “best practices” in measuring in the educational sciences. I will then present a logical analysis of the implications for this contradiction. Drawing on past and recent critiques of reflective modeling, I will propose that this contradiction persists because reflective models provide a clear and direct set of steps to support a set of epistemological claims about measuring the intended construct reliably and invariantly. I will then argue that, given the contradiction, these epistemological claims are not strongly supported through appeal to reflective modelling approaches. Rather, this contradiction leads to breakdowns in scientific practice (White & Stovner, 2023). Conclusions, Expected Outcomes or Findings The reflective measurement models that are used to evaluate the quality of educational measurement are built using a set of assumptions that contradict those used to build tests and scales. This peculiarity leaves the field evaluating the quality of measurement using models that, by design, do not fit the data to which they are applied. This raises important questions about the accuracy of claims that one has measured a specific construct, that measurement is reliably, and/or that measurement is or is not invariant. There is a need for measurement practices to shift to create alignment between the ways that tests/scales are created and how they are analyzed. I will discuss new modelling approaches that would facilitate this alignment (e.g., Henseler et al., 2014; Schuberth, 2021). However, questions of construct validity, reliability, and invariant measurement become more difficult when moving away from the reflective measurement paradigm. References American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for Educational and Psychological Testing. American Educational Research Association. http://www.apa.org/science/programs/testing/standards.aspx Fried, E. I. (2020). Theories and Models: What They Are, What They Are for, and What They Are About. Psychological Inquiry, 31(4), 336–344. https://doi.org/10.1080/1047840X.2020.1854011 Gehlbach, H., & Brinkworth, M. E. (2011). Measure Twice, Cut down Error: A Process for Enhancing the Validity of Survey Scales. Review of General Psychology, 15(4), 380–387. https://doi.org/10.1037/a0025704 Henseler, J., Dijkstra, T. K., Sarstedt, M., Ringle, C. M., Diamantopoulos, A., Straub, D. W., Ketchen, D. J., Hair, J. F., Hult, G. T. M., & Calantone, R. J. (2014). Common Beliefs and Reality About PLS: Comments on Rönkkö and Evermann (2013). Organizational Research Methods, 17(2), 182–209. https://doi.org/10.1177/1094428114526928 Maraun, M. D. (1996). The Claims of Factor Analysis. Multivariate Behavioral Research, 31(4), 673–689. https://doi.org/10.1207/s15327906mbr3104_20 McDonald, R. P. (2013). Test Theory: A Unified Treatment. Psychology Press. Schuberth, F. (2021). The Henseler-Ogasawara specification of composites in structural equation modeling: A tutorial. Psychological Methods, 28(4), 843–859. https://doi.org/10.1037/met0000432 Stadler, M., Sailer, M., & Fischer, F. (2021). Knowledge as a formative construct: A good alpha is not always better. New Ideas in Psychology, 60, 1-14. https://doi.org/hqcg Steger, D., Jankowsky, K., Schroeders, U., & Wilhelm, O. (2022). The Road to Hell Is Paved With Good Intentions: How Common Practices in Scale Construction Hurt Validity. Assessment, 1-14. https://doi.org/10.1177/10731911221124846 White, M., & Stovner, R. B. (2023). Breakdowns in Scientific Practices: How and Why Practices Can Lead to Less than Rational Conclusions (and Proposed Solutions). OSF Preprints. https://doi.org/10.31219/osf.io/w7e8q 09. Assessment, Evaluation, Testing and Measurement
Paper Exploring Mode-of-Delivery Effect in Reading Achievement in Sweden: A study using PIRLS 2021 data Göteborgs Universitet, Sweden Presenting Author:Reading literacy is considered an essential factor for learning and personal development (Mullis & Martin, 2015). International assessments like PIRLS are tracking trends and shaping literacy policies. They seek to evaluate global student learning, offering crucial insights into educational performance to shape informed policy decisions. Given the ongoing technological expansion and innovation, a shift in delivery mode became an inevitable progression (Jerrim, 2018). PIRLS has adapted to these changes, introducing the digital format in 2016 (ePIRLS) and achieving a significant milestone in 2021 with the partial transition to a digital assessment, through a web-based digital delivery system. Digital PIRLS included a variety of reading texts presented in an engaging and visually attractive format that were designed to motivate students to read and interact with the texts and answer comprehension questions. While considerable effort has been invested to ensure content similarity between the two formats, variations persist due to the distinct modes of administration (Almaskut et al., 2023). This creates the need for further analysis and exploration to better understand the impact of these differences on the overall outcomes and effectiveness of the administered modes. Previous research has highlighted the presence of a mode effect, varying in magnitude, when comparing paper-based and digital assessments (Jerrim, 2018; Kingston, 2009). Jerrim's (2018) analysis of PISA 2015 field trial data across Germany, Ireland, and Sweden indicates a consistent trend of students scoring lower in digital assessments compared to their counterparts assessed on paper. Furthermore, Kingston's meta-analysis (2009) indicates that, on average, elementary students score higher on paper and exhibit small effect sizes when transitioning from paper-based to digital reading assessments. On the other hand, PIRLS 2016 was administered both in paper and digitally in 14 countries, where students in nine countries performed better in digital assessments, while only in five countries did students perform better in paper (Grammatikopoulou et al., 2024). Formulärets överkant Additionally, research underscores the distinct consequences of printed and digital text on memory, concentration, and comprehension (Delgado et al., 2018; Baron, 2021). Furthermore, previous findings support the fact that there is variation when it comes to the factors influencing performance in these two modes. Time spent on internet and computer use for school was found as a significant predictor of digital assessments, but not of paper-based (Gilleece & Eivers, 2018). The present study Sweden was among the 26 countries out of 57 that administered the digital format in PIRLS 2021. Another paper-based text -replicated from PIRLS 2016- was also administered to a ‘bridge’ sample. To maintain consistency across formats, both digital PIRLS and paper PIRLS share identical content in terms of reading passages and questions. However, digital PIRLS utilizes certain features and item types that are not accessible in the traditional paper and pencil mode. The digital version showcased advantages such as operational efficiency and enhanced features, while maintaining content consistency with the paper format. The primary aim of the present study is to investigate a potential mode effect between digital and paper formats, if there, and explore any variations in reading achievement between the two formats. Despite advancements in digital assessment, there remains a gap in our understanding of how the shift from traditional paper-based assessments to digital formats may impact reading literacy outcomes. By delving into these potential differences, we aim to contribute valuable insights into the evolving landscape of educational assessments, informing educators, policymakers, and researchers about the effectiveness and potential challenges associated with the integration of digital modes in literacy evaluation. Methodology, Methods, Research Instruments or Sources Used The present study uses PIRLS 2021 data for Sweden. Sweden participated in digital PIRLS 2021 with 5175 students. A bridge sample, separate and equivalent, was administered on paper for 1863 students (Almaskut et al., 2023). The study aims to explore the potential mode effect in both paper-based and digital assessments, utilising item data from digital PIRLS and paper PIRLS. To assess and compare digital PIRLS and paper PIRLS as measures, we will employ a bifactor structural equation model, with a general reading achievement factor and specific factors representing the digital and paper formats. Constructing a bifactor model involves specifying key components to capture the nuances of reading achievement in both digital and paper formats. In this framework, a general reading achievement factor is introduced alongside specific factors representing the unique aspects of the digital and paper assessment modes. Notably, PIRLS categorizes reading into two broad purposes: reading for literary experience and reading to acquire and use information. Building upon this categorization, we will construct two variables based on the stated purposes of reading: the literary and the informational. We will explore how these variables contribute to reading achievement and whether there are variations in reading achievement between digital and paper formats. The model will incorporate paths from 'Literary’ and 'Information’ to both the general factor and specific factors. These paths facilitate the examination of how each observed variable influences the overall reading achievement and its specific manifestations in the digital and paper contexts. Additionally, observed indicators for each variable are included, ensuring a comprehensive representation of the constructs in the bifactor model. Furthermore, the analysis will control for socio-economic status (SES), immigrant background, and gender as variables while exploring mode effects or bias in either mode. Conclusions, Expected Outcomes or Findings The study will employ a bifactor model in the context of PIRLS 2021 data for Sweden to elucidate the multifaceted construct of reading literacy/achievement and potential mode effects between digital and paper formats. While the empirical results are pending, we anticipate several key outcomes. We expect to observe variations in the relationships between our latent constructs and observed indicators based on the mode of assessment. Based on previous findings, we tentatively expect to discern the presence of both general and specific factors, indicating that there are unique aspects associated with digital and paper reading processes that significantly impact reading achievement beyond the shared aspects captured by the general factor. Our expectation is grounded in the understanding that different areas and processes of reading may exhibit varied patterns. For instance, we speculate that while informational reading might predominantly contribute to the general reading achievement factor, fictional or longer text reading may exhibit specific factors. This differentiation in our analysis aims to provide a more nuanced understanding of the complex relationships within the reading achievement construct, considering the diverse aspects of reading activities and processes associated with digital and paper formats. The complexities showed in our analyses may prompt inquiries into additional contextual factors, the stability of mode effects across different populations, and the longitudinal impact on reading outcomes. In conclusion, our study's expected outcomes encompass a comprehensive exploration of mode effects, the unique contributions of latent factors, the significance of specific indicators, implications for educational practice, and the identification of future research directions. References Almaskut, A., LaRoche, S., & Foy, P. (2023). Sample Design in PIRLS 2021. TIMSS & PIRLS International Study Center. https://doi.org/10.6017/lse.tpisc.tr2103.kb9560 Baron, N. S. (2021). Know what? How digital technologies undermine learning and remembering. Journal of Pragmatics, 175, 27–37. https://doi.org/10.1016/j.pragma.2021.01.011 Cheung, K., Mak, S., & Sit, P. (2013). Online Reading Activities and ICT Use as Mediating Variables in Explaining the Gender Difference in Digital Reading Literacy: Comparing Hong Kong and Korea. The Asia-Pacific Education Researcher, 22(4), 709–720. https://doi.org/10.1007/s40299-013-0077-x Cho, B.-Y., Hwang, H., & Jang, B. G. (2021). Predicting fourth grade digital reading comprehension: A secondary data analysis of (e)PIRLS 2016. International Journal of Educational Research, 105, 101696. https://doi.org/10.1016/j.ijer.2020.101696 Delgado, P., Vargas, C., Ackerman, R., & Salmerón, L. (2018). Don’t throw away your printed books: A meta-analysis on the effects of reading media on reading comprehension. Educational Research Review, 25, 23–38. https://doi.org/10.1016/j.edurev.2018.09.003 Gilleece, L., & Eivers, E. (2018). Characteristics associated with paper-based and online reading in Ireland: Findings from PIRLS and ePIRLS 2016. International Journal of Educational Research, 91, 16–27. https://doi.org/10.1016/j.ijer.2018.07.004 Grammatikopoulou, E., Johansson, S., & Rosén, M., (2024). Paper-based and Digital Reading in 14 countries: Exploring cross-country variation in mode effects. Unpublished manuscript. Jerrim, J., Micklewright, J., Heine, J.-H., Salzer, C., & McKeown, C. (2018). PISA 2015: How big is the ‘mode effect’ and what has been done about it? Oxford Review of Education, 44(4), 476–493. https://doi.org/10.1080/03054985.2018.1430025 Kingston, N. M. (2008). Comparability of Computer- and Paper-Administered Multiple-Choice Tests for K–12 Populations: A Synthesis. Applied Measurement in Education, 22(1), 22–37. https://doi.org/10.1080/08957340802558326 Krull, J. L., & MacKinnon, D. P. (2001). Multilevel Modeling of Individual and Group Level Mediated Effects. Multivariate Behavioral Research, 36(2), 249–277. https://doi.org/10.1207/S15327906MBR3602_06 Mullis, I. V. S., & Martin, M. O. (Eds.). (2015). PIRLS 2016 Assessment Framework (2nd ed.). Retrieved from Boston College, TIMSS & PIRLS International Study Center website: http://timssandpirls.bc.edu/pirls2016/framework.html Rasmusson, M., & Åberg-Bengtsson, L. (2015). Does Performance in Digital Reading Relate to Computer Game Playing? A Study of Factor Structure and Gender Patterns in 15-Year-Olds’ Reading Literacy Performance. Scandinavian Journal of Educational Research, 59(6), 691–709. https://doi.org/10.1080/00313831.2014.965795 09. Assessment, Evaluation, Testing and Measurement
Paper What’s the Effect of Person Nonresponse in PISA and ICCS? The Swedish National Agency for Education Presenting Author:International Large Scale Assessments (ILSA), such as PISA and ICCS, provide internationally comparative data on students' knowledge and abilities in various subjects. The results across assessments permit countries to make comparisons of their educational systems over time and in a global context. To make this possible, the implementation and the methodology on which the studies are based need to be rigorously standardized and of high quality. But even in a well-designed study, missing data almost always occurs. Missing data can reduce the statistical power of a study and can produce biased estimates, leading to invalid conclusions. The mechanisms by which missing data occurs are many. Such a mechanism emerge, for example, from studies based on low stake tests (which ILSA should be considered as). In low stake tests the students nor their teachers receive any feedback based on the students' results. Besides risking reduced validity of results from comparisons, both over time and between countries, low stake tests run the risk of giving rise to a greater proportion of missing data. Sweden has a long tradition of high quality population administrative register data and this tradition has led us into having a great deal of data linked to the individuals via so-called social security numbers. It is relatively common for researchers and authorities to employ these high quality data in their analysis entailing more reliable results. The Swedish National Agency for Education regularly use register data when producing the official statistics and to a certain extent also when carrying out evaluation studies. The ILSA:s, used to evaluate the condition of the Swedish schooling system, both by the Swedish National Educational Agency as well as by decision-makers and other stakeholders. To further the possibilities of secondary analyses and to increase relevance t to the national context, it is therefore pertinent to collate data from registers with data from the ILSA:s. Historically, the Swedish National Agency for Education has only been able to link register data to ILSA data for participating students. This is because the participating students are considered as having given their consent for such linkages. However, before conducting PISA 2022 and ICCS 2022, the legal requirements (?) changed so that it became possible for the Swedish National Agency for Education to link register data also to nonresponding students, i.e. not only to the participating students. Methodology, Methods, Research Instruments or Sources Used The Swedish samples in PISA 2022 and ICCS 2022 consist of 7 732 15-year-olds and 3 900 students in grade 8, respectively. After the students who are to be excluded due to cognitive or physical impairment or alternatively due to not having good enough skills in the Swedish language, 7 203 in PISA 2022 and 3 632 in ICCS 2022 remain. Of these, the weighted student nonresponse is 15 percent and 13 percent in PISA and ICCS respectively. By employing register data, such as for example the students' final grades in primary school, migration background and the parents' level of education, on the full sample we have studied covariation of student nonresponses and student background characteristics (Swedish National Agency for Education, 2023a; Swedish National Agency for Education 2023b). Furthermore, we have carried out post-stratification type analyses (Little & Rubin, 2020)) to estimate the effect of nonresponses on students’ achievement. Finally, we compared students’ achievements, computed with PISA:s and ICCS rather non-informative nonparticipation adjusted weights, and students’ achievements computed with nonparticipation weights adjusted with register data. (OECD, 2023; IEA, 2023). Conclusions, Expected Outcomes or Findings The results indicate that student nonresponses lead to a bias that contributes to a certain overestimation of the students' average results Hence, Sweden's results seem to be too high given which students that participated as well as which did not participate. But, the overestimation differs between the two studies. In PISA the bias seems to be larger than the bias in ICCS and where the bias seems to lead to a significant overestimation of the students’ results in PISA the bias in ICCS seems to have a non-significant effect on the students’ results. Furthermore, we find that regardless of whether we study PISA or ICCS, two studies that differ methodologically in several aspects but are similar in the way of compensating for any person nonresponse bias, the effect of the missing-compensating elements on student’s achievements is negligible. The results of this study in terms of how the missingness lead to overestimation of students' average results in ILSA:s are consistent with previously published studies, both in relation to ILSA:s (Micklewright et al., 2012; Meinck et.al., 2023) and more generally to sample studies in general (Groves & Peytcheva, 2008; Brick & Tourangeau, 2017). However, more would need to be done as we do not know the relationship between the proportion of missingness and the size of its’ bias. And we do not know if this relationship changes over time or how this relationship might differ in an international comparison. Furthermore, when compensating for missing data our results lead to the questioning of how reasonable it is to make the assumption of missing completely at random (MCAR). Something that is commonly done in ILSA:s given a sampled school or class. References Groves & Peytcheva. (2008). The Impact of Nonresponse Rates on Nonresponse Bias: A Meta-Analysis. IEA. (2023). ICCS 2022 Technical Report. Meinck et.al. (2023). Bias risks in ILSA related to non‑participation: evidence from a longitudinal large‑scale survey in Germany (PISA Plus) Micklewright et al. (2012). Non-response biases in surveys of schoolchildren: the case of the English Programme for International Student Assessment (PISA) samples. OECD. (2023). PISA 2022 Technical Report. Roderick J. A. Little, Donald B. Rubin. (2020). Statistical Analysis with Missing Data, 3rd Ed. Swedish National Agency for Education (2023a). ICCS 2022 metodbilaga. Swedish National Agency for Education (2023b). PISA 2022 metodbilaga. |
Date: Wednesday, 28/Aug/2024 | |
9:30 - 11:00 | 09 SES 04 B: Exploring Educational Dynamics and Academic Performance Location: Room 012 in ΧΩΔ 02 (Common Teaching Facilities [CTF02]) [Ground Floor] Session Chair: Gasper Cankar Paper Session |
|
09. Assessment, Evaluation, Testing and Measurement
Paper TIMSS Repeat in Flanders: a longitudinal follow-up to TIMSS 2019 University of Antwerp, Belgium Presenting Author:Flanders has a history of participating in International Large-Scale Assessment studies (ILSAs) like TIMSS, where it has often ranked highly. However, the last cycles of TIMSS have shown a gradual decline in the academic achievement of Flemish students. This has sparked a debate about the quality of education in Flanders. Between the TIMSS cycles of 2015 and 2019, Flemish students' achievement levels decreased by 14 points for mathematics and 11 points for science (Faddar et al., 2020). Although ILSAs are crucial tools for policymakers to assess the quality of educational systems, their primary purpose is periodic benchmarking (Addey and Sellar, 2019). However, the decline found among Flemish students has prompted a deeper investigation and monitoring of the evolution of Flemish learning gains throughout the remaining two years of primary schooling, which goes beyond benchmarking. To this end, a longitudinal study based on TIMSS-2019 was set up in Flanders: TIMSS-repeat. Using a longitudinal design, TIMSS-repeat retested students who participated in the TIMSS 2019 cycle in 2021, when most of the students were in the sixth grade of primary education. In total, 4.301 students, their teachers, and their school principals participated in TIMSS repeat. The main purpose of TIMSS-repeat was to investigate the learning gains of Flemish students during the last two years of primary school, allowing an inquiry into the connection between students' background characteristics and their learning gains for mathematics and science. Moreover, the specific timing of the data collection in May 2021, just after the school closures and quarantines due to COVID-19, allowed for additional information regarding the impact of COVID-19 to be collected. This enabled the investigation of COVID-19's impact on the learning gains in mathematics and science. The following research questions were central:
The first and second questions aim to analyze how students in Flanders progress through the last years of primary school. With these questions, we aim to reveal how students’ learning gains increase and whether specific background characteristics facilitate or hamper student learning gains. In previous TIMSS studies, it was found that home language or students’ socioeconomic status is linked to their achievement (Faddar et al., 2020; IEA, 2020). The third question seeks to provide valuable information to both researchers and policymakers regarding the impact of the COVID-19 pandemic on student learning and achievement. Not only does TIMSS-repeat in Flanders provide answers to these research questions, but it also aligns with the research goals of the TIMSS longitudinal study that is following the TIMSS 2023 cycle (The International Association for the Evaluation of Educational Achievement (IEA), 2022). TIMSS-repeat in Flanders provided a valuable but tentative insight into Flemish learning gains during the final grades of primary education in Flanders, characterized by one of the most impactful global events of our time. In this presentation, we will discuss the different steps taken to conduct the TIMSS-repeat study in Flanders and present our most important findings. Methodology, Methods, Research Instruments or Sources Used The research presented here utilizes data from the TIMSS 2019 cycle collected in May 2019 (T1) and a repeated measurement after two years in May 2021 (T2), based on the same sample of schools and students. For T2 91.9% of the schools from TIMSS 2019 agreed to participate, resulting in a sample of 4301 students nested in 133 schools. Rigorous checks were conducted for selection bias in comparison to the T1 sample, including factors from both the school and the student level such as school performance, educational network, geographical location, gender, and socioeconomic status. Both T1 and T2 samples are comparable, revealing no significant selection bias, and this is on both the school and the student level. To ensure the reliability of the data, several precautions were taken. To avoid a modus effect (Martin et al., 2020), paper-based achievement tests were administered for both T1 and T2. Additionally, to minimize the likelihood of a ceiling effect, adjustments were made to the test materials: easier items were excluded and more difficult mathematics and science items were included from the Flemish national assessment tests conducted in 2015, 2016, and 2021. In the selection of these new items, we maintained a distribution that aligns with the content domains (measurement & geometry, numbers and data for mathematics; life, physical, and earth for science) and cognitive domains (knowing, applying, and reasoning) (Martin et al., 2020). To allow for a precise description of the learning gains, the test items of T1 and T2 were calibrated (Scharfen et al., 2018). Finally, to avoid a retest effect individual students were administered different test items compared to the 2019 test. To grasp the impact of COVID-19, specific scales were added to the background questionnaires for the students, teachers, and school leaders. All new instruments were found to be reliable and valid. The analysis began by calculating weights, jackknife estimates, and plausible values for students’ mathematics and science achievement (Martin et al., 2020). The R package “EdSurvey” was used for all analyses (Bailey, 2020), specifically the “mixed.sdf” function was used to estimate mixed effect models mapping differential effects of student characteristics on achievement. The analysis used a scale ranging from 0 to 1000 points. Conclusions, Expected Outcomes or Findings Looking at the first research question, Flemish pupils demonstrated achievement gains in both mathematics and sciences over the last two years of primary education, with an increase of 117 points in mathematics and 107 points in science. In terms of cognitive domains, Flemish students exhibited the most significant improvements in the Applying domain for both mathematics and science, aligning with Faddar's hypothesis regarding the emphasis on higher cognitive skills in later years of primary education (Faddar et al., 2020). Answering the second research question, we found that boys obtained slightly higher learning gains compared to girls, with an increase of 120 points in mathematics and 113 points in science, compared to 116 and 109 points, respectively. For home language, noteworthy results were found: students who never spoke the language of the test at home demonstrated the most substantial achievement gains in both mathematics (137 points) and science (134 points). Additionally, students with a room for themselves and access to a significant number of books at home experienced the highest achievement gains in both subjects. When answering the first and second research questions, caution is advised: while we found learning gains, empirical evidence to compare the size of these learning gains is lacking. Potential benchmarks such as Bloom et al. (2008), Martin et al. (1997), or Mullis et al. (1997) are based on empirical data, but may also not be as pertinent due to their age and dissimilar contexts. Finally, the descriptive data on how schools, teachers, and students adapted to COVID-19 provides an answer to the third research question.Results include, among others, a shift in didactics and teaching and difficulties with online teaching. References Addey, C., and Sellar, S. (2019). Rationales for (non) participation in international large-scale learning assessments. Education Research and Foresight: UNESCO Working paper. Bailey, P., Lee, M., Nguyen, T., & Zhang, T. (2020). Using EdSurvey to Analyze TIMSS Data. In Faddar, J., Appels, L., Merckx, B., Boeve-de Pauw, J., Delrue, K., , De Maeyer, S., and Van Petegem, P. (2020). Vlaanderen in TIMSS 2019. Wiskunde- en wetenschapsprestaties van het vierde leerjaar in internationaal perspectief en doorheen de tijd. . IEA. (2020). TIMSS 2019 International Results in Mathematics and Science. Martin, M. O., von Davier, M., and Mullis, I. V. S. (2020). Methods and Procedures: TIMSS 2019 Technical Report. T. P. I. S. C. Boston College. https://timssandpirls.bc.edu/timss2019/methods Scharfen, J., Peters, J. M., and Holling, H. (2018). Retest effects in cognitive ability tests: A meta-analysis. Intelligence, 67, 44-66. https://doi.org/https://doi.org/10.1016/j.intell.2018.01.003 The International Association for the Evaluation of Educational Achievement (IEA). (2022). TIMSS Longitudinal Study: Measuring Student Progress over One Year. 09. Assessment, Evaluation, Testing and Measurement
Paper Role of Metacognitive Skills and Self-Efficacy in Predicting Academic Results of Middle School Students National Research University "Higher School of Economics" Presenting Author:Metacognition or metacognitive skills refer to students’ “understanding and control of their own cognition” (Sternberg, 2007, p. 18). Metacognition or knowledge about thinking includes declarative, procedural, and conditional knowledge (McCormick, 2003). Students who have well developed metacognitive skills tend to thrive academically. For example, research shows that systematic metacognitive monitoring leads to better understanding and academic performance (Zimmerman & Cleary, 2009). However, many studies in education report on low to medium associations between metacognition and academic achievement (Fleur et al., 2021; Winne & Azevedo, 2022). Self-efficacy is another construct that relates to academic achievement across educational settings and age groups (DiBenedetto & Schunk, 2022). Self-efficacy refers to students’ beliefs that they can successfully tackle a task (Anderman & Wolters, 2008; Bandura, 2006). Students’ self-efficacy is related to their engagement with a task and the types of strategies they use (Bandura, 1994). Years of research indicate that self-efficacy relates to students’ learning, motivation, achievement, and self-regulated learning (DiBenedetto & Schunk, 2022). High self-efficacy is a strong predictor of students’ achievement and success (DiBenedetto & Schunk, 2022) and strongly relates to academic achievement for middle school students (Carpenter, 2007). Available research studies suggest positive yet small correlations between metacognition and general and domain-specific self-efficacy (Cera et al., 2013; Ridlo & Lutfia, 2016). In addition, metacognitive scaffolding improved metacognitive awareness, academic self-efficacy, and learning achievement of biology students (Valencia-Valejo et al., 2019). Research evidence from other countries provides support in positive relationships among metacognition, self-efficacy, and academic achievement. However, it is not clear how these constructs relate to each other in other contexts such as Russia. Therefore, the goal of this study is to examine the role of metacognitive skills and self-efficacy in predicting middle school students’ academic results. Theoretical framework The role of metacognition and self-efficacy in students’ academic results in this study is examined through a Model of Self- and Socially Regulated Learning (Author). The model is organized around three broad areas: self-regulated learning (SRL; C–I, M–N), socially regulated learning (SoRL; A–B, J–N), and culture (O). Each area has its own set of processes contributing to the development of self-/socially regulated skills. Thus, SoRL includes instructional techniques (A–B) and formative assessment practices, such as feedback, which occur in classrooms (J–N). SRL includes the processes that activate student’s background knowledge and motivational beliefs, which lead to the choice of goals and strategies to do the task (C–I, M–N). Finally, culture (O) situates both types of processes within a socio-cultural context. This model reflects the complexity of school classrooms and includes a number of variables. In this paper, however, the focus is on such components of SRL as metacognition and self-efficacy. For the purposes of this study, metacognition includes the processes of planning, progress monitoring, and reflection. According to Albert Bandura (2006), self-efficacy is domain-specific, which is why separate self-efficacy scales were developed for each of the domains. The main purpose of this study was to examine the role of metacognition and self-efficacy in predicting middle school students’ academic results. The study addresses the following research questions:
Methodology, Methods, Research Instruments or Sources Used This study employed a cross-sectional survey design. Sample. The sample included 1,167 students (55.3% girls, n = 645) from seventh (n = 345), eights (n = 514), and nineth (n = 308) grades. Instruments. The metacognition subscale is an adaptation from the SRL survey for DAACS (Lui et al., 2018). It includes the subscales of planning (5 items), monitoring (6 items), and reflection (7), using a Likert-type scale (4 – almost always, 1 – almost never), indicating good internal consistency estimate for the scale (α = 0.92; ω = 0.93). Example item: “I plan when I am going to do my homework”. The self-efficacy surveys for mathematics (4 items, α = 0.85, ω = 0.9), Russian (4 items, α = 0.79, ω = 0.85), reading (4 items, α = 0.84, ω = 0.86), foreign language (5 items, α = 0.93, ω = 0.94), biology (4 items , α = 0.87, ω = 0.9), and physics (5 items, α = 0.93, ω = 0.95) used a Likert-type scale (4 – I can do it well, 1– I cannot do it at all) with good reliability estimates. An example item: “Can you solve a math problem?”. Procedures. After receiving approval from the Ethics Committee, the data were collected online in public schools. Parents signed online consent forms, and children provided their assent to participate. The data analyses were conducted in R Studio. Results RQ1: While no differences were observed for planning and reflection, girls showed higher scores for monitoring than boys, t = 2, df = 1090.6, p = 0.04, d = 0.12. No differences were observed in self-efficacy for math, reading, foreign language, and biology. However, girls had higher self-efficacy for Russian, t = 7.81, df = 1023.6, p < 0.0001, d = 0.47. Boys had higher self-efficacy for physics, t = -3.72, df = 1095.9, p < 0.001, d = 0.22. Girls reported higher scores across all subjects than boys. Examination by grade levels revealed that students form the 9th grade had higher estimates for planning, reflection, and self-efficacy across most subjects than students from the 7th and 8th grades. RQ2: Linear regression analyses revealed that planning predicted students’ scores in foreign language and biology, and reflection predicted scores for foreign language and physics. For all other subjects, contributions of metacognition were not significant. In contrast, self-efficacy significantly predicted scores for all subjects, explaining between 16% and 32% of variance in scores. Conclusions, Expected Outcomes or Findings This paper examined the role of metacognition and self-efficacy in predicting middle school students’ academic results. The group comparison results revealed that girls had higher scores in metacognitive monitoring than boys. No differences were observed for metacognitive planning and reflection. Also, girls indicated higher self-efficacy in Russian and boys higher self-efficacy in physics. These results are partially in line with research studies, showing gender differences with boys scoring higher in mathematics (Breda & Napp, 2019) and research on perceived self-efficacy (Pajares & Valiante, 2002). Students from the 9th grade seemed to have higher scores for planning, reflection, and self-efficacy across all subjects. Ninth grade is considered a final grade of the middle school in Russia and students take the final examination, and then decide if they continue in high school or switch to other educational institutions. In 9th grade, students’ abstract thinking and analysing skills necessary to reflect on behaviours and emotions are developed enough to engage in metacognitive thinking (Uytun, 2018). The results of the regression analysis indicated that metacognition was not as strong in predicting students’ scores in respective subjects as self-efficacy. However, planning and reflection contributed to scores in foreign language, biology, and physics. These results support research studies reporting weak and moderate relationships of metacognition with academic results (Cera et al., 2013; Ridlo & Lutfia, 2016) and significant contributions of self-efficacy to academic achievement (DiBenedetto & Schunk, 2022). The scholarly significance of this study is that it examined the relationships among metacognition, self-efficacy by domains, and academic achievement of middle school students, using a relatively large sample in Russia. It provides evidence of the links between students perceived self-efficacy beliefs and their results in subject domains, and positive role of planning and reflection for some subjects. References Anderman, E. M., & Wolters, C. A. (2008). Goals, values, and affect: Influences on student motivation. In P. A. Alexander and P. H. Winne (Eds.), Handbook of educational psychology, 369–390, 2nd ed. Lawrence Erlbaum Associates Publishers. Bandura, A. (2006). Toward a psychology of human agency. Perspectives on psychological science, 1(2), 164-180. Breda, T. & Napp, C. (2019). Girls’ comparative advantage in reading can largely explain the gender gap in math-related fields.” Proceedings of the National Academy of Sciences, 116(31), 15435-15440. https://doi.org/10.1073/pnas.1905779116 Carpenter, S. L. (2007). A comparison of the relationships of students' self-efficacy, goal orientation, and achievement across grade levels: a meta-analysis. https://summit.sfu.ca/_flysystem/fedora/sfu_migrate/2661/etd2816.pdf DiBenedetto, M. K., & Schunk, D. H. (2022). Assessing academic self-efficacy. In M. S. Khine and Tine Nielsen (Eds.), Academic Self-Efficacy in Education: Nature, Assessment, and Research 11-37. Springer. Cera, R., Mancini, M., & Antonietti, A. (2013). Relationships between metacognition, self-efficacy and self-regulation in learning. Journal of Educational, Cultural and Psychological Studies (ECPS Journal), 4(7), 115-141. Fleur, D.S., Bredeweg, B. & van den Bos, W. Metacognition: ideas and insights from neuro- and educational sciences. npj Sci. Learn. 6, 13 (2021). https://doi.org/10.1038/s41539-021-00089-5 McCormick, C. B. (2003). Metacognition and learning. In W. M. Reynolds & G. E. Miller (Eds.), Handbook of psychology: Educational psychology (Vol. 7, pp. 79-102). John Wiley & Sons Inc. Pajares, F., & Valiante, G. (2002). Students’self-efficacy in their self-regulated learning strategies: a developmental perspective. Psychologia, 45(4), 211-221. Ridlo, S., & Lutfiya, F. (2017, March). The correlation between metacognition level with self-efficacy of biology education college students. In Journal of Physics: Conference Series (Vol. 824, No. 1, p. 012067). IOP Publishing. Sternberg, R. J. (2007). Intelligence, competence, and expertise. In A. J. Elliot, & C. S. Dweck (Eds.), Handbook of competence and motivation (pp. 15–30). The Guilford Press. Uytun, M. C. (2018). Development period of prefrontal cortex. In A. Starcevic and B. Filipovic (Eds.), Prefrontal Cortex. IntechOpen. DOI: 10.5772/intechopen.78697 Valencia-Vallejo, N., López-Vargas, O., & Sanabria-Rodríguez, L. (2019). Effect of a metacognitive scaffolding on self-efficacy, metacognition, and achievement in e-learning environments. Knowledge Management & ELearning, 11(1), 1–19. https://doi.org/10.34105/j.kmel.2019.11.001 Winne, P., & Azevedo, R. (2022). Metacognition and self-regulated learning. In R. K. Sawyer (Ed.), The Cambridge handbook of the learning sciences, 93-113. Cambridge University Press. Zimmerman, B. J., & Cleary, T. J. (2009). Motives to self-regulate learning: A social-cognitive account. In K. Wentzel, & A. Wigfield (Eds.), Handbook on Motivation at School. Taylor & Francis. 09. Assessment, Evaluation, Testing and Measurement
Paper The Impact of the Negative Grading Effect in Different School Subjects University of Gothenburg, Sweden Presenting Author:The negative grading effect (NGE) is the decrement in grade outcomes associated with the process of being assessed and graded. By exploiting the natural experimental conditions resulting from the introduction or abolition of grades earlier in the school career, researchers have been able to contrast the outcomes of comparable groups of Swedish students with different grading backgrounds, i.e. whether they were previously graded or not. The effect has repeatedly been identified in students’ year 9 (age 15/16) grades, and seems to particularly affect low-ability students and boys (Facchinello, 2014; Klapp, Cliffordson, & Gustafsson, 2016; Clarke, Klapp, & Rosen, under review). Despite substantial reforms to the grading and assessment system, the effect persists and thus seems to have an enduring and robust impact on compulsory school students’ grades. Methodology, Methods, Research Instruments or Sources Used This quasi-experimental study plans to use structural equation modelling or multivariate regression analyses of data collected in the Evaluation Through Follow-up project of Sweden’s compulsory school students. The database contains information from recurring studies of cohorts of students since 1948 to present. The database contains student and parental demographic background and questionnaire data, as well as teacher and school information. The data contains student academic performance measures from multiple points in their academic career as well as cognitive ability measures collected by testing the students in year 6 (age 12/13). The analysis uses birth-cohorts 1992 (N = 10147) and 2004 (N = 9775). This comparison allows for the evaluation of the academic outcomes of students in cohorts before and after a reform that lowered the age at which students are first graded. The reforms also introduced changes which increased the stakes of grades by i. a. introduction of a fail grade. The outcomes of students who have previously been graded will be compared on a by-subject level to those who have not previously received grades to determine whether having previously received grades has differential effects for different subjects. In addition to the grading status of the students, the analysis will also include the independent variables for student gender, parental education level, immigration background, and student cognitive ability levels. The dependent variables will be the grade outcomes for the school subjects studied achieved at the end of school year 9 (age 15/16). Data are available for around 14 subjects. Statistical analysis and modelling will use Mplus version 8.5 (Muthén & Muthén, 1998-2019) which can account for missing data and possible clustering effects of students within schools. Conclusions, Expected Outcomes or Findings Further support for the presence of the NGE is expected. The NGE is expected to vary in magnitude between subjects. However, at this stage, the exact nature of how the NGE varies between the various school subjects or the presence of any patterns or groupings of the subjects has not yet been determined. The remaining independent variables are expected to show similar relationships to the grade outcomes as previous research has established, though again, some between-subject variation is expected, but has not yet been determined. The study is ongoing and results are expected around Summer 2024. The study is a part of the research project funded by the Swedish Research Council (2019-04531). References Azmat, G., & Iriberri, N. (2010). The importance of relative performance feedback information: Evidence from a natural experiment using high school students. Journal of Public Economics, 94, 435-452. doi:https://doi.org/10.1016/j.jpubeco.2010.04.001 Clarke, D. R., Klapp, A., & Rosen, M. (under review). The negative effect of earlier grading. Facchinello, L. (2014). The impact of early grading on academic choices: mechanisms and social implications. Department of Economics. Stockholm: Stockholm Schools of Economics. Retrieved from https://mysu.sabanciuniv.edu/events/sites/mysu.sabanciuniv.edu.events/files/units/FASS%20Editor/jmp_-_luca_facchinello.pdf Klapp, A., Cliffordson, C., & Gustafsson, J.-E. (2016). The effect of being graded on later achievement: evidence from 13-year olds in Swedish compulsory school. Educational Psychology, 36(10), 1771-1789. doi:https://doi.org/10.1080/01443410.2014.933176 Lundahl, C., Hultén, M., & Tveit, S. (2017). The power of teacher-assigned grades in outcome-based education. Nordic Journal of Studies in Educational Policy, 3(1), 56-66. doi:https://doi.org/10.1080/20020317.2017.1317229 Mammarella, I. C., Donolato, E., Caviolo, S., & Giofrè, D. (2018). Anxiety profiles and protective factors: A latent profile analysis in children. Personality and Individual Differences, 124, 201-208. doi:https://doi.org/10.1016/j.paid.2017.12.017 Marsh, H. W. (1990). The structure of academic self-concept: The Marsh/Shavelson Model. Journal of Educational Psychology, 82(4), 623-636. Muthén, B., & Muthén, L. (1998-2019). Mplus user's guide (8th ed.). Los Angeles, CA: Author. |
13:45 - 15:15 | 100 SES 06 B: Working Meeting Hasmik Location: Room 012 in ΧΩΔ 02 (Common Teaching Facilities [CTF02]) [Ground Floor] Session Chair: Hasmik Kyureghyan Paper Session |
Date: Thursday, 29/Aug/2024 | |
9:30 - 11:00 | 09 SES 09 B: Innovative Approaches to Educational Practice and Assessment Location: Room 012 in ΧΩΔ 02 (Common Teaching Facilities [CTF02]) [Ground Floor] Session Chair: Leonidas Kyriakides Paper Session |
|
09. Assessment, Evaluation, Testing and Measurement
Paper Exploring Implementation of Value Added Model in Slovenia NEC, Slovenia Presenting Author:Value-added indicators are a more accurate method of assessing school performance since they eliminate more non-school factors (Meyer et al., 2017). Slovenian upper secondary schools in the General education track finishing with General Matura have been able to assess value-added measures and track changes over time since 2014. The two time points in question are achievement at the end of Grade 9, just before entering upper secondary schools, and achievement at General Matura examinations. Lower secondary schools can similarly check value-added between Grade 6 and Grade 9 (finishing grade) in different subjects since 2018. These measures are not part of any accountability scheme and are provided for schools’ self-evaluation purposes along with other achievement results. There can be many reasons, and within this presentation, we will explore the following research questions: Could the observed negative average value be associated with school composition factors (primarily the size of the school)? Methodology, Methods, Research Instruments or Sources Used To address the mentioned research questions, we will use simple regression techniques or hierarchical linear regression where needed. Data on external examinations and national assessments to calculate value-added measures will come from the National Examinations Centre, while the data on municipalities will originate from the Slovenian Statistical Office. We will use value-added measures for the last five years to demonstrate the stability of findings over time. Data will we used and analyzed in a responsible manner to protect individual privacy and adhere to legal requirements. This is especially important since the data on whole cohorts of students will be used. Conclusions, Expected Outcomes or Findings Value-added models can provide important information and identify underperforming schools, as demonstrated by Ferrão and Couto in the case of Portuguese schools (2014). We expect to provide insight into the problem and either identify the causes of constant negative averages or propose further steps needed to explore and resolve the issue. As value-added measures are also present in other European countries, this research will help other researchers evaluate their value-added models and contribute to a better understanding of the field. References Cankar, G. (2011). Opredelitev dodane vrednosti znanja (Izhodišča, primeri in dileme). In Kakovost v šolstvu v Sloveniji (str. 431). (2011). Pedagoška fakulteta. http://ceps.pef.uni-lj.si/dejavnosti/sp/2012-01-17/kakovost.pdf Ferrão, M., & Couto, A. (2014). The use of a school value-added model for educational improvement: a case study from the Portuguese primary education system. School Effectiveness and School Improvement, 25, 174 - 190. https://doi.org/10.1080/09243453.2013.785436. Koedel, C., Mihaly, K., & Rockoff, J. (2015). Value-added modeling: A review. Economics of Education Review, 47, 180-195. https://doi.org/10.1016/J.ECONEDUREV.2015.01.006. Meyer, R. (1997). Value-added indicators of school performance: A primer. Economics of Education Review, 16, 283-301. https://doi.org/10.1016/S0272-7757(96)00081-7. Papay, J. (2011). Different Tests, Different Answers. American Educational Research Journal, 48, 163 - 193. https://doi.org/10.3102/0002831210362589. 09. Assessment, Evaluation, Testing and Measurement
Paper Using the Dynamic Approach to Promote Formative Assessment in Mathematics: Αn Experimental Study 1Department of Education, University of Cyprus; 2Centre for Educational Research and Evaluation, Cyprus Pedagogical Institute; 3Department of Secondary General Education, Cyprus Ministry of Education, Sport and Youth Presenting Author:Teachers who use assessment for formative rather than summative purposes are more effective in promoting student learning outcomes (Chen et al., 2017; Kyriakides et al., 2020). Teachers appear to acknowledge the benefit of formative assessment. However, their assessment practice remains mainly summative oriented (Suurtamm & Koch, 2014; Wiliam, 2017). This can partly be attributed to the fact that teachers do not receive sufficient training in classroom assessment (DeLuca & Klinger, 2010). Teacher Professional Development (TPD) programs intended to improve assessment practice have so far provided mixed results regarding their impact on teachers’ assessment skills (Chen et al., 2017), whereas many studies do not provide any empirical evidence on the impact of student assessment TPD programs on student learning outcomes (Christoforidou & Kyriakides, 2021). In this context, this study aims to explore the impact of a TPD course in formative assessment on improving teachers’ assessment skills and through that on promoting student learning outcomes in mathematics (cognitive and meta-cognitive). During the first phase of the study, a framework that enables the determination and measurement of classroom assessment skills was developed. This framework examines assessment looking at three main aspects. First, skills associated with the main phases of the assessment process are considered (Gardner et al., 2010; Wiliam et al., 2004): (i) appropriate assessment instruments are used to collect valid and reliable data; (ii) appropriate procedures in administering these instruments are followed; (iii) data emerging from assessment are recorded in an efficient way and without losing important information; (iv) assessment results are analysed, interpreted, and used in ways that can promote student learning; and (v) assessment results are reported to all intended users to help them take decisions on how to improve student learning outcomes. The second aspect of this framework has to do with the fact that assessment skills are defined and measured in relation to teachers’ ability to use the main assessment techniques. Specifically, the framework looks at assessment techniques by considering two important decisions affecting assessment technique selection: a) the mode of response and b) who performs the assessment. Finally, the third aspect of the framework refers to the five measurement dimensions suggested in the Dynamic Model of Educational Effectiveness (Kyriakides et al., 2020): frequency, focus, stage, quality and differentiation. These dimensions allow us to better describe the functioning of each characteristic of an effective teacher (Scheerens, 2016). Based on the theoretical framework and its dimensions, a questionnaire measuring teachers’ skills in assessment was developed. A study provided support to the validity of the instrument. It was also found that assessment skills can be grouped into three stages of assessment behaviour. These stages were used to make decisions in relation to the content and design of the TPD course which was based on the main assumptions of the DA. First, the DA considers the importance of identifying specific needs and priorities for improvement of each teacher/group of teachers. Second, it is acknowledged that teachers should be actively involved in their professional development to better understand how and why the factors addressed have an impact on student learning. Third, the DA supports that the Advisory and Research Team has should support teachers in their efforts to develop and implement their action plans. Fourth, monitoring the implementation of teacher action plans in classroom settings is considered essential. This implies that teachers should continuously develop and improve their action plans based on the information collected through formative evaluation. Methodology, Methods, Research Instruments or Sources Used At the beginning of school year 2019-20, 62 secondary school teachers who taught mathematics in Grades 7, 8 and 9 in Nicosia (Cyprus) agreed to participate. These teachers were randomly split into the experimental (n=31) and the control group (n=31). Randomization was done at the school level to avoid any spillover effect. Students of Grades 7, 8 and 9 of the teacher sample participated in the study. All students of two classrooms per teacher were randomly selected. Our student sample was 2588 students from 124 classrooms. Teachers of the experimental group were invited to participate in a TPD course with a focus on student assessment. Teachers of the control group did not attend any TPD course. However, they were provided the opportunity to participate in the TPD course during the next school year. Data on teacher skills and student achievement were collected at the beginning and at the end of the TPD course. The instruments used were: (1) a teacher questionnaire, (2) a battery of curriculum-based written tests in mathematics (measuring cognitive skills), and (3) a battery of tests measuring metacognitive skills in mathematics. To measure the impact of the TPD course on improving teachers’ assessment skills the Extended Logistic Model of Rasch was used to analyse the data emerged from the teacher questionnaire. Data emerged from each measurement period. Then, the Mann Whitney analysis was used to search for any differences between the control and experimental group in terms of teachers’ assessment skills at the beginning and at the end of the intervention. To measure the impact of the TPD course on improving students’ cognitive learning outcomes, multilevel regression analysis was conducted to find out whether teachers employing the DA were more effective than the teachers of the control group in terms of promoting their students’ learning outcomes in mathematics. In addition, to search for the impact of the intervention on improving students’ metacognitive learning outcomes, three separate multilevel regression analyses, one for each scale measuring regulation of cognition (i.e., Prediction, Planning, Evaluation), were also conducted. Conclusions, Expected Outcomes or Findings The Wilcoxon Signed Ranks test revealed that the mean scores of teachers’ assessment skills were higher at the end of the intervention compared to their scores at the beginning of the intervention (Z=4.80, p<0.001). On the other hand, no statistically significant improvement in the skills of the control group was identified (Z=1.21, p=0.23). The Mann Whitney test did not reveal any statistically significant difference between the control and the experimental group in terms of the stage that each teacher was found to be situated at the beginning of the intervention (Z= -0.57, p=0.57). A statistically significant difference at the end of the intervention (Z=2.53, p=0.011) was found. It was observed that none of the teachers of the control group managed to move from the stage he/she was found to be situated at the beginning of the intervention to a more demanding stage. A stepwise progression was observed in the experimental group since 13 out of 31 teachers managed to move at the next more demanding stage. Moreover, the results of all four multilevel analyses revealed that the DA had a statistically significant effect on student achievement in mathematics (in both cognitive and meta-cognitive learning outcomes). The DA considers the importance of designing a course according to the specific needs and priorities for improvement of each group of teachers, unlike most ‘one size fits all’ professional development approaches. This argument has received some support since it was found that teachers’ assessment skills can be grouped into three stages. This study also reveals that teachers can improve and ultimately progress to the next developmental stage of assessment skills, by undertaking appropriate trainings. Finally, this study has shown the impact of the TPD course based on DA on both cognitive and metacognitive learning outcomes. Finally, implications for research, policy and practice are discussed. References Chen, F., Lui, A. M., Andrade, H., Valle, C., & Mir, H. (2017). Criteria-referenced formative assessment in the arts. Educational Assessment, Evaluation and Accountability, 29(3), 297-314. Christoforidou, M., & Kyriakides, L. (2021). Developing teacher assessment skills: The impact of the dynamic approach to teacher professional development. Studies in Educational Evaluation, 70, 101051. https://doi.org/10.1016/j.stueduc.2021.101051 DeLuca, C., & Klinger, D. A. (2010). Assessment literacy development: Identifying gaps in teacher candidates’ learning. Assessment in Education: Principles, Policy & Practice, 17(4), 419-438. https://doi.org/10.1080/0969594X.2010.516643 Gardner, J., Wynne, H., Hayward L., & Stobart, G. (2010). Developing Teacher Assessment. McGraw-Hill/Open University Press. Kyriakides, L., Creemers, B.P.M., Panayiotou, A., & Charalambous, E. (2020). Quality and Equity in Education: Revisiting Theory and Research on Educational Effectiveness and Improvement. Routledge. Scheerens, J. (2016). Educational effectiveness and ineffectiveness: A critical review of the knowledge base. Dordrecht, the Netherlands: Springer. DOI 10.1007/978-94-017-7459-8 Suurtamm, C., & Koch, M. J. (2014). Navigating dilemmas in transforming assessment practices: experiences of mathematics teachers in Ontario, Canada. Educational Assessment, Evaluation and Accountability, 26(3), 263-287. https://doi.org/10.1007/s11092-014-9195-0 Wiliam, D. (2017). Assessment for learning: meeting the challenge of implementation, Assessment in Education: Principles, Policy & Practice, 25(6), 686–689. https://doi.org/10.1080/0969594X.2017.1401526 Wiliam, D., Lee, C., Harrison, C., & Black, P. J. (2004). Teachers developing assessment for learning: Impact on student achievement. Assessment in Education: Principles Policy and Practice, 11(1), 49-65. https://doi.org/10.1080/0969594042000208994 09. Assessment, Evaluation, Testing and Measurement
Paper Teaching Quality In Classrooms Of Different Compositions. A Mixed Methods Approach. University of Oslo, Norway Presenting Author:Teachers’ instruction is at the heart of education, and previous research has shown that teaching quality is important for students’ learning outcomes (e.g. Charalambous & Praetorius, 2020; Seidel & Shavelson, 2007). However, teaching is a two-way process, and less is known about how the composition of the classroom affects teaching quality (TQ). Do for instance high socio-economic (SES) classrooms receive different TQ than low-SES classrooms? To examine this, one would first need to establish whether a so-called compositional effect exists. Compositional effect refers to the effects of, for instance the classroom’s socio-economic status (SES) on student learning outcomes, over and above the effect of students’ individual SES (Van Ewijk & Sleegers, 2010). Both compositional effects and unfair distribution of high-quality teachers have been found in previous studies in a number of countries (Gustafsson et al., 2018; Luschei & Jeong, 2018; Van Ewijk & Sleegers, 2010) However, in Norway, that for a long time was considered an egalitarian society (Buchholtz et al., 2020), there is a lack of such studies. At the same time, educational inequality has increased in Norway (Sandsør et al., 2021). Hence, the overarching aim of the present study is to examine whether a compositional effect exists, and how the composition of the classrooms affects TQ. We further aim to describe more in depth what characterizes the TQ in classrooms of different compositions in Oslo where the gaps between students are larger and there are more minority students than in the rest of Norway (Fløtten et al., 2023). The following research questions were asked: 1) What is the effect of the classroom composition (in terms of SES and minority status) on students learning outcomes in science, over and above students’ individual SES and minority status (i.e. the compositional effect)? How does this differ between Oslo and the rest of Norway? 2) What is the effect of the classroom composition on TQ in science, and how does this differ between Oslo and the rest of Norway? 3) What characterizes TQ in science classrooms of different compositions in Oslo?
Theoretical framework for teaching quality. We chose The Three Basic Dimensions (TBD) framework (Klieme et al., 2009; Praetorius et al., 2018) to conceptualize TQ as this framework is the most commonly used in Europe and by the international large-scale studies (Klieme & Nilsen, 2022). TQ is here defined as the type of instruction that predicts students learning outcomes, and includes the following three dimensions: 1) Classroom management refers to how teachers manage the classroom environment and includes, for instance, preventing undesirable behaviors and setting clear and consistent rules and expectations for student behavior. 2) Supportive teaching focuses on the teacher’s ability to support students both professionally and socio-emotionally, such as providing clear and comprehensive instruction and seeing and listening to every individual student. 3) Cognitive activation includes instruction that enables students to engage in higher-level cognitive thinking that promotes conceptual understanding. Such instruction is characterized by challenging and interactive learning. The TBD is a generic framework used across subject domains. To address research question 3, and further investigate more in depth the subject-specific aspect of TQ in science, a fourth dimension from the framework Teacher Education and Development Study–Instruct (TEDS-Instruct, e.g. Schlesinger et al., 2018) was included. This framework was adapted to the Norwegian context and to the subject domain of science, and validated. The fourth dimension is called Educational structuring and refers to subject-specific aspects of instruction such as inquiry or dealing with students misconceptions in science. Methodology, Methods, Research Instruments or Sources Used Design and sample. The project Teachers’ effect of student learning (TESO), funded by the Norwegian Research Council, collected data through an extended version of TIMSS 2019, including a representative sample of fifth graders in Norway, a representative sub-sample of Oslo, and video observations of grade six classrooms in Oslo. The students who participated in the video observations in sixth grade, also participated in TIMSS 2019 when they were fifth graders. All students answered questioners and the TIMSS mathematics and science tests. Measures. To measure the generic TQ in the second research questions, students’ responses to the questionnaire were used. In the questionnaire, Classroom management was measured by 6 items (e.g. “Students don’t listen to what the teacher says”). Cognitive activation was measured by 5 items (e.g. “The teacher asks us to contribute in planning experiments”. Both of these were measured using a 4-point frequency scale (from Never to Every or almost every lesson). Teacher support included 6 items on 4-point Likert scales (from Disagree a lot, to Agree a lot), e.g. “My teacher has clear answers to my questions”. To answer research question 3 and provide more in-depth descriptions of TQ, the more fine-grained TEDS-Instruct observation manual (including 21 items ratted from 1 through 4) was used to rate the videos. The manual measures the same three aspects as TIMSS conceptually, in addition to educational structuring. SES was measured by students’ responses to the number of books at home (the parents’ responses to their education had more than 40% missing and was hence excluded as a SES indicator). Minority status was measured by students’ answer to how often they speak Norwegian at home. Methods of analyses To answer research questions 1 and 2, we employed multilevel (students and classes) structural equation modelling (SEM), and a multi-group approach to examine differences between Oslo and the rest of Norway. To avoid multi-collinearity, each aspect of teaching quality was modelled separately and as latent variables. Compositional effects were estimated by subtracting the within level effects from the between level effects. To answer research question 3, the questionnaires, achievements, and ratings on the videos were linked and merged to one file. Descriptives were used to crate profiles of the ratings of the video observations to describe the characteristics of TQ in classrooms of different compositions. Conclusions, Expected Outcomes or Findings RQ1. Compositional effects The compositional effects were all significant (p< .05) and positive. The effect of SES was 0.44 for Norway, and the multigroup analyses yielded an effect of 0.57 for Oslo and 0.31 for the rest of Norway. The compositional effects of language were 0.45 for Norway, 0.76 for Oslo and 0.45 for the rest of Norway. In other words, the compositional effects for Oslo were very high, while the compositional effects for Norway overall were in line with other Scandinavian countries (Yang Hansen et al., 2022). RQ2. Relations between classroom composition and TQ High-SES, and especially low minority classrooms, had positive and significant associations to both classroom management and teacher support. These effects were stronger in Oslo than the rest of Norway. This indicates an unfair distribution of high teaching quality to advantaged classrooms. However, for cognitive activation, there were no significant results at the class level, but a negative association between high-SES, low-minority classrooms and students’ perceptions of cognitive activation. This indicates that advantaged students perceive less challenge and interactive learning. RQ3. Characteristics of TQ Results from the video observations showed that TQ in high-SES classrooms were characterized by better classroom management, teacher support, and educational structuring than low-SES classroom, albeit with less cognitive activation. Furthermore, high SES classrooms were characterized by fewer minority students and higher achievements than low SES classrooms. These findings are in line with the results from the questionnaires. Taken together, the findings from our three research questions points to a school that contributes to increase the gap between students. Classrooms with high shares of advantaged students have access to better teaching quality than classrooms with many disadvantaged students, thus generating unequal opportunities to learn. References Buchholtz, N., Stuart, A., & Frønes, T. S. (2020). Equity, equality and diversity—Putting educational justice in the Nordic model to a test. Equity, equality and diversity in the Nordic model of education, 13-41. Charalambous, C. Y., & Praetorius, A.-K. (2020). Creating a forum for researching teaching and its quality more synergistically. Studies in Educational Evaluation, 67, 100894. Fløtten, T., Kavli, H., & Bråten, B. (2023). Oslo er fortsatt en delt by [Oslo is still a divided city]. Aftenposten. Retrieved from https://www.aftenposten.no/meninger/kronikk/i/dw2z8o/oslo-er-fortsatt-en-delt-by Gustafsson, J.-E., Nilsen, T., & Hansen, K. Y. (2018). School characteristics moderating the relation between student socio-economic status and mathematics achievement in grade 8. Evidence from 50 countries in TIMSS 2011. Studies in Educational Evaluation, 57, 16-30. Klieme, E., & Nilsen, T. (2022). Teaching Quality and Student Outcomes in TIMSS and PISA. International Handbook of Comparative Large-Scale Studies in Education: Perspectives, Methods and Findings, 1089-1134. Klieme, E., Pauli, C., & Reusser, K. (2009). The pythagoras study: Investigating effects of teaching and learning in Swiss and German mathematics classrooms. The power of video studies in investigating teaching and learning in the classroom, 137-160. Luschei, T. F., & Jeong, D. W. (2018). Is teacher sorting a global phenomenon? Cross-national evidence on the nature and correlates of teacher quality opportunity gaps. Educational researcher, 47(9), 556-576. Praetorius, A.-K., Klieme, E., Herbert, B., & Pinger, P. (2018). Generic dimensions of teaching quality: The German framework of three basic dimensions. ZDM, 50(3), 407-426. Sandsør, A. M. J., Zachrisson, H. D., Karoly, L. A., & Dearing, E. (2021). Achievement Gaps by Parental Income and Education Using Population-Level Data from Norway. https://osf.io/preprints/edarxiv/unvcy Schlesinger, L., Jentsch, A., Kaiser, G., König, J., & Blömeke, S. (2018). Subject-specific characteristics of instructional quality in mathematics education. ZDM, 50, 475-490. Seidel, T., & Shavelson, R. J. (2007). Teaching effectiveness research in the past decade: The role of theory and research design in disentangling meta-analysis results. Review of Educational Research, 77(4), 454-499. Van Ewijk, R., & Sleegers, P. (2010). The effect of peer socioeconomic status on student achievement: A meta-analysis. Educational Research Review, 5(2), 134-150. Yang Hansen, K., Radišić, J., Ding, Y., & Liu, X. (2022). Contextual effects on students’ achievement and academic self-concept in the Nordic and Chinese educational systems. Large-scale Assessments in Education, 10(1), 16. |
15:45 - 17:15 | 09 SES 12 B: Reimagining Assessment Practices and Teacher Autonomy Location: Room 012 in ΧΩΔ 02 (Common Teaching Facilities [CTF02]) [Ground Floor] Session Chair: Ana María Mejía-Rodríguez Paper Session |
|
09. Assessment, Evaluation, Testing and Measurement
Paper Do teachers prefer to be free? Teachers’ Appreciation of Autonomy in students' assessment as a Personal Interpretation of Professional Reality 1Beit Ber Academic College, Israel; 2University of Haifa Presenting Author:Our aim in this study was to learn about teachers’ understanding and appreciation of their autonomy in the context of student's assessment. The study’s context was a reform in Israel’s national matriculation exams (declared in 2022), that involved transitioning from external state-governed examinations into school-based assessment. The reform triggered discussions and re-evaluation of teachers’ professional autonomy, and of assessment policy. In this context we explored teachers' perceptions of the effect of assessment on professional autonomy. Furthermore, we broadened the scope of our study beyond the confines of the local reform, utilizing this specific case to draw more general insights regarding how teachers attribute significance to the professional conditions within which they work and how these conditions effect their sense of autonomy. We looked at the relation between autonomy in assessment, and autonomy in other aspects of teachers work. Furthermore, we studied the role of autonomy in the teachers' professional identity. Our main research questions were: Which factors do the teachers acknowledge as contributing to their sense and preferences of autonomy? What are teachers' perceptions of the effect of assessment on their professional autonomy? The theoretical framework of the study includes several types of literature. First, we draw on a philosophical analysis of teachers’ freedom and responsibility, based on Existential philosophy of Jean-Paul Sartre (1946/ 2017). Teachers’ professional identity has been recognized as an extreme case of human destiny portrayed by Sartre (Author 1, 2022). While practicing the art of teaching (Schwab, 1983), teachers have a constant need to make choices in class, interpreting system’s regulations, practicing an inevitable autonomy, and exerting professional responsibility. Secondly, we looked at current studies, and learned that teacher autonomy research mirrors trends in national and global education. Several studies indicate the favorable effects of teacher autonomy on teachers' perceived self-efficacy, work satisfaction, and empowerment, and on creating a positive work climate. They also show constraints on autonomy correlate with teacher turnover and the risk of emotional exhaustion, and burnout (Skaalvik & Skaalvik,2014). Despite the recognition of the importance of teacher autonomy for job satisfaction (Juntunen, 2017), successful schools, and professional development (Wermke et al., 2019), there is less consensus on its definition (Pearson & Moomaw 2005). Autonomous teachers have a high control over daily practice issues (Wermke et al., 2019). Friedman’s scale for teacher-work autonomy (TWA 1999) includes four functioning areas pertinent to teachers’ sense of autonomy: class teaching, school operating, staff development, and curriculum development. In a re-evaluation of Friedman's scale (Strong & Yoshida, 2014), the number of autonomy areas grew to six and included assessment. In this paper we adopt Lennert-Da Silva’s (2022) definition which relates to the decision-making scope and control teachers have in relation to the national educational policy. Thirdly, we read studies that look at autonomy in the context of student assessment and examine it as part of the larger theme of accountability. In the context of marketization, schools’ decentralization places school leaders within a framework including bureaucratic regulations, discourses of competitive enterprise, and external public accountability measures, that are spreading worldwide (Hammersley-Fletcher et al., 2021; Verger et al., 2019). External assessment is a central factor in accountability (Ben-Peretz, 2012). High-stakes accountability casts a shadow on teachers' professional practice (Clarke, 2012; Mausethagen & Granlund, 2012), and their everyday practice is constrained by external testing.(Ball, 2003, 2008a, 2008b). Focusing on assessment as one expression of accountability, studies discuss the tension between external testing and autonomy. State-controlled assessment is viewed as a shift away from teacher professionalism towards the adoption of teaching methods that erode teacher autonomy in9 curriculum development and instructional decision-making (Day & Smethem, 2009). Methodology, Methods, Research Instruments or Sources Used Drawing on existential philosophy and empirical literature on the connection between student assessment and teacher autonomy, we adopted a qualitative approach, and we conducted in-depth interviews with 12 teachers, who were selected from four diverse schools, to ensure a broad representation of student populations. For our data collection We employed a semi-structured interview format that began with general questions, giving the teachers an opportunity to freely express their perspectives on their autonomy. We aimed to ascertain whether teachers would refer to assessment processes and to the reform, as aspects of autonomy and factors in their general work experience before we asked them specifically about these topics. We asked: Do you like your work? What aspects contribute to your enjoyment in teaching? What factors disturb you or minimize your satisfaction? Do you feel free at work? The subsequent phase of the interview centered on the matriculation reform, exploring whether teachers had perceived alterations to their level of autonomy. We used questions like: How do you usually evaluate your students? What is your opinion about the reform in the matriculation examinations? The interviews lasted one and a half hours, on average. They were conducted face to face, recorded, and later transcribed. To analyze our data, we utilized inductive qualitative content analysis methodology (Cho & Lee 2014), We conducted open coding of the data, asking questions such as: What do the teachers' responses reveal about their views about the ‘is’ and the ‘ought’ of their professional autonomy? Do they see a difference between internal assessment (INA) and external assessment (EXA) as factors influencing their autonomy? This procedure resulted in preliminary categories. Next, we explored the data to identify commonalities, disparities, complementarities, and interconnections among the teachers, while also considering their individual characteristics. To ensure trustworthiness, the categories obtained from this procedure were abstracted by each researcher individually. We then compared notes and agreed on the final categorization scheme. The overarching categories addressing the two research questions relate to professional circumstances: the national education system and the school in which each teacher works. As informed by inductive data analysis methodology, the analysis process also revealed professional qualities that influence teachers' view of autonomy. These were specifically identified by the teachers in the interviews and included professional confidence and a sense of purpose. The final categorial scheme is concerned not only with the individual categories but more significantly with their arrangement and interplay. Conclusions, Expected Outcomes or Findings Overall, our analysis shows that teachers' sense and preference of autonomy, as expressed in their response to the matriculation reform, stemmed from personal subjective interpretation of the objective circumstances of their professional environment. Despite diverse attitudes, the majority of teachers express a preference for autonomy, especially in assessment. Given the global teacher shortage and challenges in retaining high-quality teachers (García et al., 2022; Guthery & Bailes, 2022), recognizing that external assessments constrain teachers' experienced autonomy has significant implications for policymakers deciding on state assessments. The teachers highlighted the significance of two elements shaping their professional experience, and determining the degree of autonomy they have: the national education system and the school. They referred to assessment as a clear example of the complex interplay between those two elements; However, the teachers emphasized a holistic approach to autonomy, in which assessment cannot stand alone. For them, autonomy included curricular planning and assessment design together. Moreover, teachers’ appreciation of their autonomy is inspired by two professional qualities: confidence and a sense of purpose. This conclusion, regarding the relationship between teachers’ confidence, sense of purpose, and their views about autonomy, bares important conclusion for teacher professional learning and development, as well as for teacher education. We recognize the need for further elaboration of this conclusion, designing ways to enhance and promote these professional qualities as part of the shaping of professional identity of novice teachers, as well as that of experienced teachers . References Ball, S. (2008b). Performativity, privatisation, professionals and the state. In B. Cunningham (Ed.), Exploring professionalism (pp. 50–72). Institute of Education Day, C., & Smethem, L. (2009). The effects of reform: Have teachers really lost their sense of professionalism? Journal of Educational Change, 10, 141–157. Ben-Peretz, M. (2012). Accountability vs. teacher autonomy: An issue of balance. In The Routledge international handbook of teacher and school development (pp. 83-92). Routledge. Cho, J. Y., & Lee, E. H. (2014). Reducing confusion about grounded theory and qualitative content analysis: Similarities and differences. Qualitative report, 19(32), 1-20. Friedman, I. A. (1999). Teacher-perceived work autonomy: The concept and its measurement. Educational and Psychological Measurement, 59(1), 58-76. García, E., Han, E., & Weiss, E. (2022). Determinants of teacher attrition: Evidence from district-teacher matched data. Education Policy Analysis Archives, 30(25), n25. Guthery, S., & Bailes, L. P. (2022). Building experience and retention: the influence of principal tenure on teacher retention rates. Journal of Educational Administration, 60(4), 439-455. Hammersley-Fletcher, L., Kılıçoğlu, D., & Kılıçoğlu, G. (2021). Does autonomy exist? Comparing the autonomy of teachers and senior leaders in England and Turkey. Oxford Review of Education, 47(2), 189-206. Juntunen, M. L. (2017). National assessment meets teacher autonomy: national assessment of learning outcomes in music in Finnish basic education. Music Education Research, 19(1), 1-16. Lennert Da Silva, A. L. (2022). Comparing teacher autonomy in different models of educational governance. Nordic Journal of Studies in Educational Policy, 8(2), 103-118. Pearson, L. C., & Moomaw, W. (2005). The relationship between teacher autonomy and stress, work satisfaction, empowerment, and professionalism. Educational Research Quarterly, 29(1), 38-54. Sartre, J. P. (1946 / 2017). Existentialism is a humanism (C. Macomber, Trans.). Yale University Press. Schwab, J. J. (1973). The practical 3: Translation into curriculum. The School Review, 81(4), 501-522. Schwab, J. J. (1983). The practical 4: Something for curriculum professors to do. Curriculum Inquiry, 13(3), 239-265. Skaalvik, E. M., & Skaalvik, S. (2014). Teacher self-efficacy and perceived autonomy: Wermke, W., Olason Rick, S., & Salokangas, M. (2019). Decision-making and control: Perceived autonomy of teachers in Germany and Sweden. Journal of Curriculum Studies, 51(3), 306-325. Strong, L. E., & Yoshida, R. K. (2014). Teachers’ autonomy in today's educational climate: Current perceptions from an acceptable instrument. Educational Studies, 50(2), 123-145. 09. Assessment, Evaluation, Testing and Measurement
Paper How Well Can AI Identify Effective Teachers? 1Texas Tech University, United States of America; 2Gargani & Co Inc Presenting Author:We report ongoing research that assesses how well AI can evaluate teaching, which we define as “effective” to the degree it helps students learn. Our current research builds on a body of prior work in which we assessed how well human judges performed the same task. Under varying conditions (length of instructional sample; instruction documented as video, audio, and transcript; and judgments based on intuition alone, high-inference rubrics, and low-inference rubrics) human judges demonstrate significant limitations. Experts and nonexperts did no better than chance when they relied solely on their intuitive judgment. Experts fared no better when using high-inference rubrics. However, experts and nonexperts were more accurate than chance when they used low-inference rubrics, and just as accurate using transcripts of instruction compared to video. Machines are very good at performing low-inference tasks, and AI in particular is very good at “understanding” written text, such as transcripts. Is AI better at judging teaching effectiveness from transcripts than humans? If so, should human judges be replaced by machines? We provide data that may help answer these questions, and engage our audience in a discussion of the moral dilemmas it poses. Methodology, Methods, Research Instruments or Sources Used We investigate two types of evaluative judgments—unstructured and structured. Unstructured judgments were investigated by asking subjects to “use what they know” to classify classroom instruction of known quality as being of either high or low effectiveness. Structured judgments were investigated by asking subjects to count the occurrences of six concrete teaching behaviors using the RATE rubric. The performance of two groups of subjects are compared—human judges and AI. The tasks with human subjects are replications of experiments we previously conducted and published (Strong et al, 2011; Gargani & Strong, 2104; 2015). We are, therefore, able to compare the performance of AI and humans on the same tasks at the same time, as well as to human judges in previous studies. A contribution of our work concerns the difficult problem of developing prompts for AI that instruct it to complete the evaluation tasks. Our protocol is iterative—we developed and piloted prompts, revised them, piloted again, and so on until satisfied that any failure to complete a task well would not be attributable to weaknesses in the prompts. We developed our own criteria for prompts, which we will share. One hundred human subjects were recruited to act as a benchmark for the AI, and they use an online platform to complete the tasks. Comparisons of accuracy and reliability will be made across groups and tasks, providing a basis for judging the relative success of AI and human judges. Conclusions, Expected Outcomes or Findings We hypothesize that the use of lesson transcripts versus video or audio only will reduce the sources of bias such that humans will be able to more accurately distinguish between above-average and below-average teachers. We further hypothesize that AI will be more accurate than humans, and can be successfully trained to produce reliable evaluations using a formal observation system. References Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage. Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 470-428. Strong, M. (2011). The highly qualified teacher: What is teacher quality and how do we measure it? New York: Teachers College Press. Strong,M., Gargani, J., & Hacifazlioğlu, Ö. (2011). Do we know a successful teacher when we see one? Experiments in the identification of effective teachers. Journal of Teacher Education, 20(10), 1-16. 09. Assessment, Evaluation, Testing and Measurement
Paper The Evaluation of Online Content. Development and Empirical Evaluation of a Measurement Instrument for Primary School Children University of Wuerzburg, Germany Presenting Author:Children today are growing up in a digitally connected world which sets them apart from previous generations. For example, 42% of 5- to 7-year-olds have their own tablet and 93% of 8- to 11-year-olds spend an average of 13.5 hours online (Ofcom 2022). Digital media provides opportunities for easier access to information and communication with peers. However, it also presents a range of risks, especially for children who are particularly vulnerable due to their young age. This becomes clear when they are confronted with violent, sexual, advertising, or judgmental content in the digital space (Livingstone et al. 2015). Other challenges in digital communication and information channels include fake news, propaganda and deepfakes. With regard to the aforementioned aspects, it is necessary to possess skills that enable a critical examination of information. For this reason, information evaluation is considered an important subskill for social participation and learning inside and outside of school. When examining the internet preferences of children and young people, it becomes apparent that they are primarily interested in extracurricular activities rather than child-friendly services commonly discussed in school settings, such as children's search engines. The top four internet activities include WhatsApp, watching films and videos, and using YouTube and search engines (Feierabend et al., 2023). In this respect, WhatsApp, YouTube, and TikTok are the most popular (social media) platforms (Reppert-Bismarck et al. 2019). The evaluation of content is not limited to online research alone. It can also occur in different scenarios, such as browsing the internet for entertainment or out of boredom). In this regard, the strategies for evaluating content vary depending on the purpose of the discussion (Weisberg et al. 2023), allowing the assessment of information, data, and content from different angles. One approach to evaluate content is to verify its credibility. In research literature, credibility encompasses multiple aspects. This includes assessing the trustworthiness of content, such as recognizing intention, or the expertise of the author. However, studies show that young people tend to lack critical evaluation skills when it comes to the credibility of online content (Kiili et al. 2018) and are also insufficiently prepared to verify the truthfulness of information (Hasebrink et al. 2019). In the context of social media in particular, the question of the realism of the shared content (e.g., factuality or plausibility) arises. Recipients are faced with the challenge of multiplicity resulting from the different ‘realities’ on social media. These realities are shaped by different motivations, attitudes, and political or social contexts which can blur boundaries (Cho et al. 2022). Overall, the evaluation process of online content is influenced by various factors. For instance, research suggests that reading competence affects the evaluation process. Furthermore, the socioeconomic status has been found to influence the digitalization-related skills of young people (see ICILS results). Another important aspect to consider is the influence of platform-specific knowledge, such as understanding the YouTube algorithm, and topic-specific knowledge on content evaluation, such as the subject of a news video. In addition, the design of both the platform and the content can also have an impact. This includes factors such as image-to-text ratio, layout, effects, and the focus of the central message. To which extent the presented assumptions apply to primary school children is unclear, as most empirical results relate to adults or adolescents. Therefore, the overarching goal of the project is to develop a standardized measurement instrument for primary school children in order to assess to which extent they are able to evaluate internet content. The creation of a standardized measurement instrument involves several substeps which are outlined below. Methodology, Methods, Research Instruments or Sources Used Model The development of a measurement instrument requires a theoretical and empirical foundation. We believe there is a limited number of models that specifically address the evaluation of online content in primary school children. Therefore, we examined constructs related to the subcompetence of 'evaluation' to develop a theory- and empirically-based measurement model. For this purpose, we used normatively formulated standards, theoretical models and empirical studies that systematize, assess, or discuss information, media, digital, internet and social media skills. The analysis of these constructs can yield various criteria for evaluating online content, such as credibility or realism. For instance, context is crucial when evaluating content (e.g., advertising content; Purington Drake et al., 2023). As most of the analysis is not related to primary schools, all German curricula (e.g., based on DigComp, Ferrari 2013) were examined for relevant subcompetencies and content areas. The aim is to compare the research results with normative requirements in the primary school sector to ensure that competence targets are not set unrealistically high. Assessment instrument Based on the measurement model, we developed a digital performance test with 20 multiple-choice tasks. To increase content validity, the instrument includes multimodal test items from the age group's most popular platforms (e.g., YouTube). The operationalization includes phenomena that are platform-specific (e.g., clickbait). Assessment criteria were derived for each content area and subcompetency and adapted to the specific platform content, such as a promotional video with child influencers. Expert interviews in the online children's sector additionally contributed to the development of age-appropriate content and evaluation criteria (Brückner et al. 2020). Validation steps/procedures To validate the 20 test items, a qualitative comprehensibility analysis was conducted in small group discussions with school and university experts (n=12). Following that, five children were accompanied by the thinking aloud method while they solved the test items (Brandt and Moosbrugger 2020). Both validation steps led to linguistic and content-related adjustments. Pilot study An initial test of the measurement instrument was conducted with 81 pupils (56.8% female) in Grade 3/4 (M=10.4, SD=0.64). 57 children were given parental permission to provide information on their socioeconomic status (HISEI=47.44, SD=16.42). 51.9% predominantly speak another language at home. The aim of the pilot study was to perform an initial descriptive item analysis to determine task difficulty, variance, and selectivity. The calculation of an overall score requires item homogeneity, wherein high selectivity indices serve as an initial indication (Kelava and Moosbrugger 2020). Conclusions, Expected Outcomes or Findings The results of the piloting showed that 15 out of 20 test items had a task difficulty of 45≤Pi≤78. Five items had a higher difficulty (25≤Pi≤39). These items primarily dealt with phishing, clickbait, the use of third-party data, and bots. The correlative relationships calculation showed an inconsistent picture for the respective tasks which resulted in low selectivity indices (rit<.3) in some cases. Due to the small sample size, it was not possible to definitely determine whether the data had a unidimensional or multidimensional structure (principal component analysis/varimax rotation). As a result, the selectivity was not further interpreted (Kelava and Moosbrugger 2020). It is not surprising that students struggled with test tasks involving deception and personality interference, as even adults find phenomena like bots to be challenging (Wineburg et al. 2019). This raises the question of whether this content is appropriate for primary schools despite its real-world relevance. Methodological challenges in investigating such phenomena and implications for school support are discussed in the main study. As a result of the pilot study, the five most challenging tasks were adjusted in terms of difficulty without altering the core content (e.g., linguistic adaptations of questions/answers, replacement of videos). To obtain precise information on unidimensionality, IRT models were utilized for data analysis in the main study (Kelava and Moosbrugger 2020). The data collection was completed in December 2023 (n=672) and aims to gain more precise insights into item and test quality. The quality results of the measurement instrument will be reported at the conference with a focus on the area of deception. This study raises the question of whether primary school children are able to evaluate deceptive content and what methodological challenges this poses for measurement. This study will investigate whether individual variables (socioeconomic status, migration history) influence the evaluation of deceptive content. References Brandt, Holger; Moosbrugger, Helfried (2020): Planungsaspekte und Konstruktionsphasen von Tests und Fragebogen. In: Helfried Moosbrugger und Augustin Kelava (Hg.): Testtheorie und Fragebogenkonstruktion. Berlin, Heidelberg: Springer Berlin Heidelberg, S. 41–66. Brückner, Sebastian; Zlatkin-Troitschanskaia, Olga; Pant, Hans Anand (2020): Standards für pädagogisches Testen. In: Helfried Moosbrugger und Augustin Kelava (Hg.): Testtheorie und Fragebogenkonstruktion. Berlin, Heidelberg: Springer Berlin Heidelberg, S. 217–248. Cho, Hyunyi; Cannon, Julie; Lopez, Rachel; Li, Wenbo (2022): Social media literacy: A conceptual framework. In: New Media & Society, 146144482110685. DOI: 10.1177/14614448211068530. Feierabend, Sabine; Rathgeb, Thomas; Kheredmand, Hediye; Glöckler, Stephan (2023): KIM-Studie 2022 Kindheit, Internet, Medien. Basisuntersuchung zum Medienumgang 6-bis 13-Jähriger. Hg. v. Medienpädagogischer Forschungsverbund Südwest (mpfs). Online verfügbar unter https://www.mpfs.de/studien/kim-studie/2022/. Ferrari, Anusca (2013): DIGCOMP: A Framework for Developing and Understanding Digital Competence in Europe. Eurpean Commission Joint Research Center. Online verfügbar unter https://publications.jrc.ec.europa.eu/repository/handle/JRC83167, zuletzt geprüft am 16.05.2023. Hasebrink, Uwe; Lampert, Claudia; Thiel, Kira (2019): Online-Erfahrungen von 9- bis 17-Jährigen. Ergebnisse der EU Kids Online-Befragung in Deutschland 2019. 2. überarb. Auflage. Hamburg: Verlag Hans-Bredow. Kelava, Augustin; Moosbrugger, Helfried (2020): Deskriptivstatistische Itemanalyse und Testwertbestimmung. In: Helfried Moosbrugger und Augustin Kelava (Hg.): Testtheorie und Fragebogenkonstruktion. Berlin, Heidelberg: Springer Berlin Heidelberg, S. 143–158. Kiili, Carita; Leu, Donald J.; Utriainen, Jukka; Coiro, Julie; Kanniainen, Laura; Tolvanen, Asko et al. (2018): Reading to Learn From Online Information: Modeling the Factor Structure. In: Journal of Literacy Research 50 (3), S. 304–334. DOI: 10.1177/1086296X18784640. Livingstone, S.; Mascheroni, G.; Staksrud, E. (2015): Developing a framework for researching children’s online risks and opportunities in Europe. EU Kids Online. Online verfügbar unter https://eprints.lse.ac.uk/64470/1/__lse.ac.uk_storage_LIBRARY_Secondary_libfile_shared_repository_Content_EU%20Kids%20Online_EU%20Kids%20Online_Developing%20framework%20for%20researching_2015.pdf, zuletzt geprüft am 11.01.2024. Ofcom (2022): Children and parents: media use and attitudes report. Online verfügbar unter https://www.ofcom.org.uk/__data/assets/pdf_file/0024/234609/childrens-media-use-and-attitudes-report-2022.pdf. Purington Drake, Amanda; Masur, Philipp K.; Bazarova, Natalie N.; Zou, Wenting; Whitlock, Janis (2023): The youth social media literacy inventory: development and validation using item response theory in the US. In: Journal of Children and Media, S. 1–21. DOI: 10.1080/17482798.2023.2230493. Reppert-Bismarck; Dombrowski, Tim; Prager, Thomas (2019): Tackling Disinformation Face to Face: Journalists' Findings From the Classroom. In: Lie Directors. Weisberg, Lauren; Wan, Xiaoman; Wusylko, Christine; Kohnen, Angela M. (2023): Critical Online Information Evaluation (COIE): A comprehensive model for curriculum and assessment design. In: JMLE 15 (1), S. 14–30. DOI: 10.23860/JMLE-2023-15-1-2. Wineburg, Sam; Breakstone, Joel; Smith, Mark; McGrew, Sarah; Ortega, Teresa (2019): Civic Online Reasoning: Curriculum Evaluation (working paper 2019-A2, Stanford History Education Group, Stanford University). Online verfügbar unter https://stacks.stanford.edu/file/druid:xr124mv4805/COR%20Curriculum%20Evaluation.pdf, zuletzt geprüft am 29.06.2023. 09. Assessment, Evaluation, Testing and Measurement
Paper A Multilevel Meta-Analysis of the Validity of Student Rating Scales in Teaching Evaluation. Which Psychometric Characteristics Matter Most? 1West University of Timisoara, Romania; 2University of Bucharest, Romania Presenting Author:Student Teaching Evaluation (STE) is the procedure by which teaching performance is measured and assessed through questionnaires administered to students. Typically, these questionnaires or scales refer to the teaching practices of academic staff and are conducted in one of the last meetings of the semester. Generally, and from a practical standpoint, the primary purpose of implementing this procedure is the necessity of universities to report STE results to quality assurance agencies. Another main objective of STE procedures, and certainly the most important from a pedagogical perspective, is to provide feedback to teachers about their teaching practices. Previous studies on the highlighted topic present arguments both for and against the validity and utility of STE. On one hand, there are studies suggesting that STE results are influenced by other external variables, such as the teacher's gender or ethnicity (e.g., Boring, 2017), lenient grading (e.g., Griffin, 2004), or even the teacher's personality (e.g., Clayson & Sheffet, 2006). On the other hand, there are published works showing that STE scales are valid and useful (e.g., Hammonds et al., 2017; Wright & Jenkins, 2012). Furthermore, when STE scales are rigorously developed and validated, as is the case with SEEQ (Marsh, 1982, 2009), there is a consistent level of agreement and evidence suggesting that STE scale scores are multidimensional, precise, valid, and relatively unaffected by other external variables (Marsh, 2007; Richardson, 2005; Spooren et al., 2013). Even though this debate was very active in the 1970s and the evidence leaned more in favor of STE validity (Richardson, 2005; Marsh, 2007), a recent meta-analysis (Uttl et al., 2017) presented evidence that seriously threatens the validity of STE results. They suggest that there is no relationship between STE results and student performance levels. The existence of this relationship is vital for the debate on STE validity, starting from the premise that if STE results accurately reflect good or efficient teaching, then teachers identified as more performant should facilitate a higher level of performance among their students. In light of all the above and referring to the results of the meta-analysis conducted by Uttl et al. (2017), the present study aims to investigate whether the relationship between STE results and student learning/performance is stronger when the STE scale used is more rigorously developed and validated. For this purpose, a multilevel meta-analysis was conducted, allowing us to consider multiple effect sizes for each study included in the analysis. The results of this study can be useful in nuancing the picture of the validity of STE scales, in the sense that they can show us whether scales developed and validated in accordance with field standards can measure the quality of teaching more correctly and precisely. Additionally, this research can help outline a picture of which psychometric characteristics of STE scales contribute to a better measurement of teaching efficiency/effectiveness. Therefore, the research questions guiding the present study are as follows:
Methodology, Methods, Research Instruments or Sources Used The present study is a multilevel meta-analysis on the relationship between STE (Student Teaching Evaluation) results and student performance in multi-section STE studies, and on the moderating effect of this relationship, of different psychometric characteristics (level and type of validity evidence of the STE scales, the content of dimensions, and the level of observability/clarity of the items) of the STE scales used in these studies. To be included in this meta-analysis, a study had to meet the following inclusion criteria: 1. Present correlational results between STE results and student performance. 2. Analyze the relationship between STE results and student performance in multiple sections of the same discipline (“multi-section STE studies”). 3. Students completed the same STE scale and the same performance assessment tests. 4. Student performance was measured through objective assessments focusing on actual learning, not students' perceptions of it. 5. The correlation between STE results and student performance was estimated using aggregate data at the section level, not at the individual student level. The search for studies in the specialized literature was conducted through three procedures: 1) analysis of the reference list of similar meta-analyses; 2) examination of all articles citing Uttl (2017); 3) use of a search algorithm in the Academic Search Complete, Scopus, PsycINFO, and ERIC databases. After analyzing the abstracts and reading the full text of promising studies, 43 studies were identified and extracted that met the inclusion criteria. For coding the level of validity evidence of the STE measures used, we adapted a specific framework of psychometric evaluation criteria, proposed by Hunsley & Mash (2008). In adapting the previously mentioned evaluation framework, the recommendations put forth by Onwuegbuzie (2009) and the recommendations of AERA, APA & NCME (2014) were also considered. For coding the level of observability/clarity of the items that make up the STE scales used in the analyzed studies, we created a coding grid based on Murray (2007), which presents and explains the importance of using items with a high degree of measurability to reduce the subjectivity of the students responding to these items. The data were analyzed in R (metafor package) using the multilevel meta-analysis technique because most of the included studies report multiple effect sizes, usually one for each dimension of the STE scale. This type of analysis helps to better calculate average effects, starting from the original structure of the data presented in the primary studies. Conclusions, Expected Outcomes or Findings The obtained results suggest that: 1) STE (Student Teaching Evaluation) scales with more validity evidence tend to measure teaching effectiveness better; 2) there is a set of dimensions that are more suitable than others for correctly measuring teaching effectiveness (for example, clarity of presentation, instructor enthusiasm, interaction with students, and availability for support had the strongest relationships with performance); and 3) the degree of observability of the items that make up the STE scales is a major factor regarding the ability of these scales to accurately measure teaching effectiveness. Regarding the level of observability of the items contained in the STE scales, they were divided into 3 categories (low/medium/high observability) and the relationship between STE results and student performance was comparatively analyzed for each category. As expected, the moderating effect is significant, meaning that there are significant differences between the correlations obtained within each category of studies. The strongest relationships exist in the case of items with a high degree of observability, and as this degree of observability decreases, the intensity of the correlation between STE results and student performance also significantly decreases. These results can help nuance the picture of the validity of STE scales, suggesting that STE scales developed and validated in accordance with the standards of the field can measure the quality of teaching more correctly and precisely. It can also be said that the proposed dimensionality and the level of observability of the items are of major importance in the development of any STE scale. These recommendations can be useful in any process of development or adaptation of an STE scale for use in the process of ensuring the quality of teaching in the university environment. References American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Boring, A. (2017). Gender biases in student evaluations of teaching. Journal of public economics, 145, 27-41. Clayson, D. E., & Sheffet, M. J. (2006). Personality and the student evaluation of teaching. Journal of Marketing Education, 28, 149–160. Griffin, B. W. (2004). Grading leniency, grade discrepancy, and student ratings of instruction. Contemporary Educational Psychology, 29, 410–425. Hammonds, F., Mariano, G. J., Ammons, G., & Chambers, S. (2017). Student evaluations of teaching: improving teaching quality in higher education. Perspectives: Policy and Practice in Higher Education, 21(1), 26-33. Hunsley, J., & Mash, E. J. (2008). Developing criteria for evidence-based assessment: An introduction to assessments that work. A guide to assessments that work, 2008, 3-14. Marsh, H. W. (2007). Students’ evaluations of university teaching: Dimensionality, reliability, validity, potential biases and usefulness. In P.R., Pintrich & A. Zusho (Coord.), The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 319-383). Springer, Dordrecht. McPherson, M. A., Todd Jewell, R., & Kim, M. (2009). What determines student evaluation scores? A random effects analysis of undergraduate economics classes. Eastern Economic Journal, 35, 37–51. Onwuegbuzie, A. J., Daniel, L. G., & Collins, K. M. (2009). A meta-validation model for assessing the score-validity of student teaching evaluations. Quality & Quantity, 43(2), 197-209. Richardson, J. T. (2005). Instruments for obtaining student feedback: A review of the literature. Assessment & evaluation in higher education, 30(4), 387-415. Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598-642. Spooren, P., Vandermoere, F., Vanderstraeten, R., & Pepermans, K. (2017). Exploring high impact scholarship in research on student's evaluation of teaching (SET). Educational Research Review, 22, 129-141. Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22-42. Wright, S. L., & Jenkins-Guarnieri, M. A. (2012). Student evaluations of teaching: Combining the meta-analyses and demonstrating further evidence for effective use. Assessment & Evaluation in Higher Education, 37(6), 683-699. |
Date: Friday, 30/Aug/2024 | |
9:30 - 11:00 | 09 SES 14 B: Educational Justice in Kosovo Location: Room 012 in ΧΩΔ 02 (Common Teaching Facilities [CTF02]) [Ground Floor] Session Chair: Heike Wendt Session Chair: Heike Wendt Symposium |
|
09. Assessment, Evaluation, Testing and Measurement
Symposium Educational Justice in Kosovo In recent years, Kosovo has implemented a number of measures to improve quality assurance mechanisms in Kosovo. Participation in large-scale international comparative assessments is part of a monitoring strategy to compare the educational performance of Kosovo's primary and secondary school students with that of children and young adults in neighboring countries, the region, the European Union and other parts of the world, and as such has served as a key indicator of educational quality (MEST, 2020). International monitoring reports and the few academic studies that have been published reveal substantial differences in educational achievement at both primary and secondary levels in terms of student background and family indicators, place of residence (urban vs. rural areas) and school type (OECD; 2023; Mullis et al., 2017). However, in Kosovo, the rich sources of data obtained through participation in international large-scale assessments remain underutilized for educational research and monitoring, and thus have little to offer for evidence-based policy making in education. To date, inequalities have only been partially documented for achievement, but not for other important outcomes of schooling (Pavesic et al., 2022). Moreover, the causes of inequality have not been systematically investigated. This symposium aims to bring together different perspectives to provide a more coherent picture of educational equity and quality in Kosovo. It will also broaden the perspective of educational equity, whereas to date, large-scale educational monitoring studies have mainly been used to analyse variations in student performance and equity within and across education systems over time. Other perspectives of educational equity, such as participation and recognition equity, are often neglected. These perspectives do not only take into account achievement levels, variances or minimum competence levels, but also focus on capabilities for social and political participation and on factors such as the quality of social relations, the recognition of individual autonomy and voice, and well-being as an end in itself. On the basis of the four papers, inequality will be examined in terms of a) approaches to teaching, b) the relationship between motivation and achievement, c) explanations of gender differences, and d) comparisons with neighboring countries. The symposium will thus provide a basis for a critical review of the extent to which large-scale studies such as TIMSS and PISA incorporate different foci of educational equity in their concepts, indicators and analytical approaches. The relevance and challenges of including valid and reliable measures of equity concepts in the indicator set of these studies will be illustrated by discussing some secondary analyses of LSA data for Kosovo. References MEST (2022). Assessment Report for 2019 on the Kosovo Education Strategic Plan 2017-2021. Prishtina. Assessment Report for 2019 on the Kosovo Education Strategic Plan 2017-2021 - MASHT (rks-gov.net) Mullis, I. V. S., Martin, M. O., Foy, P., Olson, J. F. & Preuschoff, C. (2017). PIRLS 2016 International Results in Reading.: Findings form IEA's trend in international mathematics and science study at the fourth and eighth grades. TIMSS & PIRLS International Study Center Lynch School of Education Boston College. https://timss.bc.edu/TIMSS2007/mathreport.html OECD. (2023). Programme for International Student Assessment. OECD. PISA - PISA (oecd.org) Pavešic, B., & Koršˇnáková, Paulina, Meinck, Sabine. (2022). Dinaric Perspectives on TIMSS 2019: Teaching and Learning Mathematics and Science in South-Eastern Europe. Springer Nature. https://doi.org/10.1007/978-3-030-85802-5 Presentations of the Symposium Student-centered Teaching Practices to enhance Students’ Reading Performance
Teacher instructional practices are considered amongst the main determinants of student achievement (Cordero & Gil-Izquierdo, 2018). Moreover, depending on the practices used, teachers can either weaken or promote student achievement (Caro et al., 2016; Hwang et al., 2018). Since student achievement is positively related to teacher practices, the impact of different instructional practices on student achievement represents a topic of great relevance for educational equity (Le Donné et al., 2016). This study explores the prevalence of different instructional practices in classrooms and their association with students’ reading achievements, focusing on the 2018 PISA results in Kosovo. Drawing on a dataset of 3,906 students, the research employs exploratory factor analysis to identify three latent variables representing student-centered instruction: individualized learning instructional practices (ILIP), research-based instructional practices (RBIP), and feedback-oriented institutional practices (FOIP). The study aims to answer two main research questions: (1) Which teacher instructional practices are prevalent among teachers in Kosovo classrooms? (2) How do students of different reading proficiency levels perceive ILIP and FOIP, and is there a significant difference in reading performance among students exposed to different instructional practices? Results indicate that ILIP is the most prevalent instructional practice, followed by FOIP and RBIP. Students predominantly report occasional or minimal exposure to ILIP and FOIP. The study also reveals a negative correlation between ILIP and RBIP with FOIP, suggesting a potential trade-off between student-centered and teacher-oriented practices. Benchmarking analyses demonstrate an equal distribution of students across FOIP and ILIP categories based on reading scores, indicating that exposure alone does not guarantee higher performance. However, RBIP stands out, showing a positive correlation with better reading scores, even at low exposure levels. This research contributes to the ongoing discourse on educational equity by examining the relationship between teaching practices and student outcomes in Kosovo's education system. The findings underscore the importance of considering various instructional practices in the pursuit of equitable education and inform policy discussions around teacher professional development. The study emphasizes the need for continued efforts to align teaching practices with the goals of the education reform implemented in Kosovo, fostering a student-centered approach to enhance reading performance and reduce achievement gaps.
References:
Caro, D. H., Lenkeit, J., & Kyriakides, L. (2016). Teaching strategies and differential effectiveness across learning contexts: Evidence from PISA 2012. Studies in Educational Evaluation, 49, 30–41.
Cordero, J. M, Gil-Izquierdo, M. (2018). The effect of teaching strategies on student achievement: An analysis using TALIS-PISA-link. Journal of Policy Modeling 40, 1313–1331.
Hwang, J., Choi, K. M., Bae, Y., and Shin, D. H. (2018). Do teachers’ instructional practices moderate equity in mathematical and scientific literacy?: An investigation of the PISA 2012 and 2015. Int J of Sci and Math Educ 16, 25–45.
Le Donné, N., Fraser, P., & Bousquet., G. (2016). Teaching strategies for instructional quality: Insights from the TALIS-PISA link data. OECD Education Working Papers, No. 148, OECD Publishing.
Withdrawn
Sub-paper had to be withdrawn.
References:
.
Kosovan Perspective on Gender Equity
It is not only performance and the skills acquired that are of great importance within school and teaching, but also the students' attitudes towards the respective subjects. Learners' self-concepts, motivation and emotions are fundamental factors in learning, this is also evidenced by the performance of the students (OECD, 2023). In terms of gender, international studies report that girls are better readers than boys, partly due to differences in motivation and contextual effects. In addition, girls tend to have more positive attitudes towards reading and consider themselves to be more literate than boys (Mullis et al., 2017; OECD, 2023). Several studies identified the following characteristics as the cause for the better grade point averages of girls. Girls have higher self-discipline, self-control and self-regulation (Weis et al., 2013) and a higher interest in school in general (Houtte, 2004), they exert themselves more and work more, while disrupting lessons less (Downey & Vogue, 2004) are less avoidant of work, show less problem behaviour and better social behaviour (DiPrete, 2008). Although the additional effort of girls is mentioned here as the cause of the gender differences, the costs borne by girls are hardly taken into account, therefore the focus of this article is the elaboration of gender differences in reading-related self-concept and reading competencies, taking into account the family background of 15-year-old students in Kosovo. For this purpose, data from the PISA 2018 study are analyzed using regression analysis with the IEA IDB Analyzer. Rather than testing factual knowledge, PISA tests students' ability to apply and connect this knowledge. The study is conducted every three years and covers three areas, reading, mathematics and science, with reading being the focus of the assessment in 2018 (OECD, 2023). The results show that although girls perform better in reading and have a higher reading-related self-concept, even taking into account their family background, this is associated with higher costs for these girls, as they also report greater fear of failure.
References:
DiPrete, T. A. & Jennings, J. L. (2012). Social and behavioral skills and the gender gap in early educational achievement. Social Science Research, 41 (1), 1–15.
Downey, D. B. & Vogt Yuan, A. S. (2005). Sex differences in school performance during high school: Puzzling patterns and possible explanations. The Sociological Quarterly, 46 (2), 299–321
Houtte, M. v. (2004). Why boys achieve less at school than girls: The difference between boys’ and girls’ academic culture. Educational Studies, 30 (2), 159–173.
Mullis, I. V. S., Martin, M. O., Foy, P., Olson, J. F. & Preuschoff, C. (2017). PIRLS 2016 International Results in Reading.: Findings form IEA's trend in international mathematics and science study at the fourth and eighth grades. TIMSS & PIRLS International Study Center Lynch School of Education Boston College. https://timss.bc.edu/TIMSS2007/mathreport.html
OECD. (2023). Programme for International Student Assessment. OECD. PISA - PISA (oecd.org)
Weis, M., Heikamp, T. & Trommsdorff, G. (2013). Gender differences in school achieve[1]ment: The role of self-regulation. Frontiers in Psychology, 4 (442), 1–10
Cross cultural validity: Educational Justice in the Balkan region
Empirical educational research has shown that social inequalities influence the student´s educational achievement. Different theoretical frameworks, such as Bourdieu's capital theory refer to how different types of capital can be related to achievement. For several countries, the correlation of socio-economic status and student´s educational achievement has been shown (Wendt et al., 2012). About 23% of people in Kosovo live in poverty and the GDP is a quarter of the European average (UNICEF, 2021), which indicates a high level of social inequalities. The fact that post-conflict countries, such as Kosovo, have less beneficial conditions and are therefore also associated with lower levels of educational achievement is e.g. shown by the results of the TIMSS study (Mullis et al., 2020). However, analyses show that social inequality in Kosovo is quite small (Wendt et al., i.p.). Moreover, this raises the question of whether the operationalization of socio-economic status in post-conflict countries can take place in the same way than in other countries and whether constructs developed in one culture are valid for other cultures (Matsumoto, 2003). In this article we therefore analyze the operationalization of the socio-economic status indicators. As a theoretical framework we used Bourdieu´s theory (2003, 2012) of capital.
Therefore, we conducted secondary analysis of the TIMSS 2019 data of Kosovo at grade 4 to analyze the extent of differences in mathematics and science performance that can be explained by Bourdieu's theory of capital. We use data from the household survey and the student questionnaire, self-reported by parents and students (nstudents= 4496; mean age 9.9). We conducted multivariate regression analysis using the IEA IDB Analyzer.
We found a significant relationship for parental education level and number of books for both math and science The lack of economic resources at home is negatively related to mathematics and science achievement. However, the largest difference in achievement is found among children who come to school hungry, with a difference of 31.5 points in science. The "effect" of language practices on science achievement remains significant when controlling for cultural and economic resources. This can be seen as a first indication that the cultural capital acquired by parents through conflict-related migration makes an independent explanatory contribution to the differences in their children's performance. The overall variability in student performance that this model can explain is limited, explaining only about 13% of the variation in student performance, indicating that other important factors may not be accounted for.
References:
Boudon, R. (1974). Education, Opportunity, and Social Inequality. Changing Prospects in Western Society. Wiley.
Bourdieu, P. (2003). Interventionen, 1961-2001: Sozialwissenschaft und politisches Handeln. Raisons d'agir. VSA-Verlag.
Bourdieu, P. (2012). Ökonomisches Kapital, kulturelles Kapital, soziales Kapital. In U. Bauer, U. H. Bittlingmayer, & A. Scherr (Eds.), Handbuch Bildungs- und Erziehungssoziologie (pp. 229–242). VS Verlag für Sozialwissenschaften. https://doi.org/10.1007/978-3-531-18944-4_15
Matsumoto, D. (2003). Cross‐cultural Research. In S. F. Davis (Ed.), Handbook of Research Methods in Experimental Psychology (pp. 189–208). Wiley. https://doi.org/10.1002/9780470756973.ch9
UNICEF. (2021). Annual Report. https://www.unicef.org/kosovoprogramme/media/2931/file/English-2022.pdf
Wendt, H., Stubbe, T., & Schwippert, K. (2012). Soziale Herkunft und Lesekompetenzen von Schülerinnen und Schülern. In W. Bos, I. Tarelli, A. Bremerich-Vos, & K. Schwippert (Eds.), IGLU 2011Lesekompetenzen von Grundschulkindern in Deutschland im internationalen Vergleich (pp. 175–190). Waxmann Verlag.
|
11:30 - 13:00 | 09 SES 16 B: Exploring Factors Influencing Academic Achievement and Motivation Location: Room 012 in ΧΩΔ 02 (Common Teaching Facilities [CTF02]) [Ground Floor] Session Chair: Mari-Pauliina Vainikainen Paper Session |
|
09. Assessment, Evaluation, Testing and Measurement
Paper Development of Cognitive Learning to Learn Competences, Learning-related Beliefs, and School Achievement Through the Nine-year Basic Education in Finland 1University of Helsinki, F, Finland; 2Tampere University, Finland; 3University of Turku, Finland Presenting Author:Learning to learn skills are fundamental cognitive, metacognitive, motivational, and affective resources to help reach a learning goal (James, 2023). Acquiring these skills and abilities is vital for lifelong learning in the 21st century. The Finnish Learning to Learn (L2L: Hautamäki, 2002; Vainikainen & Hautamäki, 2022) scales have been developed and utilised in national and regional assessments since the late 1990s. They cover general cognitive competences needed in different school subjects, such as reading comprehension, mathematical thinking skills, general thinking and reasoning skills, and problem-solving. This paper reports on a longitudinal L2L study, in which around 1000 children were followed through the nine-year basic education in Finland. Longitudinal studies can collect a broad range of information and provide unique insight into the importance of cognitive development in the early stages of education, identify connections between student abilities and academic achievement, and allow for adjustments to the pedagogical process throughout schooling. Studying the characteristics of stability and trends in the development of cognitive abilities in different age groups makes it possible to identify the weakest points and direct pedagogical efforts to increase the level of abilities and motivation (Metsämuuronen, J., & Tuohilampi, 2014). The level of development of cognitive abilities largely determines performance in mathematics and other subjects and seems to influence children's goal orientation in learning (Mägi et al., 2010; Williams, T., & Williams, K. 2010). Longitudinal assessments of them also make it possible to identify certain trends in the development of certain skills at different age periods, which must be taken into account in the diagnosis and evaluation of the learning process (Weinstein, 2015). The present study focuses on the development and changes in the cross-curricular cognitive competences and learning-related beliefs measured by the Finnish L2L scales. We also study how they are reflected on pupils’ school achievement as measured by grade point average (GPA). We aim at analysing how individual and group-level differences develop from when the pupils enter the formal education system until they complete basic education and move to the tracked upper secondary education. We answer the following questions: 1. How are the cognitive L2L competences, learning-related beliefs and school achievement connected and how do they influence each other over the years during basic education? 2. How stable are the individual and group-level trends observed in cognitive L2L competences, learning-related beliefs and school achievement throughout the school years? Methodology, Methods, Research Instruments or Sources Used A nine-years longitudinal L2L study was conducted in one large Finnish municipality starting in 16 randomly sampled schools with 744 first grade pupils. For the second measurement, 4 new schools were included, making the pupil-level sample size around 1000. Assessments were conducted during multiple occasions including the 1st, 4th, 6th, and 9th grade assessments reported in this paper. At the beginning of the first school year, the pupils completed a learning preparedness test. In the subsequent assessments, they completed mathematical thinking, reading comprehension, and general reasoning subscales of the Finnish learning-to-learn test, and answered questionnaires about their learning-related beliefs. In this paper, we used the subscale measuring pupils’ agency beliefs of effort based on Skinner’s action-control theory (1988). The pupils rated themselves in relation to presented statements on a 7-point Likert scale. For the cognitive test and GPA, we calculated a manifest average score over different domains/subjects for each measurement point. Learning-related beliefs were included in the models as latent factors. The 1st grade learning preparedness test was used in the model as a latent factor consisting of three subscores (analogical reasoning; visuo-spatial memory; following instructions and inductively reasoning the applied rule). We specified a cross-lagged panel model in Mplus 8 to study the interrelations of the 4th, 6th and 9th, grade cognitive competences, learning-related beliefs and GPA. In addition, we predicted the 4th grade variables by the latent 1st grade learning preparedness test score. Before specifying the full model, we tested measurement invariance of latent factors over time and groups by constraining factor loadings and intercepts stepwise and studying the change in fit indices. In general, we used RMSEA <.06, CFI and TLI <.95 (Kline, 2005) as criteria for a good model. We first ran the model in the full data, and after that we performed multiple-group comparisons. Conclusions, Expected Outcomes or Findings We first focused on studying the level of cognitive competences, learning-related beliefs and GPA over the years. As expected based on earlier literature, pupils’ cognitive competences considerably improved, but the level of learning-related beliefs declined from the 4th to the 9th grade. The cognitive differences between pupils observed when the pupils started their school path seemed relatively stable over time, as in the cross-lagged panel model (CFI= .984, TLI = .979, RMSEA = .0, 26, p < .001), the first grade learning preparedness test score predicted 4th grade performance very strongly (β=.82), and there was a relatively strong connection between the test scores of subsequent assessments as well. The first grade learning preparedness predicted fourth grade GPA (β=.44), and also GPA seemed to be very stable over the years. Learning-related beliefs, on the contrary, were on the fourth grade not predicted by learning preparedness, and their connection with the other variables in the model were weak. However, the connections strengthened over time when pupils’ self-evaluation skills improved and the overly positive evaluations declined by the sixth grade. Overall, learning-related beliefs seemed to be somewhat more connected with GPA than cognitive competences, perhaps indicating that pupils are to some extent rewarded for the effort they put in schoolwork regardless of the cognitive outcomes. We also found some cross-lagged effects over time, and in the next stage, we will focus on studying these in multiple-group analyses based on competence levels and gender. References Hautamäki, J., Arinen, P., Eronen, S., Hautamäki, A., Kupiainen, S., Lindblom, B., & Scheinin, P. (2002). Assessing learning-to-learn: A framework. National Board of Education, Evaluation 4/2002. James, M. (2023). Assessing and learning, and learning to learn. International Encyclopedia of Education (Fourth Edition), p. 10-20. https://doi.org/10.1016/b978-0-12-818630-5.09015-1. James, M. (2010). An overview of Educational Assessment. In: P. Peterson, E. Baker& B. McGaw (Eds.) International Encyclopedia of Education. Vol.3: 161-171. Oxford: Elsevier Marsh, H. W., Byrne, B. M., & Shavelson, R. J. (1988). A Multifaceted Academic Self-Concept: Its Hierarchical Structure and Its Relation to Academic Achievement. Journal of Educational Psychology, 82(4), 623–636. https://doi/10.1037/0022-0663.80.3.366 Metsämuuronen, J., & Tuohilampi, L. (2014). Changes in Achievement in and Attitude toward Mathematics of the Finnish Children from Grade 0 to 9—A Longitudinal Study. Journal of Educational and Developmental Psychology , 4(2), 145-169. https://doi.org/10.5539/jedp.v4n2p145 Mägi K, Lerkkanen M-K, Poikkeus, A-M, Rasku-Puttonen H & Kikas E (2010). Relations between achievement goal orientations and math achievement in primary grades: A follow-up study. Scandinavian Journal of educational Research, 54(3), 295‒312. Skinner, E. A., Chapman, M., & Baltes, P. B. (1988). Control, means-ends, and agency beliefs: A new conceptualization and its measurement during childhood. Journal of Personality and Social Psychology, 54(1), 117–133. https://doi.org/10.1037/0022-3514.54.1.117 Vainikainen , M-P & Hautamäki , J 2022 , Three Studies on Learning to Learn in Finland :Anti-Flynn Effects 2001-2017 ' , Scandinavian Journal of Educational Research , vol. 66 , no. 1 , pp. 43-58 . https://doi.org/10.1080/00313831.2020.1833240 Weinstein, C. E., Krause, J., Stano, N., Acee,T., Jaimie,K., Stano, N.(2015), Learning to Learn, 2015 International Encyclopedia of the Social & Behavioral Sciences (Second Edition) p.712-719 Weinstein, C., Krause, J., Stano, N., Acee, T., Jaimie, R. (2015) Learning to Learn. International Encyclopedia of Education (Second Edition), p. 712-719 Williams, T., & Williams, K. (2010). Self-efficacy and performance in mathematics: Reciprocal determinism in 33 nations. Journal of Educational Psychology, 102(2), 453-466. http://dx.doi.org/10.1037/a0017271 09. Assessment, Evaluation, Testing and Measurement
Paper Motivation Profiles as Explanatory Factors of Task Behaviour and Student Performance 1University of Helsinki, Finland; 2University of Turku, Finland; 3Tampere University, Finland Presenting Author:Student’s effort and motivational factors behind it have an essential role in determing how students approach new tasks and perform in them (e.g., Kupiainen et al., 2014). Together, they affect the ability to apply the cognitive processes fundamental to identifying problems and designing and applying solutions (Kong & Abelson, 2019; Skinner ym., 1998). These processes have traditionally been measured and evaluated through self-reports and observation. While these methods undoubtedly have an important place in the human sciences, they have challenges regarding validity and large sample sizes. One solution to these challenges is that the technology's vast potential allows seamless data collection from individuals in digital environments without disrupting their natural activities (Wise & Gao, 2017). Hence, this paper focuses on investigating what time on task, number of trials, and use of problem-solving strategies in different tasks tell us about student performance and whether the results in different tasks are consistent with each other. The relations between these task behavior indicators are examined from the perspective of motivational profiles students may hold by examining whether the profiles differ in this matter. In this study, the focus is on students' control-related beliefs within the framework of Action-Control Theory (Skinner et al., 1988). According to the theory, perceived control encompasses beliefs about the relation of agents, means, and ends, shaping a student's perception of how school outcomes are achieved and the extent to which they are actively involved. These beliefs are found to be related to school achievement in to a varying degree and varying hindering or fostering effects. Accordingly, while some students with beliefs that have shown to positively predict school performance have done well, other students with similarly above average beliefs have done less well, highlighting the existence and importance of different combinations of beliefs when considering their association with motivational orientation and performance (Malmberg & Little, 2007). Treating time use as a measure of motivational investment in a task is grounded in Carroll's Model of School Learning (Carroll, 1989). According to the model, students vary in the time they need to learn, which in turn depends on students' aptitude for the task, their ability to understand instruction, and the quality of instruction. Higher aptitude corresponds to shorter learning times, while lower aptitude may require more effort. The time students ultimately invest in learning is composed of the time allocated for learning and the time students are willing to dedicate. The required time, the time spent, and the quality of instruction act as the determinants of the level of learning (Kupiainen et al., 2014). Computer-based assessment (CBA) research has confirmed that students' too short time on task indicates a lack of effort and task commitment (e.g., Wise & Gao, 2017). This results from reacting too quickly compared to the time needed for a proper task solution (Schnipke, 1995). This supports findings in problem-solving tasks, indicating that in every ability level longer response times positively correlate with correct answers as task difficulty increases (Goldhammer et al., 2014). The study delves into the diverse strategies individuals employ during problem-solving that guide the problem-solving process and ultimately influence how effectively they navigate problem-solving situations (Stubbart & Ramaprasad, 1990). Some problems may require multiple trials and inductive reasoning, while in other problems the most appropriate way is to test how individual variables affect the outcome, isolating the effect of other variables. CBA enables the exploration of these strategies by utilizing log data collected during tasks, which have been done in the past, particularly for studying the differentiation of the effect of variables in solving more complex problems (e.g., Greiff et al., 2016). Methodology, Methods, Research Instruments or Sources Used This study uses national longitudinal data for the academic year 2021-2022 (N = 8556) collected by the University of Tampere and the University of Helsinki in the framework of the DigiVOO project. This study does not use the longitudinal aspect but includes measures from three different measurement points. Motivational beliefs were assessed using Action-Control Theory Scales (e.g., Chapman et al., 1990), covering agency beliefs on ability and effort, control expectancy, and means-ends beliefs on various factors. Each scale included three items with a 7-point Likert-type scale (1 = not true at all, to 7 = very true). The success rate in problem-solving tasks was computed from the overall percentage of correct answers in programming tasks (code building and debugging) and a task measuring vary-one-thing-at-a-time (VOTAT) problem-solving strategies (Greiff et al., 2016). The programming tasks involved coding a robot to pick up a sock in a room with obstacles. The VOTAT-based task, Lilakki, required students to vary conditions for optimal plant growth. Task behavior indicators were derived from log data, including time on task measured in seconds and trials related to the number of completed items in programming tasks. Problem-solving strategies (VOTAT) in Lilakki were analyzed by calculating the relative percentage of used strategies from the overall number of trials in the task. General Point Average (GPA) reflected students' prior ability against the achievement in problem-solving tasks, incorporating grades in Finnish, mathematics, English, history, and chemistry. In this study, latent profile analysis (LPA) and multigroup structural equation modeling (SEM) will be conducted. LPA is used to identify subgroups of students based on their self-reports on the motivational measures. Fit indices for LPA are Bayesian Information Criterion (BIC), sample size adjusted BIC (SABIC), Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC), Vuong-Lo-Mendel-Rubin likelihood ratio test (VLMR), adjusted VLMR, and Bootstrap Loglikelihood ratio test (BLRT) and entropy. In addition, the elbow plot method for AIC, CAIC, BIC, and SABIC is used, and the qualitative investigation is done against substantive theory and previous studies. In multigroup SEM, the MLR estimator will be used. The goodness of fit of the model will be assessed by the following fit indices: RMSEA (< 0.05 = good model, < 0.08 = acceptable model) and CFI & TLI (> 0.95 = good model, > 0.90 = acceptable model). Conclusions, Expected Outcomes or Findings Preliminary results concerning motivational profiles have been analyzed. Based on the fit indices, elbow method, and qualitative inspection, a 5-class solution in LPA was considered the best fit. The five motivational profiles are preliminarily named Avoidant, Normative, Mildly Agentic, Agentic, and Mixed. Students in the Agentic (Class 1) profile saw their effort and ability and control over school achievement most positively compared to believing that luck and ability would determine school outcomes. Thus, this profile was considered to have the most adaptive beliefs. Mildly agentic (Class 2) and Moderate (Class 3) reflected pattern demonstrated by Agentic students but moderately. Avoidant (Class 4) students had the lowest adaptive beliefs (i.e., beliefs about their ability, effort, and control as well as effort as a means for success) and attributed school outcomes to ability over other beliefs. In the Mixed profile (Class 5), students had one of the most positive adaptive beliefs with the Agentic profile. Similarly, they possessed the most positive means-ends beliefs on ability and luck. This profile is seen to indicate adaptive as well as maladaptive consequences to achievement (Malmberg & Little, 2007). In multigroup SEM, the hypothesis is that motivational profiles play a role in how task behavior indicators (time on task, trials and strategies), prior ability, and performance in problem-solving tasks are related to each other due to differences in their approaches to novel tasks (see Callan, et al., 2021; Skinner et al., 1998). In summary, this paper delves into the complex dynamics of effort, motivation, and cognitive processes during academic tasks, utilizing innovative technology for data collection. The findings provide novel insights into students' problem-solving strategies. References Callan, G. L., Rubenstein, L. D., Ridgley, L. M., Neumeister, K. S., & Finch, M. E. H. (2021). Selfregulated learning as a cyclical process and predictor of creative problem-solving. Educational Psychology, 1–21. https://doi.org/10.1080/01443410.2021.1913575 Carroll, J. B. (1989). The Carroll model: A 25-year retrospective and prospective view. Educational Researcher, 18, 26–31. https://doi.org/10.3102/0013189X018001026 Chapman, M., Skinner, E. A., & Baltes, P. B. (1990). Interpreting correlations between children’s perceived control and cognitive performance: Control, agency or means–ends beliefs. Developmental Psychology, 26, 246–253. https://doi.org/10.1037/0012-1649.26.2.246 Goldhammer, F., Naumann, J., Stelter, A., Klieme, E., Toth, K. & Roelke, H. (2014). The time on task effect in reading and problem solving is moderated by task difficulty and skill: Insights from a computerbased large-scale assessment. Journal of Educational Psychology, 106(3), 608–626. https://doi.org/10.1037/a0034716 Greiff, S., Niepel, C., Scherer, R., & Martin, R. (2016). Understanding students' performance in a computer based assessment of complex problem solving. An analysis of behavioral data from computer-generated log files. Computers in Human Behavior, 61, 36–46. https://doi.org/10.1016/j.chb.2016.02.095 Kong, S.-C. & Abelson, H. (2019). Computational Thinking Education. Springer Singapore. https://doi.org/10.1007/978-981-13-6528-7 Malmberg, L.-E., & Little, T. D. (2007). Profiles of ability, effort, and difficulty: Relationships with worldviews, motivation and adjustment. Learning and Instruction, 17(6), 739–754. https://doi.org/10.1016/j.learninstruc.2007.09.014 Schnipke, D. L. (1995). Assessing speededness in computer-based tests using item response times. [Dissertation, John Hopkins University]. The Johns Hopkins University ProQuest Dissertations Publishing. Skinner, E. A., Chapman, M. & Baltes, P. B. (1988). Control, means-ends, and agency beliefs: A new conceptualization and its measurement during childhood. Journal of Personality and Social Psychology, 54, 117–133. https://doi.org/10.1037/0022-3514.54.1.117 Skinner, E. A., Zimmer-Gembeck, M. J. & Connell, J. P. (1998). Individual differences and the development of perceived control. Monographs of the Society for Research in Child Development, 6(2–3), 1–220. https://doi.org/10.2307/1166220 Stubbart, C. I., & Ramaprasad, A. (1990). Conclusion: The evolution of strategic thinking. Teoksessa A. Huff (toim.), Mapping strategic thought. John Wiley and Sons. Wise, S. L., & Gao, L. (2017). A general approach to measuring test-taking effort on computer-based tests. Applied Measurement in Education, 30(4), 343–354. https://doi.org/10.1080/08957347.2017.1353992 09. Assessment, Evaluation, Testing and Measurement
Paper Does the Use of ICT at School Predict Lower Reading Literacy Scores? Multiple Group Analyses with PISA 2000-2022 Data Tampere University, Finland Presenting Author:Previous studies have shown that the use of information and communication technologies (ICT) in leisure time, and also at school, is related to lower level of school performance (Biagi & Loi, 2013; Gubbels, Swart, & Groen, 2020). Furthermore, data from the Programme for International Student Assessment (PISA) studies have indicated that higher levels of ICT use is related to lower scores in reading literacy both internationally and in Finland (OECD, 2011; Saarinen, 2020). Analyses of the PISA data from 2012 have also shown no significant improvements in student achievement in reading, mathematics or science in the countries that had invested heavily in ICT for education (OECD, 2015). These findings have sometimes been interpreted as an indication of the harmful effects of digitalisation of education. PISA results have shown a declining trend in many countries (OECD, 2023). The most recent decrease in PISA 2022 scores have been explained, at least in Finland, for example, by the excess use of ICT. On the other hand, mixed results have also been reported, and it is difficult to draw clear conclusions about the relationship between the use of digital technologies and learning (Harju, Koskinen, & Pehkonen, 2019). PISA studies have found that students who use computers moderately and for a variety of purposes have the highest levels of literacy (Leino et al. 2019, p. 94; OECD, 2011). The use of ICT in schools can be seen as a target of learning but also as a learning tool, which means that ICT can also be used as a mean to support students (Jaakkola, 2022). Based on previous research, there are some indications that the digital technology is used to differentiate teaching (Biagi & Loi 2013; Lintuvuori & Rämä, 2022; OECD 2011, pp. 20-21). This study will test the hypothesis that the use of ICT could be targeted especially to lower performing students. The research questions investigated in this study are: 1. How the use of ICT at school is related to students’ reading literacy scores in PISA? Do the levels of proficiency in reading literacy explain the relationship between ICT use and reading performance? 2. Does the student’s special educational needs (SEN) status explain the relationship between the ICT use and reading performance scores? Methodology, Methods, Research Instruments or Sources Used In this study, we will use data from all eight PISA cycles, collected every three years between 2000–2022. We used the plausible values of reading literacy and the questions from ICT questionnaire related to the use of digital technology at school. In the first three cycles, it was simply asked how often the students used computers for schoolwork. We created dichotomously coded variables, comparing students selecting more seldom than once a month, 1–4 times a month, a few times every week, or almost every day to those who reported they never used computers at schools. From 2009 on, the questionnaires had longer scales measuring the different ways of using digital technology in schools, and indices of use of computers and digital devices for schoolwork were created based on them. We analysed the data using Mplus 8.0. Regression models were run for each data set separately, using the categories for computer use (years 2000–2006), the index for computer use (years 2009–2012) and the index for the use of digital devices (years 2015–2022) at school as predictors for reading literacy performance. The stratified two-stage sample design was acknowledged by taking into account school-level clustering and by using house weights that scale the final student weights to sum up to the sample size. First, we ran the analyses for the whole sample, then as multiple group analysis comparing the students at different reading proficiency levels 1–6. For the 2018 data, we performed multiple group analyses also using the information about students support needs according to the Finnish support model (no support, intensified support, special support). For comparing the coefficients between groups, we bootstrapped confidence intervals for the coefficients using 1000 replicates. Conclusions, Expected Outcomes or Findings The results from the cycles 2009–2018 showed that ICT use was negatively related to the reading literacy scores, and the effects were statistically significant. However, the ICT use explained only from one to three percent of the variation in reading literacy scores. By using the reading literacy proficiency levels, we examined whether these different levels of student performance explained negative effects of ICT use on reading literacy scores. On average, students at the lowest proficiency levels used ICT at school more than students at higher levels. However, when examined by performance level, the majority of the relationships between ICT use and reading scores remained statistically non-significant. Students with SEN used more ICT at school than other students and students’ SEN status explained the relationship between ICT use and reading literacy scores, and the relationship was negative and statistically significant. The results of this study suggest that the previous PISA results of the negative relationship between the use of ICT and student performance have often been interpreted as causal effect and thus, in a wrong way: instead of digitalisation causing the decline of performance, schools might use digital technology as a means of support for lower performing students and students with SEN. This, in turn, may at least partly explain the negative correlations between ICT use and student performance. So far, the analyses have been conducted with PISA 2000-2018 data. For this presentation, the same analyses will also be conducted with the most recent PISA 2022 data. The latest PISA results also reflect the impact of Covid-19. Furthermore, the pandemic might also have increased the use of ICT. It is important to explore the PISA 2022 results and the effect the effect of ICT use on reading performance. References Biagi, F. & Loi, M. (2013). Measuring ICT Use and Learning Outcomes: Evidence from recent econometric studies. European Journal of Education, 48(1), 28–42. https://doi.org/10.1111/ejed.12016 Gubbels, J., Swart, N., & Groen, M. (2020). Everything in moderation: ICT and reading performance of Dutch 15-year-olds. Large-scale Assessments in Education, 8(1), 1–17. https://doi.org/10.1186/s40536-020-0079-0 Harju, V., Koskinen, A., & Pehkonen, L. (2019). An exploration of longitudinal studies of digital learning. Educational Research, 61(4), 388–407. https://doi.org/10.1080/00131881.2019.1660586 Jaakkola, T., 2022. Tieto- ja viestintäteknologia oppimisen kohteena ja välineenä. In N. Hienonen, P. Nilivaara, M. Saarnio & M.-P. Vainikainen (Eds.), Laaja-alainen osaaminen koulussa. Ajattelijana ja oppijana kehittyminen (pp. 179–189). Gaudeamus. Leino, K., Ahonen, A., Hienonen, N., Hiltunen, J., Lintuvuori, M., Lähteinen, S., Lämsä, J., Nissinen, K., Nissinen, V., Puhakka, E., Pulkkinen, J., Rautopuro, J., Sirén, M., Vainikainen, M.-P. & Vettenranta, J. 2019. PISA 18 ensituloksia – Suomi parhaiden joukossa. Opetus- ja kulttuuriministeriön julkaisuja 2019:40. Opetus- ja kulttuuriministeriö. http://urn.fi/URN:ISBN:978-952-263-678-2 Lintuvuori, M. & Rämä, I., 2022. Oppimisen ja koulunkäynnin tuki - Selvitys opetuksen järjestäjien näkemyksistä tuen järjestelyistä kunnissa. Opetus- ja kulttuuriministeriön julkaisuja 6:2022. Ministry of Culture and Education. OECD. (2011). PISA 2009 Results: Students on Line: Digital Technologies and Performance (Volume VI). http://dx.doi.org/10.1787/9789264112995-en OECD. (2015). Students, Computers and Learning: Making the Connection. OECD Publishing. http://dx.doi.org/10.1787/9789264239555-en OECD. (2023). PISA 2022 Results (Volume I): The State of Learning and Equity in Education, PISA, OECD Publishing. https://doi.org/10.1787/53f23881-en. Saarinen, A. (2020). Equality in cognitive learning outcomes: The roles of educational practices. Kasvatustieteellisiä tutkimuksia 97. http://urn.fi/URN:ISBN:978-951-51-6713-2 |
14:15 - 15:45 | 09 SES 17 B: Investigating Gender Disparities in Academic Skills and Vocational Interests Location: Room 012 in ΧΩΔ 02 (Common Teaching Facilities [CTF02]) [Ground Floor] Session Chair: Petra Grell Paper Session |
|
09. Assessment, Evaluation, Testing and Measurement
Paper Towards Understanting Gen Z’s Vocational Interests: Sex and Year Effects 1University of Bucharest, Romania; 2University of Bucharest, Romania Presenting Author:Future of Jobs Report 2023 projected the possible job creation and displacements for the next 4 years, revealing a great increase in all the domains in which AI knowledge and skills will be most wanted and used (ex AI and Machine Learning Specialists, Sustainability Specialists, Business Intelligence Analysts, Information Security Analysts) whereas other domains will decrease in their demand for employees (Administrative and Executive Secretaries, Data Entry Clerks, Bank Tellers and Related Clerks) (World Economic Forum, 2023).
In these times of uncertainty and challenge, the selection of an academic path with the potential to lead to a successful career brings a complex decision-making process for adolescents. The achievement in career choices and job performance is significantly shaped by vocational interests (Rounds & Su, 2014). In this context, obtaining a clear understanding of the vocational interests of high school students belonging to Generation Z (born between 1997 and 2008) would prove particularly valuable. A substantial number of these students make decisions about their college majors during the 10th and 11th grade. If their chosen path aligns with their vocational interests, it is likely to enhance their motivation to complete college (Nye, Prasad, & Rounds, 2021), in a period when Higher education is confronted with serious drop-out rates (Eurostat Statistics, 2022). This latter source reveals that in 2022, the proportion of early leavers from education and training (ages between 18 and 24) in the EU ranged from 2.3% in Croatia to 15.6% in Romania.
According to a recent national survey in Romania focusing on Generation Z, it was found that 76% of respondents identified a passion for their work as the primary motivating factor in their job search (Romanian Business Leaders, 2022). This indicates that, for this demographic, vocational interests take precedence over financial compensation when considering employment opportunities.
As all the previous generations, Gen Z has its distinct futures, being described as more pragmatical and future-oriented compared with the more idealistic Millennials (Twenge, 2020, p. 231). Being born in a digitalized and tech world, vocational interests have also changed, as the current generation is interested in more fields of activity than the previous with an increased interest in information technologies (Roganova & Lanovenko, 2020).
Interests are defined as a cognitive and motivational factor encompassing both engagement and participation in specific content areas. The effectiveness of interest lies in its capacity to generate a rewarding experience through the information search process (Renninger & Hidi,, 2020). Interests have a significant influence on career choices and academic achievement (Hoff, Song, Wee, Phan, & Rounds, 2020), (Stoll, et al., 2020). This is why the present research endeavors to explore the patterns or clusters of interests within the Generation Z adolescent demographic.
A key objective of the study is to ascertain whether distinct patterns of interests emerge among the cohort based on factors such as the year of the examination, age, or gender. This multifaceted approach seeks to provide a nuanced understanding of the intricate interplay between vocational interests and demographic variables, contributing valuable insights to the broader discourse on college domain decisions among adolescents. Therefore, this research aims to address the following questions:
Methodology, Methods, Research Instruments or Sources Used A quantitative approach will be further employed for the current study. The selected variables include gender, and the year of the testing as independent variables, while the dependent variables comprise the 34 interest scales assessed in the Jackson Vocational Interest Survey (JVIS). The data collection took place between 2012 and 2023 at a career counseling center in Bucharest, Romania. The participants were evaluated as part of the counseling process they have acquired as a service of the center. The participants completed the test on a dedicated online platform under the guidance of a counselor. The sample for this study was derived by extracting data from the centers' database, adhering to specific inclusion criteria. The inclusion/exclusion criteria comprised individuals with a date of birth falling within the range of 1997 to 2007, aligning with the generational interval of Generation Z - 1997 - 2012 (Twenge, 2020). Additionally, participants included in the study were required to be between 16 and 17 years old at the time of taking the test, and specifically, they needed to be enrolled in high school. By implementing these criteria, the study ensures a targeted focus on the Generation Z cohort during their adolescent years, meaning being born between 1997 -2007 to meet the age criteria. Applying the specified criteria resulted in a sample size of 1047 participants, with 580 females and 467 males included in the study. The data was collected using the Jackson Vocational Interest Survey (JVIS). JVIS scores in a number of 34 interest scales. The interest scales are categorized into two primary groups: Work Roles scales (such as Performing Arts, Life Science, Law, Social Sciences, Elementary Education, Finance, Business, etc.) and Work Styles scales (such as Accountability, Stamina, Independence, Planfulness, Supervision, etc.). (Iliescu, Livinti, 2007). Each interest scale is evaluated on a scale ranging from 1 to 99 points. The data analysis will be based on a statistical approach and between the methods proposed to be used we mention: descriptive statistics, frequencies (to describe different variables), mean-level comparison (to compare the three subgroups by year and interest scales' scores), ANOVA (when comparing the 34 interest scales' scores across and between subcohorts), mixed-ANOVA (when adding the gender variable). Conclusions, Expected Outcomes or Findings We expect our statistical analysis to reveal a complex depiction of the highest and lowest interests scales among the population of 16th-17th years old, demonstrating the multifaced nature of vocational interests. We anticipate that individuals in subgroups before and after the Covid pandemic might exhibit higher scores in scales measuring aspects of the working environment, reflecting the potential influence of significant external events on individuals' perceptions and preferences. While we do not expect to observe sex differences in vocational interests overall, we anticipate potential variations in the Writing and Academia scales, where females may score higher. Given that vocational interests play a pivotal role in both career success and subjective well-being (Harris & Rottinghaus, 2017), comprehending the trends in vocational interests among Generation Z adolescents holds significant implications. This understanding can serve as a foundation for crafting improved educational policies, including enhancements in career counseling and higher educational offerings. Additionally, insights into the vocational preferences of this demographic can inform adjustments within the future job market, facilitating a more tailored and responsive approach to meet the evolving needs and aspirations of Generation Z as they navigate their educational and professional journeys. References World Economic Forum. (2023). Future of Jobs Report 2023. https://www3.weforum.org/docs/WEF_Future_of_Jobs_2023.pdf Harris, K. L., & Rottinghaus, P. (2017). Vocational interest and personal style patterns: Exploring subjective well-being using the strong interest inventory. Journal of Career Assessment, 203–218. doi:https://doi.org/10.1177/1069072715621009 Hoff, K. A., Song, Q., Wee, C., Phan, W., & Rounds, J. (2020). Interest fit and job satisfaction: A systematic review and meta-analysis. Journal of Vocational Behavior, 123. doi:https://doi.org/10.1016/j.jvb.2020.103503 Katja, P., & Hell, B. (2020). Stability and change in vocational interests from late childhood to early adolescence. Journal of Vocational Behavior, 121. doi:10.1016/j.jvb.2020.103462 Romanian Business Leaders, (2022). Raport public, https://mailchi.mp/9c64de820779/raport-insights-pulsez-2022?utm_source=mailchimp&utm_medium=Landing+page&utm_campaign=studiu Retrieved from https://izidata.ro/. Renninger, K. A., & Hidi,, S. (2020). To Level the Playing Field, Develop Interest. Policy Insights from the Behavioral and Brain Sciences, 7(1), 10-18. doi:https://doi.org/10.1177/2372732219864705 Roganova, A., & Lanovenko, Y. (2020). Transformation of interests and motivation to learn of Generation Z. Herald of Kiev Institute of Business and Technology, 44-49. doi:https://doi.org/10.37203/kibit.2020.44.06 Stoll, G., Einarsdóttir, S., Song, Q., Ondish, P., Sun, o., & Rounds, J. (2020). The Roles of Personality Traits and Vocational Interests in Explaining What People Want Out of Life. Journal of Research in Personality, 86. doi:10.1016/j.jrp.2020.103939 Twenge, J. M. (2020). Generația internetului. București: Baroque books and art. Iliescu,D, Livinti, R (trad) (2007), Jackson Vocational Interest Survey - Manual Tehnic si Interpretativ, Cluj-Napoca, Ed. Sinapsis. 09. Assessment, Evaluation, Testing and Measurement
Paper Evaluation of Policy Factors Influencing the Youth's Choice of Teaching Profession: A Pseudo-Panel Data Approach The University of Tokyo, Japan Presenting Author:Recent empirical research in the social sciences has emphasized the importance of causal inference. However, causal inference is challenging when using observational data, primarily cross-sectional, even if it includes relevant variable information, as in international and large-scale educational surveys. The difficulty is more pronounced when the variable of interest, such as a national-level policy, is systemic. This paper demonstrates that by using pseudo-panel data derived from repeated cross-sectional data, we can obtain findings relevant to policy-making, thereby mitigating some of the challenges in causal inference, particularly biases from unobserved confounding factors. The specific topic addressed in this paper is the assessment of policy factors related to the youth's choice to teach. In general, improving the availability and quality of teacher personnel is a universal and important issue for public education policy (OECD 2018). These research areas concerning the choice of teaching career and teacher supply have been interdisciplinary in education (educational policy studies, sociology of education, educational psychology, etc.) and economics (economics of education, labor economics). In particular, empirical research on the basic issues of "who chooses to teach" and "what factors increase the number of people who want to teach" has been conducted in many countries. While educational and psychological research have pointed out the importance of psychological factors, work environment factors have not been recognized as the main factors influencing career choice (Watt et al. 2017). On the other hand, empirical studies in the economics of education and labor economics have focused exclusively on the impact of salary levels as a policy variable on entry and exit from the workforce and have partially argued for its contribution (Corcoran et al. 2004; Dolton 1990; Manski 1987). Moreover, Japan, where the presenter is from, has historically excelled in maintaining high-quality teachers, as evidenced by their high competency (Hanushek et al. 2019) and low turnover rates, compared to other countries. However, recent years have seen a growing trend among young people to avoid teaching careers. Japan now faces challenges similar to many countries experiencing a structural teacher shortage. Public debates often cite the relatively inferior work environment of teaching compared to other white-collar jobs as a factor in this avoidance. Yet, substantial evidence is lacking to inform policy priorities in this area. In this study, we position and extend the groundbreaking recent studies that have used PISA student-level data to analyze the youth’s choice of teaching profession (Park & Byun 2015; Han 2018) as important prior work. We differ from that study in terms of methodology, using pseudo-panel data composed of subpopulations of countries as units; we apply a cross-classified hierarchical model to ask "Which policy factors" promote "whose" entry into the teaching profession among young people? We specifically focus on policy factors related to the working environment, namely, the relative salary level of teachers compared to other professions and the workload of teachers (working hours, number of students per teacher, and time spent on non-teaching tasks). Applying a cross-classified hierarchical model to the pseudo-panel data, we respond to the question of "which policy factors" encourage "whom" of young people to enter the teaching profession, addressing both causal inference (controlling for time-invariant confounders) and policy relevance (heterogeneity of policy effects). The cross-classified model, which sets up the random effects/coefficients in two types of units, country, and subpopulation, has a major advantage in that it allows for different policy implications for each country. To further increase the robustness of our model, we are expanding it into a semiparametric model (infinite mixture model) that does not rely on a multivariate normal distribution for random effects and coefficients. Methodology, Methods, Research Instruments or Sources Used One problem with existing quantitative empirical studies of the choice of teaching profession and teacher supply is their weak consideration of causal inferences (especially in addressing unobserved confounding factors). This paper attempts to address these problems through an analysis using pseudo-panel data. Pioneering studies based on pseudo-panel data in education (but different from the topic of this paper) include Gustafsson (2008, 2013), who applied them to data from large-scale international surveys, and the ideas in this paper also rely on them. In this paper, we use student-level data from OECD member countries in the Programme for International Student Assessment (PISA) as data related to teacher choice. PISA survey data are usually used in empirical analyses with academic achievement as the outcome variable, but they have already been used in several studies of career choices because they include questions on items related to occupations in which students expect to be employed at age 30 (Park & Byun 2015; Han 2018; Han et al. 2018, 2020). Existing studies often rely on cross-section data from a specific time period. In contrast, our analysis uses pseudo-panel data compiled from multiple time points. As each PISA survey targets different respondents (15-year-old students from each country at each time point), it does not constitute individual-level panel data. However, by reorganizing this data into a subpopulation-based panel format, incorporating multiple attribute information, we can exploit the benefits of panel data, such as controlling for time-invariant confounding factors. In creating the pseudo-panel data, subpopulations were defined based on information about gender, parental occupation (whether the parent's occupation was in teaching or not), and cognitive ability (subdivided into 10 groups based on PISA scores). The aspiration rate of primary and secondary education teachers within each subpopulation is used as the dependent variable to clarify which policy factors related to the working environment each youth group strongly responds to, influencing their choice or rejection of the teaching profession. Policy factors concerning the working environment include 1) salary level, 2) teacher-student ratio, 3) working hours, and 4) the amount of non-teaching tasks, focusing on the national and temporal levels. The data on policy factors are based on country and time units. These data are analyzed using Bayesian cross-classified parametric/semi-parametric hierarchical models. By employing a cross-classified hierarchical model, we can assume that the effects of policy factors vary between countries and subpopulations, allowing us to obtain policy-relevant insights. Conclusions, Expected Outcomes or Findings By utilizing Bayesian cross-classified hierarchical models on pseudo-panel data regarding the youth’s choice of teaching profession, we could analyze the impact of various policy factors related to the working environment. This approach allowed us to control for time-invariant confounding factors and clarify heterogeneity in the effects of each policy factor across different subpopulations and countries. Regarding overall trends, enhancing the working environment appears to motivate female students to choose teaching as a profession more than male students. Specifically, improvements in relative salary, student-teacher ratios, and reduced working hours significantly encourage highly qualified individuals to enter the teaching field. Concerning the effect's magnitude, we observed that a one standard deviation improvement in these factors increases the proportion of students aspiring to teach by 0 to 2 percentage points. However, for high-ability male students whose parents are not teachers, we found no significant incentive to pursue a career in teaching. While it is difficult to summarize the differences in policy effects across countries, focusing on Japan, which is the primary concern of the presenter, we find the relative salary level and relative working hours compared to other occupations have a stronger impact. Similarly, the analysis results can point to specific characteristics in other countries. These findings contrast with previous research in education and psychology on the choice of the teaching profession, which often underestimates the role of extrinsic factors due to the analogical application of motivational theories of learning. Our findings reveal that the working environment plays a crucial role in influencing young people's decisions to enter the teaching profession and in determining the overall supply of teachers. Moreover, they identify which policy factors will affect the quality of teacher supply. References Bryk, A. S., and S. W. Raudenbush (2002) Hierarchical Linear Models: Applications and Data Analysis Methods, Sage Publications. Condon, P. D. (2020) Bayesian Hierarchical Models with Applications Using R, 2nd edition, CRC Press. Corcoran, S. P., W. N. Evans, and R. M. Schwab (2004) “Women, the Labor Market, and the Declining Relative Quality of Teachers,” Journal of Policy Analysis and Management, 23(3): 449-470. Dolton, P. J. (1990), “The Economics of UK Teacher Supply: The Graduate's Decision,” The Economic Journal, 100: 91–104. Gustafsson, J. (2008) “Effects of International Comparative Studies on Educational Quality on the Quality of Educational Research,” European Educational Research Journal, 7(1):1-17. Gustafsson, J. (2013) “Causal Inference in Educational Effectiveness Research: A Comparison of Three Methods to Investigate Effects of Homework on Student Achievement,” School Effectiveness and School Improvement, 24(3): 275-295. Han, S. W. (2018) “Who Expects to Become a Teacher? The Role of Educational Accountability Policies in International Perspective,” Teaching and Teacher Education, 75:141–152. Han, S. W., F. Borgonovi, and S. Guerriero, (2018) “What Motivates High School Students to Want to Be Teachers? The Role of Salary, Working Conditions, and Societal Evaluations About Occupations in a Comparative Perspective,” American Educational Research Journal, 55(1): 3–39. Han, S. W., F. Borgonovi, and S. Guerriero (2020) "Why Don’t More Boys Want to Become Teachers? The Effect of a Gendered Profession on Students’ Career Expectations," International Journal of Educational Research, 103:101645. Hanushek, E. A., J. F. Kain, and S. G. Rivkin (2004) “Why Public Schools Lose Teachers”, Journal of Human Resources, 39(2): 326–354. Hanushek, E. A., M. Piopiunik, and S. Wiederhold (2019) “The Value of Smarter Teachers: International Evidence on Teacher Cognitive Skills and Student Performance,” Journal of Human Resources, 54(4), 857-899 Kleinman, K. P. and J. G. Ibrahim (1998) “A Semiparametric Bayesian Approach to the Random Effects Model," Biometrics, 54:921-938. Manski, C. F. (1987) “Teachers Ability, Earnings, and the Decision to Become a Teacher: Evidence from the National Longitudinal Study of the High School Class of 1972,'' in D. A. Wise ed., Public Sector Payrolls, University of Chicago Press. OECD(2018) Effective Teacher Policies: Insight from PISA, OECD Publishing. Park, H., and S. Y. Byun (2015) “Why Some Countries Attract More High-Ability Young Students to Teaching: Cross-National Comparisons of Students’ Expectation of Becoming a Teacher,” Comparative Education Review, 59(3): 523–549. Watt, H. M. G., P. W. Richardson, and K. Smith eds. (2017) Global Perspectives on Teacher Motivation, Cambridge University Press. |
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: ECER 2024 |
Conference Software: ConfTool Pro 2.6.153+TC © 2001–2025 by Dr. H. Weinreich, Hamburg, Germany |