Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Please note that all times are shown in the time zone of the conference. The current conference time is: 14th June 2024, 04:41:59am GMT

 
Filter by Track or Type of Session 
Only Sessions at Location/Venue 
 
 
Session Overview
Location: Gilbert Scott, EQLT [Floor 2]
Capacity: 120 persons
Date: Tuesday, 22/Aug/2023
1:15pm - 2:45pm09 SES 01 A: Getting Started with R in RStudio
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Monica Rosén
Research Workshop
 
09. Assessment, Evaluation, Testing and Measurement
Research Workshop

Getting started with R in RStudio

Monica Rosén, Erika Majoros

University of Gothenburg, Sweden

Presenting Author: Rosén, Monica; Majoros, Erika

The School of Quantitative Research Methods in Education (QRM, https://www.gu.se/en/qrm) is a research school and an emerging network funded by the Swedish Research Council. QRM aims to contribute to rebuilding of the competence of quantitative methodological knowledge in educational research in Sweden and elsewhere. The proposed workshop contributes to this aim by training basic competence in using the statistical environment, R, which is a widely used, open source project. However, using R is still relatively new in educational sciences.

R is an increasingly used statistical environment maintained by an international team of developers. The proposed workshop is intended for analysts and researchers who do not have yet the knowledge and experience using R. Working knowledge of basic statistics is required.

The workshop will begin with a brief overview of the R environment based on Venables, et al. (2014). Then, based on Grolemund (2014), the workshop leader will assist the participants to download R as well as RStudio, a software application that makes R easier to use.

Afterward, basic data management tasks will be demonstrated and practiced, including importing and exporting data. Then basic statistical analyses will be demonstrated and practiced using several R packages, including plotting the data. Exporting outputs will be also practiced. Finally, based on the workshop participants’ interest and pace of work, advanced statistical analyses will be demonstrated and practiced.

The workshop will end with informing the participants about resources to further develop their knowledge in R and commonly used R packages for working with international large-scale assessment (ILSA) data. Participants should bring a laptop. The workshop leaders will hand out the presentation slides beforehand and the R scripts in case of inquiry afterward.

Draft agenda:

  • Introduction
  • Downloading and installing R and RStudio
  • Data management
  • Basic statistical analyses
  • Creating plots
  • Advanced statistical analyses (Optional)
  • Summary and further avenues and resourses

Methodology, Methods, Research Instruments or Sources Used
Data
Practical tasks will be performed using publicly available data from the Second International Mathematics Study (1980) will be retrieved from the Center for Comparative Analysis of Educational Achievement (COMPEAT) website: https://www.gu.se/en/center-for-comparative-analysis-of-educational-achievement-compeat/studies-before-1995/second-international-mathematics-study-1980#Data---Population. COMPEAT is an infrastructure project with the general aim to build databases of international large-scale studies in educational achievement conducted by IEA and OECD before the year 2000 and to support secondary analyses of these data.

Methods
Basic statistics and optionally advanced statistical methods, such as regression analysis or item response theory modeling will be used.

Conclusions, Expected Outcomes or Findings
By the end of the workshop, the participants are expected to have gained the following learning outcomes
• Install R and Rstudio
• Download and apply an R package
• Effectively use the documentation of an R package
• Perform basic data management tasks, such as computing or recoding variables, subsetting data in RStudio
• Perform basic statistical analyses, including basic plots

References
Center for Comparative Analysis of Educational Achievement (nd). https://www.gu.se/en/center-for-comparative-analysis-of-educational-achievement-compeat
Grolemund, G. (2014). Hands-On Programming with R: Write Your Own Functions and Simulations. https://rstudio-education.github.io/hopr  
Venables, W. N., Smith, D. M., & R Core Team (2014). An Introduction to R. https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
 
3:15pm - 4:45pm09 SES 02 A: Innovations in Higher Education Admission and Student Support Programs: Enhancing Access and Success
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Jana Strakova
Paper Session
 
09. Assessment, Evaluation, Testing and Measurement
Paper

Developing a Standardized Eligibility Test for Tertiary Education in Sweden

Gudrun Erickson1, Jan-Eric Gustafsson1, Frank Bach2, Jörgen Tholin1

1University of Gothenburg, Sweden; Dept of Education and Special Education; 2University of Gothenburg, Sweden; Dept of Pedagogical, Curricular and Professional Studies

Presenting Author: Erickson, Gudrun

The paper focuses on the development of a basic eligibility test for admission into tertiary education in Sweden, a test aimed to provide opportunities for a wider group of applicants than today, thereby increasing inclusion and diversity in higher education. This is an emerging type of test, reflecting increasing national and international needs for documented competences required for access to higher education (e.g., the HiSET exam; ETS 2021).

To be allowed to apply for tertiary education in Sweden, basic entry requirements must be met. This corresponds to a leaving certificate from upper secondary level, comprising a combination of core subjects, in particular Swedish/Swedish as a second language, English and mathematics, but also a number of generic/key competences embedded in the national curricula, as well as in corresponding documents, e.g., in the European context (European Commission, 2018). Examples of such generic competences are problem solving, critical thinking and inferencing.

Traditionally, individual students who lack the formal requirements for tertiary education may have their competences evaluated by the university to which they apply. However, this is normally done in relation to a single course and is not generalizable, neither to other courses nor to other universities. These local evaluations are based on a central ordinance (UHRFS 2021:4) that defines the basic components required but not the extent or methods of the validation procedures. Hence, certain variability is self-evident.

To create a standardized alternative to local validation, a suggestion for a test of basic eligibility was made in a governmental investigation concerning entrance into higher education (SOU 2017:20). This resulted in a political decision to conduct a three-year trial round for a basic eligibility test (SFS 2018:1510). It was made clear that the test was intended for people from 24 years of age lacking formal, basic qualifications. In addition, it was decided that the result of the test should not be used for competition but for eligibility purposes only.

In 2020, the Swedish Council for Higher Education (UHR) formed a group of experts within different educational fields to discuss the development of a basic eligibility test. Three members of this group were given the task to develop a tentative framework for the activity. This work was conducted in close collaboration with the larger group, which lead to gradual revisions of the text. The final document was officially approved in January 2021 (UHR, dnr. 00012-2020). It consisted of some 40 pages and comprised subsections focusing on aims and background, including references to different international studies, e.g., PISA (OECD, 2018), test components, quality measures and control, as well as guidelines for use and future changes and developments.

The assignment to conduct the trial was given by the National Council/UHR to [university], which has a long tradition of large-scale educational assessment, regarding test development as well as analyses of results at national and international levels (Author 2 et al.; Author 2 et al.; Author 1 et al., Author 3 et al.) A three-year contract was signed in March 2021, after which the operative developmental phase started. The first, large-scale trial test was administered in October 2022, comprising a wide range of tasks: dichotomous, selected response items within different domains, English listening comprehension, integrated tasks also including graphs, tables etc., and two tasks targeting written production in Swedish and English.

The aim of the current presentation is to briefly

  • describe and discuss the rationale and methodology of the test development process,
  • present and reflect on some results of the process, and to
  • look forward into, and discuss, possible future uses of the test being developed.

Methodology, Methods, Research Instruments or Sources Used
Initially, a decision was made to form a project steering group of four people, the intention being twofold: to broaden competences and to decrease vulnerability. It was also essential to establish the basic character of the envisaged product, namely a unified test, where only one standard was to be set, namely the pass level. In addition, the cut-off point was to be determined based on a holistic assessment, not on partial requirements for individual components of the test. It was also emphasized that the generic competences were seen as overarching concepts, however with working groups linked to different competence domains. These groups were directed towards Swedish L1/ L2, English, mathematics, natural and social sciences, and methodology. Closely linked to this structure were also so-called go-betweens (gb:s), i.e., people with competence in two or more of the domains.

There were basic requirements for each working group, namely three types of, often overlapping, competences and experiences: subject matter knowledge including pedagogical/teaching experience, experience of large-scale testing, and research competence. Furthermore, a specification matrix was developed for the different groups, on the basis of which they documented gradually information for the material developed, e.g., intended construct/competence, amount and type of input, format, estimated time, estimated/intended difficulty, etc.

The working groups followed internal plans when developing items and tasks. The members of the steering group, several of them with a domain-related background, kept in regular touch. Roughly once a month there were meetings for the whole group, approx. 25 people, in which different issues were discussed.

Piloting of material, item-based as well as productive writing tasks for Swedish and English, was conducted in a four-step, anonymous process: (1) mini trials per domain in smaller groups; (2) trials including material from two domains; (3) trials of a mix of tasks from different domains; and (4) pre-testing rounds aiming for a large number of participants, representing as wide a range as possible of individuals. In the latter case, tasks were always accompanied by a set of background questions and anchor items. In addition, extensive collection of test-taker feedback was made, using Likert-scales and open comments.

Standard setting is a self-evident aspect of test development, also in this trial project. This is always a challenge, not least for tests comprising both dichotomous item data and ratings of productive tasks. A short account of this will be given, including reactions to the first large-scale trial round.


Conclusions, Expected Outcomes or Findings
The test development process focused upon in the current proposal is indeed work in progress, with continuous documentation undertaken. Hence, the conclusions drawn at this stage are tentative, and should partly be regarded as observations, which, however, will be combined into final conclusions at the end of the trial. Furthermore, they emanate from distinctly different data. One such observation is related to pre-testing and clearly shows the difficulty of finding the intended, large groups of test takers. Another aspect that needs constant attention is the nature of the test as unified, not consisting of independent parts. This is even more emphasized by the fact that two of the subtests, in particular, have a general character, including and integrating different competences in a way that makes them closely related to the overarching, generic purposes. Also, the rating of productive tasks is a multi-facetted activity that needs lots of consideration and re-consideration, especially when combined with other types of data and analyses. In addition, it can be concluded that anchor items are crucial for standard-setting purposes in making comparisons at different levels possible. Finally, it is obvious that test-taker feedback adds considerably to the quality of the process, as such, by giving the necessary perspective of the users, but also as a complement to analyses of performance.

We sometimes characterize the assignment to carry out the trial of this test as a task entailing building a boat while sailing it. This is obviously far from uncomplicated but still works quite well, much thanks to the distinct and positive collaboration that takes place in the process, between different actors: the national council and the university department, disciplines and groups within the project, and with potential users of what we hope will be a future national entrance test for tertiary education.

References
Author 2 et al. (year 1)
Author 1 et al. (year 3)
Author 2 et al. (year 2)
Author 3 et al. (year 4)

Educational Testing Service (2021). 2020 Annual Statistical Report. https://hiset.org/s/pdf/HiSET_2021_Annual_Statistical_Report.pdf

European Commission (2018). Council Recommendation of 22 May 2018 on key competences for lifelong learning. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.C_.2018.189.01.0001.01.ENG&toc=OJ:C:2018:189

OECD (2018). PISA 2018 Assessment and Analytical Framework. https://doi.org/10.1787/b25efab8-en

Regeringskansliet (2017). SOU 2017:20. Tillträde för nybörjare – ett öppnare och enklare system för tillträde till högskoleutbildning [Admission for beginners – a more open and accessible system for entrance into tertiary education]. https://www.regeringen.se/rattsliga-dokument/statens-offentliga-utredningar/2017/03/sou-201720/

Sveriges Riksdag (2018). Förordning om försöksverksamhet med behörighetsprov för tillträde till högskoleutbildning [Ordinance on trial activities with eligibility tests for admission to higher education]. (SFS 2018:1510). https://www.riksdagen.se/sv/dokument-lagar/dokument/svensk-forfattningssamling/forordning-20181510-om-forsoksverksamhet-med_sfs-2018-1510

Universitets- och högskolerådets författningssamling (UHRFS 2021:4). Föreskrifter om ändring i Universitets- och högskolerådets föreskrifter (UHRFS 2013:1) om grundläggande behörighet och urval [Regulations regarding change in UHR’s regulations on basic eligibility and selection] . https://www.uhr.se/globalassets/_uhr.se/publikationer/lagar-och-regler2/uhrfs/2021/uhrfs-2021-4.pdf

Universitets- och Högskolerådet (2021). Ramverk för det nationella behörighetsprovet [Framework for the national eligibility test]. (UHR, dnr. 00012-2020)


09. Assessment, Evaluation, Testing and Measurement
Paper

A Controversially Received Reform: The 2018 Reform of Finnish Higher Education Student Admission

Sirkku Kupiainen, Risto Hotulainen, Irene Rämä, Laura Heiskala

University of Helsinki, Finland

Presenting Author: Kupiainen, Sirkku

Upper secondary exit exams are common in education systems worldwide, marking the passing of upper secondary education and acting as gatekeeper for higher education (Noah & Eckstein, 1992). This double role of the exam is especially salient in countries where the share of the age-cohort passing academic upper secondary education exceeds that of students accepted to higher education. If the exam plays a prominent role in admission, the high stakes of the exam are especially acute. This sets specific requirements to the comparability of the examination results across the exams of the different subjects if the examination is so constructed, and across years if delayed entering to higher education is common (cf. Beguin, 2000). Both clauses are relevant in Finland, the focus of the present study.

While an upper secondary exit exam is an integral part of the education systems of most European countries, both stratified and comprehensive, the form of the exam and the share of students sitting for it vary widely. Despite these differences, the academic track of upper secondary education usually comprises some form of an exit examination or final grades taken into account in tertiary student selection as a sole factor or in addition to an entrance examination, unless access is open to all or the selection happens later based on students’ study performance. In Finland, both academic and vocational upper secondary education provide a qualification for higher education (Orr et al., 2017), meaning that even if a 2018 reform decreed half of students to be accepted into higher education based solely on their matriculation examination results, Finland cannot fully abandon the entrance examinations, which earlier regarded all applicants.

Orr and colleagues (2017) classify the European Union member states according to upper secondary tracking and higher education institutions’ autonomy on student intake. In Finland, the state decides in collaboration with universities the number of students admitted to different programs and the outlines for admission policy, while universities decide the details for the latter. In 2018, a student admission reform in Finland mandated half of students to be accepted on matriculation examination results with universities deciding in collaboration how credit for the different subject-specific exams would be awarded. The main goal of the reform was to speed Finnish students’ slow transit from secondary to tertiary education, a problem that also the OECD has pointed out as one of the weak points of the Finnish education system. Due to a backlog of older matriculates vying for a place, two thirds of new matriculates are left yearly without a place in higher education. The reform was backed by research on the drawbacks of the earlier entrance examination-based student selection (Sarvimäki & Pekkarinen, 2016) and tied the credit to the number of courses covered by each subject-specific exam. Yet, the reform has raised vocal criticism. The second chance offered by an entrance exam has been dear to many, but the focus of criticism has been that due to its biggest course-load, advanced mathematics brings most credit even in fields where proficiency in it might appear of less value. An earlier reform of medical faculties’ student admission in 2014 increased the weight given to advanced mathematics, with the positive consequence of increasing the share of girls sitting for the exam. We expect the present reform to have a similar impact despite the current critique.

In this presentation, we explore the impact of the reform on upper secondary schools – on students’ course choices and attainment, on their plans for the exams to include in their matriculation examination, on student wellbeing and possible burnout, and on students’ and teachers’ views on the reform.


Methodology, Methods, Research Instruments or Sources Used
The data is drawn from an ongoing (spring 2022 – spring 2023) study on the effect of the 2018 student admission reform on upper secondary schools and students, compiled to inform a re-writing of the respective admission criteria in 2023 due to the presumed negative impact of the reform on upper secondary students’ width and depth of studies (choice of subjects in the relatively free syllabus), and on their wellbeing. The data comprise questionnaires for students (n = 8,000), teachers, principals and guidance counsellors in sixteen upper secondary schools, register data on the sampled students‘ course choices and attainment, and additional focus-group interviews of students and teachers in five upper secondary schools. Furthermore, the data comprises national matriculation examination data of 2016–2022 to investigate possible changes in students’ exam choices across the implementation of reform.

Reflecting the cross-sectional survey data and the largely descriptive research questions, the results for the quantitative data will be mainly presented at the descriptive level, using ANOVA for variable-based profile analysis (e.g., math-oriented vs. humanistic-subjects- oriented students, high vs. low achievers, etc.) and group-level (e.g., gender, home background) comparisons. Due to the wide variability of students’ study paths within the relatively free upper secondary syllabus (of the 75 courses required for matriculation, only 45/52 are mandatory for basic/advanced mathematics) and students’ free choice for the order in which they study the different subjects (only advanced mathematics requires having a course in all of the five periods across the year), multi-level analysis is expected to be a valid option for only some specific questions. The interview data will be used at this point to just provide ‘real-life’ examples of how the students and teachers see and talk about the issues brought up by the quantitative data used as the bases for the focus-group discussions.    

Conclusions, Expected Outcomes or Findings
The preliminary data of 4,000 students suggest that, on average, students prefer the earlier practice of entrance examinations, still in use for half of new students. There were statistically significant but weak (p<.001, ƞ2 < .01) differences in this, related to students’ gender and choice of advanced vs. basic mathematics, with students of advance mathematics who perform, on average, better in all exams (Kupiainen et al., 2018) more in favor of the matriculation examination-based student admission. They were also more confident in being accepted to university on the basis of their examination results. Yet, most students predicted that they will prepare for the entrance examination once the matriculation exams are over (a necessity for many as the results from the matriculation examination-based selection come only just before the entrance examinations), a problem contrary to the goals of the reform and brought up also by the study on the impact of the reform from the universities’ point of view (Karhunen et al., 2022).  As expected, the larger credit awarded for advanced mathematics was criticized especially by students of basic mathematics (5.46 vs. 4.43 on a seven-point Likert scale, p<.001, ƞ2=.068). While public discussion has blamed the reform for leading students to choose courses based on the credit awarded for the different exams in the admission process, according to the survey, students still see personal interest in the subject as a clearly stronger incentive for their choice of the exams they plan to sit for (mean 5.67 vs. 4.58 on a seven-point Likert scale). Even if the majority of students (58.6%) expected to be admitted to university based on their matriculation examination results, a much greater majority (82.5%) was ready to use the possibility allowed by the reform to take the exams anew if they were not.
References
Béguin, A. A. (2000). Robustness of equating high-stakes tests. Thesis, University of Twente, Enschede, Netherlands.

Karhunen, H., Pekkarinen, T., Suhonen, T. & Virkola, T (2022). Opiskelijavalintauudistuksen seurantatutkimuksen loppuraportti (The final report of the follow-up study of the student selection reform). VATT Muistiot 67.

Kupiainen, S., Marjanen, J. & Ouakrim-Soivio, N. (2018) Ylioppilas valintojen pyörteessä (Students at the whirlwind of choices). Suomen ainedidaktisen tutkimusseuran julkaisuja. Ainedidaktisia tutkimuksia 14. https://helda.helsinki.fi/handle/10138/231687

Noah, H. & Eckstein, A. (1992). The two faces of examinations: A comparative and international perspective. In Noah and Eckstein (eds.), Examinations: Comparative and International Studies.  Pergamon Press: pp.147-170.

Orr, D., Usher, D., Haj, C., Atherton, G., & Geanta, I. (2017). Study on the impact of admission systems on higher education outcomes. Executive summary. European Commission, Education and Training. Publication Office for the European Union.

Sarvimäki, M., & Pekkarinen, T. (2016). Parempi tapa valita korkeakouluopiskelijat (A better way to choose higher education students). VATT Policy Brief.


09. Assessment, Evaluation, Testing and Measurement
Paper

Impact on students' retention rates of a Progression Support System Program (SiAP) in a Post-Secondary Technical Education in Chile

Cristian Cardenas, Jose Cancino, Carolina Barrientos, Fernando Alvarez

INACAP, Chile

Presenting Author: Cardenas, Cristian; Cancino, Jose

The International Standard Classification of Education (ISCED-11) has labeled post-secondary technical education as a short-cycle tertiary education program (5B). Chile, following OECD standards, has defined Tertiary Technical Education (from now on TTE) as oriented to give the necessary capacities and knowledge to perform as a professional in different areas of the labor market (Ley 21.091, 2018). Additionally, it has emphasized the opportunity to enhance successful trajectories, especially for the population that has been historically excluded from higher education and skilled jobs. In this sense, access to TTE is seen as an instrument of social mobility that seeks to reduce inequality (Brunner et al., 2022).

In turn, retention and dropouts, particularly for low-income students, have been a policy concern and a challenge to the technical educational system (Hällsten, 2017; Sarra et al., 2018; Brunner et al., 2022). The adverse effect of dropping out is dramatic and affects students and families in many ways, including greater marginalization and future lower labor market outcomes (Sosu & Pheunpha, 2019; Voelkle & Sander, 2008; O'Neill et al., 2011). Therefore, retention and dropout affect the goal of inclusion and equity that Chilean policymakers have tried to insert at the core of the TTE Chilean regulation since 2010 (Brunner et al., 2022).

As such, TTE has a disproportionate share of low-income students (Mountjoy, 2022; Sotomayor, 2018). In Chile, these institutions have more extensive participation of students from quintiles 1-3 of income. Namely, close to 50% of the enrollment of these institutions comes from the poorest 60% of the Chilean population (SIES, 2022). In this regard, a 2022 study of the Higher Education Information Service of Chile (SIES) indicates that the retention rate for first-year students of tertiary education is higher for universities (85%) compared to TTE institutions (70%) (SIES, 2022). The statistics are consistent with the study of the determinants of retention and dropout in TTE. When examining which factors influence the probability of student retention, evidence from across the globe indicates four main groups of variables: i) the sociodemographic background of the family, ii) the student's previous academic results, iii) accessibility or financing restrictions, and iv) institutional factors (Behr et al., 2020; Li & Carroll, 2017; Millea et al., 2018; among others).

The Professional Institute INACAP is one of the largest TTEs in Chile, with 15% (N=76781) of the total enrollment of the Chilean technical college system (SiES, 2022). Since 2014, INACAP has developed new and different mechanisms to support a successful trajectory of students through a program called the Progression Support System (SiAP). The program has at its core an inclusion and equity framework that has allowed a comprehensive set of initiatives articulated through a tutor that provides academic, psychosocial, and extracurricular support to students (particularly at risk) to help them successfully navigate their career pathways. The program's ultimate impact indicator is to increase first-year students' retention rates and avoid dropouts.

However, questions about the program's effectiveness have been raised, especially during the covid-19 pandemic. In this regard, we determined to measure the effect of the SiAP-program at INACAP on the first-year student retention rate for the cohorts 2017-2021. Additionally, the study sought to describe changes in the program policies and implementation during the pandemic (2020-2021) that might affect results. Finally, given the large enrollment of INACAP, findings will give insights into the institution and the Chilean tertiary education system about to what extent tutoring programming, as the core of a student system of academic progression support, facilitates retention in the educational system for vulnerable students.


Methodology, Methods, Research Instruments or Sources Used
The INACAP SiAP-program is composed of different initiatives for all first-year students. The objective is to facilitate active and self-managed insertion into higher education. In this way, all students are assigned to an INACAP-SiAP tutor whose role is to support the development of academic skills and self-management of learning through academic monitoring and psychosocial accompaniment to identify support needs and activate internal and external networks promptly. Tutors must follow an order of priority for contact and accompaniment of students based on a student-risk predictor model.

The SiAP-program has reached an average coverage of 70% (n=37.523 students) and above of the total enrollment of first-year students for cohorts 2017-2021. As a first step to evaluate its effectiveness, we decided to develop its theory of change with the SiAP team (Weiss & Connell, 1995) to establish the causal relationship between the program's multiple actions and its expected impact (retention rate of first-year students).

From it, institutional data were gathered to analyze previous analyses' top results and limitations (2014-2016). Thus, selection biases generated by the multiple mechanisms through which students can be referred to each program’s components were identified. On one side, students who decide to participate in the different components of the SiAP voluntarily generate a self-selection bias. Besides, there is an endogeneity behind being referred to the program’s components, either due to having a low score in diagnostic evaluations (where academic performance would affect participation in the program) or by the tutor’s decision (where participation would be correlated with the error term).

From this, the method used was the quasi-experimental propensity score matching (PSM) (Caliendo & Kopeinig, 2008), representing the best impact evaluation tool to minimize biases and thus isolate the effect of treatment on the probability of retention.

The Kernel algorithm was used to take advantage of the information of all the observations located within the standard support to build a more precise counterfactual, applying a weighted average where greater weight was given to those observations that have a score closest to the treatment group and vice versa (Caliendo & Kopeinig, 2008). Additionally, policy and implementation changes in the program were tracked and analyzed to understand changes in the program better.

Conclusions, Expected Outcomes or Findings
The effects of the INACAP-SiAP-program on the retention rate of first-year students are positive and statistically significant. Participation in the program increases the probability of retention for the 2017 cohort by an average of 6.3 percentage points (pp) and 10 pp for the 2018 and 2019 cohorts.

For the cohorts of 2020 and 2021, the magnitude of the effect increases dramatically to 40 pp and 58 pp, respectively. However, these 2020-21 estimates should be carefully analyzed due to external validity problems because of the pandemic, changes in the SiAP guidelines program during the Covid-19 pandemic, and the lower sample number of the control group due to the trend towards universal treatment.

Due to the above, heterogeneous analyzes in subgroups of interest was limited to the 2017-2019 cohorts. For these cohorts, the program's impact on retention rates is substantially more significant in students enrolled in evening programs hours, particularly for the 2018 and 2019 cohorts (6.8 vs. 16.6%; 7.5 vs. 17.6%. respectively). The impact is 135% higher than that obtained in daytime students for the 2019 cohort. This magnitude difference is similar to that reported in the 2018 cohort. Accordingly, the impact on working students (vs. non-working students) is higher by approximately 50% in the magnitude of the effect between the two groups for cohorts 2017-2019, which makes this a consistent result over time.

The study highlights the importance of student support systems (like the INACAP-SiAP) to help students stay on their career pathways. This effort is aligned with the equity and inclusion framework that educational policy in Chile has tried to enhance for students that see tertiary short-cycle education as an opportunity for professional jobs that allows them better opportunities in the labor market. Nonetheless, it is imperative to study the extent to which the program helps students graduate (on-time).

References
Behr, A., Giese, M., Teguim K. & Theune, K. (2020). Dropping out from Higher Education in Germany an Empirical Evaluation of Determinants for Bachelor Students. Open Education Studies, 2(1), 126-148.

Brunner, J., Labrana, J., Alvarez, J. (2022). Educación superior técnico profesional en Chile: perspectivas comparadas. Santiago de Chile: Ediciones Universidad Diego Portales. https://vertebralchile.cl/wp-content/uploads/2022/07/Educacion-superior-tecnico-profesional-en-perspectiva-comparada.pdf

Caliendo, M. & Kopeinig, S. (2008). Some Practical Guidance for the Implementation of Propensity Score Matching. Journal of Economic Surveys, 22(1), 31-72. https://doi.org/10.1111/j.1467-6419.2007.00527.x

Faas, C., Benson, M. J., Kaestle, E. C., and Savla, J. (2018). Socioeconomic success and mental health profiles of young adults who drop out of college. J. Youth Stud. 21, 669–686. doi: 10.1080/13676261.2017.1406598

LEY 21091, 2018. Sobre educacion superior. 11 de mayo 2018 (Chile)

Li, W., & Carroll, D. (2017). Factors Influencing University Student Satisfaction, Dropout and Academic Performance. National Centre for Student Equity in Higher Education.

Millea, M., Wills, R., Elder, A. & Molina, D. (2018). What Matters in College Student Success? Determinants of College Retention and Graduation Rates. Education, 138(4), 309-322.

Mountjoy, J. (2022). Community Colleges and Upward Mobility (Working Paper No 29254). National bureau of economic research. https://www.nber.org/papers/w29254

O'Neill, L. D., Wallstedt, B., Eika, B., and Hartvigsen, J. (2011). Factors associated with dropout in medical education: a literature review. Med. Educ. 45, 440–454.

Ortiz, E. A., and Dehon, C. (2013). Roads to success in the Belgian French community's higher education system: predictors of dropout Bruxelles. Res. High. Educ. 54, 693–723.

Sarra, A., Fontanella, L., and Di Zio, S. (2018). Identifying students at risk of academic failure within the educational data mining framework. Soc. Ind. Res. 1–20.

Servicio de Información de Educación Superior (SIES) (2022). Ministerio de Educación. Matricula en Educación Superior en Chile. https://www.mifuturo.cl/wp-content/uploads/2022/10/Matricula_Educacion_Superior_2022_SIES_.pdf

Sosu EM and Pheunpha P (2019) Trajectory of University Dropout: Investigating the Cumulative Effect of Academic Vulnerability and Proximity to Family Support. Front. Educ.

Sotomayor, C.; Valenzuela, J. P. (2018). Rentabilidad de la educación superior técnica entregada por los Centros de Formación Técnica Estudios de Políticas Públicas (pp., 120-133.).

Voelkle, M. C., and Sander, N. (2008). A structural equation approach to discrete-time survival analysis. J. Individ. Dif. 29, 134–147.

Weiss, C.H. and Connell, J.P. (1995) Nothing as Practical as Good Theory: Exploring Theory-Based Evaluation for Comprehensive Community Initiatives for Children and Families. In: New Approaches to Evaluating Community Initiatives: Concepts, Methods, and Contexts, The Aspen Institute, 65-92.
 
5:15pm - 6:45pm09 SES 03 A: Linking Education to Long-Term Outcomes
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Alli Klapp
Paper Session
 
09. Assessment, Evaluation, Testing and Measurement
Paper

General Matura Score as a Predictor of Personal Yearly Income more than 15 years later in Slovenia?

Gasper Cankar, Darko Zupanc

National Examinations Centre, Slovenia

Presenting Author: Cankar, Gasper; Zupanc, Darko

Every year, approximately 35% of Slovenian high school graduates who complete academically the most demanding upper secondary education take the General Matura (GM) examinations in Slovenia. The GM is comprised of five subject exams: Slovene language, Mathematics, First foreign language, and two subjects chosen by the student from a selection of over 30 subjects. The GM score in Slovenia is calculated as the sum of grades received in the five subject exams. The score can range from 10 (2+2+2+2+2), the lowest passing grade, to 34 (8+8+8+5+5), the highest possible score. Success on the GM is considered equivalent to completing upper secondary education, and in cases where university study programs have a limited number of applicants, GM scores are used as a selection criteria in the admissions process.

The use of GM scores for university admissions has been studied multiple times (Bucik, 2001; Cankar, 2000; Sočan, Krebl, Špeh & Kutin, 2016) with findings similar to research on external examinations in other countries (Kuncel, Hezlett, & Ones; 2001). However, there is a lack of research on the associations between GM scores (achieved at the age of 19) and various measures of personal success later in life and professional careers at ages 33-40 in national and international literature. In public discussions, you can often hear assertions that students’ school achievements and results on external exams have no relevance for later success on a labour market, income in their professional career, or other measures of success. Despite being a high-stakes examination, GM has not been systematically examined to determine its role and long-term value in the Slovenian educational system.

Our research aims to explore the predictive value of GM scores on socio-economic status (SES) and specifically on yearly income later in life for several cohorts of students. Our null hypothesis is that GM scores do not predict SES or yearly income of students in their professional careers.

While it is commonly assumed that success on the GM examinations at the end of upper secondary education is associated with success at university and to some extent later in professional careers, such claims are difficult to verify scientifically due to a lack of representative and valid data. This research aims to provide a deeper understanding of the associations between GM scores, socio-economic measures and personal success in professional careers.


Methodology, Methods, Research Instruments or Sources Used
We will utilize databases of National Examinations centre that will include whole cohorts of students taking General Matura between years 1995 and 2001, linking them to databases of Slovenian Statistical Office on yearly personal income for 2016 as reported in national tax database. We will also use other databases from National Statistical Office to create socio-economic status (SES) measure for each individual using also data on completed level of education, value of real estates owned and status of occupation.

As highest GM scores (30-34) are relatively rare, we will join cohorts together for the analysis. This will also increase statistical power. If we assume that GM graduates typically needed about five more years of university studies after GM before they entered labour market, then in the year 2016 they mostly had 10-16 years of professional career behind them. This should enable us to see some long-term effects on their income in the data.

We will explore regression models predicting yearly income or SES of graduates and use R as statistical environment for most analyses.

Conclusions, Expected Outcomes or Findings
General Matura examinations test students in many different ways and they include written and oral/internal parts, include multiple choice and open ended items, even essays and are both in form and content well aligned with curriculum. Students, who excel and achieve highest scores, most likely possess the combination of knowledge, skills, attitudes and perseverance that will enable them success in later stages of their life – at university or in the labour market.

Previous studies (Bucik, 2001; Cankar, 2000; Sočan, Krebl, Špeh & Kutin, 2016) suggest that the success at General Matura is associated with success at university. We therefore expect that it is also associated with income or SES as some of possible measures of that success later in person’s career.

Although the association should be there, expected predictive validity for selected measures of success of an individual will probably be low since individual aspirations, interests, career choices, motivation and many other factors not included in our model contribute and shape person’s career.
 
The size of associations or the lack of them will also provide insight for discussions about importance of school and school outcomes for life.  Regardless of someone’s position on engaging and often hot-tempered public discussion about meaning of GM scores or school outcomes in general this research will provide some new facts that can complement opinions and anecdotal arguments mostly present today.

With these findings in sight, the General Matura should not be seen as a goal in itself but as an indicator of person’s academic qualities that imply a success later in life.

References
Bucik, V. (2001). Napovedna veljavnost slovenske mature[Predictive validity of Slovenian Matura]. Psihološka obzorja, 10(3), 75-87.
Cankar, G. (2000). Napovedna veljavnost mature za študij psihologije [Predictive validity of Matura for Psychology Study course]. Psihološka obzorja, 9(1), 59.-68.
Kuncel, N. R., Hezlett, S. A., & Ones, D. S. (2001). A comprehensive meta-analysis of the predictive validity of the Graduate Record Examinations: Implications for graduate student selection and performance. Psychological Bulletin, 127(1), 162–81. https://doi.org/10.1037/0033-2909.127.1.162
Sočan, G., Krebl, M., Špeh, A. & Kutin, A. (2016). Predictive validity of the Slovene Matura exam for academic achievement in humanities and social sciences. Horizons of Psychology, 25, 84-93.


09. Assessment, Evaluation, Testing and Measurement
Paper

Long-Term Effect of Academic Resilience on Salary Development, an Autoregressive Mediation Model

Cecilia Thorsen1, Kajsa Yang Hansen2, Stefan Johansson2

1University West, Sweden; 2University of Gothenburg, Sweden

Presenting Author: Thorsen, Cecilia

In research on educational equity, students from socioeconomically disadvantaged homes are typically depicted as low-performers and more likely to fail in school (Sirin, 2005). There are, however, students who, despite their disadvantaged backgrounds, manage to succeed in school. This capacity to overcome adversities in education and still reach successful achievements is referred to as Academic Resilience (Agasisti et al., 2018). Academic Resilience is built upon two critical conditions, namely, exposure to significant threats or severe adversity and achievement of positive adaptation despite major assaults on the developmental process (Kiswarday, 2012). Resilience is often captured by identifying protective and risk factors that predict the likelihood of achieving resilient outcomes. Risk factors are characteristics which heighten the risk of adverse outcomes, while protective factors are characteristics that function as a buffer against negative impacts and are associated with positive adaptation and outcomes (Masten, 2014). Resilient students are often characterized by high self-confidence, perseverance, willingness and capacity to plan, and lower anxiety (Martin & Marsh, 2006, 2009; OECD, 2011), strong engagement in class and academic activities (Borman & Overman, 2004). Thorsen et. al., (2021) also found that resilient students display both more perseverance and consistency of interest over time. Hence, both cognitive and non-cognitive skills are important for academic resilience.

Research on the economics of human development also highlight the value of skill formation for success in adulthood, particularly for disadvantaged children. Societal investments in strengthening both cognitive and non-cognitive skills for disadvantaged children give significant economic returns both at individual and societal levels (e.g. Heckman, 2006). More recent studies have particularly highlighted non-cognitive skill formation as an crucial enabler. Non-cognitive skills are associated with promoting both economic and social mobility, economic productivity and well-being in adulthood (e.g. Kautz et al., 2015; Soto, 2019). A wealth of studies on the labour market aligns with this reasoning, identifying positive associations with both cognitive and non-cognitive skills and labour market outcomes. Johannesson (2017) found that cognitive abilities and non-cognitive skills (academic self-concept and perseverance) predicted the risk of being unemployed via school grades. Further, personality, i.e. extraversion and conscientiousness, was demonstrated to lead to higher earnings (Fletcher, 2013). Edin et. al., (2022) found that one standard-deviation increase in cognitive skills is associated with a salary increase of 6.6 percent and an increase in non-cognitive skills is associated with a 7,9 percent salary increase after controlling for educational attainment.

Studies on academic resilience and skill formation are scarcer. Nevertheless, some studies have found that protective factors identified during childhood and youth such as self-control and ability to plan are predictive of a more successful transition into adulthood (see Burt & Paysnick, 2012 for a review). In a qualitative study following up on four resilient students a decade later Morales (2008) found that the students continued to perform at high educational levels. The participants adapted the protective factors identified at the start of the study (i.e. self-confidence and internal locus of control) and used them to meet new challenges.

Employing a resilience perspective, the present study aims to investigate the difference in salary development among individuals who have been identified as being academically resilient versus those who are not. We also want to explore if the salary development can be attributed to educational attainment (educational history) and work status as changing conditions, and cognitive and non-cognitive skills, such as, cognitive ability, perseverance and academic self-concept as time invariant prerequisites.


Methodology, Methods, Research Instruments or Sources Used
Data were retrieved from the Evaluation through Follow-up database (ETF), a longitudinal project built on 10% randomly selected national representative samples of ten birth cohorts in Sweden (Härnqvist, 2000). The sampled students were followed up in grades 3, 6, and 9 of compulsory school (the Swedish school system consists of 9 years of compulsory education from age 7), and in upper secondary school (non-compulsory). Participants are about 9000 individuals born in 1972 from the ETF database. Of these, about 2000 individuals were identified as having low socioeconomic status (i.e., student’s parents only completed compulsory or vocational upper secondary education) and of these about 700 individuals were identified as being resilient (scoring above the country mean on the national standardized test).  
Academic self-concept (ASC) in grade 6 was measured by three items (e.g. how do you feel about doing maths, reading, writing) answers were given on a three-point scale ranging from difficult to easy. In upper secondary school ASC was measured using three items (e.g. do you experience any problems in math, reading, writing) answers were given on a 4-point scale from completely without problems to very big problems. Perseverance in grade 6 and upper secondary school was measured by four items (e.g. do you give up if you get a difficult task) answers were given on a dichotomous scale. Continuous variables were created for both constructs using the factor scores generated by a principal component analysis. Cognitive ability was measured in grade 6 using tests of inductive ability, spatial ability and verbal ability (antonyms).
Information on salary was retrieved from population statistics. Information about the salary for these individuals is available between the years 1988 and 2010.
Method of analysis
To investigate the salary growth of resilient students multiple group growth model with time varying and time invariant covariates will be used. Growth modelling allows for investigating the development of salary over time for both resilient and non-resilient students, conditioned on the development of individual’s educational attainment and work status, and on their cognitive ability, and personality traits. Academic self-concept and perseverance will be used as time-invariant covariates and educational attainment as time-varying covariates.

Conclusions, Expected Outcomes or Findings
Our preliminary results revealed that the resilient group has a slower rate of change of their salary level after the upper secondary education. This may be due to the fact that majority of the individuals in this group did not enter directly into the labour market but continue to higher education. However, we observed a steeper trajectory of salary development for these individuals after completing their higher education. The individuals in the non-resilience group held a higher starting salary level but a slower growth in their salary level over time.  Additionally, we found that both cognitive and non-cognitive factors, i.e. perseverance and academic self-concept explained the salary growth for academically resilient students. The explanation power was much lower or non-significant for their counterparts. We expect even clearer difference in the salary development between resilient and non-resilient individual groups when we control for the time varying covariates such as their education level and their work status, as well as time invariant covariates such as, IQ.  
References
Agasisti, T. et al. (2018). Academic resilience: What schools and countries do to help disadvantaged students succeed in PISA. OECD Education Working Papers, No. 167, OECD Publishing, Paris.
Borman, G. D., & Overman, L. T. (2004). Academic Resilience in Mathematics among Poor and Minority Students. The Elementary School Journal, 104(3), 177-195.
Burt K.B., & Paysnick A.A. (2012). Resilience in the transition to adulthood. Development and Psychopathology, 24(2), 493-505. doi:10.1017/S0954579412000119
Edin, Per-Anders, Peter Fredriksson, Martin Nybom, and Björn Öckert. (2022). The Rising Return to Noncognitive Skill. American Economic Journal: Applied Economics, 14 (2): https://www.aeaweb.org/articles?id=10.1257/app.20190199
Heckman, J. J. (2006). Skill Formation and the Economics of Investing in Disadvantaged Children. American Association for the Advancement of Science, 312(5782), 1900-1002. https://doi.org/10.1126/science.1128898
Härnqvist, K. (2000). Evaluation through follow-up. A longitudinal program for studying education and career development. In C.-G. Janson (ed.), Seven Swedish longitudinal studies in behavioural science (p. 76-114). Stockholm: Forskningsrådsnämnden. Retrieved from: http://hdl.handle.net/2077/2697078-100.
Johannesson, E. (2017). The Dynamic Development of Cognitive and Socioemotional Traits and Their Effects on School Grades and Risk of Unemployment. A Test of the Investment Theory. Doctoral Thesis, University of Gothenburg: Acta Universitatis Gothoburgensis.
Kiswarday, V. (2012). Empowering Resilience within the School Context. Methodological Horizons, 7(14). https://doi.org/10.32728/mo.07.1.2012.07
Kautz, T., Heckman, J.J., Diris, R., ter Weel, B., & Borghans, L. (2014). Fostering and Measuring Skills: Improving Cognitive and Non-Cognitive Skills to Promote Lifetime Success. National Bureau of Economic Research Working Paper Series, No. 20749. http://www.nber.org/papers/w20749
Martin, A. J., & Marsh, H. W. (2006). Academic resilience and its psychological and educational correlates: A construct validity approach. Psychology in the Schools, 43(3), 267-281. https://psycnet.apa.org/doi/10.1002/pits.20149
Masten, A. S. (2014). Ordinary magic: Resilience in development. New York, NY: Guilford Press. https://doi.org/10.1002/imhj.21625
Morales, E. E. (2008). Academic Resilience in Retrospect: Following Up a Decade Later. Journal of Hispanic Higher Education, 7(3), 228–248. https://doi.org/10.1177/1538192708317119
OECD. (2011). Against the odds: Disadvantaged students who succeed in school. Retrieved from http://dx.doi.org/10.1787/9789264090873-en
Sirin, S. R. (2005). Socioeconomic status and academic achievement: A meta-analytic review of research. Review of Educational Research, 75(3), 417-453.
Soto, C. J. (2019). How Replicable Are Links Between Personality Traits and Consequential Life Outcomes? The Life Outcomes of Personality Replication Project. Psychological Science, 30(5), 711-727. https://doi.org/10.1177/0956797619831612
Thorsen, C., Yang Hansen, K. and Johansson, S. (2021), The mechanisms of interest and perseverance in predicting achievement among academically resilient and non-resilient students: Evidence from Swedish longitudinal data. Br J Educ Psychol, 91: 1481-1497 e12431. https://doi.org/10.1111/bjep.12431


09. Assessment, Evaluation, Testing and Measurement
Paper

Relationship between Student Financial Aid and Degree Completion on Time in Portugal

Maria Eugenia Ferrao1,2

1Universidade da Beira Interior; 2CEMAPRE

Presenting Author: Ferrao, Maria Eugenia

Research studies on degree completion on due time are seldom in the higher education literature. Regarding the European Higher Education Area (EHEA), no study has been published so far addressing the topic of degree completion on due time (Yes/No) based on nationwide representative data. By considering an entire entrant cohort of first-time, full-time undergraduate students who attended their three-year program at the same institution, and that simultaneously considers students’ background, entrance scores and choices, eligibility for social scholarship, institutional organization characteristics and the area of study, this study explores the role of social scholarships/financial aid in overcoming the effects of students’ socioeconomic disadvantages. A previous study (Ferrão, 2023) analyzed the relationship of the aforementioned students’ characteristics on degree completion grade point average (GPA). Findings suggest that receiving (or not receiving) a social scholarship has no influence on students’ GPA, confirming recent institution research findings obtained with the Universidade of Minho data (Ferrão et al., 2021) for the 1st year GPA rating. Nevertheless, Ferrão et al. (2020) found a statistically significant fixed effect of scholarship on students’ persistence, at the level of significance of 10%. In addition, institutional research conducted at the Instituto Politécnico de Leiria points out that providing solutions for financial limitations may contribute to decrease the risk of dropping out. In fact, Carreira and Lopes (2021) report that the “main motives for dropping out referred were ‘financial difficulties’ (27% of the students) and ‘work-school incompatibility’ (20%), while ‘low academic performance’ (11%), ‘health reasons’ (8%) or migration (2%), for example, were less mentioned, confirming the importance of financial assistance to reduce dropout risk (for traditional students) […]” (p. 1353). Given that the financing of higher education in Portugal has progressively shifted from the state’s responsibility to that of the students' and their families, as in many other countries (Marginson, 2018; Tight, 2020), this calls for a more accountable evaluation of private/public funding and demands more effective social justice policies (Pitman et al., 2019). Thus, this study investigates the effect of receiving social scholarships/financial aid on degree completion on due time. Its specific objective is to estimate the fixed effect of receiving or not receiving a social scholarship on the probability of degree completion on due time. The study contributes to the themes of students’ success (degree completion), equity (financial aid), system evaluation and resource allocation. Since Portugal is one of the European countries where the costs of higher education are supported primarily by taxpayers, this topic of research matters not only for public policy regarding the increase of equity, but also for the efficiency of public resources allocation.


Methodology, Methods, Research Instruments or Sources Used
The population under study consists of students who entered undergraduate programs of 180 European Credit Transfer System (ECTS) by the national competition and who obtained (or not) their diploma three years later. The survey “Register of students enrolled in and graduated from higher education” (RAIDES) was used. The administrative RAIDES data (DGEEC - Direção-Geral de Estatísticas da Educação e Ciência, 2020), primarily collected for official statistics, offer great potential for secondary analyses such as quantitative based scientific research. The survey RAIDES is annually carried out in Portugal within the scope of the National Statistical System which is mandatory. Data were collected by higher education institutions and exported in XML format to the DGEEC twice a year (January and April; December 31 and March 31 as time reference, respectively), throughout the “Plataforma de Recolha de Informação do Ensino Superior” [Platform of Data Collection in Higher Education] (PRIES). Details on data collection, data processing and information about the agreement for data privacy protection may be found in previous studies such as Ferrão (2023) or Ferrão et al. (2022). For the purpose of this study, students’ data enrolled in the academic year 2013–14 and graduates’ data in the academic year 2016–17 were paired. Records of students who were not enrolled in their 1st year for the first time or whose access to higher education was different from the national competition are not considered.
Random coefficient models are well grounded in the literature on higher education and success measurement. In this study, multilevel logistic models are applied considering two hierarchical structures at two levels with dependent variable representing degree completion on due time (DC, Yes/No). Preliminary results were obtained with the statistical computing software MLwiN  (Rasbash et al., 2017), and the estimation procedure was the penalized quasi-likelihood of second order, PQL2 (Goldstein & Rasbash, 1996).

Conclusions, Expected Outcomes or Findings
Preliminary results show that, at the level of 5%, there is a statistically significant fixed effect of receiving a social scholarship on degree completion on time. The magnitude of the fixed effect depends on the set of controlling variables in the model. The odds ratio varies from 1.2 to 1.5. Other independent or controlling variables included in the linear predictor of the model are as follows: Entrance score, 1st choice programme-institution, gender, age at enrollment, parents’ education, area of study, non-local institution attended, type of institution, interaction between age and entrance score.
References
Carreira, P., & Lopes, A. S. (2021). Drivers of academic pathways in higher education: traditional vs. non-traditional students. Studies in Higher Education, 46(7), 1340–1355. https://doi.org/10.1080/03075079.2019.1675621
DGEEC - Direção-Geral de Estatísticas da Educação e Ciência. (2020). Documento técnico da plataforma de recolha de informaçao do Ensino Superior– RAIDES. DGEEC.
Ferrão, M. E. (2023). Differential effect of university entrance scores on graduates’ performance: the case of degree completion on time in Portugal. Assessment & Evaluation in Higher Education, 48(1), 95–106. https://doi.org/10.1080/02602938.2022.2052799
Ferrão, M. E., Almeida, L. S., & Ferreira, J. A. (2021). Higher Education equity in Portugal: On the relationship between student performance and student financial aid. World Education Research Association (WERA) 2020+1 Focal Meeting.
Ferrão, M. E., Prata, P., & Fazendeiro, P. (2022). Utility-driven assessment of anonymized data via clustering. Scientific Data, 9(456), 1–11. https://doi.org/10.1038/s41597-022-01561-6
Goldstein, H., & Rasbash, J. (1996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society. Series A (Statistics in Society), 159(3), 505–513. https://doi.org/10.2307/2983328
Marginson, S. (2018). Global trends in higher education financing: The United Kingdom. International Journal of Educational Development, 58, 26–36. https://doi.org/10.1016/j.ijedudev.2017.03.008
Pitman, T., Roberts, L., Bennett, D., & Richardson, S. (2019). An Australian study of graduate outcomes for disadvantaged students. Journal of Further and Higher Education, 43(1), 45–57. https://doi.org/10.1080/0309877X.2017.1349895
Rasbash, J., Browne, W., Healy, M., Cameron, B., & Charlton, C. (2017). MLwiN 3.05. Centre for Multilevel Modelling, University of Bristol.
Tight, M. (2020). Student retention and engagement in higher education. Journal of Further and Higher Education, 44(5), 689–704. https://doi.org/10.1080/0309877X.2019.1576860
 
Date: Wednesday, 23/Aug/2023
9:00am - 10:30am09 SES 04 A: Network 09 Keynote: International Large-Scale Assessments and Education in the 21st Century
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Sarah Howie
Network 09 Keynote Lecture
 
09. Assessment, Evaluation, Testing and Measurement
Paper

International Large-Scale Assessments and Education in the 21st Century

Samuel Greiff

University of Luxembourg

Presenting Author: Greiff, Samuel

International large-scale assessments such as the Programme for International Student Assessment (PISA) or the Programme for the International Assessment of Adult Competencies (PIAAC) have shaped the political landscape of how educational systems are evaluated and have led to profound reforms of educational systems across the world. Besides the political impact, large and representative data sets previously unknown have become available to researchers to pursue research questions that allow for the extensive application of data science, learning analytics, and big data analytics. These data sets contain information on “core domain” skills such as mathematics and reading, a host of background data, but also data on “21st century skills” such as digital learning, problem solving, and collaboration. In this presentation, I will provide an overview of the history of large-scale assessments and give an update of recent additions and innovations putting the spotlight on some of the most influential assessments and their policy and research impact. There will be a specific focus on the emerging role of 21st century skills and data science both in the reporting and analyses of large-scale assessment.


Methodology, Methods, Research Instruments or Sources Used
The invited Keynote will present and address methodological challenges relevant to the research presented
Conclusions, Expected Outcomes or Findings
The invited Keynote will provide conclusions with respect to both topic and methods and make suggestions for further research
References
The invited Keynote will provide relevant references
 
1:30pm - 3:00pm09 SES 06 A JS: Accessing Data for Educational Research: Research, Best-Practices and Practical Implications for Researchers
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Jana Strakova
Joint Symposium NW 09, NW 12
 
09. Assessment, Evaluation, Testing and Measurement
Symposium (Copy for Joint Session)

Accessing Data for Educational Research: Research, Best-Practices and Practical Implications for Researchers

Chair: Jana Strakova (Charles University, Prague)

Discussant: David Schiller (UniverGraubünden)

Open Science principles require that data that are collected and analysed as part of research projects are made available to other researchers at the end the project (van der Zee & Reich, 2018). This allows not only for replication, validation and generalization of research findings (van der Zee & Reich, 2018; (Tedersoo et al., 2021), but also for secondary data analyses. In general, data sharing is crucial for efficiency of scientific knowledge generation (Allen & Mehler, 2019; Nosek et al., 2015). It is also highly valuable for the individual researcher as scientific articles for which the data are published are cited more than articles for which the data are not available (Colavizza et al., 2020; Drachen et al., 2016; Piwowar et al., 2007). While in educational research more and more data are available for secondary analyses, a considerable proportion are not shared.

The proposed symposium aims at shedding more light on the factors explaining the reluctance to make data available as well as giving an overview of what can be accessed and puts an emphasis on the legal requirements. It describes the educational data landscape across several European countries and elaborates on specific legal aspects researchers struggle with. More precisely, it highlights the potential of the rich data that exist but is not (yet) available to secondary users. Often, researchers are willing to share their data but are insecure about how to make the data sharable and how to properly comply with the legal aspects, e.g. consent and copyright. Moreover, researchers are not always aware of options of restricted access and different layers of protection (including consent, anonymisation/pseudonymisation, and restricted access). Therefore, in this symposium we elaborate on challenges and best practices for sharing research data and provide practical guidance.

The symposium is a joint effort of researchers from four institutions from different European countries working on various aspects of data reuse and access to facilitate high quality educational research. The involved institutions are the DIPF | Leibniz Institute for Research and Information in Education, the International Association for the Evaluation of Educational Achievement (IEA), the University of Applied Sciences, Graubünden, and the Swiss Centre of Expertise in the Social Sciences (FORS).

The symposium consists of three contributions. The first paper takes a comparative perspective on the availability of educational research data in five European countries, namely England, Norway, France, Sweden and Switzerland. These countries are compared along a number of relevant factors with regard to data access. Based on the analysis, implications for practice are derived. The second paper provides insight into the legal challenges of data sharing in an international research project. The third paper zooms in on the topic of the GDPR, it consists of two parts. First, the GDPR and its implications on research in education are described. Then, a description of how it got implemented in the context of TIMSS 2023.

With this symposium we aim at engaging in a debate with members from Network 12 “Open Science in Education”, as well as researchers from the other networks and we encourage emerging researchers to join the debate.


References
Allen, C., & Mehler, D. M. A. (2019). Open science challenges, benefits and tips in early career and beyond. PLoS Biol, 17(5), e3000246. doi:10.1371/journal.pbio.3000246

Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K., & McGillivray, B. (2020). The citation advantage of linking publications to research data. PloS one, 15(4), e0230416. https://doi.org/10.1371/journal.pone.0230416

Drachen, T. M., Ellegaard, O., Larsen, A. V., & Dorch, S. B. F. (2016, 08/15). Sharing data increases citations. LIBER Quarterly: The Journal of the Association of European Research Libraries, 26(2), 67-82. https://doi.org/10.18352/lq.10149

Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., . . . Yarkoni, T. (2015). Promoting an open research culture. Science, 348(6242), 1422-1425. doi:doi:10.1126/science.aab2374

Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). Sharing Detailed Research Data Is Associated with Increased Citation Rate. PloS one, 2(3), e308. https://doi.org/10.1371/journal.pone.0000308

Tedersoo, L., Küngas, R., Oras, E., Köster, K., Eenmaa, H., Leijen, Ä., Pedaste, M., Raju, M., Astapova, A., Lukner, H., Kogermann, K., & Sepp, T. (2021, 2021/07/27). Data sharing practices and data availability upon request differ across scientific disciplines. Scientific data, 8(1), 192. https://doi.org/10.1038/s41597-021-00981-0

van der Zee, T., & Reich, J. (2018). Open Education Science. AERA Open, 4(3), 2332858418787466. https://doi.org/10.1177/2332858418787466

 

Presentations of the Symposium

 

Research Data on Education and Learning: Access, Availability and Challenges in Five European Contexts

Marieke Heers (FORS, University of Lausanne), David Schiller (University of Applied Sciences, Graubünden), Rahel Haymoz (University of Applied Sciences, Graubünden)

In recent years, there have been huge developments in the data landscape on education and learning. This is due to several reasons. First, the ways data can be collected and analysed. In addition, research data infrastructures at both the national and international level have taken a more prominent role and guide researchers throughout the research process with regard to questions relating to data management (Corti et al., 2019). Moreover, there has been a growing understanding that sound educational policy-making requires sound evidence based on high-quality and accessible data. Many research funders now require that data resulting from funded projects will be made available at the end of the project (Logan, Hart & Schatschneider, 2021). At the same time, the legal frameworks have evolved and several countries now allow for more easy data linkages. For example, linking data from the administrations to survey data has become more common (Harron et al., 2017). In this contribution, we draw a picture of the educational research data landscape in five European countries. The aim is threefold: We provide a description of the educational data landscapes across varying contexts, compare them and derive conclusions on what can be learnt from one context to the next. The countries included in the study are England, Norway, France, Sweden and Switzerland. These countries have been selected as contrasting cases in how educational data are administered, provided and how they can be accessed by researchers. The countries also have different legal bases for working with sensitive data and vary in the degree to which data infrastructures are centralized. In order to get a complete view on the educational data landscape in these countries we have carried out interviews with experts from each of the countries. The five countries are compared along the following characteristics: (1) Main data providers (e.g. universities, statistical offices, institutional repositories); (2) Types of available data (level of education, statistical data, learning systems, competencies etc.); (3) Possibilities for data linkage; (5) Laws and Regulations. The analysis shows that there is considerable variation in terms of what data are made available, the procedures, coverage and handling. At the same time, there are some commonalities across countries. The findings allow us to draw conclusions for future directions of the educational data landscape. We make suggestions on what can be learnt from other contexts and what would facilitate high-quality data-driven educational research that could inform educational policy and practice.

References:

Corti, L., Van den Eynden, V., Bishop, L., & Woollard, M. (2019). Managing and sharing research data: A guide to good practice (2nd ed.). London: Sage. Harron, K., Dibben, C., Boyd, J., Hjern, A., Azimaee, M., Barreto, M. L., & Goldstein, H. (2017). Challenges in administrative data linkage for research. Big data & society, 4(2), 2053951717745678. Logan, J. A. R., Hart, S. A., & Schatschneider, C. (2021, 2021/01/01). Data Sharing in Education Science. AERA Open, 7, 23328584211006475. https://doi.org/10.1177/23328584211006475
 

Navigating the Legal Complexities of Data Sharing in International Educational Research: Insights from a Case Study

Sonja Bayer (DIPF, Leibniz Institute for Research and Information in Education), Alexia Meyermann (DIPF, Leibniz Institute for Research and Information in Education)

Effective data sharing is a crucial principle in fostering transparency, reproducibility, and collaboration in scientific research, making it a vital topic in the context of Open Science. Additionally, the ability to find, access, interoperate and reuse data is crucial for advancing scientific discovery and innovation, making it relevant to the FAIR data principles (Wilkinson et al., 2016). This presentation highlights the legal challenges of sharing data from international research projects using the example of a qualitative research project in German and Australian schools. The research project was accompanied by a case study. The case study analysed issues such as data protection, intellectual property and compliance to gain a better understanding of how international data sharing can be done effectively and ethically. The study provides valuable insights for anyone involved in international research projects and helps uncover potential legal challenges. In the field of educational research, data collected is often highly personal and may include minors. However, researchers in education often have little legal knowledge, it is not part of their training. Especially, in an international context, different legal regulations make it difficult to comply with data collection and sharing. To investigate this issue, we selected an international qualitative research project for a case study (Argyrou 2017; Eisenhardt 2002; Yin 2018). The international project examined a digital student exchange between Australian and German students, using video recordings, interviews, student-created videos, chat communications, digital classroom materials, and student-drawn language biographies. This raised complex legal questions about the handling of students' personal data and the handling of personal data of third parties, such as their families. Copyright issues are also key, especially when copyrighted material is to be shared as part of Open Science and used in scientific publications. For the case study, we explored the challenges and issues that arose in collecting and sharing data of the international research project as well as the resources and opportunities to find solutions through interviews and documentation analysis. In our presentation, we summarize the legal challenges of collecting and sharing research data in the international project and show examples of the measures taken to overcome these challenges. Overall, the results of our case study provide valuable insights for those involved in international educational research projects and help identify and deal with potential legal challenges.

References:

Argyrou, A. (2017). Making the case for case studies in empirical legal research. Utrecht Law Review.org, 13(3)2017, http://doi.org/10.18352/ulr.409 Wilkinson, M. D. et al. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3 (160018), doi: 10.1038/sdata.2016.18 (2016). Yin, R. K. (2018). Case study research and applications: design and methods. Los Angeles: Sage.K. Eisenhardt, K. (2002). Building Theories from Case Study Research. In M. Huberman & M. B. Miles. The Qualitative Researcher's Companion. Los Angeles: Sage.K.
 

Taking Data Protection Seriously in International Educational Research: Insights from TIMSS 2023

Betina Borisova (IEA, the Netherlands), Paulína Koršňáková (IEA, the Netherlands)

The General Data Protection Regulation (GDPR), applicable in the European Economic Area (EEA), came into force in 2018 and impacted the day-to-day activities of anyone, who handles personal data, including IEA and the network of National Research Coordinators implementing IEA comparative studies and collecting data from students, parents/guardians, teachers, and school principals. This presentation takes the view that the GDPR provides some benefits for International Educational Research and documents how IEA interpreted and implemented some of the requirements of the GDPR and the benefits of these arrangements for participants in IEA Studies, IEA data and Open Science as a whole. The first aspect we look at is the obligation to provide information to the data subject, where personal information is collected from them. The data subject must be informed inter alia about the purpose of the processing, who will process their personal, who will have access to it, would it be transferred outside of the European Economic Area, for long the personal data will be processed and where it will be stored. In the field of educational research, this obligation imposed on the data controller, benefits the research participants by respecting the child’s evolving capacities. To this end, IEA has prepared a Data Protection Declaration template, which provides participating students, their parents/guardians and teachers with the necessary information. A step further is to prepare an additional child-friendly version of these documents, targeting 4 and 8 grade students. A second aspect to consider is some of GDPR’s core principles of processing of personal data and their positive impact for International Educational Research. Respecting principles such as lawfulness, fairness and transparency, purpose limitation, data minimization, integrity and confidentiality, as well as accountability strengthens the integrity and the ethics of the research and keep researchers accountable. Compliance with the GDPR also directly benefits IEA data itself, as it ensures that the collected data can be later analysed and used for research purposes. Using TIMSS 2023 as an example, this presentation seeks to show how the rules of data protection can be reconciled with the goals of Open Science to the benefit of study participants and international educational research. While IEA encourages Open Science by having IEA data publicly available, as part of the international report and the International Database (IDB), IEA also ensures that any personal data of participants is protected by adopting appropriate safeguards such as anonymization and pseudonymization techniques.

References:

The full title of the GDPR: REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)
 
1:30pm - 3:00pm12 SES 06 A JS: Accessing Data for Educational Research: Research, Best-Practices and Practical Implications for Researchers
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Jana Strakova
Joint Symposium NW 09, NW 12
 
12. Open Research in Education
Symposium

Accessing Data for Educational Research: Research, Best-Practices and Practical Implications for Researchers

Chair: Jana Strakova (Charles University, Prague)

Discussant: David Schiller (UniverGraubünden)

Open Science principles require that data that are collected and analysed as part of research projects are made available to other researchers at the end the project (van der Zee & Reich, 2018). This allows not only for replication, validation and generalization of research findings (van der Zee & Reich, 2018; (Tedersoo et al., 2021), but also for secondary data analyses. In general, data sharing is crucial for efficiency of scientific knowledge generation (Allen & Mehler, 2019; Nosek et al., 2015). It is also highly valuable for the individual researcher as scientific articles for which the data are published are cited more than articles for which the data are not available (Colavizza et al., 2020; Drachen et al., 2016; Piwowar et al., 2007). While in educational research more and more data are available for secondary analyses, a considerable proportion are not shared.

The proposed symposium aims at shedding more light on the factors explaining the reluctance to make data available as well as giving an overview of what can be accessed and puts an emphasis on the legal requirements. It describes the educational data landscape across several European countries and elaborates on specific legal aspects researchers struggle with. More precisely, it highlights the potential of the rich data that exist but is not (yet) available to secondary users. Often, researchers are willing to share their data but are insecure about how to make the data sharable and how to properly comply with the legal aspects, e.g. consent and copyright. Moreover, researchers are not always aware of options of restricted access and different layers of protection (including consent, anonymisation/pseudonymisation, and restricted access). Therefore, in this symposium we elaborate on challenges and best practices for sharing research data and provide practical guidance.

The symposium is a joint effort of researchers from four institutions from different European countries working on various aspects of data reuse and access to facilitate high quality educational research. The involved institutions are the DIPF | Leibniz Institute for Research and Information in Education, the International Association for the Evaluation of Educational Achievement (IEA), the University of Applied Sciences, Graubünden, and the Swiss Centre of Expertise in the Social Sciences (FORS).

The symposium consists of three contributions. The first paper takes a comparative perspective on the availability of educational research data in five European countries, namely England, Norway, France, Sweden and Switzerland. These countries are compared along a number of relevant factors with regard to data access. Based on the analysis, implications for practice are derived. The second paper provides insight into the legal challenges of data sharing in an international research project. The third paper zooms in on the topic of the GDPR, it consists of two parts. First, the GDPR and its implications on research in education are described. Then, a description of how it got implemented in the context of TIMSS 2023.

With this symposium we aim at engaging in a debate with members from Network 12 “Open Science in Education”, as well as researchers from the other networks and we encourage emerging researchers to join the debate.


References
Allen, C., & Mehler, D. M. A. (2019). Open science challenges, benefits and tips in early career and beyond. PLoS Biol, 17(5), e3000246. doi:10.1371/journal.pbio.3000246

Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K., & McGillivray, B. (2020). The citation advantage of linking publications to research data. PloS one, 15(4), e0230416. https://doi.org/10.1371/journal.pone.0230416

Drachen, T. M., Ellegaard, O., Larsen, A. V., & Dorch, S. B. F. (2016, 08/15). Sharing data increases citations. LIBER Quarterly: The Journal of the Association of European Research Libraries, 26(2), 67-82. https://doi.org/10.18352/lq.10149

Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., . . . Yarkoni, T. (2015). Promoting an open research culture. Science, 348(6242), 1422-1425. doi:doi:10.1126/science.aab2374

Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). Sharing Detailed Research Data Is Associated with Increased Citation Rate. PloS one, 2(3), e308. https://doi.org/10.1371/journal.pone.0000308

Tedersoo, L., Küngas, R., Oras, E., Köster, K., Eenmaa, H., Leijen, Ä., Pedaste, M., Raju, M., Astapova, A., Lukner, H., Kogermann, K., & Sepp, T. (2021, 2021/07/27). Data sharing practices and data availability upon request differ across scientific disciplines. Scientific data, 8(1), 192. https://doi.org/10.1038/s41597-021-00981-0

van der Zee, T., & Reich, J. (2018). Open Education Science. AERA Open, 4(3), 2332858418787466. https://doi.org/10.1177/2332858418787466

 

Presentations of the Symposium

 

Research Data on Education and Learning: Access, Availability and Challenges in Five European Contexts

Marieke Heers (FORS, University of Lausanne), David Schiller (University of Applied Sciences, Graubünden), Rahel Haymoz (University of Applied Sciences, Graubünden)

In recent years, there have been huge developments in the data landscape on education and learning. This is due to several reasons. First, the ways data can be collected and analysed. In addition, research data infrastructures at both the national and international level have taken a more prominent role and guide researchers throughout the research process with regard to questions relating to data management (Corti et al., 2019). Moreover, there has been a growing understanding that sound educational policy-making requires sound evidence based on high-quality and accessible data. Many research funders now require that data resulting from funded projects will be made available at the end of the project (Logan, Hart & Schatschneider, 2021). At the same time, the legal frameworks have evolved and several countries now allow for more easy data linkages. For example, linking data from the administrations to survey data has become more common (Harron et al., 2017). In this contribution, we draw a picture of the educational research data landscape in five European countries. The aim is threefold: We provide a description of the educational data landscapes across varying contexts, compare them and derive conclusions on what can be learnt from one context to the next. The countries included in the study are England, Norway, France, Sweden and Switzerland. These countries have been selected as contrasting cases in how educational data are administered, provided and how they can be accessed by researchers. The countries also have different legal bases for working with sensitive data and vary in the degree to which data infrastructures are centralized. In order to get a complete view on the educational data landscape in these countries we have carried out interviews with experts from each of the countries. The five countries are compared along the following characteristics: (1) Main data providers (e.g. universities, statistical offices, institutional repositories); (2) Types of available data (level of education, statistical data, learning systems, competencies etc.); (3) Possibilities for data linkage; (5) Laws and Regulations. The analysis shows that there is considerable variation in terms of what data are made available, the procedures, coverage and handling. At the same time, there are some commonalities across countries. The findings allow us to draw conclusions for future directions of the educational data landscape. We make suggestions on what can be learnt from other contexts and what would facilitate high-quality data-driven educational research that could inform educational policy and practice.

References:

Corti, L., Van den Eynden, V., Bishop, L., & Woollard, M. (2019). Managing and sharing research data: A guide to good practice (2nd ed.). London: Sage. Harron, K., Dibben, C., Boyd, J., Hjern, A., Azimaee, M., Barreto, M. L., & Goldstein, H. (2017). Challenges in administrative data linkage for research. Big data & society, 4(2), 2053951717745678. Logan, J. A. R., Hart, S. A., & Schatschneider, C. (2021, 2021/01/01). Data Sharing in Education Science. AERA Open, 7, 23328584211006475. https://doi.org/10.1177/23328584211006475
 

Navigating the Legal Complexities of Data Sharing in International Educational Research: Insights from a Case Study

Sonja Bayer (DIPF, Leibniz Institute for Research and Information in Education), Alexia Meyermann (DIPF, Leibniz Institute for Research and Information in Education)

Effective data sharing is a crucial principle in fostering transparency, reproducibility, and collaboration in scientific research, making it a vital topic in the context of Open Science. Additionally, the ability to find, access, interoperate and reuse data is crucial for advancing scientific discovery and innovation, making it relevant to the FAIR data principles (Wilkinson et al., 2016). This presentation highlights the legal challenges of sharing data from international research projects using the example of a qualitative research project in German and Australian schools. The research project was accompanied by a case study. The case study analysed issues such as data protection, intellectual property and compliance to gain a better understanding of how international data sharing can be done effectively and ethically. The study provides valuable insights for anyone involved in international research projects and helps uncover potential legal challenges. In the field of educational research, data collected is often highly personal and may include minors. However, researchers in education often have little legal knowledge, it is not part of their training. Especially, in an international context, different legal regulations make it difficult to comply with data collection and sharing. To investigate this issue, we selected an international qualitative research project for a case study (Argyrou 2017; Eisenhardt 2002; Yin 2018). The international project examined a digital student exchange between Australian and German students, using video recordings, interviews, student-created videos, chat communications, digital classroom materials, and student-drawn language biographies. This raised complex legal questions about the handling of students' personal data and the handling of personal data of third parties, such as their families. Copyright issues are also key, especially when copyrighted material is to be shared as part of Open Science and used in scientific publications. For the case study, we explored the challenges and issues that arose in collecting and sharing data of the international research project as well as the resources and opportunities to find solutions through interviews and documentation analysis. In our presentation, we summarize the legal challenges of collecting and sharing research data in the international project and show examples of the measures taken to overcome these challenges. Overall, the results of our case study provide valuable insights for those involved in international educational research projects and help identify and deal with potential legal challenges.

References:

Argyrou, A. (2017). Making the case for case studies in empirical legal research. Utrecht Law Review.org, 13(3)2017, http://doi.org/10.18352/ulr.409 Wilkinson, M. D. et al. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3 (160018), doi: 10.1038/sdata.2016.18 (2016). Yin, R. K. (2018). Case study research and applications: design and methods. Los Angeles: Sage.K. Eisenhardt, K. (2002). Building Theories from Case Study Research. In M. Huberman & M. B. Miles. The Qualitative Researcher's Companion. Los Angeles: Sage.K.
 

Taking Data Protection Seriously in International Educational Research: Insights from TIMSS 2023

Betina Borisova (IEA, the Netherlands), Paulína Koršňáková (IEA, the Netherlands)

The General Data Protection Regulation (GDPR), applicable in the European Economic Area (EEA), came into force in 2018 and impacted the day-to-day activities of anyone, who handles personal data, including IEA and the network of National Research Coordinators implementing IEA comparative studies and collecting data from students, parents/guardians, teachers, and school principals. This presentation takes the view that the GDPR provides some benefits for International Educational Research and documents how IEA interpreted and implemented some of the requirements of the GDPR and the benefits of these arrangements for participants in IEA Studies, IEA data and Open Science as a whole. The first aspect we look at is the obligation to provide information to the data subject, where personal information is collected from them. The data subject must be informed inter alia about the purpose of the processing, who will process their personal, who will have access to it, would it be transferred outside of the European Economic Area, for long the personal data will be processed and where it will be stored. In the field of educational research, this obligation imposed on the data controller, benefits the research participants by respecting the child’s evolving capacities. To this end, IEA has prepared a Data Protection Declaration template, which provides participating students, their parents/guardians and teachers with the necessary information. A step further is to prepare an additional child-friendly version of these documents, targeting 4 and 8 grade students. A second aspect to consider is some of GDPR’s core principles of processing of personal data and their positive impact for International Educational Research. Respecting principles such as lawfulness, fairness and transparency, purpose limitation, data minimization, integrity and confidentiality, as well as accountability strengthens the integrity and the ethics of the research and keep researchers accountable. Compliance with the GDPR also directly benefits IEA data itself, as it ensures that the collected data can be later analysed and used for research purposes. Using TIMSS 2023 as an example, this presentation seeks to show how the rules of data protection can be reconciled with the goals of Open Science to the benefit of study participants and international educational research. While IEA encourages Open Science by having IEA data publicly available, as part of the international report and the International Database (IDB), IEA also ensures that any personal data of participants is protected by adopting appropriate safeguards such as anonymization and pseudonymization techniques.

References:

The full title of the GDPR: REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)
 
3:30pm - 5:00pm09 SES 07 A: Exploring Behavior, Learning, and Well-being in Diverse Educational Contexts
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Sarah Howie
Paper and Ignite Talk Session
 
09. Assessment, Evaluation, Testing and Measurement
Paper

Theory-Based Behavioral Indicators for Children’s Purchasing Self-Control in a Computer-Based Simulated Supermarket

Philine Drake1, Johannes Hartig1, Manuel Froitzheim2, Gunnar Mau3, Hanna Schramm-Klein2

1DIPF, Germany; 2School of Economic Disciplines, University of Siegen, Germany; 3Department of Psychology, DHGS German University of Health and Sport, Berlin, Germany

Presenting Author: Drake, Philine

Market demands with their multitude of stimuli and information can be particularly overwhelming for children, whose cognitive abilities and skills are not yet fully developed and who lack market experience and knowledge (Mau et al., 2014). To better understand children’s purchase decisions, associated behaviors, and deficits, Mau et al. (2016) analyzed children’s behaviors in a simulated supermarket environment. They showed that children often behave differently at the point of sale than they intended and expected to when making their purchase decision. Slightly more than half of the children indicated that they would primarily look for low prices when shopping. Although the children in the subsequent observation of their shopping behavior had a limited budget and were tasked with buying the cheapest products, it was found that they clearly tended to select products more based on package design or brand. These findings indicate that children have difficulty implementing a basic requirement of goal-oriented consumer behavior, namely, taking the right actions to achieve a set goal (Bagozzi & Dholakia, 1999).

Regarding the question of how actions in purchasing processes could be implemented in a goal-oriented manner, we draw on basic theoretical frameworks of action regulation that include feedback loops, such as the cybernetic TOTE model (Powers, 1973). Carver and Scheier (1981) assume that the successful realization of a goal state (e.g., fulfillment of the shopping list) occurs by passing through loops in which the existing and target states (e.g., contents of the shopping cart vs. the shopping list) are repeatedly compared with each other and a deviation is successively reduced by operations until the loop is exited. Consequently, the execution of action is expressed as a sequence of corresponding operations and is always in the interplay between the goals of the agent and the situational requirements. According to Gillebaart (2018), setting standards or goals as well as monitoring deviations are aspects of self-regulation, while successful self-control comes into play within the feedback loop in the ‘operate’ phase. Self-control has been described by Baumeister et al. (2007) as the mental processes that enable people to control their thoughts, emotions, and behaviors to achieve higher-level goals. While operating, various aspects of self-control can be observed, such as suppressing the impulse to be tempted by alluring stimuli that are not in line with our long-term goals (e.g., completing the shopping list), avoiding situations that might lead one into temptation (e.g., forgo the candy shelf), or even delaying gratification with an immediate, smaller reward in order to obtain a larger, delayed reward. According to Inzlicht et al. (2014), as a result of repeated self-control efforts, there may be a change in the degree of self-control displayed. This is attributed to a change in task priorities, a shift in motivation away from so-called "have-to" to "want-to" goals that provide more pleasure and satisfaction. Therefore, the process of action regulation is always under the influence of changing motivations and the attendant changes in emotions and attention. Although self-control has been highlighted in its importance for the successful implementation of consumer goals (e.g., by avoiding impulsive purchases; Baumeister, 2002), there is still no study that specifically captures children's operations in the purchasing process and relates the extent of self-controlled behavior to the successful implementation of a purchase intention.


Methodology, Methods, Research Instruments or Sources Used
To address this gap, we used a computer-based supermarket simulation to study children's shopping behavior at the point of sale. In this task, children were asked to complete a shopping task based on a shopping list in the supermarket simulation. The supermarket simulation is designed so that, at the behavioral level, children's attentional behavior can be inferred from the log data of the computer-based task (Silberer, 2009). Attentional behavior includes observable attention to objects in the store environment, i.e., how often or how long children look at individual products. The supermarket simulation was intentionally designed to include elements that are not required for the performance task. The extent to which children engage with these irrelevant elements can be gauged from their attentional behavior and, at the behavioral level, enables differentiation between actions that are more conducive to have-to goals (as defined by the task) or want-to goals as defined by Inzlicht et al. (2014).  
The data analysis focused on whether the covariance among behavioral indicators hypothesized to capture self-control (e.g., the extent of engagement with task-irrelevant products) could be explained by a single common factor and how that factor was related to task success, monitoring of task performance, and spending. A sample of 136 elementary school children was given a shopping list and a limited budget. To extract behavioral indicators from the log data, we used the finite-state machine approach (Kroehne & Goldhammer, 2018).  

Conclusions, Expected Outcomes or Findings
A one-dimensional confirmatory factor analysis (CFA) with all assumed indicators was conducted. The model for self-control included four variables: The temporal extent to which children paid attention to irrelevant shelves (S1) or products (S2), the frequency with which they purchased irrelevant products that were not on the shopping list (S3), or visited irrelevant shelves (S4). The model showed a largely good fit (χ2(1) = 2.276, p = .131, RMSEA = 0.105, 90% RMSEA CI [.000, .294], CFI = 0.993, TLI = 0.956, SRMR = .04). Only the RMSEA exceeded the cut-off criterion. Task success was estimated using the partial credit model. The significant correlation between task success (WLE) and the factor for self-control (r(113)=.44, p<.001) indicates that self-control plays an important role in the purchase process. Our results also show that children who monitored their spending (imprecision of estimates, r(108)= -.35, p<.001) and task success (r(111)=.40, p<.001) more carefully tended to show greater self-control in task performance. Our study illustrates how theory-based factors can be extracted from log data of computerized tasks and demonstrates their diagnostic potential, which can be used to improve the quality and richness of psychological and educational assessments.
References
Bagozzi, R. P., & Dholakia, U. (1999). Goal Setting and Goal Striving in Consumer Behavior. Journal of Marketing, 63(4), 19-32. https://doi.org/10.1177/00222429990634s104
Baumeister, R. F. (2002). Yielding to temptation: Self-control failure, impulsive purchasing, and consumer behavior. Journal of Consumer Research, 28(4), 670-676. https://doi.org/10.1086/338209
Baumeister, R. F., Vohs, K. D., & Tice, D. M. (2007). The Strength Model of Self-Control. Current Directions in Psychological Science, 16(6), 351-355. https://doi.org/10.1111/j.1467-8721.2007.00534.x
Carver, C. S., & Scheier, M. F. (1981). The self-attention-induced feedback loop and social facilitation. Journal of Experimental Social Psychology, 17(6), 545-568. https://doi.org/10.1016/0022-1031(81)90039-1
Gillebaart, M. (2018). The ‘operational’definition of self-control. Frontiers in psychology, 9, 1231. https://doi.org/10.3389/fpsyg.2018.01231
Inzlicht, M., Schmeichel, B. J., & Macrae, C. N. (2014). Why self-control seems (but may not be) limited. Trends in cognitive sciences, 18(3), 127-133. https://doi.org/10.1016/j.tics.2013.12.009
Kroehne, U., & Goldhammer, F. (2018). How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items. Behaviormetrika, 45(2), 527-563. https://doi.org/10.1007/s41237-018-0063-y
Mau, G., Schramm-Klein, H., & Reisch, L. (2014). Consumer socialization, buying decisions, and consumer behaviour in children: Introduction to the special issue. Journal of Consumer Policy, 37(2), 155-160. https://doi.org/10.1007/s10603-014-9258-0
Mau, G., Schuhen, M., Steinmann, S., & Schramm-Klein, H. (2016). How children make purchase decisions: Behaviour of the cued processors. Young Consumers, 17(2), 111-126. https://doi.org/10.1108/YC-10-2015-00563
Powers, W. T. (1973). Behavior: The Control of Perception. Chicago, IL: Aldine.
Silberer, G. (2009). Verhaltensforschung am Point of Sale-Ansatzpunkte und Methodik. Universitätsverlag Göttingen.


09. Assessment, Evaluation, Testing and Measurement
Paper

Learning Collaboration in the School Context in Serbia: Student Perceptions

Dragica Pavlović Babić, Marina Videnović, Smiljana Jošić, Kristina Mojović Zdravković

University of Belgrade, Serbia

Presenting Author: Pavlović Babić, Dragica

The increasing interest in collaboration as an educational competence important for successful schooling and a productive adult professional and civic life can be seen in the expanding literature and research evidence (e.g. Rychen & Salganik, 2003; National Research Council, 2011). Collaboration is marked as one of the social and emotional skills on the 2030 education development agenda, defined by the intergovernmental Organisation for Economic Co-operation and Development (OECD, 2019).

Collaborative problem solving (CPS) is an umbrella term for a variety of pedagogical models that enable students to learn by engaging in joint activities, relying on each other, integrating individual knowledge, skills, and efforts (Lai, 2011). With appropriate support and scaffolding, CPS could have a greater positive effect on student achievement, and peer social relationships than competitive and individual learning (e.g. Gillies, 2016; Johnson & Johnson, 2002).

The focus of this study is on students experience with CPS as simetric peer interaction during the regular school classes. These perceptions and experiences represented a base for examining how CPS is applying in context of secondary schools in Serbia. The research suggests that a productive collaboration requires both cognitive skills (e.g. Campbell, 2021; Shi et al, 2021), as well as social and emotional skills (e.g. Newman, 2016; Rogat & Adams-Wiggins, 2015). That is why we paid attention how students have reported about cognitive (e.g, argumentation, consideration and evaluation of various perspectives...) and social and emotional aspects (group cohesion, tolerance, atmosphere…) of collaborative work.

Analysing student responses to semi-structured interviews showed that, with certain inconsistencies and overlaps, two models of cooperation are clearly differentiated, presented here by key features.

Model 1 is oriented towards an efficient use of resources, including time, with a dominant utilitarian goal - getting the job done. It is characterized by a strict division of responsibilities, usually mechanical. The roles are defined, including the leader who can be self-proclaimed. The product is a collection of individual works: either loosely bound or bound by one group member. Solution/product quality is judged on the basis of external indicators. Cognitive aspect of CPS includes prior knowledge seen as a key success factor. Social and emotional aspect: a strict division of roles and the leader’s assumption of responsibility often excludes democratic patterns of behaviour such as negotiation and agreement; the atmosphere in the group depends on the degree of closeness of the members, any disagreement during group work can grow into a conflict.

This model can be termed parallel or utilitarian and quasi-cooperation, as the key cooperation determinants cannot be easily identified, except for work arrangements. According to our respondents’ experiences, this model dominates.

Model 2 is oriented primarily towards product quality; sometimes learning cooperation as a competence is cited as an explicit cooperation goal. Cooperation primarily has a cognitive goal, reflected in the usage of search strategies for task solving. There is a loose division of responsibilities and roles, usually according to participant competencies and interests; deadlines are on the back burner. The product is based on group consensus. Cognitive aspect includes awareness of the importance of argumentation and discussion Social and emotional aspect: an atmosphere of mutual trust and equality between team members remove barriers and allow freedom in presenting and considering different solutions and/or ways of solving tasks. There is mutual knowledge and respect. Cognitive and social and emotional aspects are interconnected, manifesting itself as solidarity with others and connecting as a form of strengthening personal capacities.

This model can be called collaborative or constructive due to its orientation towards the joint construction of knowledge. Unfortunately, according to student experiences, it is rarely represented in school practice.


Methodology, Methods, Research Instruments or Sources Used
The study was conducted at the end of the 2021/2022 school year and included six secondary schools in Belgrade (3 vocational and 3 general/gymnasium schools). The sample consisted of 31 second grade students (17 female), 15-17 years old. All students involved in research had a formal parental consent and their assent.  Students were examined with a semi-structured interview which lasted approximately 60 minutes.
The adolescent answered the questions related to their perceptions of cooperation in everyday school work. The interview guide consists of five indicators, i.e. thematic units. The first indicator referred to the general impression of cooperation in the school context, whereupon the students were asked about the frequency and quality of peer cooperation in and outside school. The second theme was peer cooperation in the school context - what the organization of group work looks like in and outside class, and what are the advantages and disadvantages of group work concerning individual school work. The third indicator included questions related to the recognition of successful and unsuccessful peer cooperation factors, where they discussed the roles of different actors in group work and described the experiences of successful and unsuccessful group works in which they had participated. The fourth topic was cooperation as a competence, where the accent was on how competence is acquired and manifested, its importance, as well as whether and to what extent young people possess it. Finally, the fifth topic covered personal perspective, i.e. an assessment of personal competences for cooperation.
Following the coding of interview transcripts, 612 coded segments were analyzed using MaxQDA according to thematic analysis.

Conclusions, Expected Outcomes or Findings
Several conclusions can be drawn from these findings, with significant implications for the organization of regular classes in the Serbian educational system.
During joint work at school, important aspects of CPS (argument, sharing ideas...) are often missing. Research shows that the successful development of collaborative skills requires the support of adults (eg, Gonzalez-Howard & McNeill, 2019; Rojas-Drummond & Mercer, 2003). Our results indicate that this support is often lacking. It is necessary to think about how to organize teaching that would go in the direction of encouraging the development of these skills.
The parallel model, although more present in school practice, is a model that supports quasi-cooperation, as it only has the form of cooperative work, but lacks the features of the processes that define cooperation. Student learning of quasi-cooperation can have lasting implications for student competencies, and thus it is recommended that the system recognize this organizational form of work as not effective.
Time, restricted on 45 minutes what is the duration of school hours, could be an obstacle to organize the cooperation in the school context. We perceive time management as a particularly sensitive point in collaborative learning and/or learning through collaboration. A strategy that is often applied in these cases is to transfer a task to an extracurricular environment (such as homework), which has both good and bad sides.
Finally, the two models presented are not developmental stages in the learning of cooperation in the school context but rather, two qualitatively different approaches. In fact, practicing the first one will not enable a transition to the second, constructive model.

References
Rychen, D. S., & Salganik, L. H. (Eds.). (2003). Key competencies for a successful life and a well-functioning society. Hogrefe & Huber Publishers.
National Research Council. (2011). Assessing 21st Century Skills: Summary of a Workshop. National Academies Press (US). http://www.ncbi.nlm.nih.gov/books/NBK84218/
OECD. (2019). Future of Education and Skills 2030. OECD Publishing.
Lai, E. (2011). Collaboration: A Literature Review. Pearson, Princeton.
Gillies, R. M. (2016). Cooperative learning: review of research and practice. Australian Journal of Teacher Education (Online), 41(3), 39-54. https://search.informit.org/doi/10.3316/informit.977489802155242
Johnson, D., Johnson, R. (2002). Learning together and alone: Overview and meta-analysis. Asia Pacific Journal of Education, 22, 95-105. https://doi.org/10.1080/0218879020220110
Campbell, T. (2021). Examining how middle grade mathematics students seize learning opportunities through conflict in small groups. Mathematical Thinking and Learning. https://doi.org/10.1080/10986065.2021.1949529
Shi, Y. C., Shen, X.M., Wang, T., Cheng, L. & Wang, A.C. (2021). Dialogic teaching of controversial public issues in Chinese middle school.  Learning Culture and Social Interaction, 30. https://doi.org/10.1016/j.lcsi.2021.100533
Newman, R. (2016). Working talk: developing a framework for the teaching of collaborative talk. Research Papers in Education, 31(1), 107–131. https://doi.org/10.1080/02671522.2016.1106698
Rogat, T. K., & Adams-Wiggins, K. R. (2015). Interrelation between regulatory and socioemotional processes within collaborative groups characterized by facilitative and directive other-regulation. Computers in Human Behavior, 52, 589-600. https://doi.org/10.1016/j.chb.2015.01.026


09. Assessment, Evaluation, Testing and Measurement
Paper

Examinee Timing Indicators, Measured on a Continuous Scale. Some Insights from E-TIMSS 2019 in Mathematics

Elena Papanastasiou1, Michalis Michaelides2

1University of Nicosia, Cyprus; 2University of Cyprus

Presenting Author: Papanastasiou, Elena

When participating in assessments, it is assumed that examinees have invested effort to perform well; otherwise, scores will not reflect their true ability and will not be valid indicators of their proficiency (Baumert & Demmrich, 2001; Wise, 2015). However, lack of motivation and effort during the test-taking process, creates a threat to the validity of test outcomes, especially with International Large-Scale Assessments (Rutkowski & Wild, 2015).

Response-time data from computerized tests have enabled researchers to study test-taking effort, e.g. by identifying respondents who respond rapidly before a certain time point. However, there is a need to move beyond the examination of rapid responses through thresholds, since the time needed to respond to a test item is dependent on various factors, including ability and test-taking behaviors. Consequently, it is argued that to be interpreted appropriately, response times should be examined in relation to examinee’s performance at the item level.

The purpose of this study is to examine two novel response-based, indicators of test-taking behaviors that utilize a combination of examinee response and process (timing) data to better understand and describe test-taking effort in online assessments. These indicators, which have been named “Unsuccessful time management” and “Successful time management” will be empirically estimated with data from the fourth-grade e-TIMSS 2019 mathematics assessment. This study further aims to examine these variables in relation to achievement benchmarks, student background characteristics such as attitudes towards mathematics, confidence in mathematics, gender, as well as overall achievement. The ultimate goal of these analyses is to try to obtain further insights on examinees who participate in online assessments through the use of their timing data.


Methodology, Methods, Research Instruments or Sources Used
The sample utilized in the study was that of grade 4 students from the USA who had participated in e-TIMSS 2019. The sample included 10029 students, of which 49.44% were female. The average age of the students was 10.29 years of age (SD=0.43)
To calculate the indicators of the current study, the average time spent on an item was first calculated for each test item separately. At a second stage, a deviation score was calculated for each student who was administered item i, by subtracting the average sample screen time for item i from the students’ time for the same item. Based on these deviation scores, a cumulative indicator was calculated as follows:
1) For items that were omitted or were answered incorrectly in less time than average, this negative timing difference was added to the Unsuccessful Time Management indicator for the examinee. Therefore, this indicator represents the sum of the unused time that was spent on test items that were answered incorrectly indicating that most likely, the students made less than adequate effort to answer them correctly.  
2) For items that were answered correctly in less time than average, this negative timing difference was added to the Successful Time Management indicator for the examinee. This indicator represents the sum of the unused time that was spent on correct answers, indicating that most likely the students were either already proficient on the specific content and thus did not need additional time to correctly respond to those items, or that the correct answer was a consequence of lucky guess.
Overall, 86.459% of the participants utilized less time than average on at least one item of their incorrect responses. This resulted in an average of 320.881 (sd=166.792) seconds of unused time for those examinees.  Of the 48.489% of the participants who utilized less time than average on at least one of their correct responses, had an average of 20.617 seconds (sd=17.576) of unused time. The correlation between these two indicators was 0.29 (se=0.01)

Conclusions, Expected Outcomes or Findings
The results of this study showed that when examining these indicators by benchmark level, as the benchmark levels increase, the successful time management indicator increased, while the unsuccessful time management indicator decreased. Also, students who spent less time in incorrect answers tended to be in the lower benchmarks. The correlation between the Successful Time Management indicator and achievement equaled r=0.25 (se=0.01), while the correlation between the Unsuccessful Time Management indicator and achievement equaled r=-0.08 (se=0.02). So, the students with higher levels of achievement tended to have more unused time on their correct answers (thus, most likely being an indicator of mastery of the test content), and had less unused time for their incorrect answers. This indicated that they generally struggled more with such items; however, this relationship was very small.  

Further analyses found that 89.50% of the students who reached all test items, had the most amount of unused time. Most likely, this occurred while trying to ensure that they had time to complete the test.  These were also the students who had the highest average achievement (M=537.75, SD=84.80). The students who ran out of time were the ones who had the least amount of unused time. These are most likely students who spent more time than average on most items, which resulted in their running out of time in the end.  Finally, the 5.36% of the students that stopped responding, were most likely the students who made the least amount of effort and had the lowest average achievement (M=482.50, SD=81.59) which further verifies their low effort on the test.

Overall, the results of this study revealed that both indicators might provide additional insights related to examinee test-taking effort and characteristics, when conditioned on the accuracy of their responses. However, more research is needed to understand these indicators more comprehensively.

References
Baumert, J., & Demmrich, A. (2001). Test motivation in the assessment of student skills: The effects of incentives on motivation and performance. European Journal of Psychology of Education, 16(3), 441-462. https://doi.org/10.1007/BF03173192
Rutkowski, D., & Wild, J. (2015). Stakes matter: Student motivation and the validity of student assessments for teacher evaluation. Educational Assessment, 20(3), 165-179. https://doi.org/10.1080/10627197.2015.1059273
Wise, S. L. (2015). Effort analysis: Individual score validation of achievement test data. Applied Measurement in Education, 28(3), 237-252. https://doi.org/10.1080/08957347.2015.1042155


09. Assessment, Evaluation, Testing and Measurement
Paper

Digital Media Use and Sleep in Young Children: Insights from Advances in Long-Term ECG Monitoring of Toddlers

Marina Eglmaier1, Sigrid Hackl-Wimmer1, Manuela Paechter1, Helmut Karl Lackner2, Ilona Papousek1, Lars Eichen1

1University of Graz, Austria; 2Medical University of Graz

Presenting Author: Eglmaier, Marina

The availability of digital media devices such as smartphones, tablets, notebooks, etc. in households has become so commonplace today that we hardly think about our media use and its consequences. This also seems to be true for households with children. Even very young children come in contact with and use these devices from an early age on. For example, a study in the UK reported that children started using touchscreen media (smartphones, tablets) as early as six months of age (Cheung et al., 2017). As for the duration of media use, an Austrian study of parents with children up to the age of six years found that one-third of the children use digital media every day and around fifty percent use them several times a week (Institut für empirische Sozialforschung [IFES], 2020). The numbers are similar across Europe (e.g., Germany see Kieninger et al., 2020; UK see Bedford et al., 2016; France see Cristia & Seidl, 2015; Italy see Chindamo et al., 2019).

Parents face various educational challenges and must deliberate how to manage and regulate their children’s media use. For the investigation of parents’ educational strategies and behaviors, “parental mediation theory” has proven to be a valuable theoretical framework. Within this framework, the present study focused on children’s sleep and parents’ mediation of media use. The prevalence of digital media presents parents with several challenges, including the duration and frequency of media use but also concerns about possible harmful effects of media use itself. In this regard, parental mediation theory describes strategies parents use to minimize potentially harmful consequences of media use (Clark, 2011; Valkenburg et al., 1999).

One important area of research concerns the effects of media use on children’s development and health. In this context, sleep is an important variable. During the first years of life, sufficient and restful sleep is of particular importance, because sleep is essential for developmental processes (El-Sheik & Sadeh, 2015) such as neuronal and cognitive development processes (Ednick et al., 2009). Research suggests harmful effects of media use on children’s sleep (e.g., Hackl-Wimmer et al., 2021). Ensuring that children get sufficient and restful sleep is an important educational task for parents. Toddlers’ sleep may be influenced by a variety of factors, including digital media use (Hackl-Wimmer et al., 2021). There are diverse methods for quantifying sleep quantity and quality. However, research on children’s sleep suffers from methodological problems. Studies on young children’s sleep and media use mainly employ subjective data such as parent questionnaires to assess children’s sleep (e.g., Chindamo et al., 2019).

An approach that is often used in medicine is polysomnography (PSG). PSG comprises the recording of several physiological functions (e.g., heart rate and brain waves) in a sleep laboratory for clinical purposes. However, the methodology is not suitable for recording children’s sleep in daily situations at home. Therefore, we used and further developed a more practical approach, namely ECG (electrocardiogram) recordings with small portable devices. ECG is the recording of the electrical activity of the heart. ECG data allows the measurement of heart rate (HR) and the calculation of several parameters of heart rate variability (the variation in time between consecutive heartbeats). Due to technological progress, ECG data can be recorded using portable devices over a period of 24 hours or more. These devices are also suitable for studies with young children and allow recording of HR during sleep at home in the child’s familiar environment.

The aim of this study is to examine whether toddlers’ use of smartphones and audio media is related to their sleep quality (quantified as HR during restless sleep phases).


Methodology, Methods, Research Instruments or Sources Used
In the present study, two methodological approaches and their intertwining are used: a questionnaire on parental mediation behavior, children’s media use, and other variables plus long-term ECG monitoring.
The questionnaire included several types of media (e.g., smartphone and audio media) for which the duration of use on weekdays and the weekend was asked. Furthermore, parents were asked what their objectives were for their children’s smartphone and audio media use and for which activities their children used these devices.
To investigate toddlers’ sleep quality, long-term ECG monitoring was performed for approximately 30 hours. Data collection was performed as part of a field study at crèches in Austria and started in the morning at the crèches. The ECG device used is equipped with an integrated 3D acceleration sensor that provides information about body position and body movement. For the duration of the ECG measurement, parents and daycare educators were asked to keep an activity log to record the children’s activities with begin and end times (e.g., sleep during the day and night and other activities such as mealtimes and media use). Processing of the ECG Data and the quantification of sleep quality involved several steps. The two main steps comprised the following: First, ECG data, acceleration sensor data and the activity log recordings were used to determine restful and restless sleep phases. Major determinants for restful sleep are a lying position, little body movement and a calm, steady respiration pattern. Phases of restful sleep had to last at least ten consecutive minutes to be classified as restful sleep for further analysis. Otherwise, the sleep phases were classified as restless sleep. In the second step, heart rate (HR) was calculated for restless sleep. The statistical analysis comprised partial correlations to investigate potential relationships between media use and HR. The analysis included the children’s age (in months) and were calculated separately for smartphone and audio media use on weekdays, the weekend and the average weekly media use. Additionally, descriptive statistics are reported on the activities and objectives for the use of these devices.

Conclusions, Expected Outcomes or Findings
The results showed that smartphone use is associated with poorer night’s sleep (i.e., higher HR during restless sleep). However, audio media use is associated with more favorable sleep (i.e., lower HR during restless sleep), indicating that the investigation of media effects benefits from differentiating between media types.
The joint consideration of physiological data and parents’ educational behavior gives more insights into possible causes. Regarding the objectives of media use, parents most often reported that smartphones and audio media are used by their children for entertainment purposes or out of boredom. While parents most often reported that their children used the smartphone for activities such as watching movies, listening to music, and playing educational games, audio media were mainly used for listening to music and books. A possible explanation for the present results could be that smartphone use is related to sustained arousal due to its interactive component and the emitted blue light. On the other hand, assuming that calming content is played, using audio media to listen to music and books could help children to relax and unwind.
However, the possible selectivity of the sample must be taken into account. Media use was lower compared to other studies (e.g., Cheung et al., 2017), as some toddlers did not use smartphones or audio media at all.
In conclusion, not every type of media is detrimental to children’s sleep, so further research on media content is needed. ECG monitoring during the use of different types of media content allows the detection of psychophysiological processes that young children are unable to reflect and report on. Implications for education and development comprise the selection of media content appropriate to the situation, meaning arousing or exciting content for playtime, for example, and calming content for relaxing situations such as before bedtime.

References
Bedford, R., Saez de Urabain, I.R., Cheung, C.H.M., Karmiloff-Smith, A., & Smith, T. J. (2016). Toddlers’ fine motor milestone achievement is associated with early touchscreen scrolling. Frontiers in Psychology, 7, 1108.

Cheung, C.H.M., Bedford, R., Saez De Urabain, I.R., Karmiloff-Smith, A., & Smith, T.J. (2017). Daily touchscreen use in infants and toddlers is associated with reduced sleep and delayed sleep onset. Scientific Reports, 7, 46104.

Chindamo, S., Buja, A., DeBattisti, E., Terraneo, A., Marini, E., Gomez Perez, L.J., Marconi, L., Baldo, V., Chiamenti, G., Doria, M., Ceschin, F., Malorgio, E., Tommasi, M., Sperotto, M., Buzzetti, R., & Gallimberti, L. (2019). Sleep and new media usage in toddlers. European Journal of Pediatrics, 178(4), 483–490.

Clark, L.S. (2011). Parental mediation theory for the digital age. Communication Theory, 21, 323–343.

Cristia, A., & Seidl, A. (2015). Parental reports on touch screen use in early childhood. PloS ONE, 10(6), e0128338.

Ednick, M., Cohen, A.P., McPhail, G.L., Beebe, D., Simakajornboon, N., & Amin, R.S. (2009). A review of the effects of sleep during the first year of life on cognitive, psychomotor, and temperament development. Sleep, 32(11), 1449–1458.

El‐Sheikh, M., & Sadeh, A. (2015). I. Sleep and development: Introduction to the monograph. Monographs of the Society for Research in Child Development, 80(1), 1–14.

Hackl-Wimmer, S., Eglmaier, M.T.W., Eichen, L., Rettenbacher, K., Macher, D., Walter-Laager, C., Helmut K.L., Papousek, I., & Paechter, M. (2021). Effects of touchscreen media use on toddlers’ sleep: Insights from longtime ECG monitoring. Sensors, 21(22), 7515.

Institut für empirische Sozialforschung. (2020). Die Allerjüngsten (0-6 J.) & digitale Medien [The very young (0–6 years) & digital media]. https://www.saferinternet.at/fileadmin/redakteure/Projekt-Seiten/Safer_Internet_Day/Safer_Internet_Day_2020/Praesentation_PK_Safer_Internet_Day_2020.pdf

Kieninger, J., Feierabend, S., Ratgeb, T., Kheredmand, H., & Glöckler, S. (2020): miniKIM-Studie 2020. Kleinkinder und Medien: Basisuntersuchung zum Medienumgang 2- bis 5-Jähriger in Deutschland. www.mpfs.de/fileadmin/files/Studien/miniKIM/2020/lfk_miniKIM_2020_211020_WEB_barrierefrei.pdf

Valkenburg, P.M., Krcmar, M., Peeters, A.L., & Marseille, N.M. (1999). Developing a scale to assess three styles of television mediation: “Instructive mediation,” “restrictive mediation,” and “social coviewing”. Journal of Broadcasting & Electronic Media, 43(1), 52–66.
 
5:15pm - 6:45pm09 SES 08 A JS: Assessment and Curriculum Reforms: Understanding Impacts and Enhancing Assessment Literacy
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Sarah Howie
Joint Paper Session, NW 09 and NW 24
 
09. Assessment, Evaluation, Testing and Measurement
Paper

The Impact of Curriculum and Assessment Reform in Secondary Education on Progression to Mathematics Post-16

Joanna Williamson, Carmen Vidal Rodeiro

Cambridge University Press & Assessment, United Kingdom

Presenting Author: Williamson, Joanna

In most education systems around the world there is a strong case for increasing the mathematical skills of young people beyond the age of 16. Evidence from international student surveys such as PISA show that, for example, in the European Union about 23% of 15 year-olds in 2018 did not reach basic levels of skills in mathematics (OECD, 2019).

Incentivising young people to continue to study mathematics post-16 should not only help satisfy demands for mathematically and quantitatively skilled people in the labour market, but more generally help ensure that young people have the knowledge to succeed in an increasingly technological society (e.g., Mason et al. 2015; Smith, 2017; European Commission, 2022). Moreover, young people with good mathematical knowledge will benefit from the quantitative, analytical and problem-solving skills mathematics qualifications develop, which will support attainment in other disciplines, particularly those with a significant quantitative component.

In England, unlike other countries in Europe and the rest of the world, the study of mathematics post-16 is not compulsory for all students. A recent study comparing upper secondary mathematics participation in 24 countries (Hodgen & Pepper, 2019) showed that in England fewer than 20% of students persist with mathematics education in any form beyond the age of 16. In contrast, 18 countries have post-16 participation rates higher than 50%, with rates at more than 95% in eight of them, including Sweden, Finland, Japan and Korea.

One reason for low progression to post-16 mathematics in England could be a longstanding concern about how well the mathematics qualifications offered to students aged 14 to 16 (GCSE, General Certificate of Secondary Education) prepare students for advanced study in mathematics, with algebra frequently mentioned as the key problem (e.g., Wiliam et al., 1999; Hernandez-Martinez et al., 2011; Noyes & Sealey, 2011; Rigby 2017).

To increase uptake of mathematics and to improve students’ mathematical skills at all levels takes effort, funding and a range of interventions. In England, GCSE qualifications (in all subjects) were recently reformed “to ensure they are rigorous and robust, and give students access to high quality qualifications which match expectations in the highest performing jurisdictions”. For mathematics specifically, the new GCSE “focuses on ensuring that every student masters the fundamental mathematics that is required for further education and future careers”, and, in particular, aims to “be more demanding” and “provide greater challenge for the most able students” (Gove, 2013).

There were concerns that the new mathematics GCSE could deter students from post-16 mathematics (e.g., by reducing their confidence) and unintentionally reduce uptake (ALCAB, 2014; Lee et al., 2018). A decrease in post-16 mathematics entries in 2019 leant weight to these fears but, to date, there has been little published research on how the reform of GCSE mathematics has affected mathematics learning and progression to post-16 study. One of the few studies to consider this issue in detail was carried out by Howard and Khan (2019). Their qualitative research found that, in general, teachers were positive about the extent to which the reformed GCSE prepared students for post-16 mathematics. Their participants also reported that the reformed GCSE had positive implications beyond studying mathematics and that it would support students studying other subjects with mathematical content. Grima and Golding (2019) and Pearson Education (2019) reported similar findings from qualitative research in schools.

The current research aims to complement the qualitative analyses of existing research described above, by approaching the question of how the reform of GCSE mathematics has affected progression to and performance in post-16 mathematics and maths-related subjects via quantitative analysis of entries and performance data.


Methodology, Methods, Research Instruments or Sources Used
This work addressed the research question via quantitative analysis of national results data available in the National Pupil Database (NPD). The NPD is a longitudinal database for children in schools in England, linking pupil characteristics to school and college learning aims and attainment. It holds individual pupil level attainment data for pupils in all schools and colleges who take part in the tests/exams, and pupil and school characteristics (e.g., age, gender, ethnicity, special educational needs, eligibility for free school meals, etc.) sourced from the School Census for state schools only.

Candidates who completed a GCSE mathematics in each of the years from 2014 to 2017 (2014-2016 pre-reform; 2017 post-reform) were followed up for two years and the post-16 qualifications they achieved included in the research. For example, students who achieved a GCSE in mathematics in 2015 were followed up in 2016 and 2017 and the qualifications achieved identified. Later cohorts could not be included because end-of-course exams were cancelled in 2020 and 2021 due to the Covid pandemic.

Progression from GCSE mathematics pre- and post-reform to the following qualifications was then investigated: progression to a range of different post-16 mathematics qualifications (core maths, maths, further maths); and progression to post-16 maths-related subjects (Biology, Chemistry, Physics, Economics, Psychology).

Descriptive statistics on the number and proportion of GCSE mathematics students progressing to the qualifications listed above (overall and by GCSE grade), pre-reform (2014-2016) and post-reform (2017), were produced and compared. Marginal grade distributions for all qualifications, overall and by GCSE mathematics grade, pre-and post-reform were also produced.

To further explore the effect of GCSE reform on progression to and performance in post-16 maths or maths-related subjects multilevel logistic regression analyses were carried out. The regression analyses differ from the descriptive analyses in that they take into account students’ background characteristics when looking at the impact of GCSE reform on progression to or performance post-16.
The outcomes modelled in the regression analyses were as follows:
- progression to post-16 maths (any qualification, core maths, maths, further maths);
- progression to maths-related subjects (Biology, Chemistry, Physics, Economics, Psychology);
- achievement of specific grade thresholds in post-16 maths qualifications, and in maths-related subjects.

The independent variables in the regression models included: year the GCSE maths was achieved (i.e., an indicator of pre- or post-reform), GCSE grade, gender, overall prior attainment at school, level of socio-economic deprivation and type of school attended (e.g., private vs. state).

Conclusions, Expected Outcomes or Findings
Contrary to fears about reduced uptake, this research showed that progression to mathematics post-16 generally increased following the recent reforms to secondary level mathematics qualifications. The uptake of core maths and further maths increased independently of the grade achieved by students in their mathematics GCSE. However, for post-16 maths (i.e., the mainstream mathematics qualification, not core maths or further maths), the increase in uptake was higher amongst those who achieved top grades in their mathematics GCSE than for students with just a pass. Performance in all three post-16 maths qualifications was, in general, lower post-reform – in contrast to teacher expectations. However, it should be taken into account that students taking the reformed GCSE would have also taken newly reformed post-16 qualifications, and it is known that performance tends to dip in the first years of a new qualification.

The research also found that progression to five maths-related subjects (Biology, Chemistry, Physics, Economics, and Psychology) was higher post-reform than pre-reform. Compared to pre-reform years, performance in these maths-related subjects was generally worse post-reform. In particular, in science subjects (Biology, Chemistry and Physics) performance was very similar pre- and post-reform for students with the very top GCSE grades in mathematics, but it was lower post-reform for students with lower grades in the GCSE.

In conclusion, this research has shown that some of the aims of the curriculum and assessment reform in secondary mathematics (in particular, increasing uptake of mathematics post-16) seem to have been fulfilled. As with any reforms, changes take time to bed in, but this research has raised important issues for the mathematics education community as countries seek to increase the numbers of people that are well prepared to apply their mathematical knowledge and skills not only in further education and the workplace, but also in society more generally.

References
ALCAB (2014). Report of the ALCAB panel on Mathematics and Further Mathematics. Hemel Hempstead: The A Level Content Advisory Board.

European commission (2022). Increasing achievement and motivation in mathematics and science learning in schools. Luxembourg: European Education and Culture Executive agency.

Grima, G., and Golding, J. (2019). Reformed GCSE Mathematics qualifications: teachers’ views of the impact on students starting A levels. Ofqual Educational Assessment Seminar Scarman House, Warwick University.

Gove, M. (2013). Ofqual policy steer letter: reforming Key Stage 4 qualifications. [Letter from the Secretary of State for Education to Ofqual's Chief Regulator]. https://www.gov.uk/government/publications/letter-from-michael-gove-regarding-key-stage-4-reforms.

Hernandez-Martinez, P., Williams, J., Black, L., Davis, P., Pampaka, M., and Wake, G. (2011). Students' views on their transition from school to college mathematics: rethinking ‘transition’ as an issue of identity. Research in Mathematics Education, 13(2), 119-130.

Hodgen, J., and Pepper, D. (2019). An international comparison of upper secondary mathematics education. London: Nuffield Foundation.

Howard, E., and Khan, A. (2019). GCSE reform in schools: The impact of GCSE reforms on students’ preparedness for A level maths and English literature. Coventry: Office of Qualifications and Examinations Regulation.

Lee, S., Lord, K., Dudzic, S., and Stripp, C. (2018). Investigating the Impact of Curriculum and Funding Changes on Level 3 Mathematics Uptake. Trowbridge: Mathematics in
Education and Industry.

Mason, G., Nathan, M. and Rosso, A. (2015). State of the nation: a review of evidence on the supply and demand of quantitative skills. London: British Academy and NIESR.

Noyes, A., and Sealey, P. (2011). Managing learning trajectories: the case of 14–19 mathematics. Educational Review, 63(2), 179-193.

Pearson Education (2019). GCSE Mathematics Qualification – UK Regulated qualification efficacy report. London: Pearson UK.

OECD (2019). PISA 2018 Results (Volumes I to IV). Paris: OECD (INFULL)

Rigby, C. (2017). Exploring students’ perceptions and experiences of the transition between GCSE and AS Level mathematics. Research Papers in Education, 32(4), 501-517.

Smith, A. (2017). Review of Post-16 Mathematics. London: Department for Education

Wiliam, D., Brown, M., Kerslake, D., Martin, S., and Neill, H. (1999). The transition from GCSE to Alevel in mathematics: a preliminary study. Advances in Mathematics Education, 1(1), 41-56.


09. Assessment, Evaluation, Testing and Measurement
Paper

Quality of an Assessment Task Developed by a Preservice Mathematics Teacher: The Role of Feedback from Agencies

Gözde Kaplan-Can, Erdinç Çakıroğlu

Middle East Technical University, Turkiye

Presenting Author: Kaplan-Can, Gözde

Assessment literacy has been a repetitious term in the assessment literature since it was popularized by Stiggins (1991) (Koh et al., 2018). Assessment literacy is mainly related to teachers’ assessment practices and skills in selecting, designing, or using assessments for various purposes (Stiggins, 1991). The term also defines the knowledge of principles behind selecting, adapting, or designing assessment tasks, judging students’ work and using obtained data to enhance their learning (Koh et al., 2018).

Mathematical thinking arises when students work on problem-like tasks (Jones & Pepin, 2016). However, traditional mathematics instruction and assessment mainly emphasize memorization instead of creative thinking or reasoning. Some research also supports this claim (see Jäder et al., 2015; Stein et al., 2009; Stevenson & Stigler, 1992; Vacc, 1993). On the other hand, such instruction and assessment fail to enhance students’ competencies in mathematics and lead them to follow rote learning (Hiebert, 2003). Hence, students must face challenging and unfamiliar problems that activate their higher-order thinking skills (HOTS).

HOTS require making explanations, interpretations, and decision-making. Students with HOTS can learn how to improve their success and reduce their weaknesses (Tanujaya, 2016). Hence, mathematics teachers should be knowledgeable about HOTS and how to enhance these skills to carry out quality mathematics instruction and assessment. For this reason, teacher education programs must support preservice mathematics teachers (PMTs) to understand the significance of engaging students with higher-level tasks.

Several categorizations are provided for HOTS in the field of education. Bloom’s taxonomy proposed that analysis, synthesis, and evaluation levels include HOTS (McDavitt, 1994). Stein et al. (1996) described a higher level of cognitive demand as doing mathematics or the use of procedures with connection to concepts, understanding, or meaning. A national categorization for mathematics competence levels was also provided in a Monitoring and Evaluating Academic Skills Study (MEASS) project. This framework comprises four categories, and the last two categories were devoted to students’ higher-order thinking skills (MoNE, 2015). The framework will be introduced during the presentation.

Although challenging tasks can promote students’ HOTS, research has shown that designing worthwhile mathematical tasks is not trivial (Leavy & Hourigan, 2020). Besides, preservice teachers (PT) cannot create such tasks (Silver et al., 1996). This is predictable since they have fewer opportunities to write tasks in their teacher education programs (Crespo & Sinclair, 2008). Significantly less is known about the PT’s ability to develop mathematical tasks (Crespo, 2003). Besides, there are fewer studies on how to help PTs to realize and discuss the quality of their mathematical tasks (Crespo & Sinclair, 2008). Thus, professional development (PD) studies must be conducted to increase PMTs’ capacity to develop tasks.

The study’s purpose is to improve the quality assessment task-writing skills of PMTs through feedback provided by different agencies such as researchers, peers, and students. It specifically aimed to answer the research question, “how does feedback provided by different agencies improve the quality of assessment tasks developed by PMTs.” This study also aimed to introduce the framework for mathematics competence levels (MoNE, 2015) to European society. Feedback was defined by Eggen and Kauchak (2004) as the information that teachers or students receive with regard to the accuracy or relevancy of their work through classroom practices. In this study, this term was used to refer to the information preservice mathematics teachers receive from researchers, students, and their peers about the quality and cognitive levels of their tasks.


Methodology, Methods, Research Instruments or Sources Used
The study’s data were drawn from design-based research that aimed (1) to examine and improve the senior preservice middle school mathematics teachers’ (PMT) understanding and development of cognitively demanding quality mathematical tasks which aim to assess students’ learning and (2) to develop, test and revise a conjecture map and professional development sequence serving for the first purpose. The research was conducted in an elective course that required a weekly meeting of three-course hours. Ten fourth-year PMT enrolled in a four-year middle grades (grades 5-8) mathematics teacher education program at a public university in Türkiye participated in the course. The course consisted of two phases, including several PD activities. In the first phase, PMTs investigated sample tasks and criticized and revised them, considering their quality and cognitive demand. They conducted an independent study in the second phase. They developed two cognitively demanding quality assessment tasks. The development process of both tasks was a cyclic process that required revisions considering the researchers’, peers’, and students’ feedback. This study focused on the development process of a contextual task written by one of the preservice teachers (Mert).

The task development process involved four cycles and was based upon an iterative task design cycle suggested by Liljedahl et al. (2007), consisting of predictive analysis, trial, reflective analysis, and adjustment. Our processes emphasized the importance of feedback to re-develop the task and reflect on experiences. Hence, each cycle ended with a new version of the tasks. PMTs wrote the first version of their contextual tasks in the baseline. In task development cycle 1 (TDC1), the tasks were peer-reviewed by pairs of PMTs and criticized during the class discussion regarding their cognitive demand and quality. The researcher provided written feedback on the second version of the tasks in TDC2. In TDC3, PMTs interviewed middle school students using the third version of their task, while in TDC4, they implemented the tasks in real classrooms. They shared students’ thinking with their peers in the PD course after TDC3 and TDC4. They revised their task considering what they noticed about students’ thinking or difficulties and their peers’ feedback and prepared the last versions. Mert’s reflections at the end of each cycle, his reflections on the interviews with students and class implementation, his project report, and the post-interview provided the data for this study.

Conclusions, Expected Outcomes or Findings
Mert developed a cognitively demanding quality assessment task that involved the context in which a car turned around a center of rotation and consisted of two multiple-choice questions in the baseline. The first question asked students to compare the speeds of all wheels shown in the figures. The second question asked to choose the correct interpretation of the ratio of the front-right-wheel to the rear-left-wheel. Mert categorized its cognitive level as the highest level 4 and provided reasonable explanations.

In TDC1, peers criticized and gave feedback about the pedagogical and mathematical task qualities such as the task's clarity, appearance, cognitive level, and mathematical language. Mert changed the figures and the language he used in the second question. He asked to compare the rear-wheels’ speeds instead of comparing the speeds of a front- and a rear-wheel. However, he thought this second version's cognitive level was slightly weakened. In TDC2, Mert revised the second question’s options, made changes in its appearance considering the researcher’s feedback, and categorized the task as more qualified. He made radical changes in his task in TDC3. Students’ perspectives guided him to change the figure again and question types from multiple-choice to open-ended. He completely changed the second question and asked the difference between the distance traveled by the right- and left-rear-wheel. He also wanted students to support their explanation using algebraic expressions. He did not revise his task in TDC5 since “it was sufficient to be a cognitively demanding quality task” (Mert). In sum, each cycle contributed to the task’s quality. Having the opportunity to enact the task to the students, especially in a one-to-one setting, made the greatest contribution to the task’s pedagogical and mathematical quality. Hence this process revealed the significance of assessing students’ responses to realize the quality of tasks (Norton & Kastberg, 2012).


References
Acknowledgment: The paper was supported by the H2020 project MaTeK, no. 951822.

Crespo, S. (2003). Learning to pose mathematical problems: Exploring changes in preservice teachers’   practices. Educational Studies in Mathematics, 52(3), 243–270.
Crespo, S., & Sinclair, N. (2008). What makes a problem mathematically interesting? Inviting prospective teachers to pose better problems. Journal of Mathematics Teacher Education, 11(5), 395–415.
Hiebert, J. (2003). What research says about the NCTM standards. In J. Kilpatrick, G. Martin, & D. Schifter (Eds.), A research companion to principles and standards for school mathematics (pp. 5–26). Reston, Va.: NCTM.
Jäder, J., Lithner, J., & Sidenvall, J. (2015). A cross-national textbook analysis with a focus on mathematical reasoning–The opportunities to learn. Licentiate thesis, Linköping University.
Jones, K., & Pepin, B. (2016). Research on mathematics teachers as partners in task design. Journal of Mathematics Teacher Education, 19(2), 105–121.
Koh, K. H., Burke, L. E. C., Luke, A., Gong, W. & Tan, C. (2018). Developing the assessment literacy of teachers in Chinese language classrooms: A focus on assessment task design. Language Teaching Research, 22(3), 264–288. https://doi.org/10.1177/13621688166843
Leavy, A., & Hourigan, M. (2020). Posing mathematically worthwhile problems: developing the problem‑posing skills of prospective teachers. Journal of Mathematics Teacher Education, (23)4 p341-361. https://doi.org/10.1007/s10857-018-09425-w
Liljedahl, P., Chernoff, E., & Zazkis, R. (2007). Interweaving mathematics and pedagogy in task design: A tale of one task. Journal of Mathematics Teacher Education, 10(4–6), 239–249.
McDavitt, D. S. (1994). Teaching for understanding: Attaining higher order learning and increased achievement through experiential instruction. Technical Report. Retrieved from https://files.eric.ed.gov/fulltext/ED374093.pdf
Ministry of National Education [MoNE] (2015). Akademik becerilerin izlenmesi ve değerlendirilmesi. Retrieved from https://abide.meb.gov.tr/
Norton, A., & Kastberg, S. (2012). Learning to pose cognitively demanding tasks through letter writing.   Journal of Mathematics Teacher Education, 15, 109–130.
Silver, E. A., Mamona-Downs, J., & Leung, S. S. (1996). Posing mathematical problems: An exploratory study. Journal for Research in Mathematics Education, 27, 293–309.
Stevenson, H. W., & Stigler. J. W. (1992). The learning gap: Why our schools are failing and what we can learn from Japanese and Chinese education. NY: Summit Books.
Stiggins, R.J. (1991). Assessment literacy. Phi Delta Kappan, 72, 534−539.  
Tanujaya, B. (2016). Development of an instrument to measure higher order thinking skills in senior high school mathematics instruction. Journal of Education and Practice, 7(21), 144-148.
Vacc, N. (1993). Questioning in the mathematics classroom. Arithmetic Teacher, 41(2), 88–91.


09. Assessment, Evaluation, Testing and Measurement
Paper

Teachers’ Conceptions of Large-scale Assessment: Implications for Assessment Literacy

Serafina Pastore

University of Bari, Italy

Presenting Author: Pastore, Serafina

On the backdrop of the recent educational data movement (Marsh et al., 2015; Schildkamp et al., 2019), teachers are expected to use different kind of data to inform their instructional decision-making. However, different studies have already demonstrated that teachers are reluctant to change their assessment practices (and conceptions), especially when new practices are framed within the rationale of institutional reforms (Boardman & Woodruf, 2004; Brown, 2004; Klieger, 2016; Remesal, 2007), or in new scenarios such as those that emerged during the COVID-19 pandemic. Despite the recognition of the importance of assessment, some studies (Hopfenbeck, 2015; Looney et al., 2018) have also identified the lack of modernisation and have indicated that assessment has not changed materially. Rcent studies on the use of assessment data for decision-making and teaching practice have showed that although teachers recognise the importance of using data gathered through assessment, sometimes, they are not able to manage several sources of information including data from LSAs (Farrell & Marsh, 2016; Mandinach & Gummer, 2016; Schildkamp et al., 2014).

While LSAs have been progressively recognised as relevant components of educational accountability systems, teachers’ negative attitudes towards LSA programmes and the lack of assessment literacy have been highlighted (Fullan et al., 2018; Klinger & Rogers, 2011). In this perspective, research evidence (Hopster-den-Ottera et al., 2017; Monterio et al., 2021; Schildkamp et al., 2019) suggests that the identification of practical assessment challenges for teachers, as well as the understanding of teachers’ conceptions of assessment are of paramount importance in order to ensure teacher assessment literacy, teacher professionalism, and effective school improvement.

The present paper, with a focus on the Italian school system, tries to offer new insights for this debate. Despite the increasing interest in researching teachers’ assessment conceptions and in understanding how these conceptions affect the assessment literacy development, in Italy these research topics, unfortunately, are still neglected. Therefore, given the current lack of empirical studies on teachers’ LSA conceptions an exploratory qualitative study has been realized (Creswell, 2014; Strauss & Corbin, 2007).

In Italy, between 2007 and the time of writing this study (2022), only one LSA programme was adopted. This programme is administrated by the Italian National Institute for School Evaluation and Assessment of Learning (INVALSI) which is subject to the Minister of Education. The LSA programme, aligned with the national curriculum, comprises a census-based administration of cognitive tests (2°, 5°, 8°, 10°, and 13° grades) in the subjects of Italian, Mathematics, and English. INVALSI reports examine the quality of the national school system and support the school improvement. Since its introduction, the national LSA programme, however, caused different problems: teachers have attacked and boycotted the LSA programme. Still nowadays, they continue to perceive the INVALSI programme as a means of control for schools, teachers, and students.

During the COVID-19 pandemic, schools and teachers shifted to remote instructional activities and experienced difficulties in navigating new (and old) mechanisms within the extant assessment practice (e.g., marking and grading student work on-line or sharing feedback on assignments). In the school year 2019-2020, the INVALSI programme was not administrated. The teachers’ (and students’) positive reactions at the cancellation of the yearly INVALSI programme contrast with the need for an assessment that should be embedded within the school system and aligned with the aim of improving the school quality (Wiliam, 2013).

The study sought to better understand how Italian teachers conceptualise the LSA and how they use its results, addressing the following research questions:

  1. What do teachers think of the INVALSI programme?
  2. How do teachers use the INVALSI results for their instructional practice and decision-making (at classroom and school level)?

Methodology, Methods, Research Instruments or Sources Used
The present study was guided by the grounded theory interpretative method (Strauss & Corbin, 2007).
A total of 70 teachers from 5 schools in the district of XXXX (details removed to avoid identification) were selected to participate in the study. These schools have the same organization and jointly include grades 1°-5° (primary) and grades 6°-8° (middle). Only teachers of Italian and Mathematics were considered because the INVALSI tests pertain these two content domains.

Data were collected through semi-structured interviews by the author.
Drawing on relevant theoretical and empirical literature to design questions about teachers’ conceptions of the LSA programme, teachers’ experience with INVALSI data, and their instructional responses to data, the semi-structured interview track comprised 10 questions divided in two main sections:
1. Assessment conceptions: Questions in this section sought information on the teachers’ conceptions of LSA, its aims, and values; and
2. Data usage: This section aimed to analyse if, and how, teachers use large-scale data in their instructional practice and decision-making.
Moreover, during the interview, information on attended teacher education paths on educational assessment, and data on socio-demographic variables (e.g., gender, age, years of service) were gathered.
The data analysis followed a three steps process: open coding, axial coding, and selective coding.

The data set for this study is large and so what is presented here is only a selection of main inquiry categories:
1. There are no substantial differences in teachers’ conceptions of assessment: gender, age, and subject matter do not affect answers. The slight differences found in the conceptions of interviewed teachers are related only to the variable of their years of service.
2. Even though participants were prompted to reflect on their answers, the data demonstrate their simplistic conceptions of assessment.
3. Teachers are not able to provide a definition of assessment that goes beyond the mere dimension of measure of student learning. They appear very worried about large-scale assessment. They don’t see the real value of this kind of assessment and are scared about the idea that students’ results can be used for teachers’ performance appraisal and selection. For this reason, most of them admit teaching to the test and cheating, although they recognize these are malpractices.
4. Data relating to the fourth section of the interview reveal a composite scenario. The classroom assessment is frequently performed in a formal way. Teachers tend to not use large-scale-results to review and/or change their instructional practices.

Conclusions, Expected Outcomes or Findings
The relationships among LSA, the teaching-learning process, and the Italian school system are ambiguous and incoherent. While LSA is perceived as disconnected from school and teaching practice, classroom-based assessments are considered not entirely reliable although they provide more information about student learning processes. However, the teachers in this study admitted their assessment illiteracy with respect to some practical aspects (e.g., how to gather valid and robust data in summative assessment). They said that they were not able to read, interpret, understand, and use the data gathered through the INVALSI programme. The major hindrance is the teachers’ conceptions of the LSA programme that is rarely used to refocus and improve teaching for individual students (Herman, 2016). Even though national LSA programmes have largely spread across different countries (Verger et al., 2019), research evidence points how such assessments are sometimes perceived as a threat to the teachers’ practice and professionalism (Emler et al., 2019).
In the Italian school system, there is an urgent need to invest in teachers’ assessment literacy and evaluation culture (Emler et al., 2019; Klinger & Rogers, 2011). The challenge is to allow teachers to reach out with this knowledge and to push the use of assessments forward in a more responsive manner. The teachers’ negative conceptions of LSA and assessment illiteracy can lead to the inappropriate use of INVALSI results over time; it is not surprising that the positive effects of LSA are absent and were not perceived by the interviewed teachers (Cizek, 2001).

Despite the recognition of assessment as relevant components of teacher professionalism, assessment literacy paths are not responsive to teacher learning needs in this area. The increased relevance of data represents a challenge for teachers in terms of data use, decision-making, and public reporting.

References
Boardman, A. G., & Woodruff, A. L. (2004). Teacher change and “high stakes” assessment: What happen to professional development?. Teaching and Teacher Education 20(6): 545-557.
Brown, G. T. L. (2004). Teachers’ conceptions of assessment: Implications for policy and professional development. Assessment in Education 11(3): 301-318. doi:10.1080/0969594042000304609.
Cizek, G. J. (2001). More unintended consequences of high-stakes testing. Educational Measurement: Issues and Practice 20(4): 19-27. doi: 10.1111/j.1745-3992.2001.tb00072.x.
Corbin, J., & Strauss, A. (2007). Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory (3rd ed.). Thousand Oaks, CA: Sage.
Creswell, J. W. (2014). Research Design: Qualitative, Quantitative and Mixed Methods Approaches (4th ed.). Thousand Oaks, CA: Sage.
Emler, T. E., Zhao, Y., Deng, Z., Yin, D., & Wang, Y. (2019). Side effects of large-scale assessments in education. Review of Education 2(3): 279-296. doi: 10.1177/2096531119878964.
Farrell, C. C., & Marsh, J. A. (2016). Contributing conditions: A qualitative comparative analysis of teachers’ instructional responses to data. Teaching and Teacher Education 60(1): 398-412.
Hopster-den Ottera, D., Woolsb, S., Eggena, T. J. H. M., & Veldkamp, B. P. (2017). Formative use of test results: A user’s perspective. Studies in Educational Evaluation 52: 12-23. doi: 10.1016/j.stueduc.2016.11.002.
Klieger, A. (2016). Principals and teachers: Different perceptions of large-scale assessment. International Journal of Educational Research, 75: 143-145. doi: 10.1016/j.ijer.2015.11.006.
Looney, A., Cumming, J., van Der Kleij, F., & Haris, K. (2018). Reconceptualizing the role of teachers as assessors: Teacher assessment identity. Assessment in education: Principle, Policy & Practice 25(5): 442-467.
Mandinach, E. B., &. Gummer, E. S. (2016). Data Literacy for Teachers: Making it Count in Teacher Preparation and Practice. New York, NY: Teachers College Press.
Marsh, J. A., Bertrand, M., Huguet, A. (2015). Using data to alter instructional practice: The mediating role of coaches and professional learning communities. Teachers College Record 117(4): 1-41.
Remesal, A. (2007). Educational reform and primary and secondary teachers’ conceptions of assessment: the Spanish instance. Building upon Black and Wiliam (2015). The Curriculum Journal 18(1): 27-38. doi:10.1080/09585170701292133.
Schildkamp, K. (2019). Data-based decision-making for school improvement: Research insights and gaps. Educational Research 61: 257-273.
Schildkamp, K., Karbautzki, L., & Vanhoof, J. (2014). Exploring data use practices around Europe: Identifying enablers and barriers. Studies in Educational Evaluation, 42: 15-24.


09. Assessment, Evaluation, Testing and Measurement
Paper

Becoming Assessment Literate – Enhancing Teacher Students Assessment Literacy

Johanna Kainulainen, Mirja Tarnanen

University of Jyväskylä, Finland

Presenting Author: Kainulainen, Johanna

Teacher education plays a significant role in developing assessment culture and practices as well as teachers' assessment literacy (e.g. Xu & Brown, 2016; Atjonen et al. 2019). Assessment literacy (AL) consists of e.g. individual’s and community’s awareness of the values and principles related to assessment, as well as information related to the purpose of assessment, assessment targets, strategies and assessment practices (e.g. Atjonen, 2017; Xu & Brown, 2016). It is based on an understanding of learning processes and environments, an understanding of the impact of the individual and the community on them, an understanding of the goals set for learning processes and the monitoring of their realization throughout the entire learning process (e.g. Atjonen, 2017). When considering assessment as support of learning and its importance for learning is recognized, there is a need for teachers to deepen their understanding of the purpose of assessment and implementation of it i.e. to be assessment literate (DeLuca & Braund, 2019).

AL can be viewed as both a teacher's and a learner's competence. Learners' AL is reflected, among other things, in the skills of receiving and utilizing feedback and in the learner's activity in developing their own competence (Atjonen et al., 2019). Assessment should therefore be an interactive process in which the teacher guides and supports not only learning but also the development of the learner's assessment skills, ie his or her learning skills. Teacher education can prepare teacher students for their future work by focusing on assessment that, on the one hand, develops their assessment skills as a student, such as self-assessment skills, and, on the other hand, provides the ability to plan, implement and develop assessment as a teacher. (Boud et al., 2013; Kearney & Perkins, 2014). Teacher education plays an important role in building the foundations for teacher’s AL and as a basis for continuous development throughout a teacher's career (DeLuca & Braund, 2019).

The shaping of teachers’ AL is comprehensively described using the TALiP framework (Teacher Assessment Literacy in Practice, Xu & Brown, 2016). According to the framework, a teacher's AL can be thought of as a complex identity work in which the teacher builds their assessment concepts on the basis of diverse assessment information and pedagogical content information, which in turn is influenced by e.g. curricula. Teachers’ perceptions of assessment are shaped by the influence of the teacher's perception of learning as well as the cognitive dimension and the affective dimension. Assessment concepts are structured as assessment practices and are influenced by, for example, macro-cultural socio-cultural conditions related to the school system and, at the micro-level, for example, the traditions and customs of the school or work community. According to the framework, teachers ’own assessment practices are ultimately the result of a number of compromises that the teacher has to make with his or her own knowledge, perceptions, beliefs, and external pressures. The teacher’s assessment identity is built through the teacher’s experiences, learning attitude and reflection, and inclusion and interaction. At its best, it develops throughout teacher’s professional career. (Xu & Brown, 2016; DeLuca & Braund, 2019)

In this study, we explore how assessment literacy (AL) of teacher students is constructed through self- and group-assessments and self- and group-reflections. The research questions are

RQ1: What kind of challenges and opportunities teacher students reason for when reflecting on their development of assessment literacy?

RQ2: What do teacher students consider most significant impact on in developing their assessment literacy?


Methodology, Methods, Research Instruments or Sources Used
The data was collected in the context of an experimental research-based holistic learning unit as part of the master's studies in special education teacher training and in elementary school teacher training. The learning unit lasted 8 weeks and it consisted of intensive teaching period at university, independent and group learning tasks, and a five-week training period in school context, where teacher students (N=18) were working together in multi-professional teams. Teams can be called multi-professional because, in Finland, elementary school teacher students and special education teacher students study in separate master's programmes and have different eligibility criteria in Finnish educational legislation.

The data consists of elementary school teacher students' and special education teacher students' self-reflections and self-assessments during the learning process in the learning unit and teacher-student teams’ interview data and group reflections collected at the end of the learning unit.  

In this presentation, we examine teacher students' reflections on their assessment literacy based on theory-informed and data-driven content analysis. The data was analyzed using qualitative data-driven and theory-informed content analysis (Vaismoradi et al., 2016). The data were thematised to capture both explicit and underlying reflections on assessment literacy and its development, and also reflections on assessment and evaluation in general. The data analysis consists of three interactive sub processes: generating initial codes based on the qualitative content analysis, reviewing themes, and naming the main categories. The analysis can be also characterized theory-informed because part of the data consisted of reflections, in which students were asked to reflect on their own assessment literacy by mirroring it to the TALiP framework (Xu & Brown, 2016).

Conclusions, Expected Outcomes or Findings
The preliminary findings indicate that teacher students reason for the necessity for continuous reflection and they are readiness to reflect on and develop their assessment literacy from multiple angles in future as well. They are aware of the multidimensionality, importance and challenges of assessment in guiding learning and in instructional support (see Hamre et al., 2007). The students highlighted as challenges, for example, the lack of experience and insufficient preparation during teacher education programme, the comprehensive challenge of the teacher's work and pedagogical expertise, and especially the influence of their own beliefs and experiences when considering assessment. On the other hand, the students' reflections revealed a belief in and favor of development and a desire to constantly develop assessment literacy both individually and in collaboration in school communities. They strongly emphasized the importance of reflection in building their own competence and identifying development needs, as well as in developing their understanding and knowledge base of assessment.
Teacher students consider holistic and participatory ways of learning about assessment (as was done in the learning unit), their own active knowledge building, and continuous reflection and questioning of their own beliefs to be the most significant for developing their assessment literacy.

On the basis of the results, it may be concluded that teacher education should offer to teacher students opportunities to build their assessment literacy consciously throughout their studies and provide comprehensive support for its development in authentic environments.

In our presentation, we will also discuss how assessment literacy could be more closely bridged with teacher students’ learning processes and professional development and what kinds of pedagogical practices would be relevant for developing assessment literacy in teacher education. - Most of the data will be analyzed in spring 2023 and the results will be discussed based on that.

References
Atjonen, P. (2017). Arviointiosaamisen kehittäminen yleissivistävän koulun opettajien koulutuksessa – Opetussuunnitelmatarkastelun virittämiä näkemyksiä [Developing assessment literacy in general education school teacher training - Insights from curriculum review]. Teoksessa V. Britschgi & J. Rautopuro (toim.) Kriteerit puntarissa. Suomen kasvatustieteellinen seura. Kasvatusalan tutkimuksia 74, 132–169.

Atjonen, P., Laivamaa, H., Levonen, A., Orell, S., Saari, m., Sulonen, K., Tamm, M., Kamppi, P., Rumpu, N., Hietala, R. & Immonen, J. (2019). ”Että tietää missä on menossa”. Oppimisen ja osaamisen arviointi perusopetuksessa ja lukiokoulutuksessa [Assessment of learning and competence in basic education and upper secondary education]. Kansallinen koulutuksen arviointikeskus, Julkaisut 7:2019.

Boud, D., Lawson, R. & Thompson, D. G. (2013). Does student engagement in self-assessment calibrate their judgement over time? Assessment & Evaluation in Higher Education, 38:8, 941–956.

DeLuca, C. & Braund, H. (2019). Preparing Assessment Literate Teachers. Oxford Research Encyclopedias of Education.

Hamre, B. K., Pianta, R. C., Mashburn, A. J., & Downer, J. T. (2007). Building a science of classrooms: Application of the CLASS framework in over 4,000 U.S. early childhood and elementary classrooms. New York: Foundation for Child Development.

Kearney, S. P. & Perkins, T. (2014). Engaging students through assessment: the success and limitations of the ASPAL (Authentic Self and Peer assessment for Learning) model. Journal of University Teaching & Learning Practice, 11 (3), 2014.

Vaismoradi M, Jones J, Turunen H, and Snelgrove, S. (2016). Theme development in qualitative content analysis and thematic analysis. Journal of Nursing Education and Practice 6(5), 100–110.

Xu, Y. & Brown, G. T. L. (2016). Teacher assessment literacy in practice: A reconceptualization. Teaching and Teacher Education. Vol. 58, Aug. 2016, 149–162.
 
5:15pm - 6:45pm24 SES 08 B JS: Assessment and Curriculum Reforms: Understanding Impacts and Enhancing Assessment Literacy
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Sarah Howie
Joint Paper Session NW 09 and NW 24

Full information in the programme under 09 SES 08 A JS (set the filter to Network 09 (In conftool follow the below)
Date: Thursday, 24/Aug/2023
9:00am - 10:30am09 SES 09 A: Bridging Research and Practice in Reading Literacy Interventions: Insights and Applications
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Lisa Palmqvist
Paper Session
 
09. Assessment, Evaluation, Testing and Measurement
Paper

The relevance of Rapid Automatized Naming (RAN) in research versus practice

Malena Avall

University of Gothenburg, Sweden

Presenting Author: Avall, Malena

Rapid Automatized Naming, (RAN, a measure of the ability to name aloud objects, colours, digits or letters) and phonological awareness are two reading-related measures that are shown to predict early reading ability strongly and reliably (e.g., Moll et al., 2014; Caravolas et al., 2019). However, the relevance and predictive power of each individual measure is still under debate. This study focuses the influence of RAN on early reading ability, problematizing to what extent RAN contributes to assessments aiming to predict reading ability. The main aim, however, is to problematize how this knowledge is best to put into practice. Thus, when is RAN relevant to use as an indicator of reading ability and when are other indicators more relevant?

Because of the efforts being made to identify children, at an early stage, who are at risk of reading difficulties children are in school screened for their reading ability. But screening children is both time-consuming and costly, and therefore it must be well thought out what is being screened for, how to interpret results, but also how are the results to be used and implemented in the school's operations? Thus, given that children's time in school is limited, any activity that focuses on reading achievement must in one way or another be based on knowledge established to improve reading. Hence, the time it takes to screen children needs to be balanced against the time it takes from teaching.

In previous research, RAN is claimed to be a measure of phonological processing time and reflects how fast representations can be retrieved from long-term memory (Bowey et al., 2005; Torgesen, et al., 1997). Further it has been debated whether RAN and phonological awareness each contribute unique information to early reading, or if the measures will be merely two ways of measuring one ability, phonological processing. For example, Chiappe et al. (2002) found that most of the variance contributed by RAN to reading ability is shared with phonological awareness. Further, and in line with the understanding of RAN and phonological awareness being two sides of the same coin, it is claimed by Ziegler et al. (2010) that RAN will only become the dominant predictor when phonological awareness tasks are not challenging enough.

However, another view is that RAN and phonological awareness are two distinct measures predicting early reading ability (Torppa et al., 2013). In a cross-sectional study de Groot et al., (2015) compared reading disabled children with more skilled readers and found that for the reading disabled children the combination of RAN and phonological awareness showed the highest predictive values. When comparing the effects of phonological awareness and RAN on reading ability phonological awareness appears to be the best predictor of reading disability whilst RAN is indicated to be the best predictor of above-average to excellent reading ability (de Groot et al., 2015).

Other longitudinal studies show that the predictive power of RAN and phonological awareness on reading appears to change by age (e.g. Kirby at al., 2003; Vaessen & Blomert, 2010) and hypothesized by some researchers to be connected to the reading strategy used (Vaessen & Blomert, 2010; Rodriguez et al., 2015). Reading development is assumed to shift from a slow sequential phonological decoding to an automatic orthographic processing (Ehri, 2005).

In order to investigate the relevance of RAN in reading assessment, the present study measured RAN repeatedly among a group of children who were followed from kindergarten through their time in elementary school.


Methodology, Methods, Research Instruments or Sources Used
Method
In this longitudinal study 364 children were recruited from 45 preschools in 8 different municipalities. The children were followed between ages 4 and 15. RAN was measured by three different stimuli, objects, digits and letters. RAN-objects was measured between ages 4 and 15. RAN-letters and RAN-digits were measured between ages 8 and 15.
Word reading and reading fluency was examined. Word reading was measured between ages 8 and 15 by two different tests. The word chain test, where three words are printed without inter-word spaces. The task is to mark the correct inter-word spacing with a vertical line. The test is performed on time. The second test was a Word reading list. The task is to read aloud as many printed real words as possible within 60 s. Words were presented in vertical lists and were not graded by difficulty. The test used was specially developed for this study. Number of correctly read words after 30 seconds was recorded.
Reading fluency was measured at two times, when children were 8 years old and when they were 10 years old. At both times the child reads a narrative text aloud consisting of words with varying complexity regarding for example clusters and phoneme/grapheme correspondence. Rate was recorded
The main analytic method used in this study are regression analyses. RAN performance will be regressed on reading ability at different ages and differentiated by level of performance. Both word reading and reading fluency will be taken into account.

Conclusions, Expected Outcomes or Findings
Results
In the current study the preliminary results suggest that when children still are learning to read RAN predicts both word reading and reading fluency. For children slow at RAN this appears to apply even when they get older. However, for children high performing on RAN it appears as if RAN becomes more relevant when children are older, even if it appears to be significant from the beginning.
Thus, in line with previous research, the preliminary results suggests that the predictive power of RAN on reading achievement change as children get older. Further, it can be assumed that children´s reading development is important when interpreting the results, which might also apply to the reading measure used. The relevance of RAN in reading assessments will be discussed.

References
Bowey, J. A., McGuigan, M., & Ruschena, A. (2005). On the Association between Serial Naming Speed for Letters and Digits and Word-Reading Skill: Towards a Developmental Account. Journal of Research in Reading, 28(4), 400-422.
Caravolas, M., Lervåg, A., Mikulajová, M., Defior, S., Seidlová-Málková, G., & Hulme, C. (2019). A Cross-Linguistic, Longitudinal Study of the Foundations of Decoding and Reading Comprehension Ability. Scientific Studies of Reading, 23(5), 386-402. doi:10.1080/10888438.2019.1580284
Chiappe, P., Stringer, R., Siegel, L. S., & Stanovich, K. E. (2002). Why the timimg deficti hypothesis does not explain reading disability in adults. Reading and Writing: An Interdisciplinary Journl, 15, 73-107
de Groot, B. J. A., van den Bos, K. P., Minnaert, A. E. M. G., & van der Meulen, B. F. (2015). Phonological Processing and Word Reading in Typically Developing and Reading Disabled Children: Severity Matters. Scientific Studies of Reading, 19(2), 166-181. doi:10.1080/10888438.2014.973028
Ehri, L. C. (2005). Learning to Read Words: Theory, Findings, and Issues. Scientific Studies of Reading, 9(2), 167-188. doi:10.1207/s1532799xssr0902_4Jacobsson, 2001
Kirby, J., Parrila, R., & Pfeiffer, S. (2003). Naming speed and phonological awareness as predictors of reading development. Journal of Educational Psychology, 95, 453–464.
Moll, K., Ramus, F., Bartling, J., Bruder, J., Kunze, S., Neuhoff, N., Streiftau, S., Lyytinen, H.,. Leppänen, P. H.T, Lohvansuu, K., Tóth, D., Honbolygó, F., Csépe, V., Bogliotti, C., Iannuzzi, S., Démonet, J. F., Longeras, E., Valdois, S., George, F., . . . Landerl, K. (2014). Cognitive mechanisms underlying reading and spelling development in five European orthographies. Learning and Instruction, 29, 65-77. doi:http://dx.doi.org/10.1016/j.learninstruc.2013.09.003
Rodriguez, C., van den Boer, M., Jimenez, J. E., & de Jong, P. F. (2015). Developmental Changes in the Relations between RAN, Phonological Awareness, and Reading in Spanish Children. Scientific Studies of Reading, 19(4), 273-288.
Torgesen, J. K., Wagner, R. K., Rashotte, C. A., Burgess, S., & Hecht, S. (1997). Contributions of Phonological Awareness and Rapid Automatic Naming Ability to the Growth of Word-Reading Skills in Second-to Fifth-Grade Children. Scientific Studies of Reading, 1(2), 161.
Torppa, M., Parrila, R., Niemi, P., Poikkeus, A.-M., Lerkkanen, M.-K., & Nurmi, J.-E. (2013). The double deficit hypothesis in the transparent Finnish orthography: A longitudinal study from kindergarten to Grade 2. Reading and Writing, 26, 1353–1380. doi:10.1007/s11145-012-9423-2
Vaessen, A., & Blomert, L. (2010). Long-term cognitive dynamics of fluent reading development. Journal of Experimental Child Psychology, 105(3), 213-231. doi:http://dx.doi.org/10.1016/j.jecp.2009.11.005
Ziegler, J. C., Bertrand, D., Tóth, D., Csépe, V., Reis, A., Faísca, L., Saine, N., Lyytinen, H., Vaessen, A.,& Blomert, L. (2010). Orthographic Depth and Its Impact on Universal Predictors of Reading:A Cross-Language Investigation. Psychological Science, 21(4), 551-559. doi:10.1177/0956797610363406


09. Assessment, Evaluation, Testing and Measurement
Paper

Early Phonological Intervention: A Ten Year Follow-up

Ulrika Wolff, Jan-Eric Gustafsson

University of Gothenburg, Sweden

Presenting Author: Wolff, Ulrika

An abundance of research has established that phonological awareness skills are important prerequisites for early reading acquisition (for a review, see Melby-Lervåg et al., 2012). Early develop­ment of phonological awareness implies that a child moves from implicit to explicit control of the sound structure of language, and this explicit control is critical when a child learns to understand and handle the alphabetic principle (e.g., Caravolas et al., 2013; Lundberg et al., 2010). Accordingly, there has been a long tradition of research on phonological training to prevent failure to acquire reading skills. Early examples of such studies are Bradley and Bryant (1983) and Lundberg, Frost and Petersen (1988), and results from later training studies have been summarized in several meta-analyses (e.g., Bus & Van Ijzendoorn, 1999, National Early Literacy Panel, 2008). However, even though the results of the training studies show positive effects, Torgesen (2000) found that around two to six percent of the participants in phonological interventions could be defined as “treatment resisters”. Thus, some children do not grasp the idea of phonemes as discrete entities, and they do not seem to enhance their phonological skills to an acceptable level by the training.

Most phonological interventions have been carried out in combination with, or just before, formal reading instruction starts, and studies have typically investigated development over short periods (Kjeldsen et al., 2019). In the present study phonological awareness training was carried out when children were four and five years old (school starts at age 7 in Sweden). The intention was to begin the study at this early stage when children’s explicit awareness of the structure of speech starts to emerge (Wolff & Gustafsson, 2015; Dodd & Gillon, 2001). The training addressed different aspects of phonological awareness, gradually moving from games and exercises with morphemes and syllables to phonemes. Explicit training of phoneme/grapheme mapping (National Reading Panel, 2000) was later introduced when children were six years old, one year before formal reading instruction started. This training was given to all children regardless of whether they belonged to the experimental or control group. Thus, since everyone received the six-year-old training, the potential effects in this study are derived from the early training at ages 4 and 5.

General fluid intelligence (Gf) is a core concept in the field of intelligence. It is interpreted as the capacity to solve novel, complex problems. Gf is highly correlated with phonological awareness in 4-year-old children (Wolff & Gustafsson, 2015), and both phonology and Gf have been found to relate to early reading ability. de Jong & van der Leij (1999) found that when Gf was controlled for, the relation between phonology and reading decreased, and the direct effect of Gf on reading decreased over time. These findings support the hypothesis that the influence of Gf on early reading skills is mediated through the development of phonological awareness. Thus, we may expect that children with high Gf typically will have a more favorable development of phonological awareness skills. One important question here is if the phonological training will decrease or increase this putative influence.

The research questions are: 1) Does structured phonological awareness training starting at the age of 4 affect reading related skills ten years later in grade 8? and 2) Are there differential effects of phonological awareness training as a function of children’s cognitive abilities? The present study thus aims at extending on the rich knowledge of effects of preventive phonological interventions preceding reading instruction. As to our knowledge there are very few previous studies which investigate long-term effects of phonological awareness training during a ten-year period.


Methodology, Methods, Research Instruments or Sources Used
The participants (N=364) were recruited from 58 preschools in 8 municipalities. The participating preschools were situated in rural as well as urban regions, approximately representative of the Swedish population. Also, non-native Swedish speaking children (n=38) were included. The preschools were to have at least three children who could form a group, and who were between 3 and 10 months and 4 years 4 months old. The preschool groups were randomly assigned to an experimental group (n=138) or to a control group (n=226). The groups comprised three to six children. In case there were two groups at the same preschool, both groups were assigned to the same condition. The experimental group received phonological awareness training for six weeks at the age of 4, and for six weeks at the age of 5. Before the intervention at age 4, (t1) a pre-test was given assessing Gf and phonological awareness; four, five and ten years later in grade 2 (t2), grade 3 (t3) and grade 8 (t4) reading related skills were assessed. Informed consent was obtained from all parents before t1.
The method applied in the current study will be Structural Equation Modeling (SEM), and the models will be estimated with the Mplus 7.4 program (Muthén & Muthén, 1998–2012). The analyses will be carried out investigating direct and indirect effects of early phonological training. There are some obvious advantages of using SEM in the present study. It allows for estimation of relations between multiple dependent variables, and for reciprocal and indirect effects. SEM also allows for the use of manifest and latent variables in the same model. The models will be estimated with the Robust Maximum Likelihood (MLR) estimator in Mplus 7. In order to take the cluster-sampling design of the study into account, the so-called ‘complex option’ in Mplus will be used to obtain cluster-robust estimates of standard errors. Chi-square, Root Mean Square of Approximation (RMSEA) with confidence intervals, Comparative Fit Index (CFI), and Standardised Root Mean Square Residual (SRMR) will be reported.

Conclusions, Expected Outcomes or Findings
Previous findings in the project (Wolff & Gustafsson, 2022) demonstrated that early phonological awareness training preceding the ordinary kindergarten training improves children’s further development of phonological skills. Further, the training affected all the reading related measures in grades 2 and 3 (effect sizes running from d =0.37−0.54) and showed to be particularly beneficial for at-risk children. Bearing in mind the phonological training for all children at age 6, these effects five and six years after training are impressive.
The data files for the recently collected grade 8 data are not yet completely cleaned and organized. Still, the effects of the early phonologicl skills on reading in grade 8 were preliminary investigated, using SEM (Muthén & Muthén, 1998–2012). The assumption in the present study is that Gf is mediated through phonological awareness to early reading. Thus, phonological awareness was regressed on Gf, and reading related skills in grade 8 were regressed on phonological awareness. A manifest variable representing group assignment was related to the reading measures. There was an effect of the early phonological training on a latent measure reflecting reading related tasks in grade 8 (es = .40). When scrutinizing the effects on the manifest reading related measures, there was an effect of training on word decoding (es =.25) and reading comprehension (es =.42), whereas there was no significant effect on spelling.
For the current presentation the model will be extended. Reading related measures in grades 2 and 3 will be included, and thus, most of the training effects on grade 8 reading is expected to be indirect through grades 2 and 3 reading. Direct and indirect effects of Gf and phonological awareness will be investigated. Further, interaction effects of group assignment on the one hand, and Gf and phonological awareness on the other hand respectively will be estimated.

References
Bradley, L. & Bryant, P. (1983). Categorizing sounds and learning to read- a causal connection. Nature, 301, 419-421.
Bus, A.G., & Van Ijzendoorn, M.H. (1999).| Phonological Awareness and Early Reading: A Meta-Analysis of Experimental Training Studies.  Journal of Educational Psychology, 91, 403-414.
Caravolas, M., Lervåg, A., Defior, S., Málková, G.S., & Hulme, C. (2013). Different patterns, but equivalent predictors, of growth in reading in consistent and inconsistent orthographies. Psychological Science, 24, 1398-1407. DOI: 1177/0956797612473122
De Jong, P.F., & Van der Leij, A. (1999). Specific contributions of phonological abilities to early reading acquisition: Results from a Dutch latent variable longitudinal study. Journal of Educational Psychology, 91, 450-476.
Dodd, B. & Gillon, G. (2001) Exploring the relationship between phonological awareness, speech impairment, and literacy. Advances in Speech and Language Pathology, 3, 139-147.
Kjeldsen, A. C., Saarento-Zaprudin, S., & Niemi, P. (2019). Kindergarten training in phonological aware¬ness: Fluency and comprehension gains are greatest for readers-at-risk through grades 1 to 9. Journal of learning disabilities, 5, 366–382. https://doi.org/doi/10.1177/0022219419847154
Lundberg, I., Frost, J. & Petersen, O. (1988). Effects on an extensive program for stimulating phonological awareness in pre-school children. Reading Research Quarterly, 23, 263-284.
Lundberg, I., Larsman, P. & Strid, A. (2010). Development of phonological awareness during the preschool year: the influence of gender and socio-economic status. Reading and Writing: An Interdisciplinary Journal, 25, 305-320.
Melby-Lervåg, M., Lyster, S-A. H. & Hulme, C. (2012). Phonological skills and their role in learning to read: A meta-analytic review. Psychological Bullentin, 138, 322-352.
Muthén, L. K. & Muthén, B. O. (2012). Mplus User’s Guide. Statistical Analysis with Latent Variables. Version 7. Los Angeles, CA: Muthén & Muthén.
National Early Literacy Panel. (2008). Developing early literacy: Report of the National Early Literacy Panel. Washington, DC: National Institute for Literacy.
National Reading Panel. (2000). Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. Washington DC: National Institutes of Child Health and Human Development.
Torgesen, J.K. (2000). Individual differences in response to early intervention in reading: The lingering problem of treatment resisters. Learning Disabilities Research & Practice, 15, 55-64.
Wolff, U. & Gustafsson, J.-E. (2015). Structure of phonological ability at age four. Intelligence, 53, 108-117. doi.org/10.1016/j.intell.2015.09.003
Wolff, U. & Gustafsson, J-E (2022) Early phonological training preceding kindergarten training: effects on reading and spelling. Reading and Writing, 35, 1865–1887.
https://doi.org/10.1007/s11145-022-10261-x


09. Assessment, Evaluation, Testing and Measurement
Paper

Digital Inclusive Reading Support Evolving Through Practice To Research Transfer

Ralf Junger1, Judith Hanke2, Kirsten Diehl2

1Leipzig University, Germany; 2University of Flensburg, Germany

Presenting Author: Junger, Ralf; Hanke, Judith

A high percentage of students have considerable reading deficits, not only in Germany, but also in other European countries (Betthäuser et al. 2023; Wallner-Paschon et al. 2017). Therefore, formative diagnostics are essential to record the current learning levels of students so that specific interventions can be shaped at an early stage. This discourse has been intensified in Germany as a result of the significant increase in heterogeneity at primary schools and the disconcerting learning deficits of a large number of primary school children after the COVID-19 pandemic (Stanat et al. 2022). As a result, the German Standing Scientific Commission of the Standing Conference of the Ministers of Education and Cultural Affairs of the Federal States recommends the early intensification of nationwide diagnostics and the "provision of scientifically based, quality-assured diagnostic instruments and related support instruments" as formative diagnostics to ensure basic competencies (Köller et al. 2022, p. 74). Additionally, the Progress in International Reading Literacy Study (PIRLS), is not only conducted through a computer-based assessment since 2016, but also has a focus point in digital forms of reading (Mullis & Martin 2019).

In the collaborative BMBF-funded project DaF-L (digital everyday integrated supportive diagnostic - reading in inclusive education) an adaptive, digital and competence-oriented reading screening with adapted reading packages, consisting of reading texts and reading exercises, is being developed, tested, standardized, and subsequently made available as OER on the online learning application Levumi for the third grade of primary schools in Germany. For the project partners of four German universities an essential component in the development and improvement of digital applications is the cooperation between practice and research. Interviews with experts will be conducted in order to advance the reading packages as well as Levumi. The goal is to ensure a direct transfer of research to practice by examining the ecological validity and usability of the instrument as well as the professional development of the partners in the schools with regard to this unique form of the adaptive approach. Ensuing this, the objective is to enable low-threshold, data-based, and effective reading support, to identify conditions for the success of everyday support-based diagnostics and to improve the conditions for inclusive education in primary schools.

The reading packages for promoting reading comprehension in inclusive classes contain reading texts on three ability levels and reading tasks tailored to them. The reading packages are intended to promote the reading skill reading fluently and the reading abilities reading comprehension and strategies for reading comprehension. The ability to read fluently implies that the students can read “quietly, aloud, automatically, accurately, meaningfully, and quickly (KMK 2020, p. 16).” The students have to read texts and solve reading exercises; therefore, they read both repeatedly (Mayer 2018) and a lot (Kruse et al. 2015), thus promoting reading fluency. In the case of reading comprehension, the students read texts that correspond to their ability level and understand the meaning. Skills involve students identifying textual information at the local level, either explicit or gleaned through simple inferencing. In doing so, they also pay attention to linguistic means to ensure the context of the text as well as link text information, draw conclusions, and construct an overall understanding using their previous knowledge (KMK 2020). In the case of possessing strategies for reading comprehension or reading strategies, the students know to use basic cognitive and metacognitive reading strategies after reading. They work with after-reading strategies such as central text statements (KMK 2020).

The global question of how students can benefit from digital reading packages and how the usability of the application can be improved through collaboration between educators and researchers will be explored.


Methodology, Methods, Research Instruments or Sources Used
The collaborative project follows a multi-method design.
An ABA-design was selected for the intervention study. The study will start in April 2023 until July 2023 and will collect quantitative data of individuals, groups, and classes. It will consist of a survey group (N = 100) and a control group (N = 100).
A) It will start with the interviews and the initial testing. The testing consists of the self-developed digital and competence-oriented reading screening and the ELFE 2, which is an established diagnostic test.
B) Three weeks later, in the first lesson the students will take a self-developed digital a-version test aligned with the reading packages. This will be the start of the four-week intervention phase. The intervention (reading support) will be three times a week for 30 minutes in a classroom setting. The students will work on their digital reading exercises individually. Students will receive a reading package based on their ability level. During the intervention students’ answers will be saved digitally. At the end of the intervention, students will participate in the b-version of the aligned test as well as a second administration of the competence-oriented reading screening and the ELFE 2.
A) A follow-up will be conducted with the ELFE 2.

To ensure a direct practice-research transfer, the expert interviews are planned in a qualitative longitudinal design. Through the processual character of the design, the focus on the stakeholders' perspective is intended to improve usability and thus to increase ecological validity. Concurrently, the understanding of diagnosis and diagnostic practices will be examined. For this purpose, qualitative semi-structured expert interviews will be conducted at three different times to accompany the further development of Levumi in consultation with school practitioners. The following research questions will be pursued in the interviews:
1. How do educators rate the usability of Levumi before the redevelopment?
2. What changes would be beneficial from the educators' perspective to increase the usability of Levumi?
3. How do educators evaluate the ecological validity of the newly developed procedures?
4. What diagnostic practices characterize everyday teaching in schools?
In spring 2022, initial expert interviews (N = 7) were already conducted and evaluated as a needs analysis (M1). Based on this, expert interviews (N = 7) will be conducted in March and April 2023 to examine the reading texts and reading exercises (M2) in order to support the development of the reading packages as a practicable instrument.

Conclusions, Expected Outcomes or Findings
Results from the first expert interviews on the needs analysis (M1) show the necessity of the educators’ participation as users in the development of diagnostic procedures. That way, with the help of a practical research transfer, the usability in support-based diagnostics can be improved and the acceptance of the users can be increased. The results will contribute to the further development of the online learning application Levumi and will be verified through supplementary expert interviews (M2).
The outcomes of the study are expected to improve the student's reading abilities. The collaboration and cooperation of educators and researchers for the development and digitalisation of the reading packages enhanced the usability; therefore, the students' reading abilities were additionally supported. Additionally, in general and for the future, the alliance between educators and researchers could be a very beneficial factor for all involved, especially for the students as the collaboration could foster the improvement and implementation of digital tools in the classroom.
In the presentation, interviews (M1 and M2) as well as the development and implementation of the reading packages will be illustrated and discussed under the global question of how the students benefit from the digital reading packages and how the usability of the application can be improved through collaboration and cooperation between educators and researchers. These results provide significant value for the development of reading support and the usability of digital applications in the Pan-European context. Especially in context of the increased and perpetuated learning gaps in primary schools due to the COVID-19 pandemic. Subsequently, the possibility of transmitting the results to other countries will be discussed.

References
Betthäuser, B. A., Bach-Mortensen, A. M. & Engzell, P. (2023). A systematic review and meta-analysis of the evidence on learning during the COVID-19 pandemic. Nature human behaviour. Vorab-Onlinepublikation.
Köller, O., Thiel, F., van Ackeren, I., Anders, Y., Becker-Mrotzek, M., Cress, U., Diehl, C., Kleickmann, T., Lütje-Klose, B., Prediger, S., Seeber, S., Ziegler, B., Kuper, H., Stanat, P., Maaz, K. & Lewalter, D. (2022). Basale Kompetenzen vermitteln – Bildungschancen sichern. Perspektiven für die Grundschule. Gutachten der Ständigen Wissenschaftlichen Kommission der Kultusministerkonferenz (SWK). SWK: Bonn.
Kruse, G., Rickli, U., Riss, M., & Sommer, T. (2015). Lesen. Das Training Klasse 2./3. Klett.
Kultusministerkonferenz (KMK) (2022). Bildungsstandards für das Fach Deutsch Primarbereich. Oktober 12, 2022,
https://www.kmk.org/fileadmin/Dateien/veroeffentlichungen_beschluesse/2022/2022_06_23-Bista-Primarbereich-Deutsch.pdf
Mayer, A. (2018). Blitzschnelle Worterkennung (BliWo): Grundlagen und Praxis. Borgmann.
Mullis, I. V. S., & Martin, M. O. (Eds.). (2019). PIRLS 2021 Assessment Frameworks. Retrieved from Boston College, TIMSS & PIRLS International Study Center website: https://timssandpirls.bc.edu/pirls2021/frameworks/
Stanat, P., Schipolowski, S., Schneider, R., Sachse, K. A., Weirich, S. & Henschel, S. (2022). IQB-Bildungstrend 2021: Kompetenzen in den Fächern Deutsch und Mathematik am Ende der 4. Jahrgangsstufe im dritten Ländervergleich. Waxmann Verlag.
Wallner-Paschon, C., Itzlinger-Bruneforth, U. & Schreiner, C. (Hrsg.). (2017). PIRLS 2016. Die Lesekompetenz am Ende der Volksschule. Erste Ergebnisse. Graz: Leykam.


09. Assessment, Evaluation, Testing and Measurement
Ignite Talk (20 slides in 5 minutes)

Teacher Ratings on the Strengths and Difficulties Questionnaire for Siblings of Children with Chronic Disorders

Caitlin Prentice, Stian Orm, Krister Fjermestad

University of Oslo, Norway

Presenting Author: Prentice, Caitlin

The educational inclusion of children with chronic disorders – such as developmental and physical disabilities – is a well-studied area, across Europe and globally. The siblings of these children, however, are less studied, particularly in relation to their educational experiences and outcomes. Siblings of children with chronic disorders have divergent, and often adverse, life experiences. Some siblings may experience positive outcomes, such as increases in prosocial functioning (Orm et al., 2022). Overall, however, siblings are at risk for negative psychological effects including emotional and behavioural problems (Havill et al., 2019; Vermaes et al., 2012). Reduced psychological well-being, in turn, can affect siblings’ educational experiences, functioning, and outcomes (Gan et al., 2017).

Studies of siblings of children with chronic disorders (herein “siblings”) tend to utilise mainly parent and self-ratings on measures of psychosocial well-being (Hayden et al., 2019). While these perspectives are important, they offer limited insight into the functioning of siblings within a school environment, particularly in the case of parent-rated measures. Given the centrality of school to children’s daily lives and the importance of education outcomes to later life outcomes, it is essential to consider the perspectives of teachers on sibling well-being. The present study aims to address this gap by examining:

1) Teacher ratings for siblings on the Strengths and Difficulties Questionnaire (SDQ)

2) Agreement between teacher and parent ratings on the SDQ, and

3) Factors that may explain disagreement between raters.

Previous studies of child psychosocial functioning – generally, rather than specific to siblings of children with chronic disorders – have found low to moderate levels of agreement between teacher and parent ratings on the SDQ (Murray et al., 2021). This pattern is also found across different measures of child psychosocial functioning. Across these studies, teachers tend to report fewer problems than parents, particularly in the case of internalising problems. Rather than signalling poor reliability of measures, however, the discrepancy between teacher and parent ratings suggests that children’s behaviour is context and rater-specific. Teacher and parent ratings may be seen as complimentary pieces of a larger picture and understanding differences can facilitate better targeted interventions (De Los Reyes et al., 2015). A number of factors may explain a lack of interrater agreement on the SDQ and other measures of child psychosocial functioning. Factors of the home environment, for example, can influence the level of agreement between raters; family stress has been found to be associated with less agreement between teacher and parent ratings while positive parent-child relationship is associated with more agreement (Cheng et al., 2018).

Overall, little is known about the educational experiences of siblings of children with chronic disorders. Furthermore, the SDQ is widely used across European countries and globally. A better understanding of the conditions under which teacher – parent agreement tends to be higher and lower will help researchers and practitioners to interpret SDQ ratings and target solutions and support accordingly.


Methodology, Methods, Research Instruments or Sources Used
The present study is part of a larger RCT evaluating a therapeutic intervention programme, “SIBS”, for siblings and parents of children with chronic disorders (Fjermestad et al., 2020). SIBS aims to improve the emotional and behavioural well-being of siblings and to improve communication between parents and siblings. Participants were recruited from six sites that provide support to children with chronic disorders and their families across Norway. The SIBS intervention consists of five sessions, with separate and joint sibling and parent components. Data – including SDQ scores – were collected at baseline and at 3, 6, and 12 months following the intervention. The present study uses baseline SDQ scores from teachers (n=127) and parents (n=173).

The SDQ is a measure of children’s behavioural and emotional functioning. It is composed of 25 items organised into five subscales: emotional difficulties, conduct problems, hyperactivity and inattention, peer difficulties, and pro-social behaviour. The first of these four subscales comprise a total difficulties scale of 20 items. Each item includes a statement about the child’s behaviour and three options: not true (0), somewhat true (1), and certainly true (2). The SDQ has been validated across a range of populations and contexts. The psychometric properties of the teacher and parent versions of the SDQ are strong, with satisfactory internal consistency and test-retest reliability (Stone et al., 2010).

Means and standard deviations of teacher and parent scores will be calculated for total difficulties and each of the five subscales. Total scores will be compared with population norms using paired sample t-tests. Agreement between teacher and parent ratings will be calculated using Pearson’s correlation coefficient, for both the SDQ totals and the subscales. Finally, factors associated with agreement will be tested using logistic regressions, and will  include gender and age of the sibling, family stress, child-parent communication, and diagnosis of the child with the chronic disorder.

Conclusions, Expected Outcomes or Findings
Data analysis is currently in progress and therefore results are forthcoming. Preliminary results suggest that siblings’ teacher total SDQ scores are similar to population norms (M = 6.50, SD = 4.89) and that teachers’ scores were significantly lower than parent reports (effect size difference: mother d = .37 and father d = .46).

Results of the study will provide a novel insight into the well-being of siblings of children with chronic disorders and how their strengths and difficulties may be enacted in a school environment. Additionally, the study contributes to discussions about agreement between teacher and parent ratings on the SDQ, providing data from a unique sample of children. Finally, results will contribute to the literature on factors associated with agreement / disagreement between raters, such as parent-child communication, family socio-economic situation, and family stress levels. In previous studies of siblings of children with chronic disorders, the diagnosis of the child with the disorder has affected sibling well-being; thus, it is possible that this factor may also be associated with differences in teacher and parent SDQ scores.    

School is an important part of young people’s lives and teachers play a key role in young people’s well-being and life outcomes. The results from this study can be used to build a foundation for understanding siblings’ experiences and behaviours at school, allowing practitioners to build on existing strengths and offer targeted support if required.

References
Cheng, S., Keyes, K. M., Bitfoi, A., Carta, M. G., Koç, C., Goelitz, D., Otten, R., Lesinskiene, S., Mihova, Z., Pez, O., & Kovess-Masfety, V. (2018). Understanding parent-teacher agreement of the Strengths and Difficulties Questionnaire (SDQ): Comparison across seven European countries. International Journal of Methods in Psychiatric Research, 27(1), e1589. https://doi.org/10.1002/mpr.1589
De Los Reyes, A., Augenstein, T. M., Wang, M., Thomas, S. A., Drabick, D. A. G., Burgers, D. E., & Rabinowitz, J. (2015). The validity of the multi-informant approach to assessing child and adolescent mental health. Psychological Bulletin, 141(4), 858–900. https://doi.org/10.1037/a0038498
Fjermestad, K. W., Silverman, W. K., & Vatne, T. M. (2020). Group intervention for siblings and parents of children with chronic disorders (SIBS-RCT): study protocol for a randomized controlled trial. Trials, 21(1), 851. https://doi.org/10.1186/s13063-020-04781-6
Gan, L. L., Lum, A., Wakefield, C. E., Nandakumar, B., & Fardell, J. E. (2017). School Experiences of Siblings of Children with Chronic Illness: A Systematic Literature Review. Journal of Pediatric Nursing, 33, 23–32. https://doi.org/10.1016/J.PEDN.2016.11.007
Havill, N., Fleming, L. K., & Knafl, K. (2019). Well siblings of children with chronic illness: A synthesis research study. Research in Nursing & Health, 42(5), 334–348. https://doi.org/10.1002/nur.21978
Hayden, N. K., Hastings, R. P., Totsika, V., & Langley, E. (2019). A Population-Based Study of the Behavioral and Emotional Adjustment of Older Siblings of Children with and without Intellectual Disability. Journal of Abnormal Child Psychology, 47(8), 1409–1419. https://doi.org/10.1007/s10802-018-00510-5
Murray, A. L., Speyer, L. G., Hall, H. A., Valdebenito, S., & Hughes, C. (2021). Teacher Versus Parent Informant Measurement Invariance of the Strengths and Difficulties Questionnaire. Journal of Pediatric Psychology, 46(10), 1249–1257. https://doi.org/10.1093/jpepsy/jsab062
Orm, S., Haukeland, Y., Vatne, T., Silverman, W. K., & Fjermestad, K. (2022). Prosocial Behavior Is a Relative Strength in Siblings of Children with Physical Disabilities or Autism Spectrum Disorder. Journal of Developmental and Physical Disabilities, 34(4), 591–608. https://doi.org/10.1007/s10882-021-09816-7
Stone, L. L., Otten, R., Engels, R. C. M. E., Vermulst, A. A., & Janssens, J. M. A. M. (2010). Psychometric Properties of the Parent and Teacher Versions of the Strengths and Difficulties Questionnaire for 4- to 12-Year-Olds: A Review. Clinical Child and Family Psychology Review, 13(3), 254–274. https://doi.org/10.1007/s10567-010-0071-2
Vermaes, I. P. R., van Susante, A. M. J., & van Bakel, H. J. A. (2012). Psychological Functioning of Siblings in Families of Children with Chronic Health Conditions: A Meta-Analysis. Journal of Pediatric Psychology, 37(2), 166–184. https://doi.org/10.1093/jpepsy/jsr081
 
12:15pm - 1:15pm09 SES 10.5 A: NW 09 Network Meeting
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Monica Rosén
NW 09 Network Meeting
 
09. Assessment, Evaluation, Testing and Measurement
Paper

NW 09 Network Meeting

Monica Rosén

University of Gothenburg, Sweden

Presenting Author: Rosén, Monica

All networks hold a meeting during ECER. All interested are welcome.


Methodology, Methods, Research Instruments or Sources Used
.
Conclusions, Expected Outcomes or Findings
.
References
.
 
1:30pm - 3:00pm09 SES 11 A: Addressing Educational Equity and Inequality: Insights from Research and Policy
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Gasper Cankar
Paper Session
 
09. Assessment, Evaluation, Testing and Measurement
Paper

The Lagom Effect: School Composition and Inequality of Opportunities in Sweden

Victoria Rolfe

University of Gothenburg, Sweden

Presenting Author: Rolfe, Victoria

Sweden’s self-image as a leader in education was rocked in the early 2010s by the so-called PISA-shock, in which this formerly high-flying education system saw its performance in international assessments dramatically decline. This period of decline dominated the public and political discourse around education and reform in Sweden (e.g. Lundahl & Serder, 2020) for the rest of the decade. Over time, Sweden’s performance in Mathematics has recovered, as evidenced in numerous international assessments, including most recently TIMSS 2019 and PISA 2018 (e.g. Mullis et al., 2020; OECD, 2019a). Nevertheless, the improvement in the overall achievement of Swedish youth somewhat masks a persistent achievement gap which has been observed within the Swedish school system since the early 2000s, with growing variation in performance between schools in student grades (Skolverket, 2005, 2020). The achievement gap noted in domestic data has also been recorded in international data, with the achievement gap widening (Chmielewski, 2019) and Sweden’s decline in socioeconomic equality of outcomes the most severe among peer nations (Hanushek et al., 2014).

Socioeconomic status is a well-established predictor of educational outcomes (e.g. Sirin, 2005), and previous research using TIMSS data has confirmed this relationship in relation to mathematics outcomes for Swedish youth over multiple cycles of TIMSS between 2003 and 2015 (Authors, 2021). A longstanding strand of scholarship suggests that in addition to predicting achievement, socioeconomic background indicates varied opportunity to learn (OTL) course material, which in turn predicts test performance (e.g. Eggen et al., 1987). While this pattern of relationships has been consistently evidenced in the English-speaking world (e.g. Authors, 2021; Schmidt et al.,2013), in the Swedish context inequalities of opportunities have been inconsistently observed, appearing only among the 2003 and 2015 TIMSS cohorts (Authors, 2021).

A possible explanation for the lack of observable social reproduction through the delivery of the curriculum lies in the nature of the Swedish school system. A distinctive feature of the Swedish education system is its retention of the comprehensive model in which students are offered equal learning opportunities in integrated school settings (Arnesen & Lundahl, 2006) with limited within-school streaming when compared with other highly developed economies (Chmielewski, 2014). Of much interest to policy makers and researchers, reforms to the Swedish education system enacted in the 1990s introduced school choice and created a market for education (see Björklund et al., 2005). While admissions guidance prohibits cream-skimming (Põder et al., 2017), the exercise of school choice is socially segregated (Teske & Schneider, 2001) and the subsequent composition of schools can be interpreted as reflecting segregation beyond an expected neighbourhood effect (Böhlmark et al., 2016). Despite the observed social segregation between schools, analysis of international data suggests that the comprehensive school system in Sweden is still intact, with students of varying abilities attending the same schools, and that variation in performance between school is low when compared to other economies (OECD, 2019b).

Against this background, the following research questions are considered:

  1. Are between school socioeconomic inequalities in mathematics outcomes and opportunity to learn mathematics observable among eighth graders in Sweden?
  2. Do the relationships between socioeconomic status, opportunity to learn, and achievement very between high-, neutral-, and low-SES schools?

Methodology, Methods, Research Instruments or Sources Used
The study uses Swedish data from the grade 8 sample of the Trends in Mathematics and Science Study (TIMSS) 2019. The focus of the study is between school variation in the relation between inequalities in opportunity to learn (expressed as content coverage) and mathematics outcomes, and thus data from the student, teacher, and school questionnaire is used. Socioeconomic status and opportunity to learn are both conceived as unobserved phenomena and are thus modelled as latent factors. Socioeconomic status is indicated by the number of books in the home, the highest reported parental education level, and the sum of five home possession items. OTL is indicated by manifest variables in which teacher responses to items regarding when content is introduced is summed to create indicators of content coverage in each of the four sub-domains of mathematics (number, algebra, geometry, and data).

Structural equation modelling is used in the study to model the relations between SES, OTL, and achievement in mathematics. Complex survey data such as the TIMSS 2019 dataset favours a multilevel approach to modelling, as it allows the variance in the dependent variable, in this case achievement, to be split across individual and school levels and provides model estimates at both levels. A two-level model is specified with individual achievement regressed on SES at the student level, and a trio of relations – achievement is regressed on SES and OTL, and OTL is regressed on SES – are specified at the school level. As the focus of the study is between school differences, data from the student and teacher questionnaires is aggregated to school level to build the between level of the model. The modelling process features two stages to reflect the research questions. In the first stage, model one – the basic model – is run to identify whether socioeconomic inequalities in outcomes and opportunities can be identified for the sample as a whole. In the second stage, model two separates schools into three groups with each school classified as high-, neutral-, or low-SES, with the goal of establishing whether patterns of inequalities differ between different school profiles.


Conclusions, Expected Outcomes or Findings
Preliminary results suggest that for the Swedish TIMSS 2019 grade eight cohort as a whole, SES remains a strong predictor of achievement at the individual and school levels, in line with earlier research. However, evidence of socioeconomic inequalities in OTL were not observed. When the cohort was categorised as high-, neutral-, and low-SES schools, patterns of inequalities differed between groups, with the most notable results seen in the neutral-SES group. For this group, SES at the school level was a very strong predictor of achievement, and OTL was a significant predictor of achievement, which was not replicated in the other two groups.

In Swedish, the neutral-SES schools could be described as lagom, a concept which roughly translates to ‘not too much, not too little’ or ‘just the right amount’. It is therefore highly relevant to stakeholders in the educational project that it is in these schools with a balanced socioeconomic intake that the Swedish system goes beyond its’ comprehensive character and appears to act in a compensatory manner in terms of mathematics provision.  


References
Arnesen, A. L., & Lundahl, L. (2006). Still social and democratic? Inclusive education policies in the Nordic welfare states. Scandinavian Journal of Educational Research, 50(3), 285-300.
Authors. (2021).
Björklund, A., Clark, M. A., Edin, P. A., Fredriksson, P., & Krueger, A. B. (2005). The market comes to education in Sweden: An evaluation of Sweden's surprising school reforms. Russell Sage Foundation.
Böhlmark, A., Holmlund, H., & Lindahl, M. (2016). Parental choice, neighbourhood segregation or cream skimming? An analysis of school segregation after a generalized choice reform. Journal of Population Economics, 29(4), 1155-1190.
Chmielewski, A. K. (2014). An international comparison of achievement inequality in within- and between-school tracking systems. American Journal of Education, 120(3), 293–324.
Chmielewski, A. K. (2019). The global increase in the socioeconomic achievement gap, 1964 to 2015. American Sociological Review, 84(3), 517-544.
Eggen, T. J. H. M., Pelgrum, W. J., & Plomp, T. (1987). The implemented and attained mathematics curriculum: Some results of the second international mathematics study in The Netherlands. Studies in Educational Evaluation, 13(1), 119-135
Hanushek, E. A., Piopiunik, M., & Wiederhold, S. (2014). The value of smarter teachers: International evidence on teacher cognitive skills and student performance. National Bureau of Economic Research.
Lundahl, C., & Serder, M. (2020). Is PISA more important to school reforms than educational research? The selective use of authoritative references in media and in parliamentary debates. Nordic Journal of Studies in Educational Policy, 6(3), 193-206.
Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 International Results in Mathematics and Science. https://timssandpirls.bc.edu/timss2019/international-results/
OECD. (2019a). PISA 2018 Results (Volume I): What students know and can do.
OECD. (2019b). PISA 2018 Results (Volume II): Where All Students Can Succeed.
Põder, K., Lauri, T., & Veski, A. (2017). Does school admission by zoning affect educational inequality? A study of family background effect in Estonia, Finland, and Sweden. Scandinavian Journal of Educational Research, 61(6), 668-688.
Schmidt, W. H., Zoido, P., & Cogan, L. (2013). Schooling Matters: Opportunity to Learn in PISA 2012. OECD Education Working Papers(95).
Sirin, S. R. (2005). Socioeconomic status and academic achievement: A meta-analytic review of research. Review of Educational Research, 75(3), 417-453.
Skolverket. (2005). Skolverkets lägesbedömning 2005. https://www.skolverket.se/download/18.6bfaca41169863e6a655903/1553958898329/pdf1516.pdf
Skolverket. (2020). Skolverkets lägesbedömning 2020. https://www.skolverket.se/publikationer?id=6436
Teske, P., & Schneider, M. (2001). What research can tell policymakers about school choice. Journal of Policy Analysis and Management, 20, 609-631.


09. Assessment, Evaluation, Testing and Measurement
Paper

Does Tracking Increase School Segregation of Immigrants? A Difference-in-Differences Approach.

Janna Teltemann1, Max Brinkmann1, Nora Huth-Stöckle2

1Universität Hildesheim, Germany; 2Bergische Universität Wuppertal

Presenting Author: Teltemann, Janna

Social integration in the context of increasing immigration is a challenge faced by many industrialized countries. The institutional set-up of an education system is a natural candidate for scrutiny from a policy perspective, since education is at the forefront of social integration and institutional structures are malleable in nature. In this study, we examine immigrant social integration through school segregation and how it relates to the institutional structure of an educational system, in particular the existence of (early between-school) tracking. As a premise for integration, school segregation is crucial since it determines how much interaction between immigrants and non-immigrants occurs. Tracking on the other hand is as controversial as it is momentous since students are placed into different types of schools from an age as young as ten (e.g. Terrin & Triventi 2022, van de Werfhorst & Mijs 2010). This is in stark contrast to the practice of integrated education systems, in which students may be grouped by ability for certain topics or classes, but are only separated as they approach maturity.

We therefore examine whether tracked education systems show higher levels of school segregation of immigrants as education systems that delay between-school grouping. While sociologists have long documented the negative effects of tracking with regards to equality of opportunity (e.g. Terrin & Triventi 2022, van de Werfhorst & Mijs 2010), we argue that there may be counteracting mechanisms at work in tracked systems when it comes to ethnic segregation between schools.

We follow theories on educational inequality to understand school segregation and tracking. These theories relate differences in family resources to differences in educational attainment or achievement (i.e. Boudon 1974, Bourdieu 1987, Lareau 2011). Resources in this context can comprise economic capital, strategic knowledge, social contacts and familiarity with modes of behavior in the education system. In this context, we expect that immigrant students are disadvantaged, as many of these resources cannot easily be translated from the home country to the receiving country. We can therefore expect that they will show, on average, lower achievement at the end of primary school (i.e. primary effects; Boudon 1974). These finding has been shown by numerous studies (e.g. Heath et al. 2008).

Since observed achievement is a major indicator of track placement, primary effects of ethnic and social origin increase the likelihood for immigrant students to be sorted into lower secondary school tracks. However, parental decision making (i.e. secondary effects; Boudon 1974) is another determinant of track placement and it is well-known that immigrant parents tend to choose more ambitious educational pathways (Esser 2016; Gresch et al. 2012) which could compensate for low track placement based on ability. Lastly, school segregation likely exists in non-tracked systems as well. First, because home-to-school-distances are a main factor in selecting a school, residential segregation, which is a common phenomenon in many countries, is reflected in school segregation. Further, school choice behavior of non-immigrant families may contribute to ethnic school segregation, as particularly high status families tend to avoid schools with larger numbers of immigrants (“white flight”; Amor 1980). They do so, because they use immigrant concentration as a proxy for (lower) school quality. This tendency might be lower in tracked-systems, as track level is an accessible indicator of school quality. Non-immigrant families therefore do not need to avoid schools with larger numbers of immigrants (c.p. Meier & Schütz 2007).

In sum, there may be counteracting mechanisms with regard to school segregation and the age of first tracking. We therefore argue that it remains an empirical question to determine which mechanism outweighs the other.


Methodology, Methods, Research Instruments or Sources Used
Previous research on the effects of tracking on ethnic segregation point towards mixed effects of tracking (e.g. Kruse 2019). However, most previous findings look at data from single countries or cities. Moreover, they face the challenge of cross-sectional analyses that might be biased by unobserved heterogeneity.
We therefore aim at generating more generalizable findings on the impact of tracking on segregation by combining all data from PISA, TIMSS and PIRLS cycles between 1995 and 2018 for a total of 45 countries. In order to combine the data, we  harmonized the relevant information, most importantly information on immigrant background. We define immigrant background by the place of birth of the student (abroad). Based on this information, we calculated measures of segregation (index of dissimilarity D, Duncan & Duncan 1955) for each study-year and each country.
Crucial for our analytical approach is the fact that some of the studies are implemented in primary school - when no education system is tracked - and others are administered in secondary school (in grade 8 or at age 15), i.e. after tracking has been exercised. According to our definition (tracking takes place before grade 8) this is the case for nine countries in our sample.
Our analysis is based on a difference-in-differences approach that compares the difference in ethnic segregation between primary and secondary school and between tracked and untracked countries. This approach enables us to account for all other time-stable differences between countries. We still included control variables that can change over time: the gross domestic product and the population density and the privatization of the education system. Such decisions (e.g. including control variables or excluding probable outliers) however might have substantial impact on the obtained estimates. We therefore do not conduct a single analysis, instead we follow the approach of multiverse analyses (Simonsohn et al. 2020).
The term "multiverse analysis" refers to a type of analysis that accounts for the problem of multiple “forking paths” (Gelman & Loken 2013), because a research design has to be operationalized with variables, samples and estimation techniques. By systematically varying these decisions across all possible paths, we will “expand” a multiverse that incorporates all possible paths. In other words, it is a systematic way of doing robustness checks.

Conclusions, Expected Outcomes or Findings
Our preliminary results suggest that the presence of early between-school grouping (as compared to late between school grouping) has no discernible impact on immigrant school segregation. While segregation increases in both types of education systems there are heterogenous effects across model specifications with respect to the effect of tracking. By varying the choice of fixed effects, control variables (GDP, private school density and population density) and sample restrictions (different GDP cut-offs and different cut-offs for minimum or maximum share of immigrant students in a country) we obtain about 4000 model specifications of which 60% show a small negative (but overwhelmingly insignificant) effect and 40% show a small positive (but overwhelmingly insignificant) effect on school segregation. In our next steps, we will examine the effects of selectivity on segregation. We expect that higher selectivity will limit the ambitious school choices of immigrant families and therefore lead to higher levels of school segregation.

References
Armor, D. J. (1980). White flight and the future of school desegregation. School desegregation: Past, present, and future, 187-226.    

Bourdieu, P. (1987). Die feinen Unterschiede. Suhrkamp.

Duncan, O. D., & Duncan, B. (1955). A Methodological Analysis of Segregation Indexes. American Sociological Review, 20(2), 210–217. Retrieved from http://www.jstor.org/stable/2088328

Esser, H. (2016). Bildungssysteme und ethnische Bildungsungleichheiten. Ethnische Ungleichheiten im Bildungsverlauf: Mechanismen, Befunde, Debatten, 331-396.
[English: “Education systems and ethnic educational inequalities” in “Ethnic inequality along the educational pathway: Mechanisms, Results, Debates”]

Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, 348, 1-17.

Gresch, C., Maaz, K., Becker, M., & McElvany, N. (2012). Zur hohen Bildungsaspiration von Migranten beim Übergang von der Grundschule in die Sekundarstufe: Fakt oder Artefakt. Soziale Ungleichheit in der Einwanderungsgesellschaft. Kategorien, Konzepte, Einflussfaktoren, 56-67.
[English: “The case of high educational aspirations among migrants when transitioning from primary school to secondary school: fact or artifact?”]

Heath, A. F., Rothon, C., & Kilpi, E. (2008). The Second Generation in Western Europe: Education, Unemployment, and Occupational Attainment. Annual Review of Sociology, 34(1), 211–235. https://doi.org/10.1146/annurev.soc.34.040507.134728

Kruse, H. (2019). Between-school ability tracking and ethnic segregation in secondary schooling. Social Forces, 98(1), 119-146.

Lareau, A. (2011). Unequal Childhoods: Class, Race, and Family Life. Univ of California Press.

Meier, V., & Schütz, G. (2007). The economics of tracking and non-tracking (No. 50). Ifo working paper.

Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 4(11), 1208-1214.

Terrin, E., & Triventi, M. (2022). The effect of school tracking on student achievement and inequality: A meta-analysis. Review of Educational Research, 00346543221100850.

Van de Werfhorst, H. G., & Mijs, J. J. (2010). Achievement inequality and the institutional structure of educational systems: A comparative perspective. Annual review of sociology, 36, 407-428.


09. Assessment, Evaluation, Testing and Measurement
Paper

The Importance of Relative Age for Academic Achievement and Socioemotional Competencies

Alli Klapp

University of Gothenburg, Sweden

Presenting Author: Klapp, Alli

This study examines the impact of the Relative Age Effect (RAE), measured by students´ birth month, the cognitive ability, school achievement and socioemotional competencies. A longitudinal approach is applied by using data from a Swedish cohort of students born in 1972, from 3rd Grade (age 10) until the end of upper secondary school (age 19).

When Swedish students begin their first school year in August every year (age 7) some children are almost one year older than some of their peers in the same year level. This is due to that the school entry cut-off date is 1st January.

Being among the oldest or youngest in a group of students has been shown to have effects on many outcomes throughout school trajectory. Being relatively young has been identified as a negative impact on cognitive and maturity issues (Rod Larsen & Solli, 2017). While being relatively older when starting school seems to affect achievement, working life and later outcomes positively. Outcomes such as higher achievements and reaching higher educational attainment (Crawford et al. 2014). You are also more likely to participate in high school leadership activities (Dhuey & Lipscombe, 2008) and being more successful in sports (Gibbs et al., 2012). Further, the literature shows the existence of physical advantages of being relatively old which gives advantages in individuals identification processes during their upbringing (McCarthy et al., 2016). While mixed results exist on the impact of relative age on earnings (Black et al., 2008; Fredriksson & Öckert, 2014).

However, the economic literature has found a reverse age effect (RAE), suggesting that being older when starting school is beneficial for earnings earlier in the working career while being younger is beneficial for earnings later in the working career (McCarthy et al. 2016).

The effect of relative age on noncognitive outcomes such as self-concept, self-confidence self-esteem, coping and resilience strategies is also evident (Duckworth et al. 2007; Dweck, 2006). Findings from several studies show that children and adolescents being relatively old in the school cohort become influenced in their self-confidence, self-beliefs, and social interactions in school positively (Crawford et al., 2014). Further, it has been shown that relatively old children have higher self-esteem (Thompson et al., 1999, 2004) and suffer less from psychological and behaviour problems (Muhlenweg et al., 2010) compared to relatively younger students.

Even though research has shown that many socioemotional competencies seem to be affected by RAE some may be more crucial for success in learning such as coping and resilience strategies (Duckworth et al., 2007; Dweck, 2006).

This study contributes to the research field by providing empirical support for long-term consequences of relative age in school on cognitive ability, school achievements and noncognitive competencies in terms of students´ academic self-concept, coping and resilience strategies.

Purposes

The main aim of this study is to examine the importance of the relative age effect, measured by birth month, for students´ cognitive and socioemotional outcomes by using a longitudinal approach. Following research questions will be investigated with longitudinal data from several time points:

How does relative age affect cognitive outcomes in terms of cognitive ability, GPA, and educational attainment?

How does relative age affect socioemotional outcomes in terms of perceived academic self-concept, coping and resilience strategies?

What are the long-term effects of relative age for cognitive and socioemotional outcomes and for subgroups of students related to gender and family socioeconomic status?


Methodology, Methods, Research Instruments or Sources Used
Data from the Evaluation Through Follow-up (UGU) longitudinal infrastructure is used. The UGU database contain 10% national representative samples of students in 11 birth cohorts, born between 1948 to 2010. The cohort relevant to the present study were born in 1972 (N=9037). The participants were in grade 3 in the academic years 1987/88. The participants received a survey and a cognitive test in Grade 3 (age 10) and 6 (age 13) and a follow-up survey in Grade 10 (age 16 and without a cognitive test). The cognitive tests in Grades 3 and 6 were identical within the cohort and consisted of verbal, inductive and spatial battery of tasks.
The survey in Grade 10 (age 17) was sent to the participants home address by mail. Administrative and register data such as birth month, grades and educational attainment is available for all the participants through upper secondary education (age 19). The 1972 cohort is unique in the sense that the participants received cognitive ability tests at two time points in compulsory school. Another reason is that the participants finished upper secondary school in 1991.  
Descriptive statistics and regression analyses were conducted, and outcomes were measured by cognitive ability in Grade 3 and 6, Grade Point Average in 9th Grade (age 16) and educational attainment in upper secondary school (age 19). All through the analyses gender and socio-economic status (SES) were included. Several multivariate multiple regression models have been estimated and logistic regressions are underway.
Confirmatory factor analyses (CFA) and structural equation models (SEM) will be estimated to investigate the importance of socioemotional competencies (about 20 items reflecting coping and resilient strategy factors) for the relative age effect. Analyses with a longitudinal growth modelling approach is ongoing.
Data management and preparation was conducted in the SPSS program, version 28. The analyses were conducted in the Mplus program, version 8.5 (Muthén & Muthén, 1998-2019).

Conclusions, Expected Outcomes or Findings
Preliminary results show that there is a significant negative age effect on cognitive ability in Grade 3 and 6 and on GPA in Grade 9. The negative age effect decreases over time, being strongest for the measure of cognitive ability in Grade 3 (age 10). The result show that there are no effects for the covariates on the main relations between birth month and the three outcome measures of cognitive ability in Grade 3 and 6, and GPA in Grade 9. Factors reflecting coping and resilient strategies are constructed by CFA and will be analysed in SEM. Growth model analyses including data from upper secondary school is ongoing.
References
Black, S., Devereux, P., Salvanes, K.G., 2011. Too young to leave the nest? The effects of school starting age. Rev. Econ. Stat. 93, 455–467.
Crawford, C., Dearden, L., Greaves, E., 2014. The drivers of month-of-birth differences in children's cognitive and non-cognitive skills. J. R. Stat. Soc. Ser. A (Stat. Soc.), 177, 829–860.
Dhuey, E., Lipscomb, S., 2008. What makes a leader? Relative age and high school leadership. Econ. Educ. Rev. 27, 173–183.
Duckworth, A. L., Peterson, C., Matthews, M. D., & Kelly, D. R. (2007). Grit: Perseverance and passion for long-term goals. Journal of Personality and Social Psychology, 92(6), 1087–1101.
Dweck, C. S. (2006). Mindset: The new psychology of success. New York, NY: random house.
Fredriksson, P., Öckert, B., 2014. Life-cycle effects of age at school start. Econ. J. 124, 977–1004.
Gibbs, B., Jarvis, J., Dufur, M., 2012. The rise of the underdog? The relative age effect reversal among Canadian-born NHL hockey players. Int. Rev. Sociol. Sport, 47, 644– 649.
McCarthy, N., Collins, D., & Court, D. (2016). Start hard, finish better: further evidence for the reversal of the RAE advantage. Journal of Sports Science, 34(15), 1461–1465.
Mühlenweg, A.M., Puhani, P.A., 2010. The evolution of the school-entry age effect in a school tracking system. J. Hum. Resour. 45, 407–438.
Rod Larsen, E., & Solli, I.F. (2017). Born to run? Persisting birth month effects on earnings. Labour Economics, 46, 200-2010.
Solli, I.F., 2017. Left Behind by Birth Month. Educ. Econ. http://dx.doi.org/10.1080// 09645292.2017.1287881.
Thompson, A.H., Barnsley, R.H., Battle, J., 2004. The relative age effect and the development of self-esteem. Educ. Res. 46, 313–320.
 
3:30pm - 5:00pm09 SES 12 A: Exploring Teacher Factors and Educational Contexts: Implications for Practice and Policy
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Kajsa Yang Hansen
Paper and Ignite Talk Session
 
09. Assessment, Evaluation, Testing and Measurement
Paper

Teacher Turnover and School Composition in Sweden: a Panel Data Approach Using Register Data

Leah Glassow

University of Gothenburg, Sweden

Presenting Author: Glassow, Leah

It is widely accepted that teachers are one of the most important school-level inputs for student academic success. Most educational research focuses on teacher effectiveness in terms of their contribution to student test scores, but there is a growing need to examine the teaching profession as an outcome in itself. A teacher workforce characterized by high turnover rates will not only negatively impact schools via administrative burdens as well as students and their educational futures, but also the teachers themselves via their working conditions and professional satisfaction.

There is a longstanding link between low-SES schools and teacher turnover, but this literature mostly comes out of the USA, with some exceptions. Disproportionate teacher turnover rates often affect lower-SES schools and classrooms in particular (Bacolod, 2007; Bonesrønning, Falch & Strøm, 2005; Hanushek et al., 2004; Feng, 2009; Glassow, 2023), impeding organizational and administrative school functioning, and potentially contributing to longer-term student behaviours such as college attendance and high school completion (Jackson, 2018). Moreover, high turnover rates may be symptomatic of worsening working conditions and professional satisfaction which have been documented in a number of education systems (Ball, 2016; Craig, 2017).

There is therefore a need to document the extent to which teachers mirror socioeconomic demographics of schools and concrete ways in which to democratize access to teacher competence in Sweden. This is a pertinent issue due to the demographic changes occurring in Sweden over the past several decades, the rising school inequality in the country (Karbownik, 2020; Yang Hansen & Gustafsson, 2016). The present study seeks to contribute to this gap in knowledge and examine whether changes in school composition (by family education level or language spoken by the students) results in changes in teacher turnover rates. Using teacher and student register data, the study first examines in a descriptive fashion whether there are growing differences between schools in terms of teacher turnover rates. Next, using a panel data model, the link between changes in student school composition and teacher turnover are explored. Whether or not causal conclusions can be made from such an approach will also be explored in the paper.

Allensworth, Ponisciak and Mazzea (2009) outline several main reasons teachers cite their dissatisfaction with certain schools: principal effectiveness, dysfunctional administration, challenging students, low salary, and limited autonomy which may be due to additional accountability practices. Vagi and Pivarova (2017) consolidate the literature employing theoretical frameworks for teacher mobility and offer person-environment fit theory (Dawis, 1992) as a theory which may encapsulate the myriad of environmental and personal factors which may be relevant for teacher mobility. While the focus of the study is on the role of socioeconomic composition of schools and classrooms in teacher mobility behaviours, person-environment fit theory allows for an accurate estimation of factors which may bias results unless they are under control, or unless proper methodologies are used which account for unobserved heterogeneity. Dawis (2004) highlights that job satisfaction or work stress are the result of successful or mismatched employees, respectively.

Against this background, the main research questions of the study are:

1) Are between school turnover rates growing in Sweden over the past several decades?

2) Do changes in socioeconomic and migration demographics of schools result in higher turnover rates? Specifically, do schools with a higher proportion of students with main languages other than Swedish exhibit a significantly higher proportion of teachers who leave?

3) Does this change depending on teacher qualifications? For example, are more experienced teachers more or less likely to leave as a result of these changes?


Methodology, Methods, Research Instruments or Sources Used

The data come from the teacher and student registry from the Swedish National Agency for Education between the years 2000 and 2013. The information is collected yearly. This registry includes all teachers employed in Swedish schools and not just a sub-sample.  It contains information on teachers’ qualifications (education, specialization, experience, certification) as well as their working conditions (workplace, permanent vs. fixed-term status, and workload). The data are matched to the pupil registry for lower and upper secondary schools. Since the teachers cannot be linked to students but only to schools, the analysis concerning the socioeconomic composition is conducted at the school-level.
OLS regressions may be biased due to unobserved differences between schools and their association with the model residuals. Educational researchers are increasingly becoming aware of the advantageous of approaches using fixed effects. The analysis is conducted in posit (formerly known as RStudio) using the plm package (Croissant & Milo, 2008). Panel data techniques are employed, which account for time-invariant unobserved heterogeneity associated within the teachers (the subjects). The restriction of variation to within individuals over time account for all factors at the individual level which are constant. The remaining variation is the change in school characteristics over time. The odds of changing schools will be regressed on school characteristics related to parental education and migration composition. The analysis controls for time-varying characteristics at the school-level, such material resources or other factors, such as geographic location. The analysis also considers effect heterogeneity, in terms of whether or not the link between school composition and teacher turnover changes as a function of teacher characteristics. In a final step, the reduction in variation imposed by the fixed effects is investigated by transforming the estimate by the within-unit standard deviation, and within-unit standard deviations are presented for each school.

Conclusions, Expected Outcomes or Findings
The study expects to find a link between student socioeconomic composition and teacher mobility, whereby schools with higher proportions of students with the right to Swedish language education and lower parental education levels experiencing higher turnover rates. A general positive trend of increasing inequality in teacher turnover between schools is also expected. It is more difficult to speculate about the effects across teacher characteristics, as research is mixed, highlighting the need for this study to shed more light on the issue (Glassow, 2023). The study will provide valuable empirical evidence regarding dimensions of inequality which are often overlooked. First, the fact that the working conditions of teacher may be becoming more unequal across job settings, and second, how this affects school functioning and cohesion from an organizational perspective.
References
Allensworth, E., Ponisciak, S., and Mazzeo, C. (2009). The schools teachers leave: teacher mobility in Chicago public schools. Consortium on Chicago School Research, 1-52.  
Bacolod, M. (2007). Who teaches and where they choose to teach: college graduates of the 1990s. Educational Evaluation and Policy Analysis, 29, 155-168.
Ball, S. (2016). Neoliberal education? Confronting the slouching beast. Policy Futures in Education, 14, 1046–1059.
Bonesrønning, H., Falch, T., & Strøm, B. (2005). Teacher sorting, teacher quality, and student composition. European Economic Review, 49, 457-483.
Craig, C. (2017). International teacher attrition: multiperspective views. Teachers and Teaching, 23, 859-862.
Croissant, Y., & Millo, G. (2008). Panel data econometrics in R: The plm package. Journal of Statistical Software, 27, 1–43
Dawis, R. V. (2004). Job satisfaction. In J. C. Thomas (Ed.), Comprehensive handbook of psychological assessment, Vol. 4. Industrial and organizational assessment (pp. 470–481). John Wiley & Sons, Inc..
Feng, L. (2009). Opportunity wages, classroom characteristics, and teacher mobility.
Southern Economic Journal, 75, 1165-1190.
Glassow, L. (2023). Teacher turnover and performance-based school accountability: a global issue? Journal of Education Policy, forthcoming.
Hanushek, E. A., Kain, J. F., & Rivkin, S. G. (2004). Why Public Schools Lose Teachers. The Journal of Human Resources, 39, 326-354.
Jackson, C.K. (2009). Student demographics, teacher sorting, and teacher quality: Evidence from the end of school desegregation. Journal of Labour Economics, 27, 213-256.
Jackson, C.K. (2018). What do test scores miss? The importance of teacher effects on non test score outcomes. Journal of Political Economy, 126, 2072-2107.
Karbownik, K. (2020). The effects of student composition on teacher turnover: evidence from ad admission reform. Economics of Education Review, 75.
Vagi, R., & Pivovarova, M. (2017). "Theorizing teacher mobility": a critical review of
literature. Teachers and Teaching, 23, 781-793.
Yang Hansen, K., and Gustafsson, J.E. (2016). Causes of educational segregation in Sweden –school choice or residential segregation. Educational Research and Evaluation, 22, 23-44.


09. Assessment, Evaluation, Testing and Measurement
Paper

Teachers' Job Satisfaction: Understanding the Links Between Teacher Characteristics, Sense of Workload and Job Satisfaction

Mari Lindström, Stefan Johansson, Linda Borger

Gothenburg University, Sweden

Presenting Author: Lindström, Mari

Whether or not teacher training equips teachers with the professional knowledge and competence they need to deliver high-quality teaching has been an important area of debate in recent decades (Darling-Hammond, 2016). Research has shown that teachers develop their knowledge, competence, and skills through teacher education and subject-specific specializations during teacher training (Coenen et al., 2018; Hill et al., 2019) as well as through years of teaching experience (Coenen et al., 2018) and professional development (Hill et al., 2019). There is evidence too that teachers play a key role in influencing student learning and achievement (e.g., Coenen et al.). However, despite the best of formal qualifications the conditions for the working environment and teachers’ job satisfaction can affect how teachers exercise their competence in classrooms (Collie et al., 2012). Indeed, teachers’ professional competence is recognized as a multi-dimensional construct consisting of a broad range of cognitive and affective aspects of teacher characteristics that interact with teacher work (Blömeke, 2017). For this reason, the importance of the working environment and working conditions cannot be overlooked as it has been shown in previous research that teachers’ workload affects teacher job satisfaction (Toropova et al., 2021). Teachers’ job satisfaction, in turn, is suggested to influence teacher instruction and the learning support offered to students (Klusmann et al., 2008). In addition, teachers’ working environment, in terms of greater classroom autonomy and fewer disciplinary problems (Nguyen et al., 2020) as well as the attractiveness of the teaching profession are suggested to be factors influencing whether teachers remain in the profession or not (Viac & Fraser, 2020). Furthermore, research also shows that school socio-economic status (SES) is associated with teachers’ working conditions and well-being. Teachers working in schools with a lower socio-economic status report not only higher mental workload but also poorer well-being (Virtanen et al., 2007). Considering that working conditions and job satisfaction are associated with important teacher and student outcomes more studies that examine relationships within this area are needed.

Many European countries struggle with changes in recruitment to the teaching profession, declining status of the teaching profession, and increasing teacher turnover (eg., Skaalvik & Skaalvik, 2011). In Sweden, these issues are perhaps particularly pertinent (Alatalo et al., 2021; Holmlund et al., 2020) since Sweden also faces increasing school segregation and increasing achievement gaps (Yang Hansen & Gustafsson, 2016). Against this background, the present study aims to investigate factors related to teachers’ workload and job satisfaction in Swedish compulsory schools. Our theoretical point of departure is based on Blömeke’s (2017) modelling of teachers’ professional competence. Teacher competence is modelled as a multi-dimensional construct where all teacher resources play together to deal with the demands and challenges of the classroom. We investigate the relationships between different teacher characteristics and working conditions and teachers’ sense of job satisfaction. We hypothesize that teachers with more experience and a subject-specific specialization in mathematics have higher job satisfaction. More time in the profession may have helped teachers to find coping strategies but not only that, more specialized teachers are likely to work with the subject and grade they are trained for. This in turn could reduce the workload and increase the sense of job satisfaction. Moreover, we hypothesize that schools’ socio-economic composition is associated with teachers’ sense of workload and job satisfaction. More specifically, our research questions are:

1.) To what extent are teachers’ characteristics related to their working conditions and job satisfaction?

2.) Do teachers’ sense of workload and job satisfaction vary depending on students’ socio-economic background?


Methodology, Methods, Research Instruments or Sources Used
The current study is based on data from Sweden’s participation in the most recent Trends in International Mathematics and Science Study (TIMSS 2019). TIMSS is organized by the International Association for the Evaluation of Educational Achievement (IEA) and assesses fourth-grade and eighth-grade students’ mathematics and science achievement on a 4-year cycle. The data was retrieved from the official website of TIMSS (http://timssandpirls.bc.edu) and we took advantage of the data from the background questionnaires to the fourth-grade teachers.

To answer our research questions, we selected information about teachers’ sense of current workload, and information about teachers’ sense of job satisfaction indicated by items concerning their sense of being a teacher. Further, we selected information of teachers’ teaching experience and subject orientation during teacher training. TIMSS provides a detailed specification of the differences in subject specializations, and we used this information to categorize a variable that indicated higher and lower degrees of specialization for teaching mathematics in grade four. From students, we retrieved information about their socio-economic background measured by the number of books at home. However, further elaboration on the socio-economic background will be carried out with variables from both student and caregivers’ questionnaire answers.

TIMSS has a hierarchical design with students nested in classrooms/teachers and for this reason, the study relied on multilevel regression to account for potential cluster effects that are due to the nature of the data (e.g., Hox, 2002). Sampling weights were used to account for the stratification. Through means of confirmatory factor analysis (CFA), we modelled latent variables for teachers’ workload and job satisfaction. The latter was used as an outcome variable in a multilevel structural equation model to investigate the relationships to workload, teaching experience, and subject specialization/s. By means of student background information, we constructed a variable that aimed to capture school segregation and was used in the analysis to measure differences in teacher workload and job satisfaction between classrooms/schools. The next step is to include a similar analysis for grade 8 to compare how the results differ between grades. These analyses are to be carried out during spring. The main programs for data analysis were SPSS 29 and Mplus version 8 software.

Conclusions, Expected Outcomes or Findings
The initial results demonstrate that teachers’ sense of less workload has a significant relationship to teachers’ sense of better job satisfaction in grade 4 (b= .33 (.10), p= .001). The workload indicators (e.g., too much material to cover, too many hours, too many administrative tasks, and the need for more time to prepare and more time to assist students), indicated large variability among teachers, and the results suggest that more experienced teachers experience a higher level of workload (b= -.27 (.08), p= .001). However, no significant relationship between experience and job satisfaction was found. Having a specialization aimed at mathematics and science teaching, in turn, has a significant positive relationship with teachers’ sense of less workload (b= .26 (.08), p= .01), but no significant relationship with job satisfaction. When adding school SES as a control variable into the model, the relationships between experience and specialization and workload change only slightly, suggesting that the socio-economic status of the school does not decrease/increase the relationships to any greater extent. The results indicate that the relationship between workload and job satisfaction is the same for teachers regardless of the school’s SES. In a next step, we aim to shed light on differences across grades by means of data from 8th grade. We expect to see some differences due to the different working conditions for teachers in grade 4 and 8 teachers. For example, in Sweden, teachers in grade 8 assign grades as opposed to teachers in grade 4 and this might be one factor that increases teacher workload.

There are several limitations to this study. First, causal relationships examined in the study cannot be supported due to the cross-sectional study design. Another limitation is that the study is threatened by single-source bias by the self-reported questionnaire answers of teachers.

References
Alatalo, T., Hansson, Å., & Johansson, S. (2021). Teachers' academic achievement: evidence from Swedish longitudinal register data. European journal of teacher education, ahead-of-print(ahead-of-print), 1-21. https://doi.org/10.1080/02619768.2021.1962281
Blömeke, S. (2017). Modelling teachers' professional competence as a multi-dimensional construct. In Pedagogical Knowledge and the Changing Nature of the Teaching Profession (p. 119-135). OECD Publishing. https://doi.org/10.1787/9789264270695-7-en
Coenen, J., Cornelisz, I., Groot, W., Maassen van den Brink, H., & Van Klaveren, C. (2018). Teacher characteristics and their effects on student test scores: a systematic review. Journal of economic surveys, 32(3), 848-877. https://doi.org/10.1111/joes.12210
Collie, R. J., Shapka, J. D., & Perry, N. E. (2012). School Climate and Social-Emotional Learning: Predicting Teacher Stress, Job Satisfaction, and Teaching Efficacy. Journal of Educational Psychology, 104(4), 1189-1204. https://doi.org/10.1037/a0029356
Darling-Hammond, L. (2016). Research on Teaching and Teacher Education and Its Influences on Policy and Practice. Educational researcher, 45(2), 83-91. https://doi.org/10.3102/0013189X16639597
Hill, H. C., Charalambous, C. Y., & Chin, M. J. (2019). Teacher Characteristics and Student Learning in Mathematics: A Comprehensive Assessment. Educational policy (Los Altos, Calif.), 33(7), 1103-1134. https://doi.org/10.1177/0895904818755468
Holmlund, H., Sjögren, A., & Öckert, B. (2020). Jämlikhet i möjligheter och utfall i den svenska skolan (Rapport 2020:7), [Equality in opportunities and outcomes in the Swedish school]. Institutet för Arbetsmarknads- och Utbildningspolitisk Utvärdering.
Hox, J. (2002). Multilevel Analysis: Techniques and Applications. Taylor and Francis. https://doi.org/10.4324/9781410604118
Klusmann, U., Kunter, M., Trautwein, U., Lüdtke, O., & Baumert, J. (2008). Teachers' Occupational Well-Being and Quality of Instruction: The Important Role of Self-Regulatory Patterns. Journal of Educational Psychology, 100(3), 702-715. https://doi.org/10.1037/0022-0663.100.3.702
Nguyen, T. D., Pham, L. D., Crouch, M., & Springer, M. G. (2020). The correlates of teacher turnover: An updated and expanded Meta-analysis of the literature. Educational research review, 31, 100355. https://doi.org/10.1016/j.edurev.2020.100355
Skaalvik, E. M., & Skaalvik, S. (2011). Teacher job satisfaction and motivation to leave the teaching profession: Relations with school context, feeling of belonging, and emotional exhaustion. Teaching and Teacher Education, 27(6), 1029-1038. https://doi.org/10.1016/j.tate.2011.04.001
Toropova, A., Myrberg, E., & Johansson, S. (2021). Teacher job satisfaction: the importance of school working conditions and teacher characteristics. Educational review, 73(1), 71-97. https://doi.org/10.1080/00131911.2019.1705247
Virtanen, M., Kivimäki, M., Elovainio, M., Linna, A., Pentti, J., & Vahtera, J. (2007). Neighbourhood socioeconomic status, health and working conditions of school teachers. Journal of epidemiology and community health (1979), 61(4), 326-330. https://doi.org/10.1136/jech.2006.052878
Yang Hansen, K., & Gustafsson, J.-E. (2016). Causes of educational segregation in Sweden - school choice or residential segregation. Educational research and evaluation, 22(1-2), 23-44. https://doi.org/10.1080/13803611.2016.1178589


09. Assessment, Evaluation, Testing and Measurement
Paper

Teacher Beliefs on the Nature of Mathematics: Do These Affect Students’ Motivation and Enjoyment of Mathematics Across Different European Countries

Xin Liu1, Jelena Radišić1, Kajsa Yang Hansen2,3, Nils Buchholtz4, Hege Kaarstein1

1University of Oslo; 2University West; 3University of Gothenburg; 4University of Hamburg

Presenting Author: Liu, Xin

Mathematics competence plays a crucial role in solving problems, developing analytical skills, and providing the essential foundation to build knowledge in understanding the content of other school subjects. However, international large-scale assessment (ILSA) studies have pointed to significant cross-country variation in students’ mathematics competency levels and their motivation to learn mathematics (Mullis et al., 2020).

Students’ motivation is seen as the driving force behind their learning of mathematics over time (Wigfield et al., 2016). This is coupled with more recent ideas on the need to support strong mathematics self-efficacy (Parker et al., 2014) and positive academic emotions. Expectancy-value theory points out that achievement-related choices are motivated by a combination of students’ expectations for success and task value in particular domains (Eccles & Wigfield, 2020). Control-value theory focuses on the emotions experienced while students are involved in an achievement activity, such as the succeeding or failing emotions that arise as an outcome of an achievement activity (Pekrun et al., 2017). Indeed, empirical evidence speaks in favour of a significant relationship between teachers holding beliefs about their students’ learning, teaching a particular subject or its nature, and student motivation and their academic achievement (Muis & Foy, 2010). Given the influence of teachers’ beliefs, research has found that students report gender-stereotyped teacher ability expectations, particularly in domains of mathematics (e.g., Dickhauser & Meyer, 2006; Lazarides & Watt, 2015). Meanwhile, motivation differs in specific-domain and genders, such as mathematics learning motivation (Eccles & Wigfield, 2020).

Notwithstanding, the positive relationship between motivation and achievement in mathematics has been confirmed (e.g. Garon-Carrier et al., 2016), yet different theoretical perspectives have led to diverse ways of capturing motivation, and thus different strengths and directions of the relationship (Pipa et al., 2017). Our review of the literature found that, while many studies have measured and incorporated motivation, the nature of the relationship between teacher beliefs and motivation, for example, whether gender mediates this relationship, remains unclear. This is particularly important in the context of motivation and its development, given motivation is also seen as an essential outcome of learning. The present study is designed to investigate the relationship between teachers’ beliefs of the nature of mathematics and different aspects of students’ motivation following the Expectancy-value (Eccles & Wigfield, 2020) and enjoyment of mathematics (Pekrun et al., 2017), focusing on gender differences in motivational patterns. Building upon the conceptual framework and research objective, we focus on the following research questions (a) Do teachers’ beliefs about the nature and learning of math affect students’ motivation and enjoyment, taking into account students’ math achievement and classroom composition? (b) Are these mechanisms different between boys and girls?


Methodology, Methods, Research Instruments or Sources Used
Data were collected from 3rd and 4th-grade mathematics teachers and their students across six European countries (i.e., Norway, Finland, Sweden, Portugal, Estonia, and Serbia). The scale used to capture teachers’ beliefs on the nature of mathematics was adapted from the Teacher Education and Development Study in Mathematics (TEDS-M; Laschke & Blömeke, 2014). Students’ answers were collected with the Expectancy-Value Scale (Peixoto et al., 2022), subscales of intrinsic value, utility, and perceived competence, while enjoyment was captured with a subscale from the Achievement Emotions Questionnaire-Elementary School (AEQ-ES; Lichtenfeld et al., 2012). Math achievement was measured by a test covering major curricular topics developed using established TIMSS items (Approval IEA-22-022). A joint math competence scale was established across grades due to overlapping items in the grade-specific tests. Mplus was used for statistical analyses (Muthén & Muthén,1998-2017). Missing data were handled using FIML. In all analyses, we used the robust maximum likelihood estimator (MLR). Confirmatory factor analysis (CFA) was applied to examine the measurement properties of latent constructs and test measurement invariance across six countries. We specified two-level random slope structural equation models, using the classroom as the between-level. The model assumed the within-classrooms estimate of the slope and intercept for the regression of students’ motivation and enjoyment on gender (0=girl, 1=boy) as random coefficients. Therefore, the model estimated the mean and the variance of the slopes and the intercepts. A separate analysis was conducted for all six educational systems. We refer to the estimated coefficients of the moderators as Slope_ Enjoy, Slope_ Intrinsic, Slope_ PC, and Slope_Utility. If the mean of the slope is significant, it will imply that the effect of gender did not vary between classrooms but differs within the classroom. The significant variance of slope shows that the effect of gender varies between classrooms.

In the next step, we examined whether the student motivation and enjoyment – gender slope can be explained by classroom teacher beliefs about the nature of mathematics (i.e.., mathematics as a set of rules or as a process of inquiry), taking into account student mathematics achievement and classroom composition. This is, therefore, an investigation of cross-level interaction, i.e., if classroom teacher beliefs about the nature of mathematics moderate the within-classroom relationship between gender and student motivation and enjoyment of mathematics learning. The models also included the regression of classroom mathematics achievement on the classroom composition (i.e., % low SES students and % students with behavioural problems).

Conclusions, Expected Outcomes or Findings
Metric invariance across countries and grades was confirmed for motivation dimensions (i.e., intrinsic value, utility, and perceived competence), enjoyment and teacher beliefs about the nature of mathematics (mathematics as a set of rules or as a process of inquiry). The estimate of the variance and mean of the Slope tended to be small, and, in most cases, they were non-significant. The variance of Slope_Intrinsic is significant in five countries (excl. Finland), and Slope_ Enjoy is significant in four countries (excl. Estonia and Serbia). The variance Slope_PC is significant in Portugal. Norway and Sweden have a significant variance of Slope_Utility. Correlations between estimates from the negative significant slope regressions were only found in Portugal for Slope_PC on Inquiry and Slope_PC on Rules. The results showed that the effect of inquiry and rules on students’ perceived competence was gender-specific and higher for girls in Portugal. If teachers' beliefs on the nature of mathematics were stronger, girls reported higher perceived competence related to mathematics. The mean of Slope_PC was significant in all six countries. This pattern may reflect gender differences within the classroom, and girls perceiving themselves to be less competent in mastering mathematics.

Observations from the student-level models indicate that students’ intrinsic value and perceived competence positively relate to their enjoyment of math in all six countries. The positive relations between utility and enjoyment were confirmed in Finland, Norway, Serbia, and Sweden. At the classroom level, boys were more externally motivated (i.e. higher utility value) to learn mathematics in classrooms composed of students from socioeconomically disadvantaged families in Norway. Girls’ intrinsic value was higher in Norwegian and Swedish classrooms saturated by more students with behavioural problems.

References
Dickhauser, O., & Meyer, W. (2006). Gender differences in young children’s math ability attributions. Psychology Science, 48(1), 3–16.
Eccles, J. S., & Wigfield, A. (2020). From expectancy-value theory to situated expectancy-value theory: A developmental, social cognitive, and sociocultural perspective on motivation. Contemporary Educational Psychology, 61.
Garon-Carrier, G., Boivin, M., Guay, F., Kovas, Y., Dionne, G., et al. (2016)., Intrinsic Motivation and Achievement in Mathematics in Elementary School: A Longitudinal Investigation of Their Association. Child Development, 87, 165–175.
Laschke, C., & Blömeke, S. (2014). Teacher Education and Development Study: Learning to Teach Mathematics (TEDS-M 2008). Dokumentation der Erhebungsinstrumente. Waxmann Verlag.
Lazarides, R., Rubach, C., & Ittel, A. (2017). Adolescents’ Perceptions of Socializers’ Beliefs, Career-Related Conversations, and Motivation in Mathematics. Developmental Psychology, 53(3), 525-539.
Lichtenfeld, S., Pekrun, R., Stupnisky, R.H., Reiss, K., & Murayama, K. (2012). Measuring students’ emotions in the early years: The Achievement Emotions Questionnaire-Elementary School (AEQ-ES). Learning and Individual Differences 22, 190-201.
Muis, K. R., & Foy, M. J. (2010). The effects of teachers’ beliefs on elementary students’ beliefs, motivation, and achievement in mathematics. In L. D. Bendixen & F. C. Feucht (Eds.), Personal epistemology in the classroom: Theory, research, and implications for practice (pp. 435–469). Cambridge University Press.  
Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 International Results in Mathematics and Science.Boston College, TIMSS & PIRLS International Study Center.
Muthén, L. K., & Muthén, B. (2017). Mplus user’s guide: Statistical analysis with latent variables. Wiley.
Parker, P. D., Marsh, H. W., Ciarrochi, J., Marshall, S., & Abduljabbar, A. S. (2014). Juxtaposing math self-efficacy and self-concept as predictors of long-term achievement outcomes. Educational Psychology, 34(1), 29-48.
Peixoto, F., Radišić, J., Krstić, K., Hansen, K. Y., Laine, A., Baucal, A., Sõrmus, M., &amp; Mata, L.(2022). Contribution to the Validation of the Expectancy-Value Scale for Primary School Students. Journal of Psychoeducational Assessment, https://doi.org/10.1177/07342829221144868
Pekrun, R., Lichtenfeld, S., Marsh, H. W., Murayama, K., & Goetz, T. (2017). Achievement emotions and academic performance: Longitudinal models of reciprocal effects. Child Development, 88(5), 1653-1670.
Pipa, J., Peixoto, F., Mata, L., Monteiro, V., & Sanches, C. (2017). The Goal Orientations Scale (GOS): Validation for Portuguese students. European Journal of Developmental Psychology, 14(4), 477-488.
Wigfield, A., Tonks, S., & Klauda, S. L. (2016). Expectancy-value theory. In K. R. Wentzel & A. Wigfield (Eds.), Handbook on motivation in school (2nd ed., pp. 55–76). Routledge.
 
5:15pm - 6:45pm09 SES 13 A JS: Advancing Assessment Tools and Strategies in Subject-Specific Contexts
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Serafina Pastore
Joint Paper Session NW09 and NW 27
 
09. Assessment, Evaluation, Testing and Measurement
Paper

Construction and Validation of a Reading Literacy Test for English Language Learners in Kazakhstan

Aliya Olzhayeva

Nazarbayev University, Kazakhstan

Presenting Author: Olzhayeva, Aliya

Standardized testing is a manifestation of the neoliberal agenda and human capital theory (Rizvi & Lingard, 2010). Testing is perceived as one of the instruments to hold teachers and schools accountable for students’ performance which can lead either to rewards or sanctions. Standardized testing is one of the means of implicit control and governance that allow policymakers and politicians to audit the education system (Graham & Neu, 2004). Critics of standardized testing argue that it widens the gap between different groups of the student population (Au, 2016), encourages teachers to teach to the test and ignore the unassessed curriculum content and other subjects (Lingard, 2011; Koretz, 2017; Bach, 2020), and facilitates the practice of gaming the system to illustrate the growth in student performance (Rezai-Rashti & Segeren, 2020; Heilig & Darling-Hammond, 2008). Despite the severe criticism of standardized testing, it still can be used as an effective tool to inform teaching and learning. Testing can help curriculum designers, test developers, teachers, and educators identify students’ needs and tailor instruction in relation to those needs (Hamilton et al., 2002; Brown, 2013; Singh et al., 2015). It can also allow policymakers to evaluate the success and efficacy of the education system and identify the potential issues that could be addressed (Campbell and Levin, 2009).

The purpose of the study is to construct and validate reading assessments that account for the local contextual factors such as curriculum standards and expectations and that could provide formative information to students and teachers. The current research study includes several stages: pre-pilot study, pilot study, and main studies. In this abstract, the results of the pilot study will be presented.

The study aims to answer the following research questions:

What are the students' perceptions of the proposed testing instrument?

What are the psychometric properties of the pilot test?

The theoretical framework that guides my research is entitled evidence-centered design by Mislevy and Riconscente (2006). ECD employs the concept of layers where each layer possesses its own characteristics and processes. The goal of domain analysis is to collect substantive information about the target domain and to determine the knowledge, skills, and abilities about which assessment claims will be made. Domain modelling organizes the results of domain analysis to articulate an assessment argument that links observations of student actions to inferences about what they know or can do. Design patterns in domain modelling are arguments that enable assessment specialists and domain experts to gather evidence about student knowledge (Mislevy & Haertel, 2006). The third layer – conceptual assessment framework - provides the internals and details of operational assessments. The structure of the conceptual assessment framework (CAF) is expressed as variables, task schemas, and scoring mechanisms. This layer generates a blueprint for the intended assessment and gives a concrete shape to it. Assessment implementation constructs and prepares all operational elements specified in CAF: authoring tasks, finalizing scoring rubrics, establishing parameters in measurement models, and the like. The assessment delivery layer is where students are engaged with the assessment tasks, their performances are evaluated and measured, and feedback and reports are produced. Thus, ECD provides an essential framework for approaching test design, scoring, and analysis. In my study, the ECD framework will act as guidance to ensure that each layer is constructed, and relevant evidence accumulated.


Methodology, Methods, Research Instruments or Sources Used
The instrument of the proposed research is designed to assess students’ reading literacy in English. A number of standardized tests that measure reading literacy were reviewed. The main criteria for selecting tests were: 1) Anglophone tests; 2) availability of an online test for the public use; 3) standardized reading literacy tests; 4) tests for secondary and high school students; 5) grade-appropriate language and cognitive difficulty levels of the reading passages and test items; 6) a sufficient number of test items; 7) tests that have been used with big populations. Texts that displayed cultural bias and other features that might negatively impact test validity and reliability were not selected.
Since it is important to ensure alignment between assessment instrument and curriculum, subject experts were involved in the present study. I also used one element of Webb’s alignment model (1997) which is depth-of-knowledge (DOK) criteria that include the levels of cognitive complexity: recall of information, basic reasoning, complex reasoning, and extended reasoning.
First, experts matched DOK level and curriculum objectives with the test items. Before independently coding test items, the subject experts independently coded between five to ten items and then compared DOK levels assigned to the test items and corresponding learning objectives (Webb, 2002). After this stage, experts identified one or two objectives from the curriculum which correspond to each test item. It is not required that all experts should reach a unanimous decision about the correspondence between items and objectives. Teachers’ feedback would help to eliminate items that (1) do not map with the curriculum, or (2) might be considered ambiguous or confusing for students. Expert judgement may also help to identify potential sources of irrelevantly difficult items or items that might be far too easy for lower ability students (AERA, 2014).
Piloting assessment items is one of the ways to ensure test validity. Standard 3.3 of AERA (2014) states that analyses carried out in pilot testing should identify the aspects of the test design, content, and format that might distort the interpretations of the test scores for the intended population. In the current study, pre-test items were piloted with 11th grade students in one of the target schools.
After test piloting, retrospective probing were conducted. The main goal of retrospective probing is to examine participants’ understanding of the tasks or questions (Leighton, 2017).

Conclusions, Expected Outcomes or Findings
Five Grade 11 students were interviewed regarding their perceptions of the reading test. Three female and two male students participated in the interview. Overall, students made recommendations regarding some of the questions and distractors. For instance, students pointed out the unclarity of the distractors in some questions. Furthermore, some students argued that two correct options are possible in one of the questions. The questions that were identified problematic and confusing for students were reviewed and the corresponding changes were made.
The reading literacy test comprised 32 multiple-choice questions (31 items were dichotomous while one item was partial credit: 0, 1, 2). Test was administered among 69 Grade 11 students in a pilot school site. The mean of the students’ responses was 17.23 (SD = 5.71). The minimum score was 5 and the maximum score was 29.
Cronbach’s alpha was estimated to be .79 which is an indicator of an acceptable level of test reliability (DeVellis, 2017). However, the estimation of point-biserial correlations illustrate that some items have low levels of item discrimination even though all items exhibited positive values suggesting that all items are tapped to the reading construct.
Test items were analyzed employing the Rasch model with the assistance of the TAM package (Robitzsch, et al., 2021) using marginal maximum likelihood (MML) estimation (Bock & Aitkin, 1981). As one item involved partial scoring, the Masters (1982) partial credit Rasch model was used. Item difficulty estimates were constrained to zero, though the mean difficulty estimate was -0.09 (SD = 0.87).
Item fit analysis revealed some of the problematic items that should be reviewed prior testing with larger population of students.

References
American Educational Research Association (2014). Standards for educational psychological testing. American Educational Research Association.
Au, W. (2016). Meritocracy 2.0: High-stakes, standardized testing as a racial project of neoliberal multiculturalism. Educational Policy, 30(1), 39-62.
Bach, A. J. (2020). High-Stakes, standardized testing and emergent bilingual students in Texas. Texas Journal of Literacy Education, 8(1), 18-37. Retrieved September 30, 2021, from https://www.talejournal.com/index.php/TJLE/article/view/42
Brown, G. T. (2013). asTTle–A National Testing System for Formative Assessment: how the national testing policy ended up helping schools and teachers. In M. Lai & S. Kushner, A developmental and negotiated approach to school self-evaluation (pp. 39-56). Emerald Group Publishing Limited.
Campbell, C., & Levin, B. (2009). Using data to support educational improvement. Educational Assessment, Evaluation and Accountability, 21(1), 47-65.

DeVellis, R. F. (2017). Scale development. Theory and Applications (4th ed.). SAGE.
Hamilton, L. S., Stecher, B. M., & Klein, S. P. (2002). Introduction. In L.S. Hamilton, B.M. Stecher & S.P. Klein (Eds.), Making sense of test-based accountability in education (pp.1-12). RAND.
Heilig, J. V., & Darling-Hammond, L. (2008). Accountability Texas-style: The progress and learning of urban minority students in a high-stakes testing context. Educational Evaluation and Policy Analysis, 30(2), 75-110.
Koretz, D. (2017). The testing Charade: Pretending to make schools better. The University of Chicago Press.
Leighton, J.P. (2017). Using think-aloud interviews and cognitive labs in educational research. Oxford University Press.
Lingard, B. (2011). Policy as numbers: Ac/counting for educational research. The Australian Educational Researcher, 38(4), 355-382.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174 doi:10.1007/BF02296272
Mislevy, R. J., & Haertel, G. (2006). Implications of evidence-centered design for educational assessment. Educational Measurement: Issues and Practice, 25, 6–20.
Mislevy, R. J., & Riconscente, M. M. (2006). Evidence-centered assessment design: Layers, concepts, and terminology. In S. Downing & T. Haladyna (Eds.), Handbook of test development (pp. 61–90). Erlbaum.
Rezai-Rashti, G. M., & Segeren, A. (2020). The game of accountability: perspectives of urban school leaders on standardized testing in Ontario and British Columbia, Canada. International Journal of Leadership in Education, 1-18. doi.org/10.1080/13603124.2020.1808711
Robitzsch, A., Kiefer, T., & Wu, M. (2021). TAM: Test Analysis Modules. R package version 3.7-16. https://CRAN.R-project.org/package=TAM
Singh, P., Märtsin, M., & Glasswell, K. (2015). Dilemmatic spaces: High-stakes testing and the possibilities of collaborative knowledge work to generate learning innovations. Teachers and Teaching, 21(4), 379-399.


09. Assessment, Evaluation, Testing and Measurement
Paper

Developing A Linear Scaled Assessment-Tool For Mathematical Modelling In Chemistry

Benjamin Stöger, Nerdel Claudia

Technical University of Munich; Associate Professorship of Life Science Education

Presenting Author: Stöger, Benjamin

All model assumptions in the natural sciences are based on mathematical concepts, regularities or assumptions. For this reason, mathematical modelling is central to understanding the development and validation of models in the natural sciences. The ability to evaluate, change and apply models in the sense of gaining knowledge is understood as modelling competence. With the help of modelling cycles, the modelling process can be divided into individual steps. This enables an insight into the modelling process.

Blum & Leiß (2005) developed a framework for mathematical modelling. They distinguished between two main dimensions, "rest of the world" (which includes real-world problems, their structuring, mathematical description, and the interpretation and evaluation of mathematical results) and "mathematics". The translation between these dimensions is understood as mathematical modelling. Based on these dimensions, a seven-step modelling cycle was developed. Starting from a real situation/problem, the steps are to understand the situation (1), to simplify and structure it with a focus on the problem (2) followed by mathematisation (3), which results in the transition to the dimension of "mathematics". There, results are generated with mathematical methods (4) and translated back into the context and thus back into the dimension "rest of the world" with a focus on the problem (5). Now these results are validated in relation to the context (6) and an answer is given to the concrete problem (7).

Based on the cycle for mathematical modelling developed by Blum & Leiß (2005), various subject-specific modelling cycles were derived. Goldhausen & Di Fuccia (2014) derived a mathematical modelling cycle for the subject of chemistry. For this purpose, an additional dimension "chemistry" was added that is located between "rest of the world" and "mathematics". This is necessary because a real chemical situation (e.g. chemical experiment) must first be transferred into subject-specific models in order to be able to describe and interpret a situation.The steps of the mathematical modelling cycle were adapted to the specific requirements of a chemical contextualisation. In the first step, a problem/experiment is identified on a macroscopic level and a situation model is created (1). This is then translated into a chemical model (submicroscopic or symbolic level) ( Johnstone, 1991) (2). The chemical model is then mathematised (3), for which, according to Kimpel (2018), a deeper understanding of the model is necessary. With the developed mathematical model, mathematical results can be generated with the help of mathematical tools, similar to Blum & Leiß (2005) (4). These can then be translated back into the chemical model (5) and checked for their professional usefulness (6) so that they can finally be applied to the experiment/problem (7).

As diagnostic models, modelling cycles offer the possibility of gaining an insight into the complex cognitive processes of learners during modelling. In the field of mathematics didactics, modelling cycles have been used to develop a test instrument to measure mathematical modelling ability (Haines, Crouch & Davis., 2001; Brand, 2014; Hankeln, Adamek & Greefrath, 2019). In all cases, the steps of a modelling cycle were divided into empirically based categories. Items were constructed for these categories. Prior to testing, various models were postulated on the basis of empirical studies for the items. With the help of Rasch measurement, the data has been compared with the postulated models.

Since this type of test development has so far only been conducted for mathematical modelling in general, this study investigated if a questionnaire can be used to assess learners' mathematical modelling skills.


Methodology, Methods, Research Instruments or Sources Used
A test instrument for mathematical modelling was developed on the basis of the modelling cycle by Goldhausen & Di Fuccia (2014) and the methodological approach by Brand (2014). For this purpose, the cycle is divided into five sections (A1- A5). Four sections (A1, A2, A4, A5) each describe the change between the dimensions described in the model (rest of the world, chemistry and mathematics). For this categories, 12 items ( including question and assoociate answer format) from different chemical subject areas as well as different contexts from nature and technology were constructed. Category A3 focuses on answering mathematical questions and tasks from school mathematics of varying difficulty. Twelve areas were also developed for this, each containing three items of varying difficulty. In total, there were 36 items. Each of these categories focuses on a specific aspect of mathematical modelling ability.
For example, category A1 includes questions that focus on understanding and constructing a problem or on structuring and simplifying problems or tasks. In addition, this category includes tasks in which relevant aspects of an issue have to be identified or suitable chemical models have to be selected. Category A2 revolves around mathematising the selected model. This means selecting suitable mathematical formulae, describing mathematical relationships or developing mathematical formulae. The third category (A3) is about working mathematically. Accordingly, mathematical concepts, working methods and solutions are applied here. In category four (A4), mathematical results have to be classified technically. For example, identifying the unit of a mathematical result, assigning mathematical results to variables or classifying mathematical results in the subject context. The last category (A5) of the cycle describes the interpretation of the result considering the initial situation. This means checking a result for its meaningfulness, checking whether the result fits the model used or also to generate answer sentences.
All items in all categories have a closed answer format, with five answer options each. One correct answer, two 'plausible' answers based on misconceptions and two incorrect answers. All items were distributed across twelve test booklets each containing three (nine for A3) items per category. In order to obtain a linear scale, the coded data sets are evaluated using Rasch analyses (Boone, 2014). The person measures obtained for the individual categories served as the basis for a correlation analysis of the individual categories with the overall instrument.

Conclusions, Expected Outcomes or Findings
The data for validating the test instrument was collected by questioning students. For this purpose, students in the STEM fields of chemistry, mathematics, physics, biology and mechanical engineering have been surveyed so far. The data collection will continue until the end of February 2023. N=296 students have participated in the study until now. On the basis of this data, a first analysis was made in order to be able to identify a first trend regarding the results. For this purpose, the interview data was coded with view to the distractors. With the help of the programme Winsteps (Lincare, 2000), the data was examined using a Rasch analysis. In a first step, the quality of the items used was analysed. So, it was to check how well the items fit. Mean-square fits outside the reasonable range were found for only three of the 90 items. In the individual categories, a misfit was calculated for  a few further items (A1: none; A2: none; A3: 4 out of 36; A4: 1 out of 12; A5: 1 out of 12). Subsequently, the item reliabilities of the overall model and the individual categories were determined separately. These already showed values above 0.8 in almost all categories (A1-A5: 0.94; A1: 0.95; A2: 0.82; A3=0.87; A4=0.88; A5=0.82). In addition, the students' person abilities calculated using Rasch analysis were used for a correlation analysis. Significant and highly significant correlations between the individual dimensions were calculated pairwise. Examples include the correlation of category A3 with categories A1, A2, A4 and A5 [A1 (r=.325**, p<.001, n=159); A2 (r=.214**, p=.007, n=159); A4 (r=.288**, p<.001, n=140); A5 (r=.401**, p<.001, n=137)]. This indicates that the individual dimensions capture the overall construct of mathematical modelling well.
References
Blum, W., & Leiß, D. (2005). Modellieren im Unterricht mit der „Tanken “-Aufgabe. Mathematik lehren (128), p. 18-21, Karlsruhe.
Boone, W. J., Staver, J. R., & Yale, M. S. (2013). Rasch analysis in the human sciences. Springer Science & Business Media.
Brand, S. (2014). Erwerb von Modellierungskompetenzen: Empirischer Vergleich eines holistischen und eines atomistischen Ansatzes zur Förderung von Modellierungskompetenzen. Springer-Verlag.
Goldhausen, I., Di Fuccia, D.-S., (2014) ‚Mathematical Models in Chemistry Lessons’, Proceedings of the International Science Education Conference (ISEC) 2014, 25-27 November 2014, National Institute of Education, Singapore
Haines, C., Crouch, R., & Davis, J. (2001). Understanding students' modelling skills. In Modelling and mathematics education (pp. 366-380). Woodhead Publishing.
Hankeln, C., Adamek, C., & Greefrath, G. (2019). Assessing sub-competencies of mathematical modelling—Development of a new test instrument. In Lines of inquiry in mathematical modelling research in education (pp. 143-160). Springer, Cham.
Johnstone, A. H. (1991). Why is science difficult to learn? Things are seldom what
they seem. Journal of computer assisted learning, 7(2):75-83.
Kimpel, L. (2018). Aufgaben in der Allgemeinen Chemie: zum Zusammenspiel von chemischem Verständnis und Rechenfähigkeit (Vol. 249). Logos Verlag Berlin GmbH.
Linacre, J. M., & Wright, B. D. (2000). Winsteps. URL: http://www. winsteps. com/index. htm [accessed 2013-06-27]


09. Assessment, Evaluation, Testing and Measurement
Paper

The Backwash Effect of Exam Preparation in IBDP English A and B Courses on Developing Real-life Skills.

Botagoz Issabekova, Aliya Baratova

Nazarbayev Intellectual School in Astana, Kazakhstan

Presenting Author: Issabekova, Botagoz; Baratova, Aliya

“Will we have it in our summative?” … is quite a familiar phrase for English Language and Literature, and not only, teachers of Diploma Program (DP) in International Baccalaureate, isn’t it? While we are proud of our students’ performance and diligence, we are painfully aware that their academic achievements might have been gained at the cost of both teaching (curriculum, teaching methods, delivery) and learning (curriculum, content, life skills). This influence is also evident from the literature, where Gates (1995) defines it as “a washback effect”, i.e., “the influence of testing on teaching and learning” (p.102). Prodromou (1995), while strictly differentiating between ‘testing’ and ‘teaching’, suggests a negative effect of backwash on teaching, which he believes, greatly complicates it.

Although external exams of English A and B courses (Paper 1, Paper 2 and Individual Oral) are barely considered to be tests, as they are full-fledged final written and spoken exams, we believe that the term “backwash effect” also carries some influence on language and literature learning and teaching. Teachers have to cover a huge amount of material within two years of DP, while students, along with other subjects, have to process this huge amount of material. So many exams and IB components, so little time. And no wonder, teachers are more and more leaning to the program to be optimised, which to our concerns, can lead to keeping only essentials.

Now, the question arises what the “essential” is and what is not. Hughes (1989) proposes the following to ensure positive backwash effect: test should develop certain skills; test content should be varied and cover wide-spectrum areas; it should have an effect of unpredictability; make both teachers and students understand the procedure of the test, and others. In the same vein, Bailey (2016) puts forward the following aspects: meeting language learning objectives and requirements, authenticity of the tests and samples, ensuring learner autonomy and self-assessment, providing feedback of the test results.

In our search for a balance between students’ academic achievements and their future non-academic life, there is a sinking feeling in our teaching practice that we might be overlooking the opportunities to develop their career/real life aptitudes and skills. Although there is a sufficient literature in negative effects of backwash of testing, such as IELTS, high stake tests, standardized testing (Paker, 2013; Watkins, Dahlin & Ekholm, 2005) and ways to turn it into positive one, we seek to explore the backwash effect of written and spoken exams on language teaching and learning. We believe that the assignments covered in Diploma Program are quite challenging and intensive. They require time-managements and analytical skills as leaners are asked to write a textual analysis on a given text type and a comparative essay based on at least two literary works they have studied in class (Language A). With regards to Language B, they have to produce the text type, which requires more than mere selection of the right answer from multiple choices. Despite such a difference in requirements of the final exams, we are still concerned about the fact that there are more exams in classroom than skills development, which is, we believe, a prevailing phenomenon in education.

With that in mind, the following research questions have been addressed:

1) What are both benefits and drawbacks of predominance of the exam preparation in English class?

2) How can the classroom be modified to prepare students for life (not just exams)?


Methodology, Methods, Research Instruments or Sources Used
The given study is a collaborative research of an English Language A and Language B  teachers as part of Action research conducted in 2022-2023 academic year in Nazarbayev Intellectual School (IB World School). The sample consists of two groups. The first group is an entire class of twelveth-grade students (10), was a part of the research. These students were selected for several reasons. The major reason is that they are exposed to the curriculum at the moment, their exams are not yet passed and they have the freshest memory of the classroom. They can sincerely share their thoughts and feelings about their preparation and readiness for the upcoming exams. The second group (12) are a mix of graduates of 2020-2021 and 2021-2022 cohorts. They are currently pursuing their careers in universities and are a good source of investigating their opinions towards the skills they gained at school, whether they are beneficial or not in their academic paths.
To address research questions, a mixed research design was employed. Combining both quantitative and qualitative data provides “a very powerful mix” (Miles & Huberman, 1994, p. 42). First, Subject reports of two cohorts (2020-2021, 2021-2022) have been analysed to compare students’ performance in the beginning of the 11th grade (MOCK exams) and their final exams. Apart from this, open-ended interviews that provided students’ reflection offered different perspectives on the research topic, providing “a complex structure of the situation” (Creswell, p. 537).

Conclusions, Expected Outcomes or Findings
Having analysed the data, it has become evident that so called backwash effect on Language learning and teaching has had mostly positive effect on students’ learning. This can be seen from the final results of our school compared to the world results, where in 2022 our students received on average IB 5,47 out of 7, compared to world result IB 5,43 (Language A). Regrading Language B, our students’ result IB 6,16 was equal to the world result IB 6,16. It should be noted that, compared to the world, English for our students is a third language (after Kazakh and Russian) and taught in a non-English speaking country.
Open-ended interviews have shown students’ positive attitudes towards exam preparations as they “don’t feel threatened” being accustomed to working under strict deadlines and being bound to time restrictions. This, inevitably, developed their self-organisation skills (staying focused, being mindful, stress-resistant). Another positive aspect they mentioned was the ability to work with broad range of text types that students are exposed to in their academic lives at universities. Also, the skills they learnt in English class (coding, decoding, analysing, evaluating authors’ choices) have been indirectly assisting them in their more extensive mid-term and final papers at universities.
However, data derived from open-ended interviews, revealed that this backwash effect on Language learning and teaching has had negative effect on teachers’ teaching the course. Students’ responses resonated with our concerns about a change in teaching methods. Students noted that, even though, such approach is effective, it is monotonous and quite repetitive. This is a call for English DP teachers to think and vary the methods used in classroom, as even though students’ academic performance is high and they enter top-tier universities worldwide, teachers need to fulfill all facet of their profession.

References
Bailey, K. M. (1996). Working for washback: A review of the washback concept in language testing. Language testing, 13(3), 257-279.

Creswell, J. W., & Creswell, J. D. (2017). Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications.

Gates, S. (1995). Exploiting washback from standardized tests. Language testing in Japan.

Hughes, A. (1989) Testing for Language Teachers. Second Edition. Cambridge University press.

Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook. sage.

Paker, T. (2013). The backwash effect of the test items in the achievement exams in preparatory classes. Procedia-Social and Behavioral Sciences, 70, 1463-1471.

Prodromou, L. (1995). The backwash effect: from testing to teaching. 13-25.

Watkins, D., Dahlin, B., & Ekholm, M. (2005). Awareness of the backwash effect of assessment: A phenomenographic study of the views of Hong Kong and Swedish lecturers. Instructional Science, 33, 283-309.


09. Assessment, Evaluation, Testing and Measurement
Paper

Modeling as a Tool for Formative Assessment in Biology Lessons.

Aigul Koishigarina, Gulnar Kashkinbayeva

Nazarbayev Intellectual school in Aktobe, Kazakhstan

Presenting Author: Kashkinbayeva, Gulnar

Modern education is inextricably linked with three main concepts: learning, teaching and assessment. The teacher of the 21st century must equip students with solid knowledge that will help them in life. At present, the teacher directs and coordinates the work of students, and contributes to the development of students' skills of independence, self-criticism and the ability to find the necessary, reliable information in a huge flow of knowledge and information. Thus, the student must be able to find, analyze and apply the necessary knowledge from a numerous flow of information.

According to PISA research, “Kazakh schoolchildren have subject knowledge at the level of their reproduction or application in a familiar educational situation, but they have significant difficulties in applying knowledge in real life situations…” [1].

According to the results of the international PISA study for 2019, it was revealed that 9th grade students found it difficult to complete assignments for formulating scientific questions, which amounted to 11.6% and tasks for the scientific interpretation of evidence data, which amounted to 15.9%.

Taking into account the data of an international study, we conducted a comparative analysis of the curriculum in biology, and saw that from the 7th to the 9th grades, the number of hours for modeling various processes increases, according to certain topics and sections of the curriculum. So, for example, if in the 7th and 8th grades 5 hours are allotted for modeling, then in the 9th grade it is already 6 hours. Therefore, the teacher needs to develop the skills of working with models and apply this method of pedagogical approach to the organization of the educational process, starting from the 7th grade.

These results influenced the research and the search for solutions to the questions that arose.

The main research questions are:

-How to teach a student to master the relevant skills in life?

-How can subject knowledge be improved?

To answer questions, the teacher needs to think about teaching, think about new methods of assessing students' knowledge.

The updated criteria-based assessment model helps participants in the educational process to understand and define “At what stage of learning is the student?”, “What does the student need to do to achieve the expected results?”.[2]

The aim of the study: the use of modeling in biology lessons will help students improve the quality of knowledge and will contribute to the development of functional literacy and creative thinking skills.

The presented work is the result of many years of experience of the authors, which has been tested in the educational process and supplemented in accordance with modern requirements. We sincerely hope that the methodological recommendations in the work will help teachers to increase the effectiveness of the educational process in order to educate competitive students.


Methodology, Methods, Research Instruments or Sources Used
The authors of personal development technology V.V. Davydov, V.I. Zvyagintsev, V.V. Kraevsky, I.Ya. Lerner, M.N. Skatkin, I.S. Yakimanskaya believe that it is necessary to pay attention not only to the knowledge of students, but also to their personal characteristics in the learning process.
The famous psychologist Jean Piaget argued that the models help to consolidate the knowledge gained and apply them in practice.
In the works of teachers V.B. Filimonov, G.A. Korovkin, G.I. Patyako, M.A. Danilova, V.P. Strokova, V.V. Shtepenko, the differences in concepts that determine the creative abilities of students are considered, without which it is not possible to implement the modeling technique paying attention to the emotional attitude of the participants  and their interest in the process being studied.
It should be noted that there is relationship between STEM technology and modeling. When creating models, the student must use the acquired knowledge not only in the field of one subject, but also in other subject areas and skills. Such interdisciplinary interaction will allow students to develop research skills, develop creativity and creative thinking skills and will contribute to the development of communication skills and teamwork.[4]
After studying the works of the above authors, we began to conduct research on the topic. To conduct the study , class 8 (E and F) was taken and the control group E was selected (where this modeling technique was not used), and the experimental group F (the technique was used in the classroom).
Note This study was conducted in the span of  three years, that is, students in grade 8 were already enrolled in grade 10 upon completion of the study.
Study Passport: Class 8 E/F Number of students: 12
Study start: 2019–2020 / Study completion time: 2021–2022.
Results of the study: In the experimental classes, the results of summative assessment for a quarter and for sections are + 10% higher than in the control groups, where the modeling technique was not used. The difference between SAU and SAT indicates the objectivity of the assessment. The effectiveness of the application of modeling can be seen through the results of the exam of students in grade 10, which is 100% (in the experimental group - subject of choice - biology).

Conclusions, Expected Outcomes or Findings
Every creative teacher should create favourable conditions for teaching students and provide an opportunity for the development of abilities that would help them in the future in determining the profession and in life in general.
In grades where this technique was used, the effectiveness of students' participation in olympiads and project activities increased, mostly at the Republican and International levels, by 40–60%.
So, the proposed technique can be used at various stages of the lesson. These classes allow you to:
1. develop the skills of creative thinking, analysis, research and application of knowledge;
2. use as an effective way of formative assessment;
3. help students develop key competencies in education: the ability to solve problems and manage processes independently;
4. involve all students in the active work of learning;
5.improve the quality of learning;
6. prevent students from mechanical memorization, relieves stress before the perception of educational material.[5]
Lessons using models become interesting and engaging, and help develop students' research skills and interest in the subject.[6]
Modeling in education is heuristic in nature and develops speech, memory and logic of thinking. [7] Teachers can use the technique to organize project work with students, as well as to conduct and organize elective courses and sections in order to develop creative thinking skills.
  When working with models in the educational process, the following difficulties may arise: lack of time (time management) when executing models at the initial stages of developing modeling skills; assessment of models according to criteria (originality / completeness / deliberation).[8]
The authors of the paper hope that this study will help school  teachers develop students' creative thinking skills and contribute to shaping the education of competitive students.

References
1. Guidelines for the development of natural science literacy of students. Nur-Sultan: branch "Center for Educational Programs" of AEO "Nazarbayev Intellectual Schools", 2020. - 56 pages, ISBN 978-601-328-922-9.
2. N. A. Avdeenko, M. Yu. Demidova, G.S. Kovaleva, O. B. Loginova, A. M. Mikhailova, S. G. Yakovleva. Monitoring the formation and evaluation of functional literacy. Creative thinking / 2019.-17c [electronic resource]: https://inlnk.ru/VoV34z
3. Wagner T. The Global Achievement Gap: Why Even Our Best Schools Don’t Teach the New Survival Skills Our Children Need – And What We Can Do About It. – Basic Books, 2008.
4. Azizov R. Education of a new generation: 10 advantages of STEM education [Electronic resource]: https://ru.linkedin.com/pulse/ -stem-rufat-azizov.
5. Modeling as a method of knowledge. Classification and forms of representation of models [Electronic resource]: https://clck.ru/KcW26
6. Tarasova S.A. The modeling method as a means of achieving metasubject results in the study of biology [Electronic resource]: https://www.prodlenka.org/srednjaja-shkola/3364-method-modelirovanija-kak-sredstvo-dostizhenij.html (06/22/2018)
7. Tom Bielik, Sebastian T. Opitz, Ann M. Novak, Supporting Students in Building and Using Models: Development on the Quality and Complexity Dimensions, Education Sciences. 2018, 8(3), 149; https://doi.org/10.3390/educsci8030149
8. Julie Dirksen. The art of teaching. How to make any learning fun and effective // Mann, Ivanov and Ferber.; Moscow, 2013, 276 p.
 
5:15pm - 6:45pm27 SES 13 C JS: Advancing Assessment Tools and Strategies in Subject-Specific Contexts
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Serafina Pastore
Joint Paper Session NW09 and NW 27. Full information under 09 SES 13 A JS
Date: Friday, 25/Aug/2023
9:00am - 10:30am09 SES 14 A: Assessing Quality Management, Evaluation Feedback, and Professional Capital in Education
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Arto Ahonen
Paper Session
 
09. Assessment, Evaluation, Testing and Measurement
Paper

Evaluating the Implementation of a Nationwide Quality Management System for Schools. Concept and First Results

Erich Svecnik

IQS, Austria

Presenting Author: Svecnik, Erich

Like many other European countries Austria is currently implementing a nationwide Quality Management System for schools (QMS; https://www.qms.at/). Its aim is systematic and targeted school and teaching development based on a quality circle of plan–do–check–adjust (PDCA, ‘Shewhart Cycle’) and thus similar to other countries, especially some German Länder with which there is also a continuous exchange. The most important features are the introduction of a mandatory quality framework for all schools as well as an increased data or evidence orientation in school and teaching development in general. QMS-tools include the definition of a school’s pedagogical guiding principles, a school development plan, a balance and target agreement meeting between the principal and the regional school quality manager (formerly ‘school inspector’) and a quality handbook. To support the data orientation, an internet platform with several hundred instruments for internal evaluation was provided. Formerly different quality management programs for general and vocational schools (and thus different traditions and instruments) are being merged into QMS.

The implementation process and the diffusion of the QMS and its elements into the school system are formatively evaluated in an accompanying process (Rossi et al., 2019; Stockmann, 2011). The overall objective of this evaluation is the generation of knowledge for the optimization of the implementation process as well as its monitoring and documentation of progress.

The theoretical background of this research is based on Rogers’ (2003) ‘Diffusion of Innovations’ describing typical stages of immersion. Accordingly, knowledge of the innovation is the starting point, in concrete knowledge of the QMS model, which should subsequently lead to a positive attitude or acceptance (persuasion). The next stages are the informed decision of the actors to adopt the innovation (decision) and the actual implementation, which in the best case leads to reinforcement and confirmation. Coburn (2003) focuses attention on the depth of change, its sustainability and ownership in the medium and long term, although these are of little importance in the initial phase. A closer look at the context of implementation and the creation of necessary framework conditions follows the approach of implementation research (Petermann, 2014).

The design of the evaluation and the underlying theoretical assumptions lead to the following three guiding questions:

  • How deeply has the nationwide Quality Management System already diffused into everyday school life?
  • How can the implementation process be further promoted and supported?
  • Are there different patterns in this respect in different school sectors, specifically between general education and vocational education?

According to the underlying model of Utilization-Focused Evaluation (Patton, 1997, 2000), the detailed questions are defined in the further course of the evaluation in close coordination with the persons responsible for QMS.


Methodology, Methods, Research Instruments or Sources Used
In a first step all 43 quality regional coordinators were surveyed using an online questionnaire. The survey focused on the challenges of their work, the need for support and their perceptions of the implementation of QMS to date in terms of Diffusion at school level, Acceptance, and Realization processes. Results showed high acceptance of QMS among respondents, a high level of satisfaction with support from the ministry, but the diffusion at school level is not yet perceived as very far advanced.
The next step is a survey of a representative sample of school principals and quality school coordinators using adapted versions of A-SEW (Carmignola et al., 2021) with the dimensions of meaningfulness, usefulness, and practicality of innovation. Individual items from the Stages of Concern Questionnaire (George et al., 2006) also provide information on personal aspects of implementation. Other content-related aspects refer to supportive and obstructive framework conditions including support and training needs as well as possible need for improvement of the available material. First indications of non-intended effects of the QMS introduction (e.g. Landwehr, 2015) are also to be obtained in order to be able to take countermeasures if necessary.

Conclusions, Expected Outcomes or Findings
First results will be available in August 2023. Data will be analysed using descriptive statistics for getting an overview and inferential statistics to examine differences between educational sectors. In addition, (multilevel) regression models will provide explanations and cluster analyses will help in defining tailored support for schools by identifying schools with similar characteristics.
Once the findings are available, answering the research questions will provide the persons/institution(s) responsible for and steering the implementation process of QMS with data/knowledge to optimize the processes and provide appropriate support.

References
Carmignola, M., Hofmann, F. & Gniewosz, B. (2021). Entwicklung und Validierung einer Kurzskala zur Einschätzung der Akzeptanz von Schulentwicklungsprojekten (A-SEW). Diagnostica, 67(4), 163–175.
Coburn, C. (2003). Rethinking Scale: Moving Beyond Numbers to Deep and Lasting Change. Educational Researcher, 32(6), 3–12.
George, A. A., Hall, G. E., & Stiegelbauer, S. M. (2006). Measuring implementation in schools: The stages of concern questionnaire. SEDL
Landwehr, N. (2015). Die institutionelle und kulturelle Verankerung des Feedbacks. In: Buhren, C. G. (Ed.). Handbuch Feedback in der Schule. Weinheim Basel: Beltz.
Patton, M. Q. (1997). Utilization-Focused Evaluation: The New Century Text. Thousand Oaks; London; New Delhi: Sage Publications.
Patton, M. Q. (2000). Utilization-focused evaluation. In: Stufflebeam, D.L., Madaus, G.F., Kellaghan, T. (Eds.) Evaluation Models: Viewpoints on Educational and Human Services Evaluation (pp. 425-438). Boston: Kluwer Academic Publishers.
Petermann, F. (2014). Implementationsforschung: Grundbegriffe und Konzepte. Psychologische Rundschau, 65(3), 122–128.
Rogers, E. M. (2003). Diffusion of innovations. New York: The Free Press.
Rossi, P. H., Lipsey, M. W. & Henry, G. T. (2019). Evaluation: a systematic approach (Eighth edition.). Los Angeles: SAGE.
Stockmann, R. (Hrsg.). (2011). A Practitioner Handbook on Evaluation. Cheltenham, UK ; Northampton, MA: Edward Elgar.


09. Assessment, Evaluation, Testing and Measurement
Paper

Use and Impact of External Evaluation Feedback in Schools in Iceland

Björk Ólafsdóttir, Jón Torfi Jónasson, Anna Kristín Sigurðardóttir

University of Iceland, Iceland

Presenting Author: Ólafsdóttir, Björk

Past findings concerning whether and how feedback from external evaluations benefit the improvement of schools are inconsistent and sometimes even conflicting, which highlights the contextual nature of such evaluations and underscores the importance of exploring them in diverse contexts. Considering that broad international debate, we investigated the use and impact of feedback from external evaluations in compulsory schools in Iceland, particularly as perceived by principals and teachers in six such schools. The research questions guiding the study was “How and to what extent do schools use the feedback presented in external evaluation reports?” and “To what extent do schools sustain the changes made after using the feedback from external evaluations instrumentally?” The framework used for analysing the evaluation feedback use was based on Rossi et al. (2004) and Aderet-German and Ben-Peretz (2020) and distinguishes between instrumental, conceptual, strategic and reinforcement-oriented use.


Methodology, Methods, Research Instruments or Sources Used
To map the perceived use and long-term impact of the feedback, a qualitative research design was adopted to examine changes in the schools made during a 4–6-year period following external evaluations by conducting semi-structured interviews with principals and teachers, along with a document analysis of evaluation reports, improvement plans and progress reports. Six schools were selected to participate in the research based on the evaluation judgement, school size and geographical location. Six principals and eight teachers were interviewed. In analysing the interview transcripts and documentation a thematic approach (Braun & Clarke, 2006) was followed.

Conclusions, Expected Outcomes or Findings

The findings reveal that feedback from external evaluations has been used in a variety of ways, as the data revealed clear examples of instrumental, conceptual, persuasive and reinforcement-oriented use. Instrumental use could be seen in relation to (1) leadership and management: primarily respecting professional collaboration amongst staff members and the instructional leadership of school leaders; (2) learning and teaching: mainly regarding differentiated strategies for instruction, democratic participation of the student and the use of assessments to improve students’ learning, and (3) internal evaluation: mostly concerning evaluation plans and methods, stakeholder participation and improvement plans. Instrumental use varied between the schools, and they did not all made major changes in all three areas. Conceptual use was also evident at the schools, and in that context the usefulness of obtaining an external view of the school’s function and getting help in identifying where improvements were needed was highlighted. In some cases, the evaluation feedback led to productive discussions and reflections among the professionals and for a three newly appointed principals it gave a useful instructions. Persuasive use of the evaluations feedback was identified in three interviews in the context of supporting changes that the interviewee wanted to bring about. Likewise, reinforcement-oriented use was analysed in three interviews at schools that had received positive evaluation feedback which they experienced as empowering. The findings also showed that both teachers and principals had a positive attitude towards the external evaluation and had generally experienced the evaluation feedback as useful and that it had contributed to changes in practices in the schools. The improvement actions presented in the schools’ improvement plans were generally implemented or continue to be implemented in some way, and the changes made have mostly been sustained.


References
Aderet-German, T., & Ben-Peretz, M. (2020). Using data on school strengths and weaknesses for school improvement. Studies in Educational Evaluation, 64, Article 100831. https://doi.org/10.1016/j.stueduc.2019.100831

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101.

Rossi, P. H., Lipsey, M. W., & Freeman, H. E. (2004). Evaluation: A systematic approach (7th ed.). Sage.


09. Assessment, Evaluation, Testing and Measurement
Paper

Knowledge Mapping Of Learning Analytics And Professional Capital In Education: a Bibliometric Study

Javier de la Hoz- Ruíz1, Mohammad Khalil2, Jesús Domingo Segovia1

1University of Granada, Spain; 2SLATE, University of Bergen, Norway

Presenting Author: de la Hoz- Ruíz, Javier

To reach objective 4 and 17 adopted by the General Assembly in September 2015, go through working together, teachers and students need to form a community of knowledge seekers and builders as the UNESCO (2021) affirms, so one of the ways to achieve these two sustainable development goals is to form communities of professional practice.

Thus, improving education is significantly dependent on the ability of school leaders to connect everyone (teachers, families and local community) into a professional community of practice, which increases their professional capital. It has three dimensions; human capital is acquired and useful skills of all inhabitants or members of society (Smith, 1776); social capital, those characteristics of social organization, such as trust, standards and networks that can improve the efficiency of society by facilitating coordinated actions (Putnam, 1995) and decision capital, is what professionals acquire and accumulate through structured and unstructured experiences, practice and reflection , capital that allows them to make judgements.

In complex contexts, a new governance of the school is required, with horizontal leaders (leadership from the middle) that build projects, cultures and environments with a community vocation of commitment to educational improvement, while expanding the social capital of the Professional Practice Community.

COMMUNITY OF PROFESSIONAL PRACTICE

Interestingly, Domingo-Segovia et al. (2020) have used the term of “Professional Practice Community” as a broader term to “professional learning community” that includes the school and the local community context. Community that requires the emergence of fluid networks of interrelationship, communication and support for the learning of all and for all, with a shared and networked leadership, articulated from a broad perspective of “middle leadership” (Rincón, 2019).

Therefore, as an “extended” community, the stage and actors must be linked and expanded with the collective goal of educational improvement, expanding networks of influences and opportunities. Hence the importance of the “increase of professional capital” (Hargreaves & Fullan, 2014), the next point to be discussed.

PROFESSIONAL CAPITAL AND LEARNING ANALYTICS

The key to this term is the systemic development and integration of three types of capital – human, social and decision-making – into the teaching profession. Professional capital has to do with collective responsibility, not individual autonomy; with rigorous training, continuous learning, going beyond the evidence, being open to the needs and priorities of students and society “ (Hargreaves & Fullan, 2012).

In addition, the use of learning analytics defined as “the measurement, collection, analysis and presentation of data about students and their contexts, in order to understand and optimize learning and the environments in which it occurs” acquires a relevant value as it can be used as a means to extract the most effective methodologies, processes and tools in data measurement, collection, analysis and reporting of professional capital (Khalil & Ebner, 2016).

Thus, The rationale for this work was to understand the structure of how learning analytics can help the improvement and understanding of professional capital in the field of education by analyzing its scientific output. For this purpose, bibliometric maps offering a better understanding of the structure of a scientific domain through the graphical representation of the different units of analysis and their relationships (Small, 2006)

Thus, this research will answer the following questions:

RQ1: What are the key themes or knowledge grouped around the use of learning analytics for the improvement of professional capital?

RQ2: What is the research trend of the field under study?

RQ3: What are the research boundaries extracted from the network analysis?

The steps and tools used in this process are explained more specifically in the methodology.


Methodology, Methods, Research Instruments or Sources Used
Firstly, to guide the first part of the study and to check the scientific production more in order to make our maps, the guides were used. and PRISMA checklist (http://www.prisma-statement.org) to ensure transparency in both the research process and analysis (Moher et al 2009).

Second, the cluster-based VOSviewer (van Eck & Waltman, 2011) was used to perform the analysis. It presents the structure, evolution, cooperation and other relationships of the field of knowledge for literature data. It also allows the user to view and explore scientific data mapping in cluster format based on data from scientometrics networks.

Both are explained more specifically below.

PRISMA

Web of Science (WOS), Scopus and Association for Computing Machinering (ACM) were used as data sources in this document.

In order to provide rigor to our search process, we proceeded to establish keywords extracted from the ERIC thesaurus (Hertzberg and Rudner, 1999). This document retrieves the information using the search formula ALL FIELDS (ALL) / (ALL=“learning analytic*" OR ALL="academic analytic*" OR ALL="teaching analytic*") AND (ALL="social capital" OR ALL=“human capital“ OR ALL=“decisional capital” OR ALL=“professional capital”) and the results after the duplicate citations were 657 papers at this initial stage . The search was conducted on Feb. 23, 2022.

The process of inclusion of the studies was conducted through peer review (Sarthou, 2016). We proceeded to read the items returned by our search to identify only those that were relevant to our research questions and objectives. Of the 93 items returned, we selected 84. In this, the reason for the exclusion of these studies was the thematic inadequacy in relation to our study.

Thus, the articles under study have been obtained, they are analysed with the software described below.

VOSVIEWER

Once the 84 articles have been imported in zotero, a multiplatform bibliographic reference manager, free, open and free, its main objective is to help us collect and manage the resources needed for our research (Alonso-Arévalo, 2015).

However, we recommend also exploring the effect of excluding a smaller or higher percentage of terms, in our case we eliminated the words "study" and "analysis" as we believe they distort the results. We refer to Van Eck and Waltman (2011) for a brief explanation of the calculation of relevance scores.

The network visualization maps or cluster density maps were produced by the VOSviewer, maps that show the results of the analysis.

Conclusions, Expected Outcomes or Findings
This paper assessed global research trends  from 2012 to 2019. The subject of learning analytics in the improvement of professional capital has been a field with a lot of research over the last 10 years, but it should be noted that scientific output has increased exponentially in recent years. There is a growing interest in research related to both learning analytics and business capital independently, which corresponds to the urgent need to jointly develop and improve these research fields.

Therefore, this study will provide us with three key points: a) It helps to better understand how learning analytics studies are carried out in the improvement of professional capital, as well as the fields and disciplines in which they are carried out, specifically 6 clusters or clusters were detected (research on the influence of community improvement program, research in the analytics of learning, analysis of collaborative networks, relationship models for performance improvement and theoretical background) b) The evolution over time of the studios, where it is possible to appreciate a current interest in platforms for the improvement of professional capital; and c) a frontier of analysis is proposed, with content studies to try to observe and use more specific information of these articles.

Limitations of this bibliometric study should be addressed, it is suggested to expand the research using other databases such as EBSCO, ProQuest, Emerald, SAGE, or others; however, the study agrees with the objective of the research, providing. In addition, future studies could consider other types of maps offered by the software such as co-citation between authors, journals...
In future work, the construction and comparison of two-dimensional bibliometric maps corresponding to several time periods would show where trends and research fronts are evolving.

References
The results of this publication are part of three research projects:

1) "Communities of professional practice and learning improvement: intermediate leadership, networks, and interrelationships. Schools in complex contexts" (Ref.: PID2020-117020GB-I00), funded by MCIN/AEI/10.13039/501100011033/ and ERDF "A way of doing Europe"; and
2) "Communities of professional practice and learning improvement" (Ref.: P20_00311), funded by the Andalusian Plan for Research, Development, and Innovation (PAIDI 2020).
Bolam, R, et al. (2005). Creating and Sustaining Effective Professional Learning Communities, DfES Research Report RR637, University of Bristol, Bristol.
3) "Extended Professional Learning Communities and Collaboration Networks for Sustainable Development and Inclusion: New Governance and Social Capital" (Reference: B-SEJ-234-UGR20), financed by the FEDER 2020 Operational Program (Andalusia 2014-2020)"      

Domingo-Segovia, J., Bolívar-Ruano, R., Rodríguez-Fernández, S., & Bolívar, A. (2020). Professional Learning Community Assessment-Revised (PLCA-R) questionnaire: Translation and validation in Spanish context. Learning Environments Research, 23(3), 347-367. https://doi.org/10.1007/s10984-020-09306-1
Hargreaves, A., Fullan, M., & Pruden, J. (2012). Professional Capital.
Hargreaves, A., & Fullan, M. (2014). Capital profesional. Transformar la enseñanza en cada escuela. Madrid: Morata.
        Khalil, M. & Ebner, M. (2016). “What is Learning Analytics about? A Survey of Different Methods Used in 2013- 2015”. In Proceedings of Smart Learning Conference, Dubai, UAE, 7-9 March, 2016 (pp. 294-304). Dubai: HBMSU Publishing House.
Leana, C. R. (2011). The missing link in school reform”, Stanford Innovation review, 34.
Moher, D., Liberati, A., Tetzlaff, J., & Altman, D.G. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA Statement. Annals of Internal Medicine, 51, 264-269.
Putnam, R. D. (1995). "Bowling Alone: America's Declining Social Capital", Journal of Democracy 6:65-78.
Rincón, S. (2019). Las redes escolares como entornos de aprendizaje para los líderes educativos. En J. Weinstein & G. Muñoz (eds.). Cómo cultivar el liderazgo educativo. Trece miradas. (pp.355-388) Santiago: Universidad Diego Portales
Small, H. (2006). Tracking and predicting growth areas in science. Scientometrics, 68(3), 595-610. https://doi.org/10.1007/s11192-006-0132-y
Smith, A. (1776), An inquiry to the nature an causes of the wealth of nations, Book II: of the nature, accumulation, and employment of stock Nueva York. Classic house books.
UNESCO (2021). Reimagining our futures together: a new social contract for education. Paris. Unesco.https://unesdoc.unesco.org/ark:/48223/pf0000379707
United Nations General Assembly. (2015). Transforming our world: The 2030 agenda for sustainable development.
Van Eck, N.J., & Waltman, L. (2011). Text mining and visualization using VOSviewer. ISSI Newsletter, 7(3), 50–54
 
1:30pm - 3:00pm09 SES 16 A: Understanding Learning Outcomes and Equity in Diverse Educational Contexts
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Kajsa Yang Hansen
Paper Session
 
09. Assessment, Evaluation, Testing and Measurement
Paper

Investigation of Factors Related to Immigrant Students' Mathematics Performance in PISA 2018

Ayse Akkir1, Serkan Arikan2

1Bogazici University, Turkiye; 2Bogazici University, Turkiye

Presenting Author: Akkir, Ayse

There has been a large number of studies on the performance gap between immigrant and native students (Arikan, Van de Vijver, & Yagmur, 2017; Martin, Liem, Mok, & Xu, 2012; Pivovarova and Powers, 2019; Rodriguez, Valle, Gironelli, Guerrero, Regueiro, & Estévez, 2020). It is critical to identify variables that could have relation with the performance of immigrant students. Numerous studies have been conducted on the variables that are related to immigrant students’ performance. Some studies emphasize immigrants' resilience (Rodriguez et al, 2020), while others focus on exposure to bullying (Karakus, Courtney, & Aydin, 2022; Ponzo, 2013). It was found that native students had higher scores than immigrant students on three indicators of wellbeing such as positive affect, self-efficacy-resilience, and a sense of belonging to the school (Rodríguez et al., 2020). Investigating factors at the student- and country-level that predict immigrant students’ performance may assist policymakers in taking education related action.

Thus, this study focus on identifying student- and country-level variables that are associated with mathematics performance of immigrant students using PISA 2018 data. In this regard, student-level variables are chosen based on Walberg’s theory (Walberg, 2004). According to Walberg's theory of academic achievement, a student's success is impacted by their characteristics and their environment. The main psychological factors influencing academic achievement were categorized into three groups. Student ability, instruction, and psychological environment are the categories. Student aptitude refers to a student's capacity, growth, drive, or predisposition for extreme perseverance in academic work. Both the quantity and the quality of the instructional time are part of instruction. Psychological environments refer to the morale or students' views of their peers in the classroom and the home environment. The morale of students or their perceptions of their classmates in the classroom and at home constitute psychological settings (Walberg, 2004). On the other hand, country-level variables are chosen based on research. Some research suggests that migrant integration policy index (MIPEX) is associated with achievement (Arikan et al, 2017; He et al., 2017). In addition to this, some research claims that the human development index (HDI) was found associated with achievement (Arikan et al., 2020).

The following research questions of the current study are

RQ1: Which student-level (motivation to master tasks, resilience, cognitive flexibility/adaptivity, exposure to bullying, sense of belonging, discriminating school climate, students’ attitudes toward immigrants) and country-level (MIPEX and HDI) variables could predict mathematics performance of immigrant students and native students across European countries in PISA 2018?

RQ2: Is there a statistically significant difference between the mathematics performance of first-generation immigrant students, second-generation immigrant students and native students across European countries in PISA 2018?

RQ3: Is there a statistically significant difference between the mathematics performance of first-generation immigrant students, second-generation immigrant students and native students after controlling economic and social status (ESCS)?


Methodology, Methods, Research Instruments or Sources Used
Participants
Participants are immigrant students (first- and second-generation) and native students who took the PISA assessment in 2018. Students who were born in another country and whose parents were born in another country are considered first-generation. Second-generation students are those who were born in the country of assessment but whose parents were born elsewhere. Native students are those whose parents (at least one of them) were born in the assessment country (OECD, 2019). The data from 14 European countries such as Croatia, Estonia, Germany, Greece, Iceland, Ireland, Italy, Latvia, Malta, Portugal, Serbia, Slovenia, Spain, Switzerland were included.

Measures
PISA does not only measure the performance of students but also gather data about students’ backgrounds by applying questionnaires. Student-level variables are chosen from the student questionnaires. Student-level variables are motivation to master tasks, resilience, and cognitive flexibility/adaptivity, exposure to bullying, sense of belonging, discriminating school climate, students’ attitudes toward immigrants. At the country-level, migrant integration policy index and human development index was used. As a control variable economic and social status (ESCS) was used.

Data Analysis
In order to answer the first research question, multilevel regression analysis will be used to investigate which student-level and country-level variables could predict the mathematics performance of immigrant and native student. For the multilevel regression analyses, MPLUS 7.4 will be used. For the second research question, independent samples t-test will be used to compare the performance of immigrant students and native students. The sample weights and plausible values will be included in the analyses to have unbiased results by using IDB Analyzer (Rutkowski, Gonzalez, Joncas, & Von Davier, 2010). In order to answer the third research question, propensity score matching will be used first and then related comparisons will be performed to examine the performance gap between immigrant students and native students after controlling economic and social status. The MatchIt R package (Ho, Imai, King, & Stuart, 2011) will be used for propensity score matching.

Conclusions, Expected Outcomes or Findings
The intraclass correlation will be reported to partition the variation in immigrant students’ math performance by country-level and student-level differences. Moreover, R-square will be used to understand explained variances in mathematics performance by student-level and country-level variables of the current study. Then, student- and country-level variables that are significantly related to mathematics performance will be reported.

Multiple independent samples t-test will be used to test if statistically significant difference exists between the mathematics performance of first-generation immigrant students, second-generation immigrant students and native students. The mathematics performance of first-generation immigrant students and native students will be compared. Then, second-generation immigrant students’ and native students’ mathematics performance will be compared. After that, first-generation and second-generation immigrant students’ mathematics performance will be compared. Confidence intervals, t-values and effect sizes will be presented.  Since the sample weights and plausible values had to be included in the analyses, ANOVA could not be used. IDB Analyzer will be used because it considers sample weights and plausible values. Applying multiple t-test may increase the chance of type 1 error. Therefore, the Bonferroni adjustment will be used to lower the likelihood of receiving false-positive findings. The adjustment is made by dividing the p-value into the number of t-test (Napierala, 2012). Therefore, the correction will be made by dividing the p-value (0.5) by the number of t-tests (3).  

Propensity score matching will be used to investigate the performance difference between immigrant students and native students after ESCS has been controlled. The scores of economic and social status will be matched for immigrant and native groups so that the groups will be similar regarding ESCS. Then, the performance of immigrant students and native students will be compared by applying the t-test. Effect size of performance difference before and after matching will be compared.

References
Arikan, S., Van de Vijver, F. J., & Yagmur, K. (2017). PISA mathematics and reading performance differences of mainstream European and Turkish immigrant students. Educational Assessment, Evaluation and Accountability, 29(3), 229-246.
Arikan, S., van de Vijver, F. J., & Yagmur, K. (2020). Mainstream and immigrant students’ primary school mathematics achievement differences in European countries. European Journal of Psychology of Education, 35(4), 819-837.
Ho, D., Imai, K., King, G., & Stuart, E. A. (2011). MatchIt: Nonparametric Preprocessing for Parametric Causal Inference. Journal of Statistical Software, 42(8), 1–28. doi:10.18637/jss
IEA (2022). Help Manual for the IEA IDB Analyzer (Version 5.0). Hamburg, Germany. (Available from www.iea.nl)
Karakus, M., Courtney, M., & Aydin, H. (2022). Understanding the academic achievement of the first-and second-generation immigrant students: A multi-level analysis of PISA 2018 data. Educational Assessment, Evaluation and Accountability, 1–46.
Martin, A. J., Liem, G. A., Mok, M., & Xu, J. (2012). Problem solving and immigrant student mathematics and science achievement: Multination findings from the Programme for International Student Assessment (PISA). Journal of educational psychology, 104(4), 1054.
Napierala, M. A. (2012). What is the Bonferroni correction? Aaos Now, 40-41.
OECD (2019), PISA 2018 Results (Volume III): What School Life Means for Students’ Lives, PISA, OECD Publishing, Paris, https://doi.org/10.1787/acd78851-en.
Pivovarova, M., & Powers, J. M. (2019). Generational status, immigrant concentration and academic achievement: comparing first and second-generation immigrants with third-plus generation students. Large- scale Assessments in Education, 7(1), 1-18.
Ponzo, M. (2013). Does bullying reduce educational achievement? An evaluation using matching estimators. Journal of Policy Modeling, 35(6), 1057–1078.
Rodríguez, S., Valle, A., Gironelli, L. M., Guerrero, E., Regueiro, B., & Estévez, I. (2020). Performance and well-being of native and immigrant students. Comparative analysis based on PISA 2018. Journal of Adolescence, 85, 96–105.
Rutkowski, L., Gonzalez, E., Joncas, M., & Von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational researcher, 39(2), 142-151.
Walberg, H. J. (2004). Improving educational productivity: An assessment of extant research. The LSS Review, 3(2), 11-14.


09. Assessment, Evaluation, Testing and Measurement
Paper

Differences in Component Reading Skills Profiles for Native and Immigrant Fifteen-Year-Old Students in Sweden

Camilla Olsson, Monica Rosén

University of Gothenburg

Presenting Author: Olsson, Camilla

Today, societies in many countries are multilingual. Multilingualism can contribute to success in school and later in working life. An individual's language development affects their reading development (Kirsch et al., 2002). Thus, in a learning context, such as in school, reading skills are effective tools for obtaining, organizing, and using information in various fields (Artelt, Schiefele & Schneider, 2001). Reading is a multi-component process (Grabe, 2009) and many students during middle school have difficulty moving from “learning to read” to “reading to learn”. Fluency, previous knowledge, experience, and word knowledge are important since the students are expected to read about, for them, unknown subjects in which words and linguistic structures are more complex (Wharton‑McDonald & Erickson, 2016). For a student who reads the information at school in a second language (L2), the reading process becomes even more complex. Grabe (2009) summarizes the major overall differences between reading in a first language (L1) and a second language (L2): “Linguistic and processing differences, developmental and educational differences and sociocultural and institutional differences” (Grabe, 2009, p.130). Research has also shown that it takes at least four to five years before an individual can use their second language (L2) as a school language (Cummins, 2017; Thomas & Collier, 2002). In PISA 2018 the students in Sweden performed at a higher level on the reading literacy test, than the average for OECD countries. However, the groups of students with a foreign background, both those born in Sweden and those born abroad, performed at a lower level than native Swedish students (National Agency for Education, 2019). Educators in several countries have expressed concern about how education for first- and second-generation immigrant students is designed (Cummins, 2011).

In this study, PISA data from 2018 was used to investigate the patterns of variation regarding the components in PISA defined as reading fluently (reading fluency) and the students' perception of the usefulness of reading strategies regarding memorizing and understanding texts(UNDREM) (awareness of the usefulness of reading strategies) effects on reading literacy performance for multilingual fifteen years old students in a Swedish context. The aim is to get a better understanding of similarities and differences in the students’ component skills reading profiles (CSRP), in this study defined as learners’ relative development of different reading subskills, between categories of students with different language backgrounds.

Two research questions were posed:

  • What is the relative importance of reading fluency and the awareness of the usefulness of reading strategies in memorizing and understanding texts on an overall reading performance for native, second generation and first generation students?
  • Are there similarities and differences between the three categories of students regarding the effects of reading fluency and the awareness of the usefulness of reading strategies on the processes locate information, understanding and evaluating and reflecting?

To analyse the results the theory component skills approach to reading was used. In the approach, the overall multicomponent reading process is divided into two processes, defined as lower-level processes and higher-level processes (Grabe, 2009; Koda, 2005). The approach can show if and how these processes interact and how much each of the processes contributes individually and collectively to reading comprehension for both L1 and L2 readers (Grabe, 2009; Koda, 2005). In this study reading fluency is assumed to be related to lower-level processes and awareness of the usefulness of reading strategies to higher–level processes. Both components are, in theories and research, described to be of importance in a reading process (Grabe, 2009; Koda, 2005). Knowledge about differences and similarities in the students’ component reading profiles can potentially be used in the future development of reading instruction for multilingual students.


Methodology, Methods, Research Instruments or Sources Used
This study is based on a secondary analysis performed with data generated during the PISA 2018 when reading was the main subject for the third time. In Programme for International Student Assessment (PISA), students' knowledge in the subjects of reading literacy, mathematics and science is examined just before the students are about to leave compulsory school (OECD, 2019). The Swedish sample consisted of 5504 students of which 4283 were native students, 556 were second generation students, and 499 were first generation students. The observed independent variables used in the study were the variables in PISA defined as “reading fluently” (reading fluency) and the students’ self-reported index variable UNDREM (Meta-cognition: understanding and remembering) (reading strategies). Data preparation and management were performed in SPSS 28, and the analyses were carried out in Mplus Version 8 (Muthén & Muthén, 1998-2017). Weights are used according to common practice in the analysis of PISA data. In order to investigate the differences and similarities in the students’ component reading profiles (in this study defined as learners' relative development of reading subskills) between the different categories of students, five separate multigroup path analyses were conducted. In the first two analyses the overall PISA score for each of the students was used as a dependent variable. In the models three to five the scores on the processes measured in PISA, locating information, understanding and evaluating and reflecting were used as dependent variables. In this study all 10 plausible values for each of the processes were used. In the first path analysis the test-takers were divided into two groups, native students (who were born in Sweden) and first generation students (students who were born abroad with parents who were also born abroad). In the following models the three categories, native, second generation and first generation of students were measured separately. Additionally, the relations between the different categories of the students' reading fluency and awareness of the usefulness of reading and various levels of reading proficiency defined in the PISA assessment were compared and visualized.
Conclusions, Expected Outcomes or Findings
The results revealed that there are significant differences in the effects of the lower-level process related to reading fluency (β (Native and SecGen)= 0.434), (β (FirstGen)= 0.631) and the higher-level process related to the awareness of the usefulness of reading strategies (β (Native and SecGen)= 0.349), β (FirstGen)= 0.222) on the students reading literacy performance between, on the one hand the students who were born in Sweden and on the other those who were born abroad. In model two when all three categories of students were included the effects of the two components on reading literacy performance were almost similar between the group of native students (β (RF) = 0.422, β (RS)= 0.346) and second generation students (β (RF) = 0.491, β (RS)=0.339) while for the group of first generation students the relation was much larger (β (RF) = 0.631, β (RS)=0.222). When comparing the relation between reading fluency and the students’ perceptions regarding the usefulness of reading strategies with the proficiency levels defined in PISA the results showed that the test-takers in the group of first generation students have another distribution of higher- and lower-level processes up to proficiency levels three (between 480-553 score points on the PISA test) than the test takers in the groups of native and second generation students. Thus, the result from the analysis indicates that the groups of students have different component skills reading profiles and appear to rely on partly different kinds of processes at several reading proficiency levels. The patterns with regard to both reading fluency and awareness of the usefulness of reading strategies are similar for the groups of native and second generation students but different for the group of first generation students. The results indicate that the relative importance of reading fluency and awareness of the efficiency of reading strategies is different for first generation students.
References
Artelt, Schiefele & Schneider, 2001). Artelt, C., Schiefele, U., & Schneider, W. (2001) Predictors of Reading Literacy. European Journal of Psychology of Education 26, (3), 363.

Grabe, William. (2009). Reading in a Second Language: Moving from Theory to Practice. Cambridge University Press: Cambridge.

Kirsch, I., de Jong, J., La Fontaine, D. McQueen, J., Mendelovits, J.,Monseur, C. (2002). Reading for change: Performance and engagement across countries. Paris: Organization for Economic Co-operation and Development.

Koda, K. (2005). Insights into second language reading: A cross-linguistic approach. Cambridge: Cambridge University Press.

Muthén, L. K., & Muthén, B. (2017). Mplus user’s guide. Statistical Analysis With Latent Variables Muthén & Muthén.

National Agency for Education. (2019). PISA 2018, 15-åringars kunskaper i läsförståelse, matematik och naturvetenskap. Stockholm: National Agency for Education.

OECD. (2019). PISA 2018 Assessment and Analytical Framework. Paris: OECD Publishing.
 https://doi.org/10.1787/b25efab8-en.

Wharton-McDonald, R., & Erickson, J. (2016). Reading Comprehension in the Middle Grades Characteristics, Challenges, and Effective Supports. I S.E. Israel (red), Handbook of research on reading comprehension (s 353-376.). Guilford Publications: New York.


09. Assessment, Evaluation, Testing and Measurement
Paper

Gender Gap and Differentiating Trends in Learning Outcomes in Estonia and Finland

Arto Ahonen

University of Jyväskylä, Finland

Presenting Author: Ahonen, Arto

In PISA 2018, girls outperformed boys in reading by almost 30 score points. However, the size of the gender gap did not seem to be related to the average performance. In 16 out of the 25 countries and economies whose mean score was above the OECD average, the difference in the reading performance between boys and girls was smaller than the average gender gap across OECD countries (OECD, 2019b). Among these high-performing countries, the difference between girls' and boys' performance ranged from 13 score points in B-S-J-Z (China) to 52 in Finland.

In societies where gender equality is enhanced, girls often perform better in reading and maths (Scheeren, van de Werfhorst and H., & BolStill, 2019). This paper examines the gender gap in learning in two well-performing neighbouring countries, Finland and Estonia. In both countries, gender equality is established and travels well across the sectors of society. In Finland, there has been a declining trend in students' PISA performance in all core assessment domains, reading, mathematics and science since 2009. At the same time, the gender gap in Finland has transformed to favour girls in mathematics and science (OECD, 2019a). Meanwhile, in Estonia, the country's average performance has increased in reading and mathematics and remained at its level in science. Also, in Estonia, the gender gap has narrowed in reading, is neutral in science and developed to favour boys in mathematics.

Even though gender differences are probably the most commonly examined education outcomes, it remains unclear what the underlying causes of the existing differences remain. Maccobly and Jacklin (1974) concluded their substantially extensive review that whilst some patterns persist, for example, female superiority in verbal skills and male superiority in mathematical skills, it is not easy to untangle the influence of stereotyping on individuals' perceptions of and behaviour towards, events and objects. According to them, it was also challenging to separate if, and to what extent, innate or learned behaviours underpin the development of behavioural or cognitive gender differences. The focus on masculinity in crisis is potentially fruitful, however, because it shifts the emphasis away from structural factors in post-industrial societies, which position boys as inevitable ‘losers’. Instead, it would be necessary to explore the characteristics of masculinity that inhibit boys as learners and citizens and how these might be challenged (Epstein et al., 1998).

There is a substantial variation in gender differences, but no equal starting point given the considerable differences between countries in their provision of preschool education, age of entry into formal schooling, age of school tracking, community resources such as libraries, training of teachers, general learning cultures for example (Topping et al., 2003). From this societal and educational structure point of view, Estonia and Finland are very similar. So, it is not easy to adduce which factors have the most significant influence and why. Previous research has shown that students' families' socioeconomic status has a somewhat differentiated effect on performance by gender (Van Hek, Buchmann, & Kraaykamp, 2019; Autor, 2019). Also, students' motivation and self-efficacy are among the most vital associates of their performance across PISA studies, specifically in Finland and Estonia (Lee & Stankov, 2018; Lau & Ho, 2022).

The following research questions were formulated to examine these topics: How do motivation and self-efficacy predict girls’ and boys’ proficiency in Finland and Estonia in PISA cycles from 2006 to 2018? Could the gender gap explain the differentiating trajectories of a country's educational outcomes?


Methodology, Methods, Research Instruments or Sources Used
Finnish and Estonian data are first compared with the IDB analyser utilising the SPSS program. A linear regression analysis was conducted separately of the predictors for girls’ and boys’ country average scores calculated with ten plausible values in PISA cycles 2006, 2009, 2012, 2015 and 2018, of which have mathematics, science and reading literacy as the main domain. Descriptive statistics were calculated and presented for each cycle. The predicting factors of self-efficacy and motivation or joy/like of the main domain school subject were examined as computed variables with Weighted Likelihood Estimate (WLE) values. The ESCS index was used as an indicator of students' socioeconomic background, which was also used either as a control covariate or a predicting variable to examine the possible differentiated effect it may have on gender proficiency. Finally, regression analysis was conducted to form a predicting model for girls’ and boys’ proficiency in every domain, both for Finland and Estonia
Conclusions, Expected Outcomes or Findings
The preliminary results reveal that while in the first two cycles of the PISA study, gender differences were not as evident in Finland as later on, the motivation towards the assessed domain was higher than in the later cycles. Also, the motivational factors were stronger predictors for main domain proficiency in Finland than they were in Estonia in the earlier cycles, 2006 and 2009. In the recent cycles, 2015 and 2018, self-efficacy was the strongest predictor in Finland and Estonia. It appears that the change in the level of motivational factors has been towards a lower level in Finland but remained stable or slightly increased in Estonia. Finally, the applied regression models could predict more of the variance of the girls than the boys in each major domain in each cycle.
References
Autor, D. Figlio, D., Karbownik, K., Roth, J., & Wasserman, M. 2019. Family disadvantage and the gender gap in behavioural and educational outcomes. American economic journal: Applied economics 11(3), 338–381. https://doi.org/10.1257/app.20170571
Epstein, D., Ellwood, J., Hey, V. & Maw, J., 1998. Failing boys? Issues in gender and achievement. Buckingham: Open University Press.
Van Hek, M., Buchmann, C., & Kraaykamp, G. 2019. Educational Systems and Gender Differences in Reading: A Comparative Multilevel Analysis. European Sociological Review 35 (2), 169–186. https://doi.org/10.1093/esr/jcy054
Lau, KC., Ho, SC. 2022. Attitudes Towards Science, Teaching Practices, and Science Performance in PISA 2015: Multilevel Analysis of the Chinese and Western Top Performers. Research in Science Education 52, 415–426 https://doi.org/10.1007/s11165-020-09954-6
Lee, J., & Stankov, L. 2018. Non-cognitive predictors of academic achievement: Evidence from TIMSS and PISA. Learning and Individual Differences 65 (3), 50–64.
Maccoby, E.E. & Jacklin, C.N., 1974. The psychology of sex differences. Stanford: Stanford University Press.
OECD 2019a. PISA 2018 Results. Volume I. What Students Know and Can do? Paris: OECD Publishing.

OECD 2019b. PISA 2018 Results. Volume II. Where All Students Can Succeed. Paris: OECD Publishing.

Scheeren, L., van de Werfhorst, H., & Bol, T.  2018 The Gender Revolution in Context: How Later Tracking in Education Benefits Girls. Social Forces 97 (1), 193–220. https://doi.org/10.1093/sf/soy025


09. Assessment, Evaluation, Testing and Measurement
Paper

Alternative Indicators of Economic, Cultural, and Social Status for Monitoring Equity: A Construct Validity Approach

Alejandra Osses, Raymond J. Adams, Ursula Schwantner

Australian Council for Educ. Research, Australia

Presenting Author: Osses, Alejandra

Background: Young people’s economic, cultural and social status (ECSS) is one of the most prevalent constructs used for studying equity of educational outcomes. National, regional and international large-scale assessments have furthered the quantitative research concerning the relationship between economic, cultural, and social background indicators and educational outcomes (Broer et al., 2019; Lietz et al., 2017; OECD, 2018).

However, there are observed theoretical and analytical limitations in the use of existing ECSS indicators from large-scale assessments for the purpose of monitoring equity in education (Osses et al., Forthcoming). Theoretical limitations relate to inconsistencies in how the ECSS construct is defined and operationalised, which pose significant challenges for comparing results between large-scale assessments and limit the usability of findings in addressing policy issues concerning equity in education. For example, Osses et al. (2022), demonstrated that using alternative approaches for constructing an ECSS indicator leads to different judgements concerning education systems in terms of equity of learning achievement.

Analytical limitations relate to the validity and reliability of ECSS indicators used in large-scale assessments. Whilst studies often explore reliability, cross-national invariance and other psychometric properties of ECSS indicators, information about the performance of alternative indicators is not provided. In fact no studies were found that compare the performance of alternative ECSS indicators constructed by large-scale assessments; Oakes and Rossi (2003) is an example from health research.

Objective: This paper focuses on analysing the properties of two ECSS indicators constructed using alternative theoretical and analytical approaches, applied to the same student sample. Evidence on validity is provided to evaluate the relative merits and the comparability of the two indicators for monitoring equity in education.

Method: This study analyses the properties of students’ ECSS indicators constructed by PISA and TIMSS with the aim of providing evidence concerning the validity and comparability of these two indicators. The novelty of the methodological approach lies in estimating both indicators for the same sample of students – those in PISA 2018, and thus analysing the merits of each analytical approach.

Indicators are analysed in terms of its content – ie, evaluating alignment between the theoretical construct, the indicators and the items chosen for its operationalisation – and its internal consistency. Indicators’ internal structure is investigated using confirmatory factor analysis and item response modelling in relation to model fit and the precision with which the indicators measure the ECSS construct – that is, targeting and reliability. The use of plausible values as a strategy to reduce error in making inferences about the population of interest is also explored.

Preliminary results show that the TIMSS-like indicator constructed using PISA 2018 data may benefit from better defining the underlying construct and of theoretical support to provide evidence for evaluating the adequacy of indicators chosen in its operationalisation. In terms of internal consistency, results indicate that items in the TIMSS-like indicator are “too easy” for the PISA population of interest and, although response data show a reasonably fit to the measurement model, the chosen items provide an imprecise measurement of students’ ECSS.

Three key conclusions emerge from preliminary results. First, large-scale assessments should devote more time to clearly define and provide theoretical support for the construct of students’ ECSS. Second, items used in summary indicators of ECSS should be carefully inspected, not only in terms of their reliability but also in terms of the adequacy of response categories and fit to measurement model. Third, the use of plausible values should be considered in order to avoid bias and improve precision of population estimates. The PISA indicator is currently being analysed.


Methodology, Methods, Research Instruments or Sources Used
This work extends the analysis in Osses et al. (2022) to investigate the properties of two alternative ECSS indicators constructed with the same student sample using PISA 2018 data. The first indicator corresponds to the PISA Economic, Social, and Cultural Status index (hereinafter, PISA_ESCS). The second indicator is constructed by recoding PISA data to obtain variables that are identical to those used in the TIMSS Home Educational Resources scale for grade 8 students (hereinafter, PISA_HER) and following the procedures detailed in the TIMSS 2019 technical report (Martin, von Davier, & Mullis, 2020). Two main aspects of validity (AERA et al., 2014) are evaluated: evidence on indicators’ content and internal structure.
Evidence on indicators’ content
Evaluating alignment between the construct, indicators and items chosen for its operationalisation allows determining whether scores can be interpreted as a representation of individuals’ ECSS. This is typically referred to as evidence of content relevance and representation (AERA et al., 2014; Cizek, 2020; Messick, 1994). To investigate content relevance and representation, a review of published documentation of PISA and TIMSS was undertaken in relation to theoretical underpinning, conceptualisation and operationalisation of each indicator.
Evidence on indicators’ internal structure
The modelling approach of each indicator is analysed in relation to the appropriateness of analytical steps followed in its construction. The PISA_ESCS is the arithmetic mean of three components, highest parental education and occupation, and home possessions – the latter being a latent variable indicator constructed using Item Response Modelling – IRM (OECD, 2020). The PISA_HER is the application of an IRM to three items: highest parental education, study support items at home, and number of books at home.
Internal structure of indicators is investigated using the analytical tools provided by the modelling approach used in PISA and TIMSS in relation to model fit and the precision with which the indicators measure the ECSS construct – that is, targeting and reliability.
Confirmatory factor analysis – with a specification and constraints that matches the indicator construction method used by PISA (OECD, 2020), is used to investigate the internal structure of PISA_ESCS, including model fit and reliability. IRM is used to investigate the internal structure of PISA_HER and of the home possessions scale – a component in the PISA_ESCS. Within IRM analysis, item targeting, model fit, and reliability of estimates are investigated. The use of plausible values, as opposed to weighted likelihood estimates, is also explored (OECD, 2009; Wu, 2005).

Conclusions, Expected Outcomes or Findings
Indicators’ content: PISA and TIMSS published documentation provide different levels of depth in the theoretical argument underpinning the ECSS construct. Although indicators used in both summary scales are typically used in operationalisations of ECSS, neither assessment specifies a conceptual model relating theory to construct operationalisation.
Indicators’ internal structure: Preliminary results relate to PISA_HER indicator; PISA_ESCS indicator is currently being analysed.
Items in the PISA_HER scale are relatively easy for PISA students, with most thresholds located in the lower region of the scale – ie, below the mean latent attribute of 1.63. PISA_HER items fit well together (ie, have similar discrimination) and response data fit the partial credit model (mean squared statistic close to 1). However, the person separation index of the PISA_HER index is low (0.36). Using plausible values in relation to ability estimates is common practice in PISA and TIMSS, where the interest is on reducing error in making inferences about the population of interest. However, contextual information is typically analysed using an IRM approach with the use of point estimates (eg, WLE) to produce students’ scores. Preliminary results indicate that the analytic outcomes might be quite different if plausible values are used.
Preliminary results from this study suggest that ECSS indicators in PISA and TIMSS require a sounder definition and operationalisation of the ECSS construct, which should be supported by theory and empirical evidence. The analytical steps in constructing the summary indicator – ie, the measurement model, should reflect the underlying theory. For example, if the construct is theorised to be a latent variable, then the summary indicator should be constructed using a latent variable model. As large-scale assessments aim at making inferences about the population of interest, rather than about individual students, using plausible values is an approach that should be explored in constructing contextual indicators. 

References
AERA, APA, & NCME. (2014). Standards for Educational and Psychological Testing. AERA.
Broer, M., Bai, Y., & Fonseca, F. (2019). Socioeconomic Inequality and Educational Outcomes. Evidence from Twenty Years of TIMSS. SpringerOpen.
Cizek, G. J. (2020). Validity: An Integrated Approach to Test Score Meaning and Use. Routledge.
Hooper, M., Mullis, I. V. S., Martin, M. O., & Fishbein, B. (2017). TIMSS 2019 Context Questionnaire Framework. In I. V. S. Mullis & M. O. Martin (Eds.), TIMSS 2019 Assessment Frameworks. Boston College, TIMSS & PIRLS International Study Center. http://timssandpirls.bc.edu/timss2019/frameworks/
Lietz, P., Cresswell, J., Rust, K. F., & Adams, R. J. (2017). Implementation of Large‐Scale Education Assessments. John Wiley and Sons.
Martin, M. O., Mullis, I. V. S., Foy, P., & Arora, A. (2012). Methods and Procedures in TIMSS and PIRLS 2011. TIMSS & PIRLS International Study Center, Boston College. https://timssandpirls.bc.edu/methods/index.html
Martin, M. O., von Davier, M., & Mullis, I. V. S. (2020). Methods and Procedures: TIMSS 2019 Technical Report. TIMSS & PIRLS International Study Center.
Messick, S. (1994). Validity of Psychological Assessment: Validation of Inferences from Persons’ Responses and Performances as Scientific Inquiry into Score Meaning. Educational Testing Service. https://files.eric.ed.gov/fulltext/ED380496.pdf
Oakes, M., & Rossi, P. (2003). The measurement of SES in health research: Current practice and steps toward a new approach. Social Science & Medicine, 56(4), 769–784.
OECD. (2001). Knowledge and Skills for Life—First results from the OECD Programme for International Student Assessment (PISA) 2000. https://www.oecd-ilibrary.org/education/knowledge-and-skills-for-life_9789264195905-en
OECD. (2009). PISA Data Analysis Manual: SAS Second Edition. https://www.oecd.org/pisa/pisaproducts/pisadataanalysismanualspssandsassecondedition.htm
OECD. (2017). PISA 2015 Assessment and Analytical Framework: : Science, Reading, Mathematic, Financial Literacy and Collaborative Problem Solving, revised edition. PISA, OECD Publishing. https://doi.org/10.1787/9789264255425-en
OECD. (2018). Equity in Education: Breaking down barriers to social mobility. OECD Publishing.
OECD. (2019). PISA 2018 Results (Volume II): Where All Students Can Succeed. https://www.oecd.org/pisa/publications/
OECD. (2020). Chapter 16. Scaling procedures and construct validation of context questionnaire data—PISA 2018. https://www.oecd.org/pisa/publications/
 
3:30pm - 5:00pm09 SES 17 A: Exclusions and Non-response: Contemporary Missing Data Issues in International Large-scale Studies
Location: Gilbert Scott, EQLT [Floor 2]
Session Chair: Rolf Strietholt
Session Chair: Mojca Rozman
Symposium
 
09. Assessment, Evaluation, Testing and Measurement
Symposium

Exclusions and Non-response: Contemporary Missing Data Issues in International Large-scale Studies

Chair: Rolf Strietholt (IEA)

Discussant: Mojca Rozman (IEA Hamburg)

This session combines studies that examine different forms of missing data in international comparative large-scale studies. The overall aim is to investigate current challenges that have emerged in recent years, including issues around the sample representativeness and the validity of performance measures. Five contributions from international scholars use data from the international studies TIMSS (Trends in International Mathematics and Science Study), PIRLS (Progress in International Reading Literacy Study), and PISA (Programme for International Student Assessment) to investigate missing data related issues including the bias, scaling or reliability of these data. The contributions in this session evaluate issues in achievement tests as well as background surveys.

Missing values are a practical problem in virtually all empirical surveys. In particular, non- intentional missing values that are missing by design are a problem in empirical research because they can compromise the integrity of the data. Typical problems relate to the representativeness of data or the accurate measurement of constructs. In this session, we examine missing values in international comparative assessments, looking at missing values in both achievement tests and background surveys. Furthermore, the individual contributions examine both unit and item non- response. The different contributions study both the reasons for missing data and consequences for the integrity of the data. The papers address non-response and exclusions in background surveys and performance tests in the large-scale assessments.

The first two papers look at missing values in performance data. In the first paper, the authors look at students who were excluded from PISA in Sweden and draw on the results of national tests. The other paper, which uses performance data, compares what happens when missing values on individual test items are judged to be incorrect or not administered. This study is based on PIRLS data as well as a simulation study. The other three papers look at missing values in survey data. One of them investigates how exclusion rates have changed over a 20-year period using data from TIMSS, PIRLS, ICCS and ICILS, and the authors note an increase in some countries. The remaining paper the impact of the administration mode on the survey participation comparing paper- vs online parental surveys in TIMSS.

The session consolidates research on a theme that often receives too little attention, and that is non-response and exclusion in large-scale tests and surveys. The session investigates different methodological issues related to missing data in different international assessments. The session is divided into six parts, four presentations, a discussion by a renowned expert, and an open discussion.


References
Anders, J., Has, S., Jerrim, J., Shure, N., & Zieger, L. (2020). Is Canada really an education superpower? The impact of non-participation on results from PISA 2015. Educational Assessment, Evaluation and Accountability, 33, 1, 229-249. https://doi.org/10.1007/s11092-020- 09329-5
Debeer, D., Janssen, R., & De Boeck, P. (2017). Modeling Skipped and Not-Reached Items Using IRTrees. Journal of Educational Measurement, 54(3). 333-363. https://doi.org/10.1111/jedm.12147
De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-Based Item Response Models of the GLMM Family. Journal of Statistical Software, 48, 1-28. https://doi.org/10.18637/jss.v048.c01
Gafni, N., & Melamed, E. (1994). Differential tendencies to guess as a function of gender and lingual-cultural reference group. Studies in Educational Evaluation, 20(3), 309–319.
https://doi.org/10.1016/0191-491X(94)90018-3
DeLeeuw, E. D. (2018). Mixed-Mode: Past, Present, and Future. Survey Research Methods, 12(2), 75-89. https://doi.org/10.18148/srm/2018.v12i2.7402
 
Jerrim, J. (2021). PISA 2018 in England, Northern Ireland, Scotland and Wales: Is the data really representative of all four corners of the UK? Review of Education, 9(3). doi:10.1002/rev3.3270
Micklewright, J., Schnepf, S. V., & Skinner, C. J. (2012). Non-response biases in surveys of school children: the case of the English PISA samples. Journal of the Royal Statistical Society. Series A (General), 175, 915–938.

 

Presentations of the Symposium

 

Non-response Bias in PISA: Evidence from Comparisons with Swedish Register Data

Linda Borger (Gothenburg University), Stefan Johansson (Gothenburg University), Rolf Strietholt (TU Dortmund University)

The OECD claims that the PISA study allows for representative statements about student performance. In some countries, however, entire schools and individual students are excluded from taking the tests. In addition, an increasing number of students do not sit the test for various reasons and are therefore missing in the data. In PISA 2018 more than 10 percent of the Swedish students in the PISA sample were excluded from the test or did not participate for other reasons. While this is exclusions and non-response is considered a problem (Anders et al., 2020; Micklewright et al., 2012), it is difficult to quantify the bias because the performance levels of the excluded/non-responding schools or their students are generally unknown. To address this problem, we constructed a unique database combining Swedish PISA data with Swedish register data. This database includes national tests results, subject grades as well as information on parental education and migration background for all Swedish students tested in PISA 2018. Moreover, our database also comprises corresponding information for the full cohort of students that was eligible to sit the PISA test (100 000 students born in 2002). We compare the performances and background data of the PISA sample with the entire population of 15-year- olds to shed light on any bias in PISA. The results of the analyses reveal certain degree of bias and cast doubt on the representativeness of the 2018 PISA results in Sweden. Based on the results of the national tests available for all students from Sweden, we find that students who participated in PISA perform on average more than one standard deviation better than students who were excluded from PISA or did not participate for other reasons. The findings are discussed in relation to the general problem of missingness in survey data as well as in relation to the comparability of results over time in PISA.

References:

Anders, J., Has, S., Jerrim, J., Shure, N., & Zieger, L. (2020). Is Canada really an education superpower? The impact of non-participation on results from PISA 2015. Educational Assessment, Evaluation and Accountability, 33, 1, 229-249. https://doi.org/10.1007/s11092-020- 09329-5 Micklewright, J., Schnepf, S. V., & Skinner, C. J. (2012). Non-response biases in surveys of school children: the case of the English PISA samples. Journal of the Royal Statistical Society. Series A (General), 175, 915–938.
 

IRTrees for Skipped Items in PIRLS

Andrés Christiansen (IEA), Rianne Janssen (KU Leuven)

In international large-scale assessments, students may not be compelled to answer every test item; hence, how these missing responses are treated may affect item calibration and ability estimation. Nevertheless, using a tree-based item response model or IRTree, it is possible to disentangle the probability of attempting to answer an item from the probability of a correct response (Debeer, Janssen, & De Boeck, 2017). In an IRTree, intermediate individual decisions are represented as intermediate nodes and observed responses as final nodes. Intermediate and end nodes are connected by branches that depict all possible outcomes of a cognitive subprocess. For each branch, it is possible to estimate a distinct probability (De Boeck, & Partchev, 2012). In the present study, we evaluate the usefulness of an IRTree model for skipped (omitted) responses, first with a simulation study and then on PIRLS data from 2006, 2011, and 2016. In the simulation study, we tested missing at random (MAR) and missing not at random (MNAR) scenarios. Moreover, we tested four missing response treatments, simulating the strategies of different large-scale assessments.The simulation study proved that the IRTree model maintained a higher accuracy than traditional imputation methods within a high proportion of omitted answers. In a second step, the IRTtree model for skipping responses was implemented for data of the last three cycles (2006, 2011, and 2016) of the Progress in International Reading Literacy Study (PIRLS). Correspondence between the official PIRLS results, a Rasch model, and the IRTree model was compared at three levels: items, students, and countries. We found some differences between the PIRLS and the Rasch model estimates at the item level; however, these do not significantly impact either the estimation of student ability or the country means and rankings. Moreover, the correlation between the scores estimated by the Rasch model and the IRTree model at the student level is high; however, it is not linear. In general, the results showed that while a change in the model may impact specific countries, it did not significantly impact the overall results or the country rankings. Nonetheless, when the information is disaggregated to compare a country's results over time, it is possible to observe how the increase or decrease in the proportion of skipped items can affect their overall results.

References:

Debeer, D., Janssen, R., & De Boeck, P. (2017). Modeling Skipped and Not-Reached Items Using IRTrees. Journal of Educational Measurement, 54(3). 333-363. https://doi.org/10.1111/jedm.12147 De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-Based Item Response Models of the GLMM Family. Journal of Statistical Software, 48, 1-28. https://doi.org/10.18637/jss.v048.c01
 

Exclusion Rates from International Large-scale Assessments. An Analysis of 20 Years of IEA Data

Umut Atasever (IEA), John Jerrim (University College London), Sabine Tieck (IEA)

Cross-national comparisons of educational achievement rely upon each participating country collecting nationally representative data. When it comes to missing data, researchers would usually think of omitted answers, not reached questions, and perhaps more general non- response due to non-participation. However, in ILSAs, in most countries specific parts of the international defined target population are - due to various reasons - not considered as of interest or inaccessible, which results that right from the start a part of the population is disregarded. While obtaining high response rates are a key part of reaching this goal, other potentially important factors may also be at play. As noted by Anders et al (2021) and Jerrim (2021), response rates are only part of the story. Other factors – such as the precise definition of the target population and decision about how many schools/students to exclude – also have an impact as well. When taken together, this can result in the data collected having sub-optimal levels of population coverage, jeopardizing a key assumption underpinning cross-country comparisons - that the data for each nation is nationally representative. The paper focuses upon one such issue – exclusion rates – which has received relatively little attention in the academic literature. We elaborate on the causes of missing out a part of the target population, how these are calculated and reported, and how they changed over time. The data we analyze about such exclusion rates are drawn from all IEA studies conducted between 1999 and 2019. This incorporates six rounds of TIMSS, four rounds of PIRLS, two rounds of ICCS and two rounds of ICILS. All countries that took part in any of these studies/cycles are included. Using descriptive analyses (e.g. benchmarks, correlations, scatterplots) and OLS (ordinary least squares regression model) methods, we find there to be modest variation in exclusion rates across countries, and that there has been a relatively small increase in exclusion rates in some over time. We also demonstrate how exclusion rates tend to be higher in studies of primary school students than for studies of secondary school students. Finally, while there seems to be little relationship between exclusion rates and response rates, there is a modest association between the level of exclusions and test performance. Given the relatively high and rising level of exclusions in some countries, it is important that exclusion rates do not increase any further and – ideally – start to decline.

References:

Anders, J., Has, S., Jerrim, J., Shure, N., & Zieger, L. (2020). Is Canada really an education superpower? The impact of non-participation on results from PISA 2015. Educational Assessment, Evaluation and Accountability, 33, 1, 229-249. https://doi.org/10.1007/s11092-020- 09329-5 Jerrim, J. (2021). PISA 2018 in England, Northern Ireland, Scotland and Wales: Is the data really representative of all four corners of the UK? Review of Education, 9(3). doi:10.1002/rev3.3270
 

From Paper-pencil to Online Delivery? The Mode Effect and Bias of Non-Participation in Home Questionnaires.

Alec Kennedy (IEA), Rune Müller Kristensen (Aarhus University), Rune Müller Kristensen (Aarhus University), Yuan-Ling Liaw (IEA)

International large-scale assessments (ILSA) administrate home questionnaires to parents or guardians to collect important information regarding students’ home context and early learning experience literacy that cannot be surveyed from students directly. Non-participation rates in these home surveys are often much higher than in student surveys and thus the representativeness of these surveys may be in jeopardy. Non-participation in these questionnaires may result in biased estimates. To make the home surveys more accessible, countries now have the option of administering them either via a paper-and-pencil or a computer-based format (e.g., DeLeeuw, 2018). In this study, we investigate two conditions related to parent’s non-participation. Firstly, we will investigate the non-participation bias accordingly to different levels of non-participation in home questionnaires in both TIMSS and PIRLS studies. Secondly, we investigate whether non-response rates can be attributed to an administration mode effect. To identify this mode effect, we take advantage of the international data from several TIMSS and PIRLS cycles and compare the participation rates on the home questionnaires before and after the switch from a paper-and-pencil format to an online survey system. Based on data from more than 70 countries, we examine the effect of changing the form of administration in fixed-effects analyses for countries and study cycles. In further analysis, we examine the interaction between respondents’ characteristics (e.g., education level) and mode effect on the participation rates. We find lower participation in almost all countries that switch to online surveys, with dramatic differences of up to about 20 percent. Furthermore, we find evidence that especially socially weak families participate less in the surveys.

References:

DeLeeuw, E. D. (2018). Mixed-Mode: Past, Present, and Future. Survey Research Methods, 12(2), 75-89. https://doi.org/10.18148/srm/2018.v12i2.7402
 

 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: ECER 2023
Conference Software: ConfTool Pro 2.6.150+TC
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany