Session | ||
09 SES 01 B: Insights into Learning and Assessment
Paper Session
| ||
Presentations | ||
09. Assessment, Evaluation, Testing and Measurement
Paper Dangers of the Skip-Button - How Learning Analytics Can Enhance Learning Platforms and Student Learning University of Flensburg, Germany Presenting Author:A high number of European students face challenges in reading (Betthäuser et al., 2023; Mullis et al., 2023) and require reading support as the new Progress in International Reading Literacy Study (PIRLS) displayed last year. Consequently, it is crucial to employ formative diagnostics to record the learning levels of students, enabling the design of targeted interventions at an early stage. As a result, the German Kultusminsterkonferenz [Commission of the Conference of Ministers of Education] advocates for the early implementation of nationwide diagnostics (Köller et al., 2022). Furthermore, the PIRLS, is not only conducted through a computer-based assessment since 2016, but also has a focus point in digital forms of reading (Mullis & Martin, 2019). Formative assessment serves not only to monitor the students’ learning but also to provide ongoing feedback, refine teaching approaches, and address the unique needs of individual students (OECD, 2005; OECD, 2008). Formative assessment is essential for both students and teachers. Additionally, in the realm of monitoring and aggregating student data, is learning analytics, which plays a vital role for researchers and developers of educational applications. Researchers seek to understand students' learning behaviors and their utilization of platforms, while developers strive to leverage this information for enhancing learning platforms (SoLAR, 2021). In the project DaF-L an adaptive, digital, and competence-oriented reading screening with aligned reading packages, consisting of literary texts and reading exercises, was developed, tested, standardized, and subsequently made available as an Open Education Resource (OER) on the online learning platform Levumi (Gebhardt et al., 2016) for primary schools in Germany. The digital reading packages were developed for three reading ability levels, in which the students were sorted into through the screening. The packages’ reading ability levels consist of the same story line for the literary texts and the same exercise formats with some variation depending on the ability level. One key importance of the reading packages is the digitalization. The reading packages support different ability levels and an individualized learning where students can work on the exercises at their own speed with integrated tools such as immediate individualized feedback, second try-options, and solutions. Additionally, and essential is that the reading packages can be used as a diagnostic tool, which enables teachers to support students in the best way possible. Teachers are required and encouraged to conduct and to employ diagnostics with the focus to support their students. However, it is very time consuming and difficult for teachers to execute as they often do not know what ability test/s they should convey, how to determine if the application was helpful, if students used their full potential to answer questions, where they might need help, and what tools were helpful or unnecessary in an application. Therefore, the DaF-L project provides diagnostic tools for teachers with everything they need in order to support their students. Furthermore, through the intervention study, researchers who developed the digital reading packages received essential data. The gathered data offered insights into the reading packages, allowing for an assessment of the reading packages' strengths and weaknesses. This information was utilized to make essential adjustments, ensuring the development of the most optimal application. Moreover, the research conducted a four-week intervention in a regular classroom setting continuously gathering data to gain a deeper understanding of students' learning behaviors and their utilization of the learning platform. The presentation will discuss the digitalization of the reading packages with focusing on learning analytics. With the objective of exploring how learning analytics, such as examining the time dedicated to reading literary texts, time spent on answering questions, and assessing the use of the skip-button, can improve learning platforms worldwide. Methodology, Methods, Research Instruments or Sources Used The collaborative project follows a multi-method design. An ABA-design was selected for the intervention study (Graham et al., 2012). The study was expedited from April 2023 until July 2023 and collected quantitative data of individuals, groups, and classes. It consisted of a survey group (N = 59) and a control group (N = 53). A) Start of the initial testing started, which consisted of the self-developed digital and competence-oriented reading screening and the ELFE 2, which is an established diagnostic test. B) Approximately two weeks later, in the first lesson the students took a self-developed digital a-version test tailored to their reading packages, marking the initiation of a four-week intervention phase. The intervention (reading support) occurred three times a week for 30 minutes within a classroom setting. Students were provided with a reading package based on their proficiency levels and worked on them individually. Throughout all intervention sessions students’ responses as well as any additional information regarding their learning and their platform usage were digitally recorded. At the end of the intervention, students participated in the b-version of the aligned test as well as a second administration of the competence-oriented reading screening and the ELFE 2. A) A follow-up was conducted with the ELFE 2, the screening, and the c-version of the aligned test. Throughout the study, educators were interviewed, and observation protocols were employed. The learning platform Levumi and digital reading packages underwent adjustments based on insights gleaned from these interviews and observation protocols. However, the most intriguing insights of the students’ learning behavior emerged from the data collected during the intervention (learning analytics). This data encompassed various aspects, including the duration students spent completing the entire reading package, the time allocated to each exercise, the time students devoted to initially reading the literary text, the time spent on text rereading, whether students attempted exercises a second time, and the frequency of skip-button usage. Moreover, the scrutinized learning behavior could also be compared with the test results, examining whether factors such as the duration spent on reading materials align with test scores. Conclusions, Expected Outcomes or Findings The insights gained from interviews with educators and the observation protocols played a crucial role in advancing and enhancing the learning platform Levumi and the digital reading packages. While the observation protocols partially revealed intriguing findings, the data, especially concerning the misuse of the skip-button, was a noteworthy revelation. The collected information also shed light on important aspects of students' reading and response behavior, including the time spent reading the text before tackling exercises, the duration devoted to each exercise, utilization of the text-going-back function for rereading, use of the showing solution feature, engagement with the 2nd attempt, and the frequency of skip-button usage. Furthermore, based on the collected data, adjustments were made to the reading packages. Others such as the removal of the skip-button, in response to the observed misuse, are planned. Moreover, the learning behavior could also be examined in correspondence with the results of the tests, such as if an excessive usage of the skip-button has a negative effect on the test results or if a short reading time correlates with the test results. Additionally, the insights gained on the project can be applied to other learning platforms worldwide in order to enhance those. References Betthäuser, B. A., Bach-Mortensen, A. M. & Engzell, P. (2023). A systematic review and meta-analysis of the evidence on learning during the COVID-19 pandemic. Nature human behaviour. Vorab-Onlinepublikation. Gebhardt, M., Diehl, K. & Mühling, A. (2016). Online Lernverlaufsmessung für alle SchülerInnen in inklusiven Klassen. www.LEVUMI.de. Zeitschrift für Heilpädagogik, 67(10), 444-454. Graham, J. E., Karmarkar, A. M., Ottenbacher, K. J. (2012). Small Sample Research Designs for Evidence-Based Rehabilitation: Issues and Methods. Archives of Physical Medicine and Rehabilitation, 93(8, Supplement), 111-S116. https://doi.org/10.1016/j.apmr.2011.12.017 Köller, O., Thiel, F., van Ackeren, I., Anders, Y., Becker-Mrotzek, M., Cress, U., Diehl, C., Kleickmann, T., Lütje-Klose, B., Prediger, S., Seeber, S., Ziegler, B., Kuper, H., Stanat, P., Maaz, K. & Lewalter, D. (2022). Basale Kompetenzen vermitteln – Bildungschancen sichern. Perspektiven für die Grundschule. Gutachten der Ständigen Wissenschaftlichen Kommission der Kultusministerkonferenz (SWK). SWK: Bonn. Mullis, I.V.S., von Davier, M., Foy, P., Fishbein, B., Reynolds, K.A., & Wry, E. (2023). PIRLS 2021 International Results in Reading. Boston College, TIMSS & PIRLS International Study Center. https://doi.org/10.6017/lse.tpisc.tr2103.kb5342 Mullis, I. V. S., & Martin, M. O. (Eds.). (2019). PIRLS 2021 Assessment Frameworks. Retrieved from Boston College, TIMSS & PIRLS International Study Center website: https://timssandpirls.bc.edu/pirls2021/frameworks/ Organisation for Economic Co-Operation and Development (OECD). (2008). Assessment for Learning - Formative Assessment . AE Assessment for Learning . https://www.oecd.org/site/educeri21st/40600533.pdf Organisation for Economic Co-Operation and Development (OECD). (2005). Formative Assessment: Improving Learning in Secondary Classrooms. Policy Brief. OECD. https://www.oecd.org/education/ceri/35661078.pdf PIRLS International Study Center website: https://timssandpirls.bc.edu/pirls2021/frameworks/ Society for Learning Analytics Research (SoLAR). (2021). What is Learning Analytics?. Society for Learning Analytics Research (SoLAR). https://www.solaresearch.org/about/what-is-learning-analytics/ Stanat, P., Schipolowski, S., Schneider, R., Sachse, K. A., Weirich, S. & Henschel, S. (2022). IQB-Bildungstrend 2021: Kompetenzen in den Fächern Deutsch und Mathematik am Ende der 4. Jahrgangsstufe im dritten Ländervergleich. Waxmann Verlag. 09. Assessment, Evaluation, Testing and Measurement
Paper The Missing Piece in Multi-Informant Assessments? A Systematic Review on Self-Reports of School-Aged Participants with ADHD. 1University of Vienna, Austria; 2North-West University, Research Focus Area Optentia, Vanderbijlpark, South Africa Presenting Author:Recent evidence suggests that multi-informant assessments of children and adolescents with Attention-Deficit/Hyperactivity Disorder (ADHD) and co-occurring problems are more likely to provide sufficient sensitivity and specificity for population screening and clinical use than single measures (De Los Reyes et al., 2015); however, the perspectives of children and adolescents are underrepresented in scientific studies (Caye et al., 2017; for review see Mulraney et al., 2022). The question remains whether children and adolescents are reliable sources of information about their own ADHD symptoms. This may point to the need to investigate the complex interplay between self- and other (i.e. teachers, parents) reported ADHD symptoms (e.g. hyperactivity, inattention) and other externalizing (e.g. aggression) and internalizing (e.g. anxiety) problems. This review aims to systematically analyze and examine existing empirical studies that have focused on the comparison of self-reported and other-reported ADHD symptoms and co-occurring behavior problems in children and adolescents with ADHD. The purpose is to evaluate (1) the overall inclusion of self-reports in the assessment process (2) the agreements between informants (3) which types of informants are frequently used and (4) the instruments utilized. Methodology, Methods, Research Instruments or Sources Used Eligible studies published over the past decade in four major databases (PubMed, ERIC, PsycINFO, Web of Science) and retrieved from educational and psychological peer-reviewed journals through a thorough manual hand-search process were identified. Following PRISMA 2020 (Brennan & Munn, 2021) guidelines for inclusion and exclusion criteria, the study focuses on prospective data collection of school-aged participants to minimize recall bias associated with retrospective data reported in previous studies (von Wirth et al., 2021). Conclusions, Expected Outcomes or Findings Only 11 studies out of 467 selected studies published between 2003 and 2023 that involved a sample of diagnosed school-aged participants met the pre-defined inclusion criteria. Agreements of raters differ by (1) type of other informants (i.e. teachers or parents), (2) methodological procedures, (3) utilized assessment instruments and their psychometric properties, and (4) measured constructs. A variety of screening measures were utilized, with questionnaires predominating over interviews. In addition to teacher reports, parent reports were commonly included, with only one study gathering information from objective measurement methods. The review emphasizes that researchers who include self-reports need to be aware that young participants with ADHD often tend to underreport their behavior problems. Considering the strengths and limitations of the study, implications for practice and future research concerning existing inconsistencies in the conceptualization of externalizing problems are discussed. References Brennan, S. E., & Munn, Z. (2021). PRISMA 2020: A reporting guideline for the next generation of systematic reviews. JBI Evidence Synthesis, 19(5), 906–908. https://doi.org/10.11124/JBIES-21-00112 Caye, A., Machado, J. D., & Rohde, L. A. (2017). Evaluating parental disagreement in ADHD diagnosis: Can we rely on a single report from home? Journal of Attention Disorders, 21(7), 561–566. APA PsycInfo. https://doi.org/10.1177/1087054713504134 De Los Reyes, A., Augenstein, T. M., Wang, M., Thomas, S. A., Drabick, D. A. G., Burgers, D. E., & Rabinowitz, J. (2015). The validity of the multi-informant approach to assessing child and adolescent mental health. Psychological Bulletin, 141(4), 858–900. https://doi.org/10.1037/a0038498 Mulraney, M., Arrondo, G., Musullulu, H., Iturmendi-Sabater, I., Cortese, S., Westwood, S. J., Donno, F., Banaschewski, T., Simonoff, E., Zuddas, A., Döpfner, M., Hinshaw, S. P., & Coghill, D. (2022). Systematic Review and Meta-analysis: Screening Tools for Attention-Deficit/Hyperactivity Disorder in Children and Adolescents. Journal of the American Academy of Child & Adolescent Psychiatry, 61(8), 982–996. https://doi.org/10.1016/j.jaac.2021.11.031 von Wirth, E., Mandler, J., Breuer, D., & Döpfner, M. (2021). The Accuracy of Retrospective Recall of Childhood ADHD: Results from a Longitudinal Study. Journal of Psychopathology & Behavioral Assessment, 43(2), 413–426. 09. Assessment, Evaluation, Testing and Measurement
Paper The Relationship Between Students’ Response Times and Their Socioeconomic Status in European Countries: A Case of Achievement Motivation Questionnaire Items The Anchoring Center for Educational Research, Faculty of Education, Charles University, Czech Republic Presenting Author:Student questionnaire data, typically collected via Likert-scale items, is commonly used to compare different groups of students, be it across countries or based on student characteristics such as gender and socioeconomic status (e.g., OECD, 2017). However, such analyses can lead to inaccurate conclusions as the data might be biased due to the differences in reporting behavior between different groups of students (e.g., He & van de Vijver, 2016; Kyllonen & Bertling, 2013). Students can, for example, differ in the amount of effort they put into filling in the questionnaire.
Our theoretical framework relates to reporting behavior in surveys. Terms careless responding or insufficient effort responding have been used to describe responding patterns in which respondents lack motivation to answer accurately and do not pay attention to the content of items and survey instructions (Goldammer et al., 2020). A number of approaches have been suggested to identify such careless responding, the analysis of response time (to the whole survey or parts of it) being one of them (Curran, 2016; Goldammer et al., 2020). It rests on an assumption that there is a minimum time needed to read and answer a questionnaire item (Goldammer et al., 2020). The term “speeding” has been used for responding too fast to questionnaire items to give much thought to answers (Zhang & Conrad, 2014).
The analysis of response times is a promising tool for identification of the differences in the amount of effort put into filling in questionnaire surveys between different groups of students. It could help identify careless responding (a) between different groups of students during a single wave of measurement as well as (b) changes in careless responding of particular groups of students across different waves of measurement. This could be exploited, for example, in longitudinal research studies using questionnaires (e.g., [foreign language] learning motivation studies) as well as international large-scale assessment (ILSA) studies such as, for example, Programme for International Student Assessment (PISA). So far, however, the knowledge concerning the differences in questionnaire item response times between different groups of students in the context of ILSA studies is rather limited.
Previous research has suggested that students’ reporting behavior may differ across different socioeconomic groups (Vonkova et al., 2017), encouraging further exploration of the reporting behavior-socioeconomic status relationship. In this contribution, we address this research area. Our aim is to analyze the relationship between students’ response times to achievement motivation questionnaire items and their socioeconomic status in European PISA 2015 participating countries. Our research question is: How do questionnaire response times to achievement motivation items differ between students with parents achieving different education levels in European PISA 2015 participating countries? Methodology, Methods, Research Instruments or Sources Used We analyze data from the PISA 2015 questionnaire, focusing on 171,762 respondents from 29 European countries who were administered the questionnaire via computer. Specifically, we look at the response time to question ST119 (Achievement motivation), and we use the highest achieved education by parents (PISA variable HISCED) as an indicator for the socioeconomic status of students. Only respondents, who had complete information on all analysed variables were included in this analysis. In the question ST119, respondents were asked to answer five statements on achievement motivation using responses Strongly disagree (1), Disagree (2), Agree (3), and Strongly agree (4). The five statements were: (1) I want top grades in most or all of my courses, (2) I want to be able to select from among the best opportunities available when I graduate, (3) I want to be the best, whatever I do, (4) I see myself as an ambitious person, and (5) I want to be one of the best students in my class (OECD, 2014). Response times were taken from the response time dataset for PISA 2015, specifically the variable ST119_TT. They were logged for each screen (in this case screen containing five items relating to achievement motivation) and they were logged in milliseconds. For the purposes of our analysis, we have set an upper limit of two minutes for students to be included in the analysis. That is because a vast majority of respondents were able to respond in this time interval and only 406 respondents took longer. These were typically respondents who took extremely long (one even nearing an hour spent on the screen) and as such would negatively impact the analysis through not displaying standard response behavior. Information on parental education levels (HISCED) was extracted from the PISA 2015 dataset which uses the ISCED (International Standard Classification of Education) 1997 classification. HISCED categories ranged from 0 to 6, representing various levels of educational attainment. HISCED0 represents unfinished ISCED 1 level, HISCED1 and HISCED2 represent ISCED 1 and 2 levels, respectively, HISCED3 represents ISCED 3B and 3C, HISCED4 represents ISCED 3A and 4, HISCED5 represents ISCED 5B, and HISCED6 represents ISCED 5A and 6. Due to the low number of observations, we combined HISCED 0-2 categories for the purposes of our analysis. Conclusions, Expected Outcomes or Findings Our initial analysis showed a notable inverse relationship between mean response times to question ST119 and HISCED for European countries. Specifically, respondents from families with a lower educational background took longer when answering. This is further highlighted when looking both at median times and results of linear regression with country fixed effects (time being the explained variable and HISCED levels and country dummies being the explanatory variables), both of which display the same trend. However, when examining the variation in each HISCED group, data showed that HISCED0-2 group had fairly higher variation of response time than all other HISCED groups, the lowest being in the HISCED5 group. This suggests that there is a greater heterogeneity in response time within the HISCED0-2 group. This indicates that this group consists of a mix of respondents with low and high response times to the question ST119. Our results show that it is necessary to take response times into consideration when comparing groups of respondents, as they can potentially affect the analysis. Further research may be focused on the relationship of response times and home possessions or other indicators of socioeconomic status. Additionally, further research may also analyze other world regions and compare them with the European results. References Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006 Goldammer, P., Annen,H., Stöckli, P.L., & Jonas, K.(2020). Careless responding in questionnaire measures: Detection, impact, and remedies. The Leadership Quarterly, 31(4), Article 101384. https://doi.org/10.1016/j.leaqua.2020.101384 He, J., & Van de Vijver, F. J. R. (2016). The motivation-achievement paradox in international educational achievement tests: Toward a better understanding. In R. B. King & A. B. I. Bernardo (Eds.), The psychology of Asian learners: A festschrift in honor of David Watkins (pp. 253–268). Springer Science. https://doi.org/10.1007/978-981-287-576-1 Kyllonen, P. C., & Bertling, J. (2013). Innovative questionnaire assessment methods to increase cross-country comparability. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), A handbook of international large-scale assessment data analysis: Background, technical issues, and methods of data analysis (pp. 277–285). Chapman Hall/CRC Press. OECD. (2014). PISA 2015 student questionnaire (computer-based version). https://www.oecd.org/pisa/data/CY6_QST_MS_STQ_CBA_Final.pdf OECD. (2017). PISA 2015 results (volume III): Students' well-being. https://doi.org/10.1787/9789264273856-en Vonkova, H., Bendl, S., & Papajoanu, O. (2017). How students report dishonest behavior in school: Self-assessment and anchoring vignettes. The Journal of Experimental Education, 85(1), 36-53. https://doi.org/10.1080/00220973.2015.1094438 Zhang, C., & Conrad, F. (2014). Speeding in web surveys: The tendency to answer very fast and its association with straightlining. Survey Research Methods, 8(2), 127–135. https://doi.org/10.18148/srm/2014.v8i2.5453 |