JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organisers at ecer2023@eera.eu.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Please note that all times are shown in the time zone of the conference. The current conference time is: 17th May 2024, 05:42:49am GMT

Session Overview

Session

09 SES 13 B: Assessment Practices and School Development: Fostering Fairness and Effective Implementation

Time:

Thursday, 24/Aug/2023:

5:15pm - 6:45pm

Session Chair: Alli Klapp

Location: Gilbert Scott, 253 [Floor 2]

Capacity: 40 persons

Paper Session

Presentations

09. Assessment, Evaluation, Testing and Measurement
Paper

How to Deal with the Challenge of Assessment – Performance and Assessment Culture as an Issue of School Development

Marius Diekmann, Sabine Gruehn, Carolin Kruell

University of Münster, Germany

Presenting Author: Diekmann, Marius; Kruell, Carolin

Without doubt, everyday assessment and responses to student performance are central facets of school quality and an important field of innovations in school. The development of a “new” and formative performance resp. assessment culture, which is suitable for initiating and supporting the acquisition and development of both subject-specific and interdisciplinary/generic competences of students (e.g. self-regulated learning, social competences), seems to be a necessary condition for the development of teaching and learning in general (cf. MfSW NRW 2009; Beutel et al. 2017; Wiliam 2018). In this context, the term “culture” refers to a fundamental change of assessment practice that is not limited to a selective use of some additional or alternative diagnostic instruments by only a few teachers (cf. Jürgens & Diekmann 2006; Box 2019, 42/143). According to Sacher (2014, 264) the development of a new resp. formative performance and assessment culture in schools will only succeed, if it is based on a jointly formulated assessment concept in which the teaching staff fixes objectives, guiding principles and concrete agreements on assessment practice. Such an assessment concept – Sacher points out – must be implemented, regularly evaluated, discussed, and revised as an essential part of the school program. In which way, to what extent and how successful individual schools undertake efforts regarding the requested change and development of performance and assessment culture(s) (cf. e.g. Winter 2012) has hardly been empirically investigated (in Germany). Most of the findings on performance and assessment culture and associated innovations relate to schools that can be described as “extraordinary”. Extraordinary in the sense that they, for example, have been nominated for the (nationwide) German School Award (cf. Porsch et al. 2014, Beutel & Pant 2020) or have a special pedagogical profile (e.g. Montessori, cf. Diekmann 2018). There are almost no empirical findings that give a broader impression of focal points, achievements, or school form/grade specific characteristics of the change in performance/assessment culture at “ordinary” schools. One exception are the findings obtained in the context of an external evaluation (“Qualitätsanalyse”) of schools in North Rhine-Westphalia (federal state of Germany). During this evaluation, various methods were used to gain a comprehensive impression of the work and quality of schools. Among other things, classroom observations were conducted, and so-called school portfolios were reviewed. The school portfolios contained various documents specific to individual schools, such as school programs. A mandatory component of the school portfolios were the performance concepts developed by the individual schools. In summary the performance resp. assessment concepts schools had to submit during the evaluation are characterized as unsatisfying and in need of development (cf. MfSW NRW 2009, 34). Unfortunately, this conclusion is not really explained in detail. Regarding specific differences between school types and levels, only a few findings are reported. For example, it is pointed out that the performance/ assessment concepts at secondary schools are comparatively subject-specific (compared to the performance/ assessment concepts at primary schools). In contrast, performance/assessment concepts at primary schools apparently prove to be more elaborate about the formative use of individual diagnostics. (cf. MfSW NRW 2016, 30-32). The following questions arise from this:
Research question 1: How are performance/assessment concepts designed in terms of scope and content? Are there any school level/form-specific priorities or features?
Research question 2: Are performance/assessment concepts embedded in a whole school approach of school development?

Methodology, Methods, Research Instruments or Sources Used
To gain fundamental insights into the questions raised above, we conducted an explorative content analysis of diverse documents dealing with performance and assessment which we found on homepages of each 100 randomly selected primary schools and secondary schools in North Rhine-Westphalia. In addition to the texts explicitly designated as performance/assessment concepts, we have also incorporated texts and text passages that deal, for example, with grading practice in various subjects. After we downloaded the documents from the schools’ homepages from January to May 2022, we developed, tested, and revised the category system for content analysis. The categories we used were derived both from the material resp. performance/assessment concepts itself and from academic discussion (cf. e.g. Bohl 2018) and guidelines given by the Educational Administration (cf. QUA-LiS NRW 2011). Statements in performance/assessment concepts resp. performance/assessment related information that could be assigned to the following (superordinate) categories were coded and counted: general and subject-specific principles and objectives of (performance) assessment; quality criteria for (performance) assessment; forms and instruments of (performance) assessment; concretization and implementation of legal requirements; performance/assessment concept in the context of school and teaching development; innovations; evaluation and revision. Analyses of variance and T-tests were used to examine whether there are significant school-level and school-form-specific differences. One advantage of document analysis is that this method is much less prone to the phenomenon of social desirability than, for example, a written survey or an interview. One of its disadvantages, however, is that the origin and authorship of the analyzed material cannot always be traced, for example. Therefore, it is usually recommended to combine different methods of data collection. This is what we intend to do in the next step. Based on the findings of our document analyses, we plan to conduct in-depth interviews with school administrators and written surveys of teachers.
Conclusions, Expected Outcomes or Findings
Research question 1: The length of the performance/assessment concepts as well as the subject-specific parts varies considerably within the sample, ranging from 3 to 149 and from 0 to 139 pages (primary vs. secondary schools). Performance/assessment concepts at primary schools typically consist of an interdisciplinary and a subject-specific part. The latter is usually not included in the performance/assessment concepts of secondary schools but may be found in a separate document (subject-specific performance/assessment concept). Practically all of the performance/assessment concepts contain statements on quality criteria and principles of performance measurement, to which the respective school feels (particularly) committed, as well as statements on the concretization and implementation of legal requirements, which can be found in the School Act, in examination regulations or decrees.
Research question 2: Just under half to two-thirds of the performance/assessment concepts (elementary vs. secondary schools) contain basic statements about their formation. Less frequent and less extensive are indications to the evaluation and revision of performance/assessment concepts. It is quite remarkable that - especially at primary schools - a connection to the individual school program/school profile is established only in exceptional cases. In contrast, innovations (e.g., use of new/formative instruments) are reported more frequently in the performance/assessment concepts of elementary schools compared to those of secondary schools.

To put it simply, the performance/assessment concepts analyzed largely prove to be information about the existing practice of performance measurement and assessment, some of which is specific to the school level, as well as a concretization of binding, general requirements. As programs for innovation resp. the development and implementation of a "new" performance culture - as suggested by Sacher (2014, 264) – performance/assessment concepts seem to be (still?) little used. Examining the reasons of this finding is one of the purposes of our planned follow-up study.
References
S.-I. Beutel, K. Höhmann, H. A. Pant, M. Schratz (Hg.) (2017): Handbuch Gute Schule. Sechs Qualitätsbereiche für eine zukunftsweisende Praxis. 2. Auflage. Seelze: Kallmeyer.
Beutel, S.-I.; Pant, H. A. (2020): Lernen ohne Noten. Alternative Konzepte der Leistungsbeurteilung. Stuttgart: Kohlhammer.
Bohl, T. (2018): Ewige Baustelle? Von pädagogischer Innovation und diagnostischer Qualität. In: Lernende Schule 21 (84), 3-7.
Box, Cathy (2019): Formative Assessment in United States Classrooms. Changing the Landscape of Teaching and Learning. London: Palgrave Macmillan.
Diekmann, M. (2008): Wortgutachten, Zeugnisbriefe und Rasterzeugnisse. Zur Beurteilungspraxis an bayerischen Montessori-Schulen. In: Lernende Schule 21 (84), 30-34.
Jürgens, E.; Diekmann, M. (2006): Lernleistungen von und mit Kindern erfassen und bewerten. In: P. Hanke (Hg.): Grundschule in Entwicklung. Herausforderungen und Perspektiven für die Grundschule heute. Münster: Waxmann, 206-229.
Ministerium für Schule und Weiterbildung des Landes Nordrhein-Westfalen (MfSW NRW) (2009): Qualitätsanalyse in Nordrhein-Westfalen. Impulse für die Weiterentwicklung von Schulen. Düsseldorf.
Ministerium für Schule und Weiterbildung des Landes Nordrhein-Westfalen (MfSW NRW) (2016): Qualitätsanalyse in Nordrhein-Westfalen. Landesbericht 2016. Düsseldorf.
Qualitäts- und UnterstützungsAgentur – Landesinstitut für Schule (QUA-LiS NRW) (2011): Anlage 1.4 Checkliste – Leistungskonzept (Material Nr. 2955), verfügbar unter: schulentwicklung-nrw.de.
Porsch, R.; Ruberg, C.; Testroet, I. (2014): Elemente einer Didaktik der Vielfalt. Die Bewerbungsportfolios der Schulen. In: S.-I.Beutel, W. Beutel (Hg.): Individuelle Lernbegleitung und Leistungsbeurteilung. Lernförderung und Schulqualität an Schulen des Deutschen Schulpreises. Schwalbach/Ts.: Wochenschau Verlag, 16-87.
Sacher, W. (2014): Leistungen entwickeln, überprüfen und beurteilen. Bewährte und neue Wege für die Primar- und Sekundarstufe. 6., überarbeitete und erweiterte Auflage. Bad Heilbrunn: Klinkhardt.
Wiliam, D. (2018): Embedded Formative Assessment. Second Edition. Bloomington, IN: Solution Tree.
Winter, F. (2012): Leistungsbewertung. Eine neue Lernkultur braucht einen anderen Umgang mit den Schülerleistungen. 5., überarbeitete und erweiterte Auflage. Baltmannsweiler: Schneider.

09. Assessment, Evaluation, Testing and Measurement
Paper

Knowing without Doing: Chinese Primary Citizenship Teachers’ Perceptions and Practices of Assessment Policies

Peng Zhang, Enze Guo

IOE, UCL’s Faculty of Education and Society

Presenting Author: Zhang, Peng; Guo, Enze

Based on the policy enactment perspective (Ball et al., 2011), this study investigates how primary citizenship teachers do assessment policies in their practice and discusses its influencing factors. While the citizenship programme remains non-statutory at the primary level in many countries, such as England (Richardson, 2010), the programme is mandatory in China due to the emphasis on fostering socialist identity and moral cultivation. The programme standards mandate an assessment approach that focuses on students' 'values' and 'process performance' (Ministry of Education, 2022, p.49) rather than test scores. Advocating 'assessment for learning' (ibid., p. 50), the standards call for a greater emphasis on formative assessments. To ensure the full implementation of national education policies (Lu et al., 2018, p.113), China has established an internal agency system - the System of Pedagogical Research Officer. Primary citizenship programme pedagogical research officers are appointed by the district education authorities and have administrative powers. As intermediaries, they hold the authority and responsibility to interpret, translate, and organise citizenship assessment policies in practice.
Contrasting with the popular perception in China that relegates teachers to the role of policy implementers, scholars (Braun, et al., 2011; Ball, 2011) acknowledge teachers as policy enactors. As Ball (1994, p. 19) asserts, policies do not typically provide a set course of action, but rather create situations where the choices for what to do are limited or altered, or specific aims or outcomes are established. The majority of educational policies depend on their realisation through teaching, positioning teachers not merely as implementers, but as interpreters and 'translators' of policy (Perryman et al., 2017, p.745). This act of 'translation' suggests that while teachers adhere to policy, they also make adaptive modifications.
The study reveals that all schools employ standardised tests—developed by the district's education authorities—as summative assessments for students from Year 3 onwards, despite the absence of such tests for Year 1 and 2. Notwithstanding the stipulation in assessment policies that 'assessment results should be graded rather than scored' (Ministry of Education, 2022, p. 52), test results are ultimately rendered in the form of scores. Formative assessment post Year 3 is notably sparse, predominantly consisting of verbal feedback within classes. This is the case despite a unanimous acknowledgment that the curriculum standards advocate against basing judgments of students’ learning performance solely on test results.
Teachers perceive this disjunction—being aware of but not adhering to the assessment policies—as an outcome of the 'internal disintegration of the policies', a consequence of intermediary influences. In a manner akin to the role of medieval bishops interpreting the Bible, the pedagogical research officer wields unassailable authority within their community to interpret and translate official assessment policies. Their instructions and guidelines are considered the truly applicable policies, while the national assessment policies are often disregarded as overly 'idealistic' and 'abstract'. Furthermore, formative assessment is decried as a privilege available only to economically developed regions, which have the financial means to engage national and international experts for knowledge dissemination and practical guidance. Teachers also face considerable pressure from parents. As primary schools increasingly serve as childcare providers, teachers interact more directly and frequently with parents, many of whom express scepticism towards formative assessment due to its absence of tangible scores and rankings. The teachers identify the prevailing culture of competition or the 'rat race' endemic in nowadays China as the root of these challenges. As the country’s economy slows, societal pressure to compete escalates, underscoring the view that ‘excellence is not the sole goal, but more importantly, to be better than others'.

Methodology, Methods, Research Instruments or Sources Used
One district has been chosen as a case in this study, located in the capital of a border province - a city which, despite being officially defined as a regional centre, has significantly less economic and cultural impact compared to Beijing and Shanghai. The collected data comprised both interview and documentary sources. Primary school citizenship educators within this district, invited via purposive sampling, contributed to the interview data. This method ensured that participants freely expressed their authentic opinions. 13 teachers, covering all primary schools in this district, were interviewed across two rounds. Each participant possessed over five years of experience teaching the citizenship programme and was actively involved in student assessment practices. Due to pandemic-induced international mobility restrictions, the interviews were conducted online.

The interview Data were gathered in two rounds using semi-structured interviews. In the initial phase during 2019-2020, educators were interviewed for approximately 60 minutes to comprehend their perspectives on formative and summative assessment. Following the introduction of the new citizenship standards in 2022, the same teachers were invited for a second round of interviews, with each session extending close to 120 minutes. The objective was to gauge their views on assessment policies and particular practices. Initially conducted in Chinese, the interviews were later translated into English. Subsequently, the interview data were subject to thematic coding and analysis. Although this research is patently theory-driven — underpinned by the policy enactment perspective (Ball et al., 2011) — the attempt was to suspend any pre-existing theoretical expectations or biases during the coding phase as far as practicable. This was not only because the study aimed to present 'open' results, displaying authentic teacher viewpoints and practices, but also because it anticipated the emergence of themes beyond existing frameworks. Thematic data were continually compared throughout the coding process until saturation was achieved.

The documentary evidence encompassed the latest Primary Citizenship Programme Standards (2022 edition), 17 examination papers pertaining to the Year 3-6 citizenship programme (September 2016 to September 2022), and the topic outlines and associated documentation for the in-service citizenship teacher training over the past four years (September 2018 to September 2022). This data was contributed by participants who believed these documents played a policy role and exerted a structural impact on their assessment practice.

Conclusions, Expected Outcomes or Findings
This study diverges from the former studies emphasis on teachers’ personal factors and the Confucian testing tradition (Herman et al., 2015; Poole, 2016; Yan et al., 2021), instead investigating the manner in which teachers do official assessment policies by utilising the policy enactment perspective (Ball et al., 2011).Contrary to findings from England (Braun et al., 2011) and Ireland (Skerritt et al., 2021), the agency system in China does not invariably act as a catalyst and facilitator. On the contrary, it tends to fragment national assessment policies. This study therefore disputes the widespread belief in China that inadequate policy execution is due to teachers’ incompetence. Teachers often rely on intermediaries for policy interpretation, with these interpretations significantly influencing their behaviours. Additionally, most prior assessment studies in East Asia were centred in metropolitan regions, such as Hong Kong (Yan et al., 2021) and Beijing (Lu et al., 2018). However, this study revealed that teachers in non-metropolitan areas perceive formative assessment as a cultural benefit deriving from economic development, due to easier access to pertinent resources and support for metropolitan teachers. The child-care role that China’s primary schools play exerts greater assessment pressure on teachers from parents, compared to English secondary schools (Richardson, 2010).

Encouragingly, however, change is already underway. During the second round of interviews conducted in 2022, many teachers indicated efforts being made to enhance the status of formative assessment. They expressed gratitude towards this study, as it illuminated the utility of formative strategies in advancing student progress through practical experience, despite a lack of adequate support and training.

References
Ball, S.J. (1994). Education reform: A critical and post-structural approach, Buckingham, UK: Open University Press.

Ball, S. J., Maguire, M., & Braun, A. (2011). How schools do policy: Policy enactments in secondary schools. Routledge.

Braun, A., Ball, S. J., Maguire, M., & Hoskins, K. (2011). Taking context seriously: Towards explaining policy enactments in the secondary school. Discourse: Studies in the Cultural Politics of Education, 32(4), 585-596.

Herman, J., Osmundson, E., Dai, Y., Ringstaff, C., & Timms, M. (2015). Investigating the dynamics of formative assessment: Relationships between teacher knowledge, assessment practice and learning. Assessment in Education: Principles, Policy & Practice, 22(3), 344-367.

Lu, L. T., Shen, X., Liang, W. (2018). The composition and characteristics of practical knowledge of district and prefectural level pedagogical research officer: An example of district and prefectural level pedagogical research officer in Beijing, Teacher Education Research (教师教育研究),30(06),112-118.

Ministry of Education, (2022). Curriculum standards for morality and the rule of law in compulsory education. Beijing: Beijing Normal University Press.

Perryman, J., Ball, S. J., Braun, A., & Maguire, M. (2017). Translating policy: Governmentality and the reflective teacher. Journal of Education Policy, 32(6), 745-756.

Poole, A. (2016). ‘Complex teaching realities’ and ‘deep rooted cultural traditions’: Barriers to the implementation and internalisation of formative assessment in China. Cogent Education, 3(1),1-14.

Richardson, M. (2010). Assessing the assessment of citizenship. Research Papers in Education, 25(4), 457-478.

Skerritt, C., McNamara, G., Quinn, I., O’Hara, J., & Brown, M. (2021). Middle leaders as policy translators: Prime actors in the enactment of policy. Journal of Education Policy, 1-19.

Yan, Z., & Brown, G. T. (2021). Assessment for learning in the Hong Kong assessment reform: A case of policy borrowing. Studies in Educational Evaluation, 68, 100985.

09. Assessment, Evaluation, Testing and Measurement
Paper

Investigation of Careless Responding on Self-Report Measures

Başak Erdem Kara

Anadolu University, Turkiye

Presenting Author: Erdem Kara, Başak

The use of scores from self-report measures are very common in several areas of research. Since those instruments provide researchers to measure some psychological constructs such as personality, attitudes, beliefs, emotions of too many respondents in a short time, they are preferred widely for data collection process (Alarcon & Lee, 2022; Curran, 2015; Ulitzsch et al., 2022). However, some important problems may occur when responders do not give their best effort to select the response correctly reflecting themselves (which is very common especially with the unmotivated responders) (Rios & Soland, 2021; Schroeders et. al., 2022). Individuals may respond to items without reading them, by misinterpreting them or be unmotivated to think about (Huang et al., 2012; Ward & Meade, 2022). This type of responding behaviour have been stated as random (Beach, 1989), careless (Meade & Craig, 2012), insufficient effort (Huang et al., 2012), disengaged responding (Soland et al., 2019) in the literature. In the context of this study, the term ‘careless responding’ with CR abbreviation is preferred. Careless responding (CR) behaviour is a major concern based on the data taken from self-report scales in any type of research (Meade & Craig, 2012). Even the amount is small, it may affect the data quality and results of the study severely. Careless responses may introduce a measurement error, weaken the relationship between variables and inflate the Type II error. It may also introduce a new source of construct-irrelevant variance to the process and end up with an undesirable effect on psychometric properties of the scale (item difficulty, average scores, test reliability, factor structure etc.). Briefly, CR has the potential to weaken the test scores’ validity in different ways (Beck et. al., 2019; Rios & Soland, 2021).

Considering the factors stated above, CR have become an important and interesting research topic for researchers with a growing interest. One of the most important aspects on CR research is the way how we can detect and cope with them to ensure the quality of survey data. Identifying careless responders and removing them from the dataset is one of the suggested ways to increase data quality. In the literature, there are several data screening methods mainly classified in two groups; priori and post-hoc. Priori methods are the ones that are planned and incorporated into data collection process before the administration of survey. On the contrary, post-hoc methods get involved in the process after data collection. They are implemented on the collected dataset and typically based on a statistical calculation.

While there are several studies focusing on the effect of careless responding on datasets and comparison of the efficacy of CR identification methods, there is still no clear answer about the detection accuracy of CR identification methods (Goldammer et al., 2020). Besides, this study will focus on prior methods that have been studied and focused less on previous studies.

The present study will handle three different ways of prior methods (instructed response items, reverse items and self-report items) which will be explained in method part in detail. In the context of this study, these three ways of CR identification will be used, examinees will be removed from dataset according to those methods separately and their effects on psychometric properties of data will be investigated. This study addresses the following research questions;

- How was the distribution of careless responders with respect to three different CR identification methods?

- How did psychometric properties of the data (scale mean, reliability, correlation between factors, factor structure etc.) change when careless respondents were removed from the data with respect to different CR identification methods?

Methodology, Methods, Research Instruments or Sources Used
The purpose of that study is to examine self-report data for careless responding, to investigate the effect of CR on psychometric properties of dataset and to compare the performance of CR identification methods. Three different priori methods will be used for this purpose; instructed response, reverse and self-report items. Instructed response items are special items instructing respondents to select one specific category and the ones that choose another option than the instructed response, are assumed as careless. Reverse items are used as attention control items. Individuals are expected to select responses in opposite directions for reverse items. When they give same or too similar answers, it is assumed as an indicator of CR. Lastly, self-report items directly ask individuals about their effort (e.g. ‘I put forth my best effort in responding to this survey’; Meade & Craig, 2012).
In the context of this study, a self-report scale will be used for data collection purpose. A manipulated version of this instrument will be formed by adding one instructed-response item (‘Please select ‘strongly agree’ for this item’), one manipulated reverse item and one self-report item (‘I did my best while responding to the scale’). It is planned that manipulated form will be applied to approximately 500 students. Only one instructed response item will be added to the original scale and individuals selecting the response other than the instructed one will be handled as CR. Additionally, one reverse item for one of the items on the original scale will be purposefully added and individuals choosing the same or similar responses for reverse items will be assumed as CR. Lastly, only one self-report item will be included at the end of the scale and responders will be evaluated according to their own answers in terms of CR. Percentage of careless responders will be calculated for each method separately and psychometric properties of data (scale mean, factor loadings, reliability, explained variance etc.) will be examined. After that, careless responders will be excluded from dataset according to three methods separately and the three separate remaining datasets will be examined again to see how psychometric properties (scale means, reliabilities, correlation between factors etc) were affected by that removal. Lastly, in order to see which CR identification method performed most efficiently and improved data quality, psychometric properties (reliability, factor structure etc.) of remaining datasets will be compared separately.
Conclusions, Expected Outcomes or Findings
The finding of this study is important for the researchers and practitioners who are using self-report measures for data collection and making conclusions based on that data. Careless responses may cause ‘dirty data’ and may affect the results significantly. So, some investigations should be considered in order to make data cleaning. In addition, result will investigate the efficiency of using of different prior methods and some suggestions will be made on CR identification. I hope that this study will help to fill some gaps in careless responding identification and eliminating its’ effect in a better way.
References
Alarcon, G. M., & Lee, M. A. (2022). The relationship of insufficient effort responding and response styles: An online experiment. Frontiers in Psychology, 12. https://www.frontiersin.org/article/10.3389/fpsyg.2021.784375

Beck, M. F., Albano, A. D., & Smith, W. M. (2019). Person-fit as an index of inattentive responding: A comparison of methods using polytomous survey Data. Applied Psychological Measurement, 43(5), 374–387. https://doi.org/10.1177/0146621618798666

Curran, P. G. (2015). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66(2016), 4–19. https://doi.org/10.1016/j.jesp.2015.07.006

Goldammer, P., Annen, H., Stöckli, P. L., & Jonas, K. (2020). Careless responding in questionnaire measures: Detection, impact, and remedies. The Leadership Quarterly, 31(4). https://doi.org/10.1016/j.leaqua.2020.101384

Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27(1), 99–114. https://doi.org/10.1007/s10869-011-9231-8

Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437–455. https://doi.org/10.1037/a0028085

Rios, J. A., & Soland, J. (2021). Parameter estimation accuracy of the effort-moderated item response theory model under multiple assumption violations. Educational and Psychological Measurement.

Schroeders, U., Schmidt, C., & Gnambs, T. (2022). Detecting careless responding in survey data using stochastic gradient boosting. Educational and Psychological Measurement, 82(1), 29–56. https://doi.org/10.1177/00131644211004708

Ward, M. K., & Meade, A. W. (2022). Dealing with careless responding in survey data: prevention, identification, and recommended best practices. Annual Review of Psychology, 74(1). https://doi.org/10.1146/annurev-psych-040422-045007

Ulitzsch, E., Yildirim-Erbasli, S. N., Gorgun, G., & Bulut, O. (2022). An explanatory mixture IRT model for careless and insufficient effort responding in self-report measures.

09. Assessment, Evaluation, Testing and Measurement
Paper

Shaping and Inspiring a Fair Thinking in assessment. A research with pre-service and in-service teachers

Debora Aquario¹, Norberto Boggino², Elisabetta Ghedin¹, Juan Gonzalez Martinez³, Griselda Guarnieri², Teresa Maria Sgaramella¹

¹University of Padova, Italy; ²Universidad Nacional de Rosario, Argentina; ³Universitat de Girona, Spain

Presenting Author: Aquario, Debora; Ghedin, Elisabetta

How can we activate and create assessment systems that lead to a flourishing school where everyone is able to fulfil their potential and achieve both success and well-being? How might we shift assessment practices toward equity and justice/fairness? How do the assessment methods meet the diversity of the students? The research project SHIFT (Shaping and Inspiring a Fair Thinking in assessment) aims to investigate how a range of emerging trends within the international community can be used to answer these questions. These trends concern the literature on: (1) human capabilities (Sen, 1999) as a framework for ‘social justice’, (2) Assessment for Learning (Swaffield, 2011) as the horizon for understanding assessment, (3) Universal Design for Assessment (CAST, 2011) as the philosophy that attempts to go beyond the ‘model of adjustment’ and (4) such approaches as: fair and equitable assessment (Tierney, 2013; Montenegro & Jankowski, 2020), culturally-responsive assessment (Nortvedt et al., 2020), inclusive and universal assessment (Waterfield & West, 2006; Nieminen, 2022; Tai et al., 2021).

An increased focus on equity and justice emerges from the 2030 Agenda for Sustainable Development, where the commitment is to provide inclusive and equitable quality education at all levels, as well as from other European and international documents (OECD, 2012, 2005; UNESCO, 2015, 2022). The same concern is evident in empirical studies focused mainly in higher education contexts (Nieminen, 2022; Tai et al., 2021), and what has become evident as more and more assessment researchers and practitioners engage with the equity conversation is the desire for considering these issues also in school context. Moreover most assessment research is based on what can be described as a ‘technical perspective’, looking at whether assessment is efficient, reliable, valid, leaving less space for a “humanistic” perspective that highlights assessment to foster learning for human flourishing and for responsibility toward and within society (Swaffield, 2011; Fuller, 2012; Gergen & Gill, 2020; Hadji, 2021).

Dialogue into the paradigms of assessment is of paramount importance if assessment aims at embracing a focus on equity, ethics, and humanization and meeting the challenges of these times. The paradigm shift was initiated many years ago, moving from assessment of learning towards assessment for learning, giving greater attention on the role of learners (opening the way to participatory approaches connecting school and community), on a shift from product to process-focused assessment and on a view of learning as a lifelong process rather than something done to prepare for an exam. Although these changes have been partially incorporated into the debate about educational assessment, work remains to be done to ensure the necessary attention to the issue of diversity among learners. Such an approach would strengthen the value of the shift and enlarge the potential of the assessment process towards the promotion of all students’ learning and growth moving away from a model of adjustments, which makes specific reasonable accommodations for some students towards assessment models that allow all students to fully participate and learn in the most equitable way.

Coherently with the theoretical framework, the research design addresses the importance of engagement, participation and opportunities for access, choosing a community-based approach interconnected with the appreciative one, seeking to produce a new imaginary for approaches to assessment, with implications for both cultures and practices. The aim of the Programme is to connect assessment with justice and equity through a participatory process sustaining the shift towards a fair thinking in assessment.

Methodology, Methods, Research Instruments or Sources Used
SHIFT intends to give value to bottom-up research practices articulating the logic and the flow of Community-Based Participatory Research (CBPR) and Appreciative Inquiry as follows. At this point, steps 1 and 2 have been implemented.
STEP 1: Discovering & identifying the community: implemented through an initial phase consisting in different activities of Public Engagement (PEa). The following activities have been realised to raise awareness and mutual understanding (by establishing a common language and participatory multi-actor dialogues) and to promote and develop a shared assessment literacy about inclusiveness and diversity: an open webinar day devoted to the discussion of the main topics about accessibility and equity; a website; the identification of a logo for the project; initiatives specifically oriented to schools (‘dialoghi pedagogici’); a blog about the research keywords.
PEa represented the ground on which a Call to Action (CtA) will be opened as a strategy for discovering engagement. It has been devoted to the schools of all educational levels in order to collect concern about the key issues and co-constructing the community of research, composed by a network of 7 schools (from early childhood level to middle school level). Moreover, a group of 250 prospective teachers were involved in the research with the aim of exploring the same issues with pre-service teachers in order to explore different understandings and purposes of assessment in the two groups (Brown, Remesal, 2012).
STEP 2. Dreaming & Co-creating shared images of a preferred future. Based on the results of the CtA, the step 2 invites to begin “envisioning together”: asking themselves “what might be?” imagining and envisioning how things might work well in the future. Therefore the aim is collaborating for identifying and fostering the capacity to aspire and imagine possible and future actions.
Instruments used are the following: panel discussions with teachers (one for each of the 7 schools, for a total of 60 involved teachers from early childhood to secondary school level, and 4 panel discussions with 50 students enrolled in teacher education programs from 5 different countries -UK, Turkey, Lithuania, Netherlands and Portugal); written interviews administered to 200 students enrolled in the teacher education program at University of Padova. These questions guided the reflection: How might we shift assessment practices toward equity and accessibility? How might we assess for learning and growth of all students? How assessment methods meet the diversity of the students?
Conclusions, Expected Outcomes or Findings
First analyses from the panel discussions and written interviews with pre-service and in-service teachers show that assessment is fair when the following aspects become relevant: assessment as an integral part of the teaching strategy (use of assessment by the teacher to make teaching decisions and the consequent need to consider changes in teaching and assessment jointly and reciprocally); the “pedagogical” process as distinct from the “administrative” process (the process of reflection and use of evaluation criteria not confused with the attribution of the mark/grading); communication (need to pay attention to the communication moments -in progress and final-, both to the students and their families; need to act for change in the field of assessment by working on all dimensions -student, family, teachers, school head- in a parallel and integrated way. Differences in narratives by pre- and in-service teachers will be presented.
Next steps 3 and 4 concern the design of innovative ways to bring into existence the preferred future participants have envisioned in the Dream step through the use of participatory videos (Boni et al., 2020) and the realisation of a narrative storytelling process for sustaining the change in assessment culture and practice towards fair assessment.
Expected final outcomes consist of:
- an open digital toolkit for a fair assessment: flexible/modular in its structure/implementation, accessible and usable. Examples of the toolkit contents: a guide for a universal approach to assessment; guidelines (with different resources, tips and examples) for designing accessible and universal assessments; good practices of fair assessment.
- multimedia open educational resources (OER, Wiley, 2006) with the aim to offer an open learning path for all those who want to be self-trained in the research topics (audiovisual material, references and readings, simulations, workshops, guides about assessment contents, multimedia resources from public engagement activities and participatory videos).
References
Aquario D. (2021). Through the lens of justice. A systematic review on equity and fairness in learning assessment. Education Sciences & Society, 2, 96-110.
Brown G. T. L., & Remesal A. (2012). Prospective teachers’ conceptions of assessment: A cross- cultural comparison. The Spanish Journal of Psychology, 15(1), 75–89.
Bushe G. R. (2012). Foundations of Appreciative Inquiry: History, Criticism,and Potential. AI Practitioner: The International Journal of AI Best Practice, 14 (1), pp. 8-20.
Center for Applied Special Technology (2011). Universal Design for Learning (UDL). Guidelines version 2.0, CAST, Wakefield.
Hanesworth P., Bracken S. and Elkington S. (2019). A typology for a social justice approach to assessment: Learning from universal design and culturally sustaining pedagogy. Teaching in Higher Education, 24 (1), 98-114.
Heritage M., & Wylie C. (2018). Reaping the benefits of assessment for learning: achievement, identity, and equity. ZDM, 50 (4), 729–741.
Klenowski V. (2014). Towards fairer assessment. Australian Educational Researcher, 41, 445–470.
Levy, J., & Heiser, C. (2018). Inclusive assessment practice (Equity Response). Urbana, IL: University of Illinois and Indiana University, NILOA.
McArthur J. (2016). Assessment for social justice: the role of assessment in achieving social justice. Assessment & Evaluation in Higher Education, 41, 7, 967-981.
Montenegro E., & Jankowski N. A. (2020). A new decade for assessment: Embedding equity into assessment praxis. Urbana, IL: University of Illinois and Indiana University, NILOA.
Murillo F. J., Hidalgo N. (2017). Students’ conceptions about a fair assessment of their learning. Studies in Educational Evaluation, 53, 10-16.
Nortvedt, G.A., Wiese, E., Brown, M. et al. (2020). Aiding culturally responsive assessment in schools in a globalising world. Educational Assessment, Evaluation and Accountability, 32, 5–27.
Scott S., Webber C. F., Lupart J. L., Aitken N. and Scott D. E. (2014). Fair and equitable assessment practices for all students. Assessment in Education: Principles, Policy & Practice, 21,1, 52-70.
Sen A. (1999). Development as freedom. Oxford: Oxford University Press.
Stobart G. (2005). Fairness in multicultural assessment systems. Assessment in Education, 12, 3, 275–287.
Swaffield S. & Williams M. (Eds.) (2008), Unlocking assessment: Understanding for reflection and application. London: David Fulton.
Tierney R.D. (2014). Fairness as a multifaceted quality in classroom assessment. Studies in Educational Evaluation, 43, 55-69.
Zhang, J., Takacs, S., Truong L., Smulders, D., Lee, H. (2021). Assessment Design: Perspectives and Examples Informed by Universal Design for Learning. Centre for Teaching, Learning, and Innovation. Justice Institute of British Columbia.

Mobile View Print View

Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: ECER 2023