Conference Agenda

Session
16 SES 01 A: Chatbots and Robotics
Time:
Tuesday, 27/Aug/2024:
13:15 - 14:45

Session Chair: Ruth Wood
Location: Room 016 in ΧΩΔ 02 (Common Teaching Facilities [CTF02]) [Ground Floor]

Cap: 56

Paper Session

Presentations
16. ICT in Education and Training
Paper

A Meta Scoping Review of Programming and Robotics in K-12 Education: Characteristics, Benefits and Challenges

Sanna Forsström1, Melissa Bond1,2

1The Knowledge Centre for Education, University of Stavanger, Norway; 2EPPI Centre, University College London

Presenting Author: Forsström, Sanna

The digitalisation of education offers transformative potential, enriching teaching practices and broadening instructional possibilities within schools. However, this shift also introduces a set of complex challenges that impact both pedagogy and curriculum. Within this evolving digital landscape, which includes domains such as artificial intelligence, data management, cloud computing, and sustainable technologies (González-Pérez & Ramírez-Montoya, 2022), teachers are faced with complex considerations, including classroom management, assessment, ethical concerns, and the integration of digital technologies. A key area of focus within digital transformation is the development of computational thinking through programming and educational robotics, targeting 21st-century skills such as collaboration, critical- and ethical thinking (González-Pérez & Ramírez-Montoya, 2022; Ye et al., 2022).

In response to these educational imperatives, programming has been integrated into the school curricula of several countries. While some countries have introduced programming as a separate subject or as a subsection of Mathematics, others like Finland and Norway have embraced a cross-curricular approach, incorporating programming into diverse subjects such as Art and Design, Music, and Science, in addition to Mathematics.

Based on the interdisciplinary landscape of programming education, its research intersects with various academic disciplines and pedagogical approaches. In order to shed light on how these interdisciplinary perspectives are brought together in current K-12 programming research, as well as to gauge the scope and quality of evidence syntheses that have been undertaken previously, as well as identify research gaps, a meta scoping review (Booth et al., 2022) was undertaken. The overarching research question guiding this study is:

What is the nature and scope of evidence synthesis on programming and robotics in primary and secondary education?

To provide a comprehensive answer to this main research question, the study is broken down into the following sub-questions:

  1. What types of evidence syntheses have been conducted, and in what years were they published?
  2. In which journals are these evidence syntheses published?
  3. What is the geographical distribution of the authors, their affiliations and disciplines?
  4. To what extent is there collaboration in the systematic reviews on programming in primary and secondary education?
  5. What is the quality of K-12 programming evidence synthesis?
  6. What overlap and gaps exist among research questions across evidence syntheses in terms of topic or subject area?
  7. What benefits and challenges of programming in K-12 education have been identified?

By conducting this meta scoping review, the study aims to lay a foundational groundwork for future primary and secondary research in the domain of programming education.


Methodology, Methods, Research Instruments or Sources Used
In order to answer the research questions, a meta scoping review was conducted (Booth et al., 2022), using explicit and predefined criteria (Gough et al., 2012; Zawacki-Richter et al., 2020), with the reporting guided by the PRISMA guidelines (Page et al., 2021). A scoping meta-review is a type of tertiary review (Kitchenham et al., 2009), which synthesises secondary research such as systematic reviews and meta-analyses. The review was undertaken based on previous tertiary reviews (Bond et al., 2024; Buntins et al., 2023), with the first search conducted in April 2023, and subsequent searches conducted until 17 January 2024 to ensure the inclusion of extant literature. The platforms and databases searched were the Web of Science, Scopus, EBSCOHost (including ERIC), and Progress, with the OpenAlex platform (Priem et al., 2022) also searched via evidence synthesis software EPPI Reviewer (Thomas et al., 2023). A search string was developed based on two previous tertiary reviews (Bond et al., 2024; Buntins et al., 2023), focusing on programming, computational thinking and robotics in K-12, as well as variations of evidence synthesis (Sutton et al., 2019).

The search strategy yielded 4,369 items, which were exported as a .txt or .ris file and imported into EPPI Reviewer. Following the automatic removal of 485 duplicates, two reviewers screened the same 200 items on title and abstract (2 x 100), applying the inclusion/exclusion criteria, to ensure inter-rater reliability. After achieving perfect agreement, the remaining 3,684 items were screened on title and abstract. Studies were included if they explored programming or computational thinking in K-12, were a journal article published after 2010 in English, and were a form of evidence synthesis, leaving 195 items to screen on full text. To ensure continued inter-rater reliability, a further 10 items were double screened at this stage, and again the reviewers were in complete agreement. After screening the remaining items, 121 evidence syntheses were identified for data extraction and synthesis within EPPI Reviewer. For the purposes of this paper, however, only the 50 reviews pertaining to programming and robotics will be included.


Conclusions, Expected Outcomes or Findings
This meta scoping review explores the evolution, distribution, and quality of evidence syntheses in programming education research from 2011 to 2023, focusing on primary and secondary education. Whilst analysis is currently ongoing, systematic reviews and meta-analyses have been dominant, with a gradual increase in the number and range of syntheses being conducted since 2021. The 50 journal articles in the corpus were published in 37 unique journals, reflecting a wide interest in not only the topic, but in synthesis methods also.
Geographically, authors span five continents, with most authors hailing from Europe (42%) and Asia (38%), suggesting worldwide engagement in this research area. However, there was a notable lack of representation from Africa and Oceania. Collaboration patterns showed a heavy preference for domestic partnerships (64.8% of co-authored articles), with only 18% of research published by international research teams. The quality of studies also varied, with a preponderance of medium and low-quality evidence, with very few higher quality studies, highlighting the need for more rigorous and transparent approaches to evidence synthesis, echoing findings in the wider field of education (Bond et al., 2024; Buntins et al., 2023).
Thematic analysis revealed a focus on sub-themes such as skill development, teaching methods, and pedagogical goals. However, gaps were evident, particularly in subjects like Mathematics, on the ethical considerations of AI and robotics, and the role of teachers in programming education. The benefits of programming and robotics education emerged as significant, enhancing cognitive development, creativity, and interdisciplinary learning. Challenges included resource constraints, curriculum integration, teacher training needs, cognitive load concerns, and the need for more parental involvement in robot-assisted learning.
While programming education research is extensive and diverse, areas identified for future exploration, particularly in underrepresented regions, include ethical issues in technology use, and more inclusive pedagogical strategies.

References
Bond, M., Khosravi, H., De Laat, M., Bergdahl, N., Negrea, V., Oxley, E., Pham, P., Chong, S.W., & Siemens, G. (2024). A meta systematic review of Artificial Intelligence in Higher Education: A call for increased ethics, collaboration, and rigour. International Journal of Educational Technology in Higher Education, 21. https://doi.org/10.1186/s41239-023-00436-z

Booth, A., Sutton A., Clowes, M., Martyn-St James, M. (2022). Systematic Approaches to a Successful Literature Review. SAGE  

Buntins, K, Bedenlier, S., Marín, V., Händel,  M., & Bond, M. (2023). Methodological approaches to evidence synthesis in educational technology. A tertiary systematic mapping review. MedienPädagogik Research Syntheses, 54, 167–191. https://doi.org/10.21240/mpaed/54/2023.12.20.X

González, M.Á., Rodríguez-Sedano, F.J., Llamas, C.F., Gonçalves, J., Lima, J., & García-Peñalvo, F.J. (2020). Fostering STEAM through challenge‐based learning, robotics, and physical devices: A systematic mapping literature review. Computer Applications in Engineering Education, 29, 46 - 65.

Gough, D., Oliver, S., & Thomas, J. (Eds.). (2012). An introduction to systematic reviews. SAGE.

Kitchenham, B., Pearl Brereton, O., Budgen, D., Turner, M., Bailey, J., & Linkman, S. (2009). Systematic literature reviews in software engineering – A systematic literature review. Information and Software Technology, 51(1), 7–15. https://doi.org/10.1016/j.infsof.2008.09.009

Page, M. J., McKenzie, J. E., Bossuyt, P. M.,  . . . Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372. https://doi.org/10.1136/bmj.n71

Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. ArXiv. https://arxiv.org/abs/2205.01833

Sutton, A., Clowes, M., Preston, L., & Booth, A. (2019). Meeting the review family: Exploring review types and associated information retrieval requirements. Health Information and Libraries Journal, 36(3), 202–222. https://doi.org/10.1111/hir.12276

Thomas, J., Graziosi, S., Brunton, J., Ghouze, Z., O'Driscoll, P., Bond, M., & Koryakina, A. (2023). EPPI Reviewer: advanced software for systematic reviews, maps and evidence synthesis. EPPI Centre Software. UCL Social Research Institute. London. https://eppi.ioe.ac.uk/cms/Default.aspx?alias=eppi.ioe.ac.uk/cms/er4

Ye, J., Lai, X., & Wong, G. K.‑W. (2022). The transfer effects of computational thinking: A systematic review with meta‐analysis and qualitative synthesis. Journal of Computer Assisted Learning, 38(6), 1620–1638. https://doi.org/10.1111/jcal.12723  

Zawacki-Richter, O., Kerres, M., Bedenlier, S., Bond, M., & Buntins, K. (Eds.). (2020). Systematic Reviews in Educational Research. Springer. https://doi.org/10.1007/978-3-658-27602-7


16. ICT in Education and Training
Paper

Harnessing AI to Scale Dialogic Education and Reduce Polarization

Yifat Ben-David Kolikant, Omri Hadar, Asaf Salman

Hebrew university of Jerusalem, Israel

Presenting Author: Ben-David Kolikant, Yifat

Dialogic education has promising potential for reducing polarization, widely seen as a threat to democracy (Wegerif, 2022; Parker, 2023). Engaging students in an internally persuasive discourse (IPD) (Bakhtin, 1981) means creating a space where students examine their vested truth in light of critique and alternatives presented by a different, conflicting Other (Matusov, 2009). Successful implementation of IPD increases students’ polyphony, manifested in legitimizing the right of other opinions (other voices) to exist and engaging in a dialogic relationship with this voice (Parker, 2023). It could bring democracy to life inside the school (Apple & Beane, 2007; Gilbert, 2020).

In previous work, we developed and successfully implemented a pedagogical model aimed at IPD. Our design relied on the replete evidence in the literature that a dyadic interaction — students with textual, inanimate representations of the Other, conflicting voice — is less likely to generate IPD because students’ reading is mediated by the mechanism of appropriation/resistance (Wertsch, 1998). Namely, students tend to unquestionably accept representations in line with their in-group voice and ignore or reject (with ostensive argumentative efforts) the Other voice (Brand et al., 2023).

We thus structured a triadic interaction— students from both sides of the conflict and text. The hypothesis was that the animated Other is flexible and attuned to one’s voice, thereby metaphorically “amplifying” the text. Nonetheless, meticulous scaffolding is required to (a) prevent the deterioration of hot discussions into mere disputes, (b) enable a safe space to argue and criticize, and (c) encourage reasoning and re-examination.

In one successful implementation of this model, Israeli post-secondary students, Jewish and Arabs, e-investigated an event from the Israeli-Palestinian conflict. As expected, the discussions were disputatious. Nonetheless, they were fruitful. While students did not abandon their in-group narratives, their voices became polyphonic, that is, enriched by the Other voice. This was expressed, for example, in moving from a zero-sum viewpoint on historical events and employing moral judgment to a portrayal of an entangled relationship between the agents and assuming (some) accountability towards in-group historical agents (Ben-David Kolikant & Pollack, 2015).

Intuitively, chatbots based on large-language models (LLMs) (e.g., ChatGPT, Llama, Bard) can be used to scale dialogical education because, owing to their nature, they could enable, provoke, and facilitate a productive dyadic interaction—student and text. Specifically, the text that a chatbot provides is not inanimate, it “talks” and hence can dynamically attune the responses to the interlocutor. Moreover, it can introduce students to a myriad of voices and ideas attuned to the unfolding conversation.

The use of chatbots also lessens the need for careful structuring of the encounter, aimed at preventing “explosions”, students being offended or stressed by the Other, which may lead to the opposite result, a boost to polarization. Since chatbots are not human, there is no fear they would be offended by interlocutors. Additionally, students can feel safe to utter their voices and critique, ask for clarifications, experience uttering the Other’s voice, and admit that they changed their minds or realized there is merit in the other’s viewpoint without feeling that they betrayed themselves and/or their in-group.

To gain insights into the potential and limitation of LLMs to scale dialogic education, in particular the engagement of students in IPD, we fine-tuned an LLM with a corpus of discussions in which IPD was evident. Then, we conducted discussions on controversial topics with the chatbot and analyzed its discursive moves. Our focus was on how, if at all, the chatbot provokes and enables its interlocutors to revisit their ideas.


Methodology, Methods, Research Instruments or Sources Used
We fine-tuned “Llama-2-7b-chat-hf” with a corpus of 1000 discussions taken from Reddit/Change My View (CMV). Llama 2 is a collection of pre-trained and fine-tuned generative text models, which (a) range in scale from 7 billion to 70 billion parameters; (b) are auto-regressive language models that use optimized transformer architecture; and, importantly, (c) can be optimized for dialogue use cases. We named the chatbot obtained “LlamaLo” (meaning ‘why not’, in Hebrew).
CMV is self-described as “A place to post an opinion you accept may be flawed in an effort to understand other perspectives on the issue” (www.reddit.com/r/changemyview/). CMV is heavily moderated. To encourage users to respond to each other, whoever succeeds in shifting or expanding (i.e., changing) the view of the original poster can be rewarded with a Delta (∆).
The idea was that LlamaLo would grasp the discursive “ground rules” embedded in discussions with Delta and use them in future conversations. Owing to the high quality of discussions in CMV, they are commonly used for natural language processing (NLP) and social science research, ranging from argument mining to the study of the effects of forum norms and moderation (Dutta et al., 2020; Na & DeDeo, 2022; Nguyen & Young, 2022). The Delta reward is perceived in those studies as an indicator of a productive discussion since it declares change or expansion in view.
We then discussed with LlamaLo 10 controversial topics (e.g., religion and state; bi-national conflicts). We examined its responses to several discursive situations we had created (e.g., unreasoned disagreement, fake knowledge, complex argumentation, and critical questions).    
We analyzed the conversations, focusing on LlamaLo’s (a) quality of arguments presented, (b) extent of knowledge added, (c) transactivity, i.e., building on the interlocutor’s utterances, and (d) discursive acts that invite the interlocutor to expand and refine their voice.

Conclusions, Expected Outcomes or Findings
The significance of this work is in the proof of concept of the possibility to scale dialogic education, employing a dyadic interaction between a chatbot and users. Specifically, our preliminary findings are encouraging. LlamaLo, for the most part, presented alternative ideas using well-grounded claims and added relevant knowledge. It mitigated the disagreement (i.e., softening) and provided to-the-point critique and alternative claims. It also made discursive moves, inviting the interlocutor to continue the conversation with probing questions, such as “What do you think?”. However, similarly to other LLM-based chatbots, it was not free of flaws, such as hallucinations. Also, sometimes it stuck to one point rather than enabling the conversation to expand.
We are now in the process of further improving Llamalo by fine-tuning the base model and formulating effective prompts in order to scrutinize the potential and limits of such a tool. This phase lays the ground for future work, in which we will carefully and thoughtfully design a pedagogical model that leverages the learning potential of the dyadic interactions—student and chatbot. Then we will carry out design-based research to examine and improve the learning that takes place when the model is implemented in schools.

References
Apple, M. W., & Beane, J. A. (2007). Schooling for Democracy. Principal leadership, 8(2), 34-38.
Bakhtin, M. (1981). The dialogical imagination: Four essays. (C. Emerson and M. Holquist, Trans.). Austin: University of Texas Press.
Ben-David Kolikant, Y., & Pollack, S. (2015). The dynamics of non-convergent learning with a conflicting other: internally persuasive discourse as a framework for articulating successful collaborative learning. Cognition and Instruction, 33(4), 322-356.
Brand, C. O., Brady, D., & Stafford, T. (2023, June 27). The Ideological Turing Test: a behavioural measure of open-mindedness and perspective-taking. https://doi.org/10.31234/osf.io/2e9wn
Dutta, S., Das, D., & Chakraborty, T. (2020). Changing views: Persuasion modeling and argument extraction from online discussions. Information Processing & Management, 57(2), 102085.
Gibson, M. (2020). From deliberation to counter-narration: Toward a critical pedagogy for democratic citizenship. Theory & Research in Social Education, 48(3), 431-454.
Matusov, E. (2009). Journey into dialogic pedagogy. Nova Science Publishers.
Na, R. W., & DeDeo, S. (2022). The Diversity of Argument-Making in the Wild: from Assumptions and Definitions to Causation and Anecdote in Reddit's" Change My View". In J. Culbertson, A. Perfors, H. Rabagliati & V. Ramenzoni (Eds.), Proceedings of the 44th Annual Conference of the Cognitive Science Society (pp. 969-975).
Nguyen, H., & Young, W. (2022, March). Knowledge Construction and Uncertainty in Real World Argumentation: A Text Analysis Approach. In LAK22: 12th International Learning Analytics and Knowledge Conference (pp. 34-44).
Parker, W. C. (2023). Education for Liberal Democracy: Using Classroom Discussion to Build Knowledge and Voice. Teachers College Press.
Wegerif, R. (2022). Beyond democracy: Education as design for dialogue. In Liberal democratic education: A paradigm in crisis (pp. 157-179). Brill mentis.
Wertsch, J. V. (1998). Mind as action. Oxford university press.


16. ICT in Education and Training
Paper

The Prompt, A Crucial Component for the Use of the Chatbots to Support Written Feedback and Assessment Routines

Stefanie A. Hillen

University of Agder, Norway

Presenting Author: Hillen, Stefanie A.

As one of the main insights working with Chatbots as teacher or instructor is their proper and reflected use. Specifically, one first and decisive step is a thoroughly created prompt. A prompt in generative AI is a specific way of interaction between a human and a large language model that let the model generate the intended output, in this study the constructive feedback for the learner.

One can almost say that this is already a research result, the prompts’ importance, starting to work and apply chatbots systematically and for educational purposes. This is not different from the old proverb that “we reap what we sow” one need to thoroughly consider how to design a prompt. Whereas chat bot applications for learners are implemented and under research for instance in Learning Management Systems (Lee et al., 2020) to assist student learning (Edubots, n.d.), applications for teachers specifically on assessment are less in focus with some exceptions. Just 6 % of the Edubots support assessment activities (Okonkwo & Ade-Ibijola, 2021, p.5-6). Therefore, the prompts and approaches researched here should support teacher’s feedback work on student learning. Beside different types of prompts to address different purposes and styles of answers, one need to respect principles which one can find in publications developed by the experience of language modelers for AI bots (Atlas, 2023). This will have influenced the approaches developed and presented in this paper.

These principles are described differently in the literature but as summarized here one can find the following basic handling principles:

• choose the words carefully

• define the conversation’s purpose

• define the conversations focus

• specify and be concise

• provide context

Other recommendations are to include the following types of components (Research project at our university, n.d.):

• role (the expertise or the perspective which should be taken)

• task (the specific task, objective your bot should conduct)

• format (intended presentation format for the bot answer)

Ekin (2023, p.4) is presenting five factors influencing the so-called “engineering” of prompts which in away include the handling principles and the types of components but add a bigger picture on the understanding of the technology itself used.

User intent: Understand the user’s goal and desired output. This helps in crafting a prompt that aligns with the user’s expectations.

Model understanding: Familiarize yourself with the strengths and limitations of ChatGPT. This knowledge assists in designing prompts that exploit the model’s capabilities while mitigating its weaknesses. Keep in mind that even state-of-the-art models like ChatGPT may struggle with certain tasks or produce incorrect information.

Domain specificity: When dealing with a specialized domain, consider using domain-specific vocabulary or context to guide the model towards the desired response. Providing additional context or examples can help the model generate more accurate and relevant outputs.

Clarity and specificity: Ensure the prompt is clear and specific to avoid ambiguity or confusion, which can result in suboptimal responses. Ambiguity can arise from unclear instructions, vague questions, or insufficient context.

Constraints: Determine if any constraints (e.g., response length or format) are necessary to achieve the desired output. Explicitly specifying constraints can help guide the model towards generating responses that meet specific requirements, such as character limits or structured formats.

Independent which kind of factors to consider, basic principles to follow or components to apply there is a need to make a choice to be able to use the bots purposeful and efficient. One can find literature and training programs for the so-called “prompt engineering” (see Ekin, 2023). The research question is: How will the use of different prompt-types influence the support for teachers’ writing feedback?


Methodology, Methods, Research Instruments or Sources Used
Methodology, methods, research instruments and sources used
This study includes a summarizing literature study on ‘prompts engineering’ as well 2 approaches conducted at our university. These approaches can be understood as an incremental development by experiences made and by increasing practice as well as theoretical development of knowledge; creating on the one hand useful prompts and on the other hand analyses useful educational framework for providing chatbot supported feedback.
•The first approach use data from a university course on the bachelor’s level in international education. The students’ midterm assignments were used as data source for the written chatbot-supported feedback
•The second approach uses midterm reports which applies specific structured prompts (rubric) on a course on bachelor level in English
An analysis will be done on two levels. The prompts will be analyzed regarding their structure, principals or factors used related to the feedback quality given by applying those.
Another guiding question will be how one can design tasks for coursework respecting in advance the prompts logic embraced by the given syllabus or the teaching plan given.

Conclusions, Expected Outcomes or Findings
One of the results is to provide an overview of studies on prompt guidelines, principles or factors. The second result will show which types of the analyzed prompts will lead to which kind of results respecting the educational assignment and context given. As well a third result will be a recommendation on which kind of assignment type can be properly supported by AI feedback.
References
Literature:

Atlas, S. (2023). Chatbot Prompting: A guide for students, educators, and an AI-augmented workforce. University of Rhode Island. Independent publication. https://www.researchgate.net/publication/367464129_Chatbot_Prompting_A_guide_for_students_educators_and_an_AI-augmented_workforce


Edubots (n.d.). Best Practices of Pedagogical Chatbots in Higher Education. https://uploads-ssl.webflow.com/5eb417ec5e1a81e0e30258a0/6241a9addc2a994a9b1018ec_WP5_D6_Whitepaper_Best_Practices_of_Chatbots_in_higher_education.pdf

Ekin, S. (2023). Prompt Engineering For ChatGPT: A Quick Guide To Techniques, Tips, And Best Practices. 10.36227/techrxiv.22683919.v1.

L. -K. Lee, Y. -C. Fung, Y. -W. Pun, K. -K. Wong, M. T. -Y. Yu and N. -I. Wu,(2020). "Using a Multiplatform Chatbot as an Online Tutor in a University Course," 2020 International Symposium on Educational Technology (ISET), Bangkok, Thailand, 2020, pp. 53-56, doi: 10.1109/ISET49818.2020.00021.

Lizarraga, C.; Okonkwo, C. W.; Abejide Ade-Ibijola (2021). Chatbots Applications in Education: A Systematic Review” Computers and Education: Artificial Intelligence.

Our project (n.d.). University information website specified after review.