JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at info@ice-conference.org.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Daily Overview

Session

SS04-AR-2C: Trustworthy Autonomous AI for Digital Transformation (II)

Time:

Tuesday, 23/June/2026:

4:40pm - 6:00pm

Session Chair: Dr. Nahid Farhady Ghalaty, Microsoft
Session Chair: Jordan Hull, Microsoft
Session Chair: Dr. Abhilasha Bhargav-Spantzel, Microsoft

Location: Room Arrábida

Presentations

Quantum-enhanced remote video identification

Claudiu-Leonard Guster, Eugen Pop, Alexandra Cernian, Mihnea-Alexandru Moisescu, Miruna-Elena Iliuta

University POLITEHNICA Bucharest, Romania

In this era, remote video identification is an increasingly common solution that is gaining ground in processes that require physical presence to verify identity. The traditional video identification process involves several steps, including analyzing the identity document, taking selfies, recording a video for the system to verify the liveness part, and confirming a transmitted code to complete multi-factor authentication. In recent years, quantum algorithms have made great strides, especially after the second technological revolution, and there are now industrial applications of quantum technologies. This study provides a state-of-the-art review of the classic remote video identification process, focusing on the legal framework governing these solutions, aspects of biometric analysis, and the risks associated with this process. As a result of this work, a security architecture is proposed to support the solution infrastructure at an adequate level and reduce the risks to which user data is exposed, specifically targeting identity fraud, communication interception, and long-term storage vulnerabilities.

From Hot Chocolate to High Throughput: How Multi-Agent AI is Redefining CTFs

Arnold Spantzel, Heidi Spantzel

John Adams Academy, United States of America

At 9:59 a.m. on March 22, I, as a 14-year-old, was ready for the BSides CTF with my four-person high-school team. My hot chocolate could wait. By 10:02 a.m., we had completed 100% of the challenges and earned second place. To many observers, it looked like we must have had behind-the-scenes support from seasoned experts. In reality, our advantage came from a multi-agent AI system that could explore, reason, and solve in parallel, with us acting as supervisors rather than operators.

Capture the Flag competitions (CTFs) are becoming defined not just by technical skill, but also by how effectively teams create AI systems under real-world constraints. This talk explores a high-performance multi-agent system built using custom Python orchestration that coordinates multiple models from providers such as OpenAI, Anthropic, and Google. Challenges are exfiltrated using the CTFd API infrastructure and distributed across several parallel swarms, each using multiple models working together in isolated execution environments.

Instead of using a simple input-to-output workflow, the system implements a multi-agent workflow and resource-optimization pipeline. Smaller, faster, and less expensive models, such as gpt-5.3-spark-xhigh or gpt-5.4-mini, are used for exploration, while larger models are used for tasks that require deeper thinking. All agents and sessions for each problem run in parallel to maximize speed. Once one agent has found the flag and submitted it via the CTFd API, all other processes related to that challenge are stopped in order to save resources.

One key feature of our system is that it can run fully autonomously or accept human input. The human-in-the-loop steering layer shares the outline of its plan with the user. The user can change the plan or suggest new ideas to the overall structure in real time. This allows users to steer different agents in different directions or intervene when there are clear mistakes, hallucinations, or inefficient approaches, or when the human identifies an opportunity the AI missed.

Through specific examples in recent CTFs, we show how manual steering and strategic optimization can improve speed, solve rate, and cost. We contrast this with real-world pen-testing workflows, where exhaustive exploration is needed instead of just finding vulnerabilities specific to a task like capturing a flag.

We will also analyze examples where the autonomous system malfunctions, including hallucinations, going directly to exploitation after little exploration, and cases where the AI breaks competition rules, such as using disallowed tools or performing unintended external actions like registering additional accounts to access non-existent source code. We explore why these actions happen and what safeguards can be put in place to prevent them.

To ensure our system is not wasting tokens, instead of running as many AI models as possible in parallel, we use targeted pruning to ensure that only productive AI sessions continue working on a problem. This strategy allows faster speed for a fixed budget by enabling broader exploration at the start while reducing the workload for the human overseeing the system.

Finally, we show how the transition from purely manual work to AI workflows does not remove the need for humans. Instead, rather than writing exploit code or reviewing thousands of lines of code manually, one person can oversee multiple AI systems by suggesting new directions to explore and steering them toward effective solutions.

Automated Discovery of Prompt Injection Vulnerabilities via Mutated Prompt Generation

Bogdan Stelea¹, Arnav Garg²

¹Microsoft, Romania; ²Microsoft, USA

AI vulnerability validation presents fundamental challenges that differ from traditional security testing. The non-deterministic behavior of large language models (LLMs) complicates reproducibility, while the proliferation of LLM-powered agentic assistant interfaces across heterogeneous application surfaces limits consistent validation and systematic coverage. Indirect prompt injection attacks (IPIA) frequently require multi-step reasoning, coordinated tool usage, and repeated execution to manifest, resulting in time-intensive and fragile manual validation workflows. We introduce the Mutated Prompt Generator (MPG), a framework that combines automated adversarial payload generation with scalable, end-to-end security testing of agentic AI systems. MPG automatically generates mutated IPIA proof-of-concept payloads through four operator classes - encoding transforms, structural rewrites, LLM-guided semantic mutations, and a semantic camouflage operator that blends injection tokens toward the carrier text's semantic domain and constructs an explicit mutation tree to systematically explore the vulnerability surface around confirmed or historical attack vectors. A key design principle is semantically-aware mutation: MPG classifies each payload component as benign carrier content or malicious injection at the token segment level, applying role-appropriate mutation operators to each. MPG uses a Subtree Viability Scoring algorithm efficiently that terminates the generation of mutations based on the cummulative Attack Success Rate (ASR) of the evaluated mutation subtree while preserving promising mutation generation nodes, enabling focused exploration of complex data exfiltration vectors. The framework supports multi-modal attack surfaces including document uploads, multi-turn prompt chains, and cross-format delivery substitution, and is built on a Playwright-based testing harness that automates browser-level interaction, DOM-level inspection, and multi-surface orchestration across diverse LLM-powered agentic environments. By automating both vulnerability generation and execution, MPG enables scalable exploration of non-deterministic AI behaviors and supports fixed validation of historical security cases across evolving product interfaces.

Making AI Ethics Operational: A Post-Deployment Accountability Framework with Monitoring-to-Incident Governance

Safayat Bin Hakim¹, Kanchon Gharami³, Nahid Farhady Ghalaty², Shafika Showkat Moni¹, Houbing Song¹

¹University of Maryland Baltimore County; ²Microsoft; ³Embry-Riddle Aeronautical University

Deployed AI systems cause real-world harm when models drift, data distributions shift, or usage contexts change—yet current ethics frameworks focus almost exclusively on design-time principles, leaving practitioners without oper ational tools for post-deployment oversight. We address this gap with a lightweight accountability framework that con nects technical monitoring to governance actions through three integrated pillars: provenance tracking, continuous monitor ing beyond accuracy (fairness, drift, security), and structured incident response. Our framework includes a monitoring-to incident algorithm that translates metric violations into severity classified tickets, an incident taxonomy with explicit retirement criteria, and lifecycle artifacts that enable audit reconstruction. Unlike MLOps monitoring tools, model risk management, high level governance principles, or retrospective incident databases, our approach provides a general-purpose accountability layer spanning deployment contexts. We show how this framework operationalizes requirements from ISO/IEC 42001 and NIST AI RMF, define accountability metrics including time-to-detect and time-to-remediate, illustrate the end-to-end pipeline through a worked clinical example, and design ethical oversight to be testable, repeatable, and practical in production systems.

32^nd ICE IEEE/ITMC Conference
(ICE 2026)

22 - 24 June 2026, Porto - Portugal

Conference Agenda