2:00pm - 2:15pm
The Human in the Infinite Loop: A Case Study on Revealing and Explaining Human-AI Interaction Loop Failures
1LMU Munich, Germany; 2University of Bayreuth, Germany
Interactive AI systems increasingly employ a human-in-the-loop strategy. This creates new challenges for the HCI community when designing such systems. We reveal and investigate some of these challenges in a case study with an industry partner, and developed a prototype human-in-the-loop system for preference-guided 3D model processing. Two 3D artists used it in their daily work for 3 months. We found that the human-AI loop often did not converge towards a satisfactory result and designed a lab study (N=20) to investigate this further. We analyze interaction data and user feedback through the lens of theories of human judgment to explain the observed human-in-the-loop failures with two key insights: 1) optimization using preferential choices lacks mechanisms to deal with inconsistent and contradictory human judgments; 2) machine outcomes, in turn, influence future user inputs via heuristic biases and loss aversion. To mitigate these problems, we propose descriptive UI design guidelines. Our case study draws attention to challenging and practically relevant imperfections in human-AI loops that need to be considered when designing human-in-the-loop systems.
2:15pm - 2:30pm
Aktivitätserkennung über zeitliche Distanz mittels Supervised Learning im Kontext der Demenzdiagnostik
Hochschule RheinMain, Deutschland
Bei neurologischen Erkrankungen kann das Fortschreiten der Krankheit mittels Monitoring von Bewegungen und Aktivitäten erkannt werden. Zu der Realisierung eines solchen Monitorings bedarf es hohen Arbeitsaufwand durch Dokumentation, der angesichts des stetig steigenden Mangels an Pflegepersonal kaum zu decken ist. Schritt für Schritt versuchen wir in Kooperation mit zwei Demenz-Wohngemeinschaften, die Pflegekräfte zu entlasten, indem wir einen Ansatz zur Dokumentation entwickeln.
In dieser Arbeit wird ein Ansatz zur Aggregation einzelner Aktivitäten über einen Betreuungszeitraum hinweg mittels Smartwatches in Kombination mit Algorithmen des überwachten Lernens vorgestellt. Eine Smartwatch bietet die Möglichkeit, Sensortechnologie in die tägliche Routine eines Patienten zu integrieren, ohne ihn zu stören, da viele Patienten ohnehin Uhren tragen.
Wir untersuchen vielversprechende Algorithmen des überwachten Lernens, erfassen die Daten des Beschleunigungssensors, des Herzfrequenzsensors, des Gyroskops, der Schwerkraft und des Lagesensors mit 20 Hz und senden diese an einen Webserver. Die Aktivitäten werden dann mittels Fast Forest, Logistic Regression und Support Vector Machines über eine Pflegeschicht mehrfach klassifiziert.
Wir stellen einen Aktivitätklassifikations-Prototypen über zeitliche Distanz zur automatisierten Aktivitätserkennung vor, der nach einer Anzahl von Klassifikationen und der Wahrscheinlichkeit dieser, dem Pflegepersonal eine Aussage von Aktivitäten über den jeweiligen Zeitraum einer Pflegeschicht hinweg, in Form einer ausgefüllten Dokumentation vorschlägt.
2:30pm - 2:45pm
Using Artificial Neural Networks to Compensate Negative Effects of Latency in Commercial Real-Time Strategy Games
Universitiy of Regensburg, Germany
Cloud-based game streaming allows gamers to play Triple-A games on any device, anywhere, almost instantly. However, they entail one major disadvantage - latency. Latency, the time between input and output, worsens game experience and gamers' performances. Reducing the latency of streaming games is essential to provide gamers with the same game experience and performance as local games. Previous work indicates that deep learning-based techniques can compensate for a game's latency if the used neural network has access to the game's internal state during inference. However, it is unclear if deep learning can be used to compensate for the latency of unmodified commercial video games. Hence, this work investigates the use of deep learning-based latency compensation in commercial video games. In a first study, we collected data from 21 participants playing real-time strategy games. We used the data to train two artificial neural networks. In a second study with 12 participants, we compared three different scenarios: (1) playing without latency, (2) playing with 50 ms of controlled latency, and (3) playing with 50 ms latency fully compensated by our system. Our results show that players associated the gaming session with less negative feelings and were less tired when supported by our system. We conclude that deep learning-based latency compensation can compensate the latency of cloud-based streaming games. Ultimately, our work enables cloud-based game streaming providers to offer gamers a better and more responsive gaming experience.
2:45pm - 3:00pm
Suggestion Lists vs Continuous Generation: Interaction Design for Writing with Generative Models on Mobile Devices Affect Text Length, Wording and Perceived Authorship
University of Bayreuth, Germany
Neural language models have the potential to support human writing. However, questions remain on their integration and influence on writing and output. To address this, we designed and compared two user interfaces for writing with AI on mobile devices, which manipulate levels of initiative and control: 1) Writing with continuously generated text, the AI adds text word-by-word and user steers. 2) Writing with suggestions, the AI suggests phrases and user selects from a list. In a supervised online study (N=18), participants used these prototypes and a baseline without AI. We collected touch interactions, ratings on inspiration and authorship, and interview data. With AI suggestions, people wrote less actively, yet felt they were the author. Continuously generated text reduced this perceived authorship, yet increased editing behavior. In both designs, AI increased text length and was perceived to influence wording. Our findings add new empirical evidence on the impact of UI design decisions on user experience and output with co-creative systems.
3:00pm - 3:15pm
Sign H3re: Symbol and X-Mark Writer Identification Using Audio and Motion Data from a Digital Pen
Leibniz University Hannover, Germany
Although many contracts can be concluded or terminated digitally, laws require handwritten signatures in certain cases. Forgeries are a major challenge, as validity is not always immediately apparent without forensic methods. However, illiteracy or disabilities may result in a person being unable to write their full name. Then, x-mark signatures are used, which require a witness for validity. In cases of suspected fraud, the relationship of the witnesses must be questioned, which involves a great amount of effort. In this paper we use audio and motion data from a digital pen to identify users by handwritten symbols. We conducted a study with 30 participants to evaluate our approach on 19 symbols. We found that x-marks offer less individual features than other symbols like arrows or circles. By training on three samples and averaging three predictions we reach an average F1-score of F1 = 0.87 using statistical and spectral features fed into SVMs.
3:15pm - 3:30pm
How Accurate Does It Feel? – Human Perception of Different Types of Classification Mistakes
1Leibniz Institute for the Social Sciences; 2Hochschule der Medien; 3University of Duisburg-Essen
Supervised machine learning utilizes large datasets, often with ground truth labels annotated by humans. While some data points are easy to classify, others are hard to classify, which reduces the inter-annotator agreement. This causes noise for the classifier and might affect the user’s perception of the classifier’s performance. In our research, we investigated whether the classification difficulty of a data point influences how strongly a prediction mistake reduces the “perceived accuracy”. In an experimental online study, 225 participants interacted with three fictive classifiers with equal accuracy (73%). The classifiers made prediction mistakes on three different types of data points (easy, difficult, impossible). After the interaction, participants judged the classifier’s accuracy. We found that not all prediction mistakes reduced the perceived accuracy equally. Furthermore, the perceived accuracy differed significantly from the calculated accuracy. To conclude, accuracy and related measures seem unsuitable to represent how users perceive the performance of classifiers.