19th International Conference on Wirtschaftsinformatik

“Please help me!” Using large language models to improve titles of user-generated posts in online health communities

J. Chen

Goethe University Frankfurt, Germany

In online health communities, users can post questions to seek health-related advice from healthcare professionals. However, the titles they formulate often lack key information. Given that many people only scan titles, users may not get their questions answered. Large language models (LLM) offer a potential solution by generating titles that better align with the information needs of healthcare professionals. In this study, we fine-tuned an LLM using over 330.000 posts from the subreddit r/askdocs. Subsequently, we conducted a survey with 70 healthcare professionals to evaluate their preference between user- and LLM-generated titles. Our findings indicate that healthcare professionals perceive LLM-generated titles as better suited to the corresponding posts, more informative, and conveying a greater sense of urgency. With our work, we contribute to research on online health communities and large language models by demonstrating that LLMs can improve the titles of user-generated posts compared to those generated by users themselves.

Age Ain’t Just a Number: Exploring the Volume vs. Age Dilemma for Textual Data to Enhance Decision Making

L. Hägele, M. Klier, A. Obermeier, T. Widmann

University of Ulm, Germany

The common belief that more data leads to better results often leads to all available data being used to derive the best possible decision. However, the age of data can strongly affect data-driven decision making. Consequently, the desire for larger data volume and at the same time contemporary data leads to the “volume vs. age” dilemma, which has not yet been sufficiently researched. In this work, we rigorously investigate the “volume vs. age” dilemma for textual data using four experiments with real-world data containing customer reviews from the Yelp platform. Contributing to theory and practice, we show that more data is not always better, as the effect of data age can outweigh the effect of data volume, resulting in overall poorer performance. Moreover, we demonstrate that different aspects within textual data can exhibit different temporal effects and that considering these effects when selecting training data can clearly outperform existing practices.

Conference Agenda