Overview and details of the sessions of this conference. Please select a date or room to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Session Chair: Luanne Freund, University of British Columbia
Location:Lecture Theatre 5 (Diamond) The Diamond
Using Full-text of Research Articles to Analyze Academic Impact of Algorithms
Yuzhuo Wang, Chegzhi Zhang
Nanjing University of Science and Technology, China, People's Republic of
This paper uses full-text corpus of research articles published in ACL conference to investigate Top-10 data mining algorithms in the field of Natural Language Processing. Academic influences of algorithms are analyzed according to their usages information. Application of ten algorithms are compared in four aspects: number of papers which mention algorithm, mention frequency, mention location of algorithm, and correlation coefficient between algorithm and task. This re-search offers a new way for evaluating influence of algorithms quantitatively. Re-sults show that there are obvious differences of influences among algorithms. Specifically, impact of SVM algorithm is significantly higher than the other algo-rithms. Moreover, the most related tasks resolved by each algorithm are different.
Metadata Versus Full-Text: Tracking Electronic Theses and Dissertations (ETDs) Users’ Behavior
Daniel Gelaw Alemneh, Mark Phillips
University of North Texas, United States of America
This presentation provides data from a recent research project at the University of North Texas (UNT) Libraries to better understand how users are discovering electronic theses and dissertations (ETDs) in the UNT Libraries. The data was obtained from server log-that contained more than 178 million lines of requests to extract the specific requests for ETDs in the UNT Digital Library. From these requests, the search query was executed in an ambiguous way (not specific fielded searches) queries were extracted to create a dataset of item-query pairs. These item-query pairs were presented to the Solr full-text indexer that powers the search and retrieval side of the UNT Digital Library to report back on statistics and help to explain whether a specific query was satisfied by either the ETDs full-text, metadata, or by both fields. The re-sulting data helps us understand how our users are arriving at a given ETD in the collection. Among other speculations, the role of metadata for the discovery process, and the possible overlap that is present between metadata and the full-text of the ETD itself will be analyzed and discussed.
Music Artist Similarity: An Exploratory Study on A Large-Scale Dataset of Online Streaming Services
Xiao Hu1, Ira Keung Kit Tam1, Meijun Liu1, J. Stephen Downie2
1University of Hong Kong, Hong Kong S.A.R. (China); 2University of Illinois
In supporting music search, online music streaming services often suggest artists who are deemed as similar to those listened to or liked by users. How-ever, there has been an ongoing debate on what constitutes artist similarity. Approaching this problem from an empirical perspective, this study collected a large-scale dataset of similar artists recommended in four well-known online music steaming services, namely Spotify, Last.fm, the Echo Nest, and KKBOX, on which an exploratory quantitative analysis was conducted. Pre-liminary results reveal that similar artists in these services were related to the genre and popularity of the artists. The findings shed light on how the con-cept of artist similarity is manifested in widely adopted real-world applica-tions, which will in turn help enhance our understanding of music similarity and recommendation.
Can Word Embedding Help Term Mismatch Problem? – a Result Analysis on Clinical Retrieval Tasks
Danchen Zhang, Daqing He
university of pittsburgh, United States of America
Clinical Decision Support (CDS) systems assist doctors to make clinical deci-sions by searching for medical literature based on patients’ medical records. Past studies showed that correctly predicting patient’s diagnosis can significantly in-crease such clinical retrieval system performance. However, still a large portion of relevant documents ranked very low due to term mismatch problem. In this paper, we explore the solutions to term mismatch problem to enhance such sys-tem. Different to other retrieval tasks, queries in disease prediction based clinical retrieval systems have already been expanded with the most informative terms (i.e., predicted disease). It is therefore a great challenge for traditional Pseudo Relevance Feedback (PRF) methods to incorporate new informative terms from top K pseudo relevant documents. It is under this scenario that word embedding, which is trained on much larger collections and can identify words that are used in similar contexts to a given word, is utilized to perform further improvements. Our method was evaluated using test collections from the CDS track in TREC 2015, trained on 2014 data. Experiment results show that word embedding can significantly improve retrieval performance, and term mismatch problem can be largely resolved, particularly for the low ranked relevant documents. However, for highly ranked documents with less term mismatching problem, word emend-ing’s improvement can also be replaced by a traditional language model.