|8:30am - 10:00am||Workshop 1a: Conceptualizing the Future of Information Privacy Research|
Conceptualizing the Future of Information Privacy Research
Information and communication technologies (ICTs) have created innumerable opportunities for connecting people, simplifying complex tasks, and developing ecosystems of tools for our homes and workplaces. At the same time, they create new threats to the privacy of individuals, groups, and organizations due to the collection, sharing, and analysis practices that companies employ on user-generated data. iSchools are especially well-positioned to address these challenges because of our interdisciplinary expertise in sociotechnical spaces. In this workshop, we will work with participants to identify and define key domains where iSchool researchers can contribute to our understanding of information over the next 10 years. Topics of interest include research with marginalized groups; ethical issues with privacy research; bridging research, policy, and design; and expanding research on networked or group-managed privacy. Workshop goals include mapping these topic areas, connecting and networking with privacy researchers in the global iSchool community, writing a public statement on the state and future of information privacy research, and launching a call for a special issue of JASIST on this topic.
|10:30am - 12:00pm||Workshop 1b: Conceptualizing the Future of Information Privacy Research|
Part 2 of 4
|1:30pm - 3:00pm||Workshop 1c: Conceptualizing the Future of Information Privacy Research|
Part 3 of 4
|3:30pm - 5:00pm||Workshop 1d: Conceptualizing the Future of Information Privacy Research|
Part 4 of 4
|10:30am - 12:00pm||Papers 2: Methodological Concerns in (Big) Data Research|
Session Chair: Mohammad Hossein Jarrahi, University of North Carolina at Chapel Hill
Methodological Transparency and Big Data: A Critical Comparative Analysis of Institutionalization
1Princeton University, United States of America; 2Indiana University, United States of America
Big data is increasingly employed in predictive social analyses, yet there are many visible instances of unreliable models or failure, raising questions about methodological validity in data driven approaches. From meta-analysis of methodological institutionalization across three scholarly disciplines, there is evidence that traditional statistical quanti-tative methods, which are more institutionalized and consistent, are important to develop, structure, and institutionalize data scientific ap-proaches for new and large n quantitative methods, indicating that data driven research approaches may be limited in reliability, validity, gen-eralizability, and interpretability. Results also indicate that interdisci-plinary collaborations describe methods in significantly greater detail on projects employing big data, with the effect that institutionalization makes data science approaches more transparent.
Spanning the Boundaries of Data Visualization Work: An Exploration of Functional Affordances and Disciplinary Values
1University of Washington, Seattle, WA; 2University of Maryland, College Park, MD
Creating data visualizations requires diverse skills including computer programming, statistics, and graphic design. Visualization practitioners, often formally trained in one but not all of these areas, increasingly face the challenge of reconciling, integrating and prioritizing competing disciplinary values, norms and priorities. To inform multidisciplinary visualization pedagogy, we analyze the negotiation of values in the rhetoric and affordances of two common tools for creating visual representations of data: R and Adobe Illustrator. Features of, and discourse around, these standard visualization tools illustrate both a convergence of values and priorities (clear, attractive, and communicative data-driven graphics) side-by-side with a retention of rhetorical divisions between disciplinary communities (statistical analysis in contrast to creative expression). We discuss implications for data-driven work and data science curricula within the current environment where data visualization practice is converging while values in rhetoric remain divided.
Modeling the process of information encountering based on the analysis of secondary data
1School of Information Management, Wuhan University, China, People's Republic of; 2Center for Studies of Information Resources, Wuhan University, Wuhan, Hubei, China, People‘s Republic of
The critical incident technique (CIT) has been applied extensively in the research on information encountering (IE), and abundant IE incident descriptions have been accumulated in the literature. This study used these descriptions as secondary data for the purpose of creating a general model of IE process. The grounded theory approach was employed to systematically analyze the 279 IE incident descriptions extracted from 14 IE studies published since 1995. 230 conceptual labels, 33 subcategories, and 9 categories were created during the data analysis process, which led to one core category, i.e. “IE process”. A general IE process model was established as a result to demonstrate the relationships among the major components, including environments, foreground activities, stimuli, reactions, examination of information content, interaction with encountered information, valuable outcomes, and emotional states before/after encountering. This study not only enriches the understanding of IE as a universal information phenomenon, but also shows methodological significance by making use of secondary data to lower cost, enlarge sample size, and diversify data sources.
|1:30pm - 3:00pm||Papers 6: Limits and Affordances of Automation|
Session Chair: Radhika Garg, Syracuse University
Automating Documentation: A critical perspective into the role of artificial intelligence in clinical documentation
1Oxford Internet Institute, University of Oxford. OX1 3JS, UK; 2University of North Carolina, Chapel Hill, NC 27599, USA
The current conversation around automation and artificial intelligence technolo-gies creates a future vision where humans may not possibly compete against in-telligent machines, and that everything that can be automated through deep learn-ing, machine learning, and other AI technologies will be automated. In this article, we focus on general practitioner documentation of the patients’ clinical encounter, and explore how these work practices lend themselves to automation by AI. While these work practices may appear perfect to automate, we reveal potential negative consequences to automating these tasks, and illustrate how AI may ren-der important aspect of this work invisible and remove critical thinking. We con-clude by highlighting the specific features of clinical documentation work that could leverage the benefits of human-AI symbiosis.
Toward Three-Stage Automation of Detecting and Classifying Human Values
1Kyushu University, Japan; 2University of Maryland, USA; 3The University of Texas at Austin, USA; 4National Sun Yat-sen University, Taiwan
Prior work on automated annotation of human values has sought to train text classification techniques to label text spans with labels that reflect specific human values such as freedom, justice, or safety. This confounds three tasks: (1) selecting the documents to be labeled, (2) selecting the text spans that express or reflect human values, and (3) assigning labels to those spans. This paper proposes a three-stage model in which separate systems can be optimally trained for each of the three stages. Experiments from the first stage, document selection, indicate that annotation diversity trumps annotation quality, suggesting that when multiple annotators are available, the traditional practice of adjudicating conflicting annotations of the same documents is not as cost effective as an alternative in which each annotator labels different documents. Preliminary results for the second stage, selecting value sentences, indicate that high recall (94%) can be achieved on that task with levels of precision (above 80%) that seem suitable for use as part of a multi-stage annotation pipeline. The annotations created for these experiments are being made freely available, and the content that was annotated is available from commercial sources at modest cost.
Illegal Aliens or Undocumented Immigrants? Towards the Automated Identification of Bias by Word Choice and Labeling
1University of Konstanz, Germany; 2University of Wuppertal, Germany
Media bias, i.e., slanted news coverage, can strongly impact the public perception of topics reported in the news. While the analysis of media bias has recently gained attention in computer science, the automated methods and results tend to be simplistic when compared to approaches and results in the social sciences, where researchers have studied media bias for decades. We propose Newsalyze, a work-in-progress prototype that imitates a manual analysis concept for media bias established in the social sciences. Newsalyze aims to find instances of bias by word choice and labeling in a set of news articles reporting on the same event. Bias by word choice and labeling (WCL) occurs when journalists use different phrases to refer to the same semantic concept, e.g., actors or actions. This way, bias by WCL can induce strongly divergent emotional responses from their readers, such as the terms "illegal aliens" vs. "undocumented immigrants." We describe two critical tasks of the analysis workflow, finding and mapping such phrases, and estimating their effects on readers. For both tasks, we also present first results, which indicate the effectiveness of exploiting methods and models from the social sciences in an automated approach.
|3:30pm - 5:00pm||Papers 10: Data-Driven Storytelling and Modeling|
Session Chair: Matthew Andrew Willis, University of Oxford
Engaging the Community Through Places: An User Study of People's Festival Stories
Pennsylvania State University, United States of America
People’s lived experiences, stories, and memories about local places endow meaning to a community, which can play an important role in community engagement. We investigated the meaning of place through the lens of people’s memories of a local arts festival. We first designed, developed, and deployed a web application to collect people’s festival stories. We then developed our interview study based on 28 stories collected through the web app in order to generate rich conversations with 18 festival attendees. Our study identifies three parallel meanings that a place can hold based on the following types of festival attendees: experience seekers, nostalgia travelers, and familiar explorers. We further discuss how information technology can facilitate community engagement based on those parallel meanings of place.
Understanding Partitioning and Sequence in Data-Driven Storytelling
1University of Maryland, College Park, United States of America; 2United States Naval Academy
The comic strip narrative style is an effective method for data-driven storytelling. However, surely it is not enough to just add some speech bubbles and clip art to your PowerPoint slideshow to turn it into a data comic? In this paper, we investigate aspects of partitioning and sequence as fundamental mechanisms for comic strip narration: chunking complex visuals into manageable pieces, and organizing them into a meaningful order, respectively. We do this by presenting results from a qualitative study designed to elicit differences in participant behavior when solving questions using a complex infographic compared to when the same visuals are organized into a data comic.
Modeling adoption behavior for innovation diffusion
1School of Information Management, Sun Yat-Sen University, Guangzhou, China; 2School of Informatics and Computing, Indiana University, Bloomington,United States
Studying diffusion of innovation is getting critical given in the current AI era, an increasing number of new technologies have been developed to promote disruptive innovation. Unlike previous works which mainly consider direct influence between new technology adoption behaviors, a new model named as Adoption Behavior based Graphical Model(ABGM) is proposed by incorporating influence factor (i.e., homophily and heterophily) among users' adoption behavior towards new AI technologies. This model simulates the process of innovation diffusion and learns the diffusion patterns in a unied framework. We evaluate
the proposed model on a large-scale AI publication dataset from 2006 to 2015. Results show that ABGM outperforms start-of-art baselines and also demonstrates that the probability of individual user adopting an innovation is significantly influenced by the diffusion process through the correlation network.
|10:30am - 12:00pm||Papers 14: Data and Information in the Public Sphere|
Session Chair: Heather Moulaison, University of Missouri
Connecting Users, Data and Utilization: A Demand-side Analysis of Open Government Data
1Wuhan University, China, People's Republic of; 2Macquarie University, Sydney, NSW, Australia
Open government data (OGD) could bring various aspects of benefits through transparency and access. Thus, governments have proposed policies and practices to disclose more data to the public. However, studies have shown the utilization of OGD instead of disclosure as a key problem. Although citizens are recognized as a key participant in the utilization process of OGD from demand-side, few studies have revealed the possible relationship among OGD users, their demands of data and utilization. Therefore, our study carried out a survey on a Chinese population to analyse the possible relationship between these three. Results show citizens’ limited awareness of OGD and portals, and their different demands of OGD subjects due to different socio-demographic characteristics. Daily life and anticorruption were the two main types of OGD utilization by citizens. Their types of usage are affected by their education and knowledge of OGD. Different types of utilization could lead to different demands for OGD subjects. We suggest governments to improve citizens’ awareness of their efforts to provide OGD, and deliver more data in the subject categories that are in greater need by citizens. Further studies need to be carried out on citizens’ motivation of OGD utilization.
“Just my intuition”: Awareness of versus Acting on Political News Misinformation
1School of Creative Media, City University of Hong Kong, Hong Kong S.A.R. (China); 2Florida State University
Citizens are becoming increasingly aware of the prevalence of misinformation, disinformation, and rumors, especially on political topics. But currently, the litera-ture lacks clarity on how citizens are dealing with this issue. And information sci-ence and HCI researchers propose design solutions such as diverse information platforms assuming that citizens - with more information at hand - will be able to rationalize political misinformation on their own. In this paper, we conducted semi-structured interviews with 21 Hong Kong residents. Our findings point out that while most of our participants were aware of misinformation, they mostly did not act on them. This suggests that while it is important for designers to further develop information rich news representations, researchers also need to develop alternative solutions such as news literacy education as long term remedies.
Public-Private Partnerships in Data Services: Learning From Genealogy
1School of Information and Communications Studies, University College Dublin, Ireland.; 2Information School, University of Wisconsin-Madison, United States of America
As one strategy for expanding access to archival data, libraries and data archives are increasingly entering into Public-Private Partnerships (PPP) with commercial entities. In exchange for access to publicly held sources of information of interest to genealogists, com-mercial companies are providing financial resources for digitization and access. This paper reviews recent literature on these public-private partnerships, considers challenges and long-term implications of these relationships in data services by reviewing issues experienced in the including tensions with institutional missions, access differentiation, exclusivity agreements and nondisclosure agreements and marginalization of services financed by public data.
|1:30pm - 3:00pm||Papers 17: Algorithms at Work|
Session Chair: Monica Grace Maceli, Pratt Institute
Context-aware Coproduction: Implications for Recommendation Algorithms
1Information Sciences and Technology, Pennsylvania State University, University Park, Pennsylvania, United States; 2Human Computer Interaction Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States; 3Hasso Plattner Institute, University of Potsdam, Potsdam, Germany; 4System Sciences Laboratory, PARC Inc., a Xerox Company, Palo Alto, California, United States; 5University of Washington, Seattle, Washington, United States
Coproduction is an important form of service exchange in local community where members perform and receive services among each other on non-profit basis. Local coproduction systems enhance community connections and re-energize neighborhoods but face difficulties matching relevant and convenient transaction opportunities. Context-aware recommendations can provide promising solutions, but are so far limited to matching spatio-temporal and static user contexts.
By analyzing data from a transportation-share app during a 3-week study with 23 participants, we extend the design scope for context-aware recommendation algorithms to include important community-based parameters such as sense of community. We find that inter- and intra-relationships between spatio-temporal and community-based social contexts significantly impact users' motivation to request or provide service. The results provide novel insights for designing context-aware recommendation algorithms for community coproduction services.
Algorithmic Management and Algorithmic Competencies: Understanding and Appropriating Algorithms in Gig work
1University of North Carolina at Chapel Hill, United States of America; 2University of Washington, United States of America
Data-driven algorithms now enable digital labor platforms to automatically manage transactions between thousands of gig workers and service recipients. Recent research on algorithmic management outlines information asymmetries, which make it difficult for gig workers to gain control over their work due a lack of understanding how algorithms on digital labor plat-forms make important decisions such as assigning work and evaluating workers. By building on an empirical study of Upwork users, we make it clear that users are not passive recipients of algorithmic management. We explain how workers make sense of different automated features of the Up-work platform, developing a literacy for understanding and working with algorithms. We also highlight the ways through which workers may use this knowledge of algorithms to work around or manipulate them to retain some professional autonomy while working through the platform.
Agency Laundering and Algorithmic Decision Systems
1University of Wisconsin-Madison, United States of America; 2Florida International University, United States of America
This paper has two aims. The first is to explain a type of wrong that arises when agents obscure responsibility for their actions. Call it "agency laundering." The second is to use the concept of agency laundering to understand the underlying moral issues in a number of recent cases involving algorithmic decision systems.
|3:30pm - 5:00pm||Papers 20: Data Mining and NLP|
Session Chair: Catherine Blake, Illinois
Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts
Syracuse University, United States of America
Segmenting scientific abstracts and full-text based on their rhetorical function is an essential task in text classification. Small rhetorical segments can be useful for fine-grained literature search, summarization, and comparison. Current effort has been focusing on segmenting documents into general sections such as introduction, method, and conclusion, and much less on the roles of individual sentences within the segments. For example, not all sentences in the conclusion section are describing research findings. In this work, we developed rule-based and machine learning methods and compared their performance in identifying the finding sentences in conclusion subsections of biomedical abstracts. 1100 conclusion subsections with observational and randomized clinical trials study designs covering five common health topics were sampled from PubMed to develop and evaluate the methods. The rule-based method and the bag-of-words based machine learning method both achieved high accuracy. The better performance by the simple rule-based approach shows that although advanced machine learning approaches could capture the main patterns, human expert may still outperform on such a specialized task.
How to Make a Successful Movie: Factor Analysis from Both Financial and Critical Perspectives
1indiana university bloomington, United States of America; 2Nanjing University of Science and Technology
Over the past twenty years, people have seen considerable growth in film industry. There are two common measurements for movie quality, financial metric of net profit and reception metric in the form of ratings assigned by moviegoers on websites. Researchers have utilized these two metrics to build models for movie success prediction separately, while few of them investigate the combination. Therefore, in this paper, we analyze movie success from perspectives of financial and critical metrics in tandem. Here, optimal success is defined as a film that is both profitable and highly acclaimed, while its worst outcome involves financial loss and critical panning at the same time. Salient features that are salient to both financial and critical outcomes are identified in an attempt to uncover what makes a "good'' movie "good'' and a "bad'' one ``bad'' as well as explain common phenomenons in movie industry quantitatively.
Authority Claim in Rationale-Containing Online Comments
Syracuse University, United States of America
We examined whether the existence of authority claims signifies one’s ra-tionales in online communication content, potentially contributing to the re-search on rationale identification and rationale generation. Authority claims are statements that reveal the writer’s intention to bolster the writer’s credibil-ity. In open online communications, the anonymity and the dynamic partici-pation make it challenging to establish the credibility of their viewpoints and reasoning. Therefore, we hypothesize these online participants will tend to use authority claims to bolster their credibility when presenting their justifica-tions. We annotated authority claims in 271 text segments that contain online users’ rationales. These text segments are adapted from the open access cor-pora provided by Rutgers’ Argument Mining group. Contrary to our hypothe-sis, we found that in our dataset the users scarcely attempted to bolster their credibility when presenting their reasoning to the others in these activities. We call for more investigations to explore the role of activity context affects participants’ use of authority claims in their reasoning traces. We further state that the effects of communication medium on individuals’ cognitive and meta-cognitive processes are important to consider in argument mining research.
|10:30am - 12:00pm||Papers 23: Environmental and Visual Literacy|
Exploring and Visualizing Household Electricity Consumption Patterns in Singapore: A Geospatial Analytics Approach
Singapore Management University, Singapore
Despite being a small country-state, electricity consumption in Singapore is said to be non-homogeneous, as exploratory data analysis showed that the distribu-tions of electricity consumption differ across and within administrative bounda-ries and dwelling types. Local indicators of spatial association (LISA) were cal-culated for public housing postal codes using June 2016 data to discover local clusters of households based on electricity consumption patterns. A detailed walkthrough of the analytical process is outlined to describe the R packages and framework used in the R environment. The LISA results are visualized on three levels: country level, regional level and planning subzone level. At all levels we observe that households do cluster together based on their electricity consump-tion. By faceting the visualizations by dwelling type, electricity consumption of planning subzones can be said to fall under one of these three profiles: low-consumption subzone, high-consumption subzone and mixed-consumption sub-zone. These categories describe how consumption differs across different dwell-ing types in the same postal code (HDB block). LISA visualizations can guide electricity retailers to make informed business decisions, such as the geographical zones to enter, and the variety and pricing of plans to offer to consumers.
Creen: A Carbon Footprint Calculator Designed for Calculation In Context
Indiana University, United States of America
Concerns regarding the environment and the impact humans constantly have on the environment has been a growing concern for decades, but there is still a substantial lack of environmental literacy and action among most of the population in what they can do to reduce the damage they may be indirectly causing. Given that many people express an interest in helping the environment, this paper presents a prototype of a carbon footprint calculator which interprets a carbon footprint estimate into a form that can be more accessible to people so that they may be empowered to make more informed decisions.
Environmental Monitoring of Archival Collections: An Exploratory Study of Professionals' Data Monitoring Dashboard Needs and Related Challenges
Pratt Institute, United States of America
This work explores the data dashboard monitoring needs and challenges en-countered by archives professionals engaged in environmental monitoring, such as collection of temperature and humidity data, across a variety of cultural heritage domains. The results of a practitioner focus group and data dashboard feature ideation session are presented. Findings suggest that practitioners’ environmental monitoring struggles include a variety of factors ranging from little budget or staff buy-in, to struggles with environmental monitoring device features, data collection, and interpretation. Suggested revisions to popular data dashboard tools in use included integrating multiple sensors’ data into a single, remotely-accessible real-time control interface. Participants’ required features in a data dashboard included: charts, export options, value ranges and exceeded alerts, web and mobile access, real-time data, and a date range selector. An initial data dashboard mockup based on the expressed end user needs and challenges is presented.