Session | ||
SC1 - AI3: Data matters
| ||
Presentations | ||
Optimizing data collection for machine learning 1University of Ottawa; 2NVIDIA Corporation; 3University of Toronto; 4Vector Institute Deep learning systems use huge training data sets to meet desired performances, but over/under-collecting training data can incur unnecessary costs and workflow delays. We propose and then solve an optimal data collection problem incorporating performance targets, collection costs, a time horizon, and penalties. Experiments on six deep learning tasks show that we reduce the risks of failing to meet performance targets by over 2x compared to existing estimation-based heuristics. Policy Learning with adaptively collected Data 1Hong Kong University of Science and Technology; 2Chicago University; 3Stanford University; 4New York University Learning optimal policies from historical data enables personalization in many applications. Adaptive data collection is becoming more common for allowing to improve inferential efficiency and optimize operational performance, but adaptivity complicates policy learning ex post. Our work complements the literature by learning policies with adaptively collected data. We propose an algorithm with proven finite-sample regret bound, which is minimax optimal and meets our established lower bound. Quality vs. quantity of data in contextual decision-making: exact analysis under newsvendor loss Columbia University, United States of America We study the performance implications of quality and quantity of data in contextual decision-making. We focus on the Newsvendor loss and consider a data-driven model in which outcomes observed in similar contexts have similar distributions. We characterize exactly the worst-case regret of a classical class of kernel policies. Our exact analysis unveils new structural insights on the learning behavior of these policies that cannot be observed through state-of-the-art general purpose bounds. |