Session | ||
Epidemiology
| ||
Presentations | ||
9:15am - 9:35am
Risk Prediction using Case-Cohort Samples: A Scoping Review and Empirical Comparison 1University of Oxford, United Kingdom; 2Department of Medical Statistics, London School of Hygiene & Tropical Medicine, United Kingdom Risk prediction models are commonly developed using data from observational cohorts. When measuring certain variables across an entire cohort is infeasible due to high costs, using a case-cohort sample offers an alternative. The case-cohort design limits the full measurement of covariates to a randomly sampled subcohort along with all remaining cases, which offers an efficient strategy that maintains reasonable statistical power. However, developing risk prediction models in case-cohort data poses challenges due to its outcome-dependent sampling design, where the overrepresentation of cases can distort predictor-outcome relationships if not appropriately accounted for. Most existing case-cohort analyses have used traditional weighted Cox regression models (Prentice, Self-Prentice, and Barlow), although several methods have been proposed in recent years to develop risk prediction models in case-cohort design, including multiple imputation techniques to leverage full cohort information (Keogh et al., 2013), inverse probability-weighted kernel machine methods (Payne et al., 2016), and Bayesian frameworks for case-cohort Cox regression (Yiu et al., 2021). While these new methods are promising, their practical applicability, advantages, limitations, and real-world performance remain unvalidated. This study aims to bridge this gap by: (1) reviewing methodological advancements in case-cohort analysis, including variable selection, missing data imputation, sampling design, and model evaluation; (2) summarizing research that has developed risk prediction models with real-world case-cohort data; and (3) comparing available tools (e.g., R, Stata, and Python packages) for case-cohort analysis. Additionally, we plan an empirical comparison of these methods and software tools on case-cohort data from the China Kadoorie Biobank (CKB), allowing us to assess their practical strengths and limitations. Finally, we will discuss current limitations in the field and highlight areas for further development. 9:35am - 9:55am
Implication of the choice of time scales in survival analysis 1Institute of Statistics, Ulm University, Ulm; 2Institute of Epidemiology and Medical Biometry, Ulm University, Ulm The choice of time scale is an important and often discussed topic in time-to-event analysis. While the rule 9:55am - 10:15am
Leveraging Cancer Incidence for Lead Time Estimation in Cancer Screening Programmes Institute for biostatistics and medical informatics, Slovenia In cancer screening programmes, participants are regularly screened every few years using blood tests, urine tests, or medical imaging to detect cancer at an earlier time, when it is presumed to be more curable. Without screening, cancer would likely progress undetected until symptoms appear. The interval between early detection and the eventual onset of symptoms, had screening not been conducted, is known as lead time. Due to lead time, the survival time for screen-detected cancers is artificially extended compared to the cancers detected based on symptoms as they are being followed from an earlier point in time, resulting in a biased comparison. Understanding and estimating lead time is thus crucial for researchers to mitigate lead time bias in cancer screening studies. Estimating lead time is challenging because it is a hypothetical random variable that can only be inferred indirectly. In our study, we introduce a novel method for estimating lead time, using a data source previously untapped for this purpose—cancer incidence. We hypothesize that earlier detection of cancer due to screening should result in observable shift in cancer incidence rates, stratified by age and year of diagnosis. Our method leverages this information, and estimates lead time using a maximum likelihood estimator. In principle, the user specifies the distribution of lead time, and the method finds the parameters that best fit the observed shift in cancer incidence. Our approach is flexible, allowing for the inclusion of additional covariates and accounting for overdiagnosis. The data required for this method are routinely available from cancer registries and provided by population tables, making it easy for implementation. We validated our method through simulations and applied it to data from the Slovenian breast cancer screening programme, demonstrating its effectiveness and utility. |