10th Survival Analysis for Junior Researchers (SAfJR) Conference

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session

Epidemiology

Time:

Wednesday, 19/Mar/2025:

9:15am - 10:15am

Session Chair: Benjamin Aretz

Location: Wolfgang-Paul-Saal

Ground floor Uniclub Bonn

Presentations

9:15am - 9:35am

Risk Prediction using Case-Cohort Samples: A Scoping Review and Empirical Comparison

Yangfan Li¹, Ruth Keogh², Christiana Kartsonaki¹

¹University of Oxford, United Kingdom; ²Department of Medical Statistics, London School of Hygiene & Tropical Medicine, United Kingdom

Risk prediction models are commonly developed using data from observational cohorts. When measuring certain variables across an entire cohort is infeasible due to high costs, using a case-cohort sample offers an alternative. The case-cohort design limits the full measurement of covariates to a randomly sampled subcohort along with all remaining cases, which offers an efficient strategy that maintains reasonable statistical power. However, developing risk prediction models in case-cohort data poses challenges due to its outcome-dependent sampling design, where the overrepresentation of cases can distort predictor-outcome relationships if not appropriately accounted for.

Most existing case-cohort analyses have used traditional weighted Cox regression models (Prentice, Self-Prentice, and Barlow), although several methods have been proposed in recent years to develop risk prediction models in case-cohort design, including multiple imputation techniques to leverage full cohort information (Keogh et al., 2013), inverse probability-weighted kernel machine methods (Payne et al., 2016), and Bayesian frameworks for case-cohort Cox regression (Yiu et al., 2021). While these new methods are promising, their practical applicability, advantages, limitations, and real-world performance remain unvalidated.

This study aims to bridge this gap by: (1) reviewing methodological advancements in case-cohort analysis, including variable selection, missing data imputation, sampling design, and model evaluation; (2) summarizing research that has developed risk prediction models with real-world case-cohort data; and (3) comparing available tools (e.g., R, Stata, and Python packages) for case-cohort analysis. Additionally, we plan an empirical comparison of these methods and software tools on case-cohort data from the China Kadoorie Biobank (CKB), allowing us to assess their practical strengths and limitations. Finally, we will discuss current limitations in the field and highlight areas for further development.

9:35am - 9:55am

Implication of the choice of time scales in survival analysis

Judith Vilsmeier¹, Gisela Büchele², Martin Rehm², Dietrich Rothenbacher², Jan Beyersmann¹

¹Institute of Statistics, Ulm University, Ulm; ²Institute of Epidemiology and Medical Biometry, Ulm University, Ulm

The choice of time scale is an important and often discussed topic in time-to-event analysis. While the rule
of thumb is to choose the natural time scale for the underlying problem, it is often not clear what this natural
time scale is. This potentially leads to confusion and if researchers pick the ”wrong” time scale for the
underlying problem, this can lead to biased results. Two often discussed time scales are time since study
entry and time since birth, i.e. age, but there are also other possible time scales. The time scale we focus
on during this talk is the so-called calendar time. Its origin is an arbitrary day before the study entry of the
first patients, and it is useful if calendar time holds important information in addition to just the time from
study entry. However, we will illustrate situations where it can lead to an inflated estimation of the hazard
ratio and overoptimistic p-values when using the Cox proportional hazard model. In this talk a real data
example from the EvaCoM project is used to demonstrate this hazard ratio inflation and the reasons why
it occurs. In this project, the impact of quality audit of healthcare providers was of interest, possible time
scales being calendar time in which the audit acts or a patient’s time-to-event. It serves as an example to raise
awareness to why researchers should be careful when deciding which time scale to use, by highlighting a
situation in which the choice of time scale might not be clear but impactful. Additionally, possible solutions
for scenarios where more than one time scale seem to be important are proposed.

9:55am - 10:15am

Leveraging Cancer Incidence for Lead Time Estimation in Cancer Screening Programmes

Bor Vratanar, Maja Pohar Perme

Institute for biostatistics and medical informatics, Slovenia

In cancer screening programmes, participants are regularly screened every few years using blood tests, urine tests, or medical imaging to detect cancer at an earlier time, when it is presumed to be more curable. Without screening, cancer would likely progress undetected until symptoms appear. The interval between early detection and the eventual onset of symptoms, had screening not been conducted, is known as lead time. Due to lead time, the survival time for screen-detected cancers is artificially extended compared to the cancers detected based on symptoms as they are being followed from an earlier point in time, resulting in a biased comparison. Understanding and estimating lead time is thus crucial for researchers to mitigate lead time bias in cancer screening studies.

Estimating lead time is challenging because it is a hypothetical random variable that can only be inferred indirectly. In our study, we introduce a novel method for estimating lead time, using a data source previously untapped for this purpose—cancer incidence. We hypothesize that earlier detection of cancer due to screening should result in observable shift in cancer incidence rates, stratified by age and year of diagnosis. Our method leverages this information, and estimates lead time using a maximum likelihood estimator. In principle, the user specifies the distribution of lead time, and the method finds the parameters that best fit the observed shift in cancer incidence.

Our approach is flexible, allowing for the inclusion of additional covariates and accounting for overdiagnosis. The data required for this method are routinely available from cancer registries and provided by population tables, making it easy for implementation. We validated our method through simulations and applied it to data from the Slovenian breast cancer screening programme, demonstrating its effectiveness and utility.

Conference Agenda