Assurance methods for designing a clinical trial with a delayed treatment effect
James Salsbury1, Jeremy Oakley1, Steven Julious2, Lisa Hampson3
1The School of Mathematics and Statistics, University of Sheffield, United Kingdom; 2The School of Health and Related Research, University of Sheffield, United Kingdom; 3Advanced Methodology and Data Science, Novartis Pharma AG, Switzerland
An assurance (probability of success) calculation is a Bayesian alternative to a power calculation. These calculations are becoming more regularly performed in industry, especially in the design of Phase III confirmatory trials. Immuno-oncology (IO) is a rapidly evolving area in the development of anticancer drugs. A common phenomenon that arises from IO trials is one of delayed treatment effects, that is, a delay in the separation of the Kaplan-Meier survival curves. To calculate assurance for a trial in which a delayed treatment effect is likely to be present, uncertainty about key parameters needs to be considered. If uncertainty is not considered, then the number of patients recruited may not be enough to ensure we have adequate statistical power to detect a clinically relevant treatment effect. We present an elicitation technique for when a delayed treatment effect is likely to be present and show how to compute assurance using these elicited prior distributions. We provide an example to illustrate how this could be used in practice.
Investigating the most suitable modelling framework to predict long-term restricted mean survival time and life expectancy.
Hannah Louise Cooper, Mark Rutherford, Sarah Booth
Biostatistics Research Group, Department of Population Health Sciences, University of Leicester
Background: Survival statistics provided at fixed points after diagnosis are often misinterpreted, with difficulties stemming from understanding the scale of the measures, and the fact that various estimands are frequently used interchangeably despite having different interpretations. Life expectancy estimates have been shown to be more easily understood however are not commonly used in practice as they often require extrapolation. We sought to explore: the best modelling framework for providing this extrapolation, how much follow-up is required, and under which circumstances valid estimates of Life Expectancy/40-year Restricted Mean Survival Time (LE/40-year RMST) can be obtained.
Methods: Data from the Surveillance, Epidemiology and End Results program (SEER) involving nine cancer registries across the USA was used to develop two flexible parametric models of varying complexity (Model 1: Solely main effects, Model 2: Main effects, interactions and time dependent effects) for patients diagnosed with colon cancer. Complete case analysis was conducted resulting in 35,903 colon cancer patients under investigation. The two models were compared within an all-cause, cause-specific and relative survival framework and the amount of follow up time used to develop the models was varied (2-, 3-, 5-, 10-, 20-years). 40-years of follow up was also inspected to assess model fit with all available data. To assess the accuracy of the extrapolations, marginal predictions were made from each of the statistical models and for each follow-up scenario across 30 covariate groupings based on age, sex and stage of tumour. LE/40-year RMST and 40-year survival probability were calculated. Estimates were then compared to the observed values based on Kaplan-Meier estimates.
Results: Reasonable accuracy of LE/40-year RMST for all covariate groups was generally demonstrated using 10+ years follow up. The relative survival framework demonstrated to be generally most suited to predicting LE/40-year RMST. Increasing model complexity improved predictions for younger patients however limited difference was demonstrated for higher risk patients (older and/or distant cancer).
Conclusion: This study demonstrated increasing model complexity is not always paramount to estimate LE/40-year RMST with recommended model complexity dependent on the patient’s risk. Results demonstrate LE/40-year RMST estimates do not require long follow up, where often 10 years of follow-up is sufficient. Findings offer the ability to reasonably estimate LE for colon cancer patients of varying risk. Further work to understand the generalisability of this study to other cancers is needed. Likewise, internal and external validation should be conducted to support findings.
Methods for Analyzing Multiple Time-to-Event Endpoints in Randomized Clinical Trials: A Comprehensive Overview
Duoerkongjiang Alidan, Ann-Kathrin Ozga
University Medical Center Hamburg-Eppendorf, Germany
In clinical trials time-to-event analysis is crucial for assessing treatment efficacy and patient outcomes, e.g. time to myocardial infarction, time to hospitalization, or time to death in cardiovascular trials. Often thereby only the time to the first occurring event is considered and time-to-first-event methods, such as the Cox proportional hazards model and Kaplan-Meier estimate, are applied. However, they inherently overlook subsequent or competing events and fail to fully capture the complex nature of disease progression, potentially leading to incomplete or biased insights into treatment efficacy.
This research highlights a critical gap in conventional methods: their inability to differentiate between the clinical relevance of non-fatal and fatal events, and their neglect of the correlation and competition between multiple event types. For instance, myocardial infarction may heighten the risk of future events, including subsequent infarctions or death, creating a complex interrelationship that traditional models fail to address. This not only limits the insights drawn from such trials but also undervalues the comprehensive patient experience.
To overcome these limitations, we explore advanced methods for recurrent and competing events, including Cox based models for recurrent event analysis, multistate models, or flexible parametric models. Through a thorough comparison of semi-parametric, non-parametric, and parametric methods, we aim to provide applied researchers with a clear understanding of the advantages, assumptions, and limitations of each approach. Importantly, we examine which methods best align with the complex nature of real-world clinical trial data, particularly in randomized controlled trials comparing two treatment groups.
The novelty of our work lies not only in providing a systematic overview but also in applying these methods to a real-world cardiovascular dataset, showcasing how multiple event types (e.g., hospitalization, stroke, and death) can be more accurately analyzed.
By addressing these challenges, this study paves the way for more sophisticated analyses that can better inform treatment decisions and improve patient care. We aim to demonstrate how embracing these advanced methods can unlock deeper, clinically relevant insights that were previously inaccessible using traditional time-to-first-event techniques.
Reconstructing Survival Curves: Using imputation Strategies to construct Kaplan-Meier Estimates with no or limited Data on Survivors
Luzia Berchtold1, Thierry Gorlia2, Marjolein Geurts3, Michael Weller4, Matthias Preusser1, Franz König1
1Medical University of Vienna, Austria; 2EORTC Headquarter, Brussels, Belgium; 3Erasmus MC Cancer Institute, Rotterdam, the Netherlands; 4University Hospital and University of Zurich, Zurich, Switzerland
Secondary use of clinical trial data has become increasingly valuable in medical research, providing new insights without the expense of conducting additional trials. However, legal, ethical, and proprietary restrictions often limit access to datasets, e.g., by providing only access to the data of deceased patients. Analysing restricted datasets only carries a risk of bias since the results depend on the proportion of subjects missing. This creates a major challenge when interpreting such survival analyses. Conventional methods like multiple imputation cannot be applied as they would require information on missing patients, making them unsuitable in cases where no data from surviving patients is available.
Motivated by a real case study, we conducted a simulation study to explore the strength of the bias in different settings. The main methods of analysis include Kaplan-Meier survival curves, log-rank tests, and Cox regression models to estimate and compare survival probabilities. We explored different imputation strategies for censored patients when using other publicly available information, e.g., minimum and maximum censoring times. In the context of a randomized clinical trial we will discuss which additional information and assumptions will be needed to perform different imputation strategies. The impact of various strategies on operating characteristics such as type 1 error rate, power, and bias were evaluated. All analyses were conducted in R using the "survival" package.
We show the severe limitations of performing an analysis on restricted datasets only. Kaplan-Meier estimates for the groups will be closer together, resulting in a loss of power. For instance, even in a scenario with only 10% administrative censoring and a true underlying hazard ratio of 0.7 and a sample size of 300, the power will be decreased from 84% when using the complete datasets to 39% power in the restricted setting. As censoring increases, differences in the Kaplan-Meier plot become harder to detect, and both the survival function and estimates like median overall survival (OS) are underestimated.
In conclusion, restricting survival analysis to deceased patients introduces a significant bias, including underestimated survival functions and reduced statistical power, especially with higher proportions of censoring. When some information about the missing patients and censoring process is available, even a uniform imputation strategy greatly improves estimation accuracy. However, this information is often not available when analyzing non-randomized variables and the imputation strategy is not applicable. In these cases a worst case analysis should be carried out.
A Pareto-Driven Ensemble Feature Selection Approach Optimizes Biomarker Discovery in Multi-omics Pancreatic Cancer Studies
John Zobolas1, Anne-Marie George2, Alberto López1, Sebastian Fischer3, Marc Becker3, Tero Aittokallio1
1Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway; 2Department of Informatics, University of Oslo, Oslo, Norway; 3Department of Statistics, LMU Munich
To address the pressing clinical demands of today, it is crucial to implement models that select minimal, cost-effective features. Feature selection in machine learning aims to fulfill this need by identifying the most predictive biomarkers with minimal redundancy. We have developed a multi-omics ensemble feature selection (EFS) approach that identifies the most significant biomarkers for a given cohort of patients. Our approach leverages multiple machine learning algorithms to discover optimal features for classification, regression, and survival analysis tasks.
The EFS method ranks features using voting theory, ensuring that all ensemble model perspectives are considered. The optimal number of selected features is determined through a Pareto-based knee-point identification method, providing a trade-off between sparsity and performance. When applied to multi-omics datasets from pancreatic cancer studies, our approach successfully identifies minimal biomarkers relevant to both the clinical outcome and the underlying biology of the disease. Overall, EFS offers a reliable and clinically valuable tool for biomarker discovery in cancer research.
Increasing flexibility for the meta-analysis of full ROC curves – a copula approach
Ferdinand Valentin Stoye1, Oliver Kuss2, Annika Hoyer1
1Biostatistics and Medical Biometry, Medical School OWL, Bielefeld University, Germany; 2Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Institute for Diabetes Research at Heinrich Heine University Düsseldorf, Germany
The development of new statistical methods for the meta-analysis of diagnostic test accuracy (DTA) studies is a vivid field of research, especially with respect to summarizing full receiver operating characteristic (ROC) curves. Most current approaches to this task utilize Gaussian random effects to account for between-study heterogeneity. While Gaussian random effects are generally tractable using established statistical modeling frameworks and therefore a convenient choice, there is no conceptual reason to restrict the dependence structure to be Gaussian. To increase flexibility in the meta-analysis of ROC curves, we substitute Gaussian random effects with copulas, leading to the ability to directly model the dependence between sensitivity and specificity and an increased control over the estimation procedure. While the resulting models are numerically challenging, they lead to much more flexible and modular model structures when compared to Gaussian random effects. Combined with re-arranging the results reported by DTA studies as being bivariate interval-censored time-to-event data and clinically plausible parametric assumptions for the resulting mixtures in the marginal distributions, this leads to a powerful model to estimate summary ROC curves. An additional advantage of using copulas is the ability to provide a closed-form likelihood, enabling the possibility to use general purpose likelihood optimization strategies. In a simulation study, we use Clayton, Galambos, and Joe copulas with Weibull-binomial and Weibull-normal marginal distributions to compare the resulting models to three alternative approaches from the literature. Our copula models are able to create very flexible model fits with high convergence probabilities and perform similarly to competing models. However, they are also numerically unstable, leading to larger variations in bias as well as lower empirical coverages in the simulation compared to alternative models. This behavior gives rise to the need for a more robust estimation procedure for the copulas. We also show the practical applicability of our copula models to data from a meta-analysis for the screening of type 2 diabetes, leading to plausible estimates for summary ROC curves and the area under the curves.
Effective sample size for Cox models: A measure of individual uncertainty in survival predictions
Toby Hackmann1, Doranne Thomassen1, Saskia le Cessie1,2, Hein Putter1, Ewout W Steyerberg1,3, Liesbeth C de Wreede1
1Department of Biomedical Data Sciences, Leiden University Medical Center, the Netherlands; 2Department of Clinical Epidemiology, Leiden University Medical Center, the Netherlands; 3Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, the Netherlands
Background/Introduction: Clinical prediction models are becoming increasingly popular to support shared decision-making. Most prediction models prioritize the accurate estimation and clear communication of point predictions. Uncertainty around the point prediction may be expressed by confidence intervals or left out altogether. To present prediction uncertainty in an intuitive way, the concept of effective sample size has recently been introduced for linear and generalized linear models, but not yet for the Cox model.[1] Our goal is to provide estimates of the effective sample size for individual predictions based on a Cox model.
Methods: Effective sample size is defined as the hypothetical sample size of patients with the same characteristics (with respect to the model) as a new patient whom the prediction is for, such that the variance of the outcome in that sample would be the same as the prediction variance. It can be calculated as a ratio of the outcome variance conditional on the predictor values to the prediction variance. We estimate the effective sample size for Cox model predictions and investigate the behaviour of this estimate in an illustrative clinical data set of colon cancer patients and through simulations.
Results: The variance of a prediction based on a Cox model depends on the variance of the estimated coefficients and variance of the baseline hazard, which is impacted by censoring. This variance can readily be estimated asymptotically based on the delta method, or using resampling-based methods such the bootstrap. We will show the behaviour of effective sample size for patients as a function of follow-up time. Patients who have covariate values close to the population mean have a higher effective sample size, while patients with rare covariate combinations may have a very small effective sample size, and thus high uncertainty.
Conclusions: Effective sample size can express the uncertainty of predictions for individual patients from a Cox model. Future studies should clarify its role to communicate uncertainty around the point prediction of survival probabilities and a possible role in model building.
References:
[1] Thomassen, D., le Cessie, S., van Houwelingen, H. C., & Steyerberg, E. W. (2024). Effective sample size: A measure of individual uncertainty in predictions. Statistics in Medicine. https://doi.org/10.1002/sim.10018
Funded by the European Union under Horizon Europe Work Programme 101057332.
Imputation Free Deep Survival Prediction using Conditional Variational Autoencoders
Natalia Hong1,2,3, Christopher Yau1,2
1University of Oxford; 2Health Data Research UK; 3The Alan Turing Institute
The availability of Electronic Health Records (EHR) offers an opportunity to develop risk prediction tools to support clinical decision making. Yet, EHRs are generated from active clinical processes, and unlike data from controlled studies, lack complete records as only information deemed necessary for patient management is captured. This selective capture implies that apparent missingness can itself be informative for risk prediction. Crucially, missing data occurs both during model training and deployment, which is challenging for algorithms that rely on fixed-size inputs. Effective predictive tools must accommodate this variability in inputs to ensure the optimal use of data available to clinicians at the point of care.
Missing data is commonly addressed through imputation, where missing entries are “filled-in” before being passed to the model. This approach requires extra validation of the imputation model and relies on untestable assumptions about the missingness mechanism, potentially leading to unreliable predictions. Imputation can also overlook the informative nature of missingness as models cannot distinguish between imputed and observed entries. Alternatively, the Missing Indicator Method introduces missingness indicators into the model, but risk overfitting to specific patterns of missingness. The sharing pattern submodel (SPS) approach builds a separate model for each missing pattern while encouraging information sharing across submodels, without making missingness mechanism assumptions. However, it suffers from combinatorial inefficiencies due to the exponential growth in the number of patterns (2p) and is limited to linear models.
We introduce an imputation-free framework that employs Conditional Variational Autoencoders (VAEs) jointly trained with any deep survival model to predict risk using incomplete EHRs. Our approach leverages VAE’s ability to learn the distribution of missing patterns within a latent space to capture similarities across patterns. The learned latent embedding is fed directly into the deep survival model, enabling non-linear modelling while avoiding the combinatorial inefficiencies of SPS. We demonstrate our proposed framework with the deep survival model DeSurv, through simulation studies and two retrospective cohorts from the Clinical Practice Research Datalink primary care database. Our results show that the proposed framework is more robust in generalising to unseen or rare missingness patterns, with improved performance according to calibration-based survival metrics. The incorporation of a variational structure allows the model to decouple the learning of data and missingness, offering a more nuanced understanding of how missingness influences predictions. This framework provides a practical, consistent approach to handling missing data across development, validation, and deployment stages, all while maintaining strong performance.
Building risk prediction models by synthesizing national registry and prevention trial data
Oksana Chernova, Donna Ankerst
Technical University of Munich, Germany
Models that predict cancer incidence aid developing personalized screening regimens and models that predict outcomes after a cancer diagnosis are used to individually tailor treatments. Developing risk prediction models requires specifying a set of predictors at the baseline of the prediction and a projection period of interest, commonly five-years, thus requiring cohorts with adequate follow-up. In recent decades, there has been a notable increase in the accessibility of extremely large datasets for research purposes. These expansive repositories encompass diverse data sources such as population-based census records, disease registries, healthcare databases, and collaborative initiatives from numerous individual studies. The notable advantage of these extensive datasets is their impressive sample size, which enhances their utility for research endeavors. The first breast incidence cancer risk prediction model, the Breast Cancer Risk assessment Tool (BCRAT, also known as The Gail Model) was built by combining odds ratio estimates for risk factors approximating relative risks from a case-control study with age-specific breast cancer incidence and competingmortality data on US women from the Surveillance, Epidemiology and End Results (SEER) registries. The statistical method of combining has since been extended to develop similar clinical risk models for projecting cancer risk in special populations as well as for different cancers; see for example, Gail et al, 1989, 2007, Chen et al, 2006, Freedman et al, 2009, and Pfeiffer et al, 2013. The method has not been applied to build risk models for the projection of five-year prostate cancer risk to the knowledge of authors of this study. Gelfond et al 2022 developed a five-year prostate cancer risk model using two widely known prostate cancer prospective screening studies that completed in the early 2000s to be used in this article. Their model did not incorporate contemporary incidence rates from SEER. The purpose of this study was to provide the statistical methodology and corresponding R implementation for building pure risk prediction models merging cohort and SEER incidence data, exemplified through the development of a five-year prostate cancer risk prediction tool.
An R function for data preparation for an acyclic multistate model with non-ordered intermediate states.
Lorenzo Del Castello1, Davide Bernasconi1,2, Laura Antolini1, Maria Grazia Valsecchi1,3
1Bicocca Bioinformatics Biostatistics and Bioimaging Centre - B4, School of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy; 2Functional Department for Higher Education, Research, and Development, ASST Grande Ospedale Metropolitano Niguarda, Milan, Italy; 3Biostatistics and Clinical Epidemiology, Fondazione IRCCS San Gerardo dei Tintori, Monza, Italy
Background: The package “mstate” is a popular R library to perform multistate analysis [1]. The function “msprep” provides a simple tool to handle data preparation where a wide-format dataset (one row per patient) must be transformed in long-format (one row for each possible transition). However, “msprep” allows only for triangular transition matrices corresponding to irreversible acyclic Markov processes.Sometimes this is too restrictive. Hazard et Al. [2] used multistate models to analyze care pathways of patients admitted to ICU for severe COVID-19. Patients were allowed to move back and forth between non-absorbing states. Thus, the authors developed a dedicated R function for data preparation which overcomes the limits of “msprep”.
Methods: A further complexity, not present in the application by Hazard, could be the case of an acyclic model with no defined order between intermediated states. In our application, patients affected by cardiogenic shock are admitted to ICU and then may be treated at different times with none, one, two or three mechanical circulatory supports, namely IABP, Impella or ECMO. A patient may be treated with any of these devices and there is no specific order between treatments. However, no patient can be treated twice with the same device (e.g., after IABP one could receive another device but then cannot go back to IABP). The final absorbing state is a composite endpoint (first event between LVAD implantation, heart transplant or death). In this context, transition to a certain intermediate state blocks the possibility of future back-transitions to that state.
Results: We developed an R function allowing for data preparation in this situation. The proposed function transforms a dataset from wide to long format that can be passed to the function “msfit”. The new dataset will be prepared in order to cause the contribution to the risk sets in time to be consistent with the dynamic update of possible transitions. This is achieved through the definition of a new matrix, called “blockmat”, in which the paths that cannot be observed are declared blocked. Without the specification of the blockmat, some subjects would be considered at risk of presenting also transitions which they can never travel. In practice, this would result in the presence of additional unnecessary rows in the long dataset that will cause an underestimation of transition hazards.
Conclusion: We created an R function allowing for data preparation in case of an acyclic multistate model with non-ordered intermediate states.
A Pragmatic Approach to the Estimation of the Interventional Absolute Risk in Continuous Time
Johan Sebastian Ohlendorff, Thomas Alexander Gerds, Anders Munch
University of Copenhagen, Denmark
In medical research, causal effects of treatments that may change over time on an outcome can be defined in the context of an emulated target trial. We are concerned with estimands that are defined as contrasts of the absolute risk that an outcome event occurs before a given time horizon (tau) under prespecified treatment regimens. Most of the existing estimators based on observational data require a projection onto a discretized time scale (van der Laan et al., 2011). We consider a recently developed continuous-time approach to causal inference in this setting (Rytgaard et al., 2022) which theoretically allows preservation of the precise event timing on a subject level. Working on a continuous-time scale may improve the predictive accuracy and reduce the loss of information. However, continuous-time extensions of the standard estimators comes at the cost of increased computational burden. In this talk, I will discuss a new sequential regression type estimator for the continuous-time framework which estimates the nuisance parameter models by backtracking through the number of events. This estimator significantly reduces the computational complexity and allows for efficient, single-step targeting using machine learning methods from survival analysis and point processes, enabling robust continuous-time causal effect estimation.
References:
Rose, S., van der Laan, M.J. (2011). Introduction to TMLE. In: Targeted Learning. Springer Series in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9782-1_4
Rytgaard H C, Gerds T A, Laan M J van der (2022) Continuous-Time Targeted Minimum Loss-Based Estimation of Intervention-Specific Mean Outcomes. The Annals of Statistics 50:2469–2491. https:// doi.org/10.1214/21-AOS2114
Testing the Similarity of Healthcare Pathways based on Transition Probabilities - A New Bootstrap Procedure
Zoe Kristin Lange1, Holger Dette1, Maryam Farhadizadeh2, Nadine Binder2
1Ruhr-Universität Bochum, Germany; 2Universität Freiburg, Germany
Background
Establishing a common standard of care across clinics and finding the best treatment strategies for diseases are important goals in the healthcare system. To achieve these goals we work with healthcare pathways of patients, consisting of sequences of treatments and other events like hospital readmissions or diagnosis procedures over time. Working with healthcare pathway data is attractive since this data is collected by clinics routinely and therefore, has a high availability. However, the healthcare pathways of different patients tend to be highly heterogeneous, especially for rare diseases lacking a standardized treatment strategy. With the similarity testing approach, presented here, we can find patterns, namely typical pathways, in this heterogeneous data.
Methods
We model the treatment strategies for two different groups of patients by multistate models X1 and X2 and analyze their similarity based on their transition probabilities. That is, we test the hypotheses
H0: d(P1,P2) ≥ ε versus H1: d(P1,P2) < ε.
Here, P1 and P2 are the transition probability matrices of X1 and X2, respectively, while ε > 0 is a threshold and d a distance on the space of matrices. Groups with similar healthcare pathways, according to the test, are pooled into one group, representing a typical pathway. Based on these pooled data sets, one can perform further estimation tasks like estimating the chances of recovery for a typical treatment. The increased sample size, that results from pooling similar pathways, yields to more accurate statistical inference, especially in small sample settings. This is crucial to reliably identify the best treatment strategy.
Results
We introduce a parametric bootstrap test that is tailored to our similarity hypotheses above. The special attribute of this test consists in the fact that the estimators used for resampling are calculated with respect to a constraint. We proof the validity of this constrained parametric bootstrap test for different measures of similarity d. In the next three month, we will run simulations for the bootstrap test for different sample sizes and analyze the performance of the test on real prostate cancer data from the “Universitätsklinik Freiburg”.
Conclusion
Testing the similarity of healthcare pathways to identify best treatment strategies is a new and promising approach that accounts for small sample sizes and draws on easily available data. Additionally, the constrained parametric bootstrap test can easily be adapted to other settings beyond healthcare pathways and can therefore, be of interested in general similarity testing problems.
Introducing a flexible model for regression models with a left-censored response and covariate.
Inez De Batselier, Roel Braekers
Hasselt University, Belgium
In most studies with left-censored data, the focus lies on regression models where only the response is left-censored. In this work we also add a left-censored covariate, which complicates the situation. In this situation people often resort to naive methods, such as a complete case analysis or a limit of detection substitution. To step away from these methods, Tran et al (2021) proposed a method in which a parametric assumption is made about the distribution of the covariate. With our new method we want to avoid this parametric assumption. Instead of implying this assumption on the covariate, we estimate its distribution via a flexible model, namely through introducing a piecewise exponential distribution with fixed cut-off points. This piecewise exponential distribution offers robustness against misspecification of the covariate’s distribution. In our methodology we propose a one-stage and a two-stage estimation. In the one-stage estimation all the parameters are estimated simultaneously. While in the two-stage estimation the parameters from the piecewise exponential distribution are estimated first and then plugged into the likelihood to estimate the rest of the regression parameters. The one-stage estimation is theoretically more feasible, but may introduce dimensionality problems due to the many parameters to be estimated. This dimensionality problem can be reduced by the two-stage estimation. Via simulations we show that our new methodology produces unbiased estimates for medium-sized datasets and fairly large amounts of censoring in the covariate and the response. A comparison of our method with other methods, via simulations, shows that it outperforms the others.
References:
Tran Thao, Abrams Steven, Aerts Marc, Maertens Kirsten, Hens Niel (2021), Measuring association among censored antibody titer data, Statistics in Medicine, 1-22.
Extending landmarking to mixture cure models with time-varying covariates
Marta Cipriani1,2
1Sapienza University of Rome; 2Leiden University, Italy
Mixture cure models are commonly used in survival analysis when a subset of individuals is considered cured and no longer at risk for the event of interest. A key challenge in such models arises when covariates are time-varying, requiring dynamic updates in both the cure probability and the survival prediction for the uncured population.
Landmarking, a dynamic approach that updates survival predictions at specific time points, has been proposed for mixture cure models (Shi and Yin, 2017). However, existing landmarking methods, such as the last observation carried forward (LOCF) approach, often fail to fully capture the complexity of longitudinal data.
To address this limitation, we propose a novel framework that extends hierarchical (mixed-effects) models to incorporate time-varying covariates in both the incidence (cure probability) and the latency (survival for the uncured) components of a mixture cure model. Hierarchical models, also known as multilevel or mixed-effects models, allow for random effects accounting for the individual-specific unobserved heterogeneity over time.
In the proposed framework, we model longitudinal covariate trajectories using a mixed-effects model to estimate individual-level random effects. These random effects are then integrated into the mixture cure model, providing a robust way to predict survival outcomes based on dynamically updated covariate information.
Our approach integrates the most recent covariate information into both the incidence (cured) and the latency (uncured) components of the mixture cure model, enabling more accurate, dynamic predictions for individual patients. This approach addresses the uncertainty in covariate estimation and adjusts survival predictions in real time as new information becomes available.
We show the features of this approach through a simulation study to compare our extension to current landmarking methods. The simulations help to illustrate the advantages of using individual-specific random effects in improving prediction accuracy, reducing bias, and providing more flexible modelling of time-varying covariates. Additionally, we apply our method to a real-world dataset including liver-transplanted patients, highlighting the improved performance of dynamic survival predictions in complex longitudinal settings.
Multi-state models for individualized treatment response prediction and risk assessment in Multiple Myeloma
Sebastian Schwick1, Shammi More1, Holger Fröhlich1,2
1Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany; 2Bonn-Aachen International Center for Information Technology (b-it), University of Bonn, Bonn, Germany
Multiple Myeloma (MM) is a complex and highly heterogeneous hematologic malignancy. Despite frequent advancements in treatments, MM remains incurable, with patients experiencing multiple relapses at various disease stages and a wide range of overall survival. This heterogeneity in disease progression necessitates the need for a personalized treatment approach. Novel treatments for relapsed/refractory MM (RRMM) such as chimeric antigen receptor (CAR) T-cell immunotherapy have shown promising results but may come with significant risks for adverse events. Therefore, thorough research on eligibility criteria for CAR T-cell therapy and risk assessment is critical. Recent studies have shown the advantage of a multi-state framework for individualized risk prediction in newly diagnosed MM (NDMM) by integrating clinical, genomic and treatment variables. We compare and evaluate different multi-state methods and structures, including treatment steps or level of response as possible states, to predict and simulate patient trajectories across multiple lines of therapy. We incorporate longitudinal measurements of MM specific biomarkers (e.g. M protein), treatment regimen and procedures (e.g. stem cell transplantation) as transition specific covariates to assess their impact on the course of the disease. We explore the advantages of utilizing Deep Learning approaches, like neural ordinary differential equations for multi-state models, to reduce model assumptions and increase flexibility. Furthermore, we aim to extend these models to RRMM by incorporating the patient’s extensive treatment history and its influence on future treatment outcomes, including the response to CAR T-cell therapy and the risk of adverse events. The developed models will be compared to other machine learning models such as random survival forests and XGBoost. The models will be designed as part of a virtual twin (VT) for MM patients developed by the CERTAINTY (A CEllulaR ImmunoTherapy VirtuAl Twin for PersonalIzed Cancer TreatmeNT) project. Clinical study data (Multiple Myeloma Research Foundation CoMMpass trial: NCT01454297) as well as real world data provided by members of the CERTAINTY consortium will be used for model development. For this purpose, we also consider federated learning approaches, to allow for continuous model improvement as new data becomes available across multiple healthcare centers, while ensuring patient data privacy.
Bayesian Joint Modeling of Bivariate Longitudinal and Time-to-Event Data: With Application of Micro and Macro Vascular Complication in People with Type 2 Diabetes and Hypertension.
Mequanent Mekonen, Edoardo Otranto, Angela Alibrandi
University of Messina, Italy
Researchers often collect data on chronic disease, including multiple longitudinal measures and time-to-event outcomes. When the time-to-event and the longitudinal outcomes are associated, modeling them separately may give biased estimates. Univariate joint modeling of longitudinal and time-to-event outcomes is an effective method for evaluating their association. However, this model-based analysis can lead to biased estimates when multiple longitudinal outcomes are significantly correlated and deviate from a multivariate normal distribution. We used a bivariate joint model with a skewed multivariate normal distribution to provide a flexible approach for non-symmetrical and correlated longitudinal outcomes. This involved a bivariate linear mixed effects model for longitudinal outcomes and a Cox proportional hazards model for time-to-event outcomes, linked through shared random effects. We estimate the parameters using a Bayesian framework and implement Markov chain Monte Carlo methods via R with JAGS software. The methods are demonstrated using retrospective cohort data from individuals with type 2 diabetes and hypertension at Felgehiwot Referral Hospital in northern Ethiopia. We conduct simulation studies to evaluate the performance of the proposed method. The bivariate linear mixed effects model shows a significant positive relationship between the trajectory of systolic blood pressure and fast blood glucose, which increases significantly over time. The risk of experiencing microvascular complications increased as the subject-specific change rate in fast blood sugar and systolic blood pressure measurements increased (hazard ratio = 1.55, 95% confidence interval: 1.067 to 2.556, and hazard ratio = 3.18, 95% confidence interval: 0.62 to 13.243, respectively). Baseline body mass index (hazard ratio = 1.768, 95% confidence interval: 1.232 to 2.55) and triglycerides (hazard ratio = 1.685, 95% confidence interval: 1.185 to 2.418) were positively associated with the risk of microvascular complications. Our studies suggest a strong and significant positive relationship between the patterns of blood glucose levels and systolic blood pressure. Over time, increases in both blood glucose levels and systolic blood pressure raised the risk of microvascular and macrovascular complications. In bivariate joint modeling, using a skewed multivariate normal distribution instead of a normal distribution makes the model fit better and gives more accurate parameter estimates for the longitudinal biomarker.
Does a SARS-CoV-2 infection increase the risk of dementia? An application of causal time-to-event analysis on real-world patient data
Jannis Guski1, Holger Fröhlich1,2
1Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI, Germany; 2Universität Bonn, Germany
Despite the relative recency of the global pandemic, there is evidence that a SARS-CoV-2 infection may act as a catalyst in the gradual manifestation of neurodegenerative diseases. For example, comparisons of biomarkers before and after an infection revealed signs of changes in brain structure, neuroinflammation and disruptions of the blood-brain barrier.
As part of the EU-funded COMMUTE project, we deploy and adapt time-to-event models on real-world data to investigate whether a SARS-CoV-2 infection increases the risk or accelerates the process of developing Alzheimer’s or Parkinson’s disease, and which factors from a patient’s history may play a role in this assumed co-pathology.
Any findings are potentially affected by baseline confounding, dependent censoring and death from any cause as a competing risk. These problems may be amplified by the problem that control patients have to be sampled from before the pandemic because presumably, almost everyone has had a documented or undocumented SARS-CoV-2 infection at some point after 2020.
In a first step, we estimate average exposure effects of COVID-19 on event risks with statistical models that address the potential sources of bias via Targeted Maximum Likelihood Estimation (TMLE). Predicted effects undergo a detailed sensitivity analysis with respect to different datasets, alternative control group design and dataset stratification (e.g., by infection wave, age and sex).
Furthermore, causal machine learning models are trained that will be able to make individualized risk and exposure effect predictions for new patients. These models may derive encodings of structured data from electronic health records, e.g., via transformer architectures.
At the SAfJR, we would like to present preliminary results from our medical application of causal inference in time-to-event analysis, and discuss with the expert audience how our methodology may be further refined.
Do commonly used Machine Learning implementations allow for IPCW to address censoring? A closer look at scikit-learn.
Lukas Klein, Gunter Grieser, Henrik Stahl, Antje Jahn
UAS Darmstadt, Germany
In 2016 the central German organ transplantation registry (TxReg) was established [3] to enhance research in organ transplantation. Despite its potential, the data is rarely used for analyses [3]. One possible reason for its limited use could be data issues [4]. Related to survival analysis, significant issues are short maximum follow-up time and annual reporting schedules with occasional in-between reportings, limiting the ability to create long-term survival predictions.
Potentially motivated by similar issues, predictions based on transplantation registries from other countriesare often derived for specific time points [5-7], using classifiers for probability predictions. When adopting this approach, one common method to address censoring is inverse-probability of censoring weighting (IPCW) [8]. With IPCW, instance weights are used in the fitting process to achieve unbiased predictions.
Many machine learning libraries such as scikit-learn accept sample weights. However, the documentation of instance weight arguments for the training process usually lacks the implementation details, where and how weights exactly are used and, finally, if the implementation targets the IPCW approach.
For example, in random forests, weights can be applied during the fitting process, either in the bootstrap sampling or in the split criteria within trees. In algorithms derived from gradient descent, weighted loss functions are used. The weights then also affect the Hessian matrix, influencing the optimization process.
We examined the implementation of commonly used classification methods such as the Random Forest Classifier, the Gradient Boosting Classifier and more to assess if the implementation of sample weights can be used to address censoring by IPCW. To support our findings, we conduct a simulation study following ADEMP [9] principles. Our goal was the identification of implementations which are able to successfully produce unbiased predictions, when IPCW is applied with increasing censoring rates. Data for the simulation was generated using a Weibull model with varying censoring rates and normally distributed covariates. The objective was to predict survival at a specific time point, using linear models, dense neural networks, tree-based methods, gradient boosting approaches and a model independent weighted bootstrap approach. For the evaluation, the bias on a separate test dataset was calculated. We also compared model performance on the TxReg data for transplantations from deceased donors.
We provide an overview of different instance weighting implementations in current libraries and demonstrate that there are situations, where some implementations prevent the IPCW approach from being able to correct the bias caused by right censoring.
Combining machine learning methods for subgroup identification in time-to-event data with approximate Bayesian computation for bias correction
Henrik Stahl1, Lukas Klein1, Gunter Grieser1, Antje Jahn1, Heiko Götte2
1UAS Darmstadt, Germany; 2Merck Healthcare KGaA, Darmstadt, Germany
Personalized medicine is a crucial aspect in finding effective treatments for patients. In clinical development it is essential to identify subgroups of patients who exhibit a beneficial treatment effect, ideally before moving to confirmatory trials. The identified subgroups could be defined by predictive biomarkers with corresponding cut-off values. However, once biomarkers or corresponding cut-offs are selected in a data-driven manner a selection bias is introduced, i.e. the treatment effect within the selected subgroup is overestimated.
In previous work, the approximate Bayesian computation (ABC) algorithm was utilized to correct for this selection bias [1]. This approach rather covers a reduced range of potential subgroups that are defined by cut-off values. Machine learning (ML)-based subgroup identification methods allow to cover much more potential subgroups with the downside of even greater bias and less interpretable subgroup definitions. Our goal is to extend the ABC algorithm to correct for selection bias also in these situations. Since our research is motivated by clinical trials in oncology, we will focus on time-to-event data such as overall survival or progression-free survival time.
ABC is a simulation approach that selects simulation runs where some particular statistic calculated from trial data at hand is similar to that calculated from simulated data where the true treatment effects are known. The true treatment effects from the selected simulation runs then define an approximation of their posterior distribution that is used for bias correction. Compared to [1] ML methods raise additional questions that makes an extension not straight forward: The higher the complexity of the ML approach is the less comparable are the subgroup definitions between the simulation runs. Therefore, next to bias correction also “overlap with true subgroup”, “rate of correct biomarker inclusion” and “similarity in subgroup size” has to be assessed. Depending on the underlying goal of the ML algorithm there is also a higher or lower inherent tendency for bias and a methods potential for correcting that bias needs to be traded off against its potential to identify the “right” patients.
All those aspects are investigated in simulation studies based on the ADEMP framework [3]. We start with two approaches: model-based partitioning (MOB) [4,5,6] as an ML approach and use LASSO regression [2] with treatment interactions as a comparator. In both approaches ABC is investigated for correcting selection bias.
Planning early-phase clinical trials in oncology: A comprehensive simulation approach for Response, Progression-Free Survival, and Overall Survival
Udeerna Ippagunta1,2, Heiko Götte1
1Merck Healthcare KGaA, Germany; 2Universität Regensburg, Germany
In oncology, clinical trials rely on endpoints such as Objective Response, Progression-Free Survival (PFS), and Overall Survival (OS) to assess therapeutic efficacy [1]. These endpoints are interdependent; for instance, a patient must be alive and progression-free to qualify as a responder. Their evaluation varies across follow-up times, with response typically assessed before PFS and OS within a trial. Ignoring these dependencies may affect decision-making accuracy.
To address this, we aim to model these endpoints simultaneously to support early-phase clinical trials and reduce late-phase failures. After considering various modeling options, we chose a multi-state model framework [2]. Historically, such models have been applied in retrospective analyses and predictions [3-6], whereas for trial planning they have generally been limited to late-phase settings with a focus on PFS and OS, often excluding response as it is not a primary endpoint in confirmatory trials [7-9].
To bridge this gap, we developed a multi-state model encompassing states such as “stable disease,” “response,” “progression,” “death,” “no further radiological assessments,” and “terminal drop-out” to evaluate it for early-phase trial planning. Physicians are more accustomed to assumptions on response rates, median PFS, and OS times rather than transition hazards, which are rarely published. Thus, we derive calculations for OS and PFS curves and response probabilities under constant transition hazards, facilitating alignment with physician assumptions. This approach allows rapid evaluation of parameter constellations, enabling effective planning before more complex simulation studies are initiated.
Such a more complex simulation study also incorporates practice-relevant aspects like patient recruitment, assessment schedules and varying analyses time points. We compare our multi-state simulation set-up with separate trial planning for each endpoint and evaluate the operational characteristics in terms of correct/ false go/stop probabilities.
Preliminary observations suggest that integrating endpoints through simulation enhances early-phase decision-making, reduces costly trial failures, optimizes resources, and improves oncology drug development success rates.
[1] A. Delgado and A. K. Guddati, American journal of cancer research 11, 1121 (2021).
[2] Putter et al. Stat Med. 2007 May 20;26(11):2389-430.
[3] de Wreede LC, et al.Comput Methods Programs Biomed. 2010 Sep;99(3):261-74.
[4] Beyer U, et al., Biometrical Journal 62, 550 (2020).
[5] Kunzmann K (2023). https://github.com/Boehringer-Ingelheim/oncomsm
[6] Krishnan SM, et al. CPT Pharmacometrics Syst Pharmacol. 2021;10:1255–1266.
[7] Xia F, et al. Stat Biopharm Res. 2016;8(1):12-21.
[8] Erdmann A, et al. arXiv:2301.10059v2
[9] Ristl R, et al. Pharm Stat 2021; 20: 129–145.
CORALE project: Cumulative lifetime multi-exposures to ionising radiation and other risk factors and associations with chronic diseases in the CONSTANCES cohort
Justine Sauce1, Sophie Ancelet1, Corinne Mandin1, Philippe Renaud2, Jean-Michel Métivier2, Abdulhamid Chaikh1, Eric Blanchardon1, David Broggio1, Claire Gréau1, Caroline Vignaud1, Marie-Odile Bernier1, Enora Cléro1, Marcel Goldberg3, Christelle Huet1, Stéphane Le Got3, Emeline Lequy-Flahault3, Aurélie Isambert1, Afi Mawulawoe Sylvie Henyoh1, Géraldine Ielsh1, Célian Michel1, Choisie Mukakalisa1, Mireille Cœuret-Pellicer3, Hervé Roy1, Céline Ribet3, Lionel Saey1, Marie Zins2, Olivier Laurent1
1Institut de Radioprotection et de Sûreté Nucléaire (IRSN), PSE-SANTE/SESANE/LEPID, PSE-SANTE/SER/UEM, PSE-ENV/SERPEN/BERAP, PSE-SANTE/SDOS/LEDI, PSE-ENV/SERPEN/BERAD, PSE-SANTE/SDOS/LDRIF, PSE-SANTE/SER/BASEP, F-92260 Fontenay-aux-Roses, France; 2Institut de Radioprotection et de Sûreté Nucléaire (IRSN), PSE-ENV, PSE-ENV/SERPEN/LEREN, F-13115 Saint-Paul-Lez-Durance, France; 3Université Paris Cité, Université Paris-Saclay, UVSQ, Inserm, Cohortes Epidémiologiques en population, UMS 11, Villejuif, France
Introduction: The exposome is a concept encompassing the totality of human exposures from conception onwards. As a first application of this concept to ionising radiation (IR), the CORALE project aims to reconstruct lifetime IR exposures and analyze their associations with chronic diseases in the French general population CONSTANCES cohort. After estimating IR doses to six organs (brain, breast, thyroid, lungs, colon, prostate) from medical and environmental exposures since birth (1941 for the oldest), we focus on evaluating the combined effects of IR exposures and other factors such as smoking and alcohol, on the risks of cancers and cardiovascular diseases.
Methods: A questionnaire was sent to 76,693 volunteers from the CONSTANCES cohort to collect data on several exposures to IR. Medical exposures were estimated by combining data from the questionnaire and the National Health Data System (SNDS), with a thorough review of the literature on French medical imaging practices. Environmental exposures related to geography were reconstructed using residential histories linked with radioactivity databases, while those based on lifestyle were estimated using exposure frequency and age. To assess the combined effects of these cumulated IR exposures treated as time-dependent variables, along with other risk factors, we will apply advanced survival statistical models, including multivariate regression and Bayesian approaches to identify specific exposure profiles.
Results: We estimated lifetime exposure to IR for 30,964 respondents. The mean age at the end of follow-up in 2020 is 55, with a standard deviation of 13 years. We assessed doses, considering year and/or age at exposure and sex when applicable, from several medical procedures, including diagnostic nuclear medicine and CT scans, mammograms, panoramic dental radiographs and chest X-rays; and from environmental factors, including radon, telluric and cosmic radiation (ground-level cosmic rays and those received during air travel), fallout from Chernobyl accident and atmospheric nuclear tests, and seafood consumption. The average lifetime dose to the colon (often used to study association with solid cancers) is estimated to be 90 mSv. Between 5% and 95% of the population received dose below 27 mSv and 198 mSv, respectively. We identified that 37.4% of respondents consumed alcohol regularly, while 48.7% had a history of smoking.
Discussion: Multi-exposure analyses are essential for understanding complex interactions between environmental and behavioral factors and their combined impact on health. We are developing models to explore the associations of these co-exposures with chronic disease incidence. Further extension to other risk factors, such as chemical, is underway.
Modeling the early-redemption of fixed interest rate mortgages: a survival analysis approach
Leonardo Perotti1,2, Lech A Grzelak1,2, Cornelis W Oosterlee1
1Utrecht University, Netherlands; 2Rabobank, Netherlands
Accurately modeling the early redemption of fixed-rate mortgages is essential for ensuring fair pricing and effective risk management for financial institutions. Early redemption, or prepayment, often arises from behavioral factors, with relocation to a new house being the predominant driver in the Dutch mortgage market. We address this phenomenon by employing survival analysis techniques to map observable drivers to the distribution of relocation timing. Leveraging a rich dataset of over one million mortgages spanning the past decade, we calibrate our model to capture the underlying dynamics. Furthermore, we analyze the impact of stochastic drivers on the "cost of prepayment" and propose strategies to mitigate this risk. Our findings provide valuable insights into prepayment modeling and offer actionable recommendations for risk management in mortgage portfolios.
Bootstrapping LASSO-type estimators in Cox Frailty Models
Lena Schemet
Universität Augsburg, Germany
The principle of Occam’s Razor states that among several plausible explanations for a phenomenon, the simplest is best. Applied to regression analysis, this implies that the smallest model that fits the data is best. Therefore, in terms of analyzing high-dimensional time-to-event data, variable selection techniques are required, if we want to follow the principle of Occam's Razor. One way to achieve variable selection is the use of LASSO-regularization. However, LASSO-regularization does not allow for easy derivation of confidence intervals or p-values for the estimated coefficients, especially in complicated settings like the Cox Frailty Model. We propose bootstrap-based methods to derive confidence intervals for regularized Cox Frailty Models with time-varying effects, more specifically for Hohberg and Groll’s (2024) Coxlasso model. We simulate high-dimensional time-to-event data with random effects and time-varying effects to validate the bootstrap approach empirically. The methods are illustrated using a real data example. Future research directions include the theoretical proof of the empirical research.
A Nonparametric Bayesian Approach for High-Dimensional Causal Effect Estimation in Survival Analysis
Tijn Jacobs1, Wessel van Wieringen1,2, Stéphanie van der Pas1
1Vrije Universiteit Amsterdam, Netherlands, The; 2Amsterdam UMC, Netherlands, The
Estimation of causal effects on survival in the presence of many confounders is hampered by the high-dimensionality of the data. Published work often addresses either high-dimensionality or survival data, but rarely both. We contribute to this gap by studying high-dimensional causal inference methods that account for confounding bias and the complexities introduced by censored observations.
We present a nonparametric Bayesian method to estimate causal effects in high-dimensional, right-censored, and interval-censored survival data. Our approach employs a Bayesian ensemble of Additive Regression Trees (BART) with global-local shrinkage priors on the leaf node parameters (i.e., step heights). This contrasts with existing methods that induce sparsity through the tree skeletons. By focusing shrinkage on the leaf parameters, our method retains all covariates in the model. This reduces the risk of omitting relevant confounders and ensures compliance with the unconfoundedness assumption. The ensemble of Bayesian trees also captures complex, nonlinear relationships between covariates, treatment, and survival outcomes.
We use an efficient Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm to sample from the posterior distribution of both the regression trees and the causal treatment effect. Our framework supports a wide range of global-local shrinkage priors. We demonstrate the performance of our method using the Horseshoe prior, which adapts to various levels of sparsity. The general implementation accommodates any scale mixture prior, providing a fast and flexible computational approach for high-dimensional data.
We evaluate our method across a diverse set of simulated data settings, both sparse and dense. This evaluation showcases the method's robustness and flexibility. In sparse settings, the method effectively identifies confounders, while in dense settings, it captures intricate interactions between covariates, treatment, and outcomes. We demonstrate the practical utility of our approach on real-life data from lung cancer patients.
Comparison of the Prognostic Performance of Machine Learning Algorithms on Gene Expression Data in Acute Myeloid Leukemia
Adriana Blanda1, Sara Pizzamiglio1, Sabina Sangaletti2, Paolo Verderio1
1Unit of Bioinformatics and Biostatistics, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy; 2Molecular Immunology Unit, Department of Experimental Oncology, Fondazione IRCCS Istituto Nazionale Tumori, Milan, Italy.
BACKGROUND:
Acute Myeloid Leukaemia (AML) is a heterogeneous hematologic malignancy characterized by the clonal expansion of abnormal myeloid progenitor cells. Despite advances in treatment, the prognosis for patients remains variable with an estimated 5-year overall survival (OS) of approximately 30%. In recent years, gene expression analysis has emerged as a powerful tool for identifying novel biological markers that can aid in predicting patient outcomes. The main objective is to assess and compare the prognostic performance of different Machine Learning (ML) algorithms.
METHODS:
This study utilized a dataset comprising gene expression profiles from approximately 20,000 genes in 457 AML patients aged under 60, generated through RNA sequencing. The used dataset is available on the Gene Expression Omnibus platform (GSE6891). The outcome variable is OS defined a time from diagnosis to death. The median follow-up time is 44 months range [18-68] with a total of 290 events. The dataset was splitted in training (70%) and testing (30%) sets. To begin filtering, in the training set, a univariate Cox proportional hazards model was performed, followed by p-value adjustment using the False Discovery Rate (FDR) method (P<0.05). As feature selection methods, Lasso, Elastic-net, Forward and Backward approaches were applied. Subsequently, supervised algorithms such as Survival Random Forest, mBoost, Support Vector Machine and, Bayes Cox Models were implemented to improve the performance of models thanks to cross-validation and parameter tuning. The Harrel's Concordance index and its confidence interval was used to compare the prognostic performance.
RESULTS:
The candidate genes based on FDR-adjusted p-values were 83. After feature selection, the best model in terms of Harrell's C-index was a model with 7 genes, obtained by backward selection procedure C:0.68 [95% CI: 0.56–0.78]. The algorithm with the best performance was mBoost, achieving a C-index of 0.68 [95% CI: 0.55–0.79], followed by the Survival Random Forest model with a C-index of 0.65 [95% CI: 0.51–0.76]. In contrast, both the SVM and Bayes Cox models had lower C-indices of 0.56, with CIs of 0.42–0.68 and 0.48–0.52, respectively.
CONCLUSION:
These results showed that the mBoost algorithm was the best for this dataset. To obtain more robust and generalizable results, we plan to evaluate the prognostic performance of the signature on an external dataset. Furthermore, the present analysis illustrates a process that integrates standard statistical method for selecting features, combined with ML algorithms to build the final model.
Double-truncated and censored corporate lifetimes: Likelihood and identification.
Fiete Sieg
University Rostock, Germany
We consider a parametric model for corporate lifetimes with a double truncation and censoring structure. The goal is to estimate the parameter of an exponential distribution, which is truncated by a uniform distribution and predominantly subject to censored observations. A maximum likelihood estimator is derived from a point process-theoretic approximation to the underlying truncation process, and we demonstrate both its consistency and asymptotic normality. The developed estimation method is applied to a set of German corporate data, and its behavior is studied through Monte Carlo simulations.
Enhancing Healthcare Understanding from Clinical Routine Data by Simplifying the Representation of Treatment Pathways
Maryam Farhadizadeh1, Zoe Lange2, August Sigle3, Holger Dette2, Harald Binder4, Nadine Binder1
14 Institute of General Practice/Family Medicine, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany; 2Department of Mathematics, Ruhr University Bochum, Bochum, Germany; 3Department of Urology, Faculty of Medicine, Medical Center, University of Freiburg, Freiburg, Germany Center, University of Freiburg, Freiburg, Germany; 4Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
Background:
In clinical routine care data, patients' pathways often exhibit heterogeneity, encompassing the order in which a wide variety of interventions occur in addition to varying sequence length. This presents a challenge in understanding and managing healthcare paths effectively, e.g., when using multistate models, which can only deal with a limited number of states. Therefore, there's a critical need to identify and group similar pathways within this diverse landscape [1, 2].
Methods:
To identify typical treatment pathways for inpatient stays, we have developed an algorithmic solution that uses coded clinical data, i.e., diagnoses and procedures. This approach helps to visually represent the data in treatment path diagrams. Specifically, the algorithm detects important procedures by calculating their importance based on patient counts and node significance, comparing them to predefined thresholds. Additionally, the algorithm groups less important procedures to put the focus on essential components of these pathways.
Results:
We demonstrate our algorithm using clinical routine data from prostate cancer patients receiving radical prostatectomy in the Department of Urology at the Medical Center, University of Freiburg. To further explore the efficacy of our approach in simplifying the representation of treatment pathways, we evaluate it through an extensive simulation study. This involves varying pathway similarities, i.e., different levels of heterogeneity of interventions, and sequence lengths. We find that a representation with a manageable number of typical pathways can be obtained, and characterize the sensitivity with respect to different tuning parameters of our algorithm.
Conclusion:
This method simplifies the identification and visualization of common healthcare trajectories, aiding in understanding patient paths and informing healthcare decisions. Improving our understanding of these pathways can elevate clinical care standards and enhance health outcomes, in particular by enabling subsequent analysis with multistate models and statistical tests.
- Binder, Möllenhoff, Sigle, Dette. Similarity of competing risks models with constant intensities in an application to clinical healthcare pathways involving prostate cancer surgery. StatMed 2022
- Möllenhoff, Binder, Dette. Testing similarity of parametric competing risks models for identifying potentially similar pathways in healthcare. arXiv:2401.04490v1, 2024
Improving Cox Regression Estimates by Using the Stochastic Approximation Expectation-Maximization Algorithm to Handle Missing Data
Eliz Peyraud1,2, Julien Jacques1, Guillaume Metzler1
1Université Lumière Lyon 2, France; 2Institut Georges Lopez, France
Introduction:
Missing data is a common issue in survival analysis, where traditional approaches to handle missing values, such as single imputation, can often lead to biased results and causing model parameters to tend toward zero. This study explores the Stochastic Approximation Expectation-Maximization (SAEM) algorithm as a more robust alternative, especially for Cox regression. By comparing SAEM to standard imputation techniques, we evaluate its ability to improve parameter estimation across varying levels of missing data.
Methodology:
This study adapts the SAEM algorithm to handle missing data in Cox regression, offering a robust approach for incomplete datasets. We assume that data are Missing At Random and that the covariates follow a normal distribution. In our approach, the Cox model includes both observed and missing covariates, with the SAEM algorithm iterating to refine parameter estimates. Rather than the traditional expectation step, SAEM uses a simulation-based approach with Metropolis-Hastings algorithm to generate realistic values for missing data. Each simulated value is evaluated to ensure it matches the conditional distribution of the observed data. Through the stochastic approximation process, SAEM then updates the parameter estimates and log-likelihood with each iteration. The final step updates these parameters, and the algorithm cycle through simulation, approximation, and maximization steps until the estimates stabilize.
Simulation Study:
In the simulation study, we generated survival data using a Cox proportional hazards model with five covariates, introducing missing data at varying levels (2%, 5%, 10%, 20%) across the covariates. These levels allowed us to observe how the accuracy of regression coefficient estimates changed as the amount of missing data increased. We compare our method with two well-known imputation methods (Single Imputation by mean, Multiple Imputation by Chained Equation) based on the estimation quality of the model parameters. We show that the proposed method gives a more accurate parameters estimation up to a high rate of missing data.
Perspectives:
Future perspectives for this research include testing more complex approaches for the SAEM algorithm, for example by using different assumptions about data. The current approach assumes that there is no correlation between the covariates and we believe that we can increase both performances and stability of the estimation by taking the link between the variables into account. Additionally, applying SAEM to real-world data, particularly liver transplant datasets, where missing covariates are common, would help improve the performance of existing models by better handling missing data while preserving model accuracy and interpretability in clinical settings.
Life expectancies and blood-based biomarkers for Alzheimer’s disease in primary care
Luca Kleineidam1,2, Pamela Martino-Adami3, Selcuk Oezdemir2, Michael Wagner1,2, Alfredo Ramirez2,3, Anja Schneider1,2, for the AgeCoDe study group1
1Department of Old Age Psychiatry and Cognitive Disorders, University Hospital Bonn, Germany; 2German Center for Neurodegenerative Diseases (DZNE), Germany; 3Division of Neurogenetics and Molecular Psychiatry Department of Psychiatry, University of Cologne, Medical Faculty, Germany
The timely detection of Alzheimer’s disease and the prevention of dementia are public health challenges. Measuring plasma pTau217 in primary care holds promise for successful early detection, but an easily interpretable illustration of its prognostic value for dementia incidence is lacking.
Therefore, we estimated remaining life expectancies (LEs) with and without dementia depending on age, sex, and plasma pTau217 levels (low, intermediate, or high) in the AgeCoDe cohort, including dementia-free primary care patients collected randomly from general practitioner registries in six German cities. We used 1451 individuals with available pTau217 data (mean age=83.7 years). The incidence of dementia and death was assessed during up to eight years of follow-up.
We fitted continuous-time multi-state Markov models (R-package msm) modeling transitions from the healthy state to dementia and from both these states to death. We used age as the time scale. Since dementia and death rates grow exponentially with age, we included age as a time-varying covariate, resulting in a definition of hazards of a Gompertz distribution. Sex and pTau217 were modeled as additional covariates. LEs were estimated using the R-package elect.
As expected, higher age and male sex were associated with shorter LEs with and without dementia. Point estimates for total LEs closely resembled LEs from national registry data.
Intermediate (HR[95%-CI]=3.72 [2.62-5.30]) and high (HR[95%-CI]=6.26 [4.63-8.47]) levels of pTau217 were associated with a higher dementia risk compared to low levels, beyond age and sex, which was reflected in estimated LEs with and without dementia.
Across all ages (80-95 years) and sexes, LEs without dementia were the highest in individuals with low pTau217 levels (Est[95%-CI]=2.4 [2.1-3.1] to 10.7 [10.1-11.2]), lower in individuals with intermediate levels (1.9 [1.5-2.4] to 7.4 [6.8-8.2] years) and the lowest in individuals with high levels (1.6 [1.4-1.8] to 5.9 [5.4-6.4]). LEs with dementia were very low in individuals with low pTau217 levels (0.1 [0.1-0.2] to 0.7 [0.6-1.0] years, 4% [2%-5%] to 9% [5%-14%] of remaining lifetime). However, LEs with dementia were considerably higher in individuals with intermediate levels (0.5 [0.3-0.8] to 2.4 [1.9-3.0] years, 15% [12%-21%] to 33% [25%-41%] of remaining lifetime) and the highest in individuals with high pTau217 (0.7 [0.5-0.9] to 3.1 [2.7-3.6] years, 23% [19%-28%] to 45% [36%-56%] of remaining lifetime).
In conclusion, plasma pTau217 is related to marked differences in expected years lived with and without dementia. Estimating LEs provides an intuitively understandable metric for illustrating the influence of risk factors.
Propagator Methods for Survival Analysis
Julian Schlecker1,2,3, Ina Kurth1,3,4, Wahyu Wijaya Hadiwikarta1,3,4
1Deutsches Krebsforschungszentrum Heidelberg, Germany; 2Department of Physics and Astronomy, Heidelberg University, Germany; 3National Center for Radiation Research in Oncology (NCRO), Heidelberg Institute for Radiation Oncology (HIRO), Heidelberg, Germany; 4German Cancer Consortium (DKTK), Core Center Heidelberg, Heidelberg, Germany
To address a number of shortcomings of conventional survival analysis methods such as Kaplan-Meier Curves and Cox Regression techniques regarding assumptions and requirements imposed on the survival function and its covariate dependence, we introduce a novel survival model, inspired by and as a synthesis of multistate Markov Chain and diffusive Stochastic Process models. By conceptualising the pa- tient as a particle moving stochastically within a compound state-space of discrete categorical and continuous numerical variables that represent the disease state at any given time, we apply methods from Quantum and Statistical Physics to describe the patient’s trajectory in this space. This model introduces randomness as a conceptual result of the dynamics of obscured variables, offering distinct interpretability. We treat this perceived randomness due to lack of information with a transition probability function, the propagator, which assigns probability to every possible progression of states given our knowledge. The Regression parameters in this picture have direct meaning in terms of generalised deterministic and random forces acting on the patient. Moreover, assumptions are made at a patient-near level, enhancing clarity in what is required for applicability. Furthermore, this new model provides a more detailed interpretation of the shape of survival curves, naturally considering interactions between parameters. In the presented model, the survival curve is a resultant quantity of the description, in contrast to methods assuming its shape or dependence on covariates a priori. With the propagator, we describe the statistical evolution of patient status, allocating space in the description to the combined effects of both observables and "unobserv- ables". In the end, we aim to aid treatment strategy decisions in application of this survival model. We present the conceptual framework and construction of this model, em- phasizing how it deals with perceived randomness.
Machine Learning for Survival Analysis: Predicting Time-to-Event Through Decomposition
Lubomír Štěpánek1,2
1Institute of Biophysics and Informatics, First Faculty of Medicine, Charles University, Prague, Czech Republic; 2Department of Statistics and Probability, Faculty of Informatics and Statistics, Prague University of Economics and Business, Prague, Czech Republic
Background/Introduction: Predicting the duration of event-free intervals using multiple covariates is a common task in survival analysis, widely applied in biostatistics. Traditional methods, such as the Cox proportional hazards model and its variants, dominate this field. However, their applicability is constrained by strict statistical assumptions, which may only sometimes hold. Methods: This study addresses the limitations of traditional approaches by introducing a machine-learning framework for predicting event-free intervals through a novel decomposition of the time-to-event variable. This decomposition reformulates the time-to-event variable into two distinct components: (1) a binary variable indicating whether the event occurred, and (2) a continuous variable representing the timing of the event, if it occurred, or the censoring time otherwise. This separation enables the problem to be approached as a supervised learning task: classification for event occurrence, described using the event component, while the time of the event., i.e., the time component, is used as a predictor. A range of machine-learning classification algorithms can be employed for this task, including logistic regression, support vector machines, decision trees, random forests, naïve Bayes classifiers, and neural networks. By leveraging this decomposition, machine-learning models can be applied with fewer assumptions compared to traditional survival analysis methods. Results: The proposed method was tested using a COVID-19 dataset comprising multiple explanatory covariates and a response variable representing the time until COVID-19 antibody levels dropped below a laboratory-defined cut-off. Machine-learning models for classification were constructed to predict event occurrences at expected time points. Additionally, the Cox proportional hazards model was used to estimate the same intervals for comparison. Performance metrics indicated similar predictive accuracy between the machine-learning approach and the Cox model, demonstrating that the proposed decomposition-based framework is a viable alternative. Conclusion: The decomposition of the time-to-event variable into event and time components allows machine-learning models to perform survival analysis with fewer statistical assumptions. When applied to a COVID-19 dataset, the machine-learning approach achieved performance comparable to that of the Cox model. This novel method offers an assumption-minimal alternative for predicting event occurrences, broadening the toolbox available for survival analysis.
A Machine Learning Approach for Comparing Multiple Survival Curves: Random Forests with Reduced Assumption Dependency
Lubomír Štěpánek1,2
1Institute of Biophysics and Informatics, First Faculty of Medicine, Charles University, Prague, Czech Republic; 2Department of Statistics and Probability, Faculty of Informatics and Statistics, Prague University of Economics and Business, Prague, Czech Republic
Background/Introduction: Traditional methods for comparing survival curves, such as the log-rank test and Cox proportional hazards models, are widely used in survival analysis. However, these methods rely on strict statistical assumptions, which, if violated, can lead to biased results. Addressing this limitation is critical for improving the robustness of survival analysis. Methods: This study introduces a novel method for comparing multiple survival curves using a random forest algorithm that minimizes dependency on assumptions. Unlike conventional random survival forests that require a covariate structure and predictors, our approach utilizes only time-to-event data. The method generates variables using common estimators, such as the Kaplan-Meier estimator, which are constant within groups but individualized through imputed missing values into each individual’s vector of values after an event occurs. The core principle leverages the decision-making capability of random forests, where each tree can classify data into one, two, or more groups based on survival curves. The fraction of trees that classify into multiple groups, thereby rejecting the null hypothesis of no difference in survival curves, correlates with a calculated ϕ-value analogous to a p-value. Using the Poisson-binomial distribution, we demonstrate that the ϕ-value is a consistent and efficient estimator. Additionally, the Le Cam theorem and Chernoff bound are employed to derive an upper bound for the probability that the ϕ-value falls below a predefined threshold α, offering insights into the statistical power of the method. Tree complexity can be managed through pruning, which reduces the risk of first-type error rate but may also decrease statistical power. Results: Simulations involving pairs and triplets of statistically distinct and similar survival curves compared the proposed method with the log-rank test and Cox model. Findings indicate that higher levels of tree pruning reduce the risk of first-type error rate while demonstrating a trade-off with statistical power. Conclusion: This study presents a machine learning-based method for comparing multiple survival curves, providing an almost assumption-free alternative to traditional techniques in survival analysis. The proposed method offers adjustable control over the first-type error rates and demonstrates robustness in varied scenarios, though it may exhibit lower statistical power under certain conditions. This approach represents a possible addition to the toolbox for survival analysis, particularly in scenarios where standard assumptions are challenging to meet.
Principled estimation and prediction with competing risks: a Bayesian nonparametric approach
Claudio Del Sole1, Antonio Lijoi2, Igor Prünster2
1University of Milan - Bicocca, Milan (Italy); 2Bocconi University, Milan (Italy)
Competing risks occur in survival analysis when multiple causes of death are present. They play a prominent role in several domains extending beyond biostatistics to encompass epidemiology, actuarial sciences and reliability theory. In this contribution, a multi-state modeling framework to competing risks is adopted; we introduce a flexible nonparametric prior, specified in terms of hierarchical completely random measures, for the transition rates, inducing dependence among the different sources of risk. The marginal distribution of the data and of a latent random partition, which admits a characterization in terms of a variant of the Chinese restaurant franchise process, is determined. Leveraging these distributional results, we are able to evaluate the predictive probability that a future event is of a specific type (e.g. death from a particular cause), as a function of the time at which the event occurs. The resulting functions, derived on sound principles, represent a major innovation in the literature and are termed prediction curves. In addition, we characterize the posterior distribution of the hierarchical random measures and provide estimates for the survival function, and for the cause–specific incidence and subdistribution functions, conditionally on such latent partition structure. Both a marginal and a conditional sampling algorithms for posterior inference are devised; the performance of the model and the effectiveness of the two algorithms are then assessed by means of a simulation study, in which the proposed model is also compared non-hierarchical counterpart, modelling transition rates independently for each source of risk. Finally, some applications to clinical datasets are presented.
Prediction Stability of Survival Models
Sara Matijevic, Christopher Yau
Big Data Institute, University of Oxford, Oxford, United Kingdom
In clinical settings, survival prediction models are essential for estimating patient outcomes over time, such as time to disease progression or mortality risk. Advancements in statistical and machine learning methods have significantly expanded the number of available survival models. However, many of these models lack the stability required for clinical use, as their outputs heavily depend on the specific development data sample, making predictions highly variable if a different dataset is used. This instability can be detrimental to patient health, as model predictions are crucial for guiding treatment options and supporting personalised medicine. Furthermore, significant prediction instability is likely to undermine clinicians' trust in the model, thereby limiting its adoption in clinical practice.
In this study, we evaluated the stability of six widely used survival models: the Cox Proportional Hazards model, Weibull model, Bayesian Weibull model, Random Survival Forest, DeepSurv, and DeSurv. Using synthetic data and a bootstrapping framework, we first assessed model stability across four levels: consistency of population-level mean predictions, stability in the distribution of predictions, robustness within patient subgroups, and reliability of individual predictions. We then concentrated specifically on the fourth level, individual prediction stability, which is arguably the most clinically relevant. To examine this, we used the SUPPORT dataset and evaluated stability through mean absolute prediction error (MAPE), prediction instability plots, calibration instability plots, and MAPE instability plots.
Our findings indicate that classical statistical models, such as the Cox Proportional Hazards (CPH) and Weibull models, provide more stable predictions than machine learning and deep learning survival models. Specifically, the average MAPE for CPH was 0.0105, significantly lower than DeSurv’s average MAPE of 0.0908, illustrating greater variability in the deep learning model’s predictions. Plots further corroborated these findings, with CPH demonstrating consistently lower prediction and calibration instability. This pattern was also evident in subgroup stability analyses, where CPH maintained lower MAPE values across both balanced and imbalanced subgroups compared to other models.
As the volume of patient data continues to grow, deep learning survival models represent a scalable and efficient approach for analysing large datasets and deriving novel inferences. Stable and reliable models will enable clinicians to make more informed decisions, ultimately improving patient outcomes. More work is needed to improve the prediction stability of deep learning survival models, so they can achieve reliability comparable to their classical statistical counterparts for safe use in clinical settings.
|