11th Applied Inverse Problems Conference

Session

MS07 2: Regularization for Learning from Limited Data: From Theory to Medical Applications

Time:

Friday, 08/Sept/2023:

4:00pm - 6:00pm

Session Chair: Markus Holzleitner
Session Chair: Sergei Pereverzyev
Session Chair: Werner Zellinger

Location: VG1.101

Presentations

Imbalanced data sets in a magnetic resonance imaging case study of preterm neonates: a strategy for identifying informative variables

Sergiy Pereverzyev Jr.

Medical University of Innsbruck, Austria

Background and objective: Variable selection is the process of identifying relevant data characteristics (features, biomarkers) that are predictive of future outcomes. There is an arsenal of methods addressing the variable selection problem, but the available techniques may not work on the so-called imbalanced data sets containing mainly examples of the same outcome. Retrospective clinical data often exhibit such imbalanceness. This is the case for neuroimaging data derived from the magnetic resonance images of prematurely born infants used in attempt to identify prognostic biomarkers of their possible neurodevelopmental delays, which is the main objective of the present study. Methods: The variable selection algorithm used in our study scores the combinations of variables according to the performance of prediction functions involving these variables. The considered functions are constructed by kernel ridge regression with various input variables as regressors. As regression kernels we used universal Gaussian kernels and the kernels adjusted for underlying data manifolds. The prediction functions have been trained using data that were randomly extracted from available clinical data sets. The prediction performance has been measured in terms of area under the Receiver Operating Characteristic Curve, and maximum performance exhibited by prediction functions has been averaged over simulations. The resultant average value is then assigned as the performance index associated with the considered combination of input variables. The variables allowing the largest index value are selected as the informative ones. Results: The proposed variable selection strategy has been applied to two retrospective clinical datasets containing data of preterm infants who received magnetic resonance imaging of the brain at the term equivalent age and at around 12 months corrected age with the developmental evaluation. The first dataset contains data of 94 infants, with 13 of them being later classified as delayed in motor skills. The second set contains data of 95 infants, with 7 of them being later classified as cognitively delayed. The application of the proposed strategy clearly indicates 2 metabolite ratios and 6 diffusion tensor imaging parameters as being predictive of motor outcome, as well as 2 metabolite ratios and 2 diffusion tensor imaging parameters as being predictive of cognitive outcome. Conclusion: The proposed strategy demonstrates its ability to extract the meaningful variables from the imbalanced clinical datasets. The application of the strategy provides independent evidence supporting several previous studies separately suggesting different biomarkers. The application also shows that the predictor involving several informative variables can exhibit better performance than single variable predictors.

On Approximation for Multi-Source Domain Adaptation in the Space of Copulas

Priyanka Roy^1,2, Bernhard Moser², Werner Zellinger³, Susanne Saminger-Platz¹

¹Institute for Mathematical Methods in Medicine and Data Based Modeling, Johannes Kepler University Linz, Linz, Austria; ²Software Competence Center Hagenberg, Hagenberg, Austria; ³Johann Radon Institute for Computational and Applied Mathematics, Austrian Academy of Sciences, Linz, Austria

The set of $d$-copulas $(d \geq 2)$, denoted by $\mathcal{C}_d$ is a compact subspace of $(\Xi(\mathbb{I}^d), d_{\infty})$, the space of all continuous functions with domain $\mathbb{I}^d$; where $\mathbb{I}$ is the unit interval, $d_{\infty}(f_1,f_2)=\underset{u \in \mathbb{I}^d} {\text{sup}}|f_1(\textbf{u})-f_2(\textbf{u})|$ and the function $C:\mathbb{I}^d\to \mathbb{I}$ is a $d$-copula if, and only if, the following conditions hold:

(i) $C(u_1,..,u_d)=0$ whenever $u_j=0$ for at least one index $j\in\{1,...,d\}$,

(ii) when all the arguments of $C$ are equal to $1$, but possibly for the $j$-th one, then $$C(1,..,1,u_j,1,..,1)=u_j$$

(iii) $C$ is $d$-increasing i.e., $\forall~ ]\mathbf{a}, \mathbf{b}] \subseteq \mathbb{I}^d, V_C(]\mathbf{a},\mathbf{b}]):=\underset{{\mathbf{v}} \in \text{ver}(]\mathbf{a},\mathbf{b}])}{\sum}\text{sign}(\mathbf{v})C(\mathbf{v}) \geq 0 $ where $\text{sign}(\mathbf{v})=1$, if $v_j=a_j$ for an even number of indices, and $\text{sign}(\mathbf{v})=-1$, if $ v_j=a_j$ for an odd number of indices.

Note that every copula $C\in \mathcal{C}_d$ induces a $d$-fold stochastic measure $\mu_{C}$ on $(\mathbb{I}^d, \mathcal{B}(\mathbb{I})^d)$ defined on the rectangles $R = ]\mathbf{a}, \mathbf{b}]$ contained in $\mathbb{I}^d$, by $$\mu_{C}(R):=V_{C}(]\mathbf{a}, \mathbf{b}]).$$

We will focus on specific copulas whose support is possibly a fractal set and discuss the uniform convergence of empirical copulas induced by orbits of the so-called chaos game (a Markov process induced by transformation matrices $\mathcal{T}$, compare [4]). We aim at learning, i.e., approximating an unknown function $f$ (see also [5]), from random samples based on the examples of patterns, namely the so-called chaos game. Further details on copulas can be found in the monographs [1,2,3].

In this talk, we will first investigate the problem of learning in a relevant function space for an individual domain with the chaos game representation. Within this framework, we further formulate the problem of domain adaptation with multiple sources [6], where we discuss the method of aggregating the already obtained approximated functions in each domain to derive a function with a small error with respect to the target domain.

Acknowledgement:

This research was carried out under the Austrian COMET program (project S3AI with FFG no. 872172, www.S3AI.at, at SCCH, www.scch.at), which is funded by the Austrian ministries BMK, BMDW, and the province of Upper Austria.

[1] F. Durante, C. Sempi. Principles of copula theory. CRC Press, 2016.

[2] R. B. Nelsen. An introduction to copulas. Springer Series in Statistics. Springer, second edition, 2006.

[3] C. Alsina, M. J. Frank, B. Schweizer. Associative functions. Triangular norms and copulas.World Scientific Publishing Co. Pte. Ltd.,2006.

[4] W. Trutschnig, J.F. Sanchez. Copulas with continuous, strictly increasing singular conditional distribution functions. J. Math. Anal. Appl. 410(2): 1014–1027, 2014.

[5] F. Cucker, S. Smale. On the mathematical foundations of learning. Bull. Amer. Math. Soc. (N.S.) 39(1): 1–49, 2002.

[6] Y. Mansour, M. Mohri, A. Rostamizadeh. Domain adaptation with multiple sources. Advances in neural information processing systems 21, 2008.

Learning segmentation on unlabeled MRI data using labeled CT data

Leon Frischauf

University of Vienna, Austria

The goal of supervised learning is that of deducing a classifier from a given labeled data set. In several concrete applications, such as medical imagery, one however often operates in the setup of domain adaptation. Here, a classifier is learnt from a source labeled data set and generalised to a target unlabeled data set, with the two data sets moreover belonging to different domains (e.g. different patients, different machine setups etc.).

In our work, we use the SIFA framework [1] as a basis for medical image segmentation for a cross-modality adaptation between MRI and CT images. We have combined the SIFA algorithm with linear aggregation as well as importance-weighted validation of those trained models to remove the arbitrariness in the choice of parameters.

This presentation shall give an overview of domain adaptation and show the latest version of our experiments.

[1] C. Chen, Q. Dou, H. Chen, J. Qin, P. Heng. Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation. IEEE Transactions on Medical Imaging 39: 2494-2505, 2020.

Parameter choice in distance-regularized domain adaptation

Werner Zellinger, Sergei V. Pereverzyev

Austrian Academy of Sciences, Austria

We address the unsolved algorithm design problem of choosing a justified regularization parameter in unsupervised domain adaptation, the problem of learning from unlabeled data using labeled data from a different distribution. Our approach starts with the observation that the widely-used method of minimizing the source error, penalized by a distance measure between source and target feature representations, shares characteristics with penalized regularization methods. This observation allows us to extend Lepskii’s balancing principle, and it’s related error bound, to unsupervised domain adaptation. This talk is partially based on [1].

[1] W. Zellinger, N. Shepeleva, M.-C. Dinu, H. Eghbal-zadeh, H. D. Nguyen, B. Nessler, S. V. Pereverzyev, B. Moser. The balancing principle for parameter choice in distance-regularized domain adaptation. Advances in Neural Information Processing Systems (NeurIPS) 34, 2021.

Conference Agenda