Based on work by Cohen, Damen and Devore [1] and Bourrier et al. [2], we propose a framework that highlights the importance of knowing the measurement model $F$ and model class $\mathcal{M}_1$, when solving ill-posed (non-)linear inverse problems, by introducing the concept of kernel size. Previous work has assumed that the problem is injective on the model class $\mathcal{M}_1$ and we obviate the need for this assumption. Thus, it is applicable in Deep Larning (DL) based settings where $\mathcal{M}_1$ can be an arbitrary data set.
$\textbf{Setting and initial main result}$
Let $\mathcal{X}$, $\mathcal{Y}$ and $\mathcal{Z}$ be non-empt sets,
$\mathcal{M}_1 \subset \mathcal{X}$ be the $\textit{model class}$, $\mathcal{E}\subset \mathcal{Z}$ be the $\textit{noise class}$" and $F \colon \mathcal{M}_1\times \mathcal{E} \to \mathcal{Y}$ be the $\textit{forward map}$. An inverse problem has the form:
$$
\text{Given noisy measurements } y = F(x,e) \text{ of } x \in \mathcal{M}_1\text{ and } e \in \mathcal{E}, \text{ recover } x.\quad (1)
$$
Here $e$ represent the model noise and $x$ the signal (function or vector) we wish to recover or approximate. This also includes linear cases with additive noise, where $\mathcal{Y}=\mathcal{Z}$, $A \colon \mathcal{X} \to \mathcal{Y}$ is a linear map between vector spaces and the forward map is
$$
F(x,e) = Ax+e. \quad (2)
$$
To measure accuracy we equip $\mathcal{X}$, $\mathcal{Y}$ and $\mathcal{Z}$ with metrics $d_{\mathcal{X}}$, $d_{\mathcal{Y}}$ and $d_{\mathcal{Z}}$, such that the induced topology is second countable (it admits a countable base). We assume that the metrics $d_\mathcal{X}$ and $d_\mathcal{Z}$ satisfy the Heine-Borel property, i.e., all closed and bounded sets are compact, and that for every $y \in \mathcal{M}_2^{\mathcal{E}}$ the $\textit{feasible set}$
$$
\pi_1(F^{-1}(y)) = \{x \in \mathcal{M}_1: \exists e\in\mathcal{E} \text{ s.t. } F(x,e)=y\}
$$
is compact. Under these assumptions we define the optimality constant as the smallest error any reconstruction map can achieve, and study both the case of worst-case noise (critical when there is risk of adversarial attacks) and average random noise (more typical in practical applications). We prove that the optimality constant is bounded from below and from above (sharply) by the $\textit{kernel size}$, which is a quantity intrinsic to the inverse problem. For simplicity, we restrict to the best worst-case reconstruction error here. As we consider set-valued reconstruction maps $\varphi\colon \mathcal{M}_2^{\mathcal{E}} \rightrightarrows \mathcal{X}$, we use the Hausdorff distance $d_{\mathcal{X}}^H(\cdot, \cdot)$.
$\textbf{Definition: Optimality constant under worst-case noise}$
The $\textit{optimality constant}$ of $(1)$ is
$$
c_{\mathrm{opt}}(F,\mathcal{M}_1, \mathcal{E}) = \inf_{\varphi\colon \mathcal{M}_2^{\mathcal{E}} \rightrightarrows \mathcal{X}} \sup_{x\in \mathcal{M}_1} \sup_{e \in \mathcal{E}} \ d_{\mathcal{X}}^H(x, \varphi(F(x,e)))
$$
A mapping $\varphi\colon \mathcal{M}_2^{\mathcal{E}} \rightrightarrows \mathcal{X}$ that attains such an infimum is called an $\textit{optimal map}$.
$\textbf{Definition: Kernel size with worst-case noise}$
The kernel size of the problem $(1)$ is
$$
\operatorname{kersize}(F,\mathcal{M}_1, \mathcal{E}) = \sup_{\substack{ (x,e),(x',e')\in\mathcal{M}_1\times\mathcal{E} \text{ s.t. }\\ F(x,e) = F(x',e')} } d_{\mathcal{X}}(x,x').
$$
$\textbf{Theorem: Worst case optimality bounds}$
Under the stated assumptions, the following holds.
$(i)$ We have that
$$
\operatorname{kersize}(F,\mathcal{M}_1, \mathcal{E})/2 \leq c_{\mathrm{opt}}(F,\mathcal{M}_1, \mathcal{E}) \leq \operatorname{kersize}(F,\mathcal{M}_1, \mathcal{E}). \quad (3)
$$
$(ii)$ Moreover, the map
$$
\Psi(y) = \operatorname*{argmin}_{z \in \mathcal{X}}\sup_{(x,e) \in F^{-1}(y)} d_{\mathcal{X}}(x,z) = \operatorname*{argmin}_{z \in \mathcal{X}} d_{\mathcal{X}}^H(z, \pi_1(F^{-1}(y))),
$$
has non-empty, compact values and it is an optimal map.
This illustrates a fundamental limit for the inverse problem $(1)$. Indeed, one would hope to find a solution for $(1)$ whose error is as small as possible. The lower bound in $(3)$ shows that there is a constant intrinsic to the problem -- the kernel size -- such that no reconstruction error can be made smaller than this constant for all possible choices of $x \in \mathcal{M}_1$. Note that the above theorem is extended to the average reconstruction error in our work.
$\textbf{Background and related work}$
Linear inverse problems $(2)$ arise in image reconstruction for scientific, industrial and medical applications [3-7]. Traditional image reconstruction methods are model-based, and they have also been studied in a Bayesian setting [8]. Less studied are non-linear inverse problems, which appear in geophysics [9,10] and in inverse scattering problems [11,12]. Accuracy and error bounds have been studied in [13]. In many cases, today, DL-based methods obtain higher accuracy than traditional methods, and an overview is given in [14-17]. The key point of DL based methods for solving inverse problems in imaging is that given enough data a neural network can be trained to approximate a decoder to solve $(2)$.
[1] A. Cohen, W. Dahmen, R. DeVore. Compressed sensing and best k-term approximation. Journal of the American mathematical
society, 22(1):211–231, 2009.
[2] A. Bourrier, M. E. Davies, T. Peleg, P. Perez, R. Gribonval. Fundamental performance limits for ideal decoders in high-
dimensional linear inverse problems. IEEE Transactions on Information Theory, 60(12):7928–7946, 2014.
[3] C. A. Bouman. Foundations of Computational Imaging: A Model-Based Approach. SIAM, Philadelphia, PA, 2022.
[4] H. H. Barrett, K. J. Myers. Foundations of image science. John Wiley & Sons, 2013.
[5] C. L. Epstein. Introduction to the mathematics of medical imaging. SIAM, 2007.
[6] P. Kuchment. The Radon transform and medical imaging. SIAM, 2013.
[7] F. Natterer, F. Wubbeling. Mathematical methods in image reconstruction. SIAM, 2001.
[8] A. M. Stuart. Inverse problems: A Bayesian perspective. Acta numerica, 19:451–559, 2010.
[9] C. G. Farquharson, D. W. Oldenburg. Non-linear inversion using general measures of data misfit and model structure. Geophysical Journal International, 134(1):213–227, 1998.
[10] C. G. Farquharson, D. W. Oldenburg. A comparison of automatic techniques for estimating the regularization parameter in
non-linear inverse problems. Geophysical Journal International, 156(3):411–425, 2004.
[11] J. L. Mueller, S. Siltanen. Linear and nonlinear inverse problems with practical applications. SIAM, 2012.
[12] M. T. Bevacqua, L. Crocco, L. Di Donato, T. Isernia. An algebraic solution method for nonlinear inverse scattering. IEEE
Transactions on Antennas and Propagation, 63(2):601–610, 2014.
[13] N. Keriven, R. Gribonval. Instance optimal decoding and the restricted isometry property. In Journal of Physics: Conference
Series, IOP Publishing:012002. 2018.
[14] M. T. McCann, M. Unser. Biomedical image reconstruction: From the foundations to deep neural networks. Foundations and
Trends in Signal Processing, 13(3):283–359, 2019.
[15] S. Arridge, P. Maass, O. Oktem, C.-B. Sch onlieb. Solving inverse problems using data-driven models. ¨ Acta Numer., 28:1–174,
2019.
[16] G. Wang, J. C. Ye, K. Mueller, J. A. Fessler. Image reconstruction is a new frontier of machine learning. IEEE Trans. Med.
Imaging, 37(6):1289–1296, 2018.
[17] H. Ben Yedder, B. Cardoen, G. Hamarneh. Deep learning for biomedical image reconstruction: A survey. Artificial intelligence
review, 54(1):215–251, 2021.