The present application claims priority to European Patent Application No. 18191494.6, filed Aug. 29, 2018, the disclosure of which is hereby incorporated by reference herein in its entirety.
The present invention pertains to a method of sampling of signals based on specially learned sampling operators and decoders as well as to a device implementing such a sampling method, both the method and the device being adapted to be used for reconstruction of a signal of interest from a sampled signal and/or for decision making based on the sampled signal.
In general, the present invention concerns the problem of recovery of a signal of interest x based on an observation signal y, which can be seen as a reconstruction problem. Besides, there are several other problems related to this context that might include the classification, recognition, identification or authentication of said signal of interest x based on the observation signal y. In the following, these latter problems will be referred to as classification problems.
In this context, it is known in the prior art to use data acquisition systems that are based on signal sampling in some transform domain Ψ∈N×N according to a model that can be described by equation
y=Q(PΩΨx+z), (1)
where y∈n denotes the observation signal, x∈N or x∈N denotes the signal of interest, z∈n denotes the observation noise and n and N denote the dimensionalities of observed and original signals, respectively. The sets and denote sets of complex and real numbers, respectively. PΩ:N→n denotes a sampling operator and Q:n→n denotes quantizing operator, where represents a set of quantized signals. It is often assumed that Ψ Norms an orthogonal basis that might represent Fourier, Discrete Cosine, Hadamard, wavelet, etc. matrices. However, in certain cases, the basis vectors of the transform Ψ might be non-orthogonal. Therefore, the following will proceed with a consideration of the general case of a non-orthogonal basis.
The domains of application of such sampling methods, on the one hand, relate to image reconstruction and cover, but are not limited to, medical imaging such as magnetic resonance imaging (MM), computerized tomography (CT) and ultrasound, non-destructive testing, remote and astronomy observations based on sparse antennas and antenna arrays, radar imaging, as well as CMOS and CCD sensor-based imaging including colour imaging based on mosaicking principles, Fourier optics, or the like. On the other hand, the domains of application also include security applications like, for example, recognition, identification or authentication of objects and/or of persons. In many applications, the selection of the basis is fixed by the nature of measurements. However, the sampling operator PΩ:N→n can be chosen and optimized such as to take into account some technical constraints. Additionally, the measured signal optionally might be quantized by using the above mentioned quantizing operator Q:n→n.
In most cases, on the one hand, the sampling of a band-limited signal x should follow the Shannon-Nyquist principles to ensure preservation of the information and uniqueness of the reconstruction. However, if x possesses some structure or sparsity, the signal of interest x can be reconstructed from the observation signal y even when n<N using special decoding algorithms. On the other hand, the above mentioned classification problems need sufficient statistics for the classifier, but do not require complete signal reconstruction. Instead, one is interested in such a sampling to ensure the reliable classification or distinguishability of different classes of signals. Since the above mentioned reconstruction problem is more strict in terms of requirements, the following will proceed with its consideration and will indicate the particularities of said classification problems only when needed.
In the most popular case, the sampling with a random matrice A, which replaces the operators PΩΨ in equation (1), is implemented by a compressive sampling as considered by S. Foucart and H. Rauhut in the article “A mathematical introduction to compressive sensing”, Birkhauser Basel, 2013, vol. 1, no. 3. However, technological advances, in particular in terms of computational complexity and memory storage of currently available hardware, motivate to use the sampling in the form of equation (1), which can be considered as a structured counterpart of compressive sampling. At the same time, the well developed field of compressive sampling suggests the necessary amount of observation for sampling test signals with special sparsity properties under properly generated matrices A. In the general case, it is assumed that the signal of interest x is sparse with some sparsity S or can be sparsely presented in some transform domain.
From a practical perspective, it is extremely attractive to acquire signals with the smallest possible number of samples. Practically, this means that the sampling operator should be adapted to the data, which can be achieved on-fly using some generic information about the properties of the signals in the transform domain Ψ or by using special training or learning procedures.
The optimal on-fly construction of a sampling operator with an iterative reconstruction was considered in the past, amongst others, by I. Prudyus et al. in the article “Robust image restoration matched with adaptive aperture formation in radar imaging systems with sparse antenna arrays” published in European Signal Processing Conference EUSIPCO 1998, 9 Sep. 1998, pp. 1-4 and in the article “Adaptive aperture formation matched with radiometry image spatial spectrum” by I. Prudyus et al., published in IEEE International Microwave and Radar Conference, Krakow, Poland, 1998. In these disclosures, no training on external training datasets was assumed whereas the process of adaption of the sampling operator was based on the fact that many natural images have dominating energy frequency components located along some spatial frequency directions. A small number of samples Sin the sampling operator PΩ was used to estimate these directions, mainly in the low frequency part of the spectrum, and the remaining budget of sampling components was adapted accordingly to sample along the dominating directions that have most of the energy and information. In this case, the sampling operator was adapted to the properties of each image. One can consider this approach as an analogue of the so-called S-best selection with the given number of sampling components to be S.
The learning approach was considered in the past, amongst others, by L. Baldassarre et al. in the article “Learning-based compressive subsampling,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 4, pp. 809-822, 2016, and by V. Cevher et al. in the article “Learning-based subsampling” as well as in US 2017/0109650, filed on Oct. 19, 2015. These disclosures extend the classical formulation of compressive sensing using the sub-sampling structured matrices of the form of equation (1) with an orthonormal operator Wand a sampling operator PΩ. The main idea behind the proposed extension consists in learning of the sampling operator PΩ on a training set of images to minimize the average reconstruction error as a cost function. It should be pointed out that due to physical imaging constraints there is no freedom in the selection of the operator W, such that the only possible adaptation to data is via the operator PΩ. The solution to the above adaptation problem thus leads to the natural conclusion that the highest sampling rate of the operator PΩ should be concentrated in the region of largest magnitude Fourier transform components similar to the conclusion set out in the above mentioned article “Robust image restoration matched with adaptive aperture formation in radar imaging systems with sparse antenna arrays” by I. Prudyus et al. In turn, the authors suggest that the optimal sampling geometry, i.e. the operator PΩ, computed on average for a set of natural imaging should be in the region of low frequencies possessing the highest magnitude Fourier transform components, i.e. the highest information content. The difference between the proposals of I. Prudyus et al. set out in the above mentioned article “Robust image restoration matched with adaptive aperture formation in radar imaging systems with sparse antenna arrays” and of V. Cevher et al. set out in the above mentioned article “Learning-based subsampling” as well as in US 2017/0109650 therefore mainly consists in the fact that the sampling operator of the former method is adapted on-fly for each image by use of the S-best selection, whereas the latter method obtains an adapted sampling operator on average by use of a training set. Additionally, in contrast to compressive sensing based on non-linear reconstruction algorithms, V. Cevher et al. in said article “Learning-based subsampling” as well as in US 2017/0109650 consider a linear decoder of form {circumflex over (x)}=Ψ*PΩTy, where * denotes the complex conjugate, and also mention the possibility of using non-linear decoders like basis pursuit (BS) allowing representation of the signal of interest x in a sparse form like u=ϕx, where ϕ is a sparsifying operator, mainly orthonormal, allowing reverse reconstruction as {circumflex over (x)}=ϕ*u.
It should be pointed out here that the problem of optimal training of the sampling operator PΩ is generally solved as an optimization problem according to equation
{circumflex over (Ω)}=argminΩ:|Ω|≤s½Σi=1M∥xi−{circumflex over (x)}Ωi∥22, (2)
where {circumflex over (x)}Ωi denotes the ith reconstructed sample for i=1, . . . , M from a training set {xi}i=1M for a particular sampling operator Ω. V. Cevher et al. in said article “Learning-based subsampling” as well as in US 2017/0109650 also mention that other cost functions can be used besides the l2-metric demonstrated in these disclosures. The same authors also consider average- and worst-case strategies.
Although the above mentioned learned structured sampling operators have some obvious advantages over previously used random sampling strategies, these procedures inherently comprise several problems. First, the training set {xi}i=1M is solely used to train the sampling operator PΩ, whereas the corresponding linear or non-linear decoders are “hand-crafted” in nature and, therefore, are not adapted to the data of interest. Second, the manner of obtaining an adapted sampling operator in these above mentioned prior art approaches has also a serious impact on both the optimal sampling strategy in the operator PΩ and the decoding algorithm. For example, it is worth mentioning that said article “Learning-based subsampling” as well as US 2017/0109650 conclude, on the basis of the sampling operators proposed therein, that one should target to sample only largest magnitude components in the transform domain to ensure the optimal reconstruction of test signals, whilst this is questionable in case different operators were used. In particular, modification of the structure of the decoder might lead to different and potentially more optimal sampling principles.
The solutions according to prior art therefore present several inconveniences related to the manner of obtaining an adapted sampling operator as well as to the hand-crafted nature of corresponding decoding algorithms.
Moreover, all of the prior art documents “Robust image restoration matched with adaptive aperture formation in radar imaging systems with sparse antenna arrays” by I. Prudyus et al., “Learning-based compressive subsampling” by L. Baldassarre et al., and “Learning-based subsampling” as well as US 2017/0109650 by V. Cevher et al. only consider signal sampling for the above mentioned reconstruction problem, but do not address signal sampling for the above mentioned classification problems.
In this context, one might finally add that a number of specific techniques used at different stages in signal modeling and training are known in prior art. For example, the technical report “Fast inference in sparse coding algorithms with applications to object recognition” by K. Kavukcuoglu et al., Computational and Biological Learning Lab, Courant Institute, NYU, Tech Report CBLL-TR-2008-12-01, discloses a model training method known as predictive sparse decomposition. The book “Deep Learning” by I. Goodfellow, et. al., The MIT Press, 2016 also, at least partially, concerns this topic. Furthermore, concerning model training, it is known in prior art to use the so-called ADMM and ADAM techniques which are a sort of standard minimizers or optimizers. For example, the ADMM technique is disclosed by S. Boyd et. al. in the article “Distributed optimization and statistical learning via the alternating direction method of multipliers”, Foundations and Trends in Machine Learning, 3(1), pp. 1-122, 2011, and the ADAM technique is disclosed by D. Kingma and J. Ba in the article “ADAM: A method of stochastic optimization”, CoRR, 2014. Prior art also discloses techniques like the encoder-decoder training via the Nash equilibrium which is, for example, discussed by A. Danielyan et. al. in the article “Bm3d frames and variational image deblurring,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1715-1728, April 2012. Furthermore, K. Gregor and Y. LeCun disclose in the article “Learning fast approximations of sparse coding” published in Proceedings of the 27th International Conference on Machine Learning (ICML 2010) by J. Furnkranz and T. Joachims (Eds.), pp. 399-406, 2010, the so-called LISTA implementation. In general, techniques of deep learning such as suggested by D. Ulyanov et al. in the article “Deep image prior”, arXiv, preprint arXiv:1711.10925, 2017, are of course also known in prior art. However, these disclosures mostly only present specific techniques or mathematic models which are adapted to be used, in isolated manner, at a particular stage in the context of signal processing or classification, but do not form itself a signal sampling method that would allow to overcome the above mentioned problems.
In view of the above presentation of prior art approaches, it is an object of the present invention to overcome the above mentioned difficulties and to realize a method of signal sampling which efficiently uses training signals in an optimal manner for creating and adapting sampling operators to reconstruct images depending on the type of image and/or application, i.e. in the context of the above mentioned reconstruction problem. Within the same context, it is a further object of the present invention to avoid use of decoders that are essentially “hand-crafted” in nature and to provide improved decoders to be used in combination with the adapted sampling operators of the present invention. Moreover, it is an object of the present invention to exploit training signals to its full power. It is another object of the present invention to also address signal sampling for the above mentioned classification problems.
To this effect, the present invention proposes a method of signal sampling with learnable priors which is characterized by the features enumerated in claim 1 and which allows to achieve the objectives identified above.
In particular, the method according to the present invention distinguishes by the fact that the method comprises, at the training stage, the step of
According to this first aspect of the present invention, a joint training of the sampling operator and of the decoder is realized, in contrast to prior art sampling methods. This allows to avoid use of “hand-crafted” decoders, to provide improved decoders most adapted to the corresponding sampling operators, as well as to exploit training signals to its full power. In particular, the training data {xi}i=1M isn't solely used for training the sampling operator PΩ as it is done in the above cited prior art disclosures which keep the decoder gθg(.) “hand-crafted” and generally data-independent, but is used for a joint training of the sampling operator PΩ and of the decoder gθg(.). Additionally, the sampling paradigm that emerges from this approach allows to combine the best from the two worlds of optimal sampling and data compression that seems to be extremely important in the era of Big Data when both the dimensionality and volume of data are greatly increasing.
Moreover, an embodiment of the method distinguishes by the fact that the cost function used in the cost minimization step for determining the set of sampling—Ω and decoding parameters θg during the joint training of the sampling operator PΩ and of the decoder gθg(.) is chosen depending on the targeted application. The latter may, for example, consist in reconstruction, classification, recognition, identification and/or authentication.
According to this second aspect of the present invention, the proposed method allows to make a decision about a class of the signal of interest in recognition and identification applications. In fact, in many applications it is important to produce not only an estimate {circumflex over (x)}Ω of the signal of interest x by the decoder gθg(.). To the contrary, many security applications require to distinguish a genuine test signal representing some object from a fake one in order to establish its authenticity, to search for similar signals in big data sets, etc. Without loss of generality, the decoder gθg(.) according to the present invention will produce in all these cases a corresponding decision in a form described by equation
{circumflex over (m)}=g
θg(y), (3)
where m∈{1, . . . , C} with C being the number of classes that can be also encoded into some class label l(m)∈Λd, with d denoting the dimensionality of the label, whereas the sampling operator Pais optimized to ensure the best performance according to the chosen performance criteria such as, for example, minimization of overall error probability Pe
Furthermore, another embodiment of the method distinguishes by the fact that the cost minimization step for determining the set of sampling—Ω and decoding parameters θg during the joint training of the sampling operator PΩ and of the decoder gθg(.) is implemented by an optimization procedure iterated in alternative directions until convergence, the optimization procedure using in a particularly preferred embodiment signal regularization priors Ωx(x) and regularization parameter λx.
According to this third aspect of the present invention, the training procedure of the sampling operator PΩ may be improved by taking into account in more precise manner as compared to prior art the predetermined properties of specific types of signals, for example of the kind of images, to be sampled and reconstructed, respectively classified.
The present invention also concerns computer program means stored in a computer readable medium adapted to implement this method as well as a corresponding device for signal sampling carrying out the above proposed methods.
Other features and advantages of the present invention are mentioned in the dependent claims as well as in the description disclosing in the following, with reference to the figures, the invention in more detail.
The attached drawings exemplarily and schematically illustrate the principles as well as several embodiments of the present invention.
In the following, the invention shall be described in detail with reference to the above mentioned figures.
The present invention relates to a method of sampling of signals based on especially learned sampling operators and decoders as well as to a device implementing such a sampling method. As already mentioned before, the following description, in general, will concentrate on the consideration of the method according to the present invention when used for the above mentioned reconstruction problem, i.e. for image reconstruction and decoding, and will only highlight, respectively exemplify, differences of the method when used for the above mentioned classification problems in the course of the description. Furthermore, as usual in the domain of signal, image and video processing, the term “signal” is considered throughout the description to cover both analog and digital data in one or more dimensions.
According to the present invention, the sampling operator is trained in such a way to ensure the most accurate and discriminative estimation of signal model parameters that in turn lead to accurate and efficient decoding for the reconstruction or classification. Accordingly, the method according to the present invention comprises two stages: A first stage includes joint learning of a sampling operator PΩ and of a generalized decoder gθg(.), which has a set of parameters θg, the first stage being referred to as learning or training stage. A second stage includes testing comprising reconstruction of a signal of interest x from an observed signal y and/or other tasks such as detection, classification, recognition, identification and/or authentication using the previously learned, i.e. trained, sampling operator P{circumflex over (Ω)} and decoder g{circumflex over (θ)}g, the second stage being referred to as sampling or testing stage. In the context of identification and authentication, as part of a general recognition problem, the training stage is also called enrolment and the testing stage is referred to as verification stage.
More specifically, the training procedure generally illustrated in
The present invention proposes, in particular in the context of the above mentioned reconstruction problem, two approaches for the joint training stage of the sampling operator and of the decoder as well as for the testing/sampling stage schematically shown in
According to the inverse problem solution approach to reconstruction, the joint training of the sampling operator and of the decoder comprises two steps. During a first step, the set of sampling parameters Ω is fixed and one tries to find the estimate {circumflex over (x)}(100) of the signal of interest and the decoding parameters θg by solving an optimization problem using hand-crafted or learned properties of the signal of interest x from the set of training signals {xi}i=1M. The optimization is described by equation
{circumflex over (x)}
Ω=argminx½∥y−PΩΨx∥22+λxΩx(x), (4)
where Ωx(x) is a signal regularization prior linked to the properties and priors of the signals to be sampled and reconstructed, λxis a regularization parameter and ∥.∥2denotes the l2-norm. The regularization prior Ωx(x) and the regularization parameter λx aren't used in prior art methods and allow to take into account in improved manner the predetermined properties of specific types of signals, i.e. of the kind of images, to be sampled and reconstructed by introducing corresponding priors which are learned during the training stage of the method according to the present invention. In fact, to illustrate utility of these regularization priors and parameters in practical manner, it is clear that medical images, e.g. images of human kidney obtained by MRI, have different, but predetermined properties than images obtained by astronomy observations or else images of a given kind of objects obtained by CMOS/CCD cameras like simple facial portraits or the like. The predetermined properties of the corresponding images are reflected in the method according to the present invention in different, previously trained regularization priors Ωx(x) and parameters λx for each type of images.
Different strategies for the design of the signal priors Ωx(x) will be considered here below. The resulting estimate {circumflex over (x)}(100) of the signal of interest can be expressed by equation
{circumflex over (x)}
Ω
=g
{circumflex over (θ)}g(y,Ω), (5)
where g{circumflex over (θ)}g(.) is a decoder obtained from equation (1) and denotes the particular set of decoding parameters. The set of sampling parameters Ω is used in the decoder g{circumflex over (θ)}g(.) as an argument to signify the fact that the decoder depends on the set of sampling parameters Ω.
During a second step, using equation (2) and the set of training signals {xi}i=1M consisting of M training samples, one tries to optimize the set of sampling parameters Ω according to equation
{circumflex over (Ω)}=argminΩΣi=1M∥xi−{circumflex over (x)}Ωi∥22+λΩΩΩ(Ω), (6)
where ΩΩ(.) is a prior on desirable properties of the set of sampling parameters Ω combining the geometry, the number of samples, etc.
The optimization procedure iterates the first and second steps till convergence. This procedure is similar to an alternative direction minimization.
After joint training of the sampling operator and of the decoder, the testing/sampling stage of the inverse problem solution approach also has two steps during which the learned sampling operator and decoder are applied to real data, i.e. to a signal, which was not seen before, but which is assumed to follow the same statistics as the training data.
During a first step, by using a given set of particular sampling parameters {circumflex over (Ω)}, one tries to produce the observation signal y by use of equation
y=P{circumflex over (Ω)}Ψx. (7)
During a second step, by using a given observation signal y, the decoder (2) produces an estimate {circumflex over (x)}(100) of the signal of interest using equation
{circumflex over (x)}=gθ(y,{circumflex over (Ω)}). (8)
According to the regression approach to the reconstruction problem, the joint training of the sampling operator and of the decoder may also be formulated as an optimization problem which, in this case, may be described by equation
({circumflex over (Ω)},{circumflex over (θ)}g)=argminΩ,θ
where the decoder is assumed to be factorized as a deep structure of a form described by equation gθg(x)=σDg(WDg . . . σ1g(W1g(x))), with θg=(W1g . . . , WDg) being parameters of a deep network, whilst σkg(.) stands for point-wise non-linearities k=1, . . . , D with D denoting the number of layers; ΩΩ(Ω) denotes the constraints on the geometry and properties of the sampling operator and Ωθg(θg) defines constraints on decoder parameters; γ1 and γ2 are regularization parameters. Equation (9) also includes a factor for additional robustification to possible noise perturbations via additive component zp, p=1, . . . , P following some predefined distribution, where P denotes the number of noisy training examples.
Since the simultaneous minimization is not feasible due to the non-convexity in both Ω and θg, one can proceed by reformulating the problem as an iterative alternative minimization, keeping one of the parameters fixed and optimizing the second one, whilst preferably skipping the noise component for simplicity. Such manner of proceeding also has two steps.
During a first step, the set of sampling parameters Ω is kept fixed and one tries to find the decoding parameters according to equation
{circumflex over (θ)}g=argminθ
During a second step, the decoding parameters {circumflex over (θ)}g are kept fixed and one tries to optimize the set of sampling parameters Ω according to equation
{circumflex over (Ω)}=argminΩΣi=1M½∥xi−{circumflex over (x)}Ωi∥22+λ1ΩΩ(Ω), (11)
similarly to equation (6) in the inverse problem solution approach. The optimization procedure iterates the first and second steps till convergence.
The testing/sampling stage of the regression approach is identical to the one of the inverse problem solution approach and is based on above mentioned equations (7) and (8), however with the difference in the structure of the decoder gθg which is represented in the regression approach in a form of a deep network, such as indicated in equation (9).
For reasons of completeness and of clarity, it is added here that the nomenclature “inverse problem solution approach” and “regression approach” pertains to the fact that, on the one hand, in the first named approach the decoder g{circumflex over (θ)}g(.) is only implicitly constructed as a solution of equation (4), whilst in the second named approach the decoder g{circumflex over (θ)}g(.) is explicitly constructed in equation (9). On the other hand, in the first named approach equation (4) represents a typical signal processing or physics based solution of an inverse problem, this explaining the choice of name for this approach, whilst in the second named approach equation (9) represents a direct joint optimization formulation for both of the sampling operator and of the decoder, also called regression formulation, this being a typical machine learning formulation.
The presence of reliable priors Ω(x) plays a crucial role for the above mentioned reconstruction problem, in particular within the above described inverse problem solution approach, which is mainly due to the, in principle, ill-posed nature of equation (4).
In general, one can distinguish three cases as far as is concerned the construction of a model for the regularization priors Ω(x) in equation (4), namely a first case where no priors are used, a second case using hand-crafted priors, both of which belong to prior art, and a third case using learnable priors and corresponding to a particularly preferred embodiment of the sampling method according to the present invention.
To describe these differences in more detail, it is to be noted that the first case corresponds to the method disclosed in the above mentioned article “Learning-based subsampling” and US 2017/0109650 by V. Cevher et al., such method completely disregarding the above mentioned signal regularization priors, but in fact only solving a least squares problem described by equation
{circumflex over (x)}=argminx½∥y−PΩΨx∥22, (12)
leading to a so-called linear decoder described by equation
x=Ψ*PΩTy, (13)
that can also be considered as a particular case of equation (4) for λx=0.
The second case is based on hand-crafted priors assuming that the signal of interest x might be sparsely presented in some orthogonal basis ϕ like Fourier, DCT, wavelet, Hadamard, etc. One can distinguish two models here.
In a first model using a so-called synthesis or sparse approximation model, one can present the signal of interest x by equation
x=ϕu+e
x, (14)
where ex is a residual error of approximation and u is assumed to be sparse, leading to l1- or l0-“norm” regularizers. The corresponding regularizer can be described by equation
where Ωu(u)is a regularizer which imposes a sparsity on (u) as well as γ and αu are regularization coefficients. This is described for example by M. Aharon et al. in the article “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation”, IEEE Transactions on signal processing, 54, 11, pp. 4311-4322, 2006.
In contrast, a second model known as the analysis model or also known as sparsifying transform, disclosed for example by S. Ravishankar and Y. Bressler in the article “Learning Sparsifying transform”, 61, 5-8, pp. 1072-1086, 2013, assumes that
u=Wx+e
u, (16)
where W is the analysis transform that for the orthonormal case is just W=ϕT and eu represents the noise of approximation. The corresponding regularizer can be described by equation
The solution to equation (4) with priors according to equations (15) or (17) is generally obtained by iterative methods.
The third case corresponds to a particularly preferred embodiment of the sampling method according to the present invention and is based on learnable priors that explore the full power of training with the set of training signals {xi}i=1M. In principle, one can reformulate equation (4) jointly with equations (14) and (16) to learn the basis ϕ and the analysis transform W as a single overcomplete transform or layer.
However, in view of the complexity of this problem and the benefits offered by deep architectures, the following will consider an alternative model. Given a set of training signals {xi}i=1M, training of a model prior factorized via a model encoder-decoder pair such as shown in
where Ωθ
In this context, one can use the ADMM approach mentioned in the introduction and known as such by a person skilled in the art, in combination with efficient optimizers such as ADAM, also mentioned in the introduction and known as such by a person skilled in the art, at each stage.
Alternatively, one can also consider even simpler approaches such as k-means applied to the subbands of the transform domain Ψx to train the encoder-decoder pair. This alternative will be explored further below for reasons of demonstrating the power of the method according to the present invention, even if used in combination with such a simple model as k-means for determining the learnable priors, over the above described first and second cases wherein no priors, respectively only hand-crafted priors are used and wherein the decoder does not use the learnable priors and only the sampling operator is trained.
Once the encoder-decoder pair ({circumflex over (θ)}E, {circumflex over (θ)}D) is trained on the set of training signals {xi}i=1M, one can reformulate equation (4) using the parametrized regularization prior Ωx(x) into cost functions described by the following equations
Applying the Nash equilibrium mentioned in the introduction and known by a person skilled in the art to equations (20) and (21) similarly to leads to a consensus of restrictions imposed by said cost functions hand J1 and J2. Said consensus of restrictions is defined as a fixed point (u*, x*) fulfilling equations
x*=argminxJ1(u*,x*), (22)
u*=argminuJ2(u,x*), (23)
this being similar in spirit to the above mentioned ADMM approach.
One can demonstrate that a solution to equations (22) and (23) is obtained, by skipping here the dependence of the estimate of the signal of interest on the set of sampling parameters Ω, as an iterative decoder of a form described by equation
{circumflex over (x)}
k
=βFy−(1−βFPΩΨ){tilde over (x)}k, (24)
with an estimate of the signal of interest {tilde over (x)}k=φ({circumflex over (x)}k−1, αu) where φ(.) is a non-linear function depending on a set of sampling parameters Ωu(.) and a transform domain F=(Ψ*PΩTPΩΨ)+ΨPΩT, with “+” denoting a pseudo-inverse, that can be further reduced for special forms of the transform domain Ψ.
Moreover, one can also consider, for equation (24), to use the LISTA implementation which is mentioned in the introduction as well as known by a person skilled in the art and which has a form of a deep network such as shown schematically in
In this section, it will be presented how the model training may be implemented in the context of the proposed joint learning of the sampling operator and the decoder by use of a k-means model applied to the subbands of the transform domain Ψx to train the encoder-decoder pair. The procedure again comprises two stages, namely training/learning and testing/sampling.
The training/learning stage may be described by the following procedure:
1. One assumes some complex transform domain Ψ consisting of Ψ=(Ψre,Ψim) that leads to a representation described by equation
Ψx=Ψrex+jΨimx=xre+jxim. (25)
such as Fourier transform.
2. The Fourier spectrum is split into L subbands. The splitting might be overlapping or non-overlapping. The splitting is done such as to ensure that the magnitude of the Fourier transform domain components of each subband
is the same, where Vl is the number of samples in each subband l.
3. Given a set of training signals X=(x1, . . . , xM), the latter is split into L subbands as (ΨX)=((ΨX1), (ΨXL)).
4. Thereafter, two codebooks for the real Cre and imaginary Cim parts are generated using the k-means algorithm which is well known to a person skilled in the art. One can generate the codebooks independently or jointly.
5. The sampling operator training following thereafter consists of two parts that lead to minimization of the overall reconstruction error. The first part aims at minimizing the reconstruction error from the direct observation signal y in the set of sampling parameters Ω. It naturally leads to selection of coefficients with the largest magnitudes in the Fourier spectrum ΨX of the set of training signals. The second part seeks minimizing the error between the set of centroids and the samples in the set of sampling parameters Ω. At the same time, it is important to note that the set of sampling parameters Ω should be chosen in such a way to ensure the maximum distinguishability of codewords in the codebooks. Otherwise, if the distances to several codewords in the sampling points of the set of sampling parameters Ω are very small, the codewords are not distinguishable and any random perturbation might flip them. The result of this procedure will lead to a set of sampling parameters Ω ensuring the most informative and accurate selection of the codewords in the trained codebooks Cre and Cim. The considered trained representation also closely resembles the compressed representation of data in the trained codebooks.
The testing/sampling stage may be described by the following procedure:
Given an observation signal which may be represented as a vector y=PΩΩx=(PΩ1Ψx,PΩ2Ψx, . . . , PΩLΨx) as a subband based sampling, the decoder first finds the closest representatives in the codebooks and Cℑ for each subband l by use of equation
c
re
(Ĵ)=argmin1≤j≤K
c
im
(Ĵ)=argmin1≤j≤K
where Kl denotes the number of codewords in each subband l∈(1, . . . , L).
One can also include the magnitude and phase components to improve the overall system performance, such as described by equation
where |.| and arg(.) denote the magnitude and phase of a complex vector, respectively, ∥.∥2π is a mean angular difference normalized in the range (0,2π), and α1 and α2are Lagrangian multipliers.
The reconstruction is based on the decoder producing an estimate of the signal of interest according to equation
where {.} denotes the concatenation of vectors. The main difference of this decoder as compared to the linear decoder disclosed in the article “Learning-based subsampling” and the document US 2017/0109650 consists in the presence, in the decoder according to the present invention, of injected learnable priors coming from the learned K-means codebooks. Another difference between the decoders is the presence, in the decoder according to the present invention, of multiple subbands optimized for optimal sampling.
In general, the cost function used in the sampling method according to the present invention at the cost minimization step (180) for determining the set of sampling—Ω and decoding parameters θg during the joint training of the sampling operator PΩ(120) and of the decoder gθg(.) (140) is chosen depending on the targeted application. In particular, the cost function may comprise a metric of closeness between two signals of interest x(60) in the direct domain or latent space for recognition of the signal of interest x(60) or may comprise a metric of closeness between the targeted and estimated labels for classification, recognition, authentication, identification or forensic analysis of the signal of interest x(60). Furthermore, the cost function used in the cost minimization step (180) for determining the set of sampling—Ω and decoding parameters θg during the joint training of the sampling operator pΩ(120) and of the decoder gθg(.)(140) may optionally contain an additional constraint on the latent space ensuring rate-efficient representation of model parameters for a joint minimization of the sampling rate in the sampling operator PΩ and of the compression rate in the model latent space for a desired reconstruction distortion used for joint sampling-compression applications. These features are particularly interesting in applications of the present method related to security, like identification or authentication.
For applying the method according to the present invention to the above mentioned classification problems, in particular to classification and/or recognition of the signal of interest x based on the observation signal y, the training stage is formulated as an optimization problem according to equation
({circumflex over (Ω)},{circumflex over (θ)}g)=argminΩ,θ
wherein the decoder gθg(.) is a classifier, i.e. gθg:n→{1, . . . , C} and l(m) is an encoded class label for the class gθgLn→{1, . . . , C}. L(.) denotes a corresponding cost function typically used 11:
in classification problems. For reasons of clarity, it is added here that this description refers to the classification problem as a general problem. The classification problem can be split into two problems, namely if there are only two hypothesis—it is referred to as authentication problem, and if there are several hypothesis—it is referred to as recognition or identification problem. For the sake of completeness, it shall be noted that the terms «recognition» and «identification» are sometimes used with different signification in research communities. In the present description, identification shall signify a classification problem having M+1 hypothesis, i. e. including a reject option in case the probe does not fit to any of the known classes, whereas recognition shall signify a classification problem having M hypothesis and the probe should be classified as one of the known classes without a reject option.
In order to allow for authentication by the method according to the present invention, the training stage is formulated as an optimization problem according to equation
({circumflex over (Ω)},{circumflex over (θ)}g)=argminΩ,θ
wherein the decoder gθg(.) is a binary authentication classifier, i.e. gθg:n→{0,1} and l(m) is an encoded class label. L(.) a gain denotes a corresponding cost function.
Both approaches for classification based on the recognition and on the authentication formulations are shown in
The proposed method can be used in several kind of imaging applications. Without being exhaustive, only a few of a high number of potential applications will be considered explicitly in the following by way of example, bearing in mind that analoguous setups may easily be transposed by a person skilled in the art to applications not explictly mentioned in the following.
Application of the method according to the present invention to medical imaging may include, but is not limited to, magnetic resonance imaging (MRI), computerized tomography (CT) and ultrasound imaging. In all these applications, the image is sampled in some transform domain Ψ represented by physics of the corresponding imaging problem.
Non-destructive testing and imaging include applications where the sampling is performed in some transform domain Wand one is interested to visualize the scene under investigation or to make some decision on that.
Sparse sensor arrays include numerous applications in radar, radioastronomy, acoustic arrays, surveillance systems and remote sensing applications.
Applications related to CCD and CMOS sensors include all professional, multimedia and specialized applications where use of huge amounts of pixels is limited or not desirable due to some technical limitations that include but are not limited by energy consumption, memory, communication burden, etc. Additionally, it might be interesting to use systems with larger size of sensors to increase the sensitivity without a big sacrifice in the resolution to cope with the photon noise. Moreover, one can also develop efficient sampling schemes adapted to the compression requirements of Big Data applications. Finally, one can also significantly benefit from the proposed sampling and the corresponding demosaicking as a kind of decoder in color imaging applications.
The invention is also of interest for classification and recognition applications when some object or human should be recognized from a limited number of samples. This situation is of interest not only from technical perspectives but also from security and privacy considerations. In particular, it is possible to construct a working system if the number of samples is just sufficient for performing a reliable identification or authentication, but is insufficient to reliably reconstruct the whole object or biometric parameters. Such a sampling is also of interest in order to develop new countermeasures against the adversarial attacks in machine learning. Additionally, it is possible to randomize the sets of sampling parameters .(fusing secret keys jointly with sets of training signals. Additionally, it is possible to introduce permutation of acquired samples by way of an operator Q:n→n in equation (1), which insures key-based permutations. In this way, one can create a non-differentiable operator preventing adversarial learning and attacks.
The proposed joint training of the sampling operator and of the decoder is also of interest for anticounterfeiting applications, in which case the sampling operator and the decoder are trained on a set of training signals consisting of authentic objects/biometrics and of fakes with the above mentioned binary authentication cost function. In this manner, the found sampling regions allow to quickly and reliably establish the authenticity of objects or identity of humans without need of providing the whole range sampling. Additionally, this leads to more efficient implementations and increased reliability.
Finally, the proposed invention has a wide range of applications for both image reconstruction and classification problems in case the signals of interest represent high-dimensional spatio-temporal tensors. In these cases, the proposed sampling method might be especially beneficial in comparison to prior art sampling methods.
In order to illustrate the results that may be achieved by application of the method according to the present invention in comparison to prior art methods,
For further illustration,
Finally, it shall be noted here that the present invention also pertains to computer program means stored in a computer readable medium adapted to implement the above described method as well as to a device equipped with such computer program means. The device preferably is chosen from the group comprising a mobile phone, in particular a smart phone equipped with a camera, a digital photo apparatus, a digital video camera, a scanning device, a tablet or personal computer, a server.
In light of the above description of the method of signal sampling according to the present invention, its advantages are clear. In particular, due to the joint adaptation of the sampling operator and decoder to the training data, the method of signal sampling according to the present invention has a higher overall performance under a targeted cost function for a signal of interest acquired with a finite number of samples. This allows to reduce the number of samples in many applications that have constraints in terms of technology, cost, memory and exposure/observation time as well as in terms of health safety, security or privacy, etc. by simultaneously ensuring high quality of decoding. Furthermore, the joint training of the sampling operator and of the decoder allows, in contrast to prior art sampling methods, to avoid use of “hand-crafted” decoders, to provide improved decoders most adapted to the corresponding sampling operators, as well as to exploit training signals to its full power. Moreover, the proposed method allows to make a decision about a class of the signal of interest in recognition and identification applications and the training procedure of the sampling operator may be improved by taking into account in more precise manner as compared to prior art the predetermined properties of specific types of signals, for example of the kind of images, to be sampled and reconstructed, respectively classified.
Number | Date | Country | Kind |
---|---|---|---|
18191494.6 | Aug 2018 | EP | regional |