Tolerance to Ambiguity: Sparsity and the Bayesian Perspective

I just came across a video version of the famous talk by Richard Hamming on You and Your Research and was reminded of this part (in the video, he is even more eloquent)

....There's another trait on the side which I want to talk about; that trait is ambiguity. It took me a while to discover its importance. Most people like to believe something is or is not true. Great scientists tolerate ambiguity very well. They believe the theory enough to go ahead; they doubt it enough to notice the errors and faults so they can step forward and create the new replacement theory....

which brings us to today's paper on matching the bayesian and compressive sensing approach in the reconstruction stage. The prior in vanilla compressive sensing is different than the usual bayesian prior (usually a Laplacian distribution), yet the bayesian approach brings extraodinarly good result in the compressive sensing realm. Today's paper tries to resolve that ambiguity more or less: Sparsity and the Bayesian Perspective by Jean-Luc Starck, and David Donoho, and M.Jalal Fadili and Anais Rassat. The abstract reads:

Sparsity has been recently introduced in cosmology for weak-lensing and CMB data analysis for diﬀerent applications such as denoising, component separation or inpainting (i.e. ﬁlling the missing data or the mask). Although it gives very nice numerical results, CMB sparse inpainting has been severely criticized by top researchers in cosmology, based on arguments derived from a Bayesian perspective. Trying to understand their point of view, we realize that interpreting a regularization penalty term as a prior in a Bayesian framework can lead to erroneous conclusions. This paper is by no means against the Bayesian approach, which has proven to be very useful for many applications, but warns about a Bayesian-only interpretation in data analysis, which can be misleading in some cases.

Hi, This is an interesting paper. However, I think that there is something important missing in this discussion: the loss function. Any (Bayesian) point estimate is the minimizer of the posterior expectation of some loss; in particular, the MAP is the minimizer of the posterior expectation of the 0/1 loss (in fact, it is the limit of a family of estimates, but that can be ignored in this discussion). Accordingly, there is no reason whatsoever why the distribution of MAP estimates has to be similar to the prior; why would it? Furthermore, for a MAP estimate to yield "correct results" (whatever "correct" means), there is no reason why typical samples from the prior should look like those "correct results". In fact, the compressed sensing (CS) example in Section 3.3 of the paper illustrates this quite clearly: (a) CS theory guarantees that solving (4) or (5) yields the "correct" solution; (b) as explained in section 3.1, (5) is the MAP estimate of x under a Laplacian prior (and the linear-Gaussian likelihood therein explained); (c) thus, the solution of (5) is the minimizer of the posterior expectation of the 0/1 loss under the likelihood and prior just mentioned; (e) in conclusion, if the underlying vectors are exactly sparse enough (obviously not typical samples of a Laplacian) they can be recovered by computing the MAP estimate under a Laplacian prior, that is, by computing the minimizer of the posterior expectation of the 0/1 loss. This is simply a fact. There is nothing surprising here: the message is that the prior is only half of the story and it doesn't make sense to look at a prior without looking also at the loss function. In (Bayesian) point estimation, a prior is "good", not if it describes well the underlying objects to be estimated, but if used (in combination with the likelihood function and observations) to obtain a minimizer of the posterior expectation of some loss it leads to "good" estimates. Regards, Mario Figueiredo.

## 1 comment:

Hi,

This is an interesting paper. However, I think that there is something important missing in this discussion: the loss function. Any (Bayesian) point estimate is the minimizer of the posterior expectation of some loss; in particular, the MAP is the minimizer of the posterior expectation of the 0/1 loss (in fact, it is the limit of a family of estimates, but that can be ignored in this discussion). Accordingly, there is no reason whatsoever why the distribution of MAP estimates has to be similar to the prior; why would it? Furthermore, for a MAP estimate to yield "correct results" (whatever "correct" means), there is no reason why typical samples from the prior should look like those "correct results". In fact, the compressed sensing (CS) example in Section 3.3 of the paper illustrates this quite clearly: (a) CS theory guarantees that solving (4) or (5) yields the "correct" solution; (b) as explained in section 3.1, (5) is the MAP estimate of x under a Laplacian prior (and the linear-Gaussian likelihood therein explained); (c) thus, the solution of (5) is the minimizer of the posterior expectation of the 0/1 loss under the likelihood and prior just mentioned; (e) in conclusion, if the underlying vectors are exactly sparse enough (obviously not typical samples of a Laplacian) they can be recovered by computing the MAP estimate under a Laplacian prior, that is, by computing the minimizer of the posterior expectation of the 0/1 loss. This is simply a fact. There is nothing surprising here: the message is that the prior is only half of the story and it doesn't make sense to look at a prior without looking also at the loss function. In (Bayesian) point estimation, a prior is "good", not if it describes well the underlying objects to be estimated, but if used (in combination with the likelihood function and observations) to obtain a minimizer of the posterior expectation of some loss it leads to "good" estimates.

Regards,

Mario Figueiredo.

Post a Comment