Tuesday, June 01, 2010

CS: The long entry of the week.


Here is a selection of papers that caught my interest this past week:

L1 Minimization via Randomized First Order Algorithms by Anatoli Juditsky, Fatma Kilinc Karzan, Arkadi Nemirovski. The abstract reads:
In the paper, we propose a randomized algorithm for solving bilinear saddle points problems, present its theoretical efficiency estimates and discuss a number of applications, primarily to the problem of $\ell_1$ minimization arising in sparsity-oriented Signal Processing. We demonstrate, both theoretically and by numerical examples, that the when seeking for medium-accuracy solutions of large-scale $\ell_1$ minimization problems, our randomized algorithm outperforms significantly (and progressively as the sizes of the problem grow) the state-of-the art deterministic methods.

On Low Rank Matrix Approximations with Applications to Synthesis Problem in Compressed
Sensing
by Anatoli Juditsky, Fatma Kilinc Karzan, Arkadi Nemirovski. The abstract reads:
We consider the synthesis problem of Compressed Sensing {given s and an M£n matrix A, extract from it an m £ n submatrix Am, certified to be s-good, with m as small as possible. Starting from the veri¯able su±cient conditions of s-goodness, we express the synthesis problem as the problem of approximating a given matrix by a matrix of speci¯ed low rank in the uniform norm. We propose randomized algorithms for efficient construction of rank k approximation of matrices of size m £ n achieving accuracy bounds O(1) q ln(mn) k which hold in expectation or with high probability. We also supply derandomized versions of the approximation algorithms which does not require random sampling of matrices and attains the same accuracy bounds. We further demonstrate that our algorithms are optimal up to the logarithmic in m; n factor, i.e. the accuracy of such an approximation for the identity matrix In cannot be better than O(k ¡1 2 ). We provide preliminary numerical results on the performance of our algorithms for the synthesis problem.


In the next paper, Michael Elad introduces his work as follows:
In a new paper with Raja Giryes, we study the performance of three algorithms: the Subspace Pursuit, the CoSaMP, and the Iterative Hard Threshodling. Our analysis aims to show that under the assumtion of random additive noise, these estimation algorithms are leading to near-oracle performance, with an error that is a constant and a log-factor away from the oracle's deviation. Our work also establishes uniform bounds, implying that the obtained error-bounds are true depending ONLY on the properties of the dictionary and the cardinality of the representation vector. This puts the three algorithms in the same line with Basis Pursuit and Dantzig-Selector, and differentiating them from greedy algorithms (e.g. OMP, thresholding). Our analysis is RIP-based, and as such, it is mostly relevant to random matrices as practiced in compressed-sensing.
The paper: RIP-Based Near-Oracle Performance Guarantees for Subspace-Pursuit, CoSaMP, and Iterative Hard-Thresholding by Raja Giryes and Michael Elad. The abstract reads:
This paper presents an average case denoising performance analysis for the Subspace Pursuit (SP), the CoSaMP and the IHT algorithms. This analysis considers the recovery of a noisy signal, with the assumptions that (i) it is corrupted by an additive random white Gaussian noise; and (ii) it has a K-sparse representation with respect to a known dictionary D. The proposed analysis is based on the Restricted-Isometry-Property (RIP), establishing a near-oracle performance guarantee for each of these algorithms. The results for the three algorithms differ in the bounds’ constants and in the cardinality requirement (the upper bound on K for which the claim is true). Similar RIP-based analysis was carried out previously for the Dantzig Selector (DS) and the Basis Pursuit (BP). Past work also considered a mutual-coherence-based analysis of the denoising performance of the DS, BP, the Orthogonal Matching Pursuit (OMP) and the thresholding algorithms. This work differs from the above as it addresses a different set of algorithms. Also, despite the fact that SP, CoSaMP, and IHT are greedy-like methods, the performance guarantees developed in this work resemble those obtained for the relaxation-based methods (DS and BP), suggesting that the performance is independent of the sparse representation entries contrast and magnitude.

Penalized least squares regression is often used for signal denoising and inverse problems, and is commonly interpreted in a Bayesian framework as a Maximum A Posteriori (MAP) estimator, the penalty function being the negative logarithm of the prior. For example, the widely used quadratic program (with an $\ell^1$ penalty) associated to the LASSO / Basis Pursuit Denoising is very often considered as the MAP under a Laplacian prior. The objective of this paper is to highlight the fact that, while this is {\em one} possible Bayesian interpretation, there can be other equally acceptable Bayesian interpretations. Therefore, solving a penalized least squares regression problem with penalty $\phi(x)$ should not necessarily be interpreted as assuming a prior $C\cdot \exp(-\phi(x))$ and using the MAP estimator. In particular, we show that for {\em any} prior $p_X(x)$, the conditional mean can be interpreted as a MAP with some prior $C \cdot \exp(-\phi(x))$. Vice-versa, for {\em certain} penalties $\phi(x)$, the solution of the penalized least squares problem is indeed the {\em conditional mean}, with a certain prior $p_X(x)$. In general we have $p_X(x) \neq C \cdot \exp(-\phi(x))$.
In the next two abstracts, there were many formula, please check the papers for a coherent reading:

Deterministic Sparse Fourier Approximation via Fooling Arithmetic Progressions by Adi Akavia. The abstract reads:
A significant Fourier transform (SFT) algorithm, given a threshold and oracle access to a function f, outputs (the frequencies and approximate values of) all the -significant Fourier coefficients of f, i.e., the Fourier coefficients whose magnitude exceeds kfk22 . In this paper we present the first deterministic SFT algorithm for functions f over ZN which is: (1) Local, i.e., its running time is polynomial in logN, 1= and L1( b f) (the L1 norm of f’s Fourier transform). (2) Robust to random noise. This strictly extends the class of compressible/Fourier sparse functions over ZN efficiently handled by prior deterministic algorithms. As a corollary we obtain deterministic and robust algorithms for sparse Fourier approximation, compressed sensing and sketching. As a central tool, we prove that there are: 1. Explicit sets A of size poly((lnN)d; 1=") with "-discrepancy in all rank d Bohr sets in ZN. This extends the Razborov-Szemeredi-Wigderson result on "-discrepancy in arithmetic progressions to Bohr sets, which are their higher rank analogue. 2. Explicit sets AP of size poly(lnN; 1=") that "-approximate the uniform distribution over a given arithmetic progression P in ZN, in the sense that jEx2A (x) �� Ex2P (x)j " for all linear tests in ZN. This extends results on small biased sets, which are sets approximating the uniform distribution over the entire domain, to sets approximating uniform distributions over (arbitrary size) arithmetic progressions. These results may be of independent interest.
Fast Singular Value Thresholding without Singular Value Decomposition by Jian-Feng Cai and Stanley Osher. The abstract reads:
We are interested in solving the following minimization problem where is a given matrix, and is the Frobenius norm and the nuclear norm. This problem serves as a basic subroutine in many popular numerical schemes for nuclear norm minimization problems, which arise from low rank matrix recovery such as matrix completion. As has an explicit expression which shrinks the singular values of and keeps the singular vectors, is referred to singular value thresholding (SVT) operator in the literature. Conventional approaches for first find the singular value decomposition (SVD) of and then shrink the singular values. However, such approaches are time consuming under some circumstances, especially when the rank of is not low compared to the matrix dimension or is completely unpredictable. In this paper, we propose a fast algorithm for directly computing without using SVDs. Numerical experiments show that the proposed algorithm is much more efficient than the approach using the full SVD.

The following has similarities with the multiplicative noise papers found in CS: Sparse Recovery under Matrix Uncertainty by Mathieu Rosenbaum and Alexandre B. Tsybakov. The abstract reads:

We consider the model
y = Xµ¤ + »;
Z = X + ¥;
where the random vector y 2 Rn and the random n £ p matrix Z are observed, the n £ p matrix X is unknown, ¥ is an n £ p random noise matrix, » 2 Rn is a noise independent of ¥, and µ¤ is a vector of unknown parameters to be estimated. The matrix uncertainty is in the fact that X is observed with additive error. For dimensions p that can be much larger than the sample size n we consider the estimation of sparse vectors µ¤. Under the matrix uncertainty, the Lasso and Dantzig selector turn out to be extremely unstable in recovering the sparsity pattern (i.e., of the set of non-zero components of µ¤), even if the noise level is very small. We suggest new estimators called the matrix uncertainty selectors (or shortly the MU-selectors) which are close to µ¤ in di®erent norms and in the prediction risk if the Restricted Eigenvalue assumption on X is satis¯ed. We also show that under somewhat stronger assumptions these estimators recover correctly the sparsity pattern.

Estimation of High-Dimensional Low Rank Matrices By Angelika Rohde and Alexandre B. Tsybakov. The abstract reads:
Suppose that we observe entries or, more generally, linear combinations of entries of an unknown m×T-matrix A corrupted by noise. We are particularly interested in the high-dimensional setting where the number mT of unknown entries can be much larger than the sample size N. Motivated by several applications, we consider estimation of matrix A under the assumption that it has small rank. This can be viewed as dimension reduction or sparsity assumption. In order to shrink towards a low-rank representation, we investigate penalized least squares estimators with a Schatten-p quasi-norm penalty term, p · 1. We study these estimators under two possible assumptions – a modified version of the restricted isometry condition and a uniform bound on the ratio “empirical norm induced by the sampling operator/ Frobenius norm”. The main results are stated as non-asymptotic upper bounds on the prediction risk and on the Schatten-q risk of the estimators, where q 2 [p, 2]. The rates that we obtain for the prediction risk are of the form rm/N (for m = T), up to logarithmic factors, where r is the rank of A. The particular examples of multitask learning and matrix completion are worked out in detail. The proofs are based on tools from the theory of empirical processes. As a by-product we derive bounds for the kth entropy numbers of the quasi-convex Schatten class embeddings SMp ,! SM 2 , p \lt 1, which are of independent interest.

Guiseppe Paleologo on Twitter pointed out to: Spectral Density of Sparse Sample Covariance Matrices by Taro Nagao, Toshiyuki Tanaka. The abstract reads:
Applying the replica method of statistical mechanics, we evaluate the eigenvalue density of the large random matrix (sample covariance matrix) of the form $J = A^{\rm T} A$, where $A$ is an $M \times N$ real sparse random matrix. The difference from a dense random matrix is the most significant in the tail region of the spectrum. We compare the results of several approximation schemes, focusing on the behavior in the tail region.
from there:
One of the simplest ways to modify the random matrix J so that Marcenko-Pastur law breaks down is to make the matrix A sparse. An example demonstrating the significance of considering random matrix ensembles defined on the basis of sparse A can be found in communication theory: Information theoretic channel capacity of a randomly-spread Code-Division Multiple- Access (CDMA) channel is evaluated in terms of eigenvalue distribution of the matrix J = ATA with A defining the random spreading. It has been argued [11] that some wideband CDMA schemes can be modeled as a sparsely-spread system, where deviations from Mar˘cenko-Pastur law may affect performance of such systems. Sparse random matrices in general are also of interest in many branches of applications. In particular, the eigenvector localization expected to occur in sparse random matrices is one of the most outstanding phenomena in disordered systems. The appearance of isolated eigenvalue spectra in the tail region is another interesting feature. Such features are also observed in heavy-tailed random matrices and were recently studied in detail [12, 13, 14].
While I was watching some presentations at the conference on Random Matrices, I came across this paper:

In this contribution, we provide a theoretical study of two hypothesis tests allowing to detect the presence of an unknown transmitter using several sensors. Both tests are based on the analysis of the eigenvalues of the sampled covariance matrix of the received signal. The Generalized Likelihood Ratio Test (GLRT) derived in [1] is analyzed under the assumption that both the number K of sensors and the length N of the observation window tend to infinity at the same rate: K/N - c element of (0, 1). The GLRT is compared with a test based on the condition number used which is used in cognitive radio applications. Using results of random matrix theory
for spiked models and tools of Large Deviations, we provide the error exponent curve associated with both test and prove that the GLRT outperforms the test based on the condition number.
Eventually, Raj Rao and I talked a little bit about the fascinating Heard Island Feasibility Test. Brian Dushaw, the main scientist involved in the project let me know that he'll look into whether the 3GB of data can be made available on the site. I wonder if one could do some sorts of blind deconvolution on this dataset.

Of interest also:

Manifold reconstruction using Tangential Delaunay Complexes
by Jean-Daniel Boissonnat, Arijit Ghosh. The abstract reads:
We give a provably correct algorithm to reconstruct a k-dimensional manifold embedded in d-dimensional Euclidean space. Input to our algorithm is a point sample coming from an unknown manifold. Our approach is based on two main ideas : the notion of tangential Delaunay complex defined and the technique of sliver removal by weighting the sample points. Differently from previous methods, we do not construct any subdivision of the embedding d-dimensional space. As a result, the running time of our algorithm depends only linearly on the extrinsic dimension d while it depends quadratically on the size of the input sample, and exponentially on the intrinsic dimension k. To the best of our knowledge, this is the first certified algorithm for manifold reconstruction whose complexity depends linearly on the ambient dimension. We also prove that for a dense enough sample the output of our algorithm is isotopic to the manifold and a close geometric approximation of the manifold.

1 comment:

Anonymous said...

singular value thresholding:
here is an even simpler algorithm that does not need any SVDs, in case you are interested: www.m8j.net/data/List/Files-149/fastRegNuclearNormOptimization.pdf

Printfriendly