Friday, March 28, 2014

The Long Post of the Month


Here are some interesting preprints found on ArXiv: 

We propose a method for calculating Wannier functions of periodic solids directly from a modified variational principle for the energy, subject to the requirement that the Wannier functions are orthogonal to all their translations ("shift-orthogonality"). Localization is achieved by adding an $L_1$ regularization term to the energy functional. This approach results in "compressed" Wannier modes with compact support, where one parameter $\mu$ controls the trade-off between the accuracy of the total energy and the size of the support of the Wannier modes. Efficient algorithms for shift-orthogonalization and solution of the variational minimization problem are demonstrated.

The Sketching Complexity of Graph Cuts

We study the problem of sketching an input graph, so that, given the sketch, one can estimate the value (capacity) of any cut in the graph up to $1+\epsilon$ approximation. Our results include both upper and lower bound on the sketch size, expressed in terms of the vertex-set size $n$ and the accuracy $\epsilon$. 
We design a randomized scheme which, given $\epsilon\in(0,1)$ and an $n$-vertex graph $G=(V,E)$ with edge capacities, produces a sketch of size $\tilde O(n/\epsilon)$ bits, from which the capacity of any cut $(S,V\setminus S)$ can be reported, with high probability, within approximation factor $(1+\epsilon)$. The previous upper bound is $\tilde O(n/\epsilon^2)$ bits, which follows by storing a cut sparsifier graph as constructed by Benczur and Karger (1996) and followup work. 
In contrast, we show that if a sketch succeeds in estimating the capacity of all cuts $(S,\bar S)$ in the graph (simultaneously), it must be of size $\Omega(n/\epsilon^2)$ bits.

DIPP - Distributed Parallel Pursuit

We consider a distributed compressed sensing scenario where multiple sensors observe correlated sparse signals and the sensors are connected through a network. The signal correlation is realized by a partial common support-set. For such a system, the main objective of this paper is to develop a greedy pursuit algorithm. To fulfill the objective, we develop distributed parallel pursuit algorithm and provide restricted isometry property based analysis on convergence and reconstruction performance. Through simulations we show that the distributed parallel pursuit provides a good performance for a reasonably connected network.

Compressive Pattern Matching on Multispectral Data

We introduce a new constrained minimization problem that performs template and pattern detection on a multispectral image in a compressive sensing context. We use an original minimization problem from Guo and Osher that uses $L_1$ minimization techniques to perform template detection in a multispectral image. We first adapt this minimization problem to work with compressive sensing data. Then we extend it to perform pattern detection using a formal transform called the spectralization along a pattern. That extension brings out the problem of measurement reconstruction. We introduce shifted measurements that allow us to reconstruct all the measurement with a small overhead and we give an optimality constraint for simple patterns. We present numerical results showing the performances of the original minimization problem and the compressed ones with different measurement rates and applied on remotely sensed data.

Minimization of multi-penalty functionals by alternating iterative thresholding and optimal parameter choices

Inspired by several recent developments in regularization theory, optimization, and signal processing, we present and analyze a numerical approach to multi-penalty regularization in spaces of sparsely represented functions. The sparsity prior is motivated by the largely expected geometrical/structured features of high-dimensional data, which may not be well-represented in the framework of typically more isotropic Hilbert spaces. In this paper, we are particularly interested in regularizers which are able to correctly model and separate the multiple components of additively mixed signals. This situation is rather common as pure signals may be corrupted by additive noise. To this end, we consider a regularization functional composed by a data-fidelity term, where signal and noise are additively mixed, a non-smooth and non-convex sparsity promoting term, and a penalty term to model the noise. We propose and analyze the convergence of an iterative alternating algorithm based on simple iterative thresholding steps to perform the minimization of the functional. By means of this algorithm, we explore the effect of choosing different regularization parameters and penalization norms in terms of the quality of recovering the pure signal and separating it from additive noise. For a given fixed noise level numerical experiments confirm a significant improvement in performance compared to standard one-parameter regularization methods. By using high-dimensional data analysis methods such as Principal Component Analysis, we are able to show the correct geometrical clustering of regularized solutions around the expected solution. Eventually, for the compressive sensing problems considered in our experiments we provide a guideline for a choice of regularization norms and parameters.

Beyond L2-Loss Functions for Learning Sparse Models

Incorporating sparsity priors in learning tasks can give rise to simple, and interpretable models for complex high dimensional data. Sparse models have found widespread use in structure discovery, recovering data from corruptions, and a variety of large scale unsupervised and supervised learning problems. Assuming the availability of sufficient data, these methods infer dictionaries for sparse representations by optimizing for high-fidelity reconstruction. In most scenarios, the reconstruction quality is measured using the squared Euclidean distance, and efficient algorithms have been developed for both batch and online learning cases. However, new application domains motivate looking beyond conventional loss functions. For example, robust loss functions such as $\ell_1$ and Huber are useful in learning outlier-resilient models, and the quantile loss is beneficial in discovering structures that are the representative of a particular quantile. These new applications motivate our work in generalizing sparse learning to a broad class of convex loss functions. In particular, we consider the class of piecewise linear quadratic (PLQ) cost functions that includes Huber, as well as $\ell_1$, quantile, Vapnik, hinge loss, and smoothed variants of these penalties. We propose an algorithm to learn dictionaries and obtain sparse codes when the data reconstruction fidelity is measured using any smooth PLQ cost function. We provide convergence guarantees for the proposed algorithm, and demonstrate the convergence behavior using empirical experiments. Furthermore, we present three case studies that require the use of PLQ cost functions: (i) robust image modeling, (ii) tag refinement for image annotation and retrieval and (iii) computing empirical confidence limits for subspace clustering.

Random coordinate descent methods for $\ell_0$ regularized convex optimization

In this paper we study the minimization of $\ell_0$ regularized optimization problems, where the objective function is composed of a smooth convex function and the $\ell_0$ regularization. We analyze necessary optimality conditions for this nonconvex problem which lead to the separation of the local minima into two restricted classes. Based on these restricted classes of local minima, we devise new random coordinate descent type methods for solving these problems. In particular, we analyze the properties of an iterative hard thresholding based random coordinate descent algorithm for which we prove that any limit point is a local minimum from the first restricted class of local minimizers. Then, we analyze the convergence of a random proximal alternating minimization method and show that any limit point of this algorithm is a local minima from the second restricted class of local minimizers. Under the strong convexity assumption, we prove linear convergence in probability for both methods. We also provide numerical experiments which show the superior behavior of our methods in comparison with the usual iterative hard thresholding algorithm.

A note on compressed sensing of structured sparse wavelet coefficients from subsampled Fourier measurements

This note complements the paper "The quest for optimal sampling: Computationally efficient, structure-exploiting measurements for compressed sensing" [2]. Its purpose is to present a proof of a result stated therein concerning the recovery via compressed sensing of a signal that has structured sparsity in a Haar wavelet basis when sampled using a multilevel-subsampled discrete Fourier transform. In doing so, it provides a simple exposition of the proof in the case of Haar wavelets and discrete Fourier samples of more general result recently provided in the paper "Breaking the coherence barrier: A new theory for compressed sensing" [1].

The quest for optimal sampling: Computationally efficient, structure-exploiting measurements for compressed sensing

An intriguing phenomenon in many instances of compressed sensing is that the reconstruction quality is governed not just by the overall sparsity of the signal, but also on its structure. This paper is about understanding this phenomenon, and demonstrating how it can be fruitfully exploited by the design of suitable sampling strategies in order to outperform more standard compressed sensing techniques based on random matrices.

Minimax Optimal Rates for Poisson Inverse Problems with Physical Constraints

This paper considers fundamental limits for solving sparse inverse problems in the presence of Poisson noise with physical constraints. Such problems arise in a variety of applications, including photon-limited imaging systems based on compressed sensing. Most prior theoretical results in compressed sensing and related inverse problems apply to idealized settings where the noise is i.i.d., and do not account for signal-dependent noise and physical sensing constraints. Prior results on Poisson compressed sensing with signal-dependent noise and physical constraints provided upper bounds on mean squared error performance for a specific class of estimators. However, it was unknown whether those bounds were tight or if other estimators could achieve significantly better performance. This work provides minimax lower bounds on mean-squared error for sparse Poisson inverse problems under physical constraints. Our lower bounds are complemented by minimax upper bounds. Our upper and lower bounds reveal that due to the interplay between the Poisson noise model, the sparsity constraint and the physical constraints: (i) the mean-squared error does not depend on the sample size $n$ other than to ensure the sensing matrix satisfies RIP-like conditions and the intensity $T$ of the input signal plays a critical role; and (ii) the mean-squared error has two distinct regimes, a low-intensity and a high-intensity regime and the transition point from the low-intensity to high-intensity regime depends on the input signal $f^*$. In the low-intensity regime the mean-squared error is independent of $T$ while in the high-intensity regime, the mean-squared error scales as $\frac{s \log p}{T}$, where $s$ is the sparsity level, $p$ is the number of pixels or parameters and $T$ is the signal intensity.

A Family of Subgradient-Based Methods for Convex Optimization Problems in a Unifying Framework

We consider subgradient- and gradient-based methods for convex optimization problems whose feasible region is simple enough. We unify the way of constructing the subproblems which are necessary to be solved at each iteration of these methods. Our unifying framework provides a novel analysis of (sub)gradient-based methods and permit us to propose a family of optimal complexity methods. For the non-smooth case, we propose an extension of the mirror-descent method originally proposed by Nemirovski and Yudin and overcame its drawback on optimal convergence. Our analysis is also applied to the dual-averaging method proposed by Nesterov using simpler arguments for its complexity analysis. Finally, we propose (inexact) gradient-based methods for structured optimization problems such as with composite structure or using an inexact oracle model. The proposed family of classical gradient methods and its accelerations generalizes some of primal/dual gradient and Tseng's accelerated proximal gradient methods.

Stabilizing dual-energy X-ray computed tomography reconstructions using patch-based regularization

Recent years have seen growing interest in exploiting dual- and multi-energy measurements in computed tomography (CT) in order to characterize material properties as well as object shape. Material characterization is performed by decomposing the scene into constitutive basis functions, such as Compton scatter and photoelectric absorption functions. While well motivated physically, the joint recovery of the spatial distribution of photoelectric and Compton properties is severely complicated by the fact that the data are several orders of magnitude more sensitive to Compton scatter coefficients than to photoelectric absorption, so small errors in Compton estimates can create large artifacts in the photoelectric estimate. To address these issues, we propose a model-based iterative approach which uses patch-based regularization terms to stabilize inversion of photoelectric coefficients, and solve the resulting problem though use of computationally attractive Alternating Direction Method of Multipliers (ADMM) solution techniques. Using simulations and experimental data acquired on a commercial scanner, we demonstrate that the proposed processing can lead to more stable material property estimates which should aid materials characterization in future dual- and multi-energy CT systems.

Dynamical systems and forward-backward algorithms associated with the sum of a convex subdifferential and a monotone cocoercive operator

In a Hilbert framework, we introduce continuous and discrete dynamical systems which aim at solving inclusions governed by structured monotone operators $A=\partial\Phi+B$, where $\partial\Phi$ is the subdifferential of a convex lower semicontinuous function $\Phi$, and $B$ is a monotone cocoercive operator. We first consider the extension to this setting of the regularized Newton dynamic with two potentials. Then, we revisit some related dynamical systems, namely the semigroup of contractions generated by $A$, and the continuous gradient projection dynamic. By a Lyapunov analysis, we show the convergence properties of the orbits of these systems.
The time discretization of these dynamics gives various forward-backward splitting methods (some new) for solving structured monotone inclusions involving non-potential terms. The convergence of these algorithms is obtained under classical step size limitation. Perspectives are given in the field of numerical splitting methods for optimization, and multi-criteria decision processes.

Embedding Cryptographic Features in Compressive Sensing

Compressive sensing (CS) has been widely studied and applied in many fields. Recently, the way to perform secure compressive sensing (SCS) has become a topic of growing interest. The existing works on SCS usually take the sensing matrix as a key and the resultant security level is not evaluated in depth. They can only be considered as a preliminary exploration on SCS, but a concrete and operable encipher model is not given yet. In this paper, we are going to investigate SCS in a systematic way. The relationship between CS and symmetric-key cipher indicates some possible encryption models. To this end, we propose the two-level protection models (TLPM) for SCS which are developed from measurements taking and something else, respectively. It is believed that these models will provide a new point of view and stimulate further research in both CS and cryptography. Specifically, an efficient and secure encryption scheme for parallel compressive sensing (PCS) is designed by embedding a two-layer protection in PCS using chaos. The first layer is undertaken by random permutation on a two-dimensional signal, which is proved to be an acceptable permutation with overwhelming probability. The other layer is to sample the permuted signal column by column with the same chaotic measurement matrix, which satisfies the restricted isometry property of PCS with overwhelming probability. Both the random permutation and the measurement matrix are constructed under the control of a chaotic system. Simulation results show that unlike the general joint compression and encryption schemes in which encryption always leads to the same or a lower compression ratio, the proposed approach of embedding encryption in PCS actually improves the compression performance. Besides, the proposed approach possesses high transmission robustness against additive Gaussian white noise and cropping attack.

Random Matrices and Erasure Robust Frames

Data erasure can often occur in communication. Guarding against erasures involves redundancy in data representation. Mathematically this may be achieved by redundancy through the use of frames. One way to measure the robustness of a frame against erasures is to examine the worst case condition number of the frame with a certain number of vectors erased from the frame. The term {\em numerically erasure-robust frames (NERFs)} was introduced in \cite{FicMix12} to give a more precise characterization of erasure robustness of frames. In the paper the authors established that random frames whose entries are drawn independently from the standard normal distribution can be robust against up to approximately 15\% erasures, and asked whether there exist frames that are robust against erasures of more than 50\%. In this paper we show that with very high probability random frames are, independent of the dimension, robust against any amount of erasures as long as the number of remaining vectors is at least $1+\delta$ times the dimension for some $\delta_0>0$. This is the best possible result, and it also implies that the proportion of erasures can arbitrarily close to 1 while still maintaining robustness. Our result depends crucially on a new estimate for the smallest singular value of a rectangular random matrix with independent standard normal entries.

Viewing the Welch bound inequality from the kernel trick viewpoint

This brief note views to the Welch bound inequality using the kernel trick from machine learning research. From this angle, it gives a novel interpretation of the inequality.

SRA: Fast Removal of General Multipath for ToF Sensors

A major issue with Time of Flight sensors is the presence of multipath interference. We present Sparse Reflections Analysis (SRA), an algorithm for removing this interference which has two main advantages. First, it allows for very general forms of multipath, including interference with three or more paths, diffuse multipath resulting from Lambertian surfaces, and combinations thereof. SRA removes this general multipath with robust techniques based on $L_1$ optimization. Second, due to a novel dimension reduction, we are able to produce a very fast version of SRA, which is able to run at frame rate. Experimental results on both synthetic data with ground truth, as well as real images of challenging scenes, validate the approach.

On Compressive Sensing in Coding Problems: A Rigorous Approach

We take an information theoretic perspective on a classical sparse-sampling noisy linear model and present an analytical expression for the mutual information, which plays central role in a variety of communications/processing problems. Such an expression was addressed previously either by bounds, by simulations and by the (non-rigorous) replica method. The expression of the mutual information is based on techniques used in [1], addressing the minimum mean square error (MMSE) analysis. Using these expressions, we study specifically a variety of sparse linear communications models which include coding in different settings, accounting also for multiple access channels and different wiretap problems. For those, we provide single-letter expressions and derive achievable rates, capturing the communications/processing features of these timely models.

Sample-Optimal Fourier Sampling in Any Constant Dimension -- Part I

We give an algorithm for $\ell_2/\ell_2$ sparse recovery from Fourier measurements using $O(k\log N)$ samples, matching the lower bound of \cite{DIPW} for non-adaptive algorithms up to constant factors for any $k\leq N^{1-\delta}$. The algorithm runs in $\tilde O(N)$ time. Our algorithm extends to higher dimensions, leading to sample complexity of $O_d(k\log N)$, which is optimal up to constant factors for any $d=O(1)$. These are the first sample optimal algorithms for these problems. 
A preliminary experimental evaluation indicates that our algorithm has empirical sampling complexity comparable to that of other recovery methods known in the literature, while providing strong provable guarantees on the recovery quality.

Resource Allocation for Coordinated Multipoint Networks with Wireless Information and Power Transfer

This paper studies the resource allocation algorithm design for multiuser coordinated multipoint (CoMP) networks with simultaneous wireless information and power transfer (SWIPT). In particular, remote radio heads (RRHs) are connected to a central processor (CP) via capacity-limited backhaul links to facilitate CoMP joint transmission. Besides, the CP transfers energy to the RRHs for more efficient network operation. The considered resource allocation algorithm design is formulated as a non-convex optimization problem with a minimum required signal-to-interference-plus-noise ratio (SINR) constraint at multiple information receivers and a minimum required power transfer constraint at the energy harvesting receivers. By optimizing the transmit beamforming vectors at the CP and energy sharing between the CP and the RRHs, we aim at jointly minimizing the total network transmit power and the maximum capacity consumption per backhaul link. The resulting non-convex optimization problem is NP-hard. In light of the intractability of the problem, we reformulate it by replacing the non-convex objective function with its convex hull, which enables the derivation of an efficient iterative resource allocation algorithm. In each iteration, a non-convex optimization problem is solved by semi-definite programming (SDP) relaxation and the proposed iterative algorithm converges to a local optimal solution of the original problem. Simulation results illustrate that our proposed algorithm achieves a close-to-optimal performance and provides a significant reduction in backhaul capacity consumption compared to full cooperation. Besides, the considered CoMP network is shown to provide superior system performance as far as power consumption is concerned compared to a traditional system with multiple antennas co-located.

Randomized pick-freeze for sparse Sobol indices estimation in high dimension

Yohann De Castro (LM-Orsay), Alexandre Janon (LM-Orsay, - Méthodes d'Analyse Stochastique des Codes et Traitements Numériques)
This article investigates a new procedure to estimate the influence of each variable of a given function defined on a high-dimensional space. More precisely, we are concerned with describing a function of a large number $p$ of parameters that depends only on a small number $s$ of them. Our proposed method is an unconstrained $\ell_{1}$-minimization based on the Sobol's method. We prove that, with only $\mathcal O(s\log p)$ evaluations of $f$, one can find which are the relevant parameters.

Sparse Learning over Infinite Subgraph Features

We present a supervised-learning algorithm from graph data (a set of graphs) for arbitrary twice-differentiable loss functions and sparse linear models over all possible subgraph features. To date, it has been shown that under all possible subgraph features, several types of sparse learning, such as Adaboost, LPBoost, LARS/LASSO, and sparse PLS regression, can be performed. Particularly emphasis is placed on simultaneous learning of relevant features from an infinite set of candidates. We first generalize techniques used in all these preceding studies to derive an unifying bounding technique for arbitrary separable functions. We then carefully use this bounding to make block coordinate gradient descent feasible over infinite subgraph features, resulting in a fast converging algorithm that can solve a wider class of sparse learning problems over graph data. We also empirically study the differences from the existing approaches in convergence property, selected subgraph features, and search-space sizes. We further discuss several unnoticed issues in sparse learning over all possible subgraph features.

A Compressive Sensing Based Approach to Sparse Wideband Array Design

Sparse wideband sensor array design for sensor location optimisation is highly nonlinear and it is traditionally solved by genetic algorithms, simulated annealing or other similar optimization methods. However, this is an extremely time-consuming process and more efficient solutions are needed. In this work, this problem is studied from the viewpoint of compressive sensing and a formulation based on a modified $l_1$ norm is derived. As there are multiple coefficients associated with each sensor, the key is to make sure that these coefficients are simultaneously minimized in order to discard the corresponding sensor locations. Design examples are provided to verify the effectiveness of the proposed methods.

An Optimal-Dimensionality Sampling Scheme on the Sphere for Fast Spherical Harmonic Transforms

We develop a sampling scheme on the sphere that permits accurate computation of the spherical harmonic transform and its inverse for signals band-limited at $L$ using only $L^2$ samples. We obtain the optimal number of samples given by the degrees of freedom of the signal in harmonic space. The number of samples required in our scheme is a factor of two or four fewer than existing techniques, which require either $2L^2$ or $4L^2$ samples. We note, however, that we do not recover a sampling theorem on the sphere, where spherical harmonic transforms are theoretically exact. Nevertheless, we achieve high accuracy even for very large band-limits. For our optimal-dimensionality sampling scheme, we develop a fast and accurate algorithm to compute the spherical harmonic transform (and inverse), with computational complexity comparable with existing schemes in practice. We conduct numerical experiments to study in detail the stability, accuracy and computational complexity of the proposed transforms. We also highlight the advantages of the proposed sampling scheme and associated transforms in the context of potential applications.

Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties

We describe an asynchronous parallel stochastic proximal coordinate descent algorithm for minimizing a composite objective function, which consists of a smooth convex function plus a separable convex function. In contrast to previous analyses, our model of asynchronous computation accounts for the fact that components of the unknown vector may be written by some cores simultaneously with being read by others. Despite the complications arising from this possibility, the method achieves a linear convergence rate on functions that satisfy an optimal strong convexity property and a sublinear rate ($1/k$) on general convex functions. Near-linear speedup on a multicore system can be expected if the number of processors is $O(n^{1/4})$. We describe results from implementation on ten cores of a multicore processor.

Explicit Matrices with the Restricted Isometry Property: Breaking the Square-Root Bottleneck

Matrices with the restricted isometry property (RIP) are of particular interest in compressed sensing. To date, the best known RIP matrices are constructed using random processes, while explicit constructions are notorious for performing at the "square-root bottleneck," i.e., they only accept sparsity levels on the order of the square root of the number of measurements. The only known explicit matrix which surpasses this bottleneck was constructed by Bourgain, Dilworth, Ford, Konyagin and Kutzarova. This chapter provides three contributions to further the groundbreaking work of Bourgain et al.: (i) we develop an intuition for their matrix construction and underlying proof techniques; (ii) we prove a generalized version of their main result; and (iii) we apply this more general result to maximize the extent to which their matrix construction surpasses the square-root bottleneck.

A frame theoretic approach to the Non-Uniform Fast Fourier Transform

Nonuniform Fourier data are routinely collected in applications such as magnetic resonance imaging, synthetic aperture radar, and synthetic imaging in radio astronomy. To acquire a fast reconstruction that does not require an online inverse process, the non-uniform fast Fourier transform (NFFT), also called convolutional gridding, is frequently employed. While various investigations have led to improvements in accuracy, e?ciency, and robustness of the NFFT, not much attention has been paid to the fundamental analysis of the scheme, and in particular its convergence properties. This paper analyzes the convergence of the NFFT by casting it as a Fourier frame approximation. In so doing, we are able to design parameters for the method that satisfy conditions for numerical convergence. Our so called frame theoretic convolutional gridding algorithm can also be applied to detect features (such as edges) from non-uniform Fourier samples of piecewise smooth functions.

Iterative Detection for Compressive Sensing:Turbo CS

We consider compressive sensing as a source coding method for signal transmission. We concatenate a convolutional coding system with 1-bit compressive sensing to obtain a serial concatenated system model for sparse signal transmission over an AWGN channel. The proposed source/channel decoder, which we refer to as turbo CS, is robust against channel noise and its signal reconstruction performance at the receiver increases considerably through iterations. We show 12 dB improvement with six turbo CS iterations compared to a non-iterative concatenated source/channel decoder.

Compressive Signal Processing with Circulant Sensing Matrices

Compressive sensing achieves effective dimensionality reduction of signals, under a sparsity constraint, by means of a small number of random measurements acquired through a sensing matrix. In a signal processing system, the problem arises of processing the random projections directly, without first reconstructing the signal. In this paper, we show that circulant sensing matrices allow to perform a variety of classical signal processing tasks such as filtering, interpolation, registration, transforms, and so forth, directly in the compressed domain and in an exact fashion, \emph{i.e.}, without relying on estimators as proposed in the existing literature. The advantage of the techniques presented in this paper is to enable direct measurement-to-measurement transformations, without the need of costly recovery procedures.

An asymptotic existence result on compressed sensing matrices

For any rational number $h$ and all sufficiently large $n$ we give a deterministic construction for an $n\times \lfloor hn\rfloor$ compressed sensing matrix with $(\ell_1,t)$-recoverability where $t=O(\sqrt{n})$. Our method uses pairwise balanced designs and complex Hadamard matrices in the construction of $\epsilon$-equiangular frames, which we introduce as a generalisation of equiangular tight frames. The method is general and produces good compressed sensing matrices from any appropriately chosen pairwise balanced design. The $(\ell_1,t)$-recoverability performance is specified as a simple function of the parameters of the design. To obtain our asymptotic existence result we prove new results on the existence of pairwise balanced designs in which the numbers of blocks of each size are specified.

Quality-based Multimodal Classification Using Tree-Structured Sparsity

Recent studies have demonstrated advantages of information fusion based on sparsity models for multimodal classification. Among several sparsity models, tree-structured sparsity provides a flexible framework for extraction of cross-correlated information from different sources and for enforcing group sparsity at multiple granularities. However, the existing algorithm only solves an approximated version of the cost functional and the resulting solution is not necessarily sparse at group levels. This paper reformulates the tree-structured sparse model for multimodal classification task. An accelerated proximal algorithm is proposed to solve the optimization problem, which is an efficient tool for feature-level fusion among either homogeneous or heterogeneous sources of information. In addition, a (fuzzy-set-theoretic) possibilistic scheme is proposed to weight the available modalities, based on their respective reliability, in a joint optimization problem for finding the sparsity codes. This approach provides a general framework for quality-based fusion that offers added robustness to several sparsity-based multimodal classification algorithms. To demonstrate their efficacy, the proposed methods are evaluated on three different applications - multiview face recognition, multimodal face recognition, and target classification.

A Fast Active Set Block Coordinate Descent Algorithm for $\ell_1$-regularized least squares

The problem of finding sparse solutions to underdetermined systems of linear equations arises in several real-world problems (e.g. signal and image processing, compressive sensing, statistical inference). A standard tool for dealing with sparse recovery is the $\ell_1$-regularized least-squares approach that has been recently attracting the attention of many researchers. In this paper, we describe an efficient block active set coordinate descent algorithm that at each iteration use a bunch of variables (i.e. those variables which are non-active and violate the most some specific optimality conditions) to improve the objective function. We further analyze the convergence properties of the proposed method. Finally, we report some numerical results showing the effectiveness of the approach.

Exact Performance Analysis of the Oracle Receiver for Compressed Sensing Reconstruction

A sparse or compressible signal can be recovered from a certain number of noisy random projections, smaller than what dictated by classic Shannon/Nyquist theory. In this paper, we derive the closed-form expression of the mean square error performance of the oracle receiver, knowing the sparsity pattern of the signal. With respect to existing bounds, our result is exact and does not depend on a particular realization of the sensing matrix. Moreover, our result holds irrespective of whether the noise affecting the measurements is white or correlated. Numerical results show a perfect match between equations and simulations, confirming the validity of the result.

Robust PCA with Partial Subspace Knowledge

In recent work, robust PCA has been posed as a problem of recovering a low-rank matrix $L$ and a sparse matrix $S$ from their sum, $M:= L + S$ and a provably exact convex optimization solution called PCP has been proposed. Suppose that we have a partial estimate of the column subspace of the low rank matrix $L$. Can we use this information to improve the PCP solution, i.e. allow recovery under weaker assumptions? We propose here a simple modification of the PCP idea, called modified-PCP, that allows us to use this knowledge. We derive its correctness result which shows that modified-PCP indeed requires significantly weaker incoherence assumptions than PCP, when the available subspace knowledge is accurate. Extensive simulations are also used to illustrate this. Finally, we explain how this problem naturally occurs in many applications involving time series data, e.g. in separating a video sequence into foreground and background layers, in which the subspace spanned by the background images is not fixed but changes over time and the changes are gradual. A corollary for this case is also given.

Density matrix minimization with $\ell_1$ regularization

We propose a convex variational principle to find sparse representation of low-lying eigenspace of symmetric matrices. In the context of electronic structure calculation, this corresponds to a sparse density matrix minimization algorithm with $\ell_1$ regularization. The minimization problem can be efficiently solved by a split Bergman iteration type algorithm. We further prove that from any initial condition, the algorithm converges to a minimizer of the variational principle.

Decentralized Subspace Pursuit for Joint Sparsity Pattern Recovery

To solve the problem of joint sparsity pattern recovery in a decen-tralized network, we propose an algorithm named decentralized and collaborative subspace pursuit (DCSP). The basic idea of DCSP is to embed collaboration among nodes and fusion strategy into each iteration of the standard subspace pursuit (SP) algorithm. In DCSP, each node collaborates with several of its neighbors by sharing high-dimensional coefficient estimates and communicates with other remote nodes by exchanging low-dimensional support set estimates. Experimental evaluations show that, compared with several existing algorithms for sparsity pattern recovery, DCSP produces satisfactory results in terms of accuracy of sparsity pattern recovery with much less communication cost.

First Order Methods for Robust Non-negative Matrix Factorization for Large Scale Noisy Data

Nonnegative matrix factorization (NMF) has been shown to be identifiable under the separability assumption, under which all the columns(or rows) of the input data matrix belong to the convex cone generated by only a few of these columns(or rows) [1]. In real applications, however, such separability assumption is hard to satisfy. Following [4] and [5], in this paper, we look at the Linear Programming (LP) based reformulation to locate the extreme rays of the convex cone but in a noisy setting. Furthermore, in order to deal with the large scale data, we employ First-Order Methods (FOM) to mitigate the computational complexity of LP, which primarily results from a large number of constraints. We show the performance of the algorithm on real and synthetic data sets.

Finding the largest low-rank clusters with Ky Fan $2$-$k$-norm and $\ell_1$-norm

We propose a convex optimization formulation with the Ky Fan $2$-$k$-norm and $\ell_1$-norm to find $k$ largest approximately rank-one submatrix blocks of a given nonnegative matrix that has low-rank block diagonal structure with noise. We analyze low-rank and sparsity structures of the optimal solutions using properties of these two matrix norms. We show that, under certain hypotheses, with high probability, the approach can recover rank-one submatrix blocks even when they are corrupted with random noise and inserted into a much larger matrix with other random noise blocks.

A Split-and-Merge Dictionary Learning Algorithm for Sparse Representation

In big data image/video analytics, we encounter the problem of learning an overcomplete dictionary for sparse representation from a large training dataset, which can not be processed at once because of storage and computational constraints. To tackle the problem of dictionary learning in such scenarios, we propose an algorithm for parallel dictionary learning. The fundamental idea behind the algorithm is to learn a sparse representation in two phases. In the first phase, the whole training dataset is partitioned into small non-overlapping subsets, and a dictionary is trained independently on each small database. In the second phase, the dictionaries are merged to form a global dictionary. We show that the proposed algorithm is efficient in its usage of memory and computational complexity, and performs on par with the standard learning strategy operating on the entire data at a time. As an application, we consider the problem of image denoising. We present a comparative analysis of our algorithm with the standard learning techniques, that use the entire database at a time, in terms of training and denoising performance. We observe that the split-and-merge algorithm results in a remarkable reduction of training time, without significantly affecting the denoising performance.

Randomized Block Kaczmarz Method with Projection for Solving Least Squares

The Kaczmarz method is an iterative method for solving overcomplete linear systems of equations Ax=b. The randomized version of the Kaczmarz method put forth by Strohmer and Vershynin iteratively projects onto a randomly chosen solution space given by a single row of the matrix A and converges exponentially in expectation to the solution of a consistent system. In this paper we analyze two block versions of the method each with a randomized projection, that converge in expectation to the least squares solution of inconsistent systems. Our approach utilizes a paving of the matrix A to guarantee exponential convergence, and suggests that paving yields a significant improvement in performance in certain regimes. The proposed method is an extension of the block Kaczmarz method analyzed by Needell and Tropp and the Randomized Extended Kaczmarz method of Zouzias and Freris. The contribution is thus two-fold; unlike the standard Kaczmarz method, our methods converge to the least-squares solution of inconsistent systems, and by using appropriate blocks of the matrix this convergence can be significantly accelerated. Numerical experiments suggest that the proposed algorithm can indeed lead to advantages in practice.

Transfer Learning across Networks for Collective Classification

This paper addresses the problem of transferring useful knowledge from a source network to predict node labels in a newly formed target network. While existing transfer learning research has primarily focused on vector-based data, in which the instances are assumed to be independent and identically distributed, how to effectively transfer knowledge across different information networks has not been well studied, mainly because networks may have their distinct node features and link relationships between nodes. In this paper, we propose a new transfer learning algorithm that attempts to transfer common latent structure features across the source and target networks. The proposed algorithm discovers these latent features by constructing label propagation matrices in the source and target networks, and mapping them into a shared latent feature space. The latent features capture common structure patterns shared by two networks, and serve as domain-independent features to be transferred between networks. Together with domain-dependent node features, we thereafter propose an iterative classification algorithm that leverages label correlations to predict node labels in the target network. Experiments on real-world networks demonstrate that our proposed algorithm can successfully achieve knowledge transfer between networks to help improve the accuracy of classifying nodes in the target network.

Bayesian dynamic financial networks with time-varying predictors

We propose a Bayesian nonparametric model including time-varying predictors in dynamic network inference. The model is applied to infer the dependence structure among financial markets during the global financial crisis, estimating effects of verbal and material cooperation efforts. We interestingly learn contagion effects, with increasing influence of verbal relations during the financial crisis and opposite results during the United States housing bubble.

SRA: Fast Removal of General Multipath for ToF Sensors

A major issue with Time of Flight sensors is the presence of multipath interference. We present Sparse Reflections Analysis (SRA), an algorithm for removing this interference which has two main advantages. First, it allows for very general forms of multipath, including interference with three or more paths, diffuse multipath resulting from Lambertian surfaces, and combinations thereof. SRA removes this general multipath with robust techniques based on $L_1$ optimization. Second, due to a novel dimension reduction, we are able to produce a very fast version of SRA, which is able to run at frame rate. Experimental results on both synthetic data with ground truth, as well as real images of challenging scenes, validate the approach.


Join the CompressiveSensing subreddit or the Google+ Community and post there !

No comments:

Printfriendly