## Monday, November 24, 2014

### Nuclear Norm Minimization via Active Subspace Selection

Nuclear Norm Minimization via Active Subspace Selection by Cho-Jui Hsieh, Peder A. Olsen
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Randomized Interpolative Decomposition of Separated Representations

Random projections with tensors.

We introduce tensor Interpolative Decomposition (tensor ID) for the reduction of the separation rank of Canonical Tensor Decompositions (CTDs). Tensor ID selects, for a user-defined accuracy \epsilon, a near optimal subset of terms of a CTD to represent the remaining terms via a linear combination of the selected terms. Tensor ID can be used as an alternative to or a step of the Alternating Least Squares (ALS) algorithm. In addition, we briefly discuss Q-factorization to reduce the size of components within an ALS iteration. Combined, tensor ID and Q-factorization lead to a new paradigm for the reduction of the separation rank of CTDs. In this context, we also discuss the spectral norm as a computational alternative to the Frobenius norm.
We reduce the problem of finding tensor IDs to that of constructing Interpolative Decompositions of certain matrices. These matrices are generated via either randomized projection or randomized sampling of the given tensor. We provide cost estimates and several examples of the new approach to the reduction of separation rank.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Saturday, November 22, 2014

### Sunday Morning Insight: It's Complicated...On Games, NP-hardness, Grothendieck, Learning and Friends.

At the last Machine Learning meetup in Paris, someone (I believe it was Vincent Guigue ) made the comment that "not" was a relatively powerful feature for detecting sentiment analysis in a written paragraph. Some of us wondered if the same was really true in French (it does not look like it is). Clearly it opens another line of thought, if you have a model that works well in English, because it has been able to put the right importance on the "not" feature, will that model be as efficient in sentiment analysis in another language (say French) ? [2] . Complexity can sometimes be hidden in the model used on top of the underlying complexity of the problem at hand. Which brings us to a more general comment on complexity, models and all that.

One of the most interesting aspects of games is how addictive they are. There is a generic idea that seems to indicate that a games' addictiveness stems from the fact that these games are NP-Hard/NP-Complete. Here is a fascinating list of NP-complete problems games and puzzles from Wikipedia.

Our brain seems to act as a computing device that can handle small NP complete problems while it to seems to devise greedy strategies for larger ones. Small NP-complete problems probably end up being addictive because we store all the combinations filling up our natural RAM. So much so that we loose our ability to do something else: An eery parallel to the definition of addictiveness. When trying to solve bigger sized problems, we generally go about greedy like techniques and find out that they more or less work... or not.

From [3]

If you have been reading Nuit Blanche for a little while now, you can see this issue as a sharp phase transition problem similar to the one we see in compressive sensing and advanced matrix factorization: Most of the addictive problems are either solvable greedily or are very small NP complete problems. For small problems, the phase transition is irrelevant but when the problem gets bigger, we end up on one side of the phase diagram.

I was thinking of this this past week which also happened to be the month we learned of Grothendieck's passing.

For instance, this week, we noted that vision and language could be seen as problems that could be formulated as NP-hard advanced matrix factorizations. We also know that within these problems, there are areas of the phase space where any greedy solvers could do well because one can state this problem in some relaxed version. Here is my open question, could we know more about ourselves thanks to our capabilities to do NP-hard problems and devise specific greedy solving ?

For instance in 3D Shape Reconstruction from 2D Landmarks: A Convex Formulation, we wondered if the phase transition found there could transpose into optical illusions. But it is not just vision, it could be the way we talk (Neural Word Embeddings as Implicit Matrix Factorization ) or the way we learn (Sparse Reinforcement Learning via Convex Optimization). Every time there is convex relaxation, it is a clue that there is a potential sharp phase transition hidden somewhere. And then as we mentioned before (Sunday Morning Insight: Watching P vs NP): "We expect mathematicians to clear the waters" as Olivier Guédon, Roman Vershynin just did in [1] or as Afonso Bandeira explained in this Lecture on Semidefinite Relaxation for Community Detection. Maybe that problem solving is key for our brains to be part of the same hunting tribe and make friends....

[1] Community detection in sparse networks via Grothendieck's inequality by Olivier Guédon, Roman Vershynin

We present a simple and flexible method to prove consistency of semidefinite optimization problems on random graphs. The method is based on Grothendieck's inequality. Unlike the previous uses of this inequality that lead to constant relative accuracy, we achieve arbitrary relative accuracy by leveraging randomness. We illustrate the method with the problem of community detection in sparse networks. Despite much progress in the recent years, almost no rigorous results have been known for totally sparse networks -- those with bounded average degrees. We demonstrate that even in this regime, various natural semidefinite programs can be used to recover the community structure up to an arbitrarily small fraction of misclassified vertices. The method is general; it can be applied to a variety of stochastic models of networks and semidefinite programs.
[2] As an observation, at the very least, any country that spends real money in research and for which English is not the national tongue ought to invest in language centric databases freely available to all researchers.
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Friday, November 21, 2014

### CKN: Convolutional Kernel Networks - implementation -

You probably recall this entry on Deep Learning and Convolutional Kernel Networks, well there is an attendant implementation available now. From the project page:

### What is CKN?

CKN (Convolutional Kernel Networks) is a software package corresponding to the NIPS paper Convolutional Kernel Networks''.

Version 1.0 and later are open-source under licence GPLv3.

### Related publications

J. Mairal, P. Koniusz, Z. Harchaoui and C. Schmid. Convolutional Kernel Networks. Advances in Neural Information Processing Systems (NIPS). 2014
The attendant package can be donwloaded from here.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Provable Bounds for Learning Some Deep Representations

Following up on this entry, here are some headways connecting the deep architectures of some neural networks with random matrix theory and some of the findings in compressive sensing. From the paper:

In fact our result that a single layer of our generative model is a sparse denoising autoencoder can be seen as an analog of the fact that random matrices are good for compressed sensing/sparse reconstruction (see Donoho (Donoho,2006) for general matrices and Berinde et al. (Berinde et al., 2008) for sparse matrices). Of course, in compressed sensing the matrix of edge weights is known whereas here it has to be learnt, which is the main contribution of our work. Furthermore, we show that our algorithm for learning a single layer of weights can be extended to do layerwise learning of the entire network.

Provable Bounds for Learning Some Deep Representations by Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an n node multilayer network that has degree at most n for some  1 and each edge has a random edge weight in [ -1 1 ] Our algorithm learns almost all networks in this class with polynomial running time. The sample complexity is quadratic or cubic depending upon the details of the model. The algorithm uses layerwise learning. It is based upon a novel idea of observing correlations among features and using these to infer the underlying edge structure via a global graph recovery procedure. The analysis of the algorithm reveals interesting structure of neural nets with random edge weights
By the way, what's up with 0.4 in Lemma 8, why not 42 ?

Relevant:

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Thursday, November 20, 2014

### Nonlinear Information-Theoretic Compressive Measurement Design

Compressive sensing, nonlinearity, classification, terrific !

We investigate design of general nonlinear functions for mapping high-dimensional data into a lower-dimensional (compressive) space. The nonlinear measurements are assumed contaminated by additive Gaussian noise. Depending on the application, we are either interested in recovering the high-dimensional data from the non-linear compressive measurements, or performing classification directly based on these measurements. The latter case corresponds to classification based on nonlinearly constituted and noisy features. The nonlinear measurement functions are designed based on constrained mutual-information optimization. New analytic results are developed for the gradient of mutual information in this setting, for arbitrary input-signal statistics. We make connections to kernel-based methods, such as the support vector machine. Encouraging results are presented on multiple datasets, for both signal recovery and classification. The nonlinear approach is shown to be particularly valuable in high-noise scenarios.

Some supplementary material

Also semi-relevant:

Compressive Classification of a Mixture of Gaussians: Analysis, Designs and Geometrical Interpretation by Hugo Reboredo, Francesco Renna, Robert Calderbank, Miguel R. D. Rodrigues

This paper derives fundamental limits on the performance of compressive classification when the source is a mixture of Gaussians. It provides an asymptotic analysis of a Bhattacharya based upper bound on the misclassification probability for the optimal Maximum-A-Posteriori (MAP) classifier that depends on quantities that are dual to the concepts of diversity-order and coding gain in multi-antenna communications. The diversity-order of the measurement system determines the rate at which the probability of misclassification decays with signal-to-noise ratio (SNR) in the low-noise regime. The counterpart of coding gain is the measurement gain which determines the power offset of the probability of misclassification in the low-noise regime. These two quantities make it possible to quantify differences in misclassification probability between random measurement and (diversity-order) optimized measurement. Results are presented for two-class classification problems first with zero-mean Gaussians then with nonzero-mean Gaussians, and finally for multiple-class Gaussian classification problems. The behavior of misclassification probability is revealed to be intimately related to certain fundamental geometric quantities determined by the measurement system, the source and their interplay. Numerical results, representative of compressive classification of a mixture of Gaussians, demonstrate alignment of the actual misclassification probability with the Bhattacharya based upper bound. The connection between the misclassification performance and the alignment between source and measurement geometry may be used to guide the design of dictionaries for compressive classification.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Riemannian Pursuit for Big Matrix Recovery

Greedy algorithm for nuclear norm minimization on the manifold (hopefully we'll get to see an implementation of it at some point in time)

Low rank matrix recovery is a fundamental task in many real-world applications. The performance of existing methods, however, deteriorates significantly when applied to ill-conditioned or large-scale matrices. In this paper, we therefore propose an efficient method, called Riemannian Pursuit (RP), that aims to address these two problems simultaneously. Our method consists of a sequence of fixed-rank optimization problems. Each subproblem, solved by a nonlinear Riemannian conjugate gradient method, aims to correct the solution in the most important subspace of increasing size. Theoretically, RP converges linearly under mild conditions and experimental results show that it substantially outperforms existing methods when applied to large-scale and ill-conditioned matrices.

Supplementary material

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Wednesday, November 19, 2014

### Sparse Reinforcement Learning via Convex Optimization

How do you learn policies when you have too many features ?

Sparse Reinforcement Learning via Convex Optimization Zhiwei Qin, Weichang Li

We propose two new algorithms for the sparse reinforcement learning problem based on different formulations. The first algorithm is an off-line method based on the alternating direction method of multipliers for solving a constrained formulation that explicitly controls the projected Bellman residual. The second algorithm is an online stochastic approximation algorithm that employs the regularized dual averaging technique, using the Lagrangian formulation. The convergence of both algorithms are established. We demonstrate the performance of these algorithms through several classical examples.

Supplementary material.

 N00231074.jpg was taken on November 04, 2014 and received on Earth November 04, 2014. The camera was pointing toward SKY, and the image was taken using the CL1 and CL2 filters. This image has not been validated or calibrated. Image Credit: NASA/JPL/Space Science Institute

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets

How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets by Zhiyun Lu, Avner May, Kuan Liu, Alireza Bagheri Garakani, Dong Guo, Aurélien Bellet, Linxi Fan, Michael Collins, Brian Kingsbury, Michael Picheny, Fei Sha

In this paper, we investigate how to scale up kernel methods to take on large-scale problems, on which deep neural networks have been prevailing. To this end, we leverage existing techniques and develop new ones. These techniques include approximating kernel functions with features derived from random projections, parallel training of kernel models with 100 million parameters or more, and new schemes for combining kernel functions as a way of learning representations. We demonstrate how to muster those ideas skillfully to implement large-scale kernel machines for challenging problems in automatic speech recognition. We valid our approaches with extensive empirical studies on real-world speech datasets on the tasks of acoustic modeling. We show that our kernel models are equally competitive as well-engineered deep neural networks (DNNs). In particular, kernel models either attain similar performance to, or surpass their DNNs counterparts. Our work thus avails more tools to machine learning researchers in addressing large-scale learning problems.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.