Thursday, April 17, 2014

Non-invasive real-time imaging through scattering layers and around corners via speckle correlations

Wow !

Several themes mentioned in this blog are colliding in the following preprint. First thanks to the Moore's law we have the ability to do high resolution imagery with smartphone cameras like the Lumia 1020. Then there is this imaging with nature theme and then there is the sparse phase retrieval problem. I say sparse but there is no connection to that in this preprint and much like FROG (see here and here also), time will tell. What is so interesting here is that there is no need to know the transmission matrix or even the need for a time of flight cameras to see around the corners: A smartphone camera is all you need.

On a more general note, those quantum optic problems seem to have solutions for sparse imaging but there isn't much of a literature on the subject, in particular, since everybody is focused on correlation type of imaging, other types of nonlinearities such as triple correlation interferometry seem to be left on their own device (see the vocabulary used here and here). Anyway, enjoy!

( and by the way, no, the speckle is not the problem, it was, and continues to be the solution)

Imaging with optical resolution through and inside complex samples is a difficult challenge with important applications in many fields. The fundamental problem is that inhomogeneous samples, such as biological tissues, randomly scatter and diffuse light, impeding conventional image formation. Despite many advancements, no current method enables to noninvasively image in real-time using diffused light. Here, we show that owing to the memory-effect for speckle correlations, a single image of the scattered light, captured with a standard high-resolution camera, encodes all the information that is required to image through the medium or around a corner. We experimentally demonstrate single-shot imaging through scattering media and around corners using incoherent light and various samples, from white paint to dynamic biological samples. Our lensless technique is simple, does not require laser sources, wavefront-shaping, nor time-gated detection, and is realized here using a camera-phone. It has the potential to enable imaging in currently inaccessible scenarios.

Wednesday, April 16, 2014

3000th post. Wow !

So, this is the 3000th published post. wow! While there is nothing magic about this number, here is some perspective: it's about ten times 300 posts, or roughly the equivalent of 10 years of constant weekly posting and then some week-ends. I went from being ignorant about many things to having the blog mentioned in different quarters of the academic world. Let's not forget that one of the realities of the relative success of Nuit Blanche stems from the broken publishing model. On top of not providing quality peer review, the publishing cycle only allows relentless (most have no choice but being relentless) authors to be published. Using the Google, some of my own spidering processes and more recently ArXiv, Nuit Blanche features ideas and authors, two to three years ahead of the publishing cycle. Anybody who wants to have a headstart and be productive is bound to use it as one of its source of information. Without getting into the detail of it, the blog has followed the maturation of a field. There is yet so much to see since the blog follows the unfolding of the different crafts involved in sensing and making sense of what is sensed. That story keeps on giving. You want to show your appreciation ? Don't hesitate to provide a recommendation on LinkedIn under my profile. Seven other readers have done so in the past. By adding one, you are sending a signal that this type of academic blogging is serious and highly relevant while at the same time bringing clarity. When being asked outside of this community about Nuit Blanche, I find it difficult to articulate how relevant it is. Your recommendations are sending this louder signal that it is relevant. 

In the meantime, here are some links that have blossomed here over the past fast few years.
  • CS (1870): All blog entries related to compressive sensing
  • MF (293): All blog entries related to advanced matrix factorizations
  • implementation (234): All blog entries related to an implementation made available by their authors
  • CSHardware (80): All blog entries related to compressive sensing hardware and sensors.
  • CSjobs (69): All blog entries related to compressive sensing related jobs.
  • SundayMorningInsight (37): All blog entries featuring some Sunday Morning Reviews of something that has to be said :-)
  • ML (54): All blog entries related to Machine Learning
  • Csstats (22): All blog entries related to the blog entries featuring statistique on the coming and goings on this site.
The blog has more than +122 Google+1s

Several groups have emerged

But also several reference pages: 

Tuesday, April 15, 2014

Recent Developments in the Sparse Fourier Transform: A Fast DSP Chip and a faster GPS

A 0.75-million-point fourier-transform chip for frequency-sparse signals by Omid Abari, Ezz Hamed, Haitham Hassanieh, Abhinav Agarwal, Dina Katabi, Anantha P. Chandrakasan, Vladimir Stojanovic
Applications like spectrum sensing, radar signal processing, and pattern matching by convolving a signal with a long code, as in GPS, require large FFT sizes. ASIC implementations of such FFTs are challenging due to their large silicon area and high power consumption. However, the signals in these applications are sparse, i.e., the energy at the output of the FFT/IFFT is concentrated at a limited number of frequencies and with zero/negligible energy at most frequencies. Recent advances in signal processing have shown that, for such sparse signals, a new algorithm called the sparse FFT (sFFT) can compute the Fourier transform more efficiently than traditional FFTs [1].

Faster GPS Via the Sparse Fourier Transform  by Haitham Hassanieh, Fadel Adib, Dina Katabi, and Piotr Indyk. ( Slides )
GPS is one of the most widely used wireless systems. A GPS re ceiver has to lock on the satellite signals to calculate its position. The process of locking on the satellites is quite costly and requires hundreds of millions of hardware multiplications, leading to high power consumption. The fastest known algorithm for this problem is based on the Fourier transform and has a complexity of O(nlogn), where n is the number of signal samples. This paper presents the fastest GPS locking algorithm to date. The algorithm reduces the locking complexity to O(n√logn). Further, if the SNR is above a threshold, the algorithm becomes linther, if the SNR is above a threshold, the algorithm becomes linear, i.e., O(n). Our algorithm builds on recent developments in the growing area of sparse recovery. It exploits the sparse nature of the synchronization problem, where only the correct alignment btween the received GPS signal and the satellite code causes their cross-correlation to spike. We further show that the theoretical gain translates into empirical gains for GPS receivers. Specifically, we built a prototype of the design using software radios and tested it on two GPS datasets collected in the US and Europe. The results show that the new agorithm reduces the median number of multiplications by 2.2× in comparison to the state of the art design, for real GPS signals.

these examples of applications were found in this Recent Developments in the Sparse Fourier Transform by Anna C. Gilbert, Piotr Indyk, Mark Iwen, Ludwig Schmidt

The Discrete Fourier Transform (DFT) is one of the most important discrete transformations used in many computational settings from signal or image processing to scienti c computing. The most popular approach to computing the DFT uses the Fast Fourier Transform (FFT). However, with the emergence of big data problems, in which the size of the processed data sets can easily exceed terabytes, the \Fast" in Fast Fourier Transform is not fast enough. Furthermore, in many big data applications it is hard to acquire a su cient amount of data to compute the desired Fourier transform. The Sparse Fourier Transform (SFT) can compute a compressed Fourier transform using only a subset of the input data in time considerably shorter than the data set size. The goal of this article is to survey these recent developments, to explain the basic techniques coupled with examples and applications in big data, to demonstrate trade-o ffs in empirical performance of the algorithms, and to discuss the connection between the SFT and other techniques for massive data analysis such as streaming algorithms and compressive sensing.
An implementation of the sFFT is here. Other implementations are available under the FFT tag.

Monday, April 14, 2014

Compressive classification and the rare eclipse problem

This is interesting !

Compressive classification and the rare eclipse problem by Afonso S. Bandeira, Dustin G. Mixon, Benjamin Recht
This paper addresses the fundamental question of when convex sets remain disjoint after random projection. We provide an analysis using ideas from high-dimensional convex geometry. For ellipsoids, we provide a bound in terms of the distance between these ellipsoids and simple functions of their polynomial coefficients. As an application, this theorem provides bounds for compressive classification of convex sets. Rather than assuming that the data to be classified is sparse, our results show that the data can be acquired via very few measurements yet will remain linearly separable. We demonstrate the feasibility of this approach in the context of hyperspectral imaging.

from the conclusion:

It is also interesting how similar random projection and PCA perform. Note that the PCA method has an unfair advantage since it is data-adaptive, meaning that it uses the training data to select the projection, and in practical applications for which the sensing process is expensive, one might be interested in projecting in a non-adaptive way, thereby allowing for less sensing. Our simulations suggest that the expense is unnecessary, as a random projection will provide almost identical performance. As indicated in the previous subsection, random projection is also better understood as a means to maintain linear separability, and so there seems to be little bene t in choosing PCA over random projection (at least for this sort of classi cation task).
and this

As such, it would be interesting to extend the results of this paper to more general classes of random projections, in particular, random projections which can be implemented with a hyperspectral imager (say).

Maybe, we could try random projections as instantiated by Nature.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Confidence Intervals and Hypothesis Testing for High-Dimensional Regression - implementation -

As we were discussing the release of an implementation of the Sparse PCA algorithm, Andrea mentioned to me the release of an R program for hypothesis testing with the Lasso:

Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values for these models. We consider here high-dimensional linear regression problem, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized M-estimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. We test our method on synthetic data and a high-throughput genomic data set about riboflavin production rate, made publicly available by [BKM14].

A web page, maintained by Hamid Javadi, Adel JavanmardAndrea Montanari and Sven Schmit, featuring other papers and an implementation in R is here.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday, April 12, 2014

Saturday Morning Video: Seth Lloyd, Quantum algorithms for supervised and unsupervised machine learning

Guillaume Palacios's presentation ( The D-Wave “Quantum Computer” Myths and Realities) at the recent Paris Machine Learning meetup got me thinking that this Quantum Computer might not all PR after all. It turns out that Google Tech Talks just released a video of Seth Lloyd on the subject of Quantum Machine Learning. At around 30 minutes, Seth describes the encoding of information from a CD into a Quantum state in a set up that parallels the single pixel camera. 

Here is a recent ArXiv preprint on the subject:

Quantum algorithms for supervised and unsupervised machine learning by Seth Lloyd, Masoud Mohseni, Patrick Rebentrost

Machine-learning tasks frequently involve problems of manipulating and classifying large numbers of vectors in high-dimensional spaces. Classical algorithms for solving such problems typically take time polynomial in the number of vectors and the dimension of the space. Quantum computers are good at manipulating high-dimensional vectors in large tensor product spaces. This paper provides supervised and unsupervised quantum machine learning algorithms for cluster assignment and cluster finding. Quantum machine learning can take time logarithmic in both the number of vectors and their dimension, an exponential speed-up over classical algorithms.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, April 11, 2014

Sparse PCA: Covariance Thresholding and an AMP based approach

No implementation yet, but new ways of going about the Sparse PCA problem. I will be adding the following figure to the phase transition figure in the Advanced Matrix Factorization Jungle page for the Sparse PCA paragraph. 

As I mentioned in slide 13 of this recent ML Meetup presentation, to the untrained eye, sparse PCA is a solved problem, but the back story is a little bit more complex as related in the first paper: 
....Over the last ten years, a significant effort has been devoted to developing practical algorithms that outperform diagonal thresholding, see e.g. [MWA05, ZHT06, dEGJL07, dBG08, WTH09]. In particular, d’Aspremont et al. [dEGJL07] developed a promising M-estimator based on a semidefinite programming (SDP) relaxation. Amini and Wainwright [AW09] carried out an analysis of this method and proved that, if (i) k ≤ C(β) n/ log p, and (ii) if the SDP solution has rank one, then the SDP relaxation provides a consistent estimator of the support of v. At first sight, this appears as a satisfactory solution of the original problem. No procedure can estimate the support of v from less than k log p samples, and the SDP relaxation succeeds in doing it from –at most– a constant factor more samples. This picture was upset by a recent, remarkable result by Krauthgamer, Nadler and Vilenchik [KNV13]....

.... the rest is in the paper: Sparse PCA via Covariance Thresholding by Yash Deshpande and Andrea Montanari,
In sparse principal component analysis we are given noisy observations of a rank-one (or low rank) matrix of dimension n×p and seek to reconstruct it under additional sparsity assumptions. In particular, we assume here that the principal component v has at most k non-zero entries, and study the high-dimensional regime in which p is of the same order as n. In an influential paper, Johnstone and Lu introduced a simple algorithm that estimates the support of v by the largest entries in the diagonal of the empirical covariance. This method can be shown to succeed with high probability if k ≤ C1 p n/logp, and to fail with high probability if k ≥ C2p n/logp for two constants 0 < C1, C2 < ∞. Despite a considerable amount of work over the last ten years, no practical algorithm exists with provably better support recovery guarantees. Here we analyze a covariance thresholding algorithm that was recently proposed by Krauthgamer, Nadler and Vilenchik. We confirm empirical evidence presented by these authors and rigorously prove that the algorithm succeeds with high probability for k of order √n). Recent conditional lower bounds suggest that it might be impossible to do significantly better. Our analysis develops new bounds on the norm of kernel random matrices, in regimes that were not considered before.

Sparse Principal Component Analysis (PCA) is a dimensionality reduction technique wherein one seeks a low rank representation of a data matrix with additional sparsity constraints on the obtained representation. We consider two probabilistic formulations of sparse PCA: a spiked Wigner and spiked Wishart (or spiked covariance) model. We analyze an Approximate Message Passing (AMP) algorithm to estimate the underlying signal and show, in the high dimensional limit, that the AMP estimates are information-theoretically optimal. As an immediate corollary, our results demonstrate that the posterior expectation of the underlying signal, which is often intractable to compute, can be obtained using a polynomial-time scheme. Our results also effectively provide a single-letter characterization of the sparse PCA problem.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thursday, April 10, 2014

Slides and Video: Zurich Machine Learning Meetup #2

Martin Jaggi and the Zurich ML Meetup group also had a meetup this week, here are the video and slides of the talks:
#like or #fail - How Can Computers Tell the Difference? - Mark Cieliebak - #2 Slides:
Webcamaze - Amazing Webcams - Helmut Grabner - #2 Slides:
Machine Learning and Data Exploration in Python with scikit-learn - Andreas Müller - #2 Slides: Python notebook:
Their YouTube channel is here.
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Slides and Video: Paris Machine Learning Meetup #10: Dolphin Communications, Big Data, ConvNets and Quantum Computers

Last night, we had a blast with four different presentations at the Paris Machine Learning Meetup #10

Brenda McCowan talked to us about the very difficult problem of making sense of something for which we basically don't know anything, unsupervised learning at its hardest.  Philippe Nieuwbourg provided a bird's eye view of what Big Data entailed with several examples that included fashion among other. Gabriel Synnaeve described ConvNets and  a cool demo of a 30 frame per second classification algo (OverFeat) and finally Guillaume Palacios described to us how the D-Wave computer might be something of interest beyond PR. Thank you to the speakers for the four thought-provoking presentations.  Thank you also to the sponsors and partners, namely: TheFamily, HopWork and DojoCrea/DojoEvents.

More detailed comments will come later in the blog, here are the four presentations slides in English:
The video features the four presentations with the attendant Q&As (the first one is in English, the other three in French).

Wednesday, April 09, 2014

Ce Soir / Tonight: Paris Machine Learning Meetup #10: Dolphin Communications, Big Data, ConvNets and Quantum Computers

The meetup will streamed online at this address or you can watch it right below. Please read below for the context of these presentations (in English and French)

First, thank you to our sponsors for this 10th Paris Machine Learning meetup. They are numerous, these events just wouldn't happen without them. First is our historic partner DojoCrea, that allowed us to grow the meetup to what it currently is. We will be hosted at TheFamily this time in their new location. Hopwork has kindly offered to cover the food and refreshments. The meetup will start at 6:45PM with the beginning of the talks at 7:30PM (all Paris time). Besides the first presentation and Q&A, it is likely that the other presentations will be in French. 
Let's give some context: 

There are many important problems in Machine Learning besides strictly Learning the Identity (exploitation), there is also the exploration part. We will cover the multi-Armed bandit issues at the next meetup but when it comes to pure exploration, unsupervised learning is one of the hardest problem when trying to make sense of something that nobody has really understood before: Animal Communication. 

In order to get a sense of what we know and what we don't know, what sort of datasets are available, etc... on these truly hard problems, we will play the video that Brenda McCowan did at the Analyzing Animal Vocal Sequences Investigative Workshop at Nimbios this past fall, and then we will have her in a remote Q&A session live from California. Here is the video for those of you who are thinking about specific questions (you can add those in the comment section if you want me to ask them)

As isomorphismes said in a comment of a blog post featuring the videos of that workshop
Fantastic. Secondary school teachers should point their kids here to prove maths isn't boring.
Yes, I agree, plus any kid I know has always wanted to try to talk to Flipper or Skippy, a far cry from spending your next working life on worrying about click-thrus :-)

We will then have Philippe Nieuwbourg, live from Montreal, who will give us a bird's eyeview of how traditional businesses are changing thanks to the use of their own data. 

At Meetup#8, Lenka Zdeborova ( How hard is it to find a needle in a haystack? ) showed us that certain problems in Machine Learning (clustering or other matrix factorization) can only be explored through heuristics (i.e. easier problems to solve ) but that the downside meant the appearance of sharp or not so sharp phase transitions (sometimes it works, sometimes it never works). The only way to avoid these problems is to directly solve the hard problems that are combinatorial in nature even if that means that one has to develop new technologies such as Quantum Computers. Guillaume Palacios, formerly a physicist at CEA will tell us what he thinks of the D-Wave Quantum computers. To give more context and show that this is not a simple thought experiument, Google has a Quantum AI lab team looking into what these computers can do.

Finally, Gabriel Synnaeve will introduce the subject of ConvNets. This presentation is a follow-up of last week's presentation by Pierre Sermanet ( Deep ConvNets; "Astounding" baseline for vision, ) that we co-organized at Normale Sup at the Paris Machine Learning Specialist Talk. If you recall we had Pierre also on an earlier meetup when he talked to us how he used this type of algorithm to Win Kaggle's Dog's vs Cats contest. Because the subject is getting considerable notice, Gabriel will go over the results mentioned by Pierre last week on these ConvNets that are revolutionizing Computer Vision in nearly all known benchmarks (and allow people to win Kaggle competitions). Gabriel will also present to us the structure of these algorithms. Thank you to him to do this on such short notice.

Out group now has more than 751 members with about 168 who RSVP'ed that they would come.

Now the same announcement in French:

Pour cette 10eme edition nous aurons quatre presentations : Philippe Nieuwbourg, Guillaume Palacios, Brenda McCowan et Gabriel Synnaeve

Nous serons dans les nouveaux locaux de TheFamily.

Au dela du networking habituel qui se passera en premiere partie et apres les presentations, les themes du meetup vont etre varies. Nous aurons deux presentations remote une en anglais, l'autre en francais. Les deux autres presentations seront en Francais. Les slides (en anglais) se trouveront sur les archives prochainement.

Nous montrerons une presentation de Brenda McCowan qui nous parlera de la communication entre/avec les dauphins. Cette video est deja sur YouTube et se trouve ici ( ). A la suite de la projection (comme nous l'avions fait pour le meetup 8 avec les projets de crowdfunding), nous aurons Brenda en direct de Californie pour repondre a nos questions. L'idee de cette presentation est de montrer qu'au dela des applications traditionelles du Machine Learning, ces techniques peuvent aussi nous permettre de faire des choses tres differentes de leur utilisation habituelle. Franck et moi pensons qu'il est important que vous regardiez la video auparavant de sorte a ce que nous puissions poser de bonnes questions a Brenda. Si vous avez des questions en avance, envoyez les nous et nous les ferons passer a Brenda avant le Q&A

Philippe Nieuwbourg nous parlera en direct de Montreal pour nous donnez un apercu plus general de l'utilisation des donnees dans des domaines divers.

Nous aurons aussi deux presentations de membre de notre groupe de meetup.
Au meetup #8, nous avions vu, grace a Lenka Zdeborova, que certains problemes de ML comme le clustering ou d'autres problemes de factorization de Matrices (meetup #9) ne pouvaient etre aborde qu'a travers des relaxations (c'est a dire des problemes voisins plus facile a resoudre) et que cela avait, des fois, l'incovenient d'engendrer des problemes de transition de phase (des fois ca marche, des fois cela ne marche absolument jamais). Le seul moyen d'eviter ces problematiques etant de resoudre directement des problemes combinatoriels quitte a developer de nouvelles technologies telle que l'ordinateur quantique. Guillaume Palacios, ancien physicien au CEA, nous fera une introduction sur le sujet et nous decrira ce qu'il pense du projet d'ordinateur quantique de DWave. L'un des ces ordinateurs ayant ete acquis par Google.

Finalement, Gabriel Synnaeve nous parlera de ConvNets. Cette presentation fait suite a celle de Pierre Sermanet (meetup #8 ) qui etait a Paris la semaine derniere et que Gabriel et nous avons accueilli a Normale Sup (Paris Machine Learning Specialist Talk). Parceque le sujet est d'une actualite brulante, Gabriel nous reparlera des resultats montres par Pierre sur ces fameux ConvNets qui revolutionnent depuis quelques mois tout les benchmarks existants (et fait gagner dans les competitions Kaggle utilisant des images). Gabriel nous decrira brievement la structure de ces algorithmes et ce qu'il y a sur le net pour les implementations. Merci a lui de faire cela au pied levé.

Les abstracts:

+ Unraveling dolphin communication complexity: Past approaches and next steps Brenda McCowan 

La presentation sur YouTube:

+ Transformez vos données en dollars, par Philippe Nieuwbourg, analyste indépendant

Comment l'analyse prédictive et les données volumineuses bouleversent certains secteurs traditionnels : études de cas dans le domaine de la mode, des études comportementales, de la distribution et de la culture.

+ "The D-Wave quantum computer: Myths and Realities"

La societe Canadienne D-Wave produit le 1er ordinateur quantique a vocation commerciale. Des geants comme Google, la NASA ou encore Amazon comptent parmi les investisseurs de D-Wave et sont aussi les tout premiers utilisateurs de leur machine. Mais qu'y a-t-il dans cette machine etrange quand on souleve le capot ? Le but de ce talk est de donner des elements de reponse et de reflexion. Apres une courte introduction au calcul quantique, nous introduirons le concept de quantum optimization, au coeur du fonctionnement de la machine D-Wave. Nous verrons comment cette technique s'applique a la resolution de problemes combinatoires dit NP-hard et mentionnerons quelques applications aux algorithmes de Machine Learning.

+ ConvNets: