Friday, May 27, 2016

A proposal for a NIPS Workshop or Symposium on "Mapping Machine Learning to Hardware"

So now that the deadline for the papers at NIPS is over, I am interested in organizing a workshop on how Machine Learning is being mapped to Hardware. If you are interested or know somebody who is potentially interested, please get in touch with me and we will improve on that proposal (and yes the title can change too). Here is the first draft:

Dear Colleagues,

Mapping Machine Learning to Hardware

With the recent successes of Deep Learning, we are beginning to see a set of new specialized hardware dedicated to making some of these computations faster, or energy efficient or both. These technologies use either CMOS (CPU, GPUs, FPGAs, ASICs) or exotic technologies (Bio, Memristors, Quantum Chips, Photonics, etc…) as they seek to address a specific trade-off in mapping Machine Learning algorithms to a specific hardware technology.


Conversely there has been quite an effort on the empirical side at devising deep network architectures that can handle binary coefficients so as to be efficiently implementable on low complexity hardware.


A somewhat related issue is the recent interest of the sensing community to map the first layers of sensing hardware with the first layers of models such as deep networks. This approach has the potential of changing the way image reconstruction and signal processing will be performed in the future.


This workshop will bring together researchers at the interface of machine learning, hardware implementation, sensing, physics and biology.


The goals of the workshop are
  • to present how machine learning computations and algorithms are mapped and improved as a result of new hardware technologies and architectures
  • to evaluate the different trade-offs currently investigated in these approaches
  • to understand how sensors may change as a result of this mapping to Machine Learning algorithms.
  • to evaluate the constraints put forth on recent deep learning architectures so as to reduce redundancy and enable a simpler mapping between computing hardware and models.
 
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Riemannian stochastic variance reduced gradient on Grassmann manifold - implementation -

 
 
 
Bamdev just sent me the following:
 
Dear Igor,

I wish to share our recent technical report on "Riemannian stochastic variance reduced gradient on Grassmann manifold", available at http://arxiv.org/abs/1605.07367. In this paper, we extend the Euclidean SVRG algorithm to compact Riemannian manifolds. The results are encouraging.

Additionally, the codes are available at https://bamdevmishra.com/codes/rsvrg/. We also provide a template file, Riemannian_svrg.m, that is compatible with the Manopt toolbox [1].


Regards,
Bamdev
[1] http://manopt.org

Thanks Bamdev ! Here is the paper: Riemannian stochastic variance reduced gradient on Grassmann manifold by Hiroyuki Kasai, Hiroyuki Sato, Bamdev Mishra

Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite, number of loss functions. In this paper, we propose a novel Riemannian extension of the Euclidean stochastic variance reduced gradient algorithm (R-SVRG) to a compact manifold search space. To this end, we show the developments on the Grassmann manifold. The key challenges of averaging, addition, and subtraction of multiple gradients are addressed with notions like logarithm mapping and parallel translation of vectors on the Grassmann manifold. We present a global convergence analysis of the proposed algorithm with decay step-sizes and a local convergence rate analysis under fixed step-size with some natural assumptions. The proposed algorithm is applied on a number of problems on the Grassmann manifold like principal components analysis, low-rank matrix completion, and the Karcher mean computation. In all these cases, the proposed algorithm outperforms the standard Riemannian stochastic gradient descent algorithm.

 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thursday, May 26, 2016

Harvesting Nature: from Computational Imaging to Optical Computing

Today, at the 2016 SIAM Conference on Imaging Science in Albuquerque, Laurent, the CTO and Co-Founder of LightOn will be presenting at the Computational Imaging Systems minisymposium (9:30 AM-11:30 AM, Room:Alvarado Ballroom C).
Among the work Laurent will present, he will mention our earlier Proof of Concept (Random Projections through multiple optical scattering: Approximating kernels at the speed of light,)
I am currently thinking about submitting a workshop or symposium around Machine Learning and Hardware at NIPS. If you are interested or know a similar endeavor, please get in touch and let's make this happen.
This minisymposium emphasizes the interaction between sensing hardware and computational methods in computational imaging systems.  Novel sensing hardware together with algorithms that exploit the unique properties of the data acquired by this hardware has enabled the development of imaging systems with remarkable abilities that could not be achieved via traditional methods. This minisymposium brings together four leaders in the field of computational photography, and provides a cross-section of recent advances in this area.  

9:30-9:55 Macroscopic Fourier Ptychography, Oliver Cossairt, Northwestern University,
USA; Jason R. Holloway and Ashok Veeraraghavan, Rice University, USA; Manoj Sharma, Northwestern University, USA; Salman Asif, Rice University, USA; Nathan Matsuda, Northwestern University, USA; Roarke Horstmeyer, California Institute of
Technology, USA

10:00-10:25 Lensless Imaging Ashok Veeraraghavan, Rice University, USA
10:30-10:55 Photon-Efficient Reflectivity and Depth Imaging under Significant Ambient Light Vivek K. Goyal, Boston University, USA
11:00-11:25 Harvesting Nature: from Computational Imaging to Optical Computing
Laurent Daudet, Université Paris-Diderot, France
Thank you Mario for the invitation.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Tuesday, May 24, 2016

Compressively characterizing high-dimensional entangled states with complementary, random filtering

 How is compressive sensing changing the world ? Here is one way: it makes quantum mechanics simpler to understand: Compressively characterizing high-dimensional entangled states with complementary, random filtering by  Gregory A. Howland, Samuel H. Knarr, James Schneeloch, Daniel J. Lum, John C. Howell

The resources needed to conventionally characterize a quantum system are overwhelmingly large for high- dimensional systems. This obstacle may be overcome by abandoning traditional cornerstones of quantum measurement, such as general quantum states, strong projective measurement, and assumption-free characterization. Following this reasoning, we demonstrate an efficient technique for characterizing high-dimensional, spatial entanglement with one set of measurements. We recover sharp distributions with local, random filtering of the same ensemble in momentum followed by position---something the uncertainty principle forbids for projective measurements. Exploiting the expectation that entangled signals are highly correlated, we use fewer than 5,000 measurements to characterize a 65, 536-dimensional state. Finally, we use entropic inequalities to witness entanglement without a density matrix. Our method represents the sea change unfolding in quantum measurement where methods influenced by the information theory and signal-processing communities replace unscalable, brute-force techniques---a progression previously followed by classical sensing.
 
 Previously:
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thesis: Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks - implementation -


Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks by  Philipp Gysel

Convolutional neural networks (CNN) have achieved major breakthroughs in recent years. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex non-linear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. To enable embedded devices such as smartphones, Google glasses and monitoring cameras with the astonishing power of deep learning, dedicated hardware accelerators can be used to decrease both execution time and power consumption. In applications where fast connection to the cloud is not guaranteed or where privacy is important, computation needs to be done locally. Many hardware accelerators for deep neural networks have been proposed recently. A first important step of accelerator design is hardware-oriented approximation of deep networks, which enables energy-efficient inference. We present Ristretto, a fast and automated framework for CNN approximation. Ristretto simulates the hardware arithmetic of a custom hardware accelerator. The framework reduces the bit-width of network parameters and outputs of resource-intense layers, which reduces the chip area for multiplication units significantly. Alternatively, Ristretto can remove the need for multipliers altogether, resulting in an adder-only arithmetic. The tool fine-tunes trimmed networks to achieve high classification accuracy. Since training of deep neural networks can be time-consuming, Ristretto uses highly optimized routines which run on the GPU. This enables fast compression of any given network. Given a maximum tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8-bit. The code for Ristretto is available.
 
 
 The page for the Ristretto project is at: http://lepsucd.com/?page_id=621
From the page:
 
Ristretto is an automated CNN-approximation tool which condenses 32-bit floating point networks. Ristretto is an extention of Caffe and allows to test, train and finetune networks with limited numerical precision.

Ristretto In a Minute

  • Ristretto Tool: The Ristretto tool performs automatic network quantization and scoring, using different bit-widths for number representation, to find a good balance between compression rate and network accuracy.
  • Ristretto Layers: Ristretto reimplements Caffe-layers with quantized numbers.
  • Testing and Training: Thanks to Ristretto’s smooth integration into Caffe, network description files can be manually changed to quantize different layers. The bit-width used for different layers as well as other parameters can be set in the network’s prototxt file. This allows to directly test and train condensed networks, without any need of recompilation.

Approximation Schemes

Ristretto allows for three different quantization strategies to approximate Convolutional Neural Networks:
  • Dynamic Fixed Point: A modified fixed-point format with more flexibility.
  • Mini Floating Point: Bit-width reduced floating point numbers.
  • Power-of-two parameters: Layers with power-of-two parameters don’t need any multipliers, when implemented in hardware.

Documentation

Cite us

Our approximation framework was presented in an extended abstract at ICLR’16. Check out our poster. All results can be reproduced with our code on Github. If Ristretto helps your research project, please cite us:
  1. @article{gysel2016hardware,
  2. title={Hardware-oriented Approximation of Convolutional Neural Networks},
  3. author={Gysel, Philipp and Motamedi, Mohammad and Ghiasi, Soheil},
  4. journal={arXiv preprint arXiv:1604.03168},
  5. year={2016}
  6. }
 
 
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, May 23, 2016

The Shallow Side of Deep Learning


On his Twitter feed, Atlas mentioned three preprints: the first allows the building of deeper construction out of kernels (shallow methods). The next one shows how some deep nets can be considered as some sorts of ensemble of shallower nets. Finally, the last one makes the case that  dropout and stochastic depth are somehow equivalent to averaging over shallower networks.
 
  
In this paper, we propose a new image representation based on a multilayer kernel machine that performs end-to-end learning. Unlike traditional kernel methods, where the kernel is handcrafted or adapted to data in an unsupervised manner, we learn how to shape the kernel for a supervised prediction problem. We proceed by generalizing convolutional kernel networks, which originally provide unsupervised image representations, and we derive backpropagation rules to optimize model parameters. As a result, we obtain a new type of convolutional neural network with the following properties: (i) at each layer, learning filters is equivalent to optimizing a linear subspace in a reproducing kernel Hilbert space (RKHS), where we project data, (ii) the network may be learned with supervision or without, (iii) the model comes with a natural regularization function (the norm in the RKHS). We show that our method achieves reasonably competitive performance on some standard "deep learning" image classification datasets such as CIFAR-10 and SVHN, and also state-of-the-art results for image super-resolution, demonstrating the applicability of our approach to a large variety of image-related tasks.
 
 

Residual Networks are Exponential Ensembles of Relatively Shallow Networks by Andreas Veit, Michael Wilber, Serge Belongie

In this work, we introduce a novel interpretation of residual networks showing they are exponential ensembles. This observation is supported by a large-scale lesion study that demonstrates they behave just like ensembles at test time. Subsequently, we perform an analysis showing these ensembles mostly consist of networks that are each relatively shallow. For example, contrary to our expectations, most of the gradient in a residual network with 110 layers comes from an ensemble of very short networks, i.e., only 10-34 layers deep. This suggests that in addition to describing neural networks in terms of width and depth, there is a third dimension: multiplicity, the size of the implicit ensemble. Ultimately, residual networks do not resolve the vanishing gradient problem by preserving gradient flow throughout the entire depth of the network - rather, they avoid the problem simply by ensembling many short networks together. This insight reveals that depth is still an open research question and invites the exploration of the related notion of multiplicity.
 
 
Swapout: Learning an ensemble of deep architectures by  Saurabh Singh, Derek Hoiem, David Forsyth

We describe Swapout, a new stochastic training method, that outperforms ResNets of identical network structure yielding impressive results on CIFAR-10 and CIFAR-100. Swapout samples from a rich set of architectures including dropout, stochastic depth and residual architectures as special cases. When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, similar to dropout, but also across network layers. We conjecture that swapout achieves strong regularization by implicitly tying the parameters across layers. When viewed as an ensemble training method, it samples a much richer set of architectures than existing methods such as dropout or stochastic depth. We propose a parameterization that reveals connections to exiting architectures and suggests a much richer set of architectures to be explored. We show that our formulation suggests an efficient training method and validate our conclusions on CIFAR-10 and CIFAR-100 matching state of the art accuracy. Remarkably, our 32 layer wider model performs similar to a 1001 layer ResNet model.
 

 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday, May 21, 2016

Saturday Morning VideoS: Random Instances and Phase Transitions Workshop @ Simons Institute


May 2, 2016. Dawn LAMO Image 79 of Ceres.
Zadeni Crater, at 80 miles (128 kilometers) wide, is a prominent impact feature in the southern hemisphere of Ceres. This image from NASA's Dawn spacecraft shows terrain in Zadeni


The Random Instances and Phase Transitions workshop took place at the Simons Institute at the beginning of the month. The hashtag for this workshop was #SimonsRIPT.All the videos and attendant presentation can be found in the links below:



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday Morning Video: Solvable Model of Unsupervised Feature Learning, Lenka Zdeborova @ Simons Institute

Lenka who will be at our next Paris Machine Learning meetup on June 8th, was recently speaking at the Simons institute's Random Instances and Phase Transitions workshop within the trimester long Counting Complexity and Phase Transitions program. Here is Solvable Model of Unsupervised Feature Learning by Lenka Zdeborova,

We introduce factorization of certain randomly generated matrices as a probabilistic model for unsupervised feature learning. Using the replica method we obtain an exact closed formula for the minimum mean-squared error and sample complexity for reconstruction of the features from data. An alternative way to derive this formula is via state evolution of the associated approximate message passing. Analyzing the fixed points of this formula we identify a gap between computationally tractable and information theoretically optimal inference, associated to a first order phase transition. The model can serve for benchmarking generic-purpose feature learning algorithms, its analysis provides insight into theoretical understanding of unsupervised learning from data.



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, May 20, 2016

ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks using Angle Sensitive Pixels

Yes ! The great convergence in action. Merging sensing with the first layer of a deep learning architecture:


ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks using Angle Sensitive Pixels by Huaijin Chen, Suren Jayasuriya, Jiyue Yang, Judy Stephen, Sriram Sivaramakrishnan, Ashok Veeraraghavan, Alyosha Molnar

Deep learning using convolutional neural networks (CNNs) is quickly becoming the state-of-the-art for challenging computer vision applications. However, deep learning's power consumption and bandwidth requirements currently limit its application in embedded and mobile systems with tight energy budgets. In this paper, we explore the energy savings of optically computing the first layer of CNNs. To do so, we utilize bio-inspired Angle Sensitive Pixels (ASPs), custom CMOS diffractive image sensors which act similar to Gabor filter banks in the V1 layer of the human visual cortex. ASPs replace both image sensing and the first layer of a conventional CNN by directly performing optical edge filtering, saving sensing energy, data bandwidth, and CNN FLOPS to compute. Our experimental results (both on synthetic data and a hardware prototype) for a variety of vision tasks such as digit recognition, object recognition, and face identification demonstrate 97% reduction in image sensor power consumption and 90% reduction in data bandwidth from sensor to CPU, while achieving similar performance compared to traditional deep learning pipelines.


Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thursday, May 19, 2016

FISTA and "Maximal Sparsity with Deep Networks?", Learning MMSE Optimal Thresholds for FISTA

 
 
Ulugbek just sent me the following:  
 
Hi Igor,

I just saw your today’s post on Nuit Blanche on "Maximal Sparsity with Deep Networks?”.

You mentioned both FISTA and AMP, and the number of iterations required for convergence. I generally agree with your observations about studies on FISTA/AMP being hugely insightful. We have been investigating this from our side with Hassan, and I wanted to share a small update (see attached), which will be presented at iTWIST 2016 in Denmark this August. We haven’t uploaded the manuscript to arXiv as iTWIST anyways publishes its proceedings there.

Going from ISTA to FISTA is quite advantageous. We initialize the algorithm with 0 and also set the initial regularizer to 0, i.e. use identity operator instead of a shrinkage (which corresponds to having least-squares estimator) and converge much faster towards a regularizer that promotes sparsity.

Thank you having us all informed here,
Kind Regards,
- Ulugbek
Thanks Ulugbek and  Hassan ! Here is the paper:
 
 
Fast iterative shrinkage/thresholding algorithm (FISTA) is one of the most commonly used methods for solving linear inverse problems. In this work, we present a scheme that enables learning of optimal thresholding functions for FISTA from a set of training data. In particular, by relating iterations of FISTA to a deep neural network (DNN), we use the error backpropagation algorithm to find thresholding functions that minimize mean squared error (MSE) of the reconstruction for a given statistical distribution of data. Accordingly, the scheme can be used to computationally obtain MSE optimal variant of FISTA for performing statistical estimation.
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Printfriendly