Nuit Blanche: Saturday Morning Videos: Nonparametric Methods for Large Scale Representation Learning NIPS 2015 Workshop

Saturday, June 18, 2016

Saturday Morning Videos: Nonparametric Methods for Large Scale Representation Learning NIPS 2015 Workshop

Here are the videos of Nonparametric Methods for Large Scale Representation Learning Workshop at NIPS 2015. Enjoy !

Francis Bach (INRIA & ENS) video

Title: Sharp Analysis of Random Feature Expansions

Abstract: Random feature expansions provide a simple way to avoid the usual quadratic running-time complexity of kernel methods. In this talk, I will present recent results about the approximation properties of these expansions. In particular, I will provide improved bounds on the number of features needed for a given approximation quality.

Michael Mahoney (Berkeley) video

Title: Using Local Spectral Methods in Theory and in Practice

Abstract: Local spectral methods are algorithms that touch only a small part of a large data graph and yet come with locally-biased versions of the Cheeger-like quality-of-approximation guarantees that make the usual global spectral methods so popular. Since they touch only a small part of a large data graph, these methods come with strong scalability guarantees, and they can be applied to graphs with hundreds of millions or billions of nodes. Moreover, due to implicit regularization, they also come with interesting statistical guarantees, and they perform quite well in many practical situations. We will describe the basic ideas underlying these methods, how these methods tend to perform in practice at identifying different types of structure in data, and how an understanding of the implicit regularization properties underlying these methods leads to novel methods to robustify graph-based learning algorithms to the peculiarities of data preprocessing decisions.

Yee Whye Teh (Oxford) video

Title: Random Tensor Decompositions for Regression and Collaborative Filtering

Abstract: In this talk I will present some ongoing work by Xiaoyu Lu, Hyunjik Kim, Seth Flaxman and myself on approximations for efficiently learning Gaussian processes and kernel methods. Our approximation is applicable when the kernel has Kronecker structure, but when the data need not be on a grid. The idea is to make use of random feature expansions, low-rank tensors, and recent advances in stochastic gradient MCMC / Variational Inference / Descent. We will also present how this can be usedin a novel formualtion for collaborative filtering with side-information using Gaussian processes, arguing it is more natural than current proposals for using GPs in collaborative filtering, and showing interesting connections between our approximations and low-rank matrix factorization approaches to collaborative filtering.

Fei Sha (UCLA) video
Title: Do shallow kernel methods match deep neural networks -- and if not, what can
the shallow ones learn from the deep ones?

Abstract: Deep neural networks (DNNs) and other types of deep learning architectures have been hugely successful in a large number of applications. By contrast, kernel methods, which were exceedingly popular, have become lackluster. The crippling obstacle is the computational complexity of those methods. Nonetheless, there has been a resurgence of interest in these methods. In particular, several research groups have studied how to scale kernel methods to cope with large-scale learning problems.

Despite such progress, there has not been a systematic and head-on comparison between kernel methods and DNNs. Specifically, while recent approaches have shown exciting promises, we are still left with at least one itching unanswered question: can kernel methods, after being scaled up for large datasets, truly match DNN performance?

In this talk, I will describe our efforts in (partially) answering that question. I will present extensive empirical studies comparing kernel methods and DNNs for automatic speech recognition, a key field to which DNNs have been applied. Our investigative studies highlight the similarities and differences between these two paradigms. I will leave our main conclusion as a surprise.

Jean-Philippe Vert (Mines ParisTech & Curie Institute) Video
Title: Learning from Rankings

Abstract: In many applications such as genomics, high-dimensional data are often subject to technical variability such as noise of batch effects which are difficult to remove or model. If the variability approximately keeps the relative order of the features within each sample, then one could keep only the information of relative orders between features to characterize each sample, resulting in a representation of each sample as a permutation over the set of features. In this talk, I will discuss several new methods for supervised and unsupervised classification of such permutations, including new positive definite kernels on the symmetric groups and a new method for supervised full-quantile normalization, illustrating the benefits of these techniques on cancer patient stratification from noisy gene expression and mutation data.

Amr Ahmed (Google) Video
Title: Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams

Abstract: Clustering in document streams, such as online news articles, can be induced by their textual contents, as well as by the temporal dynamics of their arriving patterns. Can we leverage both sources of information to obtain a better clustering of the documents, and distill information that is not possible to extract using contents only? In this talk, I will describe a novel random process, referred to as the Dirichlet-Hawkes process, to take into account boht information in a unified framework. A distinctive feature of the proposed model is that the preferential attachment of items to clusters according to cluster sizes, present in Dirichlet processes, is now driven according to the intensities of cluster-wise self-exciting temporal point processes, the Hawkes processes. This new model establishes a previously unexplored connection between Bayesian nonparametrics and temporal point processes, which makes the number of clusters grow to accommodate the increasing complexity of online streaming contents, while at the same time adapts to the ever changing dynamics of the respective continuous arrival time. Large-scale experiments on both synthetic and real world news articles showed that Dirichlet-Hawkes processes can recover both meaningful topics and temporal dynamics, which leads to better predictive performance in terms of content perplexity and arrival time of future documents.

Graph Sparsification Approaches for Laplacian Smoothing, Veeranjaneyulu Sadhanala, Video

Word, graph and manifold embedding from Markov processes, Tatsu Hashimoto, video

Image of TITAN

N00261571.jpg was taken on 2016-06-04 18:19 (UTC) and received on Earth 2016-06-05 11:23 (UTC). The camera was pointing toward TITAN, and the image was taken using the CL1 and CB3 filters. This image has not been validated or calibrated. A validated/calibrated image will be archived with the NASA Planetary Data System.

For more information on raw images check out our frequently asked questions section.

Image Credit: NASA/JPL-Caltech/Space Science Institute

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !