Page Views on Nuit Blanche since July 2010







Please join/comment on the Google+ Community (1530), the CompressiveSensing subreddit (849), the Facebook page (46 likes), the LinkedIn Compressive Sensing group (3301) or the Advanced Matrix Factorization Group (1048)

Friday, July 03, 2015

Proceedings of The 28th Conference on Learning Theory - COLT 2015

So the COLT conference started this morning in sweltering Paris. Many of the presentations have been featured in a fashion or another on Nuit Blanche. Here are the full proceedings

 

Regular Papers

  • On Consistent Surrogate Risk Minimization and Property Elicitation, Arpit Agarwal, Shivani Agarwal, [abs] [pdf]
  • Online Learning with Feedback Graphs: Beyond Bandits, Noga Alon, Nicolò Cesa-Bianchi, Ofer Dekel, Tomer Koren [abs] [pdf]
  • Learning Overcomplete Latent Variable Models through Tensor Methods, Animashree Anandkumar, Rong Ge, Majid Janzamin [abs] [pdf]
  • Simple, Efficient, and Neural Algorithms for Sparse Coding, Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra [abs] [pdf]
  • Label optimal regret bounds for online local learning, Pranjal Awasthi, Moses Charikar, Kevin A Lai, Andrej Risteski [abs] [pdf]
  • Efficient Learning of Linear Separators under Bounded Noise, Pranjal Awasthi, Maria-Florina Balcan, Nika Haghtalab, Ruth Urner, [abs] [pdf]
  • Efficient Representations for Lifelong Learning and Autoencoding, Maria-Florina Balcan, Avrim Blum, Santosh Vempala [abs] [pdf]
  • Optimally Combining Classifiers Using Unlabeled Data, Akshay Balsubramani, Yoav Freund [abs] [pdf]
  • Minimax Fixed-Design Linear Regression, Peter L. Bartlett, Wouter M. Koolen, Alan Malek, Eiji Takimoto, Manfred K. Warmuth [abs] [pdf]
  • Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions, Alexandre Belloni, Tengyuan Liang, Hariharan Narayanan, Alexander Rakhlin [abs] [pdf]
  • Bandit Convex Optimization: T Regret in One Dimension, Sébastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres [abs] [pdf]
  • The entropic barrier: a simple and optimal universal self-concordant barrier, Sébastien Bubeck, Ronen Eldan [abs] [pdf]
  • Optimum Statistical Estimation with Strategic Data Sources, Yang Cai, Constantinos Daskalakis, Christos Papadimitriou [abs] [pdf]
  • On the Complexity of Learning with Kernels, Nicolò Cesa-Bianchi, Yishay Mansour, Ohad Shamir [abs] [pdf]
  • Learnability of Solutions to Conjunctive Queries: The Full Dichotomy, Hubie Chen, Matthew Valeriote [abs] [pdf]
  • Sequential Information Maximization: When is Greedy Near-optimal? Yuxin Chen, S. Hamed, Hassani, Amin Karbasi, Andreas Krause [abs] [pdf]
  • Efficient Sampling for Gaussian Graphical Models via Spectral Sparsification, Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, Shang-Hua Teng [abs] [pdf]
  • Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery, Peter Chin, Anup Rao, Van Vu [abs] [pdf]
  • On-Line Learning Algorithms for Path Experts with Non-Additive Losses, Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Manfred Warmuth [abs] [pdf]
  • Truthful Linear Regression, Rachel Cummings, Stratis Ioannidis, Katrina Ligett, [abs] [pdf]
  • A PTAS for Agnostically Learning Halfspaces, Amit Daniely [abs] [pdf]
  • S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification, Gautam Dasarathy, Robert Nowak, Xiaojin Zhu [abs] [pdf]
  • Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems, Yash Deshpande, Andrea Montanari [abs] [pdf]
  • Contextual Dueling Bandits, Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi [abs] [pdf]
  • Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering, Justin Eldridge, Mikhail Belkin, Yusu Wang [abs] [pdf]
  • Faster Algorithms for Testing under Conditional Sampling, Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Theertha Suresh [abs] [pdf]
  • Learning and inference in the presence of corrupted inputs, Uriel Feige, Yishay Mansour, Robert Schapire [abs] [pdf]
  • From Averaging to Acceleration, There is Only a Step-size, Nicolas Flammarion, Francis Bach [abs] [pdf]
  • Variable Selection is Hard, Dean Foster, Howard Karloff, Justin Thaler [abs] [pdf]
  • Vector-Valued Property Elicitation, Rafael Frongillo, Ian A. Kash [abs] [pdf]
  • Competing with the Empirical Risk Minimizer in a Single Pass, Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford [abs] [pdf]
  • A Chaining Algorithm for Online Nonparametric Regression, Pierre Gaillard, Sébastien Gerchinovitz [abs] [pdf]
  • Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition, Rong Ge, Furong Huang, Chi Jin, Yang Yuan [abs] [pdf]
  • Learning the dependence structure of rare events: a non-asymptotic study, Nicolas Goix, Anne Sabourin, Stéphan Clémençon [abs] [pdf]
  • Thompson Sampling for Learning Parameterized Markov Decision Processes, Aditya Gopalan, Shie Mannor [abs] [pdf]
  • Computational Lower Bounds for Community Detection on Random Graphs, Bruce Hajek, Yihong Wu, Jiaming Xu [abs] [pdf]
  • Adaptive Recovery of Signals by Convex Optimization, Zaid Harchaoui, Anatoli Juditsky, Arkadi Nemirovski, Dmitry Ostrovsky [abs] [pdf]
  • Tensor principal component analysis via sum-of-square proofs, Samuel B. Hopkins, Jonathan Shi, David Steurer [abs] [pdf]
  • Fast Exact Matrix Completion with Finite Samples, Prateek Jain, Praneeth Netrapalli [abs] [pdf]
  • Exp-Concavity of Proper Composite Losses, Parameswaran Kamalaruban, Robert Williamson, Xinhua Zhang [abs] [pdf]
  • On Learning Distributions from their Samples, Sudeep Kamath, Alon Orlitsky, Dheeraj Pichapati, Ananda Theertha Suresh [abs] [pdf]
  • MCMC Learning, Varun Kanade, Elchanan Mossel, [abs] [pdf]
  • Online with Spectral Bounds, Zohar Karnin, Edo Liberty, [abs] [pdf]
  • Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem, Junpei Komiyama, Junya Honda, Hisashi Kashima, Hiroshi Nakagawa [abs] [pdf]
  • Second-order Quantile Methods for Experts and Combinatorial Games, Wouter M. Koolen, Tim Van Erven [abs] [pdf]
  • Hierarchical Label Queries with Data-Dependent Partitions, Samory Kpotufe, Ruth Urner, Shai Ben-David [abs] [pdf]
  • Algorithms for Lipschitz Learning on Graphs, Rasmus Kyng, Anup Rao, Sushant Sachdeva, Daniel A. Spielman [abs] [pdf]
  • Low Rank Matrix Completion with Exponential Family Noise, Jean Lafond [abs] [pdf]
  • Bad Universal Priors and Notions of Optimality, Jan Leike, Marcus Hutter [abs] [pdf]
  • Learning with Square Loss: Localization through Offset Rademacher Complexity, Tengyuan Liang, Alexander Rakhlin, Karthik Sridharan [abs] [pdf]
  • Achieving All with No Parameters: AdaNormalHedge, Haipeng Luo, Robert E. Schapire [abs] [pdf]
  • Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization, Mehrdad Mahdavi, Lijun Zhang, Rong Jin [abs] [pdf]
  • Correlation Clustering with Noisy Partial Information Konstantin Makarychev, Yury Makarychev, Aravindan Vijayaraghavan [abs] [pdf]
  • Online Density Estimation of Bradley-Terry Models, Issei Matsumoto, Kohei Hatano, Eiji Takimoto [abs] [pdf]
  • First-order regret bounds for combinatorial semi-bandits, Gergely Neu [abs] [pdf]
  • Norm-Based Capacity Control in Neural Networks, Behnam Neyshabur, Ryota Tomioka, Nathan Srebro [abs] [pdf]
  • Cortical Learning via Prediction, Christos H. Papadimitriou, Santosh S. Vempala [abs] [pdf]
  • Partitioning Well-Clustered Graphs: Spectral Clustering Works!, Richard Peng, He Sun, Luca Zanetti [abs] [pdf]
  • Batched Bandit Problems, Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg [abs] [pdf]
  • Hierarchies of Relaxations for Online Prediction Problems with Evolving Constraints, Alexander Rakhlin, Karthik Sridharan [abs] [pdf]
  • Fast Mixing for Discrete Point Processes, Patrick Rebeschini, Amin Karbasi [abs] [pdf]
  • Generalized Mixability via Entropic Duality, Mark D. Reid, Rafael M. Frongillo, Robert C. Williamson, Nishant Mehta [abs] [pdf]
  • On the Complexity of Bandit Linear Optimization, Ohad Shamir [abs] [pdf]
  • An Almost Optimal PAC Algorithm, Hans U. Simon [abs] [pdf]
  • Minimax rates for memory-bounded sparse linear regression, Jacob Steinhardt, John Duchi [abs] [pdf]
  • Interactive Fingerprinting Codes and the Hardness of Preventing False Discovery, Thomas Steinke, Jonathan Ullman [abs] [pdf]
  • Convex Risk Minimization and Conditional Probability Estimation, Matus Telgarsky, Miroslav Dudík, Robert Schapire [abs] [pdf]
  • Regularized Linear Regression: A Precise Analysis of the Estimation Error, Christos Thrampoulidis, Samet Oymak, Babak Hassibi, [abs] [pdf]
  • Max vs Min: Tensor Decomposition and ICA with nearly Linear Sample Complexity, Santosh S. Vempala, Ying. Xiao, [abs] [pdf]
  • On Convergence of Emphatic Temporal-Difference Learning, H. Yu, [abs] [pdf]

Open Problems

  • Open Problem: Restricted Eigenvalue Condition for Heavy Tailed Designs, Arindam Banerjee, Sheng Chen, Vidyashankar Sivakumar, [abs] [pdf]
  • Open Problem: The landscape of the loss surfaces of multilayer networks, Anna, Choromanska, Yann, LeCun, Gérard Ben Arous [abs] [pdf]
  • Open Problem: The Oracle Complexity of Smooth Convex Optimization in Nonstandard Settings, Cristóbal Guzmán [abs] [pdf]
  • Open Problem: Online Sabotaged Shortest Path, Wouter M. Koolen, Manfred K. Warmuth, Dmitri Adamskiy, [abs] [pdf]
  • Open Problem: Learning Quantum Circuits with Queries, Jeremy Kun, Lev Reyzin, [abs] [pdf]
  • Open Problem: Recursive Teaching Dimension Versus VC Dimension, Hans U. Simon, Sandra Zilles, [abs] [pdf]
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Compressed Sensing of Multi-Channel EEG Signals: The Simultaneous Cosparsity and Low Rank Optimization

 

Compressed Sensing of Multi-Channel EEG Signals: The Simultaneous Cosparsity and Low Rank Optimization by Yipeng Liu, Maarten De Vos, Sabine Van Huffel

Goal: This paper deals with the problems that some EEG signals have no good sparse representation and single channel processing is not computationally efficient in compressed sensing of multi-channel EEG signals. Methods: An optimization model with L0 norm and Schatten-0 norm is proposed to enforce cosparsity and low rank structures in the reconstructed multi-channel EEG signals. Both convex relaxation and global consensus optimization with alternating direction method of multipliers are used to compute the optimization model. Results: The performance of multi-channel EEG signal reconstruction is improved in term of both accuracy and computational complexity. Conclusion: The proposed method is a better candidate than previous sparse signal recovery methods for compressed sensing of EEG signals. Significance: The proposed method enables successful compressed sensing of EEG signals even when the signals have no good sparse representation. Using compressed sensing would much reduce the power consumption of wireless EEG system.
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thursday, July 02, 2015

The Loss Surfaces of Multilayer Networks, Global Optimality in Tensor Factorization, Deep Learning, and Beyond , Generalization Bounds for Neural Networks through Tensor Factorization

Yesterday, Yann was talking at the Challenge in Optimization for Data Science workshop. One of his slides featured the distribution of losses across different neural network sizes (each color representing a specific network size, for one network size several randomly chosen networks were trained). The figure above was extracted from [1] but it is was the one shown in the slide.  The crux of the argument was that by becoming larger, neural networks, trained from initially random weights, could see a concentration of their final losses i.e. once the networks are becoming large enough, the non-convexity is not that bad because even the local minima are in fact not so far from the global minimum. Roughly speaking, enlarging or lifting the phase space of  hyperparameters allows the loss to converge toward zero while at the same time these losses are becoming more and more concentrated. Understanding how we can train these networks is important because currently we have no idea on how to really streamline the process nor have a particular understanding of the sampling bounds. It turns out that I had a very similar discussion with the folks at Kalray who were presenting their architectures to the NeuroSTIC people who also happen to have their annual meetings just a few meters away.

Coming back to this lifting argument, here are two papers that mentions that paper  ([1] The Loss Surfaces of Multilayer Networks by Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun ) and both take the path of tensorization as a way to do the lifting: First there is [3] Generalization Bounds for Neural Networks through Tensor Factorization by Majid Janzamin, Hanie Sedghi, Anima Anandkumar and then there is [2] Global Optimality in Tensor Factorization, Deep Learning, and Beyond by Benjamin D. Haeffele, Rene Vidal.

 From [3] one can read:  
Recently, Choromanska et al. (2015) analyze the loss surface of a multi-layer ReLU network by relating it to a spin glass system. They make several assumptions such as variable independence for the input, redundancy in network parameterization and so on. They show that under these assumptions, the lowest critical values of the random loss function form a layered structure, and the number of local minima outside that band diminishes exponentially with the network size.
aside from the sampling bound on the training they give, their approach is embarrassingly parallel which has a big impact on the eventual hardware on which it is going to be run. From [2] one can read:
Additionally, we have also shown that if the size of the network is allowed to be large enough then for any initialization a global minimizer can be found from purely local descent, and thus local minima are all equivalent. This is a similar conclusion to the work of Choromanska et al. (2014), who analyzed the problem from a statistical standpoint and showed that with sufficiently large networks and appropriate assumptions about the distributions of the data and network weights, then with high probability any family of networks learned with different initializations will have similar objective values, but we note that our results allow for a well defined set of conditions which will be sufficient to guarantee the property. Finally, many modern large scale networks do not use traditional regularization on the network weigh parameters such as an l1 or l2 norms during training and instead rely on alternative forms of regularization such as dropout as it tends to achieve better performance in practice(Srivastava et al., 2014; Krizhevsky et al., 2012; Wan et al., 2013). Given our commentary above regarding the critical importance of balancing the degree of homogeneity between the mapping and the regularizer, an immediate prediction of our analysis is that simply ensuring that the degrees of homogeneity are balanced could be a significant factor in improving the performance of deep networks 
I can't wait to see how this discussion on mapping and regularizer will turn out. Without further ado, here are the abstracts:
 [1] The Loss Surfaces of Multilayer Networks
Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun

We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. These assumptions enable us to explain the complexity of the fully decoupled neural network through the prism of the results from random matrix theory. We show that for large-size decoupled networks the lowest critical values of the random loss function form a layered structure and they are located in a well-defined band lower-bounded by the global minimum. The number of local minima outside that band diminishes exponentially with the size of the network. We empirically verify that the mathematical model exhibits similar behavior as the computer simulations, despite the presence of high dependencies in real networks. We conjecture that both simulated annealing and SGD converge to the band of low critical points, and that all critical points found there are local minima of high quality measured by the test error. This emphasizes a major difference between large- and small-size networks where for the latter poor quality local minima have non-zero probability of being recovered. Finally, we prove that recovering the global minimum becomes harder as the network size increases and that it is in practice irrelevant as global minimum often leads to overfitting.
[2] Global Optimality in Tensor Factorization, Deep Learning, and Beyond
Benjamin D. Haeffele, Rene Vidal

Techniques involving factorization are found in a wide range of applications and have enjoyed significant empirical success in many fields. However, common to a vast majority of these problems is the significant disadvantage that the associated optimization problems are typically non-convex due to a multilinear form or other convexity destroying transformation. Here we build on ideas from convex relaxations of matrix factorizations and present a very general framework which allows for the analysis of a wide range of non-convex factorization problems - including matrix factorization, tensor factorization, and deep neural network training formulations. We derive sufficient conditions to guarantee that a local minimum of the non-convex optimization problem is a global minimum and show that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a purely local descent algorithm. Our framework also provides a partial theoretical justification for the increasingly common use of Rectified Linear Units (ReLUs) in deep neural networks and offers guidance on deep network architectures and regularization strategies to facilitate efficient optimization.
[3] Generalization Bounds for Neural Networks through Tensor Factorization
Majid Janzamin, Hanie Sedghi, Anima Anandkumar

Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for training a two-layer neural network. We prove efficient generalization bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for generalizability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, for the first time, we have a computationally efficient method with guaranteed generalization bounds for training neural networks.
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Wednesday, July 01, 2015

Nuit Blanche in Review ( June 2015 )

Today, I will probably be at the Challenge in Optimization for Data Science workshop.

Since the last Nuit Blanche in Review ( May 2015 ) we certainly got closer to Pluto and Charon. We covered a few subjects ranging from 3rd generation genome sequencing (1), to hardware for machine learning (2) to a long series of post on random projections (3-7) and more:

  1. Compressive phase transition, a million genomes and 3rd generation sequencing 
  2.  Hardware for Machine Learning
  3.  Slides: Learning with Random Projections, Random Projections for Machine Learning and Data Mining: Theory & Applications   
  4. Random Projections as Regularizers, Compressive Linear Least Squares Regression and more
    The Unreasonable Effectiveness of Random Projections in Computer Science 
  5. Improved Bounds on the Dot Product under Random Projection and Random Sign Projection
    Random Features and random projections
  6. Random Maxout Features look like Quantized JL Embeddings 
  7. Extreme Compressive Sampling for Covariance Estimation 
from these, we shared two insights:

and quite a few implementations:
Some theses:

In-depth


Meetups:

Videos:


Conferences:


Videos:
Other:



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Tuesday, June 30, 2015

Vowpal Wabbit: A Machine Learning System ( Paris Machine Learning Meetup "Hors Série" #4 Season 2 )


Following up on meetup #6 Season 2, John Langford just gave us a tutorial presentation on Vowpal Wabbit this morning in Paris (Thank you Christophe and AXA for hosting us). Here are the slides:
Here is some additional relevant background material.
Much like the example Leon Bottou gave us on counterfactual reasoning, ( see his slides: Learning to Interact ) a year ago. I very much liked the exploration bit for policies evaluation: if you don't explore you just don't know and prediction errors are not controlled exploration.


which will be the subject of John's presentation at ICML in Lille next week:

Learning to Search Better than Your Teacher by Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daume, John Langford


 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, June 29, 2015

Slides and Implementation: Random forests with random projections of the output space for high dimensional multi-label classification

 
 

The implementation of the random output tree is here at: https://github.com/arjoly/random-output-trees

 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thesis: Learning in High Dimensions with Projected Linear Discriminants by Robert Durrant

 
The enormous power of modern computers has made possible the statistical modelling of data with dimensionality that would have made this task inconceivable only decades ago.  However, experience in such modelling has made researchers aware of many issues associated with working in high-dimensional domains, collectively known as `the curse of dimensionality', which can confound practitioners' desires to build good models of the  world  from  these  data.   When  the  dimensionality  is  very  large,  low-dimensional methods and geometric intuition both break down in these high-dimensional spaces. To mitigate the dimensionality curse we can use low-dimensional representations of the original data that capture most of the information it contained.  However, little is currently known about the eff ect of such dimensionality reduction on classi fier performance.  In this thesis we develop theory quantifying the e ect of random projection { a recent, very promising, non-adaptive dimensionality reduction technique {on the classi cation performance of Fisher's Linear Discriminant (FLD), a successful and  widely-used  linear  classifier. We  tackle  the  issues  associated  with  small  sample size and high-dimensionality by using randomly projected FLD ensembles, and we develop theory explaining why our new approach performs well.  Finally, we quantify the generalization error of Kernel FLD, a related non-linear projected classifier.
 
 
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Sunday, June 28, 2015

Slides: Learning with Random Projections, Random Projections for Machine Learning and Data Mining: Theory & Applications

Following up on Friday's The Unreasonable Effectiveness of Random Projections in Computer Science here are two series of slides:
 
Learning with Random Projections by Ata Kaban. The outline of the slides features the following subject;
  • Introduction to compressive learning
  • Compressive classification
  • Compressive regression
  • Ensembles 
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

CfP: The 3rd International Workshop on High Dimensional Data Mining (HDM’15)

Here is an interesting workshop organized by Ata Kaban
 

The 3rd International Workshop on High Dimensional Data Mining (HDM’15)
In conjunction with the IEEE International Conference on Data Mining (IEEE ICDM 2015)
14 November 2015, Atlantic City, NJ, USA.


Description of Workshop

Over a decade ago, Stanford statistician David Donoho predicted that the 21st century will be the century of data. "We can say with complete confidence that in the coming century, high-dimensional data analysis will be a very significant activity, and completely new methods of high-dimensional data analysis will be developed; we just don't know what they are yet." -- D. Donoho, 2000.

Unprecedented technological advances lead to increasingly high dimensional data sets in all areas of science, engineering and businesses. These include genomics and proteomics, biomedical imaging, signal processing, astrophysics, finance, web and market basket analysis, among many others. The number of features in such data is often of the order of thousands or millions - that is much larger than the available sample size.

For a number of reasons, classical data analysis methods inadequate, questionable, or inefficient at best when faced with high dimensional data spaces:
 1. High dimensional geometry defeats our intuition rooted in low dimensional experiences, and this makes data presentation and visualisation particularly challenging.
 2. Phenomena that occur in high dimensional probability spaces, such as the concentration of measure, are counter-intuitive for the data mining practitioner. For instance, distance concentration is the phenomenon that the contrast between pair-wise distances may vanish as the dimensionality increases.
3. Bogus correlations and misleading estimates may result when trying to fit complex models for which the effective dimensionality is too large compared to the number of data points available.
 4. The accumulation of noise may confound our ability to find low dimensional intrinsic structure hidden in the high dimensional data.
 5. The computation cost of processing high dimensional data or carrying out optimisation over a high dimensional parameter spaces is often prohibiting.

Topics

This workshop aims to promote new advances and research directions to address the curses and uncover and exploit the blessings of high dimensionality in data mining. Topics of interest include all aspects of high dimensional data mining, including the following:
- Systematic studies of how the curse of dimensionality affects data mining methods
- Models of low intrinsic dimension: sparse representation, manifold models, latent structure models, large margin, other?
- How to exploit intrinsic dimension in optimisation tasks for data mining?
- New data mining techniques that scale with the intrinsic dimension, or exploit some properties of high dimensional data spaces
- Dimensionality reduction
- Methods of random projections, compressed sensing, and random matrix theory applied to high dimensional data mining and high dimensional optimisation
- Theoretical underpinning of mining data whose dimensionality is larger than the sample size
- Classification, regression, clustering, visualisation of high dimensional complex data sets
- Functional data mining
- Data presentation and visualisation methods for very high dimensional data sets
- Data mining applications to real problems in science, engineering or businesses where the data is high dimensional

Paper submission
High quality original submissions are solicited for oral and poster presentation at the workshop. Papers should not exceed a maximum of 8 pages, and must follow the IEEE ICDM format requirements of the main conference. All submissions will be peer-reviewed, and all accepted workshop papers will be published in the proceedings by the IEEE Computer Society Press. Submit your paper here.

Important dates
Submission deadline: July 20, 2015.
Notifications to authors: September 1, 2015.
Workshop date: November 13, 2015.

Program committee
Arthur Zimek – Ludwig-Maximilians-Universitaet, Muenchen, Germany
Ata Kaban – University of Birmingham, UK
Barbara Hammer – University of Bielefeld, Germany
Bob Durrant – Waikato University, NZ
John A. Lee – Universite Catholique de Louvain, Belgium
Mark Last – Ben-Gurion University of the Negev, Israel
Mehmed Kantardzic – University of Louisville, USA
Michael E. Houle – National Institute of Informatics, Japan
Milos Radovanovic – University of Novi Sad, Serbia
Nenad Tomasev – Google, Mountain View, CA, USA
Peter Tino – University of Birmingham, UK
Stephan Gunnemann – Carnegie Mellon University, USA
Udo Seiffert – Fraunhofer IFF Magdeburg & University of Magdeburg, Germany
Yiming Ying – SUNY, NY, USA

Workshop organisation & contact
School of Computer Science, University of Birmingham, UK
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, June 26, 2015

Random Projections as Regularizers, Compressive Linear Least Squares Regression and more

We've seen a few papers recently
 here are a few more from the same authors


Random Projections as Regularizers: Learning a Linear Discriminant from Fewer Observations than Dimensions by Robert Durrant, Ata Kabán 
We prove theoretical guarantees for an averaging-ensemble of randomly projected Fisher Linear Discriminant classifiers, focusing on the case when there are fewer training observations than data dimensions. The speci c form and simplicity of this ensemble permits a direct and much more detailed analysis than existing generic tools in previous works. In particular, we are able to derive the exact form of the generalization error of our ensemble, conditional on the training set, and based on this we give theoretical guarantees which directly link the performance of the ensemble to that of the corresponding linear discriminant learned in the full data space. To the best of our knowledge these are the first theoretical results to prove such an explicit link for any classi er and classi er ensemble pair.  Furthermore  we  show  that  the  randomly  projected  ensemble  is  equivalent to  implementing  a  sophisticated  regularization  scheme  to  the  linear  discriminant learned in the original data space and this prevents over tting in conditions of small sample size where pseudo-inverse FLD learned in the data space is provably poor. Our  ensemble  is  learned  from  a  set  of  randomly  projected  representations  of the original high dimensional data and therefore for this approach data can be collected, stored and processed in such a compressed form. We confirm our theoretical ndings with experiments, and demonstrate the utility of our approach on several datasets from the bioinformatics domain and one very high dimensional dataset from the drug discovery domain, both settings in which fewer observations than dimensions are the norm.
New Bounds on Compressive Linear Least Squares Regression by Ata Kabán

In this paper we provide a new analysis of compressive least squares regression that removes a spurious logN factor from previous bounds, where N is the number of training points.  Our new bound has a clear interpretation and reveals meaningful structural properties of the linear regression problem that makes it solvable effectively in a small dimensional random subspace. In addition, the main part of our analysis does not require the compressive matrix to have the Johnson-Lindenstrauss property, or the RIP property. Instead, we only require its entries to be drawn i.i.d.  from a 0-mean symmetric distribution with finite first four moments.



In  this  paper,  we  present  a  new  variant  of  EDA for   high   dimensional   continuous   optimisation, which extends a recently  proposed  random  projections  (RP)  ensemble  based approach  by  employing  heavy  tailed  random  matrices.  In  particular,  we  use  random  matrices  with  i.i.d.  t-distributed  entries. The  use  of  t-distributions  may  look  surprising  in  the  context  of random projections, however we show that the resulting ensemble covariance  is  enlarged  when  the  degree  of  freedom  parameter is  lowered.  Based  on  this  observation,  we  develop  an  adaptive scheme to adjust this parameter during evolution, and this results in  a  flexible  means  of  balancing  exploration  and  exploitation  of the  search  process.  A  comprehensive  set  of  experiments  on  high dimensional  benchmark  functions  demonstrate  the  usefulness  of our approach.



Multivariate Cauchy EDA Optimisation. by Momodou L. Sanyang, Ata Kabán




We consider Black-Box continuous optimization by Estimation of Distribution Algorithms (EDA). In continuous EDA, the multivariate Gaussian distribution is widely used as a search operator, and it has the well-known advantage of modelling the correlation structure of the search variables, which univariate EDA lacks. However, the Gaussian distribution as a search operator is prone to premature convergence when the population is far from the optimum. Recent work suggests that replacing the univariate Gaussian with a univariate Cauchy distribution in EDA holds promise in alleviating this problem because it is able to make larger jumps in the search space due to the Cauchy distribution's heavy tails. In this paper, we propose the use of a multivariate Cauchy distribution to blend together the advantages of multivariate modelling with the ability of escaping early convergence to efficiently explore the search space. Experiments on 16 benchmark functions demonstrate the superiority of multivariate Cauchy EDA against univariate Cauchy EDA, and its advantages against multivariate Gaussian EDA when the population lies far from the optimum

Two approaches of Using Heavy Tails in High dimensional EDA by Momodou L. Sanyang, Hanno Muehlbrandt, Ata Kabán

We consider the problem of high dimensional black-box optimisation via Estimation of Distribution Algorithms (EDA). The Gaussian distribution is commonly used as a search operator in most of the EDA methods. However there are indications in the literature that heavy tailed distributions may perform better due to their higher exploration capabilities. Univariate heavy tailed distributions were already proposed for high dimensional problems. In 2D problems it has been reported that a multivariate heavy tailed (such as Cauchy) search distribution is able to blend together the strengths of multivariate modelling with a high exploration power. In this paper, we study whether a similar scheme would work well in high dimensional search problems. To get around of the difficulty of multivariate model building in high dimensions we employ a recently proposed random projections (RP) ensemble based approach which we modify to get samples from a multivariate Cauchy using the scale-mixture representation of the Cauchy distribution. Our experiments show that the resulting RP-based multivariate Cauchy EDA consistently improves on the performance of the univariate Cauchy search distribution. However, intriguingly, the RP-based multivariate Gaussian EDA has the best performance among these methods. It appears that the highly explorative nature of the multivariate Cauchy sampling is exacerbated in high dimensional search spaces and the population based search loses its focus and effectiveness as a result. Finally, we present an idea to increase exploration while maintaining exploitation and focus by using the RP-based multivariate Gaussian EDA in which the RP matrices are drawn with i.i.d. heavy tailed entries. This achieves improved performance and is competitive with the state of the art.

How effective is Cauchy-EDA in high dimensions? by Momodou L. Sanyang and Ata Kabán

We consider the problem of high dimensional black-box optimisation via Estimation of Distribution Algorithms (EDA). The Gaussian distribution is commonly used as a search operator in most of the EDA methods.  However there are indications in the literature that heavy tailed distributions may perform better due to their higher exploration capabilities.  Univariate heavy tailed distributions were already proposed for high dimensional problems.  In 2D problems it has been reported that a multivariate heavy tailed (such as Cauchy) search distribution is able to blend together the strengths of multivariate modelling with a high exploration power.  In this paper, we study whether a similar scheme would  work  well  in  high  dimensional  search  problems.   To  get  around  of  the  difficulty of  multivariate  model  building  in  high  dimensions  we  employ  a  recently  proposed  random projections (RP) ensemble based approach which we modify to get samples from a multivariate Cauchy using the scale-mixture representation of the Cauchy distribution. Our experiments show that the resulting RP-based multivariate Cauchy EDA consistently improves on the performance of the univariate Cauchy search distribution.  However, intriguingly,  the  RP-based  multivariate  Gaussian  EDA  has  the  best  performance  among these methods.  It appears that the highly explorative nature of the multivariate Cauchy sampling is exacerbated in high dimensional search spaces and the population based search loses its focus and e ectiveness as a result.  Finally,  we present an idea to increase exploration while maintaining exploitation and focus by using the RP-based multivariate Gaussian EDA in which the RP matrices are drawn with i.i.d.  heavy tailed entries.  This achieves improved performance and is competitive with the state of the art.
 
 
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Printfriendly