## Friday, September 30, 2016

### CfP: ESANN2017 Special sessions, 26 - 28 April 2017

Matthieu just sent me the following:

Dear Igor,

I hope you are doing well. I am writing you because I would really appreciate if you could share some information which should interest your blog readers.

Gilles Delmaire, Gilles Roussel, and myself are organizing a special session on Environmental Signal Processing for the 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN https://www.elen.ucl.ac.be/esann/) but it would be fairer to promote all the special sessions (https://www.elen.ucl.ac.be/esann/index.php?pg=specsess), as all of them may interest the Nuit Blanche readers. I think it will really be worth attending ESANN in April 2017, in the magnificent city of Brugge!
All the special session organizers sent submission invitations to potential participants but, at least in our case, if we forget a Nuit Blanche reader who wish to submit a paper in our session, we would appreciate to be firstly contacted.
Best regards,
Matthieu
-- Matthieu PUIGT, Ph.D.

Thanks Matthieu ! according to the main page, prospective authors are invited to submit their contributions before 19 November 2016.

The following special sessions will be organized at ESANN2017:

Photo Copyright ESA/Rosetta/MPS for OSIRIS Team MPS/UPD/LAM/IAA/SSO/INTA/UPM/DASP/IDA

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### L1-PCA, Online Sparse PCA and Discretization and Minimization of the L1 Norm on Manifolds

Coming back to some of the themes around Matrix Factorizations, the L1 norm and phase transitions:

Iteratively Reweighted Least Squares Algorithms for L1-Norm Principal Component Analysis by Young Woong Park, Diego Klabjan
Principal component analysis (PCA) is often used to reduce the dimension of data by selecting a few orthonormal vectors that explain most of the variance structure of the data. L1 PCA uses the L1 norm to measure error, whereas the conventional PCA uses the L2 norm. For the L1 PCA problem minimizing the fitting error of the reconstructed data, we propose an exact reweighted and an approximate algorithm based on iteratively reweighted least squares. We provide convergence analyses, and compare their performance against benchmark algorithms in the literature. The computational experiment shows that the proposed algorithms consistently perform best.

Online Learning for Sparse PCA in High Dimensions: Exact Dynamics and Phase Transitions by Chuang Wang, Yue M. Lu

We study the dynamics of an online algorithm for learning a sparse leading eigenvector from samples generated from a spiked covariance model. This algorithm combines the classical Oja's method for online PCA with an element-wise nonlinearity at each iteration to promote sparsity. In the high-dimensional limit, the joint empirical measure of the underlying sparse eigenvector and its estimate provided by the algorithm is shown to converge weakly to a deterministic, measure-valued process. This scaling limit is characterized as the unique solution of a nonlinear PDE, and it provides exact information regarding the asymptotic performance of the algorithm. For example, performance metrics such as the cosine similarity and the misclassification rate in sparse support recovery can be obtained by examining the limiting dynamics. A steady-state analysis of the nonlinear PDE also reveals an interesting phase transition phenomenon. Although our analysis is asymptotic in nature, numerical simulations show that the theoretical predictions are accurate for moderate signal dimensions.

Consistent Discretization and Minimization of the L1 Norm on Manifolds by Alex Bronstein, Yoni Choukroun, Ron Kimmel, Matan Sela

The L1 norm has been tremendously popular in signal and image processing in the past two decades due to its sparsity-promoting properties. More recently, its generalization to non-Euclidean domains has been found useful in shape analysis applications. For example, in conjunction with the minimization of the Dirichlet energy, it was shown to produce a compactly supported quasi-harmonic orthonormal basis, dubbed as compressed manifold modes. The continuous L1 norm on the manifold is often replaced by the vector l1 norm applied to sampled functions. We show that such an approach is incorrect in the sense that it does not consistently discretize the continuous norm and warn against its sensitivity to the specific sampling. We propose two alternative discretizations resulting in an iteratively-reweighed l2 norm. We demonstrate the proposed strategy on the compressed modes problem, which reduces to a sequence of simple eigendecomposition problems not requiring non-convex optimization on Stiefel manifolds and producing more stable and accurate results.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Thursday, September 29, 2016

### A Randomized Tensor Singular Value Decomposition based on the t-product

A Randomized Tensor Singular Value Decomposition based on the t-product by Jiani Zhang, Arvind K. Saibaba, Misha Kilmer, Shuchin Aeron

The tensor Singular Value Decomposition (t-SVD) for third order tensors that was proposed by Kilmer and Martin~\cite{2011kilmer} has been applied successfully in many fields, such as computed tomography, facial recognition, and video completion. In this paper, we propose a method that extends a well-known randomized matrix method to the t-SVD. This method can produce a factorization with similar properties to the t-SVD, but is more computationally efficient on very large datasets. We present details of the algorithm, theoretical results, and provide numerical results that show the promise of our approach for compressing and analyzing datasets. We also present an improved analysis of the randomized subspace iteration for matrices, which may be of independent interest to the scientific community.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Coherence Pursuit: Fast, Simple, and Robust Principal Component Analysis

Interesting that such a simple algorithm could do the job:

Coherence Pursuit: Fast, Simple, and Robust Principal Component Analysis by Mostafa Rahmani, George Atia

This paper presents a remarkably simple, yet powerful, algorithm for robust Principal Component Analysis (PCA). In the proposed approach, an outlier is set apart from an inlier by comparing their coherence with the rest of the data points. As inliers lie on a low dimensional subspace, they are likely to have strong mutual coherence provided there are enough inliers. By contrast, outliers do not typically admit low dimensional structures, wherefore an outlier is unlikely to bear strong resemblance with a large number of data points. The mutual coherences are computed by forming the Gram matrix of normalized data points. Subsequently, the subspace is recovered from the span of a small subset of the data points that exhibit strong coherence with the rest of the data. As coherence pursuit only involves one simple matrix multiplication, it is significantly faster than the state of-the-art robust PCA algorithms. We provide a mathematical analysis of the proposed algorithm under a random model for the distribution of the inliers and outliers. It is shown that the proposed method can recover the correct subspace even if the data is predominantly outliers. To the best of our knowledge, this is the first provable robust PCA algorithm that is simultaneously non-iterative, can tolerate a large number of outliers and is robust to linearly dependent outliers

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Wednesday, September 28, 2016

### Large-Scale Strategic Games and Adversarial Machine Learning

Estimating the impact of random projections / Random Features on adversarial Machine Learning, I love it.

From the paper:

Specifically, nonzero-sum large-scale strategic games with high-dimensional continuous decision spaces and random projection methods are investigated as a starting point. Our investigation centers around the reduction of large-scale strategic games using transformations such as random projections and their effect on Nash Equilibrium solutions. Analytically tractable results are presented for quadratic games and in an adversarial machine learning setting.

Large-Scale Strategic Games and Adversarial Machine Learning by Tansu Alpcan, Benjamin I. P. Rubinstein, Christopher Leckie

Decision making in modern large-scale and complex systems such as communication networks, smart electricity grids, and cyber-physical systems motivate novel game-theoretic approaches. This paper investigates big strategic (non-cooperative) games where a finite number of individual players each have a large number of continuous decision variables and input data points. Such high-dimensional decision spaces and big data sets lead to computational challenges, relating to efforts in non-linear optimization scaling up to large systems of variables. In addition to these computational challenges, real-world players often have limited information about their preference parameters due to the prohibitive cost of identifying them or due to operating in dynamic online settings. The challenge of limited information is exacerbated in high dimensions and big data sets. Motivated by both computational and information limitations that constrain the direct solution of big strategic games, our investigation centers around reductions using linear transformations such as random projection methods and their effect on Nash equilibrium solutions. Specific analytical results are presented for quadratic games and approximations. In addition, an adversarial learning game is presented where random projection and sampling schemes are investigated.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Tuesday, September 27, 2016

### LightOn, Forward We Go.

I officially mentioned LightOn back in June (LightOn. And so it begins.). Since then, we've been busy: talking to potential investors, getting some hardware up and running and then some. Here are some of the elements we put out on our press section:

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

So in previous times, people would talk about upsamling or superresolution ( see this eight year old blog entry entitled CS: Very High Speed Incoherent Projections for Superresolution, and a conference. on how it is done for HDTV) , in these times of the great convergence, CNNs come to the rescue (the second prepritn gives an explanation of one of the step of the first paper that was presented at CVPR:

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network by Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang

Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.

Is the deconvolution layer the same as a convolutional layer? by Wenzhe Shi, Jose Caballero, Lucas Theis, Ferenc Huszar, Andrew Aitken, Christian Ledig, Zehan Wang

In this note, we want to focus on aspects related to two questions most people asked us at CVPR about the network we presented. Firstly, What is the relationship between our proposed layer and the deconvolution layer? And secondly, why are convolutions in low-resolution (LR) space a better choice? These are key questions we tried to answer in the paper, but we were not able to go into as much depth and clarity as we would have liked in the space allowance. To better answer these questions in this note, we first discuss the relationships between the deconvolution layer in the forms of the transposed convolution layer, the sub-pixel convolutional layer and our efficient sub-pixel convolutional layer. We will refer to our efficient sub-pixel convolutional layer as a convolutional layer in LR space to distinguish it from the common sub-pixel convolutional layer. We will then show that for a fixed computational budget and complexity, a network with convolutions exclusively in LR space has more representation power at the same speed than a network that first upsamples the input in high resolution space.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Monday, September 26, 2016

### Book: Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto is here.

The 455 page draft of the second of Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto is here. Here is the table of context:
Preface to the First Edition ix
Preface to the Second Edition xiii
Summary of Notation xvii
1 The Reinforcement Learning Problem ...1
1.1 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . .1
1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
1.3 Elements of Reinforcement Learning . . . . . . . . . . . . . . . . . . .6
1.4 Limitations and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . .7
1.5 An Extended Example: Tic-Tac-Toe . . . . . . . . . . . . . . . . . . . 10
1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 History of Reinforcement Learning . . . . . . . . . . . . . . . . . . . . 15
1.8 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 23
I Tabular Solution Methods....25
2 Multi-arm Bandits...27
2.1 A k-Armed Bandit Problem . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Action-Value Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Incremental Implementation . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Tracking a Nonstationary Problem . . . . . . . . . . . . . . . . . . . . 34
2.5 Optimistic Initial Values . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6 Upper-Con dence-Bound Action Selection . . . . . . . . . . . . . . . . 37
2.7 Gradient Bandit Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 38
2.8 Associative Search (Contextual Bandits) . . . . . . . . . . . . . . . . . 42
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
iii
iv
CONTENTS
3 Finite Markov Decision Processes...47
3.1 The Agent{Environment Interface . . . . . . . . . . . . . . . . . . . . 47
3.2 Goals and Rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Unified Notation for Episodic and Continuing Tasks . . . . . . . . . . 54
3.5 The Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.6 Markov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.7 Value Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.8 Optimal Value Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.9 Optimality and Approximation . . . . . . . . . . . . . . . . . . . . . . 72
3.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4 Dynamic Programming....79
4.1 Policy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2 Policy Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3 Policy Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.4 Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.5 Asynchronous Dynamic Programming . . . . . . . . . . . . . . . . . . 91
4.6 Generalized Policy Iteration . . . . . . . . . . . . . . . . . . . . . . . . 93
4.7 E ciency of Dynamic Programming . . . . . . . . . . . . . . . . . . . 94
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5 Monte Carlo Methods....99
5.1 Monte Carlo Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2 Monte Carlo Estimation of Action Values . . . . . . . . . . . . . . . . 104
5.3 Monte Carlo Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4 Monte Carlo Control without Exploring Starts . . . . . . . . . . . . . 108
5.5 O -policy Prediction via Importance Sampling . . . . . . . . . . . . . 111
5.6 Incremental Implementation . . . . . . . . . . . . . . . . . . . . . . . . 116
5.7 O -Policy Monte Carlo Control . . . . . . . . . . . . . . . . . . . . . . 118

5.8 Return-Specific Importance Sampling . . . . . . . . . . . . . . . . . . 120
5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6 Temporal-Difference Learning...127
6.1 TD Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2 Advantages of TD Prediction Methods . . . . . . . . . . . . . . . . . . 131
6.3 Optimality of TD(0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.4 Sarsa: On-Policy TD Control . . . . . . . . . . . . . . . . . . . . . . . 137
CONTENTS
v
6.5 Q-learning: O -Policy TD Control . . . . . . . . . . . . . . . . . . . . 140
6.6 Expected Sarsa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.7 Maximization Bias and Double Learning . . . . . . . . . . . . . . . . . 143
6.8 Games, Afterstates, and Other Special Cases . . . . . . . . . . . . . . 145
6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7 Multi-step Bootstrapping.....151
7.1 n-step TD Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.2 n-step Sarsa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.3 n-step O -policy Learning by Importance Sampling . . . . . . . . . . 158
7.4 O -policy Learning Without Importance Sampling:
The n-step Tree Backup Algorithm . . . . . . . . . . . . . . . . . . . . 160
7.5 A Unifying Algorithm: n-stepQ() . . . . . . . . . . . . . . . . . . . . 162
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8 Planning and Learning with Tabular Methods.....167
8.1 Models and Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.2 Dyna: Integrating Planning, Acting, and Learning . . . . . . . . . . . 169
8.3 When the Model Is Wrong . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.4 Prioritized Sweeping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.5 Planning as Part of Action Selection . . . . . . . . . . . . . . . . . . . 180
8.6 Heuristic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
8.7 Monte Carlo Tree Search . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
II Approximate Solution Methods......189
9 On-policy Prediction with Approximation.....191
9.1 Value-function Approximation . . . . . . . . . . . . . . . . . . . . . . . 191
9.2 The Prediction Objective (MSVE) . . . . . . . . . . . . . . . . . . . . 192
9.3 Stochastic-gradient and Semi-gradient Methods . . . . . . . . . . . . . 194
9.4 Linear Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
9.5 Feature Construction for Linear Methods . . . . . . . . . . . . . . . . 203
9.5.1 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
9.5.2 Fourier Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
9.5.3 Coarse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
9.5.4 Tile Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
9.5.5 Radial Basis Functions . . . . . . . . . . . . . . . . . . . . . . . 215
vi
CONTENTS
9.6 Nonlinear Function Approximation: Arti cial Neural Networks . . . . 216
9.7 Least-Squares TD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
9.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
10 On-policy Control with Approximation....229
10.1 Episodic Semi-gradient Control . . . . . . . . . . . . . . . . . . . . . . 229
10.2 n-step Semi-gradient Sarsa . . . . . . . . . . . . . . . . . . . . . . . . 232
10.3 Average Reward: A New Problem Setting for Continuing Tasks . . . . 234
10.4 Deprecating the Discounted Setting . . . . . . . . . . . . . . . . . . . . 238
10.5 n-step Differential Semi-gradient Sarsa . . . . . . . . . . . . . . . . . . 239
10.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
11 O -policy Methods with Approximation....243
11.1 Semi-gradient Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
11.2 Baird's Counterexample . . . . . . . . . . . . . . . . . . . . . . . . . . 245
11.3 The Deadly Triad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
12 Eligibility Traces.....251
12.1 The-return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
12.2 TD() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
12.3 An On-line Forward View . . . . . . . . . . . . . . . . . . . . . . . . . 259
12.4 True Online TD() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
12.5 Dutch Traces in Monte Carlo Learning . . . . . . . . . . . . . . . . . . 263
13.1 Policy Approximation and its Advantages . . . . . . . . . . . . . . . . 266
13.2 The Policy Gradient Theorem . . . . . . . . . . . . . . . . . . . . . . . 268
13.3 REINFORCE: Monte Carlo Policy Gradient . . . . . . . . . . . . . . . 270
13.4 REINFORCE with Baseline . . . . . . . . . . . . . . . . . . . . . . . . 272
13.5 Actor-Critic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
13.6 Policy Gradient for Continuing Problems (Average Reward Rate) . . . 275
13.7 Policy Parameterization for Continuous Actions . . . . . . . . . . . . . 278
III Looking Deeper...280
14 Psychology  281
14.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
14.2 Prediction and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
CONTENTS
vii
14.3 Classical Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
14.3.1 The Rescorla-Wagner Model . . . . . . . . . . . . . . . . . . . 289
14.3.2 The TD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
14.3.3 TD Model Simulations . . . . . . . . . . . . . . . . . . . . . . . 292
14.4 Instrumental Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . 301
14.5 Delayed Reinforcement . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
14.6 Cognitive Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
14.7 Habitual and Goal-Directed Behavior . . . . . . . . . . . . . . . . . . . 309
14.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
14.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
14.10Bibliographical and Historical Remarks . . . . . . . . . . . . . . . . . 315
15 Neuroscience....319
15.1 Neuroscience Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
15.2 Reward Signals, Reinforcement Signals, Values, and Prediction Errors 322
15.3 The Reward Prediction Error Hypothesis . . . . . . . . . . . . . . . . 324
15.4 Dopamine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
15.5 Experimental Support for the Reward Prediction Error Hypothesis . . 329
15.6 TD Error/Dopamine Correspondence . . . . . . . . . . . . . . . . . . . 332
15.7 Neural Actor-Critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
15.8 Actor and Critic Learning Rules . . . . . . . . . . . . . . . . . . . . . 342
15.9 Hedonistic Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
15.10Collective Reinforcement Learning . . . . . . . . . . . . . . . . . . . . 348
15.11Model-Based Methods in the Brain . . . . . . . . . . . . . . . . . . . . 351
15.12Addiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
15.13Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
15.14Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
15.15Bibliographical and Historical Remarks . . . . . . . . . . . . . . . . . 357
16 Applications and Case Studies....365
16.1 TD-Gammon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
16.2 Samuel's Checkers Player . . . . . . . . . . . . . . . . . . . . . . . . . 370
16.3 The Acrobot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
16.4 Watson's Daily-Double Wagering . . . . . . . . . . . . . . . . . . . . 376
16.5 Optimizing Memory Control . . . . . . . . . . . . . . . . . . . . . . . . 379
16.6 Human-Level Video Game Play . . . . . . . . . . . . . . . . . . . . . . 384
16.7 Mastering the Game of Go . . . . . . . . . . . . . . . . . . . . . . . . . 389
viii
CONTENTS
16.8 Personalized Web Services . . . . . . . . . . . . . . . . . . . . . . . . . 396
16.9 Thermal Soaring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
17 Frontiers....403
17.1 The Unified View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
References....407

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Sunday, September 25, 2016

### Sunday Morning Video: Bay Area Deep Learning School Day 2 Live streaming

The Bay Area Deep Learning School Day 2 starts streaming in 30 minutes at 9:00AM PST / 12PM EST / 5:00PM London time / 6:00PM Paris time and it's all here. The whole schedule is here. Yesterday's video of day 1 has already garnered 16,000 views.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Sunday Morning Videos: HORSE2016, On “Horses” and “Potemkin Villages” in Applied Machine Learning

While waiting for day 2 of the Bay Area Deep Learning school in three hours, here are the ten videos of presentations made at the HORSE2016 workshop (On “Horses” and “Potemkin Villages” in Applied Machine Learning) organized by Bob Sturm. Bob came to present something around that theme at the 9th meetup of Season 3 of the Paris Machine Learning meetup. With this workshop, the field and attendant issues are becoming more visible: this is outstanding! as this has bearing on algorithm bias and explainability.  To make the video-watching more targeted, Bob even included commentaries with embedded videos in his blog post, here is the beginning of the whole blog entry, you should go there:

September 19, 2016 saw the successful premier edition of HORSE2016: On “Horses” and “Potemkin Villages” in Applied Machine Learning. I have now uploaded videos to the HORSE2016 YouTube channel, and posted slides to the HORSE2016 webpage. I embed the videos below with some commentary.
HORSE2016 had 10 speakers expound on a variety of interesting topics, and about 60 people in the audience. I am extremely pleased that the audience included several people from outside academia, including industry, government employees and artists. This shows how many have recognised the extent to which machine learning and artificial intelligence are impacting our daily lives. The issues explored at HORSE2016 are essential to ensuring this impact remains beneficial and not detrimental.
Here is my introductory presentation, “On Horse Taxonomy and Taxidermy”. This talk is all about “horses” in applied machine learning: what are they? Why is this important and relevant today? Why the metaphor, and why is it appropriate? I present an example “horse,” uncovered using an intervention experiment and a generation experiment. Finally, I discuss what a researcher should do if someone demonstrates their system is a “horse”.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.