Thursday, November 06, 2014

Zero SR1 quasi-Newton method - implementation -

From Stephen Becker's webpage here is;

Zero SR1 quasi-Newton method

Background

The zeroSR1 package is based on a proximal quasi-Newton algorithm to solve
 min_{x}; f(x) + h(x)
where f is a smooth convex function and h is a (possibly non-smooth, and possibly infinite) convex function such that the
  • the proximity operator text{prox}_{h}(y) = text{argmin}_x; h(x) + frac{1}{2}|x-y|_2^2 is easy to compute,
  • the proximity operator is separable in the components of the variable, and
  • the proximity operator is piecewise linear.
Exploiting the nature of h, we show in 'A quasi-Newton proximal splitting method’ (Becker, Fadili; NIPS 2012) that one can also compute the proximity operator of h in a scaled norm:
  text{prox}_h^V(y) = text{argmin}_x ; h(x) + frac{1}{2}|x-y|_V^2,; V = D + sigma uu^T
where D is a diagonal matrix, u is a vector so that uu^T is a rank-1 matrix, and sigma is pm 1.
Because we can efficiently solve for the scaled prox, it opens up the possibility of a quasi-Newton method. The SR1 update is a rank-1 update, and by using a 0-memory version, the updates to the inverse Hessian are in exactly the form of D + sigma uu^T.
This means that for the same cost as a proximal gradient method (or an accelerated one, like FISTA), we can incorporate second order information, and the method converges very quickly.

Types of non-smooth terms we can handle

The non-smooth term h can be infinite valued; for example, it may be an indicator function of a set. The indicator function of a set C is denoted
 iota_C(x) = begin{cases} 0 & x in C  +infty & x notin C end{cases}
Equivalently, we are enforcing the constraint x in C in the optimization problem.
We can solve the scaled prox of the following h in mathcal{O}(n log n) time (compared to mathcal{O}(n) time for the regular prox) for inputs of dimension n:
Function mathematical representation
l1 norm h(x) = lVert xrVert_1 = sum_{i=1}^n lvert x_i rvert
non-negativity constraints h(x) = iota_{C_+}, C_+ = { x mid x ge 0 }
l1 norm and non-negativity h(x) =  lVert xrVert_1 + iota_{C_+}
box constraints h(x) = iota_{C_{text{box}}}, C_text{box} = { x mid ell le x le u }
ell_infty norm ballh(x) = iota_{C_infty}, C_infty = { x mid lVert x rVert_infty le 1 }
hinge loss h(x) = sum_{i=1}^n max( 0, 1 - x_i )

Code

We have put a Matlab/Octave implementation on github, under the BSD 3-clause license. If you are interested in contributing a version in python or R, we will be glad to assist.
 
 
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments:

Printfriendly