Monday, February 25, 2013

Hessian Matrix



Learning!
I just love peeling back the onion. Now that I have the classic gradient descent working, what is the next step? 
BFGS apparently. http://en.wikipedia.org/wiki/BFGS_method

And I learn about a new matrix: The Hessian Matrix, (which I have been seeing mentioned a lot lately, but didn't know what it was).
http://en.wikipedia.org/wiki/Hessian_matrix

So, my intuition (before researching this) is: We are using Gradient Descent to move to the solution (where the first derivative = 0). What would be really handy is if we had the second derivative to know what our slope is, and if it's very steep, then we can make a bigger jump… Or actually if we make a small change in X, it's going to make a large change in Y due to the steep slope..

And presto, the Hessian Matrix is actually the Second Derivative. 
Also of note BFGS, needs NxN elements for the Hessian, where N is the number of parameters in the model, and there is also L-BFGS for limited memory. 

So sizing this up: For 100,000 variables (which is an epic-non-human-readable model), we would need 80GB, but I'm using less than 1K variables, so 8MB should do.

-JD

0 Comments:

Post a Comment

<< Home