Hessian Matrix
Learning!
I just love peeling back the onion. Now that I have the classic gradient descent working, what is the next step?
BFGS apparently. http://en.wikipedia.org/wiki/BFGS_method
And I learn about a new matrix: The Hessian Matrix, (which I have been seeing mentioned a lot lately, but didn't know what it was).
http://en.wikipedia.org/wiki/Hessian_matrix
So, my intuition (before researching this) is: We are using Gradient Descent to move to the solution (where the first derivative = 0). What would be really handy is if we had the second derivative to know what our slope is, and if it's very steep, then we can make a bigger jump… Or actually if we make a small change in X, it's going to make a large change in Y due to the steep slope..
And presto, the Hessian Matrix is actually the Second Derivative.
Also of note BFGS, needs NxN elements for the Hessian, where N is the number of parameters in the model, and there is also L-BFGS for limited memory.
So sizing this up: For 100,000 variables (which is an epic-non-human-readable model), we would need 80GB, but I'm using less than 1K variables, so 8MB should do.
-JD
I just love peeling back the onion. Now that I have the classic gradient descent working, what is the next step?
BFGS apparently. http://en.wikipedia.org/wiki/BFGS_method
And I learn about a new matrix: The Hessian Matrix, (which I have been seeing mentioned a lot lately, but didn't know what it was).
http://en.wikipedia.org/wiki/Hessian_matrix
So, my intuition (before researching this) is: We are using Gradient Descent to move to the solution (where the first derivative = 0). What would be really handy is if we had the second derivative to know what our slope is, and if it's very steep, then we can make a bigger jump… Or actually if we make a small change in X, it's going to make a large change in Y due to the steep slope..
And presto, the Hessian Matrix is actually the Second Derivative.
Also of note BFGS, needs NxN elements for the Hessian, where N is the number of parameters in the model, and there is also L-BFGS for limited memory.
So sizing this up: For 100,000 variables (which is an epic-non-human-readable model), we would need 80GB, but I'm using less than 1K variables, so 8MB should do.
-JD