Thoughts and Links: March 2013

Fun links

Just some fun places I have found good reading lately:

http://reddit.com/r/MachineLearning

this looks promising:
http://cs.stackexchange.com

BigML

I was just checking out this ML as a service.

https://bigml.com/pricing

On pricing, if you select the Terabyte bucket, and then push the sliders ALL THE WAY over to the right.. at 5TB + 5TB, and add up to a whopping $26K, we are not even close to the size of the data and models we are doing daily ... times 10… Probably not an economical solution. (I do like their blog though)

In this post: http://blog.bigml.com/2013/03/18/bedtime-for-boosting/

They mention:

"The authors show that, if you carefully construct a dataset, you can make certain types of boosters (ones that optimize convex potential functions, for those who care) produce models that perform no better than chance by adding just a single carefully selected noise instance."

-JD

Parallel Deep Learning.

I went to this talk yesterday..

https://www.cs.washington.edu/htbin-post/mvis/mvis?ID=1338

Here is his Stanford page:

http://ai.stanford.edu/~quocle/

And the research paper he talked about.

http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf

Some really interesting stuff. The thing I wanted to see, was that they have created a system to really parallelize Gradient Descent. Basically unlimited sized data(The Internet), unlimited size model(billions of parameters), run in parallel across arbitrary number of machines(2000). NOT done over map reduce.

Two neat things: It's asynchronous, so the model servers at the bottom are doing gradient descent on some portion of the model, and send their parameter DELTAs back up.. And it works out that even though different servers are working on different versions and views of the parameters, it still converges.

They also did an L-BFGS version, which I thought would be faster, but actually wasn't… because they also came up with an new analytical solution for adjusting the learning rate of gradient descent that was pretty slick.

They also had a new version of deep learning (Which gives some people convulsions, but I think is pretty cool), that was a relaxed version of sparse auto encoding. Deep learning uses unsupervised learning to find new features to be used in supervised learning, and it learns too!

http://ai.stanford.edu/~quocle/faces_full.pdf

go back and read:

http://alex.smola.org/teaching/cmu2013-10-701/

Hessian is ...

Hessian of J(θ) is H = XT X
Head 'esplode!

Oh Boy another new database!

I can't wait to use this new alpha release to store my data!
http://www.foundationdb.com

Just look at those graphs of scalability!

I wonder if there is a graph of mysterious data loss?

-JD

Thoughts and Links

Wednesday, March 20, 2013

Fun links

BigML

Friday, March 15, 2013

Parallel Deep Learning.

Sunday, March 10, 2013

go back and read:

Tuesday, March 05, 2013

Hessian is ...

Monday, March 04, 2013

Oh Boy another new database!

About Me

Links

Previous Posts

Archives