Thoughts and Links: July 2013

Learning a New programming Language

So I wrote an R script over 2 days that is mostly your standard file manipulation type stuff. Look at the file system, look at headers, concat'ing data from .lzo to an XDF DataFrame, And I also threw in multi-threading for good measure.
We originally farmed out the work to the 'services' branch of a company and the fellow had billed 8 hours before I pulled the plug on that. I figured (the non threaded version) would take 4 hours tops for someone that does this sort of thing on a daily basis. I (Having never looked at R before in my life), did all of the above in 2 days(most of the time was finding the right API, and threading was a bit tricky). Which is partially a pat on the back, but I'm also amazed at how easy it is to pick up a new language these days. With a little knowledge of what you want, and a proper search/stackoverflow, etc, you can pick up the basics in no time. We've certainly come a long way since "Learn C in 21 days".

Follow up on LSRN (Random Projections)

This is someones Honors Thesis for their BS… I think my honors thesis was: If most of my class is made up of a bunch of losers, I can ride their wave of incompetence to a diploma.
http://cseweb.ucsd.edu/~akmenon/HonoursThesis.pdf

-JD

GC

Another good link from Highscalability on GC params.
http://mechanical-sympathy.blogspot.com/2013/07/java-garbage-collection-distilled.html

RAID

From HighScalability blog:

"Crazy lessons from GoDaddy: there is an inherent drawback to using a battery-backed write cache: Many RAID controllers, like our Dell PERC cards, go through a battery learning cycle which calibrates the capacity of the battery to ensure it does not unexpectedly fail. For us, this cycle occurs every 90 days. When a battery learning cycle begins, it fully charges, discharges, and then charges again, realigning the true capacity of the battery. While it performs the learning process, you cannot rely on it to sustain the cache in the event of a power failure."

...I need a backup battery for my battery, that backs up my RAID array, which creates a backup of my data.

LSRN Solver

So many cool projects, so little time.

"We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to minx∈Rn ∥Ax − b∥2, where A ∈ Rm×n with m ≫ n or m ≪ n, and where A may be rank-deficient. Tikhonov regularization may also be included. Since A is only involved in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when A is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size ⌈γ min(m, n)⌉ × min(m, n), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results demonstrate that on a shared-memory machine, LSRN outperforms LAPACK's DGELSD on large dense problems, and MATLAB's backslash (SuiteSparseQR) on sparse problems. Further experiments demonstrate that LSRN scales well on an Amazon Elastic Compute Cloud cluster. "

http://arxiv.org/pdf/1109.5981v2.pdf

Thoughts and Links

Tuesday, July 23, 2013

Learning a New programming Language

Follow up on LSRN (Random Projections)

Monday, July 22, 2013

GC

RAID

Wednesday, July 17, 2013

LSRN Solver

About Me

Links

Previous Posts

Archives