Tuesday, June 19, 2012

Hadooop!

Yet another reason I think Hadoop and Map Reduce is a bit of a fad.

http://research.microsoft.com/pubs/163083/hotcbp12%20final.pdf

This paper makes the point that most workloads would fit in memory on a small cluster, but we go through hoops to stream data on/off disks and end up with these convoluted programming idioms (like cascading). Our largest dataset is 33GB as lzo. Even at 330GB, we could fit this in memory on our cluster. If all processing was in memory, we would be orders of magnitude faster.

0 Comments:

Post a Comment

<< Home