Thursday, August 01, 2013

Ceph vs HDFS


For the data nerds out there:
I've been considering for a while if it wouldn't be better to dump HDFS and replace it with Ceph. HDFS just feels so broken/fragile/abandoned. Like at some point in 2008, everyone jumped on the HDFS bandwagon, and by 2010 everyone had jumped off. Leaving just a few lonely holdouts to gaze at the Jira carnage.

http://ceph.com/ceph-storage/file-system/

Now with data locality for Hadoop Jobs.
http://www.mail-archive.com/ceph-users@lists.ceph.com/msg02306.html

The one thing MapR has over Cloudera is a nice Posix interface. I suspect Ceph is even better AND 100% open source. I'm going to give it 1 year before this becomes mainstream. By then enough people will have blazed the trail to make this a sure thing.

-JD

3 Comments:

Anonymous Anonymous said...

Ceph to replace HDFS, Do you still believe in that?

6:05 PM  
Blogger jerdavis said...

Well.. Seems like my prediction was pretty far off. :)
These days if I'm doing BigData type work. I'm in EMR, using Spark, with S3 as the source and destination.
Mostly out of convenience since I don't run my own cluster anymore... I have no idea where the high end perf went to these days.

6:14 PM  
Blogger jerdavis said...

I'm actually surprised this account still exists.

6:17 PM  

Post a Comment

<< Home