Thoughts and Links: Hadoop

Wednesday, May 23, 2012

Hadoop

I'm becoming less and less enchanted with Hadoop as time goes by.
The whole ecosystem of Hadoop software seems half baked. There's one good core idea: HDFS, and distributed computation. Then there is a raft of marginal software associated with it.

Hive: On the plus side, you get some SQL, and adhoc query goodness. On the minus side, it's a marginal subset of SQL, with bugs.
HBase: I haven't heard anyone say this is baked. In fact most people get bit by something or other.
Cascading: A somewhat useful abstraction, that (while not too buggy) doesn't quite do it for me. I'm writing way too much code to do simple things. I have to go out of my way to not lose the meta data ( Fields/Typing ).

I'm hoping that Cliff Click's 0xData doesn't completely suck.
http://0xdata.com/

In my dream world Postgres gets a few new features, and I never have to think about this again.

4 Comments:

bryann said...: hadoop is such a shit show. we already have a way way way better distributed computer here but hadoop is the new black so everyone has to learn it and it just sucks.; 8:34 AM
bryann said...: have you seen this guy?

http://www.youtube.com/watch?v=yOt5Zavslig

OS version of the MPP architecture we license from IBM, sits on top of postgres.; 8:45 AM
jerdavis said...: Cool, I hadn't seen this one. Is it reliable, or does it take a lot of care and feeding?

It feels like with Postgres-XC and this, we are _just_ about there. I know there was some talk again about getting parallel queries in the main line. Probably won't happen for a couple of years though.
I can see why they had to, but setting up multiple clusters on the same box seems sub-optimal.; 9:59 AM
bryann said...: i have not personal experience with it, but i want something that's easier to port our existing SQL into, hadoop is just not happening with all this legacy db2 sql sproc code.

the boner-killer for me on this one comes at about 17 minutes when he says there are no functions or stored procs. i still might give it a go for a bit and see how it pans out - most of our procs are either just scripts or loops of scripts.

our hadoop dev cluster is mostly idle so i figure i can throw this on there in parallel and do a quick proof of concept. i'll let you know if it's a nightmare.; 2:49 AM

Thoughts and Links

Wednesday, May 23, 2012

Hadoop

4 Comments:

About Me

Previous Posts