Hadoop
I'm becoming less and less enchanted with Hadoop as time goes by.
The whole ecosystem of Hadoop software seems half baked. There's one good core idea: HDFS, and distributed computation. Then there is a raft of marginal software associated with it.
Hive: On the plus side, you get some SQL, and adhoc query goodness. On the minus side, it's a marginal subset of SQL, with bugs.
HBase: I haven't heard anyone say this is baked. In fact most people get bit by something or other.
Cascading: A somewhat useful abstraction, that (while not too buggy) doesn't quite do it for me. I'm writing way too much code to do simple things. I have to go out of my way to not lose the meta data ( Fields/Typing ).
I'm hoping that Cliff Click's 0xData doesn't completely suck.
http://0xdata.com/
In my dream world Postgres gets a few new features, and I never have to think about this again.
The whole ecosystem of Hadoop software seems half baked. There's one good core idea: HDFS, and distributed computation. Then there is a raft of marginal software associated with it.
Hive: On the plus side, you get some SQL, and adhoc query goodness. On the minus side, it's a marginal subset of SQL, with bugs.
HBase: I haven't heard anyone say this is baked. In fact most people get bit by something or other.
Cascading: A somewhat useful abstraction, that (while not too buggy) doesn't quite do it for me. I'm writing way too much code to do simple things. I have to go out of my way to not lose the meta data ( Fields/Typing ).
I'm hoping that Cliff Click's 0xData doesn't completely suck.
http://0xdata.com/
In my dream world Postgres gets a few new features, and I never have to think about this again.
4 Comments:
hadoop is such a shit show. we already have a way way way better distributed computer here but hadoop is the new black so everyone has to learn it and it just sucks.
have you seen this guy?
http://www.youtube.com/watch?v=yOt5Zavslig
OS version of the MPP architecture we license from IBM, sits on top of postgres.
Cool, I hadn't seen this one. Is it reliable, or does it take a lot of care and feeding?
It feels like with Postgres-XC and this, we are _just_ about there. I know there was some talk again about getting parallel queries in the main line. Probably won't happen for a couple of years though.
I can see why they had to, but setting up multiple clusters on the same box seems sub-optimal.
i have not personal experience with it, but i want something that's easier to port our existing SQL into, hadoop is just not happening with all this legacy db2 sql sproc code.
the boner-killer for me on this one comes at about 17 minutes when he says there are no functions or stored procs. i still might give it a go for a bit and see how it pans out - most of our procs are either just scripts or loops of scripts.
our hadoop dev cluster is mostly idle so i figure i can throw this on there in parallel and do a quick proof of concept. i'll let you know if it's a nightmare.
Post a Comment
<< Home