Thursday, October 24, 2013

I'm a scientist!

When I started college I actually wanted to be a Bio-Chemist. I think this stemmed from my general love of all things science (And Bio-Chemsitry has TWO sciences right in the name!)
Turns out I'm not very good at chemistry, but along the way I was exposed to the classic chemistry lab notebook. The idea behind the lab notebook is that you write out all the steps you are going to perform ahead of time, and any actions you performed. (oops I added 20ml instead of 200ml, etc ), any observations: (liquid turned blue, and small explosion occurred, etc)
Turns out I wasn't very good at keeping a lab notebook either. (bad penmanship being an obvious problem).
Lately I've noticed that in my current job (AS A DATA SCIENTIST) I've been basically keeping a lab notebook of sorts. This time it's more along the lines of :
Did this query : Select xyz : got this result: 1234 rows
Did this query : Select zzz : didn't see any result with yyy. etc.
Currently this is just a very cluttered text document that I replicate on DropBox, but I could see how some more structure/organization (search, dated, index by project, ?version control?, etc etc) could be very useful. 

------

Your notebook will serve as a permanent record of your experimental work. It will contain the information you need to complete your work efficiently and safely, and you will use the information contained in your notebook to write laboratory reports explaining your results. For these reasons, it is important that your notebook be complete and accurate. As a general rule, a good notebook is one from which someone else can repeat your experimental work in the same way that you have done it.

I. General Guidelines:

1. Your notebook must be bound, the pages numbered, and have a carbon copy. 2. Write your name, the course name, and section # on the cover or front page. 3. Always use permanent ink, not pencil.
4. Write it down NOW. Your notebook is a log of what you do as you do it.

5. Use complete sentences.

6. Write everything in your notebook. Weights, temperatures, everything! When recording experimental data, always include units.

7. Do not erase! If you make an error, draw a single line through it, and continue. The original statement should still be legible.

8. Never remove original pages from your notebook. You may remove carbon copies. 9. Date every page as you use it.

-- 

also:

http://www.dartmouth.edu/~chemlab/info/notebooks/how_to.html

--

And of course it went electronic..

Monday, October 21, 2013

"Secure DropBox Alternative"

Yet another Secure DropBox Alternative.. Client Side Keys! BLAH BLAH.
http://www.filosync.com

These people are all missing the boat. First company to let you compile your own client software wins.

Sunday, October 20, 2013

I love me some Mongo hate




"Being able to walk with them doesn't change the fact that as of right now, MongoDB is clown shoes."

But seriously, the author makes a great point:

But actually, that's the Tao-like genius of MongoDB – having absolutely nothing new. Most databases are built with some killer idea: the consistency protocol for Cassandra, the crazy data structures of Redis, or the data-processing abilities of Hadoop. MongoDB has mmap, and by "has", I mean "uses" (but hey, posession is nine tenths of the law). Not having to design your own caching algorithms or write strategies, and using the simplest possible implementations of everything else, lets you get to market quickly and focus on marketing your benchmarks, consulting to bring your customers up to Web Scale, responding to haters, or learning about concurrency.

Monday, October 14, 2013

Cassandra and The trouble with timestamps




"This behavior violates strong, eventual, causal, read-your-writes, session, and monotonic write consistency, and depending on how you interpret "seen", violates monotonic read consistency as well."

Thank you kind author!
I think Cassandra is a really neat piece of software, and it's certainly come a long ways. But this Timestamp issue has always given me pause because not only can I get completely broken results, I also can't even audit when it happened, is happening or will happen.

I'm sure this has happened: 
"We have a redundant array of nodes, storing 3 copies of data across the cluster. Each node runs RAID 10 ensuring high performance and provides excellent data protection, requiring a minimum of 6 drive failures before losing data!" 
"Unfortunately last week the was a single hardware failure in the system clock on a single node, so all our data is garbage."

OTOH, I can imagine saying something similar for a lot of software I've written if the RTC is truly broken, and Im sure Oracle has similar issues. Still, Cassandra timestamp failures seem to be in a different class.

-JD

Wednesday, October 09, 2013

Leap Motion



Leap Motion looks pretty cool, I might get one. I expect we might see this as standard equipment someday…
One very small detail:




"Don't hunch over your device. Keep its field of view clear from obstructions, including yourself. Don't bend your elbows and your wrists with your arms close together. Don't hold your arms straight ahead of you in the air. Don't rest your elbows on a surface with your elbows pointed out to the side."

You know, the most natural motion possible that everyone has been using since they started using a computer...