Isaac Su / 2017.07.03

Not Google

Noteable excerpts from You Are Not Google

Regarding the Dynamo paper and Cassandra:

Having read the Dynamo paper, and knowing Cassandra to be a close derivative, I understood that these distributed databases prioritize write availability (Amazon wanted the “add to cart” action to never fail).

they did this by compromising consistency, as well as basically every feature present in a traditional RDBMS

Regarding Service-Oriented Architecture:

But by the time Amazon decided to move to SOA, they had around 7,800 employees and did over $3 billion in sales.

Regarding GFS and MapReduce:

But do you need to read and write back to literally thousands of disks? How much data do you have exactly? GFS and MapReduce were created to deal with the problem of computing over the entire web, such as… rebuilding a search index over the entire web.

Perhaps you have read the GFS and MapReduce papers and appreciate that part of the problem for Google wasn’t capacity but throughput: they distributed storage because it was taking too long to stream bytes off disk. But what’s the throughput of the devices you’ll be using in 2017?