I stumbled upon an interesting post by David J. DeWitt and Michael Stonebraker entitled
For those not familiar with MapReduce, it is a programming model developed by Google for distributed computing on extremely large sets of data. You can read Google's original paper outlining the technique here.
The authors of this post make some excellent points, most of which are centered upon the importance of a well defined structure and abstraction to data. While I certainly agree to a number of them, it's hard to ignore the simplicity and effectiveness of MapReduce. After all, it is currently being used to process 20 petabytes of data a day.