Thursday, September 1, 2011

Super Crunchers Book Review & Thoughts

I. Ayre, Supercrunchers: Why Thinking-by-Numbers Is the New Way to Be Smart. New York: Bantam, 2007.


I found this to be a fascinating read. I knew about much of the data collection that was going on but did not really grasp the magnitude of the random trials that were going on all around me. I work in a very data driven environment (high tech) and was shocked to learn that other professions are JUST NOW starting to use regression and other data analysis techniques. The fact that physicians are just starting to use data mining, regression, correlation, etc. to help diagnosis patients was sort of baffling. For some reason, I thought when my general doctor left the room and returned after 15 minutes she was actually looking in some master database for my symptoms. I never asked why she made certain conclusions, but after reading this book I'm going to start asking the origins of her diagnosis. I found the chapters with medical field examples to be the most interesting. As I mentioned above, I sort of already knew about the things airlines and credit card companies were doing.

During the course of this book, I often thought about the ethical issues surrounding the methods presented, especially the random trials. To deny patients of potentially helpful drugs for the sake of conducting trails presents some ethical dilemmas. One could argue that this 'holding back' of critical care is for the greater good, helping future patients with the statistically correct forms of treatment. A similar dilemma presents itself when conducting random trails using impoverished sections of society. In one case crucial financial help may be denied for the sake of random trials, with no information provided as to why the recipient was denied assistance. In other cases, impoverished families are given all sorts of benefits. Although I agree that efficient use of monies is of utmost importance, it's a bit depressing to think aid is being withheld.

Another potential ethical issue was presented regarding companies displaying, or burying in this case, information regarding warranties on their web sites. Companies are using random trails to determine if they should be up front and honest regarding warranty information, or if they are better off (in terms of sales) to bury this information on their website, essentially hiding it from the consumer. The author presented several cases where corruption and cheating was actually exposed using supercrunching. In one example, through extensive data analysis, a study determined that some basketball games were subtlety rigged during the last few minutes to reduce point spreading. It's encouraging to think about how data mining and analysis techniques can help reduce corruption, cheating, and selective targeting.

Dr. Ayres concluded his book with a chapter about the 2SD rule. This is the basic rule that 95% of a population is within two standard deviations of the mean. He provides some nice examples of how the rule works for every normal distribution. He provides real-world examples related to basketball game spreading, adult heights, and political races.

As a data driven engineering type, I would have loved to see some charts and figures supporting some of the examples and cases presented. The book did spark enough interest for me to go and research some of the cited cases myself out of curiosity. Charts and figures could have been used to better illustrate some of the correlations and regressions presented.

I found the book to be an easy read but sort of inconstant in terms of explaining some of the more technical aspects. In the beginning of the book the author did a good job explaining data mining and analysis at a level even 'non-techies' could understand. The author went into a fair amount of detail explaining the methods so that readers from most backgrounds could understand the foundation of the text. However, when regression was introduced it wasn't explained very well and did not show any graphical examples. The author writes "A regression is a statistical procedure that takes raw historical data and estimates how various causal factors influence a single variable of interest." Beyond this short explanation, regression as a procedure or method is not described. Given the amount of non-technical explanations explained earlier in the book, I expected a better foundation to be provided to the reader. Personally, I know what regression is and how to apply this method, but others reading this book could benefit from a bit more explanation and background. Ayres also fell a bit short when explaining standard deviation. He explained the term but did not provide an example of how standard deviation is calculated. He mentioned, in more than one section, how easy the calculation is in Excel but never showed the manual calculation.

Overall, I left this book with somewhat mixed feelings. I was very excited about the future potential of supercrunching in the medical field. In contrast, I'm a little scared about where we might be headed in terms of commercial driven supercrunching. Although these methods can help both the advocate as well as the big corporations, the privatization of huge data concentrators means that the folks with the most money (companies) have access to the best and biggest data sets. My worry is that the advocate is going to be 'out gunned' as compared to the folks looking for ways to make more and more money. With examples like Enron and the recent housing "bubble" (crash), we've seen what happens in a greed driven environment. I'm finishing up this review on an international flight from Houston to Costa Rica. My supercrucher mixed feelings continue as I'm stuck on the last row of the plane with a seat that doesn't recline, and a shortage of food when the cart finally makes it to me. The flight was overbooked, not one empty seat and hardly any extra room for carry-on bags. I'm typing these last few sentences with one hand because space is so confined these days that one has no hope of actually using two hands to type on the computer. All for the sake of higher profilts.

I'd highly recommend this book to just about anyone, it's a great read, very interesting.

No comments: