August 01, 2005

Having previously given the world the Statistically Improbable Phrase, Amazon has now come up with some new analysis to soak up the CPU cycles: Capitalized Phrases, Books on Related Topics, Concordance, and Text Stats. The Concordance and Text Stats are particular fascinating. For example the concordance for the book Microsoft Office Project 2003 Step by Step shows the most popular word is "project", followed by "task", "resource", "click", and "work" (I wonder if Amazon has a patent on "showing a concordance indicating frequency of words via font size"). The Text Stats shows the Fog, Flesch, and Flesch-Kincard Indices, as well as complexity, character/word/sentence counts, and the "Fun stats" of words per dollar and words per ounce.

The Project 2003 book has 6466 WPD and 3186 WPO. I don't know if those are good deals or not. By comparison, Who Moved My Cheese? has 1020 WPD and 1632 WPO. A paperback of War and Peace has an astonishing 53,181 WPD (no WPO, I guess because there's no weight listed). The Cat in the Hat, which famously was written with only 50 unique words, shows a concordance of only 25 words (dang, I was thinking maybe the site would crash on a book with fewer than 100 words), with "eat" in front with 25 occurences, and "green", "eggs" and "ham" all tied (not surprisingly) with 11 each. You only get 63 WPD, but isn't your little moppet worth it?

