Desperately Seeking Sane Search Engines
June 8, 2001
I asked him what kind of searches he does. It turns out that he is usually looking for the Web site of some obscure organization whose name he already knows. In this situation, Google's method of using links as votes works better than the typical brute-force method.
But that's the simple case -- you know the organization you want to find and just don't know the URL of their site. What if your search is more complicated than that?
A typical search engine might be classified as 90 percent useless, while Google brings that percentage down to about 85 percent useless, in my view. Unfortunately, the quality of the search results on Google and elsewhere is dismal.
A search on Amazon reveals at least a dozen books that are dedicated solely to teaching novices how to use search engines. That is, how to phrase your query so that the search engine software will actually look for what you want it to look for.
One of the reasons you need all those books is the fact that there are another dozen books out there on how to design a Web site so it is ranked highly by search engines. You've got people competing to trick the search engines, which makes searching for what you want even harder. This is progress?
Note that in this discussion, I am only dealing with the issue of Web pages having static text. I haven't even approached the bigger problem that is looming: the fact that many pages are now generated dynamically from a database, so the data is either hidden from search engine Web crawlers, or else the crawlers grab whatever data they happen to be presented with when they visit the site and wind up indexing data that is no longer there.
The Holy Grail of Web searching is to be able to type in a question and get an answer. This is the only option that will make Web searching accessible to the non-technical among us.
There are sites out there that claim to answer questions. AskJeeves lets you type in a question. Sounds good, right? However, I suspect that it simply strips off most of the words in the question and focuses on the nouns. To test this, I asked Jeeves "how can I find a hotel in Atlanta, GA?", "hotel Atlanta," "when do try huh if via hotel in to if Atlanta?" and "what's a good crack hotel in Atlanta?"
In all four cases, the main two responses were the same: a way to book hotels in Atlanta and a Yellow Pages listing of hotels in Atlanta. The only difference was that the first question also elicited various other general hotel-related responses, which only leads me to believe that Jeeves is looking for stock phrases like "How can I find a hotel?"
I looked around AltaVista, another search engine, and found a page detailing why its search technology is so great. It listed some "technological wizardry" (Alta Vista's term) that it uses, which turn out to simply be a bunch of tie-breakers for ordering pages:
None of these tests amount to rocket science. However, they don't deal with the main issue. The fundamental problem is that search engines have no way of understanding the content on a page.
Whatever tricks search engines use to decide how to order the pages they find, the finding of the pages itself is done by brute force, looking for the words the user entered into the query line on every page that the engine knows about on the Web. They do not attempt to throw out pages because they don't "answer" the question the user has "asked."
Yahoo! tries to get around this issue by having humans do the categorization, but that approach cannot keep up with the explosion of Web sites.
Microsoft Word has a somewhat amusing feature called AutoSummarize, which will try to condense a document down to as little as ten sentences. It may be a step in the right direction, but right now it is still too primitive to use.
Unfortunately, until search engines can start to understand what a Web page is saying, they will remain even more primitive.
Adam Barr worked at Microsoft for over 10 years before leaving in April 2000. His book about his time there, Proudly Serving My Corporate Masters, was published in December 2000. He lives in Redmond, Washington.