This is G o o g l e's cache of http://www.osopinion.com/perl/story/11091.html.
G o o g l e's cache is the snapshot that we took of the page as we crawled the web.
The page may have changed since that time. Click here for the current page without highlighting.
To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:www.osopinion.com/perl/story/11091.html


Google is not affiliated with the authors of this page nor responsible for its content.

OPINION:
Desperately Seeking Sane Search Engines

Send this Article
Print this Article
Related Stories
Contributed by Adam Barr
osOpinion.com
June 8, 2001


Whatever tricks search engines use to decide how to order the pages they find, the finding of the pages itself is done by brute force, looking for the words the user entered into the query line on every page the engine knows about on the Web.

In This Story:

Searching for Search Tips

Seek and You Shall Find What?

Tied in Knots

Word Soup

 Related Stories

A good friend of mine likes to rave about Google, the search engine. "It's great!" he says. "It always finds what I want."

I asked him what kind of searches he does. It turns out that he is usually looking for the Web site of some obscure organization whose name he already knows. In this situation, Google's method of using links as votes works better than the typical brute-force method.

But that's the simple case -- you know the organization you want to find and just don't know the URL of their site. What if your search is more complicated than that?

A typical search engine might be classified as 90 percent useless, while Google brings that percentage down to about 85 percent useless, in my view. Unfortunately, the quality of the search results on Google and elsewhere is dismal.

Searching for Search Tips

A search on Amazon reveals at least a dozen books that are dedicated solely to teaching novices how to use search engines. That is, how to phrase your query so that the search engine software will actually look for what you want it to look for.

One of the reasons you need all those books is the fact that there are another dozen books out there on how to design a Web site so it is ranked highly by search engines. You've got people competing to trick the search engines, which makes searching for what you want even harder. This is progress?

Seek and You Shall Find What?

Note that in this discussion, I am only dealing with the issue of Web pages having static text. I haven't even approached the bigger problem that is looming: the fact that many pages are now generated dynamically from a database, so the data is either hidden from search engine Web crawlers, or else the crawlers grab whatever data they happen to be presented with when they visit the site and wind up indexing data that is no longer there.

The Holy Grail of Web searching is to be able to type in a question and get an answer. This is the only option that will make Web searching accessible to the non-technical among us.

There are sites out there that claim to answer questions. AskJeeves lets you type in a question. Sounds good, right? However, I suspect that it simply strips off most of the words in the question and focuses on the nouns. To test this, I asked Jeeves "how can I find a hotel in Atlanta, GA?", "hotel Atlanta," "when do try huh if via hotel in to if Atlanta?" and "what's a good crack hotel in Atlanta?"

In all four cases, the main two responses were the same: a way to book hotels in Atlanta and a Yellow Pages listing of hotels in Atlanta. The only difference was that the first question also elicited various other general hotel-related responses, which only leads me to believe that Jeeves is looking for stock phrases like "How can I find a hotel?"

Tied in Knots

I looked around AltaVista, another search engine, and found a page detailing why its search technology is so great. It listed some "technological wizardry" (Alta Vista's term) that it uses, which turn out to simply be a bunch of tie-breakers for ordering pages:

  • First, pages where your phrase is in the title

  • Second, pages where the phrase is nearer to the beginning

  • Third, text-heavy pages over graphics

  • Fourth, pages that have a lot of links on them

None of these tests amount to rocket science. However, they don't deal with the main issue. The fundamental problem is that search engines have no way of understanding the content on a page.

Whatever tricks search engines use to decide how to order the pages they find, the finding of the pages itself is done by brute force, looking for the words the user entered into the query line on every page that the engine knows about on the Web. They do not attempt to throw out pages because they don't "answer" the question the user has "asked."

Word Soup

Yahoo! tries to get around this issue by having humans do the categorization, but that approach cannot keep up with the explosion of Web sites.

Microsoft Word has a somewhat amusing feature called AutoSummarize, which will try to condense a document down to as little as ten sentences. It may be a step in the right direction, but right now it is still too primitive to use.

Unfortunately, until search engines can start to understand what a Web page is saying, they will remain even more primitive.

Talkback Forum


Author's background:
Adam Barr worked at Microsoft for over 10 years before leaving in April 2000. His book about his time there, Proudly Serving My Corporate Masters, was published in December 2000. He lives in Redmond, Washington.

Don't get spun by the media. Spin your own...
Have YOUR Tech/OS Opinion featured on OSO!

See Related Stories
Web Dominance: A Matter of Choice?
(06-Jun-01)
Keywords: The Net's Future Naming Standard?
(04-Jun-01)
Do Portals Still Matter to E-Commerce?
(23-Mar-01)
The Secret Life of Net Shopping Robots
(19-Dec-00)
Consortium Forms Online 'Super-Library'
(20-Nov-00)