Search engines are the key to finding specific information on the vast expanse of the World Wide Web.
Without sophisticated search engines, it would be virtually impossible
to locate anything on the Web without knowing a specific URL. But do you know how search engines work? And do you know what makes some search engines more effective than others?
When people use the term search engine in relation to the Web, they
are usually referring to the actual search forms that searches through
databases of HTML documents, initially gathered by a robot.
There are basically three types of search engines: Those that are powered by robots (called crawlers; ants or spiders) and those that are powered by human submissions; and those that are a hybrid of the two.
Crawler-based search engines are those that use automated software agents (called crawlers) that visit a Web site, read the information on the actual site, read the site's meta tags
and also follow the links that the site connects to performing indexing
on all linked Web sites as well. The crawler returns all that
information back to a central depository, where the data is indexed. The
crawler will periodically return to the sites to check for any
information that has changed. The frequency with which this happens is
determined by the administrators of the search engine.
Human-powered search engines rely on humans to submit information
that is subsequently indexed and catalogued. Only information that is
submitted is put into the index.
In both cases, when you query a search engine to locate information,
you're actually searching through the index that the search engine has
created —you are not actually searching the Web. These indices are giant
databases
of information that is collected and stored and subsequently searched.
This explains why sometimes a search on a commercial search engine, such
as Yahoo! or Google, will return results that are, in fact, dead links.
Since the search results are based on the index, if the index hasn't
been updated since a Web page became invalid the search engine treats
the page as still an active link even though it no longer is. It will
remain that way until the index is updated.
So why will the same search on different search engines produce
different results? Part of the answer to that question is because not
all indices are going to be exactly the same. It depends on what the
spiders find or what the humans submitted. But more important, not every
search engine uses the same algorithm
to search through the indices. The algorithm is what the search engines
use to determine the relevance of the information in the index to what
the user is searching for.
One of the elements that a search engine algorithm scans for is the
frequency and location of keywords on a Web page. Those with higher
frequency are typically considered more relevant. But search engine
technology is becoming sophisticated in its attempt to discourage what
is known as keyword stuffing, or spamdexing.
Another common element that algorithms analyze is the way that pages
link to other pages in the Web. By analyzing how pages link to each
other, an engine can both determine what a page is about (if the
keywords of the linked pages are similar to the keywords on the original
page) and whether that page is considered "important" and deserving of a
boost in ranking. Just as the technology is becoming increasingly
sophisticated to ignore keyword stuffing, it is also becoming more savvy
to Web masters who build artificial links into their sites in order to
build an artificial ranking.
No comments:
Post a Comment