Elangeni Online Services

Search Engine Features Chart

This chart is correct as of Nov. 5, 1997. Open Text is no longer listed, as it no longer crawls much of the web nor has the strategic alliances that it used to. Northern Light has been added, but information is still being compiled on this new search engine.

Most search engine comparison charts are made for search engine users. The search engine features chart below is designed primarily for webmasters who care about how search engines index their sites. It provides a summary of important factors and features that can affect how a site is indexed.

Although designed for webmasters, search engine users will also find portions of the search engine comparison chart useful in determining how fresh and complete the different search engines are.

Please note that in a few places on the chart, a - symbol is used to denote unknown or unresearched answers.

Size of the Search Engine

The larger a search engine is, in terms of pages indexed, the more likely pages from your web site will be included. Actual numbers can be misleading, as explained below. So, search engines are categorized as big, medium or small.

Expect to find most of your pages in a big search engine, some to many of your pages in a medium search engine and few or none of your pages in a small search engine. Why might a page not be included? See the section about depth, below.

The figures shown are the last reported to me or reported elsewhere. Take them with a grain of salt. Some search engines may accidentally keep two copies of a web page but not take duplication into account when quoting numbers. There are also other factors that make comparisons difficult.

Pages Crawled Per Day

This shows how many pages a search engine can index per day. The more it can crawl, the more likely it can maintain a fresh index. However, this is not the only way to measure freshness. Search engines may learn how frequently pages change or use other methods to improve freshness to maximize a smaller crawling capacity.

Freshness

The web is constantly changing, so it's easy for search engine listings to become out-of-date. However, some listings may only be days old, while others may be months old -- or longer.

There are various reasons why this occurs. Some search engines "instantly" index any page submitted to them, as explained below. It takes longer for them to return and gather non-submitted pages. Search engines also may crawl the "popular" parts of the web more frequently than other portions.

Freshness shows the age of listings, from best to worse case scenarios for each search engine.

Date

Some search engines show the date when a web page was added. This provides a clue as to how fresh or stale the search engine's listings may be. Kudos to these search engines. The others leave you guessing about freshness.

File date means that the date of the file is shown, rather than the date is was added to the index. For example, imagine you created a file on Aug. 1, 1997, and it was spidered on Sept. 1, 1997. A search engine showing file date would list the Aug. 1 date, not the Sept. 1 date.

Submitted Pages

Ideally, a search engine will find your pages as it follows links while crawling the web. Realistically, your pages will appear much faster if you submit them directly to the engine. This shows how soon to expect a page you submitted to appear in the search engine's listings.

Non-Submitted Pages

Once a page has been submitted, a search engine will usually find other pages from the site by following links from the submitted page. However, some engines take longer to gather these "non-submitted" pages. In particular, this is because some search engines "instantly" index a page that is submitted, then add the site to the schedule for future crawling.

The chart shows how soon to expect other pages from your site to appear once you've submitted a single page -- and assuming there are no problems preventing the engine from finding these pages, such as frames or image maps, as explained below.

Depth

This is closely related to non-submitted pages. It indicates how many pages beyond the submitted page a search engine will gather. Search engines are operating in two manners:

No Limit: These search engines will diligently try to gather everything they find at a web site. They may not get every page, but that remains the general goal.

Sample: These search engines gather a sample of web pages from a web site. Some gather a bigger sample than others. Use the size listed as a guide to how large a sample you can expect each search engine to have gathered. Usually, the more popular a site is, the more likely it will be better represented in the search engine.

Keep in mind that part of the web remains unindexed due to physical hurdles. Frames, image maps and dynamically generated pages can all cause information to be missed.

Frames Support

Can the search engine follow frame links? If it can't, the search engine is probably missing much of your site.

Image Maps

Can the search engine follow client-side image maps? As with frames, if the search engine cannot follow image maps, it is probably missing much of your site.

Password Protected Sites

Some search engines can enter a password protected site, if you arrange for them to have a user name and password. Why do this? You may want people to discover you have content that matches their query. They'll still need to fill out the appropriate registration information at your site to access it, but at least they'll know it exists.

Link Popularity

All search engines can determine the popularity of a page by analyzing how many links there are to it from other pages. Some engines use this as a means to determine which pages they will include in the index.

Learns Frequency

A number of search engines can learn how often your pages change. A site that changes often will be visited more often. Those that change infrequently get infrequent visits.

Keep Out

This indicates you to tell the search engines to keep out of your site. All of the major engines respect the robots.txt exclusion standard, which tells them not to index a site or parts of a site. Some also support the meta robots tag, where a crawler can be told "noindex" on a particular page. For more information about robots.txt, see the Robots Exclusion Standard page at http://info.webcrawler.com/mak/projects/robots/exclusion.html

Redirection

Some sites redirect visitors from one web address to another.

The chart shows which URL is associated with your listing, if you perform redirection. This is important, because if the search engine indexes the redirected page, you could have a problem with visitors locating it should it be moved or changed at a later date.

Stop Words

Some search engines either leave out words when they index a page or may not search for these words during a query. These "stop words" are excluded as a way to save storage space or to speed searches.

For the webmaster, it's important to consider stop words when crafting your pages. For example, AltaVista will ignore the word web in a search for web developer, so there's little sense in trying to improve your ranking under those keywords.

Relevancy Boosters

All the search engines use the location of keywords and frequency in a web page as the basis of ranking pages in response to a query. The exact mechanism is slightly different for each engine.

In addition to location/frequency, some engines may give a page a relevancy boost based on link popularity or other factors. These help a little, but they don't guarantee a boost to the top. It's quite possible that the most linked to page on the web will still perform poorly if there's another page that's more relevant to the particular query.

Spam Penalty

All major search engines penalize sites that attempt to "spam" the engines in order to improve their position. One common technique is "stacking" or "stuffing" words on a page. This is where a word is repeated many times in a row. There are a number of other techniques. I don't approve of them, so you won't find them listed here. In general, they don't work well, and they often make a page look stupid and unprofessional.

If the search engines spot a spamming technique, they may downgrade a page's ranking or exclude it from listings altogether. One easy way search engines discover pages are through "spam narking," when people complain about pages using spam.

Meta Tag Support

Many believe all search engines acknowledge keywords and descriptions placed in meta tags. In reality, only some do. Partial indicates that search engine will index the text of the tags, but they don't control descriptions nor have any special meaning attached to them.

Titles

This shows how the search engines generate a title for your listing.

Descriptions

This shows how the search engines generate a description for your listing.

Results At A Time

How many results you can display at one time. Defaults are shown in bold. Sometimes you may need to use a special power search page to change the default, but in most cases, you do not.

Display Options

Shows the different ways you can display results, with the default listed first. Most search engines usually let you view only page titles or titles a description

URL Status Check

This shows whether you can determine if a web page has been indexed by the search engine. "Displays listing" means that you can easily search for a particular page and see exactly how it appears in the index. This is marked as "semi" for HotBot, as it is not so easy to specific a particular URL. "Reports if indexed" means that there is a URL status check form that will tell you if the page is in the index. However, you can't see the actual listing easily.

Site Removal

Sometimes web pages are removed or sites shifted to a new domain. Some search engines may continue to find the "old" pages unless certain measures are taken. These are noted on the chart for each search engine and include:

Crawler Name

Each search engine uses a "crawler" or "spider" agent to gather web pages. Most have nicknames. These names are often part of the crawler's host name. You can tell if you've been visited by a crawler by checking your access logs and looking for the various names. In addition, spiders often report an agent name. Instead of saying Mozilla, as the Netscape browser does, a spider reports its own name. For example, Excite will say "Architext" spider.

Indexes ALT Text / Comment Text

Shows if the search engine indexes ALT text associated with images or text in comment tags.

Stemming

Shows whether the search engine will also search for variations of a word based on its stem. For example, entering "swim" might also find "swims" and "swimming."

Search
Engine

AltaVista

Excite

HotBot

InfoSeek

Lycos

Northern
Light

Web
Crawler

Size
(pages
in mills)

Big
(100)

Big
(55)

Big
(80)

Medium
(30)

Medium
(30)

Big
(30 to 50)

Small
(2)

Pages crawled per day 10 million 3 million Up to 10 million - 6 to 10
million
- -

Freshness

1 day to
3 months

1 to 3
weeks

1 day to
2 weeks

Minutes to
2 months

1 to 2
weeks

2 weeks

Updated weekly

Date

Yes

No

File Date

No

Yes (via detailed display)

File Date

No

Crawling
--- Factors that affect if and when a page is indexed ---

Search
Engine

AltaVista

Excite

HotBot

InfoSeek

Lycos

Northern
Light

Web
Crawler

Submitted Pages

1 day

3 weeks

1 to 2 days

Within minutes

1 to 2
weeks

2 weeks

1 - 3
weeks

Non-submitted pages

1 to 3 months

3 weeks

2 weeks

1 - 2
months

1 to 2
weeks

2 weeks

Not added, in most cases

Depth

No limit

No limit

No limit

Sample

Sample

No limit

Sample

Frames Support

No

No

No

Yes

Yes

-

No

Image
Maps

Yes

No

No

Yes

No

-

Yes

Password Protected Sites

No

Yes

No

Yes

Yes

-

No

Link Popularity

No

No

Yes

No

Yes

No

Yes

Learns Frequency

Yes

No

Yes

Yes

No

Yes

No

Keep Out

robots.txt

robots.txt, both in future

Both

robots.txt

robots.txt

Both

Both

Redirection

Redirected URL used

Redirected URL used

-

Redirected URL used

-

-

Redirected URL used

Ranking
--- Factors that affect how a page is ranked ---

Search
Engine

AltaVista

Excite

HotBot

InfoSeek

Lycos

Northern
Light

Web
Crawler

Stop
Words

Yes

Yes

Yes

No

Yes

-

No

Relevancy Boosters

None

3 or 4 star review

Keywords in meta tag

Keywords in meta tag

None

None

Keywords in titles,
Link Popularity

Spam Penalty

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Display
--- Factors that affect how a page is listed ---

Search
Engine

AltaVista

Excite

HotBot

InfoSeek

Lycos

Northern
Light

Web
Crawler

Meta Tag Support

Yes

No

Yes

Yes

Partial

Partial

Yes

Title

Page title, otherwise, "No title"

Page title, otherwise, "Untitled"

Page title, otherwise, URL

Page title, otherwise, first line
on page

Page title, otherwise, first line
on page

-

Page title, otherwise, URL

Description

Meta tag,
or first few lines on page

Sentences grouped by concept; most dominant sentences extracted

Meta tag,
or first few lines on page

Meta tag,
or first 200 characters
after
<body> tag

Created based on content

First 25
HTML words, including title

Meta tag,
or first 275 characters after <body> tag

Results at a time 10 10, 20, 30, 40, 50 10, 25, 50, 75, 100 10, 20 (titles only) 5, 10, 15, 20, 30, 40, 50 25 10, 25, 100
Display Options Standard,
Compact,
Text-Only
Summaries,
Titles only,
Sort by site
Full
(4 lines),
Brief
(1 line),
Titles only
Summaries,
Titles Only
Standard,
Summary,
Detailed
None Titles only,
Summaries

Other

Search
Engine

AltaVista

Excite

HotBot

InfoSeek

Lycos

Northern
Light

Web
Crawler

URL Status Check

Displays
listing

None

Semi-displays
listing

Displays
listing

Reports if indexed

None

Reports if indexed

Site
Removal

Remove pages and resubmit

Remove site or install robots.txt

Install robots.txt

Remove and resubmit site or install robots.txt

-

-

Remove page, resubmit using Dead URL form

Crawler Name

Scooter

Architext
Spider

Slurp the Web Hound

Side
winder

T-Rex

Gulliver

Spidey

Indexes
ALT text
Yes No No Yes Yes - Yes
Indexes comments No No Yes Yes No - No
Stemming No No No Yes Yes - No

Go back Home

© This information is compliments of
Search Engine Watch
http://searchenginewatch.com/