Google Indexing Versus Caching
By : Nidhi Gupta
The crucial difference between indexing and caching is that indexing means making something searchable and caching means reprinting content. Google’s library scanning program makes things searchable in Google Print but reprinted.
Indexing the words on a web page isn’t that much different than indexing the words on a printed page. If you publish a site, Google reads the whole site into its cache and then lets you find things in it. Generally people who publish sites know this, and want Google to do this.

Google’s index and its cache are two different things, and it’s critical — absolutely critical — they not be confused like this.
When any search engine visits a web page, it effectively makes a copy of that page which is stored in the index. But the index literally breaks apart the page. It stores where words were located, were they in bold, what other words were they near, were the words in a hyperlink and so on.
Nothing in the index is anything you as a human being could read. Index may be described like a “big book of the web.” But it’s not, really. It’s more like a giant spreadsheet, where all the words of a page are in one row of the spreadsheet, each word to a different column, then the next page in the row below that, and so on. It’s not something a human being would read.

Aside from the index, Google, Yahoo, MSN and Ask Jeeves also make “cached” copies of pages available. You can see a copy of the exact page the search engine spidered. These cached pages are kept separate from the index. They are useful for when a page is down or for a copyright holder wants to see if someone has stolen and cloaked their content to feed to a spider. But the legality of showing such cached pages is also in question. No one today has challenged them in court. The reason seems to be that Google, which mainstreamed cached copies, lets site owners opt out of caching if they want.
All major search engines also let you opt out of being in their indexes, as well — a completely different thing — and another reason why the index shouldn’t be confused with the cache. To take Google as an example, you can:
- Have your page listed in the index (available to be found through searches) and have your page available as a cached copy
- Have your page listed in the index but not cached
- Have your page NOT listed in the index and thus also not cached.
The ability to opt-out of the index is another reason why we really haven’t had a major search engine sued over web search indexing. In addition, site owners generally want to be indexed, so they can get traffic. In fact, the reason so many are upset over the current indexing update at Google is that they feel changes are causing them to lose traffic. But whether it is LEGAL to do this type of indexing (as opposed to caching) still really hasn’t been tested.
So indexing and caching are NOT the same. Dave writes:
Del.icio.us
posted in SEO/Search Engine News | 0 Comments






