Yahoo!’s Trustrank Approach To The Spam Problem
Source:
TrustRank is an attempt to counter the web spamming activities that threatens to deceive search engines ranking algorithms.
It propagates trust among web pages in the same manner that PageRank propagates authority. However, tests would show that the combination of trust and distrust values have greater ability to demote spam sites than with the use of trust values alone.
The Assumption
A link between two pages holds an implied conveyance of trust emanating from the source page to the target page. Pointing to a link is a vote of confidence from the source that the target is able to provide content that will be of value to the user. It basically revolves around the ideal set-up that good sites only point to similarly good sites and will not knowingly refer people to spam sites. These good sites hold the trust of people which is then used in propagating trust through the link structure of the web.
There are various proposals on ways to counter the problem. One proposal suggests the comparison of copies from both the browsers perspective and the crawlers perspective. It may be necessary to get two or more copies from each side to be able to detect cloaking. Another suggests a two-step process that would require fewer resources. The first step implements a filter by use of heuristics to eliminate web pages that cannot demonstrate cloaking. All the pages that have not been eliminated will go through the second step for inspection. Features are extracted from about four copies and a classifier is used to determine whether semantic cloaking is being done or not. However, the reality remains that no ideal solution has been arrived at to effectively curb semantic cloaking. This is a technique that should not be practiced by anyone who wants to maintain good business ethics. The practice continues to undermine the search engines attempts to provide users with the actual information they need.
Read more here…
Spread the word: bookmark it/ readit






