Topic Sensitive TrustRank

I believe I found the paper on Topic Sensitive TrustRank [PDF] from Bill.

The thesis of the paper is that TrustRank is fundamentally flawed by being biased toward topical communities that are over represented in the seed set of trusted sites. Topics that are overrepresented in seed sets are often commercial in nature and also focused heavily upon by search spammers. Thus overweighting those seeds may also overweight many spammy topics and spammy pages.

By using a directory such as DMOZ or the Yahoo! Directory to offer seed sites and using those directory categories to categorize topic sensitive TrustRank scores the belief is that overall relevancy can be improved, while shifting the focus away from overrepresented topics that occur in a smaller seed set.

Since using DMOZ or the Yahoo! Directory as a seed set would vastly increase the seed set size it would be impractical to manually review all seeds, so you take the top half of trusted domains (as determined by topical TrustRank) from each topic to use as seeds. Weight the seed voting power by its PageRank and let this topic sensitive TrustRank happily propagate through the web.

Published: May 15, 2006 by Aaron Wall


May 15, 2006 - 9:49pm

Hi Aaron,

I mistakenly linked to another paper twice in my post about a series of search papers out of Lehigh University, when I should have included that one.

So while I may have led you to the list of publications from Brian D. Davison and his Ph.D. students at Lehigh, you did identify the importance of that paper on your own.

One of the other papers that they will be presenting to the 15th annual conference on the WWW also takes a critical look at Trustrank - Propagating Trust and Distrust to Demote Web Spam (pdf). It critiques one of the other assumptions that trustrank uses, which states that the more links there are on a trusted page, the less the pages being pointed to should be trusted - sort of a trust splitting and attenuation approach. It also explores using something like badrank to create distrust scores which could be used in conjunction with a trustrank approach.

I really liked their paper on cloaking (pdf) too, which will also be presented at the conference next week.

May 15, 2006 - 9:59pm

Yeah...not sure how I found that paper. I was reading your site...then somehow I was at that paper...then I went back to link to the post linking to that paper and couldn't find it ;)

Great stuff you are posting Bill. I wish I had a bit more time to read through all the great stuff you are digging up.

I printed out about 4 research papers today, but need to finish rewriting my book before I start reading more.

May 15, 2006 - 10:39pm

You possibly clicked through the link I had to Brian Davidson's home page, and looked at his list of publications, from one of my posts last night. I've changed it so that I have a link to that paper now, instead of two links to their paper on "Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings."

Thanks for you kind words. I hear you on finding the time to keep up with all of this stuff and keep up with work, too.

