The TOPHITS Model for Higher-order Web Link Analysis

Abstract

As the size of the web increases, it becomes more and more important to analyze link structure while also considering context. Multilinear algebra provides a novel tool for incorporating anchor text and other information into the authority computation used by link analysis methods such as HITS. Our recently proposed TOPHITS method uses a higher-order analogue of the matrix singular value decomposition called the PARAFAC model to analyze a three-way representation of web data. We compute hubs and authorities together with the terms that are used in the anchor text of the links between them. Adding a third dimension to the data greatly extends the applicability of HITS because the TOPHITS analysis can be performed in advance and offline. Like HITS, the TOPHITS model reveals latent groupings of pages, but TOPHITS also includes latent term information. In this paper, we describe a faster mathematical algorithm for computing the TOPHITS model on sparse data, and Web data is used to compare HITS and TOPHITS. We also discuss how the TOPHITS model can be used in queries, such as computing context-sensitive authorities and hubs. We describe different query response methodologies and present experimental results.

Publication
In Proceedings of Link Analysis, Counterterrorism and Security 2006
Date
Citation
T. Kolda, B. Bader. The TOPHITS Model for Higher-order Web Link Analysis. In Proceedings of Link Analysis, Counterterrorism and Security 2006, Sixth SIAM International Conference on Data Mining, SDM06, Bethesda, MD (2006-04-22), 2006. http://www.siam.org/meetings/sdm06/workproceed/Link Analysis/21Tamara\_Kolda\_SIAMLACS.pdf

Keywords

PARAFAC, multilinear algebra, link analysis, higher-order SVD

BibTeX

@inproceedings{KoBa06,  
author = {Tamara Kolda and Brett Bader}, 
title = {The {TOPHITS} Model for Higher-order Web Link Analysis}, 
booktitle = {Proceedings of Link Analysis, Counterterrorism and Security 2006}, 
eventtitle = {Sixth SIAM International Conference on Data Mining, SDM06},
venue = {Bethesda, MD},
eventdate = {2006-04-22}, 
year = {2006},	
url = {http://www.siam.org/meetings/sdm06/workproceed/Link Analysis/21Tamara_Kolda_SIAMLACS.pdf},
}