TY - GEN
T1 - Fractional pagerank crawler
T2 - 14th International Conference on Database Systems for Advanced Applications, DASFAA 2009
AU - Alam, Md Hijbul
AU - Ha, JongWoo
AU - Lee, Sang-Geun
PY - 2009
Y1 - 2009
N2 - Crawling important pages early is a well studied problem. However, the availability of di.erent types of framework for publishing web content greatly increases the number of web pages. Therefore, the crawler should be fast enough to prioritize and download the important pages. As the importance of a page is not known before or during its download, the crawler needs a great deal of time to approximate the importance to prioritize the download of the web pages. In this research, we propose Fractional PageRank crawlers that prioritize the downloaded pages for the purpose of discovering important URLs early during the crawl. Our experiments demonstrate that they improve the running time dramatically while crawling the important pages early.
AB - Crawling important pages early is a well studied problem. However, the availability of di.erent types of framework for publishing web content greatly increases the number of web pages. Therefore, the crawler should be fast enough to prioritize and download the important pages. As the importance of a page is not known before or during its download, the crawler needs a great deal of time to approximate the importance to prioritize the download of the web pages. In this research, we propose Fractional PageRank crawlers that prioritize the downloaded pages for the purpose of discovering important URLs early during the crawl. Our experiments demonstrate that they improve the running time dramatically while crawling the important pages early.
UR - http://www.scopus.com/inward/record.url?scp=67650099403&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67650099403&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-00887-0_52
DO - 10.1007/978-3-642-00887-0_52
M3 - Conference contribution
AN - SCOPUS:67650099403
SN - 9783642008863
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 590
EP - 594
BT - Database Systems for Advanced Applications - 14th International Conference, DASFAA 2009, Proceedings
Y2 - 21 April 2009 through 23 April 2009
ER -