Fractional pagerank crawler: Prioritizing URLs efficiently for crawling important pages early

Md Hijbul Alam, JongWoo Ha, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Crawling important pages early is a well studied problem. However, the availability of di.erent types of framework for publishing web content greatly increases the number of web pages. Therefore, the crawler should be fast enough to prioritize and download the important pages. As the importance of a page is not known before or during its download, the crawler needs a great deal of time to approximate the importance to prioritize the download of the web pages. In this research, we propose Fractional PageRank crawlers that prioritize the downloaded pages for the purpose of discovering important URLs early during the crawl. Our experiments demonstrate that they improve the running time dramatically while crawling the important pages early.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 14th International Conference, DASFAA 2009, Proceedings
Pages590-594
Number of pages5
DOIs
Publication statusPublished - 2009
Event14th International Conference on Database Systems for Advanced Applications, DASFAA 2009 - Brisbane, QLD, Australia
Duration: 2009 Apr 212009 Apr 23

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5463
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other14th International Conference on Database Systems for Advanced Applications, DASFAA 2009
Country/TerritoryAustralia
CityBrisbane, QLD
Period09/4/2109/4/23

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Fractional pagerank crawler: Prioritizing URLs efficiently for crawling important pages early'. Together they form a unique fingerprint.

Cite this