Web robot detection based on pattern-matching technique

  • Shinil Kwon
  • , Young Gab Kim*
  • , Sungdeok Cha
  • *Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    25 Citations (Scopus)

    Abstract

    In web robot detection it is important is to find features that are common characteristics of diverse robots, in order to differentiate between them and humans. Existing approaches employ fairly simple features (e.g. empty referrer field, interval between successive requests), which often fail to reflect web robots' behaviour accurately. False alarms may therefore occur unacceptably often. In this paper we propose a fresh approach that expresses the behaviour of interactive users and various web robots in terms of a sequence of request types, called request patterns. Previous proposals have primarily targeted the detection of text crawlers, but our approach works well on many other web robots, such as image crawlers, email collectors and link checkers. In empirical evaluation of more than 1 billion requests collected at www.microsoft.com, our approach achieved 94% accuracy in web robot detection, estimated by F-measure. A decision tree algorithm proposed by Tan and Kumar was also applied to the same data. A comparison shows that the proposed approach is more accurate, and that real-time detection of web robots is feasible.

    Original languageEnglish
    Pages (from-to)118-126
    Number of pages9
    JournalJournal of Information Science
    Volume38
    Issue number2
    DOIs
    Publication statusPublished - 2012 Apr

    Bibliographical note

    Funding Information:
    The authors would like to thank the Microsoft Corporation, and MSRA UR in particular, for its generous support, without which the research reported in this paper could not have been performed. MSRA provided us with raw data as well as a research grant. This research was supported by the National IT Industry Promotion Agency (NIPA) under the programme of Software Engineering Technologies Development.

    Keywords

    • human pattern
    • pattern analysis
    • web robot detection
    • web robot pattern

    ASJC Scopus subject areas

    • Information Systems
    • Library and Information Sciences

    Fingerprint

    Dive into the research topics of 'Web robot detection based on pattern-matching technique'. Together they form a unique fingerprint.

    Cite this