Learning to detect malicious URLs

Abstract

Malicious Web sites are a cornerstone of Internet criminal activities. The dangers of these sites have created a demand for safeguards that protect end-users from visiting them. This article explores how to detect malicious Web sites from the lexical and host-based features of their URLs. We show that this problem lends itself naturally to modern algorithms for online learning. Online algorithms not only process large numbers of URLs more efficiently than batch algorithms, they also adapt more quickly to new features in the continuously evolving distribution of malicious URLs. We develop a real-time system for gathering URL features and pair it with a real-time feed of labeled URLs from a large Web mail provider. From these features and labels, we are able to train an online classifier that detects malicious Web sites with 99% accuracy over a balanced dataset.

Document Details

Document Type: Pub Defense Publication
Publication Date: Apr 01, 2011
Source ID: 10.1145/1961189.1961202

Entities

People

Geoffrey M. Voelker
Justin Ma
Lawrence K. Saul
Stefan Savage

Organizations

National Science Foundation
Office of Naval Research
University of California, Berkeley
University of California, San Diego

Learning to detect malicious URLs

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers