Learning to Detect Phishing Emails

Abstract

There are an increasing number of emails purporting to be from a trusted entity that attempt to deceive users into providing account or identity information, commonly known as phishing emails. Traditional spam filters are not adequately detecting these undesirable emails, and this causes problems for both consumers and businesses wishing to do business online. From a learning perspective, this is a challenging problem. At first glance, the problem appears to be a simple text classification problem, but the classification is confounded by the fact that the class of phishing emails is nearly identical to the class of real emails. We propose a new method for detecting these malicious emails called PILFER. By incorporating features specifically designed to highlight the deceptive methods used to fool users, we are able to accurately classify over 92% of phishing emails, while maintaining a false positive rate on the order of 0.1%. These results are obtained on a dataset of approximately 860 phishing emails and 6950 non-phishing emails. The accuracy of PILFER on this dataset is significantly better than that of SpamAssassin, a widely-used spam filter.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2006
Accession Number
ADA456046

Entities

People

  • Anthony Tomasic
  • Ian Fette
  • Norman Sadeh

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • Cyber

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Applied Computer Science
  • Artificial Intelligence Software
  • Bayesian Networks
  • Classification
  • Computer Science
  • Data Sets
  • Detection
  • Electronic Mail
  • Learning
  • Machine Learning
  • Network Protocols
  • Phishers
  • Standards
  • Supervised Machine Learning
  • Web Browsers

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Cybersecurity.
  • Educational Psychology

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms