On Mining Web Access Logs

Abstract

The proliferation of information on the world wide web has made the personalization of this information space a necessity. One possible approach to web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. In the absence of any a priori knowledge, unsupervised classification or clustering methods seem to be ideally suited to analyze the semi-structured log data of user accesses. In this paper, we define the notion of a user session , as well as a dissimilarity measure between two web sessions that captures the organization of a web site. To extract a user access profile, we cluster the user sessions based on the pair-wise dissimilarities using a robust fuzzy clustering algorithm that we have developed. We report the results of experiments with our algorithm and show that this leads to extraction of interesting user profiles. We also show that it outperforms association rule based approaches for this task.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2000
Accession Number
ADA461525

Entities

People

  • Anupam Joshi
  • Raghu Krishnapuram

Organizations

  • University of Maryland, Baltimore County

Tags

Communities of Interest

  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Abstracts
  • Access Time
  • Algorithms
  • Clustering
  • Computations
  • Computer Science
  • Data Mining
  • Data Sets
  • Electrical Engineering
  • Engineering
  • Hierarchies
  • Information Operations
  • Information Retrieval
  • Military Research
  • Network Protocols
  • Websites
  • World Wide Web

Fields of Study

  • Computer science

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Artificial Intelligence
  • Cybersecurity.

Technology Areas

  • Space