Efficient Video Similarity Measurement and Search

Abstract

The amount of information on the world wide web has grown enormously since its creation in 1990. Duplication of content is inevitable because there is no central management on the web. Studies have shown that many similar versions of the same text documents can be found throughout the web. This redundancy problem is more severe for multimedia content such as web video sequences, as they are often stored in multiple locations and di erent formats to facilitate downloading and streaming. Similar versions of the same video can also be found, unknown to content creators, when web users modify and republish original content using video editing tools. Iden- tifying similar content can bene t many web applications and content owners. For example, it will reduce the number of similar answers to a web search and identify inappropriate use of copyright content. In this dissertation, we present a system ar- chitecture and corresponding algorithms to e ciently measure, search, and organize similar video sequences found on any large database such as the web. We rst introduce a class of randomized algorithms, called ViSig, to estimate video similarity. The basic idea is to summarize each video sequence into a small set of video frames, or a signature, that is most similar to a set of prede ned random images. Theoretical and experimental results show that video similarity can be reli- ably estimated by the ViSig method. Even though a small signature is su cient to estimate similarity, each frame in the signature is represented by a high-dimensional vector. Similarity search on a large database of high-dimensional vectors is a chal- lenging problem from a computational viewpoint. To solve this problem, we propose a novel non-linear feature extraction technique that can be used in a fast similarity search system.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2002
Accession Number
ADA608331

Entities

People

  • Sen-ching Cheung

Organizations

  • University of California, Berkeley

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Computational Complexity
  • Computers
  • Data Science
  • Databases
  • Digital Media
  • Electrical Engineering
  • Factor Analysis
  • Feature Extraction
  • Information Retrieval
  • Information Science
  • Lists (Data Structures)
  • Statistical Algorithms
  • Statistics
  • Theses
  • Two Dimensional
  • World Wide Web

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Neural Network Machine Learning.
  • Snow Cover Descriptors for Reptiles and Their Illustrations.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Learning Algorithms