Script-Independent Text Line Segmentation in Freestyle Handwritten Documents

Abstract

Text line segmentation in freestyle handwritten documents remains an open document analysis problem. Curvilinear text lines and small gaps between neighboring text lines present a challenge to algorithms developed for machine printed or hand-printed documents. In this paper, we propose a novel approach based on density estimation and a state-of-the-art image segmentation technique, the level set method. From an input document image, we estimate a probability map, where each element represents the probability that the underlying pixel belongs to a text line. The level set method is then exploited to determine the boundary of neighboring text lines by evolving an initial estimate. Unlike most connected component based methods, the proposed algorithm does not use any script-specific knowledge. Extensive quantitative experiments on freestyle handwritten documents with diverse scripts, such as Arabic, Chinese, Korean, and Hindi, demonstrate that our algorithm consistently outperforms previous methods. Further experiments show the proposed algorithm is robust to scale change, rotation, and noise.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 2006
Accession Number
ADA460371

Entities

People

  • David S. Doermann
  • Stefan Jaeger
  • Yefeng Zheng
  • Yi Li

Organizations

  • University of Maryland

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Computational Science
  • Computer Vision
  • Computers
  • Data Sets
  • Databases
  • Detection
  • Fluid Mechanics
  • Geometry
  • Human Behavior
  • Orientation (Direction)
  • Probability
  • Probability Density Functions
  • Random Variables
  • Statistical Analysis
  • Statistics
  • Two Dimensional

Fields of Study

  • Computer science

Readers

  • Calculus or Mathematical Analysis
  • Computational Linguistics
  • Image Processing and Computer Vision.