Towards a Statistical Analysis of Genetic Sequences Data with Particular Reference to Protein Sequences.

Abstract

This report develops a variety of character matrices as graphical tools for the visual examination of genetic sequences and in particular protein sequences. The NNC, PNC, BNCl, BNC2, and BNC3 matrices are designed to filter noise without severely suppressing signals in the CC matrix. The Matrix Smear of a character matrix is introduced as a measure of signals and noise in the matrix. The asymptotic distribution of the smears of the CC and NNC matrices are derived under the independence model. The asymptotic result is used in conjunction with exact confidence intervals from diagonal smears to automate partially the visual examination of character matrices. A generalized likelihood ratio procedure is developed to automate fully the detection of signals in two protein sequences. A simulation study has proven the procedure to be powerful and robust in detecting signals of success probability .90 and length 9 implanted within noisy binary strings of length 291 characters and success probability .15. Originator-supplied keywords include: Genetic sequences, DNA, Matrix Smear, Character Matrix Graphics.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 1985
Accession Number
ADA153605

Entities

People

  • S. P. Arsenis

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Algorithms
  • Amino Acids
  • Cells
  • Chemistry
  • Coding
  • Detection
  • False Alarms
  • Genetic Code
  • Intervals
  • Molecules
  • Probability
  • Sequences
  • Simulations
  • Statistical Analysis
  • Statistical Inference
  • Statistics
  • Two Dimensional

Readers

  • Computer Programming and Software Development.
  • Computer Vision.
  • Oncology and Biomarker-Based Cancer Detection.

Technology Areas

  • Biotechnology