Towards a Statistical Analysis of Genetic Sequences Data with Particular Reference to Protein Sequences.
Abstract
This report develops a variety of character matrices as graphical tools for the visual examination of genetic sequences and in particular protein sequences. The NNC, PNC, BNCl, BNC2, and BNC3 matrices are designed to filter noise without severely suppressing signals in the CC matrix. The Matrix Smear of a character matrix is introduced as a measure of signals and noise in the matrix. The asymptotic distribution of the smears of the CC and NNC matrices are derived under the independence model. The asymptotic result is used in conjunction with exact confidence intervals from diagonal smears to automate partially the visual examination of character matrices. A generalized likelihood ratio procedure is developed to automate fully the detection of signals in two protein sequences. A simulation study has proven the procedure to be powerful and robust in detecting signals of success probability .90 and length 9 implanted within noisy binary strings of length 291 characters and success probability .15. Originator-supplied keywords include: Genetic sequences, DNA, Matrix Smear, Character Matrix Graphics.
Document Details
- Document Type
- Technical Report
- Publication Date
- Mar 01, 1985
- Accession Number
- ADA153605
Entities
People
- S. P. Arsenis
Organizations
- Massachusetts Institute of Technology