FASTC: a file format for multi‐character sequence data

Abstract

Here, we define a sequence file format that allows for multi‐character elements (FASTC). The format is derived from the FASTA format and the custom alphabet format of POY4/5. The format is more general than either of these formats and can represent a broad variety of sequence‐type data. This format should be useful for analyses involving datasets encoded as linear streams such as gene synteny, comparative linguistics, temporal gene expression and development, complex animal behaviours, and general biological time‐series data.

Document Details

Document Type
Pub Defense Publication
Publication Date
Feb 12, 2019
Source ID
10.1111/cla.12370

Entities

People

  • Alexander J. Washburn
  • Ward C. Wheeler

Organizations

  • American Museum of Natural History
  • Defense Advanced Research Projects Agency

Tags

Fields of Study

  • Biology

Readers

  • Computer Science.
  • Molecular and genetic basis of cancer.
  • Wave Propagation and Nonlinear Chaotic Dynamics.