FASTA format

From WikiMD's Wellness Encyclopedia

FASTA format is a text-based format for representing nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The name "FASTA" derives from the FASTA software package, first developed in the 1980s by David J. Lipman and William R. Pearson, which was designed for sequence alignment and searching. Today, FASTA format is widely used in bioinformatics for sequence alignment, sequence database searches, and in various types of bioinformatics software and databases.

Format[edit | edit source]

The FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (>) symbol at the beginning. The word following the ">" symbol is the identifier of the sequence, and the rest of the line is the description (both are optional). The sequence ends if another line starting with a ">" appears; this indicates the start of another sequence.

Example[edit | edit source]

>seq1 Two different sequences
GATCAGTAGC
>seq2 Another sequence
TTAGGATCTG

In this example, there are two sequences. The first sequence has an identifier of "seq1" and a description of "Two different sequences". The sequence "GATCAGTAGC" follows the description. The second sequence is identified by "seq2" with a description of "Another sequence" and has the sequence "TTAGGATCTG".

Usage[edit | edit source]

FASTA format is used for a variety of purposes in bioinformatics, including:

Advantages and Limitations[edit | edit source]

The simplicity of FASTA format is a major advantage, making it easy to create, edit, and parse with basic text-processing tools. However, this simplicity also means that FASTA format lacks the ability to represent complex annotations and features of sequences, such as gene locations, exons, and introns. For more complex annotations, formats such as GenBank format or GFF (General Feature Format) are more appropriate.

See Also[edit | edit source]

WikiMD
Navigation: Wellness - Encyclopedia - Health topics - Disease Index‏‎ - Drugs - World Directory - Gray's Anatomy - Keto diet - Recipes

Search WikiMD

Ad.Tired of being Overweight? Try W8MD's physician weight loss program.
Semaglutide (Ozempic / Wegovy and Tirzepatide (Mounjaro / Zepbound) available.
Advertise on WikiMD

WikiMD's Wellness Encyclopedia

Let Food Be Thy Medicine
Medicine Thy Food - Hippocrates

WikiMD is not a substitute for professional medical advice. See full disclaimer.
Credits:Most images are courtesy of Wikimedia commons, and templates Wikipedia, licensed under CC BY SA or similar.

Contributors: Prab R. Tumpati, MD