FASTA format
FASTA format is a text-based format for representing nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The name "FASTA" derives from the FASTA software package, first developed in the 1980s by David J. Lipman and William R. Pearson, which was designed for sequence alignment and searching. Today, FASTA format is widely used in bioinformatics for sequence alignment, sequence database searches, and in various types of bioinformatics software and databases.
Format[edit | edit source]
The FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (>) symbol at the beginning. The word following the ">" symbol is the identifier of the sequence, and the rest of the line is the description (both are optional). The sequence ends if another line starting with a ">" appears; this indicates the start of another sequence.
Example[edit | edit source]
>seq1 Two different sequences GATCAGTAGC >seq2 Another sequence TTAGGATCTG
In this example, there are two sequences. The first sequence has an identifier of "seq1" and a description of "Two different sequences". The sequence "GATCAGTAGC" follows the description. The second sequence is identified by "seq2" with a description of "Another sequence" and has the sequence "TTAGGATCTG".
Usage[edit | edit source]
FASTA format is used for a variety of purposes in bioinformatics, including:
- Sequence alignment: Tools like BLAST (Basic Local Alignment Search Tool) and Clustal use FASTA format for input and output sequences.
- Sequence database searches: Databases such as GenBank, EMBL, and Swiss-Prot allow users to download sequences in FASTA format.
- Molecular biology software: Many software tools for sequence analysis, gene prediction, and other tasks accept sequences in FASTA format.
Advantages and Limitations[edit | edit source]
The simplicity of FASTA format is a major advantage, making it easy to create, edit, and parse with basic text-processing tools. However, this simplicity also means that FASTA format lacks the ability to represent complex annotations and features of sequences, such as gene locations, exons, and introns. For more complex annotations, formats such as GenBank format or GFF (General Feature Format) are more appropriate.
See Also[edit | edit source]
Search WikiMD
Ad.Tired of being Overweight? Try W8MD's physician weight loss program.
Semaglutide (Ozempic / Wegovy and Tirzepatide (Mounjaro / Zepbound) available.
Advertise on WikiMD
WikiMD's Wellness Encyclopedia |
Let Food Be Thy Medicine Medicine Thy Food - Hippocrates |
Translate this page: - East Asian
中文,
日本,
한국어,
South Asian
हिन्दी,
தமிழ்,
తెలుగు,
Urdu,
ಕನ್ನಡ,
Southeast Asian
Indonesian,
Vietnamese,
Thai,
မြန်မာဘာသာ,
বাংলা
European
español,
Deutsch,
français,
Greek,
português do Brasil,
polski,
română,
русский,
Nederlands,
norsk,
svenska,
suomi,
Italian
Middle Eastern & African
عربى,
Turkish,
Persian,
Hebrew,
Afrikaans,
isiZulu,
Kiswahili,
Other
Bulgarian,
Hungarian,
Czech,
Swedish,
മലയാളം,
मराठी,
ਪੰਜਾਬੀ,
ગુજરાતી,
Portuguese,
Ukrainian
WikiMD is not a substitute for professional medical advice. See full disclaimer.
Credits:Most images are courtesy of Wikimedia commons, and templates Wikipedia, licensed under CC BY SA or similar.
Contributors: Prab R. Tumpati, MD