FASTA format

FASTA format is a text-based format for representing nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The name "FASTA" derives from the FASTA software package, first developed in the 1980s by David J. Lipman and William R. Pearson, which was designed for sequence alignment and searching. Today, FASTA format is widely used in bioinformatics for sequence alignment, sequence database searches, and in various types of bioinformatics software and databases.

Format[edit | edit source]

The FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (>) symbol at the beginning. The word following the ">" symbol is the identifier of the sequence, and the rest of the line is the description (both are optional). The sequence ends if another line starting with a ">" appears; this indicates the start of another sequence.

Example[edit | edit source]

>seq1 Two different sequences
GATCAGTAGC
>seq2 Another sequence
TTAGGATCTG

In this example, there are two sequences. The first sequence has an identifier of "seq1" and a description of "Two different sequences". The sequence "GATCAGTAGC" follows the description. The second sequence is identified by "seq2" with a description of "Another sequence" and has the sequence "TTAGGATCTG".

Usage[edit | edit source]

FASTA format is used for a variety of purposes in bioinformatics, including:

Sequence alignment: Tools like BLAST (Basic Local Alignment Search Tool) and Clustal use FASTA format for input and output sequences.
Sequence database searches: Databases such as GenBank, EMBL, and Swiss-Prot allow users to download sequences in FASTA format.
Molecular biology software: Many software tools for sequence analysis, gene prediction, and other tasks accept sequences in FASTA format.

Advantages and Limitations[edit | edit source]

The simplicity of FASTA format is a major advantage, making it easy to create, edit, and parse with basic text-processing tools. However, this simplicity also means that FASTA format lacks the ability to represent complex annotations and features of sequences, such as gene locations, exons, and introns. For more complex annotations, formats such as GenBank format or GFF (General Feature Format) are more appropriate.

Search WikiMD

Ad.Tired of being Overweight? Try W8MD's physician weight loss program.
Semaglutide (Ozempic / Wegovy and Tirzepatide (Mounjaro / Zepbound) available.
Advertise on WikiMD

WikiMD's Wellness Encyclopedia

Let Food Be Thy Medicine
Medicine Thy Food - Hippocrates

Translate this page: - East Asian 中文, 日本, 한국어, South Asian हिन्दी, தமிழ், తెలుగు, Urdu, ಕನ್ನಡ, Southeast Asian Indonesian, Vietnamese, Thai, မြန်မာဘာသာ, বাংলা
European español, Deutsch, français, Greek, português do Brasil, polski, română, русский, Nederlands, norsk, svenska, suomi, Italian
Middle Eastern & African عربى, Turkish, Persian, Hebrew, Afrikaans, isiZulu, Kiswahili,
Other Bulgarian, Hungarian, Czech, Swedish, മലയാളം, मराठी, ਪੰਜਾਬੀ, ગુજરાતી, Portuguese, Ukrainian

Medical Disclaimer: WikiMD is not a substitute for professional medical advice. The information on WikiMD is provided as an information resource only, may be incorrect, outdated or misleading, and is not to be used or relied on for any diagnostic or treatment purposes. Please consult your health care provider before making any healthcare decisions or for guidance about a specific medical condition. WikiMD expressly disclaims responsibility, and shall have no liability, for any damages, loss, injury, or liability whatsoever suffered as a result of your reliance on the information contained in this site. By visiting this site you agree to the foregoing terms and conditions, which may from time to time be changed or supplemented by WikiMD. If you do not agree to the foregoing terms and conditions, you should not enter or use this site. See full disclaimer.
Credits:Most images are courtesy of Wikimedia commons, and templates, categories Wikipedia, licensed under CC BY SA or similar.