Sequence database

From WikiMD's Wellness Encyclopedia

Sequence database refers to a specialized type of database designed to store biological sequence information, such as DNA, RNA, and protein sequences. These databases are essential tools in the field of bioinformatics, facilitating a wide range of activities including gene identification, phylogenetics, and the study of evolutionary biology. Sequence databases can be broadly categorized into primary, secondary, and composite databases, each serving different purposes and containing various types of sequence data and annotations.

Primary Sequence Databases[edit | edit source]

Primary sequence databases are repositories that store raw sequence data generated from sequencing projects. They provide minimal annotation, such as the source of the sequence (the organism), the name of the submitter, and the date of submission. The most prominent examples include:

  • GenBank: A comprehensive public database of nucleotide sequences and supporting bibliographic and biological annotation. It is maintained by the National Center for Biotechnology Information (NCBI).
  • EMBL Bank: Managed by the European Molecular Biology Laboratory, this database collects nucleotide sequences from all over the world and offers free access to the data.
  • DDBJ: The DNA Data Bank of Japan, which collects DNA sequences from researchers and provides free access.

Secondary Sequence Databases[edit | edit source]

Secondary sequence databases contain curated sequences that have been processed and annotated with additional information, such as the function of the protein, domain structure, and evolutionary relationships. These databases often derive their content from primary databases but add value through curation and annotation. Examples include:

  • UniProt: A comprehensive resource for protein sequence and annotation data, providing detailed information about the biological function of proteins.
  • Pfam: A database of protein families, each represented by multiple sequence alignments and hidden Markov models.

Composite Sequence Databases[edit | edit source]

Composite sequence databases integrate data from both primary and secondary databases to provide a more comprehensive view of sequence information. They may also include specialized tools for data analysis. An example is the NCBI Entrez system, which allows users to search and retrieve information from multiple databases simultaneously.

Applications and Importance[edit | edit source]

Sequence databases are indispensable in the field of bioinformatics and molecular biology. They support a wide range of scientific research, including:

  • Identifying new genes and determining their function.
  • Studying genetic variation and its implications for health and disease.
  • Tracing the evolutionary history of organisms and genes.
  • Developing new drugs and vaccines.

Challenges and Future Directions[edit | edit source]

As sequencing technologies continue to evolve, sequence databases face challenges in data volume, quality control, and annotation. Efforts are ongoing to improve the accuracy and completeness of sequence annotations, integrate data from diverse sources, and develop more powerful tools for data analysis and visualization.

See Also[edit | edit source]

WikiMD
Navigation: Wellness - Encyclopedia - Health topics - Disease Index‏‎ - Drugs - World Directory - Gray's Anatomy - Keto diet - Recipes

Search WikiMD

Ad.Tired of being Overweight? Try W8MD's physician weight loss program.
Semaglutide (Ozempic / Wegovy and Tirzepatide (Mounjaro / Zepbound) available.
Advertise on WikiMD

WikiMD's Wellness Encyclopedia

Let Food Be Thy Medicine
Medicine Thy Food - Hippocrates

WikiMD is not a substitute for professional medical advice. See full disclaimer.
Credits:Most images are courtesy of Wikimedia commons, and templates Wikipedia, licensed under CC BY SA or similar.

Contributors: Prab R. Tumpati, MD