The FASTA file format is used for representing one or more nucleotide or amino acid sequences as a continuous string of characters. Sequences are annotated with a comment line, which starts with the >
character, that precedes each sequence. The comment line is typically formatted in a uniform way, dictated by the sequence's source database or generating software. For example:
>gi|62241013|ref|NP_001014431.1| RAC-alpha serine/threonine-protein kinase [Homo sapiens]
MSDVAIVKEGWLHKRGEYIKTWRPRYFLLKNDGTFIGYKERPQDVDQREAPLNNFSVAQCQLMKTERPRP
NTFIIRCLQWTTVIERTFHVETPEEREEWTTAIQTVADGLKKQEEEEMDFRSGSPSDNSGAEEMEVSLAK
PKHRVTMNEFEYLKLLGKGTFGKVILVKEKATGRYYAMKILKKEVIVAKDEVAHTLTENRVLQNSRHPFL
TALKYSFQTHDRLCFVMEYANGGELFFHLSRERVFSEDRARFYGAEIVSALDYLHSEKNVVYRDLKLENL
MLDKDGHIKITDFGLCKEGIKDGATMKTFCGTPEYLAPEVLEDNDYGRAVDWWGLGVVMYEMMCGRLPFY
NQDHEKLFELILMEEIRFPRTLGPEAKSLLSGLLKKDPKQRLGGGSEDAKEIMQHRFFAGIVWQHVYEKK
LSPPFKPQVTSETDTRYFDEEFTAQMITITPPDQDDSMECVDSERRPHFPQFSYSASGTA
The above example illustrates the amino acid sequence of an isoform of the human AKT1 genes, as fetched from the NCBI protein database. The header line specifies that this sequence may be identified with the GI ID 62241013
and the protein transcript ID NP_001014431.1
. This protein is named RAC-alpha serine/threonine-protein kinase
and is derived from the species, Homo sapiens
.