bioinformatics Tutorial => Linearize a FASTA sequence with AWK

Example

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' < input.fa

one can read this awk script as:

if the current line ($0) starts like a fasta header (^>). Then we print a carriage return if this is not the first sequence. (N>0?"\n":"") followed with the line itself ($0), followed with a tabulation (\t). And we look for the next line (next;)
if the current line ($0) does not start like a fasta header, this is the default awk pattern. We just print the whole line without carriage return.
At the end (END) we only print a carriage return for the last sequence.

PDF - Download bioinformatics for free

Previous Next