last update: 2015-11-09
Count FASTA record (everyone knows)
grep -c '^>' file.fasta
Remove empty FASTA records
perl -ne ' if (/^>/) { $h = $_; $s =~ s/\s+//g; print "$h$s\n" if length($s) > 0; $s = ""; } else { $s .= $_ } END{$s =~ s/\s+//g; print "$h$s\n" if length($s) > 0;}' t.fa > t2.fa
Easier solution (fasta2tab and tab2fasta are availabe on github):
fasta2tab t.fa -l | awk -F'\t' '$3 > 0' | tab2fasta -l 70 > t2.fa
Extract the first 500 bp of all seqs in fasta file. More of seqret. More EMBOSS apps.
seqret -auto -sbegin1 1 -send1 500 -sequence file.fasta -outseq new_file.fasta
fasta2tab t.fa -l | awk -F'\t' '$3 > 500' | tab2fasta -l 70 > t2.fa
Reverse-complement
seqret -sequence seq.fasta -outseq seq.rc.fa -srev
fasta2tab -rc seq.fasta > seq.rc.fa
Split one fasta file into several with awk, from here:
awk '/^>/ {OUT=substr($0,2) ".fa"}; OUT {print >OUT}' dna.fa
Rename sequence head of One-seq-fasta with the filename prefix
for f in *.fa; do bn=$(basename $f); export prefix=${bn//.fa/}; fasta2tab $f | perl -ne '/\t(.+)/; print "$ENV{prefix}\t$1\n" ' | tab2fasta -l 70 > $f.rename; mv $f.rename $f; done
HUGOMORE42
More