Extract chains' and polypeptides' sequences from PDB or mmCIF files.
Extract chains’ and polypeptides’ sequences from PDB or mmCIF files.
Only the residues with structural information are saved. Residues missing in the structure give place to the possible existence of more than one polypeptide in each chain.
Two file are generated by input structure, one with each individual polypeptide sequence, and other with whole chain sequences, where peptides sequences within the chain are concatenated.
Output files are in fasta format, with an ID composed by
The chain sequences files are particularly useful when you need to align target sequences against structural templates for molecular similarity modeling (the reason why I wrote this program in the first place).
struct2seq [-h] [-v] structfile [structfile ...]
This is a Python script, so, you can just run the uniqseq.py file or put a symbolic link in any directory of your PATH (e.g. /usr/local/bin). The second option is recommend.
struc2seq 3WKV.pdb
struct2seq 5zcs.cif 3WKV.pdb 5zd5.cif