项目作者: lskatz

项目描述 :
:deciduous_tree: Create a tree using Mash distances
高级语言: Perl
项目地址: git://github.com/lskatz/mashtree.git
创建时间: 2016-05-12T19:34:53Z
项目社区:https://github.com/lskatz/mashtree

开源协议:GNU General Public License v3.0

下载


mashtree

DOI

Create a tree using Mash distances.

For simple usage, see mashtree --help. This is an example command:

  1. mashtree *.fastq.gz > tree.dnd

For confidence values, run either with --help: mashtree_bootstrap.pl or mashtree_jackknife.pl.

Two modes: fast or accurate

Input files: fastq files are interpreted as raw read files. Fasta,
GenBank, and EMBL files are interpreted as genome
assemblies. Compressed files are also accepted of any of the
above file types. You can compress with gz, bz2, or zip.

Output files: Newick (.dnd). If --outmatrix is supplied, then
a distance matrix too.

See the documentation on the algorithms for more information.

Faster

  1. mashtree --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd

More accurate

You can get a more accurate tree with the minimum abundance finder. Simply
give --mindepth 0. This step helps ignore very unique kmers that are
more likely read errors.

  1. mashtree --mindepth 0 --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd

Adding confidence values

Mashtree can add confidence values using jack knifing. For each
jack knife tree, 50% of hashes are used. Confidence values are calculated from
the jack knife trees using BioPerl. When using this method, you can pass
flags to mashtree using the double-dash like in the example below.

Added in version 0.40.

  1. mashtree_jackknife.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.jackknife.dnd
  2. mashtree_jackknife.pl --help # additional usage help

Bootsrapping was added in version 0.55. This runs mashtree itself multiple times, each
with a random seed.

  1. mashtree_bootstrap.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.bootstrap.dnd

Usage

  1. Usage: mashtree [options] *.fastq *.fasta *.gbk *.msh > tree.dnd
  2. NOTE: fastq files are read as raw reads;
  3. fasta, gbk, and embl files are read as assemblies;
  4. Input files can be gzipped.
  5. --tempdir '' If specified, this directory will not be
  6. removed at the end of the script and can
  7. be used to cache results for future
  8. analyses.
  9. If not specified, a dir will be made for you
  10. and then deleted at the end of this script.
  11. --numcpus 1 This script uses Perl threads.
  12. --outmatrix '' If specified, will write a distance matrix
  13. in tab-delimited format
  14. --file-of-files If specified, mashtree will try to read
  15. filenames from each input file. The file of
  16. files format is one filename per line. This
  17. file of files cannot be compressed.
  18. --outtree If specified, the tree will be written to
  19. this file and not to stdout. Log messages
  20. will still go to stderr.
  21. --version Display the version and exit
  22. TREE OPTIONS
  23. --truncLength 250 How many characters to keep in a filename
  24. --sort-order ABC For neighbor-joining, the sort order can
  25. make a difference. Options include:
  26. ABC (alphabetical), random, input-order
  27. MASH SKETCH OPTIONS
  28. --genomesize 5000000
  29. --mindepth 5 If mindepth is zero, then it will be
  30. chosen in a smart but slower method,
  31. to discard lower-abundance kmers.
  32. --kmerlength 21
  33. --sketch-size 10000

Installation

Please see INSTALL.md

Further documentation

For perl library help, run perldoc on a .pm file, e.g., perldoc lib/Mashtree/Db.pm.

For executable help run --help, e.g., mashtree_bootstrap.pl --help.

For more information and help please see the docs folder

For more information on plugins, see the plugins folder. (in development)

For more information on contributions, please see CONTRIBUTING.md.

References

Citation

JOSS

Katz, L. S., Griswold, T., Morrison, S., Caravas, J., Zhang, S., den Bakker, H.C., Deng, X., and Carleton, H. A., (2019). Mashtree: a rapid comparison of whole genome sequence files. Journal of Open Source Software, 4(44), 1762, https://doi.org/10.21105/joss.01762

Poster

Katz, L. S., Griswold, T., & Carleton, H. A. (2017, October 8-11). Generating WGS Trees with Mashtree. Poster presented at the American Society for Microbiology Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines, Washington, DC. Poster number 27.