项目作者: ericniso

项目描述 :
Project for the course of Elementi di Bioinformatica AY 2017/2018
高级语言: C
项目地址: git://github.com/ericniso/hague.git
创建时间: 2017-12-26T13:54:04Z
项目社区:https://github.com/ericniso/hague

开源协议:MIT License

下载


Hague

An adjacency-list based De Bruijn graphs implementation

Build requirements

  • C compiler
  • make
  • zlib
  • gengetopt

Tested on Ubuntu 16.04 and Windows 10 using Ubuntu bash

Build

Download from Github

  1. $ git clone git git@github.com:ericniso/hague.git
  2. $ cd hague

Download submodules

  1. $ git submodule init
  2. $ git submodule update

Compile binaries, executable file is in debug folder

  1. $ make bin

Compile debug binaries, executable file is in debug folder

  1. $ make debug

Compile shared library, lib file is in lib folder

  1. $ make lib

Compile all previous targets

  1. $ make all

Test

After compiling the source code, we recommend to run automated tests in order to check for compilation success

  1. $ make test

Documentation

Docs can be generated using

  1. $ make doc

This will generated source code documentation both in html and latex in docs folder

Usage

  1. $ hague -f "/path/to/fasta/file" -k "k-mer-length"

Input file can be compressed .gz or not

Such command will generate the de Bruijn graph and output the result in csv format, here is an example output:

  1. $ hague -f ~/home/nicolaas/chr1_KI270707v1_random.fa.gz -k 10 | head -n 5
  2. Source, Target, Label
  3. AGGGGTCTG, GGGGTCTGC, AGGGGTCTGC
  4. GGGGTCTGC, GGGTCTGCT, GGGGTCTGCT
  5. GGGTCTGCT, GGTCTGCTT, GGGTCTGCTT
  6. GGTCTGCTT, GTCTGCTTA, GGTCTGCTTA

From left to right we can see:

  • Source is the source node, whose key is a (k-1)-mer
  • Target is the node that the source node is pointing to through and edge
  • Label is the edge label connecting the previous two nodes, and represents the corresponding k-mer string

If you want to redirect the output to a file you can specify a filename using the -o option:

  1. $ hague -f "/path/to/fasta/file" -k "k-mer-length" -o "/path/to/output/file"

There’s an additional feature, which is the superstring reconstruction, invoked by adding -w option:

  1. $ hague -f "/path/to/fasta/file" -k "k-mer-length" -w [-o "/path/to/output/file"]

Currently this is only a work in progress, since it only works if the graph is Eulerian (semi-Eulerian)

Authors

Eric Nisoli, Lorenzo Mammana

License

MIT LICENSE