R package for Enrichment Depletion Logos (EDLogos) and String Logos
Logolas is an R package for Enrichment Depletion Logo plots with
string symbols, that highlights both enrichment and depletion of symbols, as opposed
to standard logo plots, as in seqLogo package,
that are biased towards highlighting enrichments. Logolas also generalizes logo
plots to use both characters and strings.
If you find a bug, please create an
issue.
This code has been tested in …
Copyright (c) 2018-2019, Kushal Dey.
All source code and software in this repository are made available
under the terms of the GNU General Public
License. See the
LICENSE file for the full text of the license.
If you find that this R package is useful for your work, please cite
our paper which is out on BMC Bioinformatics:
Dey, K.K., Xie, D. and Stephens, M., 2018. A new sequence logo plot
to highlight enrichment and depletion. BMC Bioinformatics. 19:473
https://doi.org/10.1186/s12859-018-2489-3.
The most recent version of Logolas is available from Github using devtools R package.First, you would
require to install the following Bioconductor packages.
source("https://bioconductor.org/biocLite.R")
biocLite(c("Biostrings","BiocStyle","Biobase","seqLogo","ggseqlogo"))
Then install Logolas as follows
library(devtools)
install_github("kkdey/Logolas",build_vignettes = TRUE)
Once you have installed the package, load the package in R by entering
library(Logolas)
To get an overview of the package, enter
help(package = "Logolas")
Next, try creating a few plots using the logomaker
function:
Create a standard Logo plot in Logolas, analogous to seqLogo
andggseqLogo
R packages.
sequence <- c("CTATTGT","CTCTTAT","CTATTAA","CTATTTA", "CTATTAT","CTTGAAT",
"CTTAGAT","CTATTAA","CTATTTA","CTATTAT", "CTTTTAT","CTATAGT",
"CTATTTT","CTTATAT","CTATATT","CTCATTT", "CTTATTT","CAATAGT",
"CATTTGA","CTCTTAT","CTATTAT","CTTTTAT", "CTATAAT","CTTAGGT",
"CTATTGT","CTCATGT","CTATAGT", "CTCGTTA","CTAGAAT","CAATGGT")
logomaker(sequence,type = "Logo")
The corresponding EDLogo plot highlights the depletion of T in the middle, not
visually clear in the standard logo plot.
logomaker(sequence, type = "EDLogo")
One can also apply EDLogo for amino acid motifs, marked by alphabets beyond A, C, G and T as in
DNA motifs.
We create an EDLogo plot on the amino acid sequences at N-Glycosylation sites, with a user specified
background bg
chosen to be the median psoitional weight of an aminoa acid in the context around the
glycosylation site [data from Uniprotkb].
data("N_Glycosyl_sequences")
bg <- apply(N_Glycosyl_sequences, 1, function(x) return(median(x)))
bg <- bg/sum(bg)
logomaker(N_Glycosyl_sequences, type = "EDLogo", bg=bg)
EDLogo highlights the motif Asn (N) -X- Ser (S)/Thr (T) -X motif at the center where X is depleted for the amino acid Pro (P).
Logolas allows the symbols in the logo plot to be a combination of strings and charcaters or be purely strings - examples of which are shown below
For a mutation signature (mismatch type at the center with flanking bases) example (data from Shiraishi et al 2015).
data(mutation_sig)
logomaker(mutation_sig, type = "EDLogo", color_type = "per_symbol", color_seed = 2000)
EDLogo plot for the enrichment and depletion of histone marks in different parts of the genome (data from Koch et al 2007).
data(histone_marks)
logomaker(histone_marks$mat, bg = histone_marks$bgmat, type = "EDLogo")
Finally, please walk through some more detailed examples in the
vignette:
vignette("Logolas")
This was the R command used to generate the vignette PDF file from the
R Markdown source:
render("Logolas.Rmd",output_format="pdf_document")
This software was developed by Kushal Dey,
Dongyue Xie and
Matthew Stephens at the University
of Chicago. For any questions or comments, please contact Kushal Dey
at kkdey@uchicago.edu"">kkdey@uchicago.edu.
The authors would like to acknowledge Oliver Bembom, the author of theseqLogo
package which acted as an inspiration and starting point for this
software. The authors also thank Peter Carbonetto, Edward Wallace and John Blischak
for helpful discussions and feedback.