项目作者: lindenb

项目描述 :
My xslt sandbox
高级语言: XSLT
项目地址: git://github.com/lindenb/xslt-sandbox.git
创建时间: 2011-03-03T20:29:30Z
项目社区:https://github.com/lindenb/xslt-sandbox

开源协议:

下载


my XSLT sandbox.

Examples

Saving a github-wiki page so I can use it in a blog:

  1. $ curl -s "https://github.com/lindenb/jvarkit/wiki/Illuminadir" |\
  2. xsltproc --html ./github2html.xsl - > file.html

Saving a github-wiki page to LaTex:

  1. $ curl -s "https://github.com/lindenb/jvarkit/wiki/Illuminadir" |\
  2. xsltproc --html ./github2html.xsl - > file.html

Transforming (X)html to LaTex

  1. $ curl -s "https://github.com/lindenb/jvarkit/wiki/Illuminadir" |\
  2. xsltproc --html ./stylesheets/github/github2tex.xsl - > tmp.tex && \
  3. pdflatex tmp.tex && \
  4. evince tmp.pdf

Insert Blast results in sqlite3:

  1. $ xsltproc --novalid blast2sqlite.xsl blast.xml | sqlite3 blast.sqlite3

Convert blast to HTML (see also http://www.biostars.org/p/6635/ )

  1. $ xsltproc --novalid blast2html.xsl blast.xml > result.html

Convert kegg-xml (kgml) to GEXF (see also http://www.biostars.org/p/85763/ )

  1. $ xsltproc --novalid kgml2gexf.xsl "http://kgmlreader.googlecode.com/svn/trunk/KGMLReader/testData/kgml/non-metabolic/organisms/hsa/hsa04060.xml" > result.gexf

Insert Pubmed into a sqlite3 database.

  1. $ xsltproc --novalid stylesheets/bio/ncbi/pubmed2sqlite.xsl pubmed_result.xml | sqlite3 jeter.db

convert Pubmed to JSON

  1. $ xsltproc --novalid stylesheets/bio/ncbi/pubmed2json.xsl pubmed_result.xml | python -mjson.tool

Create a simple Blast dot plot (see http://www.biostars.org/p/85258/ “Make a dotplot from blast alignment” )

  1. $ xsltproc --novalid stylesheets/bio/ncbi/pubmed2sqlite.xsl pubmed_result.xml | sqlite3 jeter.db

Transforms a NCBI taxonomy to Graphiz dot:

  1. xsltproc taxon2dot.xsl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=9606,9913,30521,562,2157" |\
  2. dot -oout.png -Tpng

Get the number of children for each term in gene-ontology (see https://www.biostars.org/p/102699/ “How to determine the terminal GO terms within GO DAG” )

  1. curl "http://archive.geneontology.org/latest-termdb/go_daily-termdb.rdf-xml.gz" |\
  2. gunzip -c |\
  3. xsltproc --novalid go2countchildren.xsl go.rdf - > count.tsv

Extract HTML form:

  1. $ curl -L google.com | xsltproc --html stylesheets/html/html2curl.xsl -
  2. '&ie=ISO-8859-1&hl=fr&source=hp&q=&btnG=Recherche%20Google&btnI=J'ai%20de%20la%20chance&gbv=1'

convert NCBI/EInfo to HTML

  1. xsltproc --novalid \
  2. stylesheets/bio/ncbi/einfo2html.xsl \
  3. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi > index.html

convert Blast/XML to a HTML matrix

  1. xsltproc --novalid \
  2. stylesheets/bio/ncbi/blast2matrix.xsl \
  3. blastn.xml > blast.html

convert NCBI Taxonomy to newick

  1. $ xsltproc stylesheets/bio/ncbi/taxon2newick.xsl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=9606,10090,9031,7227,562"
  2. (((((((((((((((((((((((((((((((Homo_sapiens)Homo)Homininae)Hominidae)Hominoidea)Catarrhini)Simiiform
  3. es)Haplorrhini)Primates,((((((((Mus_musculus)Mus)Mus)Murinae)Muridae)Muroidea)Sciurognathi)Rodentia)
  4. Glires)Euarchontoglires)Boreoeutheria)Eutheria)Theria)Mammalia,(((((((((((((((Gallus_gallus)Gallus)P
  5. hasianinae)Phasianidae)Galliformes)Galloanserae)Neognathae)Aves)Coelurosauria)Theropoda)Saurischia)D
  6. inosauria)Archosauria)Archelosauria)Sauria)Sauropsida)Amniota)Tetrapoda)Dipnotetrapodomorpha)Sarcopt
  7. erygii)Euteleostomi)Teleostomi)Gnathostomata)Vertebrata)Craniata)Chordata)Deuterostomia,((((((((((((
  8. (((((((((((((((((Drosophila_melanogaster)melanogaster_subgroup)melanogaster_group)Sophophora)Drosoph
  9. ila)Drosophiliti)Drosophilina)Drosophilini)Drosophilinae)Drosophilidae)Ephydroidea)Acalyptratae)Schi
  10. zophora)Cyclorrhapha)Eremoneura)Muscomorpha)Brachycera)Diptera)Endopterygota)Neoptera)Pterygota)Dico
  11. ndylia)Insecta)Hexapoda)Pancrustacea)Mandibulata)Arthropoda)Panarthropoda)Ecdysozoa)Protostomia)Bila
  12. teria)Eumetazoa)Metazoa)Opisthokonta)Eukaryota,((((((Escherichia_coli)Escherichia)Enterobacteriaceae
  13. )Enterobacteriales)Gammaproteobacteria)Proteobacteria)Bacteria)cellular_organisms);

Get all the child terms in Disease ontology under DOID:2914 ( immune system disease ) http://disease-ontology.org/ .

  1. $ curl "http://www.berkeleybop.org/ontologies/doid.owl" |\
  2. xsltproc --stringparam ID "DOID:2914" do_children.xsl -
  1. #ID LABEL URI DESCRIPTION
  2. DOID:2914 immune system disease http://purl.obolibrary.org/obo/DOID_7 A disease of anatomical entity that is located_in the immune system.
  3. DOID:0060056 hypersensitivity reaction disease http://purl.obolibrary.org/obo/DOID_2914
  4. DOID:1205 hypersensitivity reaction type I disease http://purl.obolibrary.org/obo/DOID_0060056 An immune system disease that is an exaggerated immune response to allergens, such as insect venom, dust mites, pollen, pet dander, drugs or some foods.
  5. DOID:3044 food allergy http://purl.obolibrary.org/obo/DOID_1205 A hypersensitivity reaction type I disease that is an abnormal response to a food, triggered by the body's immune system.
  6. DOID:0060057 gluten allergic reaction http://purl.obolibrary.org/obo/DOID_3044
  7. DOID:3660 wheat allergic reaction http://purl.obolibrary.org/obo/DOID_3044
  8. DOID:4376 milk allergic reaction http://purl.obolibrary.org/obo/DOID_3044 A food allergy that results in adverse immune reaction to one or more of the proteins in cow's milk and/or the milk of other animals, which are normally harmless to the non-allergic individual.
  9. DOID:4377 egg allergy http://purl.obolibrary.org/obo/DOID_3044 A food allergy that is an allergy or hypersensitivity to dietary substances from the yolk or whites of eggs, causing an overreaction of the immune system which may lead to severe physical symptoms.
  10. DOID:4378 peanut allergic reaction http://purl.obolibrary.org/obo/DOID_3044 A food allergy that is an allergy or hypersensitivity to dietary substances from peanuts causing an overreaction of the immune system which in a small percentage of people may lead to severe physical symptoms.
  11. (...)

Generation of new java KNIME nodes

see https://github.com/lindenb/xslt-sandbox/wiki/Knime2java

Draw a Manhattan plot in SVG

see https://github.com/lindenb/xslt-sandbox/wiki/ManhattanPlot

Plot

Create a Makefile from an apache-maven pom.xml to download the required jars:

  1. $ xsltproc pom2make.xsl "http://central.maven.org/maven2/org/eclipse/jetty/jetty-server/9.3.0.M2/jetty-server-9.3.0.M2.pom" 2> /dev/null
  1. lib.dir=lib
  2. all.jars = $(addprefix ${lib.dir}/,$(sort org/eclipse/jetty/jetty-server/9.3.0.M2/jetty-server-9.3.0.M2.jar javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar org/eclipse/jetty/jetty-http/9.3.0.M2/jetty-http-9.3.0.M2.jar org/eclipse/jetty/jetty-util/9.3.0.M2/jetty-util-9.3.0.M2.jar javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar org/slf4j/slf4j-api/1.7.10/slf4j-api-1.7.10.jar org/eclipse/jetty/jetty-io/9.3.0.M2/jetty-io-9.3.0.M2.jar org/eclipse/jetty/jetty-util/9.3.0.M2/jetty-util-9.3.0.M2.jar javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar org/slf4j/slf4j-api/1.7.10/slf4j-api-1.7.10.jar org/eclipse/jetty/jetty-jmx/9.3.0.M2/jetty-jmx-9.3.0.M2.jar org/eclipse/jetty/jetty-util/9.3.0.M2/jetty-util-9.3.0.M2.jar javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar org/slf4j/slf4j-api/1.7.10/slf4j-api-1.7.10.jar))
  3. .PHONY:all
  4. all: ${all.jars}
  5. ${all.jars} :
  6. mkdir -p $(dir $@) && curl -o $@ "http://central.maven.org/maven2/$(patsubst ${lib.dir}/%,%,$@)"

see https://github.com/lindenb/xslt-sandbox/wiki/Maven2Make

Plotting project Tycho data:

see https://github.com/lindenb/xslt-sandbox/wiki/Tycho

http://i.imgur.com/gBZCVTX.png

see https://github.com/lindenb/xslt-sandbox/wiki/PubmedTrending

http://i.imgur.com/a1VAdCa

XML to dot

show a XML tree as graphviz dot

  1. $ curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=25&retmode=xml&rettype=fasta" |\
  2. xsltproc xml2dot.xsl -
  3. `
  1. digraph G
  2. {
  3. idp0 [label="<ROOT>"];
  4. idp5184 [label="TSeqSet",shape=oval]
  5. idp5120 [label="TSeq",shape=oval]
  6. idp228784 [label="TSeq_seqtype",shape=oval]
  7. idp234608 [label="@value=nucleotide",shape=box]
  8. idp234608 -> idp228784;
  9. idp228784 -> idp5120;
  10. (...)

Blast to fasta

see https://www.biostars.org/p/14913/

  1. xsltproc --novalid blast2fasta.xsl blastn.xml

PSI/Biogrid to SQL

  1. xsltproc -o tmp.sql psi2sql.xslt BIOGRID-ALL-3.4.129.psi.xml
  2. sqlite3 db.sqlite3 < tmp.sql

find the interactions for B4DG32 (http://www.uniprot.org/uniprot/B4DG32 )

  1. $ sqlite3 -header db.sqlite3 'select distinct I1.shortLabel,I2.shortLabel from interaction as L,interaction2interactor as I2I1, interaction2interactor as I2I2, interactor as I1 ,interactor as I2, xref as X1 where X1.pk="B4DG32" and X1.interactor_pk=I1.pk and I2I1.interaction_pk = L.pk and I2I1.interactor_pk = I1.pk and I2I2.interaction_pk = L.pk and I2I2.interactor_pk = I2.pk'
  2. shortLabel|shortLabel
  3. SH2D3C|BCAR1
  4. SH2D3C|EFS
  5. SH2D3C|EGFR
  6. SH2D3C|LYN
  7. SH2D3C|NEDD9
  8. SH2D3C|SH2D3C
  9. SH2D3C|SNCAIP

Get the coverage of a blast query.

  1. $ xsltproc stylesheets/bio/ncbi/blast2coverage.xsl blastn.xml

output:

  1. #ID DEF POS LENGTH CONSENSUS DEPTH
  2. gi|9626372|ref|NC_001422.1| Enterobacteria phage phiX174 sensu lato, complete genome 1 5386 GGGGGGGGGGGGGGGGGG 18
  3. gi|9626372|ref|NC_001422.1| Enterobacteria phage phiX174 sensu lato, complete genome 2 5386 AAAAAAAAAAAAAAAAAA 18
  4. gi|9626372|ref|NC_001422.1| Enterobacteria phage phiX174 sensu lato, complete genome 3 5386 GGGGGGGGGGGGGGGGGG 18
  5. gi|9626372|ref|NC_001422.1| Enterobacteria phage phiX174 sensu lato, complete genome 4 5386 TTTTTTTTTTTTTTTTTT 18
  6. gi|9626372|ref|NC_001422.1| Enterobacteria phage phiX174 sensu lato, complete genome 5 5386 TTTTTTTTTTTTTTTTTT 18
  7. gi|9626372|ref|NC_001422.1| Enterobacteria phage phiX174 sensu lato, complete genome 6 5386 TTTTTTTTTTTTTTTTTT 18
  8. gi|9626372|ref|NC_001422.1| Enterobacteria phage phiX174 sensu lato, complete genome 7 5386 TTTTTTTTTTTTTTTTTT 18
  9. gi|9626372|ref|NC_001422.1| Enterobacteria phage phiX174 sensu lato, complete genome 8 5386 AAAAAAAAAAAAAAAAAA 18
  10. gi|9626372|ref|NC_001422.1| Enterobacteria phage phiX174 sensu lato, complete genome 9 5386 TTTTTTTTTTTTTTTTTT 18
  11. (...)

convert genbank gbc to gtf

should work with simple genbank files (tested with a simple virus)

  1. xsltproc --novalid stylesheets/bio/ncbi/gb2gtf.xsl input.gbc.xml | sort -t ' ' -k4,4n > out.gtf

Contribute

License

The project is licensed under the MIT license.

Author

Pierre Lindenbaum PhD @yokofakun