Polymorphic Edge Detection - An efficient polymorphism detector for NGS data
Polymorphic Edge Detection (PED) is the analysis flow for DNA polymorphism detection from short reads of next generation sequencer (NGS). I developed two methods to detect polymorphisms based on detection of the polymorphic edge. One is based on bidirectional alignment and the other is based on comparison of k-mers. Examples of PED result and useful information are shown in Web pages (English) (Japanese) (Paper)(Blog).
DNA polymorphism is any difference of DNA sequence between individuals. These differences are single nucleotide polymorphism (SNP), insertion, deletion, inversion, translocation and copy number variation. On the non-polymorphic region, sequences between two individuals are completely same. At the position of SNP, or at the beginning of other polymorphisms, the nucleotide must be different between individuals.
Chr11 80443004
|
TTTTTAATTGAAAAGGCATTAAGCTGGGTCTATGCAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTAGATAGGTAGAAAAAAAAAACCACTATCAGCAACA Reference sequence matching from 5'-end
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| | | | |||||||| | | |
TTTTTAATTGAAAAGGCATTAAGCTGGGTCTATGCAGTGTGTGTGTGTGTGTGTGTGTGTGTGTAGATAGGTAGAAAAAAAAAACCACTATCAGCAACAGT Short read sequence
||| || | | | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TTTAATTGAAAAGGCATTAAGCTGGGTCTATGCAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTAGATAGGTAGAAAAAAAAAACCACTATCAGCAACAGT Reference sequence matching from 3'-end
|
Chr11 80442977
Short read sequence is aligned with reference sequence from both 5’- and 3’-ends. Positions indicated over and bellow of the alignment are first mismatched base, i.e., polymorphic edge. The bidirectional alignment clearly indicates two bases (GT) deletion in the short read. The bidirectional can detect not only deletion but also SNP, insertion, inversion and translocation.
Individual_A AAATGGTACATTTATATTAT
Individual_B AAATGGTACATTTATATTAC
All short reads from Individual_A and Individual_B are sliced to k-mer (e.g. k = 20) in each position. For example, the Individual_A has the k-mer sequence of AAATGGTACATTTATATTAT but does not have AAATGGTACATTTATATTAC. On the other hand, the Individual_B has the AAATGGTACATTTATATTAC but does not have AAATGGTACATTTATATTAT. The last base of k-mer of Individual_A is T, and Individual_B is C. The last base of k-mers must be SNP or edge of insertion, deletion, inversion, translocation or copy number variation. The k-mer method detects edges of polymorphism by difference of last base of k-mers. This method enables to detect polymorphisms by direct comparison of NGS data.
perl download.pl accession=SRR11542244
perl ped.pl target=SRR11542244,ref=SARS-CoV-2
or
docker run -v `pwd`:/work -w /ped akiomiyao/ped perl download.pl accession=SRR11542244,wd=/work
docker run -v `pwd`:/work -w /ped akiomiyao/ped perl ped.pl target=SRR11542244,ref=SARS-CoV-2,wd=/work
Run time of ped.pl is only two minutes for one accession using a standard desktop computer installed Linux (Ubuntu).
If you want to analyze your private sequences,
cd ped
mkdir your_sample_name
mkdir your_sample_name/read
cp somewhere/read_data.fastq your_sample_name/read
perl ped.pl target=your_sample_name,ref=SARS-CoV-2
or
docker run -v `pwd`:/work -w /ped akiomiyao/ped perl ped.pl target=your_sample_name,ref=SARS-CoV-2,wd=/work
Target name for ped.pl is the directory name.
Detailed Link for COVID-19 analysis
The ped.pl is a multithreaded (multiprocess) script, suitable for the multi-core CPU like as 4 or 8 cores.
Of course, the ped.pl can run with the 2 or single core machine, but slow.
The ped.pl runs on Linux (or FreeBSD) machine and Mac with at least 4 GB RAM and 1 TB hard disk (or SSD).
Following is a demonstration of spontaneous SNPs and SVs detection from a Caenorhabditis elegans with 250-times repeated generations.
cd ped
perl download.pl accession=ERR3063486
perl download.pl accession=ERR3063486
perl ped.pl target=ERR3063487,control=ERR3063486,ref=WBcel235
Installation of fastq-dump and ped scripts is described below.
The docker container for Linux includes fastq-dump and ped scripts.
docker run -v `pwd`:/work -w /ped akiomiyao/ped perl download.pl accession=ERR3063486,wd=/work
docker run -v `pwd`:/work -w /ped akiomiyao/ped perl download.pl accession=ERR3063487,wd=/work
docker run -v `pwd`:/work -w /ped akiomiyao/ped perl ped.pl target=ERR3063487,control=ERR3063486,ref=WBcel235,wd=/work
For analyses of metagenome or mixed genome (e.g. SARS-CoV-2 data from a patient), using count data is recommended.
File name Description
ERR3063487.aln Bidirectional alignment
ERR3063487.bi.primer Primer data for PCR
ERR3063487.bi.snp SNP data (original format)
ERR3063487.bi.snp.count SNP data (Showing snp counts from aln data)
ERR3063487.index Index file for alignemt search
ERR3063487.log Process log
ERR3063487.report Log of ped.pl
ERR3063487.sv Structural variation data
ERR3063487.sv.count Structural variation data (Showing snp counts from aln data)
ERR3063487.sv.primer Primer data for PCR
ERR3063487.vcf SNP and SV data (vcf format, for IGV)
ERR3063487.full.vcf SNP and SV data (vcf format, full output)
ERR3063487.count.vcf SNP and SV data (vcf format, full output with unverified data)
git clone https://github.com/akiomiyao/ped.git
sudo apt install git (Ubontu)
sudo yum install git (CentOS)
sudo pkg install git (FreeBSD)
git pull
sudo apt install curl (Ubontu)
sudo yum install curl (CentOS)
sudo pkg install curl (FreeBSD)
or
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo apt install docker
sudo apt install docker.io
sudo docker pull akiomiyao/ped
sudo docker stats
sudo docker kill Container_ID
After the new login, docker commands can be execute with your account.
sudo usermod -a -G docker your_username
Name Description
97103 Water melon (Citrullus lanatus subsp. vulgaris) cv. 97213v2
Asagao1.2 Asagao (Ipomoea nil) Japanese morning glory
B73v4 Corn (Zea mays B73) RefGen v4
Bomo Silkworm (Bombyx mori) Genome assembly (Nov.2016)
Bsubtilis Bacillus subtilis subsp. subtilis str. 168 (NC_000913.3)
Camarosa1.0 Strawberry (Fragaria x ananassa) Camarosa genome assembly v1.0
Camellia20200506 Black tea (Cammellia sinensis) assembly
CharlestonGray2 Water melon (Citrullus lanatus subsp. vulgaris) cv. Charleston Gray v2
CsinensisHz2 Sweet orange (Citrus sinensis) Hzau v2.0
Ecoli Escherichia coli str. K-12 substr. MG1655 (NC_000913.3)
Fielder1 Wheat (Triticum aestivum L. cv. Fielder) Version 1
GRCm38 Mouse (Mus musculus) Genome Reference Consortium Mouse Build 38
Gifu1.2 Lotus japonicus Gifu
Gmax275v2.0 Soybean (Glycine max) genome project assemble version 2
HBV Hepatitis B virus (strain ayw, NC_003977.2)
IBSC2 Barley (Hordeum vulgare L. cv. Molex) Release 47
IRGSP1.0 Rice (Olyza sativa L. cv. Nipponbare) version 1.0
IWGSC1.0 Wheat (Triticum aestivum L. cv. Chinese Spring) Version 1.0
KOD1 Thermococcus kodakarensis KOD1 (NC_006624.1)
LJ3 Lotus japonicus MG20 v3.0 (Download from https://lotus.au.dk/data/download into LJ3 directory)
Lactuca_sativa Lettuce (Lactuca sativa)
RIB40 Aspergillus oryzae RIB40 (ASM18445v3)
Reikou2.3 Strawberry Reikou genome v2.3
SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, NC_045512.2)
SL3 Tomato (Solanum lycopersicum cv. Heinz 1706) Build 3.0
SScrofa11.1 Pig (Sus scrofa) Release-97
TAIR10 Arabidopsis thaliana version TAIR10
UMD3.1 Cow (Bos taurus L1 Dominette 01449) UMD 3.1
Vcholerae Vibrio cholerae O1 biovar El Tor str. N16961
WBcel235 Caenorhabditis elegans WBcel235
danRer11 Zebrafish (Danio rerio) Genome Reference Consortium Zebrafish Build 11
dmel626 Drosophila melanogaster
hg19 Human (Homo sapiens) Genome Reference Consortium Human Build 19
hg38 Human (Homo sapiens) Genome Reference Consortium Human Build 38
sacCer3 Saccharomyces cerevisiae (UCSC sacCer3)
If fetch the fasta file is failed by the script, fetch the file separately and save in the reference directory and run the script.
Specify only reference, ped.pl will make the reference data only.
For example,
perl ped.pl ref=reference,file=fasta_file_name
mkdir IRGSP1.0
cp somewhere/IRGSP-1.0_genome.fasta.gz IRGSP1.0
perl ped.pl ref=IRGSP1.0,file=IRGSP-1.0_genome.fasta.gz
ERR3063487 specific SNPs against ERR3063486 will be detected.
perl ped.pl target=ERR3063487,control=ERR3063486,ref=WBcel235,method=kmer
perl ped.pl target=ERR3063487,ref=WBcel235,method=kmer
perl ped.pl target=ERR3063487,control=ERR3063486,method=kmer
A part of SNP list of the bidirectional method is
1 3189273 T C 50 0 15 9 H
1 3189333 T C 50 0 14 0 R
1 3189345 A G 50 0 13 17 H
1 3189429 G A 50 0 15 0 R
1 3189498 G A 50 0 0 39 M
1 3189503 T G 50 0 2 1 R
1 3189527 T G 50 0 10 0 R
1 3189609 T A 50 0 41 1 R
1 3189704 C G 50 0 0 11 M
1 3189741 A G 50 0 0 23 M
1 3189819 A G 50 0 0 4 R
1 3189833 C T 50 0 0 46 M
Column 1: Chromosome number
Column 2: Position of SNP
Column 3: Reference base at the SNP position
Column 4: Alternative base
Column 5: Number of detected reads with alternative base
Column 6: Number of reads in the control sort_uniq file with control type polymorphism
Column 7: Number of reads in the control sort_uniq file with target type polymorphism
Column 8: Number of reads in the target sort_uniq file with control type polymorphism
Column 9: Number of reads in the target sort_uniq file with target type polymorphism
Column 10: Genotype (M: homozygous, H: heterozygous, R: reference type, N: not applicable)
Following columns will be appeared in the primer file.
Column 11: Left primer sequence
Column 12: Right primer sequence
Column 13: Estimated size of the amplified fragment
Column 14: Upstream and downstream sequence around the SNP
The algorithm of detection primer sequences has been developed by my experience of PCR experiment.
Column 1: Chromosome number of junction detected by 5’ to 3’ matching
Column 2: Position of junction detected by 5’ to 3’ matching
Column 3: Chromosome number of junction detected by 3’ to 5’ matching
Column 4: Position of junction detected by 3’ to 5’ matching
Column 5: Direction
Column 6: Type of polymorphism (insertion, deletion, inversion and translocation)
Column 7: Size of insertion or deletion
Column 8: Number of reads in the control sortuniq file with control type polymorphism
Column 9: Number of reads in the control sort_uniq file with target type polymorphism
Column 10: Number of reads in the target sort_uniq file with control type polymorphism
Column 11: Number of reads in the target sort_uniq file with target type polymorphism
Column 12: Genotype (M: homozygous, H: heterozygous, R: reference type, N: not applicable)
Column 13: Sequence between junctions ( is no sequence within junctions)
Following columns will be appeared in the primer file.
Column 14: Left primer sequence
Column 15: Right primer sequence
Column 16: Estimated size of the amplified fragment
Column 17: Upstream and downstream sequences around the junction
The algorithm of detection primer sequences has been developed by my experience of PCR experiment.
- A part of SNP result by the *k*-mer method is
X 14544549 ACAGTTGTATTTTTCAATT A A G r 0 0 0 13 0 27 0 0 13 0 0 30 M
X 14544549 TACACCACTGTAAGTCAAC A A G f 13 0 0 0 0 0 27 0 13 0 0 30 M
X 14592687 AAAAATTTGGATTTTTGGA G G GT r 0 15 0 0 20 20 0 1 5 0 10 0 R
X 14592707 AAAAATTTGGATTTTTGGA G G AG f 0 0 15 0 20 0 20 0 5 0 10 0 R
X 14615799 ACCCCTATATATAGTGTTT T T AT r 25 0 0 0 16 0 0 12 26 0 18 0 R
X 14615819 ACCCCTATATATAGTGTTT T T AT f 0 0 0 25 12 0 0 16 26 0 23 0 R
X 14624654 GTTGTGCTTTATTTATTTG A A AT r 0 0 0 19 15 0 0 20 19 0 19 0 R
X 14624674 GTTGTGCTTTATTTATTTG A A AG f 18 0 0 0 18 0 16 0 19 0 17 0 R
X 14632040 ATTTTTTTCACAAACAAGG T T A r 26 0 0 0 0 0 0 23 21 0 0 21 M
X 14632040 CTTAAATCGGAGAACAAAT T T A f 0 0 0 25 22 0 0 0 21 0 0 21 M
X 14682371 TCAGTTCAATCACGATAAA A A AT r 0 0 0 19 10 0 0 19 24 0 20 0 R
Column 1: Chromosome number
Column 2: Position of SNP
Column 3: (k-1)-mer (k = 20)
Column 4: Base of reference at the position of SNP
Column 5: Base of control
Column 6: Base of target
Column 7: Direction of k-mer sequence on the reference
Column 8: Number of k-mer with A at the end of k-mer in the control
Column 9: Number of k-mer with C at the end of k-mer in the control
Column 10: Number of k-mer with G at the end of k-mer in the control
Column 11: Number of k-mer with T at the end of k-mer in the control
Column 12: Number of k-mer with A at the end of k-mer in the target
Column 13: Number of k-mer with C at the end of k-mer in the target
Column 14: Number of k-mer with G at the end of k-mer in the target
Column 15: Number of k-mer with T at the end of k-mer in the target
Column 16: Number of reads in the control sort_uniq file with control type base
Column 17: Number of reads in the control sort_uniq file with target type base
Column 18: Number of reads in the target sort_uniq file with control type base
Column 19: Number of reads in the target sort_uniq file with target type base
Column 20: Genotype (M: homozygous, H: heterozygous, R: reference type, N: not applicable)
Following columns will be appeared in the primer file.
Column 21: Left primer sequence
Column 22: Right primer sequence
Column 23: Estimated size of the amplified fragment
Column 24: Upstream and downstream sequence around the SNP
The algorithm of detection primer sequences has been developed by my experience of PCR experiment.
## Detection of polymorphisms between control and target
- SNPs between ERR3063486 (wild-type) and ERR3063487 (mutant)
I 27950 A T 31 0 0 17 M
I 892680 C G 8 1 5 3 H
I 1196268 T A 9 0 8 4 H
I 1380502 A T 22 0 0 13 M
I 1728826 T G 8 0 6 3 H
I 3203676 T G 16 1 11 5 H
I 3407954 C A 28 0 0 23 M
I 3656814 A C 8 1 8 4 H
I 5001132 T G 8 1 6 3 H
I 6324213 G T 24 0 0 21 M
I 7249395 T G 19 1 7 4 H
I 7263091 T G 14 1 9 6 H
I 9136539 T A 20 0 0 16 M
I 10137843 T G 6 0 9 4 H
I 11168832 A C 5 1 5 3 H
I 14097862 A T 17 1 9 13 H
II 86891 A G 7 0 8 4 H
II 271122 C A 16 0 6 3 H
II 1179320 C A 11 0 0 17 M
II 2500482 C A 15 0 0 24 M
II 3552886 A C 15 1 5 3 H
II 3624396 A C 12 1 9 5 H
II 3648966 C T 9 0 0 27 M
II 3771956 G C 19 0 0 16 M
II 3935824 A C 7 0 6 3 H
II 3979685 T G 9 0 5 3 H
II 8284226 C A 19 0 0 20 M
II 8447001 T G 10 0 9 4 H
II 8553707 T A 21 0 0 6 M
II 9410187 C T 23 0 0 18 M
II 9937543 T G 21 0 0 18 M
II 10629519 A C 17 1 12 8 H
II 10685303 T A 21 0 0 16 M
II 12732768 A C 13 1 6 3 H
II 14096056 T G 6 0 5 4 H
III 198231 T A 16 0 0 18 M
III 3091906 T G 19 0 9 4 H
III 3643248 G C 6 0 6 3 H
III 4824486 C G 33 0 0 23 M
III 7532164 T G 8 1 6 3 H
III 9723566 G A 25 0 0 23 M
III 11532166 C T 15 0 1 18 M
III 13092063 C T 13 0 6 6 H
III 13273147 A T 15 1 6 4 H
III 13350315 A C 14 0 7 4 H
IV 554740 A C 11 1 9 4 H
IV 876681 A G 20 0 7 4 H
IV 1159003 A C 13 0 7 6 H
IV 1327294 G A 26 0 0 31 M
IV 2417073 G A 32 0 0 21 M
IV 2858610 C T 21 0 0 14 M
IV 3931877 G A 13 0 0 20 M
IV 4298928 A C 11 1 9 4 H
IV 5200355 G T 27 0 0 23 M
IV 6481216 C G 17 0 0 8 M
IV 6796145 C T 16 0 0 19 M
IV 6967218 G A 25 0 0 25 M
IV 8053747 T A 23 0 0 19 M
IV 8951120 T G 10 1 13 6 H
IV 9709645 C A 22 0 0 27 M
IV 10195034 T G 14 1 8 4 H
IV 13672183 C T 18 0 0 18 M
IV 14217812 A C 9 0 6 3 H
IV 14760376 T G 9 0 5 3 H
IV 16870604 A C 12 1 8 4 H
V 7011 T G 10 1 11 5 H
V 974819 A C 6 0 9 4 H
V 3052728 A C 9 1 10 5 H
V 3277678 G A 22 0 1 19 M
V 3786240 T A 23 0 0 17 M
V 4261370 T G 14 0 9 4 H
V 5134172 C T 19 0 17 10 H
V 7816318 A C 20 1 8 4 H
V 9771516 A T 25 0 18 12 H
V 10513400 A C 11 0 9 4 H
V 11310419 G T 15 0 0 18 M
V 15880652 T C 20 1 6 3 H
V 19657843 T A 25 0 0 24 M
V 19718778 G A 11 0 0 11 M
V 19733914 A C 6 0 6 3 H
V 19742410 T G 6 1 5 3 H
V 20202209 T G 5 0 5 3 H
V 20413571 T G 13 1 6 3 H
X 2843588 A C 13 0 9 4 H
X 4194330 T C 23 0 0 22 M
X 4247242 T G 7 1 6 3 H
X 5446454 C T 21 0 0 7 M
X 6994561 A T 29 0 0 29 M
X 7299494 A G 7 0 6 3 H
X 7586849 T G 8 1 8 4 H
X 10486673 A T 31 0 0 23 M
X 12500491 T G 7 1 6 3 H
X 14544549 A G 13 0 0 30 M
X 14632040 T A 21 0 0 21 M
X 14735899 T G 8 0 5 3 H
X 15395882 A G 17 0 13 6 H
X 15815152 T G 13 1 5 3 H
X 16861146 C T 9 0 0 11 M
X 16964164 T G 14 0 5 3 H
- Structural variations between ERR3063486 (wild-type) and ERR3063487 (mutant)
I 834776 I 834746 f deletion -12 6 0 7 4 H TCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCT
I 1251597 I 1251573 f deletion -2 5 0 8 4 H TGTGTGTGTGTGTGTGTGTGTGTGT
I 1412142 I 1411952 f insertion 102 12 1 6 3 H CCCCCCGCTGACCCCAAACCAATATCCCGTCAAAAAACGAAAATTCATATTTTTCTTAATCTACAGTAATCCTACAGTGCCCCTACA
I 1560682 I 1560674 f deletion -1 17 0 0 19 M TTTTTTTT
I 1560682 I 1561254 r inversion N 13 0 0 6 M TTAAAGGTGGTGTGGTCGAATTTTTTTT
I 2160919 I 2160898 f deletion -1 9 0 8 4 H TTTTTTTTCAAAAAAAAAAAA
I 2384261 I 2384244 f insertion 1 14 1 8 4 H CAAAAAAAAAAAAAA
I 2715230 III 12981863 r translocation N 5 1 5 3 H CGTATTGCACAGCACATTTGACGCGCAAAAT
I 5081495 I 5081478 f deletion -2 7 1 6 4 H TTTTTTTTTTTTTTTTTT
I 8028077 I 8028066 f deletion -1 6 0 6 3 H TTTTTTTTTTT
I 8028078 I 8028065 f insertion 1 6 0 6 3 H TTTTTTTTTTT
I 9825014 I 9825007 f insertion 1 23 0 8 13 H AAAAA
I 10622455 IV 9192034 f translocation N 5 1 6 3 H GTTCAAATAAAAATATTTTTTT
I 10887954 I 10887958 f deletion -6 11 0 1 10 M A
I 11005509 I 11005489 f deletion -1 5 1 0 5 M AAATTTTTTTTTTTTTTTTT
I 12856424 I 12856415 f deletion -1 14 0 8 5 H TTTTTTTTT
I 13734806 I 13734788 f deletion -1 5 0 6 4 H TTTTTTTTTTTTTTTTTT
II 191614 II 191601 f insertion 1 11 0 9 4 H AAAAAAAAAAA
II 948033 II 948099 f deletion -69 20 0 1 15 M AT
II 1777672 II 1770125 f insertion 7506 8 1 9 4 H ATGGTGAGTAGCCGGTAATTTCATAGTTATTGAAATTTGA
II 2361947 II 2361931 f deletion -1 10 1 0 15 M TTTTTCTTTTTTTTTT
II 4463297 V 1000739 r translocation N 6 0 6 4 H TTTCGATTTTCCAGAAAATCAAAAAAAAA
II 4895056 V 18778607 r translocation N 5 0 5 3 H TTCTACGTTTTGCAATGTGTTTTTT
II 5473562 II 3675168 r inversion N 8 1 5 3 H TTTTACTCAGTTATGTTTTTTTT
II 12746320 III 4520340 f translocation N 6 1 5 3 H TGTAAAATTGTTTTTTTTT
II 13112130 V 20740040 f translocation N 6 0 7 4 H AAAAAAAAACGCATGCATTTTTCG
II 13327324 II 13327697 r inversion N 6 1 6 5 H TTTTGACACTTTTTAGTAATAAATGCAAAAAAAATCAACAAAAATAGACTAAACATTGTAAAAACTGTAAAAACTAAGAGAAAAAAT
III 1663181 III 1663357 f deletion -209 6 1 7 5 H TTTTTTCCAGAAATTAATATTTCTAGAAAAAT
III 2300741 X 15839886 r translocation N 19 0 11 6 H TTAAAGGTGGAGTAGCGCCAGTGGGAAAATTGCTTTAAAACATGCCTATGGTACCACAATGACCAAATATCAT
III 2520715 X 97782 f translocation N 7 1 6 5 H TATTTTTTCGCCATTTTTTTT
III 2985966 IV 876821 f translocation N 5 1 6 3 H AAAAAAATTTTTTTTTT
III 3566956 III 3566962 f deletion -48 8 1 6 4 H TCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCT
III 6231151 III 6231140 f deletion -1 14 0 6 3 H AAAAAAAAAAA
III 10402397 III 10402387 f deletion -2 5 1 7 4 H TTTTTTTTTTT
III 11280814 X 2272016 r translocation N 12 0 6 3 H TTTTTTTCAAAAAAAAAAAAA
III 12543896 III 12543907 f deletion -62 8 1 9 4 H AAATTTCCGGAAAACATGCAAATTGCCAGAATTGAAAATTTCCGGCAAAT
III 13419473 III 13419443 f insertion 6 5 1 10 5 H TGTGTGTGTGTGTGTGTGTGTGT
IV 1786345 IV 1786337 f deletion -1 15 0 0 12 M AAAAAAAA
IV 2112309 IV 2112287 f deletion -1 7 1 5 3 H TTTTTTGTTTTTTTTTTTTTTT
IV 2314588 IV 2314573 f deletion -1 10 1 8 4 H TTTTTTTTTTTTTTT
IV 2445289 IV 2445276 f deletion -1 15 0 5 3 H TTTTTTTTTTTTT
IV 3192017 IV 3192001 f deletion -1 10 1 8 5 H AAAAAAAAAAAAAAAA
IV 3192018 IV 3192000 f insertion 1 10 0 7 4 H AAAAAAAAAAAAAAAA
IV 3336297 IV 3336328 f deletion -31 21 0 0 23 M A
IV 3867139 IV 3867116 f deletion -2 11 1 6 3 H ATATATATATATATATATATATAT
IV 4399486 IV 4399441 f insertion 2 8 0 10 6 H GAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA
IV 4489131 IV 4489122 f deletion -1 16 0 1 17 M TTTTTTTTT
V 1027708 V 1027700 f deletion -39 9 1 10 5 H GCCTATGGCCTACGCCTATGGCCTACGCCTATGGCCTACGCCTATG
V 1989616 V 1989642 f deletion -3 6 1 6 3 H GATAAAAACTACTTGGATAAATGA
V 9026732 V 9026728 f deletion -3 6 1 7 4 H ATTATT
V 11884926 II 3151004 r translocation N 5 1 6 5 H GTGCCGAGTGCCGATCGGCACAATGTG
V 18672436 IV 2681633 r translocation N 7 1 5 3 H GGGAAAATTGCTTTAAAACATGCCTATGGTACTACAA
V 18890234 V 18892184 r inversion N 6 1 6 3 H TTTTTATTGAAAACTAGTATAAAAATATA
V 20123760 V 20123893 f deletion -150 12 0 7 4 H GGGGTTCGAACCCCGG
X 99547 X 99523 f deletion -1 8 1 9 5 H AAAAAAAATTTTTTTTTTTTTTTT
X 438457 X 438446 f insertion 1 12 0 5 5 H TTTTTTTTT
X 562164 X 562155 f insertion 1 22 0 0 22 M AAAAAAA
X 1522918 X 1522902 f deletion -1 6 1 5 3 H AAAAAAAAAAAAAAAA
X 2498483 X 2498466 f deletion -1 6 1 5 3 H TTTTTTTTTTTTTTTTT
X 3823840 X 3823828 f deletion -1 10 0 8 4 H AAAAAAAAAAAA
X 5885390 X 5885377 f deletion -1 11 0 10 5 H TTTTTTTTTTTTT
X 7312229 X 11435923 r inversion N 6 0 5 4 H TATTCACCCCGTTCGACTGTGCAATGGGTTTAATCTATTCACTTTGTAAATCAAAGAATCGACGACCGCCTCCTGAA
X 10023790 III 7850798 r translocation N 5 0 6 3 H ATATCAAAATTTCATTTTTTTT
X 14258766 III 303520 f translocation N 6 0 9 5 H TCACAAAATTCTTTGGCCGCCCCAAGTGTCCTAACTCGAAG
## Detection of Copy Number Variation
perl ped.pl target=ttm2,control=ttm5,ref=IRGSP1.0,method=cnv
```
Akio Miyao, Ph.D. miyao@affrc.go.jp
Institute of Crop Science / National Agriculture and Food Research Organization
2-1-2, Kannondai, Tsukuba, Ibaraki 305-8518, Japan
Version 1.6 Scripts for grid engine have been removed.
Version 1.5 Threads in ped.pl have been changed to forking process.
Version 1.4 Add clipping of short reads for RT-PCR data. Add application of CAVID-19 analysis.
Version 1.3 Update for search.pl for confirmation of alignment. Improvement of making sort_uniq data.
Version 1.2 sort_uniq files are divided to 64 subfiles by first three nucleotide sequence. Remake of reference data is required.
Version 1.1 sort_uniq files are compressed by gzip. Requirement of disk space is reduced but requires more CPU time.
Version 1.0 Original version for PED paper.
Cite this article as: Polymorphic edge detection (PED): two efficient methods of polymorphism detection from next-generation sequencing data
Akio Miyao, Jianyu Song Kiyomiya, Keiko Iida, Koji Doi, Hiroshi Yasue
BMC Bioinformatics. 2019 20(1):362.
URL https://doi.org/10.1186/s12859-019-2955-6
PDF https://rdcu.be/bH7e8
NARO NON-COMMERCIAL LICENSE AGREEMENT Version 1.0
This license is for ‘Non-Commercial’ use of software for polymorphic edge detection (PED)
Copyright (C) 2017 National Agriculture and Food Research Organization. All rights reserved.
Patent JP 7122006, 7166638