Helicobacter pylori: Difference between revisions
Jump to navigation
Jump to search
(→Wustl) |
No edit summary |
||
Line 54: | Line 54: | ||
== Wustl == | == Wustl == | ||
nl assembly ctgs min q1 q2 q3 max mean n50 sum reads | |||
1 HPKX_1039_AG0C1.scarf.assembled-29-21 233 100 318 1700 9639 77743 7085 19912 1650899 0 | 1 HPKX_1039_AG0C1.scarf.assembled-29-21 233 100 318 1700 9639 77743 7085 19912 1650899 0 | ||
2 HPKX_1039_AG0C2.solexa.txt.assembled-23-12 420 101 341 1684 5301 36588 3914 9412 1644043 5.2M | 2 HPKX_1039_AG0C2.solexa.txt.assembled-23-12 420 101 341 1684 5301 36588 3914 9412 1644043 5.2M | ||
Line 86: | Line 86: | ||
== CBCB == | == CBCB == | ||
nl assembly ctgs min q1 q2 q3 max mean n50 sum reads | |||
2 HPKX_1039_AG0C2 335 45 91 1376 5910 51990 4822 14468 1615385 | |||
4 HPKX_1039_AG4C2 371 45 76 297 5375 90459 4472 18459 1659148 | |||
5 HPKX_1172_AG0C1 516 45 81 244 2838 44992 3075 12094 1587142 | |||
6 HPKX_1172_AG0C2 1163 45 213 474 991 6591 740 1243 861451 | |||
7 HPKX_1172_AG4C1 2214 45 148 287 530 3776 405 589 898322 | |||
8 HPKX_1172_AG4C2 332 45 74 280 4800 50005 4741 20671 1574270 | |||
10 HPKX_1259_NL0C2 332 45 74 280 4800 50005 4741 20671 1574270 | |||
12 HPKX_1259_NL4C2 472 45 69 137 2017 59943 3381 18488 1596084 | |||
14 HPKX_1379_NL0C2 472 45 69 137 2017 59943 3381 18488 1596084 | |||
16 HPKX_1379_NL4C2 391 45 75 153 2314 79083 4184 22491 1635979 | |||
24 HPKX_345_AG4C2 1098 45 244 724 1799 25718 1367 2745 1501014 | |||
19 HPKX_345_NL0C2 1436 45 236 567 1260 8990 965 1796 1387059 | |||
21 HPKX_438_AG0C2 410 45 90 562 4467 49129 3962 13859 1624766 | |||
23 HPKX_438_CA4C2 398 45 161 1435 5137 37278 4068 12278 1619323 | |||
velvet_0.7.55: | velvet_0.7.55: | ||
# Fastq vs Fasta: no diffrence | # Fastq vs Fasta: no diffrence | ||
# velvetg . -exp_cov auto | # velvetg . -exp_cov auto | ||
---- | ---- |
Revision as of 19:03, 20 October 2009
Data
Wustl
- http://gordonlab.wustl.edu/MengWu/WU_JBC_2009.html
- http://pathogenomics.bham.ac.uk/blog/2009/09/tips-for-de-novo-bacterial-genome-assembly/
- http://www.jbc.org/content/early/2009/09/01/jbc.M109.052738.abstract
NCBI complete genomes
- Genome info
id len gc% 1 NC_000915.1 1667867 38.87 Helicobacter pylori 26695 2 NC_000921.1 1643831 39.19 Helicobacter pylori J99 3 NC_008086.1 1596366 39.08 Helicobacter pylori HPAG1 4 NC_010698.2 1608548 38.91 Helicobacter pylori Shi470 5 NC_011333.1 1652982 38.89 Helicobacter pylori G27 6 NC_011498.1 1673813 38.81 Helicobacter pylori P12 7 NC_012973.1 1576758 39.16 Helicobacter pylori B38
- nucmer -c 40 => ~200 alignments & 93-95% identity between genomes
- SNPs are mostly substitutions
- Alignment info (NC_000915 0cvg regions) :5-10% of genomes are unique
. elem min q1 q2 q3 max mean n50 sum 1 NC_000915-NC_000915 72 45 178 294 1890 10467 1013 1893 72976 #longest alignment has been removed 2 NC_000915-NC_000921 197 2 81 203 495 17816 644 2146 126988 3 NC_000915-NC_008086 151 3 103 242 894 26862 951 3103 143652 4 NC_000915-NC_010698 206 2 115 283 706 12779 726 1941 149744 5 NC_000915-NC_011333 138 2 111 260 695 7457 688 1941 95063 6 NC_000915-NC_011498 157 2 83 185 565 5362 505 1357 79337 7 NC_000915-NC_012973 140 2 108 239 526 37389 1018 5729 142568
- NC_000915 vs NC_000915 : nucmer -c 40
Align len . elem min q1 q2 q3 max mean n50 sum nucmer -c 20 484 20 21 26 82 10467 196 1892 95007 nucmer -c 40 72 45 178 294 1890 10467 1013 1893 72976
Align %id . elem min q1 q2 q3 max mean n50 sum -c 20 484 63.72 91.89 100.00 100.00 100.00 95 100 46014.71 -c 40 72 76.71 85.06 92.86 99.92 100.00 92 93 6639.6
Media:NC 000915-NC 000915.20.png, Media:NC 000915-NC 000915.40.png
NCBI SRA
- http://www.ncbi.nlm.nih.gov/sites/entrez?db=sra&term=SRP001104 (24 data sets; 10 not loaded)
- http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP001104
Other
Assemblies
Wustl
nl assembly ctgs min q1 q2 q3 max mean n50 sum reads 1 HPKX_1039_AG0C1.scarf.assembled-29-21 233 100 318 1700 9639 77743 7085 19912 1650899 0 2 HPKX_1039_AG0C2.solexa.txt.assembled-23-12 420 101 341 1684 5301 36588 3914 9412 1644043 5.2M 3 HPKX_1039_AG4C1.scarf.assembled-27-22 271 100 217 1421 8273 90368 6115 17093 1657230 0 4 HPKX_1039_AG4C2.solexa.txt.assembled-25-12 365 100 301 1595 5658 51523 4522 11890 1650547 6.8M 5 HPKX_1172_AG0C1_090424.solexa.txt.assembled-25-30 217 107 557 3370 10683 58848 7099 15527 1540507 8.6M 6 HPKX_1172_AG0C2_2lanes.assembled-21-8 1170 100 264 717 1768 11444 1319 2661 1543511 7.2M 7 HPKX_1172_AG4C1_090424.solexa.txt.assembled-23-20 377 103 355 2178 6166 35180 4169 9160 1571948 8.5M 8 HPKX_1172_AG4C2.solexa.txt.assembled-25-15 317 100 274 1540 6256 37505 4987 14946 1581161 6.0M 9 HPKX_1259_NL0C1.scarf.assembled-21-17 1704 100 264 598 1274 7953 936 1606 1595297 0 10 HPKX_1259_NL0C2.solexa.txt.assembled-23-12 410 102 240 928 4863 32792 3882 11295 1591864 3.6M 11 HPKX_1259_NL4C1.scarf.assembled-27-23 283 100 224 1098 6814 98400 5634 18624 1594699 0 12 HPKX_1259_NL4C2.solexa.txt.assembled-23-12 455 102 222 874 4348 32792 3520 11010 1601950 6.3M 13 HPKX_1379_NL0C1.scarf.assembled-27-22 295 100 230 1243 7551 59556 5539 15858 1634019 0 14 HPKX_1379_NL0C2.solexa.txt.assembled-23-12 416 100 216 1000 5177 53581 3931 11219 1635644 6.3M 15 HPKX_1379_NL4C1.scarf.assembled-25-23 328 100 227 1084 6601 61090 4996 14203 1638925 0 16 HPKX_1379_NL4C2.solexa.txt.assembled-25-20 291 100 231 1501 6751 64227 5539 15080 1612046 4.6M 17 HPKX_345_AG4C1.scarf.assembled-27-22 251 100 241 1272 8265 97643 6534 19718 1640151 0 24 HPKX_345_AG4C2 . 12.1M 18 HPKX_345_NL0C1.scarf.assembled-25-30 305 100 243 1146 6718 59632 5360 15874 1634815 0 19 HPKX_345_NL0C2_090424.solexa.txt.assembled-25-26 283 100 254 2009 8300 59229 5629 13524 1593064 11.1M 20 HPKX_438_AG0C1.scarf.assembled-27-25 267 100 348 1710 8311 87876 6071 16918 1620975 0M 21 HPKX_438_AG0C2.solexa.txt.assembled-23-18 407 102 396 1777 5455 31183 3963 8830 1613167 6.3M 22 HPKX_438_CA4C1.scarf.assembled-27-26 237 100 348 1580 8856 97139 6845 19582 1622487 0 23 HPKX_438_CA4C2.solexa.txt.assembled-23-11 485 101 363 1502 4408 35471 3332 7779 1616183 4.1M
CBCB
nl assembly ctgs min q1 q2 q3 max mean n50 sum reads 2 HPKX_1039_AG0C2 335 45 91 1376 5910 51990 4822 14468 1615385 4 HPKX_1039_AG4C2 371 45 76 297 5375 90459 4472 18459 1659148 5 HPKX_1172_AG0C1 516 45 81 244 2838 44992 3075 12094 1587142 6 HPKX_1172_AG0C2 1163 45 213 474 991 6591 740 1243 861451 7 HPKX_1172_AG4C1 2214 45 148 287 530 3776 405 589 898322 8 HPKX_1172_AG4C2 332 45 74 280 4800 50005 4741 20671 1574270 10 HPKX_1259_NL0C2 332 45 74 280 4800 50005 4741 20671 1574270 12 HPKX_1259_NL4C2 472 45 69 137 2017 59943 3381 18488 1596084 14 HPKX_1379_NL0C2 472 45 69 137 2017 59943 3381 18488 1596084 16 HPKX_1379_NL4C2 391 45 75 153 2314 79083 4184 22491 1635979 24 HPKX_345_AG4C2 1098 45 244 724 1799 25718 1367 2745 1501014 19 HPKX_345_NL0C2 1436 45 236 567 1260 8990 965 1796 1387059 21 HPKX_438_AG0C2 410 45 90 562 4467 49129 3962 13859 1624766 23 HPKX_438_CA4C2 398 45 161 1435 5137 37278 4068 12278 1619323
velvet_0.7.55:
- Fastq vs Fasta: no diffrence
- velvetg . -exp_cov auto
23 HPKX_438_CA4C2.solexa.txt.assembled-23-11
Reads
- 4.1M Solexa 36bp unpaired
- cvg =~ 80X
- ~9% of the reads contain at least one N
Quality QC:
. elem min q1 q2 q3 max mean n50 sum Ncount 4107397 3725799<=0 381598>0 0 0 0 0 35 1 34 5614614 avgQuality 4107397 118980<=20 3988417>20 0 19 26 29 34 22 28 91690471
Ncount==0 and avgQuality>=20 => 3013939 filtered reads (73%)
pos elem min q1 q2 q3 max mean n50 sum 0 4107397 0 32 33 33 33 28 33 116108100 1 4107397 0 30 33 34 34 27 33 114845690 5 4107397 0 27 32 33 34 26 33 109832819 10 4107397 0 23 31 33 34 25 32 102860584 20 4107397 0 17 28 31 34 22 30 92139539 30 4107397 0 2 21 28 34 17 27 70231217 32 4107397 0 2 19 26 34 15 26 63156733 35 4107397 0 2 2 25 34 13 26 55261361
12mer counts: too much error???
meryl -C -B -m 12 -s prefix.seq -o prefix.12mers meryl -Dh -s prefix.12mers | sort -nk2 -r | more 1 1876075 0.3452 0.0196 2 1009161 0.5308 0.0407 ... 9 36227 0.7772 0.1017 10 25866 0.7819 0.1044 48 20812 0.8729 0.2727 # read cvg ??? 49 20726 0.8768 0.2833 ...
Velvet (all reads)
Ctg stats for different velveth hash_lengths:
hash #ctgs min q1 q2 q3 max mean n50 sum 19 908 37 161 770 2289 21014 1732 3905 1572704 21 457 41 84 580 4156 49037 3548 12777 1621652 23 398 45 161 1435 5137 37278 4068 12278 1619323 (CBCB best*) 27 769 53 341 1163 2731 18704 2109 4319 1622389 ? 485 101 363 1502 4408 35471 3332 7779 1616183 (WUSTL)
CBCB best* read cvg =~ 23; repeats at higher cvg
. #ctgs min q1 q2 q3 max mean n50 sum cvg 398 13 21 23 25 139 30 25 .
Velvet (filtered reads)
Hash_len=23
Ctg stats
filter #ctgs min q1 q2 q3 max mean n50 sum #reads 0cvg(all 6 genomes) all 398 45 161 1435 5137 37278 4068 12278 1619323 4107397 765345 noN 420 45 115 1194 4898 37173 3863 11880 1622651 3725799 759343 avgqual20+ 424 45 150 1302 4797 36335 3828 11749 1623136 3069665 757697 noN.avgqual20+ 453 45 137 1093 4394 36335 3586 11461 1624653 3013939 756829 !!! least seq missing
AMOScmp
Ref : NC_000915
Ctg stats:
params #ctgs min q1 q2 q3 max mean n50 sum #readsInCtgs -l 16 -c 32 -ovl 10 9533 36 59 96 168 3160 136 185 1302448 1,123,025 (~25%) 1,023,929:0SNP 154,958:1SNP 8,532:2SNP ... -l 8 -c 24 -ovl 10 4429 36 62 152 430 5518 350 806 1554095 2,438,762 (~50%) 1,159,669:0SNP 977,687:1SNP 422,738:2SNP ... -l 8 -c 24 -ovl 5 3880 36 61 158 492 5883 400 966 1553027 2,438,762 (~50%)
nucmer 0cvg stats:
params #gaps min q1 q2 q3 max mean n50 sum -l 16 -c 32 8708 2 10 19 41 9286 44 91 388608 -l 8 -c 24 2650 2 8 17 45 2347 56 207 150099
24 HPKX_345_AG4C2
Reads
- 12.1M Solexa 36bp unpaired
- cvg =~ 120X ?
Velvet
Ctg stats :
hash #ctgs min q1 q2 q3 max mean n50 sum 23 1098 45 244 724 1799 25718 1367 2745 1501014
NC_011498
NC_011498.1 1673813 38.81
Reads
- 1.67M Simulated reads 36bp; unpaired; 100% correct;
- the reads were generated by breaking the genome in 36bp segments (35bp ovl)=>36X cvg
Velvet
- Ctg stats :
hash #ctgs min q1 q2 q3 max mean n50 sum 23 292 45 67 164 3422 73108 5654 33268 1651121
Euler-sr
- Ctg stats :
vertex_size #ctgs min q1 q2 q3 max mean n50 sum #misassemblies 0cvg 23 366 24 36 92 931 83748 4596 41745 1682410 . 25 343 26 39 98 1125 83752 5075 42087 1740988 4 27377 27 331 28 43 109 1125 83756 5016 41753 1660506 4 27392
AMOScmp
- Ref : NC_000915
- Ctg stats:
params #ctgs min q1 q2 q3 max mean n50 sum #readsInCtgs misassemblies(<95%length match) nucmer -l 16 -c 32 -ovl 10 8569 36 66 114 203 2897 164 231 1405371 836827 (~50%) 39 soap -v 5 -g 3 -s 12 -f 2; -ovl 10 1787 37 99 305 1129 14043 881 2283 1574554 1437078 (~85%) 69 soap -v 5 -g 0 -s 12 -f 2; -ovl 10 1789 37 98 304 1128 14043 880 2283 1574516 1437077 64 soap -v 3 -g 0 -s 12 -f 2; -ovl 10 3646 36 89 214 532 7184 424 857 1548982 55 soap -v 3 -g 0 -s 12 -f 2; -ovl 20 4957 36 81 174 389 4783 316 580 1567357 1353104 49
- 0cvg stats
params #gaps min q1 q2 q3 max mean n50 sum nucmer -l 16 -c 32 -ovl 10 8026 2 8 15 34 5384 36 75 291959 soap -v 5 -g 3 -s 12 -f 2; -ovl 10 1042 2 6 19 46 5355 93 746 97176
minimus* on velvet contigs
. ctgs min q1 q2 q3 max mean n50 sum velvet 292 45 67 164 3422 73108 5654 33268 1651121
ctgs+sing min q1 q2 q3 max mean n50 sum misas. minimus2(delta-filter -1; OVL=20) 191 45 108 465 10421 117862 8631 33268 1648580 17(6) # ctgs-vs-ctgs; OVL=20
minimus3(delta-filter -q; OVL=20) 251 45 68 214 5486 73108 6573 33268 1650065 11() # ctgs-vs-ref => ctgs-vs-ctgs minimus3(delta-filter -q; OVL=5) 191 45 71 227 9222 73108 8631 41743 1648698 6(1)
minimus3( OVL=20) 204 45 115 560 10394 117862 8072 33268 1646865 5(1) minimus3( OVL=5) 172 45 134 611 12957 122309 9572 37367 1646538 8(4)
minimus3(all ref; OVL=20) 231 45 99 357 7424 117862 7138 33268 1648917 5(1) minimus3(all ref; OVL=5) 150 45 140 1177 15439 118850 10959 41743 1643998 15(7)