Helicobacter pylori
Jump to navigation
Jump to search
Data
Wustl
NCBI complete genomes
- Genome info
id len gc% 1 NC_000915.1 1667867 38.87 Helicobacter pylori 26695 2 NC_000921.1 1643831 39.19 Helicobacter pylori J99 3 NC_008086.1 1596366 39.08 Helicobacter pylori HPAG1 4 NC_010698.2 1608548 38.91 Helicobacter pylori Shi470 5 NC_011333.1 1652982 38.89 Helicobacter pylori G27 6 NC_011498.1 1673813 38.81 Helicobacter pylori P12 7 NC_012973.1 1576758 39.16 Helicobacter pylori B38
- nucmer -c 40 => ~200 alignments & 93-95% identity between genomes
- SNPs are mostly substitutions
- Alignment info (NC_000915 0cvg regions) :5-10% of genomes are unique
. elem min q1 q2 q3 max mean n50 sum 1 NC_000915-NC_000915 72 45 178 294 1890 10467 1013 1893 72976 #longest alignment has been removed 2 NC_000915-NC_000921 197 2 81 203 495 17816 644 2146 126988 3 NC_000915-NC_008086 151 3 103 242 894 26862 951 3103 143652 4 NC_000915-NC_010698 206 2 115 283 706 12779 726 1941 149744 5 NC_000915-NC_011333 138 2 111 260 695 7457 688 1941 95063 6 NC_000915-NC_011498 157 2 83 185 565 5362 505 1357 79337 7 NC_000915-NC_012973 140 2 108 239 526 37389 1018 5729 142568
- NC_000915 vs NC_000915 : nucmer -c 40
Align len . elem min q1 q2 q3 max mean n50 sum nucmer -c 20 484 20 21 26 82 10467 196 1892 95007 nucmer -c 40 72 45 178 294 1890 10467 1013 1893 72976
Align %id . elem min q1 q2 q3 max mean n50 sum -c 20 484 63.72 91.89 100.00 100.00 100.00 95 100 46014.71 -c 40 72 76.71 85.06 92.86 99.92 100.00 92 93 6639.6
Media:NC 000915-NC 000915.20.png, Media:NC 000915-NC 000915.40.png
NCBI SRA
- http://www.ncbi.nlm.nih.gov/sites/entrez?db=sra&term=SRP001104 (24 data sets; 10 not loaded)
- http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP001104
Other
Assemblies
Wustl
velvet contigs 100bp+ stats:
nl assembly ctgs min q1 q2 q3 max mean n50 sum reads 0cvg 1 HPKX_1039_AG0C1.scarf.assembled-29-21 233 100 318 1700 9639 77743 7085 19912 1650899 6.4m 674288 2 HPKX_1039_AG0C2.solexa.txt.assembled-23-12 420 101 341 1684 5301 36588 3914 9412 1644043 5.2M 729736 3 HPKX_1039_AG4C1.scarf.assembled-27-22 271 100 217 1421 8273 90368 6115 17093 1657230 6.0m 687296 4 HPKX_1039_AG4C2.solexa.txt.assembled-25-12 365 100 301 1595 5658 51523 4522 11890 1650547 6.8M 708607 5 HPKX_1172_AG0C1_090424.solexa.txt.assembled-25-30 217 107 557 3370 10683 58848 7099 15527 1540507 8.6M 1068848 6 HPKX_1172_AG0C2_2lanes.assembled-21-8 1170 100 264 717 1768 11444 1319 2661 1543511 7.2M 1110550 95.8%id to HP_HPAG1 7 HPKX_1172_AG4C1_090424.solexa.txt.assembled-23-20 377 103 355 2178 6166 35180 4169 9160 1571948 8.5M 1106858 8 HPKX_1172_AG4C2.solexa.txt.assembled-25-15 317 100 274 1540 6256 37505 4987 14946 1581161 6.0M 812671 9 HPKX_1259_NL0C1.scarf.assembled-21-17 1704 100 264 598 1274 7953 936 1606 1595297 4.2m 963211 10 HPKX_1259_NL0C2.solexa.txt.assembled-23-12 410 102 240 928 4863 32792 3882 11295 1591864 3.6M 824474 11 HPKX_1259_NL4C1.scarf.assembled-27-23 283 100 224 1098 6814 98400 5634 18624 1594699 6.4m 797825 12 HPKX_1259_NL4C2.solexa.txt.assembled-23-12 455 102 222 874 4348 32792 3520 11010 1601950 6.3M 833155 13 HPKX_1379_NL0C1.scarf.assembled-27-22 295 100 230 1243 7551 59556 5539 15858 1634019 6.1m 730236 14 HPKX_1379_NL0C2.solexa.txt.assembled-23-12 416 100 216 1000 5177 53581 3931 11219 1635644 6.3M 754915 15 HPKX_1379_NL4C1.scarf.assembled-25-23 328 100 227 1084 6601 61090 4996 14203 1638925 5.0m 716540 16 HPKX_1379_NL4C2.solexa.txt.assembled-25-20 291 100 231 1501 6751 64227 5539 15080 1612046 4.6M 785276 17 HPKX_345_AG4C1.scarf.assembled-27-22 251 100 241 1272 8265 97643 6534 19718 1640151 4.5m 727208 18 HPKX_345_AG4C2 . . . . . . . . . 12.1M . 19 HPKX_345_NL0C1.scarf.assembled-25-30 305 100 243 1146 6718 59632 5360 15874 1634815 5.7m 759194 20 HPKX_345_NL0C2_090424.solexa.txt.assembled-25-26 283 100 254 2009 8300 59229 5629 13524 1593064 11.1M 1067288 21 HPKX_438_AG0C1.scarf.assembled-27-25 267 100 348 1710 8311 87876 6071 16918 1620975 5.8m 755933 22 HPKX_438_AG0C2.solexa.txt.assembled-23-18 407 102 396 1777 5455 31183 3963 8830 1613167 6.3M 804474 23 HPKX_438_CA4C1.scarf.assembled-27-26 237 100 348 1580 8856 97139 6845 19582 1622487 6.0m 742559 24 HPKX_438_CA4C2.solexa.txt.assembled-23-11 485 101 363 1502 4408 35471 3332 7779 1616183 4.1M 801123
CBCB
velvet contigs 100bp+ stats:
nl assembly ctgs min q1 q2 q3 max mean n50 sum reads 0cvg qual comment 2 HPKX_1039_AG0C2 417 101 730 2132 5402 34482 3937 8041 1642106 2.9M 722358 q28 little worse 4 HPKX_1039_AG4C2 250 100 287 2134 9314 90459 6604 18459 1651103 6.7M 703430 q00 5 HPKX_1172_AG0C1 213 100 229 1521 9559 93714 7423 22783 1581145 7.1M 795376 q20 6* HPKX_1172_AG0C2 1122 100 466 900 1639 7935 1243 1857 1395595 3.1M 1915029 q30 worse 7 HPKX_1172_AG4C1 313 100 373 2061 7176 67930 5131 13160 1606127 6.0M 811067 q30 8 HPKX_1172_AG4C2 218 102 280 1561 10678 79935 7225 22481 1575172 5.1M 802832 q20 10 HPKX_1259_NL0C2 271 102 296 1687 7254 78486 5869 17821 1590749 3.1M 811063 q20 12 HPKX_1259_NL4C2 311 101 214 901 6216 78488 5137 17546 1597850 5.5M 816013 q20 14 HPKX_1379_NL0C2 237 101 213 1039 8905 79512 6874 22931 1629236 3.9M 744144 q20 16 HPKX_1379_NL4C2 230 101 225 1440 9348 79083 7055 21887 1622754 4.8M 742039 q20 18 HPKX_345_AG4C2 1130 101 389 846 1732 14790 1349 2295 1525450 6.5M 1283074 q20 missing 20 HPKX_345_NL0C2 260 101 216 1515 8697 59766 6286 18718 1634563 9.1M 750135 q20 22 HPKX_438_AG0C2 272 102 433 2271 7395 53896 5945 14906 1617041 4.7M 768889 q20 24 HPKX_438_CA4C2 356 102 432 2029 6132 36335 4546 11461 1618724 3.0M 775835 q20 Files: /fs/szasmg3/dpuiu/Helicobacter_pylori/HPKX_*/velvet/
velvet-merged contigs 100bp+ stats: (merged based on alignments to the 7 complete genomes) ; minOVL=5bp
nl assembly ctgs min q1 q2 q3 max mean n50 sum 2 HPKX_1039_AG0C2 230 101 575 2665 10199 81516 7129 17062 1639841 4 HPKX_1039_AG4C2 192 100 318 2402 12716 95840 8601 21767 1651530 5 HPKX_1172_AG0C1 161 100 229 1602 13093 98369 9826 33745 1582047 6 HPKX_1172_AG0C2 275 100 774 2840 7251 45605 5732 12775 1576564 # merged AMOScmp(6**) and velvet(6*) contigs; did not use the complete genomes for alignments minOVL=40bp 7 HPKX_1172_AG4C1 204 103 415 2527 12920 76550 7865 19077 1604640 8 HPKX_1172_AG4C2 183 102 256 1344 13094 79936 8612 25128 1576072 10 HPKX_1259_NL0C2 198 102 296 1152 10414 91140 8036 24879 1591324 12 HPKX_1259_NL4C2 255 101 216 729 8686 78488 6272 20602 1599364 14 HPKX_1379_NL0C2 183 101 226 1039 11932 96425 8912 32446 1630904 16 HPKX_1379_NL4C2 167 101 262 1511 13732 95996 9720 32534 1623349 18 HPKX_345_AG4C2 751 101 416 1129 2583 24913 2025 3967 1520951 20 HPKX_345_NL0C2 208 101 239 1581 10973 87982 7862 23967 1635408 22 HPKX_438_AG0C2 198 102 393 2767 10090 96481 8165 23774 1616761 24 HPKX_438_CA4C2 224 102 433 2320 10298 53648 7220 19679 1617399 Files: /fs/szasmg3/dpuiu/Helicobacter_pylori/HPKX_*/velvet/minimus3/ /fs/szasmg3/dpuiu/Helicobacter_pylori/HPKX_*/minimus2/
AMOScmp contigs 100bp+ stats:
nl assembly ctgs min q1 q2 q3 max mean n50 sum reads 0cvg qual comment 2 HPKX_1039_AG0C2 286 100 341 2402 8354 90182 5718 12867 1635626 4.7M . q00 HPKX_1039_AG4C2 ref 6** HPKX_1172_AG0C2 367 100 300 1727 5675 37710 4294 10457 1575949 5.5M . q00 HPKX_1172_AG0C1 ref
Files: /fs/szasmg3/dpuiu/Helicobacter_pylori/HPKX_*/AMOScmp
velvet_0.7.55:
- Fastq vs Fasta: no diffrence
- velvetg . -exp_cov auto
6 HPKX_1172_AG0C2
Reads:
- all : 7.1M
- q30+: 3.1M
- aligned by soap Helicobacter pylori HPAG1 : 4.8M
- aligned by soap Helicobacter pylori HPKX_1172_AG0C1 : 5.5M
velvet
. elem min q1 q2 q3 max mean n50 sum reads 0cvg qual ctgs 1239 45 346 799 1528 7935 1132 1834 1403538 3.1M 1889408 q30 ctgs.100+ 1122 100 466 900 1639 7935 1243 1857 1395595 3.1M 1915029 q30
AMOScmp-shortReads (ref HP_HPAG1)
. elem min q1 q2 q3 max mean n50 sum reads 0cvg qual ctgs.all 1334 36 78 238 1283 16118 1146 3978 1529868 4.8M 1137259 q00 ctgs.100+ 905 100 223 728 2152 16133 1662 4073 1504146 . . q00
Directory:
/fs/szasmg3/dpuiu/Helicobacter_pylori/HPKX_1172_AG0C2.6/AMOScmp.HP_HPAG1
AMOScmp-shortReads (ref 5 HPKX_1172_AG0C1)
. elem min q1 q2 q3 max mean n50 sum reads 0cvg qual ctgs.all 392 37 227 1470 5481 37710 4024 10457 1577557 5.5M . q00 ctgs.100+ 367 100 300 1727 5675 37710 4294 10457 1575949 ref 213 100 229 1521 9559 93714 7423 22783 1581145
Directory:
/fs/szasmg3/dpuiu/Helicobacter_pylori/HPKX_1172_AG0C2.6/AMOScmp.HPKX_1172_AG0C1
18 HPKX_345_AG4C2
Reads
- 12.1M Solexa 36bp unpaired
- cvg =~ 120X ?
Velvet
Ctg stats :
hash #ctgs min q1 q2 q3 max mean n50 sum 23 1098 45 244 724 1799 25718 1367 2745 1501014
24 HPKX_438_CA4C2.solexa.txt.assembled-23-11
Reads
- 4.1M Solexa 36bp unpaired
- cvg =~ 80X
- ~9% of the reads contain at least one N
Quality QC:
. elem min q1 q2 q3 max mean n50 sum Ncount 4107397 3725799<=0 381598>0 0 0 0 0 35 1 34 5614614 avgQuality 4107397 118980<=20 3988417>20 0 19 26 29 34 22 28 91690471
Ncount==0 and avgQuality>=20 => 3013939 filtered reads (73%)
pos elem min q1 q2 q3 max mean n50 sum 0 4107397 0 32 33 33 33 28 33 116108100 1 4107397 0 30 33 34 34 27 33 114845690 5 4107397 0 27 32 33 34 26 33 109832819 10 4107397 0 23 31 33 34 25 32 102860584 20 4107397 0 17 28 31 34 22 30 92139539 30 4107397 0 2 21 28 34 17 27 70231217 32 4107397 0 2 19 26 34 15 26 63156733 35 4107397 0 2 2 25 34 13 26 55261361
12mer counts: too much error???
meryl -C -B -m 12 -s prefix.seq -o prefix.12mers meryl -Dh -s prefix.12mers | sort -nk2 -r | more 1 1876075 0.3452 0.0196 2 1009161 0.5308 0.0407 ... 9 36227 0.7772 0.1017 10 25866 0.7819 0.1044 48 20812 0.8729 0.2727 # read cvg ??? 49 20726 0.8768 0.2833 ...
Velvet (all reads)
Ctg stats for different velveth hash_lengths:
hash #ctgs min q1 q2 q3 max mean n50 sum 19 908 37 161 770 2289 21014 1732 3905 1572704 21 457 41 84 580 4156 49037 3548 12777 1621652 23 398 45 161 1435 5137 37278 4068 12278 1619323 (CBCB best*) 27 769 53 341 1163 2731 18704 2109 4319 1622389 ? 485 101 363 1502 4408 35471 3332 7779 1616183 (WUSTL)
CBCB best* read cvg =~ 23; repeats at higher cvg
. #ctgs min q1 q2 q3 max mean n50 sum cvg 398 13 21 23 25 139 30 25 .
Velvet (filtered reads)
Hash_len=23
Ctg stats
filter #ctgs min q1 q2 q3 max mean n50 sum #reads 0cvg(all 6 genomes) all 398 45 161 1435 5137 37278 4068 12278 1619323 4107397 765345 noN 420 45 115 1194 4898 37173 3863 11880 1622651 3725799 759343 avgqual20+ 424 45 150 1302 4797 36335 3828 11749 1623136 3069665 757697 noN.avgqual20+ 453 45 137 1093 4394 36335 3586 11461 1624653 3013939 756829 !!! least seq missing
AMOScmp
Ref : NC_000915
Ctg stats:
params #ctgs min q1 q2 q3 max mean n50 sum #readsInCtgs -l 16 -c 32 -ovl 10 9533 36 59 96 168 3160 136 185 1302448 1,123,025 (~25%) 1,023,929:0SNP 154,958:1SNP 8,532:2SNP ... -l 8 -c 24 -ovl 10 4429 36 62 152 430 5518 350 806 1554095 2,438,762 (~50%) 1,159,669:0SNP 977,687:1SNP 422,738:2SNP ... -l 8 -c 24 -ovl 5 3880 36 61 158 492 5883 400 966 1553027 2,438,762 (~50%)
nucmer 0cvg stats:
params #gaps min q1 q2 q3 max mean n50 sum -l 16 -c 32 8708 2 10 19 41 9286 44 91 388608 -l 8 -c 24 2650 2 8 17 45 2347 56 207 150099
NC_011498
NC_011498.1 1673813 38.81
Reads
- 1.67M Simulated reads 36bp; unpaired; 100% correct;
- the reads were generated by breaking the genome in 36bp segments (35bp ovl)=>36X cvg
Velvet
- Ctg stats :
hash #ctgs min q1 q2 q3 max mean n50 sum 23 292 45 67 164 3422 73108 5654 33268 1651121
Euler-sr
- Ctg stats :
vertex_size #ctgs min q1 q2 q3 max mean n50 sum #misassemblies 0cvg 23 366 24 36 92 931 83748 4596 41745 1682410 . 25 343 26 39 98 1125 83752 5075 42087 1740988 4 27377 27 331 28 43 109 1125 83756 5016 41753 1660506 4 27392
AMOScmp
- Ref : NC_000915
- Ctg stats:
params #ctgs min q1 q2 q3 max mean n50 sum #readsInCtgs misassemblies(<95%length match) nucmer -l 16 -c 32 -ovl 10 8569 36 66 114 203 2897 164 231 1405371 836827 (~50%) 39 soap -v 5 -g 3 -s 12 -f 2; -ovl 10 1787 37 99 305 1129 14043 881 2283 1574554 1437078 (~85%) 69 soap -v 5 -g 0 -s 12 -f 2; -ovl 10 1789 37 98 304 1128 14043 880 2283 1574516 1437077 64 soap -v 3 -g 0 -s 12 -f 2; -ovl 10 3646 36 89 214 532 7184 424 857 1548982 55 soap -v 3 -g 0 -s 12 -f 2; -ovl 20 4957 36 81 174 389 4783 316 580 1567357 1353104 49
- 0cvg stats
params #gaps min q1 q2 q3 max mean n50 sum nucmer -l 16 -c 32 -ovl 10 8026 2 8 15 34 5384 36 75 291959 soap -v 5 -g 3 -s 12 -f 2; -ovl 10 1042 2 6 19 46 5355 93 746 97176
minimus* on velvet contigs
. ctgs min q1 q2 q3 max mean n50 sum velvet 292 45 67 164 3422 73108 5654 33268 1651121
ctgs+sing min q1 q2 q3 max mean n50 sum misas. minimus2(delta-filter -1; OVL=20) 191 45 108 465 10421 117862 8631 33268 1648580 17(6) # ctgs-vs-ctgs; OVL=20
minimus3(delta-filter -q; OVL=20) 251 45 68 214 5486 73108 6573 33268 1650065 11() # ctgs-vs-ref => ctgs-vs-ctgs minimus3(delta-filter -q; OVL=5) 191 45 71 227 9222 73108 8631 41743 1648698 6(1)
minimus3( OVL=20) 204 45 115 560 10394 117862 8072 33268 1646865 5(1) minimus3( OVL=5) 172 45 134 611 12957 122309 9572 37367 1646538 8(4)
minimus3(all ref; OVL=20) 231 45 99 357 7424 117862 7138 33268 1648917 5(1) minimus3(all ref; OVL=5) 150 45 140 1177 15439 118850 10959 41743 1643998 15(7)