Helicobacter pylori
Jump to navigation
Jump to search
Data
Wustl
- http://gordonlab.wustl.edu/MengWu/WU_JBC_2009.html
- http://pathogenomics.bham.ac.uk/blog/2009/09/tips-for-de-novo-bacterial-genome-assembly/
- http://www.jbc.org/content/early/2009/09/01/jbc.M109.052738.abstract
NCBI complete genomes
- Genome info
id len gc% 1 NC_000915.1 1667867 38.87 Helicobacter pylori 26695 2 NC_000921.1 1643831 39.19 Helicobacter pylori J99 3 NC_008086.1 1596366 39.08 Helicobacter pylori HPAG1 4 NC_010698.2 1608548 38.91 Helicobacter pylori Shi470 5 NC_011333.1 1652982 38.89 Helicobacter pylori G27 6 NC_011498.1 1673813 38.81 Helicobacter pylori P12 7 NC_012973.1 1576758 39.16 Helicobacter pylori B38
- ~200 alignments & 93-95% identity between genomes
- SNPs are mostly substitutions
- Alignment info (NC_000915 0cvg regions) :5-10% of genomes are unique
. elem min q1 q2 q3 max mean n50 sum 1 NC_000915-NC_000915 . . . . . . . . . 2 NC_000915-NC_000921 197 2 81 203 495 17816 644 2146 126988 3 NC_000915-NC_008086 151 3 103 242 894 26862 951 3103 143652 4 NC_000915-NC_010698 206 2 115 283 706 12779 726 1941 149744 5 NC_000915-NC_011333 138 2 111 260 695 7457 688 1941 95063 6 NC_000915-NC_011498 157 2 83 185 565 5362 505 1357 79337 7 NC_000915-NC_012973 140 2 108 239 526 37389 1018 5729 142568
NCBI SRA
- http://www.ncbi.nlm.nih.gov/sites/entrez?db=sra&term=SRP001104 (24 data sets; 10 not loaded)
- http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP001104
Other
Assemblies
Wustl
. . ctgs min q1 q2 q3 max mean n50 sum reads 1 HPKX_1039_AG0C1.scarf.assembled-29-21 233 100 318 1700 9639 77743 7085 19912 1650899 0 2 HPKX_1039_AG0C2.solexa.txt.assembled-23-12 420 101 341 1684 5301 36588 3914 9412 1644043 5.2M 3 HPKX_1039_AG4C1.scarf.assembled-27-22 271 100 217 1421 8273 90368 6115 17093 1657230 0 4 HPKX_1039_AG4C2.solexa.txt.assembled-25-12 365 100 301 1595 5658 51523 4522 11890 1650547 6.8M 5 HPKX_1172_AG0C1_090424.solexa.txt.assembled-25-30 217 107 557 3370 10683 58848 7099 15527 1540507 8.6M 6 HPKX_1172_AG0C2_2lanes.assembled-21-8 1170 100 264 717 1768 11444 1319 2661 1543511 7.2M 7 HPKX_1172_AG4C1_090424.solexa.txt.assembled-23-20 377 103 355 2178 6166 35180 4169 9160 1571948 8.5M 8 HPKX_1172_AG4C2.solexa.txt.assembled-25-15 317 100 274 1540 6256 37505 4987 14946 1581161 6.0M 9 HPKX_1259_NL0C1.scarf.assembled-21-17 1704 100 264 598 1274 7953 936 1606 1595297 0 10 HPKX_1259_NL0C2.solexa.txt.assembled-23-12 410 102 240 928 4863 32792 3882 11295 1591864 3.6M 11 HPKX_1259_NL4C1.scarf.assembled-27-23 283 100 224 1098 6814 98400 5634 18624 1594699 0 12 HPKX_1259_NL4C2.solexa.txt.assembled-23-12 455 102 222 874 4348 32792 3520 11010 1601950 6.3M 13 HPKX_1379_NL0C1.scarf.assembled-27-22 295 100 230 1243 7551 59556 5539 15858 1634019 0 14 HPKX_1379_NL0C2.solexa.txt.assembled-23-12 416 100 216 1000 5177 53581 3931 11219 1635644 6.3M 15 HPKX_1379_NL4C1.scarf.assembled-25-23 328 100 227 1084 6601 61090 4996 14203 1638925 0 16 HPKX_1379_NL4C2.solexa.txt.assembled-25-20 291 100 231 1501 6751 64227 5539 15080 1612046 4.6M 17 HPKX_345_AG4C1.scarf.assembled-27-22 251 100 241 1272 8265 97643 6534 19718 1640151 0 18 HPKX_345_NL0C1.scarf.assembled-25-30 305 100 243 1146 6718 59632 5360 15874 1634815 0 19 HPKX_345_NL0C2_090424.solexa.txt.assembled-25-26 283 100 254 2009 8300 59229 5629 13524 1593064 11.1M 20 HPKX_438_AG0C1.scarf.assembled-27-25 267 100 348 1710 8311 87876 6071 16918 1620975 0M 21 HPKX_438_AG0C2.solexa.txt.assembled-23-18 407 102 396 1777 5455 31183 3963 8830 1613167 6.3M 22 HPKX_438_CA4C1.scarf.assembled-27-26 237 100 348 1580 8856 97139 6845 19582 1622487 0 23 HPKX_438_CA4C2.solexa.txt.assembled-23-11 485 101 363 1502 4408 35471 3332 7779 1616183 4.1M 24 HPKX_345_AG4C2 . 12.1M
CBCB
velvet_0.7.55:
- Fastq vs Fasta: no diffrence
- velvetg . -exp_cov auto
23 HPKX_438_CA4C2.solexa.txt.assembled-23-11
Reads =
- 4.1M Solexa 36bp unpaired
- cvg =~ 80X
Quality QC:
. elem <=0 >0 min q1 q2 q3 max mean n50 sum Ncount 4107397 3725799 381598 0 0 0 0 35 1 34 5614614 avgQuality 4107397 118980 3988417 0 19 26 29 34 22 28 91690471
pos elem min q1 q2 q3 max mean n50 sum 0 4107397 0 32 33 33 33 28 33 116108100 1 4107397 0 30 33 34 34 27 33 114845690 5 4107397 0 27 32 33 34 26 33 109832819 10 4107397 0 23 31 33 34 25 32 102860584 20 4107397 0 17 28 31 34 22 30 92139539 30 4107397 0 2 21 28 34 17 27 70231217 32 4107397 0 2 19 26 34 15 26 63156733 35 4107397 0 2 2 25 34 13 26 55261361
Velvet
Ctg stats for different velveth hash_lengths:
hash #ctgs min q1 q2 q3 max mean n50 sum 19 908 37 161 770 2289 21014 1732 3905 1572704 21 457 41 84 580 4156 49037 3548 12777 1621652 23 398 45 161 1435 5137 37278 4068 12278 1619323 (CBCB best*) 27 769 53 341 1163 2731 18704 2109 4319 1622389 ? 485 101 363 1502 4408 35471 3332 7779 1616183 (WUSTL)
CBCB best* read cvg =~ 23; repeats at higher cvg
. #ctgs min q1 q2 q3 max mean n50 sum cvg 398 13 21 23 25 139 30 25 .
12mer counts: too much error???
meryl -C -B -m 12 -s prefix.seq -o prefix.12mers meryl -Dh -s prefix.12mers | sort -nk2 -r | more 1 1876075 0.3452 0.0196 2 1009161 0.5308 0.0407 ... 9 36227 0.7772 0.1017 10 25866 0.7819 0.1044 48 20812 0.8729 0.2727 # read cvg ??? 49 20726 0.8768 0.2833 ...
AMOScmp
Ref : NC_000915
Ctg stats:
params #ctgs min q1 q2 q3 max mean n50 sum #readsInCtgs -l 16 -c 32 -ovl 10 9533 36 59 96 168 3160 136 185 1302448 1,123,025 (~25%) 1,023,929:0SNP 154,958:1SNP 8,532:2SNP ... -l 8 -c 24 -ovl 10 4429 36 62 152 430 5518 350 806 1554095 2,438,762 (~50%) 1,159,669:0SNP 977,687:1SNP 422,738:2SNP ... -l 8 -c 24 -ovl 5 3880 36 61 158 492 5883 400 966 1553027 2,438,762 (~50%)
nucmer 0cvg stats:
params #gaps min q1 q2 q3 max mean n50 sum -l 16 -c 32 8708 2 10 19 41 9286 44 91 388608 -l 8 -c 24 2650 2 8 17 45 2347 56 207 150099
24 HPKX_345_AG4C2
Reads
- 12.1M Solexa 36bp unpaired
- cvg =~ 120X ?
Velvet
Ctg stats :
hash #ctgs min q1 q2 q3 max mean n50 sum 23 1098 45 244 724 1799 25718 1367 2745 1501014
NC_011498
NC_011498.1 1673813 38.81
Reads
- 1.67M Simulated reads 36bp; unpaired; 100% correct;
- the reads were generated by breaking the genome in 36bp segments (35bp ovl)=>36X cvg
Velvet
- Ctg stats :
hash #ctgs min q1 q2 q3 max mean n50 sum 23 292 45 67 164 3422 73108 5654 33268 1651121
Euler-sr
- Ctg stats :
vertex_size #ctgs min q1 q2 q3 max mean n50 sum #misassemblies 23 366 24 36 92 931 83748 4596 41745 1682410 . 25 343 26 39 98 1125 83752 5075 42087 1740988 7 27 331 28 43 109 1125 83756 5016 41753 1660506 6
AMOScmp
- Ref : NC_000915
- Ctg stats:
params #ctgs min q1 q2 q3 max mean n50 sum #readsInCtgs nucmer -l 16 -c 32 -ovl 10 8569 36 66 114 203 2897 164 231 1405371 836827 (~50%) soap -v 5 -g 3 -s 12 -f 2; -ovl 10 1787 37 99 305 1129 14043 881 2283 1574554 1437078 (~85%)
- 0cvg stats
params #gaps min q1 q2 q3 max mean n50 sum nucmer -l 16 -c 32 -ovl 10 8026 2 8 15 34 5384 36 75 291959 soap -v 5 -g 3 -s 12 -f 2; -ovl 10 1042 2 6 19 46 5355 93 746 97176