Helicobacter pylori: Difference between revisions
Jump to navigation
Jump to search
(→CBCB) |
|||
Line 66: | Line 66: | ||
# Fastq vs Fasta: no diffrence | # Fastq vs Fasta: no diffrence | ||
# velvetg . -exp_cov auto | # velvetg . -exp_cov auto | ||
=== 23 HPKX_438_CA4C2.solexa.txt.assembled-23-11 === | === 23 HPKX_438_CA4C2.solexa.txt.assembled-23-11 === | ||
Line 74: | Line 73: | ||
* cvg =~ 80X | * cvg =~ 80X | ||
Velvet | ==== Velvet ==== | ||
Ctg stats for different velveth hash_lengths: | Ctg stats for different velveth hash_lengths: | ||
Line 102: | Line 99: | ||
49 20726 0.8768 0.2833 | 49 20726 0.8768 0.2833 | ||
... | ... | ||
==== AMOScmp ==== | |||
Ref : NC_000915 | |||
Ctg stats: | |||
params #ctgs min q1 q2 q3 max mean n50 sum #readsInCtgs | |||
-l 16 -c 32 -ovl 10 9533 36 59 96 168 3160 136 185 1302448 1,123,025 | |||
-l 8 -c 24 -ovl 10 4429 36 62 152 430 5518 350 806 1554095 2,438,762 |
Revision as of 20:20, 15 October 2009
Data
Wustl
- http://gordonlab.wustl.edu/MengWu/WU_JBC_2009.html
- http://pathogenomics.bham.ac.uk/blog/2009/09/tips-for-de-novo-bacterial-genome-assembly/
- http://www.jbc.org/content/early/2009/09/01/jbc.M109.052738.abstract
NCBI complete genomes
- Genome info
id len gc% 1 NC_000915.1 1667867 38.87 Helicobacter pylori 26695 2 NC_000921.1 1643831 39.19 Helicobacter pylori J99 3 NC_008086.1 1596366 39.08 Helicobacter pylori HPAG1 4 NC_010698.2 1608548 38.91 Helicobacter pylori Shi470 5 NC_011333.1 1652982 38.89 Helicobacter pylori G27 6 NC_011498.1 1673813 38.81 Helicobacter pylori P12 7 NC_012973.1 1576758 39.16 Helicobacter pylori B38
- Alignment info (NC_000915 0cvg regions) :5-10% of genomes are unique
. elem min q1 q2 q3 max mean n50 sum 1 NC_000915-NC_000915 . . . . . . . . . 2 NC_000915-NC_000921 197 2 81 203 495 17816 644 2146 126988 3 NC_000915-NC_008086 151 3 103 242 894 26862 951 3103 143652 4 NC_000915-NC_010698 206 2 115 283 706 12779 726 1941 149744 5 NC_000915-NC_011333 138 2 111 260 695 7457 688 1941 95063 6 NC_000915-NC_011498 157 2 83 185 565 5362 505 1357 79337 7 NC_000915-NC_012973 140 2 108 239 526 37389 1018 5729 142568
NCBI SRA
- http://www.ncbi.nlm.nih.gov/sites/entrez?db=sra&term=SRP001104 (24 data sets; 10 not loaded)
- http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP001104
Assemblies
Wustl
. . ctgs min q1 q2 q3 max mean n50 sum reads 1 HPKX_1039_AG0C1.scarf.assembled-29-21 233 100 318 1700 9639 77743 7085 19912 1650899 0 2 HPKX_1039_AG0C2.solexa.txt.assembled-23-12 420 101 341 1684 5301 36588 3914 9412 1644043 5.2M 3 HPKX_1039_AG4C1.scarf.assembled-27-22 271 100 217 1421 8273 90368 6115 17093 1657230 0 4 HPKX_1039_AG4C2.solexa.txt.assembled-25-12 365 100 301 1595 5658 51523 4522 11890 1650547 6.8M 5 HPKX_1172_AG0C1_090424.solexa.txt.assembled-25-30 217 107 557 3370 10683 58848 7099 15527 1540507 8.6M 6 HPKX_1172_AG0C2_2lanes.assembled-21-8 1170 100 264 717 1768 11444 1319 2661 1543511 7.2M 7 HPKX_1172_AG4C1_090424.solexa.txt.assembled-23-20 377 103 355 2178 6166 35180 4169 9160 1571948 8.5M 8 HPKX_1172_AG4C2.solexa.txt.assembled-25-15 317 100 274 1540 6256 37505 4987 14946 1581161 6.0M 9 HPKX_1259_NL0C1.scarf.assembled-21-17 1704 100 264 598 1274 7953 936 1606 1595297 0 10 HPKX_1259_NL0C2.solexa.txt.assembled-23-12 410 102 240 928 4863 32792 3882 11295 1591864 3.6M 11 HPKX_1259_NL4C1.scarf.assembled-27-23 283 100 224 1098 6814 98400 5634 18624 1594699 0 12 HPKX_1259_NL4C2.solexa.txt.assembled-23-12 455 102 222 874 4348 32792 3520 11010 1601950 6.3M 13 HPKX_1379_NL0C1.scarf.assembled-27-22 295 100 230 1243 7551 59556 5539 15858 1634019 0 14 HPKX_1379_NL0C2.solexa.txt.assembled-23-12 416 100 216 1000 5177 53581 3931 11219 1635644 6.3M 15 HPKX_1379_NL4C1.scarf.assembled-25-23 328 100 227 1084 6601 61090 4996 14203 1638925 0 16 HPKX_1379_NL4C2.solexa.txt.assembled-25-20 291 100 231 1501 6751 64227 5539 15080 1612046 4.6M 17 HPKX_345_AG4C1.scarf.assembled-27-22 251 100 241 1272 8265 97643 6534 19718 1640151 0 18 HPKX_345_NL0C1.scarf.assembled-25-30 305 100 243 1146 6718 59632 5360 15874 1634815 0 19 HPKX_345_NL0C2_090424.solexa.txt.assembled-25-26 283 100 254 2009 8300 59229 5629 13524 1593064 11.1M 20 HPKX_438_AG0C1.scarf.assembled-27-25 267 100 348 1710 8311 87876 6071 16918 1620975 0M 21 HPKX_438_AG0C2.solexa.txt.assembled-23-18 407 102 396 1777 5455 31183 3963 8830 1613167 6.3M 22 HPKX_438_CA4C1.scarf.assembled-27-26 237 100 348 1580 8856 97139 6845 19582 1622487 0 23 HPKX_438_CA4C2.solexa.txt.assembled-23-11 485 101 363 1502 4408 35471 3332 7779 1616183 4.1M 24 HPKX_345_AG4C2 . 12.1M
CBCB
velvet_0.7.55:
- Fastq vs Fasta: no diffrence
- velvetg . -exp_cov auto
23 HPKX_438_CA4C2.solexa.txt.assembled-23-11
Reads
- 4.1M Solexa 36bp unpaired
- cvg =~ 80X
Velvet
Ctg stats for different velveth hash_lengths:
hash #ctgs min q1 q2 q3 max mean n50 sum 19 908 37 161 770 2289 21014 1732 3905 1572704 21 457 41 84 580 4156 49037 3548 12777 1621652 23 398 45 161 1435 5137 37278 4068 12278 1619323 (CBCB best*) 27 769 53 341 1163 2731 18704 2109 4319 1622389 ? 485 101 363 1502 4408 35471 3332 7779 1616183 (WUSTL)
CBCB best* read cvg =~ 23; repeats at higher cvg
. #ctgs min q1 q2 q3 max mean n50 sum cvg 398 13 21 23 25 139 30 25 .
12mer counts: too much error???
meryl -C -B -m 12 -s prefix.seq -o prefix.12mers meryl -Dh -s prefix.12mers | sort -nk2 -r | more 1 1876075 0.3452 0.0196 2 1009161 0.5308 0.0407 ... 9 36227 0.7772 0.1017 10 25866 0.7819 0.1044 48 20812 0.8729 0.2727 # read cvg ??? 49 20726 0.8768 0.2833 ...
AMOScmp
Ref : NC_000915
Ctg stats:
params #ctgs min q1 q2 q3 max mean n50 sum #readsInCtgs -l 16 -c 32 -ovl 10 9533 36 59 96 168 3160 136 185 1302448 1,123,025 -l 8 -c 24 -ovl 10 4429 36 62 152 430 5518 350 806 1554095 2,438,762