Helicobacter pylori

From Cbcb
Revision as of 18:53, 16 October 2009 by Dpuiu (talk | contribs) (→‎Reads)
Jump to navigation Jump to search

Data

Wustl

NCBI complete genomes

  • Genome info
       id             len     gc%
    1  NC_000915.1    1667867 38.87  Helicobacter pylori 26695
    2  NC_000921.1    1643831 39.19  Helicobacter pylori J99
    3  NC_008086.1    1596366 39.08  Helicobacter pylori HPAG1
    4  NC_010698.2    1608548 38.91  Helicobacter pylori Shi470
    5  NC_011333.1    1652982 38.89  Helicobacter pylori G27
    6  NC_011498.1    1673813 38.81  Helicobacter pylori P12
    7  NC_012973.1    1576758 39.16  Helicobacter pylori B38
  • ~200 alignments & 93-95% identity between genomes
  • SNPs are mostly substitutions
  • Alignment info (NC_000915 0cvg regions) :5-10% of genomes are unique
       .                      elem  min  q1   q2   q3   max    mean  n50   sum
    1  NC_000915-NC_000915    .     .    .    .    .    .      .     .     .
    2  NC_000915-NC_000921    197   2    81   203  495  17816  644   2146  126988
    3  NC_000915-NC_008086    151   3    103  242  894  26862  951   3103  143652
    4  NC_000915-NC_010698    206   2    115  283  706  12779  726   1941  149744
    5  NC_000915-NC_011333    138   2    111  260  695  7457   688   1941  95063
    6  NC_000915-NC_011498    157   2    83   185  565  5362   505   1357  79337
    7  NC_000915-NC_012973    140   2    108  239  526  37389  1018  5729  142568

NCBI SRA

Other

Assemblies

Wustl

    .  .                                                    ctgs  min  q1   q2    q3     max    mean  n50    sum       reads     
    1  HPKX_1039_AG0C1.scarf.assembled-29-21                233   100  318  1700  9639   77743  7085  19912  1650899   0
    2  HPKX_1039_AG0C2.solexa.txt.assembled-23-12           420   101  341  1684  5301   36588  3914  9412   1644043   5.2M
    3  HPKX_1039_AG4C1.scarf.assembled-27-22                271   100  217  1421  8273   90368  6115  17093  1657230   0
    4  HPKX_1039_AG4C2.solexa.txt.assembled-25-12           365   100  301  1595  5658   51523  4522  11890  1650547   6.8M
    5  HPKX_1172_AG0C1_090424.solexa.txt.assembled-25-30    217   107  557  3370  10683  58848  7099  15527  1540507   8.6M
    6  HPKX_1172_AG0C2_2lanes.assembled-21-8                1170  100  264  717   1768   11444  1319  2661   1543511   7.2M
    7  HPKX_1172_AG4C1_090424.solexa.txt.assembled-23-20    377   103  355  2178  6166   35180  4169  9160   1571948   8.5M
    8  HPKX_1172_AG4C2.solexa.txt.assembled-25-15           317   100  274  1540  6256   37505  4987  14946  1581161   6.0M
    9  HPKX_1259_NL0C1.scarf.assembled-21-17                1704  100  264  598   1274   7953   936   1606   1595297   0  
   10  HPKX_1259_NL0C2.solexa.txt.assembled-23-12           410   102  240  928   4863   32792  3882  11295  1591864   3.6M
   11  HPKX_1259_NL4C1.scarf.assembled-27-23                283   100  224  1098  6814   98400  5634  18624  1594699   0
   12  HPKX_1259_NL4C2.solexa.txt.assembled-23-12           455   102  222  874   4348   32792  3520  11010  1601950   6.3M
   13  HPKX_1379_NL0C1.scarf.assembled-27-22                295   100  230  1243  7551   59556  5539  15858  1634019   0
   14  HPKX_1379_NL0C2.solexa.txt.assembled-23-12           416   100  216  1000  5177   53581  3931  11219  1635644   6.3M
   15  HPKX_1379_NL4C1.scarf.assembled-25-23                328   100  227  1084  6601   61090  4996  14203  1638925   0
   16  HPKX_1379_NL4C2.solexa.txt.assembled-25-20           291   100  231  1501  6751   64227  5539  15080  1612046   4.6M
   17  HPKX_345_AG4C1.scarf.assembled-27-22                 251   100  241  1272  8265   97643  6534  19718  1640151   0
   18  HPKX_345_NL0C1.scarf.assembled-25-30                 305   100  243  1146  6718   59632  5360  15874  1634815   0
   19  HPKX_345_NL0C2_090424.solexa.txt.assembled-25-26     283   100  254  2009  8300   59229  5629  13524  1593064   11.1M
   20  HPKX_438_AG0C1.scarf.assembled-27-25                 267   100  348  1710  8311   87876  6071  16918  1620975   0M
   21  HPKX_438_AG0C2.solexa.txt.assembled-23-18            407   102  396  1777  5455   31183  3963  8830   1613167   6.3M
   22  HPKX_438_CA4C1.scarf.assembled-27-26                 237   100  348  1580  8856   97139  6845  19582  1622487   0
   23  HPKX_438_CA4C2.solexa.txt.assembled-23-11            485   101  363  1502  4408   35471  3332  7779   1616183   4.1M
   24  HPKX_345_AG4C2                                       .                                                          12.1M

CBCB

velvet_0.7.55:

  1. Fastq vs Fasta: no diffrence
  2. velvetg . -exp_cov auto

23 HPKX_438_CA4C2.solexa.txt.assembled-23-11

Reads

  • 4.1M Solexa 36bp unpaired
  • cvg =~ 80X

Quality QC:

 .                  elem       <=0        >0         min    q1     q2     q3     max        mean       n50        sum
 Ncount             4107397    3725799    381598     0      0      0      0      35         1          34         5614614
 avgQuality         4107397    118980     3988417    0      19     26     29     34         22         28         91690471
 Ncount==0 and avgQuality>=20 => 3013939 filtered reads
 pos                  elem       min    q1     q2     q3     max        mean       n50        sum
 0                    4107397    0      32     33     33     33         28         33         116108100
 1                    4107397    0      30     33     34     34         27         33         114845690
 5                    4107397    0      27     32     33     34         26         33         109832819
 10                   4107397    0      23     31     33     34         25         32         102860584
 20                   4107397    0      17     28     31     34         22         30         92139539
 30                   4107397    0      2      21     28     34         17         27         70231217
 32                   4107397    0      2      19     26     34         15         26         63156733
 35                   4107397    0      2      2      25     34         13         26         55261361

Velvet

Ctg stats for different velveth hash_lengths:

 hash  #ctgs    min  q1   q2    q3    max    mean  n50    sum
 19    908      37   161  770   2289  21014  1732  3905   1572704
 21    457      41   84   580   4156  49037  3548  12777  1621652
 23    398      45   161  1435  5137  37278  4068  12278  1619323  (CBCB best*)
 27    769      53   341  1163  2731  18704  2109  4319   1622389

 ?     485      101  363  1502  4408  35471  3332  7779   1616183  (WUSTL)

CBCB best* read cvg =~ 23; repeats at higher cvg

 .     #ctgs    min  q1   q2    q3    max    mean  n50    sum
 cvg   398      13   21   23    25    139    30    25     .

12mer counts: too much error???

 meryl -C -B -m 12 -s prefix.seq -o prefix.12mers
 meryl -Dh -s prefix.12mers | sort -nk2 -r | more
 1       1876075 0.3452  0.0196
 2       1009161 0.5308  0.0407
 ...
 9       36227   0.7772  0.1017
 10      25866   0.7819  0.1044
 48      20812   0.8729  0.2727  # read cvg ??? 
 49      20726   0.8768  0.2833
 ...

AMOScmp

Ref : NC_000915

Ctg stats:

 params                           #ctgs      min    q1     q2     q3     max        mean       n50        sum       #readsInCtgs
 -l 16 -c 32 -ovl 10              9533       36     59     96     168    3160       136        185        1302448   1,123,025 (~25%) 1,023,929:0SNP 154,958:1SNP 8,532:2SNP ...
 -l 8  -c 24 -ovl 10              4429       36     62     152    430    5518       350        806        1554095   2,438,762 (~50%) 1,159,669:0SNP 977,687:1SNP 422,738:2SNP ...
 -l 8  -c 24 -ovl 5               3880       36     61     158    492    5883       400        966        1553027   2,438,762 (~50%)

nucmer 0cvg stats:

params                            #gaps      min    q1     q2     q3     max        mean       n50        sum 
 -l 16 -c 32                      8708       2      10     19     41     9286       44         91         388608 
 -l 8  -c 24                      2650       2      8      17     45     2347       56         207        150099

24 HPKX_345_AG4C2

Reads

  • 12.1M Solexa 36bp unpaired
  • cvg =~ 120X ?

Velvet

Ctg stats :

 hash  #ctgs    min  q1   q2    q3    max    mean  n50    sum
 23    1098     45   244  724   1799  25718  1367  2745   1501014

NC_011498

NC_011498.1 1673813 38.81

Reads

  • 1.67M Simulated reads 36bp; unpaired; 100% correct;
  • the reads were generated by breaking the genome in 36bp segments (35bp ovl)=>36X cvg

Velvet

  • Ctg stats :
 hash                     #ctgs    min  q1   q2    q3    max    mean  n50    sum
 23                       292      45   67   164   3422  73108  5654  33268  1651121

Euler-sr

  • Ctg stats :
 vertex_size              #ctgs    min  q1   q2    q3    max    mean  n50    sum        #misassemblies
 23                       366      24   36   92    931   83748  4596  41745  1682410    .
 25                       343      26   39   98    1125  83752  5075  42087  1740988    7
 27                       331      28   43  109    1125  83756  5016  41753  1660506    6

AMOScmp

  • Ref : NC_000915
  • Ctg stats:
 params                                   #ctgs    min  q1   q2    q3    max    mean  n50    sum        #readsInCtgs     misassemblies(<95%length match)
 nucmer -l 16 -c 32 -ovl 10               8569     36   66   114   203   2897   164   231    1405371    836827  (~50%)  39 
 soap -v 5 -g 3 -s 12 -f 2; -ovl 10       1787     37   99   305   1129  14043  881   2283   1574554    1437078 (~85%)  69
  • 0cvg stats
 params                                   #gaps    min  q1   q2    q3    max    mean  n50    sum
 nucmer -l 16 -c 32 -ovl 10               8026     2    8    15    34    5384   36    75     291959
 soap -v 5 -g 3 -s 12 -f 2; -ovl 10       1042     2    6    19    46    5355   93    746    97176