Ecoli germany: Difference between revisions

From Cbcb
Jump to navigation Jump to search
No edit summary
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Finished genomes =
= Finished genomes =
    NC_011748   Escherichia coli 55989, complete genome;                              Length: 5,154,862 nt
  NC_011748   Escherichia coli 55989, complete genome;                              Length: 5,154,862 nt
    NC_011752   Escherichia coli 55989 plasmid 55989p, complete sequence;            Length: 72,482 nt
  NC_011752   Escherichia coli 55989 plasmid 55989p, complete sequence;            Length: 72,482 nt
   
   
    NC_013353   Escherichia coli O103:H2 str. 12009, complete genome;                Length: 5,449,314 nt
  NC_013353   Escherichia coli O103:H2 str. 12009, complete genome;                Length: 5,449,314 nt
    NC_013354   Escherichia coli O103:H2 str. 12009 plasmid pO103, complete sequence; Length: 75,546 nt
  NC_013354   Escherichia coli O103:H2 str. 12009 plasmid pO103, complete sequence; Length: 75,546 nt
    ...
  ...
 
  NC_004914      Stx2 converting phage II, complete genome                            Length: 62,706 nt
 
= Data =
= Data =
* [http://www.genomics.cn/en/news_show.php?type=show&id=644 BGI Sequences Genome of the Deadly E. Coli in Germany and Reveals New Super-Toxic Strain June 2nd 2011]
* [http://www.genomics.cn/en/news_show.php?type=show&id=644 BGI Sequences Genome of the Deadly E. Coli in Germany and Reveals New Super-Toxic Strain June 2nd 2011]
Line 23: Line 26:
    
    
   ctg      1217        100  251  938  5215  72019  4274  13577  5201850
   ctg      1217        100  251  938  5215  72019  4274  13577  5201850
  ctg.v2    513          62  346  896  4903  204342 10330 53266  5299150
* [http://www.ncbi.nlm.nih.gov/nuccore/AFOG00000000 Escherichia coli O104:H4 TY-2482, whole genome shotgun sequencing project @NCBI]
* [http://en.wikipedia.org/wiki/Escherichia_coli_O104:H4 Wikipedia Escherichia_coli_O104:H4]
* [http://en.wikipedia.org/wiki/Shiga_toxin Wikipedia Shiga_toxin]


= Links =
= Links =
Line 28: Line 36:
* http://omicsomics.blogspot.com/2011/05/ion-torrents-data-quality-is-pretty.html
* http://omicsomics.blogspot.com/2011/05/ion-torrents-data-quality-is-pretty.html
* http://pathogenomics.bham.ac.uk/blog/2011/05/first-look-at-ion-torrent-data-de-novo-assembly/
* http://pathogenomics.bham.ac.uk/blog/2011/05/first-look-at-ion-torrent-data-de-novo-assembly/
* [http://pathogenomics.bham.ac.uk/blog/2011/06/ehec-genome-assembly/]


== EdgeBio Assembly ==
= EdgeBio Assembly =
* Used CLC & newbler
* Used CLC & newbler
* http://www.edgebio.com/data/ion/ecoli_bgi/
* http://www.edgebio.com/data/ion/ecoli_bgi/
Line 48: Line 57:
   De novo assembly => 3,297 contigs with an N25 of 4K and N50 of 2.5K.
   De novo assembly => 3,297 contigs with an N25 of 4K and N50 of 2.5K.


= CBCB best assembly (run1-5) =
  .            elem  min  q1  q2  q3    max    mean  n50    sum         
  ctg.denovo    363  200  329  638  1421  7816  1071  1753  388833


  .            elem  min  q1  q2  q3    max    mean  n50    sum   
= CBCB best assembly =
  ctg          505  35  204  758  8927  185186  9823  41503  4960767
  ctg.denovo    4261  31  41  58  104  2713    119  31    506842 
  total        4766  31  44  59  161  185186  1147  41503  5467609


* Method:
== run1-5 ==
* Stats
  .            elem  min  q1  q2  q3    max    mean  n50    sum   
  ctg          505  35  204  758  8927  185186  9823  41503  4960767
  ctg.denovo    4261  31  41  58  104  2713    119    31    506842 
  total        4766  31  44  59  161  185186  1147  41503  5467609
 
* Method (run1-5):
   The reads were first assembled using Ecoli_55989 as reference. The unmapped reads were assembled denovo using ABYSS.  
   The reads were first assembled using Ecoli_55989 as reference. The unmapped reads were assembled denovo using ABYSS.  
   total reads: 629,368
   total reads: 629,368
Line 61: Line 75:
   unaligned reads : 84,135
   unaligned reads : 84,135


* Ftp file locations
* Ftp file locations (run1-5)
   ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/
   ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/
   ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/assemble.sh
   ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/assemble.sh
Line 67: Line 81:
   ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.ctg.fasta
   ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.ctg.fasta
   ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.ctg.denovo.fasta
   ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.ctg.denovo.fasta
== run1-7 ==
* Stats
  .            elem  min  q1  q2  q3    max    mean  n50    sum           
  ctg          445  41  202  651  7622  185186  11156  46725  4964630       
  ctg.denovo    4704  31  39  57  88    2990    113    31    531947       
  total        5149  31  40  58  129  185186  1068  46725  5496577       
== run1-7 using  Escherichia coli 55989 & Stx2 converting phage II as reference ==
* Stats:
  .            #ctgs maxCtg  sumCtg           
  NC_011748    387  185186  4922625       
  NC_011752    50    6554    41684         
  NC_004914*  38    16960  44334         
  denovo      3745  2990    445588
* Location:
  /fs/szattic-asmg5/dpuiu/Ecoli_TY-2482/Assembly/sam.run1-7_phage
== run1-7 ; Escherichia coli 55989 & viral db reference ==
* Stats:
  ref          #ctgs maxCtg  sumCtg  refLen  refGC refDescription
  NC_011748*  408  185186  4914924  5154862 50.66 Escherichia coli 55989, complete genome
  denovo*      3504  2990    412557 
  NC_004914*  38    18079  52920    62706  49.9  Stx2 converting phage II, complete genome
  NC_011752*  51    6554    41755    72482  46.13 Escherichia coli 55989 plasmid 55989p, complete sequence
  NC_009514    48    2592    21527    47021  49.11 Phage cdtI, complete genome
  NC_005344    7    7217    11583    39043  47.46 Enterobacteria phage Sf6, complete genome
  NC_011357    15    1613    7044    62147  50.91 Stx2-converting phage 1717, complete prophage genome
  NC_002371    3    6346    6565    41724  47.09 Enterobacteria phage P22 virus, complete genome
  NC_011356    9    2036    5840    54896  51.12 Enterobacteria phage YYZ-2008, complete prophage genome
  NC_003444    2    2817    4159    37074  50.76 Enterobacteria phage SfV, complete genome
  NC_004813    10    769    3269    57930  50.6  Enterobacteria phage BP-4795, complete genome
  NC_008464    4    1036    2922    60238  49.06 Stx2-converting phage 86, complete genome
  NC_005856    5    1204    2246    94800  47.31 Enterobacteria phage P1, complete genome
  NC_001416    9    351    1880    48502  49.85 Enterobacteria phage lambda, complete genome
  NC_000924    4    1199    1776    61670  49.36 Enterobacteria phage 933W, complete genome
  NC_002167    2    1339    1496    39732  49.78 Enterobacteria phage HK97, complete genome
  NC_003525    2    1078    1349    61765  49.38 Stx2 converting phage I, complete genome
  NC_010392    4    470    1118    48491  51.09 Phage Gifsy-1, complete genome
  NC_005841    5    412    1117    41391  47.43 Enterobacteria phage ST104, complete genome
  NC_003356    4    471    1107    42575  49.35 Enterobacteria phage phiP27, complete genome
  NC_002730    2    467    862      38297  46.68 Enterobacteria phage HK620, complete genome
  NC_002166    1    672    672      40751  49.48 Enterobacteria phage HK022, complete genome
  NC_004313    1    493    493      40149  51.01 Salmonella phage ST64B, complete genome
  NC_011976    1    419    419      43016  47.26 Salmonella phage epsilon34, complete genome
  NC_001954    3    148    349      8454    43.7  Enterobacteria phage If1, complete genome
  NC_007804    1    272    272      39104  48.97 Escherichia phage phiV10, complete genome
  NC_001895    2    130    209      33593  50.16 Enterobacteria phage P2, complete genome
*  Location:
  /fs/szattic-asmg5/dpuiu/Ecoli_TY-2482/Assembly/sam.run1-7_phage.redo/


= Other CBCB assemblies run1-5 =
= Other CBCB assemblies run1-5 =

Latest revision as of 18:31, 14 June 2011

Finished genomes

 NC_011748	   Escherichia coli 55989, complete genome;                              Length: 5,154,862 nt
 NC_011752	   Escherichia coli 55989 plasmid 55989p, complete sequence;             Length: 72,482 nt

 NC_013353	   Escherichia coli O103:H2 str. 12009, complete genome;                 Length: 5,449,314 nt
 NC_013354	   Escherichia coli O103:H2 str. 12009 plasmid pO103, complete sequence; Length: 75,546 nt
 ...
 NC_004914      Stx2 converting phage II, complete genome                             Length: 62,706 nt

Data

 .         elem         min  q1   q2   q3    max    mean  n50    sum
 run1      92370        5    99   106  109   123    102   107    9433083
 run2      122208       5    96   105  109   133    100   106    12248530
 run3      96765        5    100  106  110   129    103   107    9958873
 run4      222275       5    101  107  110   135    103   107    22924825
 run5      95750        5    92   103  108   133    97    104    9379125
 run1-5    629368       5    99   106  109   135    102   107    63944436  (12.2X)
 
 run6      79341        5    104  108  111   137    105   108    8355410
 run7      74388        5    97   105  109   129    100   106    7469876
 run1-7    783097       5    99   106  110   137    102   107    79769722  (15.2X)
 
 ctg       1217         100  251  938  5215  72019  4274  13577  5201850
 ctg.v2    513          62   346  896  4903  204342 10330 53266  5299150

Links

EdgeBio Assembly

 read QC: FastQC 
 read trimming: CLC
 .                    elem       min    q1     q2     q3     max        mean       n50        sum          
 run                  629368     5      99     106    109    135        102        107        63944436 
 run.trimmed          617257     10     46     69     86     94         64         79         39745910   
 read alignment: CLC; 85% of untrimmed reads aligned
 6X+ cvg regions: 490 SNPS s & 1,848 INDELS.  
 De novo assembly of 90K reads that did not map, 58K assembled together => 363 contigs ranging in size from 400 to 7800 bp.  
 Contigs were blasted against NRNT at NCBI with a minimum .01 e-value. 
 Hits:  E. coli strains (such as O83:H1 and O42) , Shigella flexneri, Salmonella, and Cronobacter.  
 De novo assembly => 3,297 contigs with an N25 of 4K and N50 of 2.5K.
 .             elem  min  q1   q2   q3     max    mean  n50    sum           
 ctg.denovo    363   200  329  638  1421   7816   1071  1753   388833

CBCB best assembly

run1-5

  • Stats
 .             elem  min  q1   q2   q3    max     mean   n50    sum     
 ctg           505   35   204  758  8927  185186  9823   41503  4960767 
 ctg.denovo    4261  31   41   58   104   2713    119    31     506842  
 total         4766  31   44   59   161   185186  1147   41503  5467609 
  • Method (run1-5):
 The reads were first assembled using Ecoli_55989 as reference. The unmapped reads were assembled denovo using ABYSS. 
 total reads: 629,368
 reads aligned to Ecoli_55989 using "bwa bwasw": 545,233 (the consensus was called using "samtools pileup")
 unaligned reads : 84,135
  • Ftp file locations (run1-5)
 ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/
 ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/assemble.sh
 ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.summary
 ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.ctg.fasta
 ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.ctg.denovo.fasta

run1-7

  • Stats
 .             elem  min  q1   q2   q3    max     mean   n50    sum            
 ctg           445   41   202  651  7622  185186  11156  46725  4964630        
 ctg.denovo    4704  31   39   57   88    2990    113    31     531947         
 total         5149  31   40   58   129   185186  1068   46725  5496577        

run1-7 using Escherichia coli 55989 & Stx2 converting phage II as reference

  • Stats:
 .            #ctgs maxCtg  sumCtg            
 NC_011748    387   185186  4922625        
 NC_011752    50    6554    41684          
 NC_004914*   38    16960   44334          
 denovo       3745  2990    445588
  • Location:
  /fs/szattic-asmg5/dpuiu/Ecoli_TY-2482/Assembly/sam.run1-7_phage

run1-7 ; Escherichia coli 55989 & viral db reference

  • Stats:
 ref          #ctgs maxCtg  sumCtg   refLen  refGC refDescription
 NC_011748*   408   185186  4914924  5154862 50.66 Escherichia coli 55989, complete genome
 denovo*      3504  2990    412557   
 NC_004914*   38    18079   52920    62706   49.9  Stx2 converting phage II, complete genome
 NC_011752*   51    6554    41755    72482   46.13 Escherichia coli 55989 plasmid 55989p, complete sequence
 NC_009514    48    2592    21527    47021   49.11 Phage cdtI, complete genome
 NC_005344    7     7217    11583    39043   47.46 Enterobacteria phage Sf6, complete genome
 NC_011357    15    1613    7044     62147   50.91 Stx2-converting phage 1717, complete prophage genome
 NC_002371    3     6346    6565     41724   47.09 Enterobacteria phage P22 virus, complete genome
 NC_011356    9     2036    5840     54896   51.12 Enterobacteria phage YYZ-2008, complete prophage genome
 NC_003444    2     2817    4159     37074   50.76 Enterobacteria phage SfV, complete genome
 NC_004813    10    769     3269     57930   50.6  Enterobacteria phage BP-4795, complete genome
 NC_008464    4     1036    2922     60238   49.06 Stx2-converting phage 86, complete genome
 NC_005856    5     1204    2246     94800   47.31 Enterobacteria phage P1, complete genome
 NC_001416    9     351     1880     48502   49.85 Enterobacteria phage lambda, complete genome
 NC_000924    4     1199    1776     61670   49.36 Enterobacteria phage 933W, complete genome
 NC_002167    2     1339    1496     39732   49.78 Enterobacteria phage HK97, complete genome
 NC_003525    2     1078    1349     61765   49.38 Stx2 converting phage I, complete genome
 NC_010392    4     470     1118     48491   51.09 Phage Gifsy-1, complete genome
 NC_005841    5     412     1117     41391   47.43 Enterobacteria phage ST104, complete genome
 NC_003356    4     471     1107     42575   49.35 Enterobacteria phage phiP27, complete genome
 NC_002730    2     467     862      38297   46.68 Enterobacteria phage HK620, complete genome
 NC_002166    1     672     672      40751   49.48 Enterobacteria phage HK022, complete genome
 NC_004313    1     493     493      40149   51.01 Salmonella phage ST64B, complete genome
 NC_011976    1     419     419      43016   47.26 Salmonella phage epsilon34, complete genome
 NC_001954    3     148     349      8454    43.7  Enterobacteria phage If1, complete genome
 NC_007804    1     272     272      39104   48.97 Escherichia phage phiV10, complete genome
 NC_001895    2     130     209      33593   50.16 Enterobacteria phage P2, complete genome
  • Location:
 /fs/szattic-asmg5/dpuiu/Ecoli_TY-2482/Assembly/sam.run1-7_phage.redo/

Other CBCB assemblies run1-5

 .                         elem       min    q1     q2     q3     max        mean       n50        sum            
 CA.ctg+deg                22395      64     107    159    253    2367       204        216        4567575        
 newbler.Ecoli_55989.ctg   8357       100    218    397    696    5038       534        639        4465330
 AMOScmp.Ecoli_55989.ctg   1321       65     425    1780   5257   40345      3774       7775       4985978