Ecoli germany: Difference between revisions
Jump to navigation
Jump to search
(→Data) |
|||
(17 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= | = Finished genomes = | ||
NC_011748 Escherichia coli 55989, complete genome; Length: 5,154,862 nt | |||
NC_011752 Escherichia coli 55989 plasmid 55989p, complete sequence; Length: 72,482 nt | |||
NC_013353 Escherichia coli O103:H2 str. 12009, complete genome; Length: 5,449,314 nt | |||
NC_013354 Escherichia coli O103:H2 str. 12009 plasmid pO103, complete sequence; Length: 75,546 nt | |||
... | |||
NC_004914 Stx2 converting phage II, complete genome Length: 62,706 nt | |||
= Data = | = Data = | ||
* [http://www.genomics.cn/en/news_show.php?type=show&id=644 BGI Sequences Genome of the Deadly E. Coli in Germany and Reveals New Super-Toxic Strain June 2nd 2011] | |||
* [ftp://ftp.genomics.org.cn/pub/Ecoli_TY-2482/ BGI run1-7 & assembly] | |||
. elem min q1 q2 q3 max mean n50 sum | |||
run1 92370 5 99 106 109 123 102 107 9433083 | |||
run2 122208 5 96 105 109 133 100 106 12248530 | |||
run3 96765 5 100 106 110 129 103 107 9958873 | |||
run4 222275 5 101 107 110 135 103 107 22924825 | |||
run5 95750 5 92 103 108 133 97 104 9379125 | |||
run1-5 629368 5 99 106 109 135 102 107 63944436 (12.2X) | |||
run6 79341 5 104 108 111 137 105 108 8355410 | |||
run7 74388 5 97 105 109 129 100 106 7469876 | |||
run1-7 783097 5 99 106 110 137 102 107 79769722 (15.2X) | |||
ctg 1217 100 251 938 5215 72019 4274 13577 5201850 | |||
ctg.v2 513 62 346 896 4903 204342 10330 53266 5299150 | |||
* [http://www.ncbi.nlm.nih.gov/nuccore/AFOG00000000 Escherichia coli O104:H4 TY-2482, whole genome shotgun sequencing project @NCBI] | |||
* [http://en.wikipedia.org/wiki/Escherichia_coli_O104:H4 Wikipedia Escherichia_coli_O104:H4] | |||
* [http://en.wikipedia.org/wiki/Shiga_toxin Wikipedia Shiga_toxin] | |||
= Links = | = Links = | ||
Line 34: | Line 36: | ||
* http://omicsomics.blogspot.com/2011/05/ion-torrents-data-quality-is-pretty.html | * http://omicsomics.blogspot.com/2011/05/ion-torrents-data-quality-is-pretty.html | ||
* http://pathogenomics.bham.ac.uk/blog/2011/05/first-look-at-ion-torrent-data-de-novo-assembly/ | * http://pathogenomics.bham.ac.uk/blog/2011/05/first-look-at-ion-torrent-data-de-novo-assembly/ | ||
* [http://pathogenomics.bham.ac.uk/blog/2011/06/ehec-genome-assembly/] | |||
= | = EdgeBio Assembly = | ||
* Used CLC & newbler | |||
* http://www.edgebio.com/data/ion/ecoli_bgi/ | * http://www.edgebio.com/data/ion/ecoli_bgi/ | ||
read QC: FastQC | read QC: FastQC | ||
Line 47: | Line 49: | ||
read alignment: CLC; 85% of untrimmed reads aligned | read alignment: CLC; 85% of untrimmed reads aligned | ||
6X+ cvg regions: 490 SNPS s & 1,848 INDELS. | |||
De novo assembly of 90K reads that did not map, 58K assembled together => 363 contigs ranging in size from 400 to 7800 bp. | |||
Contigs were blasted against NRNT at NCBI with a minimum .01 e-value. | |||
Hits: E. coli strains (such as O83:H1 and O42) , Shigella flexneri, Salmonella, and Cronobacter. | |||
De novo assembly => 3,297 contigs with an N25 of 4K and N50 of 2.5K. | De novo assembly => 3,297 contigs with an N25 of 4K and N50 of 2.5K. | ||
. elem min q1 q2 q3 max mean n50 sum | |||
. | ctg.denovo 363 200 329 638 1421 7816 1071 1753 388833 | ||
ctg | |||
== | = CBCB best assembly = | ||
. | == run1-5 == | ||
ctg | * Stats | ||
. elem min q1 q2 q3 max mean n50 sum | |||
ctg 505 35 204 758 8927 185186 9823 41503 4960767 | |||
ctg.denovo 4261 31 41 58 104 2713 119 31 506842 | |||
total 4766 31 44 59 161 185186 1147 41503 5467609 | |||
* Method (run1-5): | |||
The reads were first assembled using Ecoli_55989 as reference. The unmapped reads were assembled denovo using ABYSS. | |||
total reads: 629,368 | |||
reads aligned to Ecoli_55989 using "bwa bwasw": 545,233 (the consensus was called using "samtools pileup") | |||
unaligned reads : 84,135 | |||
* Ftp file locations (run1-5) | |||
ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/ | |||
ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/assemble.sh | |||
ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.summary | |||
ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.ctg.fasta | |||
ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.ctg.denovo.fasta | |||
== run1-7 == | |||
* Stats | |||
. elem min q1 q2 q3 max mean n50 sum | |||
ctg 445 41 202 651 7622 185186 11156 46725 4964630 | |||
ctg.denovo 4704 31 39 57 88 2990 113 31 531947 | |||
total 5149 31 40 58 129 185186 1068 46725 5496577 | |||
== run1-7 using Escherichia coli 55989 & Stx2 converting phage II as reference == | |||
* Stats: | |||
. #ctgs maxCtg sumCtg | |||
NC_011748 387 185186 4922625 | |||
NC_011752 50 6554 41684 | |||
NC_004914* 38 16960 44334 | |||
denovo 3745 2990 445588 | |||
* Location: | |||
/fs/szattic-asmg5/dpuiu/Ecoli_TY-2482/Assembly/sam.run1-7_phage | |||
== run1-7 ; Escherichia coli 55989 & viral db reference == | |||
* Stats: | |||
ref #ctgs maxCtg sumCtg refLen refGC refDescription | |||
NC_011748* 408 185186 4914924 5154862 50.66 Escherichia coli 55989, complete genome | |||
denovo* 3504 2990 412557 | |||
NC_004914* 38 18079 52920 62706 49.9 Stx2 converting phage II, complete genome | |||
NC_011752* 51 6554 41755 72482 46.13 Escherichia coli 55989 plasmid 55989p, complete sequence | |||
NC_009514 48 2592 21527 47021 49.11 Phage cdtI, complete genome | |||
NC_005344 7 7217 11583 39043 47.46 Enterobacteria phage Sf6, complete genome | |||
NC_011357 15 1613 7044 62147 50.91 Stx2-converting phage 1717, complete prophage genome | |||
NC_002371 3 6346 6565 41724 47.09 Enterobacteria phage P22 virus, complete genome | |||
NC_011356 9 2036 5840 54896 51.12 Enterobacteria phage YYZ-2008, complete prophage genome | |||
NC_003444 2 2817 4159 37074 50.76 Enterobacteria phage SfV, complete genome | |||
NC_004813 10 769 3269 57930 50.6 Enterobacteria phage BP-4795, complete genome | |||
NC_008464 4 1036 2922 60238 49.06 Stx2-converting phage 86, complete genome | |||
NC_005856 5 1204 2246 94800 47.31 Enterobacteria phage P1, complete genome | |||
NC_001416 9 351 1880 48502 49.85 Enterobacteria phage lambda, complete genome | |||
NC_000924 4 1199 1776 61670 49.36 Enterobacteria phage 933W, complete genome | |||
NC_002167 2 1339 1496 39732 49.78 Enterobacteria phage HK97, complete genome | |||
NC_003525 2 1078 1349 61765 49.38 Stx2 converting phage I, complete genome | |||
NC_010392 4 470 1118 48491 51.09 Phage Gifsy-1, complete genome | |||
NC_005841 5 412 1117 41391 47.43 Enterobacteria phage ST104, complete genome | |||
NC_003356 4 471 1107 42575 49.35 Enterobacteria phage phiP27, complete genome | |||
NC_002730 2 467 862 38297 46.68 Enterobacteria phage HK620, complete genome | |||
NC_002166 1 672 672 40751 49.48 Enterobacteria phage HK022, complete genome | |||
NC_004313 1 493 493 40149 51.01 Salmonella phage ST64B, complete genome | |||
NC_011976 1 419 419 43016 47.26 Salmonella phage epsilon34, complete genome | |||
NC_001954 3 148 349 8454 43.7 Enterobacteria phage If1, complete genome | |||
NC_007804 1 272 272 39104 48.97 Escherichia phage phiV10, complete genome | |||
NC_001895 2 130 209 33593 50.16 Enterobacteria phage P2, complete genome | |||
* Location: | |||
/fs/szattic-asmg5/dpuiu/Ecoli_TY-2482/Assembly/sam.run1-7_phage.redo/ | |||
= Other CBCB assemblies run1-5 = | |||
. elem min q1 q2 q3 max mean n50 sum | |||
CA.ctg+deg 22395 64 107 159 253 2367 204 216 4567575 | |||
newbler.Ecoli_55989.ctg 8357 100 218 397 696 5038 534 639 4465330 | |||
AMOScmp.Ecoli_55989.ctg 1321 65 425 1780 5257 40345 3774 7775 4985978 |
Latest revision as of 18:31, 14 June 2011
Finished genomes
NC_011748 Escherichia coli 55989, complete genome; Length: 5,154,862 nt NC_011752 Escherichia coli 55989 plasmid 55989p, complete sequence; Length: 72,482 nt NC_013353 Escherichia coli O103:H2 str. 12009, complete genome; Length: 5,449,314 nt NC_013354 Escherichia coli O103:H2 str. 12009 plasmid pO103, complete sequence; Length: 75,546 nt ...
NC_004914 Stx2 converting phage II, complete genome Length: 62,706 nt
Data
- BGI Sequences Genome of the Deadly E. Coli in Germany and Reveals New Super-Toxic Strain June 2nd 2011
- BGI run1-7 & assembly
. elem min q1 q2 q3 max mean n50 sum run1 92370 5 99 106 109 123 102 107 9433083 run2 122208 5 96 105 109 133 100 106 12248530 run3 96765 5 100 106 110 129 103 107 9958873 run4 222275 5 101 107 110 135 103 107 22924825 run5 95750 5 92 103 108 133 97 104 9379125 run1-5 629368 5 99 106 109 135 102 107 63944436 (12.2X) run6 79341 5 104 108 111 137 105 108 8355410 run7 74388 5 97 105 109 129 100 106 7469876 run1-7 783097 5 99 106 110 137 102 107 79769722 (15.2X) ctg 1217 100 251 938 5215 72019 4274 13577 5201850 ctg.v2 513 62 346 896 4903 204342 10330 53266 5299150
- Escherichia coli O104:H4 TY-2482, whole genome shotgun sequencing project @NCBI
- Wikipedia Escherichia_coli_O104:H4
- Wikipedia Shiga_toxin
Links
- http://www.iontorrent.com/
- http://omicsomics.blogspot.com/2011/05/ion-torrents-data-quality-is-pretty.html
- http://pathogenomics.bham.ac.uk/blog/2011/05/first-look-at-ion-torrent-data-de-novo-assembly/
- [1]
EdgeBio Assembly
- Used CLC & newbler
- http://www.edgebio.com/data/ion/ecoli_bgi/
read QC: FastQC read trimming: CLC
. elem min q1 q2 q3 max mean n50 sum run 629368 5 99 106 109 135 102 107 63944436 run.trimmed 617257 10 46 69 86 94 64 79 39745910
read alignment: CLC; 85% of untrimmed reads aligned 6X+ cvg regions: 490 SNPS s & 1,848 INDELS.
De novo assembly of 90K reads that did not map, 58K assembled together => 363 contigs ranging in size from 400 to 7800 bp. Contigs were blasted against NRNT at NCBI with a minimum .01 e-value. Hits: E. coli strains (such as O83:H1 and O42) , Shigella flexneri, Salmonella, and Cronobacter.
De novo assembly => 3,297 contigs with an N25 of 4K and N50 of 2.5K.
. elem min q1 q2 q3 max mean n50 sum ctg.denovo 363 200 329 638 1421 7816 1071 1753 388833
CBCB best assembly
run1-5
- Stats
. elem min q1 q2 q3 max mean n50 sum ctg 505 35 204 758 8927 185186 9823 41503 4960767 ctg.denovo 4261 31 41 58 104 2713 119 31 506842 total 4766 31 44 59 161 185186 1147 41503 5467609
- Method (run1-5):
The reads were first assembled using Ecoli_55989 as reference. The unmapped reads were assembled denovo using ABYSS. total reads: 629,368 reads aligned to Ecoli_55989 using "bwa bwasw": 545,233 (the consensus was called using "samtools pileup") unaligned reads : 84,135
- Ftp file locations (run1-5)
ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/ ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/assemble.sh ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.summary ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.ctg.fasta ftp://ftp.cbcb.umd.edu/pub/data/assembly/Ecoli_TY-2482/asm.ctg.denovo.fasta
run1-7
- Stats
. elem min q1 q2 q3 max mean n50 sum ctg 445 41 202 651 7622 185186 11156 46725 4964630 ctg.denovo 4704 31 39 57 88 2990 113 31 531947 total 5149 31 40 58 129 185186 1068 46725 5496577
run1-7 using Escherichia coli 55989 & Stx2 converting phage II as reference
- Stats:
. #ctgs maxCtg sumCtg NC_011748 387 185186 4922625 NC_011752 50 6554 41684 NC_004914* 38 16960 44334 denovo 3745 2990 445588
- Location:
/fs/szattic-asmg5/dpuiu/Ecoli_TY-2482/Assembly/sam.run1-7_phage
run1-7 ; Escherichia coli 55989 & viral db reference
- Stats:
ref #ctgs maxCtg sumCtg refLen refGC refDescription NC_011748* 408 185186 4914924 5154862 50.66 Escherichia coli 55989, complete genome denovo* 3504 2990 412557 NC_004914* 38 18079 52920 62706 49.9 Stx2 converting phage II, complete genome NC_011752* 51 6554 41755 72482 46.13 Escherichia coli 55989 plasmid 55989p, complete sequence NC_009514 48 2592 21527 47021 49.11 Phage cdtI, complete genome NC_005344 7 7217 11583 39043 47.46 Enterobacteria phage Sf6, complete genome NC_011357 15 1613 7044 62147 50.91 Stx2-converting phage 1717, complete prophage genome NC_002371 3 6346 6565 41724 47.09 Enterobacteria phage P22 virus, complete genome NC_011356 9 2036 5840 54896 51.12 Enterobacteria phage YYZ-2008, complete prophage genome NC_003444 2 2817 4159 37074 50.76 Enterobacteria phage SfV, complete genome NC_004813 10 769 3269 57930 50.6 Enterobacteria phage BP-4795, complete genome NC_008464 4 1036 2922 60238 49.06 Stx2-converting phage 86, complete genome NC_005856 5 1204 2246 94800 47.31 Enterobacteria phage P1, complete genome NC_001416 9 351 1880 48502 49.85 Enterobacteria phage lambda, complete genome NC_000924 4 1199 1776 61670 49.36 Enterobacteria phage 933W, complete genome NC_002167 2 1339 1496 39732 49.78 Enterobacteria phage HK97, complete genome NC_003525 2 1078 1349 61765 49.38 Stx2 converting phage I, complete genome NC_010392 4 470 1118 48491 51.09 Phage Gifsy-1, complete genome NC_005841 5 412 1117 41391 47.43 Enterobacteria phage ST104, complete genome NC_003356 4 471 1107 42575 49.35 Enterobacteria phage phiP27, complete genome NC_002730 2 467 862 38297 46.68 Enterobacteria phage HK620, complete genome NC_002166 1 672 672 40751 49.48 Enterobacteria phage HK022, complete genome NC_004313 1 493 493 40149 51.01 Salmonella phage ST64B, complete genome NC_011976 1 419 419 43016 47.26 Salmonella phage epsilon34, complete genome NC_001954 3 148 349 8454 43.7 Enterobacteria phage If1, complete genome NC_007804 1 272 272 39104 48.97 Escherichia phage phiV10, complete genome NC_001895 2 130 209 33593 50.16 Enterobacteria phage P2, complete genome
- Location:
/fs/szattic-asmg5/dpuiu/Ecoli_TY-2482/Assembly/sam.run1-7_phage.redo/
Other CBCB assemblies run1-5
. elem min q1 q2 q3 max mean n50 sum CA.ctg+deg 22395 64 107 159 253 2367 204 216 4567575 newbler.Ecoli_55989.ctg 8357 100 218 397 696 5038 534 639 4465330 AMOScmp.Ecoli_55989.ctg 1321 65 425 1780 5257 40345 3774 7775 4985978