Methanobrevibacter smithii: Difference between revisions
Jump to navigation
Jump to search
(55 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
'''Methanobrevibacter smithii ATCC 35061''' | '''Methanobrevibacter smithii ATCC 35061''' | ||
* complete: NC_009515.1 | * complete: NC_009515.1 1,853,160bp 31.03%GC | ||
* Ref: /fs/szdata/ncbi/ftp.ncbi.nih.gov/genomes/Bacteria/Methanobrevibacter_smithii_ATCC_35061/*fna | * Ref: /fs/szdata/ncbi/ftp.ncbi.nih.gov/genomes/Bacteria/Methanobrevibacter_smithii_ATCC_35061/*fna | ||
* Published: [http://www.pnas.org/content/104/25/10643.long Genomic and metabolic adaptations of Methanobrevibacter smithii to the human gut] PNAS | * Published: [http://www.pnas.org/content/104/25/10643.long Genomic and metabolic adaptations of Methanobrevibacter smithii to the human gut] PNAS | ||
* Assembled: Phrap and PCAP | * Assembled: Phrap and PCAP | ||
* IS elements: | |||
cat NC_009515.gff | grep "IS%20element" | grep CDS | awk '{print $1,$2,$3,$4,$5,$6,$7,$5-$4+1}' | |||
NC_009515.1 RefSeq CDS 504499 505509 . - 1011 | |||
NC_009515.1 RefSeq CDS 505464 505748 . - 285 | |||
NC_009515.1 RefSeq CDS 508688 508984 . + 297 | |||
NC_009515.1 RefSeq CDS 509066 509491 . + 426 | |||
NC_009515.1 RefSeq CDS 509745 509900 . + 156 | |||
NC_009515.1 RefSeq CDS 1542734 1543111 . - 378 | |||
NC_009515.1 RefSeq CDS 1543120 1543254 . - 135 | |||
NC_009515.1 RefSeq CDS 1543220 1543417 . - 198 | |||
* [[Media:methanobrevibacter_smithii.1con-1con.png|Repeats.png]] | * [[Media:methanobrevibacter_smithii.1con-1con.png|Repeats.png]] | ||
~/bin/RepeatSearch.amos -D REPEATLEN=36 ms | |||
. elem min q1 q2 q3 max mean n50 sum | |||
repeats.36+ 317 36 43 63 104 3732 182.96 1363 57999 | |||
uniq.36+ 309 1 24 109 2195 97573 5809.58 42534 1795160 | |||
repeats.350+ 26 386 691 1386 1406 3732 1369.12 1390 35597 | |||
uniq.350+ 25 1 13517 52288 90848 258405 72702.48 131389 1817562 | |||
'''Methanobrevibacter smithii DSM 2374''' | '''Methanobrevibacter smithii DSM 2374''' | ||
Line 29: | Line 43: | ||
== WUSTL (Gordon Lab) == | == WUSTL (Gordon Lab) == | ||
* | |||
* [http://cgsweb.wustl.edu/~hansene/assembly-stats.txt Stats] | |||
* [http://cgsweb.wustl.edu/~hansene/ Ftp] | |||
* Location: | * Location: | ||
/fs/szattic-asmg4/methanobrevibacter_smithii/Data | /fs/szattic-asmg4/methanobrevibacter_smithii/Data | ||
* 22 strains | * 22 strains | ||
* half-way through sequencing (should have all the data by early to mid-January) | * half-way through sequencing (should have all the data by early to mid-January) | ||
Line 48: | Line 62: | ||
4 TS145A 6536457 235312452 126 | 4 TS145A 6536457 235312452 126 | ||
5 TS145B 8277390 297986040 160 | 5 TS145B 8277390 297986040 160 | ||
TS146e4 26899427 968379372 522 # 8417021 of the reads (~ 160X cvg) have qual==32 | |||
6 TS94-3 4886376 175909536 94 | 6 TS94-3 4886376 175909536 94 | ||
7 TS94-5 4785200 172267200 92 | 7 TS94-5 4785200 172267200 92 | ||
Line 71: | Line 86: | ||
'''Illumina & 454''' : 7 strands have both Illumina & 454 reads | '''Illumina & 454''' : 7 strands have both Illumina & 454 reads | ||
Illumina 454 | |||
--------------------------- --------------------------- | |||
nl strand #reads #bases cvg #read #bases cvg avg%idFinishedGenome | nl strand #reads #bases cvg #read #bases cvg avg%idFinishedGenome | ||
1 FR1LH1 5186537 186715332 100 . . . | 1 FR1LH1 5186537 186715332 100 . . . 98 | ||
2 FR1LH3 7211080 259598880 140 . . . | 2 FR1LH3 7211080 259598880 140 . . . | ||
3 FR1LH6 4968262 178857432 96 . . . | 3 FR1LH6 4968262 178857432 96 . . . | ||
Line 79: | Line 96: | ||
5 TS145B 8277390 297986040 160 45203 15720997 8 | 5 TS145B 8277390 297986040 160 45203 15720997 8 | ||
6 TS146-3 . . . 49854 17608862 10 | 6 TS146-3 . . . 49854 17608862 10 | ||
7 TS146e4 | 7 <span style="background:yellow">TS146e4 26899427 968379372 522 58633 18306825 10</span> | ||
8* TS146e5A . . . 27844 8311560 4 98 | 8* TS146e5A . . . 27844 8311560 4 98 | ||
9 TS146e5B . . . 73182 23547825 13 | 9 TS146e5B . . . 73182 23547825 13 | ||
Line 90: | Line 107: | ||
15 TS95-4 3557512 128070432 69 85737 29231201 16 | 15 TS95-4 3557512 128070432 69 85737 29231201 16 | ||
16 TS95-5 4559830 164153880 88 96757 35000794 19 | 16 TS95-5 4559830 164153880 88 96757 35000794 19 | ||
nl strand IlluminaGC 454GC | |||
1 FR1LH1 38.89 . | |||
2 FR1LH3 33.33 . | |||
3 FR1LH6 33.33 . | |||
4 TS145A 41.67 31.54 | |||
5 TS145B 36.11 31.52 | |||
6 TS146-3 . 31.85 | |||
7 TS146e4 33.33 31.72 | |||
8 TS146e5A . 32.77 | |||
9 TS146e5B . 31.96 | |||
10 TS147e8 . 31.11 | |||
11 TS94-3 30.56 30.67 | |||
12 TS94-5 30.56 30.69 | |||
13 TS95-2 36.11 30.60 | |||
14 TS95-3 33.33 . | |||
15 TS95-4 30.56 30.70 | |||
16 TS95-5 33.33 30.56 | |||
Based on nucmer alignments of finished genome and newbler contigs looks like | Based on nucmer alignments of finished genome and newbler contigs looks like | ||
Line 96: | Line 133: | ||
= Assembly = | = Assembly = | ||
* Available online at: ftp://ftp.cbcb.umd.edu/pub/data/dpuiu/Methanobrevibacter_smithii/ | |||
* newbler assemblies are generally the "best": longer contigs, fewer 0cvg regions ... | |||
* CBCB velvet & newbler assemblies are slightly better than the WUSTL ones | |||
* velvet contigs slightly longer than the SOAPdenovo ones; | |||
* velvet assemblies contain slightly fewer bases than SOAPdenovo ones & have more 0cvg regions (compared to the reference) | |||
* can further merge the newbler(454) & velvet/SOAPdenovo(Illumina) assemblies; would merge some contigs together | |||
== AMOScmp.Illumina (CBCB) == | |||
#ctgs min q1 q2 q3 max mean n50 sum 0cgv | #ctgs min q1 q2 q3 max mean n50 sum 0cgv | ||
FR1LH1.1 1684 36 55 109 650 33114 958 5106 1614205 275309 | FR1LH1.1 1684 36 55 109 650 33114 958 5106 1614205 275309 | ||
FR1LH3.2 1642 36 55 106 631 33115 | FR1LH3.2 1642 36 55 106 631 33115* 984 5331 1616391 273301 | ||
FR1LH6.3 1649 36 55 108 660 26387 979 5180 1615744 273654 | FR1LH6.3 1649 36 55 108 660 26387 979 5180 1615744 273654 | ||
TS145A.4 1918 36 59 124 805 28850 838 3234 1608620 284953 | <span style="background:cyan">TS145A.4 1918 36 59 124 805 28850 838 3234 1608620 284953</span> | ||
TS145B.5 1723 36 58 110 647 33114 936 4463 1612823 277233 | TS145B.5 1723 36 58 110 647 33114 936 4463 1612823 277233 | ||
TS94-3.11 ??? | |||
TS94-5.12 8859 36 56 85 155 10840 159 242 1408957 658124 | TS94-5.12 8859 36 56 85 155 10840 159 242 1408957 658124 | ||
Line 113: | Line 158: | ||
TS95-5.16 9238 36 55 84 152 10160 155 235 1435894 645655 | TS95-5.16 9238 36 55 84 152 10160 155 235 1435894 645655 | ||
== SOAPdenovo.Illumina == | == SOAPdenovo.Illumina (CBCB) == | ||
#ctgs min q1 q2 q3 max mean n50 sum 0cgv | #ctgs min q1 q2 q3 max mean n50 sum 0cgv | ||
FR1LH1.1 774 45 75 137 2158 50285 2353 10320 1821758 218968 | FR1LH1.1 774 45 75 137 2158 50285 2353 10320 1821758 218968 | ||
FR1LH3.2 767 45 73 108 977 56788 | FR1LH3.2 767 45 73 108 977 56788* 2371 14018 1818948 221732 | ||
FR1LH6.3 757 45 74 129 1396 50013 2402 12294 1818674 219385 | FR1LH6.3 757 45 74 129 1396 50013 2402 12294 1818674 219385 | ||
TS145A.4 1798 45 127 459 1381 13456 991 2225 1781905 230976 | <span style="background:cyan">TS145A.4 1798 45 127 459 1381 13456 991 2225 1781905 230976</span> | ||
TS145B.5 1098 45 83 306 2231 20523 1631 4869 1791302 216380 | TS145B.5 1098 45 83 306 2231 20523 1631 4869 1791302 216380 | ||
Line 131: | Line 175: | ||
TS95-5.16 1259 45 63 85 285 43875 1576 12739 1984835 209415 | TS95-5.16 1259 45 63 85 285 43875 1576 12739 1984835 209415 | ||
== velvet.Illumina == | == velvet.Illumina (CBCB) == | ||
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv | |||
FR1LH1.1 518 45 73 158 3131 67401 3491 16101 1808646 219880 | FR1LH1.1 518 45 73 158 3131 67401 3491 16101 1808646 219880 | ||
FR1LH3.2 532 45 72 105 729 111372 3382 27042 1799368 227816 | FR1LH3.2 532 45 72 105 729 111372 3382 27042 1799368 227816 | ||
FR1LH6.3 547 45 72 111 992 85069 3309 20751 1810235 220469 | FR1LH6.3 547 45 72 111 992 85069 3309 20751 1810235 220469 | ||
TS145A.4 1002 45 398 1154 2228 14317 1575 2707 1578999 355737 | <span style="background:cyan">TS145A.4 1002 45 398 1154 2228 14317 1575 2707 1578999 355737</span> | ||
TS145B.5 687 45 105 773 3424 31627 2567 6873 1763953 223089 | TS145B.5 687 45 105 773 3424 31627 2567 6873 1763953 223089 | ||
<span style="background:yellow">TS146e4.7 375 45 71 129 1127 153113* 4740 40215 1777614 209640</span> # 522X cvg | |||
<span style="background:yellow">TS146e4.7.filt 7641 45 99 143 218 1244 177 207 1355197 626004</span> # 150X cvg (only q32 reads) | |||
TS94-3.11 738 45 63 102 755 99478 2526 17184 1864601 250197 | TS94-3.11 738 45 63 102 755 99478 2526 17184 1864601 250197 | ||
Line 149: | Line 195: | ||
TS95-5.16 881 45 65 76 350 47906 2226 15468 1961518 212351 | TS95-5.16 881 45 65 76 350 47906 2226 15468 1961518 212351 | ||
== | == velvet.Illumina (WUSTL) == | ||
strain elem min q1 q2 q3 max mean n50 sum | |||
FR1LH1.1 403 101 282 1950 5891 41268 4388 10473 1768436 | |||
FR1LH3.2 274 100 267 1375 7601 88330* 6469 19398 1772541 | |||
FR1LH6.3 291 100 224 1225 7594 84927 5768 17024 1678650 | |||
<span style="background:cyan">TS145A.4 1127 101 409 1004 2075 11833 1479 2545 1667855</span> | |||
TS145B.5 445 100 721 2392 4817 20558 3396 5896 1511226 | |||
TS94-3.11 449 100 152 671 5121 50277 3958 12756 1777364 | |||
TS94-5.12 468 100 154 634 4768 44436 3818 12667 1786983 | |||
TS95-2.13 4476 100 179 303 531 3089 414 567 1853314 | |||
TS95-3.14 484 100 186 725 5581 49558 3918 11725 1896543 | |||
TS95-4.15 508 100 235 1211 5315 33145 3685 10173 1872243 | |||
TS95-5.16 469 100 203 927 5304 42207 4038 12739 1894003 | |||
MsmALI 209 102 371 3882 9790 58310 7882 21331 1647404 | |||
MsmPS-copy 497 100 341 1399 4492 43636 3497 8361 1738127 | |||
== AMOScmp.454 (CBCB) == | |||
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv | |||
<span style="background:yellow">TS146e4.7 213 71 1454 3954 9800 62523 7725 15060 1645592 208920</span> # original reads | |||
<span style="background:yellow">TS146e4.7 168 56 1312 4609 13025 62537 9908 24440 1664607 194003</span> # alignment trimmed reads | |||
== CA.bog.454 (CBCB) == | |||
TS145A.4 48 131 883 9320 56233 210376 37056 112225 1778716 196515 | |||
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv | |||
TS145A.4 39 1135 12340 37904 61833 166391 44902 83237 1751181 227444 | |||
TS146e4.7 111 1039 5766 11729 21286 105399 16055 22905 1782182 200113 | |||
== newbler.deNovo.454 (CBCB) == | |||
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv | |||
<span style="background:cyan">TS145A.4 48 131 883 9320 56233 210376 37056 112225 1778716 196515</span> | |||
TS145B.5 117 104 2185 7850 22256 79935 15147 34389 1772299 203614 | TS145B.5 117 104 2185 7850 22256 79935 15147 34389 1772299 203614 | ||
TS146-3.6 441 100 248 398 497 106331 4295 37197 1894514 202108 | TS146-3.6 441 100 248 398 497 106331 4295 37197 1894514 202108 | ||
TS146e4.7 99 135 926 13013 24440 147105 17974 36272 1779476 198936 | <span style="background:yellow">TS146e4.7 99 135 926 13013 24440 147105 17974 36272 1779476 198936</span> | ||
TS146e5A.8 1183 98 326 770 1953 19970 1446 2812 1710725 350637 | TS146e5A.8 1183 98 326 770 1953 19970 1446 2812 1710725 350637 | ||
TS146e5B.9 432 100 191 325 462 212980 4411 106361 1905607 195632 | TS146e5B.9 432 100 191 325 462 212980 4411 106361 1905607 195632 | ||
Line 162: | Line 238: | ||
TS147e8.10 86 132 1662 8952 35506 122338 22610 55697 1944471 221774 | TS147e8.10 86 132 1662 8952 35506 122338 22610 55697 1944471 221774 | ||
TS94-3.11 723 96 238 361 456 395484 | TS94-3.11 723 96 238 361 456 395484* 2934 91958 2121818 224389 | ||
TS94-5.12 67 103 778 5001 46879 173674 28164 82955 1887011 226423 | TS94-5.12 67 103 778 5001 46879 173674 28164 82955 1887011 226423 | ||
Line 168: | Line 244: | ||
TS95-4.15 58 111 387 11498 59591 169655 34241 89436 1986032 186252 | TS95-4.15 58 111 387 11498 59591 169655 34241 89436 1986032 186252 | ||
TS95-5.16 58 103 286 5198 59620 188214 34081 115574 1976737 186181 | TS95-5.16 58 103 286 5198 59620 188214 34081 115574 1976737 186181 | ||
== newbler.deNovo.454 (WUSTL) == | |||
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv | |||
<span style="background:cyan">TS145A.4 62 106 386 3049 46046 166070 28692 82515 1778941</span> | |||
TS145B.5 130 101 2104 7697 18636 79979 13620 34155 1770674 | |||
TS146-3.6 384 100 349 447 1527 83020 4885 28235 1876197 | |||
<span style="background:yellow">TS146e4.7 145 101 1901 7807 17092 73674 12238 24421 1774620</span> | |||
TS146e5A.8 1199 100 591 932 1529 7108 1199 1566 1438364 much worse | |||
TS146e5B.9 112 111 490 1598 20451 129166 16084 52376 1801428 | |||
TS147e8.10 136 100 1197 5950 17566 140484 14275 41840 1941435 | |||
TS94-3.11 567 111 332 408 471 284245* 3642 93923 2065443 | |||
TS94-5.12 87 130 375 1511 19610 189466 21686 71682 1886713 | |||
TS95-2.13 78 116 350 8420 41426 136894 25419 73086 1982724 | |||
TS95-4.15 60 112 373 2519 59614 200840 33056 115852 1983394 | |||
TS95-5.16 50 132 463 20422 73626 191648 39557 89465 1977893 | |||
== newbler.refMapper.454 (CBCB) == | |||
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv | |||
<span style="background:yellow">TS146e4.7 254 101 206 851 6803 72759 6408 25310 1627809 229435</span> | |||
== best == | == best == | ||
Generated by merging newbler with velvet/SOAPdenovo contigs using minimus2 | |||
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv | |||
FR1LH1.1 169 47 307 3469 13513 113154 10746 29975 1816093 210847 | FR1LH1.1 169 47 307 3469 13513 113154 10746 29975 1816093 210847 | ||
FR1LH3.2 159 47 162 2932 13757 113158 11288 37473 1794914 221522 | FR1LH3.2 159 47 162 2932 13757 113158 11288 37473 1794914 221522 | ||
FR1LH6.3 157 50 269 3785 15911 126755 11603 29272 1821689 211256 | FR1LH6.3 157 50 269 3785 15911 126755 11603 29272 1821689 211256 | ||
TS145A.4 48 131 883 9320 56233 210376 37056 112225 1778716 196515 | <span style="background:cyan">TS145A.4 48 131 883 9320 56233 210376 37056 112225 1778716 196515</span> | ||
TS145B.5 53 71 1018 11425 57301 147750 33590 83876 1780320 198279 | TS145B.5 53 71 1018 11425 57301 147750 33590 83876 1780320 198279 | ||
TS146-3.6 441 100 248 398 497 106331 4295 37197 1894514 202108 | TS146-3.6 441 100 248 398 497 106331 4295 37197 1894514 202108 | ||
TS146e4.7 99 135 926 13013 24440 147105 17974 36272 1779476 198936 | <span style="background:yellow">TS146e4.7 99 135 926 13013 24440 147105 17974 36272 1779476 198936</span> | ||
TS146e5A.8 1183 98 326 770 1953 19970 1446 2812 1710725 350637 | TS146e5A.8 1183 98 326 770 1953 19970 1446 2812 1710725 350637 | ||
TS146e5B.9 432 100 191 325 462 212980 4411 106361 1905607 195632 | TS146e5B.9 432 100 191 325 462 212980 4411 106361 1905607 195632 | ||
Line 198: | Line 299: | ||
* 126x Illumina & 15x 454 | * 126x Illumina & 15x 454 | ||
. elem min q1 q2 q3 max mean n50 sum cvg | |||
Illumina 6536457 36 36 36 36 36 36.00 36 235312452 126 | |||
454 85737 30 247 377 453 626 340.94 422 29231201 15 | |||
* Read assemblies: | * Read assemblies: | ||
#ctgs min q1 q2 q3 max mean n50 sum 0cgv | |||
AMOScmp.Illumina | AMOScmp.Illumina (soap2 -v5 -g0 -f2) 1918 36 59 124 805 28850 838 3234 1608620 284953 | ||
SOAPdenovo.Illumina* | AMOScmp.Illumina (soap1 -v3 -g3 -f1 -c35) 1717 36 56 100 528 28868 964 4879 1656159 255687 | ||
velvet.Illumina | SOAPdenovo.Illumina* 1798 45 127 459 1381 13456 991 2225 1781905 230976 | ||
velvet.Illumina 1002 45 398 1154 2228 14317 1575 2707 1578999 355737 | |||
AMOScmp.454 | AMOScmp.454 166 116 1110 3614 11776 110905 9952 24150 1652197 203385 | ||
CA.454 203 1002 2802 6431 12196 41845 8646 13047 1755208 245426 | CA.454 203 1002 2802 6431 12196 41845 8646 13047 1755208 245426 | ||
newbler.454** 48 131 883 9320 56233 210376 37056 112225 1778716 196515 | newbler.454** 48 131 883 9320 56233 210376 37056 112225 1778716 196515 | ||
SOAPdenovo.454 6531 45 103 179 306 1806 237 321 1553899 478469 | SOAPdenovo.454 6531 45 103 179 306 1806 237 321 1553899 478469 | ||
velvet.454 21165 45 53 65 83 548 73 74 1561277 821841 | velvet.454 21165 45 53 65 83 548 73 74 1561277 821841 | ||
SOAPdenovo.Illumina_454 5795 45 75 184 | AMOScmp.Illumina_454(soap1 -v3 -g3 -f1 -c35) 274 36 46 86 4129 117510 6136 34524 1681406 186848 | ||
velvet.Illumina_454 4229 45 135 269 | SOAPdenovo.Illumina_454 5795 45 75 184 425 3983 317 588 1840810 243391 | ||
velvet.Illumina_454 4229 45 135 269 507 3529 384 584 1626062 396716 | |||
Contig assemblies(no singletons): | Contig assemblies(no singletons): | ||
Line 234: | Line 341: | ||
* 94x Illumina & 85x 454 | * 94x Illumina & 85x 454 | ||
. elem min q1 q2 q3 max mean n50 sum | |||
454 449545 29 255 390 463 1190 349.92 433 157306990 | |||
Illumina 4886376 36 36 36 36 36 36.00 36 175909536 | |||
* Read assemblies: | * Read assemblies: | ||
#ctgs min q1 q2 q3 max mean n50 sum 0cgv | #ctgs min q1 q2 q3 max mean n50 sum 0cgv | ||
Line 241: | Line 353: | ||
AMOScmp.454 255 52 585 2580 8321 61854 6531 16349 1665535 225050 | AMOScmp.454 255 52 585 2580 8321 61854 6531 16349 1665535 225050 | ||
newbler.454 723 96 238 361 456 395484 2934 91958 2121818 224389 | newbler.454 723 96 238 361 456 395484 2934 91958 2121818 224389 | ||
newbler.454.1000+ 44 1007 2480 9817 64051 395484 42923 106659 1888613 | |||
CA.454 55 1001 1777 20479 51987 188883 34348 81660 1889177 244396 | |||
=== newbler.454 === | |||
* ReadStatus counts | |||
total % | |||
Assembled 435637 96.91 | |||
Singleton 8178 1.82 | |||
PartiallyAssembled 4511 1 | |||
Outlier 856 0.19 | |||
Repeat 363 0.08 | |||
total 449545 100 | |||
* Location | |||
/fs/szattic-asmg4/methanobrevibacter_smithii/Assembly.CBCB/TS94-3.11/newbler.454/ | |||
=== CA.bog === | |||
. elem min q1 q2 q3 max mean n50 sum | |||
ctg 55 1001 1777 20479 51987 188883 34348.67 81660 1889177 | |||
deg 922 64 341 457 515 13313 460.62 489 424691 | |||
* Location | |||
/fs/szattic-asmg4/methanobrevibacter_smithii/Assembly.CBCB/TS94-3.11/CA.bog/ |
Latest revision as of 18:16, 26 March 2010
Data
NCBI
1 complete + 2 draft assembly strains:
Methanobrevibacter smithii ATCC 35061
- complete: NC_009515.1 1,853,160bp 31.03%GC
- Ref: /fs/szdata/ncbi/ftp.ncbi.nih.gov/genomes/Bacteria/Methanobrevibacter_smithii_ATCC_35061/*fna
- Published: Genomic and metabolic adaptations of Methanobrevibacter smithii to the human gut PNAS
- Assembled: Phrap and PCAP
- IS elements:
cat NC_009515.gff | grep "IS%20element" | grep CDS | awk '{print $1,$2,$3,$4,$5,$6,$7,$5-$4+1}' NC_009515.1 RefSeq CDS 504499 505509 . - 1011 NC_009515.1 RefSeq CDS 505464 505748 . - 285 NC_009515.1 RefSeq CDS 508688 508984 . + 297 NC_009515.1 RefSeq CDS 509066 509491 . + 426 NC_009515.1 RefSeq CDS 509745 509900 . + 156 NC_009515.1 RefSeq CDS 1542734 1543111 . - 378 NC_009515.1 RefSeq CDS 1543120 1543254 . - 135 NC_009515.1 RefSeq CDS 1543220 1543417 . - 198
~/bin/RepeatSearch.amos -D REPEATLEN=36 ms . elem min q1 q2 q3 max mean n50 sum repeats.36+ 317 36 43 63 104 3732 182.96 1363 57999 uniq.36+ 309 1 24 109 2195 97573 5809.58 42534 1795160 repeats.350+ 26 386 691 1386 1406 3732 1369.12 1390 35597 uniq.350+ 25 1 13517 52288 90848 258405 72702.48 131389 1817562
Methanobrevibacter smithii DSM 2374
- draft: NZ_ABYV02000000
- 1,727,775 bp
- 25 contigs: NZ_ABYV02000001 .. NZ_ABYV02000025
Methanobrevibacter smithii DSM 2375
- draft: NZ_ABYW00000000
- 1,704,865 bp
- 24 contigs: NZ_ABYW01000001 .. NZ_ABYW01000024
Methanobrevibacter smithii DSM 11975
- progress
WUSTL (Gordon Lab)
/fs/szattic-asmg4/methanobrevibacter_smithii/Data
- 22 strains
- half-way through sequencing (should have all the data by early to mid-January)
- right now (--Dpuiu 15:19, 15 December 2009 (EST)):
- 10 strains sequenced by GAII Illumina (36mers) with 3-8 million reads per strain (coverage is 50-150x),
- 12 strains sequenced by 454-Titanium sequencing, with 20,000 to 90,000 reads per strain (coverage is ~5-20x).
- 7 strains sequenced by Illumina and 454
Illumina: 36 bp single reads
nl strain reads bases cvg 1 FR1LH1 5186537 186715332 100 2 FR1LH3 7211080 259598880 140 3 FR1LH6 4968262 178857432 96 4 TS145A 6536457 235312452 126 5 TS145B 8277390 297986040 160 TS146e4 26899427 968379372 522 # 8417021 of the reads (~ 160X cvg) have qual==32 6 TS94-3 4886376 175909536 94 7 TS94-5 4785200 172267200 92 8 TS95-2 2896065 104258340 56 9 TS95-3 5064150 182309400 98 10 TS95-4 3557512 128070432 69 11 TS95-5 4559830 164153880 88
454: avg 342 bp single reads
nl strain reads bases cvg 1 TS145A 83667 28587141 15 2 TS145B 45203 15720997 8 3 TS146-3 49854 17608862 10 4 TS146e4 58633 18306825 10 5 TS146e5A 27844 8311560 4 6 TS146e5B 73182 23547825 13 7 TS147e8 68487 20662109 11 8 TS94-3 449545 157306990 85 9 TS94-5 76513 26734802 14 10 TS95-2 73255 25779806 14 11 TS95-4 85737 29231201 16 12 TS95-5 96757 35000794 19
Illumina & 454 : 7 strands have both Illumina & 454 reads
Illumina 454
--------------------------- ---------------------------
nl strand #reads #bases cvg #read #bases cvg avg%idFinishedGenome
1 FR1LH1 5186537 186715332 100 . . . 98
2 FR1LH3 7211080 259598880 140 . . .
3 FR1LH6 4968262 178857432 96 . . .
4* TS145A 6536457 235312452 126 83667 28587141 15 98
5 TS145B 8277390 297986040 160 45203 15720997 8
6 TS146-3 . . . 49854 17608862 10
7 TS146e4 26899427 968379372 522 58633 18306825 10
8* TS146e5A . . . 27844 8311560 4 98
9 TS146e5B . . . 73182 23547825 13
10 TS147e8 . . . 68487 20662109 11
11* TS94-3 4886376 175909536 94 449545 157306990 85 92
12 TS94-5 4785200 172267200 92 76513 26734802 14
13 TS95-2 2896065 104258340 56 73255 25779806 14
14 TS95-3 5064150 182309400 98 . . .
15 TS95-4 3557512 128070432 69 85737 29231201 16
16 TS95-5 4559830 164153880 88 96757 35000794 19
nl strand IlluminaGC 454GC 1 FR1LH1 38.89 . 2 FR1LH3 33.33 . 3 FR1LH6 33.33 . 4 TS145A 41.67 31.54 5 TS145B 36.11 31.52 6 TS146-3 . 31.85 7 TS146e4 33.33 31.72 8 TS146e5A . 32.77 9 TS146e5B . 31.96 10 TS147e8 . 31.11 11 TS94-3 30.56 30.67 12 TS94-5 30.56 30.69 13 TS95-2 36.11 30.60 14 TS95-3 33.33 . 15 TS95-4 30.56 30.70 16 TS95-5 33.33 30.56
Based on nucmer alignments of finished genome and newbler contigs looks like
- TS145A.4 & TS146e5A.8 99% id
- TS145A.4 & TS94-3.11 92% id
Assembly
- Available online at: ftp://ftp.cbcb.umd.edu/pub/data/dpuiu/Methanobrevibacter_smithii/
- newbler assemblies are generally the "best": longer contigs, fewer 0cvg regions ...
- CBCB velvet & newbler assemblies are slightly better than the WUSTL ones
- velvet contigs slightly longer than the SOAPdenovo ones;
- velvet assemblies contain slightly fewer bases than SOAPdenovo ones & have more 0cvg regions (compared to the reference)
- can further merge the newbler(454) & velvet/SOAPdenovo(Illumina) assemblies; would merge some contigs together
AMOScmp.Illumina (CBCB)
#ctgs min q1 q2 q3 max mean n50 sum 0cgv
FR1LH1.1 1684 36 55 109 650 33114 958 5106 1614205 275309
FR1LH3.2 1642 36 55 106 631 33115* 984 5331 1616391 273301
FR1LH6.3 1649 36 55 108 660 26387 979 5180 1615744 273654
TS145A.4 1918 36 59 124 805 28850 838 3234 1608620 284953
TS145B.5 1723 36 58 110 647 33114 936 4463 1612823 277233
TS94-3.11 ???
TS94-5.12 8859 36 56 85 155 10840 159 242 1408957 658124
TS95-2.13 9535 36 55 82 147 7508 144 209 1380166 695098
TS95-3.14 9152 36 56 85 154 10159 156 238 1436137 642175
TS95-4.15 8859 36 56 85 155 10840 159 242 1408957 658124
TS95-5.16 9238 36 55 84 152 10160 155 235 1435894 645655
SOAPdenovo.Illumina (CBCB)
#ctgs min q1 q2 q3 max mean n50 sum 0cgv
FR1LH1.1 774 45 75 137 2158 50285 2353 10320 1821758 218968
FR1LH3.2 767 45 73 108 977 56788* 2371 14018 1818948 221732
FR1LH6.3 757 45 74 129 1396 50013 2402 12294 1818674 219385
TS145A.4 1798 45 127 459 1381 13456 991 2225 1781905 230976
TS145B.5 1098 45 83 306 2231 20523 1631 4869 1791302 216380
TS94-3.11 1405 45 59 98 634 35690 1349 7940 1895871 249447
TS94-5.12 1562 45 58 101 706 25048 1216 6456 1899512 251003
TS95-2.13 6611 45 131 215 364 2994 281 375 1863491 404797
TS95-3.14 1687 45 58 85 388 25368 1184 6931 1997443 213660
TS95-4.15 1562 45 58 101 706 25048 1216 6456 1899512 251003
TS95-5.16 1259 45 63 85 285 43875 1576 12739 1984835 209415
velvet.Illumina (CBCB)
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv FR1LH1.1 518 45 73 158 3131 67401 3491 16101 1808646 219880 FR1LH3.2 532 45 72 105 729 111372 3382 27042 1799368 227816 FR1LH6.3 547 45 72 111 992 85069 3309 20751 1810235 220469 TS145A.4 1002 45 398 1154 2228 14317 1575 2707 1578999 355737 TS145B.5 687 45 105 773 3424 31627 2567 6873 1763953 223089 TS146e4.7 375 45 71 129 1127 153113* 4740 40215 1777614 209640 # 522X cvg TS146e4.7.filt 7641 45 99 143 218 1244 177 207 1355197 626004 # 150X cvg (only q32 reads) TS94-3.11 738 45 63 102 755 99478 2526 17184 1864601 250197 TS94-5.12 751 45 63 98 887 51451 2455 15583 1844453 259266 TS95-2.13 7235 45 107 198 349 2547 265 370 1922674 382074 TS95-3.14 1018 45 68 85 341 60536 1934 15001 1969398 211351 TS95-4.15 751 45 63 98 887 51451 2455 15583 1844453 259266 TS95-5.16 881 45 65 76 350 47906 2226 15468 1961518 212351
velvet.Illumina (WUSTL)
strain elem min q1 q2 q3 max mean n50 sum
FR1LH1.1 403 101 282 1950 5891 41268 4388 10473 1768436
FR1LH3.2 274 100 267 1375 7601 88330* 6469 19398 1772541
FR1LH6.3 291 100 224 1225 7594 84927 5768 17024 1678650
TS145A.4 1127 101 409 1004 2075 11833 1479 2545 1667855
TS145B.5 445 100 721 2392 4817 20558 3396 5896 1511226
TS94-3.11 449 100 152 671 5121 50277 3958 12756 1777364
TS94-5.12 468 100 154 634 4768 44436 3818 12667 1786983
TS95-2.13 4476 100 179 303 531 3089 414 567 1853314
TS95-3.14 484 100 186 725 5581 49558 3918 11725 1896543
TS95-4.15 508 100 235 1211 5315 33145 3685 10173 1872243
TS95-5.16 469 100 203 927 5304 42207 4038 12739 1894003
MsmALI 209 102 371 3882 9790 58310 7882 21331 1647404
MsmPS-copy 497 100 341 1399 4492 43636 3497 8361 1738127
AMOScmp.454 (CBCB)
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv TS146e4.7 213 71 1454 3954 9800 62523 7725 15060 1645592 208920 # original reads TS146e4.7 168 56 1312 4609 13025 62537 9908 24440 1664607 194003 # alignment trimmed reads
CA.bog.454 (CBCB)
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv TS145A.4 39 1135 12340 37904 61833 166391 44902 83237 1751181 227444 TS146e4.7 111 1039 5766 11729 21286 105399 16055 22905 1782182 200113
newbler.deNovo.454 (CBCB)
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv TS145A.4 48 131 883 9320 56233 210376 37056 112225 1778716 196515 TS145B.5 117 104 2185 7850 22256 79935 15147 34389 1772299 203614 TS146-3.6 441 100 248 398 497 106331 4295 37197 1894514 202108 TS146e4.7 99 135 926 13013 24440 147105 17974 36272 1779476 198936 TS146e5A.8 1183 98 326 770 1953 19970 1446 2812 1710725 350637 TS146e5B.9 432 100 191 325 462 212980 4411 106361 1905607 195632 TS147e8.10 86 132 1662 8952 35506 122338 22610 55697 1944471 221774 TS94-3.11 723 96 238 361 456 395484* 2934 91958 2121818 224389 TS94-5.12 67 103 778 5001 46879 173674 28164 82955 1887011 226423 TS95-2.13 78 108 626 5843 45099 140027 25376 88829 1979337 187754 TS95-4.15 58 111 387 11498 59591 169655 34241 89436 1986032 186252 TS95-5.16 58 103 286 5198 59620 188214 34081 115574 1976737 186181
newbler.deNovo.454 (WUSTL)
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv TS145A.4 62 106 386 3049 46046 166070 28692 82515 1778941 TS145B.5 130 101 2104 7697 18636 79979 13620 34155 1770674 TS146-3.6 384 100 349 447 1527 83020 4885 28235 1876197 TS146e4.7 145 101 1901 7807 17092 73674 12238 24421 1774620 TS146e5A.8 1199 100 591 932 1529 7108 1199 1566 1438364 much worse TS146e5B.9 112 111 490 1598 20451 129166 16084 52376 1801428 TS147e8.10 136 100 1197 5950 17566 140484 14275 41840 1941435 TS94-3.11 567 111 332 408 471 284245* 3642 93923 2065443 TS94-5.12 87 130 375 1511 19610 189466 21686 71682 1886713 TS95-2.13 78 116 350 8420 41426 136894 25419 73086 1982724 TS95-4.15 60 112 373 2519 59614 200840 33056 115852 1983394 TS95-5.16 50 132 463 20422 73626 191648 39557 89465 1977893
newbler.refMapper.454 (CBCB)
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv
TS146e4.7 254 101 206 851 6803 72759 6408 25310 1627809 229435
best
Generated by merging newbler with velvet/SOAPdenovo contigs using minimus2
strain #ctgs min q1 q2 q3 max mean n50 sum 0cgv FR1LH1.1 169 47 307 3469 13513 113154 10746 29975 1816093 210847 FR1LH3.2 159 47 162 2932 13757 113158 11288 37473 1794914 221522 FR1LH6.3 157 50 269 3785 15911 126755 11603 29272 1821689 211256 TS145A.4 48 131 883 9320 56233 210376 37056 112225 1778716 196515 TS145B.5 53 71 1018 11425 57301 147750 33590 83876 1780320 198279 TS146-3.6 441 100 248 398 497 106331 4295 37197 1894514 202108 TS146e4.7 99 135 926 13013 24440 147105 17974 36272 1779476 198936 TS146e5A.8 1183 98 326 770 1953 19970 1446 2812 1710725 350637 TS146e5B.9 432 100 191 325 462 212980 4411 106361 1905607 195632 TS147e8.10 86 132 1662 8952 35506 122338 22610 55697 1944471 221774 TS94-3.11 38 55 764 9841 66968 395587 49738 144195 1890074 223341 TS94-5.12 49 65 766 7044 51688 285237 38534 111850 1888200 225957 TS95-2.13 57 110 1383 9455 50972 165660 34754 94352 1981005 182973 TS95-3.14 408 47 141 419 5698 52172 4663 16369 1902753 202507 TS95-4.15 79 47 147 2317 16017 200685 26493 138013 2092967 179552 TS95-5.16 47 69 311 14482 77588 188308 42099 115630 1978688 184590
Strains
TS145A.4
- 126x Illumina & 15x 454
. elem min q1 q2 q3 max mean n50 sum cvg Illumina 6536457 36 36 36 36 36 36.00 36 235312452 126 454 85737 30 247 377 453 626 340.94 422 29231201 15
- Read assemblies:
#ctgs min q1 q2 q3 max mean n50 sum 0cgv AMOScmp.Illumina (soap2 -v5 -g0 -f2) 1918 36 59 124 805 28850 838 3234 1608620 284953 AMOScmp.Illumina (soap1 -v3 -g3 -f1 -c35) 1717 36 56 100 528 28868 964 4879 1656159 255687 SOAPdenovo.Illumina* 1798 45 127 459 1381 13456 991 2225 1781905 230976 velvet.Illumina 1002 45 398 1154 2228 14317 1575 2707 1578999 355737 AMOScmp.454 166 116 1110 3614 11776 110905 9952 24150 1652197 203385 CA.454 203 1002 2802 6431 12196 41845 8646 13047 1755208 245426 newbler.454** 48 131 883 9320 56233 210376 37056 112225 1778716 196515 SOAPdenovo.454 6531 45 103 179 306 1806 237 321 1553899 478469 velvet.454 21165 45 53 65 83 548 73 74 1561277 821841 AMOScmp.Illumina_454(soap1 -v3 -g3 -f1 -c35) 274 36 46 86 4129 117510 6136 34524 1681406 186848 SOAPdenovo.Illumina_454 5795 45 75 184 425 3983 317 588 1840810 243391 velvet.Illumina_454 4229 45 135 269 507 3529 384 584 1626062 396716
Contig assemblies(no singletons):
#ctgs min q1 q2 q3 max mean n50 sum 0cgv minimus2.SOAPdenovo.Illumina-newbler.454 41 46 883 17898 61580 287497 43446 114143 1781287 197555 # 3 contigs don't contain any newbler original ctg minimus2.velvet.Illumina-newbler.454 36 289 1802 38243 70184 210459 49402 113849 1778497 197611 # all contigs contain at least 1 newbler original ctg
Contig assemblies(include singletons):
#ctgs min q1 q2 q3 max mean n50 sum 0cgv minimus2.SOAPdenovo.Illumina-newbler.454 83 45 97 425 17898 287497 21689 114143 1800192 196885 minimus2.velvet.Illumina-newbler.454 61 45 677 2956 43814 210459 29821 113849 1819092 197424
TS146e5A.8
- 4x 454
- Read assemblies:
#ctgs min q1 q2 q3 max mean n50 sum 0cgv AMOScmp.454 736 56 675 1452 2902 17360 2092 3319 1540404 324145 newbler.454* 1183 98 326 770 1953 19970 1446 2812 1710725 350637
TS94-3.11
- 94x Illumina & 85x 454
. elem min q1 q2 q3 max mean n50 sum 454 449545 29 255 390 463 1190 349.92 433 157306990 Illumina 4886376 36 36 36 36 36 36.00 36 175909536
- Read assemblies:
#ctgs min q1 q2 q3 max mean n50 sum 0cgv SOAPdenovo.Illumina 1405 45 59 98 634 35690 1349 7940 1895871 249447 velvet.Illumina 738 45 63 102 755 99478 2526 17184 1864601 250197 AMOScmp.454 255 52 585 2580 8321 61854 6531 16349 1665535 225050 newbler.454 723 96 238 361 456 395484 2934 91958 2121818 224389 newbler.454.1000+ 44 1007 2480 9817 64051 395484 42923 106659 1888613 CA.454 55 1001 1777 20479 51987 188883 34348 81660 1889177 244396
newbler.454
- ReadStatus counts
total % Assembled 435637 96.91 Singleton 8178 1.82 PartiallyAssembled 4511 1 Outlier 856 0.19 Repeat 363 0.08 total 449545 100
- Location
/fs/szattic-asmg4/methanobrevibacter_smithii/Assembly.CBCB/TS94-3.11/newbler.454/
CA.bog
. elem min q1 q2 q3 max mean n50 sum ctg 55 1001 1777 20479 51987 188883 34348.67 81660 1889177 deg 922 64 341 457 515 13313 460.62 489 424691
- Location
/fs/szattic-asmg4/methanobrevibacter_smithii/Assembly.CBCB/TS94-3.11/CA.bog/