Metagenoms: Difference between revisions
Jump to navigation
Jump to search
(37 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= | = Web sites = | ||
* [http:// | * [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi NCBI Bacterial Genomes] 647 Complete, 1067 in Progress | ||
* [http://www. | * [http://fames.jgi-psf.org/ JGI Fidelity of Analysis of Metagenomic Samples (FAMeS)] | ||
* [http://www. | * [http://fames.jgi-psf.org/cgi-bin/dataset_desc.pl?dataset=soil JGI Simultaed High Complexity (SIMHC)] | ||
* [http://nihroadmap.nih.gov/hmp/ Human Microbiome Project (HMP) @ NIH] | |||
* [http://genome.wustl.edu/pub/organism/Microbes/Human_Gut_Microbiome/ WUSTL Human Gut Microbiome (HGMI)] 41 genomes | |||
* [http://www.hgsc.bcm.tmc.edu/microbiome-index.xsp Baylor HMP] list of 353 genomes targeted for sequencing by the 4 centers | |||
* [http://www.jcvi.org/cms/research/projects/hmp/overview/ JCVI HMP] 50 genomes (oral,skin,vagina) | |||
* [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi JGI Integrated Microbial Genomes(IMG)] | |||
* [http://img.jgi.doe.gov/m/doc/about_index.html JGI Integrated Microbial Genomes for Metagenomics (IMG/M)] | |||
* [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=16729 HGMI at NIH] | |||
* [ftp://ftp.ncbi.nih.gov/pub/TraceDB/16729_uncultured_bacteria/ HGMI TA] ~150K Sanger traces; trimming points are given | |||
* [http://en.wikipedia.org/wiki/Superfamily Wikipedia Taxonomic Ranks] | |||
domain | |||
phylum | |||
class | |||
order | |||
family | |||
genus | |||
species | |||
strain | |||
* Example: Pseudomonas aeruginosa | |||
domain: Bacteria | |||
phylum: Proteobacteria | |||
class: Gammaproteobacteria | |||
order: Pseudomonadales | |||
family: Pseudomonadaceae | |||
genus: Pseudomonas | |||
species:Pseudomonas aeruginosa group | |||
strain: Pseudomonas aeruginosa | |||
* [http:// | = Articles = | ||
* [http:// | |||
* [http:// | * [http://dnaresearch.oxfordjournals.org/cgi/reprint/dsm018v1 Kurokawa] | ||
* [http://www.sciencemag.org/cgi/content/abstract/312/5778/1355 Gill] | |||
* [http://www.nature.com/nmeth/journal/v4/n6/pdf/nmeth1043.pdf JGI] | |||
* [http://www.nature.com/nature/journal/v449/n7164/full/nature06244.html HMP Nature Oct 2007] | |||
* [http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/HMPP_Proposal.pdf Human Microbiome Pilot Project (HMPP)] | |||
* [http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/HGMISeq.pdf Human Gut Microbiome Initiative (HGMI)] Need for more RefSeqs; sequence the genomes of 100 cultured representatives of the phylogenetic diversity in the human gut microbiota | |||
* [http://www.nature.com/nrmicro/journal/v6/n6/pdf/nrmicro1901.pdf Nature Reviews: Microbiology in the post-genomic era] | |||
* [http://nar.oxfordjournals.org/cgi/reprint/gkm846v1 IMG NAR 2007] | |||
* [http://www.nature.com/nmeth/journal/v4/n1/pdf/nmeth976.pdf Accurate phylogenetic classification of variable-length DNA fragments., Nature Methods Jan 2007] | |||
= HMP = | |||
NIH Roadmap | |||
# Sequence the genomes of 200 microbes that have been isolated from the human body; | |||
# Recruit a set of healthy donors and obtain samples from a set of body regions | |||
# Perform initial 16S rDNA gene metagenomic sequence analyses to estimate the complexity of the microbiota at these sites. | |||
Centers: Baylor, Broad, JCVI, WUSTL | |||
* ~10times more bateria cells than human cells in the body | |||
* small-subunit (16S) ribosomal RNA gene-sequence-based surveys: | |||
* found in all microorganisms | |||
* has enough sequence conservation for accurate alignment | |||
* has enough variation for phylogenetic analyses. | |||
* skin, mouth, oesophagus, stomach, colon and vagina | |||
* largest reported data sets are for the gut | |||
* most of the 10–100 trillion microorganisms in the human gastrointestinal tract live in the colon. | |||
* more than 90% of all phylogenetic types (phylotypes) of colonic bacteria belong to just 2 of the 70 known divisions (phyla) in the domain Bacteria: the Firmicutes and the Bacteroidetes. | |||
Firmicutes (Gram-positive bacteria) : 639 Genome Sequences | |||
* Bacilli 472 | |||
* Clostridia 106 | |||
* Erysipelotrichi 1 | |||
* Mollicutes 60 | |||
* Thermolithobacteria | |||
* unclassified Firmicutes sensu stricto | |||
* environmental samples | |||
Bacteroidetes : 53 Genome Sequences | |||
* Bacteroidetes (class) 22 | |||
* Flavobacteria 21 | |||
* Sphingobacteria 8 | |||
* unclassified Bacteroidetes 1 | |||
* environmental samples 1 | |||
Actinobacteria | |||
* In colon, the differences between individuals are greater than the differences between different sampling sites in one individuall | |||
* Comunirties are usually stable over time | |||
= SIMHC = | |||
== Data == | |||
=== Online === | |||
* 113 reference genomes : 89 complete, 24 incomplete | |||
#elem min max mean median n50 sum | |||
NC_chromosomes 103 943016 8264687 3569251 3481691 4326849 367,632,882 | |||
NC_plasmids 70 3361 821788 147207 96488 300758 10,304,479 | |||
NC_* 173 3361 8264687 2184609 1966858 4317977 377,937,361 # come from 89 genomes & 70 plasmids ; some genomes contain multiple chromosomes | |||
NZ_* 3505 185 1802798 26389 9051 73891 92,494,763 # come from 24 genomes | |||
N*_total 3678 185 8264687 127904 9940 3561584 470,432,124 | |||
* 118084 Sanger reads from 3 insert libs (small,med,large) | |||
#reads min max mean median n50 sum | |||
116771 43 3754 950 968 982 110875383 | |||
* More than 50% of reads are from Proteobacteria; Gammaproteobacteria | |||
Phylums: | |||
Proteobacteria 72634 | |||
Firmicutes 15021 | |||
Actinobacteria 10744 | |||
Cyanobacteria 6877 | |||
Chlorobi 4982 | |||
Euryarchaeota 3266 | |||
Chloroflexi 1277 | |||
Bacteroidetes 1161 | |||
Deinococcus-Thermus 809 | |||
Total 116771 | |||
* Organisms | |||
Burkholderia 12431 # Which one? B. cenocepacia AU 1054 , B. AMMD, B. sp. 383, B. xenovorans LB400, B. cenocepacia HI2424, B. vietnamiensis G4 | |||
Shewanella 9613 | |||
Rhodopseudomonas 5279 | |||
... | |||
Ferroplasma 471 | |||
Pediococcus 456 | |||
Oenococcus 422 # Firmicutes, Bacilli, Lactobacillales | |||
* read coverage of each organism 0.09 ..0.53X | |||
0.53X: Moorella thermoacetica ATCC 39073 (1426 reads; NC_007644 2,628,784bp 55.79%GC) (Firmicutes) | |||
0.09X: Xylella fastidiosa Ann-1 | |||
=== Local === | |||
* 113 reference genomes : 89 complete, 24 incomplete | |||
* 116771 Sanger reads from 3 insert libs (small,med,large) | |||
* 118084-116771=1313 Moorella and Xylella were discarded !!! | |||
* read coverage of each organism ... | |||
0.23X: Moorella thermoacetica ATCC 39073 (740 reads; NC_007644 2,628,784bp 55.79%GC) (Firmicutes) | |||
== Assemblies == | |||
Contig stats | |||
#elem min max mean median n50 sum singl | |||
phrap-ctg 23398 73 8603 1289 1194 1341 30,163,430 66524 | |||
arachne-ctg 578 240 6300 1878 1822 1985 1,085,508 115300 | |||
jazz-scaff 860 1000 39837 3236 1105 7278 2,783,247 109080 | |||
CA-scaff 4327 1000 48861 1682 1342 1456 7,278,044 76270 # OBT trimming of the reads | |||
CA-scf-unambiguous 4327 1000 26070 1473 1342 1403 6,374,650 | |||
CA-ctg 4491 1000 5252 1419 1340 1388 6,374,863 # 4407 CONTAINED in references | |||
CA-deg 11611 66 5920 850 833 898 9,865,902 | |||
AMOS-ctg 95372 72 7426 975 916 964 92,959,055 5661 # alignment based trimming of the reads; casm-layout "-S -r" have been used | |||
AMOS-ctg-plasmids 2078 159 7426 1007 911 972 2,092,115 | |||
minimus2-ctg 15211 85 4699 1257 1205 1305 19,119,413 82535 # alignment based trimming of the reads | |||
CA: | |||
* Longest scaff: scf7180000043951 48861bp(26070 unambiguous) comes from NC_007968 41221 38.26 Psychrobacter cryohalolentis K5 plasmid 1 | |||
* Longest ctg: ctg7180000029682 5252bp comes from NC_008499 35595 38.51 Lactobacillus brevis ATCC 367 plasmid 2 | |||
* Longest degen: ctg7180000030915 5920bp NC_008608 30722 56.32 Pelobacter propionicus DSM 2379 plasmid pPRO2 | |||
minimus2 on AMOScmp & CA (ctg+degen) : most of CA assemblies were contained in AMOScmp contigs; stats about same as AMOScmp |
Latest revision as of 19:04, 9 June 2008
Web sites
- NCBI Bacterial Genomes 647 Complete, 1067 in Progress
- JGI Fidelity of Analysis of Metagenomic Samples (FAMeS)
- JGI Simultaed High Complexity (SIMHC)
- Human Microbiome Project (HMP) @ NIH
- WUSTL Human Gut Microbiome (HGMI) 41 genomes
- Baylor HMP list of 353 genomes targeted for sequencing by the 4 centers
- JCVI HMP 50 genomes (oral,skin,vagina)
- JGI Integrated Microbial Genomes(IMG)
- JGI Integrated Microbial Genomes for Metagenomics (IMG/M)
- HGMI at NIH
- HGMI TA ~150K Sanger traces; trimming points are given
- Wikipedia Taxonomic Ranks
domain phylum class order family genus species strain
- Example: Pseudomonas aeruginosa
domain: Bacteria phylum: Proteobacteria class: Gammaproteobacteria order: Pseudomonadales family: Pseudomonadaceae genus: Pseudomonas species:Pseudomonas aeruginosa group strain: Pseudomonas aeruginosa
Articles
- Kurokawa
- Gill
- JGI
- HMP Nature Oct 2007
- Human Microbiome Pilot Project (HMPP)
- Human Gut Microbiome Initiative (HGMI) Need for more RefSeqs; sequence the genomes of 100 cultured representatives of the phylogenetic diversity in the human gut microbiota
- Nature Reviews: Microbiology in the post-genomic era
- IMG NAR 2007
- Accurate phylogenetic classification of variable-length DNA fragments., Nature Methods Jan 2007
HMP
NIH Roadmap
- Sequence the genomes of 200 microbes that have been isolated from the human body;
- Recruit a set of healthy donors and obtain samples from a set of body regions
- Perform initial 16S rDNA gene metagenomic sequence analyses to estimate the complexity of the microbiota at these sites.
Centers: Baylor, Broad, JCVI, WUSTL
- ~10times more bateria cells than human cells in the body
- small-subunit (16S) ribosomal RNA gene-sequence-based surveys:
* found in all microorganisms * has enough sequence conservation for accurate alignment * has enough variation for phylogenetic analyses.
- skin, mouth, oesophagus, stomach, colon and vagina
- largest reported data sets are for the gut
- most of the 10–100 trillion microorganisms in the human gastrointestinal tract live in the colon.
- more than 90% of all phylogenetic types (phylotypes) of colonic bacteria belong to just 2 of the 70 known divisions (phyla) in the domain Bacteria: the Firmicutes and the Bacteroidetes.
Firmicutes (Gram-positive bacteria) : 639 Genome Sequences
* Bacilli 472 * Clostridia 106 * Erysipelotrichi 1 * Mollicutes 60 * Thermolithobacteria * unclassified Firmicutes sensu stricto * environmental samples
Bacteroidetes : 53 Genome Sequences
* Bacteroidetes (class) 22 * Flavobacteria 21 * Sphingobacteria 8 * unclassified Bacteroidetes 1 * environmental samples 1
Actinobacteria
- In colon, the differences between individuals are greater than the differences between different sampling sites in one individuall
- Comunirties are usually stable over time
SIMHC
Data
Online
- 113 reference genomes : 89 complete, 24 incomplete
#elem min max mean median n50 sum NC_chromosomes 103 943016 8264687 3569251 3481691 4326849 367,632,882 NC_plasmids 70 3361 821788 147207 96488 300758 10,304,479 NC_* 173 3361 8264687 2184609 1966858 4317977 377,937,361 # come from 89 genomes & 70 plasmids ; some genomes contain multiple chromosomes
NZ_* 3505 185 1802798 26389 9051 73891 92,494,763 # come from 24 genomes
N*_total 3678 185 8264687 127904 9940 3561584 470,432,124
- 118084 Sanger reads from 3 insert libs (small,med,large)
#reads min max mean median n50 sum 116771 43 3754 950 968 982 110875383
- More than 50% of reads are from Proteobacteria; Gammaproteobacteria
Phylums:
Proteobacteria 72634 Firmicutes 15021 Actinobacteria 10744 Cyanobacteria 6877 Chlorobi 4982 Euryarchaeota 3266 Chloroflexi 1277 Bacteroidetes 1161 Deinococcus-Thermus 809 Total 116771
- Organisms
Burkholderia 12431 # Which one? B. cenocepacia AU 1054 , B. AMMD, B. sp. 383, B. xenovorans LB400, B. cenocepacia HI2424, B. vietnamiensis G4 Shewanella 9613 Rhodopseudomonas 5279 ... Ferroplasma 471 Pediococcus 456 Oenococcus 422 # Firmicutes, Bacilli, Lactobacillales
- read coverage of each organism 0.09 ..0.53X
0.53X: Moorella thermoacetica ATCC 39073 (1426 reads; NC_007644 2,628,784bp 55.79%GC) (Firmicutes) 0.09X: Xylella fastidiosa Ann-1
Local
- 113 reference genomes : 89 complete, 24 incomplete
- 116771 Sanger reads from 3 insert libs (small,med,large)
- 118084-116771=1313 Moorella and Xylella were discarded !!!
- read coverage of each organism ...
0.23X: Moorella thermoacetica ATCC 39073 (740 reads; NC_007644 2,628,784bp 55.79%GC) (Firmicutes)
Assemblies
Contig stats
#elem min max mean median n50 sum singl
phrap-ctg 23398 73 8603 1289 1194 1341 30,163,430 66524 arachne-ctg 578 240 6300 1878 1822 1985 1,085,508 115300 jazz-scaff 860 1000 39837 3236 1105 7278 2,783,247 109080
CA-scaff 4327 1000 48861 1682 1342 1456 7,278,044 76270 # OBT trimming of the reads CA-scf-unambiguous 4327 1000 26070 1473 1342 1403 6,374,650 CA-ctg 4491 1000 5252 1419 1340 1388 6,374,863 # 4407 CONTAINED in references CA-deg 11611 66 5920 850 833 898 9,865,902
AMOS-ctg 95372 72 7426 975 916 964 92,959,055 5661 # alignment based trimming of the reads; casm-layout "-S -r" have been used AMOS-ctg-plasmids 2078 159 7426 1007 911 972 2,092,115
minimus2-ctg 15211 85 4699 1257 1205 1305 19,119,413 82535 # alignment based trimming of the reads
CA:
- Longest scaff: scf7180000043951 48861bp(26070 unambiguous) comes from NC_007968 41221 38.26 Psychrobacter cryohalolentis K5 plasmid 1
- Longest ctg: ctg7180000029682 5252bp comes from NC_008499 35595 38.51 Lactobacillus brevis ATCC 367 plasmid 2
- Longest degen: ctg7180000030915 5920bp NC_008608 30722 56.32 Pelobacter propionicus DSM 2379 plasmid pPRO2
minimus2 on AMOScmp & CA (ctg+degen) : most of CA assemblies were contained in AMOScmp contigs; stats about same as AMOScmp