Metagenoms: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Line 62: Line 62:
== SIMHC ==
== SIMHC ==


=== Data (online) ===
* 113 reference genomes : 89 complete, 24 incomplete
* 113 reference genomes : 89 complete, 24 incomplete
* 116771 Sanger reads from 3 insert libs (small,med,large)
* 118084 Sanger reads from 3 insert libs (small,med,large)
   #reads  min    max    mean    median  n50    sum
   #reads  min    max    mean    median  n50    sum
   116771  43      3754    950    968    982    110875383
   116771  43      3754    950    968    982    110875383
Line 72: Line 73:
   ...
   ...
* read coverage of each organism 0.09 ..0.53X
* read coverage of each organism 0.09 ..0.53X
   0.53X: Moorella thermoacetica ATCC 39073 (NC_007644 2,628,784bp 55.79%GC) (Firmicutes)
   0.53X: Moorella thermoacetica ATCC 39073 (1426 reads; NC_007644 2,628,784bp 55.79%GC) (Firmicutes)
   0.09X: Xylella fastidiosa Ann-1
   0.09X: Xylella fastidiosa Ann-1
=== Data (local) ===
* 113 reference genomes : 89 complete, 24 incomplete
* 116771 Sanger reads from 3 insert libs (small,med,large) 
* 118084-116771=1313  Moorella and Xylella were discarded !!!
* read coverage of each organism ...
  0.23X: Moorella thermoacetica ATCC 39073 (740 reads; NC_007644 2,628,784bp 55.79%GC) (Firmicutes)
== Assemblies ==
Contig stats
              #elem  min    max    mean    median  n50    sum            singl
  phrap-ctg    23398  73      8603    1289    1194    1341    30,163,430      66524
  arachne-ctg  578    240    6300    1878    1822    1985    1,085,508      115300
  jazz-scaff  860    1000    39837  3236    1105    7278    2,783,247      109080
  CA-scaff    4327    1000    48,861  1682    1342    1456    7,278,044      76270    # OBT trimming of the reads
  CA-ctg      4491    1000    5252    1419    1340    1388    6,374,863
  CA-deg      11611  66      5920    850    833    898    9,865,902
  AMOS-ctg    95372  72      7426    975    916    964    92,959,055      5661      # alignment based trimming of the reads; casm-layout "-S -r" have been used
  minimus2-ctg 15211  85      4699    1257    1205    1305    19,119,413      82535    # alignment based trimming of the reads

Revision as of 15:46, 6 June 2008

Web sites

Articles

HMP

NIH Roadmap

  1. Sequence the genomes of 200 microbes that have been isolated from the human body;
  2. Recruit a set of healthy donors and obtain samples from a set of body regions
  3. Perform initial 16S rDNA gene metagenomic sequence analyses to estimate the complexity of the microbiota at these sites.

Centers: Baylor, Broad, JCVI, WUSTL

  • ~10times more bateria cells than human cells in the body
  • small-subunit (16S) ribosomal RNA gene-sequence-based surveys:
 * found in all microorganisms 
 * has enough sequence conservation for accurate alignment 
 * has enough variation for phylogenetic analyses.
  • skin, mouth, oesophagus, stomach, colon and vagina
  • largest reported data sets are for the gut
  • most of the 10–100 trillion microorganisms in the human gastrointestinal tract live in the colon.
  • more than 90% of all phylogenetic types (phylotypes) of colonic bacteria belong to just 2 of the 70 known divisions (phyla) in the domain Bacteria: the Firmicutes and the Bacteroidetes.

Firmicutes (Gram-positive bacteria) : 639 Genome Sequences

   * Bacilli    472
   * Clostridia    106
   * Erysipelotrichi    1
   * Mollicutes    60
   * Thermolithobacteria   
   * unclassified Firmicutes sensu stricto   
   * environmental samples    

Bacteroidetes : 53 Genome Sequences

   * Bacteroidetes (class)    22
   * Flavobacteria    21
   * Sphingobacteria    8
   * unclassified Bacteroidetes    1
   * environmental samples    1

Actinobacteria

  • In colon, the differences between individuals are greater than the differences between different sampling sites in one individuall
  • Comunirties are usually stable over time

SIMHC

Data (online)

  • 113 reference genomes : 89 complete, 24 incomplete
  • 118084 Sanger reads from 3 insert libs (small,med,large)
 #reads  min     max     mean    median  n50     sum
 116771  43      3754    950     968     982     110875383
  • More than 50% of reads are from Proteobacteria; Gammaproteobacteria
 Proteobacteria  73261
 Firmicutes      15707
 Actinobacteria  10744
 ...
  • read coverage of each organism 0.09 ..0.53X
 0.53X: Moorella thermoacetica ATCC 39073 (1426 reads; NC_007644 2,628,784bp 55.79%GC) (Firmicutes)
 0.09X: Xylella fastidiosa Ann-1

Data (local)

  • 113 reference genomes : 89 complete, 24 incomplete
  • 116771 Sanger reads from 3 insert libs (small,med,large)
  • 118084-116771=1313 Moorella and Xylella were discarded !!!
  • read coverage of each organism ...
 0.23X: Moorella thermoacetica ATCC 39073 (740 reads; NC_007644 2,628,784bp 55.79%GC) (Firmicutes)

Assemblies

Contig stats

              #elem   min     max     mean    median  n50     sum             singl
 phrap-ctg    23398   73      8603    1289    1194    1341    30,163,430      66524
 arachne-ctg  578     240     6300    1878    1822    1985    1,085,508       115300
 jazz-scaff   860     1000    39837   3236    1105    7278    2,783,247       109080
 CA-scaff     4327    1000    48,861  1682    1342    1456    7,278,044       76270    # OBT trimming of the reads
 CA-ctg       4491    1000    5252    1419    1340    1388    6,374,863
 CA-deg       11611   66      5920    850     833     898     9,865,902
 AMOS-ctg     95372   72      7426    975     916     964     92,959,055      5661      # alignment based trimming of the reads; casm-layout "-S -r" have been used
 minimus2-ctg 15211   85      4699    1257    1205    1305    19,119,413      82535     # alignment based trimming of the reads