Brugia malayi: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Line 87: Line 87:
=== CBCB ===
=== CBCB ===


<nowiki>[Scaffolds]
  [Scaffolds]
TotalScaffolds=10317
  TotalScaffolds=10317
TotalContigsInScaffolds=12753
  TotalContigsInScaffolds=12753
MeanContigsPerScaffold=1.24
  MeanContigsPerScaffold=1.24
MinContigsPerScaffold=1
  MinContigsPerScaffold=1
MaxContigsPerScaffold=53
  MaxContigsPerScaffold=53


[Contigs]
  [Contigs]
TotalContigsInScaffolds=12753
  TotalContigsInScaffolds=12753
TotalBasesInScaffolds=77964006
  TotalBasesInScaffolds=77964006
TotalVarRecords=87058
  TotalVarRecords=87058
MeanContigLength=6113
  MeanContigLength=6113
MinContigLength=273
  MinContigLength=273
MaxContigLength=376744
  MaxContigLength=376744
N50ContigBases=24748
  N50ContigBases=24748


[Reads]
  [Reads]
TotalReadsInput=1178192
  TotalReadsInput=1178192
TotalUsableReads=1173016
  TotalUsableReads=1173016
AvgClearRange=791
  AvgClearRange=791
ContigReads=663383(56.55%)
  ContigReads=663383(56.55%)
BigContigReads=544689(46.43%)
  BigContigReads=544689(46.43%)
SmallContigReads=118694(10.12%)
  SmallContigReads=118694(10.12%)
DegenContigReads=124230(10.59%)
  DegenContigReads=124230(10.59%)
SurrogateReads=295861(25.22%)
  SurrogateReads=295861(25.22%)
PlacedSurrogateReads=44577(3.80%)
  PlacedSurrogateReads=44577(3.80%)
SingletonReads=134119(11.43%)
  SingletonReads=134119(11.43%)
ChaffReads=134119(11.43%)
  ChaffReads=134119(11.43%)


[Coverage]
  [Coverage]
ContigsOnly=6.86
  ContigsOnly=6.86
Contigs_Surrogates=9.47
  Contigs_Surrogates=9.47
Contigs_Degens_Surrogates=9.33
  Contigs_Degens_Surrogates=9.33
AllReads=11.91
  AllReads=11.91
 
</nowiki>


== Files ==  
== Files ==  

Revision as of 15:26, 7 October 2008

Articles

Genome Info

  • 6 chromosomes: 1-5, XY
  • ~ 90M, 30% GC, 32% coding, 15% repeats

Other sequences

Genome Project

Brugia malayi has a diploid genome of approximately 110 Mb, organized in 6 pairs of chromosomes (five pairs of autosomes and one pair of sex chromosomes). In addition to the nuclear genome, B. malayi has a mitochondrial genome of about 14kb, and the genome of the harbored bacterial endosymbiont Wolbachia sp (1-2Mb).

The B. malayi genome project has been completed by The Institute for Genomic Research. Whole Genome Shotgun sequencing was used to obtain more than eight-fold coverage of the genome. The complete genome was assembled into approximately 8200 scaffolds and deposited in GenBank. The accession for the WGS project is AAQA00000000 and consists of sequences AAQA01000001-AAQA01029808.

 * 26,879 good ctgs
 * 2,929 jird contaminants (Example: AAQA01001321 : mouse 99%id hits)
 good ctg len
       #elem   min     max     mean    median  n50     sum
 all   26879   200     611244  3241    1005    19005   87119350
 10K+  1224    10036   611244  41018   23135   60727   50206329
 
 good ctg GC%
       #elem   min     max     mean    median  n50  
 all   26878   0.00    72.30   28.86   28.56   29.46
 10K+  1224    24.38   38.44   30.38   30.43   30.62
 contaminant ctg len
       #elem   min     max     mean    median  n50     sum
 all   2929    200     8994    740     675     763     2167588
 
 contaminant ctg GC%
       #elem   min     max     mean    median  n50  
 all   2929    18.09   75.96   44.1    43.59   44.80

Traces

 Libraries:
   * 2K : bulk of the sequence @TIGR
   * 15-20 K @TIGR
   * 8,000 BAC clones @Children's Hospital Oakland Research Institute.  (!!! no NCBI TA submission)

Trace summary:

  * all:          1,260,215 
  * TRACE_TYPE_CODE 
    * WGS:        1,258,277 
    * TRANSPOSON:     1,437 
    * PRIMER_WALK       501
  * CENTER NAME
    * TIGR:         856,624
    * JCVI          403,591
  * NO BACS !!!; max INSERT_SIZE=23K
  • TI's: 1172642810, ... ,1174845185
  • SEQ_LIB_ID's : 1047111480027, ... , 1047174912885

FRG file:

  • FRG.src : same as TI's above
  • FRG.acc: 2 ..
  • DST.acc: 1260217, ... , 1260234

Problems:

  • All library insert sizes are underestimated
  • The contaminant reads align at ~91-93% id to the contaminant ctgs while the Mt/We reads align at 99% id to Mt/We finished seq. What %id thold to use for contaminant?

Assemblies

TIGR

  • 9X coverage, 856K Sanger traces => 8,200 scaff & 29,808 ctg (avg. scaff=~10K & avg ctg=~3K)
  • "scaffolds totaling ~71 Mb of data with a further ~17.5 Mb of contigs not integrated into any scaffold (orphan contigs)" (Science 2007)

CBCB

 [Scaffolds]
 TotalScaffolds=10317
 TotalContigsInScaffolds=12753
 MeanContigsPerScaffold=1.24
 MinContigsPerScaffold=1
 MaxContigsPerScaffold=53
 [Contigs]
 TotalContigsInScaffolds=12753
 TotalBasesInScaffolds=77964006
 TotalVarRecords=87058
 MeanContigLength=6113
 MinContigLength=273
 MaxContigLength=376744
 N50ContigBases=24748
 [Reads]
 TotalReadsInput=1178192
 TotalUsableReads=1173016
 AvgClearRange=791
 ContigReads=663383(56.55%)
 BigContigReads=544689(46.43%)
 SmallContigReads=118694(10.12%)
 DegenContigReads=124230(10.59%)
 SurrogateReads=295861(25.22%)
 PlacedSurrogateReads=44577(3.80%)
 SingletonReads=134119(11.43%)
 ChaffReads=134119(11.43%)
 [Coverage]
 ContigsOnly=6.86
 Contigs_Surrogates=9.47
 Contigs_Degens_Surrogates=9.33
 AllReads=11.91

Files

 * /fs/szattic/asmg1/adelcher/Genomes/Brugia             : Art's files
 * /fs/sztmpscratch/cole/tarchive_download/brugia_malay  : Cole's files
 * /fs/szasmg3/dpuiu/Brugia_malayi/                      : Daniela's files