Brugia malayi: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Line 108: Line 108:
   ctg+deg      22414  65      376744  4013    1201    19411  89953003
   ctg+deg      22414  65      376744  4013    1201    19411  89953003
   ctg+deg(10K+) 1651    10015  376744  33019  22612  40060  54514638
   ctg+deg(10K+) 1651    10015  376744  33019  22612  40060  54514638
   scf          10317  935    3890532 8019    1538    41772  82730474
   scf          10317  935    3890532 8019    1538    41772  82730474
  scf2K+        3656    2001    3890532 20189  5733    50293  73813083
   reads        1178192(100%)
   reads        1178192(100%)
   singletons    134119 (11.43%)
   singletons    134119 (11.43%)
Line 114: Line 117:
!!! better assembly than the published one
!!! better assembly than the published one
* Location: /fs/szasmg3/dpuiu/Brugia_malayi/Assembly/Bm/2008_0826_CA
* Location: /fs/szasmg3/dpuiu/Brugia_malayi/Assembly/Bm/2008_0826_CA
=== PITT ===
* Best so far !?
* Date: 110508
* Stats:
        elem      min    q1    q2    q3    max        mean      n50        sum
  scf    3170      2000  2917  4483  14471  6534162    22916      112914    72643770


== Files ==  
== Files ==  

Revision as of 19:03, 14 December 2009

Articles

Genome Info

  • 6 chromosomes: 1-5, XY
  • ~ 90M, 30% GC, 32% coding, 15% repeats

Other sequences

Genome Project

Brugia malayi has a diploid genome of approximately 110 Mb, organized in 6 pairs of chromosomes (five pairs of autosomes and one pair of sex chromosomes). In addition to the nuclear genome, B. malayi has a mitochondrial genome of about 14kb, and the genome of the harbored bacterial endosymbiont Wolbachia sp (1-2Mb).

The B. malayi genome project has been completed by The Institute for Genomic Research. Whole Genome Shotgun sequencing was used to obtain more than eight-fold coverage of the genome. The complete genome was assembled into approximately 8200 scaffolds and deposited in GenBank. The accession for the WGS project is AAQA00000000 and consists of sequences AAQA01000001-AAQA01029808.

Traces

 Libraries:
   * 2K : bulk of the sequence @TIGR
   * 15-20 K @TIGR
   * 8,000 BAC clones @Children's Hospital Oakland Research Institute.  (!!! no NCBI TA submission)
   * 454 Titanium paired-end  (NEW)
     ** 500Mbp per run in reads of ~350bp each.
     ** reads: 80bp to 500bp
     ** 30% - 50% of the reads contain two mate tags per read
     ** Each mate tag is ~150bp and the tags are separated by the 42bp "recombi" XLR linker sequence
     ** lib mean,std =~ 28K,6K
     ** duplication problem: 2-4 identical copies of each pair???

Trace summary:

  * all:          1,260,215 
  * TRACE_TYPE_CODE 
    * WGS:        1,258,277 
    * TRANSPOSON:     1,437 
    * PRIMER_WALK       501
  * CENTER NAME
    * TIGR:         856,624
    * JCVI          403,591
  * NO BACS !!!; max INSERT_SIZE=23K
  • TI's: 1172642810, ... ,1174845185
  • SEQ_LIB_ID's : 1047111480027, ... , 1047174912885

FRG file:

  • FRG.src : same as TI's above
  • FRG.acc: 2 ..
  • DST.acc: 1260217, ... , 1260234

Problems:

  • All library insert sizes are underestimated
  • The contaminant reads align at ~91-93% id to the contaminant ctgs while the Mt/We reads align at 99% id to Mt/We finished seq. What %id thold to use for contaminant?

Contigs

 * 26,879 good ctgs
 * 2,929 jird contaminants (Example: AAQA01001321 : mouse 99%id hits)
 good ctg len
       #elem   min     max     mean    median  n50     sum
 all   26879   200     611244  3241    1005    19005   87119350
 10K+  1224    10036   611244  41018   23135   60727   50206329
 
 good ctg GC%
       #elem   min     max     mean    median  n50  
 all   26878   0.00    72.30   28.86   28.56   29.46
 10K+  1224    24.38   38.44   30.38   30.43   30.62
 contaminant ctg len
       #elem   min     max     mean    median  n50     sum
 all   2929    200     8994    740     675     763     2167588
 
 contaminant ctg GC%
       #elem   min     max     mean    median  n50  
 all   2929    18.09   75.96   44.1    43.59   44.80

Assemblies

TIGR

  • 9X coverage, 856K Sanger traces => 8,200 scaff & 29,808 ctg (avg. scaff=~10K & avg ctg=~3K)
  • "scaffolds totaling ~71 Mb of data with a further ~17.5 Mb of contigs not integrated into any scaffold (orphan contigs)" (Science 2007)

CBCB

wgs 5.1 on filtered Sanger reads

               #elem   min     max     mean    median  n50     sum
 ctg           12753   273     376744  6113    1632    24765   77964006
 ctg(10K+)     1553    10015   376744  33868   23216   40618   52597039
 deg           9661    65      72494   1241    949     1008    11988997
 ctg+deg       22414   65      376744  4013    1201    19411   89953003
 ctg+deg(10K+) 1651    10015   376744  33019   22612   40060   54514638

 scf           10317   935     3890532 8019    1538    41772   82730474
 scf2K+        3656    2001    3890532 20189   5733    50293   73813083

 reads         1178192(100%)
 singletons    134119 (11.43%)

!!! better assembly than the published one

  • Location: /fs/szasmg3/dpuiu/Brugia_malayi/Assembly/Bm/2008_0826_CA

PITT

  • Best so far !?
  • Date: 110508
  • Stats:
        elem       min    q1     q2     q3     max        mean       n50        sum
 scf    3170       2000   2917   4483   14471  6534162    22916      112914     72643770

Files

 * /fs/szattic/asmg1/adelcher/Genomes/Brugia             : Art's files
 * /fs/sztmpscratch/cole/tarchive_download/brugia_malay  : Cole's files
 * /fs/szasmg3/dpuiu/Brugia_malayi/                      : Daniela's files