Brugia malayi: Difference between revisions

From Cbcb
Jump to navigation Jump to search
No edit summary
Line 158: Line 158:
   
   
   #clr of the seqs in the frg files
   #clr of the seqs in the frg files
  seqs        min    q1    q2    q3    max        mean      n50        sum             
            seqs        min    q1    q2    q3    max        mean      n50        sum             
   3,297,077    3      156    248    301    2043      244        275        806,091,347 => 8X
   all      3,297,077    3      156    248    301    2043      244        275        806,091,347 => 8X
  mated      858,340    64    107    156    223    612        171        201        147,070,298     
  unmated  2,438,737    2      207    261    335    2042      268        286        655,723,972


=== Contaminant search ===
=== Contaminant search ===

Revision as of 14:34, 12 February 2010

Articles

Genome Info

  • 6 chromosomes: 1-5, XY ; diploit genome ~ 110M bp
  • 30% GC,
  • 32% coding, 15% repeats

Other sequences

 contaminants       min    q1     q2     q3     max        mean       n50        sum            
 2929               200    527    675    820    8994       740.04     762        2167588

Genome Project

Brugia malayi has a diploid genome of approximately 110 Mb, organized in 6 pairs of chromosomes (five pairs of autosomes and one pair of sex chromosomes). In addition to the nuclear genome, B. malayi has a mitochondrial genome of about 14kb, and the genome of the harbored bacterial endosymbiont Wolbachia sp (1-2Mb).

The B. malayi genome project has been completed by The Institute for Genomic Research. Whole Genome Shotgun sequencing was used to obtain more than eight-fold coverage of the genome. The complete genome was assembled into approximately 8200 scaffolds and deposited in GenBank. The accession for the WGS project is AAQA00000000 and consists of sequences AAQA01000001-AAQA01029808. File location:

 /fs/szasmg3/dpuiu/Brugia_malayi/Data/Bm.fasta 
 ctgs               min    q1     q2     q3     max        mean       n50        sum            
 26879              200    836    1005   1495   611244     3241.17    18986      87119350   

Data

Original Traces

 INSERT_SIZE    INSERT_STDEV  TRACE_TYPE_CODE         
 1000           300           PRIMERWALK       325     
 1000           300           WGS              13500   
 1258           377           PRIMERWALK       3       
 1258           377           WGS              305906  
 1415           424           WGS              389561  
 3123           936           PRIMERWALK       173     
 3123           936           TRANSPOSON       1437    
 3123           936           WGS              293782  
 6000           1800          WGS              3193    
 7158           2147          WGS              219306  
 17168          5150          WGS              14934   
 22419          6725          WGS              18094   
 23000          6900          WGS              1       
 total                                         1,260,215
  • TI's: 1172642810, ... ,1174845185
  • SEQ_LIB_ID's : 1047111480027, ... , 1047174912885

FRG file:

  • FRG.src : same as TI's above
  • FRG.acc: 2 ..
  • DST.acc: 1260217, ... , 1260234
  • Location
 /fs/szasmg3/dpuiu/Brugia_malayi/Data/nucmer_seq/Bm-all.frg 
 DST     15
 FRG     1178192
 LKG     530930
 
 seqs       min    q1     q2     q3     max        mean       n50        sum            
 1178192    65     645    771    850    1214       724        800        853847771  => 8X

Problems:

  • All library insert sizes are underestimated
  • The contaminant reads align at ~91-93% id to the contaminant ctgs while the Mt/We reads align at 99% id to Mt/We finished seq. What %id thold to use for contaminant?

BACS

8,000 BAC clones @Children's Hospital Oakland Research Institute. (!!! no NCBI TA submission)

PITT FTP data

  • 3.21M 454 reads
  • 454 Titanium paired-end (NEW)
    • 500Mbp per run in reads of ~350bp each.
    • reads: 80bp to 500bp
    • 30% - 50% of the reads contain two mate tags per read
    • Each mate tag is ~150bp and the tags are separated by the 42bp "recombi" XLR linker sequence
    • lib mean,std =~ 28K,6K
    • duplication problem: 2-4 identical copies of each pair???

CBCB Location:

 /scratch1/brugia_malayi/Data/
 /fs/szattic-asmg4/brugia_malayi/Data/

FTP access:

 lftp -u bma 136.142.191.201
 pass: 6279
 user: bma
 # empty as of --Dpuiu 12:04, 8 January 2010 (EST)

Elodie's table: /scratch1/brugia_malayi/brugia-sequencing-summary.txt.csv

 #    elodie's date  protocol  platform  type            description                                run_name                                                                                                  Reads    Mates
 1    01/17/2008     WGS       Standard  Full run (2/2)  Mix of worms (calibration of the machine)  R_2008_01_31_18_01_35_FLX10070260_adminrig_ghedintestsample                                               ?        0
 2    07/01/2008     3Kb       Standard  Full            single worm (pUC contamination)            R_2008_08_06_13_52_29_FLX10070260_adminrig_080608_Ghedin-BrugiaLTPE1                                      492575   84341 
 3    09/11/2008     3kb       Standard  4/8 wells       single worm (pUC contamination)            R_2008_09_19_14_17_55_FLX10070260_adminrig_091908_HATFULL-MIDrepeat_GHEDIN-LTPE1                          263421   49258 
 4    10/01/2008     3Kb       Standard  Full            Mix of worms (still pUC contamination)     R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest                             59711    5096  
 5    02/01/2009     WGS       Standard  1/4 wells       Mix of worms; regions 2 & 3 were myxoma    R_2009_02_27_16_11_34_FLX10070260_adminrig_022709_GHEDIN                                                  ?        0
 6    04/06/2009     WGS       Standard  1/4 wells       Mix of worms; with comp. bio run           R_2009_04_15_14_46_56_FLX10070260_adminrig_041509_GHEDIN_r1-WGS1_r2-LMW4_r3-pool2compbio_r4-pool3compbio  ?        0
 7    05/01/2009     20Kb      Titanium  7/8 wells       Mix of worms                               R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1                              631287   213524
 8    10/28/2009     20Kb      Titanium  Full            Mix of worms                               R_2009_10_22_15_30_12_FLX10070260_adminrig_102209_GHEDIN_Brugia20kb2                                      1095713  377547
 9    Pending        3 Kb      Titanium  Full            Mix of worms                               ?                                                                                                         ?        ?
 .    Total                                                                                                                                                                                                   2542707  729766

SFF files counts:

      run                                                                                                                                                                       count   linker
 1    R_2008_01_31_18_01_35_FLX10070260_adminrig_ghedintestsample/D_2008_01_31_18_01_35_FLX10070260_adminrig_FullAnalysis/sff/E4RA0X101.sff                                     272923
 1    R_2008_01_31_18_01_35_FLX10070260_adminrig_ghedintestsample/D_2008_01_31_18_01_35_FLX10070260_adminrig_FullAnalysis/sff/E4RA0X102.sff                                     261899

 2    R_2008_08_06_13_52_29_FLX10070260_adminrig_080608_Ghedin-BrugiaLTPE1/D_2009_02_12_22_12_04_j_SignalProcessing/sff/FEZH5RS01.sff                                           228204  flx
 2    R_2008_08_06_13_52_29_FLX10070260_adminrig_080608_Ghedin-BrugiaLTPE1/D_2009_02_12_22_12_04_j_SignalProcessing/sff/FEZH5RS02.sff                                           264371  flx

 3    R_2008_09_19_14_17_55_FLX10070260_adminrig_091908_HATFULL-MIDrepeat_GHEDIN-LTPE1/FHAVB5T02.sff                                                                            86862   flx
 3    R_2008_09_19_14_17_55_FLX10070260_adminrig_091908_HATFULL-MIDrepeat_GHEDIN-LTPE1/FHAVB5T03.sff                                                                            87488   flx
 3    R_2008_09_19_14_17_55_FLX10070260_adminrig_091908_HATFULL-MIDrepeat_GHEDIN-LTPE1/FHAVB5T04.sff                                                                            89071   flx

 4    R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest/FIOXLOM01.sff                                                                               13695   flx
 4    R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest/FIOXLOM02.sff                                                                               14197   flx
 4    R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest/FIOXLOM03.sff                                                                               15515   flx
 4    R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest/FIOXLOM04.sff                                                                               16304   flx

 5    R_2009_02_27_16_11_34_FLX10070260_adminrig_022709_GHEDIN/FRLDXKV01.sff                                                                                                    18025

 6    R_2009_04_15_14_46_56_FLX10070260_adminrig_041509_GHEDIN_r1-WGS1_r2-LMW4_r3-pool2compbio_r4-pool3compbio/D_2009_04_16_14_19_21_morty_fullProcessing/FT9KOI001.sff         118490

 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY01.sff                           73807   tit
 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY02.sff                           91698   tit
 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY03.sff                           93878   tit
 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY04.sff                           90232   tit
 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY05.sff                           97065   tit
 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY06.sff                           94326   tit
 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY07.sff                           90281   tit

 8    R_2009_10_22_15_30_12_FLX10070260_adminrig_102209_GHEDIN_Brugia20kb2/F4H5CMB01.sff                                                                                        551263  tit
 8    R_2009_10_22_15_30_12_FLX10070260_adminrig_102209_GHEDIN_Brugia20kb2/F4H5CMB02.sff                                                                                        544450  tit

 total                                                                                                                                                                          3214044

Frg files

Location:

 /fs/szattic-asmg4/brugia_malayi/Data/Frg/ 

 #clr of the seqs in the frg files
           seqs         min    q1     q2     q3     max        mean       n50        sum            
 all       3,297,077    3      156    248    301    2043       244        275        806,091,347 => 8X
 mated       858,340    64     107    156    223    612        171        201        147,070,298      
 unmated   2,438,737    2      207    261    335    2042       268        286        655,723,972

Contaminant search

          Sanger       454
 jird     31,501    197,420
 Mt        1,507      2,634
 We       49,014     23,249

Assemblies

TIGR

  • 9X coverage, 856K Sanger traces => 8,200 scaff & 29,808 ctg (avg. scaff=~10K & avg ctg=~3K)
  • "scaffolds totaling ~71 Mb of data with a further ~17.5 Mb of contigs not integrated into any scaffold (orphan contigs)" (Science 2007)
  • NCBI AAQA00000000 AAQA01000001-AAQA01029808
 * 26,879 good ctgs
 * 2,929 jird contaminants (Example: AAQA01001321 : mouse 99%id hits)
 good ctg len
       #elem   min     max     mean    median  n50     sum
 all   26879   200     611244  3241    1005    19005   87119350
 10K+  1224    10036   611244  41018   23135   60727   50206329
 
 good ctg GC%
       #elem   min     max     mean    median  n50  
 all   26878   0.00    72.30   28.86   28.56   29.46
 10K+  1224    24.38   38.44   30.38   30.43   30.62
 contaminant ctg len
       #elem   min     max     mean    median  n50     sum
 all   2929    200     8994    740     675     763     2167588
 
 contaminant ctg GC%
       #elem   min     max     mean    median  n50  
 all   2929    18.09   75.96   44.1    43.59   44.80

CBCB

  • test wgs 5.1 on filtered Sanger reads
  • better assembly than the published one
  • Location:
 /fs/szasmg3/dpuiu/Brugia_malayi/Assembly/Bm/2008_0826_CA
               #elem   min     max     mean    median  n50     sum
 ctg           12753   273     376744  6113    1632    24765   77964006
 ctg(10K+)     1553    10015   376744  33868   23216   40618   52597039
 deg           9661    65      72494   1241    949     1008    11988997
 ctg+deg       22414   65      376744  4013    1201    19411   89953003
 ctg+deg(10K+) 1651    10015   376744  33019   22612   40060   54514638

 scf           10317   935     3890532 8019    1538    41772   82730474
 scf2K+        3656    2001    3890532 20189   5733    50293   73813083

 reads         1178192(100%)
 singletons    134119 (11.43%)

PITT

  • Best so far
  • Date: 11/05/08
  • Location:
 /fs/szasmg3/dpuiu/Brugia_malayi/Assembly/PITT/brugia_assemblies_110508.fasta
  • Stats:
        elem       min    q1     q2     q3     max        mean       n50        sum
 scf    3170       2000   2917   4483   14471  6534162    22916      112914     72643770

Files

 /fs/szattic/asmg1/adelcher/Genomes/Brugia             : Art's files
 /fs/sztmpscratch/cole/tarchive_download/brugia_malay  : Cole's files
 /fs/szasmg3/dpuiu/Brugia_malayi/                      : Daniela's files

 /scratch1/brugia_malayi/Data/                         : ftp PITT data
 /fs/szattic-asmg4/brugia_malayi                       : ftp PITT data (as well)